Patchwork [5/8] virtio-serial-bus: Add support for buffering guest output, throttling guests

login
register
mail settings
Submitter Amit Shah
Date Jan. 7, 2010, 7:31 a.m.
Message ID <1262849506-27132-6-git-send-email-amit.shah@redhat.com>
Download mbox | patch
Permalink /patch/42402/
State New
Headers show

Comments

Amit Shah - Jan. 7, 2010, 7:31 a.m.
Guests send us one buffer at a time. Current guests send buffers sized
4K bytes. If guest userspace applications sent out > 4K bytes in one
write() syscall, the write request actually sends out multiple buffers,
each of 4K in size.

This usually isn't a problem but for some apps, like VNC, the entire
data has to be sent in one go to make copy/paste work fine. So if an app
on the guest sends out guest clipboard contents, it has to be sent to
the vnc server in one go as the guest app sent it.

For this to be done, we need the guest to send us START and END markers
for each write request so that we can find out complete buffers and send
them off to ports.

This needs us to buffer all the data that comes in from the guests, hold
it off till we see all the data corresponding to one write request,
merge it all in one buffer and then send it to the port the data was
destined for.

Also, we add support for caching of these buffers till a port indicates
it's ready to receive data.

We keep caching data the guest sends us till a port accepts it. However,
this could lead to an OOM condition where a rogue process on the guest
could continue pumping in data while the host continues to cache it.

We introduce a per-port byte-limit property to alleviate this condition.
When this limit is reached, we send a control message to the guest
indicating it to not send us any more data till further indication. When
the number of bytes cached go lesser than the limit specified, we open
tell the guest to restart sending data.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 hw/virtio-serial-bus.c |  309 +++++++++++++++++++++++++++++++++++++++++++++++-
 hw/virtio-serial.c     |   11 +-
 hw/virtio-serial.h     |   39 ++++++
 3 files changed, 352 insertions(+), 7 deletions(-)
Jamie Lokier - Jan. 8, 2010, 1:12 a.m.
Amit Shah wrote:
> Guests send us one buffer at a time. Current guests send buffers sized
> 4K bytes. If guest userspace applications sent out > 4K bytes in one
> write() syscall, the write request actually sends out multiple buffers,
> each of 4K in size.
> 
> This usually isn't a problem but for some apps, like VNC, the entire
> data has to be sent in one go to make copy/paste work fine. So if an app
> on the guest sends out guest clipboard contents, it has to be sent to
> the vnc server in one go as the guest app sent it.
> 
> For this to be done, we need the guest to send us START and END markers
> for each write request so that we can find out complete buffers and send
> them off to ports.

That looks very dubious.  TCP/IP doesn't maintain write boundaries;
neither do pipes, unix domain sockets, pseudo-terminals, and almost
every other modern byte-oriented transport.

So how does VNC transmit the clipboard over TCP/IP to a VNC client,
without those boundaries, and why is it different with virtserialport?

-- Jamie
Amit Shah - Jan. 8, 2010, 5:03 a.m.
On (Fri) Jan 08 2010 [01:12:31], Jamie Lokier wrote:
> Amit Shah wrote:
> > Guests send us one buffer at a time. Current guests send buffers sized
> > 4K bytes. If guest userspace applications sent out > 4K bytes in one
> > write() syscall, the write request actually sends out multiple buffers,
> > each of 4K in size.
> > 
> > This usually isn't a problem but for some apps, like VNC, the entire
> > data has to be sent in one go to make copy/paste work fine. So if an app
> > on the guest sends out guest clipboard contents, it has to be sent to
> > the vnc server in one go as the guest app sent it.
> > 
> > For this to be done, we need the guest to send us START and END markers
> > for each write request so that we can find out complete buffers and send
> > them off to ports.
> 
> That looks very dubious.  TCP/IP doesn't maintain write boundaries;
> neither do pipes, unix domain sockets, pseudo-terminals, and almost
> every other modern byte-oriented transport.
> 
> So how does VNC transmit the clipboard over TCP/IP to a VNC client,
> without those boundaries, and why is it different with virtserialport?

TCP does this in its stack: it waits for the number of bytes written to
be received and then notifies userspace of data availibility.

In this case, consider the case where the guest writes 10k of data. The
guest gives us those 10k in 3 chunks: the first containing 4k (- header
size), the 2nd containing the next 4k (- header size) and the 3rd chunk
the remaining data.

I want to flush out this data only when I get all 10k.

		Amit
Jamie Lokier - Jan. 8, 2010, 1:35 p.m.
Amit Shah wrote:
> On (Fri) Jan 08 2010 [01:12:31], Jamie Lokier wrote:
> > Amit Shah wrote:
> > > Guests send us one buffer at a time. Current guests send buffers sized
> > > 4K bytes. If guest userspace applications sent out > 4K bytes in one
> > > write() syscall, the write request actually sends out multiple buffers,
> > > each of 4K in size.
> > > 
> > > This usually isn't a problem but for some apps, like VNC, the entire
> > > data has to be sent in one go to make copy/paste work fine. So if an app
> > > on the guest sends out guest clipboard contents, it has to be sent to
> > > the vnc server in one go as the guest app sent it.
> > > 
> > > For this to be done, we need the guest to send us START and END markers
> > > for each write request so that we can find out complete buffers and send
> > > them off to ports.
> > 
> > That looks very dubious.  TCP/IP doesn't maintain write boundaries;
> > neither do pipes, unix domain sockets, pseudo-terminals, and almost
> > every other modern byte-oriented transport.
> > 
> > So how does VNC transmit the clipboard over TCP/IP to a VNC client,
> > without those boundaries, and why is it different with virtserialport?
> 
> TCP does this in its stack: it waits for the number of bytes written to
> be received and then notifies userspace of data availibility.
> 
> In this case, consider the case where the guest writes 10k of data. The
> guest gives us those 10k in 3 chunks: the first containing 4k (- header
> size), the 2nd containing the next 4k (- header size) and the 3rd chunk
> the remaining data.
> 
> I want to flush out this data only when I get all 10k.

No, TCP does not do that.  It does not maintain boundaries, or delay
delivery until a full write is transmitted.  Even if you use TCP_CORK
(Linux specific), that is just a performance hint.

If the sender writes 10k of data in a single write() over TCP, and it
is split into packets size 4k/4k/2k (assume just over 4k MSS :-), the
receiver will be notified of availability any time after the *first*
packet is received, and the read() call may indeed return less than
10k.  In fact it can be split at any byte position, depending on other
activity.

Applications handle this by using their own framing protocol on top of
the TCP byte stream.  For example a simple header saying "expect N
bytes" followed by N bytes, or line delimiters or escape characters.

Sometimes it looks like TCP is maintaining write boundaries, but it is
just an artifact of its behaviour on many systems, and is not reliable
even on those systems where it seems to happen most of the time.  Even
when connecting to localhost, you cannot rely on that.  I have seen
people write code assuming TCP keeps boundaries, and then some weeks
later they are very confused debugging their code because it is not
reliable...

Since VNC is clearly designed to work over TCP, and is written by
people who know this, I'm wondering why you think it needs to be
different for virtio-serial.

-- Jamie
Anthony Liguori - Jan. 8, 2010, 4:26 p.m.
On 01/08/2010 07:35 AM, Jamie Lokier wrote:
> Sometimes it looks like TCP is maintaining write boundaries, but it is
> just an artifact of its behaviour on many systems, and is not reliable
> even on those systems where it seems to happen most of the time.  Even
> when connecting to localhost, you cannot rely on that.  I have seen
> people write code assuming TCP keeps boundaries, and then some weeks
> later they are very confused debugging their code because it is not
> reliable...
>
> Since VNC is clearly designed to work over TCP, and is written by
> people who know this, I'm wondering why you think it needs to be
> different for virtio-serial.
>    

I'm confused about why the buffering is needed in the first place.

I would think that any buffering should be pushed back to the guest.  
IOW, if there's available data from the char driver, but the guest 
doesn't have a buffer.  Don't select on the char driver until the guest 
has a buffer available.  If the guest attempts to write data but the 
char driver isn't ready to receive data, don't complete the request 
until the char driver can accept data.

Where does buffering come in?

Regards,

Anthony Liguori

> -- Jamie
>
>
>
Amit Shah - Jan. 11, 2010, 8:34 a.m.
On (Fri) Jan 08 2010 [13:35:03], Jamie Lokier wrote:
> 
> Since VNC is clearly designed to work over TCP, and is written by
> people who know this, I'm wondering why you think it needs to be
> different for virtio-serial.

For vnc putting stuff from a guest clipboard into vnc client clipboard
using the ServerCutText command, the entire buffer has to be provided
after sending the command and the 'length' values.

In this case, if the data from guest arrives in multiple packets, we
really don't want to call into the write function multiple times. A
single clipboard entry has to be created in the client with the entire
contents, so a single write operation has to be invoked.

For this to happen, there has to be some indication from the guest as to
how much data was written in one write() operation, which will let us
make a single write operation to the vnc client.

		Amit
Amit Shah - Jan. 11, 2010, 8:39 a.m.
On (Fri) Jan 08 2010 [10:26:59], Anthony Liguori wrote:
> On 01/08/2010 07:35 AM, Jamie Lokier wrote:
>> Sometimes it looks like TCP is maintaining write boundaries, but it is
>> just an artifact of its behaviour on many systems, and is not reliable
>> even on those systems where it seems to happen most of the time.  Even
>> when connecting to localhost, you cannot rely on that.  I have seen
>> people write code assuming TCP keeps boundaries, and then some weeks
>> later they are very confused debugging their code because it is not
>> reliable...
>>
>> Since VNC is clearly designed to work over TCP, and is written by
>> people who know this, I'm wondering why you think it needs to be
>> different for virtio-serial.
>>    
>
> I'm confused about why the buffering is needed in the first place.
>
> I would think that any buffering should be pushed back to the guest.   
> IOW, if there's available data from the char driver, but the guest  
> doesn't have a buffer.  Don't select on the char driver until the guest  
> has a buffer available.  If the guest attempts to write data but the  
> char driver isn't ready to receive data, don't complete the request  
> until the char driver can accept data.

This is a different thing from what Jamie's talking about. A guest or a
host might be interested in communicating data without waiting for the
other end to come up. The other end can just start consuming the data
(even the data that it missed while it wasn't connected) once it's up.

(I can remove this option for now and add it later, if you prefer it
that way.)

		Amit
Jamie Lokier - Jan. 11, 2010, 10:45 a.m.
Amit Shah wrote:
> On (Fri) Jan 08 2010 [13:35:03], Jamie Lokier wrote:
> > Since VNC is clearly designed to work over TCP, and is written by
> > people who know this, I'm wondering why you think it needs to be
> > different for virtio-serial.
> 
> For vnc putting stuff from a guest clipboard into vnc client clipboard
> using the ServerCutText command, the entire buffer has to be provided
> after sending the command and the 'length' values.

Are you talking about a VNC protocol command between qemu's VNC server
and the user's VNC client, or a private protocol between the guest and
qemu's VNC server?

> In this case, if the data from guest arrives in multiple packets, we
> really don't want to call into the write function multiple times. A
> single clipboard entry has to be created in the client with the entire
> contents, so a single write operation has to be invoked.

Same question again: *Why do you think the VNC server (in qemu) needs to
see the entire clipboard in a aingle write from the guest?*

You have already told it the total length to expect.  There is no
ambiguity about where it ends.

There is no need to do any more, if the reciever (in qemu) is
implemented correctly with a sane protocol.  That's assuming the guest
sends to qemu's VNC server which then sends it to the user's VNC client.

> For this to happen, there has to be some indication from the guest as to
> how much data was written in one write() operation, which will let us
> make a single write operation to the vnc client.

When it is sent to the user's VNC client, it will be split into
multiple packets by TCP. You *can't* send a single large write over
TCP without getting it split at arbitrary places. It's *impossible*. TCP
doesn't support that. It will split and merge your writes arbitrarily.

So the only interesting part is how it's transmitted from the guest to
qemu's VNC server first. Do you get to design that protocol yourself?

-- Jamie
Amit Shah - Jan. 11, 2010, 11:04 a.m.
On (Mon) Jan 11 2010 [10:45:53], Jamie Lokier wrote:
> Amit Shah wrote:
> > On (Fri) Jan 08 2010 [13:35:03], Jamie Lokier wrote:
> > > Since VNC is clearly designed to work over TCP, and is written by
> > > people who know this, I'm wondering why you think it needs to be
> > > different for virtio-serial.
> > 
> > For vnc putting stuff from a guest clipboard into vnc client clipboard
> > using the ServerCutText command, the entire buffer has to be provided
> > after sending the command and the 'length' values.
> 
> Are you talking about a VNC protocol command between qemu's VNC server
> and the user's VNC client, or a private protocol between the guest and
> qemu's VNC server?

What happens is:

1. Guest puts something on its clipboard
2. An agent on the guest gets notified of new clipboard contents
3. This agent sends over the entire clipboard contents to qemu via
   virtio-serial
4. virtio-serial sends off this data to the virtio-serial-vnc code
5. ServerCutText message from the vnc backend is sent to the vnc client
6. vnc client's clipboard gets updated
7. You can see guest's clipboard contents in your client's clipboard.

I'm talking about steps 3, 4, 5 here.

> > In this case, if the data from guest arrives in multiple packets, we
> > really don't want to call into the write function multiple times. A
> > single clipboard entry has to be created in the client with the entire
> > contents, so a single write operation has to be invoked.
> 
> Same question again: *Why do you think the VNC server (in qemu) needs to
> see the entire clipboard in a aingle write from the guest?*
> 
> You have already told it the total length to expect.  There is no
> ambiguity about where it ends.

Where does the total length come from? It has to come from the guest.
Otherwise, the vnc code will not know if a byte stream contains two
separate clipboard entries or just one huge clipboard entry.

Earlier, I used to send the length of one write as issued by a guest to
qemu. I just changed that to send a START and END flag so that I don't
have to send the length.

If this doesn't explain it, then I think we're not understanding each
other here.

		Amit
Jamie Lokier - Jan. 11, 2010, 11:33 p.m.
Amit Shah wrote:
> > Are you talking about a VNC protocol command between qemu's VNC server
> > and the user's VNC client, or a private protocol between the guest and
> > qemu's VNC server?
> 
> What happens is:
> 
> 1. Guest puts something on its clipboard
> 2. An agent on the guest gets notified of new clipboard contents
> 3. This agent sends over the entire clipboard contents to qemu via
>    virtio-serial
> 4. virtio-serial sends off this data to the virtio-serial-vnc code
> 5. ServerCutText message from the vnc backend is sent to the vnc client
> 6. vnc client's clipboard gets updated
> 7. You can see guest's clipboard contents in your client's clipboard.
> 
> I'm talking about steps 3, 4, 5 here.

Ok. Let's not worry about 5; it doesn't seem relevant, only that the
guest clipboad is sent to the host somehow.

> > You have already told it the total length to expect.  There is no
> > ambiguity about where it ends.
> 
> Where does the total length come from? It has to come from the guest.
> Otherwise, the vnc code will not know if a byte stream contains two
> separate clipboard entries or just one huge clipboard entry.

I see.  So it's a *really simple* protocol where the clipboard entry
is sent by the guest agent with a single write() without any framing bytes?

> Earlier, I used to send the length of one write as issued by a guest to
> qemu. I just changed that to send a START and END flag so that I don't
> have to send the length.

Why not just have the guest agent send a 4-byte header which is the
integer length of the clipboard blob to follow?

I.e. instead of

    int guest_send_clipboard(const char *data, size_t length)
    {
        return write_full(virtio_fd, data, length);
    }

do this:

    int guest_send_clipboard(const char *data, size_t length)
    {
        u32 encoded_length = cpu_to_be32(length);
        int err = write_full(virtio_serial_fd, &encoded_length,
                             sizeof(encoded_length));
        if (err == 0)
            err = write_full(virtio_serial_fd, data, length);
        return err;
    }

> If this doesn't explain it, then I think we're not understanding each
> other here.

It does explain it very well, thanks.  I think you're misguided about
the solution :-)

What confused me was you mentioned the VNC ServerCutText command
having to receive the whole data in one go.  ServerCutText isn't
really relevant to this, and clearly is encoded with VNC protocol
framing.  If it was RDP or the SDL client instead of VNC, it would be
something else.  All that matters is getting the clipboard blob from
guest to qemu in one piece, right?

Having the guest agent send a few framing bytes seems very simple, and
would have the added bonus that the same guest agent protocol would
work on a "real" emulated serial port, guest->host TCP, etc. where
virtio-serial isn't available in the guest OS (e.g. older kernels).

I really can't see any merit in making virtio-serial not be a serial
port, being instead like a unix datagram socket, to support a specific
user of virtio-serial when a trivial 4-byte header in the guest agent
code would be easier for that user anyway.

If it did that, I think the name virtio-serial would have to change to
virtio-datagram, becuase it wouldn't behave like a serial port any
more.  It would also be less useful for things that _do_ want
something like a pipe/serial port.  But why bother?

-- Jamie
Anthony Liguori - Jan. 12, 2010, 12:27 a.m.
On 01/11/2010 05:33 PM, Jamie Lokier wrote:
> Amit Shah wrote:
>    
>>> Are you talking about a VNC protocol command between qemu's VNC server
>>> and the user's VNC client, or a private protocol between the guest and
>>> qemu's VNC server?
>>>        
>> What happens is:
>>
>> 1. Guest puts something on its clipboard
>> 2. An agent on the guest gets notified of new clipboard contents
>> 3. This agent sends over the entire clipboard contents to qemu via
>>     virtio-serial
>> 4. virtio-serial sends off this data to the virtio-serial-vnc code
>> 5. ServerCutText message from the vnc backend is sent to the vnc client
>> 6. vnc client's clipboard gets updated
>> 7. You can see guest's clipboard contents in your client's clipboard.
>>
>> I'm talking about steps 3, 4, 5 here.
>>      
> Ok. Let's not worry about 5; it doesn't seem relevant, only that the
> guest clipboad is sent to the host somehow.
>
>    
>>> You have already told it the total length to expect.  There is no
>>> ambiguity about where it ends.
>>>        
>> Where does the total length come from? It has to come from the guest.
>> Otherwise, the vnc code will not know if a byte stream contains two
>> separate clipboard entries or just one huge clipboard entry.
>>      
> I see.  So it's a *really simple* protocol where the clipboard entry
> is sent by the guest agent with a single write() without any framing bytes?
>
>    
>> Earlier, I used to send the length of one write as issued by a guest to
>> qemu. I just changed that to send a START and END flag so that I don't
>> have to send the length.
>>      
> Why not just have the guest agent send a 4-byte header which is the
> integer length of the clipboard blob to follow?
>
> I.e. instead of
>
>      int guest_send_clipboard(const char *data, size_t length)
>      {
>          return write_full(virtio_fd, data, length);
>      }
>
> do this:
>
>      int guest_send_clipboard(const char *data, size_t length)
>      {
>          u32 encoded_length = cpu_to_be32(length);
>          int err = write_full(virtio_serial_fd,&encoded_length,
>                               sizeof(encoded_length));
>          if (err == 0)
>              err = write_full(virtio_serial_fd, data, length);
>          return err;
>      }
>
>    
>> If this doesn't explain it, then I think we're not understanding each
>> other here.
>>      
> It does explain it very well, thanks.  I think you're misguided about
> the solution :-)
>
> What confused me was you mentioned the VNC ServerCutText command
> having to receive the whole data in one go.  ServerCutText isn't
> really relevant to this, and clearly is encoded with VNC protocol
> framing.  If it was RDP or the SDL client instead of VNC, it would be
> something else.  All that matters is getting the clipboard blob from
> guest to qemu in one piece, right?
>
> Having the guest agent send a few framing bytes seems very simple, and
> would have the added bonus that the same guest agent protocol would
> work on a "real" emulated serial port, guest->host TCP, etc. where
> virtio-serial isn't available in the guest OS (e.g. older kernels).
>
> I really can't see any merit in making virtio-serial not be a serial
> port, being instead like a unix datagram socket, to support a specific
> user of virtio-serial when a trivial 4-byte header in the guest agent
> code would be easier for that user anyway.
>
> If it did that, I think the name virtio-serial would have to change to
> virtio-datagram, becuase it wouldn't behave like a serial port any
> more.  It would also be less useful for things that _do_ want
> something like a pipe/serial port.  But why bother?
>    

I agree wrt a streaming protocol verses a datagram protocol.  The core 
argument IMHO is that the userspace interface is a file descriptor.  
Most programmers are used to assuming that boundaries aren't preserved 
in read/write calls.

Regards,

Anthony Liguori

> -- Jamie
>
>
>
Anthony Liguori - Jan. 12, 2010, 12:28 a.m.
On 01/11/2010 02:39 AM, Amit Shah wrote:
> On (Fri) Jan 08 2010 [10:26:59], Anthony Liguori wrote:
>    
>> On 01/08/2010 07:35 AM, Jamie Lokier wrote:
>>      
>>> Sometimes it looks like TCP is maintaining write boundaries, but it is
>>> just an artifact of its behaviour on many systems, and is not reliable
>>> even on those systems where it seems to happen most of the time.  Even
>>> when connecting to localhost, you cannot rely on that.  I have seen
>>> people write code assuming TCP keeps boundaries, and then some weeks
>>> later they are very confused debugging their code because it is not
>>> reliable...
>>>
>>> Since VNC is clearly designed to work over TCP, and is written by
>>> people who know this, I'm wondering why you think it needs to be
>>> different for virtio-serial.
>>>
>>>        
>> I'm confused about why the buffering is needed in the first place.
>>
>> I would think that any buffering should be pushed back to the guest.
>> IOW, if there's available data from the char driver, but the guest
>> doesn't have a buffer.  Don't select on the char driver until the guest
>> has a buffer available.  If the guest attempts to write data but the
>> char driver isn't ready to receive data, don't complete the request
>> until the char driver can accept data.
>>      
> This is a different thing from what Jamie's talking about. A guest or a
> host might be interested in communicating data without waiting for the
> other end to come up. The other end can just start consuming the data
> (even the data that it missed while it wasn't connected) once it's up.
>
> (I can remove this option for now and add it later, if you prefer it
> that way.)
>    

If it's not needed by your use case, please remove it.  Doing buffering 
gets tricky because you can't allow an infinite buffer for security 
reasons.  All you end up doing is increasing the size of the buffer 
beyond what the guest and client are capable of doing.  Since you still 
can lose data, apps have to be written to handle this.  I think it adds 
complexity without a lot of benefit.

Regards,

Anthony Liguori

> 		Amit
>
Amit Shah - Jan. 12, 2010, 7:08 a.m.
On (Mon) Jan 11 2010 [18:28:52], Anthony Liguori wrote:
>>>
>>> I would think that any buffering should be pushed back to the guest.
>>> IOW, if there's available data from the char driver, but the guest
>>> doesn't have a buffer.  Don't select on the char driver until the guest
>>> has a buffer available.  If the guest attempts to write data but the
>>> char driver isn't ready to receive data, don't complete the request
>>> until the char driver can accept data.
>>>      
>> This is a different thing from what Jamie's talking about. A guest or a
>> host might be interested in communicating data without waiting for the
>> other end to come up. The other end can just start consuming the data
>> (even the data that it missed while it wasn't connected) once it's up.
>>
>> (I can remove this option for now and add it later, if you prefer it
>> that way.)
>>    
>
> If it's not needed by your use case, please remove it.  Doing buffering  
> gets tricky because you can't allow an infinite buffer for security  
> reasons.  All you end up doing is increasing the size of the buffer  
> beyond what the guest and client are capable of doing.  Since you still  
> can lose data, apps have to be written to handle this.  I think it adds  
> complexity without a lot of benefit.

The buffering has to remain anyway since we can't assume that the ports
will consume the entire buffers we pass on to them. So we'll have to
buffer the data till the entire buffer is consumed.

That, or the buffer management should be passed off to individual ports.
Which might result in a lot of code duplication since we can have a lot
of these ports in different places in the qemu code.

So I guess it's better to leave the buffer management in the bus itself.

Which means we get the 'cache_buffers' functionality essentially for
free.

		Amit
Amit Shah - Jan. 12, 2010, 7:16 a.m.
On (Mon) Jan 11 2010 [23:33:56], Jamie Lokier wrote:
> Amit Shah wrote:
> > > Are you talking about a VNC protocol command between qemu's VNC server
> > > and the user's VNC client, or a private protocol between the guest and
> > > qemu's VNC server?
> > 
> > What happens is:
> > 
> > 1. Guest puts something on its clipboard
> > 2. An agent on the guest gets notified of new clipboard contents
> > 3. This agent sends over the entire clipboard contents to qemu via
> >    virtio-serial
> > 4. virtio-serial sends off this data to the virtio-serial-vnc code
> > 5. ServerCutText message from the vnc backend is sent to the vnc client
> > 6. vnc client's clipboard gets updated
> > 7. You can see guest's clipboard contents in your client's clipboard.
> > 
> > I'm talking about steps 3, 4, 5 here.
> 
> Ok. Let's not worry about 5; it doesn't seem relevant, only that the
> guest clipboad is sent to the host somehow.

Actually, it is important...

> > > You have already told it the total length to expect.  There is no
> > > ambiguity about where it ends.
> > 
> > Where does the total length come from? It has to come from the guest.
> > Otherwise, the vnc code will not know if a byte stream contains two
> > separate clipboard entries or just one huge clipboard entry.
> 
> I see.  So it's a *really simple* protocol where the clipboard entry
> is sent by the guest agent with a single write() without any framing bytes?
> 
> > Earlier, I used to send the length of one write as issued by a guest to
> > qemu. I just changed that to send a START and END flag so that I don't
> > have to send the length.
> 
> Why not just have the guest agent send a 4-byte header which is the
> integer length of the clipboard blob to follow?
> 
> I.e. instead of
> 
>     int guest_send_clipboard(const char *data, size_t length)
>     {
>         return write_full(virtio_fd, data, length);
>     }
> 
> do this:
> 
>     int guest_send_clipboard(const char *data, size_t length)
>     {
>         u32 encoded_length = cpu_to_be32(length);
>         int err = write_full(virtio_serial_fd, &encoded_length,
>                              sizeof(encoded_length));
>         if (err == 0)
>             err = write_full(virtio_serial_fd, data, length);
>         return err;
>     }
> 
> > If this doesn't explain it, then I think we're not understanding each
> > other here.
> 
> It does explain it very well, thanks.  I think you're misguided about
> the solution :-)

The above solution you specify works if it's assumed that we hold off
writes to the vnc client till we get a complete buffer according to the
header received.

Now, a header might contain the length 10000, meaning 10000 bytes are to
be expected.

What if the write() on the guest fails after writing 8000 bytes? There's
no way for us to signal that.

So this vnc port might just be waiting for all 10000 bytes to be
received, and it may never receive anything more.

Or, it might receive the start of the next clipboard entry and it could
be interpreted as data from the previous copy.

> What confused me was you mentioned the VNC ServerCutText command
> having to receive the whole data in one go.  ServerCutText isn't
> really relevant to this,

It is relevant. You can't split up one ServerCutText command in multiple
buffers. You can also not execute any other commands while one command
is in progress, so you have to hold off on executing ServerCutText till
all the data is available. And you can't reliably do that from guest
userspace because of the previously-mentioned scenario.

> I really can't see any merit in making virtio-serial not be a serial
> port, being instead like a unix datagram socket, to support a specific
> user of virtio-serial when a trivial 4-byte header in the guest agent
> code would be easier for that user anyway.

BTW I don't really want this too, I can get rid of it if everyone agrees
we won't support clipboard writes > 4k over vnc or if there's a better
idea.

		Amit
Anthony Liguori - Jan. 12, 2010, 3 p.m.
On 01/12/2010 01:16 AM, Amit Shah wrote:
> BTW I don't really want this too, I can get rid of it if everyone agrees
> we won't support clipboard writes>  4k over vnc or if there's a better
> idea.
>    

Why bother trying to preserve message boundaries?   I think that's the 
fundamental question.

Regards,

Anthony Liguori

> 		Amit
>
>
>
Amit Shah - Jan. 12, 2010, 3:13 p.m.
On (Tue) Jan 12 2010 [09:00:52], Anthony Liguori wrote:
> On 01/12/2010 01:16 AM, Amit Shah wrote:
>> BTW I don't really want this too, I can get rid of it if everyone agrees
>> we won't support clipboard writes>  4k over vnc or if there's a better
>> idea.
>>    
>
> Why bother trying to preserve message boundaries?   I think that's the  
> fundamental question.

For the vnc clipboard copy-paste case, I explained that in the couple of
mails before in this thread.

There might be other use-cases, I don't know about them though.

		Amit
Anthony Liguori - Jan. 12, 2010, 3:46 p.m.
On 01/12/2010 09:13 AM, Amit Shah wrote:
> On (Tue) Jan 12 2010 [09:00:52], Anthony Liguori wrote:
>    
>> On 01/12/2010 01:16 AM, Amit Shah wrote:
>>      
>>> BTW I don't really want this too, I can get rid of it if everyone agrees
>>> we won't support clipboard writes>   4k over vnc or if there's a better
>>> idea.
>>>
>>>        
>> Why bother trying to preserve message boundaries?   I think that's the
>> fundamental question.
>>      
> For the vnc clipboard copy-paste case, I explained that in the couple of
> mails before in this thread.
>    

It didn't make sense to me.  I think the assumption has to be that the 
client can send corrupt data and the host has to handle it.

Regards,

Anthony Liguori

> There might be other use-cases, I don't know about them though.
>
> 		Amit
>
Amit Shah - Jan. 12, 2010, 3:49 p.m.
On (Tue) Jan 12 2010 [09:46:55], Anthony Liguori wrote:
> On 01/12/2010 09:13 AM, Amit Shah wrote:
>> On (Tue) Jan 12 2010 [09:00:52], Anthony Liguori wrote:
>>    
>>> On 01/12/2010 01:16 AM, Amit Shah wrote:
>>>      
>>>> BTW I don't really want this too, I can get rid of it if everyone agrees
>>>> we won't support clipboard writes>   4k over vnc or if there's a better
>>>> idea.
>>>>
>>>>        
>>> Why bother trying to preserve message boundaries?   I think that's the
>>> fundamental question.
>>>      
>> For the vnc clipboard copy-paste case, I explained that in the couple of
>> mails before in this thread.
>>    
>
> It didn't make sense to me.  I think the assumption has to be that the  
> client can send corrupt data and the host has to handle it.

You mean if the guest kernel sends the wrong flags? Or doesn't set the
flags? Can you explain what scenario you're talking about?

		Amit
Anthony Liguori - Jan. 12, 2010, 3:55 p.m.
On 01/12/2010 09:49 AM, Amit Shah wrote:
> On (Tue) Jan 12 2010 [09:46:55], Anthony Liguori wrote:
>    
>> On 01/12/2010 09:13 AM, Amit Shah wrote:
>>      
>>> On (Tue) Jan 12 2010 [09:00:52], Anthony Liguori wrote:
>>>
>>>        
>>>> On 01/12/2010 01:16 AM, Amit Shah wrote:
>>>>
>>>>          
>>>>> BTW I don't really want this too, I can get rid of it if everyone agrees
>>>>> we won't support clipboard writes>    4k over vnc or if there's a better
>>>>> idea.
>>>>>
>>>>>
>>>>>            
>>>> Why bother trying to preserve message boundaries?   I think that's the
>>>> fundamental question.
>>>>
>>>>          
>>> For the vnc clipboard copy-paste case, I explained that in the couple of
>>> mails before in this thread.
>>>
>>>        
>> It didn't make sense to me.  I think the assumption has to be that the
>> client can send corrupt data and the host has to handle it.
>>      
> You mean if the guest kernel sends the wrong flags? Or doesn't set the
> flags? Can you explain what scenario you're talking about?
>    

It's very likely that you'll have to implement some sort of protocol on 
top of virtio-serial.  It won't always just be simple strings.

If you have a simple datagram protocol, that contains two ints and a 
string, it's going to have to be encoded like <int a><int b><int 
len><char data[len]>.  You need to validate that len fits within the 
boundaries and deal with len being less than the boundary.

If you've got a command protocol where the you send the guest something 
and then expect a response, you have to deal with the fact that the 
guest may never respond.  Having well defined message boundaries does 
not help the general problem and it only helps in the most trivial cases.

Basically, it boils down to a lot of complexity for something that isn't 
going to be helpful in most circumstances.

Regards,

Anthony Liguori

> 		Amit
>
Amit Shah - Jan. 12, 2010, 4:04 p.m.
On (Tue) Jan 12 2010 [09:55:41], Anthony Liguori wrote:
> On 01/12/2010 09:49 AM, Amit Shah wrote:
>> On (Tue) Jan 12 2010 [09:46:55], Anthony Liguori wrote:
>>    
>>> On 01/12/2010 09:13 AM, Amit Shah wrote:
>>>      
>>>> On (Tue) Jan 12 2010 [09:00:52], Anthony Liguori wrote:
>>>>
>>>>        
>>>>> On 01/12/2010 01:16 AM, Amit Shah wrote:
>>>>>
>>>>>          
>>>>>> BTW I don't really want this too, I can get rid of it if everyone agrees
>>>>>> we won't support clipboard writes>    4k over vnc or if there's a better
>>>>>> idea.
>>>>>>
>>>>>>
>>>>>>            
>>>>> Why bother trying to preserve message boundaries?   I think that's the
>>>>> fundamental question.
>>>>>
>>>>>          
>>>> For the vnc clipboard copy-paste case, I explained that in the couple of
>>>> mails before in this thread.
>>>>
>>>>        
>>> It didn't make sense to me.  I think the assumption has to be that the
>>> client can send corrupt data and the host has to handle it.
>>>      
>> You mean if the guest kernel sends the wrong flags? Or doesn't set the
>> flags? Can you explain what scenario you're talking about?
>>    
>
> It's very likely that you'll have to implement some sort of protocol on  
> top of virtio-serial.  It won't always just be simple strings.

Yes, virtio-serial is just meant to be a transport agnostic of whatever
data or protocols that ride over it.

> If you have a simple datagram protocol, that contains two ints and a  
> string, it's going to have to be encoded like <int a><int b><int  
> len><char data[len]>.  You need to validate that len fits within the  
> boundaries and deal with len being less than the boundary.
>
> If you've got a command protocol where the you send the guest something  
> and then expect a response, you have to deal with the fact that the  
> guest may never respond.  Having well defined message boundaries does  
> not help the general problem and it only helps in the most trivial cases.
>
> Basically, it boils down to a lot of complexity for something that isn't  
> going to be helpful in most circumstances.

I don't know why you're saying virtio-serial-bus does (or needs to) do
anything of this.

		Amit
Markus Armbruster - Jan. 13, 2010, 5:14 p.m.
Anthony Liguori <anthony@codemonkey.ws> writes:

> On 01/12/2010 01:16 AM, Amit Shah wrote:
>> BTW I don't really want this too, I can get rid of it if everyone agrees
>> we won't support clipboard writes>  4k over vnc or if there's a better
>> idea.
>>    
>
> Why bother trying to preserve message boundaries?   I think that's the
> fundamental question.

Yes.  Either it's a datagram or a stream pipe.  I always thought it
would be a stream pipe, as the name "serial" suggests.

As to the clipboard use case: same problem exists with any old stream
pipe, including TCP, same solutions apply.  If you told the peer "I'm
going to send you 12345 bytes now", and your stream pipe chokes after
7890 bytes, you retry until everything got through.  If you want to be
able to abort a partial transfer and start a new one, you layer a
protocol suitable for that on top of your stream pipe.
Anthony Liguori - Jan. 13, 2010, 6:31 p.m.
On 01/13/2010 11:14 AM, Markus Armbruster wrote:
> Anthony Liguori<anthony@codemonkey.ws>  writes:
>
>    
>> On 01/12/2010 01:16 AM, Amit Shah wrote:
>>      
>>> BTW I don't really want this too, I can get rid of it if everyone agrees
>>> we won't support clipboard writes>   4k over vnc or if there's a better
>>> idea.
>>>
>>>        
>> Why bother trying to preserve message boundaries?   I think that's the
>> fundamental question.
>>      
> Yes.  Either it's a datagram or a stream pipe.  I always thought it
> would be a stream pipe, as the name "serial" suggests.
>    

And if it's a datagram, then we should accept that there will be a fixed 
max message size which is pretty common in all datagram protocols.  That 
fixed size should be no larger than what the transport supports so in 
this case, it would be 4k.

If a guest wants to send larger messages, it must build a continuation 
protocol on top of the datagram protocol.

Regards,

Anthony Liguori

Patch

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 20d9580..c947143 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -44,6 +44,20 @@  struct VirtIOSerial {
     struct virtio_console_config config;
 };
 
+/* This struct holds individual buffers received for each port */
+typedef struct VirtIOSerialPortBuffer {
+    QTAILQ_ENTRY(VirtIOSerialPortBuffer) next;
+
+    uint8_t *buf;
+
+    size_t len; /* length of the buffer */
+    size_t offset; /* offset from which to consume data in the buffer */
+
+    uint32_t flags; /* Sent by guest (start of data stream, end of stream) */
+
+    bool previous_failure; /* Did sending out this buffer fail previously? */
+} VirtIOSerialPortBuffer;
+
 static VirtIOSerialPort *find_port_by_id(VirtIOSerial *vser, uint32_t id)
 {
     VirtIOSerialPort *port;
@@ -157,6 +171,198 @@  static size_t send_control_event(VirtIOSerialPort *port, uint16_t event,
     return send_control_msg(port, &cpkt, sizeof(cpkt));
 }
 
+static void init_buf(VirtIOSerialPortBuffer *buf, uint8_t *buffer, size_t len)
+{
+    buf->buf = buffer;
+    buf->len = len;
+    buf->offset = 0;
+    buf->flags = 0;
+    buf->previous_failure = false;
+}
+
+static VirtIOSerialPortBuffer *alloc_buf(size_t len)
+{
+    VirtIOSerialPortBuffer *buf;
+
+    buf = qemu_malloc(sizeof(*buf));
+    buf->buf = qemu_malloc(len);
+
+    init_buf(buf, buf->buf, len);
+
+    return buf;
+}
+
+static void free_buf(VirtIOSerialPortBuffer *buf)
+{
+    qemu_free(buf->buf);
+    qemu_free(buf);
+}
+
+static size_t get_complete_data_size(VirtIOSerialPort *port)
+{
+    VirtIOSerialPortBuffer *buf;
+    size_t size;
+    bool is_complete, start_seen;
+
+    size = 0;
+    is_complete = false;
+    start_seen = false;
+    QTAILQ_FOREACH(buf, &port->unflushed_buffers, next) {
+        size += buf->len - buf->offset;
+
+        if (buf->flags & VIRTIO_CONSOLE_HDR_END_DATA) {
+            is_complete = true;
+            break;
+        }
+        if (buf == QTAILQ_FIRST(&port->unflushed_buffers)
+            && !(buf->flags & VIRTIO_CONSOLE_HDR_START_DATA)) {
+
+            /* There's some data that arrived without a START flag. Flush it. */
+            is_complete = true;
+            break;
+        }
+
+        if (buf->flags & VIRTIO_CONSOLE_HDR_START_DATA) {
+            if (start_seen) {
+                /*
+                 * There's some data that arrived without an END
+                 * flag. Flush it.
+                 */
+                size -= buf->len + buf->offset;
+                is_complete = true;
+                break;
+            }
+            start_seen = true;
+        }
+    }
+    return is_complete ? size : 0;
+}
+
+/*
+ * The guest could have sent the data corresponding to one write
+ * request split up in multiple buffers. The first buffer has the
+ * VIRTIO_CONSOLE_HDR_START_DATA flag set and the last buffer has the
+ * VIRTIO_CONSOLE_HDR_END_DATA flag set. Using this information, merge
+ * the parts into one buf here to process it for output.
+ */
+static VirtIOSerialPortBuffer *get_complete_buf(VirtIOSerialPort *port)
+{
+    VirtIOSerialPortBuffer *buf, *buf2;
+    uint8_t *outbuf;
+    size_t out_offset, out_size;
+
+    out_size = get_complete_data_size(port);
+    if (!out_size)
+        return NULL;
+
+    buf = QTAILQ_FIRST(&port->unflushed_buffers);
+    if (buf->len - buf->offset == out_size) {
+        QTAILQ_REMOVE(&port->unflushed_buffers, buf, next);
+        return buf;
+    }
+    out_offset = 0;
+    outbuf = qemu_malloc(out_size);
+
+    QTAILQ_FOREACH_SAFE(buf, &port->unflushed_buffers, next, buf2) {
+        size_t copy_size;
+
+        copy_size = buf->len - buf->offset;
+        memcpy(outbuf + out_offset, buf->buf + buf->offset, copy_size);
+        out_offset += copy_size;
+
+        QTAILQ_REMOVE(&port->unflushed_buffers, buf, next);
+        qemu_free(buf->buf);
+
+        if (out_offset == out_size) {
+            break;
+        }
+        qemu_free(buf);
+    }
+    init_buf(buf, outbuf, out_size);
+    buf->flags = VIRTIO_CONSOLE_HDR_START_DATA | VIRTIO_CONSOLE_HDR_END_DATA;
+
+    return buf;
+}
+
+/* Call with the unflushed_buffers_lock held */
+static void flush_queue(VirtIOSerialPort *port)
+{
+    VirtIOSerialPortBuffer *buf;
+    size_t out_size;
+    ssize_t ret;
+
+    /*
+     * If a device is interested in buffering packets till it's
+     * opened, cache the data the guest sends us till a connection is
+     * established.
+     */
+    if (!port->host_connected && port->cache_buffers) {
+        return;
+    }
+
+    while ((buf = get_complete_buf(port))) {
+        out_size = buf->len - buf->offset;
+        if (!port->host_connected) {
+            /*
+             * Caching is disabled and host is not connected, so
+             * discard the buffer. Do this only after merging the
+             * buffer as a port can get connected in the middle of
+             * dropping buffers and the port will end up getting the
+             * incomplete output.
+             */
+            port->nr_bytes -= buf->len + buf->offset;
+            free_buf(buf);
+            continue;
+        }
+
+        ret = port->info->have_data(port, buf->buf + buf->offset, out_size);
+        if (ret < out_size) {
+            QTAILQ_INSERT_HEAD(&port->unflushed_buffers, buf, next);
+        }
+        if (ret <= 0) {
+            /* We're not progressing at all */
+            if (buf->previous_failure) {
+                break;
+            }
+            buf->previous_failure = true;
+        } else {
+            buf->offset += ret;
+            port->nr_bytes -= ret;
+            buf->previous_failure = false;
+        }
+        if (!(buf->len - buf->offset)) {
+            free_buf(buf);
+        }
+    }
+
+    if (port->host_throttled && port->nr_bytes < port->byte_limit) {
+        port->host_throttled = false;
+        send_control_event(port, VIRTIO_CONSOLE_THROTTLE_PORT, 0);
+    }
+}
+
+static void flush_all_ports(VirtIOSerial *vser)
+{
+    struct VirtIOSerialPort *port;
+
+    QTAILQ_FOREACH(port, &vser->ports, next) {
+        if (port->has_activity) {
+            port->has_activity = false;
+            flush_queue(port);
+        }
+    }
+}
+
+static void remove_port_buffers(VirtIOSerialPort *port)
+{
+    struct VirtIOSerialPortBuffer *buf, *buf2;
+
+    QTAILQ_FOREACH_SAFE(buf, &port->unflushed_buffers, next, buf2) {
+        QTAILQ_REMOVE(&port->unflushed_buffers, buf, next);
+        free_buf(buf);
+    }
+}
+
 /* Functions for use inside qemu to open and read from/write to ports */
 int virtio_serial_open(VirtIOSerialPort *port)
 {
@@ -168,6 +374,10 @@  int virtio_serial_open(VirtIOSerialPort *port)
     port->host_connected = true;
     send_control_event(port, VIRTIO_CONSOLE_PORT_OPEN, 1);
 
+    /* Flush any buffers that were cached while the port was closed */
+    if (port->cache_buffers && port->info->have_data) {
+        flush_queue(port);
+    }
     return 0;
 }
 
@@ -176,6 +386,9 @@  int virtio_serial_close(VirtIOSerialPort *port)
     port->host_connected = false;
     send_control_event(port, VIRTIO_CONSOLE_PORT_OPEN, 0);
 
+    if (!port->cache_buffers) {
+        remove_port_buffers(port);
+    }
     return 0;
 }
 
@@ -265,6 +478,14 @@  static void handle_control_message(VirtIOSerial *vser, void *buf)
             qemu_free(buffer);
         }
 
+        /*
+         * We also want to signal to the guest whether or not the port
+         * is set to caching the buffers when disconnected.
+         */
+        if (port->cache_buffers) {
+            send_control_event(port, VIRTIO_CONSOLE_CACHE_BUFFERS, 1);
+        }
+
         if (port->host_connected) {
             send_control_event(port, VIRTIO_CONSOLE_PORT_OPEN, 1);
         }
@@ -315,6 +536,10 @@  static void control_out(VirtIODevice *vdev, VirtQueue *vq)
 
 /*
  * Guest wrote something to some port.
+ *
+ * Flush the data in the entire chunk that we received rather than
+ * splitting it into multiple buffers. VNC clients don't consume split
+ * buffers
  */
 static void handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
@@ -325,6 +550,7 @@  static void handle_output(VirtIODevice *vdev, VirtQueue *vq)
 
     while (virtqueue_pop(vq, &elem)) {
         VirtIOSerialPort *port;
+        VirtIOSerialPortBuffer *buf;
         struct virtio_console_header header;
         int header_len;
 
@@ -333,10 +559,14 @@  static void handle_output(VirtIODevice *vdev, VirtQueue *vq)
         if (elem.out_sg[0].iov_len < header_len) {
             goto next_buf;
         }
+        if (header_len) {
+            memcpy(&header, elem.out_sg[0].iov_base, header_len);
+        }
         port = find_port_by_vq(vser, vq);
         if (!port) {
             goto next_buf;
         }
+
         /*
          * A port may not have any handler registered for consuming the
          * data that the guest sends or it may not have a chardev associated
@@ -347,13 +577,38 @@  static void handle_output(VirtIODevice *vdev, VirtQueue *vq)
         }
 
         /* The guest always sends only one sg */
-        port->info->have_data(port, elem.out_sg[0].iov_base + header_len,
-                              elem.out_sg[0].iov_len - header_len);
+        buf = alloc_buf(elem.out_sg[0].iov_len - header_len);
+        memcpy(buf->buf, elem.out_sg[0].iov_base + header_len, buf->len);
+
+        if (header_len) {
+            /*
+             * Only the first buffer in a stream will have this
+             * set. This will help us identify the first buffer and
+             * the remaining buffers in the stream based on length
+             */
+            buf->flags = ldl_p(&header.flags)
+                & (VIRTIO_CONSOLE_HDR_START_DATA | VIRTIO_CONSOLE_HDR_END_DATA);
+        } else {
+            /* We always want to flush all the buffers in this case */
+            buf->flags = VIRTIO_CONSOLE_HDR_START_DATA
+                | VIRTIO_CONSOLE_HDR_END_DATA;
+        }
+
+        QTAILQ_INSERT_TAIL(&port->unflushed_buffers, buf, next);
+        port->nr_bytes += buf->len;
+        port->has_activity = true;
 
+        if (!port->host_throttled && port->byte_limit &&
+            port->nr_bytes >= port->byte_limit) {
+
+            port->host_throttled = true;
+            send_control_event(port, VIRTIO_CONSOLE_THROTTLE_PORT, 1);
+        }
     next_buf:
         virtqueue_push(vq, &elem, elem.out_sg[0].iov_len);
     }
     virtio_notify(vdev, vq);
+    flush_all_ports(vser);
 }
 
 static void handle_input(VirtIODevice *vdev, VirtQueue *vq)
@@ -386,6 +641,7 @@  static void virtio_serial_save(QEMUFile *f, void *opaque)
     VirtIOSerial *s = opaque;
     VirtIOSerialPort *port;
     uint32_t nr_active_ports;
+    unsigned int nr_bufs;
 
     /* The virtio device */
     virtio_save(&s->vdev, f);
@@ -408,14 +664,35 @@  static void virtio_serial_save(QEMUFile *f, void *opaque)
      * Items in struct VirtIOSerialPort.
      */
     QTAILQ_FOREACH(port, &s->ports, next) {
+        VirtIOSerialPortBuffer *buf;
+
         /*
          * We put the port number because we may not have an active
          * port at id 0 that's reserved for a console port, or in case
          * of ports that might have gotten unplugged
          */
         qemu_put_be32s(f, &port->id);
+        qemu_put_be64s(f, &port->byte_limit);
+        qemu_put_be64s(f, &port->nr_bytes);
         qemu_put_byte(f, port->guest_connected);
+        qemu_put_byte(f, port->host_throttled);
+
+        /* All the pending buffers from active ports */
+        nr_bufs = 0;
+        QTAILQ_FOREACH(buf, &port->unflushed_buffers, next) {
+            nr_bufs++;
+        }
+        qemu_put_be32s(f, &nr_bufs);
+        if (!nr_bufs) {
+            continue;
+        }
 
+        QTAILQ_FOREACH(buf, &port->unflushed_buffers, next) {
+            qemu_put_be64s(f, &buf->len);
+            qemu_put_be64s(f, &buf->offset);
+            qemu_put_be32s(f, &buf->flags);
+            qemu_put_buffer(f, buf->buf, buf->len);
+        }
     }
 }
 
@@ -448,13 +725,34 @@  static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
 
     /* Items in struct VirtIOSerialPort */
     for (i = 0; i < nr_active_ports; i++) {
+        VirtIOSerialPortBuffer *buf;
         uint32_t id;
+        unsigned int nr_bufs;
 
         id = qemu_get_be32(f);
         port = find_port_by_id(s, id);
 
+        port->byte_limit = qemu_get_be64(f);
+        port->nr_bytes   = qemu_get_be64(f);
         port->guest_connected = qemu_get_byte(f);
+        port->host_throttled = qemu_get_byte(f);
+
+        /* All the pending buffers from active ports */
+        qemu_get_be32s(f, &nr_bufs);
+        if (!nr_bufs) {
+            continue;
+        }
+        for (; nr_bufs; nr_bufs--) {
+            size_t len;
 
+            qemu_get_be64s(f, &len);
+            buf = alloc_buf(len);
+
+            qemu_get_be64s(f, &buf->offset);
+            qemu_get_be32s(f, &buf->flags);
+            qemu_get_buffer(f, buf->buf, buf->len);
+            QTAILQ_INSERT_TAIL(&port->unflushed_buffers, buf, next);
+        }
     }
 
     return 0;
@@ -490,6 +788,10 @@  static void virtser_bus_dev_print(Monitor *mon, DeviceState *qdev, int indent)
                    indent, "", port->guest_connected);
     monitor_printf(mon, "%*s dev-prop-int: host_connected: %d\n",
                    indent, "", port->host_connected);
+    monitor_printf(mon, "%*s dev-prop-int: host_throttled: %d\n",
+                   indent, "", port->host_throttled);
+    monitor_printf(mon, "%*s dev-prop-int: nr_bytes: %zu\n",
+                   indent, "", port->nr_bytes);
 }
 
 static int virtser_port_qdev_init(DeviceState *qdev, DeviceInfo *base)
@@ -520,6 +822,7 @@  static int virtser_port_qdev_init(DeviceState *qdev, DeviceInfo *base)
     if (ret) {
         return ret;
     }
+    QTAILQ_INIT(&port->unflushed_buffers);
 
     port->id = plugging_port0 ? 0 : port->vser->config.nr_ports++;
 
@@ -570,6 +873,8 @@  static int virtser_port_qdev_exit(DeviceState *qdev)
     if (port->info->exit)
         port->info->exit(dev);
 
+    remove_port_buffers(port);
+
     return 0;
 }
 
diff --git a/hw/virtio-serial.c b/hw/virtio-serial.c
index 470446b..fd27c33 100644
--- a/hw/virtio-serial.c
+++ b/hw/virtio-serial.c
@@ -66,13 +66,14 @@  static int virtconsole_initfn(VirtIOSerialDevice *dev)
 
     port->info = dev->info;
 
-    port->is_console = true;
-
     /*
-     * For console ports, just assume the guest is ready to accept our
-     * data.
+     * We're not interested in data the guest sends while nothing is
+     * connected on the host side. Just ignore it instead of saving it
+     * for later consumption.
      */
-    port->guest_connected = true;
+    port->cache_buffers = 0;
+
+    port->is_console = true;
 
     if (vcon->chr) {
         qemu_chr_add_handlers(vcon->chr, chr_can_read, chr_read, chr_event,
diff --git a/hw/virtio-serial.h b/hw/virtio-serial.h
index 5505841..acb601d 100644
--- a/hw/virtio-serial.h
+++ b/hw/virtio-serial.h
@@ -49,12 +49,18 @@  struct virtio_console_header {
     uint32_t flags;		/* Some message between host and guest */
 };
 
+/* Messages between host and guest */
+#define VIRTIO_CONSOLE_HDR_START_DATA	(1 << 0)
+#define VIRTIO_CONSOLE_HDR_END_DATA	(1 << 1)
+
 /* Some events for the internal messages (control packets) */
 #define VIRTIO_CONSOLE_PORT_READY	0
 #define VIRTIO_CONSOLE_CONSOLE_PORT	1
 #define VIRTIO_CONSOLE_RESIZE		2
 #define VIRTIO_CONSOLE_PORT_OPEN	3
 #define VIRTIO_CONSOLE_PORT_NAME	4
+#define VIRTIO_CONSOLE_THROTTLE_PORT	5
+#define VIRTIO_CONSOLE_CACHE_BUFFERS	6
 
 /* == In-qemu interface == */
 
@@ -96,6 +102,13 @@  struct VirtIOSerialPort {
     char *name;
 
     /*
+     * This list holds buffers pushed by the guest in case the guest
+     * sent incomplete messages or the host connection was down and
+     * the device requested to cache the data.
+     */
+    QTAILQ_HEAD(, VirtIOSerialPortBuffer) unflushed_buffers;
+
+    /*
      * This id helps identify ports between the guest and the host.
      * The guest sends a "header" with this id with each data packet
      * that it sends and the host can then find out which associated
@@ -103,6 +116,27 @@  struct VirtIOSerialPort {
      */
     uint32_t id;
 
+    /*
+     * Each port can specify the limit on number of bytes that can be
+     * outstanding in the unread buffers. This is to prevent any OOM
+     * situtation if a rogue process on the guest keeps injecting
+     * data.
+     */
+    size_t byte_limit;
+
+    /*
+     * The number of bytes we have queued up in our unread queue
+     */
+    size_t nr_bytes;
+
+    /*
+     * This boolean, when set, means "queue data that gets sent to
+     * this port when the host is not connected". The queued data, if
+     * any, is then sent out to the port when the host connection is
+     * opened.
+     */
+    uint8_t cache_buffers;
+
     /* Identify if this is a port that binds with hvc in the guest */
     uint8_t is_console;
 
@@ -110,6 +144,11 @@  struct VirtIOSerialPort {
     bool guest_connected;
     /* Is this device open for IO on the host? */
     bool host_connected;
+    /* Have we sent a throttle message to the guest? */
+    bool host_throttled;
+
+    /* Did this port get data in the recent handle_output call? */
+    bool has_activity;
 };
 
 struct VirtIOSerialPortInfo {