Patchwork virtio-serial-bus: replay guest_open on migration

login
register
mail settings
Submitter Alon Levy
Date July 26, 2011, 1:03 p.m.
Message ID <1311685434-25282-1-git-send-email-alevy@redhat.com>
Download mbox | patch
Permalink /patch/106870/
State New
Headers show

Comments

Alon Levy - July 26, 2011, 1:03 p.m.
Signed-off-by: Alon Levy <alevy@redhat.com>
---
 hw/virtio-serial-bus.c |    8 +++++++-
 1 files changed, 7 insertions(+), 1 deletions(-)
Markus Armbruster - July 27, 2011, 5:45 a.m.
Alon Levy <alevy@redhat.com> writes:

> Signed-off-by: Alon Levy <alevy@redhat.com>
> ---
>  hw/virtio-serial-bus.c |    8 +++++++-
>  1 files changed, 7 insertions(+), 1 deletions(-)
>
> diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
> index c5eb931..7a652ff 100644
> --- a/hw/virtio-serial-bus.c
> +++ b/hw/virtio-serial-bus.c
> @@ -618,14 +618,20 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
>      for (i = 0; i < nr_active_ports; i++) {
>          uint32_t id;
>          bool host_connected;
> +        VirtIOSerialPortInfo *info;
>  
>          id = qemu_get_be32(f);
>          port = find_port_by_id(s, id);
>          if (!port) {
>              return -EINVAL;
>          }
> -
>          port->guest_connected = qemu_get_byte(f);
> +        info = DO_UPCAST(VirtIOSerialPortInfo, qdev, port->dev.info);
> +        if (port->guest_connected && info->guest_open) {
> +            /* replay guest open */
> +            info->guest_open(port);
> +
> +        }
>          host_connected = qemu_get_byte(f);
>          if (host_connected != port->host_connected) {
>              /*

The patch makes enough sense to me, but the commit message is
insufficient.  Why do you have to replay?  And what's being fixed?
Alon Levy - July 27, 2011, 7:07 a.m.
On Wed, Jul 27, 2011 at 07:45:25AM +0200, Markus Armbruster wrote:
> Alon Levy <alevy@redhat.com> writes:
> 
> > Signed-off-by: Alon Levy <alevy@redhat.com>
> > ---
> >  hw/virtio-serial-bus.c |    8 +++++++-
> >  1 files changed, 7 insertions(+), 1 deletions(-)
> >
> > diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
> > index c5eb931..7a652ff 100644
> > --- a/hw/virtio-serial-bus.c
> > +++ b/hw/virtio-serial-bus.c
> > @@ -618,14 +618,20 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
> >      for (i = 0; i < nr_active_ports; i++) {
> >          uint32_t id;
> >          bool host_connected;
> > +        VirtIOSerialPortInfo *info;
> >  
> >          id = qemu_get_be32(f);
> >          port = find_port_by_id(s, id);
> >          if (!port) {
> >              return -EINVAL;
> >          }
> > -
> >          port->guest_connected = qemu_get_byte(f);
> > +        info = DO_UPCAST(VirtIOSerialPortInfo, qdev, port->dev.info);
> > +        if (port->guest_connected && info->guest_open) {
> > +            /* replay guest open */
> > +            info->guest_open(port);
> > +
> > +        }
> >          host_connected = qemu_get_byte(f);
> >          if (host_connected != port->host_connected) {
> >              /*
> 
> The patch makes enough sense to me, but the commit message is
> insufficient.  Why do you have to replay?  And what's being fixed?

When migrating a host with with a spice agent running the mouse becomes
non operational after the migration. This is rhbz #718463, currently on
spice-server but it seems this is a qemu-kvm issue. The problem is that
after migration spice doesn't know the guest agent is open. Spice is just
a char dev here. And a chardev cannot query it's device, the device has
to let the chardev know when it is open. Right now after migration the
chardev which is recreated is in it's default state, which assumes the
guest is disconnected. Char devices carry no information across migration,
but the virtio-serial does already carry the guest_connected state. This
patch passes that bit to the chardev.
Markus Armbruster - July 27, 2011, 9:21 a.m.
Alon Levy <alevy@redhat.com> writes:

> On Wed, Jul 27, 2011 at 07:45:25AM +0200, Markus Armbruster wrote:
>> Alon Levy <alevy@redhat.com> writes:
>> 
>> > Signed-off-by: Alon Levy <alevy@redhat.com>
>> > ---
>> >  hw/virtio-serial-bus.c |    8 +++++++-
>> >  1 files changed, 7 insertions(+), 1 deletions(-)
>> >
>> > diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
>> > index c5eb931..7a652ff 100644
>> > --- a/hw/virtio-serial-bus.c
>> > +++ b/hw/virtio-serial-bus.c
>> > @@ -618,14 +618,20 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
>> >      for (i = 0; i < nr_active_ports; i++) {
>> >          uint32_t id;
>> >          bool host_connected;
>> > +        VirtIOSerialPortInfo *info;
>> >  
>> >          id = qemu_get_be32(f);
>> >          port = find_port_by_id(s, id);
>> >          if (!port) {
>> >              return -EINVAL;
>> >          }
>> > -
>> >          port->guest_connected = qemu_get_byte(f);
>> > +        info = DO_UPCAST(VirtIOSerialPortInfo, qdev, port->dev.info);
>> > +        if (port->guest_connected && info->guest_open) {
>> > +            /* replay guest open */
>> > +            info->guest_open(port);
>> > +
>> > +        }
>> >          host_connected = qemu_get_byte(f);
>> >          if (host_connected != port->host_connected) {
>> >              /*
>> 
>> The patch makes enough sense to me, but the commit message is
>> insufficient.  Why do you have to replay?  And what's being fixed?
>
> When migrating a host with with a spice agent running the mouse becomes
> non operational after the migration. This is rhbz #718463, currently on
> spice-server but it seems this is a qemu-kvm issue. The problem is that
> after migration spice doesn't know the guest agent is open. Spice is just
> a char dev here. And a chardev cannot query it's device, the device has
> to let the chardev know when it is open. Right now after migration the
> chardev which is recreated is in it's default state, which assumes the
> guest is disconnected. Char devices carry no information across migration,
> but the virtio-serial does already carry the guest_connected state. This
> patch passes that bit to the chardev.

Put this information in the commit message and resend?
Amit Shah - July 27, 2011, 10:20 a.m.
On (Wed) 27 Jul 2011 [10:07:56], Alon Levy wrote:
> On Wed, Jul 27, 2011 at 07:45:25AM +0200, Markus Armbruster wrote:
> > Alon Levy <alevy@redhat.com> writes:
> > 
> > > Signed-off-by: Alon Levy <alevy@redhat.com>
> > > ---
> > >  hw/virtio-serial-bus.c |    8 +++++++-
> > >  1 files changed, 7 insertions(+), 1 deletions(-)
> > >
> > > diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
> > > index c5eb931..7a652ff 100644
> > > --- a/hw/virtio-serial-bus.c
> > > +++ b/hw/virtio-serial-bus.c
> > > @@ -618,14 +618,20 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
> > >      for (i = 0; i < nr_active_ports; i++) {
> > >          uint32_t id;
> > >          bool host_connected;
> > > +        VirtIOSerialPortInfo *info;
> > >  
> > >          id = qemu_get_be32(f);
> > >          port = find_port_by_id(s, id);
> > >          if (!port) {
> > >              return -EINVAL;
> > >          }
> > > -
> > >          port->guest_connected = qemu_get_byte(f);
> > > +        info = DO_UPCAST(VirtIOSerialPortInfo, qdev, port->dev.info);
> > > +        if (port->guest_connected && info->guest_open) {
> > > +            /* replay guest open */
> > > +            info->guest_open(port);
> > > +
> > > +        }
> > >          host_connected = qemu_get_byte(f);
> > >          if (host_connected != port->host_connected) {
> > >              /*
> > 
> > The patch makes enough sense to me, but the commit message is
> > insufficient.  Why do you have to replay?  And what's being fixed?
> 
> When migrating a host with with a spice agent running the mouse becomes
> non operational after the migration. This is rhbz #718463, currently on
> spice-server but it seems this is a qemu-kvm issue. The problem is that
> after migration spice doesn't know the guest agent is open. Spice is just
> a char dev here. And a chardev cannot query it's device, the device has
> to let the chardev know when it is open. Right now after migration the
> chardev which is recreated is in it's default state, which assumes the
> guest is disconnected. Char devices carry no information across migration,
> but the virtio-serial does already carry the guest_connected state. This
> patch passes that bit to the chardev.

It's not guaranteed all ports will be chardevs.

My thinking was this can be handled by qemu-char-spice.c since it can
add a new migration section and if an image is being restored and the
guest agent channel is open after migration finishes, it can continue
its work from there.  What's the benefit of all virtio-serial ports
receiving a guest_open() event in this case?

Also, we'll be lying that a guest opened, since a guest was opened
much earlier, before migration.  Nothing has changed as far as the
guest is concerned, this is just some host-side tracking that has to
be done post-migrate, which belongs in individual devices / ports.

So I'm not completely sure that this is the right place for such a
notification.  However, if others feel this is fine, I'll accept the
patch.

(Also, when resending, make sure the whitespace changes don't go
through.)

Thanks,
		Amit
Alon Levy - July 27, 2011, 11:09 a.m.
On Wed, Jul 27, 2011 at 03:50:11PM +0530, Amit Shah wrote:
> On (Wed) 27 Jul 2011 [10:07:56], Alon Levy wrote:
> > On Wed, Jul 27, 2011 at 07:45:25AM +0200, Markus Armbruster wrote:
> > > Alon Levy <alevy@redhat.com> writes:
> > > 
> > > > Signed-off-by: Alon Levy <alevy@redhat.com>
> > > > ---
> > > >  hw/virtio-serial-bus.c |    8 +++++++-
> > > >  1 files changed, 7 insertions(+), 1 deletions(-)
> > > >
> > > > diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
> > > > index c5eb931..7a652ff 100644
> > > > --- a/hw/virtio-serial-bus.c
> > > > +++ b/hw/virtio-serial-bus.c
> > > > @@ -618,14 +618,20 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
> > > >      for (i = 0; i < nr_active_ports; i++) {
> > > >          uint32_t id;
> > > >          bool host_connected;
> > > > +        VirtIOSerialPortInfo *info;
> > > >  
> > > >          id = qemu_get_be32(f);
> > > >          port = find_port_by_id(s, id);
> > > >          if (!port) {
> > > >              return -EINVAL;
> > > >          }
> > > > -
> > > >          port->guest_connected = qemu_get_byte(f);
> > > > +        info = DO_UPCAST(VirtIOSerialPortInfo, qdev, port->dev.info);
> > > > +        if (port->guest_connected && info->guest_open) {
> > > > +            /* replay guest open */
> > > > +            info->guest_open(port);
> > > > +
> > > > +        }
> > > >          host_connected = qemu_get_byte(f);
> > > >          if (host_connected != port->host_connected) {
> > > >              /*
> > > 
> > > The patch makes enough sense to me, but the commit message is
> > > insufficient.  Why do you have to replay?  And what's being fixed?
> > 
> > When migrating a host with with a spice agent running the mouse becomes
> > non operational after the migration. This is rhbz #718463, currently on
> > spice-server but it seems this is a qemu-kvm issue. The problem is that
> > after migration spice doesn't know the guest agent is open. Spice is just
> > a char dev here. And a chardev cannot query it's device, the device has
> > to let the chardev know when it is open. Right now after migration the
> > chardev which is recreated is in it's default state, which assumes the
> > guest is disconnected. Char devices carry no information across migration,
> > but the virtio-serial does already carry the guest_connected state. This
> > patch passes that bit to the chardev.
> 
> It's not guaranteed all ports will be chardevs.
> 

The wording may be off, but the code isn't - it doesn't check for chardev,
it calls the same guest_open callback that is called when guest_connected
is changed from 0 to 1. So the logic is:
1. Start from stratch.
2. guest_connected 0->1
3. info->guest_open(port)

And on migration target:
5. if (guest_connected)
6.  info->guest_open(port)

If that callback was non NULL it would be called in a non migration scenario
as well, so no reason to care if it is specifically a chardev or not, let alone
specifically a spice chardev or not.

> My thinking was this can be handled by qemu-char-spice.c since it can
> add a new migration section and if an image is being restored and the
> guest agent channel is open after migration finishes, it can continue
> its work from there.  What's the benefit of all virtio-serial ports
> receiving a guest_open() event in this case?
> 

Consistency between non migration guest open and migrating when guest connected.

> Also, we'll be lying that a guest opened, since a guest was opened
> much earlier, before migration.  Nothing has changed as far as the
> guest is concerned, this is just some host-side tracking that has to
> be done post-migrate, which belongs in individual devices / ports.

The callback is there on purpose, some qemu side users exist surely. While
I understand the lying part about the time, it is worst to lie completely
by not mentioning the guest has opened the port.

> 
> So I'm not completely sure that this is the right place for such a
> notification.  However, if others feel this is fine, I'll accept the
> patch.
> 
> (Also, when resending, make sure the whitespace changes don't go
> through.)
> 

I removed them. We can continue this on the patch - I didn't cc you since
I thought you were already agreed and just wanted Armbru/Juan to take a look.

> Thanks,
> 		Amit
>
Amit Shah - July 27, 2011, 12:05 p.m.
On (Wed) 27 Jul 2011 [14:09:45], Alon Levy wrote:

> > Also, we'll be lying that a guest opened, since a guest was opened
> > much earlier, before migration.  Nothing has changed as far as the
> > guest is concerned, this is just some host-side tracking that has to
> > be done post-migrate, which belongs in individual devices / ports.
> 
> The callback is there on purpose, some qemu side users exist surely. While
> I understand the lying part about the time, it is worst to lie completely
> by not mentioning the guest has opened the port.

Guest has not re-opened the port.  When the guest opened the port,
spice did get the callback called.  After migration, guest state has
not changed.  Why should you get the callback again?  If you depend on
guest connectedness, after migration, just ensure you do whatever is
necessary.  I think there's no need to involve any other code in this.

		Amit
Alon Levy - July 27, 2011, 12:27 p.m.
On Wed, Jul 27, 2011 at 05:35:28PM +0530, Amit Shah wrote:
> On (Wed) 27 Jul 2011 [14:09:45], Alon Levy wrote:
> 
> > > Also, we'll be lying that a guest opened, since a guest was opened
> > > much earlier, before migration.  Nothing has changed as far as the
> > > guest is concerned, this is just some host-side tracking that has to
> > > be done post-migrate, which belongs in individual devices / ports.
> > 
> > The callback is there on purpose, some qemu side users exist surely. While
> > I understand the lying part about the time, it is worst to lie completely
> > by not mentioning the guest has opened the port.
> 
> Guest has not re-opened the port.  When the guest opened the port,
> spice did get the callback called.  After migration, guest state has
> not changed.  Why should you get the callback again?
Again, my point is that from the chardev pov this *is* the first callback.
You say the timing is wrong - correct, but I don't see any part in the api
that talks about timing, i.e. no gurantee to be broken.

Actually searching for current users it appears it is just spicevmc (spice-qemu-char.c).

> If you depend on
> guest connectedness, after migration, just ensure you do whatever is
> necessary.  I think there's no need to involve any other code in this.

There is no migration mechanism for char devices. There is no migration
mechanism for the spice server. All data that is passed is done via device migration -
qxl or in this case I suggest virtio-serial.

Do you think chardevices should be migratable? that would also work.

Do you think a separate callback would be better worded? or add a "migrated" boolean?
both are ugly, I agree.

The chardev does receive parameters, it's possible to have the remote vm start with
a spicevmc chardev that has "migrated=1" as part of the command line. That looks much
uglier then this patch to me.

> 
> 		Amit
>
Markus Armbruster - July 27, 2011, 2:02 p.m.
Alon Levy <alevy@redhat.com> writes:

> On Wed, Jul 27, 2011 at 05:35:28PM +0530, Amit Shah wrote:
>> On (Wed) 27 Jul 2011 [14:09:45], Alon Levy wrote:
>> 
>> > > Also, we'll be lying that a guest opened, since a guest was opened
>> > > much earlier, before migration.  Nothing has changed as far as the
>> > > guest is concerned, this is just some host-side tracking that has to
>> > > be done post-migrate, which belongs in individual devices / ports.
>> > 
>> > The callback is there on purpose, some qemu side users exist surely. While
>> > I understand the lying part about the time, it is worst to lie completely
>> > by not mentioning the guest has opened the port.
>> 
>> Guest has not re-opened the port.  When the guest opened the port,
>> spice did get the callback called.  After migration, guest state has
>> not changed.  Why should you get the callback again?
> Again, my point is that from the chardev pov this *is* the first callback.
> You say the timing is wrong - correct, but I don't see any part in the api
> that talks about timing, i.e. no gurantee to be broken.
>
> Actually searching for current users it appears it is just spicevmc (spice-qemu-char.c).
>
>> If you depend on
>> guest connectedness, after migration, just ensure you do whatever is
>> necessary.  I think there's no need to involve any other code in this.
>
> There is no migration mechanism for char devices. There is no migration
> mechanism for the spice server. All data that is passed is done via device migration -
> qxl or in this case I suggest virtio-serial.
>
> Do you think chardevices should be migratable? that would also work.

No.

 We migrate device models (frontends).  The backends get recreated from
scratch.

> Do you think a separate callback would be better worded? or add a "migrated" boolean?
> both are ugly, I agree.

I don't see the need for two separate callbacks.  From the backend's
point of view, there's no difference between normal startup and
migration startup.

If the name "guest_open" bothers you, find another one.  guest_connect?
The guest (or rather the device model), connects to a newly created
backend chrdev.  The just guest doesn't know it (the device model does).

> The chardev does receive parameters, it's possible to have the remote vm start with
> a spicevmc chardev that has "migrated=1" as part of the command line. That looks much
> uglier then this patch to me.

There's no precedence for "migrated=1".

For what it's worth, my recent block layer series also has device models
run backend callbacks after migration.  Check out "[PATCH 37/55]
ide/atapi: Preserve tray state on migration" and "[PATCH 38/55]
scsi-disk: Preserve tray state on migration".
Anthony Liguori - July 27, 2011, 2:16 p.m.
On 07/27/2011 02:07 AM, Alon Levy wrote:
> On Wed, Jul 27, 2011 at 07:45:25AM +0200, Markus Armbruster wrote:
>> Alon Levy<alevy@redhat.com>  writes:
>>
>>> Signed-off-by: Alon Levy<alevy@redhat.com>
>>> ---
>>>   hw/virtio-serial-bus.c |    8 +++++++-
>>>   1 files changed, 7 insertions(+), 1 deletions(-)
>>>
>>> diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
>>> index c5eb931..7a652ff 100644
>>> --- a/hw/virtio-serial-bus.c
>>> +++ b/hw/virtio-serial-bus.c
>>> @@ -618,14 +618,20 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
>>>       for (i = 0; i<  nr_active_ports; i++) {
>>>           uint32_t id;
>>>           bool host_connected;
>>> +        VirtIOSerialPortInfo *info;
>>>
>>>           id = qemu_get_be32(f);
>>>           port = find_port_by_id(s, id);
>>>           if (!port) {
>>>               return -EINVAL;
>>>           }
>>> -
>>>           port->guest_connected = qemu_get_byte(f);
>>> +        info = DO_UPCAST(VirtIOSerialPortInfo, qdev, port->dev.info);
>>> +        if (port->guest_connected&&  info->guest_open) {
>>> +            /* replay guest open */
>>> +            info->guest_open(port);
>>> +
>>> +        }
>>>           host_connected = qemu_get_byte(f);
>>>           if (host_connected != port->host_connected) {
>>>               /*
>>
>> The patch makes enough sense to me, but the commit message is
>> insufficient.  Why do you have to replay?  And what's being fixed?
>
> When migrating a host with with a spice agent running the mouse becomes
> non operational after the migration. This is rhbz #718463, currently on
> spice-server but it seems this is a qemu-kvm issue. The problem is that
> after migration spice doesn't know the guest agent is open.

The problem is that guest_open is a hack.

You want connection semantics.  You need the following information from 
the backend and the client:

1) backend is associated with a transport.  The transport may disconnect 
at any point in time.  The backend needs to have explicit state 
transitions associated with the transport disconnecting and connecting.

2) the client may disconnect and reconnect at any point in time.  A 
device model reset is a disconnect followed by a reconnect.

This gives you the following matrix of states:

A: backend-connected, client-connected
B: backend-disconnected, client-disconnected
C: backend-connected, client-disconnected
D: backend-disconnected, client-connected

The state transition diagram looks like this:

B: for some devices, immediately goto C.  other devices, on accept() 
goto D.  if in B and client connects, goto D

C: if transport disconnects, goto B. if client connects, goto A

D: if transport connects, goto A.  if client disconects, goto B

A: if transport disconnects, goto B, if client disconnects, goto C

The problem is that guest_open() is a poor approximation of 
'client-connected' and it's not used universally.  We need to introduce 
proper state tracking to the character device layer and we need to have 
a proper connection function that is used by all char device clients.

Semantically, write should only be allowed in states A and D.  Read 
should only be allowed in states A and C.

C and D should have very well defined semantics about what happens to 
the data that is written  Arguably, read/write should not be allowed in 
states C/D.

Device reset should always trigger a client reconnect.  Migration resets 
devices so migration would Just Work if we modelled the state 
transitions appropriately.

Regards,

Anthony Liguori
Alon Levy - July 27, 2011, 2:49 p.m.
On Wed, Jul 27, 2011 at 09:16:33AM -0500, Anthony Liguori wrote:
> On 07/27/2011 02:07 AM, Alon Levy wrote:
> >On Wed, Jul 27, 2011 at 07:45:25AM +0200, Markus Armbruster wrote:
> >>Alon Levy<alevy@redhat.com>  writes:
> >>
> >>>Signed-off-by: Alon Levy<alevy@redhat.com>
> >>>---
> >>>  hw/virtio-serial-bus.c |    8 +++++++-
> >>>  1 files changed, 7 insertions(+), 1 deletions(-)
> >>>
> >>>diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
> >>>index c5eb931..7a652ff 100644
> >>>--- a/hw/virtio-serial-bus.c
> >>>+++ b/hw/virtio-serial-bus.c
> >>>@@ -618,14 +618,20 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
> >>>      for (i = 0; i<  nr_active_ports; i++) {
> >>>          uint32_t id;
> >>>          bool host_connected;
> >>>+        VirtIOSerialPortInfo *info;
> >>>
> >>>          id = qemu_get_be32(f);
> >>>          port = find_port_by_id(s, id);
> >>>          if (!port) {
> >>>              return -EINVAL;
> >>>          }
> >>>-
> >>>          port->guest_connected = qemu_get_byte(f);
> >>>+        info = DO_UPCAST(VirtIOSerialPortInfo, qdev, port->dev.info);
> >>>+        if (port->guest_connected&&  info->guest_open) {
> >>>+            /* replay guest open */
> >>>+            info->guest_open(port);
> >>>+
> >>>+        }
> >>>          host_connected = qemu_get_byte(f);
> >>>          if (host_connected != port->host_connected) {
> >>>              /*
> >>
> >>The patch makes enough sense to me, but the commit message is
> >>insufficient.  Why do you have to replay?  And what's being fixed?
> >
> >When migrating a host with with a spice agent running the mouse becomes
> >non operational after the migration. This is rhbz #718463, currently on
> >spice-server but it seems this is a qemu-kvm issue. The problem is that
> >after migration spice doesn't know the guest agent is open.
> 
> The problem is that guest_open is a hack.
> 
> You want connection semantics.  You need the following information
> from the backend and the client:
> 
> 1) backend is associated with a transport.  The transport may
> disconnect at any point in time.  The backend needs to have explicit
> state transitions associated with the transport disconnecting and
> connecting.
> 
> 2) the client may disconnect and reconnect at any point in time.  A
> device model reset is a disconnect followed by a reconnect.
> 
> This gives you the following matrix of states:
> 
> A: backend-connected, client-connected
> B: backend-disconnected, client-disconnected
> C: backend-connected, client-disconnected
> D: backend-disconnected, client-connected
> 
> The state transition diagram looks like this:
> 
> B: for some devices, immediately goto C.  other devices, on accept()
> goto D.  if in B and client connects, goto D
> 
> C: if transport disconnects, goto B. if client connects, goto A
> 
> D: if transport connects, goto A.  if client disconects, goto B
> 
> A: if transport disconnects, goto B, if client disconnects, goto C
> 
> The problem is that guest_open() is a poor approximation of
> 'client-connected' and it's not used universally.  We need to
> introduce proper state tracking to the character device layer and we
> need to have a proper connection function that is used by all char
> device clients.
> 
> Semantically, write should only be allowed in states A and D.  Read
> should only be allowed in states A and C.
> 
> C and D should have very well defined semantics about what happens
> to the data that is written  Arguably, read/write should not be
> allowed in states C/D.
> 
> Device reset should always trigger a client reconnect.  Migration
> resets devices so migration would Just Work if we modelled the state
> transitions appropriately.
Are you saying currently on migration the guest (client above) always
receives an event from virtio? The guest_open callback happens when
a guest operation happens, not when the device gets reset, unless the later
triggers the former, but I don't understand how that would happen since a
reset can happen while the guest isn't ready to handle anything (guest is
booting).
I do see a virtio_pci_reset does a virtio_reset which sends the
VIRTIO_NO_VECTOR interrupt, but I don't understand what happens after that.

Besides, I understand the need to fix the connection semantics of chardevs,
but the situation is broken right now and even if someone were to write this
I don't believe you would just take it to 0.15.0, would you?

Also, the conversation is still ongoing but Armbru mentioned some ''relevant-cases''
in http://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg03221.html
backing the fix-the-hack approach (at least for 0.15.0).

> 
> Regards,
> 
> Anthony Liguori
>
Anthony Liguori - July 27, 2011, 3:01 p.m.
On 07/27/2011 09:49 AM, Alon Levy wrote:
> On Wed, Jul 27, 2011 at 09:16:33AM -0500, Anthony Liguori wrote:
>> On 07/27/2011 02:07 AM, Alon Levy wrote:
>>> On Wed, Jul 27, 2011 at 07:45:25AM +0200, Markus Armbruster wrote:
>>>> Alon Levy<alevy@redhat.com>   writes:
>>>>
>>>>> Signed-off-by: Alon Levy<alevy@redhat.com>
>>>>> ---
>>>>>   hw/virtio-serial-bus.c |    8 +++++++-
>>>>>   1 files changed, 7 insertions(+), 1 deletions(-)
>>>>>
>>>>> diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
>>>>> index c5eb931..7a652ff 100644
>>>>> --- a/hw/virtio-serial-bus.c
>>>>> +++ b/hw/virtio-serial-bus.c
>>>>> @@ -618,14 +618,20 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
>>>>>       for (i = 0; i<   nr_active_ports; i++) {
>>>>>           uint32_t id;
>>>>>           bool host_connected;
>>>>> +        VirtIOSerialPortInfo *info;
>>>>>
>>>>>           id = qemu_get_be32(f);
>>>>>           port = find_port_by_id(s, id);
>>>>>           if (!port) {
>>>>>               return -EINVAL;
>>>>>           }
>>>>> -
>>>>>           port->guest_connected = qemu_get_byte(f);
>>>>> +        info = DO_UPCAST(VirtIOSerialPortInfo, qdev, port->dev.info);
>>>>> +        if (port->guest_connected&&   info->guest_open) {
>>>>> +            /* replay guest open */
>>>>> +            info->guest_open(port);
>>>>> +
>>>>> +        }
>>>>>           host_connected = qemu_get_byte(f);
>>>>>           if (host_connected != port->host_connected) {
>>>>>               /*
>>>>
>>>> The patch makes enough sense to me, but the commit message is
>>>> insufficient.  Why do you have to replay?  And what's being fixed?
>>>
>>> When migrating a host with with a spice agent running the mouse becomes
>>> non operational after the migration. This is rhbz #718463, currently on
>>> spice-server but it seems this is a qemu-kvm issue. The problem is that
>>> after migration spice doesn't know the guest agent is open.
>>
>> The problem is that guest_open is a hack.
>>
>> You want connection semantics.  You need the following information
>> from the backend and the client:
>>
>> 1) backend is associated with a transport.  The transport may
>> disconnect at any point in time.  The backend needs to have explicit
>> state transitions associated with the transport disconnecting and
>> connecting.
>>
>> 2) the client may disconnect and reconnect at any point in time.  A
>> device model reset is a disconnect followed by a reconnect.
>>
>> This gives you the following matrix of states:
>>
>> A: backend-connected, client-connected
>> B: backend-disconnected, client-disconnected
>> C: backend-connected, client-disconnected
>> D: backend-disconnected, client-connected
>>
>> The state transition diagram looks like this:
>>
>> B: for some devices, immediately goto C.  other devices, on accept()
>> goto D.  if in B and client connects, goto D
>>
>> C: if transport disconnects, goto B. if client connects, goto A
>>
>> D: if transport connects, goto A.  if client disconects, goto B
>>
>> A: if transport disconnects, goto B, if client disconnects, goto C
>>
>> The problem is that guest_open() is a poor approximation of
>> 'client-connected' and it's not used universally.  We need to
>> introduce proper state tracking to the character device layer and we
>> need to have a proper connection function that is used by all char
>> device clients.
>>
>> Semantically, write should only be allowed in states A and D.  Read
>> should only be allowed in states A and C.
>>
>> C and D should have very well defined semantics about what happens
>> to the data that is written  Arguably, read/write should not be
>> allowed in states C/D.
>>
>> Device reset should always trigger a client reconnect.  Migration
>> resets devices so migration would Just Work if we modelled the state
>> transitions appropriately.
> Are you saying currently on migration the guest (client above) always
> receives an event from virtio?

No.  Everything is all over the place today.  I'm saying this is how it 
should work:

All character devices need to start in the same state today.  As it 
stands, everything but spicevmc starts in D.  spicevmc starts in B. 
guest_open()/guest_close() assumes the devices starts in D.

guest_open/guest_close is a fundamentally broken interface because it 
only works for spicevmc.

We need to generalize guest_open/guest_close and make all backends start 
in D.  That means we need to add guest_open() calls to the initfns of 
any device that uses a CharDriverState.

And the natural place to add a guest_close() is to the destroy functions 
of any device.

reset() is semantically a close followed by an open, so for all devices 
that use CDSs, we should add a guest_close() followed by a guest_open().

Now, for virtio-serial, since it supports multiple ports, in it's 
reset() path, it needs to do close() followed by open.  Since 
vmstate_load() is basically an extension of reset(), you'll have to 
open() any new ports that appear in that path.

> The guest_open callback happens when
> a guest operation happens, not when the device gets reset, unless the later
> triggers the former, but I don't understand how that would happen since a
> reset can happen while the guest isn't ready to handle anything (guest is
> booting).
> I do see a virtio_pci_reset does a virtio_reset which sends the
> VIRTIO_NO_VECTOR interrupt, but I don't understand what happens after that.
>
> Besides, I understand the need to fix the connection semantics of chardevs,
> but the situation is broken right now and even if someone were to write this
> I don't believe you would just take it to 0.15.0, would you?

No, guest_open() is fundamentally broken right now.  It's not that hard 
to fix, but it's too much for 0.150 I suspect.

>
> Also, the conversation is still ongoing but Armbru mentioned some ''relevant-cases''
> in http://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg03221.html
> backing the fix-the-hack approach (at least for 0.15.0).

The problem is this isn't the only place where this problem with occur. 
  libvirt just implemented reboot via shutdown and reset which means 
that if you don't do the right thing in reset(), you're going to have 
the same problem there.

Fixing it right isn't hard, let's focus on doing it the right way.

Regards,

Anthony Liguori

>>
>> Regards,
>>
>> Anthony Liguori
>>
>
Alon Levy - July 27, 2011, 3:32 p.m.
On Wed, Jul 27, 2011 at 10:01:57AM -0500, Anthony Liguori wrote:
> On 07/27/2011 09:49 AM, Alon Levy wrote:
> >On Wed, Jul 27, 2011 at 09:16:33AM -0500, Anthony Liguori wrote:
> >>On 07/27/2011 02:07 AM, Alon Levy wrote:
> >>>On Wed, Jul 27, 2011 at 07:45:25AM +0200, Markus Armbruster wrote:
> >>>>Alon Levy<alevy@redhat.com>   writes:
> >>>>
> >>>>>Signed-off-by: Alon Levy<alevy@redhat.com>
> >>>>>---
> >>>>>  hw/virtio-serial-bus.c |    8 +++++++-
> >>>>>  1 files changed, 7 insertions(+), 1 deletions(-)
> >>>>>
> >>>>>diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
> >>>>>index c5eb931..7a652ff 100644
> >>>>>--- a/hw/virtio-serial-bus.c
> >>>>>+++ b/hw/virtio-serial-bus.c
> >>>>>@@ -618,14 +618,20 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
> >>>>>      for (i = 0; i<   nr_active_ports; i++) {
> >>>>>          uint32_t id;
> >>>>>          bool host_connected;
> >>>>>+        VirtIOSerialPortInfo *info;
> >>>>>
> >>>>>          id = qemu_get_be32(f);
> >>>>>          port = find_port_by_id(s, id);
> >>>>>          if (!port) {
> >>>>>              return -EINVAL;
> >>>>>          }
> >>>>>-
> >>>>>          port->guest_connected = qemu_get_byte(f);
> >>>>>+        info = DO_UPCAST(VirtIOSerialPortInfo, qdev, port->dev.info);
> >>>>>+        if (port->guest_connected&&   info->guest_open) {
> >>>>>+            /* replay guest open */
> >>>>>+            info->guest_open(port);
> >>>>>+
> >>>>>+        }
> >>>>>          host_connected = qemu_get_byte(f);
> >>>>>          if (host_connected != port->host_connected) {
> >>>>>              /*
> >>>>
> >>>>The patch makes enough sense to me, but the commit message is
> >>>>insufficient.  Why do you have to replay?  And what's being fixed?
> >>>
> >>>When migrating a host with with a spice agent running the mouse becomes
> >>>non operational after the migration. This is rhbz #718463, currently on
> >>>spice-server but it seems this is a qemu-kvm issue. The problem is that
> >>>after migration spice doesn't know the guest agent is open.
> >>
> >>The problem is that guest_open is a hack.
> >>
> >>You want connection semantics.  You need the following information
> >>from the backend and the client:
> >>
> >>1) backend is associated with a transport.  The transport may
> >>disconnect at any point in time.  The backend needs to have explicit
> >>state transitions associated with the transport disconnecting and
> >>connecting.
> >>
> >>2) the client may disconnect and reconnect at any point in time.  A
> >>device model reset is a disconnect followed by a reconnect.
> >>
> >>This gives you the following matrix of states:
> >>
> >>A: backend-connected, client-connected
> >>B: backend-disconnected, client-disconnected
> >>C: backend-connected, client-disconnected
> >>D: backend-disconnected, client-connected
> >>
> >>The state transition diagram looks like this:
> >>
> >>B: for some devices, immediately goto C.  other devices, on accept()
> >>goto D.  if in B and client connects, goto D
> >>
> >>C: if transport disconnects, goto B. if client connects, goto A
> >>
> >>D: if transport connects, goto A.  if client disconects, goto B
> >>
> >>A: if transport disconnects, goto B, if client disconnects, goto C
> >>
> >>The problem is that guest_open() is a poor approximation of
> >>'client-connected' and it's not used universally.  We need to
> >>introduce proper state tracking to the character device layer and we
> >>need to have a proper connection function that is used by all char
> >>device clients.
> >>
> >>Semantically, write should only be allowed in states A and D.  Read
> >>should only be allowed in states A and C.
> >>
> >>C and D should have very well defined semantics about what happens
> >>to the data that is written  Arguably, read/write should not be
> >>allowed in states C/D.
> >>
> >>Device reset should always trigger a client reconnect.  Migration
> >>resets devices so migration would Just Work if we modelled the state
> >>transitions appropriately.
> >Are you saying currently on migration the guest (client above) always
> >receives an event from virtio?
> 
> No.  Everything is all over the place today.  I'm saying this is how
> it should work:
> 
> All character devices need to start in the same state today.  As it
> stands, everything but spicevmc starts in D.  spicevmc starts in B.
> guest_open()/guest_close() assumes the devices starts in D.
> 

I think that's squashing two concepts into one. The guest open/close
is mirroring a state inside the guest. The client_connected is a different
thing. A reset of the device does not equal a guest_close + guest_open,
since the guest isn't aware of it at all (the whole reset path I've
followed was wrong - it happens while the guest is stopped, it's only
started after migration is finished, so the guest sees nothing).

> guest_open/guest_close is a fundamentally broken interface because
> it only works for spicevmc.

It's a third bit. the client concept is really the device, not the
guest. And currently nothing is interested in it but spicevmc, so
no harm with it being unimplemented by any other chardev.

> 
> We need to generalize guest_open/guest_close and make all backends
> start in D.  That means we need to add guest_open() calls to the
> initfns of any device that uses a CharDriverState.
> 
> And the natural place to add a guest_close() is to the destroy
> functions of any device.
> 
> reset() is semantically a close followed by an open, so for all
> devices that use CDSs, we should add a guest_close() followed by a
> guest_open().
> 
> Now, for virtio-serial, since it supports multiple ports, in it's
> reset() path, it needs to do close() followed by open.  Since
> vmstate_load() is basically an extension of reset(), you'll have to
> open() any new ports that appear in that path.
> 
> >The guest_open callback happens when
> >a guest operation happens, not when the device gets reset, unless the later
> >triggers the former, but I don't understand how that would happen since a
> >reset can happen while the guest isn't ready to handle anything (guest is
> >booting).
> >I do see a virtio_pci_reset does a virtio_reset which sends the
> >VIRTIO_NO_VECTOR interrupt, but I don't understand what happens after that.
> >
> >Besides, I understand the need to fix the connection semantics of chardevs,
> >but the situation is broken right now and even if someone were to write this
> >I don't believe you would just take it to 0.15.0, would you?
> 
> No, guest_open() is fundamentally broken right now.  It's not that
> hard to fix, but it's too much for 0.150 I suspect.
> 
> >
> >Also, the conversation is still ongoing but Armbru mentioned some ''relevant-cases''
> >in http://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg03221.html
> >backing the fix-the-hack approach (at least for 0.15.0).
> 
> The problem is this isn't the only place where this problem with
> occur.  libvirt just implemented reboot via shutdown and reset which
> means that if you don't do the right thing in reset(), you're going
> to have the same problem there.
> 
> Fixing it right isn't hard, let's focus on doing it the right way.
> 
> Regards,
> 
> Anthony Liguori
> 
> >>
> >>Regards,
> >>
> >>Anthony Liguori
> >>
> >
> 
>

Patch

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index c5eb931..7a652ff 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -618,14 +618,20 @@  static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
     for (i = 0; i < nr_active_ports; i++) {
         uint32_t id;
         bool host_connected;
+        VirtIOSerialPortInfo *info;
 
         id = qemu_get_be32(f);
         port = find_port_by_id(s, id);
         if (!port) {
             return -EINVAL;
         }
-
         port->guest_connected = qemu_get_byte(f);
+        info = DO_UPCAST(VirtIOSerialPortInfo, qdev, port->dev.info);
+        if (port->guest_connected && info->guest_open) {
+            /* replay guest open */
+            info->guest_open(port);
+
+        }
         host_connected = qemu_get_byte(f);
         if (host_connected != port->host_connected) {
             /*