Patchwork qemu-ga behavior on virtio-serial unplug

login
register
mail settings
Submitter Michael Roth
Date June 20, 2013, 3:12 p.m.
Message ID <20130620151230.GD7866@vm>
Download mbox | patch
Permalink /patch/252997/
State New
Headers show

Comments

Michael Roth - June 20, 2013, 3:12 p.m.
On Wed, Jun 19, 2013 at 01:17:57PM +0200, Laszlo Ersek wrote:
> Hello Michael,
> 
> this is with reference to
> <https://bugzilla.redhat.com/show_bug.cgi?id=907733>.
> 
> Ever since the initial qemu-ga commit AFAICS an exception for
> virtio-serial has existed, when reading EOF from the channel.
> 
> For isa-serial, EOF results in the client connection being closed. I
> assume this exits the glib main loop somehow (otherwise qemu-ga would
> just sit there and do nothing, as no further connections are accepted I
> think).

I think it would actually do the latter, unfortunately. It's distinct
from virtio-serial handling in that we remove the GSource by return false
in the event handler (qga/main.c:channel_event_cb), but I think we'd
need to drop in a g_main_loop_quit() to actually get qemu-ga to exit in
that scenario.

This doesn't normally get triggered though, as isa-serial does not send
EOF when nothing is connected to the chardev backend, but instead just
blocks. Might till make sense to make qemu-ga exit in this case though
since it won't be doing anything useful and wrapper scripts would at
least have some indication that something is up.

> 
> For a unix domain socket, we can continue accepting new connections
> after reading EOF.
> 
> For virtio-serial, EOF means "no host-side process is connected". In
> this case we sleep a bit and go back to reading from the same fd (and
> this turns into a sleep loop until the next host-side process connects).
> 
> 
> Can we tell "virtio-serial port unplug" from "no host-side process"?
> Because in the former case qemu-ga should really close the channel (and
> maybe exit (*)), otherwise the unplug won't complete in the guest kernel.
> 
> According to Amit's comments in the RHBZ, at unplug a SIGIO is
> delivered, and a POLLHUP event is reported. However,
> 
> (a) I think the glib iochannel abstraction doesn't allow us to tell
> POLLHUP apart from reading EOF;

AFAICT we can actually access the POLLHUP event via the 'condition' param
that gets passed to the cb, but the issue is we also get POLLUP when
the chardev backend isn't open.

> 
> (b) delivery of an unhandled SIGIO seems to terminate the victim
> process. qemu-ga doesn't seem to either catch or block SIGIO, which is
> why I think a SIGIO signal is not sent in reality (maybe we should ask
> for it first?)
> 
> ... Actually I'm confused about this as well. The virtio-serial port
> *is* opened with O_ASYNC (and on Solaris, it is replaced with an
> I_SETSIG ioctl()). What's the reason for this? g_io_channel_unix_new()
> doesn't seem to list it as a requirement, and qemu-ga doesn't seem to
> handle SIGIO.

At some point I played around with trying to use SIGIO to handle channel
resets and whatnot (since we're also supposed to get one when someone
opens the chardev backend and causes VIRTIO_CONSOLE_PORT_OPEN to get
sent). I don't think I ever got it working, SIGIO doesn't seem to get
sent, so that O_ASYNC might just be a relic from that.

I tried installing a handler retested host-connect as well as hot
unplug and still don't seem to be getting the signal. Not sure if i'm
doing something wrong or if there's an issue with the guest driver.

I did notice something interesting though:

1371740628.596505: debug: cb: condition: 17, status: 2
1371740628.596541: debug: received EOF
1371740628.395726: debug: cb: condition: 17, status: 2
1371740628.395760: debug: received EOF
1371740628.496035: debug: cb: condition: 17, status: 2
1371740628.496072: debug: received EOF
1371740628.596505: debug: cb: condition: 17, status: 2
1371740628.596541: debug: received EOF

<host opened chardev backend>

1371740634.195524: debug: cb: condition: 1, status: 1
1371740634.195556: debug: read data, count: 25, data:
{'execute':'guest-ping'}

1371740634.195634: debug: process_event: called
1371740634.195660: debug: processing command
1371740634.196007: debug: sending data, count: 15

<virtio-serial unplugged>

1371740644.113346: debug: cb: condition: 16, status: 2
1371740644.113379: debug: received EOF
1371740644.213694: debug: cb: condition: 16, status: 2
1371740644.213725: debug: received EOF
1371740644.314041: debug: cb: condition: 16, status: 2
1371740644.314168: debug: received EOF

i.e. we got the POLLHUP if we read from an
unconnected-but-present port, and we *don't* get the POLLHUP
if the port has been unplugged.

And in none of these cases do the SIGIO seem to be sent.

Here's the debug stuff i added:


> 
> In any case we'd need a way to tell "host side close" from "port unplug".
> 

Poking around a bit it seems that the SIGIO can be tied to a specific
kind of event we can extract via siginfo_t. Right now it looks like
virtio-serial is hard-coded to set POLL_OUT, but with some driver
changes we could maybe tie unplug to POLL_HUP or something.

So with some driver changes qemu-ga can be made to handle this
gracefully, but otherwise I don't have any good ideas.

The only that comes to mind is adding a 'quit' command to qemu-ga that
libvirt could call prior to unplug, and once we get around to integrating
qemu-ga into qmp qemu could issue it internally as part of the unplug.
This isn't consumable for other stuff uses virtio-serial though so I
think working SIGIO into doing what we want is probably the best
approach.

There's also taking advantage of the above behavior (EOF and no POLLHUP
means we were hot-unplugged) but based I what I've read that's not the
intended behavior.

> (*) Then, there's the question what to do after unplug. Should we keep
> reopening the same virtio-serial port (and sleeping a bit in-between)?
> Or exit and let udev / systemd restart qemu-ga on any new virtio-serial
> port, passing -p accordingly?

Event-based would be pretty spiffy, but that doesn't preclude us from
adding a "--wait-for-channel" type of option that plays a little more
nicely with a watchdog-style deployment.

> 
> Thanks
> Laszlo
>
Michael Roth - June 20, 2013, 3:20 p.m.
On Thu, Jun 20, 2013 at 10:12:30AM -0500, mdroth wrote:
> On Wed, Jun 19, 2013 at 01:17:57PM +0200, Laszlo Ersek wrote:
> > Hello Michael,
> > 
> > this is with reference to
> > <https://bugzilla.redhat.com/show_bug.cgi?id=907733>.
> > 
> > Ever since the initial qemu-ga commit AFAICS an exception for
> > virtio-serial has existed, when reading EOF from the channel.
> > 
> > For isa-serial, EOF results in the client connection being closed. I
> > assume this exits the glib main loop somehow (otherwise qemu-ga would
> > just sit there and do nothing, as no further connections are accepted I
> > think).
> 
> I think it would actually do the latter, unfortunately. It's distinct
> from virtio-serial handling in that we remove the GSource by return false
> in the event handler (qga/main.c:channel_event_cb), but I think we'd
> need to drop in a g_main_loop_quit() to actually get qemu-ga to exit in
> that scenario.
> 
> This doesn't normally get triggered though, as isa-serial does not send
> EOF when nothing is connected to the chardev backend, but instead just
> blocks. Might till make sense to make qemu-ga exit in this case though
> since it won't be doing anything useful and wrapper scripts would at
> least have some indication that something is up.
> 
> > 
> > For a unix domain socket, we can continue accepting new connections
> > after reading EOF.
> > 
> > For virtio-serial, EOF means "no host-side process is connected". In
> > this case we sleep a bit and go back to reading from the same fd (and
> > this turns into a sleep loop until the next host-side process connects).
> > 
> > 
> > Can we tell "virtio-serial port unplug" from "no host-side process"?
> > Because in the former case qemu-ga should really close the channel (and
> > maybe exit (*)), otherwise the unplug won't complete in the guest kernel.
> > 
> > According to Amit's comments in the RHBZ, at unplug a SIGIO is
> > delivered, and a POLLHUP event is reported. However,
> > 
> > (a) I think the glib iochannel abstraction doesn't allow us to tell
> > POLLHUP apart from reading EOF;
> 
> AFAICT we can actually access the POLLHUP event via the 'condition' param
> that gets passed to the cb, but the issue is we also get POLLUP when
> the chardev backend isn't open.
> 
> > 
> > (b) delivery of an unhandled SIGIO seems to terminate the victim
> > process. qemu-ga doesn't seem to either catch or block SIGIO, which is
> > why I think a SIGIO signal is not sent in reality (maybe we should ask
> > for it first?)
> > 
> > ... Actually I'm confused about this as well. The virtio-serial port
> > *is* opened with O_ASYNC (and on Solaris, it is replaced with an
> > I_SETSIG ioctl()). What's the reason for this? g_io_channel_unix_new()
> > doesn't seem to list it as a requirement, and qemu-ga doesn't seem to
> > handle SIGIO.
> 
> At some point I played around with trying to use SIGIO to handle channel
> resets and whatnot (since we're also supposed to get one when someone
> opens the chardev backend and causes VIRTIO_CONSOLE_PORT_OPEN to get
> sent). I don't think I ever got it working, SIGIO doesn't seem to get
> sent, so that O_ASYNC might just be a relic from that.
> 
> I tried installing a handler retested host-connect as well as hot
> unplug and still don't seem to be getting the signal. Not sure if i'm
> doing something wrong or if there's an issue with the guest driver.
> 
> I did notice something interesting though:
> 
> 1371740628.596505: debug: cb: condition: 17, status: 2
> 1371740628.596541: debug: received EOF
> 1371740628.395726: debug: cb: condition: 17, status: 2
> 1371740628.395760: debug: received EOF
> 1371740628.496035: debug: cb: condition: 17, status: 2
> 1371740628.496072: debug: received EOF
> 1371740628.596505: debug: cb: condition: 17, status: 2
> 1371740628.596541: debug: received EOF
> 
> <host opened chardev backend>
> 
> 1371740634.195524: debug: cb: condition: 1, status: 1
> 1371740634.195556: debug: read data, count: 25, data:
> {'execute':'guest-ping'}
> 
> 1371740634.195634: debug: process_event: called
> 1371740634.195660: debug: processing command
> 1371740634.196007: debug: sending data, count: 15
> 
> <virtio-serial unplugged>
> 
> 1371740644.113346: debug: cb: condition: 16, status: 2
> 1371740644.113379: debug: received EOF
> 1371740644.213694: debug: cb: condition: 16, status: 2
> 1371740644.213725: debug: received EOF
> 1371740644.314041: debug: cb: condition: 16, status: 2
> 1371740644.314168: debug: received EOF
> 
> i.e. we got the POLLHUP if we read from an
> unconnected-but-present port, and we *don't* get the POLLHUP
> if the port has been unplugged.

Er...silly me, POLLHUP=16, POLLIN=1 for glib, so I mixed them
up. For unplugged case we get POLLHUP, for unconnected case we get
POLLIN | POLLHUP, so that might actually be enough to distinguish
unplug if this is the intended behavior.

Amit, can you confirm?

> 
> And in none of these cases do the SIGIO seem to be sent.
> 
> Here's the debug stuff i added:
> 
> diff --git a/qga/main.c b/qga/main.c
> index 0e04e73..7f9a628 100644
> --- a/qga/main.c
> +++ b/qga/main.c
> @@ -140,6 +140,11 @@ static void quit_handler(int sig)
>      }
>  }
>  
> +static void sigio_handler(int sig)
> +{
> +    g_debug("got sigio: %d", sig);
> +}
> +
>  #ifndef _WIN32
>  static gboolean register_signal_handlers(void)
>  {
> @@ -158,6 +163,13 @@ static gboolean register_signal_handlers(void)
>          g_error("error configuring signal handler: %s", strerror(errno));
>      }
>  
> +    memset(&sigact, 0, sizeof(struct sigaction));
> +    sigact.sa_handler = sigio_handler;
> +    ret = sigaction(SIGIO, &sigact, NULL);
> +    if (ret == -1) {
> +        g_error("error configuring signal handler: %s", strerror(errno));
> +    }
> +
>      return true;
>  }
>  
> @@ -627,6 +639,7 @@ static gboolean channel_event_cb(GIOCondition condition, gpointer data)
>      gsize count;
>      GError *err = NULL;
>      GIOStatus status = ga_channel_read(s->channel, buf, QGA_READ_COUNT_DEFAULT, &count);
> +    g_debug("cb: condition: %d, status: %d", condition, status);
>      if (err != NULL) {
>          g_warning("error reading channel: %s", err->message);
>          g_error_free(err);
> 
> > 
> > In any case we'd need a way to tell "host side close" from "port unplug".
> > 
> 
> Poking around a bit it seems that the SIGIO can be tied to a specific
> kind of event we can extract via siginfo_t. Right now it looks like
> virtio-serial is hard-coded to set POLL_OUT, but with some driver
> changes we could maybe tie unplug to POLL_HUP or something.
> 
> So with some driver changes qemu-ga can be made to handle this
> gracefully, but otherwise I don't have any good ideas.
> 
> The only that comes to mind is adding a 'quit' command to qemu-ga that
> libvirt could call prior to unplug, and once we get around to integrating
> qemu-ga into qmp qemu could issue it internally as part of the unplug.
> This isn't consumable for other stuff uses virtio-serial though so I
> think working SIGIO into doing what we want is probably the best
> approach.
> 
> There's also taking advantage of the above behavior (EOF and no POLLHUP
> means we were hot-unplugged) but based I what I've read that's not the
> intended behavior.
> 
> > (*) Then, there's the question what to do after unplug. Should we keep
> > reopening the same virtio-serial port (and sleeping a bit in-between)?
> > Or exit and let udev / systemd restart qemu-ga on any new virtio-serial
> > port, passing -p accordingly?
> 
> Event-based would be pretty spiffy, but that doesn't preclude us from
> adding a "--wait-for-channel" type of option that plays a little more
> nicely with a watchdog-style deployment.
> 
> > 
> > Thanks
> > Laszlo
> >
Amit Shah - Aug. 13, 2013, noon
Hi,

(Sorry for the late reply!)

On (Thu) 20 Jun 2013 [10:20:01], mdroth wrote:
> On Thu, Jun 20, 2013 at 10:12:30AM -0500, mdroth wrote:
> > On Wed, Jun 19, 2013 at 01:17:57PM +0200, Laszlo Ersek wrote:
> > > Hello Michael,
> > > 
> > > this is with reference to
> > > <https://bugzilla.redhat.com/show_bug.cgi?id=907733>.
> > > 
> > > Ever since the initial qemu-ga commit AFAICS an exception for
> > > virtio-serial has existed, when reading EOF from the channel.
> > > 
> > > For isa-serial, EOF results in the client connection being closed. I
> > > assume this exits the glib main loop somehow (otherwise qemu-ga would
> > > just sit there and do nothing, as no further connections are accepted I
> > > think).
> > 
> > I think it would actually do the latter, unfortunately. It's distinct
> > from virtio-serial handling in that we remove the GSource by return false
> > in the event handler (qga/main.c:channel_event_cb), but I think we'd
> > need to drop in a g_main_loop_quit() to actually get qemu-ga to exit in
> > that scenario.
> > 
> > This doesn't normally get triggered though, as isa-serial does not send
> > EOF when nothing is connected to the chardev backend, but instead just
> > blocks. Might till make sense to make qemu-ga exit in this case though
> > since it won't be doing anything useful and wrapper scripts would at
> > least have some indication that something is up.
> > 
> > > 
> > > For a unix domain socket, we can continue accepting new connections
> > > after reading EOF.
> > > 
> > > For virtio-serial, EOF means "no host-side process is connected". In
> > > this case we sleep a bit and go back to reading from the same fd (and
> > > this turns into a sleep loop until the next host-side process connects).
> > > 
> > > 
> > > Can we tell "virtio-serial port unplug" from "no host-side process"?
> > > Because in the former case qemu-ga should really close the channel (and
> > > maybe exit (*)), otherwise the unplug won't complete in the guest kernel.

Port unplug will give an error return, and 'no host-side process' will
give EOF.  A bug was fixed recently in 3.11-rc5, and that fix is
queued for the -stable kernels as well.

Upstream kernel commit 96f97a83910cdb9d89d127c5ee523f8fc040a804

> > > According to Amit's comments in the RHBZ, at unplug a SIGIO is
> > > delivered, and a POLLHUP event is reported. However,
> > > 
> > > (a) I think the glib iochannel abstraction doesn't allow us to tell
> > > POLLHUP apart from reading EOF;
> > 
> > AFAICT we can actually access the POLLHUP event via the 'condition' param
> > that gets passed to the cb, but the issue is we also get POLLUP when
> > the chardev backend isn't open.
> > 
> > > 
> > > (b) delivery of an unhandled SIGIO seems to terminate the victim
> > > process. qemu-ga doesn't seem to either catch or block SIGIO, which is
> > > why I think a SIGIO signal is not sent in reality (maybe we should ask
> > > for it first?)
> > > 
> > > ... Actually I'm confused about this as well. The virtio-serial port
> > > *is* opened with O_ASYNC (and on Solaris, it is replaced with an
> > > I_SETSIG ioctl()). What's the reason for this? g_io_channel_unix_new()
> > > doesn't seem to list it as a requirement, and qemu-ga doesn't seem to
> > > handle SIGIO.
> > 
> > At some point I played around with trying to use SIGIO to handle channel
> > resets and whatnot (since we're also supposed to get one when someone
> > opens the chardev backend and causes VIRTIO_CONSOLE_PORT_OPEN to get
> > sent). I don't think I ever got it working, SIGIO doesn't seem to get
> > sent, so that O_ASYNC might just be a relic from that.
> > 
> > I tried installing a handler retested host-connect as well as hot
> > unplug and still don't seem to be getting the signal. Not sure if i'm
> > doing something wrong or if there's an issue with the guest driver.
> > 
> > I did notice something interesting though:
> > 
> > 1371740628.596505: debug: cb: condition: 17, status: 2
> > 1371740628.596541: debug: received EOF
> > 1371740628.395726: debug: cb: condition: 17, status: 2
> > 1371740628.395760: debug: received EOF
> > 1371740628.496035: debug: cb: condition: 17, status: 2
> > 1371740628.496072: debug: received EOF
> > 1371740628.596505: debug: cb: condition: 17, status: 2
> > 1371740628.596541: debug: received EOF
> > 
> > <host opened chardev backend>
> > 
> > 1371740634.195524: debug: cb: condition: 1, status: 1
> > 1371740634.195556: debug: read data, count: 25, data:
> > {'execute':'guest-ping'}
> > 
> > 1371740634.195634: debug: process_event: called
> > 1371740634.195660: debug: processing command
> > 1371740634.196007: debug: sending data, count: 15
> > 
> > <virtio-serial unplugged>
> > 
> > 1371740644.113346: debug: cb: condition: 16, status: 2
> > 1371740644.113379: debug: received EOF
> > 1371740644.213694: debug: cb: condition: 16, status: 2
> > 1371740644.213725: debug: received EOF
> > 1371740644.314041: debug: cb: condition: 16, status: 2
> > 1371740644.314168: debug: received EOF
> > 
> > i.e. we got the POLLHUP if we read from an
> > unconnected-but-present port, and we *don't* get the POLLHUP
> > if the port has been unplugged.
> 
> Er...silly me, POLLHUP=16, POLLIN=1 for glib, so I mixed them
> up. For unplugged case we get POLLHUP, for unconnected case we get
> POLLIN | POLLHUP, so that might actually be enough to distinguish
> unplug if this is the intended behavior.
> 
> Amit, can you confirm?

For unconnected, you should get POLLHUP.  For unplug, even poll should
return error now.

> > And in none of these cases do the SIGIO seem to be sent.

Yes, that was a bug where SIGIO wasn't sent when port was unplugged.
Fixed in upstream kernel 92d3453815fbe74d539c86b60dab39ecdf01bb99, and
will trickle to -stable kernels soon.

> > > In any case we'd need a way to tell "host side close" from "port unplug".
> > > 
> > 
> > Poking around a bit it seems that the SIGIO can be tied to a specific
> > kind of event we can extract via siginfo_t. Right now it looks like
> > virtio-serial is hard-coded to set POLL_OUT, but with some driver
> > changes we could maybe tie unplug to POLL_HUP or something.
> > 
> > So with some driver changes qemu-ga can be made to handle this
> > gracefully, but otherwise I don't have any good ideas.
> > 
> > The only that comes to mind is adding a 'quit' command to qemu-ga that
> > libvirt could call prior to unplug, and once we get around to integrating
> > qemu-ga into qmp qemu could issue it internally as part of the unplug.
> > This isn't consumable for other stuff uses virtio-serial though so I
> > think working SIGIO into doing what we want is probably the best
> > approach.
> > 
> > There's also taking advantage of the above behavior (EOF and no POLLHUP
> > means we were hot-unplugged) but based I what I've read that's not the
> > intended behavior.
> > 
> > > (*) Then, there's the question what to do after unplug. Should we keep
> > > reopening the same virtio-serial port (and sleeping a bit in-between)?
> > > Or exit and let udev / systemd restart qemu-ga on any new virtio-serial
> > > port, passing -p accordingly?
> > 
> > Event-based would be pretty spiffy, but that doesn't preclude us from
> > adding a "--wait-for-channel" type of option that plays a little more
> > nicely with a watchdog-style deployment.

Thanks,

		Amit

Patch

diff --git a/qga/main.c b/qga/main.c
index 0e04e73..7f9a628 100644
--- a/qga/main.c
+++ b/qga/main.c
@@ -140,6 +140,11 @@  static void quit_handler(int sig)
     }
 }
 
+static void sigio_handler(int sig)
+{
+    g_debug("got sigio: %d", sig);
+}
+
 #ifndef _WIN32
 static gboolean register_signal_handlers(void)
 {
@@ -158,6 +163,13 @@  static gboolean register_signal_handlers(void)
         g_error("error configuring signal handler: %s", strerror(errno));
     }
 
+    memset(&sigact, 0, sizeof(struct sigaction));
+    sigact.sa_handler = sigio_handler;
+    ret = sigaction(SIGIO, &sigact, NULL);
+    if (ret == -1) {
+        g_error("error configuring signal handler: %s", strerror(errno));
+    }
+
     return true;
 }
 
@@ -627,6 +639,7 @@  static gboolean channel_event_cb(GIOCondition condition, gpointer data)
     gsize count;
     GError *err = NULL;
     GIOStatus status = ga_channel_read(s->channel, buf, QGA_READ_COUNT_DEFAULT, &count);
+    g_debug("cb: condition: %d, status: %d", condition, status);
     if (err != NULL) {
         g_warning("error reading channel: %s", err->message);
         g_error_free(err);