Message ID | 1365600207-21685-1-git-send-email-pbonzini@redhat.com |
---|---|
State | New |
Headers | show |
On (Wed) 10 Apr 2013 [15:23:27], Paolo Bonzini wrote: > After attaching the source, we have to remove the reference we hold > to it, because we do not hold anymore a pointer to the source. > > If we do not do this, removing the source will not finalize it and > will not drop the "real" I/O watch source. > > This showed up when backporting the new flow control patches to older > versions of QEMU that still used select. The whole select then failed > with EBADF (poll instead will reporting POLLNVAL on a single pollfd) > and QEMU froze. This patch doesn't apply directly to master, applies with some fuzz. However, this patch causes qemu freeze. My testcase is: Open chardev on host Write something to a virtserialport in guest Close chardev on host Keep writing to virtserialport in guest When I apply the patch to the old qemu version with select, that starts working fine with the testcase above. There's a slight difference in my old qemu tree, I have Hans's "virtio-console: Remove any pending watches on close" patch applied, which makes use of the tag obtained on adding the watch. That patch hasn't found its way to master yet, but it should go in soon. Amit
Il 10/04/2013 19:59, Amit Shah ha scritto: > On (Wed) 10 Apr 2013 [15:23:27], Paolo Bonzini wrote: >> After attaching the source, we have to remove the reference we hold >> to it, because we do not hold anymore a pointer to the source. >> >> If we do not do this, removing the source will not finalize it and >> will not drop the "real" I/O watch source. >> >> This showed up when backporting the new flow control patches to older >> versions of QEMU that still used select. The whole select then failed >> with EBADF (poll instead will reporting POLLNVAL on a single pollfd) >> and QEMU froze. > > This patch doesn't apply directly to master, applies with some fuzz. > However, this patch causes qemu freeze. My testcase is: > > Open chardev on host > Write something to a virtserialport in guest > Close chardev on host > Keep writing to virtserialport in guest > > When I apply the patch to the old qemu version with select, that > starts working fine with the testcase above. I cannot replicate the freeze. The patch works on both old and new versions of QEMU. My testcases are: 1) on host, nc -l -p 12345 on host, start qemu in guest, cat > /dev/vport0p1 in guest, write something on host, close nc in guest, write something in guest, ^D and poweroff 2) on host, nc -l -p 12345 on host, start qemu in guest, echo abc > /dev/vport0p1 on host, close nc in guest, echo abc > /dev/vport0p1 in guest, poweroff > There's a slight difference in my old qemu tree, I have Hans's > "virtio-console: Remove any pending watches on close" patch applied, > which makes use of the tag obtained on adding the watch. That patch > hasn't found its way to master yet, but it should go in soon. I don't have that patch in my (new) tree. It's vanilla upstream QEMU. Paolo
On (Thu) 11 Apr 2013 [10:58:30], Paolo Bonzini wrote: > Il 10/04/2013 19:59, Amit Shah ha scritto: > > On (Wed) 10 Apr 2013 [15:23:27], Paolo Bonzini wrote: > >> After attaching the source, we have to remove the reference we hold > >> to it, because we do not hold anymore a pointer to the source. > >> > >> If we do not do this, removing the source will not finalize it and > >> will not drop the "real" I/O watch source. > >> > >> This showed up when backporting the new flow control patches to older > >> versions of QEMU that still used select. The whole select then failed > >> with EBADF (poll instead will reporting POLLNVAL on a single pollfd) > >> and QEMU froze. > > > > This patch doesn't apply directly to master, applies with some fuzz. > > However, this patch causes qemu freeze. My testcase is: > > > > Open chardev on host > > Write something to a virtserialport in guest > > Close chardev on host > > Keep writing to virtserialport in guest > > > > When I apply the patch to the old qemu version with select, that > > starts working fine with the testcase above. > > I cannot replicate the freeze. The patch works on both old and new > versions of QEMU. My testcases are: > > 1) on host, nc -l -p 12345 > on host, start qemu > in guest, cat > /dev/vport0p1 > in guest, write something > on host, close nc > in guest, write something > in guest, ^D and poweroff > > 2) on host, nc -l -p 12345 > on host, start qemu > in guest, echo abc > /dev/vport0p1 > on host, close nc > in guest, echo abc > /dev/vport0p1 > in guest, poweroff Can you try multiple writes from the guest? At least 3-4? QEMU doesn't detect a backend getting closed right away (another bug), so the freeze doesn't trigger til qemu detects there's no chardev anymore. > > There's a slight difference in my old qemu tree, I have Hans's > > "virtio-console: Remove any pending watches on close" patch applied, > > which makes use of the tag obtained on adding the watch. That patch > > hasn't found its way to master yet, but it should go in soon. > > I don't have that patch in my (new) tree. It's vanilla upstream QEMU. Yep, I tested upstream QEMU from master as well. (It's just my 'old' qemu tree which has Hans's patches too.) Amit
Il 12/04/2013 11:24, Amit Shah ha scritto: > Can you try multiple writes from the guest? At least 3-4? QEMU > doesn't detect a backend getting closed right away (another bug), so > the freeze doesn't trigger til qemu detects there's no chardev > anymore. All writes after the second will hang and ^C will return bash: echo: write error: Interrupted system call Same for "yes > /dev/vport0p1". The writes hang as soon as I exit nc on the host, and ^C exits cleanly to the shell. I think this patch is obvious. You might be seeing another bug that should be fixed separately. Paolo >>> There's a slight difference in my old qemu tree, I have Hans's >>> "virtio-console: Remove any pending watches on close" patch applied, >>> which makes use of the tag obtained on adding the watch. That patch >>> hasn't found its way to master yet, but it should go in soon. >> >> I don't have that patch in my (new) tree. It's vanilla upstream QEMU. > > Yep, I tested upstream QEMU from master as well. (It's just my 'old' > qemu tree which has Hans's patches too.) > > Amit > >
Applied. Thanks. Regards, Anthony Liguori
On 04/10/13 15:23, Paolo Bonzini wrote: > After attaching the source, we have to remove the reference we hold > to it, because we do not hold anymore a pointer to the source. > > If we do not do this, removing the source will not finalize it and > will not drop the "real" I/O watch source. > > This showed up when backporting the new flow control patches to older > versions of QEMU that still used select. The whole select then failed > with EBADF (poll instead will reporting POLLNVAL on a single pollfd) > and QEMU froze. I get freezes now in master, bisecting points to this patch. Reproducer: "qemu -serial pty". qemu is pretty much unusable with libvirt now as libvirt uses pty chardevs by default for serial & monitor ... (gdb) bt #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136 #1 0x00007f4141ce7388 in _L_lock_854 () from /lib64/libpthread.so.0 #2 0x00007f4141ce7257 in __pthread_mutex_lock (mutex=0x7f4145639128) at pthread_mutex_lock.c:61 #3 0x00007f4142f41c37 in ?? () from /lib64/libglib-2.0.so.0 #4 0x00007f41439ff1b1 in io_watch_poll_finalize (source=<value optimized out>) at /home/kraxel/projects/qemu/qemu-char.c:647 #5 0x00007f4142f4182a in ?? () from /lib64/libglib-2.0.so.0 #6 0x00007f4142f41b85 in ?? () from /lib64/libglib-2.0.so.0 #7 0x00007f4142f4416e in g_source_remove () from /lib64/libglib-2.0.so.0 #8 0x00007f4143a02f38 in pty_chr_state (chr=0x7f4145644b70, connected=<value optimized out>) at /home/kraxel/projects/qemu/qemu-char.c:1151 #9 0x00007f4143a0303c in pty_chr_read (chan=<value optimized out>, cond=<value optimized out>, opaque=0x7f4145644b70) at /home/kraxel/projects/qemu/qemu-char.c:1116 #10 0x00007f4142f41f0e in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #11 0x00007f41439d8259 in glib_pollfds_poll (nonblocking=<value optimized out>) at /home/kraxel/projects/qemu/main-loop.c:187 #12 os_host_main_loop_wait (nonblocking=<value optimized out>) at /home/kraxel/projects/qemu/main-loop.c:232 #13 main_loop_wait (nonblocking=<value optimized out>) at /home/kraxel/projects/qemu/main-loop.c:468 #14 0x00007f4143a4f055 in main_loop (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /home/kraxel/projects/qemu/vl.c:2039 #15 main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /home/kraxel/projects/qemu/vl.c:4432 cheers, Gerd
Il 16/04/2013 11:15, Gerd Hoffmann ha scritto: > On 04/10/13 15:23, Paolo Bonzini wrote: >> After attaching the source, we have to remove the reference we hold >> to it, because we do not hold anymore a pointer to the source. >> >> If we do not do this, removing the source will not finalize it and >> will not drop the "real" I/O watch source. >> >> This showed up when backporting the new flow control patches to older >> versions of QEMU that still used select. The whole select then failed >> with EBADF (poll instead will reporting POLLNVAL on a single pollfd) >> and QEMU froze. > > I get freezes now in master, bisecting points to this patch. > > Reproducer: "qemu -serial pty". > > qemu is pretty much unusable with libvirt now as libvirt uses pty > chardevs by default for serial & monitor ... I'm not sure why all users of qemu_chr_fe_add_watch believe that the watch will be one-shot. This is definitely not what g_io_create_watch does... Paolo
Ji, On 04/16/2013 11:34 AM, Paolo Bonzini wrote: > Il 16/04/2013 11:15, Gerd Hoffmann ha scritto: >> On 04/10/13 15:23, Paolo Bonzini wrote: >>> After attaching the source, we have to remove the reference we hold >>> to it, because we do not hold anymore a pointer to the source. >>> >>> If we do not do this, removing the source will not finalize it and >>> will not drop the "real" I/O watch source. >>> >>> This showed up when backporting the new flow control patches to older >>> versions of QEMU that still used select. The whole select then failed >>> with EBADF (poll instead will reporting POLLNVAL on a single pollfd) >>> and QEMU froze. >> >> I get freezes now in master, bisecting points to this patch. >> >> Reproducer: "qemu -serial pty". >> >> qemu is pretty much unusable with libvirt now as libvirt uses pty >> chardevs by default for serial & monitor ... > > I'm not sure why all users of qemu_chr_fe_add_watch believe that the > watch will be one-shot. This is definitely not what g_io_create_watch > does... It is supposed to depend on the return value of the callback, if you return False, then the source should be removed, in essence making the watch one-shot, see: http://www.gtk.org/api/2.6/glib/glib-The-Main-Event-Loop.html#GSourceFunc Which specifically says: Returns: it should return FALSE if the source should be removed. And using sources in a one-shot mode by making the callback return false is quite normal in the glib world. Regards, Hans
Il 17/04/2013 17:36, Hans de Goede ha scritto: >>> >> >> I'm not sure why all users of qemu_chr_fe_add_watch believe that the >> watch will be one-shot. This is definitely not what g_io_create_watch >> does... > > It is supposed to depend on the return value of the callback, if you > return False, then the source should be removed, in essence making > the watch one-shot, see: > http://www.gtk.org/api/2.6/glib/glib-The-Main-Event-Loop.html#GSourceFunc Yeah, I found that later. :) Paolo
diff --git a/qemu-char.c b/qemu-char.c index 37117b5..2caef56 100644 --- a/qemu-char.c +++ b/qemu-char.c @@ -610,6 +610,7 @@ static guint io_add_watch_poll(GIOChannel *channel, gpointer user_data) { IOWatchPoll *iwp; + int tag; iwp = (IOWatchPoll *) g_source_new(&io_watch_poll_funcs, sizeof(IOWatchPoll)); iwp->fd_can_read = fd_can_read; @@ -618,7 +619,9 @@ static guint io_add_watch_poll(GIOChannel *channel, iwp->fd_read = (GSourceFunc) fd_read; iwp->src = NULL; - return g_source_attach(&iwp->parent, NULL); + tag = g_source_attach(&iwp->parent, NULL); + g_source_unref(&iwp->parent); + return tag; } static GIOChannel *io_channel_from_fd(int fd)
After attaching the source, we have to remove the reference we hold to it, because we do not hold anymore a pointer to the source. If we do not do this, removing the source will not finalize it and will not drop the "real" I/O watch source. This showed up when backporting the new flow control patches to older versions of QEMU that still used select. The whole select then failed with EBADF (poll instead will reporting POLLNVAL on a single pollfd) and QEMU froze. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- qemu-char.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)