diff mbox

qemu-char: another io_add_watch_poll fix

Message ID 1365600207-21685-1-git-send-email-pbonzini@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini April 10, 2013, 1:23 p.m. UTC
After attaching the source, we have to remove the reference we hold
to it, because we do not hold anymore a pointer to the source.

If we do not do this, removing the source will not finalize it and
will not drop the "real" I/O watch source.

This showed up when backporting the new flow control patches to older
versions of QEMU that still used select.  The whole select then failed
with EBADF (poll instead will reporting POLLNVAL on a single pollfd)
and QEMU froze.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 qemu-char.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Amit Shah April 10, 2013, 5:59 p.m. UTC | #1
On (Wed) 10 Apr 2013 [15:23:27], Paolo Bonzini wrote:
> After attaching the source, we have to remove the reference we hold
> to it, because we do not hold anymore a pointer to the source.
> 
> If we do not do this, removing the source will not finalize it and
> will not drop the "real" I/O watch source.
> 
> This showed up when backporting the new flow control patches to older
> versions of QEMU that still used select.  The whole select then failed
> with EBADF (poll instead will reporting POLLNVAL on a single pollfd)
> and QEMU froze.

This patch doesn't apply directly to master, applies with some fuzz.
However, this patch causes qemu freeze.  My testcase is:

Open chardev on host
Write something to a virtserialport in guest
Close chardev on host
Keep writing to virtserialport in guest

When I apply the patch to the old qemu version with select, that
starts working fine with the testcase above.

There's a slight difference in my old qemu tree, I have Hans's
"virtio-console: Remove any pending watches on close" patch applied,
which makes use of the tag obtained on adding the watch.  That patch
hasn't found its way to master yet, but it should go in soon.

		Amit
Paolo Bonzini April 11, 2013, 8:58 a.m. UTC | #2
Il 10/04/2013 19:59, Amit Shah ha scritto:
> On (Wed) 10 Apr 2013 [15:23:27], Paolo Bonzini wrote:
>> After attaching the source, we have to remove the reference we hold
>> to it, because we do not hold anymore a pointer to the source.
>>
>> If we do not do this, removing the source will not finalize it and
>> will not drop the "real" I/O watch source.
>>
>> This showed up when backporting the new flow control patches to older
>> versions of QEMU that still used select.  The whole select then failed
>> with EBADF (poll instead will reporting POLLNVAL on a single pollfd)
>> and QEMU froze.
> 
> This patch doesn't apply directly to master, applies with some fuzz.
> However, this patch causes qemu freeze.  My testcase is:
> 
> Open chardev on host
> Write something to a virtserialport in guest
> Close chardev on host
> Keep writing to virtserialport in guest
> 
> When I apply the patch to the old qemu version with select, that
> starts working fine with the testcase above.

I cannot replicate the freeze.  The patch works on both old and new
versions of QEMU.  My testcases are:

1) on host, nc -l -p 12345
   on host, start qemu
   in guest, cat > /dev/vport0p1
   in guest, write something
   on host, close nc
   in guest, write something
   in guest, ^D and poweroff

2) on host, nc -l -p 12345
   on host, start qemu
   in guest, echo abc > /dev/vport0p1
   on host, close nc
   in guest, echo abc > /dev/vport0p1
   in guest, poweroff

> There's a slight difference in my old qemu tree, I have Hans's
> "virtio-console: Remove any pending watches on close" patch applied,
> which makes use of the tag obtained on adding the watch.  That patch
> hasn't found its way to master yet, but it should go in soon.

I don't have that patch in my (new) tree.  It's vanilla upstream QEMU.

Paolo
Amit Shah April 12, 2013, 9:24 a.m. UTC | #3
On (Thu) 11 Apr 2013 [10:58:30], Paolo Bonzini wrote:
> Il 10/04/2013 19:59, Amit Shah ha scritto:
> > On (Wed) 10 Apr 2013 [15:23:27], Paolo Bonzini wrote:
> >> After attaching the source, we have to remove the reference we hold
> >> to it, because we do not hold anymore a pointer to the source.
> >>
> >> If we do not do this, removing the source will not finalize it and
> >> will not drop the "real" I/O watch source.
> >>
> >> This showed up when backporting the new flow control patches to older
> >> versions of QEMU that still used select.  The whole select then failed
> >> with EBADF (poll instead will reporting POLLNVAL on a single pollfd)
> >> and QEMU froze.
> > 
> > This patch doesn't apply directly to master, applies with some fuzz.
> > However, this patch causes qemu freeze.  My testcase is:
> > 
> > Open chardev on host
> > Write something to a virtserialport in guest
> > Close chardev on host
> > Keep writing to virtserialport in guest
> > 
> > When I apply the patch to the old qemu version with select, that
> > starts working fine with the testcase above.
> 
> I cannot replicate the freeze.  The patch works on both old and new
> versions of QEMU.  My testcases are:
> 
> 1) on host, nc -l -p 12345
>    on host, start qemu
>    in guest, cat > /dev/vport0p1
>    in guest, write something
>    on host, close nc
>    in guest, write something
>    in guest, ^D and poweroff
> 
> 2) on host, nc -l -p 12345
>    on host, start qemu
>    in guest, echo abc > /dev/vport0p1
>    on host, close nc
>    in guest, echo abc > /dev/vport0p1
>    in guest, poweroff

Can you try multiple writes from the guest?  At least 3-4?  QEMU
doesn't detect a backend getting closed right away (another bug), so
the freeze doesn't trigger til qemu detects there's no chardev
anymore.

> > There's a slight difference in my old qemu tree, I have Hans's
> > "virtio-console: Remove any pending watches on close" patch applied,
> > which makes use of the tag obtained on adding the watch.  That patch
> > hasn't found its way to master yet, but it should go in soon.
> 
> I don't have that patch in my (new) tree.  It's vanilla upstream QEMU.

Yep, I tested upstream QEMU from master as well.  (It's just my 'old'
qemu tree which has Hans's patches too.)

		Amit
Paolo Bonzini April 12, 2013, 10:31 a.m. UTC | #4
Il 12/04/2013 11:24, Amit Shah ha scritto:
> Can you try multiple writes from the guest?  At least 3-4?  QEMU
> doesn't detect a backend getting closed right away (another bug), so
> the freeze doesn't trigger til qemu detects there's no chardev
> anymore.

All writes after the second will hang and ^C will return

bash: echo: write error: Interrupted system call

Same for "yes > /dev/vport0p1".  The writes hang as soon as I exit nc on
the host, and ^C exits cleanly to the shell.

I think this patch is obvious.  You might be seeing another bug that
should be fixed separately.

Paolo

>>> There's a slight difference in my old qemu tree, I have Hans's
>>> "virtio-console: Remove any pending watches on close" patch applied,
>>> which makes use of the tag obtained on adding the watch.  That patch
>>> hasn't found its way to master yet, but it should go in soon.
>>
>> I don't have that patch in my (new) tree.  It's vanilla upstream QEMU.
> 
> Yep, I tested upstream QEMU from master as well.  (It's just my 'old'
> qemu tree which has Hans's patches too.)
> 
> 		Amit
> 
>
Anthony Liguori April 15, 2013, 9:10 p.m. UTC | #5
Applied.  Thanks.

Regards,

Anthony Liguori
Gerd Hoffmann April 16, 2013, 9:15 a.m. UTC | #6
On 04/10/13 15:23, Paolo Bonzini wrote:
> After attaching the source, we have to remove the reference we hold
> to it, because we do not hold anymore a pointer to the source.
> 
> If we do not do this, removing the source will not finalize it and
> will not drop the "real" I/O watch source.
> 
> This showed up when backporting the new flow control patches to older
> versions of QEMU that still used select.  The whole select then failed
> with EBADF (poll instead will reporting POLLNVAL on a single pollfd)
> and QEMU froze.

I get freezes now in master, bisecting points to this patch.

Reproducer: "qemu -serial pty".

qemu is pretty much unusable with libvirt now as libvirt uses pty
chardevs by default for serial & monitor ...

(gdb) bt
#0  __lll_lock_wait () at
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007f4141ce7388 in _L_lock_854 () from /lib64/libpthread.so.0
#2  0x00007f4141ce7257 in __pthread_mutex_lock (mutex=0x7f4145639128) at
pthread_mutex_lock.c:61
#3  0x00007f4142f41c37 in ?? () from /lib64/libglib-2.0.so.0
#4  0x00007f41439ff1b1 in io_watch_poll_finalize (source=<value
optimized out>)
    at /home/kraxel/projects/qemu/qemu-char.c:647
#5  0x00007f4142f4182a in ?? () from /lib64/libglib-2.0.so.0
#6  0x00007f4142f41b85 in ?? () from /lib64/libglib-2.0.so.0
#7  0x00007f4142f4416e in g_source_remove () from /lib64/libglib-2.0.so.0
#8  0x00007f4143a02f38 in pty_chr_state (chr=0x7f4145644b70,
connected=<value optimized out>)
    at /home/kraxel/projects/qemu/qemu-char.c:1151
#9  0x00007f4143a0303c in pty_chr_read (chan=<value optimized out>,
cond=<value optimized out>,
    opaque=0x7f4145644b70) at /home/kraxel/projects/qemu/qemu-char.c:1116
#10 0x00007f4142f41f0e in g_main_context_dispatch () from
/lib64/libglib-2.0.so.0
#11 0x00007f41439d8259 in glib_pollfds_poll (nonblocking=<value
optimized out>)
    at /home/kraxel/projects/qemu/main-loop.c:187
#12 os_host_main_loop_wait (nonblocking=<value optimized out>)
    at /home/kraxel/projects/qemu/main-loop.c:232
#13 main_loop_wait (nonblocking=<value optimized out>)
    at /home/kraxel/projects/qemu/main-loop.c:468
#14 0x00007f4143a4f055 in main_loop (argc=<value optimized out>,
argv=<value optimized out>,
    envp=<value optimized out>) at /home/kraxel/projects/qemu/vl.c:2039
#15 main (argc=<value optimized out>, argv=<value optimized out>,
envp=<value optimized out>)
    at /home/kraxel/projects/qemu/vl.c:4432

cheers,
  Gerd
Paolo Bonzini April 16, 2013, 9:34 a.m. UTC | #7
Il 16/04/2013 11:15, Gerd Hoffmann ha scritto:
> On 04/10/13 15:23, Paolo Bonzini wrote:
>> After attaching the source, we have to remove the reference we hold
>> to it, because we do not hold anymore a pointer to the source.
>>
>> If we do not do this, removing the source will not finalize it and
>> will not drop the "real" I/O watch source.
>>
>> This showed up when backporting the new flow control patches to older
>> versions of QEMU that still used select.  The whole select then failed
>> with EBADF (poll instead will reporting POLLNVAL on a single pollfd)
>> and QEMU froze.
> 
> I get freezes now in master, bisecting points to this patch.
> 
> Reproducer: "qemu -serial pty".
> 
> qemu is pretty much unusable with libvirt now as libvirt uses pty
> chardevs by default for serial & monitor ...

I'm not sure why all users of qemu_chr_fe_add_watch believe that the
watch will be one-shot.  This is definitely not what g_io_create_watch
does...

Paolo
Hans de Goede April 17, 2013, 3:36 p.m. UTC | #8
Ji,

On 04/16/2013 11:34 AM, Paolo Bonzini wrote:
> Il 16/04/2013 11:15, Gerd Hoffmann ha scritto:
>> On 04/10/13 15:23, Paolo Bonzini wrote:
>>> After attaching the source, we have to remove the reference we hold
>>> to it, because we do not hold anymore a pointer to the source.
>>>
>>> If we do not do this, removing the source will not finalize it and
>>> will not drop the "real" I/O watch source.
>>>
>>> This showed up when backporting the new flow control patches to older
>>> versions of QEMU that still used select.  The whole select then failed
>>> with EBADF (poll instead will reporting POLLNVAL on a single pollfd)
>>> and QEMU froze.
>>
>> I get freezes now in master, bisecting points to this patch.
>>
>> Reproducer: "qemu -serial pty".
>>
>> qemu is pretty much unusable with libvirt now as libvirt uses pty
>> chardevs by default for serial & monitor ...
>
> I'm not sure why all users of qemu_chr_fe_add_watch believe that the
> watch will be one-shot.  This is definitely not what g_io_create_watch
> does...

It is supposed to depend on the return value of the callback, if you
return False, then the source should be removed, in essence making
the watch one-shot, see:
http://www.gtk.org/api/2.6/glib/glib-The-Main-Event-Loop.html#GSourceFunc

Which specifically says:

Returns:	it should return FALSE if the source should be removed.

And using sources in a one-shot mode by making the callback return false
is quite normal in the glib world.

Regards,

Hans
Paolo Bonzini April 17, 2013, 4:54 p.m. UTC | #9
Il 17/04/2013 17:36, Hans de Goede ha scritto:
>>>
>>
>> I'm not sure why all users of qemu_chr_fe_add_watch believe that the
>> watch will be one-shot.  This is definitely not what g_io_create_watch
>> does...
> 
> It is supposed to depend on the return value of the callback, if you
> return False, then the source should be removed, in essence making
> the watch one-shot, see:
> http://www.gtk.org/api/2.6/glib/glib-The-Main-Event-Loop.html#GSourceFunc

Yeah, I found that later. :)

Paolo
diff mbox

Patch

diff --git a/qemu-char.c b/qemu-char.c
index 37117b5..2caef56 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -610,6 +610,7 @@  static guint io_add_watch_poll(GIOChannel *channel,
                                gpointer user_data)
 {
     IOWatchPoll *iwp;
+    int tag;
 
     iwp = (IOWatchPoll *) g_source_new(&io_watch_poll_funcs, sizeof(IOWatchPoll));
     iwp->fd_can_read = fd_can_read;
@@ -618,7 +619,9 @@  static guint io_add_watch_poll(GIOChannel *channel,
     iwp->fd_read = (GSourceFunc) fd_read;
     iwp->src = NULL;
 
-    return g_source_attach(&iwp->parent, NULL);
+    tag = g_source_attach(&iwp->parent, NULL);
+    g_source_unref(&iwp->parent);
+    return tag;
 }
 
 static GIOChannel *io_channel_from_fd(int fd)