diff mbox

xen-unstable qemu-upstream: libxl: error: libxl_qmp.c:440:qmp_next: timeout, libxl_qmp.c:705:libxl__qmp_initialize: Failed to connect to QMP

Message ID 715415426.20130416212325@eikelenboom.it
State New
Headers show

Commit Message

Sander Eikelenboom April 16, 2013, 7:23 p.m. UTC
Hi Stefano,

When trying to start a hvm quest with xen-unstable and qemu-upstream, it ends up stuck:

xc: info: VIRTUAL MEMORY ARRANGEMENT:
  Loader:        0000000000100000->000000000019c864
  Modules:       0000000000000000->0000000000000000
  TOTAL:         0000000000000000->000000002f800000
  ENTRY ADDRESS: 0000000000100000
xc: info: PHYSICAL MEMORY ALLOCATION:
  4KB PAGES: 0x0000000000000200
  2MB PAGES: 0x000000000000017b
  1GB PAGES: 0x0000000000000000
libxl: error: libxl_qmp.c:440:qmp_next: timeout
libxl: error: libxl_qmp.c:705:libxl__qmp_initialize: Failed to connect to QMP

Bisecting pointed to this commit:

commit 0ca5aa4f4c4a8bcc73988dd52a536241d35e5223
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Wed Apr 10 15:23:27 2013 +0200

    qemu-char: another io_add_watch_poll fix

    After attaching the source, we have to remove the reference we hold
    to it, because we do not hold anymore a pointer to the source.

    If we do not do this, removing the source will not finalize it and
    will not drop the "real" I/O watch source.

    This showed up when backporting the new flow control patches to older
    versions of QEMU that still used select.  The whole select then failed
    with EBADF (poll instead will reporting POLLNVAL on a single pollfd)
    and QEMU froze.

    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Message-id: 1365600207-21685-1-git-send-email-pbonzini@redhat.com
    Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>


Which changed:


 Any thing that springs to mind ?

 --
 Sander

Comments

Stefano Stabellini April 18, 2013, 1:27 p.m. UTC | #1
On Tue, 16 Apr 2013, Sander Eikelenboom wrote:
> Hi Stefano,
> 
> When trying to start a hvm quest with xen-unstable and qemu-upstream, it ends up stuck:
> 
> xc: info: VIRTUAL MEMORY ARRANGEMENT:
>   Loader:        0000000000100000->000000000019c864
>   Modules:       0000000000000000->0000000000000000
>   TOTAL:         0000000000000000->000000002f800000
>   ENTRY ADDRESS: 0000000000100000
> xc: info: PHYSICAL MEMORY ALLOCATION:
>   4KB PAGES: 0x0000000000000200
>   2MB PAGES: 0x000000000000017b
>   1GB PAGES: 0x0000000000000000
> libxl: error: libxl_qmp.c:440:qmp_next: timeout
> libxl: error: libxl_qmp.c:705:libxl__qmp_initialize: Failed to connect to QMP
> 
> Bisecting pointed to this commit:
> 
> commit 0ca5aa4f4c4a8bcc73988dd52a536241d35e5223
> Author: Paolo Bonzini <pbonzini@redhat.com>
> Date:   Wed Apr 10 15:23:27 2013 +0200
> 
>     qemu-char: another io_add_watch_poll fix
> 
>     After attaching the source, we have to remove the reference we hold
>     to it, because we do not hold anymore a pointer to the source.
> 
>     If we do not do this, removing the source will not finalize it and
>     will not drop the "real" I/O watch source.
> 
>     This showed up when backporting the new flow control patches to older
>     versions of QEMU that still used select.  The whole select then failed
>     with EBADF (poll instead will reporting POLLNVAL on a single pollfd)
>     and QEMU froze.
> 
>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>     Message-id: 1365600207-21685-1-git-send-email-pbonzini@redhat.com
>     Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
> 
> 
> Which changed:
> 
> --- a/qemu-char.c
> +++ b/qemu-char.c
> @@ -663,6 +663,7 @@ static guint io_add_watch_poll(GIOChannel *channel,
>                                 gpointer user_data)
>  {
>      IOWatchPoll *iwp;
> +    int tag;
>  
>      iwp = (IOWatchPoll *) g_source_new(&io_watch_poll_funcs, sizeof(IOWatchPoll));
>      iwp->fd_can_read = fd_can_read;
> @@ -671,7 +672,9 @@ static guint io_add_watch_poll(GIOChannel *channel,
>      iwp->fd_read = (GSourceFunc) fd_read;
>      iwp->src = NULL;
>  
> -    return g_source_attach(&iwp->parent, NULL);
> +    tag = g_source_attach(&iwp->parent, NULL);
> +    g_source_unref(&iwp->parent);
> +    return tag;
>  }
>  
>  #ifndef _WIN32
> 
>  Any thing that springs to mind ?

Nope, sorry.
Given the commit that your bisection identified and the error message,
it's clear that it's a QMP connection issue from libxl. I would start
looking in libxl_qmp.c.
Maybe Anthony (CC'ed) has some ideas.
Paolo Bonzini April 18, 2013, 1:32 p.m. UTC | #2
Il 18/04/2013 15:27, Stefano Stabellini ha scritto:
>> > 
>> >  Any thing that springs to mind ?
> Nope, sorry.
> Given the commit that your bisection identified and the error message,
> it's clear that it's a QMP connection issue from libxl. I would start
> looking in libxl_qmp.c.
> Maybe Anthony (CC'ed) has some ideas.

What host is this?  It is likely a glib or qemu-char bug (hopefully
 the deadlock that Gerd found on RHEL6) that was latent up to this patch.

Paolo
Sander Eikelenboom April 18, 2013, 11:21 p.m. UTC | #3
Thursday, April 18, 2013, 3:32:37 PM, you wrote:

> Il 18/04/2013 15:27, Stefano Stabellini ha scritto:
>>> > 
>>> >  Any thing that springs to mind ?
>> Nope, sorry.
>> Given the commit that your bisection identified and the error message,
>> it's clear that it's a QMP connection issue from libxl. I would start
>> looking in libxl_qmp.c.
>> Maybe Anthony (CC'ed) has some ideas.

> What host is this?  It is likely a glib or qemu-char bug (hopefully
>  the deadlock that Gerd found on RHEL6) that was latent up to this patch.

The host/Dom0 is Debian squeeze (glibc ) on xen-unstable, using qemu-upstream.
The guest is Debian wheezy.

Previously (older xen-unstable and qemu-upstream (from yesterdays tree)), it worked without connecting the console (serial=pty and using "xl create -c") and only reverting 0ca5aa4f4c4a8bcc73988dd52a536241d35e5223 from qemu-upstream.

With the latest xen-unstable (764012376a096a45bced88ee2ec1a6c17c6c22c7) and qemu-upstream (09dada400328d75daf79e3eca1e48e024fec148d):

Without reverting anything, and using serial=pty and connecting the console at boot with "xl create -c", i get:
    xc: info: VIRTUAL MEMORY ARRANGEMENT:
      Loader:        0000000000100000->000000000019c864
      Modules:       0000000000000000->0000000000000000
      TOTAL:         0000000000000000->000000002f800000
      ENTRY ADDRESS: 0000000000100000
    xc: info: PHYSICAL MEMORY ALLOCATION:
      4KB PAGES: 0x0000000000000200
      2MB PAGES: 0x000000000000017b
      1GB PAGES: 0x0000000000000000
    libxl: error: libxl_qmp.c:440:qmp_next: timeout
    libxl: error: libxl_qmp.c:705:libxl__qmp_initialize: Failed to connect to QMP
    libxl: error: libxl_qmp.c:440:qmp_next: timeout
    libxl: error: libxl_qmp.c:705:libxl__qmp_initialize: Failed to connect to QMP
    libxl: error: libxl_qmp.c:690:libxl__qmp_initialize: Connection error: Resource temporarily unavailable
    Daemon running with PID 10174
    xenconsole: Could not read tty from store: No such file or directory
    libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: console child [0] exited with error status 2

and qemu log only shows:
    char device redirected to /dev/pts/17 (label serial0)






With reverting commit 0ca5aa4f4c4a8bcc73988dd52a536241d35e5223, and using serial=pty and connecting the console at boot with "xl create -c", i get:

The Guest starts fine.

and qemu log only shows:
    char device redirected to /dev/pts/16 (label serial0)
    main-loop: WARNING: I/O thread spun for 1000 iterations
    xen be: vkbd-0: initialise() failed
    xen be: vkbd-0: initialise() failed
    xen be: vkbd-0: initialise() failed






With reverting commit 0ca5aa4f4c4a8bcc73988dd52a536241d35e5223, and using serial=pty and NOT connecting the console at boot with "xl create" (omitting the -c), i get:

    Parsing config from /etc/xen/domU/production/security.cfg
    xc: info: VIRTUAL MEMORY ARRANGEMENT:
      Loader:        0000000000100000->000000000019c864
      Modules:       0000000000000000->0000000000000000
      TOTAL:         0000000000000000->000000002f800000
      ENTRY ADDRESS: 0000000000100000
    xc: info: PHYSICAL MEMORY ALLOCATION:
      4KB PAGES: 0x0000000000000200
      2MB PAGES: 0x000000000000017b
      1GB PAGES: 0x0000000000000000
    Daemon running with PID 468

So the guest starts, but it runs very very very very slowly.
When using "xl console", and after i hit a key, i see the kernel boot output being printed very slowly with about 1 character per second ...

and qemu log only shows:
    char device redirected to /dev/pts/16 (label serial0)
    main-loop: WARNING: I/O thread spun for 1000 iterations



So it looks like it's a output buffer that's filled to the max that makes it grind to a halt.

--
Sander

> Paolo
Paolo Bonzini April 19, 2013, 7:27 a.m. UTC | #4
Il 19/04/2013 01:21, Sander Eikelenboom ha scritto:> 
> Thursday, April 18, 2013, 3:32:37 PM, you wrote:
> 
>> Il 18/04/2013 15:27, Stefano Stabellini ha scritto:
>>>>>
>>>>>  Any thing that springs to mind ?
>>> Nope, sorry.
>>> Given the commit that your bisection identified and the error message,
>>> it's clear that it's a QMP connection issue from libxl. I would start
>>> looking in libxl_qmp.c.
>>> Maybe Anthony (CC'ed) has some ideas.
> 
>> What host is this?  It is likely a glib or qemu-char bug (hopefully
>>  the deadlock that Gerd found on RHEL6) that was latent up to this patch.
> 
> The host/Dom0 is Debian squeeze (glibc ) on xen-unstable, using qemu-upstream.
> The guest is Debian wheezy.

What version is glib on the host?

> and qemu log only shows:
>     char device redirected to /dev/pts/16 (label serial0)

Ok, there are patches on the list already for this issue.

Paolo
Sander Eikelenboom April 19, 2013, 7:52 a.m. UTC | #5
Friday, April 19, 2013, 9:27:34 AM, you wrote:

> Il 19/04/2013 01:21, Sander Eikelenboom ha scritto:> 
>> Thursday, April 18, 2013, 3:32:37 PM, you wrote:
>> 
>>> Il 18/04/2013 15:27, Stefano Stabellini ha scritto:
>>>>>>
>>>>>>  Any thing that springs to mind ?
>>>> Nope, sorry.
>>>> Given the commit that your bisection identified and the error message,
>>>> it's clear that it's a QMP connection issue from libxl. I would start
>>>> looking in libxl_qmp.c.
>>>> Maybe Anthony (CC'ed) has some ideas.
>> 
>>> What host is this?  It is likely a glib or qemu-char bug (hopefully
>>>  the deadlock that Gerd found on RHEL6) that was latent up to this patch.
>> 
>> The host/Dom0 is Debian squeeze (glibc ) on xen-unstable, using qemu-upstream.
>> The guest is Debian wheezy.

> What version is glib on the host?
Ah i knew i forgot to look something up ..

ii  libglib2.0-0                                                                   2.24.2-1                           The GLib library of C routines
ii  libglib2.0-data                                                                2.24.2-1                           Common files for GLib library
ii  libglib2.0-dev                                                                 2.24.2-1                           Development files for the GLib library


>> and qemu log only shows:
>>     char device redirected to /dev/pts/16 (label serial0)

> Ok, there are patches on the list already for this issue.

> Paolo
Sander Eikelenboom April 19, 2013, 7:56 a.m. UTC | #6
Friday, April 19, 2013, 9:27:34 AM, you wrote:

> Il 19/04/2013 01:21, Sander Eikelenboom ha scritto:> 
>> Thursday, April 18, 2013, 3:32:37 PM, you wrote:
>> 
>>> Il 18/04/2013 15:27, Stefano Stabellini ha scritto:
>>>>>>
>>>>>>  Any thing that springs to mind ?
>>>> Nope, sorry.
>>>> Given the commit that your bisection identified and the error message,
>>>> it's clear that it's a QMP connection issue from libxl. I would start
>>>> looking in libxl_qmp.c.
>>>> Maybe Anthony (CC'ed) has some ideas.
>> 
>>> What host is this?  It is likely a glib or qemu-char bug (hopefully
>>>  the deadlock that Gerd found on RHEL6) that was latent up to this patch.
>> 
>> The host/Dom0 is Debian squeeze (glibc ) on xen-unstable, using qemu-upstream.
>> The guest is Debian wheezy.

> What version is glib on the host?

Somewhat more precise:

~# /lib/libc.so.6
GNU C Library (Debian EGLIBC 2.11.3-4) stable release version 2.11.3, by Roland McGrath et al.
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.4.5.
Compiled on a Linux 2.6.32 system on 2012-06-06.
Available extensions:
        crypt add-on version 2.1 by Michael Glad and others
        GNU Libidn by Simon Josefsson
        Native POSIX Threads Library by Ulrich Drepper et al
        BIND-8.2.3-T5B
For bug reporting instructions, please see:
<http://www.debian.org/Bugs/>.


>> and qemu log only shows:
>>     char device redirected to /dev/pts/16 (label serial0)

> Ok, there are patches on the list already for this issue.

> Paolo
Paolo Bonzini April 19, 2013, 8:04 a.m. UTC | #7
Il 19/04/2013 09:52, Sander Eikelenboom ha scritto:
>> > What version is glib on the host?
> Ah i knew i forgot to look something up ..
> 
> ii  libglib2.0-0                                                                   2.24.2-1                           The GLib library of C routines
> ii  libglib2.0-data                                                                2.24.2-1                           Common files for GLib library
> ii  libglib2.0-dev                                                                 2.24.2-1                           Development files for the GLib library

This very likely has the same deadlock bug that Gerd found on RHEL6.

Paolo
Sander Eikelenboom April 19, 2013, 12:56 p.m. UTC | #8
Friday, April 19, 2013, 10:04:22 AM, you wrote:

> Il 19/04/2013 09:52, Sander Eikelenboom ha scritto:
>>> > What version is glib on the host?
>> Ah i knew i forgot to look something up ..
>> 
>> ii  libglib2.0-0                                                                   2.24.2-1                           The GLib library of C routines
>> ii  libglib2.0-data                                                                2.24.2-1                           Common files for GLib library
>> ii  libglib2.0-dev                                                                 2.24.2-1                           Development files for the GLib library

> This very likely has the same deadlock bug that Gerd found on RHEL6.

Any ideas on how to get things fixed ?

> Paolo
Paolo Bonzini April 19, 2013, 1 p.m. UTC | #9
Il 19/04/2013 14:56, Sander Eikelenboom ha scritto:
> 
> Friday, April 19, 2013, 10:04:22 AM, you wrote:
> 
>> Il 19/04/2013 09:52, Sander Eikelenboom ha scritto:
>>>>> What version is glib on the host?
>>> Ah i knew i forgot to look something up ..
>>>
>>> ii  libglib2.0-0                                                                   2.24.2-1                           The GLib library of C routines
>>> ii  libglib2.0-data                                                                2.24.2-1                           Common files for GLib library
>>> ii  libglib2.0-dev                                                                 2.24.2-1                           Development files for the GLib library
> 
>> This very likely has the same deadlock bug that Gerd found on RHEL6.
> 
> Any ideas on how to get things fixed ?

Either I or Gerd will test the prototype patch I posted yesterday, and
it will be ok in -rc0.

Paolo
Sander Eikelenboom April 19, 2013, 4:53 p.m. UTC | #10
Friday, April 19, 2013, 3:00:56 PM, you wrote:

> Il 19/04/2013 14:56, Sander Eikelenboom ha scritto:
>> 
>> Friday, April 19, 2013, 10:04:22 AM, you wrote:
>> 
>>> Il 19/04/2013 09:52, Sander Eikelenboom ha scritto:
>>>>>> What version is glib on the host?
>>>> Ah i knew i forgot to look something up ..
>>>>
>>>> ii  libglib2.0-0                                                                   2.24.2-1                           The GLib library of C routines
>>>> ii  libglib2.0-data                                                                2.24.2-1                           Common files for GLib library
>>>> ii  libglib2.0-dev                                                                 2.24.2-1                           Development files for the GLib library
>> 
>>> This very likely has the same deadlock bug that Gerd found on RHEL6.
>> 
>> Any ideas on how to get things fixed ?

> Either I or Gerd will test the prototype patch I posted yesterday, and
> it will be ok in -rc0.

This prototype patch fixes the guest start issue for me.

Thx,

Sander

> Paolo
Paolo Bonzini April 19, 2013, 5 p.m. UTC | #11
Il 19/04/2013 18:53, Sander Eikelenboom ha scritto:
> 
> Friday, April 19, 2013, 3:00:56 PM, you wrote:
> 
>> Il 19/04/2013 14:56, Sander Eikelenboom ha scritto:
>>>
>>> Friday, April 19, 2013, 10:04:22 AM, you wrote:
>>>
>>>> Il 19/04/2013 09:52, Sander Eikelenboom ha scritto:
>>>>>>> What version is glib on the host?
>>>>> Ah i knew i forgot to look something up ..
>>>>>
>>>>> ii  libglib2.0-0                                                                   2.24.2-1                           The GLib library of C routines
>>>>> ii  libglib2.0-data                                                                2.24.2-1                           Common files for GLib library
>>>>> ii  libglib2.0-dev                                                                 2.24.2-1                           Development files for the GLib library
>>>
>>>> This very likely has the same deadlock bug that Gerd found on RHEL6.
>>>
>>> Any ideas on how to get things fixed ?
> 
>> Either I or Gerd will test the prototype patch I posted yesterday, and
>> it will be ok in -rc0.
> 
> This prototype patch fixes the guest start issue for me.

Cool, I just reposted the 4 patches together.  Can you reply with the
text "Tested-by: Sander Eikelenboom <linux@eikelenboom.it>"?

Paolo
Sander Eikelenboom April 19, 2013, 5:03 p.m. UTC | #12
Friday, April 19, 2013, 7:00:11 PM, you wrote:

> Il 19/04/2013 18:53, Sander Eikelenboom ha scritto:
>> 
>> Friday, April 19, 2013, 3:00:56 PM, you wrote:
>> 
>>> Il 19/04/2013 14:56, Sander Eikelenboom ha scritto:
>>>>
>>>> Friday, April 19, 2013, 10:04:22 AM, you wrote:
>>>>
>>>>> Il 19/04/2013 09:52, Sander Eikelenboom ha scritto:
>>>>>>>> What version is glib on the host?
>>>>>> Ah i knew i forgot to look something up ..
>>>>>>
>>>>>> ii  libglib2.0-0                                                                   2.24.2-1                           The GLib library of C routines
>>>>>> ii  libglib2.0-data                                                                2.24.2-1                           Common files for GLib library
>>>>>> ii  libglib2.0-dev                                                                 2.24.2-1                           Development files for the GLib library
>>>>
>>>>> This very likely has the same deadlock bug that Gerd found on RHEL6.
>>>>
>>>> Any ideas on how to get things fixed ?
>> 
>>> Either I or Gerd will test the prototype patch I posted yesterday, and
>>> it will be ok in -rc0.
>> 
>> This prototype patch fixes the guest start issue for me.

> Cool, I just reposted the 4 patches together.  Can you reply with the
> text "Tested-by: Sander Eikelenboom <linux@eikelenboom.it>"?

I'm not subscribed to the qemu-devel list, so unfortunately, i have no message-id to reply to.
However i'm fine with you sticking the tested-by on it.

--
Sander

> Paolo
Sander Eikelenboom April 20, 2013, 2:36 p.m. UTC | #13
Friday, April 19, 2013, 7:15:59 PM, you wrote:

> Il 19/04/2013 19:11, Sander Eikelenboom ha scritto:
>> BTW, is there a relation with the "main-loop: WARNING: I/O thread spun for 1000 iterations" ?
>> I'm still seeing that one popping up in the log.

> If you want to try to up the count to 10000 or 100000 it would be nice.

> If it doesn't disappear with 100000 it's definitely a bug.

It's somewhere between 10000 and 50000

> Paolo
diff mbox

Patch

--- a/qemu-char.c
+++ b/qemu-char.c
@@ -663,6 +663,7 @@  static guint io_add_watch_poll(GIOChannel *channel,
                                gpointer user_data)
 {
     IOWatchPoll *iwp;
+    int tag;
 
     iwp = (IOWatchPoll *) g_source_new(&io_watch_poll_funcs, sizeof(IOWatchPoll));
     iwp->fd_can_read = fd_can_read;
@@ -671,7 +672,9 @@  static guint io_add_watch_poll(GIOChannel *channel,
     iwp->fd_read = (GSourceFunc) fd_read;
     iwp->src = NULL;
 
-    return g_source_attach(&iwp->parent, NULL);
+    tag = g_source_attach(&iwp->parent, NULL);
+    g_source_unref(&iwp->parent);
+    return tag;
 }
 
 #ifndef _WIN32