Patchwork Abort in monitor_puts.

login
register
mail settings
Submitter Luiz Capitulino
Date March 22, 2013, 9:39 p.m.
Message ID <20130322173904.66d2f5ce@doriath>
Download mbox | patch
Permalink /patch/230268/
State New
Headers show

Comments

Luiz Capitulino - March 22, 2013, 9:39 p.m.
On Fri, 22 Mar 2013 16:50:39 -0400
Luiz Capitulino <lcapitulino@redhat.com> wrote:

> On Fri, 22 Mar 2013 10:17:58 +0100
> KONRAD Frédéric <fred.konrad@greensocs.com> wrote:
> 
> > Hi,
> > 
> > Seems there is an issue with the current git (found by toddf on IRC).
> > 
> > To reproduce:
> > 
> > ./qemu-system-x86_64 --monitor stdio --nographic
> > 
> > and put "?" it should abort.
> > 
> > Here is the backtrace:
> > 
> > #0  0x00007f77cd347935 in raise () from /lib64/libc.so.6
> > #1  0x00007f77cd3490e8 in abort () from /lib64/libc.so.6
> > #2  0x00007f77cd3406a2 in __assert_fail_base () from /lib64/libc.so.6
> > #3  0x00007f77cd340752 in __assert_fail () from /lib64/libc.so.6
> > #4  0x00007f77d1c1f226 in monitor_puts (mon=<optimized out>,
> >      str=<optimized out>) at 
> 
> Yes, it's easy to reproduce. Bisect says:
> 
> f628926bb423fa8a7e0b114511400ea9df38b76a is the first bad commit
> commit f628926bb423fa8a7e0b114511400ea9df38b76a
> Author: Gerd Hoffmann <kraxel@redhat.com>
> Date:   Tue Mar 19 10:57:56 2013 +0100
> 
>     fix monitor
>     
>     chardev flow control broke monitor, fix it by adding watch support.
>     
>     Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
> 
> My impression is that monitor_puts() in being called in parallel.

Not all.

What's happening is that qemu_chr_fe_write() is returning < 0,
mon->outbuf_index is not reset and is full, this causes the assert in
monitor_puts() to trig.

The previous version of monitor_flush() ignores errors, and everything
works, so doing the same thing here fixes the problem :)

For some reason I'm unable to see what the error code is. Gerd, do you think
the patch below is reasonable? If it's not, how should we handle errors here?
Gerd Hoffmann - March 25, 2013, 7:42 a.m.
On 03/22/13 22:39, Luiz Capitulino wrote:
> On Fri, 22 Mar 2013 16:50:39 -0400
> Luiz Capitulino <lcapitulino@redhat.com> wrote:
> 
>> On Fri, 22 Mar 2013 10:17:58 +0100
>> KONRAD Frédéric <fred.konrad@greensocs.com> wrote:
>>
>>> Hi,
>>>
>>> Seems there is an issue with the current git (found by toddf on IRC).
>>>
>>> To reproduce:
>>>
>>> ./qemu-system-x86_64 --monitor stdio --nographic
>>>
>>> and put "?" it should abort.
>>>
>>> Here is the backtrace:
>>>
>>> #0  0x00007f77cd347935 in raise () from /lib64/libc.so.6
>>> #1  0x00007f77cd3490e8 in abort () from /lib64/libc.so.6
>>> #2  0x00007f77cd3406a2 in __assert_fail_base () from /lib64/libc.so.6
>>> #3  0x00007f77cd340752 in __assert_fail () from /lib64/libc.so.6
>>> #4  0x00007f77d1c1f226 in monitor_puts (mon=<optimized out>,
>>>      str=<optimized out>) at 
>>
>> Yes, it's easy to reproduce. Bisect says:
>>
>> f628926bb423fa8a7e0b114511400ea9df38b76a is the first bad commit
>> commit f628926bb423fa8a7e0b114511400ea9df38b76a
>> Author: Gerd Hoffmann <kraxel@redhat.com>
>> Date:   Tue Mar 19 10:57:56 2013 +0100
>>
>>     fix monitor
>>     
>>     chardev flow control broke monitor, fix it by adding watch support.
>>     
>>     Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
>>
>> My impression is that monitor_puts() in being called in parallel.
> 
> Not all.
> 
> What's happening is that qemu_chr_fe_write() is returning < 0,
> mon->outbuf_index is not reset and is full, this causes the assert in
> monitor_puts() to trig.
> 
> The previous version of monitor_flush() ignores errors, and everything
> works, so doing the same thing here fixes the problem :)

No, ignoring errors breaks qmp because the output isn't valid json any
more when you cut off something ...

> For some reason I'm unable to see what the error code is. Gerd, do you think
> the patch below is reasonable? If it's not, how should we handle errors here?

No, it's not.

Ignoring the error for errno = EAGAIN breaks flow control.

Ignoring the error for errno != EAGAIN (and maybe logging a debug
message) would be ok, but I suspect it's actually EAGAIN you get here.

Just go for a larger buffer?

cheers,
  Gerd
Luiz Capitulino - March 25, 2013, 11:56 a.m.
On Mon, 25 Mar 2013 08:42:57 +0100
Gerd Hoffmann <kraxel@redhat.com> wrote:

> On 03/22/13 22:39, Luiz Capitulino wrote:
> > On Fri, 22 Mar 2013 16:50:39 -0400
> > Luiz Capitulino <lcapitulino@redhat.com> wrote:
> > 
> >> On Fri, 22 Mar 2013 10:17:58 +0100
> >> KONRAD Frédéric <fred.konrad@greensocs.com> wrote:
> >>
> >>> Hi,
> >>>
> >>> Seems there is an issue with the current git (found by toddf on IRC).
> >>>
> >>> To reproduce:
> >>>
> >>> ./qemu-system-x86_64 --monitor stdio --nographic
> >>>
> >>> and put "?" it should abort.
> >>>
> >>> Here is the backtrace:
> >>>
> >>> #0  0x00007f77cd347935 in raise () from /lib64/libc.so.6
> >>> #1  0x00007f77cd3490e8 in abort () from /lib64/libc.so.6
> >>> #2  0x00007f77cd3406a2 in __assert_fail_base () from /lib64/libc.so.6
> >>> #3  0x00007f77cd340752 in __assert_fail () from /lib64/libc.so.6
> >>> #4  0x00007f77d1c1f226 in monitor_puts (mon=<optimized out>,
> >>>      str=<optimized out>) at 
> >>
> >> Yes, it's easy to reproduce. Bisect says:
> >>
> >> f628926bb423fa8a7e0b114511400ea9df38b76a is the first bad commit
> >> commit f628926bb423fa8a7e0b114511400ea9df38b76a
> >> Author: Gerd Hoffmann <kraxel@redhat.com>
> >> Date:   Tue Mar 19 10:57:56 2013 +0100
> >>
> >>     fix monitor
> >>     
> >>     chardev flow control broke monitor, fix it by adding watch support.
> >>     
> >>     Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
> >>
> >> My impression is that monitor_puts() in being called in parallel.
> > 
> > Not all.
> > 
> > What's happening is that qemu_chr_fe_write() is returning < 0,
> > mon->outbuf_index is not reset and is full, this causes the assert in
> > monitor_puts() to trig.
> > 
> > The previous version of monitor_flush() ignores errors, and everything
> > works, so doing the same thing here fixes the problem :)
> 
> No, ignoring errors breaks qmp because the output isn't valid json any
> more when you cut off something ...

What you mean "when you cut off"? When the other side disconnects? Do we care?

> > For some reason I'm unable to see what the error code is. Gerd, do you think
> > the patch below is reasonable? If it's not, how should we handle errors here?
> 
> No, it's not.
> 
> Ignoring the error for errno = EAGAIN breaks flow control.
> 
> Ignoring the error for errno != EAGAIN (and maybe logging a debug
> message) would be ok, but I suspect it's actually EAGAIN you get here.
> 
> Just go for a larger buffer?

That's simple, but it's not a real fix. We hit that problem because
the help output is a large one. I'd guess that this is easily reproduced
with something like QIDL, which (iirc) generates long json output on QMP.

Looks like we need a dynamic buffer there.

Other ideas?
Gerd Hoffmann - March 25, 2013, 8:11 p.m.
Hi,

>>> The previous version of monitor_flush() ignores errors, and everything
>>> works, so doing the same thing here fixes the problem :)
>>
>> No, ignoring errors breaks qmp because the output isn't valid json any
>> more when you cut off something ...
> 
> What you mean "when you cut off"? When the other side disconnects? Do we care?

errno = EAGAIN means "kernel buffers full, can't accept your data atm,
try again later".  Simply ignoring this will throw away the data which
didn't fit, which will for the receiver look like someone cut off some
data from the response ...

>> Just go for a larger buffer?
> 
> That's simple, but it's not a real fix. We hit that problem because
> the help output is a large one. I'd guess that this is easily reproduced
> with something like QIDL, which (iirc) generates long json output on QMP.
> 
> Looks like we need a dynamic buffer there.

Yes.

Or generate the data piecewise (i.e. in monitor_unblocked which is
called back when the kernel has room again).  Which is probably only
worth the trouble if we have _really_ big responses (megabytes).

cheers,
  Gerd

Patch

diff --git a/monitor.c b/monitor.c
index cfb5d64..ecfe97c 100644
--- a/monitor.c
+++ b/monitor.c
@@ -274,12 +274,11 @@  void monitor_flush(Monitor *mon)
 
     if (mon && mon->outbuf_index != 0 && !mon->mux_out) {
         rc = qemu_chr_fe_write(mon->chr, mon->outbuf, mon->outbuf_index);
-        if (rc == mon->outbuf_index) {
+        if (rc == mon->outbuf_index || rc < 0) {
             /* all flushed */
             mon->outbuf_index = 0;
             return;
-        }
-        if (rc > 0) {
+        } else {
             /* partinal write */
             memmove(mon->outbuf, mon->outbuf + rc, mon->outbuf_index - rc);
             mon->outbuf_index -= rc;