diff mbox

[v2] monitor: work around delayed CHR_EVENT_OPENED events

Message ID 1369582419-28205-1-git-send-email-mdroth@linux.vnet.ibm.com
State New
Headers show

Commit Message

Michael Roth May 26, 2013, 3:33 p.m. UTC
In the past, CHR_EVENT_OPENED events were emitted via a pre-expired
QEMUTimer. Due to timers being processing at the tail end of each main
loop iteration, this generally meant that such events would be emitted
within the same main loop iteration, prior any client data being read
by tcp/unix socket server backends.

With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to
a "bottom-half" that is registered via g_idle_add(). This makes it
likely that the event won't be sent until a subsequent iteration, and
also adds the possibility that such events will race with the
processing of client data.

In cases where we rely on the CHR_EVENT_OPENED event being delivered
prior to any client data being read, this may cause unexpected behavior.

In the case of a QMP monitor session using a socket backend, the delayed
processing of the CHR_EVENT_OPENED event can lead to a situation where
a previous session, where 'qmp_capabilities' has already processed, can
cause the rejection of 'qmp_capabilities' for a subsequent session,
since we reset capabilities in response to CHR_EVENT_OPENED, which may
not yet have been delivered. This can be reproduced with the following
command, generally within 50 or so iterations:

  mdroth@loki:~$ cat cmds
  {'execute':'qmp_capabilities'}
  {'execute':'query-cpus'}
  mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock
  <cmds | grep CommandNotFound &>/dev/null; then echo failed; break; else
  echo ok; fi; done;
  ok
  ok
  failed
  mdroth@loki:~$

As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED
hook, which gets called as part of the GIOChannel cb associated with the
client rather than a bottom-half, and is thus guaranteed to be delivered
prior to accepting any subsequent client sessions.

This does not address the more general problem of whether or not there
are chardev users that rely on CHR_EVENT_OPENED being delivered prior to
client data, and whether or not we should consider moving CHR_EVENT_OPENED
"in-band" with connection establishment as a general solution, but fixes
QMP for the time being.

Reported-by: Stefan Priebe <s.priebe@profihost.ag>
Cc: qemu-stable@nongnu.org
Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
v1->v2:
* remove command_mode reset from CHR_EVENT_OPENED case, since this
  might still cause a race

 monitor.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Luiz Capitulino May 27, 2013, 3:24 p.m. UTC | #1
On Sun, 26 May 2013 10:33:39 -0500
Michael Roth <mdroth@linux.vnet.ibm.com> wrote:

> In the past, CHR_EVENT_OPENED events were emitted via a pre-expired
> QEMUTimer. Due to timers being processing at the tail end of each main
> loop iteration, this generally meant that such events would be emitted
> within the same main loop iteration, prior any client data being read
> by tcp/unix socket server backends.
> 
> With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to
> a "bottom-half" that is registered via g_idle_add(). This makes it
> likely that the event won't be sent until a subsequent iteration, and
> also adds the possibility that such events will race with the
> processing of client data.
> 
> In cases where we rely on the CHR_EVENT_OPENED event being delivered
> prior to any client data being read, this may cause unexpected behavior.
> 
> In the case of a QMP monitor session using a socket backend, the delayed
> processing of the CHR_EVENT_OPENED event can lead to a situation where
> a previous session, where 'qmp_capabilities' has already processed, can
> cause the rejection of 'qmp_capabilities' for a subsequent session,
> since we reset capabilities in response to CHR_EVENT_OPENED, which may
> not yet have been delivered. This can be reproduced with the following
> command, generally within 50 or so iterations:
> 
>   mdroth@loki:~$ cat cmds
>   {'execute':'qmp_capabilities'}
>   {'execute':'query-cpus'}
>   mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock
>   <cmds | grep CommandNotFound &>/dev/null; then echo failed; break; else
>   echo ok; fi; done;
>   ok
>   ok
>   failed
>   mdroth@loki:~$
> 
> As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED
> hook, which gets called as part of the GIOChannel cb associated with the
> client rather than a bottom-half, and is thus guaranteed to be delivered
> prior to accepting any subsequent client sessions.
> 
> This does not address the more general problem of whether or not there
> are chardev users that rely on CHR_EVENT_OPENED being delivered prior to
> client data, and whether or not we should consider moving CHR_EVENT_OPENED
> "in-band" with connection establishment as a general solution, but fixes
> QMP for the time being.
> 
> Reported-by: Stefan Priebe <s.priebe@profihost.ag>
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>

Thanks for debugging this Michael. I'm going to apply your patch to the qmp
branch because we can't let this broken (esp. in -stable) but this is papering
over the real bug...

> ---
> v1->v2:
> * remove command_mode reset from CHR_EVENT_OPENED case, since this
>   might still cause a race
> 
>  monitor.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/monitor.c b/monitor.c
> index 6ce2a4e..f1953a0 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -4649,7 +4649,6 @@ static void monitor_control_event(void *opaque, int event)
>  
>      switch (event) {
>      case CHR_EVENT_OPENED:
> -        mon->mc->command_mode = 0;
>          data = get_qmp_greeting();
>          monitor_json_emitter(mon, data);
>          qobject_decref(data);
> @@ -4660,6 +4659,7 @@ static void monitor_control_event(void *opaque, int event)
>          json_message_parser_init(&mon->mc->parser, handle_qmp_command);
>          mon_refcount--;
>          monitor_fdsets_cleanup();
> +        mon->mc->command_mode = 0;
>          break;
>      }
>  }
Anthony Liguori May 27, 2013, 4:16 p.m. UTC | #2
Luiz Capitulino <lcapitulino@redhat.com> writes:

> On Sun, 26 May 2013 10:33:39 -0500
> Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
>
>> In the past, CHR_EVENT_OPENED events were emitted via a pre-expired
>> QEMUTimer. Due to timers being processing at the tail end of each main
>> loop iteration, this generally meant that such events would be emitted
>> within the same main loop iteration, prior any client data being read
>> by tcp/unix socket server backends.
>> 
>> With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to
>> a "bottom-half" that is registered via g_idle_add(). This makes it
>> likely that the event won't be sent until a subsequent iteration, and
>> also adds the possibility that such events will race with the
>> processing of client data.
>> 
>> In cases where we rely on the CHR_EVENT_OPENED event being delivered
>> prior to any client data being read, this may cause unexpected behavior.
>> 
>> In the case of a QMP monitor session using a socket backend, the delayed
>> processing of the CHR_EVENT_OPENED event can lead to a situation where
>> a previous session, where 'qmp_capabilities' has already processed, can
>> cause the rejection of 'qmp_capabilities' for a subsequent session,
>> since we reset capabilities in response to CHR_EVENT_OPENED, which may
>> not yet have been delivered. This can be reproduced with the following
>> command, generally within 50 or so iterations:
>> 
>>   mdroth@loki:~$ cat cmds
>>   {'execute':'qmp_capabilities'}
>>   {'execute':'query-cpus'}
>>   mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock
>>   <cmds | grep CommandNotFound &>/dev/null; then echo failed; break; else
>>   echo ok; fi; done;
>>   ok
>>   ok
>>   failed
>>   mdroth@loki:~$
>> 
>> As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED
>> hook, which gets called as part of the GIOChannel cb associated with the
>> client rather than a bottom-half, and is thus guaranteed to be delivered
>> prior to accepting any subsequent client sessions.
>> 
>> This does not address the more general problem of whether or not there
>> are chardev users that rely on CHR_EVENT_OPENED being delivered prior to
>> client data, and whether or not we should consider moving CHR_EVENT_OPENED
>> "in-band" with connection establishment as a general solution, but fixes
>> QMP for the time being.
>> 
>> Reported-by: Stefan Priebe <s.priebe@profihost.ag>
>> Cc: qemu-stable@nongnu.org
>> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
>
> Thanks for debugging this Michael. I'm going to apply your patch to the qmp
> branch because we can't let this broken (esp. in -stable) but this is papering
> over the real bug...

Do we really need OPENED to happen in a bottom half?  Shouldn't the open
event handlers register the callback instead if they need it?

Regards,

Anthony Liguori

>
>> ---
>> v1->v2:
>> * remove command_mode reset from CHR_EVENT_OPENED case, since this
>>   might still cause a race
>> 
>>  monitor.c |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/monitor.c b/monitor.c
>> index 6ce2a4e..f1953a0 100644
>> --- a/monitor.c
>> +++ b/monitor.c
>> @@ -4649,7 +4649,6 @@ static void monitor_control_event(void *opaque, int event)
>>  
>>      switch (event) {
>>      case CHR_EVENT_OPENED:
>> -        mon->mc->command_mode = 0;
>>          data = get_qmp_greeting();
>>          monitor_json_emitter(mon, data);
>>          qobject_decref(data);
>> @@ -4660,6 +4659,7 @@ static void monitor_control_event(void *opaque, int event)
>>          json_message_parser_init(&mon->mc->parser, handle_qmp_command);
>>          mon_refcount--;
>>          monitor_fdsets_cleanup();
>> +        mon->mc->command_mode = 0;
>>          break;
>>      }
>>  }
Michael Roth May 27, 2013, 5:59 p.m. UTC | #3
On Mon, May 27, 2013 at 11:16:01AM -0500, Anthony Liguori wrote:
> Luiz Capitulino <lcapitulino@redhat.com> writes:
> 
> > On Sun, 26 May 2013 10:33:39 -0500
> > Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> >
> >> In the past, CHR_EVENT_OPENED events were emitted via a pre-expired
> >> QEMUTimer. Due to timers being processing at the tail end of each main
> >> loop iteration, this generally meant that such events would be emitted
> >> within the same main loop iteration, prior any client data being read
> >> by tcp/unix socket server backends.
> >> 
> >> With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to
> >> a "bottom-half" that is registered via g_idle_add(). This makes it
> >> likely that the event won't be sent until a subsequent iteration, and
> >> also adds the possibility that such events will race with the
> >> processing of client data.
> >> 
> >> In cases where we rely on the CHR_EVENT_OPENED event being delivered
> >> prior to any client data being read, this may cause unexpected behavior.
> >> 
> >> In the case of a QMP monitor session using a socket backend, the delayed
> >> processing of the CHR_EVENT_OPENED event can lead to a situation where
> >> a previous session, where 'qmp_capabilities' has already processed, can
> >> cause the rejection of 'qmp_capabilities' for a subsequent session,
> >> since we reset capabilities in response to CHR_EVENT_OPENED, which may
> >> not yet have been delivered. This can be reproduced with the following
> >> command, generally within 50 or so iterations:
> >> 
> >>   mdroth@loki:~$ cat cmds
> >>   {'execute':'qmp_capabilities'}
> >>   {'execute':'query-cpus'}
> >>   mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock
> >>   <cmds | grep CommandNotFound &>/dev/null; then echo failed; break; else
> >>   echo ok; fi; done;
> >>   ok
> >>   ok
> >>   failed
> >>   mdroth@loki:~$
> >> 
> >> As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED
> >> hook, which gets called as part of the GIOChannel cb associated with the
> >> client rather than a bottom-half, and is thus guaranteed to be delivered
> >> prior to accepting any subsequent client sessions.
> >> 
> >> This does not address the more general problem of whether or not there
> >> are chardev users that rely on CHR_EVENT_OPENED being delivered prior to
> >> client data, and whether or not we should consider moving CHR_EVENT_OPENED
> >> "in-band" with connection establishment as a general solution, but fixes
> >> QMP for the time being.
> >> 
> >> Reported-by: Stefan Priebe <s.priebe@profihost.ag>
> >> Cc: qemu-stable@nongnu.org
> >> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> >
> > Thanks for debugging this Michael. I'm going to apply your patch to the qmp
> > branch because we can't let this broken (esp. in -stable) but this is papering
> > over the real bug...
> 
> Do we really need OPENED to happen in a bottom half?  Shouldn't the open
> event handlers register the callback instead if they need it?

I think that's the more general fix, but wasn't sure if there were
historical reasons why we didn't do this in the first place.

Looking at it more closely it does seem like we need a general fix
sooner rather than later though, and I don't see any reason why we can't
move BH registration into the actual OPENED hooks as you suggest:

hw/usb/ccid-card-passthru.c
 - currently affected? no
    debug hook only
 - needs OPENED BH? no

hw/usb/redirect.c
 - currently affected? yes
    can_read handler checks for dev->parser != NULL, which may be
    true if CLOSED BH has been executed yet. In the past, OPENED
    quiesced outstanding CLOSED events prior to us reading client data.
 - need OPENED BH? possibly
    seems to only be needed by CLOSED events, which can be issued by
    USBDevice, so we do teardown/usb_detach in BH. For OPENED, this
    may not apply. if we do issue a BH, we'd need to modify can_read
    handler to avoid race.

hw/usb/dev-serial.c
 - currently affected? no
    can_read checks for dev.attached != NULL. set to NULL
    synchronously in CLOSED, will not accept reads until OPENED gets
    called and sets dev.attached
 - need opened BH?  no

hw/char/sclpconsole.c
 - currently affected? no
    guarded by can_read handler
 - need opened BH? no

hw/char/ipoctal232.c
 - currently affected? no
    debug hook only
 - need opened BH? no

hw/char/virtio-console.c
 - currently affected? no
    OPENED/CLOSED map to vq events handled asynchronously. can_read
    checks for guest_connected state rather than anything set by OPENED
 - need opened BH? no

qtest.c
 - currently affected? yes
    can_read handler doesn't check for qtest_opened == true, can read
    data before OPENED event is processed
 - need opened BH? no

monitor.c
 - current affected? yes
    negotiated session state can temporarilly leak into a new session
 - need opened BH? no

gdbstub.c
 - currently affected? yes
    can fail to pause machine prior to starting gdb session
 - need opened BH? no

So we have a number of cases that can be fixed by avoiding the use of
the generic BH, and only 1 possible cause where we might need a
device-specific BH.

At first glance anyway. So if this all seems reasonable I can send a
more general fix shortly.

> 
> Regards,
> 
> Anthony Liguori
> 
> >
> >> ---
> >> v1->v2:
> >> * remove command_mode reset from CHR_EVENT_OPENED case, since this
> >>   might still cause a race
> >> 
> >>  monitor.c |    2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >> 
> >> diff --git a/monitor.c b/monitor.c
> >> index 6ce2a4e..f1953a0 100644
> >> --- a/monitor.c
> >> +++ b/monitor.c
> >> @@ -4649,7 +4649,6 @@ static void monitor_control_event(void *opaque, int event)
> >>  
> >>      switch (event) {
> >>      case CHR_EVENT_OPENED:
> >> -        mon->mc->command_mode = 0;
> >>          data = get_qmp_greeting();
> >>          monitor_json_emitter(mon, data);
> >>          qobject_decref(data);
> >> @@ -4660,6 +4659,7 @@ static void monitor_control_event(void *opaque, int event)
> >>          json_message_parser_init(&mon->mc->parser, handle_qmp_command);
> >>          mon_refcount--;
> >>          monitor_fdsets_cleanup();
> >> +        mon->mc->command_mode = 0;
> >>          break;
> >>      }
> >>  }
>
Luiz Capitulino May 29, 2013, 5:27 p.m. UTC | #4
On Mon, 27 May 2013 12:59:25 -0500
mdroth <mdroth@linux.vnet.ibm.com> wrote:

> On Mon, May 27, 2013 at 11:16:01AM -0500, Anthony Liguori wrote:
> > Luiz Capitulino <lcapitulino@redhat.com> writes:
> > 
> > > On Sun, 26 May 2013 10:33:39 -0500
> > > Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> > >
> > >> In the past, CHR_EVENT_OPENED events were emitted via a pre-expired
> > >> QEMUTimer. Due to timers being processing at the tail end of each main
> > >> loop iteration, this generally meant that such events would be emitted
> > >> within the same main loop iteration, prior any client data being read
> > >> by tcp/unix socket server backends.
> > >> 
> > >> With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to
> > >> a "bottom-half" that is registered via g_idle_add(). This makes it
> > >> likely that the event won't be sent until a subsequent iteration, and
> > >> also adds the possibility that such events will race with the
> > >> processing of client data.
> > >> 
> > >> In cases where we rely on the CHR_EVENT_OPENED event being delivered
> > >> prior to any client data being read, this may cause unexpected behavior.
> > >> 
> > >> In the case of a QMP monitor session using a socket backend, the delayed
> > >> processing of the CHR_EVENT_OPENED event can lead to a situation where
> > >> a previous session, where 'qmp_capabilities' has already processed, can
> > >> cause the rejection of 'qmp_capabilities' for a subsequent session,
> > >> since we reset capabilities in response to CHR_EVENT_OPENED, which may
> > >> not yet have been delivered. This can be reproduced with the following
> > >> command, generally within 50 or so iterations:
> > >> 
> > >>   mdroth@loki:~$ cat cmds
> > >>   {'execute':'qmp_capabilities'}
> > >>   {'execute':'query-cpus'}
> > >>   mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock
> > >>   <cmds | grep CommandNotFound &>/dev/null; then echo failed; break; else
> > >>   echo ok; fi; done;
> > >>   ok
> > >>   ok
> > >>   failed
> > >>   mdroth@loki:~$
> > >> 
> > >> As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED
> > >> hook, which gets called as part of the GIOChannel cb associated with the
> > >> client rather than a bottom-half, and is thus guaranteed to be delivered
> > >> prior to accepting any subsequent client sessions.
> > >> 
> > >> This does not address the more general problem of whether or not there
> > >> are chardev users that rely on CHR_EVENT_OPENED being delivered prior to
> > >> client data, and whether or not we should consider moving CHR_EVENT_OPENED
> > >> "in-band" with connection establishment as a general solution, but fixes
> > >> QMP for the time being.
> > >> 
> > >> Reported-by: Stefan Priebe <s.priebe@profihost.ag>
> > >> Cc: qemu-stable@nongnu.org
> > >> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > >
> > > Thanks for debugging this Michael. I'm going to apply your patch to the qmp
> > > branch because we can't let this broken (esp. in -stable) but this is papering
> > > over the real bug...
> > 
> > Do we really need OPENED to happen in a bottom half?  Shouldn't the open
> > event handlers register the callback instead if they need it?
> 
> I think that's the more general fix, but wasn't sure if there were
> historical reasons why we didn't do this in the first place.
> 
> Looking at it more closely it does seem like we need a general fix
> sooner rather than later though, and I don't see any reason why we can't
> move BH registration into the actual OPENED hooks as you suggest:
> 
> hw/usb/ccid-card-passthru.c
>  - currently affected? no
>     debug hook only
>  - needs OPENED BH? no
> 
> hw/usb/redirect.c
>  - currently affected? yes
>     can_read handler checks for dev->parser != NULL, which may be
>     true if CLOSED BH has been executed yet. In the past, OPENED
>     quiesced outstanding CLOSED events prior to us reading client data.
>  - need OPENED BH? possibly
>     seems to only be needed by CLOSED events, which can be issued by
>     USBDevice, so we do teardown/usb_detach in BH. For OPENED, this
>     may not apply. if we do issue a BH, we'd need to modify can_read
>     handler to avoid race.
> 
> hw/usb/dev-serial.c
>  - currently affected? no
>     can_read checks for dev.attached != NULL. set to NULL
>     synchronously in CLOSED, will not accept reads until OPENED gets
>     called and sets dev.attached
>  - need opened BH?  no
> 
> hw/char/sclpconsole.c
>  - currently affected? no
>     guarded by can_read handler
>  - need opened BH? no
> 
> hw/char/ipoctal232.c
>  - currently affected? no
>     debug hook only
>  - need opened BH? no
> 
> hw/char/virtio-console.c
>  - currently affected? no
>     OPENED/CLOSED map to vq events handled asynchronously. can_read
>     checks for guest_connected state rather than anything set by OPENED
>  - need opened BH? no
> 
> qtest.c
>  - currently affected? yes
>     can_read handler doesn't check for qtest_opened == true, can read
>     data before OPENED event is processed
>  - need opened BH? no
> 
> monitor.c
>  - current affected? yes
>     negotiated session state can temporarilly leak into a new session
>  - need opened BH? no
> 
> gdbstub.c
>  - currently affected? yes
>     can fail to pause machine prior to starting gdb session
>  - need opened BH? no
> 
> So we have a number of cases that can be fixed by avoiding the use of
> the generic BH, and only 1 possible cause where we might need a
> device-specific BH.
> 
> At first glance anyway. So if this all seems reasonable I can send a
> more general fix shortly.

Michael, I've dropped your first patch and am taking it that you're
going to cook this more general fix.

> 
> > 
> > Regards,
> > 
> > Anthony Liguori
> > 
> > >
> > >> ---
> > >> v1->v2:
> > >> * remove command_mode reset from CHR_EVENT_OPENED case, since this
> > >>   might still cause a race
> > >> 
> > >>  monitor.c |    2 +-
> > >>  1 file changed, 1 insertion(+), 1 deletion(-)
> > >> 
> > >> diff --git a/monitor.c b/monitor.c
> > >> index 6ce2a4e..f1953a0 100644
> > >> --- a/monitor.c
> > >> +++ b/monitor.c
> > >> @@ -4649,7 +4649,6 @@ static void monitor_control_event(void *opaque, int event)
> > >>  
> > >>      switch (event) {
> > >>      case CHR_EVENT_OPENED:
> > >> -        mon->mc->command_mode = 0;
> > >>          data = get_qmp_greeting();
> > >>          monitor_json_emitter(mon, data);
> > >>          qobject_decref(data);
> > >> @@ -4660,6 +4659,7 @@ static void monitor_control_event(void *opaque, int event)
> > >>          json_message_parser_init(&mon->mc->parser, handle_qmp_command);
> > >>          mon_refcount--;
> > >>          monitor_fdsets_cleanup();
> > >> +        mon->mc->command_mode = 0;
> > >>          break;
> > >>      }
> > >>  }
> > 
>
Michael Roth May 29, 2013, 9:03 p.m. UTC | #5
On Wed, May 29, 2013 at 01:27:33PM -0400, Luiz Capitulino wrote:
> On Mon, 27 May 2013 12:59:25 -0500
> mdroth <mdroth@linux.vnet.ibm.com> wrote:
> 
> > On Mon, May 27, 2013 at 11:16:01AM -0500, Anthony Liguori wrote:
> > > Luiz Capitulino <lcapitulino@redhat.com> writes:
> > > 
> > > > On Sun, 26 May 2013 10:33:39 -0500
> > > > Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> > > >
> > > >> In the past, CHR_EVENT_OPENED events were emitted via a pre-expired
> > > >> QEMUTimer. Due to timers being processing at the tail end of each main
> > > >> loop iteration, this generally meant that such events would be emitted
> > > >> within the same main loop iteration, prior any client data being read
> > > >> by tcp/unix socket server backends.
> > > >> 
> > > >> With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to
> > > >> a "bottom-half" that is registered via g_idle_add(). This makes it
> > > >> likely that the event won't be sent until a subsequent iteration, and
> > > >> also adds the possibility that such events will race with the
> > > >> processing of client data.
> > > >> 
> > > >> In cases where we rely on the CHR_EVENT_OPENED event being delivered
> > > >> prior to any client data being read, this may cause unexpected behavior.
> > > >> 
> > > >> In the case of a QMP monitor session using a socket backend, the delayed
> > > >> processing of the CHR_EVENT_OPENED event can lead to a situation where
> > > >> a previous session, where 'qmp_capabilities' has already processed, can
> > > >> cause the rejection of 'qmp_capabilities' for a subsequent session,
> > > >> since we reset capabilities in response to CHR_EVENT_OPENED, which may
> > > >> not yet have been delivered. This can be reproduced with the following
> > > >> command, generally within 50 or so iterations:
> > > >> 
> > > >>   mdroth@loki:~$ cat cmds
> > > >>   {'execute':'qmp_capabilities'}
> > > >>   {'execute':'query-cpus'}
> > > >>   mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock
> > > >>   <cmds | grep CommandNotFound &>/dev/null; then echo failed; break; else
> > > >>   echo ok; fi; done;
> > > >>   ok
> > > >>   ok
> > > >>   failed
> > > >>   mdroth@loki:~$
> > > >> 
> > > >> As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED
> > > >> hook, which gets called as part of the GIOChannel cb associated with the
> > > >> client rather than a bottom-half, and is thus guaranteed to be delivered
> > > >> prior to accepting any subsequent client sessions.
> > > >> 
> > > >> This does not address the more general problem of whether or not there
> > > >> are chardev users that rely on CHR_EVENT_OPENED being delivered prior to
> > > >> client data, and whether or not we should consider moving CHR_EVENT_OPENED
> > > >> "in-band" with connection establishment as a general solution, but fixes
> > > >> QMP for the time being.
> > > >> 
> > > >> Reported-by: Stefan Priebe <s.priebe@profihost.ag>
> > > >> Cc: qemu-stable@nongnu.org
> > > >> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > > >
> > > > Thanks for debugging this Michael. I'm going to apply your patch to the qmp
> > > > branch because we can't let this broken (esp. in -stable) but this is papering
> > > > over the real bug...
> > > 
> > > Do we really need OPENED to happen in a bottom half?  Shouldn't the open
> > > event handlers register the callback instead if they need it?
> > 
> > I think that's the more general fix, but wasn't sure if there were
> > historical reasons why we didn't do this in the first place.
> > 
> > Looking at it more closely it does seem like we need a general fix
> > sooner rather than later though, and I don't see any reason why we can't
> > move BH registration into the actual OPENED hooks as you suggest:
> > 
> > hw/usb/ccid-card-passthru.c
> >  - currently affected? no
> >     debug hook only
> >  - needs OPENED BH? no
> > 
> > hw/usb/redirect.c
> >  - currently affected? yes
> >     can_read handler checks for dev->parser != NULL, which may be
> >     true if CLOSED BH has been executed yet. In the past, OPENED
> >     quiesced outstanding CLOSED events prior to us reading client data.
> >  - need OPENED BH? possibly
> >     seems to only be needed by CLOSED events, which can be issued by
> >     USBDevice, so we do teardown/usb_detach in BH. For OPENED, this
> >     may not apply. if we do issue a BH, we'd need to modify can_read
> >     handler to avoid race.
> > 
> > hw/usb/dev-serial.c
> >  - currently affected? no
> >     can_read checks for dev.attached != NULL. set to NULL
> >     synchronously in CLOSED, will not accept reads until OPENED gets
> >     called and sets dev.attached
> >  - need opened BH?  no
> > 
> > hw/char/sclpconsole.c
> >  - currently affected? no
> >     guarded by can_read handler
> >  - need opened BH? no
> > 
> > hw/char/ipoctal232.c
> >  - currently affected? no
> >     debug hook only
> >  - need opened BH? no
> > 
> > hw/char/virtio-console.c
> >  - currently affected? no
> >     OPENED/CLOSED map to vq events handled asynchronously. can_read
> >     checks for guest_connected state rather than anything set by OPENED
> >  - need opened BH? no
> > 
> > qtest.c
> >  - currently affected? yes
> >     can_read handler doesn't check for qtest_opened == true, can read
> >     data before OPENED event is processed
> >  - need opened BH? no
> > 
> > monitor.c
> >  - current affected? yes
> >     negotiated session state can temporarilly leak into a new session
> >  - need opened BH? no
> > 
> > gdbstub.c
> >  - currently affected? yes
> >     can fail to pause machine prior to starting gdb session
> >  - need opened BH? no
> > 
> > So we have a number of cases that can be fixed by avoiding the use of
> > the generic BH, and only 1 possible cause where we might need a
> > device-specific BH.
> > 
> > At first glance anyway. So if this all seems reasonable I can send a
> > more general fix shortly.
> 
> Michael, I've dropped your first patch and am taking it that you're
> going to cook this more general fix.
> 

Yup, baking now

> > 
> > > 
> > > Regards,
> > > 
> > > Anthony Liguori
> > > 
> > > >
> > > >> ---
> > > >> v1->v2:
> > > >> * remove command_mode reset from CHR_EVENT_OPENED case, since this
> > > >>   might still cause a race
> > > >> 
> > > >>  monitor.c |    2 +-
> > > >>  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >> 
> > > >> diff --git a/monitor.c b/monitor.c
> > > >> index 6ce2a4e..f1953a0 100644
> > > >> --- a/monitor.c
> > > >> +++ b/monitor.c
> > > >> @@ -4649,7 +4649,6 @@ static void monitor_control_event(void *opaque, int event)
> > > >>  
> > > >>      switch (event) {
> > > >>      case CHR_EVENT_OPENED:
> > > >> -        mon->mc->command_mode = 0;
> > > >>          data = get_qmp_greeting();
> > > >>          monitor_json_emitter(mon, data);
> > > >>          qobject_decref(data);
> > > >> @@ -4660,6 +4659,7 @@ static void monitor_control_event(void *opaque, int event)
> > > >>          json_message_parser_init(&mon->mc->parser, handle_qmp_command);
> > > >>          mon_refcount--;
> > > >>          monitor_fdsets_cleanup();
> > > >> +        mon->mc->command_mode = 0;
> > > >>          break;
> > > >>      }
> > > >>  }
> > > 
> > 
>
Michael Roth May 29, 2013, 10:29 p.m. UTC | #6
On Wed, May 29, 2013 at 01:27:33PM -0400, Luiz Capitulino wrote:
> On Mon, 27 May 2013 12:59:25 -0500
> mdroth <mdroth@linux.vnet.ibm.com> wrote:
> 
> > On Mon, May 27, 2013 at 11:16:01AM -0500, Anthony Liguori wrote:
> > > Luiz Capitulino <lcapitulino@redhat.com> writes:
> > > 
> > > > On Sun, 26 May 2013 10:33:39 -0500
> > > > Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> > > >
> > > >> In the past, CHR_EVENT_OPENED events were emitted via a pre-expired
> > > >> QEMUTimer. Due to timers being processing at the tail end of each main
> > > >> loop iteration, this generally meant that such events would be emitted
> > > >> within the same main loop iteration, prior any client data being read
> > > >> by tcp/unix socket server backends.
> > > >> 
> > > >> With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to
> > > >> a "bottom-half" that is registered via g_idle_add(). This makes it
> > > >> likely that the event won't be sent until a subsequent iteration, and
> > > >> also adds the possibility that such events will race with the
> > > >> processing of client data.
> > > >> 
> > > >> In cases where we rely on the CHR_EVENT_OPENED event being delivered
> > > >> prior to any client data being read, this may cause unexpected behavior.
> > > >> 
> > > >> In the case of a QMP monitor session using a socket backend, the delayed
> > > >> processing of the CHR_EVENT_OPENED event can lead to a situation where
> > > >> a previous session, where 'qmp_capabilities' has already processed, can
> > > >> cause the rejection of 'qmp_capabilities' for a subsequent session,
> > > >> since we reset capabilities in response to CHR_EVENT_OPENED, which may
> > > >> not yet have been delivered. This can be reproduced with the following
> > > >> command, generally within 50 or so iterations:
> > > >> 
> > > >>   mdroth@loki:~$ cat cmds
> > > >>   {'execute':'qmp_capabilities'}
> > > >>   {'execute':'query-cpus'}
> > > >>   mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock
> > > >>   <cmds | grep CommandNotFound &>/dev/null; then echo failed; break; else
> > > >>   echo ok; fi; done;
> > > >>   ok
> > > >>   ok
> > > >>   failed
> > > >>   mdroth@loki:~$
> > > >> 
> > > >> As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED
> > > >> hook, which gets called as part of the GIOChannel cb associated with the
> > > >> client rather than a bottom-half, and is thus guaranteed to be delivered
> > > >> prior to accepting any subsequent client sessions.
> > > >> 
> > > >> This does not address the more general problem of whether or not there
> > > >> are chardev users that rely on CHR_EVENT_OPENED being delivered prior to
> > > >> client data, and whether or not we should consider moving CHR_EVENT_OPENED
> > > >> "in-band" with connection establishment as a general solution, but fixes
> > > >> QMP for the time being.
> > > >> 
> > > >> Reported-by: Stefan Priebe <s.priebe@profihost.ag>
> > > >> Cc: qemu-stable@nongnu.org
> > > >> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> > > >
> > > > Thanks for debugging this Michael. I'm going to apply your patch to the qmp
> > > > branch because we can't let this broken (esp. in -stable) but this is papering
> > > > over the real bug...
> > > 
> > > Do we really need OPENED to happen in a bottom half?  Shouldn't the open
> > > event handlers register the callback instead if they need it?
> > 
> > I think that's the more general fix, but wasn't sure if there were
> > historical reasons why we didn't do this in the first place.
> > 
> > Looking at it more closely it does seem like we need a general fix
> > sooner rather than later though, and I don't see any reason why we can't
> > move BH registration into the actual OPENED hooks as you suggest:
> > 
> > hw/usb/ccid-card-passthru.c
> >  - currently affected? no
> >     debug hook only
> >  - needs OPENED BH? no
> > 
> > hw/usb/redirect.c
> >  - currently affected? yes
> >     can_read handler checks for dev->parser != NULL, which may be
> >     true if CLOSED BH has been executed yet. In the past, OPENED
> >     quiesced outstanding CLOSED events prior to us reading client data.
> >  - need OPENED BH? possibly
> >     seems to only be needed by CLOSED events, which can be issued by
> >     USBDevice, so we do teardown/usb_detach in BH. For OPENED, this
> >     may not apply. if we do issue a BH, we'd need to modify can_read
> >     handler to avoid race.
> > 
> > hw/usb/dev-serial.c
> >  - currently affected? no
> >     can_read checks for dev.attached != NULL. set to NULL
> >     synchronously in CLOSED, will not accept reads until OPENED gets
> >     called and sets dev.attached
> >  - need opened BH?  no
> > 
> > hw/char/sclpconsole.c
> >  - currently affected? no
> >     guarded by can_read handler
> >  - need opened BH? no
> > 
> > hw/char/ipoctal232.c
> >  - currently affected? no
> >     debug hook only
> >  - need opened BH? no
> > 
> > hw/char/virtio-console.c
> >  - currently affected? no
> >     OPENED/CLOSED map to vq events handled asynchronously. can_read
> >     checks for guest_connected state rather than anything set by OPENED
> >  - need opened BH? no
> > 
> > qtest.c
> >  - currently affected? yes
> >     can_read handler doesn't check for qtest_opened == true, can read
> >     data before OPENED event is processed
> >  - need opened BH? no
> > 
> > monitor.c
> >  - current affected? yes
> >     negotiated session state can temporarilly leak into a new session
> >  - need opened BH? no
> > 
> > gdbstub.c
> >  - currently affected? yes
> >     can fail to pause machine prior to starting gdb session
> >  - need opened BH? no
> > 
> > So we have a number of cases that can be fixed by avoiding the use of
> > the generic BH, and only 1 possible cause where we might need a
> > device-specific BH.
> > 
> > At first glance anyway. So if this all seems reasonable I can send a
> > more general fix shortly.
> 
> Michael, I've dropped your first patch and am taking it that you're
> going to cook this more general fix.

fix sent:

http://article.gmane.org/gmane.comp.emulators.qemu/213863

> 
> > 
> > > 
> > > Regards,
> > > 
> > > Anthony Liguori
> > > 
> > > >
> > > >> ---
> > > >> v1->v2:
> > > >> * remove command_mode reset from CHR_EVENT_OPENED case, since this
> > > >>   might still cause a race
> > > >> 
> > > >>  monitor.c |    2 +-
> > > >>  1 file changed, 1 insertion(+), 1 deletion(-)
> > > >> 
> > > >> diff --git a/monitor.c b/monitor.c
> > > >> index 6ce2a4e..f1953a0 100644
> > > >> --- a/monitor.c
> > > >> +++ b/monitor.c
> > > >> @@ -4649,7 +4649,6 @@ static void monitor_control_event(void *opaque, int event)
> > > >>  
> > > >>      switch (event) {
> > > >>      case CHR_EVENT_OPENED:
> > > >> -        mon->mc->command_mode = 0;
> > > >>          data = get_qmp_greeting();
> > > >>          monitor_json_emitter(mon, data);
> > > >>          qobject_decref(data);
> > > >> @@ -4660,6 +4659,7 @@ static void monitor_control_event(void *opaque, int event)
> > > >>          json_message_parser_init(&mon->mc->parser, handle_qmp_command);
> > > >>          mon_refcount--;
> > > >>          monitor_fdsets_cleanup();
> > > >> +        mon->mc->command_mode = 0;
> > > >>          break;
> > > >>      }
> > > >>  }
> > > 
> > 
>
Stefan Priebe - Profihost AG May 30, 2013, 6:58 a.m. UTC | #7
Am 30.05.2013 00:29, schrieb mdroth:
> On Wed, May 29, 2013 at 01:27:33PM -0400, Luiz Capitulino wrote:
>> On Mon, 27 May 2013 12:59:25 -0500
>> mdroth <mdroth@linux.vnet.ibm.com> wrote:
>>
>>> On Mon, May 27, 2013 at 11:16:01AM -0500, Anthony Liguori wrote:
>>>> Luiz Capitulino <lcapitulino@redhat.com> writes:
>>>>
>>>>> On Sun, 26 May 2013 10:33:39 -0500
>>>>> Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
>>>>>
>>>>>> In the past, CHR_EVENT_OPENED events were emitted via a pre-expired
>>>>>> QEMUTimer. Due to timers being processing at the tail end of each main
>>>>>> loop iteration, this generally meant that such events would be emitted
>>>>>> within the same main loop iteration, prior any client data being read
>>>>>> by tcp/unix socket server backends.
>>>>>>
>>>>>> With 9f939df955a4152aad69a19a77e0898631bb2c18, this was switched to
>>>>>> a "bottom-half" that is registered via g_idle_add(). This makes it
>>>>>> likely that the event won't be sent until a subsequent iteration, and
>>>>>> also adds the possibility that such events will race with the
>>>>>> processing of client data.
>>>>>>
>>>>>> In cases where we rely on the CHR_EVENT_OPENED event being delivered
>>>>>> prior to any client data being read, this may cause unexpected behavior.
>>>>>>
>>>>>> In the case of a QMP monitor session using a socket backend, the delayed
>>>>>> processing of the CHR_EVENT_OPENED event can lead to a situation where
>>>>>> a previous session, where 'qmp_capabilities' has already processed, can
>>>>>> cause the rejection of 'qmp_capabilities' for a subsequent session,
>>>>>> since we reset capabilities in response to CHR_EVENT_OPENED, which may
>>>>>> not yet have been delivered. This can be reproduced with the following
>>>>>> command, generally within 50 or so iterations:
>>>>>>
>>>>>>   mdroth@loki:~$ cat cmds
>>>>>>   {'execute':'qmp_capabilities'}
>>>>>>   {'execute':'query-cpus'}
>>>>>>   mdroth@loki:~$ while true; do if socat stdio unix-connect:/tmp/qmp.sock
>>>>>>   <cmds | grep CommandNotFound &>/dev/null; then echo failed; break; else
>>>>>>   echo ok; fi; done;
>>>>>>   ok
>>>>>>   ok
>>>>>>   failed
>>>>>>   mdroth@loki:~$
>>>>>>
>>>>>> As a work-around, we reset capabilities as part of the CHR_EVENT_CLOSED
>>>>>> hook, which gets called as part of the GIOChannel cb associated with the
>>>>>> client rather than a bottom-half, and is thus guaranteed to be delivered
>>>>>> prior to accepting any subsequent client sessions.
>>>>>>
>>>>>> This does not address the more general problem of whether or not there
>>>>>> are chardev users that rely on CHR_EVENT_OPENED being delivered prior to
>>>>>> client data, and whether or not we should consider moving CHR_EVENT_OPENED
>>>>>> "in-band" with connection establishment as a general solution, but fixes
>>>>>> QMP for the time being.
>>>>>>
>>>>>> Reported-by: Stefan Priebe <s.priebe@profihost.ag>
>>>>>> Cc: qemu-stable@nongnu.org
>>>>>> Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
>>>>>
>>>>> Thanks for debugging this Michael. I'm going to apply your patch to the qmp
>>>>> branch because we can't let this broken (esp. in -stable) but this is papering
>>>>> over the real bug...
>>>>
>>>> Do we really need OPENED to happen in a bottom half?  Shouldn't the open
>>>> event handlers register the callback instead if they need it?
>>>
>>> I think that's the more general fix, but wasn't sure if there were
>>> historical reasons why we didn't do this in the first place.
>>>
>>> Looking at it more closely it does seem like we need a general fix
>>> sooner rather than later though, and I don't see any reason why we can't
>>> move BH registration into the actual OPENED hooks as you suggest:
>>>
>>> hw/usb/ccid-card-passthru.c
>>>  - currently affected? no
>>>     debug hook only
>>>  - needs OPENED BH? no
>>>
>>> hw/usb/redirect.c
>>>  - currently affected? yes
>>>     can_read handler checks for dev->parser != NULL, which may be
>>>     true if CLOSED BH has been executed yet. In the past, OPENED
>>>     quiesced outstanding CLOSED events prior to us reading client data.
>>>  - need OPENED BH? possibly
>>>     seems to only be needed by CLOSED events, which can be issued by
>>>     USBDevice, so we do teardown/usb_detach in BH. For OPENED, this
>>>     may not apply. if we do issue a BH, we'd need to modify can_read
>>>     handler to avoid race.
>>>
>>> hw/usb/dev-serial.c
>>>  - currently affected? no
>>>     can_read checks for dev.attached != NULL. set to NULL
>>>     synchronously in CLOSED, will not accept reads until OPENED gets
>>>     called and sets dev.attached
>>>  - need opened BH?  no
>>>
>>> hw/char/sclpconsole.c
>>>  - currently affected? no
>>>     guarded by can_read handler
>>>  - need opened BH? no
>>>
>>> hw/char/ipoctal232.c
>>>  - currently affected? no
>>>     debug hook only
>>>  - need opened BH? no
>>>
>>> hw/char/virtio-console.c
>>>  - currently affected? no
>>>     OPENED/CLOSED map to vq events handled asynchronously. can_read
>>>     checks for guest_connected state rather than anything set by OPENED
>>>  - need opened BH? no
>>>
>>> qtest.c
>>>  - currently affected? yes
>>>     can_read handler doesn't check for qtest_opened == true, can read
>>>     data before OPENED event is processed
>>>  - need opened BH? no
>>>
>>> monitor.c
>>>  - current affected? yes
>>>     negotiated session state can temporarilly leak into a new session
>>>  - need opened BH? no
>>>
>>> gdbstub.c
>>>  - currently affected? yes
>>>     can fail to pause machine prior to starting gdb session
>>>  - need opened BH? no
>>>
>>> So we have a number of cases that can be fixed by avoiding the use of
>>> the generic BH, and only 1 possible cause where we might need a
>>> device-specific BH.
>>>
>>> At first glance anyway. So if this all seems reasonable I can send a
>>> more general fix shortly.
>>
>> Michael, I've dropped your first patch and am taking it that you're
>> going to cook this more general fix.
> 
> fix sent:
> 
> http://article.gmane.org/gmane.comp.emulators.qemu/213863

thanks seems to work fine

Stefan
diff mbox

Patch

diff --git a/monitor.c b/monitor.c
index 6ce2a4e..f1953a0 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4649,7 +4649,6 @@  static void monitor_control_event(void *opaque, int event)
 
     switch (event) {
     case CHR_EVENT_OPENED:
-        mon->mc->command_mode = 0;
         data = get_qmp_greeting();
         monitor_json_emitter(mon, data);
         qobject_decref(data);
@@ -4660,6 +4659,7 @@  static void monitor_control_event(void *opaque, int event)
         json_message_parser_init(&mon->mc->parser, handle_qmp_command);
         mon_refcount--;
         monitor_fdsets_cleanup();
+        mon->mc->command_mode = 0;
         break;
     }
 }