Patchwork monitor: Protect outbuf from concurrent access

login
register
mail settings
Submitter Luiz Capitulino
Date Sept. 1, 2011, 7:35 p.m.
Message ID <20110901163545.71ba1515@doriath>
Download mbox | patch
Permalink /patch/112968/
State New
Headers show

Comments

Luiz Capitulino - Sept. 1, 2011, 7:35 p.m.
Sometimes, when having lots of VMs running on a RHEV host and the user
attempts to close a SPICE window, libvirt will get corrupted json from
QEMU.

After some investigation, I found out that the problem is that different
SPICE threads are calling monitor functions (such as
monitor_protocol_event()) in parallel which causes concurrent access
to the monitor's internal buffer outbuf[].

This fixes the problem by protecting accesses to outbuf[] with a mutex.

Honestly speaking, I'm not completely sure this the best thing to do
because the monitor itself and other qemu subsystems are not thread safe,
so having subsystems like SPICE assuming the contrary seems a bit
catastrophic to me...

Anyways, this commit fixes the problem at hand.

Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
---
 monitor.c |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)
Daniel P. Berrange - Sept. 1, 2011, 7:47 p.m.
On Thu, Sep 01, 2011 at 04:35:45PM -0300, Luiz Capitulino wrote:
> Sometimes, when having lots of VMs running on a RHEV host and the user
> attempts to close a SPICE window, libvirt will get corrupted json from
> QEMU.
> 
> After some investigation, I found out that the problem is that different
> SPICE threads are calling monitor functions (such as
> monitor_protocol_event()) in parallel which causes concurrent access
> to the monitor's internal buffer outbuf[].
> 
> This fixes the problem by protecting accesses to outbuf[] with a mutex.
> 
> Honestly speaking, I'm not completely sure this the best thing to do
> because the monitor itself and other qemu subsystems are not thread safe,
> so having subsystems like SPICE assuming the contrary seems a bit
> catastrophic to me...
> 
> Anyways, this commit fixes the problem at hand.

IMHO this patch should be applied to stable-0.15 as is, since it is
an important fix for SPICE, and this highly targetted mutex lock has
low-risk of regressions elsewhere.

I'd also apply it for master now, but at the same time perhaps start
work on adding broader locking that covers all APIs that monitor.c
exposes to internal QEMU code, so we're future proofed against other
surprises.

> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>

  Signed-off-by: Daniel P. Berrange <berrange@redhat.com>


Regards,
Daniel
Jan Kiszka - Sept. 1, 2011, 9:03 p.m.
On 2011-09-01 21:35, Luiz Capitulino wrote:
> Sometimes, when having lots of VMs running on a RHEV host and the user
> attempts to close a SPICE window, libvirt will get corrupted json from
> QEMU.
> 
> After some investigation, I found out that the problem is that different
> SPICE threads are calling monitor functions (such as
> monitor_protocol_event()) in parallel which causes concurrent access
> to the monitor's internal buffer outbuf[].
> 
> This fixes the problem by protecting accesses to outbuf[] with a mutex.
> 
> Honestly speaking, I'm not completely sure this the best thing to do
> because the monitor itself and other qemu subsystems are not thread safe,
> so having subsystems like SPICE assuming the contrary seems a bit
> catastrophic to me...

I fully agree.

...

> @@ -246,10 +248,14 @@ static int monitor_read_password(Monitor *mon, ReadLineFunc *readline_func,
>  
>  void monitor_flush(Monitor *mon)
>  {
> +    qemu_mutex_lock(&mon->mutex);
> +
>      if (mon && mon->outbuf_index != 0 && !mon->mux_out) {
>          qemu_chr_fe_write(mon->chr, mon->outbuf, mon->outbuf_index);
>          mon->outbuf_index = 0;
>      }
> +
> +    qemu_mutex_unlock(&mon->mutex);

Here is another example for things that can break due to "optimistic"
parallelization: What protects the chardev state that will be touched by
calling qemu_chr_fe_write? Even when ignoring mux'ed channels for now, I
bet there are code paths that modify the state without holding the
frontend lock (i.e. Monitor::mutex).

Jan
Anthony Liguori - Sept. 2, 2011, 1:34 a.m.
On 09/01/2011 02:35 PM, Luiz Capitulino wrote:
> Sometimes, when having lots of VMs running on a RHEV host and the user
> attempts to close a SPICE window, libvirt will get corrupted json from
> QEMU.
>
> After some investigation, I found out that the problem is that different
> SPICE threads are calling monitor functions (such as
> monitor_protocol_event()) in parallel which causes concurrent access
> to the monitor's internal buffer outbuf[].
>
> This fixes the problem by protecting accesses to outbuf[] with a mutex.
>
> Honestly speaking, I'm not completely sure this the best thing to do
> because the monitor itself and other qemu subsystems are not thread safe,
> so having subsystems like SPICE assuming the contrary seems a bit
> catastrophic to me...
>
> Anyways, this commit fixes the problem at hand.

Nack.

This is absolutely a Spice bug.  Spice should not be calling into QEMU 
code from multiple threads.  It should only call into QEMU code while 
it's holding the qemu_mutex.

The right way to fix this is probably to make all of the 
SpiceCoreInterface callbacks simply write to a file descriptor which can 
then wake up QEMU to do the operation on behalf of it.   It's ugly but 
the libspice interface is far too tied to QEMU internals in the first 
place which is the root of the problem.

Regards,

Anthony Liguori

>
> Signed-off-by: Luiz Capitulino<lcapitulino@redhat.com>
> ---
>   monitor.c |   16 +++++++++++++++-
>   1 files changed, 15 insertions(+), 1 deletions(-)
>
> diff --git a/monitor.c b/monitor.c
> index 04f465a..61d4d93 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -57,6 +57,7 @@
>   #include "json-parser.h"
>   #include "osdep.h"
>   #include "cpu.h"
> +#include "qemu-thread.h"
>   #ifdef CONFIG_SIMPLE_TRACE
>   #include "trace.h"
>   #endif
> @@ -144,6 +145,7 @@ struct Monitor {
>       int suspend_cnt;
>       uint8_t outbuf[1024];
>       int outbuf_index;
> +    QemuMutex mutex;
>       ReadLineState *rs;
>       MonitorControl *mc;
>       CPUState *mon_cpu;
> @@ -246,10 +248,14 @@ static int monitor_read_password(Monitor *mon, ReadLineFunc *readline_func,
>
>   void monitor_flush(Monitor *mon)
>   {
> +    qemu_mutex_lock(&mon->mutex);
> +
>       if (mon&&  mon->outbuf_index != 0&&  !mon->mux_out) {
>           qemu_chr_fe_write(mon->chr, mon->outbuf, mon->outbuf_index);
>           mon->outbuf_index = 0;
>       }
> +
> +    qemu_mutex_unlock(&mon->mutex);
>   }
>
>   /* flush at every end of line or if the buffer is full */
> @@ -257,6 +263,8 @@ static void monitor_puts(Monitor *mon, const char *str)
>   {
>       char c;
>
> +    qemu_mutex_lock(&mon->mutex);
> +
>       for(;;) {
>           c = *str++;
>           if (c == '\0')
> @@ -265,9 +273,14 @@ static void monitor_puts(Monitor *mon, const char *str)
>               mon->outbuf[mon->outbuf_index++] = '\r';
>           mon->outbuf[mon->outbuf_index++] = c;
>           if (mon->outbuf_index>= (sizeof(mon->outbuf) - 1)
> -            || c == '\n')
> +            || c == '\n') {
> +            qemu_mutex_unlock(&mon->mutex);
>               monitor_flush(mon);
> +            qemu_mutex_lock(&mon->mutex);
> +        }
>       }
> +
> +    qemu_mutex_unlock(&mon->mutex);
>   }
>
>   void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
> @@ -5275,6 +5288,7 @@ void monitor_init(CharDriverState *chr, int flags)
>
>       mon = g_malloc0(sizeof(*mon));
>
> +    qemu_mutex_init(&mon->mutex);
>       mon->chr = chr;
>       mon->flags = flags;
>       if (flags&  MONITOR_USE_READLINE) {
Daniel P. Berrange - Sept. 2, 2011, 9:41 a.m.
On Thu, Sep 01, 2011 at 08:34:35PM -0500, Anthony Liguori wrote:
> On 09/01/2011 02:35 PM, Luiz Capitulino wrote:
> >Sometimes, when having lots of VMs running on a RHEV host and the user
> >attempts to close a SPICE window, libvirt will get corrupted json from
> >QEMU.
> >
> >After some investigation, I found out that the problem is that different
> >SPICE threads are calling monitor functions (such as
> >monitor_protocol_event()) in parallel which causes concurrent access
> >to the monitor's internal buffer outbuf[].
> >
> >This fixes the problem by protecting accesses to outbuf[] with a mutex.
> >
> >Honestly speaking, I'm not completely sure this the best thing to do
> >because the monitor itself and other qemu subsystems are not thread safe,
> >so having subsystems like SPICE assuming the contrary seems a bit
> >catastrophic to me...
> >
> >Anyways, this commit fixes the problem at hand.
> 
> Nack.
> 
> This is absolutely a Spice bug.  Spice should not be calling into
> QEMU code from multiple threads.  It should only call into QEMU code
> while it's holding the qemu_mutex.
> 
> The right way to fix this is probably to make all of the
> SpiceCoreInterface callbacks simply write to a file descriptor which
> can then wake up QEMU to do the operation on behalf of it.   It's
> ugly but the libspice interface is far too tied to QEMU internals in
> the first place which is the root of the problem.

This feels like a rather short-term approach to fixing the problem
to me. As QEMU becomes increasingly multi-threaded, there is high
liklihood that we'll get other code in QEMU which wants to use the
monitor from multiple threads. The monitor code in QEMU is fairly
well isolated & thus comparatively easy to make threadsafe, so I
don't see why we wouldn't want todo that & avoid any chance of this
type of problem recurring in the future.

IMHO, "fixing" SPICE is not fixing the bug at all, it is just removing
the trigger of the bug in the monitor.

Regards,
Daniel
Jan Kiszka - Sept. 2, 2011, 11:26 a.m.
On 2011-09-02 11:41, Daniel P. Berrange wrote:
> On Thu, Sep 01, 2011 at 08:34:35PM -0500, Anthony Liguori wrote:
>> On 09/01/2011 02:35 PM, Luiz Capitulino wrote:
>>> Sometimes, when having lots of VMs running on a RHEV host and the user
>>> attempts to close a SPICE window, libvirt will get corrupted json from
>>> QEMU.
>>>
>>> After some investigation, I found out that the problem is that different
>>> SPICE threads are calling monitor functions (such as
>>> monitor_protocol_event()) in parallel which causes concurrent access
>>> to the monitor's internal buffer outbuf[].
>>>
>>> This fixes the problem by protecting accesses to outbuf[] with a mutex.
>>>
>>> Honestly speaking, I'm not completely sure this the best thing to do
>>> because the monitor itself and other qemu subsystems are not thread safe,
>>> so having subsystems like SPICE assuming the contrary seems a bit
>>> catastrophic to me...
>>>
>>> Anyways, this commit fixes the problem at hand.
>>
>> Nack.
>>
>> This is absolutely a Spice bug.  Spice should not be calling into
>> QEMU code from multiple threads.  It should only call into QEMU code
>> while it's holding the qemu_mutex.
>>
>> The right way to fix this is probably to make all of the
>> SpiceCoreInterface callbacks simply write to a file descriptor which
>> can then wake up QEMU to do the operation on behalf of it.   It's
>> ugly but the libspice interface is far too tied to QEMU internals in
>> the first place which is the root of the problem.
> 
> This feels like a rather short-term approach to fixing the problem
> to me. As QEMU becomes increasingly multi-threaded, there is high
> liklihood that we'll get other code in QEMU which wants to use the
> monitor from multiple threads. The monitor code in QEMU is fairly
> well isolated & thus comparatively easy to make threadsafe, so I

As pointed out before, this assumption is not correct.

> don't see why we wouldn't want todo that & avoid any chance of this
> type of problem recurring in the future.
> 
> IMHO, "fixing" SPICE is not fixing the bug at all, it is just removing
> the trigger of the bug in the monitor.

Until we have officially thread-safe subsystems, SPICE must take the
qemu_global_mutex before calling core services. This patch does not make
the monitor thread-safe as it does not address indirectly called services.

Jan

Patch

diff --git a/monitor.c b/monitor.c
index 04f465a..61d4d93 100644
--- a/monitor.c
+++ b/monitor.c
@@ -57,6 +57,7 @@ 
 #include "json-parser.h"
 #include "osdep.h"
 #include "cpu.h"
+#include "qemu-thread.h"
 #ifdef CONFIG_SIMPLE_TRACE
 #include "trace.h"
 #endif
@@ -144,6 +145,7 @@  struct Monitor {
     int suspend_cnt;
     uint8_t outbuf[1024];
     int outbuf_index;
+    QemuMutex mutex;
     ReadLineState *rs;
     MonitorControl *mc;
     CPUState *mon_cpu;
@@ -246,10 +248,14 @@  static int monitor_read_password(Monitor *mon, ReadLineFunc *readline_func,
 
 void monitor_flush(Monitor *mon)
 {
+    qemu_mutex_lock(&mon->mutex);
+
     if (mon && mon->outbuf_index != 0 && !mon->mux_out) {
         qemu_chr_fe_write(mon->chr, mon->outbuf, mon->outbuf_index);
         mon->outbuf_index = 0;
     }
+
+    qemu_mutex_unlock(&mon->mutex);
 }
 
 /* flush at every end of line or if the buffer is full */
@@ -257,6 +263,8 @@  static void monitor_puts(Monitor *mon, const char *str)
 {
     char c;
 
+    qemu_mutex_lock(&mon->mutex);
+
     for(;;) {
         c = *str++;
         if (c == '\0')
@@ -265,9 +273,14 @@  static void monitor_puts(Monitor *mon, const char *str)
             mon->outbuf[mon->outbuf_index++] = '\r';
         mon->outbuf[mon->outbuf_index++] = c;
         if (mon->outbuf_index >= (sizeof(mon->outbuf) - 1)
-            || c == '\n')
+            || c == '\n') {
+            qemu_mutex_unlock(&mon->mutex);
             monitor_flush(mon);
+            qemu_mutex_lock(&mon->mutex);
+        }
     }
+
+    qemu_mutex_unlock(&mon->mutex);
 }
 
 void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap)
@@ -5275,6 +5288,7 @@  void monitor_init(CharDriverState *chr, int flags)
 
     mon = g_malloc0(sizeof(*mon));
 
+    qemu_mutex_init(&mon->mutex);
     mon->chr = chr;
     mon->flags = flags;
     if (flags & MONITOR_USE_READLINE) {