Message ID | 20110901163545.71ba1515@doriath |
---|---|
State | New |
Headers | show |
On Thu, Sep 01, 2011 at 04:35:45PM -0300, Luiz Capitulino wrote: > Sometimes, when having lots of VMs running on a RHEV host and the user > attempts to close a SPICE window, libvirt will get corrupted json from > QEMU. > > After some investigation, I found out that the problem is that different > SPICE threads are calling monitor functions (such as > monitor_protocol_event()) in parallel which causes concurrent access > to the monitor's internal buffer outbuf[]. > > This fixes the problem by protecting accesses to outbuf[] with a mutex. > > Honestly speaking, I'm not completely sure this the best thing to do > because the monitor itself and other qemu subsystems are not thread safe, > so having subsystems like SPICE assuming the contrary seems a bit > catastrophic to me... > > Anyways, this commit fixes the problem at hand. IMHO this patch should be applied to stable-0.15 as is, since it is an important fix for SPICE, and this highly targetted mutex lock has low-risk of regressions elsewhere. I'd also apply it for master now, but at the same time perhaps start work on adding broader locking that covers all APIs that monitor.c exposes to internal QEMU code, so we're future proofed against other surprises. > Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Regards, Daniel
On 2011-09-01 21:35, Luiz Capitulino wrote: > Sometimes, when having lots of VMs running on a RHEV host and the user > attempts to close a SPICE window, libvirt will get corrupted json from > QEMU. > > After some investigation, I found out that the problem is that different > SPICE threads are calling monitor functions (such as > monitor_protocol_event()) in parallel which causes concurrent access > to the monitor's internal buffer outbuf[]. > > This fixes the problem by protecting accesses to outbuf[] with a mutex. > > Honestly speaking, I'm not completely sure this the best thing to do > because the monitor itself and other qemu subsystems are not thread safe, > so having subsystems like SPICE assuming the contrary seems a bit > catastrophic to me... I fully agree. ... > @@ -246,10 +248,14 @@ static int monitor_read_password(Monitor *mon, ReadLineFunc *readline_func, > > void monitor_flush(Monitor *mon) > { > + qemu_mutex_lock(&mon->mutex); > + > if (mon && mon->outbuf_index != 0 && !mon->mux_out) { > qemu_chr_fe_write(mon->chr, mon->outbuf, mon->outbuf_index); > mon->outbuf_index = 0; > } > + > + qemu_mutex_unlock(&mon->mutex); Here is another example for things that can break due to "optimistic" parallelization: What protects the chardev state that will be touched by calling qemu_chr_fe_write? Even when ignoring mux'ed channels for now, I bet there are code paths that modify the state without holding the frontend lock (i.e. Monitor::mutex). Jan
On 09/01/2011 02:35 PM, Luiz Capitulino wrote: > Sometimes, when having lots of VMs running on a RHEV host and the user > attempts to close a SPICE window, libvirt will get corrupted json from > QEMU. > > After some investigation, I found out that the problem is that different > SPICE threads are calling monitor functions (such as > monitor_protocol_event()) in parallel which causes concurrent access > to the monitor's internal buffer outbuf[]. > > This fixes the problem by protecting accesses to outbuf[] with a mutex. > > Honestly speaking, I'm not completely sure this the best thing to do > because the monitor itself and other qemu subsystems are not thread safe, > so having subsystems like SPICE assuming the contrary seems a bit > catastrophic to me... > > Anyways, this commit fixes the problem at hand. Nack. This is absolutely a Spice bug. Spice should not be calling into QEMU code from multiple threads. It should only call into QEMU code while it's holding the qemu_mutex. The right way to fix this is probably to make all of the SpiceCoreInterface callbacks simply write to a file descriptor which can then wake up QEMU to do the operation on behalf of it. It's ugly but the libspice interface is far too tied to QEMU internals in the first place which is the root of the problem. Regards, Anthony Liguori > > Signed-off-by: Luiz Capitulino<lcapitulino@redhat.com> > --- > monitor.c | 16 +++++++++++++++- > 1 files changed, 15 insertions(+), 1 deletions(-) > > diff --git a/monitor.c b/monitor.c > index 04f465a..61d4d93 100644 > --- a/monitor.c > +++ b/monitor.c > @@ -57,6 +57,7 @@ > #include "json-parser.h" > #include "osdep.h" > #include "cpu.h" > +#include "qemu-thread.h" > #ifdef CONFIG_SIMPLE_TRACE > #include "trace.h" > #endif > @@ -144,6 +145,7 @@ struct Monitor { > int suspend_cnt; > uint8_t outbuf[1024]; > int outbuf_index; > + QemuMutex mutex; > ReadLineState *rs; > MonitorControl *mc; > CPUState *mon_cpu; > @@ -246,10 +248,14 @@ static int monitor_read_password(Monitor *mon, ReadLineFunc *readline_func, > > void monitor_flush(Monitor *mon) > { > + qemu_mutex_lock(&mon->mutex); > + > if (mon&& mon->outbuf_index != 0&& !mon->mux_out) { > qemu_chr_fe_write(mon->chr, mon->outbuf, mon->outbuf_index); > mon->outbuf_index = 0; > } > + > + qemu_mutex_unlock(&mon->mutex); > } > > /* flush at every end of line or if the buffer is full */ > @@ -257,6 +263,8 @@ static void monitor_puts(Monitor *mon, const char *str) > { > char c; > > + qemu_mutex_lock(&mon->mutex); > + > for(;;) { > c = *str++; > if (c == '\0') > @@ -265,9 +273,14 @@ static void monitor_puts(Monitor *mon, const char *str) > mon->outbuf[mon->outbuf_index++] = '\r'; > mon->outbuf[mon->outbuf_index++] = c; > if (mon->outbuf_index>= (sizeof(mon->outbuf) - 1) > - || c == '\n') > + || c == '\n') { > + qemu_mutex_unlock(&mon->mutex); > monitor_flush(mon); > + qemu_mutex_lock(&mon->mutex); > + } > } > + > + qemu_mutex_unlock(&mon->mutex); > } > > void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap) > @@ -5275,6 +5288,7 @@ void monitor_init(CharDriverState *chr, int flags) > > mon = g_malloc0(sizeof(*mon)); > > + qemu_mutex_init(&mon->mutex); > mon->chr = chr; > mon->flags = flags; > if (flags& MONITOR_USE_READLINE) {
On Thu, Sep 01, 2011 at 08:34:35PM -0500, Anthony Liguori wrote: > On 09/01/2011 02:35 PM, Luiz Capitulino wrote: > >Sometimes, when having lots of VMs running on a RHEV host and the user > >attempts to close a SPICE window, libvirt will get corrupted json from > >QEMU. > > > >After some investigation, I found out that the problem is that different > >SPICE threads are calling monitor functions (such as > >monitor_protocol_event()) in parallel which causes concurrent access > >to the monitor's internal buffer outbuf[]. > > > >This fixes the problem by protecting accesses to outbuf[] with a mutex. > > > >Honestly speaking, I'm not completely sure this the best thing to do > >because the monitor itself and other qemu subsystems are not thread safe, > >so having subsystems like SPICE assuming the contrary seems a bit > >catastrophic to me... > > > >Anyways, this commit fixes the problem at hand. > > Nack. > > This is absolutely a Spice bug. Spice should not be calling into > QEMU code from multiple threads. It should only call into QEMU code > while it's holding the qemu_mutex. > > The right way to fix this is probably to make all of the > SpiceCoreInterface callbacks simply write to a file descriptor which > can then wake up QEMU to do the operation on behalf of it. It's > ugly but the libspice interface is far too tied to QEMU internals in > the first place which is the root of the problem. This feels like a rather short-term approach to fixing the problem to me. As QEMU becomes increasingly multi-threaded, there is high liklihood that we'll get other code in QEMU which wants to use the monitor from multiple threads. The monitor code in QEMU is fairly well isolated & thus comparatively easy to make threadsafe, so I don't see why we wouldn't want todo that & avoid any chance of this type of problem recurring in the future. IMHO, "fixing" SPICE is not fixing the bug at all, it is just removing the trigger of the bug in the monitor. Regards, Daniel
On 2011-09-02 11:41, Daniel P. Berrange wrote: > On Thu, Sep 01, 2011 at 08:34:35PM -0500, Anthony Liguori wrote: >> On 09/01/2011 02:35 PM, Luiz Capitulino wrote: >>> Sometimes, when having lots of VMs running on a RHEV host and the user >>> attempts to close a SPICE window, libvirt will get corrupted json from >>> QEMU. >>> >>> After some investigation, I found out that the problem is that different >>> SPICE threads are calling monitor functions (such as >>> monitor_protocol_event()) in parallel which causes concurrent access >>> to the monitor's internal buffer outbuf[]. >>> >>> This fixes the problem by protecting accesses to outbuf[] with a mutex. >>> >>> Honestly speaking, I'm not completely sure this the best thing to do >>> because the monitor itself and other qemu subsystems are not thread safe, >>> so having subsystems like SPICE assuming the contrary seems a bit >>> catastrophic to me... >>> >>> Anyways, this commit fixes the problem at hand. >> >> Nack. >> >> This is absolutely a Spice bug. Spice should not be calling into >> QEMU code from multiple threads. It should only call into QEMU code >> while it's holding the qemu_mutex. >> >> The right way to fix this is probably to make all of the >> SpiceCoreInterface callbacks simply write to a file descriptor which >> can then wake up QEMU to do the operation on behalf of it. It's >> ugly but the libspice interface is far too tied to QEMU internals in >> the first place which is the root of the problem. > > This feels like a rather short-term approach to fixing the problem > to me. As QEMU becomes increasingly multi-threaded, there is high > liklihood that we'll get other code in QEMU which wants to use the > monitor from multiple threads. The monitor code in QEMU is fairly > well isolated & thus comparatively easy to make threadsafe, so I As pointed out before, this assumption is not correct. > don't see why we wouldn't want todo that & avoid any chance of this > type of problem recurring in the future. > > IMHO, "fixing" SPICE is not fixing the bug at all, it is just removing > the trigger of the bug in the monitor. Until we have officially thread-safe subsystems, SPICE must take the qemu_global_mutex before calling core services. This patch does not make the monitor thread-safe as it does not address indirectly called services. Jan
diff --git a/monitor.c b/monitor.c index 04f465a..61d4d93 100644 --- a/monitor.c +++ b/monitor.c @@ -57,6 +57,7 @@ #include "json-parser.h" #include "osdep.h" #include "cpu.h" +#include "qemu-thread.h" #ifdef CONFIG_SIMPLE_TRACE #include "trace.h" #endif @@ -144,6 +145,7 @@ struct Monitor { int suspend_cnt; uint8_t outbuf[1024]; int outbuf_index; + QemuMutex mutex; ReadLineState *rs; MonitorControl *mc; CPUState *mon_cpu; @@ -246,10 +248,14 @@ static int monitor_read_password(Monitor *mon, ReadLineFunc *readline_func, void monitor_flush(Monitor *mon) { + qemu_mutex_lock(&mon->mutex); + if (mon && mon->outbuf_index != 0 && !mon->mux_out) { qemu_chr_fe_write(mon->chr, mon->outbuf, mon->outbuf_index); mon->outbuf_index = 0; } + + qemu_mutex_unlock(&mon->mutex); } /* flush at every end of line or if the buffer is full */ @@ -257,6 +263,8 @@ static void monitor_puts(Monitor *mon, const char *str) { char c; + qemu_mutex_lock(&mon->mutex); + for(;;) { c = *str++; if (c == '\0') @@ -265,9 +273,14 @@ static void monitor_puts(Monitor *mon, const char *str) mon->outbuf[mon->outbuf_index++] = '\r'; mon->outbuf[mon->outbuf_index++] = c; if (mon->outbuf_index >= (sizeof(mon->outbuf) - 1) - || c == '\n') + || c == '\n') { + qemu_mutex_unlock(&mon->mutex); monitor_flush(mon); + qemu_mutex_lock(&mon->mutex); + } } + + qemu_mutex_unlock(&mon->mutex); } void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap) @@ -5275,6 +5288,7 @@ void monitor_init(CharDriverState *chr, int flags) mon = g_malloc0(sizeof(*mon)); + qemu_mutex_init(&mon->mutex); mon->chr = chr; mon->flags = flags; if (flags & MONITOR_USE_READLINE) {
Sometimes, when having lots of VMs running on a RHEV host and the user attempts to close a SPICE window, libvirt will get corrupted json from QEMU. After some investigation, I found out that the problem is that different SPICE threads are calling monitor functions (such as monitor_protocol_event()) in parallel which causes concurrent access to the monitor's internal buffer outbuf[]. This fixes the problem by protecting accesses to outbuf[] with a mutex. Honestly speaking, I'm not completely sure this the best thing to do because the monitor itself and other qemu subsystems are not thread safe, so having subsystems like SPICE assuming the contrary seems a bit catastrophic to me... Anyways, this commit fixes the problem at hand. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> --- monitor.c | 16 +++++++++++++++- 1 files changed, 15 insertions(+), 1 deletions(-)