Patchwork monitor: Protect outbuf from concurrent access

login
register
mail settings
Submitter Gerd Hoffmann
Date Sept. 2, 2011, 1:39 p.m.
Message ID <4E60DC77.5020300@redhat.com>
Download mbox | patch
Permalink /patch/113140/
State New
Headers show

Comments

Gerd Hoffmann - Sept. 2, 2011, 1:39 p.m.
Hi,

>> After some investigation, I found out that the problem is that different
>> SPICE threads are calling monitor functions (such as
>> monitor_protocol_event()) in parallel which causes concurrent access
>> to the monitor's internal buffer outbuf[].

[ adding spice-list to Cc, see qemu-devel for the rest of the thread ]

spice isn't supposed to do that.

/me just added a assert in channel_event() and saw it trigger in display 
channel disconnects.

#0  0x0000003ceba32a45 in raise () from /lib64/libc.so.6
#1  0x0000003ceba34225 in abort () from /lib64/libc.so.6
#2  0x0000003ceba2b9d5 in __assert_fail () from /lib64/libc.so.6
#3  0x0000000000503759 in channel_event (event=3, info=0x35e9340)
     at /home/kraxel/projects/qemu/ui/spice-core.c:223
#4  0x00007f9a77a9921b in reds_channel_event (s=0x35e92c0) at reds.c:400
#5  reds_stream_free (s=0x35e92c0) at reds.c:4981
#6  0x00007f9a77aac8b0 in red_disconnect_channel 
(channel=0x7f9a24069a80) at red_worker.c:8489
#7  0x00007f9a77ab53a8 in handle_dev_input (listener=0x7f9a3211ab20, 
events=<value optimized out>)
     at red_worker.c:10062
#8  0x00007f9a77ab436d in red_worker_main (arg=<value optimized out>) at 
red_worker.c:10304
#9  0x0000003cec2077e1 in start_thread () from /lib64/libpthread.so.0
#10 0x0000003cebae68ed in clone () from /lib64/libc.so.6

IMHO spice server should handle the display channel tear-down in the 
dispatcher instead of the worker thread.  Alon?

>> Anyways, this commit fixes the problem at hand.

Not really.  channel_event() itself isn't thread-safe too, it does 
unlocked list operations which can also blow up when called from 
different threads.

A patch like the attached (warning: untested) should do as quick&dirty 
fix for stable.  But IMO we really should fix spice instead.

cheers,
   Gerd
From 7496e573ff6085d3c42d7e65b72c85fd2a7b4a78 Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann <kraxel@redhat.com>
Date: Fri, 2 Sep 2011 15:03:28 +0200
Subject: [PATCH] spice: workaround a spice server bug.

---
 ui/spice-core.c |   21 ++++++++++++++++++++-
 1 files changed, 20 insertions(+), 1 deletions(-)
Anthony Liguori - Sept. 2, 2011, 2:03 p.m.
On 09/02/2011 08:39 AM, Gerd Hoffmann wrote:
> Hi,
>
>>> After some investigation, I found out that the problem is that different
>>> SPICE threads are calling monitor functions (such as
>>> monitor_protocol_event()) in parallel which causes concurrent access
>>> to the monitor's internal buffer outbuf[].
>
> [ adding spice-list to Cc, see qemu-devel for the rest of the thread ]
>
> spice isn't supposed to do that.
>
> /me just added a assert in channel_event() and saw it trigger in display
> channel disconnects.
>
> #0 0x0000003ceba32a45 in raise () from /lib64/libc.so.6
> #1 0x0000003ceba34225 in abort () from /lib64/libc.so.6
> #2 0x0000003ceba2b9d5 in __assert_fail () from /lib64/libc.so.6
> #3 0x0000000000503759 in channel_event (event=3, info=0x35e9340)
> at /home/kraxel/projects/qemu/ui/spice-core.c:223
> #4 0x00007f9a77a9921b in reds_channel_event (s=0x35e92c0) at reds.c:400
> #5 reds_stream_free (s=0x35e92c0) at reds.c:4981
> #6 0x00007f9a77aac8b0 in red_disconnect_channel (channel=0x7f9a24069a80)
> at red_worker.c:8489
> #7 0x00007f9a77ab53a8 in handle_dev_input (listener=0x7f9a3211ab20,
> events=<value optimized out>)
> at red_worker.c:10062
> #8 0x00007f9a77ab436d in red_worker_main (arg=<value optimized out>) at
> red_worker.c:10304
> #9 0x0000003cec2077e1 in start_thread () from /lib64/libpthread.so.0
> #10 0x0000003cebae68ed in clone () from /lib64/libc.so.6
>
> IMHO spice server should handle the display channel tear-down in the
> dispatcher instead of the worker thread. Alon?
>
>>> Anyways, this commit fixes the problem at hand.
>
> Not really. channel_event() itself isn't thread-safe too, it does
> unlocked list operations which can also blow up when called from
> different threads.
>
> A patch like the attached (warning: untested) should do as quick&dirty
> fix for stable. But IMO we really should fix spice instead.

Spice should not be calling *any* QEMU code without holding the global 
mutex.  That includes all of the QObject interactions.

Regards,

Anthony Liguori

>
> cheers,
> Gerd
>
Luiz Capitulino - Sept. 2, 2011, 2:24 p.m.
On Fri, 02 Sep 2011 15:39:03 +0200
Gerd Hoffmann <kraxel@redhat.com> wrote:

>    Hi,
> 
> >> After some investigation, I found out that the problem is that different
> >> SPICE threads are calling monitor functions (such as
> >> monitor_protocol_event()) in parallel which causes concurrent access
> >> to the monitor's internal buffer outbuf[].
> 
> [ adding spice-list to Cc, see qemu-devel for the rest of the thread ]
> 
> spice isn't supposed to do that.
> 
> /me just added a assert in channel_event() and saw it trigger in display 
> channel disconnects.
> 
> #0  0x0000003ceba32a45 in raise () from /lib64/libc.so.6
> #1  0x0000003ceba34225 in abort () from /lib64/libc.so.6
> #2  0x0000003ceba2b9d5 in __assert_fail () from /lib64/libc.so.6
> #3  0x0000000000503759 in channel_event (event=3, info=0x35e9340)
>      at /home/kraxel/projects/qemu/ui/spice-core.c:223
> #4  0x00007f9a77a9921b in reds_channel_event (s=0x35e92c0) at reds.c:400
> #5  reds_stream_free (s=0x35e92c0) at reds.c:4981
> #6  0x00007f9a77aac8b0 in red_disconnect_channel 
> (channel=0x7f9a24069a80) at red_worker.c:8489
> #7  0x00007f9a77ab53a8 in handle_dev_input (listener=0x7f9a3211ab20, 
> events=<value optimized out>)
>      at red_worker.c:10062
> #8  0x00007f9a77ab436d in red_worker_main (arg=<value optimized out>) at 
> red_worker.c:10304
> #9  0x0000003cec2077e1 in start_thread () from /lib64/libpthread.so.0
> #10 0x0000003cebae68ed in clone () from /lib64/libc.so.6
> 
> IMHO spice server should handle the display channel tear-down in the 
> dispatcher instead of the worker thread.  Alon?
> 
> >> Anyways, this commit fixes the problem at hand.
> 
> Not really.  channel_event() itself isn't thread-safe too, it does 
> unlocked list operations which can also blow up when called from 
> different threads.

I thought my patch was at least a candidate for stable, but after this
thread I'm convinced the problem should be fixed in spice instead.

> 
> A patch like the attached (warning: untested) should do as quick&dirty 
> fix for stable.  But IMO we really should fix spice instead.
> 
> cheers,
>    Gerd
>
Anthony Liguori - Sept. 2, 2011, 2:28 p.m.
On 09/02/2011 08:39 AM, Gerd Hoffmann wrote:
> Hi,
>
>>> After some investigation, I found out that the problem is that different
>>> SPICE threads are calling monitor functions (such as
>>> monitor_protocol_event()) in parallel which causes concurrent access
>>> to the monitor's internal buffer outbuf[].
>
> [ adding spice-list to Cc, see qemu-devel for the rest of the thread ]
>
> spice isn't supposed to do that.
>
> /me just added a assert in channel_event() and saw it trigger in display
> channel disconnects.
>
> #0 0x0000003ceba32a45 in raise () from /lib64/libc.so.6
> #1 0x0000003ceba34225 in abort () from /lib64/libc.so.6
> #2 0x0000003ceba2b9d5 in __assert_fail () from /lib64/libc.so.6
> #3 0x0000000000503759 in channel_event (event=3, info=0x35e9340)
> at /home/kraxel/projects/qemu/ui/spice-core.c:223
> #4 0x00007f9a77a9921b in reds_channel_event (s=0x35e92c0) at reds.c:400
> #5 reds_stream_free (s=0x35e92c0) at reds.c:4981
> #6 0x00007f9a77aac8b0 in red_disconnect_channel (channel=0x7f9a24069a80)
> at red_worker.c:8489
> #7 0x00007f9a77ab53a8 in handle_dev_input (listener=0x7f9a3211ab20,
> events=<value optimized out>)
> at red_worker.c:10062
> #8 0x00007f9a77ab436d in red_worker_main (arg=<value optimized out>) at
> red_worker.c:10304
> #9 0x0000003cec2077e1 in start_thread () from /lib64/libpthread.so.0
> #10 0x0000003cebae68ed in clone () from /lib64/libc.so.6
>
> IMHO spice server should handle the display channel tear-down in the
> dispatcher instead of the worker thread. Alon?
>
>>> Anyways, this commit fixes the problem at hand.
>
> Not really. channel_event() itself isn't thread-safe too, it does
> unlocked list operations which can also blow up when called from
> different threads.
>
> A patch like the attached (warning: untested) should do as quick&dirty
> fix for stable. But IMO we really should fix spice instead.

I agree.  I'm not sure I like the idea of still calling QEMU code 
without holding the mutex (even the QObject code).

Can you just use a bottom half to defer this work to the I/O thread? 
Bottom half scheduling has to be signal safe which means it will also be 
thread safe.

Regards,

Anthony Liguori

>
> cheers,
> Gerd
>
Gerd Hoffmann - Sept. 2, 2011, 3:18 p.m.
Hi,

>> A patch like the attached (warning: untested) should do as quick&dirty
>> fix for stable. But IMO we really should fix spice instead.
>
> I agree. I'm not sure I like the idea of still calling QEMU code without
> holding the mutex (even the QObject code).

I though just creating the objects isn't an issue, but if you disagree 
we can just move up the lock to the head of the function.

> Can you just use a bottom half to defer this work to the I/O thread?
> Bottom half scheduling has to be signal safe which means it will also be
> thread safe.

Not that straight forward as I would have to pass arguments to the 
bottom half.

cheers,
   Gerd
Anthony Liguori - Sept. 2, 2011, 3:20 p.m.
On 09/02/2011 10:18 AM, Gerd Hoffmann wrote:
> Hi,
>
>>> A patch like the attached (warning: untested) should do as quick&dirty
>>> fix for stable. But IMO we really should fix spice instead.
>>
>> I agree. I'm not sure I like the idea of still calling QEMU code without
>> holding the mutex (even the QObject code).
>
> I though just creating the objects isn't an issue, but if you disagree
> we can just move up the lock to the head of the function.

What I fear is that Spice will assume something is thread safe, but then 
someone will make a change that makes the subsystem non-reentrant.

I'd rather that we have very clear rules about what's thread safe and 
not thread safe.  If you want to audit the QObject subsystem, declare it 
thread safe, and document it as such, that would be okay.  But it needs 
to be systematic, not ad-hoc.

Regards,

Anthony Liguori

>
>> Can you just use a bottom half to defer this work to the I/O thread?
>> Bottom half scheduling has to be signal safe which means it will also be
>> thread safe.
>
> Not that straight forward as I would have to pass arguments to the
> bottom half.
>
> cheers,
> Gerd
>
>
Paolo Bonzini - Sept. 2, 2011, 3:31 p.m.
On 09/02/2011 05:18 PM, Gerd Hoffmann wrote:
>
>> Can you just use a bottom half to defer this work to the I/O thread?
>> Bottom half scheduling has to be signal safe which means it will also be
>> thread safe.
>
> Not that straight forward as I would have to pass arguments to the
> bottom half.

Can you add a variant of qemu_bh_new that accepts a sizeof for the new 
bottom half?  Then the bottom half itself can be passed as the opaque 
and used for the arguments.

Paolo
Anthony Liguori - Sept. 2, 2011, 3:37 p.m.
On 09/02/2011 10:31 AM, Paolo Bonzini wrote:
> On 09/02/2011 05:18 PM, Gerd Hoffmann wrote:
>>
>>> Can you just use a bottom half to defer this work to the I/O thread?
>>> Bottom half scheduling has to be signal safe which means it will also be
>>> thread safe.
>>
>> Not that straight forward as I would have to pass arguments to the
>> bottom half.
>
> Can you add a variant of qemu_bh_new that accepts a sizeof for the new
> bottom half? Then the bottom half itself can be passed as the opaque and
> used for the arguments.

Bottom halves are opaque to the caller.

Passing arguments would require careful consideration of locking too.  I 
think the best way to resolve this is to fix libspice and not try to 
work around the problem in QEMU.

Regards,

Anthony Liguori

>
> Paolo
Gerd Hoffmann - Sept. 5, 2011, 7:48 a.m.
On 09/02/11 17:31, Paolo Bonzini wrote:
> On 09/02/2011 05:18 PM, Gerd Hoffmann wrote:
>>
>>> Can you just use a bottom half to defer this work to the I/O thread?
>>> Bottom half scheduling has to be signal safe which means it will also be
>>> thread safe.
>>
>> Not that straight forward as I would have to pass arguments to the
>> bottom half.
>
> Can you add a variant of qemu_bh_new that accepts a sizeof for the new
> bottom half? Then the bottom half itself can be passed as the opaque and
> used for the arguments.

That wouldn't help.  I would have to create some kind of job queue which 
is then processed by the bottom half.

cheers,
   Gerd

Patch

diff --git a/ui/spice-core.c b/ui/spice-core.c
index dba11f0..c99cdc5 100644
--- a/ui/spice-core.c
+++ b/ui/spice-core.c
@@ -19,6 +19,7 @@ 
 #include <spice-experimental.h>
 
 #include <netdb.h>
+#include <pthread.h>
 
 #include "qemu-common.h"
 #include "qemu-spice.h"
@@ -44,6 +45,8 @@  static char *auth_passwd;
 static time_t auth_expires = TIME_MAX;
 int using_spice = 0;
 
+static pthread_t me;
+
 struct SpiceTimer {
     QEMUTimer *timer;
     QTAILQ_ENTRY(SpiceTimer) next;
@@ -216,6 +219,8 @@  static void channel_event(int event, SpiceChannelEventInfo *info)
     };
     QDict *server, *client;
     QObject *data;
+    bool need_lock = !pthread_equal(me, pthread_self());
+    static int first = 1;
 
     client = qdict_new();
     add_addr_info(client, &info->paddr, info->plen);
@@ -223,6 +228,14 @@  static void channel_event(int event, SpiceChannelEventInfo *info)
     server = qdict_new();
     add_addr_info(server, &info->laddr, info->llen);
 
+    if (need_lock) {
+        qemu_mutex_lock_iothread();
+        if (first) {
+            fprintf(stderr, "You are using a broken spice-server version\n");
+            first = 0;
+        }
+    }
+
     if (event == SPICE_CHANNEL_EVENT_INITIALIZED) {
         qdict_put(server, "auth", qstring_from_str(auth));
         add_channel_info(client, info);
@@ -236,6 +249,10 @@  static void channel_event(int event, SpiceChannelEventInfo *info)
                               QOBJECT(client), QOBJECT(server));
     monitor_protocol_event(qevent[event], data);
     qobject_decref(data);
+
+    if (need_lock) {
+        qemu_mutex_unlock_iothread();
+    }
 }
 
 #else /* SPICE_INTERFACE_CORE_MINOR >= 3 */
@@ -482,7 +499,9 @@  void qemu_spice_init(void)
     spice_image_compression_t compression;
     spice_wan_compression_t wan_compr;
 
-    if (!opts) {
+    me = pthread_self();
+
+   if (!opts) {
         return;
     }
     port = qemu_opt_get_number(opts, "port", 0);