Message ID | 4E60DC77.5020300@redhat.com |
---|---|
State | New |
Headers | show |
On 09/02/2011 08:39 AM, Gerd Hoffmann wrote: > Hi, > >>> After some investigation, I found out that the problem is that different >>> SPICE threads are calling monitor functions (such as >>> monitor_protocol_event()) in parallel which causes concurrent access >>> to the monitor's internal buffer outbuf[]. > > [ adding spice-list to Cc, see qemu-devel for the rest of the thread ] > > spice isn't supposed to do that. > > /me just added a assert in channel_event() and saw it trigger in display > channel disconnects. > > #0 0x0000003ceba32a45 in raise () from /lib64/libc.so.6 > #1 0x0000003ceba34225 in abort () from /lib64/libc.so.6 > #2 0x0000003ceba2b9d5 in __assert_fail () from /lib64/libc.so.6 > #3 0x0000000000503759 in channel_event (event=3, info=0x35e9340) > at /home/kraxel/projects/qemu/ui/spice-core.c:223 > #4 0x00007f9a77a9921b in reds_channel_event (s=0x35e92c0) at reds.c:400 > #5 reds_stream_free (s=0x35e92c0) at reds.c:4981 > #6 0x00007f9a77aac8b0 in red_disconnect_channel (channel=0x7f9a24069a80) > at red_worker.c:8489 > #7 0x00007f9a77ab53a8 in handle_dev_input (listener=0x7f9a3211ab20, > events=<value optimized out>) > at red_worker.c:10062 > #8 0x00007f9a77ab436d in red_worker_main (arg=<value optimized out>) at > red_worker.c:10304 > #9 0x0000003cec2077e1 in start_thread () from /lib64/libpthread.so.0 > #10 0x0000003cebae68ed in clone () from /lib64/libc.so.6 > > IMHO spice server should handle the display channel tear-down in the > dispatcher instead of the worker thread. Alon? > >>> Anyways, this commit fixes the problem at hand. > > Not really. channel_event() itself isn't thread-safe too, it does > unlocked list operations which can also blow up when called from > different threads. > > A patch like the attached (warning: untested) should do as quick&dirty > fix for stable. But IMO we really should fix spice instead. Spice should not be calling *any* QEMU code without holding the global mutex. That includes all of the QObject interactions. Regards, Anthony Liguori > > cheers, > Gerd >
On Fri, 02 Sep 2011 15:39:03 +0200 Gerd Hoffmann <kraxel@redhat.com> wrote: > Hi, > > >> After some investigation, I found out that the problem is that different > >> SPICE threads are calling monitor functions (such as > >> monitor_protocol_event()) in parallel which causes concurrent access > >> to the monitor's internal buffer outbuf[]. > > [ adding spice-list to Cc, see qemu-devel for the rest of the thread ] > > spice isn't supposed to do that. > > /me just added a assert in channel_event() and saw it trigger in display > channel disconnects. > > #0 0x0000003ceba32a45 in raise () from /lib64/libc.so.6 > #1 0x0000003ceba34225 in abort () from /lib64/libc.so.6 > #2 0x0000003ceba2b9d5 in __assert_fail () from /lib64/libc.so.6 > #3 0x0000000000503759 in channel_event (event=3, info=0x35e9340) > at /home/kraxel/projects/qemu/ui/spice-core.c:223 > #4 0x00007f9a77a9921b in reds_channel_event (s=0x35e92c0) at reds.c:400 > #5 reds_stream_free (s=0x35e92c0) at reds.c:4981 > #6 0x00007f9a77aac8b0 in red_disconnect_channel > (channel=0x7f9a24069a80) at red_worker.c:8489 > #7 0x00007f9a77ab53a8 in handle_dev_input (listener=0x7f9a3211ab20, > events=<value optimized out>) > at red_worker.c:10062 > #8 0x00007f9a77ab436d in red_worker_main (arg=<value optimized out>) at > red_worker.c:10304 > #9 0x0000003cec2077e1 in start_thread () from /lib64/libpthread.so.0 > #10 0x0000003cebae68ed in clone () from /lib64/libc.so.6 > > IMHO spice server should handle the display channel tear-down in the > dispatcher instead of the worker thread. Alon? > > >> Anyways, this commit fixes the problem at hand. > > Not really. channel_event() itself isn't thread-safe too, it does > unlocked list operations which can also blow up when called from > different threads. I thought my patch was at least a candidate for stable, but after this thread I'm convinced the problem should be fixed in spice instead. > > A patch like the attached (warning: untested) should do as quick&dirty > fix for stable. But IMO we really should fix spice instead. > > cheers, > Gerd >
On 09/02/2011 08:39 AM, Gerd Hoffmann wrote: > Hi, > >>> After some investigation, I found out that the problem is that different >>> SPICE threads are calling monitor functions (such as >>> monitor_protocol_event()) in parallel which causes concurrent access >>> to the monitor's internal buffer outbuf[]. > > [ adding spice-list to Cc, see qemu-devel for the rest of the thread ] > > spice isn't supposed to do that. > > /me just added a assert in channel_event() and saw it trigger in display > channel disconnects. > > #0 0x0000003ceba32a45 in raise () from /lib64/libc.so.6 > #1 0x0000003ceba34225 in abort () from /lib64/libc.so.6 > #2 0x0000003ceba2b9d5 in __assert_fail () from /lib64/libc.so.6 > #3 0x0000000000503759 in channel_event (event=3, info=0x35e9340) > at /home/kraxel/projects/qemu/ui/spice-core.c:223 > #4 0x00007f9a77a9921b in reds_channel_event (s=0x35e92c0) at reds.c:400 > #5 reds_stream_free (s=0x35e92c0) at reds.c:4981 > #6 0x00007f9a77aac8b0 in red_disconnect_channel (channel=0x7f9a24069a80) > at red_worker.c:8489 > #7 0x00007f9a77ab53a8 in handle_dev_input (listener=0x7f9a3211ab20, > events=<value optimized out>) > at red_worker.c:10062 > #8 0x00007f9a77ab436d in red_worker_main (arg=<value optimized out>) at > red_worker.c:10304 > #9 0x0000003cec2077e1 in start_thread () from /lib64/libpthread.so.0 > #10 0x0000003cebae68ed in clone () from /lib64/libc.so.6 > > IMHO spice server should handle the display channel tear-down in the > dispatcher instead of the worker thread. Alon? > >>> Anyways, this commit fixes the problem at hand. > > Not really. channel_event() itself isn't thread-safe too, it does > unlocked list operations which can also blow up when called from > different threads. > > A patch like the attached (warning: untested) should do as quick&dirty > fix for stable. But IMO we really should fix spice instead. I agree. I'm not sure I like the idea of still calling QEMU code without holding the mutex (even the QObject code). Can you just use a bottom half to defer this work to the I/O thread? Bottom half scheduling has to be signal safe which means it will also be thread safe. Regards, Anthony Liguori > > cheers, > Gerd >
Hi, >> A patch like the attached (warning: untested) should do as quick&dirty >> fix for stable. But IMO we really should fix spice instead. > > I agree. I'm not sure I like the idea of still calling QEMU code without > holding the mutex (even the QObject code). I though just creating the objects isn't an issue, but if you disagree we can just move up the lock to the head of the function. > Can you just use a bottom half to defer this work to the I/O thread? > Bottom half scheduling has to be signal safe which means it will also be > thread safe. Not that straight forward as I would have to pass arguments to the bottom half. cheers, Gerd
On 09/02/2011 10:18 AM, Gerd Hoffmann wrote: > Hi, > >>> A patch like the attached (warning: untested) should do as quick&dirty >>> fix for stable. But IMO we really should fix spice instead. >> >> I agree. I'm not sure I like the idea of still calling QEMU code without >> holding the mutex (even the QObject code). > > I though just creating the objects isn't an issue, but if you disagree > we can just move up the lock to the head of the function. What I fear is that Spice will assume something is thread safe, but then someone will make a change that makes the subsystem non-reentrant. I'd rather that we have very clear rules about what's thread safe and not thread safe. If you want to audit the QObject subsystem, declare it thread safe, and document it as such, that would be okay. But it needs to be systematic, not ad-hoc. Regards, Anthony Liguori > >> Can you just use a bottom half to defer this work to the I/O thread? >> Bottom half scheduling has to be signal safe which means it will also be >> thread safe. > > Not that straight forward as I would have to pass arguments to the > bottom half. > > cheers, > Gerd > >
On 09/02/2011 05:18 PM, Gerd Hoffmann wrote: > >> Can you just use a bottom half to defer this work to the I/O thread? >> Bottom half scheduling has to be signal safe which means it will also be >> thread safe. > > Not that straight forward as I would have to pass arguments to the > bottom half. Can you add a variant of qemu_bh_new that accepts a sizeof for the new bottom half? Then the bottom half itself can be passed as the opaque and used for the arguments. Paolo
On 09/02/2011 10:31 AM, Paolo Bonzini wrote: > On 09/02/2011 05:18 PM, Gerd Hoffmann wrote: >> >>> Can you just use a bottom half to defer this work to the I/O thread? >>> Bottom half scheduling has to be signal safe which means it will also be >>> thread safe. >> >> Not that straight forward as I would have to pass arguments to the >> bottom half. > > Can you add a variant of qemu_bh_new that accepts a sizeof for the new > bottom half? Then the bottom half itself can be passed as the opaque and > used for the arguments. Bottom halves are opaque to the caller. Passing arguments would require careful consideration of locking too. I think the best way to resolve this is to fix libspice and not try to work around the problem in QEMU. Regards, Anthony Liguori > > Paolo
On 09/02/11 17:31, Paolo Bonzini wrote: > On 09/02/2011 05:18 PM, Gerd Hoffmann wrote: >> >>> Can you just use a bottom half to defer this work to the I/O thread? >>> Bottom half scheduling has to be signal safe which means it will also be >>> thread safe. >> >> Not that straight forward as I would have to pass arguments to the >> bottom half. > > Can you add a variant of qemu_bh_new that accepts a sizeof for the new > bottom half? Then the bottom half itself can be passed as the opaque and > used for the arguments. That wouldn't help. I would have to create some kind of job queue which is then processed by the bottom half. cheers, Gerd
diff --git a/ui/spice-core.c b/ui/spice-core.c index dba11f0..c99cdc5 100644 --- a/ui/spice-core.c +++ b/ui/spice-core.c @@ -19,6 +19,7 @@ #include <spice-experimental.h> #include <netdb.h> +#include <pthread.h> #include "qemu-common.h" #include "qemu-spice.h" @@ -44,6 +45,8 @@ static char *auth_passwd; static time_t auth_expires = TIME_MAX; int using_spice = 0; +static pthread_t me; + struct SpiceTimer { QEMUTimer *timer; QTAILQ_ENTRY(SpiceTimer) next; @@ -216,6 +219,8 @@ static void channel_event(int event, SpiceChannelEventInfo *info) }; QDict *server, *client; QObject *data; + bool need_lock = !pthread_equal(me, pthread_self()); + static int first = 1; client = qdict_new(); add_addr_info(client, &info->paddr, info->plen); @@ -223,6 +228,14 @@ static void channel_event(int event, SpiceChannelEventInfo *info) server = qdict_new(); add_addr_info(server, &info->laddr, info->llen); + if (need_lock) { + qemu_mutex_lock_iothread(); + if (first) { + fprintf(stderr, "You are using a broken spice-server version\n"); + first = 0; + } + } + if (event == SPICE_CHANNEL_EVENT_INITIALIZED) { qdict_put(server, "auth", qstring_from_str(auth)); add_channel_info(client, info); @@ -236,6 +249,10 @@ static void channel_event(int event, SpiceChannelEventInfo *info) QOBJECT(client), QOBJECT(server)); monitor_protocol_event(qevent[event], data); qobject_decref(data); + + if (need_lock) { + qemu_mutex_unlock_iothread(); + } } #else /* SPICE_INTERFACE_CORE_MINOR >= 3 */ @@ -482,7 +499,9 @@ void qemu_spice_init(void) spice_image_compression_t compression; spice_wan_compression_t wan_compr; - if (!opts) { + me = pthread_self(); + + if (!opts) { return; } port = qemu_opt_get_number(opts, "port", 0);