diff mbox

[RFC] fix select(2) race between main_loop_wait and qemu_aio_wait

Message ID 1330936455-23802-1-git-send-email-pbonzini@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini March 5, 2012, 8:34 a.m. UTC
This is quite ugly.  Two threads, one running main_loop_wait and
one running qemu_aio_wait, can race with each other on running the
same iohandler.  The result is that an iohandler could run while the
underlying socket is not readable or writable, with possibly ill effects.

This shows as a failure to boot an IDE disk using the NBD device.
We can consider it a bug in NBD or in the main loop.  The patch fixes
this in main_loop_wait, which is always going to lose the race because
qemu_aio_wait runs select with the global lock held.

Reported-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
	Anthony, if you think this is too ugly tell me and I can
	post an NBD fix too.

 main-loop.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

Comments

Jan Kiszka March 5, 2012, 9:07 a.m. UTC | #1
On 2012-03-05 09:34, Paolo Bonzini wrote:
> This is quite ugly.  Two threads, one running main_loop_wait and
> one running qemu_aio_wait, can race with each other on running the
> same iohandler.  The result is that an iohandler could run while the
> underlying socket is not readable or writable, with possibly ill effects.

Hmm, isn't it a problem already that a socket is polled by two threads
at the same time? Can't that be avoided?

Long-term, I'd like to cut out certain file descriptors from the main
loop and process them completely in separate threads (for separate
locking, prioritization etc.). Dunno how NBD works, but maybe it should
be reworked like this already.

Jan

> 
> This shows as a failure to boot an IDE disk using the NBD device.
> We can consider it a bug in NBD or in the main loop.  The patch fixes
> this in main_loop_wait, which is always going to lose the race because
> qemu_aio_wait runs select with the global lock held.
> 
> Reported-by: Laurent Vivier <laurent@vivier.eu>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> 	Anthony, if you think this is too ugly tell me and I can
> 	post an NBD fix too.
> 
>  main-loop.c |    7 +++++++
>  1 files changed, 7 insertions(+), 0 deletions(-)
> 
> diff --git a/main-loop.c b/main-loop.c
> index db23de0..3beccff 100644
> --- a/main-loop.c
> +++ b/main-loop.c
> @@ -458,6 +458,13 @@ int main_loop_wait(int nonblocking)
>  
>      if (timeout > 0) {
>          qemu_mutex_lock_iothread();
> +
> +        /* Poll again.  A qemu_aio_wait() on another thread
> +         * could have made the fdsets stale.
> +         */
> +        tv.tv_sec = 0;
> +        tv.tv_usec = 0;
> +        ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv);
>      }
>  
>      glib_select_poll(&rfds, &wfds, &xfds, (ret < 0));
Paolo Bonzini March 5, 2012, 9:25 a.m. UTC | #2
Il 05/03/2012 10:07, Jan Kiszka ha scritto:
> > This is quite ugly.  Two threads, one running main_loop_wait and
> > one running qemu_aio_wait, can race with each other on running the
> > same iohandler.  The result is that an iohandler could run while the
> > underlying socket is not readable or writable, with possibly ill effects.
> 
> Hmm, isn't it a problem already that a socket is polled by two threads
> at the same time? Can't that be avoided?

We still have synchronous I/O in the device models.  That's the root
cause of the bug, I suppose.

> Long-term, I'd like to cut out certain file descriptors from the main
> loop and process them completely in separate threads (for separate
> locking, prioritization etc.). Dunno how NBD works, but maybe it should
> be reworked like this already.

Me too, I even made a very simple proof of concept a couple of weeks ago
(search for a thread "switching the block layer from coroutines to
threads").  It worked, though it is obviously not upstreamable in any way.

In that world order EventNotifiers would replace
qemu_aio_set_fd_handler, and socket-based protocols such as NBD would
run with blocking I/O in their own thread.  In addition to one thread
per I/O request (from a thread pool), there would be one arbiter thread
that reads replies and dispatches them to the appropriate I/O request
thread.  The arbiter thread replaces the read callback in
qemu_aio_set_fd_handler.

The problem is, even though it worked, making this thread-safe is
another story.  I suspect that in practice it is very difficult to do
without resurrecting RCU patches.

Paolo
Avi Kivity March 5, 2012, 2:24 p.m. UTC | #3
On 03/05/2012 11:07 AM, Jan Kiszka wrote:
> On 2012-03-05 09:34, Paolo Bonzini wrote:
> > This is quite ugly.  Two threads, one running main_loop_wait and
> > one running qemu_aio_wait, can race with each other on running the
> > same iohandler.  The result is that an iohandler could run while the
> > underlying socket is not readable or writable, with possibly ill effects.
>
> Hmm, isn't it a problem already that a socket is polled by two threads
> at the same time? Can't that be avoided?

Could it be done simply by adding a mutex there?  It's hardly a clean
fix, but it's not a clean problem.

> Long-term, I'd like to cut out certain file descriptors from the main
> loop and process them completely in separate threads (for separate
> locking, prioritization etc.). Dunno how NBD works, but maybe it should
> be reworked like this already.

Ideally qemu_set_fd_handler2() should be made thread local, and each
device thread would run a copy of the main loop, just working on
different data.
Paolo Bonzini March 5, 2012, 2:30 p.m. UTC | #4
Il 05/03/2012 15:24, Avi Kivity ha scritto:
> On 03/05/2012 11:07 AM, Jan Kiszka wrote:
>> On 2012-03-05 09:34, Paolo Bonzini wrote:
>>> This is quite ugly.  Two threads, one running main_loop_wait and
>>> one running qemu_aio_wait, can race with each other on running the
>>> same iohandler.  The result is that an iohandler could run while the
>>> underlying socket is not readable or writable, with possibly ill effects.
>>
>> Hmm, isn't it a problem already that a socket is polled by two threads
>> at the same time? Can't that be avoided?
> 
> Could it be done simply by adding a mutex there?  It's hardly a clean
> fix, but it's not a clean problem.

Hmm, I don't think so.  It would need to protect execution of the
iohandlers too, and pretty much everything can happen there including a
nested loop.  Of course recursive mutexes exist, but it sounds like too
big an axe.

I could add a generation count updated by qemu_aio_wait(), and rerun the
select() only if the generation count changes during its execution.

Or we can call it an NBD bug.  I'm not against that, but it seemed to me
that the problem is more general.

Paolo
Jan Kiszka March 5, 2012, 2:30 p.m. UTC | #5
On 2012-03-05 15:24, Avi Kivity wrote:
>> Long-term, I'd like to cut out certain file descriptors from the main
>> loop and process them completely in separate threads (for separate
>> locking, prioritization etc.). Dunno how NBD works, but maybe it should
>> be reworked like this already.
> 
> Ideally qemu_set_fd_handler2() should be made thread local, and each
> device thread would run a copy of the main loop, just working on
> different data.

qemu_set_fd_handler2 may not only be called over an iothread. Rather, we
need an object and associated lock that is related to the io-path (i.e.
frontend device + backend driver). That object has to be passed to
services like qemu_set_fd_handler2.

Jan
Avi Kivity March 5, 2012, 3:14 p.m. UTC | #6
On 03/05/2012 04:30 PM, Paolo Bonzini wrote:
> Il 05/03/2012 15:24, Avi Kivity ha scritto:
> > On 03/05/2012 11:07 AM, Jan Kiszka wrote:
> >> On 2012-03-05 09:34, Paolo Bonzini wrote:
> >>> This is quite ugly.  Two threads, one running main_loop_wait and
> >>> one running qemu_aio_wait, can race with each other on running the
> >>> same iohandler.  The result is that an iohandler could run while the
> >>> underlying socket is not readable or writable, with possibly ill effects.
> >>
> >> Hmm, isn't it a problem already that a socket is polled by two threads
> >> at the same time? Can't that be avoided?
> > 
> > Could it be done simply by adding a mutex there?  It's hardly a clean
> > fix, but it's not a clean problem.
>
> Hmm, I don't think so.  It would need to protect execution of the
> iohandlers too, and pretty much everything can happen there including a
> nested loop.  Of course recursive mutexes exist, but it sounds like too
> big an axe.

The I/O handlers would still use the qemu mutex, no?  we'd just protect
the select() (taking the mutex from before releasing the global lock,
and reacquiring it afterwards).

> I could add a generation count updated by qemu_aio_wait(), and rerun the
> select() only if the generation count changes during its execution.
>
> Or we can call it an NBD bug.  I'm not against that, but it seemed to me
> that the problem is more general.

What about making sure all callers of qemu_aio_wait() run from
coroutines (or threads)?  Then they just ask the main thread to wake
them up, instead of dispatching completions themselves.
Paolo Bonzini March 5, 2012, 4:14 p.m. UTC | #7
Il 05/03/2012 16:14, Avi Kivity ha scritto:
>> > Hmm, I don't think so.  It would need to protect execution of the
>> > iohandlers too, and pretty much everything can happen there including a
>> > nested loop.  Of course recursive mutexes exist, but it sounds like too
>> > big an axe.
> The I/O handlers would still use the qemu mutex, no?  we'd just protect
> the select() (taking the mutex from before releasing the global lock,
> and reacquiring it afterwards).

Yes, that could work, but it is _really_ ugly.  I still prefer this
patch or fixing NBD.  At least both contain the hack in a single place.

>> > I could add a generation count updated by qemu_aio_wait(), and rerun the
>> > select() only if the generation count changes during its execution.
>> >
>> > Or we can call it an NBD bug.  I'm not against that, but it seemed to me
>> > that the problem is more general.
> What about making sure all callers of qemu_aio_wait() run from
> coroutines (or threads)?  Then they just ask the main thread to wake
> them up, instead of dispatching completions themselves.

That would open another Pandora's box.  The point of having a separate
main loop is that only AIO can happen during qemu_aio_wait() or
qemu_aio_flush().  In particular you don't want the monitor to process
input while you're running another monitor command.

Paolo
Avi Kivity March 5, 2012, 5:35 p.m. UTC | #8
On 03/05/2012 06:14 PM, Paolo Bonzini wrote:
> Il 05/03/2012 16:14, Avi Kivity ha scritto:
> >> > Hmm, I don't think so.  It would need to protect execution of the
> >> > iohandlers too, and pretty much everything can happen there including a
> >> > nested loop.  Of course recursive mutexes exist, but it sounds like too
> >> > big an axe.
> > The I/O handlers would still use the qemu mutex, no?  we'd just protect
> > the select() (taking the mutex from before releasing the global lock,
> > and reacquiring it afterwards).
>
> Yes, that could work, but it is _really_ ugly. 

Yes, it is...

>  I still prefer this
> patch or fixing NBD.  At least both contain the hack in a single place.



> >> > I could add a generation count updated by qemu_aio_wait(), and rerun the
> >> > select() only if the generation count changes during its execution.
> >> >
> >> > Or we can call it an NBD bug.  I'm not against that, but it seemed to me
> >> > that the problem is more general.
> > What about making sure all callers of qemu_aio_wait() run from
> > coroutines (or threads)?  Then they just ask the main thread to wake
> > them up, instead of dispatching completions themselves.
>
> That would open another Pandora's box.  The point of having a separate
> main loop is that only AIO can happen during qemu_aio_wait() or
> qemu_aio_flush().  In particular you don't want the monitor to process
> input while you're running another monitor command.

Hmm, yes, we're abusing the type of completion here as a kind of wierd
locking.  It's conceptually broken since an aio completion could trigger
anything.  Usually it just involves block format driver and device code,
but in theory, it can affect the state of whoever's running qemu_aio_wait().
Avi Kivity March 5, 2012, 5:39 p.m. UTC | #9
On 03/05/2012 04:30 PM, Jan Kiszka wrote:
> On 2012-03-05 15:24, Avi Kivity wrote:
> >> Long-term, I'd like to cut out certain file descriptors from the main
> >> loop and process them completely in separate threads (for separate
> >> locking, prioritization etc.). Dunno how NBD works, but maybe it should
> >> be reworked like this already.
> > 
> > Ideally qemu_set_fd_handler2() should be made thread local, and each
> > device thread would run a copy of the main loop, just working on
> > different data.
>
> qemu_set_fd_handler2 may not only be called over an iothread. Rather, we
> need an object and associated lock that is related to the io-path (i.e.
> frontend device + backend driver). That object has to be passed to
> services like qemu_set_fd_handler2.

Not sure I like implicit lock-taking.  In particular, how does it
interact with unregistering an fd_handler?
Jan Kiszka March 5, 2012, 5:55 p.m. UTC | #10
On 2012-03-05 18:39, Avi Kivity wrote:
> On 03/05/2012 04:30 PM, Jan Kiszka wrote:
>> On 2012-03-05 15:24, Avi Kivity wrote:
>>>> Long-term, I'd like to cut out certain file descriptors from the main
>>>> loop and process them completely in separate threads (for separate
>>>> locking, prioritization etc.). Dunno how NBD works, but maybe it should
>>>> be reworked like this already.
>>>
>>> Ideally qemu_set_fd_handler2() should be made thread local, and each
>>> device thread would run a copy of the main loop, just working on
>>> different data.
>>
>> qemu_set_fd_handler2 may not only be called over an iothread. Rather, we
>> need an object and associated lock that is related to the io-path (i.e.
>> frontend device + backend driver). That object has to be passed to
>> services like qemu_set_fd_handler2.
> 
> Not sure I like implicit lock-taking.  In particular, how does it
> interact with unregistering an fd_handler?

I wasn't suggesting implicit lock taking, just decoupling from our
infamous global lock. My point is that thread-local won't help here.

Jan
Paolo Bonzini March 6, 2012, 9:01 a.m. UTC | #11
Il 05/03/2012 18:35, Avi Kivity ha scritto:
>>> > > The I/O handlers would still use the qemu mutex, no?  we'd just protect
>>> > > the select() (taking the mutex from before releasing the global lock,
>>> > > and reacquiring it afterwards).
>> >
>> > Yes, that could work, but it is _really_ ugly. 
> Yes, it is...
> 

Let's just fix it in NBD then.

Paolo
diff mbox

Patch

diff --git a/main-loop.c b/main-loop.c
index db23de0..3beccff 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -458,6 +458,13 @@  int main_loop_wait(int nonblocking)
 
     if (timeout > 0) {
         qemu_mutex_lock_iothread();
+
+        /* Poll again.  A qemu_aio_wait() on another thread
+         * could have made the fdsets stale.
+         */
+        tv.tv_sec = 0;
+        tv.tv_usec = 0;
+        ret = select(nfds + 1, &rfds, &wfds, &xfds, &tv);
     }
 
     glib_select_poll(&rfds, &wfds, &xfds, (ret < 0));