Message ID | 1437370031-9070-1-git-send-email-pbonzini@redhat.com |
---|---|
State | New |
Headers | show |
On Mon, 07/20 07:27, Paolo Bonzini wrote: > diff --git a/aio-win32.c b/aio-win32.c > index ea655b0..7afc999 100644 > --- a/aio-win32.c > +++ b/aio-win32.c > @@ -337,10 +337,11 @@ bool aio_poll(AioContext *ctx, bool blocking) > aio_context_acquire(ctx); > } > > - if (first && aio_bh_poll(ctx)) { > - progress = true; > + if (first) { > + event_notifier_test_and_clear(&ctx->notifier); I'm looking at optimizing it but I don't fully understand the relationship between aio_prepare and WaitForMultipleObjects. Do they get the same set of events? What if a new event comes in between, for example, thread worker calls aio_notify()? Fam > + progress |= aio_bh_poll(ctx); > + first = false; > } > - first = false; > > /* if we have any signaled events, dispatch event */ > event = NULL;
On Mon, 07/20 15:46, Fam Zheng wrote: > On Mon, 07/20 07:27, Paolo Bonzini wrote: > > diff --git a/aio-win32.c b/aio-win32.c > > index ea655b0..7afc999 100644 > > --- a/aio-win32.c > > +++ b/aio-win32.c > > @@ -337,10 +337,11 @@ bool aio_poll(AioContext *ctx, bool blocking) > > aio_context_acquire(ctx); > > } > > > > - if (first && aio_bh_poll(ctx)) { > > - progress = true; > > + if (first) { > > + event_notifier_test_and_clear(&ctx->notifier); > > I'm looking at optimizing it but I don't fully understand the relationship > between aio_prepare and WaitForMultipleObjects. Do they get the same set of > events? What if a new event comes in between, for example, thread worker calls > aio_notify()? > After some reading I think WaitForMultipleObjects is for event notifiers and aio_prepare is for select() on fd events. It's a bit trickier than aio-posix, in the first iteration there could be another event masking ctx->notifier so we don't know if we need to clear it. But since MSDN says: """ ... the return value minus WAIT_OBJECT_0 indicates the lpHandles array index of the object that satisfied the wait. If more than one object became signaled during the call, this is the array index of the signaled object with the smallest index value of all the signaled objects. """ Maybe we can reverse events[] so that ctx->notifier will be the 0th one. And I think we can always remove it after first iteration, am I right? Fam
> > I'm looking at optimizing it but I don't fully understand the relationship > > between aio_prepare and WaitForMultipleObjects. Do they get the same set of > > events? > > After some reading I think WaitForMultipleObjects is for event notifiers and > aio_prepare is for select() on fd events. > It's a bit trickier than aio-posix, in the first iteration there could be > another event masking ctx->notifier so we don't know if we need to clear it. > Maybe we can reverse events[] so that ctx->notifier will be the 0th one. And I > think we can always remove it after first iteration, am I right? Yes, that would work. I am not sure how complex it would be. You would also need a solution for the GSource and one (probably similar to aio-posix) for your epoll implementation. With ctx->notified at least you can encapsulate it in aio_notify_accept... Stefan, any preferences? Paolo
On Mon, Jul 20, 2015 at 07:27:11AM +0200, Paolo Bonzini wrote: > event_notifier_test_and_clear must be called before processing events. > Otherwise, an aio_poll could "eat" the notification before the main > I/O thread invokes ppoll(). The main I/O thread then never wakes up. > This is an example of what could happen: > > i/o thread vcpu thread worker thread > --------------------------------------------------------------------- > lock_iothread > notify_me = 1 > ... > unlock_iothread > lock_iothread > notify_me = 3 > ppoll > notify_me = 1 > bh->scheduled = 1 > event_notifier_set > event_notifier_test_and_clear > ppoll > *** hang *** I don't understand this diagram. Why is event_notifier_test_and_clear() called by the vcpu thread? event_notifier_set() was called *after* vcpu thread's ppoll() returned so I wouldn't expect the vcpu thread to see the notify. Was there a previous notify pending?
On 20/07/2015 18:36, Stefan Hajnoczi wrote: >> > >> > i/o thread vcpu thread worker thread >> > --------------------------------------------------------------------- >> > lock_iothread >> > notify_me = 1 >> > ... >> > unlock_iothread >> > lock_iothread >> > notify_me = 3 >> > ppoll >> > notify_me = 1 >> > bh->scheduled = 1 >> > event_notifier_set >> > event_notifier_test_and_clear >> > ppoll >> > *** hang *** > I don't understand this diagram. Why is event_notifier_test_and_clear() > called by the vcpu thread? event_notifier_set() was called *after* vcpu > thread's ppoll() returned so I wouldn't expect the vcpu thread to see > the notify. > > Was there a previous notify pending? Yes, for example a bottom half could have rescheduled itself. Paolo
On Mon, 07/20 08:42, Paolo Bonzini wrote: > > > > I'm looking at optimizing it but I don't fully understand the relationship > > > between aio_prepare and WaitForMultipleObjects. Do they get the same set of > > > events? > > > > After some reading I think WaitForMultipleObjects is for event notifiers and > > aio_prepare is for select() on fd events. > > It's a bit trickier than aio-posix, in the first iteration there could be > > another event masking ctx->notifier so we don't know if we need to clear it. > > Maybe we can reverse events[] so that ctx->notifier will be the 0th one. And I > > think we can always remove it after first iteration, am I right? > > Yes, that would work. I am not sure how complex it would be. You would also need > a solution for the GSource and one (probably similar to aio-posix) for your > epoll implementation. With ctx->notified at least you can encapsulate it in > aio_notify_accept... Yes, I'll take this into consideration when I refactor for epoll implementation. For now I think aio_notify_accept is cleaner. Fam
On 21/07/2015 03:19, Fam Zheng wrote: > > Yes, that would work. I am not sure how complex it would be. You would also need > > a solution for the GSource and one (probably similar to aio-posix) for your > > epoll implementation. With ctx->notified at least you can encapsulate it in > > aio_notify_accept... > > Yes, I'll take this into consideration when I refactor for epoll > implementation. For now I think aio_notify_accept is cleaner. Ok, I'll wait for Stefan to pick up the unoptimized version, and then send v2 of aio_notify_accept. Paolo
On Mon, Jul 20, 2015 at 07:27:11AM +0200, Paolo Bonzini wrote: > event_notifier_test_and_clear must be called before processing events. > Otherwise, an aio_poll could "eat" the notification before the main > I/O thread invokes ppoll(). The main I/O thread then never wakes up. > This is an example of what could happen: > > i/o thread vcpu thread worker thread > --------------------------------------------------------------------- > lock_iothread > notify_me = 1 > ... > unlock_iothread > lock_iothread > notify_me = 3 > ppoll > notify_me = 1 > bh->scheduled = 1 > event_notifier_set > event_notifier_test_and_clear > ppoll > *** hang *** > > In contrast with the previous bug, there wasn't really much debugging > to do here. "Tracing" with qemu_clock_get_ns shows pretty much the > same behavior as in the previous patch, so the only wait to find the > bug is staring at the code. > > One could also use a formal model, of course. The included one shows > this with three processes: notifier corresponds to a QEMU thread pool > worker, temporary_waiter to a VCPU thread that invokes aio_poll(), > waiter to the main I/O thread. I would be happy to say that the > formal model found the bug for me, but actually I wrote it after the > fact. > > This patch is a bit of a big hammer. Fam is looking into optimizations. > > Reported-by: Richard W. M. Jones <rjones@redhat.com> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > v1->v2: only do event_notifier_test_and_clear once for Win32 > > aio-posix.c | 2 + > aio-win32.c | 7 ++- > async.c | 8 ++- > docs/aio_notify_bug.promela | 140 ++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 153 insertions(+), 4 deletions(-) > create mode 100644 docs/aio_notify_bug.promela Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan
diff --git a/aio-posix.c b/aio-posix.c index 249889f..5c8b266 100644 --- a/aio-posix.c +++ b/aio-posix.c @@ -276,6 +276,8 @@ bool aio_poll(AioContext *ctx, bool blocking) aio_context_acquire(ctx); } + event_notifier_test_and_clear(&ctx->notifier); + /* if we have any readable fds, dispatch event */ if (ret > 0) { for (i = 0; i < npfd; i++) { diff --git a/aio-win32.c b/aio-win32.c index ea655b0..7afc999 100644 --- a/aio-win32.c +++ b/aio-win32.c @@ -337,10 +337,11 @@ bool aio_poll(AioContext *ctx, bool blocking) aio_context_acquire(ctx); } - if (first && aio_bh_poll(ctx)) { - progress = true; + if (first) { + event_notifier_test_and_clear(&ctx->notifier); + progress |= aio_bh_poll(ctx); + first = false; } - first = false; /* if we have any signaled events, dispatch event */ event = NULL; diff --git a/async.c b/async.c index a232192..d625e8a 100644 --- a/async.c +++ b/async.c @@ -203,6 +203,8 @@ aio_ctx_check(GSource *source) QEMUBH *bh; atomic_and(&ctx->notify_me, ~1); + event_notifier_test_and_clear(&ctx->notifier); + for (bh = ctx->first_bh; bh; bh = bh->next) { if (!bh->deleted && bh->scheduled) { return true; @@ -279,6 +281,10 @@ static void aio_rfifolock_cb(void *opaque) aio_notify(opaque); } +static void event_notifier_dummy_cb(EventNotifier *e) +{ +} + AioContext *aio_context_new(Error **errp) { int ret; @@ -293,7 +299,7 @@ AioContext *aio_context_new(Error **errp) g_source_set_can_recurse(&ctx->source, true); aio_set_event_notifier(ctx, &ctx->notifier, (EventNotifierHandler *) - event_notifier_test_and_clear); + event_notifier_dummy_cb); ctx->thread_pool = NULL; qemu_mutex_init(&ctx->bh_lock); rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx); diff --git a/docs/aio_notify_bug.promela b/docs/aio_notify_bug.promela new file mode 100644 index 0000000..b3bfca1 --- /dev/null +++ b/docs/aio_notify_bug.promela @@ -0,0 +1,140 @@ +/* + * This model describes a bug in aio_notify. If ctx->notifier is + * cleared too late, a wakeup could be lost. + * + * Author: Paolo Bonzini <pbonzini@redhat.com> + * + * This file is in the public domain. If you really want a license, + * the WTFPL will do. + * + * To verify the buggy version: + * spin -a -DBUG docs/aio_notify_bug.promela + * gcc -O2 pan.c + * ./a.out -a -f + * + * To verify the fixed version: + * spin -a docs/aio_notify_bug.promela + * gcc -O2 pan.c + * ./a.out -a -f + * + * Add -DCHECK_REQ to test an alternative invariant and the + * "notify_me" optimization. + */ + +int notify_me; +bool event; +bool req; +bool notifier_done; + +#ifdef CHECK_REQ +#define USE_NOTIFY_ME 1 +#else +#define USE_NOTIFY_ME 0 +#endif + +active proctype notifier() +{ + do + :: true -> { + req = 1; + if + :: !USE_NOTIFY_ME || notify_me -> event = 1; + :: else -> skip; + fi + } + :: true -> break; + od; + notifier_done = 1; +} + +#ifdef BUG +#define AIO_POLL \ + notify_me++; \ + if \ + :: !req -> { \ + if \ + :: event -> skip; \ + fi; \ + } \ + :: else -> skip; \ + fi; \ + notify_me--; \ + \ + req = 0; \ + event = 0; +#else +#define AIO_POLL \ + notify_me++; \ + if \ + :: !req -> { \ + if \ + :: event -> skip; \ + fi; \ + } \ + :: else -> skip; \ + fi; \ + notify_me--; \ + \ + event = 0; \ + req = 0; +#endif + +active proctype waiter() +{ + do + :: true -> AIO_POLL; + od; +} + +/* Same as waiter(), but disappears after a while. */ +active proctype temporary_waiter() +{ + do + :: true -> AIO_POLL; + :: true -> break; + od; +} + +#ifdef CHECK_REQ +never { + do + :: req -> goto accept_if_req_not_eventually_false; + :: true -> skip; + od; + +accept_if_req_not_eventually_false: + if + :: req -> goto accept_if_req_not_eventually_false; + fi; + assert(0); +} + +#else +/* There must be infinitely many transitions of event as long + * as the notifier does not exit. + * + * If event stayed always true, the waiters would be busy looping. + * If event stayed always false, the waiters would be sleeping + * forever. + */ +never { + do + :: !event -> goto accept_if_event_not_eventually_true; + :: event -> goto accept_if_event_not_eventually_false; + :: true -> skip; + od; + +accept_if_event_not_eventually_true: + if + :: !event && notifier_done -> do :: true -> skip; od; + :: !event && !notifier_done -> goto accept_if_event_not_eventually_true; + fi; + assert(0); + +accept_if_event_not_eventually_false: + if + :: event -> goto accept_if_event_not_eventually_false; + fi; + assert(0); +} +#endif
event_notifier_test_and_clear must be called before processing events. Otherwise, an aio_poll could "eat" the notification before the main I/O thread invokes ppoll(). The main I/O thread then never wakes up. This is an example of what could happen: i/o thread vcpu thread worker thread --------------------------------------------------------------------- lock_iothread notify_me = 1 ... unlock_iothread lock_iothread notify_me = 3 ppoll notify_me = 1 bh->scheduled = 1 event_notifier_set event_notifier_test_and_clear ppoll *** hang *** In contrast with the previous bug, there wasn't really much debugging to do here. "Tracing" with qemu_clock_get_ns shows pretty much the same behavior as in the previous patch, so the only wait to find the bug is staring at the code. One could also use a formal model, of course. The included one shows this with three processes: notifier corresponds to a QEMU thread pool worker, temporary_waiter to a VCPU thread that invokes aio_poll(), waiter to the main I/O thread. I would be happy to say that the formal model found the bug for me, but actually I wrote it after the fact. This patch is a bit of a big hammer. Fam is looking into optimizations. Reported-by: Richard W. M. Jones <rjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- v1->v2: only do event_notifier_test_and_clear once for Win32 aio-posix.c | 2 + aio-win32.c | 7 ++- async.c | 8 ++- docs/aio_notify_bug.promela | 140 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 153 insertions(+), 4 deletions(-) create mode 100644 docs/aio_notify_bug.promela