diff mbox

[v2] AioContext: fix broken placement of event_notifier_test_and_clear

Message ID 1437370031-9070-1-git-send-email-pbonzini@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini July 20, 2015, 5:27 a.m. UTC
event_notifier_test_and_clear must be called before processing events.
Otherwise, an aio_poll could "eat" the notification before the main
I/O thread invokes ppoll().  The main I/O thread then never wakes up.
This is an example of what could happen:

   i/o thread         vcpu thread                   worker thread
   ---------------------------------------------------------------------
   lock_iothread
   notify_me = 1
   ...
   unlock_iothread
                      lock_iothread
                      notify_me = 3
                      ppoll
                      notify_me = 1
                                                     bh->scheduled = 1
                                                     event_notifier_set
                      event_notifier_test_and_clear
   ppoll
   *** hang ***

In contrast with the previous bug, there wasn't really much debugging
to do here.  "Tracing" with qemu_clock_get_ns shows pretty much the
same behavior as in the previous patch, so the only wait to find the
bug is staring at the code.

One could also use a formal model, of course.  The included one shows
this with three processes: notifier corresponds to a QEMU thread pool
worker, temporary_waiter to a VCPU thread that invokes aio_poll(),
waiter to the main I/O thread.  I would be happy to say that the
formal model found the bug for me, but actually I wrote it after the
fact.

This patch is a bit of a big hammer.  Fam is looking into optimizations.

Reported-by: Richard W. M. Jones <rjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: only do event_notifier_test_and_clear once for Win32

 aio-posix.c                 |   2 +
 aio-win32.c                 |   7 ++-
 async.c                     |   8 ++-
 docs/aio_notify_bug.promela | 140 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 153 insertions(+), 4 deletions(-)
 create mode 100644 docs/aio_notify_bug.promela

Comments

Fam Zheng July 20, 2015, 7:46 a.m. UTC | #1
On Mon, 07/20 07:27, Paolo Bonzini wrote:
> diff --git a/aio-win32.c b/aio-win32.c
> index ea655b0..7afc999 100644
> --- a/aio-win32.c
> +++ b/aio-win32.c
> @@ -337,10 +337,11 @@ bool aio_poll(AioContext *ctx, bool blocking)
>              aio_context_acquire(ctx);
>          }
>  
> -        if (first && aio_bh_poll(ctx)) {
> -            progress = true;
> +        if (first) {
> +            event_notifier_test_and_clear(&ctx->notifier);

I'm looking at optimizing it but I don't fully understand the relationship
between aio_prepare and WaitForMultipleObjects. Do they get the same set of
events? What if a new event comes in between, for example, thread worker calls
aio_notify()?

Fam

> +            progress |= aio_bh_poll(ctx);
> +            first = false;
>          }
> -        first = false;
>  
>          /* if we have any signaled events, dispatch event */
>          event = NULL;
Fam Zheng July 20, 2015, 10:06 a.m. UTC | #2
On Mon, 07/20 15:46, Fam Zheng wrote:
> On Mon, 07/20 07:27, Paolo Bonzini wrote:
> > diff --git a/aio-win32.c b/aio-win32.c
> > index ea655b0..7afc999 100644
> > --- a/aio-win32.c
> > +++ b/aio-win32.c
> > @@ -337,10 +337,11 @@ bool aio_poll(AioContext *ctx, bool blocking)
> >              aio_context_acquire(ctx);
> >          }
> >  
> > -        if (first && aio_bh_poll(ctx)) {
> > -            progress = true;
> > +        if (first) {
> > +            event_notifier_test_and_clear(&ctx->notifier);
> 
> I'm looking at optimizing it but I don't fully understand the relationship
> between aio_prepare and WaitForMultipleObjects. Do they get the same set of
> events? What if a new event comes in between, for example, thread worker calls
> aio_notify()?
> 

After some reading I think WaitForMultipleObjects is for event notifiers and
aio_prepare is for select() on fd events.

It's a bit trickier than aio-posix, in the first iteration there could be
another event masking ctx->notifier so we don't know if we need to clear it.
But since MSDN says:

"""
... the return value minus WAIT_OBJECT_0 indicates the lpHandles array index of
the object that satisfied the wait. If more than one object became signaled
during the call, this is the array index of the signaled object with the
smallest index value of all the signaled objects.
"""

Maybe we can reverse events[] so that ctx->notifier will be the 0th one. And I
think we can always remove it after first iteration, am I right?

Fam
Paolo Bonzini July 20, 2015, 12:42 p.m. UTC | #3
> > I'm looking at optimizing it but I don't fully understand the relationship
> > between aio_prepare and WaitForMultipleObjects. Do they get the same set of
> > events?
> 
> After some reading I think WaitForMultipleObjects is for event notifiers and
> aio_prepare is for select() on fd events.
> It's a bit trickier than aio-posix, in the first iteration there could be
> another event masking ctx->notifier so we don't know if we need to clear it.
> Maybe we can reverse events[] so that ctx->notifier will be the 0th one. And I
> think we can always remove it after first iteration, am I right?

Yes, that would work. I am not sure how complex it would be. You would also need
a solution for the GSource and one (probably similar to aio-posix) for your
epoll implementation. With ctx->notified at least you can encapsulate it in
aio_notify_accept...

Stefan, any preferences?

Paolo
Stefan Hajnoczi July 20, 2015, 4:36 p.m. UTC | #4
On Mon, Jul 20, 2015 at 07:27:11AM +0200, Paolo Bonzini wrote:
> event_notifier_test_and_clear must be called before processing events.
> Otherwise, an aio_poll could "eat" the notification before the main
> I/O thread invokes ppoll().  The main I/O thread then never wakes up.
> This is an example of what could happen:
> 
>    i/o thread         vcpu thread                   worker thread
>    ---------------------------------------------------------------------
>    lock_iothread
>    notify_me = 1
>    ...
>    unlock_iothread
>                       lock_iothread
>                       notify_me = 3
>                       ppoll
>                       notify_me = 1
>                                                      bh->scheduled = 1
>                                                      event_notifier_set
>                       event_notifier_test_and_clear
>    ppoll
>    *** hang ***

I don't understand this diagram.  Why is event_notifier_test_and_clear()
called by the vcpu thread?  event_notifier_set() was called *after* vcpu
thread's ppoll() returned so I wouldn't expect the vcpu thread to see
the notify.

Was there a previous notify pending?
Paolo Bonzini July 20, 2015, 7:50 p.m. UTC | #5
On 20/07/2015 18:36, Stefan Hajnoczi wrote:
>> > 
>> >    i/o thread         vcpu thread                   worker thread
>> >    ---------------------------------------------------------------------
>> >    lock_iothread
>> >    notify_me = 1
>> >    ...
>> >    unlock_iothread
>> >                       lock_iothread
>> >                       notify_me = 3
>> >                       ppoll
>> >                       notify_me = 1
>> >                                                      bh->scheduled = 1
>> >                                                      event_notifier_set
>> >                       event_notifier_test_and_clear
>> >    ppoll
>> >    *** hang ***
> I don't understand this diagram.  Why is event_notifier_test_and_clear()
> called by the vcpu thread?  event_notifier_set() was called *after* vcpu
> thread's ppoll() returned so I wouldn't expect the vcpu thread to see
> the notify.
> 
> Was there a previous notify pending?

Yes, for example a bottom half could have rescheduled itself.

Paolo
Fam Zheng July 21, 2015, 1:19 a.m. UTC | #6
On Mon, 07/20 08:42, Paolo Bonzini wrote:
> 
> > > I'm looking at optimizing it but I don't fully understand the relationship
> > > between aio_prepare and WaitForMultipleObjects. Do they get the same set of
> > > events?
> > 
> > After some reading I think WaitForMultipleObjects is for event notifiers and
> > aio_prepare is for select() on fd events.
> > It's a bit trickier than aio-posix, in the first iteration there could be
> > another event masking ctx->notifier so we don't know if we need to clear it.
> > Maybe we can reverse events[] so that ctx->notifier will be the 0th one. And I
> > think we can always remove it after first iteration, am I right?
> 
> Yes, that would work. I am not sure how complex it would be. You would also need
> a solution for the GSource and one (probably similar to aio-posix) for your
> epoll implementation. With ctx->notified at least you can encapsulate it in
> aio_notify_accept...

Yes, I'll take this into consideration when I refactor for epoll
implementation.  For now I think aio_notify_accept is cleaner.

Fam
Paolo Bonzini July 21, 2015, 6:51 a.m. UTC | #7
On 21/07/2015 03:19, Fam Zheng wrote:
> > Yes, that would work. I am not sure how complex it would be. You would also need
> > a solution for the GSource and one (probably similar to aio-posix) for your
> > epoll implementation. With ctx->notified at least you can encapsulate it in
> > aio_notify_accept...
> 
> Yes, I'll take this into consideration when I refactor for epoll
> implementation.  For now I think aio_notify_accept is cleaner.

Ok, I'll wait for Stefan to pick up the unoptimized version, and then
send v2 of aio_notify_accept.

Paolo
Stefan Hajnoczi July 21, 2015, 11:04 a.m. UTC | #8
On Mon, Jul 20, 2015 at 07:27:11AM +0200, Paolo Bonzini wrote:
> event_notifier_test_and_clear must be called before processing events.
> Otherwise, an aio_poll could "eat" the notification before the main
> I/O thread invokes ppoll().  The main I/O thread then never wakes up.
> This is an example of what could happen:
> 
>    i/o thread         vcpu thread                   worker thread
>    ---------------------------------------------------------------------
>    lock_iothread
>    notify_me = 1
>    ...
>    unlock_iothread
>                       lock_iothread
>                       notify_me = 3
>                       ppoll
>                       notify_me = 1
>                                                      bh->scheduled = 1
>                                                      event_notifier_set
>                       event_notifier_test_and_clear
>    ppoll
>    *** hang ***
> 
> In contrast with the previous bug, there wasn't really much debugging
> to do here.  "Tracing" with qemu_clock_get_ns shows pretty much the
> same behavior as in the previous patch, so the only wait to find the
> bug is staring at the code.
> 
> One could also use a formal model, of course.  The included one shows
> this with three processes: notifier corresponds to a QEMU thread pool
> worker, temporary_waiter to a VCPU thread that invokes aio_poll(),
> waiter to the main I/O thread.  I would be happy to say that the
> formal model found the bug for me, but actually I wrote it after the
> fact.
> 
> This patch is a bit of a big hammer.  Fam is looking into optimizations.
> 
> Reported-by: Richard W. M. Jones <rjones@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: only do event_notifier_test_and_clear once for Win32
> 
>  aio-posix.c                 |   2 +
>  aio-win32.c                 |   7 ++-
>  async.c                     |   8 ++-
>  docs/aio_notify_bug.promela | 140 ++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 153 insertions(+), 4 deletions(-)
>  create mode 100644 docs/aio_notify_bug.promela

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan
diff mbox

Patch

diff --git a/aio-posix.c b/aio-posix.c
index 249889f..5c8b266 100644
--- a/aio-posix.c
+++ b/aio-posix.c
@@ -276,6 +276,8 @@  bool aio_poll(AioContext *ctx, bool blocking)
         aio_context_acquire(ctx);
     }
 
+    event_notifier_test_and_clear(&ctx->notifier);
+
     /* if we have any readable fds, dispatch event */
     if (ret > 0) {
         for (i = 0; i < npfd; i++) {
diff --git a/aio-win32.c b/aio-win32.c
index ea655b0..7afc999 100644
--- a/aio-win32.c
+++ b/aio-win32.c
@@ -337,10 +337,11 @@  bool aio_poll(AioContext *ctx, bool blocking)
             aio_context_acquire(ctx);
         }
 
-        if (first && aio_bh_poll(ctx)) {
-            progress = true;
+        if (first) {
+            event_notifier_test_and_clear(&ctx->notifier);
+            progress |= aio_bh_poll(ctx);
+            first = false;
         }
-        first = false;
 
         /* if we have any signaled events, dispatch event */
         event = NULL;
diff --git a/async.c b/async.c
index a232192..d625e8a 100644
--- a/async.c
+++ b/async.c
@@ -203,6 +203,8 @@  aio_ctx_check(GSource *source)
     QEMUBH *bh;
 
     atomic_and(&ctx->notify_me, ~1);
+    event_notifier_test_and_clear(&ctx->notifier);
+
     for (bh = ctx->first_bh; bh; bh = bh->next) {
         if (!bh->deleted && bh->scheduled) {
             return true;
@@ -279,6 +281,10 @@  static void aio_rfifolock_cb(void *opaque)
     aio_notify(opaque);
 }
 
+static void event_notifier_dummy_cb(EventNotifier *e)
+{
+}
+
 AioContext *aio_context_new(Error **errp)
 {
     int ret;
@@ -293,7 +299,7 @@  AioContext *aio_context_new(Error **errp)
     g_source_set_can_recurse(&ctx->source, true);
     aio_set_event_notifier(ctx, &ctx->notifier,
                            (EventNotifierHandler *)
-                           event_notifier_test_and_clear);
+                           event_notifier_dummy_cb);
     ctx->thread_pool = NULL;
     qemu_mutex_init(&ctx->bh_lock);
     rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx);
diff --git a/docs/aio_notify_bug.promela b/docs/aio_notify_bug.promela
new file mode 100644
index 0000000..b3bfca1
--- /dev/null
+++ b/docs/aio_notify_bug.promela
@@ -0,0 +1,140 @@ 
+/*
+ * This model describes a bug in aio_notify.  If ctx->notifier is
+ * cleared too late, a wakeup could be lost.
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This file is in the public domain.  If you really want a license,
+ * the WTFPL will do.
+ *
+ * To verify the buggy version:
+ *     spin -a -DBUG docs/aio_notify_bug.promela
+ *     gcc -O2 pan.c
+ *     ./a.out -a -f
+ *
+ * To verify the fixed version:
+ *     spin -a docs/aio_notify_bug.promela
+ *     gcc -O2 pan.c
+ *     ./a.out -a -f
+ *
+ * Add -DCHECK_REQ to test an alternative invariant and the
+ * "notify_me" optimization.
+ */
+
+int notify_me;
+bool event;
+bool req;
+bool notifier_done;
+
+#ifdef CHECK_REQ
+#define USE_NOTIFY_ME 1
+#else
+#define USE_NOTIFY_ME 0
+#endif
+
+active proctype notifier()
+{
+    do
+        :: true -> {
+            req = 1;
+            if
+               :: !USE_NOTIFY_ME || notify_me -> event = 1;
+               :: else -> skip;
+            fi
+        }
+        :: true -> break;
+    od;
+    notifier_done = 1;
+}
+
+#ifdef BUG
+#define AIO_POLL                                                    \
+    notify_me++;                                                    \
+    if                                                              \
+        :: !req -> {                                                \
+            if                                                      \
+                :: event -> skip;                                   \
+            fi;                                                     \
+        }                                                           \
+        :: else -> skip;                                            \
+    fi;                                                             \
+    notify_me--;                                                    \
+                                                                    \
+    req = 0;                                                        \
+    event = 0;
+#else
+#define AIO_POLL                                                    \
+    notify_me++;                                                    \
+    if                                                              \
+        :: !req -> {                                                \
+            if                                                      \
+                :: event -> skip;                                   \
+            fi;                                                     \
+        }                                                           \
+        :: else -> skip;                                            \
+    fi;                                                             \
+    notify_me--;                                                    \
+                                                                    \
+    event = 0;                                                      \
+    req = 0;
+#endif
+
+active proctype waiter()
+{
+    do
+       :: true -> AIO_POLL;
+    od;
+}
+
+/* Same as waiter(), but disappears after a while.  */
+active proctype temporary_waiter()
+{
+    do
+       :: true -> AIO_POLL;
+       :: true -> break;
+    od;
+}
+
+#ifdef CHECK_REQ
+never {
+    do
+        :: req -> goto accept_if_req_not_eventually_false;
+        :: true -> skip;
+    od;
+
+accept_if_req_not_eventually_false:
+    if
+        :: req -> goto accept_if_req_not_eventually_false;
+    fi;
+    assert(0);
+}
+
+#else
+/* There must be infinitely many transitions of event as long
+ * as the notifier does not exit.
+ *
+ * If event stayed always true, the waiters would be busy looping.
+ * If event stayed always false, the waiters would be sleeping
+ * forever.
+ */
+never {
+    do
+        :: !event    -> goto accept_if_event_not_eventually_true;
+        :: event     -> goto accept_if_event_not_eventually_false;
+        :: true      -> skip;
+    od;
+
+accept_if_event_not_eventually_true:
+    if
+        :: !event && notifier_done  -> do :: true -> skip; od;
+        :: !event && !notifier_done -> goto accept_if_event_not_eventually_true;
+    fi;
+    assert(0);
+
+accept_if_event_not_eventually_false:
+    if
+        :: event     -> goto accept_if_event_not_eventually_false;
+    fi;
+    assert(0);
+}
+#endif