diff mbox

[v3,2/2] QEMUBH: make AioContext's bh re-entrant

Message ID 1371675569-6516-3-git-send-email-pingfank@linux.vnet.ibm.com
State New
Headers show

Commit Message

pingfan liu June 19, 2013, 8:59 p.m. UTC
BH will be used outside big lock, so introduce lock to protect
between the writers, ie, bh's adders and deleter. The lock only
affects the writers and bh's callback does not take this extra lock.
Note that for the same AioContext, aio_bh_poll() can not run in
parallel yet.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 async.c             | 22 ++++++++++++++++++++++
 include/block/aio.h |  5 +++++
 2 files changed, 27 insertions(+)

Comments

Stefan Hajnoczi June 20, 2013, 7:39 a.m. UTC | #1
On Thu, Jun 20, 2013 at 04:59:29AM +0800, Liu Ping Fan wrote:
> BH will be used outside big lock, so introduce lock to protect
> between the writers, ie, bh's adders and deleter. The lock only
> affects the writers and bh's callback does not take this extra lock.
> Note that for the same AioContext, aio_bh_poll() can not run in
> parallel yet.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  async.c             | 22 ++++++++++++++++++++++
>  include/block/aio.h |  5 +++++
>  2 files changed, 27 insertions(+)

qemu_bh_cancel() and qemu_bh_delete() are not modified by this patch.

It seems that calling them from a thread is a little risky because there
is no guarantee that the BH is no longer invoked after a thread calls
these functions.

I think that's worth a comment or do you want them to take the lock so
they become safe?

The other thing I'm unclear on is the ->idle assignment followed
immediately by a ->scheduled assignment.  Without memory barriers
aio_bh_poll() isn't guaranteed to get an ordered view of these updates:
it may see an idle BH as a regular scheduled BH because ->idle is still
0.

Stefan
Paolo Bonzini June 20, 2013, 8:16 a.m. UTC | #2
Il 20/06/2013 09:39, Stefan Hajnoczi ha scritto:
> qemu_bh_cancel() and qemu_bh_delete() are not modified by this patch.
> 
> It seems that calling them from a thread is a little risky because there
> is no guarantee that the BH is no longer invoked after a thread calls
> these functions.
> 
> I think that's worth a comment or do you want them to take the lock so
> they become safe?

Taking the lock wouldn't help.  The invoking loop of aio_bh_poll runs
lockless.  I think a comment is better.

qemu_bh_cancel is inherently not thread-safe, there's not much you can
do about it.

qemu_bh_delete is safe as long as you wait for the bottom half to stop
before deleting the containing object.  Once we have RCU, deletion of
QOM objects will be RCU-protected.  Hence, a simple way could be to put
the first part of aio_bh_poll() within rcu_read_lock/unlock.

> The other thing I'm unclear on is the ->idle assignment followed
> immediately by a ->scheduled assignment.  Without memory barriers
> aio_bh_poll() isn't guaranteed to get an ordered view of these updates:
> it may see an idle BH as a regular scheduled BH because ->idle is still
> 0.

Right.  You need to order ->idle writes before ->scheduled writes, and
add memory barriers, or alternatively use two bits in ->scheduled so
that you can assign both atomically.

Paolo
pingfan liu June 20, 2013, 9:12 a.m. UTC | #3
On Thu, Jun 20, 2013 at 4:16 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> Il 20/06/2013 09:39, Stefan Hajnoczi ha scritto:
>> qemu_bh_cancel() and qemu_bh_delete() are not modified by this patch.
>>
>> It seems that calling them from a thread is a little risky because there
>> is no guarantee that the BH is no longer invoked after a thread calls
>> these functions.
>>
>> I think that's worth a comment or do you want them to take the lock so
>> they become safe?
>
> Taking the lock wouldn't help.  The invoking loop of aio_bh_poll runs
> lockless.  I think a comment is better.
>
Yes, will document it.
> qemu_bh_cancel is inherently not thread-safe, there's not much you can
> do about it.
>
> qemu_bh_delete is safe as long as you wait for the bottom half to stop
> before deleting the containing object.  Once we have RCU, deletion of
> QOM objects will be RCU-protected.  Hence, a simple way could be to put
> the first part of aio_bh_poll() within rcu_read_lock/unlock.
>
>> The other thing I'm unclear on is the ->idle assignment followed
>> immediately by a ->scheduled assignment.  Without memory barriers
>> aio_bh_poll() isn't guaranteed to get an ordered view of these updates:
>> it may see an idle BH as a regular scheduled BH because ->idle is still
>> 0.
>
> Right.  You need to order ->idle writes before ->scheduled writes, and
> add memory barriers, or alternatively use two bits in ->scheduled so
> that you can assign both atomically.
>
I think just shift the position of smp_rmb/wmb in _schedule and _poll,
we can acheive this (callbacks will not refer to ->idle)

Regards,
Pingfan

> Paolo
Paolo Bonzini June 20, 2013, 9:19 a.m. UTC | #4
Il 20/06/2013 11:12, liu ping fan ha scritto:
>> Right.  You need to order ->idle writes before ->scheduled writes, and
>> add memory barriers, or alternatively use two bits in ->scheduled so
>> that you can assign both atomically.
>>
> I think just shift the position of smp_rmb/wmb in _schedule and _poll,
> we can acheive this (callbacks will not refer to ->idle)

Yes, but you also need to swap ->idle and ->scheduled assignments
(aio_bh_poll reads scheduled before idle; qemu_bh_schedule* must write
idle before scheduled).

Paolo
diff mbox

Patch

diff --git a/async.c b/async.c
index 90fe906..4b17eb7 100644
--- a/async.c
+++ b/async.c
@@ -47,11 +47,16 @@  QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque)
     bh->ctx = ctx;
     bh->cb = cb;
     bh->opaque = opaque;
+    qemu_mutex_lock(&ctx->bh_lock);
     bh->next = ctx->first_bh;
+    /* Make sure the members ready before putting bh into list */
+    smp_wmb();
     ctx->first_bh = bh;
+    qemu_mutex_unlock(&ctx->bh_lock);
     return bh;
 }
 
+/* Multiple occurrences of aio_bh_poll cannot be called concurrently */
 int aio_bh_poll(AioContext *ctx)
 {
     QEMUBH *bh, **bhp, *next;
@@ -61,12 +66,18 @@  int aio_bh_poll(AioContext *ctx)
 
     ret = 0;
     for (bh = ctx->first_bh; bh; bh = next) {
+        /* Make sure fetching bh before accessing its members */
+        smp_read_barrier_depends();
         next = bh->next;
         if (!bh->deleted && bh->scheduled) {
             bh->scheduled = 0;
             if (!bh->idle)
                 ret = 1;
             bh->idle = 0;
+            /* Paired with write barrier in bh schedule to ensure reading for
+             *  callbacks coming after bh's scheduling.
+             */
+            smp_rmb();
             bh->cb(bh->opaque);
         }
     }
@@ -75,6 +86,7 @@  int aio_bh_poll(AioContext *ctx)
 
     /* remove deleted bhs */
     if (!ctx->walking_bh) {
+        qemu_mutex_lock(&ctx->bh_lock);
         bhp = &ctx->first_bh;
         while (*bhp) {
             bh = *bhp;
@@ -85,6 +97,7 @@  int aio_bh_poll(AioContext *ctx)
                 bhp = &bh->next;
             }
         }
+        qemu_mutex_unlock(&ctx->bh_lock);
     }
 
     return ret;
@@ -94,6 +107,10 @@  void qemu_bh_schedule_idle(QEMUBH *bh)
 {
     if (bh->scheduled)
         return;
+    /* Make sure any writes that are needed by the callback are done
+     * before the locations are read in the aio_bh_poll.
+     */
+    smp_wmb();
     bh->scheduled = 1;
     bh->idle = 1;
 }
@@ -102,6 +119,10 @@  void qemu_bh_schedule(QEMUBH *bh)
 {
     if (bh->scheduled)
         return;
+    /* Make sure any writes that are needed by the callback are done
+     * before the locations are read in the aio_bh_poll.
+     */
+    smp_wmb();
     bh->scheduled = 1;
     bh->idle = 0;
     aio_notify(bh->ctx);
@@ -211,6 +232,7 @@  AioContext *aio_context_new(void)
     ctx = (AioContext *) g_source_new(&aio_source_funcs, sizeof(AioContext));
     ctx->pollfds = g_array_new(FALSE, FALSE, sizeof(GPollFD));
     ctx->thread_pool = NULL;
+    qemu_mutex_init(&ctx->bh_lock);
     event_notifier_init(&ctx->notifier, false);
     aio_set_event_notifier(ctx, &ctx->notifier, 
                            (EventNotifierHandler *)
diff --git a/include/block/aio.h b/include/block/aio.h
index 1836793..686b10f 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -17,6 +17,7 @@ 
 #include "qemu-common.h"
 #include "qemu/queue.h"
 #include "qemu/event_notifier.h"
+#include "qemu/thread.h"
 
 typedef struct BlockDriverAIOCB BlockDriverAIOCB;
 typedef void BlockDriverCompletionFunc(void *opaque, int ret);
@@ -53,6 +54,8 @@  typedef struct AioContext {
      */
     int walking_handlers;
 
+    /* lock to protect between bh's adders and deleter */
+    QemuMutex bh_lock;
     /* Anchor of the list of Bottom Halves belonging to the context */
     struct QEMUBH *first_bh;
 
@@ -127,6 +130,8 @@  void aio_notify(AioContext *ctx);
  * aio_bh_poll: Poll bottom halves for an AioContext.
  *
  * These are internal functions used by the QEMU main loop.
+ * And notice that multiple occurrences of aio_bh_poll cannot
+ * be called concurrently
  */
 int aio_bh_poll(AioContext *ctx);