diff mbox

[v2] thread-pool: fix deadlock when callbacks depends on each other

Message ID 538C248F.2000801@beyond.pl
State New
Headers show

Commit Message

Marcin Gibuła June 2, 2014, 7:15 a.m. UTC
When two coroutines submit I/O and first coroutine depends on second to 
complete (by calling bdrv_drain_all), deadlock may occur.

This is because both requests may have completed before thread pool 
notifier got called. Then, when notifier gets executed and first 
coroutine calls aio_pool() to make progress, it will hang forever, as 
notifier's descriptor has been already marked clear.

This patch fixes this, by deferring clearing notifier until no 
completions are pending.

Without this patch, I could reproduce this bug with snapshot-commit with 
about 1 per 10 tries. With this patch, I couldn't reproduce it any more.

Signed-off-by: Marcin Gibula <m.gibula@beyond.pl>
---

THREAD_DONE) {
@@ -185,6 +190,8 @@ restart:
          }
          if (elem->state == THREAD_DONE && elem->common.cb) {
              QLIST_REMOVE(elem, all);
+            atomic_dec(&pool->pending_completions);
+
              /* Read state before ret.  */
              smp_rmb();
              elem->common.cb(elem->common.opaque, elem->ret);
@@ -196,6 +203,19 @@ restart:
              qemu_aio_release(elem);
          }
      }
+
+    /* Double test of pending_completions is necessary to
+     * ensure that there is no race between testing it and
+     * clearing notifier.
+     */
+    if (atomic_read(&pool->pending_completions)) {
+        goto restart;
+    }
+    event_notifier_test_and_clear(notifier);
+    if (atomic_read(&pool->pending_completions)) {
+        event_notifier_set(notifier);
+        goto restart;
+    }
  }

  static void thread_pool_cancel(BlockDriverAIOCB *acb)

Comments

Stefan Hajnoczi June 4, 2014, 10:01 a.m. UTC | #1
On Mon, Jun 02, 2014 at 09:15:27AM +0200, Marcin Gibuła wrote:
> When two coroutines submit I/O and first coroutine depends on second to
> complete (by calling bdrv_drain_all), deadlock may occur.

bdrv_drain_all() is a very heavy-weight operation.  Coroutines should
avoid it if possible.  Please post the file/line/function where this
call was made, perhaps there is a better way to wait for the other
coroutine.  This isn't a fix for this bug but it's a cleanup.

> This is because both requests may have completed before thread pool notifier
> got called. Then, when notifier gets executed and first coroutine calls
> aio_pool() to make progress, it will hang forever, as notifier's descriptor
> has been already marked clear.
> 
> This patch fixes this, by deferring clearing notifier until no completions
> are pending.
> 
> Without this patch, I could reproduce this bug with snapshot-commit with
> about 1 per 10 tries. With this patch, I couldn't reproduce it any more.
> 
> Signed-off-by: Marcin Gibula <m.gibula@beyond.pl>
> ---

This is an interesting bug that definitely needs a test case to prevent
regressions in the future.

Please take a look at tests/test-thread-pool.c and add a test to it.  It
can be reproduced deterministically - just call aio_poll() after the
dummy worker functions have both completed.  Then the next aio_poll()
call in the thread pool callback will suffer the problem you described.

Stefan
Paolo Bonzini June 4, 2014, 10:18 a.m. UTC | #2
Il 04/06/2014 12:01, Stefan Hajnoczi ha scritto:
>> > Without this patch, I could reproduce this bug with snapshot-commit with
>> > about 1 per 10 tries. With this patch, I couldn't reproduce it any more.
>> >
>> > Signed-off-by: Marcin Gibula <m.gibula@beyond.pl>
>> > ---
> This is an interesting bug that definitely needs a test case to prevent
> regressions in the future.
>
> Please take a look at tests/test-thread-pool.c and add a test to it.  It
> can be reproduced deterministically - just call aio_poll() after the
> dummy worker functions have both completed.  Then the next aio_poll()
> call in the thread pool callback will suffer the problem you described.

The question if we want to consider this thread-pool.c behavior a real 
bug or just a misfeature (the real bug being elsewhere).

Even though this patch avoids the performance problems of v1, we would 
have to fix at least two other cases and it's not obvious (a) that those 
two are the only ones (b) tgat those two can be fixed without affecting 
performance.

If the bottom half code is immune from this event notifier problem, 
bdrv_drain/bdrv_drain_all calls in coroutine context can defer the 
actual draining to a bottom half and reenter the coroutine afterwards; 
we can then audit that all other calls should come from the main loop 
rather than aio_poll.

Paolo
Marcin Gibuła June 4, 2014, 10:31 a.m. UTC | #3
On 04.06.2014 12:01, Stefan Hajnoczi wrote:
> On Mon, Jun 02, 2014 at 09:15:27AM +0200, Marcin Gibuła wrote:
>> When two coroutines submit I/O and first coroutine depends on second to
>> complete (by calling bdrv_drain_all), deadlock may occur.
>
> bdrv_drain_all() is a very heavy-weight operation.  Coroutines should
> avoid it if possible.  Please post the file/line/function where this
> call was made, perhaps there is a better way to wait for the other
> coroutine.  This isn't a fix for this bug but it's a cleanup.

As in original bug report:

#4  0x00007f699c095c0a in bdrv_drain_all () at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1805
#5  0x00007f699c09c87e in bdrv_close (bs=bs@entry=0x7f699f0bc520) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1695
#6  0x00007f699c09c5fa in bdrv_delete (bs=0x7f699f0bc520) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:1978
#7  bdrv_unref (bs=0x7f699f0bc520) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:5198
#8  0x00007f699c09c812 in bdrv_drop_intermediate 
(active=active@entry=0x7f699ebfd330, top=top@entry=0x7f699f0bc520, 
base=base@entry=0x7f699eec43d0) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block.c:2567
#9  0x00007f699c0a1963 in commit_run (opaque=0x7f699f17dcc0) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/block/commit.c:144
#10 0x00007f699c0e0dca in coroutine_trampoline (i0=<optimized out>, 
i1=<optimized out>) at 
/var/tmp/portage/app-emulation/qemu-2.0.0_rc2/work/qemu-2.0.0-rc2/coroutine-ucontext.c:118 



mirror_run probably has this as well. I didn't check others.
diff mbox

Patch

--- thread-pool.c	2014-04-17 15:44:45.000000000 +0200
+++ thread-pool.c	2014-06-02 09:10:25.442260590 +0200
@@ -76,6 +76,8 @@  struct ThreadPool {
      int new_threads;     /* backlog of threads we need to create */
      int pending_threads; /* threads created but not running yet */
      int pending_cancellations; /* whether we need a cond_broadcast */
+    int pending_completions; /* whether we need to rearm notifier when
+                                executing callback */
      bool stopping;
  };

@@ -110,6 +112,10 @@  static void *worker_thread(void *opaque)
          ret = req->func(req->arg);

          req->ret = ret;
+        if (req->common.cb) {
+            atomic_inc(&pool->pending_completions);
+        }
+
          /* Write ret before state.  */
          smp_wmb();
          req->state = THREAD_DONE;
@@ -173,7 +179,6 @@  static void event_notifier_ready(EventNo
      ThreadPool *pool = container_of(notifier, ThreadPool, notifier);
      ThreadPoolElement *elem, *next;

-    event_notifier_test_and_clear(notifier);
  restart:
      QLIST_FOREACH_SAFE(elem, &pool->head, all, next) {
          if (elem->state != THREAD_CANCELED && elem->state !=