diff mbox

thread-pool: fix deadlock when callbacks depends on each other

Message ID 538A1F71.4080203@beyond.pl
State New
Headers show

Commit Message

Marcin Gibuła May 31, 2014, 6:29 p.m. UTC
When two coroutines submit I/O and first coroutine depends on second to 
complete (by calling bdrv_drain_all), deadlock may occur.

This is because both requests may have completed before thread pool 
notifier got called. Then, when notifier gets executed and first 
coroutine calls aio_pool() to make progress, it will hang forever, as 
notifier's descriptor has been already marked clear.

This patch fixes this, by rearming thread pool notifier if there are 
more than one completed requests on list.

Without this patch, I could reproduce this bug with snapshot-commit with 
about 1 per 10 tries. With this patch, I couldn't reproduce it any more.

Signed-off-by: Marcin Gibula <m.gibula@beyond.pl>
---

Comments

Paolo Bonzini June 1, 2014, 6:12 p.m. UTC | #1
Il 31/05/2014 20:29, Marcin Gibuła ha scritto:
>
> @@ -185,6 +191,14 @@ restart:
>          }
>          if (elem->state == THREAD_DONE && elem->common.cb) {
>              QLIST_REMOVE(elem, all);
> +            /* If more completed requests are waiting, notifier needs
> +             * to be rearmed so callback can progress with aio_pool().
> +             */
> +            pool->pending_completions--;
> +            if (pool->pending_completions) {
> +                event_notifier_set(notifier);
> +            }
> +
>              /* Read state before ret.  */
>              smp_rmb();
>              elem->common.cb(elem->common.opaque, elem->ret);

Good catch!  The main problem with the patch is that you need to use 
atomic_inc/atomic_dec to increment and decrement pool->pending_completions.

Secondarily, event_notifier_set is pretty heavy-weight, does it work if 
you wrap the loop like this?

restart:
     QLIST_FOREACH_SAFE(elem, &pool->head, all, next) {
         ...
     }
     if (pool->pending_completions) {
         goto restart;
     }
     event_notifier_test_and_clear(notifier);
     if (pool->pending_completions) {
         event_notifier_set(notifier);
         goto restart;
     }

Finally, the same bug is also in block/linux-aio.c and block/win32-aio.c.

Paolo
Marcin Gibuła June 1, 2014, 7:02 p.m. UTC | #2
> Good catch!  The main problem with the patch is that you need to use
> atomic_inc/atomic_dec to increment and decrement pool->pending_completions.

Ok.

> Secondarily, event_notifier_set is pretty heavy-weight, does it work if
> you wrap the loop like this?
>
> restart:
>      QLIST_FOREACH_SAFE(elem, &pool->head, all, next) {
>          ...
>      }
>      if (pool->pending_completions) {
>          goto restart;
>      }
>      event_notifier_test_and_clear(notifier);
>      if (pool->pending_completions) {
>          event_notifier_set(notifier);
>          goto restart;
>      }

I'll test it tomorrow. I assume you want to avoid calling 
event_notifier_set() until function is reentered via aio_pool?

 > Finally, the same bug is also in block/linux-aio.c and
 > block/win32-aio.c.

I can try with linux-aio, but my knowledge of windows api is zero...
Paolo Bonzini June 2, 2014, 3:32 p.m. UTC | #3
Il 01/06/2014 21:02, Marcin Gibuła ha scritto:
>> Good catch!  The main problem with the patch is that you need to use
>> atomic_inc/atomic_dec to increment and decrement
>> pool->pending_completions.
>
> Ok.
>
>> Secondarily, event_notifier_set is pretty heavy-weight, does it work if
>> you wrap the loop like this?
>>
>> restart:
>>      QLIST_FOREACH_SAFE(elem, &pool->head, all, next) {
>>          ...
>>      }
>>      if (pool->pending_completions) {
>>          goto restart;
>>      }
>>      event_notifier_test_and_clear(notifier);
>>      if (pool->pending_completions) {
>>          event_notifier_set(notifier);
>>          goto restart;
>>      }
>
> I'll test it tomorrow. I assume you want to avoid calling
> event_notifier_set() until function is reentered via aio_pool?

Yes.  But actually, I need to check if it's possible to fix 
bdrv_drain_all.  If you're in coroutine context, you can defer the 
draining to a safe point using a bottom half.  If you're not in 
coroutine context, perhaps bdrv_drain_all has to be made illegal.  Which 
means a bunch of code auditing...

Paolo

>> Finally, the same bug is also in block/linux-aio.c and
>> block/win32-aio.c.
>
> I can try with linux-aio, but my knowledge of windows api is zero...
>
Marcin Gibuła June 2, 2014, 3:57 p.m. UTC | #4
>> I'll test it tomorrow. I assume you want to avoid calling
>> event_notifier_set() until function is reentered via aio_pool?
>
> Yes.  But actually, I need to check if it's possible to fix
> bdrv_drain_all.  If you're in coroutine context, you can defer the
> draining to a safe point using a bottom half.  If you're not in
> coroutine context, perhaps bdrv_drain_all has to be made illegal.  Which
> means a bunch of code auditing...

For what it's worth, your solution also works fine, I couldn't recreate 
hang with it. Updated patch proposal posted earlier today.
diff mbox

Patch

--- thread-pool.c       2014-04-17 15:44:45.000000000 +0200
+++ thread-pool.c       2014-05-31 20:20:26.083011514 +0200
@@ -76,6 +76,8 @@  struct ThreadPool {
      int new_threads;     /* backlog of threads we need to create */
      int pending_threads; /* threads created but not running yet */
      int pending_cancellations; /* whether we need a cond_broadcast */
+    int pending_completions; /* whether we need to rearm notifier when
+                                executing callback */
      bool stopping;
  };

@@ -110,6 +112,10 @@  static void *worker_thread(void *opaque)
          ret = req->func(req->arg);

          req->ret = ret;
+        if (req->common.cb) {
+            pool->pending_completions++;
+        }
+
          /* Write ret before state.  */
          smp_wmb();
          req->state = THREAD_DONE;
@@ -185,6 +191,14 @@  restart:
          }
          if (elem->state == THREAD_DONE && elem->common.cb) {
              QLIST_REMOVE(elem, all);
+            /* If more completed requests are waiting, notifier needs
+             * to be rearmed so callback can progress with aio_pool().
+             */
+            pool->pending_completions--;
+            if (pool->pending_completions) {
+                event_notifier_set(notifier);
+            }
+
              /* Read state before ret.  */
              smp_rmb();
              elem->common.cb(elem->common.opaque, elem->ret);