diff mbox series

[PULL,1/1] test-bdrv-drain: fix iothread_join() hang

Message ID 20191014085211.25800-2-stefanha@redhat.com
State New
Headers show
Series [PULL,1/1] test-bdrv-drain: fix iothread_join() hang | expand

Commit Message

Stefan Hajnoczi Oct. 14, 2019, 8:52 a.m. UTC
tests/test-bdrv-drain can hang in tests/iothread.c:iothread_run():

  while (!atomic_read(&iothread->stopping)) {
      aio_poll(iothread->ctx, true);
  }

The iothread_join() function works as follows:

  void iothread_join(IOThread *iothread)
  {
      iothread->stopping = true;
      aio_notify(iothread->ctx);
      qemu_thread_join(&iothread->thread);

If iothread_run() checks iothread->stopping before the iothread_join()
thread sets stopping to true, then aio_notify() may be optimized away
and iothread_run() hangs forever in aio_poll().

The correct way to change iothread->stopping is from a BH that executes
within iothread_run().  This ensures that iothread->stopping is checked
after we set it to true.

This was already fixed for ./iothread.c (note this is a different source
file!) by commit 2362a28ea11c145e1a13ae79342d76dc118a72a6 ("iothread:
fix iothread_stop() race condition"), but not for tests/iothread.c.

Fixes: 0c330a734b51c177ab8488932ac3b0c4d63a718a
       ("aio: introduce aio_co_schedule and aio_co_wake")
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20191003100103.331-1-stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/iothread.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Paolo Bonzini Oct. 14, 2019, 11:11 a.m. UTC | #1
On 14/10/19 10:52, Stefan Hajnoczi wrote:
> tests/test-bdrv-drain can hang in tests/iothread.c:iothread_run():
> 
>   while (!atomic_read(&iothread->stopping)) {
>       aio_poll(iothread->ctx, true);
>   }
> 
> The iothread_join() function works as follows:
> 
>   void iothread_join(IOThread *iothread)
>   {
>       iothread->stopping = true;
>       aio_notify(iothread->ctx);
>       qemu_thread_join(&iothread->thread);
> 
> If iothread_run() checks iothread->stopping before the iothread_join()
> thread sets stopping to true, then aio_notify() may be optimized away
> and iothread_run() hangs forever in aio_poll().
> 
> The correct way to change iothread->stopping is from a BH that executes
> within iothread_run().  This ensures that iothread->stopping is checked
> after we set it to true.
> 
> This was already fixed for ./iothread.c (note this is a different source
> file!) by commit 2362a28ea11c145e1a13ae79342d76dc118a72a6 ("iothread:
> fix iothread_stop() race condition"), but not for tests/iothread.c.

Aha, I did have some kind of dejavu when sending the patch I have just
sent; let's see if this also fixes the test-aio-multithread assertion
failure.

Note that with this change the atomic read of iothread->stopping can go
away; I can send a separate patch later.

Paolo

> Fixes: 0c330a734b51c177ab8488932ac3b0c4d63a718a
>        ("aio: introduce aio_co_schedule and aio_co_wake")
> Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Message-Id: <20191003100103.331-1-stefanha@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tests/iothread.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/iothread.c b/tests/iothread.c
> index 777d9eea46..13c9fdcd8d 100644
> --- a/tests/iothread.c
> +++ b/tests/iothread.c
> @@ -55,10 +55,16 @@ static void *iothread_run(void *opaque)
>      return NULL;
>  }
>  
> -void iothread_join(IOThread *iothread)
> +static void iothread_stop_bh(void *opaque)
>  {
> +    IOThread *iothread = opaque;
> +
>      iothread->stopping = true;
> -    aio_notify(iothread->ctx);
> +}
> +
> +void iothread_join(IOThread *iothread)
> +{
> +    aio_bh_schedule_oneshot(iothread->ctx, iothread_stop_bh, iothread);
>      qemu_thread_join(&iothread->thread);
>      qemu_cond_destroy(&iothread->init_done_cond);
>      qemu_mutex_destroy(&iothread->init_done_lock);
>
Stefan Hajnoczi Oct. 15, 2019, 8:50 a.m. UTC | #2
On Mon, Oct 14, 2019 at 01:11:41PM +0200, Paolo Bonzini wrote:
> On 14/10/19 10:52, Stefan Hajnoczi wrote:
> > tests/test-bdrv-drain can hang in tests/iothread.c:iothread_run():
> > 
> >   while (!atomic_read(&iothread->stopping)) {
> >       aio_poll(iothread->ctx, true);
> >   }
> > 
> > The iothread_join() function works as follows:
> > 
> >   void iothread_join(IOThread *iothread)
> >   {
> >       iothread->stopping = true;
> >       aio_notify(iothread->ctx);
> >       qemu_thread_join(&iothread->thread);
> > 
> > If iothread_run() checks iothread->stopping before the iothread_join()
> > thread sets stopping to true, then aio_notify() may be optimized away
> > and iothread_run() hangs forever in aio_poll().
> > 
> > The correct way to change iothread->stopping is from a BH that executes
> > within iothread_run().  This ensures that iothread->stopping is checked
> > after we set it to true.
> > 
> > This was already fixed for ./iothread.c (note this is a different source
> > file!) by commit 2362a28ea11c145e1a13ae79342d76dc118a72a6 ("iothread:
> > fix iothread_stop() race condition"), but not for tests/iothread.c.
> 
> Aha, I did have some kind of dejavu when sending the patch I have just
> sent; let's see if this also fixes the test-aio-multithread assertion
> failure.
> 
> Note that with this change the atomic read of iothread->stopping can go
> away; I can send a separate patch later.

Yes, I thought about the atomic_read() later as well.

Stefan
diff mbox series

Patch

diff --git a/tests/iothread.c b/tests/iothread.c
index 777d9eea46..13c9fdcd8d 100644
--- a/tests/iothread.c
+++ b/tests/iothread.c
@@ -55,10 +55,16 @@  static void *iothread_run(void *opaque)
     return NULL;
 }
 
-void iothread_join(IOThread *iothread)
+static void iothread_stop_bh(void *opaque)
 {
+    IOThread *iothread = opaque;
+
     iothread->stopping = true;
-    aio_notify(iothread->ctx);
+}
+
+void iothread_join(IOThread *iothread)
+{
+    aio_bh_schedule_oneshot(iothread->ctx, iothread_stop_bh, iothread);
     qemu_thread_join(&iothread->thread);
     qemu_cond_destroy(&iothread->init_done_cond);
     qemu_mutex_destroy(&iothread->init_done_lock);