diff mbox series

[RFC,3/5] job: Drop AioContext lock around aio_poll()

Message ID 20180817170246.14641-4-kwolf@redhat.com
State New
Headers show
Series Fix some jobs/drain/aio_poll related hangs | expand

Commit Message

Kevin Wolf Aug. 17, 2018, 5:02 p.m. UTC
Simimlar to AIO_WAIT_WHILE(), job_finish_sync() needs to release the
AioContext lock of the job before calling aio_poll(). Otherwise,
callbacks called by aio_poll() would possibly take the lock a second
time and run into a deadlock with a nested AIO_WAIT_WHILE() call.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 job.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Fam Zheng Aug. 24, 2018, 7:22 a.m. UTC | #1
On Fri, 08/17 19:02, Kevin Wolf wrote:
> Simimlar to AIO_WAIT_WHILE(), job_finish_sync() needs to release the
> AioContext lock of the job before calling aio_poll(). Otherwise,
> callbacks called by aio_poll() would possibly take the lock a second
> time and run into a deadlock with a nested AIO_WAIT_WHILE() call.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  job.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/job.c b/job.c
> index a746bfe70b..6acf55bceb 100644
> --- a/job.c
> +++ b/job.c
> @@ -1016,7 +1016,10 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
>          job_drain(job);
>      }
>      while (!job_is_completed(job)) {
> +        AioContext *aio_context = job->aio_context;
> +        aio_context_release(aio_context);
>          aio_poll(qemu_get_aio_context(), true);
> +        aio_context_acquire(aio_context);
>      }
>      ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : job->ret;
>      job_unref(job);

Why doesn't this function just use AIO_WAIT_WHILE()?

Fam
Kevin Wolf Sept. 4, 2018, 2:44 p.m. UTC | #2
Am 24.08.2018 um 09:22 hat Fam Zheng geschrieben:
> On Fri, 08/17 19:02, Kevin Wolf wrote:
> > Simimlar to AIO_WAIT_WHILE(), job_finish_sync() needs to release the
> > AioContext lock of the job before calling aio_poll(). Otherwise,
> > callbacks called by aio_poll() would possibly take the lock a second
> > time and run into a deadlock with a nested AIO_WAIT_WHILE() call.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  job.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/job.c b/job.c
> > index a746bfe70b..6acf55bceb 100644
> > --- a/job.c
> > +++ b/job.c
> > @@ -1016,7 +1016,10 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
> >          job_drain(job);
> >      }
> >      while (!job_is_completed(job)) {
> > +        AioContext *aio_context = job->aio_context;
> > +        aio_context_release(aio_context);
> >          aio_poll(qemu_get_aio_context(), true);
> > +        aio_context_acquire(aio_context);
> >      }
> >      ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : job->ret;
> >      job_unref(job);
> 
> Why doesn't this function just use AIO_WAIT_WHILE()?

I don't have an AioWait here, so this seemed the simplest way. But
thinking more about it, a dummy AioWait should do because otherwise the
code would already hang as it is.

Of course, we need to #include "block/aio-wait.h", which doesn't feel
completely right outside the block layer. But maybe that just means that
the header should be moved somewhere else.

Kevin
diff mbox series

Patch

diff --git a/job.c b/job.c
index a746bfe70b..6acf55bceb 100644
--- a/job.c
+++ b/job.c
@@ -1016,7 +1016,10 @@  int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
         job_drain(job);
     }
     while (!job_is_completed(job)) {
+        AioContext *aio_context = job->aio_context;
+        aio_context_release(aio_context);
         aio_poll(qemu_get_aio_context(), true);
+        aio_context_acquire(aio_context);
     }
     ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : job->ret;
     job_unref(job);