Message ID | 20180817170246.14641-4-kwolf@redhat.com |
---|---|
State | New |
Headers | show |
Series | Fix some jobs/drain/aio_poll related hangs | expand |
On Fri, 08/17 19:02, Kevin Wolf wrote: > Simimlar to AIO_WAIT_WHILE(), job_finish_sync() needs to release the > AioContext lock of the job before calling aio_poll(). Otherwise, > callbacks called by aio_poll() would possibly take the lock a second > time and run into a deadlock with a nested AIO_WAIT_WHILE() call. > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > --- > job.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/job.c b/job.c > index a746bfe70b..6acf55bceb 100644 > --- a/job.c > +++ b/job.c > @@ -1016,7 +1016,10 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp) > job_drain(job); > } > while (!job_is_completed(job)) { > + AioContext *aio_context = job->aio_context; > + aio_context_release(aio_context); > aio_poll(qemu_get_aio_context(), true); > + aio_context_acquire(aio_context); > } > ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : job->ret; > job_unref(job); Why doesn't this function just use AIO_WAIT_WHILE()? Fam
Am 24.08.2018 um 09:22 hat Fam Zheng geschrieben: > On Fri, 08/17 19:02, Kevin Wolf wrote: > > Simimlar to AIO_WAIT_WHILE(), job_finish_sync() needs to release the > > AioContext lock of the job before calling aio_poll(). Otherwise, > > callbacks called by aio_poll() would possibly take the lock a second > > time and run into a deadlock with a nested AIO_WAIT_WHILE() call. > > > > Signed-off-by: Kevin Wolf <kwolf@redhat.com> > > --- > > job.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/job.c b/job.c > > index a746bfe70b..6acf55bceb 100644 > > --- a/job.c > > +++ b/job.c > > @@ -1016,7 +1016,10 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp) > > job_drain(job); > > } > > while (!job_is_completed(job)) { > > + AioContext *aio_context = job->aio_context; > > + aio_context_release(aio_context); > > aio_poll(qemu_get_aio_context(), true); > > + aio_context_acquire(aio_context); > > } > > ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : job->ret; > > job_unref(job); > > Why doesn't this function just use AIO_WAIT_WHILE()? I don't have an AioWait here, so this seemed the simplest way. But thinking more about it, a dummy AioWait should do because otherwise the code would already hang as it is. Of course, we need to #include "block/aio-wait.h", which doesn't feel completely right outside the block layer. But maybe that just means that the header should be moved somewhere else. Kevin
diff --git a/job.c b/job.c index a746bfe70b..6acf55bceb 100644 --- a/job.c +++ b/job.c @@ -1016,7 +1016,10 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp) job_drain(job); } while (!job_is_completed(job)) { + AioContext *aio_context = job->aio_context; + aio_context_release(aio_context); aio_poll(qemu_get_aio_context(), true); + aio_context_acquire(aio_context); } ret = (job_is_cancelled(job) && job->ret == 0) ? -ECANCELED : job->ret; job_unref(job);
Simimlar to AIO_WAIT_WHILE(), job_finish_sync() needs to release the AioContext lock of the job before calling aio_poll(). Otherwise, callbacks called by aio_poll() would possibly take the lock a second time and run into a deadlock with a nested AIO_WAIT_WHILE() call. Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- job.c | 3 +++ 1 file changed, 3 insertions(+)