Message ID | 20200401081504.200017-3-s.reiter@proxmox.com |
---|---|
State | New |
Headers | show |
Series | Fix some AIO context locking in jobs | expand |
On 01.04.20 10:15, Stefan Reiter wrote: > job_cancel_sync requires the job's lock to be held, all other callers > already do this (replication_stop, drive_backup_abort, > blockdev_backup_abort, job_cancel_sync_all, cancel_common). I think all other callers come directly from QMP, though, so they have no locks yet. This OTOH is called from a block driver function, so I would assume the BDS context is locked already (or rather, this is executed in the BDS context). I also think that the commit job runs in the same context. So I would assume that this would be a nested lock, which should be unnecessary and might cause problems. Maybe we should just assert that the job’s context is the current context? (Or would that still be problematic because now job_txn_apply() wants to release some context, and that isn’t possible without this patch? I would hope it’s possible, because I think we shouldn’t have to acquire the current context, and should be free to release it...? I have no idea. Maybe we are actually free to reacquire the current context here.) > Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> > --- > block/replication.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/block/replication.c b/block/replication.c > index 413d95407d..17ddc31569 100644 > --- a/block/replication.c > +++ b/block/replication.c > @@ -144,12 +144,18 @@ fail: > static void replication_close(BlockDriverState *bs) > { > BDRVReplicationState *s = bs->opaque; > + Job *commit_job; > + AioContext *commit_ctx; > > if (s->stage == BLOCK_REPLICATION_RUNNING) { > replication_stop(s->rs, false, NULL); > } > if (s->stage == BLOCK_REPLICATION_FAILOVER) { > - job_cancel_sync(&s->commit_job->job); > + commit_job = &s->commit_job->job; > + commit_ctx = commit_job->aio_context; > + aio_context_acquire(commit_ctx); > + job_cancel_sync(commit_job); > + aio_context_release(commit_ctx); Anyway, there’s also the problem that I would guess the job_cancel_sync() might move the job from its current context back into the main context. Then we’d release the wrong context here. Max > } > > if (s->mode == REPLICATION_MODE_SECONDARY) { >
On 02/04/2020 14:41, Max Reitz wrote: > On 01.04.20 10:15, Stefan Reiter wrote: >> job_cancel_sync requires the job's lock to be held, all other callers >> already do this (replication_stop, drive_backup_abort, >> blockdev_backup_abort, job_cancel_sync_all, cancel_common). > > I think all other callers come directly from QMP, though, so they have > no locks yet. This OTOH is called from a block driver function, so I > would assume the BDS context is locked already (or rather, this is > executed in the BDS context). > > I also think that the commit job runs in the same context. So I would > assume that this would be a nested lock, which should be unnecessary and > might cause problems. Maybe we should just assert that the job’s > context is the current context? > > (Or would that still be problematic because now job_txn_apply() wants to > release some context, and that isn’t possible without this patch? I > would hope it’s possible, because I think we shouldn’t have to acquire > the current context, and should be free to release it...? I have no > idea. Maybe we are actually free to reacquire the current context here.) > You're right, this seems to be unnecessary. Adding an assert(commit_job->aio_context == qemu_get_current_aio_context()); to make sure seems like a good idea though. bdrv_close appears to always have the lock on its driver's context held. >> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> >> --- >> block/replication.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/block/replication.c b/block/replication.c >> index 413d95407d..17ddc31569 100644 >> --- a/block/replication.c >> +++ b/block/replication.c >> @@ -144,12 +144,18 @@ fail: >> static void replication_close(BlockDriverState *bs) >> { >> BDRVReplicationState *s = bs->opaque; >> + Job *commit_job; >> + AioContext *commit_ctx; >> >> if (s->stage == BLOCK_REPLICATION_RUNNING) { >> replication_stop(s->rs, false, NULL); >> } >> if (s->stage == BLOCK_REPLICATION_FAILOVER) { >> - job_cancel_sync(&s->commit_job->job); >> + commit_job = &s->commit_job->job; >> + commit_ctx = commit_job->aio_context; >> + aio_context_acquire(commit_ctx); >> + job_cancel_sync(commit_job); >> + aio_context_release(commit_ctx); > > Anyway, there’s also the problem that I would guess the > job_cancel_sync() might move the job from its current context back into > the main context. Then we’d release the wrong context here. > > Max > >> } >> >> if (s->mode == REPLICATION_MODE_SECONDARY) { >> > >
diff --git a/block/replication.c b/block/replication.c index 413d95407d..17ddc31569 100644 --- a/block/replication.c +++ b/block/replication.c @@ -144,12 +144,18 @@ fail: static void replication_close(BlockDriverState *bs) { BDRVReplicationState *s = bs->opaque; + Job *commit_job; + AioContext *commit_ctx; if (s->stage == BLOCK_REPLICATION_RUNNING) { replication_stop(s->rs, false, NULL); } if (s->stage == BLOCK_REPLICATION_FAILOVER) { - job_cancel_sync(&s->commit_job->job); + commit_job = &s->commit_job->job; + commit_ctx = commit_job->aio_context; + aio_context_acquire(commit_ctx); + job_cancel_sync(commit_job); + aio_context_release(commit_ctx); } if (s->mode == REPLICATION_MODE_SECONDARY) {
job_cancel_sync requires the job's lock to be held, all other callers already do this (replication_stop, drive_backup_abort, blockdev_backup_abort, job_cancel_sync_all, cancel_common). Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> --- block/replication.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)