Message ID | 20200401081504.200017-2-s.reiter@proxmox.com |
---|---|
State | New |
Headers | show |
Series | Fix some AIO context locking in jobs | expand |
On 01.04.20 10:15, Stefan Reiter wrote: > All callers of job_txn_apply hold a single job's lock, but different > jobs within a transaction can have different contexts, thus we need to > lock each one individually before applying the callback function. > > Similar to job_completed_txn_abort this also requires releasing the > caller's context before and reacquiring it after to avoid recursive > locks which might break AIO_WAIT_WHILE in the callback. > > This also brings to light a different issue: When a callback function in > job_txn_apply moves it's job to a different AIO context, job_exit will > try to release the wrong lock (now that we re-acquire the lock > correctly, previously it would just continue with the old lock, leaving > the job unlocked for the rest of the codepath back to job_exit). Fix > this by not caching the job's context in job_exit and add a comment > about why this is done. > > One test needed adapting, since it calls job_finalize directly, so it > manually needs to acquire the correct context. > > Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> > --- > job.c | 48 ++++++++++++++++++++++++++++++++++--------- > tests/test-blockjob.c | 2 ++ > 2 files changed, 40 insertions(+), 10 deletions(-) > > diff --git a/job.c b/job.c > index 134a07b92e..5fbaaabf78 100644 > --- a/job.c > +++ b/job.c > @@ -136,17 +136,36 @@ static void job_txn_del_job(Job *job) > } > } > > -static int job_txn_apply(JobTxn *txn, int fn(Job *)) > +static int job_txn_apply(Job *job, int fn(Job *)) > { > - Job *job, *next; > + AioContext *inner_ctx; > + Job *other_job, *next; > + JobTxn *txn = job->txn; > int rc = 0; > > - QLIST_FOREACH_SAFE(job, &txn->jobs, txn_list, next) { > - rc = fn(job); > + /* > + * Similar to job_completed_txn_abort, we take each job's lock before > + * applying fn, but since we assume that outer_ctx is held by the caller, > + * we need to release it here to avoid holding the lock twice - which would > + * break AIO_WAIT_WHILE from within fn. > + */ > + aio_context_release(job->aio_context); Hm, is it safe to drop the lock here and then reacquire it later? I.e., is the job in a consistent state in between? I don’t know. Looks like it. Maybe someone else knows better. (I find the job code rather confusing.) > + > + QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) { > + inner_ctx = other_job->aio_context; > + aio_context_acquire(inner_ctx); > + rc = fn(other_job); > + aio_context_release(inner_ctx); > if (rc) { > break; > } > } > + > + /* > + * Note that job->aio_context might have been changed by calling fn, so we > + * can't use a local variable to cache it. > + */ > + aio_context_acquire(job->aio_context); But all callers of job_txn_apply() (but job_exit(), which you fix in this patch) cache it. Won’t they release the wrong context then? Or is a context change only possible when job_txn_apply() is called from job_exit()? Max
On 02/04/2020 14:33, Max Reitz wrote: > On 01.04.20 10:15, Stefan Reiter wrote: >> All callers of job_txn_apply hold a single job's lock, but different >> jobs within a transaction can have different contexts, thus we need to >> lock each one individually before applying the callback function. >> >> Similar to job_completed_txn_abort this also requires releasing the >> caller's context before and reacquiring it after to avoid recursive >> locks which might break AIO_WAIT_WHILE in the callback. >> >> This also brings to light a different issue: When a callback function in >> job_txn_apply moves it's job to a different AIO context, job_exit will >> try to release the wrong lock (now that we re-acquire the lock >> correctly, previously it would just continue with the old lock, leaving >> the job unlocked for the rest of the codepath back to job_exit). Fix >> this by not caching the job's context in job_exit and add a comment >> about why this is done. >> >> One test needed adapting, since it calls job_finalize directly, so it >> manually needs to acquire the correct context. >> >> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> >> --- >> job.c | 48 ++++++++++++++++++++++++++++++++++--------- >> tests/test-blockjob.c | 2 ++ >> 2 files changed, 40 insertions(+), 10 deletions(-) >> >> diff --git a/job.c b/job.c >> index 134a07b92e..5fbaaabf78 100644 >> --- a/job.c >> +++ b/job.c >> @@ -136,17 +136,36 @@ static void job_txn_del_job(Job *job) >> } >> } >> >> -static int job_txn_apply(JobTxn *txn, int fn(Job *)) >> +static int job_txn_apply(Job *job, int fn(Job *)) >> { >> - Job *job, *next; >> + AioContext *inner_ctx; >> + Job *other_job, *next; >> + JobTxn *txn = job->txn; >> int rc = 0; >> >> - QLIST_FOREACH_SAFE(job, &txn->jobs, txn_list, next) { >> - rc = fn(job); >> + /* >> + * Similar to job_completed_txn_abort, we take each job's lock before >> + * applying fn, but since we assume that outer_ctx is held by the caller, >> + * we need to release it here to avoid holding the lock twice - which would >> + * break AIO_WAIT_WHILE from within fn. >> + */ >> + aio_context_release(job->aio_context); > > Hm, is it safe to drop the lock here and then reacquire it later? I.e., > is the job in a consistent state in between? I don’t know. Looks like > it. Maybe someone else knows better. > I would have said so too, but the iotest segfaults Kevin discovered (I can reproduce them after a dozen or so cycles) seem to be triggered by some kind of race, which I can only imagine being caused by it not being safe to drop the lock. It can be resolved by doing a job_ref/unref in job_txn_apply, but that might be only fixing the symptoms and ignoring the problem. > (I find the job code rather confusing.) > That seems to be common around here ;) >> + >> + QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) { >> + inner_ctx = other_job->aio_context; >> + aio_context_acquire(inner_ctx); Ignoring the fact that fn() can change a job's lock, one idea I had was to do a if (inner_ctx != job->aio_context) { aio_context_acquire(inner_ctx); } instead of releasing the lock prior. However, that gets complicated when the job's context does change in fn. >> + rc = fn(other_job); >> + aio_context_release(inner_ctx); >> if (rc) { >> break; >> } >> } >> + >> + /* >> + * Note that job->aio_context might have been changed by calling fn, so we >> + * can't use a local variable to cache it. >> + */ >> + aio_context_acquire(job->aio_context); > > But all callers of job_txn_apply() (but job_exit(), which you fix in > this patch) cache it. Won’t they release the wrong context then? Or is > a context change only possible when job_txn_apply() is called from > job_exit()? > You're right that it can probably change for other callers as well (though at least it doesn't seem to currently, since no other test cases fail?). Looking at the analysis Vladimir did [0], there's a few places where that would need fixing. The issue is that all of these places would also need the job_ref/unref part AFAICT, which would make this a bit unwieldy... [0] https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07867.html > Max >
diff --git a/job.c b/job.c index 134a07b92e..5fbaaabf78 100644 --- a/job.c +++ b/job.c @@ -136,17 +136,36 @@ static void job_txn_del_job(Job *job) } } -static int job_txn_apply(JobTxn *txn, int fn(Job *)) +static int job_txn_apply(Job *job, int fn(Job *)) { - Job *job, *next; + AioContext *inner_ctx; + Job *other_job, *next; + JobTxn *txn = job->txn; int rc = 0; - QLIST_FOREACH_SAFE(job, &txn->jobs, txn_list, next) { - rc = fn(job); + /* + * Similar to job_completed_txn_abort, we take each job's lock before + * applying fn, but since we assume that outer_ctx is held by the caller, + * we need to release it here to avoid holding the lock twice - which would + * break AIO_WAIT_WHILE from within fn. + */ + aio_context_release(job->aio_context); + + QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) { + inner_ctx = other_job->aio_context; + aio_context_acquire(inner_ctx); + rc = fn(other_job); + aio_context_release(inner_ctx); if (rc) { break; } } + + /* + * Note that job->aio_context might have been changed by calling fn, so we + * can't use a local variable to cache it. + */ + aio_context_acquire(job->aio_context); return rc; } @@ -774,11 +793,11 @@ static void job_do_finalize(Job *job) assert(job && job->txn); /* prepare the transaction to complete */ - rc = job_txn_apply(job->txn, job_prepare); + rc = job_txn_apply(job, job_prepare); if (rc) { job_completed_txn_abort(job); } else { - job_txn_apply(job->txn, job_finalize_single); + job_txn_apply(job, job_finalize_single); } } @@ -824,10 +843,10 @@ static void job_completed_txn_success(Job *job) assert(other_job->ret == 0); } - job_txn_apply(txn, job_transition_to_pending); + job_txn_apply(job, job_transition_to_pending); /* If no jobs need manual finalization, automatically do so */ - if (job_txn_apply(txn, job_needs_finalize) == 0) { + if (job_txn_apply(job, job_needs_finalize) == 0) { job_do_finalize(job); } } @@ -849,9 +868,10 @@ static void job_completed(Job *job) static void job_exit(void *opaque) { Job *job = (Job *)opaque; - AioContext *ctx = job->aio_context; + AioContext *ctx; - aio_context_acquire(ctx); + job_ref(job); + aio_context_acquire(job->aio_context); /* This is a lie, we're not quiescent, but still doing the completion * callbacks. However, completion callbacks tend to involve operations that @@ -862,6 +882,14 @@ static void job_exit(void *opaque) job_completed(job); + /* + * Note that calling job_completed can move the job to a different + * aio_context, so we cannot cache from above. job_txn_apply takes care of + * acquiring the new lock, and we ref/unref to avoid job_completed freeing + * the job underneath us. + */ + ctx = job->aio_context; + job_unref(job); aio_context_release(ctx); } diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c index 4eeb184caf..7519847912 100644 --- a/tests/test-blockjob.c +++ b/tests/test-blockjob.c @@ -367,7 +367,9 @@ static void test_cancel_concluded(void) aio_poll(qemu_get_aio_context(), true); assert(job->status == JOB_STATUS_PENDING); + aio_context_acquire(job->aio_context); job_finalize(job, &error_abort); + aio_context_release(job->aio_context); assert(job->status == JOB_STATUS_CONCLUDED); cancel_common(s);
All callers of job_txn_apply hold a single job's lock, but different jobs within a transaction can have different contexts, thus we need to lock each one individually before applying the callback function. Similar to job_completed_txn_abort this also requires releasing the caller's context before and reacquiring it after to avoid recursive locks which might break AIO_WAIT_WHILE in the callback. This also brings to light a different issue: When a callback function in job_txn_apply moves it's job to a different AIO context, job_exit will try to release the wrong lock (now that we re-acquire the lock correctly, previously it would just continue with the old lock, leaving the job unlocked for the rest of the codepath back to job_exit). Fix this by not caching the job's context in job_exit and add a comment about why this is done. One test needed adapting, since it calls job_finalize directly, so it manually needs to acquire the correct context. Signed-off-by: Stefan Reiter <s.reiter@proxmox.com> --- job.c | 48 ++++++++++++++++++++++++++++++++++--------- tests/test-blockjob.c | 2 ++ 2 files changed, 40 insertions(+), 10 deletions(-)