diff mbox series

[v4,1/3] job: take each job's lock individually in job_txn_apply

Message ID 20200401081504.200017-2-s.reiter@proxmox.com
State New
Headers show
Series Fix some AIO context locking in jobs | expand

Commit Message

Stefan Reiter April 1, 2020, 8:15 a.m. UTC
All callers of job_txn_apply hold a single job's lock, but different
jobs within a transaction can have different contexts, thus we need to
lock each one individually before applying the callback function.

Similar to job_completed_txn_abort this also requires releasing the
caller's context before and reacquiring it after to avoid recursive
locks which might break AIO_WAIT_WHILE in the callback.

This also brings to light a different issue: When a callback function in
job_txn_apply moves it's job to a different AIO context, job_exit will
try to release the wrong lock (now that we re-acquire the lock
correctly, previously it would just continue with the old lock, leaving
the job unlocked for the rest of the codepath back to job_exit). Fix
this by not caching the job's context in job_exit and add a comment
about why this is done.

One test needed adapting, since it calls job_finalize directly, so it
manually needs to acquire the correct context.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 job.c                 | 48 ++++++++++++++++++++++++++++++++++---------
 tests/test-blockjob.c |  2 ++
 2 files changed, 40 insertions(+), 10 deletions(-)

Comments

Max Reitz April 2, 2020, 12:33 p.m. UTC | #1
On 01.04.20 10:15, Stefan Reiter wrote:
> All callers of job_txn_apply hold a single job's lock, but different
> jobs within a transaction can have different contexts, thus we need to
> lock each one individually before applying the callback function.
> 
> Similar to job_completed_txn_abort this also requires releasing the
> caller's context before and reacquiring it after to avoid recursive
> locks which might break AIO_WAIT_WHILE in the callback.
> 
> This also brings to light a different issue: When a callback function in
> job_txn_apply moves it's job to a different AIO context, job_exit will
> try to release the wrong lock (now that we re-acquire the lock
> correctly, previously it would just continue with the old lock, leaving
> the job unlocked for the rest of the codepath back to job_exit). Fix
> this by not caching the job's context in job_exit and add a comment
> about why this is done.
> 
> One test needed adapting, since it calls job_finalize directly, so it
> manually needs to acquire the correct context.
> 
> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
> ---
>  job.c                 | 48 ++++++++++++++++++++++++++++++++++---------
>  tests/test-blockjob.c |  2 ++
>  2 files changed, 40 insertions(+), 10 deletions(-)
> 
> diff --git a/job.c b/job.c
> index 134a07b92e..5fbaaabf78 100644
> --- a/job.c
> +++ b/job.c
> @@ -136,17 +136,36 @@ static void job_txn_del_job(Job *job)
>      }
>  }
>  
> -static int job_txn_apply(JobTxn *txn, int fn(Job *))
> +static int job_txn_apply(Job *job, int fn(Job *))
>  {
> -    Job *job, *next;
> +    AioContext *inner_ctx;
> +    Job *other_job, *next;
> +    JobTxn *txn = job->txn;
>      int rc = 0;
>  
> -    QLIST_FOREACH_SAFE(job, &txn->jobs, txn_list, next) {
> -        rc = fn(job);
> +    /*
> +     * Similar to job_completed_txn_abort, we take each job's lock before
> +     * applying fn, but since we assume that outer_ctx is held by the caller,
> +     * we need to release it here to avoid holding the lock twice - which would
> +     * break AIO_WAIT_WHILE from within fn.
> +     */
> +    aio_context_release(job->aio_context);

Hm, is it safe to drop the lock here and then reacquire it later?  I.e.,
is the job in a consistent state in between?  I don’t know.  Looks like
it.  Maybe someone else knows better.

(I find the job code rather confusing.)

> +
> +    QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
> +        inner_ctx = other_job->aio_context;
> +        aio_context_acquire(inner_ctx);
> +        rc = fn(other_job);
> +        aio_context_release(inner_ctx);
>          if (rc) {
>              break;
>          }
>      }
> +
> +    /*
> +     * Note that job->aio_context might have been changed by calling fn, so we
> +     * can't use a local variable to cache it.
> +     */
> +    aio_context_acquire(job->aio_context);

But all callers of job_txn_apply() (but job_exit(), which you fix in
this patch) cache it.  Won’t they release the wrong context then?  Or is
a context change only possible when job_txn_apply() is called from
job_exit()?

Max
Stefan Reiter April 2, 2020, 3:04 p.m. UTC | #2
On 02/04/2020 14:33, Max Reitz wrote:
> On 01.04.20 10:15, Stefan Reiter wrote:
>> All callers of job_txn_apply hold a single job's lock, but different
>> jobs within a transaction can have different contexts, thus we need to
>> lock each one individually before applying the callback function.
>>
>> Similar to job_completed_txn_abort this also requires releasing the
>> caller's context before and reacquiring it after to avoid recursive
>> locks which might break AIO_WAIT_WHILE in the callback.
>>
>> This also brings to light a different issue: When a callback function in
>> job_txn_apply moves it's job to a different AIO context, job_exit will
>> try to release the wrong lock (now that we re-acquire the lock
>> correctly, previously it would just continue with the old lock, leaving
>> the job unlocked for the rest of the codepath back to job_exit). Fix
>> this by not caching the job's context in job_exit and add a comment
>> about why this is done.
>>
>> One test needed adapting, since it calls job_finalize directly, so it
>> manually needs to acquire the correct context.
>>
>> Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
>> ---
>>   job.c                 | 48 ++++++++++++++++++++++++++++++++++---------
>>   tests/test-blockjob.c |  2 ++
>>   2 files changed, 40 insertions(+), 10 deletions(-)
>>
>> diff --git a/job.c b/job.c
>> index 134a07b92e..5fbaaabf78 100644
>> --- a/job.c
>> +++ b/job.c
>> @@ -136,17 +136,36 @@ static void job_txn_del_job(Job *job)
>>       }
>>   }
>>   
>> -static int job_txn_apply(JobTxn *txn, int fn(Job *))
>> +static int job_txn_apply(Job *job, int fn(Job *))
>>   {
>> -    Job *job, *next;
>> +    AioContext *inner_ctx;
>> +    Job *other_job, *next;
>> +    JobTxn *txn = job->txn;
>>       int rc = 0;
>>   
>> -    QLIST_FOREACH_SAFE(job, &txn->jobs, txn_list, next) {
>> -        rc = fn(job);
>> +    /*
>> +     * Similar to job_completed_txn_abort, we take each job's lock before
>> +     * applying fn, but since we assume that outer_ctx is held by the caller,
>> +     * we need to release it here to avoid holding the lock twice - which would
>> +     * break AIO_WAIT_WHILE from within fn.
>> +     */
>> +    aio_context_release(job->aio_context);
> 
> Hm, is it safe to drop the lock here and then reacquire it later?  I.e.,
> is the job in a consistent state in between?  I don’t know.  Looks like
> it.  Maybe someone else knows better.
>

I would have said so too, but the iotest segfaults Kevin discovered (I 
can reproduce them after a dozen or so cycles) seem to be triggered by 
some kind of race, which I can only imagine being caused by it not being 
safe to drop the lock.

It can be resolved by doing a job_ref/unref in job_txn_apply, but that 
might be only fixing the symptoms and ignoring the problem.

> (I find the job code rather confusing.)
> 

That seems to be common around here ;)

>> +
>> +    QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
>> +        inner_ctx = other_job->aio_context;
>> +        aio_context_acquire(inner_ctx);

Ignoring the fact that fn() can change a job's lock, one idea I had was 
to do a

   if (inner_ctx != job->aio_context) {
       aio_context_acquire(inner_ctx);
   }

instead of releasing the lock prior.
However, that gets complicated when the job's context does change in fn.

>> +        rc = fn(other_job);
>> +        aio_context_release(inner_ctx);
>>           if (rc) {
>>               break;
>>           }
>>       }
>> +
>> +    /*
>> +     * Note that job->aio_context might have been changed by calling fn, so we
>> +     * can't use a local variable to cache it.
>> +     */
>> +    aio_context_acquire(job->aio_context);
> 
> But all callers of job_txn_apply() (but job_exit(), which you fix in
> this patch) cache it.  Won’t they release the wrong context then?  Or is
> a context change only possible when job_txn_apply() is called from
> job_exit()?
> 

You're right that it can probably change for other callers as well 
(though at least it doesn't seem to currently, since no other test cases 
fail?). Looking at the analysis Vladimir did [0], there's a few places 
where that would need fixing.

The issue is that all of these places would also need the job_ref/unref 
part AFAICT, which would make this a bit unwieldy...

[0] https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07867.html

> Max
>
diff mbox series

Patch

diff --git a/job.c b/job.c
index 134a07b92e..5fbaaabf78 100644
--- a/job.c
+++ b/job.c
@@ -136,17 +136,36 @@  static void job_txn_del_job(Job *job)
     }
 }
 
-static int job_txn_apply(JobTxn *txn, int fn(Job *))
+static int job_txn_apply(Job *job, int fn(Job *))
 {
-    Job *job, *next;
+    AioContext *inner_ctx;
+    Job *other_job, *next;
+    JobTxn *txn = job->txn;
     int rc = 0;
 
-    QLIST_FOREACH_SAFE(job, &txn->jobs, txn_list, next) {
-        rc = fn(job);
+    /*
+     * Similar to job_completed_txn_abort, we take each job's lock before
+     * applying fn, but since we assume that outer_ctx is held by the caller,
+     * we need to release it here to avoid holding the lock twice - which would
+     * break AIO_WAIT_WHILE from within fn.
+     */
+    aio_context_release(job->aio_context);
+
+    QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
+        inner_ctx = other_job->aio_context;
+        aio_context_acquire(inner_ctx);
+        rc = fn(other_job);
+        aio_context_release(inner_ctx);
         if (rc) {
             break;
         }
     }
+
+    /*
+     * Note that job->aio_context might have been changed by calling fn, so we
+     * can't use a local variable to cache it.
+     */
+    aio_context_acquire(job->aio_context);
     return rc;
 }
 
@@ -774,11 +793,11 @@  static void job_do_finalize(Job *job)
     assert(job && job->txn);
 
     /* prepare the transaction to complete */
-    rc = job_txn_apply(job->txn, job_prepare);
+    rc = job_txn_apply(job, job_prepare);
     if (rc) {
         job_completed_txn_abort(job);
     } else {
-        job_txn_apply(job->txn, job_finalize_single);
+        job_txn_apply(job, job_finalize_single);
     }
 }
 
@@ -824,10 +843,10 @@  static void job_completed_txn_success(Job *job)
         assert(other_job->ret == 0);
     }
 
-    job_txn_apply(txn, job_transition_to_pending);
+    job_txn_apply(job, job_transition_to_pending);
 
     /* If no jobs need manual finalization, automatically do so */
-    if (job_txn_apply(txn, job_needs_finalize) == 0) {
+    if (job_txn_apply(job, job_needs_finalize) == 0) {
         job_do_finalize(job);
     }
 }
@@ -849,9 +868,10 @@  static void job_completed(Job *job)
 static void job_exit(void *opaque)
 {
     Job *job = (Job *)opaque;
-    AioContext *ctx = job->aio_context;
+    AioContext *ctx;
 
-    aio_context_acquire(ctx);
+    job_ref(job);
+    aio_context_acquire(job->aio_context);
 
     /* This is a lie, we're not quiescent, but still doing the completion
      * callbacks. However, completion callbacks tend to involve operations that
@@ -862,6 +882,14 @@  static void job_exit(void *opaque)
 
     job_completed(job);
 
+    /*
+     * Note that calling job_completed can move the job to a different
+     * aio_context, so we cannot cache from above. job_txn_apply takes care of
+     * acquiring the new lock, and we ref/unref to avoid job_completed freeing
+     * the job underneath us.
+     */
+    ctx = job->aio_context;
+    job_unref(job);
     aio_context_release(ctx);
 }
 
diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c
index 4eeb184caf..7519847912 100644
--- a/tests/test-blockjob.c
+++ b/tests/test-blockjob.c
@@ -367,7 +367,9 @@  static void test_cancel_concluded(void)
     aio_poll(qemu_get_aio_context(), true);
     assert(job->status == JOB_STATUS_PENDING);
 
+    aio_context_acquire(job->aio_context);
     job_finalize(job, &error_abort);
+    aio_context_release(job->aio_context);
     assert(job->status == JOB_STATUS_CONCLUDED);
 
     cancel_common(s);