diff mbox series

[v10,13/21] job: detect change of aiocontext within job coroutine

Message ID 20220725073855.76049-14-eesposit@redhat.com
State New
Headers show
Series job: replace AioContext lock with job_mutex | expand

Commit Message

Emanuele Giuseppe Esposito July 25, 2022, 7:38 a.m. UTC
From: Paolo Bonzini <pbonzini@redhat.com>

We want to make sure access of job->aio_context is always done
under either BQL or job_mutex. The problem is that using
aio_co_enter(job->aiocontext, job->co) in job_start and job_enter_cond
makes the coroutine immediately resume, so we can't hold the job lock.
And caching it is not safe either, as it might change.

job_start is under BQL, so it can freely read job->aiocontext, but
job_enter_cond is not. In order to fix this, use aio_co_wake():
the advantage is that it won't use job->aiocontext, but the
main disadvantage is that it won't be able to detect a change of
job AioContext.

Calling bdrv_try_set_aio_context() will issue the following calls
(simplified):
* in terms of  bdrv callbacks:
  .drained_begin -> .set_aio_context -> .drained_end
* in terms of child_job functions:
  child_job_drained_begin -> child_job_set_aio_context -> child_job_drained_end
* in terms of job functions:
  job_pause_locked -> job_set_aio_context -> job_resume_locked

We can see that after setting the new aio_context, job_resume_locked
calls again job_enter_cond, which then invokes aio_co_wake(). But
while job->aiocontext has been set in job_set_aio_context,
job->co->ctx has not changed, so the coroutine would be entering in
the wrong aiocontext.

Using aio_co_schedule in job_resume_locked() might seem as a valid
alternative, but the problem is that the bh resuming the coroutine
is not scheduled immediately, and if in the meanwhile another
bdrv_try_set_aio_context() is run (see test_propagate_mirror() in
test-block-iothread.c), we would have the first schedule in the
wrong aiocontext, and the second set of drains won't even manage
to schedule the coroutine, as job->busy would still be true from
the previous job_resume_locked().

The solution is to stick with aio_co_wake(), but then detect every time
the coroutine resumes back from yielding if job->aio_context
has changed. If so, we can reschedule it to the new context.

Check for the aiocontext change in job_do_yield_locked because:
1) aio_co_reschedule_self requires to be in the running coroutine
2) since child_job_set_aio_context allows changing the aiocontext only
   while the job is paused, this is the exact place where the coroutine
   resumes, before running JobDriver's code.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 job.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

Comments

Kevin Wolf Aug. 5, 2022, 8:37 a.m. UTC | #1
Am 25.07.2022 um 09:38 hat Emanuele Giuseppe Esposito geschrieben:
> From: Paolo Bonzini <pbonzini@redhat.com>
> 
> We want to make sure access of job->aio_context is always done
> under either BQL or job_mutex.

Is this the goal of this series? If so, it would have been useful to
state somewhere more obvious, because I had assumed that holding the BQL
would not be considered enough, but everyone needs to hold the job_mutex.

> The problem is that using
> aio_co_enter(job->aiocontext, job->co) in job_start and job_enter_cond
> makes the coroutine immediately resume, so we can't hold the job lock.
> And caching it is not safe either, as it might change.
> 
> job_start is under BQL, so it can freely read job->aiocontext, but
> job_enter_cond is not. In order to fix this, use aio_co_wake():
> the advantage is that it won't use job->aiocontext, but the
> main disadvantage is that it won't be able to detect a change of
> job AioContext.
> 
> Calling bdrv_try_set_aio_context() will issue the following calls
> (simplified):
> * in terms of  bdrv callbacks:
>   .drained_begin -> .set_aio_context -> .drained_end
> * in terms of child_job functions:
>   child_job_drained_begin -> child_job_set_aio_context -> child_job_drained_end
> * in terms of job functions:
>   job_pause_locked -> job_set_aio_context -> job_resume_locked
> 
> We can see that after setting the new aio_context, job_resume_locked
> calls again job_enter_cond, which then invokes aio_co_wake(). But
> while job->aiocontext has been set in job_set_aio_context,
> job->co->ctx has not changed, so the coroutine would be entering in
> the wrong aiocontext.
> 
> Using aio_co_schedule in job_resume_locked() might seem as a valid
> alternative, but the problem is that the bh resuming the coroutine
> is not scheduled immediately, and if in the meanwhile another
> bdrv_try_set_aio_context() is run (see test_propagate_mirror() in
> test-block-iothread.c), we would have the first schedule in the
> wrong aiocontext, and the second set of drains won't even manage
> to schedule the coroutine, as job->busy would still be true from
> the previous job_resume_locked().
> 
> The solution is to stick with aio_co_wake(), but then detect every time
> the coroutine resumes back from yielding if job->aio_context
> has changed. If so, we can reschedule it to the new context.

Hm, but with this in place, what does aio_co_wake() actually buy us
compared to aio_co_enter()?

I guess it's a bit simpler code because you don't have to explicitly
specify the AioContext, but we're still going to enter the coroutine in
the wrong AioContext occasionally and have to reschedule it, just like
in the existing code (except that the rescheduling doesn't exist there
yet).

So while I don't disagree with the change, I don't think the
justification in the commit message is right for this part.

> Check for the aiocontext change in job_do_yield_locked because:
> 1) aio_co_reschedule_self requires to be in the running coroutine
> 2) since child_job_set_aio_context allows changing the aiocontext only
>    while the job is paused, this is the exact place where the coroutine
>    resumes, before running JobDriver's code.
> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  job.c | 21 +++++++++++++++++++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/job.c b/job.c
> index b0729e2bb2..ecec66b44e 100644
> --- a/job.c
> +++ b/job.c
> @@ -585,7 +585,9 @@ void job_enter_cond_locked(Job *job, bool(*fn)(Job *job))
>      timer_del(&job->sleep_timer);
>      job->busy = true;
>      real_job_unlock();
> -    aio_co_enter(job->aio_context, job->co);
> +    job_unlock();
> +    aio_co_wake(job->co);
> +    job_lock();

The addition of job_unlock/lock is unrelated, this was necessary even
before this patch.

>  }
>  
>  void job_enter_cond(Job *job, bool(*fn)(Job *job))
> @@ -611,6 +613,8 @@ void job_enter(Job *job)
>   */
>  static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns)
>  {
> +    AioContext *next_aio_context;
> +
>      real_job_lock();
>      if (ns != -1) {
>          timer_mod(&job->sleep_timer, ns);
> @@ -622,7 +626,20 @@ static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns)
>      qemu_coroutine_yield();
>      job_lock();
>  
> -    /* Set by job_enter_cond() before re-entering the coroutine.  */
> +    next_aio_context = job->aio_context;
> +    /*
> +     * Coroutine has resumed, but in the meanwhile the job AioContext
> +     * might have changed via bdrv_try_set_aio_context(), so we need to move
> +     * the coroutine too in the new aiocontext.
> +     */
> +    while (qemu_get_current_aio_context() != next_aio_context) {
> +        job_unlock();
> +        aio_co_reschedule_self(next_aio_context);
> +        job_lock();
> +        next_aio_context = job->aio_context;
> +    }
> +
> +    /* Set by job_enter_cond_locked() before re-entering the coroutine.  */
>      assert(job->busy);
>  }

The actual code changes look good.

Kevin
Emanuele Giuseppe Esposito Aug. 16, 2022, 3:09 p.m. UTC | #2
Am 05/08/2022 um 10:37 schrieb Kevin Wolf:
> Am 25.07.2022 um 09:38 hat Emanuele Giuseppe Esposito geschrieben:
>> From: Paolo Bonzini <pbonzini@redhat.com>
>>
>> We want to make sure access of job->aio_context is always done
>> under either BQL or job_mutex.
> 
> Is this the goal of this series? If so, it would have been useful to
> state somewhere more obvious, because I had assumed that holding the BQL
> would not be considered enough, but everyone needs to hold the job_mutex.

It is the goal for this patch :)
The whole job API can't rely on BQL since there are coroutines running
in another aiocontext.
> 
>> The problem is that using
>> aio_co_enter(job->aiocontext, job->co) in job_start and job_enter_cond
>> makes the coroutine immediately resume, so we can't hold the job lock.
>> And caching it is not safe either, as it might change.
>>
>> job_start is under BQL, so it can freely read job->aiocontext, but
>> job_enter_cond is not. In order to fix this, use aio_co_wake():
>> the advantage is that it won't use job->aiocontext, but the
>> main disadvantage is that it won't be able to detect a change of
>> job AioContext.
>>
>> Calling bdrv_try_set_aio_context() will issue the following calls
>> (simplified):
>> * in terms of  bdrv callbacks:
>>   .drained_begin -> .set_aio_context -> .drained_end
>> * in terms of child_job functions:
>>   child_job_drained_begin -> child_job_set_aio_context -> child_job_drained_end
>> * in terms of job functions:
>>   job_pause_locked -> job_set_aio_context -> job_resume_locked
>>
>> We can see that after setting the new aio_context, job_resume_locked
>> calls again job_enter_cond, which then invokes aio_co_wake(). But
>> while job->aiocontext has been set in job_set_aio_context,
>> job->co->ctx has not changed, so the coroutine would be entering in
>> the wrong aiocontext.
>>
>> Using aio_co_schedule in job_resume_locked() might seem as a valid
>> alternative, but the problem is that the bh resuming the coroutine
>> is not scheduled immediately, and if in the meanwhile another
>> bdrv_try_set_aio_context() is run (see test_propagate_mirror() in
>> test-block-iothread.c), we would have the first schedule in the
>> wrong aiocontext, and the second set of drains won't even manage
>> to schedule the coroutine, as job->busy would still be true from
>> the previous job_resume_locked().
>>
>> The solution is to stick with aio_co_wake(), but then detect every time
>> the coroutine resumes back from yielding if job->aio_context
>> has changed. If so, we can reschedule it to the new context.
> 
> Hm, but with this in place, what does aio_co_wake() actually buy us
> compared to aio_co_enter()?
> 
> I guess it's a bit simpler code because you don't have to explicitly
> specify the AioContext, but we're still going to enter the coroutine in
> the wrong AioContext occasionally and have to reschedule it, just like
> in the existing code (except that the rescheduling doesn't exist there
> yet).
> 
> So while I don't disagree with the change, I don't think the
> justification in the commit message is right for this part.

What do you suggest to change?

> 
>> Check for the aiocontext change in job_do_yield_locked because:
>> 1) aio_co_reschedule_self requires to be in the running coroutine
>> 2) since child_job_set_aio_context allows changing the aiocontext only
>>    while the job is paused, this is the exact place where the coroutine
>>    resumes, before running JobDriver's code.
>>
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  job.c | 21 +++++++++++++++++++--
>>  1 file changed, 19 insertions(+), 2 deletions(-)
>>
>> diff --git a/job.c b/job.c
>> index b0729e2bb2..ecec66b44e 100644
>> --- a/job.c
>> +++ b/job.c
>> @@ -585,7 +585,9 @@ void job_enter_cond_locked(Job *job, bool(*fn)(Job *job))
>>      timer_del(&job->sleep_timer);
>>      job->busy = true;
>>      real_job_unlock();
>> -    aio_co_enter(job->aio_context, job->co);
>> +    job_unlock();
>> +    aio_co_wake(job->co);
>> +    job_lock();
> 
> The addition of job_unlock/lock is unrelated, this was necessary even
> before this patch.

Ok

> 
>>  }
>>  
>>  void job_enter_cond(Job *job, bool(*fn)(Job *job))
>> @@ -611,6 +613,8 @@ void job_enter(Job *job)
>>   */
>>  static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns)
>>  {
>> +    AioContext *next_aio_context;
>> +
>>      real_job_lock();
>>      if (ns != -1) {
>>          timer_mod(&job->sleep_timer, ns);
>> @@ -622,7 +626,20 @@ static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns)
>>      qemu_coroutine_yield();
>>      job_lock();
>>  
>> -    /* Set by job_enter_cond() before re-entering the coroutine.  */
>> +    next_aio_context = job->aio_context;
>> +    /*
>> +     * Coroutine has resumed, but in the meanwhile the job AioContext
>> +     * might have changed via bdrv_try_set_aio_context(), so we need to move
>> +     * the coroutine too in the new aiocontext.
>> +     */
>> +    while (qemu_get_current_aio_context() != next_aio_context) {
>> +        job_unlock();
>> +        aio_co_reschedule_self(next_aio_context);
>> +        job_lock();
>> +        next_aio_context = job->aio_context;
>> +    }
>> +
>> +    /* Set by job_enter_cond_locked() before re-entering the coroutine.  */
>>      assert(job->busy);
>>  }
> 

Emanuele
Kevin Wolf Aug. 17, 2022, 8:34 a.m. UTC | #3
Am 16.08.2022 um 17:09 hat Emanuele Giuseppe Esposito geschrieben:
> 
> 
> Am 05/08/2022 um 10:37 schrieb Kevin Wolf:
> > Am 25.07.2022 um 09:38 hat Emanuele Giuseppe Esposito geschrieben:
> >> From: Paolo Bonzini <pbonzini@redhat.com>
> >>
> >> We want to make sure access of job->aio_context is always done
> >> under either BQL or job_mutex.
> > 
> > Is this the goal of this series? If so, it would have been useful to
> > state somewhere more obvious, because I had assumed that holding the BQL
> > would not be considered enough, but everyone needs to hold the job_mutex.
> 
> It is the goal for this patch :)
> The whole job API can't rely on BQL since there are coroutines running
> in another aiocontext.

Yes, as I saw in patch 14, which describes the goal more clearly in the
commit message and also adds the corresponding documentation to
Job.aio_context. Maybe it would have been clearer if the documentation
were already in this patch.

> >> The problem is that using
> >> aio_co_enter(job->aiocontext, job->co) in job_start and job_enter_cond
> >> makes the coroutine immediately resume, so we can't hold the job lock.
> >> And caching it is not safe either, as it might change.
> >>
> >> job_start is under BQL, so it can freely read job->aiocontext, but
> >> job_enter_cond is not. In order to fix this, use aio_co_wake():
> >> the advantage is that it won't use job->aiocontext, but the
> >> main disadvantage is that it won't be able to detect a change of
> >> job AioContext.
> >>
> >> Calling bdrv_try_set_aio_context() will issue the following calls
> >> (simplified):
> >> * in terms of  bdrv callbacks:
> >>   .drained_begin -> .set_aio_context -> .drained_end
> >> * in terms of child_job functions:
> >>   child_job_drained_begin -> child_job_set_aio_context -> child_job_drained_end
> >> * in terms of job functions:
> >>   job_pause_locked -> job_set_aio_context -> job_resume_locked
> >>
> >> We can see that after setting the new aio_context, job_resume_locked
> >> calls again job_enter_cond, which then invokes aio_co_wake(). But
> >> while job->aiocontext has been set in job_set_aio_context,
> >> job->co->ctx has not changed, so the coroutine would be entering in
> >> the wrong aiocontext.
> >>
> >> Using aio_co_schedule in job_resume_locked() might seem as a valid
> >> alternative, but the problem is that the bh resuming the coroutine
> >> is not scheduled immediately, and if in the meanwhile another
> >> bdrv_try_set_aio_context() is run (see test_propagate_mirror() in
> >> test-block-iothread.c), we would have the first schedule in the
> >> wrong aiocontext, and the second set of drains won't even manage
> >> to schedule the coroutine, as job->busy would still be true from
> >> the previous job_resume_locked().
> >>
> >> The solution is to stick with aio_co_wake(), but then detect every time
> >> the coroutine resumes back from yielding if job->aio_context
> >> has changed. If so, we can reschedule it to the new context.
> > 
> > Hm, but with this in place, what does aio_co_wake() actually buy us
> > compared to aio_co_enter()?
> > 
> > I guess it's a bit simpler code because you don't have to explicitly
> > specify the AioContext, but we're still going to enter the coroutine in
> > the wrong AioContext occasionally and have to reschedule it, just like
> > in the existing code (except that the rescheduling doesn't exist there
> > yet).
> > 
> > So while I don't disagree with the change, I don't think the
> > justification in the commit message is right for this part.
> 
> What do you suggest to change?

The commit message shouldn't pretend that aio_co_wake() solves the
problem (it says "In order to fix this, use aio_co_wake"), even if
that's what you thought at first before you saw that the problem wasn't
fully fixed by it.

I would move the real solution up in the commit message ("In order to
fix this, detect every time..."), and then maybe mention why
aio_co_wake() doesn't solve the problem, but you're leaving it in anyway
because it's nicer than the previous sequence or something like that.

> >> Check for the aiocontext change in job_do_yield_locked because:
> >> 1) aio_co_reschedule_self requires to be in the running coroutine
> >> 2) since child_job_set_aio_context allows changing the aiocontext only
> >>    while the job is paused, this is the exact place where the coroutine
> >>    resumes, before running JobDriver's code.
> >>
> >> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> >> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Kevin
Emanuele Giuseppe Esposito Aug. 17, 2022, 11:16 a.m. UTC | #4
Am 17/08/2022 um 10:34 schrieb Kevin Wolf:
> Am 16.08.2022 um 17:09 hat Emanuele Giuseppe Esposito geschrieben:
>>
>>
>> Am 05/08/2022 um 10:37 schrieb Kevin Wolf:
>>> Am 25.07.2022 um 09:38 hat Emanuele Giuseppe Esposito geschrieben:
>>>> From: Paolo Bonzini <pbonzini@redhat.com>
>>>>
>>>> We want to make sure access of job->aio_context is always done
>>>> under either BQL or job_mutex.
>>>
>>> Is this the goal of this series? If so, it would have been useful to
>>> state somewhere more obvious, because I had assumed that holding the BQL
>>> would not be considered enough, but everyone needs to hold the job_mutex.
>>
>> It is the goal for this patch :)
>> The whole job API can't rely on BQL since there are coroutines running
>> in another aiocontext.
> 
> Yes, as I saw in patch 14, which describes the goal more clearly in the
> commit message and also adds the corresponding documentation to
> Job.aio_context. Maybe it would have been clearer if the documentation
> were already in this patch.
> 

I don't understand what you mean here, sorry. You mean
job_set_aiocontext documentation? I don't see what is confusing here,
could you please clarify?

>>>> The problem is that using
>>>> aio_co_enter(job->aiocontext, job->co) in job_start and job_enter_cond
>>>> makes the coroutine immediately resume, so we can't hold the job lock.
>>>> And caching it is not safe either, as it might change.
>>>>
>>>> job_start is under BQL, so it can freely read job->aiocontext, but
>>>> job_enter_cond is not. In order to fix this, use aio_co_wake():
>>>> the advantage is that it won't use job->aiocontext, but the
>>>> main disadvantage is that it won't be able to detect a change of
>>>> job AioContext.
>>>>
>>>> Calling bdrv_try_set_aio_context() will issue the following calls
>>>> (simplified):
>>>> * in terms of  bdrv callbacks:
>>>>   .drained_begin -> .set_aio_context -> .drained_end
>>>> * in terms of child_job functions:
>>>>   child_job_drained_begin -> child_job_set_aio_context -> child_job_drained_end
>>>> * in terms of job functions:
>>>>   job_pause_locked -> job_set_aio_context -> job_resume_locked
>>>>
>>>> We can see that after setting the new aio_context, job_resume_locked
>>>> calls again job_enter_cond, which then invokes aio_co_wake(). But
>>>> while job->aiocontext has been set in job_set_aio_context,
>>>> job->co->ctx has not changed, so the coroutine would be entering in
>>>> the wrong aiocontext.
>>>>
>>>> Using aio_co_schedule in job_resume_locked() might seem as a valid
>>>> alternative, but the problem is that the bh resuming the coroutine
>>>> is not scheduled immediately, and if in the meanwhile another
>>>> bdrv_try_set_aio_context() is run (see test_propagate_mirror() in
>>>> test-block-iothread.c), we would have the first schedule in the
>>>> wrong aiocontext, and the second set of drains won't even manage
>>>> to schedule the coroutine, as job->busy would still be true from
>>>> the previous job_resume_locked().
>>>>
>>>> The solution is to stick with aio_co_wake(), but then detect every time
>>>> the coroutine resumes back from yielding if job->aio_context
>>>> has changed. If so, we can reschedule it to the new context.
>>>
>>> Hm, but with this in place, what does aio_co_wake() actually buy us
>>> compared to aio_co_enter()?
>>>
>>> I guess it's a bit simpler code because you don't have to explicitly
>>> specify the AioContext, but we're still going to enter the coroutine in
>>> the wrong AioContext occasionally and have to reschedule it, just like
>>> in the existing code (except that the rescheduling doesn't exist there
>>> yet).
>>>
>>> So while I don't disagree with the change, I don't think the
>>> justification in the commit message is right for this part.
>>
>> What do you suggest to change?
> 
> The commit message shouldn't pretend that aio_co_wake() solves the
> problem (it says "In order to fix this, use aio_co_wake"), even if
> that's what you thought at first before you saw that the problem wasn't
> fully fixed by it.
> 
> I would move the real solution up in the commit message ("In order to
> fix this, detect every time..."), and then maybe mention why
> aio_co_wake() doesn't solve the problem, but you're leaving it in anyway
> because it's nicer than the previous sequence or something like that.

I think the issue is "In order to fix this", meaning what "this" refers
to. It is not the fix, it is the problem stated in the previous sentence.

Since job_enter_cond is not under BQL, we can't read job->aiocontext,
therefore use aio_co_wake to avoid doing that.

Would it be ok if I replace the paragraph with:

job_start is under BQL, so it can freely read job->aiocontext, but
job_enter_cond is not. In order to avoid reading the job aiocontext, use
aio_co_wake(), since it resorts so job->co->ctx. The
only disadvantage is that it won't be able to detect a change of
job AioContext, as explained below.

Emanuele
> 
>>>> Check for the aiocontext change in job_do_yield_locked because:
>>>> 1) aio_co_reschedule_self requires to be in the running coroutine
>>>> 2) since child_job_set_aio_context allows changing the aiocontext only
>>>>    while the job is paused, this is the exact place where the coroutine
>>>>    resumes, before running JobDriver's code.
>>>>
>>>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> Kevin
>
diff mbox series

Patch

diff --git a/job.c b/job.c
index b0729e2bb2..ecec66b44e 100644
--- a/job.c
+++ b/job.c
@@ -585,7 +585,9 @@  void job_enter_cond_locked(Job *job, bool(*fn)(Job *job))
     timer_del(&job->sleep_timer);
     job->busy = true;
     real_job_unlock();
-    aio_co_enter(job->aio_context, job->co);
+    job_unlock();
+    aio_co_wake(job->co);
+    job_lock();
 }
 
 void job_enter_cond(Job *job, bool(*fn)(Job *job))
@@ -611,6 +613,8 @@  void job_enter(Job *job)
  */
 static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns)
 {
+    AioContext *next_aio_context;
+
     real_job_lock();
     if (ns != -1) {
         timer_mod(&job->sleep_timer, ns);
@@ -622,7 +626,20 @@  static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns)
     qemu_coroutine_yield();
     job_lock();
 
-    /* Set by job_enter_cond() before re-entering the coroutine.  */
+    next_aio_context = job->aio_context;
+    /*
+     * Coroutine has resumed, but in the meanwhile the job AioContext
+     * might have changed via bdrv_try_set_aio_context(), so we need to move
+     * the coroutine too in the new aiocontext.
+     */
+    while (qemu_get_current_aio_context() != next_aio_context) {
+        job_unlock();
+        aio_co_reschedule_self(next_aio_context);
+        job_lock();
+        next_aio_context = job->aio_context;
+    }
+
+    /* Set by job_enter_cond_locked() before re-entering the coroutine.  */
     assert(job->busy);
 }