diff mbox series

[3/7] jobs: add exit shim

Message ID 20180817190457.8292-4-jsnow@redhat.com
State New
Headers show
Series jobs: remove job_defer_to_main_loop | expand

Commit Message

John Snow Aug. 17, 2018, 7:04 p.m. UTC
All jobs do the same thing when they leave their running loop:
- Store the return code in a structure
- wait to receive this structure in the main thread
- signal job completion via job_completed

Few jobs do anything beyond exactly this. Consolidate this exit
logic for a net reduction in SLOC.

More seriously, when we utilize job_defer_to_main_loop_bh to call
a function that calls job_completed, job_finalize_single will run
in a context where it has recursively taken the aio_context lock,
which can cause hangs if it puts down a reference that causes a flush.

You can observe this in practice by looking at mirror_exit's careful
placement of job_completed and bdrv_unref calls.

If we centralize job exiting, we can signal job completion from outside
of the aio_context, which should allow for job cleanup code to run with
only one lock, which makes cleanup callbacks less tricky to write.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 include/qemu/job.h |  7 +++++++
 job.c              | 19 +++++++++++++++++++
 2 files changed, 26 insertions(+)

Comments

Eric Blake Aug. 20, 2018, 9:16 p.m. UTC | #1
On 08/17/2018 02:04 PM, John Snow wrote:
> All jobs do the same thing when they leave their running loop:
> - Store the return code in a structure
> - wait to receive this structure in the main thread
> - signal job completion via job_completed
> 
> Few jobs do anything beyond exactly this. Consolidate this exit
> logic for a net reduction in SLOC.
> 
> More seriously, when we utilize job_defer_to_main_loop_bh to call
> a function that calls job_completed, job_finalize_single will run
> in a context where it has recursively taken the aio_context lock,
> which can cause hangs if it puts down a reference that causes a flush.
> 
> You can observe this in practice by looking at mirror_exit's careful
> placement of job_completed and bdrv_unref calls.
> 
> If we centralize job exiting, we can signal job completion from outside
> of the aio_context, which should allow for job cleanup code to run with
> only one lock, which makes cleanup callbacks less tricky to write.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>   include/qemu/job.h |  7 +++++++
>   job.c              | 19 +++++++++++++++++++
>   2 files changed, 26 insertions(+)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>
Max Reitz Aug. 22, 2018, 11:43 a.m. UTC | #2
On 2018-08-17 21:04, John Snow wrote:
> All jobs do the same thing when they leave their running loop:
> - Store the return code in a structure
> - wait to receive this structure in the main thread
> - signal job completion via job_completed
> 
> Few jobs do anything beyond exactly this. Consolidate this exit
> logic for a net reduction in SLOC.
> 
> More seriously, when we utilize job_defer_to_main_loop_bh to call
> a function that calls job_completed, job_finalize_single will run
> in a context where it has recursively taken the aio_context lock,
> which can cause hangs if it puts down a reference that causes a flush.
> 
> You can observe this in practice by looking at mirror_exit's careful
> placement of job_completed and bdrv_unref calls.
> 
> If we centralize job exiting, we can signal job completion from outside
> of the aio_context, which should allow for job cleanup code to run with
> only one lock, which makes cleanup callbacks less tricky to write.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  include/qemu/job.h |  7 +++++++
>  job.c              | 19 +++++++++++++++++++
>  2 files changed, 26 insertions(+)

Currently all jobs do this, the question of course is why.  The answer
is because they are block jobs that need to do some graph manipulation
in the main thread, right?

OK, that's reasonable enough, that sounds like even non-block jobs may
need this (i.e. modify some global qemu state that you can only do in
the main loop).  Interestingly, the create job only calls
job_completed() of which it says nowhere that it needs to be executed in
the main loop.

...on second thought, do we really want to execute job_complete() in the
main loop?  First of all, all of the transactional functions will run in
the main loop.  Which makes sense, but it isn't noted anywhere.
Secondly, we may end up calling JobDriver.user_resume(), which is
probably not something we want to call in the main loop.

OTOH, job_finish_sync() is something that has to be run in the main loop
because it polls the main loop (and as far as my FUSE experiments have
told me, polling a foreign AioContext doesn't work).

So...  I suppose it would be nice if we had a real distinction which
functions are run in which AioContext.  It seems like we indeed want to
run job_completed() in the main loop, but what to do about the
user_resume() call in job_cancel_async()?

(And it should be noted for all of the transactional methods that they
are called in the main loop.)


OK, so that's that.  Now that I know what it's for, I'd like to ask for
a different name.  exit() means kill the process.  JobDriver.exit() will
not mean kill the job.  That's where I get a headache.

This function is for allowing the job to carry out global qemu state
modifications in the main loop.  Neither is that exiting in the sense
that the job is destroyed (as this is done only later, and the job gets
to take part in it through the transactional callbacks, and .free()),
nor is it exiting in the sense that the job needs to do all
pre-transactional clean-ups here (they are supposed to do that in .run()
-- *unlees* something needs the main loop).

I'd like .main_loop_settle().  Or .main_loop_post_run().  I think it's
OK to have names that aren't as cool and tense as possible, when in
return they actually tell you what they're doing.  (Sure,
.main_loop_post_run() sounds really stupid, but it tells you exactly
when the function is called and what it's for.)

(Maybe the problem of all your naming woes really is just that you
always try to find a single word that describes what's going on :-) -- I
don't want to go into ProblemSolveFactoryObserverFactorySingleton
either, but it's OK to use an underscore once in a while.)

Max
Max Reitz Aug. 22, 2018, 11:52 a.m. UTC | #3
On 2018-08-22 13:43, Max Reitz wrote:

[...]

> I'd like .main_loop_settle().  Or .main_loop_post_run().  I think it's
> OK to have names that aren't as cool and tense as possible, when in
> return they actually tell you what they're doing.  (Sure,
> .main_loop_post_run() sounds really stupid, but it tells you exactly
> when the function is called and what it's for.)

Looking at the next patch, I realized that .main_loop_complete() or
.complete_in_main_loop() would work just as well.  (No, I don't see any
confusion with whether you need to call job_completed(), especially
since after this series you can probably make that function static to
job.c.)

Max
John Snow Aug. 22, 2018, 9:45 p.m. UTC | #4
On 08/22/2018 07:52 AM, Max Reitz wrote:
> On 2018-08-22 13:43, Max Reitz wrote:
> 
> [...]
> 
>> I'd like .main_loop_settle().  Or .main_loop_post_run().  I think it's
>> OK to have names that aren't as cool and tense as possible, when in
>> return they actually tell you what they're doing.  (Sure,
>> .main_loop_post_run() sounds really stupid, but it tells you exactly
>> when the function is called and what it's for.)
> 
> Looking at the next patch, I realized that .main_loop_complete() or
> .complete_in_main_loop() would work just as well.  (No, I don't see any
> confusion with whether you need to call job_completed(), especially
> since after this series you can probably make that function static to
> job.c.)
> 
> Max
> 

I'm sorry to announce that after the part two of this series, the
callback will be erased completely, so the name is maybe less... important.
John Snow Aug. 22, 2018, 9:52 p.m. UTC | #5
On 08/22/2018 07:43 AM, Max Reitz wrote:
> On 2018-08-17 21:04, John Snow wrote:
>> All jobs do the same thing when they leave their running loop:
>> - Store the return code in a structure
>> - wait to receive this structure in the main thread
>> - signal job completion via job_completed
>>
>> Few jobs do anything beyond exactly this. Consolidate this exit
>> logic for a net reduction in SLOC.
>>
>> More seriously, when we utilize job_defer_to_main_loop_bh to call
>> a function that calls job_completed, job_finalize_single will run
>> in a context where it has recursively taken the aio_context lock,
>> which can cause hangs if it puts down a reference that causes a flush.
>>
>> You can observe this in practice by looking at mirror_exit's careful
>> placement of job_completed and bdrv_unref calls.
>>
>> If we centralize job exiting, we can signal job completion from outside
>> of the aio_context, which should allow for job cleanup code to run with
>> only one lock, which makes cleanup callbacks less tricky to write.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>  include/qemu/job.h |  7 +++++++
>>  job.c              | 19 +++++++++++++++++++
>>  2 files changed, 26 insertions(+)
> 
> Currently all jobs do this, the question of course is why.  The answer
> is because they are block jobs that need to do some graph manipulation
> in the main thread, right?
> 

Yep.

> OK, that's reasonable enough, that sounds like even non-block jobs may
> need this (i.e. modify some global qemu state that you can only do in
> the main loop).  Interestingly, the create job only calls
> job_completed() of which it says nowhere that it needs to be executed in
> the main loop.
> 

Yeah, not all jobs will have anything meaningful to do in the main loop
context. This is one of them.

> ...on second thought, do we really want to execute job_complete() in the
> main loop?  First of all, all of the transactional functions will run in
> the main loop.  Which makes sense, but it isn't noted anywhere.
> Secondly, we may end up calling JobDriver.user_resume(), which is
> probably not something we want to call in the main loop.
> 

I think we need to execute job_complete in the main loop, or otherwise
restructure the code that can run between job_completed and
job_finalize_single so that .prepare/.commit/.abort/.clean run in the
main thread, which is something we want to preserve.

It's simpler just to say that complete will run from the main thread,
like it does presently.

Why would we not want to call user_resume from the main loop? That's
directly where it's called from, since it gets invoked directly from the
qmp thread.

> OTOH, job_finish_sync() is something that has to be run in the main loop
> because it polls the main loop (and as far as my FUSE experiments have
> told me, polling a foreign AioContext doesn't work).
> 
> So...  I suppose it would be nice if we had a real distinction which
> functions are run in which AioContext.  It seems like we indeed want to
> run job_completed() in the main loop, but what to do about the
> user_resume() call in job_cancel_async()?
> 

I don't think we need to do anything -- at least, these functions
*already* run from the main loop.

mirror_exit et al get scheduled from job_defer_to_main_loop and call
job_completed there, so it's already always done from the main loop; I'm
just cutting out the part where the jobs have to manually schedule this.

> (And it should be noted for all of the transactional methods that they
> are called in the main loop.)
>
Eric Blake Aug. 22, 2018, 10:01 p.m. UTC | #6
On 08/22/2018 06:43 AM, Max Reitz wrote:
> On 2018-08-17 21:04, John Snow wrote:
>> All jobs do the same thing when they leave their running loop:
>> - Store the return code in a structure
>> - wait to receive this structure in the main thread
>> - signal job completion via job_completed
>>
>> Few jobs do anything beyond exactly this. Consolidate this exit
>> logic for a net reduction in SLOC.
>>

> OK, so that's that.  Now that I know what it's for, I'd like to ask for
> a different name.  exit() means kill the process.  JobDriver.exit() will
> not mean kill the job.  That's where I get a headache.
> 
> This function is for allowing the job to carry out global qemu state
> modifications in the main loop.  Neither is that exiting in the sense
> that the job is destroyed (as this is done only later, and the job gets
> to take part in it through the transactional callbacks, and .free()),
> nor is it exiting in the sense that the job needs to do all
> pre-transactional clean-ups here (they are supposed to do that in .run()
> -- *unlees* something needs the main loop).
> 
> I'd like .main_loop_settle().  Or .main_loop_post_run().  I think it's
> OK to have names that aren't as cool and tense as possible, when in
> return they actually tell you what they're doing.  (Sure,
> .main_loop_post_run() sounds really stupid, but it tells you exactly
> when the function is called and what it's for.)
> 
> (Maybe the problem of all your naming woes really is just that you
> always try to find a single word that describes what's going on :-) -- I
> don't want to go into ProblemSolveFactoryObserverFactorySingleton
> either, but it's OK to use an underscore once in a while.)

Does .wrapup or .conclude work any better than .exit for such a one-word 
name that goes away in the next series?  Actually, your suggestion of 
.settle seems reasonable to me (if we want terser than 
.main_loop_settle, because the name is going away, but still have a name 
that is not as weird as '.exit' when there are more steps still to follow)
John Snow Aug. 22, 2018, 10:04 p.m. UTC | #7
On 08/22/2018 06:01 PM, Eric Blake wrote:
> On 08/22/2018 06:43 AM, Max Reitz wrote:
>> On 2018-08-17 21:04, John Snow wrote:
>>> All jobs do the same thing when they leave their running loop:
>>> - Store the return code in a structure
>>> - wait to receive this structure in the main thread
>>> - signal job completion via job_completed
>>>
>>> Few jobs do anything beyond exactly this. Consolidate this exit
>>> logic for a net reduction in SLOC.
>>>
> 
>> OK, so that's that.  Now that I know what it's for, I'd like to ask for
>> a different name.  exit() means kill the process.  JobDriver.exit() will
>> not mean kill the job.  That's where I get a headache.
>>
>> This function is for allowing the job to carry out global qemu state
>> modifications in the main loop.  Neither is that exiting in the sense
>> that the job is destroyed (as this is done only later, and the job gets
>> to take part in it through the transactional callbacks, and .free()),
>> nor is it exiting in the sense that the job needs to do all
>> pre-transactional clean-ups here (they are supposed to do that in .run()
>> -- *unlees* something needs the main loop).
>>
>> I'd like .main_loop_settle().  Or .main_loop_post_run().  I think it's
>> OK to have names that aren't as cool and tense as possible, when in
>> return they actually tell you what they're doing.  (Sure,
>> .main_loop_post_run() sounds really stupid, but it tells you exactly
>> when the function is called and what it's for.)
>>
>> (Maybe the problem of all your naming woes really is just that you
>> always try to find a single word that describes what's going on :-) -- I
>> don't want to go into ProblemSolveFactoryObserverFactorySingleton
>> either, but it's OK to use an underscore once in a while.)
> 
> Does .wrapup or .conclude work any better than .exit for such a one-word
> name that goes away in the next series?  Actually, your suggestion of
> .settle seems reasonable to me (if we want terser than
> .main_loop_settle, because the name is going away, but still have a name
> that is not as weird as '.exit' when there are more steps still to follow)
> 

This is running away from me :)

.exit() goes away after part two of the series, once I refactor all of
these completion functions into their .prepare/.abort/.commit/.clean
components.

It's just important that I do this series ***FIRST***, to avoid
deadlocks in the component callbacks.

--js
Max Reitz Aug. 25, 2018, 12:54 p.m. UTC | #8
On 2018-08-22 23:45, John Snow wrote:
> 
> 
> On 08/22/2018 07:52 AM, Max Reitz wrote:
>> On 2018-08-22 13:43, Max Reitz wrote:
>>
>> [...]
>>
>>> I'd like .main_loop_settle().  Or .main_loop_post_run().  I think it's
>>> OK to have names that aren't as cool and tense as possible, when in
>>> return they actually tell you what they're doing.  (Sure,
>>> .main_loop_post_run() sounds really stupid, but it tells you exactly
>>> when the function is called and what it's for.)
>>
>> Looking at the next patch, I realized that .main_loop_complete() or
>> .complete_in_main_loop() would work just as well.  (No, I don't see any
>> confusion with whether you need to call job_completed(), especially
>> since after this series you can probably make that function static to
>> job.c.)
>>
>> Max
>>
> 
> I'm sorry to announce that after the part two of this series, the
> callback will be erased completely, so the name is maybe less... important.

That's what I get for not reading the cover letter.

Max
Max Reitz Aug. 25, 2018, 1:05 p.m. UTC | #9
On 2018-08-22 23:52, John Snow wrote:
> 
> 
> On 08/22/2018 07:43 AM, Max Reitz wrote:
>> On 2018-08-17 21:04, John Snow wrote:
>>> All jobs do the same thing when they leave their running loop:
>>> - Store the return code in a structure
>>> - wait to receive this structure in the main thread
>>> - signal job completion via job_completed
>>>
>>> Few jobs do anything beyond exactly this. Consolidate this exit
>>> logic for a net reduction in SLOC.
>>>
>>> More seriously, when we utilize job_defer_to_main_loop_bh to call
>>> a function that calls job_completed, job_finalize_single will run
>>> in a context where it has recursively taken the aio_context lock,
>>> which can cause hangs if it puts down a reference that causes a flush.
>>>
>>> You can observe this in practice by looking at mirror_exit's careful
>>> placement of job_completed and bdrv_unref calls.
>>>
>>> If we centralize job exiting, we can signal job completion from outside
>>> of the aio_context, which should allow for job cleanup code to run with
>>> only one lock, which makes cleanup callbacks less tricky to write.
>>>
>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>> ---
>>>  include/qemu/job.h |  7 +++++++
>>>  job.c              | 19 +++++++++++++++++++
>>>  2 files changed, 26 insertions(+)
>>
>> Currently all jobs do this, the question of course is why.  The answer
>> is because they are block jobs that need to do some graph manipulation
>> in the main thread, right?
>>
> 
> Yep.
> 
>> OK, that's reasonable enough, that sounds like even non-block jobs may
>> need this (i.e. modify some global qemu state that you can only do in
>> the main loop).  Interestingly, the create job only calls
>> job_completed() of which it says nowhere that it needs to be executed in
>> the main loop.
>>
> 
> Yeah, not all jobs will have anything meaningful to do in the main loop
> context. This is one of them.
> 
>> ...on second thought, do we really want to execute job_complete() in the
>> main loop?  First of all, all of the transactional functions will run in
>> the main loop.  Which makes sense, but it isn't noted anywhere.
>> Secondly, we may end up calling JobDriver.user_resume(), which is
>> probably not something we want to call in the main loop.
>>
> 
> I think we need to execute job_complete in the main loop, or otherwise
> restructure the code that can run between job_completed and
> job_finalize_single so that .prepare/.commit/.abort/.clean run in the
> main thread, which is something we want to preserve.

Sure.

> It's simpler just to say that complete will run from the main thread,
> like it does presently.

Yes, but we don't say that.

> Why would we not want to call user_resume from the main loop? That's
> directly where it's called from, since it gets invoked directly from the
> qmp thread.

Hmm!  True indeed.

The reason why we might not want to do it is because the job may not run
in the main loop, so modifying the job (especially invoking a job
method) may be dangerous without taking precautions.

>> OTOH, job_finish_sync() is something that has to be run in the main loop
>> because it polls the main loop (and as far as my FUSE experiments have
>> told me, polling a foreign AioContext doesn't work).
>>
>> So...  I suppose it would be nice if we had a real distinction which
>> functions are run in which AioContext.  It seems like we indeed want to
>> run job_completed() in the main loop, but what to do about the
>> user_resume() call in job_cancel_async()?
>>
> 
> I don't think we need to do anything -- at least, these functions
> *already* run from the main loop.

Yeah, but we don't mark that anywhere.  I really don't like that.  Jobs
need to know which of their functions are run in which AioContext.

> mirror_exit et al get scheduled from job_defer_to_main_loop and call
> job_completed there, so it's already always done from the main loop; I'm
> just cutting out the part where the jobs have to manually schedule this.

I'm not saying what you're doing is wrong, I'm just saying tracking
which things are running in which context is not easy because there are
no comments on how it's supposed to be run.  (Apart from your new
.exit() method which does say that it's run in the main loop.)

No, I don't find it obvious which functions are run in which context
when first I have to think about in which context those functions are
used (e.g. user_resume is usually the result of a QMP command, so it's
run in the main loop; the transactional methods are part of completion,
which is done in the main loop, so they are also called in the main
loop; and so on).

But that's not part of this series.  It just occurred to me when
tracking down which function belongs to which context when reviewing
this patch.

Max
John Snow Aug. 27, 2018, 3:54 p.m. UTC | #10
On 08/25/2018 09:05 AM, Max Reitz wrote:
> On 2018-08-22 23:52, John Snow wrote:
>>
>>
>> On 08/22/2018 07:43 AM, Max Reitz wrote:
>>> On 2018-08-17 21:04, John Snow wrote:
>>>> All jobs do the same thing when they leave their running loop:
>>>> - Store the return code in a structure
>>>> - wait to receive this structure in the main thread
>>>> - signal job completion via job_completed
>>>>
>>>> Few jobs do anything beyond exactly this. Consolidate this exit
>>>> logic for a net reduction in SLOC.
>>>>
>>>> More seriously, when we utilize job_defer_to_main_loop_bh to call
>>>> a function that calls job_completed, job_finalize_single will run
>>>> in a context where it has recursively taken the aio_context lock,
>>>> which can cause hangs if it puts down a reference that causes a flush.
>>>>
>>>> You can observe this in practice by looking at mirror_exit's careful
>>>> placement of job_completed and bdrv_unref calls.
>>>>
>>>> If we centralize job exiting, we can signal job completion from outside
>>>> of the aio_context, which should allow for job cleanup code to run with
>>>> only one lock, which makes cleanup callbacks less tricky to write.
>>>>
>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>> ---
>>>>  include/qemu/job.h |  7 +++++++
>>>>  job.c              | 19 +++++++++++++++++++
>>>>  2 files changed, 26 insertions(+)
>>>
>>> Currently all jobs do this, the question of course is why.  The answer
>>> is because they are block jobs that need to do some graph manipulation
>>> in the main thread, right?
>>>
>>
>> Yep.
>>
>>> OK, that's reasonable enough, that sounds like even non-block jobs may
>>> need this (i.e. modify some global qemu state that you can only do in
>>> the main loop).  Interestingly, the create job only calls
>>> job_completed() of which it says nowhere that it needs to be executed in
>>> the main loop.
>>>
>>
>> Yeah, not all jobs will have anything meaningful to do in the main loop
>> context. This is one of them.
>>
>>> ...on second thought, do we really want to execute job_complete() in the
>>> main loop?  First of all, all of the transactional functions will run in
>>> the main loop.  Which makes sense, but it isn't noted anywhere.
>>> Secondly, we may end up calling JobDriver.user_resume(), which is
>>> probably not something we want to call in the main loop.
>>>
>>
>> I think we need to execute job_complete in the main loop, or otherwise
>> restructure the code that can run between job_completed and
>> job_finalize_single so that .prepare/.commit/.abort/.clean run in the
>> main thread, which is something we want to preserve.
> 
> Sure.
> 
>> It's simpler just to say that complete will run from the main thread,
>> like it does presently.
> 
> Yes, but we don't say that.
> 
>> Why would we not want to call user_resume from the main loop? That's
>> directly where it's called from, since it gets invoked directly from the
>> qmp thread.
> 
> Hmm!  True indeed.
> 
> The reason why we might not want to do it is because the job may not run
> in the main loop, so modifying the job (especially invoking a job
> method) may be dangerous without taking precautions.
> 
>>> OTOH, job_finish_sync() is something that has to be run in the main loop
>>> because it polls the main loop (and as far as my FUSE experiments have
>>> told me, polling a foreign AioContext doesn't work).
>>>
>>> So...  I suppose it would be nice if we had a real distinction which
>>> functions are run in which AioContext.  It seems like we indeed want to
>>> run job_completed() in the main loop, but what to do about the
>>> user_resume() call in job_cancel_async()?
>>>
>>
>> I don't think we need to do anything -- at least, these functions
>> *already* run from the main loop.
> 
> Yeah, but we don't mark that anywhere.  I really don't like that.  Jobs
> need to know which of their functions are run in which AioContext.
> 
>> mirror_exit et al get scheduled from job_defer_to_main_loop and call
>> job_completed there, so it's already always done from the main loop; I'm
>> just cutting out the part where the jobs have to manually schedule this.
> 
> I'm not saying what you're doing is wrong, I'm just saying tracking
> which things are running in which context is not easy because there are
> no comments on how it's supposed to be run.  (Apart from your new
> .exit() method which does say that it's run in the main loop.)
> 
> No, I don't find it obvious which functions are run in which context
> when first I have to think about in which context those functions are
> used (e.g. user_resume is usually the result of a QMP command, so it's
> run in the main loop; the transactional methods are part of completion,
> which is done in the main loop, so they are also called in the main
> loop; and so on).
> 
> But that's not part of this series.  It just occurred to me when
> tracking down which function belongs to which context when reviewing
> this patch.
> 
> Max
> 

Oh, I see. I can mark up the functions I/we expect to run in the main
thread with comments above the function implementation, would that help?

Probably also a top level document would also help... We're overdue for
one after all the changes recently.
Max Reitz Aug. 29, 2018, 8:16 a.m. UTC | #11
On 2018-08-27 17:54, John Snow wrote:
> 
> 
> On 08/25/2018 09:05 AM, Max Reitz wrote:
>> On 2018-08-22 23:52, John Snow wrote:
>>>
>>>
>>> On 08/22/2018 07:43 AM, Max Reitz wrote:
>>>> On 2018-08-17 21:04, John Snow wrote:
>>>>> All jobs do the same thing when they leave their running loop:
>>>>> - Store the return code in a structure
>>>>> - wait to receive this structure in the main thread
>>>>> - signal job completion via job_completed
>>>>>
>>>>> Few jobs do anything beyond exactly this. Consolidate this exit
>>>>> logic for a net reduction in SLOC.
>>>>>
>>>>> More seriously, when we utilize job_defer_to_main_loop_bh to call
>>>>> a function that calls job_completed, job_finalize_single will run
>>>>> in a context where it has recursively taken the aio_context lock,
>>>>> which can cause hangs if it puts down a reference that causes a flush.
>>>>>
>>>>> You can observe this in practice by looking at mirror_exit's careful
>>>>> placement of job_completed and bdrv_unref calls.
>>>>>
>>>>> If we centralize job exiting, we can signal job completion from outside
>>>>> of the aio_context, which should allow for job cleanup code to run with
>>>>> only one lock, which makes cleanup callbacks less tricky to write.
>>>>>
>>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>>> ---
>>>>>  include/qemu/job.h |  7 +++++++
>>>>>  job.c              | 19 +++++++++++++++++++
>>>>>  2 files changed, 26 insertions(+)
>>>>
>>>> Currently all jobs do this, the question of course is why.  The answer
>>>> is because they are block jobs that need to do some graph manipulation
>>>> in the main thread, right?
>>>>
>>>
>>> Yep.
>>>
>>>> OK, that's reasonable enough, that sounds like even non-block jobs may
>>>> need this (i.e. modify some global qemu state that you can only do in
>>>> the main loop).  Interestingly, the create job only calls
>>>> job_completed() of which it says nowhere that it needs to be executed in
>>>> the main loop.
>>>>
>>>
>>> Yeah, not all jobs will have anything meaningful to do in the main loop
>>> context. This is one of them.
>>>
>>>> ...on second thought, do we really want to execute job_complete() in the
>>>> main loop?  First of all, all of the transactional functions will run in
>>>> the main loop.  Which makes sense, but it isn't noted anywhere.
>>>> Secondly, we may end up calling JobDriver.user_resume(), which is
>>>> probably not something we want to call in the main loop.
>>>>
>>>
>>> I think we need to execute job_complete in the main loop, or otherwise
>>> restructure the code that can run between job_completed and
>>> job_finalize_single so that .prepare/.commit/.abort/.clean run in the
>>> main thread, which is something we want to preserve.
>>
>> Sure.
>>
>>> It's simpler just to say that complete will run from the main thread,
>>> like it does presently.
>>
>> Yes, but we don't say that.
>>
>>> Why would we not want to call user_resume from the main loop? That's
>>> directly where it's called from, since it gets invoked directly from the
>>> qmp thread.
>>
>> Hmm!  True indeed.
>>
>> The reason why we might not want to do it is because the job may not run
>> in the main loop, so modifying the job (especially invoking a job
>> method) may be dangerous without taking precautions.
>>
>>>> OTOH, job_finish_sync() is something that has to be run in the main loop
>>>> because it polls the main loop (and as far as my FUSE experiments have
>>>> told me, polling a foreign AioContext doesn't work).
>>>>
>>>> So...  I suppose it would be nice if we had a real distinction which
>>>> functions are run in which AioContext.  It seems like we indeed want to
>>>> run job_completed() in the main loop, but what to do about the
>>>> user_resume() call in job_cancel_async()?
>>>>
>>>
>>> I don't think we need to do anything -- at least, these functions
>>> *already* run from the main loop.
>>
>> Yeah, but we don't mark that anywhere.  I really don't like that.  Jobs
>> need to know which of their functions are run in which AioContext.
>>
>>> mirror_exit et al get scheduled from job_defer_to_main_loop and call
>>> job_completed there, so it's already always done from the main loop; I'm
>>> just cutting out the part where the jobs have to manually schedule this.
>>
>> I'm not saying what you're doing is wrong, I'm just saying tracking
>> which things are running in which context is not easy because there are
>> no comments on how it's supposed to be run.  (Apart from your new
>> .exit() method which does say that it's run in the main loop.)
>>
>> No, I don't find it obvious which functions are run in which context
>> when first I have to think about in which context those functions are
>> used (e.g. user_resume is usually the result of a QMP command, so it's
>> run in the main loop; the transactional methods are part of completion,
>> which is done in the main loop, so they are also called in the main
>> loop; and so on).
>>
>> But that's not part of this series.  It just occurred to me when
>> tracking down which function belongs to which context when reviewing
>> this patch.
>>
>> Max
>>
> 
> Oh, I see. I can mark up the functions I/we expect to run in the main
> thread with comments above the function implementation, would that help?

Sure, that's exactly what I mean. :-)

> Probably also a top level document would also help... We're overdue for
> one after all the changes recently.

If you have the time, sure.

Max
diff mbox series

Patch

diff --git a/include/qemu/job.h b/include/qemu/job.h
index 5c92c53ef0..6f5b13751a 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -204,6 +204,13 @@  struct JobDriver {
      */
     void (*drain)(Job *job);
 
+    /**
+     * If the callback is not NULL, exit will be invoked from the main thread
+     * when the job's coroutine has finished, but before transactional
+     * convergence; before @prepare or @abort.
+     */
+    void (*exit)(Job *job);
+
     /**
      * If the callback is not NULL, prepare will be invoked when all the jobs
      * belonging to the same transaction complete; or upon this job's completion
diff --git a/job.c b/job.c
index c9de1af556..fae8e6047c 100644
--- a/job.c
+++ b/job.c
@@ -535,6 +535,19 @@  void job_drain(Job *job)
     }
 }
 
+static void job_exit(void *opaque)
+{
+    Job *job = (Job *)opaque;
+    AioContext *aio_context = job->aio_context;
+
+    if (job->driver->exit) {
+        aio_context_acquire(aio_context);
+        job->driver->exit(job);
+        aio_context_release(aio_context);
+    }
+    job_completed(job, job->ret);
+}
+
 /**
  * All jobs must allow a pause point before entering their job proper. This
  * ensures that jobs can be paused prior to being started, then resumed later.
@@ -546,6 +559,12 @@  static void coroutine_fn job_co_entry(void *opaque)
     assert(job && job->driver && job->driver->run);
     job_pause_point(job);
     job->ret = job->driver->run(job, &job->err);
+    if (!job->deferred_to_main_loop) {
+        job->deferred_to_main_loop = true;
+        aio_bh_schedule_oneshot(qemu_get_aio_context(),
+                                job_exit,
+                                job);
+    }
 }