[v2,04/11] blockjobs: Always use block_job_get_aio_context

Message ID	1475272849-19990-5-git-send-email-jsnow@redhat.com
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: John Snow <jsnow@redhat.com> To: qemu-block@nongnu.org Date: Fri, 30 Sep 2016 18:00:42 -0400 Message-Id: <1475272849-19990-5-git-send-email-jsnow@redhat.com> In-Reply-To: <1475272849-19990-1-git-send-email-jsnow@redhat.com> References: <1475272849-19990-1-git-send-email-jsnow@redhat.com> Subject: [Qemu-devel] [PATCH v2 04/11] blockjobs: Always use block_job_get_aio_context Precedence: list Cc: kwolf@redhat.com, vsementsov@virtuozzo.com, famz@redhat.com, John Snow <jsnow@redhat.com>, jcody@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>

John Snow Sept. 30, 2016, 10 p.m. UTC

There are a few places where we're fishing it out for ourselves.
Let's not do that and instead use the helper.

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/io.c               | 4 ++--
 blockdev.c               | 4 ++--
 blockjob.c               | 2 +-
 include/block/blockjob.h | 9 +++++++++
 qemu-img.c               | 2 +-
 5 files changed, 15 insertions(+), 6 deletions(-)

Kevin Wolf Oct. 5, 2016, 2:02 p.m. UTC | #1

Am 01.10.2016 um 00:00 hat John Snow geschrieben:
> There are a few places where we're fishing it out for ourselves.
> Let's not do that and instead use the helper.
> 
> Signed-off-by: John Snow <jsnow@redhat.com>

That change makes a difference when the block job is running its
completion part after block_job_defer_to_main_loop(). The commit message
could be more explicit about whether this is intentional or whether this
case is expected to happen at all.

I suspect that if it can happen, this is a bug fix. Please check and
update the commit message accordingly.

Kevin

John Snow Oct. 6, 2016, 8:22 p.m. UTC | #2

On 10/05/2016 10:02 AM, Kevin Wolf wrote:
> Am 01.10.2016 um 00:00 hat John Snow geschrieben:
>> There are a few places where we're fishing it out for ourselves.
>> Let's not do that and instead use the helper.
>>
>> Signed-off-by: John Snow <jsnow@redhat.com>
>
> That change makes a difference when the block job is running its
> completion part after block_job_defer_to_main_loop(). The commit message
> could be more explicit about whether this is intentional or whether this
> case is expected to happen at all.
>
> I suspect that if it can happen, this is a bug fix. Please check and
> update the commit message accordingly.
>

Because I'm bad with being concise, I wrote a TLDR at the bottom. 
Otherwise, enjoy this wall of text.

> Kevin
>

Intentional under the premise of:

(1) Acquiring the context for which a job is not actually running under 
is likely incorrect (or at the very least misleading), and

(2) If using the main thread context for any would-be callers is 
incorrect, this is a problem with the job lifetime that needs to be 
corrected anyway.

In general, if we are acquiring the context to secure exclusive access 
to the BlockJob state itself, using the getter here is perfectly safe. 
If we are acquiring context for other reasons, we need to consider more 
carefully.

The callers are:

(A) bdrv_drain_all (block/io)

	Obtains context for the sake of pause/resume. Pauses all jobs before 
draining all BDSes. For starters, pausing a job that has deferred to 
main has no effect (and neither does resuming). This usage appears 
slightly erroneous, though, in that if we are not running from the main 
thread, we are definitely not securing exclusive rights to the block 
job. We could, in theory, race on reads/writes to the pause count field. 
This would be a bugfix.

(B) find_block_job (all monitor context)

	Acquires context as a courtesy for its callers:
	- qmp_block_job_set_speed
	- qmp_block_job_cancel
	- qmp_block_job_pause
	- qmp_block_job_resume
	- qmp_block_job_complete

In an "already deferred to main" sense...  in general, if the job has 
already deferred to main we don't need to acquire the block's context to 
get safe access to the job, because we're already running in the main 
context. Further, none of these functions actually have any meaning for 
a job in such a state.

	- set_speed: Sets speed parameters, harmless either way.
	- cancel: Will set the cancelled boolean, reset iostatus, then attempt 
to enter the job. Since a job that enters the main loop remains busy, 
the enter is a NOP. The BlockBackend AIO context here is therefore 
extraneous, and the getter is safe.
	- pause: Only increments a counter, and will have no effect.
	- resume: Decrements a counter. Attempts to enter(), but as stated 
above this is a NOP.
	- complete: Calls .complete(), for which the only implementation is 
mirror_complete. Uh, this actually seems messy. Looks like there's 
nothing to prevent us from calling this after we've already told it to 
complete once. This could be a legitimate bug that this patch does 
nothing in particular to address. If complete() is shored up such that 
it can be called precisely once, this becomes safe.

(C) qmp_query_block_jobs (monitor context)

	Just a getter. Using get_context is safe in either state.

(D) run_block_job (qemu-img)

	Never called when the context is in the main loop anyway. Effectively 
no change here.

So, with the exception of .complete, I think this is a safe change as it 
stands... However... Paolo wants to complicate my life and get rid of 
this getter for his own fiendish purposes. He suggests pushing down 
context acquisition into blockjob.c directly for any QMP callers:

- qmp_block_job_set_speed -> block_job_set_speed
- qmp_block_job_cancel -> block_job_cancel
- qmp_block_job_pause -> block_job_user_pause
- qmp_block_job_resume -> block_job_user_resume
- qmp_block_job_complete -> block_job_complete
- qmp_query_block_jobs -> block_job_query

Most of these have only one caller in the QMP layer:

block_job_set_speed
block_job_user_pause
block_job_user_resume
block_job_query

These can easily just take the context they need, removing external uses 
of job->blk for purposes of acquiring the context.

block_job_cancel and block_job_complete are different.

block_job_cancel is called in many places, but we can just add a similar 
block_job_user_cancel if we wanted a version which takes care to acquire 
context and one that does not. (Or we could just acquire the context 
regardless, but Paolo warned me ominously that recursive locks are EVIL. 
He sounded serious.)

block_job_complete has no direct callers outside of QMP, but it is also 
used as a callback by block_job_complete_sync, used in qemu-img for 
run_block_job. I can probably rewrite qemu_img to avoid this usage.

TLDR:
- This change should be perfectly safe, but Paolo wants to get rid of 
this usage anyway.
- At least 5/6 uses of external context grabbing can be internalized easily.
- qemu-img's run_block_job needs to be refactored a bit, though I don't 
have an idea for that yet, but as you pointed out it needs to be done 
for the public/private split anyway.
- block_job_complete needs to be touched up no matter what we do.
- The aio_context getter can probably be removed entirely as per Paolo's 
wishes, but I'll have to change bdrv_drain_all a bit. A 
block_job_pause_all and block_job_resume all would work, though that's a 
bit special purpose. I could craft up a block_job_apply_all for the 
purpose instead. (e.g. block_job_apply_all(block_job_pause))

I think that answers everyone's questions...
--js

Paolo Bonzini Oct. 7, 2016, 7:49 a.m. UTC | #3

On 06/10/2016 22:22, John Snow wrote:
> Calls .complete(), for which the only implementation is
> mirror_complete. Uh, this actually seems messy. Looks like there's
> nothing to prevent us from calling this after we've already told it to
> complete once.

Yeah, it should have an

    if (s->should_complete) {
        return;
    }

at the beginning.  I have other mirror.c patches in my queue so I can
take care of that.

> block_job_cancel and block_job_complete are different.
> 
> block_job_cancel is called in many places, but we can just add a similar
> block_job_user_cancel if we wanted a version which takes care to acquire
> context and one that does not. (Or we could just acquire the context
> regardless, but Paolo warned me ominously that recursive locks are EVIL.
> He sounded serious.)

Not that many places:

- block_job_finish_sync calls it, and you can just release/reacquire
around the call to "finish(job, &local_err)".

- there are two callers in blockdev.c, and you can just remove the
acquire/release from blockdev.c if you push it in block_job_cancel.

As to block_job_cancel_sync:

- replication_stop is not acquiring s->secondary_disk->bs's AioContext.

- there is no need to hold the AioContext between ->prepare and ->clean.
 My suggestion is to ref the blockjob after storing it in state->job
(you probably should do that anyway) and unref'ing it in ->clean.  Then
you can call again let block_job_cancel_sync(bs->job) take the
AioContext, which it will do in block_job_finish_sync.

> block_job_complete has no direct callers outside of QMP, but it is also
> used as a callback by block_job_complete_sync, used in qemu-img for
> run_block_job. I can probably rewrite qemu_img to avoid this usage.

No need to: qemu-img is not acquiring the AioContext, so it's okay to
let block_job_complete do that (like block_job_cancel,
block_job_complete will be called by block_job_finish_sync without the
AioContext acquired).

Paolo

John Snow Oct. 13, 2016, 12:49 a.m. UTC | #4

As context to everyone else as to why I'm going down the rabbit hole of 
trying to remove external references to AioContext at all, see 
https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg00795.html

On 10/07/2016 03:49 AM, Paolo Bonzini wrote:
>
>
> On 06/10/2016 22:22, John Snow wrote:
>> Calls .complete(), for which the only implementation is
>> mirror_complete. Uh, this actually seems messy. Looks like there's
>> nothing to prevent us from calling this after we've already told it to
>> complete once.
>
> Yeah, it should have an
>
>     if (s->should_complete) {
>         return;
>     }
>
> at the beginning.  I have other mirror.c patches in my queue so I can
> take care of that.
>

Or something up the stack at block_job_complete so it's not up to job 
implementations? What if the next implementer "forgets."

>> block_job_cancel and block_job_complete are different.
>>
>> block_job_cancel is called in many places, but we can just add a similar
>> block_job_user_cancel if we wanted a version which takes care to acquire
>> context and one that does not. (Or we could just acquire the context
>> regardless, but Paolo warned me ominously that recursive locks are EVIL.
>> He sounded serious.)
>
> Not that many places:
>
> - block_job_finish_sync calls it, and you can just release/reacquire
> around the call to "finish(job, &local_err)".
>

This makes me a little nervous because we went through the trouble of 
creating this callback, but we're going to assume we know that it's a 
public interface that will take the lock for itself (or otherwise does 
not require a lock.)

In practice it works, but it seems needlessly mystifying in terms of 
proving correctness.

> - there are two callers in blockdev.c, and you can just remove the
> acquire/release from blockdev.c if you push it in block_job_cancel.
>

Makes sense; I don't like the association of (bs :: job) here anyway. 
Again we're grabbing context for a job where that job may not even be 
running.

> As to block_job_cancel_sync:
>

Which I didn't audit, because no callers use job->blk to get the 
AioContext before calling this; they use bs if bs->job is present.

> - replication_stop is not acquiring s->secondary_disk->bs's AioContext.
>

Seems like a bug on their part. Would be fixed by having cancel acquire 
context for itself.

> - there is no need to hold the AioContext between ->prepare and ->clean.
>  My suggestion is to ref the blockjob after storing it in state->job
> (you probably should do that anyway) and unref'ing it in ->clean.  Then
> you can call again let block_job_cancel_sync(bs->job) take the
> AioContext, which it will do in block_job_finish_sync.

Yeah, I should be reffing it anyway.

The rest of this... What I think you mean is acquiring and releasing the 
context as needed for EACH of prepare, commit, abort, and clean as 
necessary, right?

And then in this case, it simply wouldn't be necessary for abort, as the 
sync cancel would do it for us.

>
>> block_job_complete has no direct callers outside of QMP, but it is also
>> used as a callback by block_job_complete_sync, used in qemu-img for
>> run_block_job. I can probably rewrite qemu_img to avoid this usage.
>
> No need to: qemu-img is not acquiring the AioContext, so it's okay to
> let block_job_complete do that (like block_job_cancel,
> block_job_complete will be called by block_job_finish_sync without the
> AioContext acquired).
>

Eh? Oh, you're right, it just gets it for the sake of aio_poll.

> Paolo
>

Alright.

Say I *do* push the acquisitions down into blockjob.c. What benefit does 
that provide? Won't I still need the block_job_get_aio_context() 
function (At least internally) to know which context to acquire? This 
would preclude you from deleting it.

Plus... we remove some fairly simple locking mechanisms and then inflate 
it tenfold. I'm not convinced this is an improvement.

As context and a refresher (for me when I re-read this email in 12 
hours,) there are three places externally that are using an AioContext 
lock as acquired from *within* a BlockJob, excluding those that acquire 
a context separately from a Job and use that to reason that accesses to 
the job are safe (For example, blockdev_mark_auto_del.)

(1) QMP interface for job management
(2) bdrv_drain_all, in block/io.c

(1) AFAICT, the QMP interface is concerned with assuring itself it has 
unique access to the BlockJob structure itself, and it doesn't really 
authentically care about the AIOContext itself -- just race-free access 
to the Job.

This is not necessarily buggy today because, even though we grab the 
BlockBackend's context unconditionally, we already know the main/monitor 
thread is not accessing the blockjob. It's still silly, though.

(2) bdrv_drain_all appears to be worried about the same thing; we just 
need to safely deliver pause/resume messages.

I'm less sure about where this can run from, and suspect that if the job 
has deferred to main that this could be buggy. If bdrv_drain_all is 
called from context A and the job is running on context M having 
deferred from B, we may lock against context B (from A) and have racey 
access from between A/M. (Maybe?)

It looks like all of these usages don't actually really care what 
context they're getting, just that they're getting safe access to the 
job itself. If you want to decouple jobs from their BlockBackend's AIO 
Context, perhaps we just need to replace these by a simple mutex...?

As it stands, though, pushing AIOContext acquisitions down into 
blockjob.c isn't actually going to fix anything, is it? Why not just 
leave it be for right now?

Would you be terribly offended if I left this patch as-is for now and we 
can work on removing the AioContext locks afterwards, or are you adamant 
that I cooperate in getting block_job_get_aio_context removed before I 
add more usages?

Sorry I'm being so obtuse about this all.

--js

Paolo Bonzini Oct. 13, 2016, 9:03 a.m. UTC | #5

On 13/10/2016 02:49, John Snow wrote:
> As context to everyone else as to why I'm going down the rabbit hole of
> trying to remove external references to AioContext at all, see
> https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg00795.html
> 
> On 10/07/2016 03:49 AM, Paolo Bonzini wrote:
>>
>>
>> On 06/10/2016 22:22, John Snow wrote:
>>> Calls .complete(), for which the only implementation is
>>> mirror_complete. Uh, this actually seems messy. Looks like there's
>>> nothing to prevent us from calling this after we've already told it to
>>> complete once.
>>
>> Yeah, it should have an
>>
>>     if (s->should_complete) {
>>         return;
>>     }
>>
>> at the beginning.  I have other mirror.c patches in my queue so I can
>> take care of that.
>>
> 
> Or something up the stack at block_job_complete so it's not up to job
> implementations? What if the next implementer "forgets."

Yeah, that would require moving s->should_complete up to BlockJob.

>>> block_job_cancel and block_job_complete are different.
>>>
>>> block_job_cancel is called in many places, but we can just add a similar
>>> block_job_user_cancel if we wanted a version which takes care to acquire
>>> context and one that does not. (Or we could just acquire the context
>>> regardless, but Paolo warned me ominously that recursive locks are EVIL.
>>> He sounded serious.)
>>
>> Not that many places:
>>
>> - block_job_finish_sync calls it, and you can just release/reacquire
>> around the call to "finish(job, &local_err)".
> 
> This makes me a little nervous because we went through the trouble of
> creating this callback, but we're going to assume we know that it's a
> public interface that will take the lock for itself (or otherwise does
> not require a lock.)
> 
> In practice it works, but it seems needlessly mystifying in terms of
> proving correctness.

It's _much_ easier to assume that all callbacks take the lock
themselves.  It's counterintuitive (just like the idea that recursive
locks are bad :)), but the point is that if you second guess the
callbacks and invoke them you might get the locking order wrong.  This
ends up with ugly AB-BA deadlocks.

>> - replication_stop is not acquiring s->secondary_disk->bs's AioContext.
> 
> Seems like a bug on their part. Would be fixed by having cancel acquire
> context for itself.

Yep.

> Yeah, I should be reffing it anyway.
> 
> The rest of this... What I think you mean is acquiring and releasing the
> context as needed for EACH of prepare, commit, abort, and clean as
> necessary, right?
> 
> And then in this case, it simply wouldn't be necessary for abort, as the
> sync cancel would do it for us.

Right.

> Alright.
> 
> Say I *do* push the acquisitions down into blockjob.c. What benefit does
> that provide? Won't I still need the block_job_get_aio_context()
> function (At least internally) to know which context to acquire? This
> would preclude you from deleting it.
> 
> Plus... we remove some fairly simple locking mechanisms and then inflate
> it tenfold. I'm not convinced this is an improvement.

The improvement is that you can now replace the huge lock with a smaller
one (you don't have to do it now, but you could).  Furthermore, the
small lock is a) non-recursive b) a leaf lock.  This means that you've
removed a lot of opportunities for deadlock, and generally made things
easier to reason about.

Furthermore, with large locks you never know what they actually protect;
cue the confusion between aio_context_acquire/release and
bdrv_drained_begin/end.  And it's easier to forget them if you force the
caller to do it; and we have an example with replication.

Again, it can be counterintuitive that it's better, but it is. :)

> (1) QMP interface for job management
> (2) bdrv_drain_all, in block/io.c
> 
> (1) AFAICT, the QMP interface is concerned with assuring itself it has
> unique access to the BlockJob structure itself, and it doesn't really
> authentically care about the AIOContext itself -- just race-free access
> to the Job.

Yes.  The protection you need here is mostly against concurrent
completion of the job (and concurrent manipulation of fields such as
job->busy, job->paused, etc.

> This is not necessarily buggy today because, even though we grab the
> BlockBackend's context unconditionally, we already know the main/monitor
> thread is not accessing the blockjob. It's still silly, though.
> 
> (2) bdrv_drain_all appears to be worried about the same thing; we just
> need to safely deliver pause/resume messages.
> 
> I'm less sure about where this can run from, and suspect that if the job
> has deferred to main that this could be buggy. If bdrv_drain_all is
> called from context A and the job is running on context M having
> deferred from B, we may lock against context B (from A) and have racey
> access from between A/M. (Maybe?)

bdrv_drain_all is BQL-only (because it acquires more than one AioContext).

> Would you be terribly offended if I left this patch as-is for now and we
> can work on removing the AioContext locks afterwards, or are you adamant
> that I cooperate in getting block_job_get_aio_context removed before I
> add more usages?

Removing block_job_get_aio_context is necessary to fix dataplane bugs,
so I'd really prefer to have that in 2.8.  But why do you actually need
this patch at all?  You can do neither thing---not push the lock down
and not add more usages---which is always the best. :)

Paolo

[v2,04/11] blockjobs: Always use block_job_get_aio_context

Commit Message

Comments

Patch