Message ID | 1475272849-19990-5-git-send-email-jsnow@redhat.com |
---|---|
State | New |
Headers | show |
Am 01.10.2016 um 00:00 hat John Snow geschrieben: > There are a few places where we're fishing it out for ourselves. > Let's not do that and instead use the helper. > > Signed-off-by: John Snow <jsnow@redhat.com> That change makes a difference when the block job is running its completion part after block_job_defer_to_main_loop(). The commit message could be more explicit about whether this is intentional or whether this case is expected to happen at all. I suspect that if it can happen, this is a bug fix. Please check and update the commit message accordingly. Kevin
On 10/05/2016 10:02 AM, Kevin Wolf wrote: > Am 01.10.2016 um 00:00 hat John Snow geschrieben: >> There are a few places where we're fishing it out for ourselves. >> Let's not do that and instead use the helper. >> >> Signed-off-by: John Snow <jsnow@redhat.com> > > That change makes a difference when the block job is running its > completion part after block_job_defer_to_main_loop(). The commit message > could be more explicit about whether this is intentional or whether this > case is expected to happen at all. > > I suspect that if it can happen, this is a bug fix. Please check and > update the commit message accordingly. > Because I'm bad with being concise, I wrote a TLDR at the bottom. Otherwise, enjoy this wall of text. > Kevin > Intentional under the premise of: (1) Acquiring the context for which a job is not actually running under is likely incorrect (or at the very least misleading), and (2) If using the main thread context for any would-be callers is incorrect, this is a problem with the job lifetime that needs to be corrected anyway. In general, if we are acquiring the context to secure exclusive access to the BlockJob state itself, using the getter here is perfectly safe. If we are acquiring context for other reasons, we need to consider more carefully. The callers are: (A) bdrv_drain_all (block/io) Obtains context for the sake of pause/resume. Pauses all jobs before draining all BDSes. For starters, pausing a job that has deferred to main has no effect (and neither does resuming). This usage appears slightly erroneous, though, in that if we are not running from the main thread, we are definitely not securing exclusive rights to the block job. We could, in theory, race on reads/writes to the pause count field. This would be a bugfix. (B) find_block_job (all monitor context) Acquires context as a courtesy for its callers: - qmp_block_job_set_speed - qmp_block_job_cancel - qmp_block_job_pause - qmp_block_job_resume - qmp_block_job_complete In an "already deferred to main" sense... in general, if the job has already deferred to main we don't need to acquire the block's context to get safe access to the job, because we're already running in the main context. Further, none of these functions actually have any meaning for a job in such a state. - set_speed: Sets speed parameters, harmless either way. - cancel: Will set the cancelled boolean, reset iostatus, then attempt to enter the job. Since a job that enters the main loop remains busy, the enter is a NOP. The BlockBackend AIO context here is therefore extraneous, and the getter is safe. - pause: Only increments a counter, and will have no effect. - resume: Decrements a counter. Attempts to enter(), but as stated above this is a NOP. - complete: Calls .complete(), for which the only implementation is mirror_complete. Uh, this actually seems messy. Looks like there's nothing to prevent us from calling this after we've already told it to complete once. This could be a legitimate bug that this patch does nothing in particular to address. If complete() is shored up such that it can be called precisely once, this becomes safe. (C) qmp_query_block_jobs (monitor context) Just a getter. Using get_context is safe in either state. (D) run_block_job (qemu-img) Never called when the context is in the main loop anyway. Effectively no change here. So, with the exception of .complete, I think this is a safe change as it stands... However... Paolo wants to complicate my life and get rid of this getter for his own fiendish purposes. He suggests pushing down context acquisition into blockjob.c directly for any QMP callers: - qmp_block_job_set_speed -> block_job_set_speed - qmp_block_job_cancel -> block_job_cancel - qmp_block_job_pause -> block_job_user_pause - qmp_block_job_resume -> block_job_user_resume - qmp_block_job_complete -> block_job_complete - qmp_query_block_jobs -> block_job_query Most of these have only one caller in the QMP layer: block_job_set_speed block_job_user_pause block_job_user_resume block_job_query These can easily just take the context they need, removing external uses of job->blk for purposes of acquiring the context. block_job_cancel and block_job_complete are different. block_job_cancel is called in many places, but we can just add a similar block_job_user_cancel if we wanted a version which takes care to acquire context and one that does not. (Or we could just acquire the context regardless, but Paolo warned me ominously that recursive locks are EVIL. He sounded serious.) block_job_complete has no direct callers outside of QMP, but it is also used as a callback by block_job_complete_sync, used in qemu-img for run_block_job. I can probably rewrite qemu_img to avoid this usage. TLDR: - This change should be perfectly safe, but Paolo wants to get rid of this usage anyway. - At least 5/6 uses of external context grabbing can be internalized easily. - qemu-img's run_block_job needs to be refactored a bit, though I don't have an idea for that yet, but as you pointed out it needs to be done for the public/private split anyway. - block_job_complete needs to be touched up no matter what we do. - The aio_context getter can probably be removed entirely as per Paolo's wishes, but I'll have to change bdrv_drain_all a bit. A block_job_pause_all and block_job_resume all would work, though that's a bit special purpose. I could craft up a block_job_apply_all for the purpose instead. (e.g. block_job_apply_all(block_job_pause)) I think that answers everyone's questions... --js
On 06/10/2016 22:22, John Snow wrote: > Calls .complete(), for which the only implementation is > mirror_complete. Uh, this actually seems messy. Looks like there's > nothing to prevent us from calling this after we've already told it to > complete once. Yeah, it should have an if (s->should_complete) { return; } at the beginning. I have other mirror.c patches in my queue so I can take care of that. > block_job_cancel and block_job_complete are different. > > block_job_cancel is called in many places, but we can just add a similar > block_job_user_cancel if we wanted a version which takes care to acquire > context and one that does not. (Or we could just acquire the context > regardless, but Paolo warned me ominously that recursive locks are EVIL. > He sounded serious.) Not that many places: - block_job_finish_sync calls it, and you can just release/reacquire around the call to "finish(job, &local_err)". - there are two callers in blockdev.c, and you can just remove the acquire/release from blockdev.c if you push it in block_job_cancel. As to block_job_cancel_sync: - replication_stop is not acquiring s->secondary_disk->bs's AioContext. - there is no need to hold the AioContext between ->prepare and ->clean. My suggestion is to ref the blockjob after storing it in state->job (you probably should do that anyway) and unref'ing it in ->clean. Then you can call again let block_job_cancel_sync(bs->job) take the AioContext, which it will do in block_job_finish_sync. > block_job_complete has no direct callers outside of QMP, but it is also > used as a callback by block_job_complete_sync, used in qemu-img for > run_block_job. I can probably rewrite qemu_img to avoid this usage. No need to: qemu-img is not acquiring the AioContext, so it's okay to let block_job_complete do that (like block_job_cancel, block_job_complete will be called by block_job_finish_sync without the AioContext acquired). Paolo
As context to everyone else as to why I'm going down the rabbit hole of trying to remove external references to AioContext at all, see https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg00795.html On 10/07/2016 03:49 AM, Paolo Bonzini wrote: > > > On 06/10/2016 22:22, John Snow wrote: >> Calls .complete(), for which the only implementation is >> mirror_complete. Uh, this actually seems messy. Looks like there's >> nothing to prevent us from calling this after we've already told it to >> complete once. > > Yeah, it should have an > > if (s->should_complete) { > return; > } > > at the beginning. I have other mirror.c patches in my queue so I can > take care of that. > Or something up the stack at block_job_complete so it's not up to job implementations? What if the next implementer "forgets." >> block_job_cancel and block_job_complete are different. >> >> block_job_cancel is called in many places, but we can just add a similar >> block_job_user_cancel if we wanted a version which takes care to acquire >> context and one that does not. (Or we could just acquire the context >> regardless, but Paolo warned me ominously that recursive locks are EVIL. >> He sounded serious.) > > Not that many places: > > - block_job_finish_sync calls it, and you can just release/reacquire > around the call to "finish(job, &local_err)". > This makes me a little nervous because we went through the trouble of creating this callback, but we're going to assume we know that it's a public interface that will take the lock for itself (or otherwise does not require a lock.) In practice it works, but it seems needlessly mystifying in terms of proving correctness. > - there are two callers in blockdev.c, and you can just remove the > acquire/release from blockdev.c if you push it in block_job_cancel. > Makes sense; I don't like the association of (bs :: job) here anyway. Again we're grabbing context for a job where that job may not even be running. > As to block_job_cancel_sync: > Which I didn't audit, because no callers use job->blk to get the AioContext before calling this; they use bs if bs->job is present. > - replication_stop is not acquiring s->secondary_disk->bs's AioContext. > Seems like a bug on their part. Would be fixed by having cancel acquire context for itself. > - there is no need to hold the AioContext between ->prepare and ->clean. > My suggestion is to ref the blockjob after storing it in state->job > (you probably should do that anyway) and unref'ing it in ->clean. Then > you can call again let block_job_cancel_sync(bs->job) take the > AioContext, which it will do in block_job_finish_sync. Yeah, I should be reffing it anyway. The rest of this... What I think you mean is acquiring and releasing the context as needed for EACH of prepare, commit, abort, and clean as necessary, right? And then in this case, it simply wouldn't be necessary for abort, as the sync cancel would do it for us. > >> block_job_complete has no direct callers outside of QMP, but it is also >> used as a callback by block_job_complete_sync, used in qemu-img for >> run_block_job. I can probably rewrite qemu_img to avoid this usage. > > No need to: qemu-img is not acquiring the AioContext, so it's okay to > let block_job_complete do that (like block_job_cancel, > block_job_complete will be called by block_job_finish_sync without the > AioContext acquired). > Eh? Oh, you're right, it just gets it for the sake of aio_poll. > Paolo > Alright. Say I *do* push the acquisitions down into blockjob.c. What benefit does that provide? Won't I still need the block_job_get_aio_context() function (At least internally) to know which context to acquire? This would preclude you from deleting it. Plus... we remove some fairly simple locking mechanisms and then inflate it tenfold. I'm not convinced this is an improvement. As context and a refresher (for me when I re-read this email in 12 hours,) there are three places externally that are using an AioContext lock as acquired from *within* a BlockJob, excluding those that acquire a context separately from a Job and use that to reason that accesses to the job are safe (For example, blockdev_mark_auto_del.) (1) QMP interface for job management (2) bdrv_drain_all, in block/io.c (1) AFAICT, the QMP interface is concerned with assuring itself it has unique access to the BlockJob structure itself, and it doesn't really authentically care about the AIOContext itself -- just race-free access to the Job. This is not necessarily buggy today because, even though we grab the BlockBackend's context unconditionally, we already know the main/monitor thread is not accessing the blockjob. It's still silly, though. (2) bdrv_drain_all appears to be worried about the same thing; we just need to safely deliver pause/resume messages. I'm less sure about where this can run from, and suspect that if the job has deferred to main that this could be buggy. If bdrv_drain_all is called from context A and the job is running on context M having deferred from B, we may lock against context B (from A) and have racey access from between A/M. (Maybe?) It looks like all of these usages don't actually really care what context they're getting, just that they're getting safe access to the job itself. If you want to decouple jobs from their BlockBackend's AIO Context, perhaps we just need to replace these by a simple mutex...? As it stands, though, pushing AIOContext acquisitions down into blockjob.c isn't actually going to fix anything, is it? Why not just leave it be for right now? Would you be terribly offended if I left this patch as-is for now and we can work on removing the AioContext locks afterwards, or are you adamant that I cooperate in getting block_job_get_aio_context removed before I add more usages? Sorry I'm being so obtuse about this all. --js
On 13/10/2016 02:49, John Snow wrote: > As context to everyone else as to why I'm going down the rabbit hole of > trying to remove external references to AioContext at all, see > https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg00795.html > > On 10/07/2016 03:49 AM, Paolo Bonzini wrote: >> >> >> On 06/10/2016 22:22, John Snow wrote: >>> Calls .complete(), for which the only implementation is >>> mirror_complete. Uh, this actually seems messy. Looks like there's >>> nothing to prevent us from calling this after we've already told it to >>> complete once. >> >> Yeah, it should have an >> >> if (s->should_complete) { >> return; >> } >> >> at the beginning. I have other mirror.c patches in my queue so I can >> take care of that. >> > > Or something up the stack at block_job_complete so it's not up to job > implementations? What if the next implementer "forgets." Yeah, that would require moving s->should_complete up to BlockJob. >>> block_job_cancel and block_job_complete are different. >>> >>> block_job_cancel is called in many places, but we can just add a similar >>> block_job_user_cancel if we wanted a version which takes care to acquire >>> context and one that does not. (Or we could just acquire the context >>> regardless, but Paolo warned me ominously that recursive locks are EVIL. >>> He sounded serious.) >> >> Not that many places: >> >> - block_job_finish_sync calls it, and you can just release/reacquire >> around the call to "finish(job, &local_err)". > > This makes me a little nervous because we went through the trouble of > creating this callback, but we're going to assume we know that it's a > public interface that will take the lock for itself (or otherwise does > not require a lock.) > > In practice it works, but it seems needlessly mystifying in terms of > proving correctness. It's _much_ easier to assume that all callbacks take the lock themselves. It's counterintuitive (just like the idea that recursive locks are bad :)), but the point is that if you second guess the callbacks and invoke them you might get the locking order wrong. This ends up with ugly AB-BA deadlocks. >> - replication_stop is not acquiring s->secondary_disk->bs's AioContext. > > Seems like a bug on their part. Would be fixed by having cancel acquire > context for itself. Yep. > Yeah, I should be reffing it anyway. > > The rest of this... What I think you mean is acquiring and releasing the > context as needed for EACH of prepare, commit, abort, and clean as > necessary, right? > > And then in this case, it simply wouldn't be necessary for abort, as the > sync cancel would do it for us. Right. > Alright. > > Say I *do* push the acquisitions down into blockjob.c. What benefit does > that provide? Won't I still need the block_job_get_aio_context() > function (At least internally) to know which context to acquire? This > would preclude you from deleting it. > > Plus... we remove some fairly simple locking mechanisms and then inflate > it tenfold. I'm not convinced this is an improvement. The improvement is that you can now replace the huge lock with a smaller one (you don't have to do it now, but you could). Furthermore, the small lock is a) non-recursive b) a leaf lock. This means that you've removed a lot of opportunities for deadlock, and generally made things easier to reason about. Furthermore, with large locks you never know what they actually protect; cue the confusion between aio_context_acquire/release and bdrv_drained_begin/end. And it's easier to forget them if you force the caller to do it; and we have an example with replication. Again, it can be counterintuitive that it's better, but it is. :) > (1) QMP interface for job management > (2) bdrv_drain_all, in block/io.c > > (1) AFAICT, the QMP interface is concerned with assuring itself it has > unique access to the BlockJob structure itself, and it doesn't really > authentically care about the AIOContext itself -- just race-free access > to the Job. Yes. The protection you need here is mostly against concurrent completion of the job (and concurrent manipulation of fields such as job->busy, job->paused, etc. > This is not necessarily buggy today because, even though we grab the > BlockBackend's context unconditionally, we already know the main/monitor > thread is not accessing the blockjob. It's still silly, though. > > (2) bdrv_drain_all appears to be worried about the same thing; we just > need to safely deliver pause/resume messages. > > I'm less sure about where this can run from, and suspect that if the job > has deferred to main that this could be buggy. If bdrv_drain_all is > called from context A and the job is running on context M having > deferred from B, we may lock against context B (from A) and have racey > access from between A/M. (Maybe?) bdrv_drain_all is BQL-only (because it acquires more than one AioContext). > Would you be terribly offended if I left this patch as-is for now and we > can work on removing the AioContext locks afterwards, or are you adamant > that I cooperate in getting block_job_get_aio_context removed before I > add more usages? Removing block_job_get_aio_context is necessary to fix dataplane bugs, so I'd really prefer to have that in 2.8. But why do you actually need this patch at all? You can do neither thing---not push the lock down and not add more usages---which is always the best. :) Paolo
diff --git a/block/io.c b/block/io.c index 868b065..f26a503 100644 --- a/block/io.c +++ b/block/io.c @@ -288,7 +288,7 @@ void bdrv_drain_all(void) GSList *aio_ctxs = NULL, *ctx; while ((job = block_job_next(job))) { - AioContext *aio_context = blk_get_aio_context(job->blk); + AioContext *aio_context = block_job_get_aio_context(job); aio_context_acquire(aio_context); block_job_pause(job, false); @@ -347,7 +347,7 @@ void bdrv_drain_all(void) job = NULL; while ((job = block_job_next(job))) { - AioContext *aio_context = blk_get_aio_context(job->blk); + AioContext *aio_context = block_job_get_aio_context(job); aio_context_acquire(aio_context); block_job_resume(job); diff --git a/blockdev.c b/blockdev.c index 268452f..0ac507f 100644 --- a/blockdev.c +++ b/blockdev.c @@ -3596,7 +3596,7 @@ static BlockJob *find_block_job(const char *id, AioContext **aio_context, return NULL; } - *aio_context = blk_get_aio_context(job->blk); + *aio_context = block_job_get_aio_context(job); aio_context_acquire(*aio_context); return job; @@ -3956,7 +3956,7 @@ BlockJobInfoList *qmp_query_block_jobs(Error **errp) for (job = block_job_next(NULL); job; job = block_job_next(job)) { BlockJobInfoList *elem = g_new0(BlockJobInfoList, 1); - AioContext *aio_context = blk_get_aio_context(job->blk); + AioContext *aio_context = block_job_get_aio_context(job); aio_context_acquire(aio_context); elem->value = block_job_query(job); diff --git a/blockjob.c b/blockjob.c index 2a35f50..073d9ce 100644 --- a/blockjob.c +++ b/blockjob.c @@ -78,7 +78,7 @@ BlockJob *block_job_get(const char *id) * block_job_defer_to_main_loop() where it runs in the QEMU main loop. Code * that supports both cases uses this helper function. */ -static AioContext *block_job_get_aio_context(BlockJob *job) +AioContext *block_job_get_aio_context(BlockJob *job) { return job->deferred_to_main_loop ? qemu_get_aio_context() : diff --git a/include/block/blockjob.h b/include/block/blockjob.h index 081f6c2..6f28c73 100644 --- a/include/block/blockjob.h +++ b/include/block/blockjob.h @@ -513,4 +513,13 @@ void block_job_txn_unref(BlockJobTxn *txn); */ void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job); +/** + * block_job_get_aio_context: + * @job: Job to get the aio_context for + * + * Fetch the current context for the given BlockJob. May be the main loop if + * the job has already deferred to main for final cleanup. + */ +AioContext *block_job_get_aio_context(BlockJob *job); + #endif diff --git a/qemu-img.c b/qemu-img.c index ceffefe..204fa9c 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -793,7 +793,7 @@ static void common_block_job_cb(void *opaque, int ret) static void run_block_job(BlockJob *job, Error **errp) { - AioContext *aio_context = blk_get_aio_context(job->blk); + AioContext *aio_context = block_job_get_aio_context(job); do { aio_poll(aio_context, true);
There are a few places where we're fishing it out for ourselves. Let's not do that and instead use the helper. Signed-off-by: John Snow <jsnow@redhat.com> --- block/io.c | 4 ++-- blockdev.c | 4 ++-- blockjob.c | 2 +- include/block/blockjob.h | 9 +++++++++ qemu-img.c | 2 +- 5 files changed, 15 insertions(+), 6 deletions(-)