Message ID | 1444650651-26227-6-git-send-email-famz@redhat.com |
---|---|
State | New |
Headers | show |
Am 12.10.2015 um 13:50 hat Fam Zheng geschrieben: > The semantics is that after bdrv_drained_begin(bs), bs will not get new external > requests until the matching bdrv_drained_end(bs). > > Signed-off-by: Fam Zheng <famz@redhat.com> > --- > block.c | 2 ++ > block/io.c | 18 ++++++++++++++++++ > include/block/block.h | 19 +++++++++++++++++++ > include/block/block_int.h | 2 ++ > 4 files changed, 41 insertions(+) > > diff --git a/block.c b/block.c > index 1f90b47..9b28a07 100644 > --- a/block.c > +++ b/block.c > @@ -2058,6 +2058,8 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest, > bs_dest->device_list = bs_src->device_list; > bs_dest->blk = bs_src->blk; > > + bs_dest->quiesce_counter = bs_src->quiesce_counter; > + > memcpy(bs_dest->op_blockers, bs_src->op_blockers, > sizeof(bs_dest->op_blockers)); > } This feels wrong. As I understand it, bdrv_drained_begin/end works on specific nodes and not on trees. Including the field in bdrv_move_feature_fields() means that it moves to the top of the tree (i.e. it stays at in the same C object, which however belongs to a different logical node now). What I could imagine is that you did this so you can use bdrv_draind_end() on the same BDS as you called bdrv_drained_start() on. However, that's not the interface of bdrv_swap(), which really means that the BDSes are swapped. So with this hunk you just end up having a bug that cancels out the weirdness of the bdrv_swap() interface. If you rebase on my bdrv_swap() removal series, things become a bit more obvious. If you don't, you should drop this hunk and change some bdrv_drained_end() calls, e.g. in the next patch, you'd have to call bdrv_drained_begin(state->old_bs), but bdrv_drained_end(state->new_bs). The rest of this patch looks good. Kevin
On 12/10/2015 13:50, Fam Zheng wrote: > +void bdrv_drained_begin(BlockDriverState *bs) > +{ > + if (bs->quiesce_counter++) { > + return; > + } > + aio_disable_external(bdrv_get_aio_context(bs)); > + bdrv_drain(bs); > +} I think bdrv_drain should be called unconditionally, i.e. before the "if". This should also solve Kevin's doubt about new allocating write request reenabling the timer: any write request from the drained section happens normally, until you get a nested drain request and then the callback completes the requests. Paolo
Am 12.10.2015 um 16:27 hat Paolo Bonzini geschrieben: > > > On 12/10/2015 13:50, Fam Zheng wrote: > > +void bdrv_drained_begin(BlockDriverState *bs) > > +{ > > + if (bs->quiesce_counter++) { > > + return; > > + } > > + aio_disable_external(bdrv_get_aio_context(bs)); > > + bdrv_drain(bs); > > +} > > I think bdrv_drain should be called unconditionally, i.e. before the > "if". This should also solve Kevin's doubt about new allocating write > request reenabling the timer: any write request from the drained section > happens normally, until you get a nested drain request and then the > callback completes the requests. This would mean that once you've sent an I/O request inside a drain section, you have to expect that more internal I/O might be going on after the request has completed. If you don't want this, you have to issue another bdrv_drain() or use a nested bdrv_drained_begin/end() section. Sounds reasonable enough to me, but I guess this should be explicitly documented. Kevin
On 13/10/2015 11:31, Kevin Wolf wrote: > This would mean that once you've sent an I/O request inside a drain > section, you have to expect that more internal I/O might be going on > after the request has completed. If you don't want this, you have to > issue another bdrv_drain() or use a nested bdrv_drained_begin/end() > section. Yes. > Sounds reasonable enough to me, but I guess this should be explicitly > documented. I agree. Perhaps bdrv_drained_begin/end() could be renamed to bdrv_drain_and_lock() / bdrv_unlock()? Paolo
Am 13.10.2015 um 12:39 hat Paolo Bonzini geschrieben: > > > On 13/10/2015 11:31, Kevin Wolf wrote: > > This would mean that once you've sent an I/O request inside a drain > > section, you have to expect that more internal I/O might be going on > > after the request has completed. If you don't want this, you have to > > issue another bdrv_drain() or use a nested bdrv_drained_begin/end() > > section. > > Yes. > > > Sounds reasonable enough to me, but I guess this should be explicitly > > documented. > > I agree. Perhaps bdrv_drained_begin/end() could be renamed to > bdrv_drain_and_lock() / bdrv_unlock()? It's not very obvious what bdrv_unlock() refers to, so I prefer the current naming. Just making sure that the comment for bdrv_drained_begin explains the exact semantics should be good enough. Kevin
diff --git a/block.c b/block.c index 1f90b47..9b28a07 100644 --- a/block.c +++ b/block.c @@ -2058,6 +2058,8 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest, bs_dest->device_list = bs_src->device_list; bs_dest->blk = bs_src->blk; + bs_dest->quiesce_counter = bs_src->quiesce_counter; + memcpy(bs_dest->op_blockers, bs_src->op_blockers, sizeof(bs_dest->op_blockers)); } diff --git a/block/io.c b/block/io.c index 94e18e6..5c088d5 100644 --- a/block/io.c +++ b/block/io.c @@ -2618,3 +2618,21 @@ void bdrv_flush_io_queue(BlockDriverState *bs) } bdrv_start_throttled_reqs(bs); } + +void bdrv_drained_begin(BlockDriverState *bs) +{ + if (bs->quiesce_counter++) { + return; + } + aio_disable_external(bdrv_get_aio_context(bs)); + bdrv_drain(bs); +} + +void bdrv_drained_end(BlockDriverState *bs) +{ + assert(bs->quiesce_counter > 0); + if (--bs->quiesce_counter > 0) { + return; + } + aio_enable_external(bdrv_get_aio_context(bs)); +} diff --git a/include/block/block.h b/include/block/block.h index 2dd6630..c4f6eef 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -619,4 +619,23 @@ void bdrv_flush_io_queue(BlockDriverState *bs); BlockAcctStats *bdrv_get_stats(BlockDriverState *bs); +/** + * bdrv_drained_begin: + * + * Begin a quiesced section for exclusive access to the BDS, by disabling + * external request sources including NBD server and device model. Note that + * this doesn't block timers or coroutines from submitting more requests, which + * means block_job_pause is still necessary. + * + * This function can be recursive. + */ +void bdrv_drained_begin(BlockDriverState *bs); + +/** + * bdrv_drained_end: + * + * End a quiescent section started by bdrv_drained_begin(). + */ +void bdrv_drained_end(BlockDriverState *bs); + #endif diff --git a/include/block/block_int.h b/include/block/block_int.h index 14ad4c3..7c58221 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -456,6 +456,8 @@ struct BlockDriverState { /* threshold limit for writes, in bytes. "High water mark". */ uint64_t write_threshold_offset; NotifierWithReturn write_threshold_notifier; + + int quiesce_counter; };
The semantics is that after bdrv_drained_begin(bs), bs will not get new external requests until the matching bdrv_drained_end(bs). Signed-off-by: Fam Zheng <famz@redhat.com> --- block.c | 2 ++ block/io.c | 18 ++++++++++++++++++ include/block/block.h | 19 +++++++++++++++++++ include/block/block_int.h | 2 ++ 4 files changed, 41 insertions(+)