diff mbox

[RFC,2/3] block-backend: add drained_begin / drained_end ops

Message ID 20170316004603.20609-3-jsnow@redhat.com
State New
Headers show

Commit Message

John Snow March 16, 2017, 12:46 a.m. UTC
Signed-off-by: John Snow <jsnow@redhat.com>

---

RFC questions:

- Does the presence of blk->quiesce_counter relieve the burden of needing
  blk->public.io_limits_disabled? I could probably eliminate this counter
  entirely and just spy on the root node's quiescent state at key moments
  instead. I am confident I'm traipsing on delicate drain semantics.

- Should I treat the separation of a quisced BDS/BB as a drained_end request
  as an analogue to how blk_insert_bs (in this patch) handles this?

Signed-off-by: John Snow <jsnow@redhat.com>
---
 block/block-backend.c          | 25 +++++++++++++++++++++++++
 include/sysemu/block-backend.h |  8 ++++++++
 2 files changed, 33 insertions(+)

Comments

Paolo Bonzini March 16, 2017, 8:25 a.m. UTC | #1
On 16/03/2017 01:46, John Snow wrote:
> 
> - Does the presence of blk->quiesce_counter relieve the burden of needing
>   blk->public.io_limits_disabled? I could probably eliminate this counter
>   entirely and just spy on the root node's quiescent state at key moments
>   instead. I am confident I'm traipsing on delicate drain semantics.

Tricky question.  I believe io_limits_disabled is a bit of a hack.

Certainly you could get rid of disable_external now that we have
drained_begin/drained_end, but it would be a separate patch.

> - Should I treat the separation of a quisced BDS/BB as a drained_end request
>   as an analogue to how blk_insert_bs (in this patch) handles this?

I think so.

Paolo
Kevin Wolf March 16, 2017, 5:25 p.m. UTC | #2
Am 16.03.2017 um 01:46 hat John Snow geschrieben:
> Signed-off-by: John Snow <jsnow@redhat.com>
> 
> ---
> 
> RFC questions:
> 
> - Does the presence of blk->quiesce_counter relieve the burden of needing
>   blk->public.io_limits_disabled? I could probably eliminate this counter
>   entirely and just spy on the root node's quiescent state at key moments
>   instead. I am confident I'm traipsing on delicate drain semantics.

Not for 2.9 anyway. :-)

> - Should I treat the separation of a quisced BDS/BB as a drained_end request
>   as an analogue to how blk_insert_bs (in this patch) handles this?

It's already taken care of, you don't need to add anything for that
(see below).

> Signed-off-by: John Snow <jsnow@redhat.com>
> ---
>  block/block-backend.c          | 25 +++++++++++++++++++++++++
>  include/sysemu/block-backend.h |  8 ++++++++
>  2 files changed, 33 insertions(+)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 5742c09..eb85e8b 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -65,6 +65,8 @@ struct BlockBackend {
>      bool allow_write_beyond_eof;
>  
>      NotifierList remove_bs_notifiers, insert_bs_notifiers;
> +
> +    int quiesce_counter;
>  };
>  
>  typedef struct BlockBackendAIOCB {
> @@ -559,6 +561,11 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp)
>      }
>      bdrv_ref(bs);
>  
> +    /* The new BDS may be quiescent, we should attempt to match */
> +    if (bs->quiesce_counter) {
> +        blk_root_drained_begin(blk->root);
> +    }

Do you really need this hunk? BB and BDS should already be kept in sync,
it's just the BB and its user that aren't. This is the call chain that
already leads to blk_root_drained_begin():

blk_insert_bs
    bdrv_root_attach_child
        bdrv_replace_child
            bdrv_replace_child_noperm
                child->role->drained_begin(child)

Did you check whether blk->public.io_limits_disabled ever goes back to 0
with your patch? I think it's increased twice and decreased only once.

To answer your RFC question, bdrv_replace_child_noperm() also takes care
of calling drained_end() when the BDS is detached from the BB.

>      notifier_list_notify(&blk->insert_bs_notifiers, blk);
>      if (blk->public.throttle_state) {
>          throttle_timers_attach_aio_context(
> @@ -705,6 +712,11 @@ void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops,
>  
>      blk->dev_ops = ops;
>      blk->dev_opaque = opaque;
> +
> +    /* Are we currently quiesced? Should we enforce this right now? */
> +    if (blk->quiesce_counter && ops->drained_begin) {
> +        ops->drained_begin(opaque);
> +    }
>  }
>  
>  /*
> @@ -1870,9 +1882,15 @@ static void blk_root_drained_begin(BdrvChild *child)
>  {
>      BlockBackend *blk = child->opaque;
>  
> +    blk->quiesce_counter++;
> +
>      /* Note that blk->root may not be accessible here yet if we are just
>       * attaching to a BlockDriverState that is drained. Use child instead. */
>  
> +    if (blk->dev_ops && blk->dev_ops->drained_begin) {
> +        blk->dev_ops->drained_begin(blk->dev_opaque);
> +    }

Should we do this only if blk->quiesce_counter was 0? (And
dev_ops->drained_end() when it reaches 0 again.)

The difference is that the implementation of the callback would have to
have its own counter as it is in this patch. Not really that bad, but at
the moment I don't see a reason why we would need to support nested
drained sections for dev_ops.

> +
>      if (blk->public.io_limits_disabled++ == 0) {
>          throttle_group_restart_blk(blk);
>      }
> @@ -1882,6 +1900,13 @@ static void blk_root_drained_end(BdrvChild *child)
>  {
>      BlockBackend *blk = child->opaque;
>  
> +    assert(blk->quiesce_counter);
> +    blk->quiesce_counter--;
> +
>      assert(blk->public.io_limits_disabled);
>      --blk->public.io_limits_disabled;
> +
> +    if (blk->dev_ops && blk->dev_ops->drained_end) {
> +        blk->dev_ops->drained_end(blk->dev_opaque);
> +    }

The basic idea looks good to me.

Kevin
John Snow March 16, 2017, 7:58 p.m. UTC | #3
On 03/16/2017 01:25 PM, Kevin Wolf wrote:
> Am 16.03.2017 um 01:46 hat John Snow geschrieben:
>> Signed-off-by: John Snow <jsnow@redhat.com>
>>
>> ---
>>
>> RFC questions:
>>
>> - Does the presence of blk->quiesce_counter relieve the burden of needing
>>   blk->public.io_limits_disabled? I could probably eliminate this counter
>>   entirely and just spy on the root node's quiescent state at key moments
>>   instead. I am confident I'm traipsing on delicate drain semantics.
> 
> Not for 2.9 anyway. :-)
> 
>> - Should I treat the separation of a quisced BDS/BB as a drained_end request
>>   as an analogue to how blk_insert_bs (in this patch) handles this?
> 
> It's already taken care of, you don't need to add anything for that
> (see below).
> 
>> Signed-off-by: John Snow <jsnow@redhat.com>
>> ---
>>  block/block-backend.c          | 25 +++++++++++++++++++++++++
>>  include/sysemu/block-backend.h |  8 ++++++++
>>  2 files changed, 33 insertions(+)
>>
>> diff --git a/block/block-backend.c b/block/block-backend.c
>> index 5742c09..eb85e8b 100644
>> --- a/block/block-backend.c
>> +++ b/block/block-backend.c
>> @@ -65,6 +65,8 @@ struct BlockBackend {
>>      bool allow_write_beyond_eof;
>>  
>>      NotifierList remove_bs_notifiers, insert_bs_notifiers;
>> +
>> +    int quiesce_counter;
>>  };
>>  
>>  typedef struct BlockBackendAIOCB {
>> @@ -559,6 +561,11 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp)
>>      }
>>      bdrv_ref(bs);
>>  
>> +    /* The new BDS may be quiescent, we should attempt to match */
>> +    if (bs->quiesce_counter) {
>> +        blk_root_drained_begin(blk->root);
>> +    }
> 
> Do you really need this hunk? BB and BDS should already be kept in sync,
> it's just the BB and its user that aren't. This is the call chain that
> already leads to blk_root_drained_begin():
> 

Oh, good call..!

> blk_insert_bs
>     bdrv_root_attach_child
>         bdrv_replace_child
>             bdrv_replace_child_noperm
>                 child->role->drained_begin(child)
> 
> Did you check whether blk->public.io_limits_disabled ever goes back to 0
> with your patch? I think it's increased twice and decreased only once.
> 

I didn't, but with your change above, it works out just fine.

> To answer your RFC question, bdrv_replace_child_noperm() also takes care
> of calling drained_end() when the BDS is detached from the BB.
> 

Excellent.

>>      notifier_list_notify(&blk->insert_bs_notifiers, blk);
>>      if (blk->public.throttle_state) {
>>          throttle_timers_attach_aio_context(
>> @@ -705,6 +712,11 @@ void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops,
>>  
>>      blk->dev_ops = ops;
>>      blk->dev_opaque = opaque;
>> +
>> +    /* Are we currently quiesced? Should we enforce this right now? */
>> +    if (blk->quiesce_counter && ops->drained_begin) {
>> +        ops->drained_begin(opaque);
>> +    }
>>  }
>>  
>>  /*
>> @@ -1870,9 +1882,15 @@ static void blk_root_drained_begin(BdrvChild *child)
>>  {
>>      BlockBackend *blk = child->opaque;
>>  
>> +    blk->quiesce_counter++;
>> +
>>      /* Note that blk->root may not be accessible here yet if we are just
>>       * attaching to a BlockDriverState that is drained. Use child instead. */
>>  
>> +    if (blk->dev_ops && blk->dev_ops->drained_begin) {
>> +        blk->dev_ops->drained_begin(blk->dev_opaque);
>> +    }
> 
> Should we do this only if blk->quiesce_counter was 0? (And
> dev_ops->drained_end() when it reaches 0 again.)
> 

Ah yes, actually, that led to the question about .io_limits_disabled,
because I could have both begin/end only operate on the edge between 0/1.

> The difference is that the implementation of the callback would have to
> have its own counter as it is in this patch. Not really that bad, but at
> the moment I don't see a reason why we would need to support nested
> drained sections for dev_ops.
> 

Coincidentally, block_job_pause already supports counters! Entirely by
accident this worked.

>> +
>>      if (blk->public.io_limits_disabled++ == 0) {
>>          throttle_group_restart_blk(blk);
>>      }
>> @@ -1882,6 +1900,13 @@ static void blk_root_drained_end(BdrvChild *child)
>>  {
>>      BlockBackend *blk = child->opaque;
>>  
>> +    assert(blk->quiesce_counter);
>> +    blk->quiesce_counter--;
>> +
>>      assert(blk->public.io_limits_disabled);
>>      --blk->public.io_limits_disabled;
>> +
>> +    if (blk->dev_ops && blk->dev_ops->drained_end) {
>> +        blk->dev_ops->drained_end(blk->dev_opaque);
>> +    }
> 
> The basic idea looks good to me.
> 
> Kevin
> 

I'll send a v2 shortly, ever-so-slightly touched up.

--js
diff mbox

Patch

diff --git a/block/block-backend.c b/block/block-backend.c
index 5742c09..eb85e8b 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -65,6 +65,8 @@  struct BlockBackend {
     bool allow_write_beyond_eof;
 
     NotifierList remove_bs_notifiers, insert_bs_notifiers;
+
+    int quiesce_counter;
 };
 
 typedef struct BlockBackendAIOCB {
@@ -559,6 +561,11 @@  int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp)
     }
     bdrv_ref(bs);
 
+    /* The new BDS may be quiescent, we should attempt to match */
+    if (bs->quiesce_counter) {
+        blk_root_drained_begin(blk->root);
+    }
+
     notifier_list_notify(&blk->insert_bs_notifiers, blk);
     if (blk->public.throttle_state) {
         throttle_timers_attach_aio_context(
@@ -705,6 +712,11 @@  void blk_set_dev_ops(BlockBackend *blk, const BlockDevOps *ops,
 
     blk->dev_ops = ops;
     blk->dev_opaque = opaque;
+
+    /* Are we currently quiesced? Should we enforce this right now? */
+    if (blk->quiesce_counter && ops->drained_begin) {
+        ops->drained_begin(opaque);
+    }
 }
 
 /*
@@ -1870,9 +1882,15 @@  static void blk_root_drained_begin(BdrvChild *child)
 {
     BlockBackend *blk = child->opaque;
 
+    blk->quiesce_counter++;
+
     /* Note that blk->root may not be accessible here yet if we are just
      * attaching to a BlockDriverState that is drained. Use child instead. */
 
+    if (blk->dev_ops && blk->dev_ops->drained_begin) {
+        blk->dev_ops->drained_begin(blk->dev_opaque);
+    }
+
     if (blk->public.io_limits_disabled++ == 0) {
         throttle_group_restart_blk(blk);
     }
@@ -1882,6 +1900,13 @@  static void blk_root_drained_end(BdrvChild *child)
 {
     BlockBackend *blk = child->opaque;
 
+    assert(blk->quiesce_counter);
+    blk->quiesce_counter--;
+
     assert(blk->public.io_limits_disabled);
     --blk->public.io_limits_disabled;
+
+    if (blk->dev_ops && blk->dev_ops->drained_end) {
+        blk->dev_ops->drained_end(blk->dev_opaque);
+    }
 }
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 096c17f..c6f4408 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -58,6 +58,14 @@  typedef struct BlockDevOps {
      * Runs when the size changed (e.g. monitor command block_resize)
      */
     void (*resize_cb)(void *opaque);
+    /*
+     * Runs when the backend receives a drain request.
+     */
+    void (*drained_begin)(void *opaque);
+    /*
+     * Runs when the backend's drain request ends.
+     */
+    void (*drained_end)(void *opaque);
 } BlockDevOps;
 
 /* This struct is embedded in (the private) BlockBackend struct and contains