diff mbox

[v5,17/38] block: Add BlockBackendRootState

Message ID 1442589793-7105-18-git-send-email-mreitz@redhat.com
State New
Headers show

Commit Message

Max Reitz Sept. 18, 2015, 3:22 p.m. UTC
This structure will store some of the state of the root BDS if the BDS
tree is removed, so that state can be restored once a new BDS tree is
inserted.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/block-backend.c          | 37 +++++++++++++++++++++++++++++++++++++
 include/block/block_int.h      | 10 ++++++++++
 include/qemu/typedefs.h        |  1 +
 include/sysemu/block-backend.h |  2 ++
 4 files changed, 50 insertions(+)

Comments

Kevin Wolf Sept. 22, 2015, 2:17 p.m. UTC | #1
Am 18.09.2015 um 17:22 hat Max Reitz geschrieben:
> This structure will store some of the state of the root BDS if the BDS
> tree is removed, so that state can be restored once a new BDS tree is
> inserted.

This is magic that is bound to bite us sooner or later. I see that we
have to do this in order to maintain compatibility, but imagine we'll
have multiple BlockBackends sitting on top of the same BDS. They'll be
fighting for whose throttling settings are applied. With this approach,
the winner is the BlockBackend that most recently went from no medium to
medium, which might just be the most surprising way to solve the
problem.

I'll detail one aspect of the problem below.

The correct way to solve this seems to be that each BB has its own I/O
throttling filter. Actually, if we could lift the throttling code to
BlockBackend, that might solve the problem.

I guess the same applies for the other fields in BlockBackendRootState.

> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> ---
>  block/block-backend.c          | 37 +++++++++++++++++++++++++++++++++++++
>  include/block/block_int.h      | 10 ++++++++++
>  include/qemu/typedefs.h        |  1 +
>  include/sysemu/block-backend.h |  2 ++
>  4 files changed, 50 insertions(+)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 1f2cd9b..9590be5 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -13,6 +13,7 @@
>  #include "sysemu/block-backend.h"
>  #include "block/block_int.h"
>  #include "block/blockjob.h"
> +#include "block/throttle-groups.h"
>  #include "sysemu/blockdev.h"
>  #include "sysemu/sysemu.h"
>  #include "qapi-event.h"
> @@ -37,6 +38,10 @@ struct BlockBackend {
>      /* the block size for which the guest device expects atomicity */
>      int guest_block_size;
>  
> +    /* If the BDS tree is removed, some of its options are stored here (which
> +     * can be used to restore those options in the new BDS on insert) */
> +    BlockBackendRootState root_state;
> +
>      /* I/O stats (display with "info blockstats"). */
>      BlockAcctStats stats;
>  
> @@ -161,6 +166,7 @@ static void blk_delete(BlockBackend *blk)
>          bdrv_unref(blk->bs);
>          blk->bs = NULL;
>      }
> +    g_free(blk->root_state.throttle_group_name);
>      /* Avoid double-remove after blk_hide_on_behalf_of_hmp_drive_del() */
>      if (blk->name[0]) {
>          QTAILQ_REMOVE(&blk_backends, blk, link);
> @@ -1050,3 +1056,34 @@ int blk_probe_geometry(BlockBackend *blk, HDGeometry *geo)
>  {
>      return bdrv_probe_geometry(blk->bs, geo);
>  }
> +
> +/*
> + * Updates the BlockBackendRootState object with data from the currently
> + * attached BlockDriverState.
> + */
> +void blk_update_root_state(BlockBackend *blk)
> +{
> +    assert(blk->bs);
> +
> +    blk->root_state.open_flags    = blk->bs->open_flags;
> +    blk->root_state.read_only     = blk->bs->read_only;
> +    blk->root_state.detect_zeroes = blk->bs->detect_zeroes;
> +
> +    blk->root_state.io_limits_enabled = blk->bs->io_limits_enabled;
> +
> +    g_free(blk->root_state.throttle_group_name);
> +    if (blk->bs->throttle_state) {
> +        throttle_get_config(blk->bs->throttle_state,
> +                            &blk->root_state.throttle_config);
> +        blk->root_state.throttle_group_name =
> +            g_strdup(throttle_group_get_name(blk->bs));

The throttling group name is essentially a weak reference. I think
making it strong is worth considering.

If you attach a new BDS, if the old throttling group still exists, the
new BDS is re-added there, as expected. If the throttling group has got
changed limits meanwhile, you need to choose whether to apply the limits
saved here or use the new limits for the BDS. I guess the latter will be
the correct choice.

If the old throttling group doesn't exist any more, a new one is created
with the old name and limits. If two BlockBackends eject their media at
separate times and the throttle group goes away, reinstantiating the
throttle group by attaching a BDS to both BBs will use the config of the
BB that got a new medium first (as opposed to: the latest setting the
throttling group had before it was freed).

If the old throttling group was freed, but then a new throttling group
was created with the same name, we get attached to that new group with
its new limits. This could be slightly surprising as well.

Kevin
Max Reitz Sept. 22, 2015, 3 p.m. UTC | #2
On 22.09.2015 16:17, Kevin Wolf wrote:
> Am 18.09.2015 um 17:22 hat Max Reitz geschrieben:
>> This structure will store some of the state of the root BDS if the BDS
>> tree is removed, so that state can be restored once a new BDS tree is
>> inserted.
> 
> This is magic that is bound to bite us sooner or later. I see that we
> have to do this in order to maintain compatibility, but imagine we'll
> have multiple BlockBackends sitting on top of the same BDS. They'll be
> fighting for whose throttling settings are applied. With this approach,
> the winner is the BlockBackend that most recently went from no medium to
> medium, which might just be the most surprising way to solve the
> problem.

Well, my first answer to this is that as far as I remember and as far as
I can see right now, it's only qmp_blockdev_change_medium() that
accesses this structure. This is effectively a legacy command, even
though it is introduced by this series.

When using the "atomic" operations, this structure will not be used.

So let's examine what cases of multiple BBs on top of the same BDS we
can have, that are relevant here: One of the BBs always has to be the BB
belonging to a device such as a CD drive, otherwise we cannot invoke the
change command. The other BB(s) can be either:
(1) Also BBs belonging to CDD-like devices (let's call them "tray
    devices")
(2) Other BBs such as block job BBs or NBD BBs

In the first case, the scenario you described can indeed occur, but only
if the user is indeed using the change command. I'm very much inclined
to say it's the user's own fault for having multiple tray device BBs on
top of a single BDS and managing them with the change command. In any
case, with the current state of having throttling at the BDS level, we
are guarenteed to break something, at least.

In the second case, throttling will change for all the attached BBs
whenever the BDS is inserted into a throttled tray device. This seems
correct to me, at least as long as we have throttling on the BDS level.

> I'll detail one aspect of the problem below.
> 
> The correct way to solve this seems to be that each BB has its own I/O
> throttling filter. Actually, if we could lift the throttling code to
> BlockBackend, that might solve the problem.

So yes, as long as we have throttling on the BDS level, something may
always break when having multiple BBs per BDS, because you'd probably
want throttling to be on the BB level. But lifting that doesn't seem a
trivial task, and I don't really know whether I want this in this series.

Especially considering that right now you cannot have multiple BBs per BDS.

All in all: Yes, before we allow multiple BBs per BDS we want throttling
to be on the BB level. But this is nothing that should be done in or for
this series, since right now we do not allow multiple BBs per BDS.

> I guess the same applies for the other fields in BlockBackendRootState.
> 
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>> ---
>>  block/block-backend.c          | 37 +++++++++++++++++++++++++++++++++++++
>>  include/block/block_int.h      | 10 ++++++++++
>>  include/qemu/typedefs.h        |  1 +
>>  include/sysemu/block-backend.h |  2 ++
>>  4 files changed, 50 insertions(+)
>>
>> diff --git a/block/block-backend.c b/block/block-backend.c
>> index 1f2cd9b..9590be5 100644
>> --- a/block/block-backend.c
>> +++ b/block/block-backend.c
>> @@ -13,6 +13,7 @@
>>  #include "sysemu/block-backend.h"
>>  #include "block/block_int.h"
>>  #include "block/blockjob.h"
>> +#include "block/throttle-groups.h"
>>  #include "sysemu/blockdev.h"
>>  #include "sysemu/sysemu.h"
>>  #include "qapi-event.h"
>> @@ -37,6 +38,10 @@ struct BlockBackend {
>>      /* the block size for which the guest device expects atomicity */
>>      int guest_block_size;
>>  
>> +    /* If the BDS tree is removed, some of its options are stored here (which
>> +     * can be used to restore those options in the new BDS on insert) */
>> +    BlockBackendRootState root_state;
>> +
>>      /* I/O stats (display with "info blockstats"). */
>>      BlockAcctStats stats;
>>  
>> @@ -161,6 +166,7 @@ static void blk_delete(BlockBackend *blk)
>>          bdrv_unref(blk->bs);
>>          blk->bs = NULL;
>>      }
>> +    g_free(blk->root_state.throttle_group_name);
>>      /* Avoid double-remove after blk_hide_on_behalf_of_hmp_drive_del() */
>>      if (blk->name[0]) {
>>          QTAILQ_REMOVE(&blk_backends, blk, link);
>> @@ -1050,3 +1056,34 @@ int blk_probe_geometry(BlockBackend *blk, HDGeometry *geo)
>>  {
>>      return bdrv_probe_geometry(blk->bs, geo);
>>  }
>> +
>> +/*
>> + * Updates the BlockBackendRootState object with data from the currently
>> + * attached BlockDriverState.
>> + */
>> +void blk_update_root_state(BlockBackend *blk)
>> +{
>> +    assert(blk->bs);
>> +
>> +    blk->root_state.open_flags    = blk->bs->open_flags;
>> +    blk->root_state.read_only     = blk->bs->read_only;
>> +    blk->root_state.detect_zeroes = blk->bs->detect_zeroes;
>> +
>> +    blk->root_state.io_limits_enabled = blk->bs->io_limits_enabled;
>> +
>> +    g_free(blk->root_state.throttle_group_name);
>> +    if (blk->bs->throttle_state) {
>> +        throttle_get_config(blk->bs->throttle_state,
>> +                            &blk->root_state.throttle_config);
>> +        blk->root_state.throttle_group_name =
>> +            g_strdup(throttle_group_get_name(blk->bs));
> 
> The throttling group name is essentially a weak reference. I think
> making it strong is worth considering.
> 
> If you attach a new BDS, if the old throttling group still exists, the
> new BDS is re-added there, as expected. If the throttling group has got
> changed limits meanwhile, you need to choose whether to apply the limits
> saved here or use the new limits for the BDS. I guess the latter will be
> the correct choice.
> 
> If the old throttling group doesn't exist any more, a new one is created
> with the old name and limits. If two BlockBackends eject their media at
> separate times and the throttle group goes away, reinstantiating the
> throttle group by attaching a BDS to both BBs will use the config of the
> BB that got a new medium first (as opposed to: the latest setting the
> throttling group had before it was freed).
> 
> If the old throttling group was freed, but then a new throttling group
> was created with the same name, we get attached to that new group with
> its new limits. This could be slightly surprising as well.

Yes, I'll see to making it a strong reference, and to fetching the
throttle config from the throttle group (probably by using
throttle_group_unregister_bs()+throttle_group_register_bs()
consecutively in qmp_blockdev_change_medium(), and dropping
BBRS.throttle_config).

Max
Alberto Garcia Sept. 29, 2015, 11:17 a.m. UTC | #3
On Tue 22 Sep 2015 05:00:05 PM CEST, Max Reitz wrote:

>> The correct way to solve this seems to be that each BB has its own
>> I/O throttling filter. Actually, if we could lift the throttling code
>> to BlockBackend, that might solve the problem.
>
> So yes, as long as we have throttling on the BDS level, something may
> always break when having multiple BBs per BDS, because you'd probably
> want throttling to be on the BB level. But lifting that doesn't seem a
> trivial task, and I don't really know whether I want this in this
> series.
>
> Especially considering that right now you cannot have multiple BBs per
> BDS.
>
> All in all: Yes, before we allow multiple BBs per BDS we want
> throttling to be on the BB level. But this is nothing that should be
> done in or for this series, since right now we do not allow multiple
> BBs per BDS.

I agree that it makes sense to move throttling to the BB level. It's
probably not trivial but I don't think it's too complex either. I can
give it a look once this series has been merged.

Berto
Eric Blake Sept. 29, 2015, 12:30 p.m. UTC | #4
On 09/29/2015 05:17 AM, Alberto Garcia wrote:
> On Tue 22 Sep 2015 05:00:05 PM CEST, Max Reitz wrote:
> 
>>> The correct way to solve this seems to be that each BB has its own
>>> I/O throttling filter. Actually, if we could lift the throttling code
>>> to BlockBackend, that might solve the problem.
>>
>> So yes, as long as we have throttling on the BDS level, something may
>> always break when having multiple BBs per BDS, because you'd probably
>> want throttling to be on the BB level. But lifting that doesn't seem a
>> trivial task, and I don't really know whether I want this in this
>> series.
>>
>> Especially considering that right now you cannot have multiple BBs per
>> BDS.
>>
>> All in all: Yes, before we allow multiple BBs per BDS we want
>> throttling to be on the BB level. But this is nothing that should be
>> done in or for this series, since right now we do not allow multiple
>> BBs per BDS.
> 
> I agree that it makes sense to move throttling to the BB level. It's
> probably not trivial but I don't think it's too complex either. I can
> give it a look once this series has been merged.

Ultimately, I think we want throttling at both levels.  I can make the
argument that in a cloud, a guest should not be able to consume more
than X resources (throttling at the BB, regardless of what BDS tree it
is associated with); but I can also argue that I may want to throttle
specific BDS (a backing chain with network backing file and local active
file: throttle the backing file to limit network traffic, and the guest
gets faster as it moves more local; or conversely, a common backing file
and one-off guest: throttle the active layer to do performance analysis
of the bottlenecks where the guest differs from the common backing layer).
Kevin Wolf Sept. 29, 2015, 1:12 p.m. UTC | #5
Am 29.09.2015 um 14:30 hat Eric Blake geschrieben:
> On 09/29/2015 05:17 AM, Alberto Garcia wrote:
> > On Tue 22 Sep 2015 05:00:05 PM CEST, Max Reitz wrote:
> > 
> >>> The correct way to solve this seems to be that each BB has its own
> >>> I/O throttling filter. Actually, if we could lift the throttling code
> >>> to BlockBackend, that might solve the problem.
> >>
> >> So yes, as long as we have throttling on the BDS level, something may
> >> always break when having multiple BBs per BDS, because you'd probably
> >> want throttling to be on the BB level. But lifting that doesn't seem a
> >> trivial task, and I don't really know whether I want this in this
> >> series.
> >>
> >> Especially considering that right now you cannot have multiple BBs per
> >> BDS.
> >>
> >> All in all: Yes, before we allow multiple BBs per BDS we want
> >> throttling to be on the BB level. But this is nothing that should be
> >> done in or for this series, since right now we do not allow multiple
> >> BBs per BDS.
> > 
> > I agree that it makes sense to move throttling to the BB level. It's
> > probably not trivial but I don't think it's too complex either. I can
> > give it a look once this series has been merged.
> 
> Ultimately, I think we want throttling at both levels.  I can make the
> argument that in a cloud, a guest should not be able to consume more
> than X resources (throttling at the BB, regardless of what BDS tree it
> is associated with); but I can also argue that I may want to throttle
> specific BDS (a backing chain with network backing file and local active
> file: throttle the backing file to limit network traffic, and the guest
> gets faster as it moves more local; or conversely, a common backing file
> and one-off guest: throttle the active layer to do performance analysis
> of the bottlenecks where the guest differs from the common backing layer).

Yes, but the throttling that we currently have is the one that belongs
into BlockBackend. This becomes very clear when you consider that the
throttling moves to the new top when you create live snapshots.

Throttling for individual BDSes should be done by a filter block driver.
In fact, I also suspect that the BB-level filtering should eventually be
done by a filter block driver, but that's something for which we don't
have a clear design yet.

Kevin
diff mbox

Patch

diff --git a/block/block-backend.c b/block/block-backend.c
index 1f2cd9b..9590be5 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -13,6 +13,7 @@ 
 #include "sysemu/block-backend.h"
 #include "block/block_int.h"
 #include "block/blockjob.h"
+#include "block/throttle-groups.h"
 #include "sysemu/blockdev.h"
 #include "sysemu/sysemu.h"
 #include "qapi-event.h"
@@ -37,6 +38,10 @@  struct BlockBackend {
     /* the block size for which the guest device expects atomicity */
     int guest_block_size;
 
+    /* If the BDS tree is removed, some of its options are stored here (which
+     * can be used to restore those options in the new BDS on insert) */
+    BlockBackendRootState root_state;
+
     /* I/O stats (display with "info blockstats"). */
     BlockAcctStats stats;
 
@@ -161,6 +166,7 @@  static void blk_delete(BlockBackend *blk)
         bdrv_unref(blk->bs);
         blk->bs = NULL;
     }
+    g_free(blk->root_state.throttle_group_name);
     /* Avoid double-remove after blk_hide_on_behalf_of_hmp_drive_del() */
     if (blk->name[0]) {
         QTAILQ_REMOVE(&blk_backends, blk, link);
@@ -1050,3 +1056,34 @@  int blk_probe_geometry(BlockBackend *blk, HDGeometry *geo)
 {
     return bdrv_probe_geometry(blk->bs, geo);
 }
+
+/*
+ * Updates the BlockBackendRootState object with data from the currently
+ * attached BlockDriverState.
+ */
+void blk_update_root_state(BlockBackend *blk)
+{
+    assert(blk->bs);
+
+    blk->root_state.open_flags    = blk->bs->open_flags;
+    blk->root_state.read_only     = blk->bs->read_only;
+    blk->root_state.detect_zeroes = blk->bs->detect_zeroes;
+
+    blk->root_state.io_limits_enabled = blk->bs->io_limits_enabled;
+
+    g_free(blk->root_state.throttle_group_name);
+    if (blk->bs->throttle_state) {
+        throttle_get_config(blk->bs->throttle_state,
+                            &blk->root_state.throttle_config);
+        blk->root_state.throttle_group_name =
+            g_strdup(throttle_group_get_name(blk->bs));
+    } else {
+        blk->root_state.throttle_config = (ThrottleConfig){};
+        blk->root_state.throttle_group_name = NULL;
+    }
+}
+
+BlockBackendRootState *blk_get_root_state(BlockBackend *blk)
+{
+    return &blk->root_state;
+}
diff --git a/include/block/block_int.h b/include/block/block_int.h
index bd35570..ca1eefa 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -449,6 +449,16 @@  struct BlockDriverState {
     NotifierWithReturn write_threshold_notifier;
 };
 
+struct BlockBackendRootState {
+    int open_flags;
+    bool read_only;
+    BlockdevDetectZeroesOptions detect_zeroes;
+
+    bool io_limits_enabled;
+    ThrottleConfig throttle_config;
+    char *throttle_group_name;
+};
+
 
 /* Essential block drivers which must always be statically linked into qemu, and
  * which therefore can be accessed without using bdrv_find_format() */
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 97ac727..5105cc7 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -11,6 +11,7 @@  typedef struct AddressSpace AddressSpace;
 typedef struct AioContext AioContext;
 typedef struct AudioState AudioState;
 typedef struct BlockBackend BlockBackend;
+typedef struct BlockBackendRootState BlockBackendRootState;
 typedef struct BlockDriverState BlockDriverState;
 typedef struct BusClass BusClass;
 typedef struct BusState BusState;
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index eafcef0..52e35a1 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -163,6 +163,8 @@  void blk_add_close_notifier(BlockBackend *blk, Notifier *notify);
 void blk_io_plug(BlockBackend *blk);
 void blk_io_unplug(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
+BlockBackendRootState *blk_get_root_state(BlockBackend *blk);
+void blk_update_root_state(BlockBackend *blk);
 
 void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk,
                   BlockCompletionFunc *cb, void *opaque);