Message ID | 1442589793-7105-18-git-send-email-mreitz@redhat.com |
---|---|
State | New |
Headers | show |
Am 18.09.2015 um 17:22 hat Max Reitz geschrieben: > This structure will store some of the state of the root BDS if the BDS > tree is removed, so that state can be restored once a new BDS tree is > inserted. This is magic that is bound to bite us sooner or later. I see that we have to do this in order to maintain compatibility, but imagine we'll have multiple BlockBackends sitting on top of the same BDS. They'll be fighting for whose throttling settings are applied. With this approach, the winner is the BlockBackend that most recently went from no medium to medium, which might just be the most surprising way to solve the problem. I'll detail one aspect of the problem below. The correct way to solve this seems to be that each BB has its own I/O throttling filter. Actually, if we could lift the throttling code to BlockBackend, that might solve the problem. I guess the same applies for the other fields in BlockBackendRootState. > Signed-off-by: Max Reitz <mreitz@redhat.com> > Reviewed-by: Eric Blake <eblake@redhat.com> > --- > block/block-backend.c | 37 +++++++++++++++++++++++++++++++++++++ > include/block/block_int.h | 10 ++++++++++ > include/qemu/typedefs.h | 1 + > include/sysemu/block-backend.h | 2 ++ > 4 files changed, 50 insertions(+) > > diff --git a/block/block-backend.c b/block/block-backend.c > index 1f2cd9b..9590be5 100644 > --- a/block/block-backend.c > +++ b/block/block-backend.c > @@ -13,6 +13,7 @@ > #include "sysemu/block-backend.h" > #include "block/block_int.h" > #include "block/blockjob.h" > +#include "block/throttle-groups.h" > #include "sysemu/blockdev.h" > #include "sysemu/sysemu.h" > #include "qapi-event.h" > @@ -37,6 +38,10 @@ struct BlockBackend { > /* the block size for which the guest device expects atomicity */ > int guest_block_size; > > + /* If the BDS tree is removed, some of its options are stored here (which > + * can be used to restore those options in the new BDS on insert) */ > + BlockBackendRootState root_state; > + > /* I/O stats (display with "info blockstats"). */ > BlockAcctStats stats; > > @@ -161,6 +166,7 @@ static void blk_delete(BlockBackend *blk) > bdrv_unref(blk->bs); > blk->bs = NULL; > } > + g_free(blk->root_state.throttle_group_name); > /* Avoid double-remove after blk_hide_on_behalf_of_hmp_drive_del() */ > if (blk->name[0]) { > QTAILQ_REMOVE(&blk_backends, blk, link); > @@ -1050,3 +1056,34 @@ int blk_probe_geometry(BlockBackend *blk, HDGeometry *geo) > { > return bdrv_probe_geometry(blk->bs, geo); > } > + > +/* > + * Updates the BlockBackendRootState object with data from the currently > + * attached BlockDriverState. > + */ > +void blk_update_root_state(BlockBackend *blk) > +{ > + assert(blk->bs); > + > + blk->root_state.open_flags = blk->bs->open_flags; > + blk->root_state.read_only = blk->bs->read_only; > + blk->root_state.detect_zeroes = blk->bs->detect_zeroes; > + > + blk->root_state.io_limits_enabled = blk->bs->io_limits_enabled; > + > + g_free(blk->root_state.throttle_group_name); > + if (blk->bs->throttle_state) { > + throttle_get_config(blk->bs->throttle_state, > + &blk->root_state.throttle_config); > + blk->root_state.throttle_group_name = > + g_strdup(throttle_group_get_name(blk->bs)); The throttling group name is essentially a weak reference. I think making it strong is worth considering. If you attach a new BDS, if the old throttling group still exists, the new BDS is re-added there, as expected. If the throttling group has got changed limits meanwhile, you need to choose whether to apply the limits saved here or use the new limits for the BDS. I guess the latter will be the correct choice. If the old throttling group doesn't exist any more, a new one is created with the old name and limits. If two BlockBackends eject their media at separate times and the throttle group goes away, reinstantiating the throttle group by attaching a BDS to both BBs will use the config of the BB that got a new medium first (as opposed to: the latest setting the throttling group had before it was freed). If the old throttling group was freed, but then a new throttling group was created with the same name, we get attached to that new group with its new limits. This could be slightly surprising as well. Kevin
On 22.09.2015 16:17, Kevin Wolf wrote: > Am 18.09.2015 um 17:22 hat Max Reitz geschrieben: >> This structure will store some of the state of the root BDS if the BDS >> tree is removed, so that state can be restored once a new BDS tree is >> inserted. > > This is magic that is bound to bite us sooner or later. I see that we > have to do this in order to maintain compatibility, but imagine we'll > have multiple BlockBackends sitting on top of the same BDS. They'll be > fighting for whose throttling settings are applied. With this approach, > the winner is the BlockBackend that most recently went from no medium to > medium, which might just be the most surprising way to solve the > problem. Well, my first answer to this is that as far as I remember and as far as I can see right now, it's only qmp_blockdev_change_medium() that accesses this structure. This is effectively a legacy command, even though it is introduced by this series. When using the "atomic" operations, this structure will not be used. So let's examine what cases of multiple BBs on top of the same BDS we can have, that are relevant here: One of the BBs always has to be the BB belonging to a device such as a CD drive, otherwise we cannot invoke the change command. The other BB(s) can be either: (1) Also BBs belonging to CDD-like devices (let's call them "tray devices") (2) Other BBs such as block job BBs or NBD BBs In the first case, the scenario you described can indeed occur, but only if the user is indeed using the change command. I'm very much inclined to say it's the user's own fault for having multiple tray device BBs on top of a single BDS and managing them with the change command. In any case, with the current state of having throttling at the BDS level, we are guarenteed to break something, at least. In the second case, throttling will change for all the attached BBs whenever the BDS is inserted into a throttled tray device. This seems correct to me, at least as long as we have throttling on the BDS level. > I'll detail one aspect of the problem below. > > The correct way to solve this seems to be that each BB has its own I/O > throttling filter. Actually, if we could lift the throttling code to > BlockBackend, that might solve the problem. So yes, as long as we have throttling on the BDS level, something may always break when having multiple BBs per BDS, because you'd probably want throttling to be on the BB level. But lifting that doesn't seem a trivial task, and I don't really know whether I want this in this series. Especially considering that right now you cannot have multiple BBs per BDS. All in all: Yes, before we allow multiple BBs per BDS we want throttling to be on the BB level. But this is nothing that should be done in or for this series, since right now we do not allow multiple BBs per BDS. > I guess the same applies for the other fields in BlockBackendRootState. > >> Signed-off-by: Max Reitz <mreitz@redhat.com> >> Reviewed-by: Eric Blake <eblake@redhat.com> >> --- >> block/block-backend.c | 37 +++++++++++++++++++++++++++++++++++++ >> include/block/block_int.h | 10 ++++++++++ >> include/qemu/typedefs.h | 1 + >> include/sysemu/block-backend.h | 2 ++ >> 4 files changed, 50 insertions(+) >> >> diff --git a/block/block-backend.c b/block/block-backend.c >> index 1f2cd9b..9590be5 100644 >> --- a/block/block-backend.c >> +++ b/block/block-backend.c >> @@ -13,6 +13,7 @@ >> #include "sysemu/block-backend.h" >> #include "block/block_int.h" >> #include "block/blockjob.h" >> +#include "block/throttle-groups.h" >> #include "sysemu/blockdev.h" >> #include "sysemu/sysemu.h" >> #include "qapi-event.h" >> @@ -37,6 +38,10 @@ struct BlockBackend { >> /* the block size for which the guest device expects atomicity */ >> int guest_block_size; >> >> + /* If the BDS tree is removed, some of its options are stored here (which >> + * can be used to restore those options in the new BDS on insert) */ >> + BlockBackendRootState root_state; >> + >> /* I/O stats (display with "info blockstats"). */ >> BlockAcctStats stats; >> >> @@ -161,6 +166,7 @@ static void blk_delete(BlockBackend *blk) >> bdrv_unref(blk->bs); >> blk->bs = NULL; >> } >> + g_free(blk->root_state.throttle_group_name); >> /* Avoid double-remove after blk_hide_on_behalf_of_hmp_drive_del() */ >> if (blk->name[0]) { >> QTAILQ_REMOVE(&blk_backends, blk, link); >> @@ -1050,3 +1056,34 @@ int blk_probe_geometry(BlockBackend *blk, HDGeometry *geo) >> { >> return bdrv_probe_geometry(blk->bs, geo); >> } >> + >> +/* >> + * Updates the BlockBackendRootState object with data from the currently >> + * attached BlockDriverState. >> + */ >> +void blk_update_root_state(BlockBackend *blk) >> +{ >> + assert(blk->bs); >> + >> + blk->root_state.open_flags = blk->bs->open_flags; >> + blk->root_state.read_only = blk->bs->read_only; >> + blk->root_state.detect_zeroes = blk->bs->detect_zeroes; >> + >> + blk->root_state.io_limits_enabled = blk->bs->io_limits_enabled; >> + >> + g_free(blk->root_state.throttle_group_name); >> + if (blk->bs->throttle_state) { >> + throttle_get_config(blk->bs->throttle_state, >> + &blk->root_state.throttle_config); >> + blk->root_state.throttle_group_name = >> + g_strdup(throttle_group_get_name(blk->bs)); > > The throttling group name is essentially a weak reference. I think > making it strong is worth considering. > > If you attach a new BDS, if the old throttling group still exists, the > new BDS is re-added there, as expected. If the throttling group has got > changed limits meanwhile, you need to choose whether to apply the limits > saved here or use the new limits for the BDS. I guess the latter will be > the correct choice. > > If the old throttling group doesn't exist any more, a new one is created > with the old name and limits. If two BlockBackends eject their media at > separate times and the throttle group goes away, reinstantiating the > throttle group by attaching a BDS to both BBs will use the config of the > BB that got a new medium first (as opposed to: the latest setting the > throttling group had before it was freed). > > If the old throttling group was freed, but then a new throttling group > was created with the same name, we get attached to that new group with > its new limits. This could be slightly surprising as well. Yes, I'll see to making it a strong reference, and to fetching the throttle config from the throttle group (probably by using throttle_group_unregister_bs()+throttle_group_register_bs() consecutively in qmp_blockdev_change_medium(), and dropping BBRS.throttle_config). Max
On Tue 22 Sep 2015 05:00:05 PM CEST, Max Reitz wrote: >> The correct way to solve this seems to be that each BB has its own >> I/O throttling filter. Actually, if we could lift the throttling code >> to BlockBackend, that might solve the problem. > > So yes, as long as we have throttling on the BDS level, something may > always break when having multiple BBs per BDS, because you'd probably > want throttling to be on the BB level. But lifting that doesn't seem a > trivial task, and I don't really know whether I want this in this > series. > > Especially considering that right now you cannot have multiple BBs per > BDS. > > All in all: Yes, before we allow multiple BBs per BDS we want > throttling to be on the BB level. But this is nothing that should be > done in or for this series, since right now we do not allow multiple > BBs per BDS. I agree that it makes sense to move throttling to the BB level. It's probably not trivial but I don't think it's too complex either. I can give it a look once this series has been merged. Berto
On 09/29/2015 05:17 AM, Alberto Garcia wrote: > On Tue 22 Sep 2015 05:00:05 PM CEST, Max Reitz wrote: > >>> The correct way to solve this seems to be that each BB has its own >>> I/O throttling filter. Actually, if we could lift the throttling code >>> to BlockBackend, that might solve the problem. >> >> So yes, as long as we have throttling on the BDS level, something may >> always break when having multiple BBs per BDS, because you'd probably >> want throttling to be on the BB level. But lifting that doesn't seem a >> trivial task, and I don't really know whether I want this in this >> series. >> >> Especially considering that right now you cannot have multiple BBs per >> BDS. >> >> All in all: Yes, before we allow multiple BBs per BDS we want >> throttling to be on the BB level. But this is nothing that should be >> done in or for this series, since right now we do not allow multiple >> BBs per BDS. > > I agree that it makes sense to move throttling to the BB level. It's > probably not trivial but I don't think it's too complex either. I can > give it a look once this series has been merged. Ultimately, I think we want throttling at both levels. I can make the argument that in a cloud, a guest should not be able to consume more than X resources (throttling at the BB, regardless of what BDS tree it is associated with); but I can also argue that I may want to throttle specific BDS (a backing chain with network backing file and local active file: throttle the backing file to limit network traffic, and the guest gets faster as it moves more local; or conversely, a common backing file and one-off guest: throttle the active layer to do performance analysis of the bottlenecks where the guest differs from the common backing layer).
Am 29.09.2015 um 14:30 hat Eric Blake geschrieben: > On 09/29/2015 05:17 AM, Alberto Garcia wrote: > > On Tue 22 Sep 2015 05:00:05 PM CEST, Max Reitz wrote: > > > >>> The correct way to solve this seems to be that each BB has its own > >>> I/O throttling filter. Actually, if we could lift the throttling code > >>> to BlockBackend, that might solve the problem. > >> > >> So yes, as long as we have throttling on the BDS level, something may > >> always break when having multiple BBs per BDS, because you'd probably > >> want throttling to be on the BB level. But lifting that doesn't seem a > >> trivial task, and I don't really know whether I want this in this > >> series. > >> > >> Especially considering that right now you cannot have multiple BBs per > >> BDS. > >> > >> All in all: Yes, before we allow multiple BBs per BDS we want > >> throttling to be on the BB level. But this is nothing that should be > >> done in or for this series, since right now we do not allow multiple > >> BBs per BDS. > > > > I agree that it makes sense to move throttling to the BB level. It's > > probably not trivial but I don't think it's too complex either. I can > > give it a look once this series has been merged. > > Ultimately, I think we want throttling at both levels. I can make the > argument that in a cloud, a guest should not be able to consume more > than X resources (throttling at the BB, regardless of what BDS tree it > is associated with); but I can also argue that I may want to throttle > specific BDS (a backing chain with network backing file and local active > file: throttle the backing file to limit network traffic, and the guest > gets faster as it moves more local; or conversely, a common backing file > and one-off guest: throttle the active layer to do performance analysis > of the bottlenecks where the guest differs from the common backing layer). Yes, but the throttling that we currently have is the one that belongs into BlockBackend. This becomes very clear when you consider that the throttling moves to the new top when you create live snapshots. Throttling for individual BDSes should be done by a filter block driver. In fact, I also suspect that the BB-level filtering should eventually be done by a filter block driver, but that's something for which we don't have a clear design yet. Kevin
diff --git a/block/block-backend.c b/block/block-backend.c index 1f2cd9b..9590be5 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -13,6 +13,7 @@ #include "sysemu/block-backend.h" #include "block/block_int.h" #include "block/blockjob.h" +#include "block/throttle-groups.h" #include "sysemu/blockdev.h" #include "sysemu/sysemu.h" #include "qapi-event.h" @@ -37,6 +38,10 @@ struct BlockBackend { /* the block size for which the guest device expects atomicity */ int guest_block_size; + /* If the BDS tree is removed, some of its options are stored here (which + * can be used to restore those options in the new BDS on insert) */ + BlockBackendRootState root_state; + /* I/O stats (display with "info blockstats"). */ BlockAcctStats stats; @@ -161,6 +166,7 @@ static void blk_delete(BlockBackend *blk) bdrv_unref(blk->bs); blk->bs = NULL; } + g_free(blk->root_state.throttle_group_name); /* Avoid double-remove after blk_hide_on_behalf_of_hmp_drive_del() */ if (blk->name[0]) { QTAILQ_REMOVE(&blk_backends, blk, link); @@ -1050,3 +1056,34 @@ int blk_probe_geometry(BlockBackend *blk, HDGeometry *geo) { return bdrv_probe_geometry(blk->bs, geo); } + +/* + * Updates the BlockBackendRootState object with data from the currently + * attached BlockDriverState. + */ +void blk_update_root_state(BlockBackend *blk) +{ + assert(blk->bs); + + blk->root_state.open_flags = blk->bs->open_flags; + blk->root_state.read_only = blk->bs->read_only; + blk->root_state.detect_zeroes = blk->bs->detect_zeroes; + + blk->root_state.io_limits_enabled = blk->bs->io_limits_enabled; + + g_free(blk->root_state.throttle_group_name); + if (blk->bs->throttle_state) { + throttle_get_config(blk->bs->throttle_state, + &blk->root_state.throttle_config); + blk->root_state.throttle_group_name = + g_strdup(throttle_group_get_name(blk->bs)); + } else { + blk->root_state.throttle_config = (ThrottleConfig){}; + blk->root_state.throttle_group_name = NULL; + } +} + +BlockBackendRootState *blk_get_root_state(BlockBackend *blk) +{ + return &blk->root_state; +} diff --git a/include/block/block_int.h b/include/block/block_int.h index bd35570..ca1eefa 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -449,6 +449,16 @@ struct BlockDriverState { NotifierWithReturn write_threshold_notifier; }; +struct BlockBackendRootState { + int open_flags; + bool read_only; + BlockdevDetectZeroesOptions detect_zeroes; + + bool io_limits_enabled; + ThrottleConfig throttle_config; + char *throttle_group_name; +}; + /* Essential block drivers which must always be statically linked into qemu, and * which therefore can be accessed without using bdrv_find_format() */ diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h index 97ac727..5105cc7 100644 --- a/include/qemu/typedefs.h +++ b/include/qemu/typedefs.h @@ -11,6 +11,7 @@ typedef struct AddressSpace AddressSpace; typedef struct AioContext AioContext; typedef struct AudioState AudioState; typedef struct BlockBackend BlockBackend; +typedef struct BlockBackendRootState BlockBackendRootState; typedef struct BlockDriverState BlockDriverState; typedef struct BusClass BusClass; typedef struct BusState BusState; diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index eafcef0..52e35a1 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -163,6 +163,8 @@ void blk_add_close_notifier(BlockBackend *blk, Notifier *notify); void blk_io_plug(BlockBackend *blk); void blk_io_unplug(BlockBackend *blk); BlockAcctStats *blk_get_stats(BlockBackend *blk); +BlockBackendRootState *blk_get_root_state(BlockBackend *blk); +void blk_update_root_state(BlockBackend *blk); void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk, BlockCompletionFunc *cb, void *opaque);