diff mbox series

[v3,3/5] qcow2: Introduce an option for sufficient L2 cache for the entire image

Message ID 20180724200343.13733-4-lbloch@janustech.com
State New
Headers show
Series [v3,1/5,for-3.0] A grammar fix | expand

Commit Message

Leonid Bloch July 24, 2018, 8:03 p.m. UTC
An option "l2-cache-full" is introduced to automatically set the qcow2
L2 cache to a sufficient value for covering the entire image. The memory
overhead when using this option is not big (1 MB for each 8 GB of
virtual image size with the default cluster size) and it can noticeably
improve performance when using large images with frequent I/O.
Previously, for this functionality the correct L2 cache size needed to
be calculated manually or with a script, and then this size needed to be
passed to the "l2-cache-size" option. Now it is sufficient to just pass
the boolean "l2-cache-full" option.

Signed-off-by: Leonid Bloch <lbloch@janustech.com>
---
 block/qcow2.c        | 35 ++++++++++++++++++++++++++++-------
 block/qcow2.h        |  1 +
 qapi/block-core.json |  8 +++++++-
 qemu-options.hx      |  6 +++++-
 4 files changed, 41 insertions(+), 9 deletions(-)

Comments

Kevin Wolf July 25, 2018, 8:26 a.m. UTC | #1
Am 24.07.2018 um 22:03 hat Leonid Bloch geschrieben:
> An option "l2-cache-full" is introduced to automatically set the qcow2
> L2 cache to a sufficient value for covering the entire image. The memory
> overhead when using this option is not big (1 MB for each 8 GB of
> virtual image size with the default cluster size) and it can noticeably
> improve performance when using large images with frequent I/O.
> Previously, for this functionality the correct L2 cache size needed to
> be calculated manually or with a script, and then this size needed to be
> passed to the "l2-cache-size" option. Now it is sufficient to just pass
> the boolean "l2-cache-full" option.
> 
> Signed-off-by: Leonid Bloch <lbloch@janustech.com>

> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index d40d5ecc3b..c584059e23 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2812,7 +2812,12 @@
>  #                         refcount block caches in bytes (since 2.2)
>  #
>  # @l2-cache-size:         the maximum size of the L2 table cache in
> -#                         bytes (since 2.2)
> +#                         bytes (mutually exclusive with l2-cache-full)
> +#                         (since 2.2)
> +#
> +# @l2-cache-full:         make the L2 table cache large enough to cover the
> +#                         entire image (mutually exclusive with l2-cache-size)
> +#                         (since 3.1)
>  #
>  # @l2-cache-entry-size:   the size of each entry in the L2 cache in
>  #                         bytes. It must be a power of two between 512
> @@ -2840,6 +2845,7 @@
>              '*overlap-check': 'Qcow2OverlapChecks',
>              '*cache-size': 'int',
>              '*l2-cache-size': 'int',
> +            '*l2-cache-full': 'bool',
>              '*l2-cache-entry-size': 'int',
>              '*refcount-cache-size': 'int',
>              '*cache-clean-interval': 'int',

Only looking at the external interface for now, I wonder whether it
would be nicer not to have two mutually exclusive options, but to make
l2-cache-size an alternate that can take either an int like before
(meaning the number of bytes) or a string/enum (with the only accepted
value "full" for now).

Another interesting question is whether 'full' shouldn't keep meaning
full throughout the lifetime of the BlockDriverState, i.e. should it
keep adapting to the new size when the image size changes?

Kevin
Eric Blake July 25, 2018, 12:22 p.m. UTC | #2
On 07/25/2018 03:26 AM, Kevin Wolf wrote:

>> @@ -2840,6 +2845,7 @@
>>               '*overlap-check': 'Qcow2OverlapChecks',
>>               '*cache-size': 'int',
>>               '*l2-cache-size': 'int',
>> +            '*l2-cache-full': 'bool',
>>               '*l2-cache-entry-size': 'int',
>>               '*refcount-cache-size': 'int',
>>               '*cache-clean-interval': 'int',
> 
> Only looking at the external interface for now, I wonder whether it
> would be nicer not to have two mutually exclusive options, but to make
> l2-cache-size an alternate that can take either an int like before
> (meaning the number of bytes) or a string/enum (with the only accepted
> value "full" for now).

That does sound interesting.

> 
> Another interesting question is whether 'full' shouldn't keep meaning
> full throughout the lifetime of the BlockDriverState, i.e. should it
> keep adapting to the new size when the image size changes?

Do we even resize the cache now for image size changes? If we use an 
enum, we could have two different values depending on whether the chosen 
cache size remains fixed or also tries to resize when the image grows.
Leonid Bloch July 25, 2018, 12:32 p.m. UTC | #3
On 07/25/2018 03:22 PM, Eric Blake wrote:

     On 07/25/2018 03:26 AM, Kevin Wolf wrote:

     Only looking at the external interface for now, I wonder whether it
     would be nicer not to have two mutually exclusive options, but to
     make
     l2-cache-size an alternate that can take either an int like before
     (meaning the number of bytes) or a string/enum (with the only
     accepted
     value "full" for now).

     That does sound interesting.

   This does, but currently QEMU supports QEMU_OPT_STRING, QEMU_OPT_BOOL,
   QEMU_OPT_NUMBER, and QEMU_OPT_SIZE. Looks like it will require a more
   fundamental change to accept an option that can be either a string or a
   size.

     Another interesting question is whether 'full' shouldn't keep
     meaning
     full throughout the lifetime of the BlockDriverState, i.e. should it
     keep adapting to the new size when the image size changes?

     Do we even resize the cache now for image size changes? If we use an
     enum, we could have two different values depending on whether the
     chosen cache size remains fixed or also tries to resize when the
     image grows.

   Is it even possible to change the virtual disk image size online?
   Found a problem with my previous patch: the property was not actually
   set as a proper boolean option. Also, fixing the error output in iotest
   103 (thanks Kevin for the catch!). V5 is on the way.
Kevin Wolf July 25, 2018, 1:32 p.m. UTC | #4
Am 25.07.2018 um 14:32 hat Leonid Bloch geschrieben:
> On 07/25/2018 03:22 PM, Eric Blake wrote:
> 
>     On 07/25/2018 03:26 AM, Kevin Wolf wrote:
> 
>         Only looking at the external interface for now, I wonder whether it
>         would be nicer not to have two mutually exclusive options, but to make
>         l2-cache-size an alternate that can take either an int like before
>         (meaning the number of bytes) or a string/enum (with the only accepted
>         value "full" for now).
> 
>     That does sound interesting.
> 
> This does, but currently QEMU supports QEMU_OPT_STRING, QEMU_OPT_BOOL,
> QEMU_OPT_NUMBER, and QEMU_OPT_SIZE. Looks like it will require a more
> fundamental change to accept an option that can be either a string or a size.

Hm, yes, good point. We wouldn't be able to parse the options purely
with QemuOpts any more. So we would have to manually check for 'full' in
the QDict before calling qemu_opts_absorb_qdict(). If it's there, we
would have to process it and then delete it from the QDict before we
feed the QDict to qemu_opts_absorb_qdict(), which would only accept a
number there. A bit ugly, but should be workable.

Maybe this is really the time that we should convert qcow2 to use the
QAPI types anyway, like some of the protocol drivers do internally now.
Obviously, this is out of scope for this series, but it gives a
perspective for how to get rid of the ugliness again.

>         Another interesting question is whether 'full' shouldn't keep meaning
>         full throughout the lifetime of the BlockDriverState, i.e. should it
>         keep adapting to the new size when the image size changes?
> 
> 
>     Do we even resize the cache now for image size changes? If we use an enum,
>     we could have two different values depending on whether the chosen cache
>     size remains fixed or also tries to resize when the image grows.

We don't because we only support absolute cache sizes today. 'full'
would be the first one that is relative to the image size.

> Is it even possible to change the virtual disk image size online?

Yes, this is what qcow2_co_truncate() does (can be invoked, amongst
others, with the QMP command 'block_resize').

> Found a problem with my previous patch: the property was not actually set as a
> proper boolean option. Also, fixing the error output in iotest 103 (thanks
> Kevin for the catch!). V5 is on the way.

Maybe give the alternate thing a try with v5, as everyone seems to agree
that it's a nicer interface if we can make it work.

Also, a meta-comment: Leonid, would you mind sending plain text emails
instead of HTML-only?

Kevin
Leonid Bloch July 25, 2018, 3:23 p.m. UTC | #5
On 07/25/2018 04:32 PM, Kevin Wolf wrote:
> Am 25.07.2018 um 14:32 hat Leonid Bloch geschrieben:
>> On 07/25/2018 03:22 PM, Eric Blake wrote:
>>
>>      On 07/25/2018 03:26 AM, Kevin Wolf wrote:
>>
>>          Only looking at the external interface for now, I wonder whether it
>>          would be nicer not to have two mutually exclusive options, but to make
>>          l2-cache-size an alternate that can take either an int like before
>>          (meaning the number of bytes) or a string/enum (with the only accepted
>>          value "full" for now).
>>
>>      That does sound interesting.
>>
>> This does, but currently QEMU supports QEMU_OPT_STRING, QEMU_OPT_BOOL,
>> QEMU_OPT_NUMBER, and QEMU_OPT_SIZE. Looks like it will require a more
>> fundamental change to accept an option that can be either a string or a size.
> 
> Hm, yes, good point. We wouldn't be able to parse the options purely
> with QemuOpts any more. So we would have to manually check for 'full' in
> the QDict before calling qemu_opts_absorb_qdict(). If it's there, we
> would have to process it and then delete it from the QDict before we
> feed the QDict to qemu_opts_absorb_qdict(), which would only accept a
> number there. A bit ugly, but should be workable.
> 
> Maybe this is really the time that we should convert qcow2 to use the
> QAPI types anyway, like some of the protocol drivers do internally now.
> Obviously, this is out of scope for this series, but it gives a
> perspective for how to get rid of the ugliness again.

I need to look into that. Thanks for the idea. But indeed looks like out 
of scope for this series.

> 
>>          Another interesting question is whether 'full' shouldn't keep meaning
>>          full throughout the lifetime of the BlockDriverState, i.e. should it
>>          keep adapting to the new size when the image size changes?
>>
>>
>>      Do we even resize the cache now for image size changes? If we use an enum,
>>      we could have two different values depending on whether the chosen cache
>>      size remains fixed or also tries to resize when the image grows.
> 
> We don't because we only support absolute cache sizes today. 'full'
> would be the first one that is relative to the image size.
> 
>> Is it even possible to change the virtual disk image size online?
> 
> Yes, this is what qcow2_co_truncate() does (can be invoked, amongst
> others, with the QMP command 'block_resize').

Cool! This does look like a good idea to resize the L2 cache 
accordingly, but maybe this is out of scope for now as well? The purpose 
of the current series is just to provide an option to automatically 
calculate the needed L2 cache size for covering the entire image, 
instead of using an external script to do that and feed the output to 
l2-cache-size.

> 
>> Found a problem with my previous patch: the property was not actually set as a
>> proper boolean option. Also, fixing the error output in iotest 103 (thanks
>> Kevin for the catch!). V5 is on the way.
> 
> Maybe give the alternate thing a try with v5, as everyone seems to agree
> that it's a nicer interface if we can make it work.

You mean with QDict? I'll look into that now. But already sent v5 before 
reading this email.

> 
> Also, a meta-comment: Leonid, would you mind sending plain text emails
> instead of HTML-only?

Sure, Kevin! Sorry, I thought Thunderbird as configured here is sending 
plain text. I was wrong. Now it should be fine. Thanks for the remark.

Leonid.

> 
> Kevin
>
Kevin Wolf July 25, 2018, 3:53 p.m. UTC | #6
Am 25.07.2018 um 17:23 hat Leonid Bloch geschrieben:
> On 07/25/2018 04:32 PM, Kevin Wolf wrote:
> > Am 25.07.2018 um 14:32 hat Leonid Bloch geschrieben:
> > > On 07/25/2018 03:22 PM, Eric Blake wrote:
> > > 
> > >      On 07/25/2018 03:26 AM, Kevin Wolf wrote:
> > > 
> > >          Only looking at the external interface for now, I wonder whether it
> > >          would be nicer not to have two mutually exclusive options, but to make
> > >          l2-cache-size an alternate that can take either an int like before
> > >          (meaning the number of bytes) or a string/enum (with the only accepted
> > >          value "full" for now).
> > > 
> > >      That does sound interesting.
> > > 
> > > This does, but currently QEMU supports QEMU_OPT_STRING, QEMU_OPT_BOOL,
> > > QEMU_OPT_NUMBER, and QEMU_OPT_SIZE. Looks like it will require a more
> > > fundamental change to accept an option that can be either a string or a size.
> > 
> > Hm, yes, good point. We wouldn't be able to parse the options purely
> > with QemuOpts any more. So we would have to manually check for 'full' in
> > the QDict before calling qemu_opts_absorb_qdict(). If it's there, we
> > would have to process it and then delete it from the QDict before we
> > feed the QDict to qemu_opts_absorb_qdict(), which would only accept a
> > number there. A bit ugly, but should be workable.
> > 
> > Maybe this is really the time that we should convert qcow2 to use the
> > QAPI types anyway, like some of the protocol drivers do internally now.
> > Obviously, this is out of scope for this series, but it gives a
> > perspective for how to get rid of the ugliness again.
> 
> I need to look into that. Thanks for the idea. But indeed looks like
> out of scope for this series.

QAPIfication of .bdrv_open() is a long-term goal for all block drivers
anyway, so if you're really interested in this, any work towards it is
welcome. But it definitely won't be needed to get the cache size thing
in. I only mentioned it to show that we wouldn't be stuck with the
direct QDict access forever.

> > >          Another interesting question is whether 'full' shouldn't keep meaning
> > >          full throughout the lifetime of the BlockDriverState, i.e. should it
> > >          keep adapting to the new size when the image size changes?
> > > 
> > > 
> > >      Do we even resize the cache now for image size changes? If we use an enum,
> > >      we could have two different values depending on whether the chosen cache
> > >      size remains fixed or also tries to resize when the image grows.
> > 
> > We don't because we only support absolute cache sizes today. 'full'
> > would be the first one that is relative to the image size.
> > 
> > > Is it even possible to change the virtual disk image size online?
> > 
> > Yes, this is what qcow2_co_truncate() does (can be invoked, amongst
> > others, with the QMP command 'block_resize').
> 
> Cool! This does look like a good idea to resize the L2 cache accordingly,
> but maybe this is out of scope for now as well? The purpose of the current
> series is just to provide an option to automatically calculate the needed L2
> cache size for covering the entire image, instead of using an external
> script to do that and feed the output to l2-cache-size.

I'm not sure. Probably both ways can be defended, but if I as a user
said l2-cache-size=full, and then resized my 10 GB image to 20 GB, I
think I would expect that the cache still covers the whole image and not
only half of it. I never said I wanted enough cache for 10 GB, but I
said that I wanted enough cache to cover the whole image.

The thing is that once we decide not to resize the cache yet (and
document the option accordingly), it becomes ABI and we can't change
that later any more. We would have to introduce another option that
enables adaption to changed image sizes. As I don't see a reason why you
could want l2-cache-size=full, but no adaption, I'd prefer to implement
it right away.

And I think it might be as simple as adding something like this at the
end of qcow2_co_truncate():

    /* Update cache size for l2-cache-size=full */
    qcow2_update_options(bs, bs->options, s->flags, NULL);

I'm ignoring errors here because I think the resize was still successful
if we couldn't update the cache size, but that could be discussed.

> > > Found a problem with my previous patch: the property was not actually set as a
> > > proper boolean option. Also, fixing the error output in iotest 103 (thanks
> > > Kevin for the catch!). V5 is on the way.
> > 
> > Maybe give the alternate thing a try with v5, as everyone seems to agree
> > that it's a nicer interface if we can make it work.
> 
> You mean with QDict? I'll look into that now. But already sent v5 before
> reading this email.

Yes, with reading it from the QDict. (Or whatever the simplest way is
that results in the right external interface, but I suppose this is the
one.)

> > Also, a meta-comment: Leonid, would you mind sending plain text emails
> > instead of HTML-only?
> 
> Sure, Kevin! Sorry, I thought Thunderbird as configured here is sending
> plain text. I was wrong. Now it should be fine. Thanks for the remark.

Thanks, that looks better!

Kevin
Daniel P. Berrangé July 25, 2018, 3:59 p.m. UTC | #7
On Wed, Jul 25, 2018 at 06:23:45PM +0300, Leonid Bloch wrote:
> On 07/25/2018 04:32 PM, Kevin Wolf wrote:
> > >          Another interesting question is whether 'full' shouldn't keep meaning
> > >          full throughout the lifetime of the BlockDriverState, i.e. should it
> > >          keep adapting to the new size when the image size changes?
> > > 
> > > 
> > >      Do we even resize the cache now for image size changes? If we use an enum,
> > >      we could have two different values depending on whether the chosen cache
> > >      size remains fixed or also tries to resize when the image grows.
> > 
> > We don't because we only support absolute cache sizes today. 'full'
> > would be the first one that is relative to the image size.
> > 
> > > Is it even possible to change the virtual disk image size online?
> > 
> > Yes, this is what qcow2_co_truncate() does (can be invoked, amongst
> > others, with the QMP command 'block_resize').
> 
> Cool! This does look like a good idea to resize the L2 cache accordingly,
> but maybe this is out of scope for now as well? The purpose of the current
> series is just to provide an option to automatically calculate the needed L2
> cache size for covering the entire image, instead of using an external
> script to do that and feed the output to l2-cache-size.

Personally if I saw the description of the option and then found it didn't
"do the right thing" when resizing I'd consider the option broken. So I
think we should make it deal with resizes right away rather than implementing
a known bug.


Regards,
Daniel
Leonid Bloch July 26, 2018, 12:24 p.m. UTC | #8
>> You mean with QDict? I'll look into that now. But already sent v5 before
>> reading this email.
> 
> Yes, with reading it from the QDict. (Or whatever the simplest way is
> that results in the right external interface, but I suppose this is the
> one.)

Well, there is a problem with that: I can easily isolate
l2-cache-size from QDict, check if it is "full", and if it is - do 
whatever is needed, and delete this option before parsing. But what if 
it is "foo"? It will not get deleted, and the regular QEMU_OPT_SIZE 
parsing error will appear, stating that l2-cache-size "expects a 
non-negative number..." - no word about that it can expect "full" as 
well. Now, one can try to modify local_err->msg for this particular 
option, but this will require substantial additional logic. I think 
considering this, it would be easier to stick with a dedicated option, 
l2-cache-full.

Do you think there is a smarter way to parse the l2-cache-size option, 
so it would accept both size and "full", while handling errors 
correctly? It seems more elegant to have a single option, but the 
internal handling will be more elegant and simpler with two mutually 
exclusive options.

By the way, the L2 cache resizes now on image resize. Will send the 
changes in v6. Thanks for the suggestion!

Leonid.
Kevin Wolf July 26, 2018, 2:42 p.m. UTC | #9
Am 26.07.2018 um 14:24 hat Leonid Bloch geschrieben:
> > > You mean with QDict? I'll look into that now. But already sent v5 before
> > > reading this email.
> > 
> > Yes, with reading it from the QDict. (Or whatever the simplest way is
> > that results in the right external interface, but I suppose this is the
> > one.)
> 
> Well, there is a problem with that: I can easily isolate
> l2-cache-size from QDict, check if it is "full", and if it is - do whatever
> is needed, and delete this option before parsing. But what if it is "foo"?
> It will not get deleted, and the regular QEMU_OPT_SIZE parsing error will
> appear, stating that l2-cache-size "expects a non-negative number..." - no
> word about that it can expect "full" as well. Now, one can try to modify
> local_err->msg for this particular option, but this will require substantial
> additional logic. I think considering this, it would be easier to stick with
> a dedicated option, l2-cache-full.
> 
> Do you think there is a smarter way to parse the l2-cache-size option, so it
> would accept both size and "full", while handling errors correctly? It seems
> more elegant to have a single option, but the internal handling will be more
> elegant and simpler with two mutually exclusive options.

I think we can live with the suboptimal error message for a while. Once
qcow2 is QAPIfied, it should become easy to improve it. Let's not choose
a worse design (that stays forever) for a temporarily better error
message.

> By the way, the L2 cache resizes now on image resize. Will send the changes
> in v6. Thanks for the suggestion!

Sounds good!

Kevin
Leonid Bloch July 26, 2018, 2:50 p.m. UTC | #10
On 07/26/2018 05:42 PM, Kevin Wolf wrote:
> Am 26.07.2018 um 14:24 hat Leonid Bloch geschrieben:
>>>> You mean with QDict? I'll look into that now. But already sent v5 before
>>>> reading this email.
>>>
>>> Yes, with reading it from the QDict. (Or whatever the simplest way is
>>> that results in the right external interface, but I suppose this is the
>>> one.)
>>
>> Well, there is a problem with that: I can easily isolate
>> l2-cache-size from QDict, check if it is "full", and if it is - do whatever
>> is needed, and delete this option before parsing. But what if it is "foo"?
>> It will not get deleted, and the regular QEMU_OPT_SIZE parsing error will
>> appear, stating that l2-cache-size "expects a non-negative number..." - no
>> word about that it can expect "full" as well. Now, one can try to modify
>> local_err->msg for this particular option, but this will require substantial
>> additional logic. I think considering this, it would be easier to stick with
>> a dedicated option, l2-cache-full.
>>
>> Do you think there is a smarter way to parse the l2-cache-size option, so it
>> would accept both size and "full", while handling errors correctly? It seems
>> more elegant to have a single option, but the internal handling will be more
>> elegant and simpler with two mutually exclusive options.
> 
> I think we can live with the suboptimal error message for a while. Once
> qcow2 is QAPIfied, it should become easy to improve it. Let's not choose
> a worse design (that stays forever) for a temporarily better error
> message.

OK. I'll add a TODO then.

> 
>> By the way, the L2 cache resizes now on image resize. Will send the changes
>> in v6. Thanks for the suggestion!
> 
> Sounds good!
> 
> Kevin
>
Leonid Bloch July 26, 2018, 7:43 p.m. UTC | #11
On 07/26/2018 05:50 PM, Leonid Bloch wrote:
> On 07/26/2018 05:42 PM, Kevin Wolf wrote:
>> Am 26.07.2018 um 14:24 hat Leonid Bloch geschrieben:
>>>>> You mean with QDict? I'll look into that now. But already sent v5 
>>>>> before
>>>>> reading this email.
>>>>
>>>> Yes, with reading it from the QDict. (Or whatever the simplest way is
>>>> that results in the right external interface, but I suppose this is the
>>>> one.)
>>>
>>> Well, there is a problem with that: I can easily isolate
>>> l2-cache-size from QDict, check if it is "full", and if it is - do 
>>> whatever
>>> is needed, and delete this option before parsing. But what if it is 
>>> "foo"?
>>> It will not get deleted, and the regular QEMU_OPT_SIZE parsing error 
>>> will
>>> appear, stating that l2-cache-size "expects a non-negative number..." 
>>> - no
>>> word about that it can expect "full" as well. Now, one can try to modify
>>> local_err->msg for this particular option, but this will require 
>>> substantial
>>> additional logic. I think considering this, it would be easier to 
>>> stick with
>>> a dedicated option, l2-cache-full.
>>>
>>> Do you think there is a smarter way to parse the l2-cache-size 
>>> option, so it
>>> would accept both size and "full", while handling errors correctly? 
>>> It seems
>>> more elegant to have a single option, but the internal handling will 
>>> be more
>>> elegant and simpler with two mutually exclusive options.
>>
>> I think we can live with the suboptimal error message for a while. Once
>> qcow2 is QAPIfied, it should become easy to improve it. Let's not choose
>> a worse design (that stays forever) for a temporarily better error
>> message.
> 
> OK. I'll add a TODO then.

Another problem without a dedicated option, is that if l2-cache-size is 
processed and deleted, it can not be read again when needed for 
resizing. Without some *extremely* dirty tricks, that is.

Can QAPI be the solution? I've seen some examples, and it looks like the 
qcow2 driver already uses QAPI to some extent - I mean all the qdict 
stuff is from QAPI, no? Can you please point me to an example of how 
QAPI can solve the issue of an option that can accept both a size and a 
string?

Thanks,
Leonid.

>>
>>> By the way, the L2 cache resizes now on image resize. Will send the 
>>> changes
>>> in v6. Thanks for the suggestion!
>>
>> Sounds good!
>>
>> Kevin
>>
Kevin Wolf July 26, 2018, 8:19 p.m. UTC | #12
Am 26.07.2018 um 21:43 hat Leonid Bloch geschrieben:
> On 07/26/2018 05:50 PM, Leonid Bloch wrote:
> > On 07/26/2018 05:42 PM, Kevin Wolf wrote:
> > > Am 26.07.2018 um 14:24 hat Leonid Bloch geschrieben:
> > > > > > You mean with QDict? I'll look into that now. But
> > > > > > already sent v5 before
> > > > > > reading this email.
> > > > > 
> > > > > Yes, with reading it from the QDict. (Or whatever the simplest way is
> > > > > that results in the right external interface, but I suppose this is the
> > > > > one.)
> > > > 
> > > > Well, there is a problem with that: I can easily isolate
> > > > l2-cache-size from QDict, check if it is "full", and if it is -
> > > > do whatever
> > > > is needed, and delete this option before parsing. But what if it
> > > > is "foo"?
> > > > It will not get deleted, and the regular QEMU_OPT_SIZE parsing
> > > > error will
> > > > appear, stating that l2-cache-size "expects a non-negative
> > > > number..." - no
> > > > word about that it can expect "full" as well. Now, one can try to modify
> > > > local_err->msg for this particular option, but this will require
> > > > substantial
> > > > additional logic. I think considering this, it would be easier
> > > > to stick with
> > > > a dedicated option, l2-cache-full.
> > > > 
> > > > Do you think there is a smarter way to parse the l2-cache-size
> > > > option, so it
> > > > would accept both size and "full", while handling errors
> > > > correctly? It seems
> > > > more elegant to have a single option, but the internal handling
> > > > will be more
> > > > elegant and simpler with two mutually exclusive options.
> > > 
> > > I think we can live with the suboptimal error message for a while. Once
> > > qcow2 is QAPIfied, it should become easy to improve it. Let's not choose
> > > a worse design (that stays forever) for a temporarily better error
> > > message.
> > 
> > OK. I'll add a TODO then.
> 
> Another problem without a dedicated option, is that if l2-cache-size is
> processed and deleted, it can not be read again when needed for resizing.
> Without some *extremely* dirty tricks, that is.

Are you sure? The way that the .bdrv_open() callbacks work, _all_
options that are processed are removed from the QDict. The difference is
just whether qemu_opts_absorb_qdict() removes them or you do that
manually. In the end, if options are left in the QDict, the block layer
returns an error because that means that an unknown option was
specified.

I suppose you use bs->options while resizing. This is a copy of the
QDict which is not affected by the removal of processed options.

> Can QAPI be the solution? I've seen some examples, and it looks like the
> qcow2 driver already uses QAPI to some extent - I mean all the qdict stuff
> is from QAPI, no? Can you please point me to an example of how QAPI can
> solve the issue of an option that can accept both a size and a string?

By using QAPI I mean using types like BlockdevOptionsQcow2. These types
are directly generated from the JSON schema, where it's possible to
specify an alternate of a string and an integer.

One example for an alternate in the JSON schema is BlockdevRef, where
you can give a string (to reference a node-name) or an inline
declaration of a block device. In JSON, it looks like this:

    { 'alternate': 'BlockdevRef',
      'data': { 'definition': 'BlockdevOptions',
                'reference': 'str' } }

And the generated C type for it is:

    struct BlockdevRef {
        QType type;
        union { /* union tag is @type */
            BlockdevOptions definition;
            char *reference;
        } u;
    };

In our case, we would get a union of int64_t and char* instead, and
still the QType type as a discriminator.

Kevin
Leonid Bloch July 26, 2018, 9:08 p.m. UTC | #13
On July 26, 2018 11:19:03 PM EEST, Kevin Wolf <kwolf@redhat.com> wrote:
>Am 26.07.2018 um 21:43 hat Leonid Bloch geschrieben:
>> On 07/26/2018 05:50 PM, Leonid Bloch wrote:
>> > On 07/26/2018 05:42 PM, Kevin Wolf wrote:
>> > > Am 26.07.2018 um 14:24 hat Leonid Bloch geschrieben:
>> > > > > > You mean with QDict? I'll look into that now. But
>> > > > > > already sent v5 before
>> > > > > > reading this email.
>> > > > > 
>> > > > > Yes, with reading it from the QDict. (Or whatever the
>simplest way is
>> > > > > that results in the right external interface, but I suppose
>this is the
>> > > > > one.)
>> > > > 
>> > > > Well, there is a problem with that: I can easily isolate
>> > > > l2-cache-size from QDict, check if it is "full", and if it is -
>> > > > do whatever
>> > > > is needed, and delete this option before parsing. But what if
>it
>> > > > is "foo"?
>> > > > It will not get deleted, and the regular QEMU_OPT_SIZE parsing
>> > > > error will
>> > > > appear, stating that l2-cache-size "expects a non-negative
>> > > > number..." - no
>> > > > word about that it can expect "full" as well. Now, one can try
>to modify
>> > > > local_err->msg for this particular option, but this will
>require
>> > > > substantial
>> > > > additional logic. I think considering this, it would be easier
>> > > > to stick with
>> > > > a dedicated option, l2-cache-full.
>> > > > 
>> > > > Do you think there is a smarter way to parse the l2-cache-size
>> > > > option, so it
>> > > > would accept both size and "full", while handling errors
>> > > > correctly? It seems
>> > > > more elegant to have a single option, but the internal handling
>> > > > will be more
>> > > > elegant and simpler with two mutually exclusive options.
>> > > 
>> > > I think we can live with the suboptimal error message for a
>while. Once
>> > > qcow2 is QAPIfied, it should become easy to improve it. Let's not
>choose
>> > > a worse design (that stays forever) for a temporarily better
>error
>> > > message.
>> > 
>> > OK. I'll add a TODO then.
>> 
>> Another problem without a dedicated option, is that if l2-cache-size
>is
>> processed and deleted, it can not be read again when needed for
>resizing.
>> Without some *extremely* dirty tricks, that is.
>
>Are you sure? The way that the .bdrv_open() callbacks work, _all_
>options that are processed are removed from the QDict. The difference
>is
>just whether qemu_opts_absorb_qdict() removes them or you do that
>manually. In the end, if options are left in the QDict, the block layer
>returns an error because that means that an unknown option was
>specified.
>
>I suppose you use bs->options while resizing. This is a copy of the
>QDict which is not affected by the removal of processed options.

This probably explains why it works on the first resize, but fails on the second onward.

>
>> Can QAPI be the solution? I've seen some examples, and it looks like
>the
>> qcow2 driver already uses QAPI to some extent - I mean all the qdict
>stuff
>> is from QAPI, no? Can you please point me to an example of how QAPI
>can
>> solve the issue of an option that can accept both a size and a
>string?
>
>By using QAPI I mean using types like BlockdevOptionsQcow2. These types
>are directly generated from the JSON schema, where it's possible to
>specify an alternate of a string and an integer.
>
>One example for an alternate in the JSON schema is BlockdevRef, where
>you can give a string (to reference a node-name) or an inline
>declaration of a block device. In JSON, it looks like this:
>
>    { 'alternate': 'BlockdevRef',
>      'data': { 'definition': 'BlockdevOptions',
>                'reference': 'str' } }
>
>And the generated C type for it is:
>
>    struct BlockdevRef {
>        QType type;
>        union { /* union tag is @type */
>            BlockdevOptions definition;
>            char *reference;
>        } u;
>    };
>
>In our case, we would get a union of int64_t and char* instead, and
>still the QType type as a discriminator.

Thanks for the explanation!
But qcow2 already has the options in the JSON file, they're just not used for some reason?

Leonid.

>
>Kevin
Kevin Wolf July 26, 2018, 9:28 p.m. UTC | #14
Am 26.07.2018 um 23:08 hat Leonid Bloch geschrieben:
> On July 26, 2018 11:19:03 PM EEST, Kevin Wolf <kwolf@redhat.com> wrote:
> >Am 26.07.2018 um 21:43 hat Leonid Bloch geschrieben:
> >> On 07/26/2018 05:50 PM, Leonid Bloch wrote:
> >> > On 07/26/2018 05:42 PM, Kevin Wolf wrote:
> >> > > Am 26.07.2018 um 14:24 hat Leonid Bloch geschrieben:
> >> > > > > > You mean with QDict? I'll look into that now. But
> >> > > > > > already sent v5 before
> >> > > > > > reading this email.
> >> > > > > 
> >> > > > > Yes, with reading it from the QDict. (Or whatever the
> >simplest way is
> >> > > > > that results in the right external interface, but I suppose
> >this is the
> >> > > > > one.)
> >> > > > 
> >> > > > Well, there is a problem with that: I can easily isolate
> >> > > > l2-cache-size from QDict, check if it is "full", and if it is -
> >> > > > do whatever
> >> > > > is needed, and delete this option before parsing. But what if
> >it
> >> > > > is "foo"?
> >> > > > It will not get deleted, and the regular QEMU_OPT_SIZE parsing
> >> > > > error will
> >> > > > appear, stating that l2-cache-size "expects a non-negative
> >> > > > number..." - no
> >> > > > word about that it can expect "full" as well. Now, one can try
> >to modify
> >> > > > local_err->msg for this particular option, but this will
> >require
> >> > > > substantial
> >> > > > additional logic. I think considering this, it would be easier
> >> > > > to stick with
> >> > > > a dedicated option, l2-cache-full.
> >> > > > 
> >> > > > Do you think there is a smarter way to parse the l2-cache-size
> >> > > > option, so it
> >> > > > would accept both size and "full", while handling errors
> >> > > > correctly? It seems
> >> > > > more elegant to have a single option, but the internal handling
> >> > > > will be more
> >> > > > elegant and simpler with two mutually exclusive options.
> >> > > 
> >> > > I think we can live with the suboptimal error message for a
> >while. Once
> >> > > qcow2 is QAPIfied, it should become easy to improve it. Let's not
> >choose
> >> > > a worse design (that stays forever) for a temporarily better
> >error
> >> > > message.
> >> > 
> >> > OK. I'll add a TODO then.
> >> 
> >> Another problem without a dedicated option, is that if l2-cache-size
> >is
> >> processed and deleted, it can not be read again when needed for
> >resizing.
> >> Without some *extremely* dirty tricks, that is.
> >
> >Are you sure? The way that the .bdrv_open() callbacks work, _all_
> >options that are processed are removed from the QDict. The difference
> >is
> >just whether qemu_opts_absorb_qdict() removes them or you do that
> >manually. In the end, if options are left in the QDict, the block layer
> >returns an error because that means that an unknown option was
> >specified.
> >
> >I suppose you use bs->options while resizing. This is a copy of the
> >QDict which is not affected by the removal of processed options.
> 
> This probably explains why it works on the first resize, but fails on
> the second onward.

That makes sense. You probably need to clone the QDict before passing
it to qcow2_update_options(), with qdict_clone_shallow().

> >> Can QAPI be the solution? I've seen some examples, and it looks like
> >the
> >> qcow2 driver already uses QAPI to some extent - I mean all the qdict
> >stuff
> >> is from QAPI, no? Can you please point me to an example of how QAPI
> >can
> >> solve the issue of an option that can accept both a size and a
> >string?
> >
> >By using QAPI I mean using types like BlockdevOptionsQcow2. These types
> >are directly generated from the JSON schema, where it's possible to
> >specify an alternate of a string and an integer.
> >
> >One example for an alternate in the JSON schema is BlockdevRef, where
> >you can give a string (to reference a node-name) or an inline
> >declaration of a block device. In JSON, it looks like this:
> >
> >    { 'alternate': 'BlockdevRef',
> >      'data': { 'definition': 'BlockdevOptions',
> >                'reference': 'str' } }
> >
> >And the generated C type for it is:
> >
> >    struct BlockdevRef {
> >        QType type;
> >        union { /* union tag is @type */
> >            BlockdevOptions definition;
> >            char *reference;
> >        } u;
> >    };
> >
> >In our case, we would get a union of int64_t and char* instead, and
> >still the QType type as a discriminator.
> 
> Thanks for the explanation!
> But qcow2 already has the options in the JSON file, they're just not
> used for some reason?

The block layer configuration is a bit of a mess because of backwards
compatibility. The JSON schema is in fact used for -blockdev and for the
blockdev-add QMP command, but not for the old option -drive and
drive_add.

In the case of -blockdev/blockdev-add, after validating the input
against the schema, the QAPI objects are converted to QDicts so they can
use the same code path as the old code. Eventually, we want to use QAPI
objects all the way, but that requires either expressing all -drive
options in terms of QAPI object or getting rid of -driver altogether.
This is still going to take some time.

Kevin
Leonid Bloch July 26, 2018, 9:51 p.m. UTC | #15
On 07/27/2018 12:28 AM, Kevin Wolf wrote>
> That makes sense. You probably need to clone the QDict before passing
> it to qcow2_update_options(), with qdict_clone_shallow().

Sounds like a solution! Great, thanks!

>>>> Can QAPI be the solution? I've seen some examples, and it looks like
>>> the
>>>> qcow2 driver already uses QAPI to some extent - I mean all the qdict
>>> stuff
>>>> is from QAPI, no? Can you please point me to an example of how QAPI
>>> can
>>>> solve the issue of an option that can accept both a size and a
>>> string?
>>>
>>> By using QAPI I mean using types like BlockdevOptionsQcow2. These types
>>> are directly generated from the JSON schema, where it's possible to
>>> specify an alternate of a string and an integer.
>>>
>>> One example for an alternate in the JSON schema is BlockdevRef, where
>>> you can give a string (to reference a node-name) or an inline
>>> declaration of a block device. In JSON, it looks like this:
>>>
>>>     { 'alternate': 'BlockdevRef',
>>>       'data': { 'definition': 'BlockdevOptions',
>>>                 'reference': 'str' } }
>>>
>>> And the generated C type for it is:
>>>
>>>     struct BlockdevRef {
>>>         QType type;
>>>         union { /* union tag is @type */
>>>             BlockdevOptions definition;
>>>             char *reference;
>>>         } u;
>>>     };
>>>
>>> In our case, we would get a union of int64_t and char* instead, and
>>> still the QType type as a discriminator.
>>
>> Thanks for the explanation!
>> But qcow2 already has the options in the JSON file, they're just not
>> used for some reason?
> 
> The block layer configuration is a bit of a mess because of backwards
> compatibility. The JSON schema is in fact used for -blockdev and for the
> blockdev-add QMP command, but not for the old option -drive and
> drive_add.
> 
> In the case of -blockdev/blockdev-add, after validating the input
> against the schema, the QAPI objects are converted to QDicts so they can
> use the same code path as the old code. Eventually, we want to use QAPI
> objects all the way, but that requires either expressing all -drive
> options in terms of QAPI object or getting rid of -driver altogether.
> This is still going to take some time.

Thanks for the details! It's clear now!

Leonid.

> 
> Kevin
>
diff mbox series

Patch

diff --git a/block/qcow2.c b/block/qcow2.c
index ec9e6238a0..101b8b474b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -695,6 +695,11 @@  static QemuOptsList qcow2_runtime_opts = {
             .type = QEMU_OPT_SIZE,
             .help = "Maximum L2 table cache size",
         },
+        {
+            .name = QCOW2_OPT_L2_CACHE_FULL,
+            .type = QEMU_OPT_BOOL,
+            .help = "Create full coverage of the image with the L2 cache",
+        },
         {
             .name = QCOW2_OPT_L2_CACHE_ENTRY_SIZE,
             .type = QEMU_OPT_SIZE,
@@ -779,10 +784,12 @@  static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
     BDRVQcow2State *s = bs->opaque;
     uint64_t combined_cache_size;
     bool l2_cache_size_set, refcount_cache_size_set, combined_cache_size_set;
+    bool l2_cache_full_set;
     int min_refcount_cache = MIN_REFCOUNT_CACHE_SIZE * s->cluster_size;
 
     combined_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_CACHE_SIZE);
     l2_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_L2_CACHE_SIZE);
+    l2_cache_full_set = qemu_opt_get(opts, QCOW2_OPT_L2_CACHE_FULL);
     refcount_cache_size_set = qemu_opt_get(opts, QCOW2_OPT_REFCOUNT_CACHE_SIZE);
 
     combined_cache_size = qemu_opt_get_size(opts, QCOW2_OPT_CACHE_SIZE, 0);
@@ -793,6 +800,17 @@  static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
     *l2_cache_entry_size = qemu_opt_get_size(
         opts, QCOW2_OPT_L2_CACHE_ENTRY_SIZE, s->cluster_size);
 
+    uint64_t virtual_disk_size = bs->total_sectors * BDRV_SECTOR_SIZE;
+    uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
+
+    if (l2_cache_size_set && l2_cache_full_set) {
+        error_setg(errp, QCOW2_OPT_L2_CACHE_SIZE " and "
+                   QCOW2_OPT_L2_CACHE_FULL " may not be set at the same time");
+        return;
+    } else if (l2_cache_full_set) {
+        *l2_cache_size = max_l2_cache;
+    }
+
     if (combined_cache_size_set) {
         if (l2_cache_size_set && refcount_cache_size_set) {
             error_setg(errp, QCOW2_OPT_CACHE_SIZE ", " QCOW2_OPT_L2_CACHE_SIZE
@@ -800,8 +818,14 @@  static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
                        "at the same time");
             return;
         } else if (*l2_cache_size > combined_cache_size) {
-            error_setg(errp, QCOW2_OPT_L2_CACHE_SIZE " may not exceed "
-                       QCOW2_OPT_CACHE_SIZE);
+            if (l2_cache_full_set) {
+                error_setg(errp, QCOW2_OPT_CACHE_SIZE " must be greater than "
+                           "the full L2 cache if " QCOW2_OPT_L2_CACHE_FULL
+                           " is used");
+            } else {
+                error_setg(errp, QCOW2_OPT_L2_CACHE_SIZE " may not exceed "
+                           QCOW2_OPT_CACHE_SIZE);
+            }
             return;
         } else if (*refcount_cache_size > combined_cache_size) {
             error_setg(errp, QCOW2_OPT_REFCOUNT_CACHE_SIZE " may not exceed "
@@ -809,14 +833,11 @@  static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
             return;
         }
 
-        if (l2_cache_size_set) {
+        if (l2_cache_size_set || l2_cache_full_set) {
             *refcount_cache_size = combined_cache_size - *l2_cache_size;
         } else if (refcount_cache_size_set) {
             *l2_cache_size = combined_cache_size - *refcount_cache_size;
         } else {
-            uint64_t virtual_disk_size = bs->total_sectors * BDRV_SECTOR_SIZE;
-            uint64_t max_l2_cache = virtual_disk_size / (s->cluster_size / 8);
-
             /* Assign as much memory as possible to the L2 cache, and
              * use the remainder for the refcount cache */
             if (combined_cache_size >= max_l2_cache + min_refcount_cache) {
@@ -829,7 +850,7 @@  static void read_cache_sizes(BlockDriverState *bs, QemuOpts *opts,
             }
         }
     } else {
-        if (!l2_cache_size_set) {
+        if (!l2_cache_size_set && !l2_cache_full_set) {
             *l2_cache_size = MAX(DEFAULT_L2_CACHE_BYTE_SIZE,
                                  (uint64_t)DEFAULT_L2_CACHE_CLUSTERS
                                  * s->cluster_size);
diff --git a/block/qcow2.h b/block/qcow2.h
index 81b844e936..151e014bd8 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -97,6 +97,7 @@ 
 #define QCOW2_OPT_OVERLAP_BITMAP_DIRECTORY "overlap-check.bitmap-directory"
 #define QCOW2_OPT_CACHE_SIZE "cache-size"
 #define QCOW2_OPT_L2_CACHE_SIZE "l2-cache-size"
+#define QCOW2_OPT_L2_CACHE_FULL "l2-cache-full"
 #define QCOW2_OPT_L2_CACHE_ENTRY_SIZE "l2-cache-entry-size"
 #define QCOW2_OPT_REFCOUNT_CACHE_SIZE "refcount-cache-size"
 #define QCOW2_OPT_CACHE_CLEAN_INTERVAL "cache-clean-interval"
diff --git a/qapi/block-core.json b/qapi/block-core.json
index d40d5ecc3b..c584059e23 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2812,7 +2812,12 @@ 
 #                         refcount block caches in bytes (since 2.2)
 #
 # @l2-cache-size:         the maximum size of the L2 table cache in
-#                         bytes (since 2.2)
+#                         bytes (mutually exclusive with l2-cache-full)
+#                         (since 2.2)
+#
+# @l2-cache-full:         make the L2 table cache large enough to cover the
+#                         entire image (mutually exclusive with l2-cache-size)
+#                         (since 3.1)
 #
 # @l2-cache-entry-size:   the size of each entry in the L2 cache in
 #                         bytes. It must be a power of two between 512
@@ -2840,6 +2845,7 @@ 
             '*overlap-check': 'Qcow2OverlapChecks',
             '*cache-size': 'int',
             '*l2-cache-size': 'int',
+            '*l2-cache-full': 'bool',
             '*l2-cache-entry-size': 'int',
             '*refcount-cache-size': 'int',
             '*cache-clean-interval': 'int',
diff --git a/qemu-options.hx b/qemu-options.hx
index ef0706c359..6d417cb267 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -754,7 +754,7 @@  image file)
 The maximum total size of the L2 table and refcount block caches in bytes
 
 @item l2-cache-size
-The maximum size of the L2 table cache.
+The maximum size of the L2 table cache. (Mutually exclusive with l2-cache-full)
 (default: if cache-size is not defined - 1048576 bytes or 8 clusters,
 whichever is larger; if cache-size is defined and is large enough to
 accommodate enough L2 cache to cover the entire virtual size of the image plus
@@ -762,6 +762,10 @@  the minimal amount of refcount cache - enough to cover the entire image;
 if cache-size is defined and is not large enough - as much as possible while
 leaving space for the needed refcount cache)
 
+@item l2-cache-full
+Make the L2 table cache large enough to cover the entire image (mutually
+exclusive with l2-cache-size) (on/off; default: off)
+
 @item refcount-cache-size
 The maximum size of the refcount block cache in bytes
 (default: 4 times the cluster size, or if cache-size is defined and is large