diff mbox series

[v3,09/33] block: Add generic bdrv_inherited_options()

Message ID 20200218124242.584644-10-mreitz@redhat.com
State New
Headers show
Series block: Introduce real BdrvChildRole | expand

Commit Message

Max Reitz Feb. 18, 2020, 12:42 p.m. UTC
After the series this patch belongs to, we want to have a common
BdrvChildClass that encompasses all of child_file, child_format, and
child_backing.  Such a single class needs a single .inherit_options()
implementation, and this patch introduces it.

The next patch will show how the existing implementations can fall back
to it just by passing appropriate BdrvChildRole and parent_is_format
values.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 84 insertions(+)

Comments

Kevin Wolf May 6, 2020, 10:37 a.m. UTC | #1
Am 18.02.2020 um 13:42 hat Max Reitz geschrieben:
> After the series this patch belongs to, we want to have a common
> BdrvChildClass that encompasses all of child_file, child_format, and
> child_backing.  Such a single class needs a single .inherit_options()
> implementation, and this patch introduces it.
> 
> The next patch will show how the existing implementations can fall back
> to it just by passing appropriate BdrvChildRole and parent_is_format
> values.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> ---
>  block.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 84 insertions(+)
> 
> diff --git a/block.c b/block.c
> index c33f0e9b42..9179b9b604 100644
> --- a/block.c
> +++ b/block.c
> @@ -998,6 +998,90 @@ static void bdrv_temp_snapshot_options(int *child_flags, QDict *child_options,
>      *child_flags &= ~BDRV_O_NATIVE_AIO;
>  }
>  
> +/*
> + * Returns the options and flags that a generic child of a BDS should
> + * get, based on the given options and flags for the parent BDS.
> + */
> +static void __attribute__((unused))
> +    bdrv_inherited_options(BdrvChildRole role, bool parent_is_format,
> +                           int *child_flags, QDict *child_options,
> +                           int parent_flags, QDict *parent_options)
> +{
> +    int flags = parent_flags;
> +
> +    /*
> +     * First, decide whether to set, clear, or leave BDRV_O_PROTOCOL.
> +     * Generally, the question to answer is: Should this child be
> +     * format-probed by default?
> +     */
> +
> +    /*
> +     * Pure and non-filtered data children of non-format nodes should
> +     * be probed by default (even when the node itself has BDRV_O_PROTOCOL
> +     * set).  This only affects a very limited set of drivers (namely
> +     * quorum and blkverify when this comment was written).
> +     * Force-clear BDRV_O_PROTOCOL then.
> +     */
> +    if (!parent_is_format &&
> +        (role & (BDRV_CHILD_DATA | BDRV_CHILD_METADATA |
> +                 BDRV_CHILD_FILTERED)) ==
> +            BDRV_CHILD_DATA)

You could avoid the odd indentation (I can't decide whether or not it
should be one space more to align correctly) and probably also make the
expression more readable if you split it into:

    (role & BDRV_CHILD_DATA) &&
    !(role & (BDRV_CHILD_METADATA | BDRV_CHILD_FILTERED))

> +    {
> +        flags &= ~BDRV_O_PROTOCOL;
> +    }
> +
> +    /*
> +     * All children of format nodes (except for COW children) and all
> +     * metadata children in general should never be format-probed.
> +     * Force-set BDRV_O_PROTOCOL then.
> +     */
> +    if ((parent_is_format && !(role & BDRV_CHILD_COW)) ||
> +        (role & BDRV_CHILD_METADATA))
> +    {
> +        flags |= BDRV_O_PROTOCOL;
> +    }
> +
> +    /*
> +     * If the cache mode isn't explicitly set, inherit direct and no-flush from
> +     * the parent.
> +     */
> +    qdict_copy_default(child_options, parent_options, BDRV_OPT_CACHE_DIRECT);
> +    qdict_copy_default(child_options, parent_options, BDRV_OPT_CACHE_NO_FLUSH);
> +    qdict_copy_default(child_options, parent_options, BDRV_OPT_FORCE_SHARE);
> +
> +    if (role & BDRV_CHILD_COW) {
> +        /* backing files are always opened read-only */

Not "always", just by default.

> +        qdict_set_default_str(child_options, BDRV_OPT_READ_ONLY, "on");
> +        qdict_set_default_str(child_options, BDRV_OPT_AUTO_READ_ONLY, "off");
> +    } else {
> +        /* Inherit the read-only option from the parent if it's not set */
> +        qdict_copy_default(child_options, parent_options, BDRV_OPT_READ_ONLY);
> +        qdict_copy_default(child_options, parent_options,
> +                           BDRV_OPT_AUTO_READ_ONLY);
> +    }
> +
> +    if (parent_is_format && !(role & BDRV_CHILD_COW)) {
> +        /*
> +         * Our format drivers take care to send flushes and respect
> +         * unmap policy, so we can default to enable both on lower
> +         * layers regardless of the corresponding parent options.
> +         */
> +        qdict_set_default_str(child_options, BDRV_OPT_DISCARD, "unmap");
> +    }

Why the restriction to format here? Don't we break "unmap" propagation
through filters with this?

It would probably also be a good question why we don't propagate it to
the backing file, but this is preexisting.

> +    /* Clear flags that only apply to the top layer */
> +    flags &= ~(BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING | BDRV_O_COPY_ON_READ);
> +
> +    if (role & BDRV_CHILD_METADATA) {
> +        flags &= ~BDRV_O_NO_IO;
> +    }

Hm... This is interesting, but I guess it makes sense. It just never was
a hard rule that a format driver must not access data even internally
with NO_IO, but I think it holds true.

> +    if (role & BDRV_CHILD_COW) {
> +        flags &= ~BDRV_O_TEMPORARY;
> +    }

We could probably have a long discussion about whether this is right in
theory, but in practice BDRV_O_TEMPORARY only exists for snapshot=on,
for which we know that it's always qcow2 with a file and a backing
child. And there is no doubt that we make the right distinction in this
case.

> +    *child_flags = flags;
> +}
> +
>  /*
>   * Returns the options and flags that bs->file should get if a protocol driver
>   * is expected, based on the given options and flags for the parent BDS
> -- 
> 2.24.1

Kevin
Kevin Wolf May 6, 2020, 1:11 p.m. UTC | #2
Am 06.05.2020 um 12:37 hat Kevin Wolf geschrieben:
> Am 18.02.2020 um 13:42 hat Max Reitz geschrieben:
> > After the series this patch belongs to, we want to have a common
> > BdrvChildClass that encompasses all of child_file, child_format, and
> > child_backing.  Such a single class needs a single .inherit_options()
> > implementation, and this patch introduces it.
> > 
> > The next patch will show how the existing implementations can fall back
> > to it just by passing appropriate BdrvChildRole and parent_is_format
> > values.
> > 
> > Signed-off-by: Max Reitz <mreitz@redhat.com>
> > Reviewed-by: Eric Blake <eblake@redhat.com>
> > ---
> >  block.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 84 insertions(+)
> > 
> > diff --git a/block.c b/block.c
> > index c33f0e9b42..9179b9b604 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -998,6 +998,90 @@ static void bdrv_temp_snapshot_options(int *child_flags, QDict *child_options,
> >      *child_flags &= ~BDRV_O_NATIVE_AIO;
> >  }
> >  
> > +/*
> > + * Returns the options and flags that a generic child of a BDS should
> > + * get, based on the given options and flags for the parent BDS.
> > + */
> > +static void __attribute__((unused))
> > +    bdrv_inherited_options(BdrvChildRole role, bool parent_is_format,
> > +                           int *child_flags, QDict *child_options,
> > +                           int parent_flags, QDict *parent_options)
> > +{
> > +    int flags = parent_flags;
> > +
> > +    /*
> > +     * First, decide whether to set, clear, or leave BDRV_O_PROTOCOL.
> > +     * Generally, the question to answer is: Should this child be
> > +     * format-probed by default?
> > +     */

Just for clarity: Do you know a good reason to ever leave it (i.e.
inherit it from the parent), except that that's what we have always been
doing for backing files? Though of course, only formats have backing
files, so the flag would never be set in practice in this case.

> > +    /*
> > +     * Pure and non-filtered data children of non-format nodes should
> > +     * be probed by default (even when the node itself has BDRV_O_PROTOCOL
> > +     * set).  This only affects a very limited set of drivers (namely
> > +     * quorum and blkverify when this comment was written).
> > +     * Force-clear BDRV_O_PROTOCOL then.
> > +     */
> > +    if (!parent_is_format &&
> > +        (role & (BDRV_CHILD_DATA | BDRV_CHILD_METADATA |
> > +                 BDRV_CHILD_FILTERED)) ==
> > +            BDRV_CHILD_DATA)
> 
> You could avoid the odd indentation (I can't decide whether or not it
> should be one space more to align correctly) and probably also make the
> expression more readable if you split it into:
> 
>     (role & BDRV_CHILD_DATA) &&
>     !(role & (BDRV_CHILD_METADATA | BDRV_CHILD_FILTERED))
> 
> > +    {
> > +        flags &= ~BDRV_O_PROTOCOL;
> > +    }
> > +
> > +    /*
> > +     * All children of format nodes (except for COW children) and all
> > +     * metadata children in general should never be format-probed.
> > +     * Force-set BDRV_O_PROTOCOL then.
> > +     */
> > +    if ((parent_is_format && !(role & BDRV_CHILD_COW)) ||
> > +        (role & BDRV_CHILD_METADATA))
> > +    {
> > +        flags |= BDRV_O_PROTOCOL;
> > +    }
> > +
> > +    /*
> > +     * If the cache mode isn't explicitly set, inherit direct and no-flush from
> > +     * the parent.
> > +     */
> > +    qdict_copy_default(child_options, parent_options, BDRV_OPT_CACHE_DIRECT);
> > +    qdict_copy_default(child_options, parent_options, BDRV_OPT_CACHE_NO_FLUSH);
> > +    qdict_copy_default(child_options, parent_options, BDRV_OPT_FORCE_SHARE);
> > +
> > +    if (role & BDRV_CHILD_COW) {
> > +        /* backing files are always opened read-only */
> 
> Not "always", just by default.
> 
> > +        qdict_set_default_str(child_options, BDRV_OPT_READ_ONLY, "on");
> > +        qdict_set_default_str(child_options, BDRV_OPT_AUTO_READ_ONLY, "off");
> > +    } else {
> > +        /* Inherit the read-only option from the parent if it's not set */
> > +        qdict_copy_default(child_options, parent_options, BDRV_OPT_READ_ONLY);
> > +        qdict_copy_default(child_options, parent_options,
> > +                           BDRV_OPT_AUTO_READ_ONLY);
> > +    }
> > +
> > +    if (parent_is_format && !(role & BDRV_CHILD_COW)) {
> > +        /*
> > +         * Our format drivers take care to send flushes and respect
> > +         * unmap policy, so we can default to enable both on lower
> > +         * layers regardless of the corresponding parent options.
> > +         */
> > +        qdict_set_default_str(child_options, BDRV_OPT_DISCARD, "unmap");
> > +    }
> 
> Why the restriction to format here? Don't we break "unmap" propagation
> through filters with this?
> 
> It would probably also be a good question why we don't propagate it to
> the backing file, but this is preexisting.

Some patches later, I think the fix is an else branch that copies the
flag from parent_options.

Kevin
Max Reitz May 7, 2020, 8:49 a.m. UTC | #3
On 06.05.20 12:37, Kevin Wolf wrote:
> Am 18.02.2020 um 13:42 hat Max Reitz geschrieben:
>> After the series this patch belongs to, we want to have a common
>> BdrvChildClass that encompasses all of child_file, child_format, and
>> child_backing.  Such a single class needs a single .inherit_options()
>> implementation, and this patch introduces it.
>>
>> The next patch will show how the existing implementations can fall back
>> to it just by passing appropriate BdrvChildRole and parent_is_format
>> values.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>> ---
>>  block.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 84 insertions(+)
>>
>> diff --git a/block.c b/block.c
>> index c33f0e9b42..9179b9b604 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -998,6 +998,90 @@ static void bdrv_temp_snapshot_options(int *child_flags, QDict *child_options,
>>      *child_flags &= ~BDRV_O_NATIVE_AIO;
>>  }
>>  
>> +/*
>> + * Returns the options and flags that a generic child of a BDS should
>> + * get, based on the given options and flags for the parent BDS.
>> + */
>> +static void __attribute__((unused))
>> +    bdrv_inherited_options(BdrvChildRole role, bool parent_is_format,
>> +                           int *child_flags, QDict *child_options,
>> +                           int parent_flags, QDict *parent_options)
>> +{
>> +    int flags = parent_flags;
>> +
>> +    /*
>> +     * First, decide whether to set, clear, or leave BDRV_O_PROTOCOL.
>> +     * Generally, the question to answer is: Should this child be
>> +     * format-probed by default?
>> +     */
>> +
>> +    /*
>> +     * Pure and non-filtered data children of non-format nodes should
>> +     * be probed by default (even when the node itself has BDRV_O_PROTOCOL
>> +     * set).  This only affects a very limited set of drivers (namely
>> +     * quorum and blkverify when this comment was written).
>> +     * Force-clear BDRV_O_PROTOCOL then.
>> +     */
>> +    if (!parent_is_format &&
>> +        (role & (BDRV_CHILD_DATA | BDRV_CHILD_METADATA |
>> +                 BDRV_CHILD_FILTERED)) ==
>> +            BDRV_CHILD_DATA)
> 
> You could avoid the odd indentation (I can't decide whether or not it
> should be one space more to align correctly) and probably also make the
> expression more readable if you split it into:
> 
>     (role & BDRV_CHILD_DATA) &&
>     !(role & (BDRV_CHILD_METADATA | BDRV_CHILD_FILTERED))

Yes, looks good.

>> +    {
>> +        flags &= ~BDRV_O_PROTOCOL;
>> +    }
>> +
>> +    /*
>> +     * All children of format nodes (except for COW children) and all
>> +     * metadata children in general should never be format-probed.
>> +     * Force-set BDRV_O_PROTOCOL then.
>> +     */
>> +    if ((parent_is_format && !(role & BDRV_CHILD_COW)) ||
>> +        (role & BDRV_CHILD_METADATA))
>> +    {
>> +        flags |= BDRV_O_PROTOCOL;
>> +    }
>> +
>> +    /*
>> +     * If the cache mode isn't explicitly set, inherit direct and no-flush from
>> +     * the parent.
>> +     */
>> +    qdict_copy_default(child_options, parent_options, BDRV_OPT_CACHE_DIRECT);
>> +    qdict_copy_default(child_options, parent_options, BDRV_OPT_CACHE_NO_FLUSH);
>> +    qdict_copy_default(child_options, parent_options, BDRV_OPT_FORCE_SHARE);
>> +
>> +    if (role & BDRV_CHILD_COW) {
>> +        /* backing files are always opened read-only */
> 
> Not "always", just by default.

OK.  I just copied the comment from bdrv_backing_options().

>> +        qdict_set_default_str(child_options, BDRV_OPT_READ_ONLY, "on");
>> +        qdict_set_default_str(child_options, BDRV_OPT_AUTO_READ_ONLY, "off");
>> +    } else {
>> +        /* Inherit the read-only option from the parent if it's not set */
>> +        qdict_copy_default(child_options, parent_options, BDRV_OPT_READ_ONLY);
>> +        qdict_copy_default(child_options, parent_options,
>> +                           BDRV_OPT_AUTO_READ_ONLY);
>> +    }
>> +
>> +    if (parent_is_format && !(role & BDRV_CHILD_COW)) {
>> +        /*
>> +         * Our format drivers take care to send flushes and respect
>> +         * unmap policy, so we can default to enable both on lower
>> +         * layers regardless of the corresponding parent options.
>> +         */
>> +        qdict_set_default_str(child_options, BDRV_OPT_DISCARD, "unmap");
>> +    }
> 
> Why the restriction to format here? Don't we break "unmap" propagation
> through filters with this?

Right now (before this series), the behavior seems ambiguous, in that
for filters that use bs->file, it is set, but for those that use
bs->backing, it isn’t.

But I suspect the main reason for what I did is the way I interpreted
the comment (which before this series only mentions block drivers in
general, not specifically format drivers): It sounded to me as if the
block driver needed to respect the unmap policy, and I didn’t think
filters did that.  So it was my understanding that filter drivers would
just propagate discards and thus we couldn’t default-enable unmap on
their children.

But I was wrong, the block driver doesn’t need to respect anything,
because bdrv_co_pdiscard() already does.

So I suppose it should indeed be enabled for all children, with the
comment changed to express that it isn’t any block driver that respects
unmap policy, but bdrv_co_pdiscard(), e.g.:

bdrv_co_pdiscard() respects unmap policy for the parent, so we can
default to enable it on lower layers regardless of the parent option.

> It would probably also be a good question why we don't propagate it to
> the backing file, but this is preexisting.

I suppose we should, although it’s irrelevant, so.  I suppose I’ll just
drop the parent_is_format, adjust the comment and that should be fine
for this series.

Max
Max Reitz May 7, 2020, 9:18 a.m. UTC | #4
On 06.05.20 15:11, Kevin Wolf wrote:
> Am 06.05.2020 um 12:37 hat Kevin Wolf geschrieben:
>> Am 18.02.2020 um 13:42 hat Max Reitz geschrieben:
>>> After the series this patch belongs to, we want to have a common
>>> BdrvChildClass that encompasses all of child_file, child_format, and
>>> child_backing.  Such a single class needs a single .inherit_options()
>>> implementation, and this patch introduces it.
>>>
>>> The next patch will show how the existing implementations can fall back
>>> to it just by passing appropriate BdrvChildRole and parent_is_format
>>> values.
>>>
>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>> ---
>>>  block.c | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 84 insertions(+)
>>>
>>> diff --git a/block.c b/block.c
>>> index c33f0e9b42..9179b9b604 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -998,6 +998,90 @@ static void bdrv_temp_snapshot_options(int *child_flags, QDict *child_options,
>>>      *child_flags &= ~BDRV_O_NATIVE_AIO;
>>>  }
>>>  
>>> +/*
>>> + * Returns the options and flags that a generic child of a BDS should
>>> + * get, based on the given options and flags for the parent BDS.
>>> + */
>>> +static void __attribute__((unused))
>>> +    bdrv_inherited_options(BdrvChildRole role, bool parent_is_format,
>>> +                           int *child_flags, QDict *child_options,
>>> +                           int parent_flags, QDict *parent_options)
>>> +{
>>> +    int flags = parent_flags;
>>> +
>>> +    /*
>>> +     * First, decide whether to set, clear, or leave BDRV_O_PROTOCOL.
>>> +     * Generally, the question to answer is: Should this child be
>>> +     * format-probed by default?
>>> +     */
> 
> Just for clarity: Do you know a good reason to ever leave it (i.e.
> inherit it from the parent), except that that's what we have always been
> doing for backing files? Though of course, only formats have backing
> files, so the flag would never be set in practice in this case.

It seems correct for filters.

[...]

>>> +    if (parent_is_format && !(role & BDRV_CHILD_COW)) {
>>> +        /*
>>> +         * Our format drivers take care to send flushes and respect
>>> +         * unmap policy, so we can default to enable both on lower
>>> +         * layers regardless of the corresponding parent options.
>>> +         */
>>> +        qdict_set_default_str(child_options, BDRV_OPT_DISCARD, "unmap");
>>> +    }
>>
>> Why the restriction to format here? Don't we break "unmap" propagation
>> through filters with this?
>>
>> It would probably also be a good question why we don't propagate it to
>> the backing file, but this is preexisting.
> 
> Some patches later, I think the fix is an else branch that copies the
> flag from parent_options.

I thought about the same thing, but is that really necessary if
bdrv_co_pdiscard() will already suppress discards on the parent if unmap
is false?

Max
Kevin Wolf May 7, 2020, 11:19 a.m. UTC | #5
Am 07.05.2020 um 10:49 hat Max Reitz geschrieben:
> On 06.05.20 12:37, Kevin Wolf wrote:
> > Am 18.02.2020 um 13:42 hat Max Reitz geschrieben:
> >> +    if (role & BDRV_CHILD_COW) {
> >> +        /* backing files are always opened read-only */
> > 
> > Not "always", just by default.
> 
> OK.  I just copied the comment from bdrv_backing_options().

Yes, sorry, I noticed this only later (it's how review goes when a move
is split into a copy in one patch and a removal later). I don't insist
on a change if you prefer to have a clean copy.

> >> +        qdict_set_default_str(child_options, BDRV_OPT_READ_ONLY, "on");
> >> +        qdict_set_default_str(child_options, BDRV_OPT_AUTO_READ_ONLY, "off");
> >> +    } else {
> >> +        /* Inherit the read-only option from the parent if it's not set */
> >> +        qdict_copy_default(child_options, parent_options, BDRV_OPT_READ_ONLY);
> >> +        qdict_copy_default(child_options, parent_options,
> >> +                           BDRV_OPT_AUTO_READ_ONLY);
> >> +    }
> >> +
> >> +    if (parent_is_format && !(role & BDRV_CHILD_COW)) {
> >> +        /*
> >> +         * Our format drivers take care to send flushes and respect
> >> +         * unmap policy, so we can default to enable both on lower
> >> +         * layers regardless of the corresponding parent options.
> >> +         */
> >> +        qdict_set_default_str(child_options, BDRV_OPT_DISCARD, "unmap");
> >> +    }
> > 
> > Why the restriction to format here? Don't we break "unmap" propagation
> > through filters with this?
> 
> Right now (before this series), the behavior seems ambiguous, in that
> for filters that use bs->file, it is set, but for those that use
> bs->backing, it isn’t.

It's probably easy to agree that this is a bug.

> But I suspect the main reason for what I did is the way I interpreted
> the comment (which before this series only mentions block drivers in
> general, not specifically format drivers): It sounded to me as if the
> block driver needed to respect the unmap policy, and I didn’t think
> filters did that.  So it was my understanding that filter drivers would
> just propagate discards and thus we couldn’t default-enable unmap on
> their children.

This was actually my thought as well. And in order to propagate, we have
to copy the option from parent_options, no? Currently it will always be
disabled (unless specified explicitly for the node) because that's the
default setting for unmap.

> But I was wrong, the block driver doesn’t need to respect anything,
> because bdrv_co_pdiscard() already does.
> 
> So I suppose it should indeed be enabled for all children, with the
> comment changed to express that it isn’t any block driver that respects
> unmap policy, but bdrv_co_pdiscard(), e.g.:
> 
> bdrv_co_pdiscard() respects unmap policy for the parent, so we can
> default to enable it on lower layers regardless of the parent option.

This would restore the behaviour before this series for child_file and
child_format. It would be different in that it also enables "unmap" for
backing files, which should be okay.

> > It would probably also be a good question why we don't propagate it to
> > the backing file, but this is preexisting.
> 
> I suppose we should, although it’s irrelevant, so.  I suppose I’ll just
> drop the parent_is_format, adjust the comment and that should be fine
> for this series.

Isn't it relevant for zero writes during active commit? (The "normal"
intermediate commit job doesn't even try to optimise zero blocks...)

The job will have its own BdrvChild to access the node, but option
inheritance happens only from the parent that "owns" the backing file,
so if a qcow2 image doesn't set "unmap" on its COW child, unmap will be
disabled for the active commit job, too.

(Oops. Turns out that it's not the case because the 'unmap' option for
the job exists only for drive-mirror. blockdev-mirror passes true
unconditionally and block-commit passes false unconditionally. I'm
always amazed how consistently we fail to be consistent.

But I think using zero writes with MAY_UNMAP in a commit job is an
obvious extension, so considering it now can't hurt anyway.)

Kevin
Max Reitz May 7, 2020, 11:34 a.m. UTC | #6
On 07.05.20 13:19, Kevin Wolf wrote:
> Am 07.05.2020 um 10:49 hat Max Reitz geschrieben:
>> On 06.05.20 12:37, Kevin Wolf wrote:
>>> Am 18.02.2020 um 13:42 hat Max Reitz geschrieben:
>>>> +    if (role & BDRV_CHILD_COW) {
>>>> +        /* backing files are always opened read-only */
>>>
>>> Not "always", just by default.
>>
>> OK.  I just copied the comment from bdrv_backing_options().
> 
> Yes, sorry, I noticed this only later (it's how review goes when a move
> is split into a copy in one patch and a removal later). I don't insist
> on a change if you prefer to have a clean copy.
> 
>>>> +        qdict_set_default_str(child_options, BDRV_OPT_READ_ONLY, "on");
>>>> +        qdict_set_default_str(child_options, BDRV_OPT_AUTO_READ_ONLY, "off");
>>>> +    } else {
>>>> +        /* Inherit the read-only option from the parent if it's not set */
>>>> +        qdict_copy_default(child_options, parent_options, BDRV_OPT_READ_ONLY);
>>>> +        qdict_copy_default(child_options, parent_options,
>>>> +                           BDRV_OPT_AUTO_READ_ONLY);
>>>> +    }
>>>> +
>>>> +    if (parent_is_format && !(role & BDRV_CHILD_COW)) {
>>>> +        /*
>>>> +         * Our format drivers take care to send flushes and respect
>>>> +         * unmap policy, so we can default to enable both on lower
>>>> +         * layers regardless of the corresponding parent options.
>>>> +         */
>>>> +        qdict_set_default_str(child_options, BDRV_OPT_DISCARD, "unmap");
>>>> +    }
>>>
>>> Why the restriction to format here? Don't we break "unmap" propagation
>>> through filters with this?
>>
>> Right now (before this series), the behavior seems ambiguous, in that
>> for filters that use bs->file, it is set, but for those that use
>> bs->backing, it isn’t.
> 
> It's probably easy to agree that this is a bug.
> 
>> But I suspect the main reason for what I did is the way I interpreted
>> the comment (which before this series only mentions block drivers in
>> general, not specifically format drivers): It sounded to me as if the
>> block driver needed to respect the unmap policy, and I didn’t think
>> filters did that.  So it was my understanding that filter drivers would
>> just propagate discards and thus we couldn’t default-enable unmap on
>> their children.
> 
> This was actually my thought as well. And in order to propagate, we have
> to copy the option from parent_options, no? Currently it will always be
> disabled (unless specified explicitly for the node) because that's the
> default setting for unmap.
> 
>> But I was wrong, the block driver doesn’t need to respect anything,
>> because bdrv_co_pdiscard() already does.
>>
>> So I suppose it should indeed be enabled for all children, with the
>> comment changed to express that it isn’t any block driver that respects
>> unmap policy, but bdrv_co_pdiscard(), e.g.:
>>
>> bdrv_co_pdiscard() respects unmap policy for the parent, so we can
>> default to enable it on lower layers regardless of the parent option.
> 
> This would restore the behaviour before this series for child_file and
> child_format. It would be different in that it also enables "unmap" for
> backing files, which should be okay.
> 
>>> It would probably also be a good question why we don't propagate it to
>>> the backing file, but this is preexisting.
>>
>> I suppose we should, although it’s irrelevant, so.  I suppose I’ll just
>> drop the parent_is_format, adjust the comment and that should be fine
>> for this series.
> 
> Isn't it relevant for zero writes during active commit? (The "normal"
> intermediate commit job doesn't even try to optimise zero blocks...)
> 
> The job will have its own BdrvChild to access the node, but option
> inheritance happens only from the parent that "owns" the backing file,
> so if a qcow2 image doesn't set "unmap" on its COW child, unmap will be
> disabled for the active commit job, too.
> 
> (Oops. Turns out that it's not the case because the 'unmap' option for
> the job exists only for drive-mirror. blockdev-mirror passes true
> unconditionally and block-commit passes false unconditionally. I'm
> always amazed how consistently we fail to be consistent.

Alles kaputt :)

> But I think using zero writes with MAY_UNMAP in a commit job is an
> obvious extension, so considering it now can't hurt anyway.)

OK.  So I’ll just make unmap=on the default always.

Max
diff mbox series

Patch

diff --git a/block.c b/block.c
index c33f0e9b42..9179b9b604 100644
--- a/block.c
+++ b/block.c
@@ -998,6 +998,90 @@  static void bdrv_temp_snapshot_options(int *child_flags, QDict *child_options,
     *child_flags &= ~BDRV_O_NATIVE_AIO;
 }
 
+/*
+ * Returns the options and flags that a generic child of a BDS should
+ * get, based on the given options and flags for the parent BDS.
+ */
+static void __attribute__((unused))
+    bdrv_inherited_options(BdrvChildRole role, bool parent_is_format,
+                           int *child_flags, QDict *child_options,
+                           int parent_flags, QDict *parent_options)
+{
+    int flags = parent_flags;
+
+    /*
+     * First, decide whether to set, clear, or leave BDRV_O_PROTOCOL.
+     * Generally, the question to answer is: Should this child be
+     * format-probed by default?
+     */
+
+    /*
+     * Pure and non-filtered data children of non-format nodes should
+     * be probed by default (even when the node itself has BDRV_O_PROTOCOL
+     * set).  This only affects a very limited set of drivers (namely
+     * quorum and blkverify when this comment was written).
+     * Force-clear BDRV_O_PROTOCOL then.
+     */
+    if (!parent_is_format &&
+        (role & (BDRV_CHILD_DATA | BDRV_CHILD_METADATA |
+                 BDRV_CHILD_FILTERED)) ==
+            BDRV_CHILD_DATA)
+    {
+        flags &= ~BDRV_O_PROTOCOL;
+    }
+
+    /*
+     * All children of format nodes (except for COW children) and all
+     * metadata children in general should never be format-probed.
+     * Force-set BDRV_O_PROTOCOL then.
+     */
+    if ((parent_is_format && !(role & BDRV_CHILD_COW)) ||
+        (role & BDRV_CHILD_METADATA))
+    {
+        flags |= BDRV_O_PROTOCOL;
+    }
+
+    /*
+     * If the cache mode isn't explicitly set, inherit direct and no-flush from
+     * the parent.
+     */
+    qdict_copy_default(child_options, parent_options, BDRV_OPT_CACHE_DIRECT);
+    qdict_copy_default(child_options, parent_options, BDRV_OPT_CACHE_NO_FLUSH);
+    qdict_copy_default(child_options, parent_options, BDRV_OPT_FORCE_SHARE);
+
+    if (role & BDRV_CHILD_COW) {
+        /* backing files are always opened read-only */
+        qdict_set_default_str(child_options, BDRV_OPT_READ_ONLY, "on");
+        qdict_set_default_str(child_options, BDRV_OPT_AUTO_READ_ONLY, "off");
+    } else {
+        /* Inherit the read-only option from the parent if it's not set */
+        qdict_copy_default(child_options, parent_options, BDRV_OPT_READ_ONLY);
+        qdict_copy_default(child_options, parent_options,
+                           BDRV_OPT_AUTO_READ_ONLY);
+    }
+
+    if (parent_is_format && !(role & BDRV_CHILD_COW)) {
+        /*
+         * Our format drivers take care to send flushes and respect
+         * unmap policy, so we can default to enable both on lower
+         * layers regardless of the corresponding parent options.
+         */
+        qdict_set_default_str(child_options, BDRV_OPT_DISCARD, "unmap");
+    }
+
+    /* Clear flags that only apply to the top layer */
+    flags &= ~(BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING | BDRV_O_COPY_ON_READ);
+
+    if (role & BDRV_CHILD_METADATA) {
+        flags &= ~BDRV_O_NO_IO;
+    }
+    if (role & BDRV_CHILD_COW) {
+        flags &= ~BDRV_O_TEMPORARY;
+    }
+
+    *child_flags = flags;
+}
+
 /*
  * Returns the options and flags that bs->file should get if a protocol driver
  * is expected, based on the given options and flags for the parent BDS