[1/2] commit: Add top-node/base-node options

Message ID 20180810162658.6562-2-kwolf@redhat.com
State New
Headers show
Series
  • commit: Add top-node/base-node options
Related show

Commit Message

Kevin Wolf Aug. 10, 2018, 4:26 p.m.
The block-commit QMP command required specifying the top and base nodes
of the commit jobs using the file name of that node. While this works
in simple cases (local files with absolute paths), the file names
generated for more complicated setups can be hard to predict.

This adds two new options top-node and base-node to the command, which
allow specifying node names instead. They are mutually exclusive with
the old options.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qapi/block-core.json | 24 ++++++++++++++++++------
 blockdev.c           | 32 ++++++++++++++++++++++++++++++--
 2 files changed, 48 insertions(+), 8 deletions(-)

Comments

Eric Blake Aug. 10, 2018, 5:33 p.m. | #1
On 08/10/2018 11:26 AM, Kevin Wolf wrote:
> The block-commit QMP command required specifying the top and base nodes
> of the commit jobs using the file name of that node. While this works
> in simple cases (local files with absolute paths), the file names
> generated for more complicated setups can be hard to predict.
> 
> This adds two new options top-node and base-node to the command, which
> allow specifying node names instead. They are mutually exclusive with
> the old options.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   qapi/block-core.json | 24 ++++++++++++++++++------
>   blockdev.c           | 32 ++++++++++++++++++++++++++++++--
>   2 files changed, 48 insertions(+), 8 deletions(-)
> 
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index 5b9084a394..91dd075c84 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -1455,12 +1455,23 @@
>   #
>   # @device:  the device name or node-name of a root node
>   #
> -# @base:   The file name of the backing image to write data into.
> -#                    If not specified, this is the deepest backing image.
> +# @base-node: The node name of the backing image to write data into.
> +#             If not specified, this is the deepest backing image.
> +#             (since: 2.10)

I'd word this as (since 3.1)...

>   #
> -# @top:    The file name of the backing image within the image chain,
> -#                    which contains the topmost data to be committed down. If
> -#                    not specified, this is the active layer.
> +# @base: Same as @base-node, except that it is a file name rather than a node
> +#        name. This must be the exact filename string that was used to open the
> +#        node; other strings, even if addressing the same file, are not
> +#        accepted (deprecated, use @base-node instead)

...and this as (since 2.10). When we finish the deprecation and remove 
@base, then we might consolidate the 'since' documentation at that time, 
but until then, I think listing the two separate releases gives users an 
idea of how far back they might have been using the deprecated code, and 
when the preferred form was introduced.

> +#
> +# @top-node: The node name of the backing image within the image chain
> +#            which contains the topmost data to be committed down. If
> +#            not specified, this is the active layer. (since: 2.10)
> +#
> +# @top: Same as @top-node, except that it is a file name rather than a node
> +#       name. This must be the exact filename string that was used to open the
> +#       node; other strings, even if addressing the same file, are not
> +#       accepted (deprecated, use @base-node instead)

Likewise.

Actually, do we NEED new arguments? Can we just make @base and @top 
accept either an exact file name OR a node name?  On the other hand, new 
arguments are introspectible, overloading the old argument to take two 
forms is not.  So that doesn't help :(

Or, here's an idea:

Keep the name @base and @top, but add a new '*by-node':'bool' parameter, 
defaulting to false for now, but perhaps with a deprecation warning that 
we'll switch the default to true in one release and delete the parameter 
altogether in an even later release. When false, @base and @top are 
filenames, as before; when true, @base and @top are node names instead. 
Introspectible, nicer names in the long run, and avoids having to detect 
a user providing two mutually-exclusive options at once.

> +++ b/blockdev.c
> @@ -3308,7 +3308,9 @@ out:
>   }
>   
>   void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
> +                      bool has_base_node, const char *base_node,
>                         bool has_base, const char *base,
> +                      bool has_top_node, const char *top_node,
>                         bool has_top, const char *top,
>                         bool has_backing_file, const char *backing_file,
>                         bool has_speed, int64_t speed,

Getting to be a long signature. Should we use 'boxed':true in the QAPI 
file to make this easier to write?  (Separate commit)
Kevin Wolf Aug. 13, 2018, 9:08 a.m. | #2
Am 10.08.2018 um 19:33 hat Eric Blake geschrieben:
> On 08/10/2018 11:26 AM, Kevin Wolf wrote:
> > The block-commit QMP command required specifying the top and base nodes
> > of the commit jobs using the file name of that node. While this works
> > in simple cases (local files with absolute paths), the file names
> > generated for more complicated setups can be hard to predict.
> > 
> > This adds two new options top-node and base-node to the command, which
> > allow specifying node names instead. They are mutually exclusive with
> > the old options.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >   qapi/block-core.json | 24 ++++++++++++++++++------
> >   blockdev.c           | 32 ++++++++++++++++++++++++++++++--
> >   2 files changed, 48 insertions(+), 8 deletions(-)
> > 
> > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > index 5b9084a394..91dd075c84 100644
> > --- a/qapi/block-core.json
> > +++ b/qapi/block-core.json
> > @@ -1455,12 +1455,23 @@
> >   #
> >   # @device:  the device name or node-name of a root node
> >   #
> > -# @base:   The file name of the backing image to write data into.
> > -#                    If not specified, this is the deepest backing image.
> > +# @base-node: The node name of the backing image to write data into.
> > +#             If not specified, this is the deepest backing image.
> > +#             (since: 2.10)
> 
> I'd word this as (since 3.1)...

Whoops. Apparently I didn't read the documentation change carefully
enough when resurrecting this patch from an old branch.

> >  #
> > -# @top:    The file name of the backing image within the image chain,
> > -#                    which contains the topmost data to be committed down. If
> > -#                    not specified, this is the active layer.
> > +# @base: Same as @base-node, except that it is a file name rather than a node
> > +#        name. This must be the exact filename string that was used to open the
> > +#        node; other strings, even if addressing the same file, are not
> > +#        accepted (deprecated, use @base-node instead)
> 
> ...and this as (since 2.10).

No, 2.10 is just completely wrong. @base exists since the command was
introduced, which is commit ed61fc10e8c or QEMU 1.3.

> When we finish the deprecation and remove @base, then we might
> consolidate the 'since' documentation at that time, but until then, I
> think listing the two separate releases gives users an idea of how far
> back they might have been using the deprecated code, and when the
> preferred form was introduced.

Yes, obviously.

> > +#
> > +# @top-node: The node name of the backing image within the image chain
> > +#            which contains the topmost data to be committed down. If
> > +#            not specified, this is the active layer. (since: 2.10)
> > +#
> > +# @top: Same as @top-node, except that it is a file name rather than a node
> > +#       name. This must be the exact filename string that was used to open the
> > +#       node; other strings, even if addressing the same file, are not
> > +#       accepted (deprecated, use @base-node instead)
> 
> Likewise.
> 
> Actually, do we NEED new arguments? Can we just make @base and @top accept
> either an exact file name OR a node name?

No, no, no, no, no!

You can't tell whether "foo" is a file name or a node name, and they
could both exist at the same time, so it would be ambiguous. We should
avoid mixing semantically different things in a single field whenever
it's possible.

The reason why node name and BlockBackend name can be used in the same
option is that they share a name space, i.e. if there is already a node
name "foo", trying to create a BlockBackend "foo" will fail, and vice
versa.

> On the other hand, new arguments are introspectible, overloading the
> old argument to take two forms is not.
> So that doesn't help :(

That, too, yes.

> Or, here's an idea:
> 
> Keep the name @base and @top, but add a new '*by-node':'bool' parameter,
> defaulting to false for now, but perhaps with a deprecation warning that
> we'll switch the default to true in one release and delete the parameter
> altogether in an even later release. When false, @base and @top are
> filenames, as before; when true, @base and @top are node names instead.
> Introspectible, nicer names in the long run, and avoids having to detect a
> user providing two mutually-exclusive options at once.

I don't like options that completely change the semantics of other
options, but maybe that's just personal preference.

Anyway, thinking about the long term for block-commit is useless, the
command is just broken (for example, the @device option doesn't make any
sense) and will have to be replaced. But libvirt needs something _now_
for the -blockdev support, so I decided to add this as a quick hack
before we get the proper replacement.

I think it makes more sense to create a new blockdev-commit (which
would be a name more in line with the other block job commands) and
deprecate the old block-commit command as a whole.

> > +++ b/blockdev.c
> > @@ -3308,7 +3308,9 @@ out:
> >   }
> >   void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
> > +                      bool has_base_node, const char *base_node,
> >                         bool has_base, const char *base,
> > +                      bool has_top_node, const char *top_node,
> >                         bool has_top, const char *top,
> >                         bool has_backing_file, const char *backing_file,
> >                         bool has_speed, int64_t speed,
> 
> Getting to be a long signature. Should we use 'boxed':true in the QAPI file
> to make this easier to write?  (Separate commit)

It's an option.

Has any progress been made on the plan to support defaults in QAPI, so
that we could get rid of the has_* parameters?

Kevin
Markus Armbruster Aug. 13, 2018, 9:35 a.m. | #3
Kevin Wolf <kwolf@redhat.com> writes:

> Am 10.08.2018 um 19:33 hat Eric Blake geschrieben:
>> On 08/10/2018 11:26 AM, Kevin Wolf wrote:
>> > The block-commit QMP command required specifying the top and base nodes
>> > of the commit jobs using the file name of that node. While this works
>> > in simple cases (local files with absolute paths), the file names
>> > generated for more complicated setups can be hard to predict.
>> > 
>> > This adds two new options top-node and base-node to the command, which
>> > allow specifying node names instead. They are mutually exclusive with
>> > the old options.
>> > 
>> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>> > ---
>> >   qapi/block-core.json | 24 ++++++++++++++++++------
>> >   blockdev.c           | 32 ++++++++++++++++++++++++++++++--
>> >   2 files changed, 48 insertions(+), 8 deletions(-)
>> > 
>> > diff --git a/qapi/block-core.json b/qapi/block-core.json
>> > index 5b9084a394..91dd075c84 100644
>> > --- a/qapi/block-core.json
>> > +++ b/qapi/block-core.json
[...]
>> Or, here's an idea:
>> 
>> Keep the name @base and @top, but add a new '*by-node':'bool' parameter,
>> defaulting to false for now, but perhaps with a deprecation warning that
>> we'll switch the default to true in one release and delete the parameter
>> altogether in an even later release. When false, @base and @top are
>> filenames, as before; when true, @base and @top are node names instead.
>> Introspectible, nicer names in the long run, and avoids having to detect a
>> user providing two mutually-exclusive options at once.
>
> I don't like options that completely change the semantics of other
> options, but maybe that's just personal preference.

I happen to share it.

> Anyway, thinking about the long term for block-commit is useless, the
> command is just broken (for example, the @device option doesn't make any
> sense) and will have to be replaced. But libvirt needs something _now_
> for the -blockdev support, so I decided to add this as a quick hack
> before we get the proper replacement.
>
> I think it makes more sense to create a new blockdev-commit (which
> would be a name more in line with the other block job commands) and
> deprecate the old block-commit command as a whole.
>
>> > +++ b/blockdev.c
>> > @@ -3308,7 +3308,9 @@ out:
>> >   }
>> >   void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
>> > +                      bool has_base_node, const char *base_node,
>> >                         bool has_base, const char *base,
>> > +                      bool has_top_node, const char *top_node,
>> >                         bool has_top, const char *top,
>> >                         bool has_backing_file, const char *backing_file,
>> >                         bool has_speed, int64_t speed,
>> 
>> Getting to be a long signature. Should we use 'boxed':true in the QAPI file
>> to make this easier to write?  (Separate commit)
>
> It's an option.
>
> Has any progress been made on the plan to support defaults in QAPI, so
> that we could get rid of the has_* parameters?

No.  It's one of those things that keep getting pushed out by more
important or urgent stuff.

I expect it to be straightforward, if tedious.
Eric Blake Aug. 13, 2018, 3:08 p.m. | #4
On 08/13/2018 04:35 AM, Markus Armbruster wrote:

>>> Or, here's an idea:
>>>
>>> Keep the name @base and @top, but add a new '*by-node':'bool' parameter,
>>> defaulting to false for now, but perhaps with a deprecation warning that
>>> we'll switch the default to true in one release and delete the parameter
>>> altogether in an even later release. When false, @base and @top are
>>> filenames, as before; when true, @base and @top are node names instead.
>>> Introspectible, nicer names in the long run, and avoids having to detect a
>>> user providing two mutually-exclusive options at once.
>>
>> I don't like options that completely change the semantics of other
>> options, but maybe that's just personal preference.
> 
> I happen to share it.

Okay, we'll ditch that idea as a non-starter.

> 
>> Anyway, thinking about the long term for block-commit is useless, the
>> command is just broken (for example, the @device option doesn't make any
>> sense) and will have to be replaced. But libvirt needs something _now_
>> for the -blockdev support, so I decided to add this as a quick hack
>> before we get the proper replacement.
>>
>> I think it makes more sense to create a new blockdev-commit (which
>> would be a name more in line with the other block job commands) and
>> deprecate the old block-commit command as a whole.

Okay, looks like a good plan for the long term, and thus a good 
rationale for the short-term choices. The commit message could call that 
out.


>> Has any progress been made on the plan to support defaults in QAPI, so
>> that we could get rid of the has_* parameters?
> 
> No.  It's one of those things that keep getting pushed out by more
> important or urgent stuff.
> 
> I expect it to be straightforward, if tedious.

In part, Marc-Andre's work to get conditional compilation in has gotten 
us closer, in that we can have 'name':{'type':'foo','if':'...'} instead 
of 'name':'type', since that dict for conditional compilation is also 
where we would stick in default values.
Max Reitz Aug. 13, 2018, 4:40 p.m. | #5
On 2018-08-10 18:26, Kevin Wolf wrote:
> The block-commit QMP command required specifying the top and base nodes
> of the commit jobs using the file name of that node. While this works
> in simple cases (local files with absolute paths), the file names
> generated for more complicated setups can be hard to predict.
> 
> This adds two new options top-node and base-node to the command, which
> allow specifying node names instead. They are mutually exclusive with
> the old options.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  qapi/block-core.json | 24 ++++++++++++++++++------
>  blockdev.c           | 32 ++++++++++++++++++++++++++++++--
>  2 files changed, 48 insertions(+), 8 deletions(-)

Looks good to me, but you made me a bit cautious with your talk of how
many pitfalls you've encountered on your way to do this change...

Max
Peter Krempa Aug. 28, 2018, 2:26 p.m. | #6
On Fri, Aug 10, 2018 at 18:26:57 +0200, Kevin Wolf wrote:
> The block-commit QMP command required specifying the top and base nodes
> of the commit jobs using the file name of that node. While this works
> in simple cases (local files with absolute paths), the file names
> generated for more complicated setups can be hard to predict.
> 
> This adds two new options top-node and base-node to the command, which
> allow specifying node names instead. They are mutually exclusive with
> the old options.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  qapi/block-core.json | 24 ++++++++++++++++++------
>  blockdev.c           | 32 ++++++++++++++++++++++++++++++--
>  2 files changed, 48 insertions(+), 8 deletions(-)

While the below is not strictly relevant to this patch usage
block-commit is not possible when using -blockdev. Thus the new
arguments are not very useful otherwise.

With the new options I'm getting:

{"execute":"block-commit",
 "arguments": { "device":"libvirt-3-format",
                "job-id":"libvirt-3-format",
                "top-node":"libvirt-8-format",
                "base-node":"libvirt-9-format",
                "auto-finalize":true,
                "auto-dismiss":false},
 "id":"libvirt-16"}

{"id":"libvirt-16",
 "error":{"class":"GenericError",
          "desc":"Block node is read-only"}}

I'm pointing into the backing chain so the files are declared as read-only.

It works just-fine if I open them as read-write with
-blockdev/blockdev-add but that obviously is not correct as you can't
then share parts of the backing chain with other VMs due to image
locking.

libvirt-3-format is read-write and all other node names are readonly in
the above example.

The same also happens when using filenames:

{"execute":"block-commit",
 "arguments" : {"device":"libvirt-3-format",
                "job-id":"libvirt-3-format",
                "top":"/var/lib/libvirt/images/rhel7.3.1483615252",
                "base":"/var/lib/libvirt/images/rhel7.3.1483605924",
                "auto-finalize":true,
                "auto-dismiss":false},
 "id":"libvirt-13"}

{"id":"libvirt-13","error":{"class":"GenericError","desc":"Block node is read-only"}}


When I use the drive alias rather than the node-name for the 'device'
argument it works as expected.

{"execute":"block-commit",
 "arguments": { "device":"drive-virtio-disk0",
                "job-id":"drive-virtio-disk0",
                "top":"/var/lib/libvirt/images/rhel7.3.1483536402",
                "base":"/var/lib/libvirt/images/rhel7.3.1483545313"},
 "id":"libvirt-18"}


I was not able to find anything which would allow to reopen the file R/W
in case of the block-commit operation, but I suspect it should be done
automatically as previously it was done that way prior to -blockdev.

Peter
Kevin Wolf Sept. 3, 2018, 3:03 p.m. | #7
Am 28.08.2018 um 16:26 hat Peter Krempa geschrieben:
> On Fri, Aug 10, 2018 at 18:26:57 +0200, Kevin Wolf wrote:
> > The block-commit QMP command required specifying the top and base nodes
> > of the commit jobs using the file name of that node. While this works
> > in simple cases (local files with absolute paths), the file names
> > generated for more complicated setups can be hard to predict.
> > 
> > This adds two new options top-node and base-node to the command, which
> > allow specifying node names instead. They are mutually exclusive with
> > the old options.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  qapi/block-core.json | 24 ++++++++++++++++++------
> >  blockdev.c           | 32 ++++++++++++++++++++++++++++++--
> >  2 files changed, 48 insertions(+), 8 deletions(-)
> 
> While the below is not strictly relevant to this patch usage
> block-commit is not possible when using -blockdev. Thus the new
> arguments are not very useful otherwise.
> 
> With the new options I'm getting:
> 
> {"execute":"block-commit",
>  "arguments": { "device":"libvirt-3-format",
>                 "job-id":"libvirt-3-format",
>                 "top-node":"libvirt-8-format",
>                 "base-node":"libvirt-9-format",
>                 "auto-finalize":true,
>                 "auto-dismiss":false},
>  "id":"libvirt-16"}
> 
> {"id":"libvirt-16",
>  "error":{"class":"GenericError",
>           "desc":"Block node is read-only"}}
> 
> I'm pointing into the backing chain so the files are declared as read-only.
> 
> It works just-fine if I open them as read-write with
> -blockdev/blockdev-add but that obviously is not correct as you can't
> then share parts of the backing chain with other VMs due to image
> locking.
> 
> libvirt-3-format is read-write and all other node names are readonly in
> the above example.
> 
> The same also happens when using filenames:
> 
> {"execute":"block-commit",
>  "arguments" : {"device":"libvirt-3-format",
>                 "job-id":"libvirt-3-format",
>                 "top":"/var/lib/libvirt/images/rhel7.3.1483615252",
>                 "base":"/var/lib/libvirt/images/rhel7.3.1483605924",
>                 "auto-finalize":true,
>                 "auto-dismiss":false},
>  "id":"libvirt-13"}
> 
> {"id":"libvirt-13","error":{"class":"GenericError","desc":"Block node is read-only"}}

I see what's happening here. So we have a graph like this:

     guest device
          |
          v
    overlay-format -------> backing-format
    [read-only=off]         [read-only=on]
          |                        |
          v                        v
    overlay-proto           backing-proto
    [read-only=off]         [read-only=on]

The difference between your -blockdev use and -drive is that you
explicitly specify the read-only option for backing-proto (and you use a
separate -blockdev option anyway), so it doesn't just inherit it from
backing-format.

Now, when the commit job tries to reopen backing-format, your explicit
read-only=on for backing-proto still takes precedence and the node stays
read-only. If you hadn't used a separate -blockdev for backing-proto,
but included it in the definition for backing-format and left out the
read-only option, it would have inherited the option and reopen would
adjust both nodes. This is what happens with -drive.

So essentially, I guess, all places that want to switch between
read-only and read-write need to learn which other nodes (apart from the
top-level node they are interested in) must be reopened as well.

This looks a bit messy. :-/

Any good ideas anyone?

Kevin
Alberto Garcia Sept. 4, 2018, 1:13 p.m. | #8
On Mon 03 Sep 2018 05:03:11 PM CEST, Kevin Wolf wrote:
>> libvirt-3-format is read-write and all other node names are readonly in
>> the above example.
>> 
>> The same also happens when using filenames:
>> 
>> {"execute":"block-commit",
>>  "arguments" : {"device":"libvirt-3-format",
>>                 "job-id":"libvirt-3-format",
>>                 "top":"/var/lib/libvirt/images/rhel7.3.1483615252",
>>                 "base":"/var/lib/libvirt/images/rhel7.3.1483605924",
>>                 "auto-finalize":true,
>>                 "auto-dismiss":false},
>>  "id":"libvirt-13"}
>> 
>> {"id":"libvirt-13","error":{"class":"GenericError","desc":"Block node is read-only"}}
>
> I see what's happening here. So we have a graph like this:
>
>      guest device
>           |
>           v
>     overlay-format -------> backing-format
>     [read-only=off]         [read-only=on]
>           |                        |
>           v                        v
>     overlay-proto           backing-proto
>     [read-only=off]         [read-only=on]
>
> The difference between your -blockdev use and -drive is that you
> explicitly specify the read-only option for backing-proto (and you use
> a separate -blockdev option anyway), so it doesn't just inherit it
> from backing-format.

Are these format and protocol block devices opened with four separate
-blockdev parameters? Is that how libvirt does it?

Berto
Peter Krempa Sept. 4, 2018, 2:17 p.m. | #9
On Tue, Sep 04, 2018 at 15:13:44 +0200, Alberto Garcia wrote:
> On Mon 03 Sep 2018 05:03:11 PM CEST, Kevin Wolf wrote:
> >> libvirt-3-format is read-write and all other node names are readonly in
> >> the above example.
> >> 
> >> The same also happens when using filenames:
> >> 
> >> {"execute":"block-commit",
> >>  "arguments" : {"device":"libvirt-3-format",
> >>                 "job-id":"libvirt-3-format",
> >>                 "top":"/var/lib/libvirt/images/rhel7.3.1483615252",
> >>                 "base":"/var/lib/libvirt/images/rhel7.3.1483605924",
> >>                 "auto-finalize":true,
> >>                 "auto-dismiss":false},
> >>  "id":"libvirt-13"}
> >> 
> >> {"id":"libvirt-13","error":{"class":"GenericError","desc":"Block node is read-only"}}
> >
> > I see what's happening here. So we have a graph like this:
> >
> >      guest device
> >           |
> >           v
> >     overlay-format -------> backing-format
> >     [read-only=off]         [read-only=on]
> >           |                        |
> >           v                        v
> >     overlay-proto           backing-proto
> >     [read-only=off]         [read-only=on]
> >
> > The difference between your -blockdev use and -drive is that you
> > explicitly specify the read-only option for backing-proto (and you use
> > a separate -blockdev option anyway), so it doesn't just inherit it
> > from backing-format.
> 
> Are these format and protocol block devices opened with four separate
> -blockdev parameters? Is that how libvirt does it?

Yes. This goes along with the fact that for 'blockdev-create' you need
to blockdev-add the file which you want to format, but the formatted
file is not automatically added.

If we'd use the approach where the protocol layer is opened as part of
the format layer it would complicate the snapshot code where we need to
add a file and then format it to qcow2. It would mean that we'd have to
blockdev-add a file, format it via blockdev-create, then blockdev-del it
and open it together with the format layer. Otherwise the disk
hot-unplug code would be plain crazy.
Peter Krempa Sept. 4, 2018, 2:21 p.m. | #10
On Mon, Sep 03, 2018 at 17:03:11 +0200, Kevin Wolf wrote:
> Am 28.08.2018 um 16:26 hat Peter Krempa geschrieben:
> > On Fri, Aug 10, 2018 at 18:26:57 +0200, Kevin Wolf wrote:
> > > The block-commit QMP command required specifying the top and base nodes
> > > of the commit jobs using the file name of that node. While this works
> > > in simple cases (local files with absolute paths), the file names
> > > generated for more complicated setups can be hard to predict.
> > > 
> > > This adds two new options top-node and base-node to the command, which
> > > allow specifying node names instead. They are mutually exclusive with
> > > the old options.
> > > 
> > > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > > ---
> > >  qapi/block-core.json | 24 ++++++++++++++++++------
> > >  blockdev.c           | 32 ++++++++++++++++++++++++++++++--
> > >  2 files changed, 48 insertions(+), 8 deletions(-)
> > 
> > While the below is not strictly relevant to this patch usage
> > block-commit is not possible when using -blockdev. Thus the new
> > arguments are not very useful otherwise.
> > 
> > With the new options I'm getting:
> > 
> > {"execute":"block-commit",
> >  "arguments": { "device":"libvirt-3-format",
> >                 "job-id":"libvirt-3-format",
> >                 "top-node":"libvirt-8-format",
> >                 "base-node":"libvirt-9-format",
> >                 "auto-finalize":true,
> >                 "auto-dismiss":false},
> >  "id":"libvirt-16"}
> > 
> > {"id":"libvirt-16",
> >  "error":{"class":"GenericError",
> >           "desc":"Block node is read-only"}}
> > 
> > I'm pointing into the backing chain so the files are declared as read-only.
> > 
> > It works just-fine if I open them as read-write with
> > -blockdev/blockdev-add but that obviously is not correct as you can't
> > then share parts of the backing chain with other VMs due to image
> > locking.
> > 
> > libvirt-3-format is read-write and all other node names are readonly in
> > the above example.
> > 
> > The same also happens when using filenames:
> > 
> > {"execute":"block-commit",
> >  "arguments" : {"device":"libvirt-3-format",
> >                 "job-id":"libvirt-3-format",
> >                 "top":"/var/lib/libvirt/images/rhel7.3.1483615252",
> >                 "base":"/var/lib/libvirt/images/rhel7.3.1483605924",
> >                 "auto-finalize":true,
> >                 "auto-dismiss":false},
> >  "id":"libvirt-13"}
> > 
> > {"id":"libvirt-13","error":{"class":"GenericError","desc":"Block node is read-only"}}
> 
> I see what's happening here. So we have a graph like this:
> 
>      guest device
>           |
>           v
>     overlay-format -------> backing-format
>     [read-only=off]         [read-only=on]
>           |                        |
>           v                        v
>     overlay-proto           backing-proto
>     [read-only=off]         [read-only=on]
> 
> The difference between your -blockdev use and -drive is that you
> explicitly specify the read-only option for backing-proto (and you use a
> separate -blockdev option anyway), so it doesn't just inherit it from
> backing-format.
> 
> Now, when the commit job tries to reopen backing-format, your explicit
> read-only=on for backing-proto still takes precedence and the node stays
> read-only. If you hadn't used a separate -blockdev for backing-proto,
> but included it in the definition for backing-format and left out the
> read-only option, it would have inherited the option and reopen would
> adjust both nodes. This is what happens with -drive.
> 
> So essentially, I guess, all places that want to switch between
> read-only and read-write need to learn which other nodes (apart from the
> top-level node they are interested in) must be reopened as well.

We theoretically can always open the protocol layer read-write if it
does not conflict with the image locking code (I did not test that).

Changing to opening them as dependancy of the format layer would
complicate things with blockdev-create where that would not be possible
and would require blockdev-add(proto), blockdev-create,
blockdev-del(proto),blockdev-add (format+proto).
Alberto Garcia Sept. 4, 2018, 2:42 p.m. | #11
On Tue 04 Sep 2018 04:17:30 PM CEST, Peter Krempa wrote:
>> >> libvirt-3-format is read-write and all other node names are
>> >> readonly in the above example.
>> >> 
>> >> The same also happens when using filenames:
>> >> 
>> >> {"execute":"block-commit",
>> >>  "arguments" : {"device":"libvirt-3-format",
>> >>                 "job-id":"libvirt-3-format",
>> >>                 "top":"/var/lib/libvirt/images/rhel7.3.1483615252",
>> >>                 "base":"/var/lib/libvirt/images/rhel7.3.1483605924",
>> >>                 "auto-finalize":true,
>> >>                 "auto-dismiss":false},
>> >>  "id":"libvirt-13"}
>> >> 
>> >> {"id":"libvirt-13","error":{"class":"GenericError","desc":"Block node is read-only"}}
>> >
>> > I see what's happening here. So we have a graph like this:
>> >
>> >      guest device
>> >           |
>> >           v
>> >     overlay-format -------> backing-format
>> >     [read-only=off]         [read-only=on]
>> >           |                        |
>> >           v                        v
>> >     overlay-proto           backing-proto
>> >     [read-only=off]         [read-only=on]
>> >
>> > The difference between your -blockdev use and -drive is that you
>> > explicitly specify the read-only option for backing-proto (and you use
>> > a separate -blockdev option anyway), so it doesn't just inherit it
>> > from backing-format.
>> 
>> Are these format and protocol block devices opened with four separate
>> -blockdev parameters? Is that how libvirt does it?
>
> Yes. This goes along with the fact that for 'blockdev-create' you need
> to blockdev-add the file which you want to format, but the formatted
> file is not automatically added.
>
> If we'd use the approach where the protocol layer is opened as part of
> the format layer it would complicate the snapshot code where we need
> to add a file and then format it to qcow2. It would mean that we'd
> have to blockdev-add a file, format it via blockdev-create, then
> blockdev-del it and open it together with the format layer. Otherwise
> the disk hot-unplug code would be plain crazy.

Do you need to add the protocol layer in order to format it, though? :-?

(I'm just trying to understand how this works, I'm not too familiar with
blockdev-create)

{'execute': 'blockdev-create',
 'arguments': {'job-id': 'job0',
               'options': {'driver': 'file',
                           'filename': 'test.qcow2',
                           'size': 0}}}
{'execute': 'job-dismiss', 'arguments': {'id': 'job0'}}

{'execute': 'blockdev-create',
 'arguments': {'job-id': 'job1',
               'options': { 'driver': 'qcow2',
                            'size': 1048576,
                            'file': {'driver': 'file',
                                     'filename': 'test.qcow2'}}}}
{'execute': 'job-dismiss', 'arguments': {'id': 'job1'}}

Berto
Peter Krempa Sept. 4, 2018, 3 p.m. | #12
On Tue, Sep 04, 2018 at 16:42:17 +0200, Alberto Garcia wrote:
> On Tue 04 Sep 2018 04:17:30 PM CEST, Peter Krempa wrote:
> >> >> libvirt-3-format is read-write and all other node names are
> >> >> readonly in the above example.
> >> >> 
> >> >> The same also happens when using filenames:
> >> >> 
> >> >> {"execute":"block-commit",
> >> >>  "arguments" : {"device":"libvirt-3-format",
> >> >>                 "job-id":"libvirt-3-format",
> >> >>                 "top":"/var/lib/libvirt/images/rhel7.3.1483615252",
> >> >>                 "base":"/var/lib/libvirt/images/rhel7.3.1483605924",
> >> >>                 "auto-finalize":true,
> >> >>                 "auto-dismiss":false},
> >> >>  "id":"libvirt-13"}
> >> >> 
> >> >> {"id":"libvirt-13","error":{"class":"GenericError","desc":"Block node is read-only"}}
> >> >
> >> > I see what's happening here. So we have a graph like this:
> >> >
> >> >      guest device
> >> >           |
> >> >           v
> >> >     overlay-format -------> backing-format
> >> >     [read-only=off]         [read-only=on]
> >> >           |                        |
> >> >           v                        v
> >> >     overlay-proto           backing-proto
> >> >     [read-only=off]         [read-only=on]
> >> >
> >> > The difference between your -blockdev use and -drive is that you
> >> > explicitly specify the read-only option for backing-proto (and you use
> >> > a separate -blockdev option anyway), so it doesn't just inherit it
> >> > from backing-format.
> >> 
> >> Are these format and protocol block devices opened with four separate
> >> -blockdev parameters? Is that how libvirt does it?
> >
> > Yes. This goes along with the fact that for 'blockdev-create' you need
> > to blockdev-add the file which you want to format, but the formatted
> > file is not automatically added.
> >
> > If we'd use the approach where the protocol layer is opened as part of
> > the format layer it would complicate the snapshot code where we need
> > to add a file and then format it to qcow2. It would mean that we'd
> > have to blockdev-add a file, format it via blockdev-create, then
> > blockdev-del it and open it together with the format layer. Otherwise
> > the disk hot-unplug code would be plain crazy.
> 
> Do you need to add the protocol layer in order to format it, though? :-?
> 
> (I'm just trying to understand how this works, I'm not too familiar with
> blockdev-create)
> 
> {'execute': 'blockdev-create',
>  'arguments': {'job-id': 'job0',
>                'options': {'driver': 'file',
>                            'filename': 'test.qcow2',
>                            'size': 0}}}
> {'execute': 'job-dismiss', 'arguments': {'id': 'job0'}}
> 
> {'execute': 'blockdev-create',
>  'arguments': {'job-id': 'job1',
>                'options': { 'driver': 'qcow2',
>                             'size': 1048576,
>                             'file': {'driver': 'file',
>                                      'filename': 'test.qcow2'}}}}
> {'execute': 'job-dismiss', 'arguments': {'id': 'job1'}}

Honestly I've reused the existing approach and did not try without
actually adding the protocol layer.

I remember being told some time ago to specify both layers explicitly.
Since it's not yet enabled in libvirt we theoretically could change to
one -blockdev for format+protocol but in that case we need some kind
of guarantee that every (even new) feature will work with it.

Switching between those approaches once we enable it upstream will not
be possible without adding a lot of compatibility code.
Kevin Wolf Sept. 4, 2018, 3:34 p.m. | #13
Am 04.09.2018 um 17:00 hat Peter Krempa geschrieben:
> On Tue, Sep 04, 2018 at 16:42:17 +0200, Alberto Garcia wrote:
> > On Tue 04 Sep 2018 04:17:30 PM CEST, Peter Krempa wrote:
> > >> >> libvirt-3-format is read-write and all other node names are
> > >> >> readonly in the above example.
> > >> >> 
> > >> >> The same also happens when using filenames:
> > >> >> 
> > >> >> {"execute":"block-commit",
> > >> >>  "arguments" : {"device":"libvirt-3-format",
> > >> >>                 "job-id":"libvirt-3-format",
> > >> >>                 "top":"/var/lib/libvirt/images/rhel7.3.1483615252",
> > >> >>                 "base":"/var/lib/libvirt/images/rhel7.3.1483605924",
> > >> >>                 "auto-finalize":true,
> > >> >>                 "auto-dismiss":false},
> > >> >>  "id":"libvirt-13"}
> > >> >> 
> > >> >> {"id":"libvirt-13","error":{"class":"GenericError","desc":"Block node is read-only"}}
> > >> >
> > >> > I see what's happening here. So we have a graph like this:
> > >> >
> > >> >      guest device
> > >> >           |
> > >> >           v
> > >> >     overlay-format -------> backing-format
> > >> >     [read-only=off]         [read-only=on]
> > >> >           |                        |
> > >> >           v                        v
> > >> >     overlay-proto           backing-proto
> > >> >     [read-only=off]         [read-only=on]
> > >> >
> > >> > The difference between your -blockdev use and -drive is that you
> > >> > explicitly specify the read-only option for backing-proto (and you use
> > >> > a separate -blockdev option anyway), so it doesn't just inherit it
> > >> > from backing-format.
> > >> 
> > >> Are these format and protocol block devices opened with four separate
> > >> -blockdev parameters? Is that how libvirt does it?
> > >
> > > Yes. This goes along with the fact that for 'blockdev-create' you need
> > > to blockdev-add the file which you want to format, but the formatted
> > > file is not automatically added.
> > >
> > > If we'd use the approach where the protocol layer is opened as part of
> > > the format layer it would complicate the snapshot code where we need
> > > to add a file and then format it to qcow2. It would mean that we'd
> > > have to blockdev-add a file, format it via blockdev-create, then
> > > blockdev-del it and open it together with the format layer. Otherwise
> > > the disk hot-unplug code would be plain crazy.
> > 
> > Do you need to add the protocol layer in order to format it, though? :-?
> > 
> > (I'm just trying to understand how this works, I'm not too familiar with
> > blockdev-create)
> > 
> > {'execute': 'blockdev-create',
> >  'arguments': {'job-id': 'job0',
> >                'options': {'driver': 'file',
> >                            'filename': 'test.qcow2',
> >                            'size': 0}}}
> > {'execute': 'job-dismiss', 'arguments': {'id': 'job0'}}
> > 
> > {'execute': 'blockdev-create',
> >  'arguments': {'job-id': 'job1',
> >                'options': { 'driver': 'qcow2',
> >                             'size': 1048576,
> >                             'file': {'driver': 'file',
> >                                      'filename': 'test.qcow2'}}}}
> > {'execute': 'job-dismiss', 'arguments': {'id': 'job1'}}
> 
> Honestly I've reused the existing approach and did not try without
> actually adding the protocol layer.
> 
> I remember being told some time ago to specify both layers explicitly.
> Since it's not yet enabled in libvirt we theoretically could change to
> one -blockdev for format+protocol but in that case we need some kind
> of guarantee that every (even new) feature will work with it.
> 
> Switching between those approaches once we enable it upstream will not
> be possible without adding a lot of compatibility code.

Yeah, I think specifying both layers explicitly is cleaner. This should
probably be solved some way inside QEMU.

The read-only option for the backend isn't that useful anyway. Maybe we
should do away with it, at least for its current purpose, and just rely
on write permissions taken by parents. We could then either silently
ignore (and deprecate) the read-only backend option or we could change
its semantics to mean "never allow a writer on this node".

Kevin
Peter Krempa Sept. 5, 2018, 12:38 p.m. | #14
On Tue, Sep 04, 2018 at 17:34:36 +0200, Kevin Wolf wrote:
> Am 04.09.2018 um 17:00 hat Peter Krempa geschrieben:
> > On Tue, Sep 04, 2018 at 16:42:17 +0200, Alberto Garcia wrote:
> > > On Tue 04 Sep 2018 04:17:30 PM CEST, Peter Krempa wrote:

[...]

> > I remember being told some time ago to specify both layers explicitly.
> > Since it's not yet enabled in libvirt we theoretically could change to
> > one -blockdev for format+protocol but in that case we need some kind
> > of guarantee that every (even new) feature will work with it.
> > 
> > Switching between those approaches once we enable it upstream will not
> > be possible without adding a lot of compatibility code.
> 
> Yeah, I think specifying both layers explicitly is cleaner. This should
> probably be solved some way inside QEMU.
> 
> The read-only option for the backend isn't that useful anyway. Maybe we
> should do away with it, at least for its current purpose, and just rely
> on write permissions taken by parents. We could then either silently
> ignore (and deprecate) the read-only backend option or we could change
> its semantics to mean "never allow a writer on this node".

So I tried that approach and it seems to work just fine with files
including sharing part of the read-only backing chain with other VMs
without the image locking mechanism ruining the day.

block-commit is able to reopen the format layers and works as expected.

Unfortunately though the 'read-only' option is actually useful as the
curl-driver does not work without it:

-blockdev {"driver":"http","url":"http://ftp.sjtu.edu.cn:80/ubuntu-cd/12.04/ubuntu-12.04.5-alternate-amd64.iso","node-name":"libvirt-2-storage","discard":"unmap"}: curl block device does not support writes

We obviously can encode that knowledge into libvirt but it will be hard
to undo if qemu eventually supports writes in the curl driver.

Which other protocol drivers don't support writes? in case we have to go
this way.
Eric Blake Sept. 5, 2018, 1:48 p.m. | #15
On 09/05/2018 07:38 AM, Peter Krempa wrote:

> block-commit is able to reopen the format layers and works as expected.
> 
> Unfortunately though the 'read-only' option is actually useful as the
> curl-driver does not work without it:
> 
> -blockdev {"driver":"http","url":"http://ftp.sjtu.edu.cn:80/ubuntu-cd/12.04/ubuntu-12.04.5-alternate-amd64.iso","node-name":"libvirt-2-storage","discard":"unmap"}: curl block device does not support writes
> 
> We obviously can encode that knowledge into libvirt but it will be hard
> to undo if qemu eventually supports writes in the curl driver.
> 
> Which other protocol drivers don't support writes? in case we have to go
> this way.

When an NBD server exported an image as read-only, the NBD block client 
cannot request write permissions.  But that's a runtime discovery 
process, not a limitation of the block driver itself.
Peter Krempa Sept. 5, 2018, 2:02 p.m. | #16
On Wed, Sep 05, 2018 at 08:48:15 -0500, Eric Blake wrote:
> On 09/05/2018 07:38 AM, Peter Krempa wrote:
> 
> > block-commit is able to reopen the format layers and works as expected.
> > 
> > Unfortunately though the 'read-only' option is actually useful as the
> > curl-driver does not work without it:
> > 
> > -blockdev {"driver":"http","url":"http://ftp.sjtu.edu.cn:80/ubuntu-cd/12.04/ubuntu-12.04.5-alternate-amd64.iso","node-name":"libvirt-2-storage","discard":"unmap"}: curl block device does not support writes
> > 
> > We obviously can encode that knowledge into libvirt but it will be hard
> > to undo if qemu eventually supports writes in the curl driver.
> > 
> > Which other protocol drivers don't support writes? in case we have to go
> > this way.
> 
> When an NBD server exported an image as read-only, the NBD block client
> cannot request write permissions.  But that's a runtime discovery process,
> not a limitation of the block driver itself.

Hmmm, that's unfortunate. Because in some cases we don't know this fact
upfront in libvirt and we also don't know whether an user might attempt
to block-commit at some time.

We probably do need a way to specify that we want
'read-write-if-possible' behaviour.

Peter
Kevin Wolf Sept. 5, 2018, 3:04 p.m. | #17
Am 05.09.2018 um 16:02 hat Peter Krempa geschrieben:
> On Wed, Sep 05, 2018 at 08:48:15 -0500, Eric Blake wrote:
> > On 09/05/2018 07:38 AM, Peter Krempa wrote:
> > 
> > > block-commit is able to reopen the format layers and works as expected.
> > > 
> > > Unfortunately though the 'read-only' option is actually useful as the
> > > curl-driver does not work without it:
> > > 
> > > -blockdev {"driver":"http","url":"http://ftp.sjtu.edu.cn:80/ubuntu-cd/12.04/ubuntu-12.04.5-alternate-amd64.iso","node-name":"libvirt-2-storage","discard":"unmap"}: curl block device does not support writes
> > > 
> > > We obviously can encode that knowledge into libvirt but it will be hard
> > > to undo if qemu eventually supports writes in the curl driver.
> > > 
> > > Which other protocol drivers don't support writes? in case we have to go
> > > this way.
> > 
> > When an NBD server exported an image as read-only, the NBD block client
> > cannot request write permissions.  But that's a runtime discovery process,
> > not a limitation of the block driver itself.
> 
> Hmmm, that's unfortunate. Because in some cases we don't know this fact
> upfront in libvirt and we also don't know whether an user might attempt
> to block-commit at some time.
> 
> We probably do need a way to specify that we want
> 'read-write-if-possible' behaviour.

So after all, maybe we should try whether a read-only=auto is possible,
which would reopen the image file on demand (depending on whether some
user of the node requested BLK_PERM_WRITE etc.)

Kevin

Patch

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5b9084a394..91dd075c84 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1455,12 +1455,23 @@ 
 #
 # @device:  the device name or node-name of a root node
 #
-# @base:   The file name of the backing image to write data into.
-#                    If not specified, this is the deepest backing image.
+# @base-node: The node name of the backing image to write data into.
+#             If not specified, this is the deepest backing image.
+#             (since: 2.10)
 #
-# @top:    The file name of the backing image within the image chain,
-#                    which contains the topmost data to be committed down. If
-#                    not specified, this is the active layer.
+# @base: Same as @base-node, except that it is a file name rather than a node
+#        name. This must be the exact filename string that was used to open the
+#        node; other strings, even if addressing the same file, are not
+#        accepted (deprecated, use @base-node instead)
+#
+# @top-node: The node name of the backing image within the image chain
+#            which contains the topmost data to be committed down. If
+#            not specified, this is the active layer. (since: 2.10)
+#
+# @top: Same as @top-node, except that it is a file name rather than a node
+#       name. This must be the exact filename string that was used to open the
+#       node; other strings, even if addressing the same file, are not
+#       accepted (deprecated, use @base-node instead)
 #
 # @backing-file:  The backing file string to write into the overlay
 #                           image of 'top'.  If 'top' is the active layer,
@@ -1516,7 +1527,8 @@ 
 #
 ##
 { 'command': 'block-commit',
-  'data': { '*job-id': 'str', 'device': 'str', '*base': 'str', '*top': 'str',
+  'data': { '*job-id': 'str', 'device': 'str', '*base-node': 'str',
+            '*base': 'str', '*top-node': 'str', '*top': 'str',
             '*backing-file': 'str', '*speed': 'int',
             '*filter-node-name': 'str' } }
 
diff --git a/blockdev.c b/blockdev.c
index dcf8c8d2ab..064c8fb3f5 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3308,7 +3308,9 @@  out:
 }
 
 void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
+                      bool has_base_node, const char *base_node,
                       bool has_base, const char *base,
+                      bool has_top_node, const char *top_node,
                       bool has_top, const char *top,
                       bool has_backing_file, const char *backing_file,
                       bool has_speed, int64_t speed,
@@ -3360,7 +3362,20 @@  void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
     /* default top_bs is the active layer */
     top_bs = bs;
 
-    if (has_top && top) {
+    if (has_top_node && has_top) {
+        error_setg(errp, "'top-node' and 'top' are mutually exclusive");
+        goto out;
+    } else if (has_top_node) {
+        top_bs = bdrv_lookup_bs(NULL, top_node, errp);
+        if (top_bs == NULL) {
+            goto out;
+        }
+        if (!bdrv_chain_contains(bs, top_bs)) {
+            error_setg(errp, "'%s' is not in this backing file chain",
+                       top_node);
+            goto out;
+        }
+    } else if (has_top && top) {
         if (strcmp(bs->filename, top) != 0) {
             top_bs = bdrv_find_backing_image(bs, top);
         }
@@ -3373,7 +3388,20 @@  void qmp_block_commit(bool has_job_id, const char *job_id, const char *device,
 
     assert(bdrv_get_aio_context(top_bs) == aio_context);
 
-    if (has_base && base) {
+    if (has_base_node && has_base) {
+        error_setg(errp, "'base-node' and 'base' are mutually exclusive");
+        goto out;
+    } else if (has_base_node) {
+        base_bs = bdrv_lookup_bs(NULL, base_node, errp);
+        if (base_bs == NULL) {
+            goto out;
+        }
+        if (!bdrv_chain_contains(top_bs, base_bs)) {
+            error_setg(errp, "'%s' is not in this backing file chain",
+                       base_node);
+            goto out;
+        }
+    } else if (has_base && base) {
         base_bs = bdrv_find_backing_image(top_bs, base);
     } else {
         base_bs = bdrv_find_base(top_bs);