diff mbox

[v5,3/4] qmp: add monitor command to add/remove a child

Message ID 1442907862-21376-4-git-send-email-wency@cn.fujitsu.com
State New
Headers show

Commit Message

Wen Congyang Sept. 22, 2015, 7:44 a.m. UTC
The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
It justs for adding/removing quorum's child now, and don't support all
kinds of children, nor all block drivers. So it is experimental now.

Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
 blockdev.c           | 48 +++++++++++++++++++++++++++++++++++++++++
 qapi/block-core.json | 34 +++++++++++++++++++++++++++++
 qmp-commands.hx      | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 143 insertions(+)

Comments

Alberto Garcia Oct. 7, 2015, 2:33 p.m. UTC | #1
On Tue 22 Sep 2015 09:44:21 AM CEST, Wen Congyang wrote:
> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
> It justs for adding/removing quorum's child now, and don't support all
> kinds of children, nor all block drivers. So it is experimental now.

Better "So it is experimental for now". Otherwise it suggests that it
was not experimental before but now it is.

> +void qmp_x_blockdev_child_add(const char *parent, const char *child,
> +                              Error **errp)
> +{
> +    BlockDriverState *parent_bs, *child_bs;
> +    Error *local_err = NULL;
> +
> +    parent_bs = bdrv_lookup_bs(parent, parent, &local_err);
> +    if (!parent_bs) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }

I think this is fine as it is now, but since you are checking the value
of parent_bs to see if bdrv_lookup_bs() succeeded you can use errp
directly instead and skip the error_propagate() call.

> +void qmp_x_blockdev_child_del(const char *parent, const char *child,
> +                              Error **errp)
> +{
> +    BlockDriverState *parent_bs, *child_bs;
> +    Error *local_err = NULL;
> +
> +    parent_bs = bdrv_lookup_bs(parent, parent, &local_err);
> +    if (!parent_bs) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }

Same here.

> +##
> +# @x-blockdev-child-add
> +#
> +# Add a new child to the parent BDS. Currently only the Quorum driver
> +# implements this feature. This is useful to fix a broken quorum
> child.

Again, I would not use 'BDS' here. "Add a new child to an existing block
device", or something like that.

> +# @parent: graph node name or id which the child will be added to.
> +#
> +# @child: graph node name that will be added.
> +#
> +# Note: this command is experimental, and not a stable API.

"[..] and does not have a stable API".

> +x-blockdev-child-add
> +------------
> +
> +Add a child to a quorum node.
> +
> +Arguments:
> +
> +- "parent": the quorum's id or node name
> +- "child": the child node-name which will be added
> +

I would use the same kind of description than in the json file. "Add a
child to an existing block device. Currently only Quorum supports this
feature". Same for the 'x-blockdev-child-del' documentation.

Berto
Max Reitz Oct. 7, 2015, 7:42 p.m. UTC | #2
On 22.09.2015 09:44, Wen Congyang wrote:
> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
> It justs for adding/removing quorum's child now, and don't support all
> kinds of children,

It does support all kinds of children for quorum, doesn't it?

>                    nor all block drivers. So it is experimental now.

Well, that is not really a reason why we would have to make it
experimental. For instance, blockdev-add (although some might argue it
actually is experimental...) doesn't support all block drivers either.

The reason I am hesitant of adding an experimental QMP interface that is
actually visible to the user (compare x-image in blkverify and blkdebug,
which are not documented and not to be used by the user) is twofold:

(1) At some point we have to say "OK, this is good enough now" and make
    it stable. What would that point be? Who can guarantee that we
    wouldn't want to make any interface changes after that point? Would
    we actually remember to revisit this function once in a while and
    consider making it stable?

(2) While marking things experimental *should* keep people from using it
    in their tools, nobody can guarantee that it *does* keep them from
    doing so. So we may find ourselves in the situation of having to
    keep a compatibility interface for an experimental feature...

For the second point, you should also consider how useful this feature
is to management tools. Just being able to remove and attach children
from a quorum node seems very useful on its own. I don't see why we
should wait for having support for other block drivers; also, for most
block drivers there is no meaningful way of adding or removing children
as nicely as that is possible for quorum.

E.g. you may have a block filter in the future where you want to
exchange its child BDS. This exchange should be an atomic operation, so
we cannot use this interface there anyway. For quorum, such an exchange
does not need to be atomic, since you can just add the new child first
and remove the old one afterwards.

So maybe in the future we get some block driver other than quorum for
which adding and removing children (as opposed to atomically exchanging
them) makes sense, but for now I can only see quorum. Therefore, that
this works for quorum only is in my opinion not a reason to make it
experimental. I think we actually want to keep it that way.

So the question would then be: What ways can you imagine to change this
interface, which would necessitate an incompatible change, therefore
calling for an experimental interface?

(My point is that with such an experimental interface, management tools
cannot use it, even though it'd be a very nice functionality to have)

> 
> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
> ---
>  blockdev.c           | 48 +++++++++++++++++++++++++++++++++++++++++
>  qapi/block-core.json | 34 +++++++++++++++++++++++++++++
>  qmp-commands.hx      | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 143 insertions(+)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 32b04b4..8da0ffb 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3086,6 +3086,54 @@ fail:
>      qmp_output_visitor_cleanup(ov);
>  }
>  
> +void qmp_x_blockdev_child_add(const char *parent, const char *child,
> +                              Error **errp)
> +{
> +    BlockDriverState *parent_bs, *child_bs;
> +    Error *local_err = NULL;
> +
> +    parent_bs = bdrv_lookup_bs(parent, parent, &local_err);
> +    if (!parent_bs) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    child_bs = bdrv_find_node(child);
> +    if (!child_bs) {
> +        error_setg(errp, "Node '%s' not found", child);
> +        return;
> +    }
> +
> +    bdrv_add_child(parent_bs, child_bs, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +    }

You can just pass errp to bdrv_add_child().

> +}
> +
> +void qmp_x_blockdev_child_del(const char *parent, const char *child,
> +                              Error **errp)
> +{
> +    BlockDriverState *parent_bs, *child_bs;
> +    Error *local_err = NULL;
> +
> +    parent_bs = bdrv_lookup_bs(parent, parent, &local_err);
> +    if (!parent_bs) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    child_bs = bdrv_find_node(child);
> +    if (!child_bs) {
> +        error_setg(errp, "Node '%s' not found", child);
> +        return;
> +    }
> +
> +    bdrv_del_child(parent_bs, child_bs, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +    }

Same here.

Max

> +}
> +
>  BlockJobInfoList *qmp_query_block_jobs(Error **errp)
>  {
>      BlockJobInfoList *head = NULL, **p_next = &head;
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index bb2189e..9418f05 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2114,3 +2114,37 @@
>  ##
>  { 'command': 'block-set-write-threshold',
>    'data': { 'node-name': 'str', 'write-threshold': 'uint64' } }
> +
> +##
> +# @x-blockdev-child-add
> +#
> +# Add a new child to the parent BDS. Currently only the Quorum driver
> +# implements this feature. This is useful to fix a broken quorum child.
> +#
> +# @parent: graph node name or id which the child will be added to.
> +#
> +# @child: graph node name that will be added.
> +#
> +# Note: this command is experimental, and not a stable API.
> +#
> +# Since: 2.5
> +##
> +{ 'command': 'x-blockdev-child-add',
> +  'data' : { 'parent': 'str', 'child': 'str' } }
> +
> +##
> +# @x-blockdev-child-del
> +#
> +# Remove a child from the parent BDS. Currently only the Quorum driver
> +# implements this feature. This is useful to fix a broken quorum child.
> +# Note, you can't remove a child if it would bring the quorum below its
> +# threshold.
> +#
> +# @parent: graph node name or id from which the child will removed.
> +#
> +# @child: graph node name that will be removed.
> +#
> +# Since: 2.5
> +##
> +{ 'command': 'x-blockdev-child-del',
> +  'data' : { 'parent': 'str', 'child': 'str' } }
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index 66f0300..36e75b1 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -3916,6 +3916,67 @@ Example (2):
>  EQMP
>  
>      {
> +        .name       = "x-blockdev-child-add",
> +        .args_type  = "parent:B,child:B",
> +        .mhandler.cmd_new = qmp_marshal_x_blockdev_child_add,
> +    },
> +
> +SQMP
> +x-blockdev-child-add
> +------------
> +
> +Add a child to a quorum node.
> +
> +Arguments:
> +
> +- "parent": the quorum's id or node name
> +- "child": the child node-name which will be added
> +
> +Note: this command is experimental, and not a stable API. It doesn't
> +support all kinds of children, nor all block drivers.
> +
> +Example:
> +
> +-> { "execute": blockdev-add",
> +    "arguments": { "options": { "driver": "raw",
> +                                "node-name": "new_node",
> +                                "id": "test_new_node",
> +                                "file": { "driver": "file",
> +                                          "filename": "test.raw" } } } }
> +<- { "return": {} }
> +-> { "execute": "x-blockdev-child-add",
> +    "arguments": { "parent": "disk1", "child": "new_node" } }
> +<- { "return": {} }
> +
> +EQMP
> +
> +    {
> +        .name        = "x-blockdev-child-del",
> +        .args_type   = "parent:B,child:B",
> +        .mhandler.cmd_new = qmp_marshal_x_blockdev_child_del,
> +    },
> +
> +SQMP
> +x-blockdev-child-del
> +------------
> +
> +Delete a child from a quorum node. It can be used to remove a broken
> +quorum child.
> +
> +Arguments:
> +
> +- "parent": the quorum's id or node name
> +- "child": the child node-name which will be removed
> +
> +Example:
> +
> +-> { "execute": "x-blockdev-child-del",
> +    "arguments": { "parent": "disk1", "child": "new_node" } }
> +<- { "return": {} }
> +
> +EQMP
> +
> +    {
>          .name       = "query-named-block-nodes",
>          .args_type  = "",
>          .mhandler.cmd_new = qmp_marshal_query_named_block_nodes,
>
Markus Armbruster Oct. 8, 2015, 6:15 a.m. UTC | #3
Max Reitz <mreitz@redhat.com> writes:

> On 22.09.2015 09:44, Wen Congyang wrote:
>> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
>> It justs for adding/removing quorum's child now, and don't support all
>> kinds of children,
>
> It does support all kinds of children for quorum, doesn't it?
>
>>                    nor all block drivers. So it is experimental now.
>
> Well, that is not really a reason why we would have to make it
> experimental. For instance, blockdev-add (although some might argue it
> actually is experimental...) doesn't support all block drivers either.

Yup, and not calling it x-blockdev-add until it's done was a mistake.
People tried using it, then found its current limitations the painful
way.  Not nice.

> The reason I am hesitant of adding an experimental QMP interface that is
> actually visible to the user (compare x-image in blkverify and blkdebug,
> which are not documented and not to be used by the user) is twofold:
>
> (1) At some point we have to say "OK, this is good enough now" and make
>     it stable. What would that point be? Who can guarantee that we
>     wouldn't want to make any interface changes after that point?

Nobody can, just like for any other interface.  So?

The x- prefix enables work spanning multiple releases.  Until the
feature is complete, we have a hard time seeing the whole picture, and
therefore the risk of interface mistakes is higher than normal.  Once
it's complete, we drop the x-.

>                                                                   Would
>     we actually remember to revisit this function once in a while and
>     consider making it stable?

Has that been a problem in the past?

If the feature is of any use, there's always been mounting pressure to
finish the job and drop the x-.  If it's of no use, not dropping the x-
would do no harm.  The opposite, actually.

> (2) While marking things experimental *should* keep people from using it
>     in their tools, nobody can guarantee that it *does* keep them from
>     doing so. So we may find ourselves in the situation of having to
>     keep a compatibility interface for an experimental feature...

You can't force people not to do stupid things, but you *can* improve
their chances at avoiding them.  Clearly marking experimental stuff
improves chances.

> For the second point, you should also consider how useful this feature
> is to management tools. Just being able to remove and attach children
> from a quorum node seems very useful on its own. I don't see why we
> should wait for having support for other block drivers; also, for most
> block drivers there is no meaningful way of adding or removing children
> as nicely as that is possible for quorum.

Okay, this is an argument I might be able to buy.

> E.g. you may have a block filter in the future where you want to
> exchange its child BDS. This exchange should be an atomic operation, so
> we cannot use this interface there anyway. For quorum, such an exchange
> does not need to be atomic, since you can just add the new child first
> and remove the old one afterwards.
>
> So maybe in the future we get some block driver other than quorum for
> which adding and removing children (as opposed to atomically exchanging
> them) makes sense, but for now I can only see quorum. Therefore, that
> this works for quorum only is in my opinion not a reason to make it
> experimental. I think we actually want to keep it that way.

Are you telling us the existing interface is insufficiently general?
That the general interface neeeds to support atomic replacement?

If yes, why isn't the general interface is what we should do for quorum?
Delete is atomic replacement by nothing, add is atomic replacement of
nothing.

> So the question would then be: What ways can you imagine to change this
> interface, which would necessitate an incompatible change, therefore
> calling for an experimental interface?

Yes, that's an important question.

Another important question is whether this is the interface we want.

A secondary question is whether the incompleteness of the implementation
demands an x- to warn users.

> (My point is that with such an experimental interface, management tools
> cannot use it, even though it'd be a very nice functionality to have)

How much work is it to finish the job?  Unless it's a substantial chunk,
debating x- is much ado about nothing: just finish the job already :)
Alberto Garcia Oct. 8, 2015, 8:29 a.m. UTC | #4
On Thu 08 Oct 2015 08:15:25 AM CEST, Markus Armbruster wrote:
>> For the second point, you should also consider how useful this
>> feature is to management tools. Just being able to remove and attach
>> children from a quorum node seems very useful on its own. I don't see
>> why we should wait for having support for other block drivers; also,
>> for most block drivers there is no meaningful way of adding or
>> removing children as nicely as that is possible for quorum.
>
> Okay, this is an argument I might be able to buy.

Note that if we want to make this interface stable there's one use case
missing: there's currently no way to change the vote threshold.

This is maybe not so important for the COLO use case, but for the
general case of adding and removing children from a quorum node having
the possibility to change the threshold makes a lot of sense.

That would probably require a its own API ('quorum-set-threshold' or
something like that) so I don't think it has an effect on these
child-add and child-del commands, but I wanted to mention it here anyway
in case someone sees something that I'm overlooking.

Berto
Kevin Wolf Oct. 8, 2015, 10:03 a.m. UTC | #5
Am 08.10.2015 um 10:29 hat Alberto Garcia geschrieben:
> On Thu 08 Oct 2015 08:15:25 AM CEST, Markus Armbruster wrote:
> >> For the second point, you should also consider how useful this
> >> feature is to management tools. Just being able to remove and attach
> >> children from a quorum node seems very useful on its own. I don't see
> >> why we should wait for having support for other block drivers; also,
> >> for most block drivers there is no meaningful way of adding or
> >> removing children as nicely as that is possible for quorum.
> >
> > Okay, this is an argument I might be able to buy.
> 
> Note that if we want to make this interface stable there's one use case
> missing: there's currently no way to change the vote threshold.
> 
> This is maybe not so important for the COLO use case, but for the
> general case of adding and removing children from a quorum node having
> the possibility to change the threshold makes a lot of sense.
> 
> That would probably require a its own API ('quorum-set-threshold' or
> something like that) so I don't think it has an effect on these
> child-add and child-del commands, but I wanted to mention it here anyway
> in case someone sees something that I'm overlooking.

The right way to change the voting threshold is bdrv_reopen(). Adding
support to quorum for changing the threshold this way should be trivial.
The hard part is how to expose it in QMP, which is why I've only added
it to qemu-io so far (which happens to be accessible from HMP).

If you need this functionality, let's start a new email thread and
discuss the right QMP interface for blockdev-reopen (or blockdev-
set-options or whatever you would call it).

Kevin
Alberto Garcia Oct. 8, 2015, 10:13 a.m. UTC | #6
On Thu 08 Oct 2015 12:03:45 PM CEST, Kevin Wolf wrote:
>> Note that if we want to make this interface stable there's one use
>> case missing: there's currently no way to change the vote threshold.
>> 
> The right way to change the voting threshold is bdrv_reopen(). Adding
> support to quorum for changing the threshold this way should be
> trivial.

Ah, and that would mean implementing quorum_reopen_*. Works for me.

> The hard part is how to expose it in QMP, which is why I've only added
> it to qemu-io so far (which happens to be accessible from HMP).
>
> If you need this functionality, let's start a new email thread and
> discuss the right QMP interface for blockdev-reopen (or blockdev-
> set-options or whatever you would call it).

Ok, sounds good. Thanks!

Berto
Kevin Wolf Oct. 8, 2015, 11:02 a.m. UTC | #7
Am 08.10.2015 um 08:15 hat Markus Armbruster geschrieben:
> Max Reitz <mreitz@redhat.com> writes:
> > E.g. you may have a block filter in the future where you want to
> > exchange its child BDS. This exchange should be an atomic operation, so
> > we cannot use this interface there anyway. For quorum, such an exchange
> > does not need to be atomic, since you can just add the new child first
> > and remove the old one afterwards.
> >
> > So maybe in the future we get some block driver other than quorum for
> > which adding and removing children (as opposed to atomically exchanging
> > them) makes sense, but for now I can only see quorum. Therefore, that
> > this works for quorum only is in my opinion not a reason to make it
> > experimental. I think we actually want to keep it that way.
> 
> Are you telling us the existing interface is insufficiently general?
> That the general interface neeeds to support atomic replacement?
> 
> If yes, why isn't the general interface is what we should do for quorum?
> Delete is atomic replacement by nothing, add is atomic replacement of
> nothing.

The general thing is what we used to call "dynamic reconfiguration". If
we want a single command to enable it (which would be great if we
could), we need to some more design work first. Atomic replacement might
be the operation we're looking for, but we have to be sure.

So far we haven't thought about dynamic reconfiguation enough that we
would know the right solution, but enough that we know it's hard. That
would be an argument for me that makes adding an x-* command now
acceptable. On the other hand, the fact that we want a single command in
the end makes me want to keep it experimental.


What types of dynamic reconfiguration do we need to support? I'll start
with a small list, feel free to extend it:

* Replace a child node with another node. This works pretty much
  everywhere in the tree - including the root, i.e. BlockBackend!
  Working just on BDSes doesn't seem to be enough.

* Adding a child to a node that can take additional children (e.g.
  quorum can take an arbitrary number; but also COW image formats have
  an option child, so you could add a backing file to a node originally
  opened with BDRV_O_NO_BACKING)

  Same as atomically replacing nothing by a node.

* Removing an optional child from a node that remains valid with that
  child removed. The same examples apply.

  Same as atomically replacing a child by nothing.

* Add a new node between two existing nodes. An example is taking a live
  snapshot, which inserts a new node between the BlockBackend and the
  first BDS. Or it could be used to insert a filter somewhere in the
  graph.

  Same as creating the new node pointing to node B (or dynamically
  adding it) and then atomically replacing the reference of node A that
  pointed to B with a reference to the new node.

* Add a new node between multiple existing nodes. This is one of the
  examples we always used to discuss with dynamic reconfiguration:

      base <- sn1 <- sn2 <--+-- virtio-blk
                            |
                            +-- NBD server

  Adding a filter could result in this:

        base <- sn1 <- sn2 <- throttling <--+-- virtio-blk
                                          |
                                          +-- NBD server

  Or this:

        base <- sn1 <- sn2 <--+-- throttling <- virtio-blk
                              |
                              +-- NBD server

  Or this:

        base <- sn1 <- sn2 <--+-- virtio-blk
                              |
                              +-- throttling <- NBD server

  All of these are different kinds of "adding a filter", and all of
  them are valid operations that a user could want to perform.

  Case 2 and 3 are really just "add a new node between two existing
  nodes", as explained above.

  Case 1 is new: We still create the throttling node so that it already
  points to sn2, but now we have to atomically change the children of
  two things (the BlockBackends of virtio-blk and the NBD server). Not a
  problem per se because we can just do that, but it raises the question
  whether the atomic replacement operation needs to be transactionable.

* Remove a node between two (or more) other nodes.

  Same as atomically replacing a child by a grandchild. For more than
  two involved nodes, again a transactional version might be needed.

So at the first sight, this operation seems to work as the general
interface for dynamic reconfiguration.


One problem we discussed earlier that I'm not sure whether it's related
is filter nodes inserted automatically once we change e.g. the I/O
throttling QMP commands to add a throttling filter BDS to the graph.

If the user creates nodes A and B, but sets throttling options, they
might end up with a chain like this:

    A <- throttling <- B

Now imagine that they want to add another filter between A and B, let's
say blkdebug. They would need to know that they have to insert the new
node between A and throttling or B and throttling, but not between A and
B. If they tried to insert it between A and B, the algorithm above says
that they would let blkdebug point to A, and replace B's child with
blkdebug, the resulting tree wouldn't be throttled any more!

    A <- blkdebug <- B

         throttling (ref = 0 -> delete)

Even if they knew that they have to consider the throttling node, it
currently wouldn't have a node-name, and with Jeff's autogenerated names
it wouldn't be predictable.

Maybe the dynamic reconfiguration interface does need to be a bit
cleverer.


Anyway, after writing all of this, I'm almost convinced now that an
experimental interface is the right thing to do in this patch series.

Kevin
Kevin Wolf Oct. 8, 2015, 11:10 a.m. UTC | #8
Am 08.10.2015 um 13:02 hat Kevin Wolf geschrieben:
> Am 08.10.2015 um 08:15 hat Markus Armbruster geschrieben:
> > Max Reitz <mreitz@redhat.com> writes:
> > > E.g. you may have a block filter in the future where you want to
> > > exchange its child BDS. This exchange should be an atomic operation, so
> > > we cannot use this interface there anyway. For quorum, such an exchange
> > > does not need to be atomic, since you can just add the new child first
> > > and remove the old one afterwards.
> > >
> > > So maybe in the future we get some block driver other than quorum for
> > > which adding and removing children (as opposed to atomically exchanging
> > > them) makes sense, but for now I can only see quorum. Therefore, that
> > > this works for quorum only is in my opinion not a reason to make it
> > > experimental. I think we actually want to keep it that way.
> > 
> > Are you telling us the existing interface is insufficiently general?
> > That the general interface neeeds to support atomic replacement?
> > 
> > If yes, why isn't the general interface is what we should do for quorum?
> > Delete is atomic replacement by nothing, add is atomic replacement of
> > nothing.
> 
> The general thing is what we used to call "dynamic reconfiguration". If
> we want a single command to enable it (which would be great if we
> could), we need to some more design work first. Atomic replacement might
> be the operation we're looking for, but we have to be sure.
> 
> So far we haven't thought about dynamic reconfiguation enough that we
> would know the right solution, but enough that we know it's hard. That
> would be an argument for me that makes adding an x-* command now
> acceptable. On the other hand, the fact that we want a single command in
> the end makes me want to keep it experimental.
> 
> 
> What types of dynamic reconfiguration do we need to support? I'll start
> with a small list, feel free to extend it:
> 
> * Replace a child node with another node. This works pretty much
>   everywhere in the tree - including the root, i.e. BlockBackend!
>   Working just on BDSes doesn't seem to be enough.
> 
> * Adding a child to a node that can take additional children (e.g.
>   quorum can take an arbitrary number; but also COW image formats have
>   an option child, so you could add a backing file to a node originally
>   opened with BDRV_O_NO_BACKING)
> 
>   Same as atomically replacing nothing by a node.
> 
> * Removing an optional child from a node that remains valid with that
>   child removed. The same examples apply.
> 
>   Same as atomically replacing a child by nothing.
> 
> * Add a new node between two existing nodes. An example is taking a live
>   snapshot, which inserts a new node between the BlockBackend and the
>   first BDS. Or it could be used to insert a filter somewhere in the
>   graph.
> 
>   Same as creating the new node pointing to node B (or dynamically
>   adding it) and then atomically replacing the reference of node A that
>   pointed to B with a reference to the new node.
> 
> * Add a new node between multiple existing nodes. This is one of the
>   examples we always used to discuss with dynamic reconfiguration:
> 
>       base <- sn1 <- sn2 <--+-- virtio-blk
>                             |
>                             +-- NBD server
> 
>   Adding a filter could result in this:
> 
>         base <- sn1 <- sn2 <- throttling <--+-- virtio-blk
>                                           |
>                                           +-- NBD server
> 
>   Or this:
> 
>         base <- sn1 <- sn2 <--+-- throttling <- virtio-blk
>                               |
>                               +-- NBD server
> 
>   Or this:
> 
>         base <- sn1 <- sn2 <--+-- virtio-blk
>                               |
>                               +-- throttling <- NBD server
> 
>   All of these are different kinds of "adding a filter", and all of
>   them are valid operations that a user could want to perform.
> 
>   Case 2 and 3 are really just "add a new node between two existing
>   nodes", as explained above.
> 
>   Case 1 is new: We still create the throttling node so that it already
>   points to sn2, but now we have to atomically change the children of
>   two things (the BlockBackends of virtio-blk and the NBD server). Not a
>   problem per se because we can just do that, but it raises the question
>   whether the atomic replacement operation needs to be transactionable.
> 
> * Remove a node between two (or more) other nodes.
> 
>   Same as atomically replacing a child by a grandchild. For more than
>   two involved nodes, again a transactional version might be needed.
> 
> So at the first sight, this operation seems to work as the general
> interface for dynamic reconfiguration.

And actually, after re-reading the other branch of this email thread
where I replied to Berto that he wants to use bdrv_reopen() to change
the voting threshold for quorum, it occurred to me that bdrv_reopen()
should possibly also be this atomatic replacement function.

After all, the references to other nodes are blockdev-add options, and
if we implement bdrv_reopen() fully so that it can change any options
that can be given to blockdev-add, that means that we must be able to
replace children.

Kevin

> One problem we discussed earlier that I'm not sure whether it's related
> is filter nodes inserted automatically once we change e.g. the I/O
> throttling QMP commands to add a throttling filter BDS to the graph.
> 
> If the user creates nodes A and B, but sets throttling options, they
> might end up with a chain like this:
> 
>     A <- throttling <- B
> 
> Now imagine that they want to add another filter between A and B, let's
> say blkdebug. They would need to know that they have to insert the new
> node between A and throttling or B and throttling, but not between A and
> B. If they tried to insert it between A and B, the algorithm above says
> that they would let blkdebug point to A, and replace B's child with
> blkdebug, the resulting tree wouldn't be throttled any more!
> 
>     A <- blkdebug <- B
> 
>          throttling (ref = 0 -> delete)
> 
> Even if they knew that they have to consider the throttling node, it
> currently wouldn't have a node-name, and with Jeff's autogenerated names
> it wouldn't be predictable.
> 
> Maybe the dynamic reconfiguration interface does need to be a bit
> cleverer.
> 
> 
> Anyway, after writing all of this, I'm almost convinced now that an
> experimental interface is the right thing to do in this patch series.
> 
> Kevin
>
Max Reitz Oct. 9, 2015, 4:13 p.m. UTC | #9
On 08.10.2015 08:15, Markus Armbruster wrote:
> Max Reitz <mreitz@redhat.com> writes:
> 
>> On 22.09.2015 09:44, Wen Congyang wrote:
>>> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
>>> It justs for adding/removing quorum's child now, and don't support all
>>> kinds of children,
>>
>> It does support all kinds of children for quorum, doesn't it?
>>
>>>                    nor all block drivers. So it is experimental now.
>>
>> Well, that is not really a reason why we would have to make it
>> experimental. For instance, blockdev-add (although some might argue it
>> actually is experimental...) doesn't support all block drivers either.
> 
> Yup, and not calling it x-blockdev-add until it's done was a mistake.
> People tried using it, then found its current limitations the painful
> way.  Not nice.

I knew I should have written s/some might/Markus does/. ;-)

>> The reason I am hesitant of adding an experimental QMP interface that is
>> actually visible to the user (compare x-image in blkverify and blkdebug,
>> which are not documented and not to be used by the user) is twofold:
>>
>> (1) At some point we have to say "OK, this is good enough now" and make
>>     it stable. What would that point be? Who can guarantee that we
>>     wouldn't want to make any interface changes after that point?
> 
> Nobody can, just like for any other interface.  So?

The main question is "what would that point be". As I can see you're
arguing that that point would be "once people want to use it", but I'm
arguing that people want to use it today or we wouldn't need this
interface at all.

I'm against adding external experimental interface because having
external interface indicates that someone wants to use them, but making
them experimental indicates that nobody should use them.

This interface is added for the COLO series. The documentation added in
patch 5 there explains usage of COLO with x-child-add. I don't think
that should be there, because it's experimental. But why have an
external interface if nobody should use it anyway?

> The x- prefix enables work spanning multiple releases.  Until the
> feature is complete, we have a hard time seeing the whole picture, and
> therefore the risk of interface mistakes is higher than normal.  Once
> it's complete, we drop the x-.

I'm arguing the feature is complete as far as what it's supposed to do goes.

>>                                                                   Would
>>     we actually remember to revisit this function once in a while and
>>     consider making it stable?
> 
> Has that been a problem in the past?

I don't know, because I never witnessed an external experimental
interface, but I haven't been closely involved with qemu for too long.

> If the feature is of any use, there's always been mounting pressure to
> finish the job and drop the x-.  If it's of no use, not dropping the x-
> would do no harm.  The opposite, actually.

This is of use, however.

I do see your point, but I'm still arguing that I don't see why we need
an external interface then. But COLO needs an external interface (which
management tools should be able to use!) so there's that.

>> (2) While marking things experimental *should* keep people from using it
>>     in their tools, nobody can guarantee that it *does* keep them from
>>     doing so. So we may find ourselves in the situation of having to
>>     keep a compatibility interface for an experimental feature...
> 
> You can't force people not to do stupid things, but you *can* improve
> their chances at avoiding them.  Clearly marking experimental stuff
> improves chances.

OK, right.

>> For the second point, you should also consider how useful this feature
>> is to management tools. Just being able to remove and attach children
>> from a quorum node seems very useful on its own. I don't see why we
>> should wait for having support for other block drivers; also, for most
>> block drivers there is no meaningful way of adding or removing children
>> as nicely as that is possible for quorum.
> 
> Okay, this is an argument I might be able to buy.
> 
>> E.g. you may have a block filter in the future where you want to
>> exchange its child BDS. This exchange should be an atomic operation, so
>> we cannot use this interface there anyway. For quorum, such an exchange
>> does not need to be atomic, since you can just add the new child first
>> and remove the old one afterwards.
>>
>> So maybe in the future we get some block driver other than quorum for
>> which adding and removing children (as opposed to atomically exchanging
>> them) makes sense, but for now I can only see quorum. Therefore, that
>> this works for quorum only is in my opinion not a reason to make it
>> experimental. I think we actually want to keep it that way.
> 
> Are you telling us the existing interface is insufficiently general?
> That the general interface neeeds to support atomic replacement?

The general interface for all block drivers and situations (as described
by Kevin), yes. The interface for adding/removing children from quorum
in regards to what COLO needs, no.

> If yes, why isn't the general interface is what we should do for quorum?
> Delete is atomic replacement by nothing, add is atomic replacement of
> nothing.

Yes, but I personally don't have a problem with macro functions. I don't
see the harm in implementing blockdev-child-{add,del} now the way they
are and later replacing them by calls to the general function.

But why should we do that?

Some people really want COLO in better sooner than later. Telling them
to wait for until the block layer is as nice as we want it to be is not
really an option I'd be willing to accept. I don't see why we should
delay the COLO work just so that we don't have these two (macro) functions.

The alternative would be to keep it experimental and then tell every
COLO user to use the general function once it's available. But since
COLO users are probably mostly management tools, that seems much more
difficult to me than to just keep these two macro functions as legacy in
qemu.

>> So the question would then be: What ways can you imagine to change this
>> interface, which would necessitate an incompatible change, therefore
>> calling for an experimental interface?
> 
> Yes, that's an important question.
> 
> Another important question is whether this is the interface we want.

I can see why this is an important question to you, but to me it is not
so much. As I argued above, I'm not opposed to adding interface that are
actually not what we want, but that are what users want, and that are
easy to implement with the interface that we want. It's what I call a
“macro function”.

> A secondary question is whether the incompleteness of the implementation
> demands an x- to warn users.

We don't want to shoehorn generality into these two functions, but add a
new one anyway. We might want to add optional parameters later on (e.g.
changing the threshold), but that would be a compatible change.

>> (My point is that with such an experimental interface, management tools
>> cannot use it, even though it'd be a very nice functionality to have)
> 
> How much work is it to finish the job?  Unless it's a substantial chunk,
> debating x- is much ado about nothing: just finish the job already :)

We have proven in the past to need a significant amount of time for even
for non-substantial chunks. Often, non-substantial chunks turn out to be
substantial when actually tackling them.

It basically comes down to whether the COLO authors are willing to wait
for us to do the job. And frankly, if I were them, I wouldn't be.

Max
Max Reitz Oct. 9, 2015, 4:14 p.m. UTC | #10
On 08.10.2015 10:29, Alberto Garcia wrote:
> On Thu 08 Oct 2015 08:15:25 AM CEST, Markus Armbruster wrote:
>>> For the second point, you should also consider how useful this
>>> feature is to management tools. Just being able to remove and attach
>>> children from a quorum node seems very useful on its own. I don't see
>>> why we should wait for having support for other block drivers; also,
>>> for most block drivers there is no meaningful way of adding or
>>> removing children as nicely as that is possible for quorum.
>>
>> Okay, this is an argument I might be able to buy.
> 
> Note that if we want to make this interface stable there's one use case
> missing: there's currently no way to change the vote threshold.

Besides what Kevin said: If you add a new function, that would be
independent from these two functions. If want to add it as an optional
parameter to blockdev-add-child so the change is done atomically, that
wouldn't be an incompatible interface change either.

Max

> This is maybe not so important for the COLO use case, but for the
> general case of adding and removing children from a quorum node having
> the possibility to change the threshold makes a lot of sense.
> 
> That would probably require a its own API ('quorum-set-threshold' or
> something like that) so I don't think it has an effect on these
> child-add and child-del commands, but I wanted to mention it here anyway
> in case someone sees something that I'm overlooking.
> 
> Berto
>
Dr. David Alan Gilbert Oct. 9, 2015, 4:42 p.m. UTC | #11
* Max Reitz (mreitz@redhat.com) wrote:
> On 08.10.2015 08:15, Markus Armbruster wrote:
> > Max Reitz <mreitz@redhat.com> writes:
> > 
> >> On 22.09.2015 09:44, Wen Congyang wrote:
> >>> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
> >>> It justs for adding/removing quorum's child now, and don't support all
> >>> kinds of children,
> >>
> >> It does support all kinds of children for quorum, doesn't it?
> >>
> >>>                    nor all block drivers. So it is experimental now.
> >>
> >> Well, that is not really a reason why we would have to make it
> >> experimental. For instance, blockdev-add (although some might argue it
> >> actually is experimental...) doesn't support all block drivers either.
> > 
> > Yup, and not calling it x-blockdev-add until it's done was a mistake.
> > People tried using it, then found its current limitations the painful
> > way.  Not nice.
> 
> I knew I should have written s/some might/Markus does/. ;-)
> 
> >> The reason I am hesitant of adding an experimental QMP interface that is
> >> actually visible to the user (compare x-image in blkverify and blkdebug,
> >> which are not documented and not to be used by the user) is twofold:
> >>
> >> (1) At some point we have to say "OK, this is good enough now" and make
> >>     it stable. What would that point be? Who can guarantee that we
> >>     wouldn't want to make any interface changes after that point?
> > 
> > Nobody can, just like for any other interface.  So?
> 
> The main question is "what would that point be". As I can see you're
> arguing that that point would be "once people want to use it", but I'm
> arguing that people want to use it today or we wouldn't need this
> interface at all.
> 
> I'm against adding external experimental interface because having
> external interface indicates that someone wants to use them, but making
> them experimental indicates that nobody should use them.
> 
> This interface is added for the COLO series. The documentation added in
> patch 5 there explains usage of COLO with x-child-add. I don't think
> that should be there, because it's experimental. But why have an
> external interface if nobody should use it anyway?

Because it lets people move forward; the COLO series is pretty huge, there
already seem to be side discussions spawning off about dynamic reconfiguration
of stuff, who knows how long those will take to pan out.
Adding the experimental stuff makes it easier for people to try and
get some feedback on.
If everyone turns out to love it then it only takes a trivial patch to promote
it; if people actually realise there is a better interface then it's
no problem to change it either - x- doesn't stop any one using it, but it
does remove their right to moan if it changes.

Dave

> 
> > The x- prefix enables work spanning multiple releases.  Until the
> > feature is complete, we have a hard time seeing the whole picture, and
> > therefore the risk of interface mistakes is higher than normal.  Once
> > it's complete, we drop the x-.
> 
> I'm arguing the feature is complete as far as what it's supposed to do goes.
> 
> >>                                                                   Would
> >>     we actually remember to revisit this function once in a while and
> >>     consider making it stable?
> > 
> > Has that been a problem in the past?
> 
> I don't know, because I never witnessed an external experimental
> interface, but I haven't been closely involved with qemu for too long.
> 
> > If the feature is of any use, there's always been mounting pressure to
> > finish the job and drop the x-.  If it's of no use, not dropping the x-
> > would do no harm.  The opposite, actually.
> 
> This is of use, however.
> 
> I do see your point, but I'm still arguing that I don't see why we need
> an external interface then. But COLO needs an external interface (which
> management tools should be able to use!) so there's that.
> 
> >> (2) While marking things experimental *should* keep people from using it
> >>     in their tools, nobody can guarantee that it *does* keep them from
> >>     doing so. So we may find ourselves in the situation of having to
> >>     keep a compatibility interface for an experimental feature...
> > 
> > You can't force people not to do stupid things, but you *can* improve
> > their chances at avoiding them.  Clearly marking experimental stuff
> > improves chances.
> 
> OK, right.
> 
> >> For the second point, you should also consider how useful this feature
> >> is to management tools. Just being able to remove and attach children
> >> from a quorum node seems very useful on its own. I don't see why we
> >> should wait for having support for other block drivers; also, for most
> >> block drivers there is no meaningful way of adding or removing children
> >> as nicely as that is possible for quorum.
> > 
> > Okay, this is an argument I might be able to buy.
> > 
> >> E.g. you may have a block filter in the future where you want to
> >> exchange its child BDS. This exchange should be an atomic operation, so
> >> we cannot use this interface there anyway. For quorum, such an exchange
> >> does not need to be atomic, since you can just add the new child first
> >> and remove the old one afterwards.
> >>
> >> So maybe in the future we get some block driver other than quorum for
> >> which adding and removing children (as opposed to atomically exchanging
> >> them) makes sense, but for now I can only see quorum. Therefore, that
> >> this works for quorum only is in my opinion not a reason to make it
> >> experimental. I think we actually want to keep it that way.
> > 
> > Are you telling us the existing interface is insufficiently general?
> > That the general interface neeeds to support atomic replacement?
> 
> The general interface for all block drivers and situations (as described
> by Kevin), yes. The interface for adding/removing children from quorum
> in regards to what COLO needs, no.
> 
> > If yes, why isn't the general interface is what we should do for quorum?
> > Delete is atomic replacement by nothing, add is atomic replacement of
> > nothing.
> 
> Yes, but I personally don't have a problem with macro functions. I don't
> see the harm in implementing blockdev-child-{add,del} now the way they
> are and later replacing them by calls to the general function.
> 
> But why should we do that?
> 
> Some people really want COLO in better sooner than later. Telling them
> to wait for until the block layer is as nice as we want it to be is not
> really an option I'd be willing to accept. I don't see why we should
> delay the COLO work just so that we don't have these two (macro) functions.
> 
> The alternative would be to keep it experimental and then tell every
> COLO user to use the general function once it's available. But since
> COLO users are probably mostly management tools, that seems much more
> difficult to me than to just keep these two macro functions as legacy in
> qemu.
> 
> >> So the question would then be: What ways can you imagine to change this
> >> interface, which would necessitate an incompatible change, therefore
> >> calling for an experimental interface?
> > 
> > Yes, that's an important question.
> > 
> > Another important question is whether this is the interface we want.
> 
> I can see why this is an important question to you, but to me it is not
> so much. As I argued above, I'm not opposed to adding interface that are
> actually not what we want, but that are what users want, and that are
> easy to implement with the interface that we want. It's what I call a
> “macro function”.
> 
> > A secondary question is whether the incompleteness of the implementation
> > demands an x- to warn users.
> 
> We don't want to shoehorn generality into these two functions, but add a
> new one anyway. We might want to add optional parameters later on (e.g.
> changing the threshold), but that would be a compatible change.
> 
> >> (My point is that with such an experimental interface, management tools
> >> cannot use it, even though it'd be a very nice functionality to have)
> > 
> > How much work is it to finish the job?  Unless it's a substantial chunk,
> > debating x- is much ado about nothing: just finish the job already :)
> 
> We have proven in the past to need a significant amount of time for even
> for non-substantial chunks. Often, non-substantial chunks turn out to be
> substantial when actually tackling them.
> 
> It basically comes down to whether the COLO authors are willing to wait
> for us to do the job. And frankly, if I were them, I wouldn't be.
> 
> Max
> 


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Max Reitz Oct. 9, 2015, 6:24 p.m. UTC | #12
On 09.10.2015 18:42, Dr. David Alan Gilbert wrote:
> * Max Reitz (mreitz@redhat.com) wrote:
>> On 08.10.2015 08:15, Markus Armbruster wrote:
>>> Max Reitz <mreitz@redhat.com> writes:
>>>
>>>> On 22.09.2015 09:44, Wen Congyang wrote:
>>>>> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
>>>>> It justs for adding/removing quorum's child now, and don't support all
>>>>> kinds of children,
>>>>
>>>> It does support all kinds of children for quorum, doesn't it?
>>>>
>>>>>                    nor all block drivers. So it is experimental now.
>>>>
>>>> Well, that is not really a reason why we would have to make it
>>>> experimental. For instance, blockdev-add (although some might argue it
>>>> actually is experimental...) doesn't support all block drivers either.
>>>
>>> Yup, and not calling it x-blockdev-add until it's done was a mistake.
>>> People tried using it, then found its current limitations the painful
>>> way.  Not nice.
>>
>> I knew I should have written s/some might/Markus does/. ;-)
>>
>>>> The reason I am hesitant of adding an experimental QMP interface that is
>>>> actually visible to the user (compare x-image in blkverify and blkdebug,
>>>> which are not documented and not to be used by the user) is twofold:
>>>>
>>>> (1) At some point we have to say "OK, this is good enough now" and make
>>>>     it stable. What would that point be? Who can guarantee that we
>>>>     wouldn't want to make any interface changes after that point?
>>>
>>> Nobody can, just like for any other interface.  So?
>>
>> The main question is "what would that point be". As I can see you're
>> arguing that that point would be "once people want to use it", but I'm
>> arguing that people want to use it today or we wouldn't need this
>> interface at all.
>>
>> I'm against adding external experimental interface because having
>> external interface indicates that someone wants to use them, but making
>> them experimental indicates that nobody should use them.
>>
>> This interface is added for the COLO series. The documentation added in
>> patch 5 there explains usage of COLO with x-child-add. I don't think
>> that should be there, because it's experimental. But why have an
>> external interface if nobody should use it anyway?
> 
> Because it lets people move forward; the COLO series is pretty huge, there
> already seem to be side discussions spawning off about dynamic reconfiguration
> of stuff, who knows how long those will take to pan out.

Yes, and my point is that with these functions
(blockdev-child-{add,del}) the result of that side discussion doesn't
matter.

> Adding the experimental stuff makes it easier for people to try and
> get some feedback on.

The thing is, I cannot imagine any feedback that would necessitate an
incompatible change. “I want to change quorum's options while
adding/removing children” can easily be accomplished with an additional
optional parameter.

But I do know that we want to keep things experimental exactly because
there can be feedback which I cannot imagine right now.

> If everyone turns out to love it then it only takes a trivial patch to promote
> it; if people actually realise there is a better interface then it's
> no problem to change it either - x- doesn't stop any one using it,

But it should, shouldn't it? No management tool should be using an x-
command, as far as I know. And these are functions which are clearly
designed for management tools.

If management tools are indeed free to use x- functions, then I'm
completely fine with making these experimental for now. It's just that
it looks to me like “Hey, look, we have these two new functions you can
use!” and then, two versions later we remove them because we have a
general reconfiguration option, and we'll say “It's your own fault for
using experimental functions” if someone complains. That sounds
hypocritical to me, but I'm probably being to “legal” here.

(i.e. it's more like “Hey, look, two new cool functions! But don't use
them.” which sounds like a contradiction to me, whereas it actually
means “Feel free to use them but don't blame us”)

tl;dr: May management tools use x- functions? And is it actually
conceivable for them to do so? If so, my whole argument becomes moot, so
let's make these functions x-.

Mainly I'd like to know about some example where we had an x- function
in the past. Markus seemed to imply that was the case.

Max

>                                                                    but it
> does remove their right to moan if it changes.
> 
> Dave
Markus Armbruster Oct. 12, 2015, 7:56 a.m. UTC | #13
Max Reitz <mreitz@redhat.com> writes:

> On 08.10.2015 08:15, Markus Armbruster wrote:
>> Max Reitz <mreitz@redhat.com> writes:
>> 
>>> On 22.09.2015 09:44, Wen Congyang wrote:
>>>> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
>>>> It justs for adding/removing quorum's child now, and don't support all
>>>> kinds of children,
>>>
>>> It does support all kinds of children for quorum, doesn't it?
>>>
>>>>                    nor all block drivers. So it is experimental now.
>>>
>>> Well, that is not really a reason why we would have to make it
>>> experimental. For instance, blockdev-add (although some might argue it
>>> actually is experimental...) doesn't support all block drivers either.
>> 
>> Yup, and not calling it x-blockdev-add until it's done was a mistake.
>> People tried using it, then found its current limitations the painful
>> way.  Not nice.
>
> I knew I should have written s/some might/Markus does/. ;-)

:)

>>> The reason I am hesitant of adding an experimental QMP interface that is
>>> actually visible to the user (compare x-image in blkverify and blkdebug,
>>> which are not documented and not to be used by the user) is twofold:
>>>
>>> (1) At some point we have to say "OK, this is good enough now" and make
>>>     it stable. What would that point be? Who can guarantee that we
>>>     wouldn't want to make any interface changes after that point?
>> 
>> Nobody can, just like for any other interface.  So?
>
> The main question is "what would that point be". As I can see you're
> arguing that that point would be "once people want to use it", but I'm
> arguing that people want to use it today or we wouldn't need this
> interface at all.
>
> I'm against adding external experimental interface because having
> external interface indicates that someone wants to use them, but making
> them experimental indicates that nobody should use them.

Make that "nobody should use them in anger just yet."

They can and should be used to develop stuff.  Developing non-trivial
interfaces without actual users is risky.  Sometimes, you can't see
shortcomings in an interface until you try to use it.  Successful actual
use can build confidence the experimental interface is in fact ready to
be cast in stone.

> This interface is added for the COLO series. The documentation added in
> patch 5 there explains usage of COLO with x-child-add. I don't think
> that should be there, because it's experimental. But why have an
> external interface if nobody should use it anyway?
>
>> The x- prefix enables work spanning multiple releases.  Until the
>> feature is complete, we have a hard time seeing the whole picture, and
>> therefore the risk of interface mistakes is higher than normal.  Once
>> it's complete, we drop the x-.
>
> I'm arguing the feature is complete as far as what it's supposed to do goes.

When you say "the feature is complete", you're arguing this specific
interface is ready.  When you say you're "against adding external
experimental interface", you're arguing proper use of x-.  Let's try to
keep the discussion of principles separate from the discussion of the
specific instance.

On the former: maybe the interface is ready, but I can't judge offhand.
All I can do is ask questions.

On the latter: I emphatically disagree with the idea that experimental
interfaces are to be avoided because "someone wants to use them".

>>>                                                                   Would
>>>     we actually remember to revisit this function once in a while and
>>>     consider making it stable?
>> 
>> Has that been a problem in the past?
>
> I don't know, because I never witnessed an external experimental
> interface, but I haven't been closely involved with qemu for too long.

QMP itself started experimental, and was declared stable after fairly
heated discussion.

I think we've been dropping x- prefixes pretty routinely.  A quick,
superficial search finds commit 41310c6 (x-rdma) and commit 467b3f3
(x-iothread).

[...]
Markus Armbruster Oct. 12, 2015, 7:58 a.m. UTC | #14
"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes:

> * Max Reitz (mreitz@redhat.com) wrote:
>> On 08.10.2015 08:15, Markus Armbruster wrote:
>> > Max Reitz <mreitz@redhat.com> writes:
>> > 
>> >> On 22.09.2015 09:44, Wen Congyang wrote:
>> >>> The new QMP command name is x-blockdev-child-add, and
>> >>> x-blockdev-child-del.
>> >>> It justs for adding/removing quorum's child now, and don't support all
>> >>> kinds of children,
>> >>
>> >> It does support all kinds of children for quorum, doesn't it?
>> >>
>> >>>                    nor all block drivers. So it is experimental now.
>> >>
>> >> Well, that is not really a reason why we would have to make it
>> >> experimental. For instance, blockdev-add (although some might argue it
>> >> actually is experimental...) doesn't support all block drivers either.
>> > 
>> > Yup, and not calling it x-blockdev-add until it's done was a mistake.
>> > People tried using it, then found its current limitations the painful
>> > way.  Not nice.
>> 
>> I knew I should have written s/some might/Markus does/. ;-)
>> 
>> >> The reason I am hesitant of adding an experimental QMP interface that is
>> >> actually visible to the user (compare x-image in blkverify and blkdebug,
>> >> which are not documented and not to be used by the user) is twofold:
>> >>
>> >> (1) At some point we have to say "OK, this is good enough now" and make
>> >>     it stable. What would that point be? Who can guarantee that we
>> >>     wouldn't want to make any interface changes after that point?
>> > 
>> > Nobody can, just like for any other interface.  So?
>> 
>> The main question is "what would that point be". As I can see you're
>> arguing that that point would be "once people want to use it", but I'm
>> arguing that people want to use it today or we wouldn't need this
>> interface at all.
>> 
>> I'm against adding external experimental interface because having
>> external interface indicates that someone wants to use them, but making
>> them experimental indicates that nobody should use them.
>> 
>> This interface is added for the COLO series. The documentation added in
>> patch 5 there explains usage of COLO with x-child-add. I don't think
>> that should be there, because it's experimental. But why have an
>> external interface if nobody should use it anyway?
>
> Because it lets people move forward; the COLO series is pretty huge, there
> already seem to be side discussions spawning off about dynamic reconfiguration
> of stuff, who knows how long those will take to pan out.
> Adding the experimental stuff makes it easier for people to try and
> get some feedback on.
> If everyone turns out to love it then it only takes a trivial patch to promote
> it; if people actually realise there is a better interface then it's
> no problem to change it either - x- doesn't stop any one using it, but it
> does remove their right to moan if it changes.

Exactly.
Dr. David Alan Gilbert Oct. 12, 2015, 8:07 a.m. UTC | #15
* Max Reitz (mreitz@redhat.com) wrote:
> On 09.10.2015 18:42, Dr. David Alan Gilbert wrote:
> > * Max Reitz (mreitz@redhat.com) wrote:
> >> On 08.10.2015 08:15, Markus Armbruster wrote:
> >>> Max Reitz <mreitz@redhat.com> writes:
> >>>
> >>>> On 22.09.2015 09:44, Wen Congyang wrote:
> >>>>> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
> >>>>> It justs for adding/removing quorum's child now, and don't support all
> >>>>> kinds of children,
> >>>>
> >>>> It does support all kinds of children for quorum, doesn't it?
> >>>>
> >>>>>                    nor all block drivers. So it is experimental now.
> >>>>
> >>>> Well, that is not really a reason why we would have to make it
> >>>> experimental. For instance, blockdev-add (although some might argue it
> >>>> actually is experimental...) doesn't support all block drivers either.
> >>>
> >>> Yup, and not calling it x-blockdev-add until it's done was a mistake.
> >>> People tried using it, then found its current limitations the painful
> >>> way.  Not nice.
> >>
> >> I knew I should have written s/some might/Markus does/. ;-)
> >>
> >>>> The reason I am hesitant of adding an experimental QMP interface that is
> >>>> actually visible to the user (compare x-image in blkverify and blkdebug,
> >>>> which are not documented and not to be used by the user) is twofold:
> >>>>
> >>>> (1) At some point we have to say "OK, this is good enough now" and make
> >>>>     it stable. What would that point be? Who can guarantee that we
> >>>>     wouldn't want to make any interface changes after that point?
> >>>
> >>> Nobody can, just like for any other interface.  So?
> >>
> >> The main question is "what would that point be". As I can see you're
> >> arguing that that point would be "once people want to use it", but I'm
> >> arguing that people want to use it today or we wouldn't need this
> >> interface at all.
> >>
> >> I'm against adding external experimental interface because having
> >> external interface indicates that someone wants to use them, but making
> >> them experimental indicates that nobody should use them.
> >>
> >> This interface is added for the COLO series. The documentation added in
> >> patch 5 there explains usage of COLO with x-child-add. I don't think
> >> that should be there, because it's experimental. But why have an
> >> external interface if nobody should use it anyway?
> > 
> > Because it lets people move forward; the COLO series is pretty huge, there
> > already seem to be side discussions spawning off about dynamic reconfiguration
> > of stuff, who knows how long those will take to pan out.
> 
> Yes, and my point is that with these functions
> (blockdev-child-{add,del}) the result of that side discussion doesn't
> matter.
> 
> > Adding the experimental stuff makes it easier for people to try and
> > get some feedback on.
> 
> The thing is, I cannot imagine any feedback that would necessitate an
> incompatible change. “I want to change quorum's options while
> adding/removing children” can easily be accomplished with an additional
> optional parameter.
> 
> But I do know that we want to keep things experimental exactly because
> there can be feedback which I cannot imagine right now.
> 
> > If everyone turns out to love it then it only takes a trivial patch to promote
> > it; if people actually realise there is a better interface then it's
> > no problem to change it either - x- doesn't stop any one using it,
> 
> But it should, shouldn't it? No management tool should be using an x-
> command, as far as I know. And these are functions which are clearly
> designed for management tools.
> 
> If management tools are indeed free to use x- functions, then I'm
> completely fine with making these experimental for now. It's just that
> it looks to me like “Hey, look, we have these two new functions you can
> use!” and then, two versions later we remove them because we have a
> general reconfiguration option, and we'll say “It's your own fault for
> using experimental functions” if someone complains. That sounds
> hypocritical to me, but I'm probably being to “legal” here.
>
> (i.e. it's more like “Hey, look, two new cool functions! But don't use
> them.” which sounds like a contradiction to me, whereas it actually
> means “Feel free to use them but don't blame us”)
> 
> tl;dr: May management tools use x- functions? And is it actually
> conceivable for them to do so? If so, my whole argument becomes moot, so
> let's make these functions x-.

My guess is the libvirt guys wont take the code to drive the x- methods;
but it still makes it easier if someone wants to try this stuff out, they
wont need to apply 2/3 sets of COLO code and then any management tools.

> Mainly I'd like to know about some example where we had an x- function
> in the past. Markus seemed to imply that was the case.

The RDMA code used to have x- for migration protocol and some of the
capabilities; we've recently added Jason Herne's cpu throttling with
similar x- flags (1626fee3bdbb295d5e8aff800f7621357bb376d6),
and input-send-event got moved into the x- world (df5b2adb7398d71016ee469f71e52075ed95e04e)
which is much worse than it starting out there.

Dave

> 
> Max
> 
> >                                                                    but it
> > does remove their right to moan if it changes.
> > 
> > Dave
> 
> 


--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Kevin Wolf Oct. 12, 2015, 8:18 a.m. UTC | #16
Am 09.10.2015 um 20:24 hat Max Reitz geschrieben:
> On 09.10.2015 18:42, Dr. David Alan Gilbert wrote:
> > * Max Reitz (mreitz@redhat.com) wrote:
> >> On 08.10.2015 08:15, Markus Armbruster wrote:
> >>> Max Reitz <mreitz@redhat.com> writes:
> >>>
> >>>> On 22.09.2015 09:44, Wen Congyang wrote:
> >>>>> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
> >>>>> It justs for adding/removing quorum's child now, and don't support all
> >>>>> kinds of children,
> >>>>
> >>>> It does support all kinds of children for quorum, doesn't it?
> >>>>
> >>>>>                    nor all block drivers. So it is experimental now.
> >>>>
> >>>> Well, that is not really a reason why we would have to make it
> >>>> experimental. For instance, blockdev-add (although some might argue it
> >>>> actually is experimental...) doesn't support all block drivers either.
> >>>
> >>> Yup, and not calling it x-blockdev-add until it's done was a mistake.
> >>> People tried using it, then found its current limitations the painful
> >>> way.  Not nice.
> >>
> >> I knew I should have written s/some might/Markus does/. ;-)
> >>
> >>>> The reason I am hesitant of adding an experimental QMP interface that is
> >>>> actually visible to the user (compare x-image in blkverify and blkdebug,
> >>>> which are not documented and not to be used by the user) is twofold:
> >>>>
> >>>> (1) At some point we have to say "OK, this is good enough now" and make
> >>>>     it stable. What would that point be? Who can guarantee that we
> >>>>     wouldn't want to make any interface changes after that point?
> >>>
> >>> Nobody can, just like for any other interface.  So?
> >>
> >> The main question is "what would that point be". As I can see you're
> >> arguing that that point would be "once people want to use it", but I'm
> >> arguing that people want to use it today or we wouldn't need this
> >> interface at all.
> >>
> >> I'm against adding external experimental interface because having
> >> external interface indicates that someone wants to use them, but making
> >> them experimental indicates that nobody should use them.
> >>
> >> This interface is added for the COLO series. The documentation added in
> >> patch 5 there explains usage of COLO with x-child-add. I don't think
> >> that should be there, because it's experimental. But why have an
> >> external interface if nobody should use it anyway?
> > 
> > Because it lets people move forward; the COLO series is pretty huge, there
> > already seem to be side discussions spawning off about dynamic reconfiguration
> > of stuff, who knows how long those will take to pan out.
> 
> Yes, and my point is that with these functions
> (blockdev-child-{add,del}) the result of that side discussion doesn't
> matter.
> 
> > Adding the experimental stuff makes it easier for people to try and
> > get some feedback on.
> 
> The thing is, I cannot imagine any feedback that would necessitate an
> incompatible change. “I want to change quorum's options while
> adding/removing children” can easily be accomplished with an additional
> optional parameter.
> 
> But I do know that we want to keep things experimental exactly because
> there can be feedback which I cannot imagine right now.
> 
> > If everyone turns out to love it then it only takes a trivial patch to promote
> > it; if people actually realise there is a better interface then it's
> > no problem to change it either - x- doesn't stop any one using it,
> 
> But it should, shouldn't it? No management tool should be using an x-
> command, as far as I know. And these are functions which are clearly
> designed for management tools.

It should stop people from using it in production, but it shouldn't stop
them from using it for development and testing.

We know that child-add/del is probably not the interface that we want to
have in the end (and I would like to avoid accumulating tons of
compatibility commands once we have what we want).

If the COLO people say that they need an experimental command in order
to make progress, that's fine with me. I think we'll all agree that
while 'blockdev-add' can't reasonably be used in production yet, without
it we couldn't have made much of the progress in the block layer that we
made in the past year. If COLO people are in the same situation, let's
give them what they need, without setting an unwanted interface in
stone.

> If management tools are indeed free to use x- functions, then I'm
> completely fine with making these experimental for now. It's just that
> it looks to me like “Hey, look, we have these two new functions you can
> use!” and then, two versions later we remove them because we have a
> general reconfiguration option, and we'll say “It's your own fault for
> using experimental functions” if someone complains. That sounds
> hypocritical to me, but I'm probably being to “legal” here.

Experimental features in management tools (e.g. in some feature branch)
can use them, they just can't rely on it keeping working.

> (i.e. it's more like “Hey, look, two new cool functions! But don't use
> them.” which sounds like a contradiction to me, whereas it actually
> means “Feel free to use them but don't blame us”)

"Hey, look, two new functions for an incomplete feature! Use them if you
want to help with development, but otherwise stay clear of them."

Kevin
Max Reitz Oct. 12, 2015, 4:27 p.m. UTC | #17
On 07.10.2015 21:42, Max Reitz wrote:
> On 22.09.2015 09:44, Wen Congyang wrote:
>> The new QMP command name is x-blockdev-child-add, and x-blockdev-child-del.
>> It justs for adding/removing quorum's child now, and don't support all
>> kinds of children,
> 
> It does support all kinds of children for quorum, doesn't it?
> 
>>                    nor all block drivers. So it is experimental now.
> 
> Well, that is not really a reason why we would have to make it
> experimental. For instance, blockdev-add (although some might argue it
> actually is experimental...) doesn't support all block drivers either.

OK, after a rather long discussion, my opinion has changed. Adding them
as experimental interfaces is good (although the reason noted here is
not exactly what I feel is the reason that came out in the discussion).

Thanks to everyone who argued with me! I took a good chunk of your time,
and I'll have you know that I'm grateful for it.

Max
Markus Armbruster Oct. 21, 2015, 8:27 a.m. UTC | #18
Sorry for my slow reply.

Kevin Wolf <kwolf@redhat.com> writes:

> Am 08.10.2015 um 13:02 hat Kevin Wolf geschrieben:
>> Am 08.10.2015 um 08:15 hat Markus Armbruster geschrieben:
>> > Max Reitz <mreitz@redhat.com> writes:
>> > > E.g. you may have a block filter in the future where you want to
>> > > exchange its child BDS. This exchange should be an atomic operation, so
>> > > we cannot use this interface there anyway. For quorum, such an exchange
>> > > does not need to be atomic, since you can just add the new child first
>> > > and remove the old one afterwards.
>> > >
>> > > So maybe in the future we get some block driver other than quorum for
>> > > which adding and removing children (as opposed to atomically exchanging
>> > > them) makes sense, but for now I can only see quorum. Therefore, that
>> > > this works for quorum only is in my opinion not a reason to make it
>> > > experimental. I think we actually want to keep it that way.
>> > 
>> > Are you telling us the existing interface is insufficiently general?
>> > That the general interface neeeds to support atomic replacement?
>> > 
>> > If yes, why isn't the general interface is what we should do for quorum?
>> > Delete is atomic replacement by nothing, add is atomic replacement of
>> > nothing.
>> 
>> The general thing is what we used to call "dynamic reconfiguration". If
>> we want a single command to enable it (which would be great if we
>> could), we need to some more design work first. Atomic replacement might
>> be the operation we're looking for, but we have to be sure.
>> 
>> So far we haven't thought about dynamic reconfiguation enough that we
>> would know the right solution, but enough that we know it's hard.

Yes.

>>                                                                   That
>> would be an argument for me that makes adding an x-* command now
>> acceptable. On the other hand, the fact that we want a single command in
>> the end makes me want to keep it experimental.

We have a bit more time to think until the release calcifies interfaces.

>> What types of dynamic reconfiguration do we need to support? I'll start
>> with a small list, feel free to extend it:
>> 
>> * Replace a child node with another node. This works pretty much
>>   everywhere in the tree - including the root, i.e. BlockBackend!
>>   Working just on BDSes doesn't seem to be enough.

Two ways to replace a node:

1. Operation on a node, i.e. replace node O by node N *everywhere*.

2. Operation on an edge, i.e. replace an edge to node O by an edge to
   node N.

Less abstract:

1. Node replacement takes a BDS *O.  It either replaces in place
   (something we've worked hard to get rid of), or needs to find all
   variables containing the value O.

2. Edge replacement takes a BDS **O.  Naturally, whoever contains that
   thing may need to be notified, so the actual interface may well be a
   bit more involved.

>> * Adding a child to a node that can take additional children (e.g.
>>   quorum can take an arbitrary number; but also COW image formats have
>>   an option child, so you could add a backing file to a node originally
>>   opened with BDRV_O_NO_BACKING)
>> 
>>   Same as atomically replacing nothing by a node.

Yes.

>> * Removing an optional child from a node that remains valid with that
>>   child removed. The same examples apply.
>> 
>>   Same as atomically replacing a child by nothing.

Yes.

>> * Add a new node between two existing nodes. An example is taking a live
>>   snapshot, which inserts a new node between the BlockBackend and the
>>   first BDS. Or it could be used to insert a filter somewhere in the
>>   graph.
>> 
>>   Same as creating the new node pointing to node B (or dynamically
>>   adding it) and then atomically replacing the reference of node A that
>>   pointed to B with a reference to the new node.

Footnote: between creation of the new node and the atomic replace, the
old node has another parent.  Requires support for a BDS to have
multiple parents, but that's a given, I think.

>> * Add a new node between multiple existing nodes. This is one of the
>>   examples we always used to discuss with dynamic reconfiguration:
>> 
>>       base <- sn1 <- sn2 <--+-- virtio-blk
>>                             |
>>                             +-- NBD server
>> 
>>   Adding a filter could result in this:
>> 
>>         base <- sn1 <- sn2 <- throttling <--+-- virtio-blk
>>                                           |
>>                                           +-- NBD server
>> 
>>   Or this:
>> 
>>         base <- sn1 <- sn2 <--+-- throttling <- virtio-blk
>>                               |
>>                               +-- NBD server
>> 
>>   Or this:
>> 
>>         base <- sn1 <- sn2 <--+-- virtio-blk
>>                               |
>>                               +-- throttling <- NBD server
>> 
>>   All of these are different kinds of "adding a filter", and all of
>>   them are valid operations that a user could want to perform.
>> 
>>   Case 2 and 3 are really just "add a new node between two existing
>>   nodes", as explained above.

Yes.

>>   Case 1 is new: We still create the throttling node so that it already
>>   points to sn2, but now we have to atomically change the children of
>>   two things (the BlockBackends of virtio-blk and the NBD server). Not a
>>   problem per se because we can just do that, but it raises the question
>>   whether the atomic replacement operation needs to be transactionable.

Where "transactionable" means you can bind it together with other
transactionable operations, and the resulting transaction either
succeeds or fails completely.  Right?

In increasing flexibility order:

a. atomic single replacement, not transactionable

b. atomic multiple replacements, not transactionable

c. single replacement, transactionable

In your throttling example, having to insert the throttle in two
separate operations (either of which can fail) seems tolerable.

Can we think of an example where a similar multi-replacement should
either succeed or fail completely?  I.e. one that needs at least b.

Can we think of an example where we need c?

>> * Remove a node between two (or more) other nodes.
>> 
>>   Same as atomically replacing a child by a grandchild. For more than
>>   two involved nodes, again a transactional version might be needed.
>> 
>> So at the first sight, this operation seems to work as the general
>> interface for dynamic reconfiguration.

Yes, we seem to be onto something here.  Finally!

> And actually, after re-reading the other branch of this email thread
> where I replied to Berto that he wants to use bdrv_reopen() to change
> the voting threshold for quorum, it occurred to me that bdrv_reopen()
> should possibly also be this atomatic replacement function.
>
> After all, the references to other nodes are blockdev-add options, and
> if we implement bdrv_reopen() fully so that it can change any options
> that can be given to blockdev-add, that means that we must be able to
> replace children.

You got a point.  Do you think implementing it will be hard?

>
> Kevin
>
>> One problem we discussed earlier that I'm not sure whether it's related
>> is filter nodes inserted automatically once we change e.g. the I/O
>> throttling QMP commands to add a throttling filter BDS to the graph.
>> 
>> If the user creates nodes A and B, but sets throttling options, they
>> might end up with a chain like this:
>> 
>>     A <- throttling <- B

Is "sets throttling options" == "uses the (then) legacy interface for
throttling instead of the more powerful 'insert throttling node'
interface"?

>> Now imagine that they want to add another filter between A and B, let's
>> say blkdebug. They would need to know that they have to insert the new
>> node between A and throttling or B and throttling, but not between A and
>> B. If they tried to insert it between A and B, the algorithm above says
>> that they would let blkdebug point to A, and replace B's child with
>> blkdebug, the resulting tree wouldn't be throttled any more!
>> 
>>     A <- blkdebug <- B
>> 
>>          throttling (ref = 0 -> delete)

Can we phrase the operation differently?  Instead of "insert between A
and B (silently replacing everything that is now between A and B)",
say one of

1a. Replace node A by A <- blkdebug

1b. Replace node B by blkdebug <- B

2a. Replace edge A <- B by <- blkdebug <-
    Impossible in the current state, because there is no such edge.

2b. Replace edge A <- by <- blkdebug <-

2c. Replace edge <- B by <- blkdebug <-

See also node vs. edge replacement above.

>> Even if they knew that they have to consider the throttling node, it
>> currently wouldn't have a node-name, and with Jeff's autogenerated names
>> it wouldn't be predictable.
>> 
>> Maybe the dynamic reconfiguration interface does need to be a bit
>> cleverer.
>> 
>> 
>> Anyway, after writing all of this, I'm almost convinced now that an
>> experimental interface is the right thing to do in this patch series.

Let's do experimental now, and reconsider before the release.
Wen Congyang Oct. 26, 2015, 2:04 a.m. UTC | #19
On 10/21/2015 04:27 PM, Markus Armbruster wrote:
> Sorry for my slow reply.
> 
> Kevin Wolf <kwolf@redhat.com> writes:
> 
>> Am 08.10.2015 um 13:02 hat Kevin Wolf geschrieben:
>>> Am 08.10.2015 um 08:15 hat Markus Armbruster geschrieben:
>>>> Max Reitz <mreitz@redhat.com> writes:
>>>>> E.g. you may have a block filter in the future where you want to
>>>>> exchange its child BDS. This exchange should be an atomic operation, so
>>>>> we cannot use this interface there anyway. For quorum, such an exchange
>>>>> does not need to be atomic, since you can just add the new child first
>>>>> and remove the old one afterwards.
>>>>>
>>>>> So maybe in the future we get some block driver other than quorum for
>>>>> which adding and removing children (as opposed to atomically exchanging
>>>>> them) makes sense, but for now I can only see quorum. Therefore, that
>>>>> this works for quorum only is in my opinion not a reason to make it
>>>>> experimental. I think we actually want to keep it that way.
>>>>
>>>> Are you telling us the existing interface is insufficiently general?
>>>> That the general interface neeeds to support atomic replacement?
>>>>
>>>> If yes, why isn't the general interface is what we should do for quorum?
>>>> Delete is atomic replacement by nothing, add is atomic replacement of
>>>> nothing.
>>>
>>> The general thing is what we used to call "dynamic reconfiguration". If
>>> we want a single command to enable it (which would be great if we
>>> could), we need to some more design work first. Atomic replacement might
>>> be the operation we're looking for, but we have to be sure.
>>>
>>> So far we haven't thought about dynamic reconfiguation enough that we
>>> would know the right solution, but enough that we know it's hard.
> 
> Yes.
> 
>>>                                                                   That
>>> would be an argument for me that makes adding an x-* command now
>>> acceptable. On the other hand, the fact that we want a single command in
>>> the end makes me want to keep it experimental.
> 
> We have a bit more time to think until the release calcifies interfaces.
> 
>>> What types of dynamic reconfiguration do we need to support? I'll start
>>> with a small list, feel free to extend it:
>>>
>>> * Replace a child node with another node. This works pretty much
>>>   everywhere in the tree - including the root, i.e. BlockBackend!
>>>   Working just on BDSes doesn't seem to be enough.
> 
> Two ways to replace a node:
> 
> 1. Operation on a node, i.e. replace node O by node N *everywhere*.
> 
> 2. Operation on an edge, i.e. replace an edge to node O by an edge to
>    node N.
> 
> Less abstract:
> 
> 1. Node replacement takes a BDS *O.  It either replaces in place
>    (something we've worked hard to get rid of), or needs to find all
>    variables containing the value O.
> 
> 2. Edge replacement takes a BDS **O.  Naturally, whoever contains that
>    thing may need to be notified, so the actual interface may well be a
>    bit more involved.
> 
>>> * Adding a child to a node that can take additional children (e.g.
>>>   quorum can take an arbitrary number; but also COW image formats have
>>>   an option child, so you could add a backing file to a node originally
>>>   opened with BDRV_O_NO_BACKING)
>>>
>>>   Same as atomically replacing nothing by a node.
> 
> Yes.
> 
>>> * Removing an optional child from a node that remains valid with that
>>>   child removed. The same examples apply.
>>>
>>>   Same as atomically replacing a child by nothing.
> 
> Yes.
> 
>>> * Add a new node between two existing nodes. An example is taking a live
>>>   snapshot, which inserts a new node between the BlockBackend and the
>>>   first BDS. Or it could be used to insert a filter somewhere in the
>>>   graph.
>>>
>>>   Same as creating the new node pointing to node B (or dynamically
>>>   adding it) and then atomically replacing the reference of node A that
>>>   pointed to B with a reference to the new node.
> 
> Footnote: between creation of the new node and the atomic replace, the
> old node has another parent.  Requires support for a BDS to have
> multiple parents, but that's a given, I think.
> 
>>> * Add a new node between multiple existing nodes. This is one of the
>>>   examples we always used to discuss with dynamic reconfiguration:
>>>
>>>       base <- sn1 <- sn2 <--+-- virtio-blk
>>>                             |
>>>                             +-- NBD server
>>>
>>>   Adding a filter could result in this:
>>>
>>>         base <- sn1 <- sn2 <- throttling <--+-- virtio-blk
>>>                                           |
>>>                                           +-- NBD server
>>>
>>>   Or this:
>>>
>>>         base <- sn1 <- sn2 <--+-- throttling <- virtio-blk
>>>                               |
>>>                               +-- NBD server
>>>
>>>   Or this:
>>>
>>>         base <- sn1 <- sn2 <--+-- virtio-blk
>>>                               |
>>>                               +-- throttling <- NBD server
>>>
>>>   All of these are different kinds of "adding a filter", and all of
>>>   them are valid operations that a user could want to perform.
>>>
>>>   Case 2 and 3 are really just "add a new node between two existing
>>>   nodes", as explained above.
> 
> Yes.
> 
>>>   Case 1 is new: We still create the throttling node so that it already
>>>   points to sn2, but now we have to atomically change the children of
>>>   two things (the BlockBackends of virtio-blk and the NBD server). Not a
>>>   problem per se because we can just do that, but it raises the question
>>>   whether the atomic replacement operation needs to be transactionable.
> 
> Where "transactionable" means you can bind it together with other
> transactionable operations, and the resulting transaction either
> succeeds or fails completely.  Right?
> 
> In increasing flexibility order:
> 
> a. atomic single replacement, not transactionable
> 
> b. atomic multiple replacements, not transactionable
> 
> c. single replacement, transactionable
> 
> In your throttling example, having to insert the throttle in two
> separate operations (either of which can fail) seems tolerable.
> 
> Can we think of an example where a similar multi-replacement should
> either succeed or fail completely?  I.e. one that needs at least b.
> 
> Can we think of an example where we need c?
> 
>>> * Remove a node between two (or more) other nodes.
>>>
>>>   Same as atomically replacing a child by a grandchild. For more than
>>>   two involved nodes, again a transactional version might be needed.
>>>
>>> So at the first sight, this operation seems to work as the general
>>> interface for dynamic reconfiguration.
> 
> Yes, we seem to be onto something here.  Finally!
> 
>> And actually, after re-reading the other branch of this email thread
>> where I replied to Berto that he wants to use bdrv_reopen() to change
>> the voting threshold for quorum, it occurred to me that bdrv_reopen()
>> should possibly also be this atomatic replacement function.
>>
>> After all, the references to other nodes are blockdev-add options, and
>> if we implement bdrv_reopen() fully so that it can change any options
>> that can be given to blockdev-add, that means that we must be able to
>> replace children.
> 
> You got a point.  Do you think implementing it will be hard?
> 
>>
>> Kevin
>>
>>> One problem we discussed earlier that I'm not sure whether it's related
>>> is filter nodes inserted automatically once we change e.g. the I/O
>>> throttling QMP commands to add a throttling filter BDS to the graph.
>>>
>>> If the user creates nodes A and B, but sets throttling options, they
>>> might end up with a chain like this:
>>>
>>>     A <- throttling <- B
> 
> Is "sets throttling options" == "uses the (then) legacy interface for
> throttling instead of the more powerful 'insert throttling node'
> interface"?
> 
>>> Now imagine that they want to add another filter between A and B, let's
>>> say blkdebug. They would need to know that they have to insert the new
>>> node between A and throttling or B and throttling, but not between A and
>>> B. If they tried to insert it between A and B, the algorithm above says
>>> that they would let blkdebug point to A, and replace B's child with
>>> blkdebug, the resulting tree wouldn't be throttled any more!
>>>
>>>     A <- blkdebug <- B
>>>
>>>          throttling (ref = 0 -> delete)
> 
> Can we phrase the operation differently?  Instead of "insert between A
> and B (silently replacing everything that is now between A and B)",
> say one of
> 
> 1a. Replace node A by A <- blkdebug
> 
> 1b. Replace node B by blkdebug <- B
> 
> 2a. Replace edge A <- B by <- blkdebug <-
>     Impossible in the current state, because there is no such edge.

What does 'edge' mean?

Thanks
Wen Congyang

> 
> 2b. Replace edge A <- by <- blkdebug <-
> 
> 2c. Replace edge <- B by <- blkdebug <-
> 
> See also node vs. edge replacement above.
> 
>>> Even if they knew that they have to consider the throttling node, it
>>> currently wouldn't have a node-name, and with Jeff's autogenerated names
>>> it wouldn't be predictable.
>>>
>>> Maybe the dynamic reconfiguration interface does need to be a bit
>>> cleverer.
>>>
>>>
>>> Anyway, after writing all of this, I'm almost convinced now that an
>>> experimental interface is the right thing to do in this patch series.
> 
> Let's do experimental now, and reconsider before the release.
> 
> .
>
Markus Armbruster Oct. 26, 2015, 7:24 a.m. UTC | #20
Wen Congyang <wency@cn.fujitsu.com> writes:

> On 10/21/2015 04:27 PM, Markus Armbruster wrote:
[...]
>> Can we phrase the operation differently?  Instead of "insert between A
>> and B (silently replacing everything that is now between A and B)",
>> say one of
>> 
>> 1a. Replace node A by A <- blkdebug
>> 
>> 1b. Replace node B by blkdebug <- B
>> 
>> 2a. Replace edge A <- B by <- blkdebug <-
>>     Impossible in the current state, because there is no such edge.
>
> What does 'edge' mean?

It's graph terminology: the BB and BDS serve as the graph's nodes
(a.k.a. vertices), and the pointers connecting them serve as the graph's
edges.

[...]
Wen Congyang Oct. 26, 2015, 7:25 a.m. UTC | #21
On 10/26/2015 03:24 PM, Markus Armbruster wrote:
> Wen Congyang <wency@cn.fujitsu.com> writes:
> 
>> On 10/21/2015 04:27 PM, Markus Armbruster wrote:
> [...]
>>> Can we phrase the operation differently?  Instead of "insert between A
>>> and B (silently replacing everything that is now between A and B)",
>>> say one of
>>>
>>> 1a. Replace node A by A <- blkdebug
>>>
>>> 1b. Replace node B by blkdebug <- B
>>>
>>> 2a. Replace edge A <- B by <- blkdebug <-
>>>     Impossible in the current state, because there is no such edge.
>>
>> What does 'edge' mean?
> 
> It's graph terminology: the BB and BDS serve as the graph's nodes
> (a.k.a. vertices), and the pointers connecting them serve as the graph's
> edges.

Thanks

Wen Congyang

> 
> [...]
> .
>
diff mbox

Patch

diff --git a/blockdev.c b/blockdev.c
index 32b04b4..8da0ffb 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3086,6 +3086,54 @@  fail:
     qmp_output_visitor_cleanup(ov);
 }
 
+void qmp_x_blockdev_child_add(const char *parent, const char *child,
+                              Error **errp)
+{
+    BlockDriverState *parent_bs, *child_bs;
+    Error *local_err = NULL;
+
+    parent_bs = bdrv_lookup_bs(parent, parent, &local_err);
+    if (!parent_bs) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    child_bs = bdrv_find_node(child);
+    if (!child_bs) {
+        error_setg(errp, "Node '%s' not found", child);
+        return;
+    }
+
+    bdrv_add_child(parent_bs, child_bs, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+    }
+}
+
+void qmp_x_blockdev_child_del(const char *parent, const char *child,
+                              Error **errp)
+{
+    BlockDriverState *parent_bs, *child_bs;
+    Error *local_err = NULL;
+
+    parent_bs = bdrv_lookup_bs(parent, parent, &local_err);
+    if (!parent_bs) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    child_bs = bdrv_find_node(child);
+    if (!child_bs) {
+        error_setg(errp, "Node '%s' not found", child);
+        return;
+    }
+
+    bdrv_del_child(parent_bs, child_bs, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+    }
+}
+
 BlockJobInfoList *qmp_query_block_jobs(Error **errp)
 {
     BlockJobInfoList *head = NULL, **p_next = &head;
diff --git a/qapi/block-core.json b/qapi/block-core.json
index bb2189e..9418f05 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2114,3 +2114,37 @@ 
 ##
 { 'command': 'block-set-write-threshold',
   'data': { 'node-name': 'str', 'write-threshold': 'uint64' } }
+
+##
+# @x-blockdev-child-add
+#
+# Add a new child to the parent BDS. Currently only the Quorum driver
+# implements this feature. This is useful to fix a broken quorum child.
+#
+# @parent: graph node name or id which the child will be added to.
+#
+# @child: graph node name that will be added.
+#
+# Note: this command is experimental, and not a stable API.
+#
+# Since: 2.5
+##
+{ 'command': 'x-blockdev-child-add',
+  'data' : { 'parent': 'str', 'child': 'str' } }
+
+##
+# @x-blockdev-child-del
+#
+# Remove a child from the parent BDS. Currently only the Quorum driver
+# implements this feature. This is useful to fix a broken quorum child.
+# Note, you can't remove a child if it would bring the quorum below its
+# threshold.
+#
+# @parent: graph node name or id from which the child will removed.
+#
+# @child: graph node name that will be removed.
+#
+# Since: 2.5
+##
+{ 'command': 'x-blockdev-child-del',
+  'data' : { 'parent': 'str', 'child': 'str' } }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 66f0300..36e75b1 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -3916,6 +3916,67 @@  Example (2):
 EQMP
 
     {
+        .name       = "x-blockdev-child-add",
+        .args_type  = "parent:B,child:B",
+        .mhandler.cmd_new = qmp_marshal_x_blockdev_child_add,
+    },
+
+SQMP
+x-blockdev-child-add
+------------
+
+Add a child to a quorum node.
+
+Arguments:
+
+- "parent": the quorum's id or node name
+- "child": the child node-name which will be added
+
+Note: this command is experimental, and not a stable API. It doesn't
+support all kinds of children, nor all block drivers.
+
+Example:
+
+-> { "execute": blockdev-add",
+    "arguments": { "options": { "driver": "raw",
+                                "node-name": "new_node",
+                                "id": "test_new_node",
+                                "file": { "driver": "file",
+                                          "filename": "test.raw" } } } }
+<- { "return": {} }
+-> { "execute": "x-blockdev-child-add",
+    "arguments": { "parent": "disk1", "child": "new_node" } }
+<- { "return": {} }
+
+EQMP
+
+    {
+        .name        = "x-blockdev-child-del",
+        .args_type   = "parent:B,child:B",
+        .mhandler.cmd_new = qmp_marshal_x_blockdev_child_del,
+    },
+
+SQMP
+x-blockdev-child-del
+------------
+
+Delete a child from a quorum node. It can be used to remove a broken
+quorum child.
+
+Arguments:
+
+- "parent": the quorum's id or node name
+- "child": the child node-name which will be removed
+
+Example:
+
+-> { "execute": "x-blockdev-child-del",
+    "arguments": { "parent": "disk1", "child": "new_node" } }
+<- { "return": {} }
+
+EQMP
+
+    {
         .name       = "query-named-block-nodes",
         .args_type  = "",
         .mhandler.cmd_new = qmp_marshal_query_named_block_nodes,