diff mbox series

[15/22] mirror: Prevent loops

Message ID 20190920152804.12875-16-mreitz@redhat.com
State New
Headers show
Series block: Fix check_to_replace_node() | expand

Commit Message

Max Reitz Sept. 20, 2019, 3:27 p.m. UTC
While bdrv_replace_node() will not follow through with it, a specific
@replaces asks the mirror job to create a loop.

For example, say both the source and the target share a child where the
source is a filter; by letting @replaces point to the common child, you
ask for a loop.

Or if you use @replaces in drive-mirror with sync=none and
mode=absolute-paths, you generally ask for a loop (@replaces must point
to a child of the source, and sync=none makes the source the backing
file of the target after the job).

bdrv_replace_node() will not create those loops, but it by doing so, it
ignores the user-requested configuration, which is not ideally either.
(In the first example above, the target's child will remain what it was,
which may still be reasonable.  But in the second example, the target
will just not become a child of the source, which is precisely what was
requested with @replaces.)

So prevent such configurations, both before the job, and before it
actually completes.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 include/block/block_int.h |  3 +++
 block.c                   | 30 ++++++++++++++++++++++++
 block/mirror.c            | 19 +++++++++++++++-
 blockdev.c                | 48 ++++++++++++++++++++++++++++++++++++++-
 4 files changed, 98 insertions(+), 2 deletions(-)

Comments

Vladimir Sementsov-Ogievskiy Sept. 26, 2019, 1:08 p.m. UTC | #1
20.09.2019 18:27, Max Reitz wrote:
> While bdrv_replace_node() will not follow through with it, a specific
> @replaces asks the mirror job to create a loop.
> 
> For example, say both the source and the target share a child where the
> source is a filter; by letting @replaces point to the common child, you
> ask for a loop.


source[filter]
   |
   v
child  <----- target

and we want target to be a child if itself. OK, it's a loop.

> 
> Or if you use @replaces in drive-mirror with sync=none and
> mode=absolute-paths, you generally ask for a loop (@replaces must point
> to a child of the source, and sync=none makes the source the backing
> file of the target after the job).
> 
> bdrv_replace_node() will not create those loops, but it by doing so, it

s/but it/but ?

> ignores the user-requested configuration, which is not ideally either.
> (In the first example above, the target's child will remain what it was,
> which may still be reasonable.  But in the second example, the target
> will just not become a child of the source, which is precisely what was
> requested with @replaces.)

so you say that second example is bad [1]

> 
> So prevent such configurations, both before the job, and before it
> actually completes.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>   include/block/block_int.h |  3 +++
>   block.c                   | 30 ++++++++++++++++++++++++
>   block/mirror.c            | 19 +++++++++++++++-
>   blockdev.c                | 48 ++++++++++++++++++++++++++++++++++++++-
>   4 files changed, 98 insertions(+), 2 deletions(-)
> 
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 70f26530c9..8a26a0d88a 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -1256,6 +1256,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
>   bool bdrv_recurse_can_replace(BlockDriverState *bs,
>                                 BlockDriverState *to_replace);
>   
> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
> +                      int min_level);
> +
>   /*
>    * Default implementation for drivers to pass bdrv_co_block_status() to
>    * their file.
> diff --git a/block.c b/block.c
> index 0d9b3de98f..332191fb47 100644
> --- a/block.c
> +++ b/block.c
> @@ -6260,6 +6260,36 @@ out:
>       return to_replace_bs;
>   }
>   
> +/*
> + * Return true iff @child is a (recursive) child of @parent, with at
> + * least @min_level edges between them.
> + *
> + * (If @min_level == 0, return true if @child == @parent.  For
> + * @min_level == 1, @child needs to be at least a real child; for
> + * @min_level == 2, it needs to be at least a grand-child; and so on.)
> + */
> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
> +                      int min_level)
> +{
> +    BdrvChild *c;
> +
> +    if (child == parent && min_level <= 0) {
> +        return true;
> +    }
> +
> +    if (!parent) {
> +        return false;
> +    }
> +
> +    QLIST_FOREACH(c, &parent->children, next) {
> +        if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>   /**
>    * Iterates through the list of runtime option keys that are said to
>    * be "strong" for a BDS.  An option is called "strong" if it changes
> diff --git a/block/mirror.c b/block/mirror.c
> index d877637e1e..f30a6933d8 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
>            * there.
>            */
>           if (bdrv_recurse_can_replace(src, to_replace)) {
> -            bdrv_replace_node(to_replace, target_bs, &local_err);
> +            /*
> +             * It is OK for @to_replace to be an immediate child of
> +             * @target_bs, because that is what happens with
> +             * drive-mirror sync=none mode=absolute-paths: target_bs's
> +             * backing file will be the source node, which is also
> +             * to_replace (by default).
> +             * bdrv_replace_node() handles this case by not letting
> +             * target_bs->backing point to itself, but to the source
> +             * still.

but here you said [1] is good

> +             */
> +            if (!bdrv_is_child_of(to_replace, target_bs, 2)) {
> +                bdrv_replace_node(to_replace, target_bs, &local_err);
> +            } else {


So, we allow bdrv_replace_node automatically handle case whith immediate child of target is
replaces.. But if consider several additional filters above target (so target is a filter),
we not allow it. Why filtered case is worse?

> +                error_setg(&local_err, "Can no longer replace '%s' by '%s', "
> +                           "because the former is now a child of the latter, "
> +                           "and doing so would thus create a loop",
> +                           to_replace->node_name, target_bs->node_name);
> +            }
>           } else {
>               error_setg(&local_err, "Can no longer replace '%s' by '%s', "
>                          "because it can no longer be guaranteed that doing so "
> diff --git a/blockdev.c b/blockdev.c
> index 0420bc29be..27344247d5 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -3845,7 +3845,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>       }
>   
>       if (has_replaces) {
> -        BlockDriverState *to_replace_bs;
> +        BlockDriverState *to_replace_bs, *target_backing_bs;
>           AioContext *replace_aio_context;
>           int64_t bs_size, replace_size;
>   
> @@ -3860,6 +3860,52 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
>               return;
>           }
>   
> +        if (bdrv_is_child_of(to_replace_bs, target, 1)) {
> +            error_setg(errp, "Replacing %s by %s would result in a loop, "
> +                       "because the former is a child of the latter",
> +                       to_replace_bs->node_name, target->node_name);
> +            return;
> +        }
> +
> +        if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN ||
> +            backing_mode == MIRROR_OPEN_BACKING_CHAIN)
> +        {
> +            /*
> +             * While we do not quite know what OPEN_BACKING_CHAIN
> +             * (used for mode=existing) will yield, it is probably
> +             * best to restrict it exactly like SOURCE_BACKING_CHAIN,
> +             * because that is our best guess.
> +             */
> +            switch (sync) {
> +            case MIRROR_SYNC_MODE_FULL:
> +                target_backing_bs = NULL;
> +                break;
> +
> +            case MIRROR_SYNC_MODE_TOP:
> +                target_backing_bs = backing_bs(bs);
> +                break;
> +
> +            case MIRROR_SYNC_MODE_NONE:
> +                target_backing_bs = bs;
> +                break;
> +
> +            default:
> +                abort();
> +            }
> +        } else {
> +            assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN);
> +            target_backing_bs = backing_bs(target);
> +        }
> +
> +        if (bdrv_is_child_of(to_replace_bs, target_backing_bs, 0)) {
> +            error_setg(errp, "Replacing '%s' by '%s' with this sync mode would "
> +                       "result in a loop, because the former would be a child "
> +                       "of the latter's backing file ('%s') after the mirror "
> +                       "job", to_replace_bs->node_name, target->node_name,
> +                       target_backing_bs->node_name);
> +            return;
> +        }
> +
>           replace_aio_context = bdrv_get_aio_context(to_replace_bs);
>           aio_context_acquire(replace_aio_context);
>           replace_size = bdrv_getlength(to_replace_bs);
>
Max Reitz Oct. 2, 2019, 12:36 p.m. UTC | #2
On 26.09.19 15:08, Vladimir Sementsov-Ogievskiy wrote:
> 20.09.2019 18:27, Max Reitz wrote:
>> While bdrv_replace_node() will not follow through with it, a specific
>> @replaces asks the mirror job to create a loop.
>>
>> For example, say both the source and the target share a child where the
>> source is a filter; by letting @replaces point to the common child, you
>> ask for a loop.
> 
> 
> source[filter]
>    |
>    v
> child  <----- target
> 
> and we want target to be a child if itself. OK, it's a loop.
> 
>>
>> Or if you use @replaces in drive-mirror with sync=none and
>> mode=absolute-paths, you generally ask for a loop (@replaces must point
>> to a child of the source, and sync=none makes the source the backing
>> file of the target after the job).
>>
>> bdrv_replace_node() will not create those loops, but it by doing so, it
> 
> s/but it/but ?

Yep.

>> ignores the user-requested configuration, which is not ideally either.
>> (In the first example above, the target's child will remain what it was,
>> which may still be reasonable.  But in the second example, the target
>> will just not become a child of the source, which is precisely what was
>> requested with @replaces.)
> 
> so you say that second example is bad [1]
> 
>>
>> So prevent such configurations, both before the job, and before it
>> actually completes.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   include/block/block_int.h |  3 +++
>>   block.c                   | 30 ++++++++++++++++++++++++
>>   block/mirror.c            | 19 +++++++++++++++-
>>   blockdev.c                | 48 ++++++++++++++++++++++++++++++++++++++-
>>   4 files changed, 98 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index 70f26530c9..8a26a0d88a 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -1256,6 +1256,9 @@ void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
>>   bool bdrv_recurse_can_replace(BlockDriverState *bs,
>>                                 BlockDriverState *to_replace);
>>   
>> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
>> +                      int min_level);
>> +
>>   /*
>>    * Default implementation for drivers to pass bdrv_co_block_status() to
>>    * their file.
>> diff --git a/block.c b/block.c
>> index 0d9b3de98f..332191fb47 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -6260,6 +6260,36 @@ out:
>>       return to_replace_bs;
>>   }
>>   
>> +/*
>> + * Return true iff @child is a (recursive) child of @parent, with at
>> + * least @min_level edges between them.
>> + *
>> + * (If @min_level == 0, return true if @child == @parent.  For
>> + * @min_level == 1, @child needs to be at least a real child; for
>> + * @min_level == 2, it needs to be at least a grand-child; and so on.)
>> + */
>> +bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
>> +                      int min_level)
>> +{
>> +    BdrvChild *c;
>> +
>> +    if (child == parent && min_level <= 0) {
>> +        return true;
>> +    }
>> +
>> +    if (!parent) {
>> +        return false;
>> +    }
>> +
>> +    QLIST_FOREACH(c, &parent->children, next) {
>> +        if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
>> +            return true;
>> +        }
>> +    }
>> +
>> +    return false;
>> +}
>> +
>>   /**
>>    * Iterates through the list of runtime option keys that are said to
>>    * be "strong" for a BDS.  An option is called "strong" if it changes
>> diff --git a/block/mirror.c b/block/mirror.c
>> index d877637e1e..f30a6933d8 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -701,7 +701,24 @@ static int mirror_exit_common(Job *job)
>>            * there.
>>            */
>>           if (bdrv_recurse_can_replace(src, to_replace)) {
>> -            bdrv_replace_node(to_replace, target_bs, &local_err);
>> +            /*
>> +             * It is OK for @to_replace to be an immediate child of
>> +             * @target_bs, because that is what happens with
>> +             * drive-mirror sync=none mode=absolute-paths: target_bs's
>> +             * backing file will be the source node, which is also
>> +             * to_replace (by default).
>> +             * bdrv_replace_node() handles this case by not letting
>> +             * target_bs->backing point to itself, but to the source
>> +             * still.
> 
> but here you said [1] is good

Well.  Not quite.

In [1] I say that sync=none mode=absolute-paths replaces=some-node will
not work as intended, because the target will not replace some-node.
(bdrv_replace_node() prevents that.)

Here, it’s about when there is no @replaces option.  Then, the target
will absolutely replace the source, and target->backing will point to
the source.

Here’s what happens (as far as I understand):

Without @replaces:

target->backing is first set to the source.

Then, target replaces the source node.  bdrv_replace_node() will not
create loops, so it keeps the target->backing link as it is.

With replaces=some-node (must be a (recursive) child of the source):

target->backing is first set to the source.

Then target replaces some-node.  bdrv_replace_node() will not create
loops, so it keeps all links that would connect the source and the
target as they are.  Thus, the target will actually not really replace
some-node.

Example:

quorum --children.0--> to-replace

Now we mirror from quorum to some target (target-node) with sync=none,
mode=absolute-paths, and replaces=to-replace.

What we’re effectively asking for is this:

quorum --children.0--> target-node
  ^                         |
  +---------backing---------+

Which of course doesn’t work.  It makes sense if you break up the loop
by imagining that quorum simply won’t read from children.0, but in
reality it is a loop, so it can’t work.

What happens though is this:

  quorum --children.0--> to-replace
     ^
  backing
     |
target-node

The target node does not replace to-replace, because changing the
children.0 link to point to target-node would create a loop.  Thus, the
user doesn’t get what they want.

(Note that if to-replace has other parents, target-node would absolutely
replace it for them.)

>> +             */
>> +            if (!bdrv_is_child_of(to_replace, target_bs, 2)) {
>> +                bdrv_replace_node(to_replace, target_bs, &local_err);
>> +            } else {
> 
> 
> So, we allow bdrv_replace_node automatically handle case whith immediate child of target is
> replaces.. But if consider several additional filters above target (so target is a filter),
> we not allow it. Why filtered case is worse?

Because if it’s an immediate child, there are no other nodes involved.
Just one link stays as it is (e.g. target->backing simply still points
to the source node).

If it’s an indirect child, then there are other nodes involved and it’s
likely the user won’t get what they want (as in the example above, where
the replacement just won’t happen).


The pragmatic explanation is “We need to allow immediate children,
because that’s the no-@replaces sync=none mode=absolute-paths case.  We
need to forbid everything else because it’d probably lead to unexpected
results.”

Max
diff mbox series

Patch

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 70f26530c9..8a26a0d88a 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1256,6 +1256,9 @@  void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
 bool bdrv_recurse_can_replace(BlockDriverState *bs,
                               BlockDriverState *to_replace);
 
+bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
+                      int min_level);
+
 /*
  * Default implementation for drivers to pass bdrv_co_block_status() to
  * their file.
diff --git a/block.c b/block.c
index 0d9b3de98f..332191fb47 100644
--- a/block.c
+++ b/block.c
@@ -6260,6 +6260,36 @@  out:
     return to_replace_bs;
 }
 
+/*
+ * Return true iff @child is a (recursive) child of @parent, with at
+ * least @min_level edges between them.
+ *
+ * (If @min_level == 0, return true if @child == @parent.  For
+ * @min_level == 1, @child needs to be at least a real child; for
+ * @min_level == 2, it needs to be at least a grand-child; and so on.)
+ */
+bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
+                      int min_level)
+{
+    BdrvChild *c;
+
+    if (child == parent && min_level <= 0) {
+        return true;
+    }
+
+    if (!parent) {
+        return false;
+    }
+
+    QLIST_FOREACH(c, &parent->children, next) {
+        if (bdrv_is_child_of(child, c->bs, min_level - 1)) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
 /**
  * Iterates through the list of runtime option keys that are said to
  * be "strong" for a BDS.  An option is called "strong" if it changes
diff --git a/block/mirror.c b/block/mirror.c
index d877637e1e..f30a6933d8 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -701,7 +701,24 @@  static int mirror_exit_common(Job *job)
          * there.
          */
         if (bdrv_recurse_can_replace(src, to_replace)) {
-            bdrv_replace_node(to_replace, target_bs, &local_err);
+            /*
+             * It is OK for @to_replace to be an immediate child of
+             * @target_bs, because that is what happens with
+             * drive-mirror sync=none mode=absolute-paths: target_bs's
+             * backing file will be the source node, which is also
+             * to_replace (by default).
+             * bdrv_replace_node() handles this case by not letting
+             * target_bs->backing point to itself, but to the source
+             * still.
+             */
+            if (!bdrv_is_child_of(to_replace, target_bs, 2)) {
+                bdrv_replace_node(to_replace, target_bs, &local_err);
+            } else {
+                error_setg(&local_err, "Can no longer replace '%s' by '%s', "
+                           "because the former is now a child of the latter, "
+                           "and doing so would thus create a loop",
+                           to_replace->node_name, target_bs->node_name);
+            }
         } else {
             error_setg(&local_err, "Can no longer replace '%s' by '%s', "
                        "because it can no longer be guaranteed that doing so "
diff --git a/blockdev.c b/blockdev.c
index 0420bc29be..27344247d5 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3845,7 +3845,7 @@  static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
     }
 
     if (has_replaces) {
-        BlockDriverState *to_replace_bs;
+        BlockDriverState *to_replace_bs, *target_backing_bs;
         AioContext *replace_aio_context;
         int64_t bs_size, replace_size;
 
@@ -3860,6 +3860,52 @@  static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
             return;
         }
 
+        if (bdrv_is_child_of(to_replace_bs, target, 1)) {
+            error_setg(errp, "Replacing %s by %s would result in a loop, "
+                       "because the former is a child of the latter",
+                       to_replace_bs->node_name, target->node_name);
+            return;
+        }
+
+        if (backing_mode == MIRROR_SOURCE_BACKING_CHAIN ||
+            backing_mode == MIRROR_OPEN_BACKING_CHAIN)
+        {
+            /*
+             * While we do not quite know what OPEN_BACKING_CHAIN
+             * (used for mode=existing) will yield, it is probably
+             * best to restrict it exactly like SOURCE_BACKING_CHAIN,
+             * because that is our best guess.
+             */
+            switch (sync) {
+            case MIRROR_SYNC_MODE_FULL:
+                target_backing_bs = NULL;
+                break;
+
+            case MIRROR_SYNC_MODE_TOP:
+                target_backing_bs = backing_bs(bs);
+                break;
+
+            case MIRROR_SYNC_MODE_NONE:
+                target_backing_bs = bs;
+                break;
+
+            default:
+                abort();
+            }
+        } else {
+            assert(backing_mode == MIRROR_LEAVE_BACKING_CHAIN);
+            target_backing_bs = backing_bs(target);
+        }
+
+        if (bdrv_is_child_of(to_replace_bs, target_backing_bs, 0)) {
+            error_setg(errp, "Replacing '%s' by '%s' with this sync mode would "
+                       "result in a loop, because the former would be a child "
+                       "of the latter's backing file ('%s') after the mirror "
+                       "job", to_replace_bs->node_name, target->node_name,
+                       target_backing_bs->node_name);
+            return;
+        }
+
         replace_aio_context = bdrv_get_aio_context(to_replace_bs);
         aio_context_acquire(replace_aio_context);
         replace_size = bdrv_getlength(to_replace_bs);