diff mbox series

[v2,2/3] qemu-img: rebase: Reduce reads on in-chain rebase

Message ID 20190502135828.42797-3-shmuel.eiderman@oracle.com
State New
Headers show
Series qemu-img: rebase: Improve/optimize rebase operation | expand

Commit Message

Sam Eiderman May 2, 2019, 1:58 p.m. UTC
In the following case:

(base) A <- B <- C (tip)

when running:

    qemu-img rebase -b A C

QEMU would read all sectors not allocated in the file being rebased (C)
and compare them to the new base image (A), regardless of whether they
were changed or even allocated anywhere along the chain between the new
base and the top image (B). This causes many unneeded reads when
rebasing an image which represents a small diff of a large disk, as it
would read most of the disk's sectors.

Instead, use bdrv_is_allocated_above() to reduce the number of
unnecessary reads.

Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
Signed-off-by: Eyal Moscovici <eyal.moscovici@oracle.com>
---
 qemu-img.c | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

Comments

Max Reitz May 23, 2019, 2:01 p.m. UTC | #1
On 02.05.19 15:58, Sam Eiderman wrote:
> In the following case:
> 
> (base) A <- B <- C (tip)
> 
> when running:
> 
>     qemu-img rebase -b A C
> 
> QEMU would read all sectors not allocated in the file being rebased (C)
> and compare them to the new base image (A), regardless of whether they
> were changed or even allocated anywhere along the chain between the new
> base and the top image (B). This causes many unneeded reads when
> rebasing an image which represents a small diff of a large disk, as it
> would read most of the disk's sectors.
> 
> Instead, use bdrv_is_allocated_above() to reduce the number of
> unnecessary reads.
> 
> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
> Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
> Signed-off-by: Eyal Moscovici <eyal.moscovici@oracle.com>
> ---
>  qemu-img.c | 25 ++++++++++++++++++++++++-
>  1 file changed, 24 insertions(+), 1 deletion(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index d9b609b3f0..7f20858cb9 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c

[...]

> @@ -3422,6 +3428,23 @@ static int img_rebase(int argc, char **argv)
>                  continue;
>              }
>  
> +            if (prefix_chain_bs) {
> +                /*
> +                 * If cluster wasn't changed since prefix_chain, we don't need
> +                 * to take action
> +                 */
> +                ret = bdrv_is_allocated_above(bs, prefix_chain_bs,
> +                                              offset, n, &n);

This will always return true because it definitely is allocated in @bs,
or we wouldn’t be here.  (We just checked that with
bdrv_is_allocated().)  I think @top should be backing_bs(bs).

Max

> +                if (ret < 0) {
> +                    error_report("error while reading image metadata: %s",
> +                                 strerror(-ret));
> +                    goto out;
> +                }
> +                if (!ret) {
> +                    continue;
> +                }
> +            }
> +
>              /*
>               * Read old and new backing file and take into consideration that
>               * backing files may be smaller than the COW image.
>
Sam Eiderman May 23, 2019, 2:09 p.m. UTC | #2
> On 23 May 2019, at 17:01, Max Reitz <mreitz@redhat.com> wrote:
> 
> On 02.05.19 15:58, Sam Eiderman wrote:
>> In the following case:
>> 
>> (base) A <- B <- C (tip)
>> 
>> when running:
>> 
>>    qemu-img rebase -b A C
>> 
>> QEMU would read all sectors not allocated in the file being rebased (C)
>> and compare them to the new base image (A), regardless of whether they
>> were changed or even allocated anywhere along the chain between the new
>> base and the top image (B). This causes many unneeded reads when
>> rebasing an image which represents a small diff of a large disk, as it
>> would read most of the disk's sectors.
>> 
>> Instead, use bdrv_is_allocated_above() to reduce the number of
>> unnecessary reads.
>> 
>> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com>
>> Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com>
>> Signed-off-by: Eyal Moscovici <eyal.moscovici@oracle.com>
>> ---
>> qemu-img.c | 25 ++++++++++++++++++++++++-
>> 1 file changed, 24 insertions(+), 1 deletion(-)
>> 
>> diff --git a/qemu-img.c b/qemu-img.c
>> index d9b609b3f0..7f20858cb9 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
> 
> [...]
> 
>> @@ -3422,6 +3428,23 @@ static int img_rebase(int argc, char **argv)
>>                 continue;
>>             }
>> 
>> +            if (prefix_chain_bs) {
>> +                /*
>> +                 * If cluster wasn't changed since prefix_chain, we don't need
>> +                 * to take action
>> +                 */
>> +                ret = bdrv_is_allocated_above(bs, prefix_chain_bs,
>> +                                              offset, n, &n);
> 
> This will always return true because it definitely is allocated in @bs,
> or we wouldn’t be here.  (We just checked that with
> bdrv_is_allocated().)  I think @top should be backing_bs(bs).
> 
> Max

I don’t think that’s true:

Examine the case where we have the following chain:

A <- B <- C

When we rebase C directly over A: qemu-img rebase -b A C

We must check for every offset (sector): bdrv_is_allocated_above(C, A, offset, n, &n);

If a sector from C is allocated above A - it may have been changed - so we need to do a read from A and a read from C and compare.
If the sector is not allocated above, it was not changed - we don’t need to read from A or C.

Sam


> 
>> +                if (ret < 0) {
>> +                    error_report("error while reading image metadata: %s",
>> +                                 strerror(-ret));
>> +                    goto out;
>> +                }
>> +                if (!ret) {
>> +                    continue;
>> +                }
>> +            }
>> +
>>             /*
>>              * Read old and new backing file and take into consideration that
>>              * backing files may be smaller than the COW image.
Max Reitz May 23, 2019, 2:26 p.m. UTC | #3
On 23.05.19 16:09, Sam Eiderman wrote:
> 
> 
>> On 23 May 2019, at 17:01, Max Reitz <mreitz@redhat.com
>> <mailto:mreitz@redhat.com>> wrote:
>>
>> On 02.05.19 15:58, Sam Eiderman wrote:
>>> In the following case:
>>>
>>> (base) A <- B <- C (tip)
>>>
>>> when running:
>>>
>>>    qemu-img rebase -b A C
>>>
>>> QEMU would read all sectors not allocated in the file being rebased (C)
>>> and compare them to the new base image (A), regardless of whether they
>>> were changed or even allocated anywhere along the chain between the new
>>> base and the top image (B). This causes many unneeded reads when
>>> rebasing an image which represents a small diff of a large disk, as it
>>> would read most of the disk's sectors.
>>>
>>> Instead, use bdrv_is_allocated_above() to reduce the number of
>>> unnecessary reads.
>>>
>>> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com
>>> <mailto:karl.heubaum@oracle.com>>
>>> Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com
>>> <mailto:shmuel.eiderman@oracle.com>>
>>> Signed-off-by: Eyal Moscovici <eyal.moscovici@oracle.com
>>> <mailto:eyal.moscovici@oracle.com>>
>>> ---
>>> qemu-img.c | 25 ++++++++++++++++++++++++-
>>> 1 file changed, 24 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/qemu-img.c b/qemu-img.c
>>> index d9b609b3f0..7f20858cb9 100644
>>> --- a/qemu-img.c
>>> +++ b/qemu-img.c
>>
>> [...]
>>
>>> @@ -3422,6 +3428,23 @@ static int img_rebase(int argc, char **argv)
>>>                 continue;
>>>             }
>>>
>>> +            if (prefix_chain_bs) {
>>> +                /*
>>> +                 * If cluster wasn't changed since prefix_chain, we
>>> don't need
>>> +                 * to take action
>>> +                 */
>>> +                ret = bdrv_is_allocated_above(bs, prefix_chain_bs,
>>> +                                              offset, n, &n);
>>
>> This will always return true because it definitely is allocated in @bs,
>> or we wouldn’t be here.  (We just checked that with
>> bdrv_is_allocated().)  I think @top should be backing_bs(bs).
>>
>> Max
> 
> I don’t think that’s true:
> 
> Examine the case where we have the following chain:
> 
> A <- B <- C
> 
> When we rebase C directly over A: qemu-img rebase -b A C
> 
> We must check for every offset (sector): bdrv_is_allocated_above(C, A,
> offset, n, &n);
> 
> If a sector from C is allocated above A - it may have been changed - so
> we need to do a read from A and a read from C and compare.
> If the sector is not allocated above, it was not changed - we don’t need
> to read from A or C.

First: Oops, somehow I inverted the bdrv_is_allocated() check in my
head.  (For context: I mean this part above this hunk here:

    /* If the cluster is allocated, we don't need to take action */
    ret = bdrv_is_allocated(bs, offset, n, &n);
    if (ret < 0) {
        error_report("error while reading image metadata: %s",
                     strerror(-ret));
        goto out;
    }



    if (ret) {
        continue;
    }

)  So at this point, the range definitely is *not* allocated in @bs.

But second: That still means that we do not have to check @bs itself,
because we already did.  We know the range isn’t allocated there, so we
can start at its backing file.

On a more abstract level: No, we do not need to read all sectors from A
and C and compare them if they are allocated anywhere above A.  If they
are allocated in C, we’re good, because all we’d do is write them back
to C (which is a no-op).  That’s exactly what the existing
bdrv_is_allocated() check is for.

So we only need to know whether the sectors are allocated above the base
(A) and below the top (C), so in your example, whether they are
allocated in B.  If they are, we need to compare and potentially copy,
if they are not, we can skip them.

So my claim that bdrv_is_allocated_above() would always return true is
wrong, but it still should use backing_bs(bs) for the top because we
have checked @bs already.

Max
Sam Eiderman May 23, 2019, 3:37 p.m. UTC | #4
> On 23 May 2019, at 17:26, Max Reitz <mreitz@redhat.com> wrote:
> 
> On 23.05.19 16:09, Sam Eiderman wrote:
>> 
>> 
>>> On 23 May 2019, at 17:01, Max Reitz <mreitz@redhat.com
>>> <mailto:mreitz@redhat.com>> wrote:
>>> 
>>> On 02.05.19 15:58, Sam Eiderman wrote:
>>>> In the following case:
>>>> 
>>>> (base) A <- B <- C (tip)
>>>> 
>>>> when running:
>>>> 
>>>>    qemu-img rebase -b A C
>>>> 
>>>> QEMU would read all sectors not allocated in the file being rebased (C)
>>>> and compare them to the new base image (A), regardless of whether they
>>>> were changed or even allocated anywhere along the chain between the new
>>>> base and the top image (B). This causes many unneeded reads when
>>>> rebasing an image which represents a small diff of a large disk, as it
>>>> would read most of the disk's sectors.
>>>> 
>>>> Instead, use bdrv_is_allocated_above() to reduce the number of
>>>> unnecessary reads.
>>>> 
>>>> Reviewed-by: Karl Heubaum <karl.heubaum@oracle.com
>>>> <mailto:karl.heubaum@oracle.com>>
>>>> Signed-off-by: Sam Eiderman <shmuel.eiderman@oracle.com
>>>> <mailto:shmuel.eiderman@oracle.com>>
>>>> Signed-off-by: Eyal Moscovici <eyal.moscovici@oracle.com
>>>> <mailto:eyal.moscovici@oracle.com>>
>>>> ---
>>>> qemu-img.c | 25 ++++++++++++++++++++++++-
>>>> 1 file changed, 24 insertions(+), 1 deletion(-)
>>>> 
>>>> diff --git a/qemu-img.c b/qemu-img.c
>>>> index d9b609b3f0..7f20858cb9 100644
>>>> --- a/qemu-img.c
>>>> +++ b/qemu-img.c
>>> 
>>> [...]
>>> 
>>>> @@ -3422,6 +3428,23 @@ static int img_rebase(int argc, char **argv)
>>>>                 continue;
>>>>             }
>>>> 
>>>> +            if (prefix_chain_bs) {
>>>> +                /*
>>>> +                 * If cluster wasn't changed since prefix_chain, we
>>>> don't need
>>>> +                 * to take action
>>>> +                 */
>>>> +                ret = bdrv_is_allocated_above(bs, prefix_chain_bs,
>>>> +                                              offset, n, &n);
>>> 
>>> This will always return true because it definitely is allocated in @bs,
>>> or we wouldn’t be here.  (We just checked that with
>>> bdrv_is_allocated().)  I think @top should be backing_bs(bs).
>>> 
>>> Max
>> 
>> I don’t think that’s true:
>> 
>> Examine the case where we have the following chain:
>> 
>> A <- B <- C
>> 
>> When we rebase C directly over A: qemu-img rebase -b A C
>> 
>> We must check for every offset (sector): bdrv_is_allocated_above(C, A,
>> offset, n, &n);
>> 
>> If a sector from C is allocated above A - it may have been changed - so
>> we need to do a read from A and a read from C and compare.
>> If the sector is not allocated above, it was not changed - we don’t need
>> to read from A or C.
> 
> First: Oops, somehow I inverted the bdrv_is_allocated() check in my
> head.  (For context: I mean this part above this hunk here:
> 
>    /* If the cluster is allocated, we don't need to take action */
>    ret = bdrv_is_allocated(bs, offset, n, &n);
>    if (ret < 0) {
>        error_report("error while reading image metadata: %s",
>                     strerror(-ret));
>        goto out;
>    }
> 
> 
> 
>    if (ret) {
>        continue;
>    }
> 
> )  So at this point, the range definitely is *not* allocated in @bs.
> 
> But second: That still means that we do not have to check @bs itself,
> because we already did.  We know the range isn’t allocated there, so we
> can start at its backing file.
> 
> On a more abstract level: No, we do not need to read all sectors from A
> and C and compare them if they are allocated anywhere above A.  If they
> are allocated in C, we’re good, because all we’d do is write them back
> to C (which is a no-op).  That’s exactly what the existing
> bdrv_is_allocated() check is for.
> 
> So we only need to know whether the sectors are allocated above the base
> (A) and below the top (C), so in your example, whether they are
> allocated in B.  If they are, we need to compare and potentially copy,
> if they are not, we can skip them.
> 
> So my claim that bdrv_is_allocated_above() would always return true is
> wrong, but it still should use backing_bs(bs) for the top because we
> have checked @bs already.

I see your point, basically we save a single iteration in the loop at bdrv_is_allocated_above.
I’ll submit a v4 patch series.

Sam

> 
> Max
>
diff mbox series

Patch

diff --git a/qemu-img.c b/qemu-img.c
index d9b609b3f0..7f20858cb9 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3152,7 +3152,7 @@  static int img_rebase(int argc, char **argv)
     BlockBackend *blk = NULL, *blk_old_backing = NULL, *blk_new_backing = NULL;
     uint8_t *buf_old = NULL;
     uint8_t *buf_new = NULL;
-    BlockDriverState *bs = NULL;
+    BlockDriverState *bs = NULL, *prefix_chain_bs = NULL;
     char *filename;
     const char *fmt, *cache, *src_cache, *out_basefmt, *out_baseimg;
     int c, flags, src_flags, ret;
@@ -3343,6 +3343,12 @@  static int img_rebase(int argc, char **argv)
                 goto out;
             }
 
+            /*
+             * Find out whether we rebase an image on top of a previous image
+             * in its chain.
+             */
+            prefix_chain_bs = bdrv_find_backing_image(bs, out_real_path);
+
             blk_new_backing = blk_new_open(out_real_path, NULL,
                                            options, src_flags, &local_err);
             g_free(out_real_path);
@@ -3422,6 +3428,23 @@  static int img_rebase(int argc, char **argv)
                 continue;
             }
 
+            if (prefix_chain_bs) {
+                /*
+                 * If cluster wasn't changed since prefix_chain, we don't need
+                 * to take action
+                 */
+                ret = bdrv_is_allocated_above(bs, prefix_chain_bs,
+                                              offset, n, &n);
+                if (ret < 0) {
+                    error_report("error while reading image metadata: %s",
+                                 strerror(-ret));
+                    goto out;
+                }
+                if (!ret) {
+                    continue;
+                }
+            }
+
             /*
              * Read old and new backing file and take into consideration that
              * backing files may be smaller than the COW image.