diff mbox

qemu-img: use blk_co_pwrite_zeroes for zero sectors when compressed

Message ID 1492677526-4739-1-git-send-email-lidongchen@tencent.com
State New
Headers show

Commit Message

858585 jemmy April 20, 2017, 8:38 a.m. UTC
From: Lidong Chen <lidongchen@tencent.com>

when the buffer is zero, blk_co_pwrite_zeroes is more effectively than
blk_co_pwritev with BDRV_REQ_WRITE_COMPRESSED. this patch can reduces
the time when converts the qcow2 image with lots of zero.

Signed-off-by: Lidong Chen <lidongchen@tencent.com>
---
 qemu-img.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

Comments

858585 jemmy April 20, 2017, 8:52 a.m. UTC | #1
On Thu, Apr 20, 2017 at 4:38 PM,  <jemmy858585@gmail.com> wrote:
> From: Lidong Chen <lidongchen@tencent.com>
>
> when the buffer is zero, blk_co_pwrite_zeroes is more effectively than
> blk_co_pwritev with BDRV_REQ_WRITE_COMPRESSED. this patch can reduces
> the time when converts the qcow2 image with lots of zero.
>

the original qcow2 file which have lots of cluster unallocated:
[root]# qemu-img info /mnt/img2016111016860868_old.qcow2
image: /mnt/img2016111016860868_old.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 214M (224460800 bytes)
cluster_size: 65536
backing file: /baseimage/img2015122818606660/img2015122818606660.qcow2

the time used for qemu-img convert:
[root ~]# time /root/kvm/bin/qemu-img convert -c -p -o
backing_file=/baseimage/img2015122818606660/img2015122818606660.qcow2
-O
qcow2 /mnt/img2016111016860868.qcow2 /mnt/img2016111016860868_old.qcow2
    (100.00/100%)
real    0m29.456s
user    0m29.345s
sys     0m0.481s

run dd if=/dev/zero of=./test bs=65536 in guest os
then convert again.

before apply this patch:
[root~]# time /root/kvm/bin/qemu-img convert -c -p -o
backing_file=/baseimage/img2015122818606660/img2015122818606660.qcow2
-O qcow2 /mnt/img2016111016860868.qcow2 /mnt/img2016111016860868_new.qcow2
    (100.00/100%)

real    5m35.617s
user    5m33.417s
sys     0m10.699s

after apply this patch:
[root~]# time /root/kvm/bin/qemu-img convert -c -p -o
backing_file=/baseimage/img2015122818606660/img2015122818606660.qcow2
-O
qcow2 /mnt/img2016111016860868.qcow2 /mnt/img2016111016860868_new1.qcow2
    (100.00/100%)

real    0m51.189s
user    0m35.239s
sys     0m14.251s

the time reduce from 5m35.617s to 0m51.189s.

[root ]# ll /mnt/img2016111016860868* -h
-rw-r--r-- 1 root root 254M Apr 20 14:50 /mnt/img2016111016860868_new.qcow2
-rw-r--r-- 1 root root 232M Apr 20 15:27 /mnt/img2016111016860868_new1.qcow2

the size reduce from 254M to 232M.

> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
> ---
>  qemu-img.c | 19 +++++++++++++------
>  1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/qemu-img.c b/qemu-img.c
> index b220cf7..0256539 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1675,13 +1675,20 @@ static int coroutine_fn convert_co_write(ImgConvertState *s, int64_t sector_num,
>               * write if the buffer is completely zeroed and we're allowed to
>               * keep the target sparse. */
>              if (s->compressed) {
> -                if (s->has_zero_init && s->min_sparse &&
> -                    buffer_is_zero(buf, n * BDRV_SECTOR_SIZE))
> -                {
> -                    assert(!s->target_has_backing);
> -                    break;
> +                if (buffer_is_zero(buf, n * BDRV_SECTOR_SIZE)) {
> +                    if (s->has_zero_init && s->min_sparse) {
> +                        assert(!s->target_has_backing);
> +                        break;
> +                    } else {
> +                        ret = blk_co_pwrite_zeroes(s->target,
> +                                           sector_num << BDRV_SECTOR_BITS,
> +                                           n << BDRV_SECTOR_BITS, 0);
> +                        if (ret < 0) {
> +                            return ret;
> +                        }
> +                        break;
> +                    }
>                  }
> -
>                  iov.iov_base = buf;
>                  iov.iov_len = n << BDRV_SECTOR_BITS;
>                  qemu_iovec_init_external(&qiov, &iov, 1);
> --
> 1.8.3.1
>
Kevin Wolf April 20, 2017, 10 a.m. UTC | #2
Am 20.04.2017 um 10:38 hat jemmy858585@gmail.com geschrieben:
> From: Lidong Chen <lidongchen@tencent.com>
> 
> when the buffer is zero, blk_co_pwrite_zeroes is more effectively than
> blk_co_pwritev with BDRV_REQ_WRITE_COMPRESSED. this patch can reduces
> the time when converts the qcow2 image with lots of zero.
> 
> Signed-off-by: Lidong Chen <lidongchen@tencent.com>

Good catch, using blk_co_pwrite_zeroes() makes sense even for compressed
images.

> diff --git a/qemu-img.c b/qemu-img.c
> index b220cf7..0256539 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1675,13 +1675,20 @@ static int coroutine_fn convert_co_write(ImgConvertState *s, int64_t sector_num,
>               * write if the buffer is completely zeroed and we're allowed to
>               * keep the target sparse. */
>              if (s->compressed) {
> -                if (s->has_zero_init && s->min_sparse &&
> -                    buffer_is_zero(buf, n * BDRV_SECTOR_SIZE))
> -                {
> -                    assert(!s->target_has_backing);
> -                    break;
> +                if (buffer_is_zero(buf, n * BDRV_SECTOR_SIZE)) {
> +                    if (s->has_zero_init && s->min_sparse) {
> +                        assert(!s->target_has_backing);
> +                        break;
> +                    } else {
> +                        ret = blk_co_pwrite_zeroes(s->target,
> +                                           sector_num << BDRV_SECTOR_BITS,
> +                                           n << BDRV_SECTOR_BITS, 0);
> +                        if (ret < 0) {
> +                            return ret;
> +                        }
> +                        break;
> +                    }
>                  }

If s->min_sparse == 0, we may neither skip the write not use
blk_co_pwrite_zeroes(), because this requires actual full allocation
with explicit zero sectors.

Of course, if you fix this, what you end up with here is a duplicate of
the code path for non-compressed images. The remaining difference seems
to be the BDRV_REQ_WRITE_COMPRESSED flag and buffer_is_zero() vs.
is_allocated_sectors_min() (because uncompressed clusters can be written
partially, but compressed clusters can't).

So I suppose that instead of just fixing the above bug, we could actually
mostly unify the two code paths, if you want to have a try at it.

Kevin
858585 jemmy April 21, 2017, 2:58 a.m. UTC | #3
On Thu, Apr 20, 2017 at 6:00 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 20.04.2017 um 10:38 hat jemmy858585@gmail.com geschrieben:
>> From: Lidong Chen <lidongchen@tencent.com>
>>
>> when the buffer is zero, blk_co_pwrite_zeroes is more effectively than
>> blk_co_pwritev with BDRV_REQ_WRITE_COMPRESSED. this patch can reduces
>> the time when converts the qcow2 image with lots of zero.
>>
>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>
> Good catch, using blk_co_pwrite_zeroes() makes sense even for compressed
> images.
>
>> diff --git a/qemu-img.c b/qemu-img.c
>> index b220cf7..0256539 100644
>> --- a/qemu-img.c
>> +++ b/qemu-img.c
>> @@ -1675,13 +1675,20 @@ static int coroutine_fn convert_co_write(ImgConvertState *s, int64_t sector_num,
>>               * write if the buffer is completely zeroed and we're allowed to
>>               * keep the target sparse. */
>>              if (s->compressed) {
>> -                if (s->has_zero_init && s->min_sparse &&
>> -                    buffer_is_zero(buf, n * BDRV_SECTOR_SIZE))
>> -                {
>> -                    assert(!s->target_has_backing);
>> -                    break;
>> +                if (buffer_is_zero(buf, n * BDRV_SECTOR_SIZE)) {
>> +                    if (s->has_zero_init && s->min_sparse) {
>> +                        assert(!s->target_has_backing);
>> +                        break;
>> +                    } else {
>> +                        ret = blk_co_pwrite_zeroes(s->target,
>> +                                           sector_num << BDRV_SECTOR_BITS,
>> +                                           n << BDRV_SECTOR_BITS, 0);
>> +                        if (ret < 0) {
>> +                            return ret;
>> +                        }
>> +                        break;
>> +                    }
>>                  }
>
> If s->min_sparse == 0, we may neither skip the write not use
> blk_co_pwrite_zeroes(), because this requires actual full allocation
> with explicit zero sectors.
>
> Of course, if you fix this, what you end up with here is a duplicate of
> the code path for non-compressed images. The remaining difference seems
> to be the BDRV_REQ_WRITE_COMPRESSED flag and buffer_is_zero() vs.
> is_allocated_sectors_min() (because uncompressed clusters can be written
> partially, but compressed clusters can't).

I have a try to unify the code.

I don't understand why use  is_allocated_sectors_min when don't compressed.
the s->min_sparse is 8 default, which is smaller than cluster_sectors.

if a cluster which data is  8 sector zero and 8 sector non-zero
repeated, it will call
blk_co_pwritev and blk_co_pwrite_zeroes many times for a cluster.

why not compare the zero by cluster_sectors size?

>
> So I suppose that instead of just fixing the above bug, we could actually
> mostly unify the two code paths, if you want to have a try at it.
>
> Kevin
858585 jemmy April 21, 2017, 5:37 a.m. UTC | #4
On Fri, Apr 21, 2017 at 10:58 AM, 858585 jemmy <jemmy858585@gmail.com> wrote:
> On Thu, Apr 20, 2017 at 6:00 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> Am 20.04.2017 um 10:38 hat jemmy858585@gmail.com geschrieben:
>>> From: Lidong Chen <lidongchen@tencent.com>
>>>
>>> when the buffer is zero, blk_co_pwrite_zeroes is more effectively than
>>> blk_co_pwritev with BDRV_REQ_WRITE_COMPRESSED. this patch can reduces
>>> the time when converts the qcow2 image with lots of zero.
>>>
>>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>>
>> Good catch, using blk_co_pwrite_zeroes() makes sense even for compressed
>> images.
>>
>>> diff --git a/qemu-img.c b/qemu-img.c
>>> index b220cf7..0256539 100644
>>> --- a/qemu-img.c
>>> +++ b/qemu-img.c
>>> @@ -1675,13 +1675,20 @@ static int coroutine_fn convert_co_write(ImgConvertState *s, int64_t sector_num,
>>>               * write if the buffer is completely zeroed and we're allowed to
>>>               * keep the target sparse. */
>>>              if (s->compressed) {
>>> -                if (s->has_zero_init && s->min_sparse &&
>>> -                    buffer_is_zero(buf, n * BDRV_SECTOR_SIZE))
>>> -                {
>>> -                    assert(!s->target_has_backing);
>>> -                    break;
>>> +                if (buffer_is_zero(buf, n * BDRV_SECTOR_SIZE)) {
>>> +                    if (s->has_zero_init && s->min_sparse) {
>>> +                        assert(!s->target_has_backing);
>>> +                        break;
>>> +                    } else {
>>> +                        ret = blk_co_pwrite_zeroes(s->target,
>>> +                                           sector_num << BDRV_SECTOR_BITS,
>>> +                                           n << BDRV_SECTOR_BITS, 0);
>>> +                        if (ret < 0) {
>>> +                            return ret;
>>> +                        }
>>> +                        break;
>>> +                    }
>>>                  }
>>
>> If s->min_sparse == 0, we may neither skip the write not use
>> blk_co_pwrite_zeroes(), because this requires actual full allocation
>> with explicit zero sectors.
>>
>> Of course, if you fix this, what you end up with here is a duplicate of
>> the code path for non-compressed images. The remaining difference seems
>> to be the BDRV_REQ_WRITE_COMPRESSED flag and buffer_is_zero() vs.
>> is_allocated_sectors_min() (because uncompressed clusters can be written
>> partially, but compressed clusters can't).
>
> I have a try to unify the code.
>
> I don't understand why use  is_allocated_sectors_min when don't compressed.
> the s->min_sparse is 8 default, which is smaller than cluster_sectors.
>
> if a cluster which data is  8 sector zero and 8 sector non-zero
> repeated, it will call
> blk_co_pwritev and blk_co_pwrite_zeroes many times for a cluster.
>
> why not compare the zero by cluster_sectors size?

I write this code, run in guest os.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main()
{
    char *zero;
    char *nonzero;
    FILE* fp = fopen("./test.dat", "ab");

    zero = malloc(sizeof(char)*512*8);
    nonzero = malloc(sizeof(char)*512*8);

    memset(zero, 0, sizeof(char)*512*8);
    memset(nonzero, 1, sizeof(char)*512*8);

    while (1) {
        fwrite(zero, sizeof(char)*512*8, 1, fp);
        fwrite(nonzero, sizeof(char)*512*8, 1, fp);
    }
    fclose(fp);
}

qemu-img info /mnt/img2016111016860868.qcow2
image: /mnt/img2016111016860868.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 19G (20061552640 bytes)
cluster_size: 65536
backing file: /baseimage/img2016042213665396/img2016042213665396.qcow2

use -S 65536 option.

time /root/kvm/bin/qemu-img convert -p -B
/baseimage/img2016042213665396/img2016042213665396.qcow2 -O qcow2
/mnt/img2016111016860868.qcow2 /mnt/img2017041611162809_zip_new.qcow2
-S 65536
    (100.00/100%)

real    0m32.203s
user    0m5.165s
sys     0m27.887s

time /root/kvm/bin/qemu-img convert -p -B
/baseimage/img2016042213665396/img2016042213665396.qcow2 -O qcow2
/mnt/img2016111016860868.qcow2 /mnt/img2017041611162809_zip_new.qcow2
    (100.00/100%)

real    1m38.665s
user    0m45.418s
sys     1m7.518s

should we set cluster_sectors as the default value of s->min_sparse?

>
>>
>> So I suppose that instead of just fixing the above bug, we could actually
>> mostly unify the two code paths, if you want to have a try at it.
>>
>> Kevin
858585 jemmy April 23, 2017, 12:53 p.m. UTC | #5
On Fri, Apr 21, 2017 at 1:37 PM, 858585 jemmy <jemmy858585@gmail.com> wrote:
> On Fri, Apr 21, 2017 at 10:58 AM, 858585 jemmy <jemmy858585@gmail.com> wrote:
>> On Thu, Apr 20, 2017 at 6:00 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> Am 20.04.2017 um 10:38 hat jemmy858585@gmail.com geschrieben:
>>>> From: Lidong Chen <lidongchen@tencent.com>
>>>>
>>>> when the buffer is zero, blk_co_pwrite_zeroes is more effectively than
>>>> blk_co_pwritev with BDRV_REQ_WRITE_COMPRESSED. this patch can reduces
>>>> the time when converts the qcow2 image with lots of zero.
>>>>
>>>> Signed-off-by: Lidong Chen <lidongchen@tencent.com>
>>>
>>> Good catch, using blk_co_pwrite_zeroes() makes sense even for compressed
>>> images.
>>>
>>>> diff --git a/qemu-img.c b/qemu-img.c
>>>> index b220cf7..0256539 100644
>>>> --- a/qemu-img.c
>>>> +++ b/qemu-img.c
>>>> @@ -1675,13 +1675,20 @@ static int coroutine_fn convert_co_write(ImgConvertState *s, int64_t sector_num,
>>>>               * write if the buffer is completely zeroed and we're allowed to
>>>>               * keep the target sparse. */
>>>>              if (s->compressed) {
>>>> -                if (s->has_zero_init && s->min_sparse &&
>>>> -                    buffer_is_zero(buf, n * BDRV_SECTOR_SIZE))
>>>> -                {
>>>> -                    assert(!s->target_has_backing);
>>>> -                    break;
>>>> +                if (buffer_is_zero(buf, n * BDRV_SECTOR_SIZE)) {
>>>> +                    if (s->has_zero_init && s->min_sparse) {
>>>> +                        assert(!s->target_has_backing);
>>>> +                        break;
>>>> +                    } else {
>>>> +                        ret = blk_co_pwrite_zeroes(s->target,
>>>> +                                           sector_num << BDRV_SECTOR_BITS,
>>>> +                                           n << BDRV_SECTOR_BITS, 0);
>>>> +                        if (ret < 0) {
>>>> +                            return ret;
>>>> +                        }
>>>> +                        break;
>>>> +                    }
>>>>                  }
>>>
>>> If s->min_sparse == 0, we may neither skip the write not use
>>> blk_co_pwrite_zeroes(), because this requires actual full allocation
>>> with explicit zero sectors.
>>>
>>> Of course, if you fix this, what you end up with here is a duplicate of
>>> the code path for non-compressed images. The remaining difference seems
>>> to be the BDRV_REQ_WRITE_COMPRESSED flag and buffer_is_zero() vs.
>>> is_allocated_sectors_min() (because uncompressed clusters can be written
>>> partially, but compressed clusters can't).
>>
>> I have a try to unify the code.
>>
>> I don't understand why use  is_allocated_sectors_min when don't compressed.
>> the s->min_sparse is 8 default, which is smaller than cluster_sectors.
>>
>> if a cluster which data is  8 sector zero and 8 sector non-zero
>> repeated, it will call
>> blk_co_pwritev and blk_co_pwrite_zeroes many times for a cluster.
>>
>> why not compare the zero by cluster_sectors size?
>
> I write this code, run in guest os.
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
>
> int main()
> {
>     char *zero;
>     char *nonzero;
>     FILE* fp = fopen("./test.dat", "ab");
>
>     zero = malloc(sizeof(char)*512*8);
>     nonzero = malloc(sizeof(char)*512*8);
>
>     memset(zero, 0, sizeof(char)*512*8);
>     memset(nonzero, 1, sizeof(char)*512*8);
>
>     while (1) {
>         fwrite(zero, sizeof(char)*512*8, 1, fp);
>         fwrite(nonzero, sizeof(char)*512*8, 1, fp);
>     }
>     fclose(fp);
> }
>
> qemu-img info /mnt/img2016111016860868.qcow2
> image: /mnt/img2016111016860868.qcow2
> file format: qcow2
> virtual size: 20G (21474836480 bytes)
> disk size: 19G (20061552640 bytes)
> cluster_size: 65536
> backing file: /baseimage/img2016042213665396/img2016042213665396.qcow2
>
> use -S 65536 option.
>
> time /root/kvm/bin/qemu-img convert -p -B
> /baseimage/img2016042213665396/img2016042213665396.qcow2 -O qcow2
> /mnt/img2016111016860868.qcow2 /mnt/img2017041611162809_zip_new.qcow2
> -S 65536
>     (100.00/100%)
>
> real    0m32.203s
> user    0m5.165s
> sys     0m27.887s
>
> time /root/kvm/bin/qemu-img convert -p -B
> /baseimage/img2016042213665396/img2016042213665396.qcow2 -O qcow2
> /mnt/img2016111016860868.qcow2 /mnt/img2017041611162809_zip_new.qcow2
>     (100.00/100%)
>
> real    1m38.665s
> user    0m45.418s
> sys     1m7.518s
>
> should we set cluster_sectors as the default value of s->min_sparse?

change the default value of s->min_sparse will break the API.
qemu-img --help describe that the default value is 4k.

  '-S' indicates the consecutive number of bytes (defaults to 4k) that must
       contain only zeros for qemu-img to create a sparse image during
       conversion. If the number of bytes is 0, the source will not be
scanned for
       unallocated or zero sectors, and the destination image will always be
       fully allocated

>
>>
>>>
>>> So I suppose that instead of just fixing the above bug, we could actually
>>> mostly unify the two code paths, if you want to have a try at it.
>>>
>>> Kevin
diff mbox

Patch

diff --git a/qemu-img.c b/qemu-img.c
index b220cf7..0256539 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1675,13 +1675,20 @@  static int coroutine_fn convert_co_write(ImgConvertState *s, int64_t sector_num,
              * write if the buffer is completely zeroed and we're allowed to
              * keep the target sparse. */
             if (s->compressed) {
-                if (s->has_zero_init && s->min_sparse &&
-                    buffer_is_zero(buf, n * BDRV_SECTOR_SIZE))
-                {
-                    assert(!s->target_has_backing);
-                    break;
+                if (buffer_is_zero(buf, n * BDRV_SECTOR_SIZE)) {
+                    if (s->has_zero_init && s->min_sparse) {
+                        assert(!s->target_has_backing);
+                        break;
+                    } else {
+                        ret = blk_co_pwrite_zeroes(s->target,
+                                           sector_num << BDRV_SECTOR_BITS,
+                                           n << BDRV_SECTOR_BITS, 0);
+                        if (ret < 0) {
+                            return ret;
+                        }
+                        break;
+                    }
                 }
-
                 iov.iov_base = buf;
                 iov.iov_len = n << BDRV_SECTOR_BITS;
                 qemu_iovec_init_external(&qiov, &iov, 1);