diff mbox series

[RFC,2/2] qemu-img convert: Fix sparseness detection

Message ID 20210415152214.279844-3-kwolf@redhat.com
State New
Headers show
Series qemu-img convert: Fix sparseness detection | expand

Commit Message

Kevin Wolf April 15, 2021, 3:22 p.m. UTC
In order to avoid RMW cycles, is_allocated_sectors() treats zeroed areas
like non-zero data if the end of the checked area isn't aligned. This
can improve the efficiency of the conversion and was introduced in
commit 8dcd3c9b91a.

However, it comes with a correctness problem: qemu-img convert is
supposed to sparsify areas that contain only zeros, which it doesn't do
any more. It turns out that this even happens when not only the
unaligned area is zeroed, but also the blocks before and after it. In
the bug report, conversion of a fragmented 10G image containing only
zeros resulted in an image consuming 2.82 GiB even though the expected
size is only 4 KiB.

As a tradeoff between both, let's ignore zeroed sectors only after
non-zero data to fix the alignment, but if we're only looking at zeros,
keep them as such, even if it may mean additional RMW cycles.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1882917
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qemu-img.c                 | 18 ++++--------------
 tests/qemu-iotests/122.out | 12 ++++--------
 2 files changed, 8 insertions(+), 22 deletions(-)

Comments

Vladimir Sementsov-Ogievskiy April 20, 2021, 2:31 p.m. UTC | #1
15.04.2021 18:22, Kevin Wolf wrote:
> In order to avoid RMW cycles, is_allocated_sectors() treats zeroed areas
> like non-zero data if the end of the checked area isn't aligned. This
> can improve the efficiency of the conversion and was introduced in
> commit 8dcd3c9b91a.
> 
> However, it comes with a correctness problem: qemu-img convert is
> supposed to sparsify areas that contain only zeros, which it doesn't do
> any more. It turns out that this even happens when not only the
> unaligned area is zeroed, but also the blocks before and after it. In
> the bug report, conversion of a fragmented 10G image containing only
> zeros resulted in an image consuming 2.82 GiB even though the expected
> size is only 4 KiB.
> 
> As a tradeoff between both, let's ignore zeroed sectors only after
> non-zero data to fix the alignment, but if we're only looking at zeros,
> keep them as such, even if it may mean additional RMW cycles.
> 

Hmm.. If I understand correctly, we are going to do unaligned write-zero. And that helps. Doesn't that mean that alignment is wrongly detected?

> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1882917
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   qemu-img.c                 | 18 ++++--------------
>   tests/qemu-iotests/122.out | 12 ++++--------
>   2 files changed, 8 insertions(+), 22 deletions(-)
> 
> diff --git a/qemu-img.c b/qemu-img.c
> index a5993682aa..ca4eba2dd1 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1168,20 +1168,10 @@ static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum,
>       }
>   
>       tail = (sector_num + i) & (alignment - 1);
> -    if (tail) {
> -        if (is_zero && i <= tail) {
> -            /* treat unallocated areas which only consist
> -             * of a small tail as allocated. */
> -            is_zero = false;
> -        }
> -        if (!is_zero) {
> -            /* align up end offset of allocated areas. */
> -            i += alignment - tail;
> -            i = MIN(i, n);
> -        } else {
> -            /* align down end offset of zero areas. */
> -            i -= tail;
> -        }
> +    if (tail && !is_zero) {
> +        /* align up end offset of allocated areas. */
> +        i += alignment - tail;
> +        i = MIN(i, n);
>       }
>       *pnum = i;
>       return !is_zero;
> diff --git a/tests/qemu-iotests/122.out b/tests/qemu-iotests/122.out
> index dcc44a2304..fe0ea34164 100644
> --- a/tests/qemu-iotests/122.out
> +++ b/tests/qemu-iotests/122.out
> @@ -199,11 +199,9 @@ convert -S 4k
>   [{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
>   { "start": 4096, "length": 4096, "depth": 0, "zero": true, "data": false},
>   { "start": 8192, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
> -{ "start": 12288, "length": 4096, "depth": 0, "zero": true, "data": false},
> -{ "start": 16384, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
> -{ "start": 20480, "length": 46080, "depth": 0, "zero": true, "data": false},
> -{ "start": 66560, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
> -{ "start": 67584, "length": 67041280, "depth": 0, "zero": true, "data": false}]
> +{ "start": 12288, "length": 5120, "depth": 0, "zero": true, "data": false},
> +{ "start": 17408, "length": 3072, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
> +{ "start": 20480, "length": 67088384, "depth": 0, "zero": true, "data": false}]
>   
>   convert -c -S 4k
>   [{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true},
> @@ -215,9 +213,7 @@ convert -c -S 4k
>   
>   convert -S 8k
>   [{ "start": 0, "length": 24576, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
> -{ "start": 24576, "length": 41984, "depth": 0, "zero": true, "data": false},
> -{ "start": 66560, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
> -{ "start": 67584, "length": 67041280, "depth": 0, "zero": true, "data": false}]
> +{ "start": 24576, "length": 67084288, "depth": 0, "zero": true, "data": false}]
>   
>   convert -c -S 8k
>   [{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true},
>
Kevin Wolf April 20, 2021, 3:04 p.m. UTC | #2
Am 20.04.2021 um 16:31 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 15.04.2021 18:22, Kevin Wolf wrote:
> > In order to avoid RMW cycles, is_allocated_sectors() treats zeroed areas
> > like non-zero data if the end of the checked area isn't aligned. This
> > can improve the efficiency of the conversion and was introduced in
> > commit 8dcd3c9b91a.
> > 
> > However, it comes with a correctness problem: qemu-img convert is
> > supposed to sparsify areas that contain only zeros, which it doesn't do
> > any more. It turns out that this even happens when not only the
> > unaligned area is zeroed, but also the blocks before and after it. In
> > the bug report, conversion of a fragmented 10G image containing only
> > zeros resulted in an image consuming 2.82 GiB even though the expected
> > size is only 4 KiB.
> > 
> > As a tradeoff between both, let's ignore zeroed sectors only after
> > non-zero data to fix the alignment, but if we're only looking at zeros,
> > keep them as such, even if it may mean additional RMW cycles.
> > 
> 
> Hmm.. If I understand correctly, we are going to do unaligned
> write-zero. And that helps.

This can happen (mostly raw images on block devices, I think?), but
usually it just means skipping the write because we know that the target
image is already zeroed.

What it does mean is that if the next part is data, we'll have an
unaligned data write.

> Doesn't that mean that alignment is wrongly detected?

The problem is that you can have bdrv_block_status_above() return the
same allocation status multiple times in a row, but *pnum can be
unaligned for the conversion.

We only look at a single range returned by it when detecting the
alignment, so it could be that we have zero buffers for both 0-11 and
12-16 and detect two misaligned ranges, when both together are a
perfectly aligned zeroed range.

In theory we could try to do some lookahead and merge ranges where
possible, which should give us the perfect result, but it would make the
code considerably more complicated. (Whether we want to merge them
doesn't only depend on the block status, but possibly also on the
content of a DATA range.)

Kevin
Vladimir Sementsov-Ogievskiy April 20, 2021, 4:52 p.m. UTC | #3
20.04.2021 18:04, Kevin Wolf wrote:
> Am 20.04.2021 um 16:31 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> 15.04.2021 18:22, Kevin Wolf wrote:
>>> In order to avoid RMW cycles, is_allocated_sectors() treats zeroed areas
>>> like non-zero data if the end of the checked area isn't aligned. This
>>> can improve the efficiency of the conversion and was introduced in
>>> commit 8dcd3c9b91a.
>>>
>>> However, it comes with a correctness problem: qemu-img convert is
>>> supposed to sparsify areas that contain only zeros, which it doesn't do
>>> any more. It turns out that this even happens when not only the
>>> unaligned area is zeroed, but also the blocks before and after it. In
>>> the bug report, conversion of a fragmented 10G image containing only
>>> zeros resulted in an image consuming 2.82 GiB even though the expected
>>> size is only 4 KiB.
>>>
>>> As a tradeoff between both, let's ignore zeroed sectors only after
>>> non-zero data to fix the alignment, but if we're only looking at zeros,
>>> keep them as such, even if it may mean additional RMW cycles.
>>>
>>
>> Hmm.. If I understand correctly, we are going to do unaligned
>> write-zero. And that helps.
> 
> This can happen (mostly raw images on block devices, I think?), but
> usually it just means skipping the write because we know that the target
> image is already zeroed.
> 
> What it does mean is that if the next part is data, we'll have an
> unaligned data write.
> 
>> Doesn't that mean that alignment is wrongly detected?
> 
> The problem is that you can have bdrv_block_status_above() return the
> same allocation status multiple times in a row, but *pnum can be
> unaligned for the conversion.
> 
> We only look at a single range returned by it when detecting the
> alignment, so it could be that we have zero buffers for both 0-11 and
> 12-16 and detect two misaligned ranges, when both together are a
> perfectly aligned zeroed range.
> 
> In theory we could try to do some lookahead and merge ranges where
> possible, which should give us the perfect result, but it would make the
> code considerably more complicated. (Whether we want to merge them
> doesn't only depend on the block status, but possibly also on the
> content of a DATA range.)
> 
> Kevin
> 

Oh, I understand now the problem, thanks for explanation.

Hmm, yes that means, that if the whole buf is zero, is_allocated_sectors must not align it down, to be possibly "merged" with next chunk if it is zero too.

But it's still good to align zeroes down, if data starts somewhere inside the buf, isn't it?

what about something like this:

diff --git a/qemu-img.c b/qemu-img.c
index babb5573ab..d1704584a0 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1167,19 +1167,39 @@ static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum,
          }
      }
  
+    if (i == n) {
+        /*
+         * The whole buf is the same.
+         *
+         * if it's data, just return it. It's the old behavior.
+         *
+         * if it's zero, just return too. It will work good if target is alredy
+         * zeroed. And if next chunk is zero too we'll have no RMW and no reason
+         * to write data.
+         */
+        *pnum = i;
+        return !is_zero;
+    }
+
      tail = (sector_num + i) & (alignment - 1);
      if (tail) {
          if (is_zero && i <= tail) {
-            /* treat unallocated areas which only consist
-             * of a small tail as allocated. */
+            /*
+             * For sure next sector after i is data, and it will rewrite this
+             * tail anyway due to RMW. So, let's just write data now.
+             */
              is_zero = false;
          }
          if (!is_zero) {
-            /* align up end offset of allocated areas. */
+            /* If possible, align up end offset of allocated areas. */
              i += alignment - tail;
              i = MIN(i, n);
          } else {
-            /* align down end offset of zero areas. */
+            /*
+             * For sure next sector after i is data, and it will rewrite this
+             * tail anyway due to RMW. Better is avoid RMW and write zeroes up
+             * to aligned bound.
+             */
              i -= tail;
          }
      }
Peter Lieven May 19, 2021, 1:24 p.m. UTC | #4
Am 20.04.21 um 18:52 schrieb Vladimir Sementsov-Ogievskiy:
> 20.04.2021 18:04, Kevin Wolf wrote:
>> Am 20.04.2021 um 16:31 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> 15.04.2021 18:22, Kevin Wolf wrote:
>>>> In order to avoid RMW cycles, is_allocated_sectors() treats zeroed areas
>>>> like non-zero data if the end of the checked area isn't aligned. This
>>>> can improve the efficiency of the conversion and was introduced in
>>>> commit 8dcd3c9b91a.
>>>>
>>>> However, it comes with a correctness problem: qemu-img convert is
>>>> supposed to sparsify areas that contain only zeros, which it doesn't do
>>>> any more. It turns out that this even happens when not only the
>>>> unaligned area is zeroed, but also the blocks before and after it. In
>>>> the bug report, conversion of a fragmented 10G image containing only
>>>> zeros resulted in an image consuming 2.82 GiB even though the expected
>>>> size is only 4 KiB.
>>>>
>>>> As a tradeoff between both, let's ignore zeroed sectors only after
>>>> non-zero data to fix the alignment, but if we're only looking at zeros,
>>>> keep them as such, even if it may mean additional RMW cycles.
>>>>
>>>
>>> Hmm.. If I understand correctly, we are going to do unaligned
>>> write-zero. And that helps.
>>
>> This can happen (mostly raw images on block devices, I think?), but
>> usually it just means skipping the write because we know that the target
>> image is already zeroed.
>>
>> What it does mean is that if the next part is data, we'll have an
>> unaligned data write.
>>
>>> Doesn't that mean that alignment is wrongly detected?
>>
>> The problem is that you can have bdrv_block_status_above() return the
>> same allocation status multiple times in a row, but *pnum can be
>> unaligned for the conversion.
>>
>> We only look at a single range returned by it when detecting the
>> alignment, so it could be that we have zero buffers for both 0-11 and
>> 12-16 and detect two misaligned ranges, when both together are a
>> perfectly aligned zeroed range.
>>
>> In theory we could try to do some lookahead and merge ranges where
>> possible, which should give us the perfect result, but it would make the
>> code considerably more complicated. (Whether we want to merge them
>> doesn't only depend on the block status, but possibly also on the
>> content of a DATA range.)
>>
>> Kevin
>>
>
> Oh, I understand now the problem, thanks for explanation.
>
> Hmm, yes that means, that if the whole buf is zero, is_allocated_sectors must not align it down, to be possibly "merged" with next chunk if it is zero too.
>
> But it's still good to align zeroes down, if data starts somewhere inside the buf, isn't it?
>
> what about something like this:
>
> diff --git a/qemu-img.c b/qemu-img.c
> index babb5573ab..d1704584a0 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1167,19 +1167,39 @@ static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum,
>          }
>      }
>  
> +    if (i == n) {
> +        /*
> +         * The whole buf is the same.
> +         *
> +         * if it's data, just return it. It's the old behavior.
> +         *
> +         * if it's zero, just return too. It will work good if target is alredy
> +         * zeroed. And if next chunk is zero too we'll have no RMW and no reason
> +         * to write data.
> +         */
> +        *pnum = i;
> +        return !is_zero;
> +    }
> +
>      tail = (sector_num + i) & (alignment - 1);
>      if (tail) {
>          if (is_zero && i <= tail) {
> -            /* treat unallocated areas which only consist
> -             * of a small tail as allocated. */
> +            /*
> +             * For sure next sector after i is data, and it will rewrite this
> +             * tail anyway due to RMW. So, let's just write data now.
> +             */
>              is_zero = false;
>          }
>          if (!is_zero) {
> -            /* align up end offset of allocated areas. */
> +            /* If possible, align up end offset of allocated areas. */
>              i += alignment - tail;
>              i = MIN(i, n);
>          } else {
> -            /* align down end offset of zero areas. */
> +            /*
> +             * For sure next sector after i is data, and it will rewrite this
> +             * tail anyway due to RMW. Better is avoid RMW and write zeroes up
> +             * to aligned bound.
> +             */
>              i -= tail;
>          }
>      }
>
>

I think we forgot to follow up on this. Has anyone tested this suggestion?

Otherwise, I would try to rerun the tests I did with the my old and Kevins suggestion.


Peter
Kevin Wolf May 19, 2021, 4:48 p.m. UTC | #5
Am 19.05.2021 um 15:24 hat Peter Lieven geschrieben:
> Am 20.04.21 um 18:52 schrieb Vladimir Sementsov-Ogievskiy:
> > 20.04.2021 18:04, Kevin Wolf wrote:
> >> Am 20.04.2021 um 16:31 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >>> 15.04.2021 18:22, Kevin Wolf wrote:
> >>>> In order to avoid RMW cycles, is_allocated_sectors() treats zeroed areas
> >>>> like non-zero data if the end of the checked area isn't aligned. This
> >>>> can improve the efficiency of the conversion and was introduced in
> >>>> commit 8dcd3c9b91a.
> >>>>
> >>>> However, it comes with a correctness problem: qemu-img convert is
> >>>> supposed to sparsify areas that contain only zeros, which it doesn't do
> >>>> any more. It turns out that this even happens when not only the
> >>>> unaligned area is zeroed, but also the blocks before and after it. In
> >>>> the bug report, conversion of a fragmented 10G image containing only
> >>>> zeros resulted in an image consuming 2.82 GiB even though the expected
> >>>> size is only 4 KiB.
> >>>>
> >>>> As a tradeoff between both, let's ignore zeroed sectors only after
> >>>> non-zero data to fix the alignment, but if we're only looking at zeros,
> >>>> keep them as such, even if it may mean additional RMW cycles.
> >>>>
> >>>
> >>> Hmm.. If I understand correctly, we are going to do unaligned
> >>> write-zero. And that helps.
> >>
> >> This can happen (mostly raw images on block devices, I think?), but
> >> usually it just means skipping the write because we know that the target
> >> image is already zeroed.
> >>
> >> What it does mean is that if the next part is data, we'll have an
> >> unaligned data write.
> >>
> >>> Doesn't that mean that alignment is wrongly detected?
> >>
> >> The problem is that you can have bdrv_block_status_above() return the
> >> same allocation status multiple times in a row, but *pnum can be
> >> unaligned for the conversion.
> >>
> >> We only look at a single range returned by it when detecting the
> >> alignment, so it could be that we have zero buffers for both 0-11 and
> >> 12-16 and detect two misaligned ranges, when both together are a
> >> perfectly aligned zeroed range.
> >>
> >> In theory we could try to do some lookahead and merge ranges where
> >> possible, which should give us the perfect result, but it would make the
> >> code considerably more complicated. (Whether we want to merge them
> >> doesn't only depend on the block status, but possibly also on the
> >> content of a DATA range.)
> >>
> >> Kevin
> >>
> >
> > Oh, I understand now the problem, thanks for explanation.
> >
> > Hmm, yes that means, that if the whole buf is zero, is_allocated_sectors must not align it down, to be possibly "merged" with next chunk if it is zero too.
> >
> > But it's still good to align zeroes down, if data starts somewhere inside the buf, isn't it?
> >
> > what about something like this:
> >
> > diff --git a/qemu-img.c b/qemu-img.c
> > index babb5573ab..d1704584a0 100644
> > --- a/qemu-img.c
> > +++ b/qemu-img.c
> > @@ -1167,19 +1167,39 @@ static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum,
> >          }
> >      }
> >  
> > +    if (i == n) {
> > +        /*
> > +         * The whole buf is the same.
> > +         *
> > +         * if it's data, just return it. It's the old behavior.
> > +         *
> > +         * if it's zero, just return too. It will work good if target is alredy
> > +         * zeroed. And if next chunk is zero too we'll have no RMW and no reason
> > +         * to write data.
> > +         */
> > +        *pnum = i;
> > +        return !is_zero;
> > +    }
> > +
> >      tail = (sector_num + i) & (alignment - 1);
> >      if (tail) {
> >          if (is_zero && i <= tail) {
> > -            /* treat unallocated areas which only consist
> > -             * of a small tail as allocated. */
> > +            /*
> > +             * For sure next sector after i is data, and it will rewrite this
> > +             * tail anyway due to RMW. So, let's just write data now.
> > +             */
> >              is_zero = false;
> >          }
> >          if (!is_zero) {
> > -            /* align up end offset of allocated areas. */
> > +            /* If possible, align up end offset of allocated areas. */
> >              i += alignment - tail;
> >              i = MIN(i, n);
> >          } else {
> > -            /* align down end offset of zero areas. */
> > +            /*
> > +             * For sure next sector after i is data, and it will rewrite this
> > +             * tail anyway due to RMW. Better is avoid RMW and write zeroes up
> > +             * to aligned bound.
> > +             */
> >              i -= tail;
> >          }
> >      }
> 
> I think we forgot to follow up on this. Has anyone tested this
> suggestion?
> 
> Otherwise, I would try to rerun the tests I did with the my old and
> Kevins suggestion.

I noticed earlier this week that these patches are still in my
development branch, but didn't actually pick it up again yet. So feel
free to try it out.

Kevin
Peter Lieven Dec. 3, 2021, 11:17 a.m. UTC | #6
Am 19.05.21 um 18:48 schrieb Kevin Wolf:
> Am 19.05.2021 um 15:24 hat Peter Lieven geschrieben:
>> Am 20.04.21 um 18:52 schrieb Vladimir Sementsov-Ogievskiy:
>>> 20.04.2021 18:04, Kevin Wolf wrote:
>>>> Am 20.04.2021 um 16:31 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>> 15.04.2021 18:22, Kevin Wolf wrote:
>>>>>> In order to avoid RMW cycles, is_allocated_sectors() treats zeroed areas
>>>>>> like non-zero data if the end of the checked area isn't aligned. This
>>>>>> can improve the efficiency of the conversion and was introduced in
>>>>>> commit 8dcd3c9b91a.
>>>>>>
>>>>>> However, it comes with a correctness problem: qemu-img convert is
>>>>>> supposed to sparsify areas that contain only zeros, which it doesn't do
>>>>>> any more. It turns out that this even happens when not only the
>>>>>> unaligned area is zeroed, but also the blocks before and after it. In
>>>>>> the bug report, conversion of a fragmented 10G image containing only
>>>>>> zeros resulted in an image consuming 2.82 GiB even though the expected
>>>>>> size is only 4 KiB.
>>>>>>
>>>>>> As a tradeoff between both, let's ignore zeroed sectors only after
>>>>>> non-zero data to fix the alignment, but if we're only looking at zeros,
>>>>>> keep them as such, even if it may mean additional RMW cycles.
>>>>>>
>>>>> Hmm.. If I understand correctly, we are going to do unaligned
>>>>> write-zero. And that helps.
>>>> This can happen (mostly raw images on block devices, I think?), but
>>>> usually it just means skipping the write because we know that the target
>>>> image is already zeroed.
>>>>
>>>> What it does mean is that if the next part is data, we'll have an
>>>> unaligned data write.
>>>>
>>>>> Doesn't that mean that alignment is wrongly detected?
>>>> The problem is that you can have bdrv_block_status_above() return the
>>>> same allocation status multiple times in a row, but *pnum can be
>>>> unaligned for the conversion.
>>>>
>>>> We only look at a single range returned by it when detecting the
>>>> alignment, so it could be that we have zero buffers for both 0-11 and
>>>> 12-16 and detect two misaligned ranges, when both together are a
>>>> perfectly aligned zeroed range.
>>>>
>>>> In theory we could try to do some lookahead and merge ranges where
>>>> possible, which should give us the perfect result, but it would make the
>>>> code considerably more complicated. (Whether we want to merge them
>>>> doesn't only depend on the block status, but possibly also on the
>>>> content of a DATA range.)
>>>>
>>>> Kevin
>>>>
>>> Oh, I understand now the problem, thanks for explanation.
>>>
>>> Hmm, yes that means, that if the whole buf is zero, is_allocated_sectors must not align it down, to be possibly "merged" with next chunk if it is zero too.
>>>
>>> But it's still good to align zeroes down, if data starts somewhere inside the buf, isn't it?
>>>
>>> what about something like this:
>>>
>>> diff --git a/qemu-img.c b/qemu-img.c
>>> index babb5573ab..d1704584a0 100644
>>> --- a/qemu-img.c
>>> +++ b/qemu-img.c
>>> @@ -1167,19 +1167,39 @@ static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum,
>>>          }
>>>      }
>>>  
>>> +    if (i == n) {
>>> +        /*
>>> +         * The whole buf is the same.
>>> +         *
>>> +         * if it's data, just return it. It's the old behavior.
>>> +         *
>>> +         * if it's zero, just return too. It will work good if target is alredy
>>> +         * zeroed. And if next chunk is zero too we'll have no RMW and no reason
>>> +         * to write data.
>>> +         */
>>> +        *pnum = i;
>>> +        return !is_zero;
>>> +    }
>>> +
>>>      tail = (sector_num + i) & (alignment - 1);
>>>      if (tail) {
>>>          if (is_zero && i <= tail) {
>>> -            /* treat unallocated areas which only consist
>>> -             * of a small tail as allocated. */
>>> +            /*
>>> +             * For sure next sector after i is data, and it will rewrite this
>>> +             * tail anyway due to RMW. So, let's just write data now.
>>> +             */
>>>              is_zero = false;
>>>          }
>>>          if (!is_zero) {
>>> -            /* align up end offset of allocated areas. */
>>> +            /* If possible, align up end offset of allocated areas. */
>>>              i += alignment - tail;
>>>              i = MIN(i, n);
>>>          } else {
>>> -            /* align down end offset of zero areas. */
>>> +            /*
>>> +             * For sure next sector after i is data, and it will rewrite this
>>> +             * tail anyway due to RMW. Better is avoid RMW and write zeroes up
>>> +             * to aligned bound.
>>> +             */
>>>              i -= tail;
>>>          }
>>>      }
>> I think we forgot to follow up on this. Has anyone tested this
>> suggestion?
>>
>> Otherwise, I would try to rerun the tests I did with the my old and
>> Kevins suggestion.
> I noticed earlier this week that these patches are still in my
> development branch, but didn't actually pick it up again yet. So feel
> free to try it out.


It seems this time I forgot to follow up. Is this topic still open?


Best,

Peter
Vladimir Sementsov-Ogievskiy Dec. 3, 2021, 11:04 p.m. UTC | #7
03.12.2021 14:17, Peter Lieven wrote:
> Am 19.05.21 um 18:48 schrieb Kevin Wolf:
>> Am 19.05.2021 um 15:24 hat Peter Lieven geschrieben:
>>> Am 20.04.21 um 18:52 schrieb Vladimir Sementsov-Ogievskiy:
>>>> 20.04.2021 18:04, Kevin Wolf wrote:
>>>>> Am 20.04.2021 um 16:31 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>>> 15.04.2021 18:22, Kevin Wolf wrote:
>>>>>>> In order to avoid RMW cycles, is_allocated_sectors() treats zeroed areas
>>>>>>> like non-zero data if the end of the checked area isn't aligned. This
>>>>>>> can improve the efficiency of the conversion and was introduced in
>>>>>>> commit 8dcd3c9b91a.
>>>>>>>
>>>>>>> However, it comes with a correctness problem: qemu-img convert is
>>>>>>> supposed to sparsify areas that contain only zeros, which it doesn't do
>>>>>>> any more. It turns out that this even happens when not only the
>>>>>>> unaligned area is zeroed, but also the blocks before and after it. In
>>>>>>> the bug report, conversion of a fragmented 10G image containing only
>>>>>>> zeros resulted in an image consuming 2.82 GiB even though the expected
>>>>>>> size is only 4 KiB.
>>>>>>>
>>>>>>> As a tradeoff between both, let's ignore zeroed sectors only after
>>>>>>> non-zero data to fix the alignment, but if we're only looking at zeros,
>>>>>>> keep them as such, even if it may mean additional RMW cycles.
>>>>>>>
>>>>>> Hmm.. If I understand correctly, we are going to do unaligned
>>>>>> write-zero. And that helps.
>>>>> This can happen (mostly raw images on block devices, I think?), but
>>>>> usually it just means skipping the write because we know that the target
>>>>> image is already zeroed.
>>>>>
>>>>> What it does mean is that if the next part is data, we'll have an
>>>>> unaligned data write.
>>>>>
>>>>>> Doesn't that mean that alignment is wrongly detected?
>>>>> The problem is that you can have bdrv_block_status_above() return the
>>>>> same allocation status multiple times in a row, but *pnum can be
>>>>> unaligned for the conversion.
>>>>>
>>>>> We only look at a single range returned by it when detecting the
>>>>> alignment, so it could be that we have zero buffers for both 0-11 and
>>>>> 12-16 and detect two misaligned ranges, when both together are a
>>>>> perfectly aligned zeroed range.
>>>>>
>>>>> In theory we could try to do some lookahead and merge ranges where
>>>>> possible, which should give us the perfect result, but it would make the
>>>>> code considerably more complicated. (Whether we want to merge them
>>>>> doesn't only depend on the block status, but possibly also on the
>>>>> content of a DATA range.)
>>>>>
>>>>> Kevin
>>>>>
>>>> Oh, I understand now the problem, thanks for explanation.
>>>>
>>>> Hmm, yes that means, that if the whole buf is zero, is_allocated_sectors must not align it down, to be possibly "merged" with next chunk if it is zero too.
>>>>
>>>> But it's still good to align zeroes down, if data starts somewhere inside the buf, isn't it?
>>>>
>>>> what about something like this:
>>>>
>>>> diff --git a/qemu-img.c b/qemu-img.c
>>>> index babb5573ab..d1704584a0 100644
>>>> --- a/qemu-img.c
>>>> +++ b/qemu-img.c
>>>> @@ -1167,19 +1167,39 @@ static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum,
>>>>           }
>>>>       }
>>>>   
>>>> +    if (i == n) {
>>>> +        /*
>>>> +         * The whole buf is the same.
>>>> +         *
>>>> +         * if it's data, just return it. It's the old behavior.
>>>> +         *
>>>> +         * if it's zero, just return too. It will work good if target is alredy
>>>> +         * zeroed. And if next chunk is zero too we'll have no RMW and no reason
>>>> +         * to write data.
>>>> +         */
>>>> +        *pnum = i;
>>>> +        return !is_zero;
>>>> +    }
>>>> +
>>>>       tail = (sector_num + i) & (alignment - 1);
>>>>       if (tail) {
>>>>           if (is_zero && i <= tail) {
>>>> -            /* treat unallocated areas which only consist
>>>> -             * of a small tail as allocated. */
>>>> +            /*
>>>> +             * For sure next sector after i is data, and it will rewrite this
>>>> +             * tail anyway due to RMW. So, let's just write data now.
>>>> +             */
>>>>               is_zero = false;
>>>>           }
>>>>           if (!is_zero) {
>>>> -            /* align up end offset of allocated areas. */
>>>> +            /* If possible, align up end offset of allocated areas. */
>>>>               i += alignment - tail;
>>>>               i = MIN(i, n);
>>>>           } else {
>>>> -            /* align down end offset of zero areas. */
>>>> +            /*
>>>> +             * For sure next sector after i is data, and it will rewrite this
>>>> +             * tail anyway due to RMW. Better is avoid RMW and write zeroes up
>>>> +             * to aligned bound.
>>>> +             */
>>>>               i -= tail;
>>>>           }
>>>>       }
>>> I think we forgot to follow up on this. Has anyone tested this
>>> suggestion?
>>>
>>> Otherwise, I would try to rerun the tests I did with the my old and
>>> Kevins suggestion.
>> I noticed earlier this week that these patches are still in my
>> development branch, but didn't actually pick it up again yet. So feel
>> free to try it out.
> 
> 
> It seems this time I forgot to follow up. Is this topic still open?
> 

Most probably yes :) I now checked, that my proposed diff is still applicable to master and don't break compilation. So, if you have some test, you can check if it works better with the change.
Peter Lieven Dec. 4, 2021, 9:53 a.m. UTC | #8
> Am 04.12.2021 um 00:04 schrieb Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>:
> 
> 03.12.2021 14:17, Peter Lieven wrote:
>>> Am 19.05.21 um 18:48 schrieb Kevin Wolf:
>>> Am 19.05.2021 um 15:24 hat Peter Lieven geschrieben:
>>>> Am 20.04.21 um 18:52 schrieb Vladimir Sementsov-Ogievskiy:
>>>>> 20.04.2021 18:04, Kevin Wolf wrote:
>>>>>> Am 20.04.2021 um 16:31 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>>>> 15.04.2021 18:22, Kevin Wolf wrote:
>>>>>>>> In order to avoid RMW cycles, is_allocated_sectors() treats zeroed areas
>>>>>>>> like non-zero data if the end of the checked area isn't aligned. This
>>>>>>>> can improve the efficiency of the conversion and was introduced in
>>>>>>>> commit 8dcd3c9b91a.
>>>>>>>> 
>>>>>>>> However, it comes with a correctness problem: qemu-img convert is
>>>>>>>> supposed to sparsify areas that contain only zeros, which it doesn't do
>>>>>>>> any more. It turns out that this even happens when not only the
>>>>>>>> unaligned area is zeroed, but also the blocks before and after it. In
>>>>>>>> the bug report, conversion of a fragmented 10G image containing only
>>>>>>>> zeros resulted in an image consuming 2.82 GiB even though the expected
>>>>>>>> size is only 4 KiB.
>>>>>>>> 
>>>>>>>> As a tradeoff between both, let's ignore zeroed sectors only after
>>>>>>>> non-zero data to fix the alignment, but if we're only looking at zeros,
>>>>>>>> keep them as such, even if it may mean additional RMW cycles.
>>>>>>>> 
>>>>>>> Hmm.. If I understand correctly, we are going to do unaligned
>>>>>>> write-zero. And that helps.
>>>>>> This can happen (mostly raw images on block devices, I think?), but
>>>>>> usually it just means skipping the write because we know that the target
>>>>>> image is already zeroed.
>>>>>> 
>>>>>> What it does mean is that if the next part is data, we'll have an
>>>>>> unaligned data write.
>>>>>> 
>>>>>>> Doesn't that mean that alignment is wrongly detected?
>>>>>> The problem is that you can have bdrv_block_status_above() return the
>>>>>> same allocation status multiple times in a row, but *pnum can be
>>>>>> unaligned for the conversion.
>>>>>> 
>>>>>> We only look at a single range returned by it when detecting the
>>>>>> alignment, so it could be that we have zero buffers for both 0-11 and
>>>>>> 12-16 and detect two misaligned ranges, when both together are a
>>>>>> perfectly aligned zeroed range.
>>>>>> 
>>>>>> In theory we could try to do some lookahead and merge ranges where
>>>>>> possible, which should give us the perfect result, but it would make the
>>>>>> code considerably more complicated. (Whether we want to merge them
>>>>>> doesn't only depend on the block status, but possibly also on the
>>>>>> content of a DATA range.)
>>>>>> 
>>>>>> Kevin
>>>>>> 
>>>>> Oh, I understand now the problem, thanks for explanation.
>>>>> 
>>>>> Hmm, yes that means, that if the whole buf is zero, is_allocated_sectors must not align it down, to be possibly "merged" with next chunk if it is zero too.
>>>>> 
>>>>> But it's still good to align zeroes down, if data starts somewhere inside the buf, isn't it?
>>>>> 
>>>>> what about something like this:
>>>>> 
>>>>> diff --git a/qemu-img.c b/qemu-img.c
>>>>> index babb5573ab..d1704584a0 100644
>>>>> --- a/qemu-img.c
>>>>> +++ b/qemu-img.c
>>>>> @@ -1167,19 +1167,39 @@ static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum,
>>>>>          }
>>>>>      }
>>>>>  +    if (i == n) {
>>>>> +        /*
>>>>> +         * The whole buf is the same.
>>>>> +         *
>>>>> +         * if it's data, just return it. It's the old behavior.
>>>>> +         *
>>>>> +         * if it's zero, just return too. It will work good if target is alredy
>>>>> +         * zeroed. And if next chunk is zero too we'll have no RMW and no reason
>>>>> +         * to write data.
>>>>> +         */
>>>>> +        *pnum = i;
>>>>> +        return !is_zero;
>>>>> +    }
>>>>> +
>>>>>      tail = (sector_num + i) & (alignment - 1);
>>>>>      if (tail) {
>>>>>          if (is_zero && i <= tail) {
>>>>> -            /* treat unallocated areas which only consist
>>>>> -             * of a small tail as allocated. */
>>>>> +            /*
>>>>> +             * For sure next sector after i is data, and it will rewrite this
>>>>> +             * tail anyway due to RMW. So, let's just write data now.
>>>>> +             */
>>>>>              is_zero = false;
>>>>>          }
>>>>>          if (!is_zero) {
>>>>> -            /* align up end offset of allocated areas. */
>>>>> +            /* If possible, align up end offset of allocated areas. */
>>>>>              i += alignment - tail;
>>>>>              i = MIN(i, n);
>>>>>          } else {
>>>>> -            /* align down end offset of zero areas. */
>>>>> +            /*
>>>>> +             * For sure next sector after i is data, and it will rewrite this
>>>>> +             * tail anyway due to RMW. Better is avoid RMW and write zeroes up
>>>>> +             * to aligned bound.
>>>>> +             */
>>>>>              i -= tail;
>>>>>          }
>>>>>      }
>>>> I think we forgot to follow up on this. Has anyone tested this
>>>> suggestion?
>>>> 
>>>> Otherwise, I would try to rerun the tests I did with the my old and
>>>> Kevins suggestion.
>>> I noticed earlier this week that these patches are still in my
>>> development branch, but didn't actually pick it up again yet. So feel
>>> free to try it out.
>> It seems this time I forgot to follow up. Is this topic still open?
> 
> Most probably yes :) I now checked, that my proposed diff is still applicable to master and don't break compilation. So, if you have some test, you can check if it works better with the change.

Unfortunately the reproducer in the original ticket does not work for me. It might also depend on the XFS version used.
What I can say is that your patch has no negative impact for my use cases.
If I find the time I will try the reproducer on CentOS7

Best,
Peter
Peter Lieven Dec. 17, 2021, 2:43 p.m. UTC | #9
Am 04.12.21 um 00:04 schrieb Vladimir Sementsov-Ogievskiy:
> 03.12.2021 14:17, Peter Lieven wrote:
>> Am 19.05.21 um 18:48 schrieb Kevin Wolf:
>>> Am 19.05.2021 um 15:24 hat Peter Lieven geschrieben:
>>>> Am 20.04.21 um 18:52 schrieb Vladimir Sementsov-Ogievskiy:
>>>>> 20.04.2021 18:04, Kevin Wolf wrote:
>>>>>> Am 20.04.2021 um 16:31 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>>>> 15.04.2021 18:22, Kevin Wolf wrote:
>>>>>>>> In order to avoid RMW cycles, is_allocated_sectors() treats zeroed areas
>>>>>>>> like non-zero data if the end of the checked area isn't aligned. This
>>>>>>>> can improve the efficiency of the conversion and was introduced in
>>>>>>>> commit 8dcd3c9b91a.
>>>>>>>>
>>>>>>>> However, it comes with a correctness problem: qemu-img convert is
>>>>>>>> supposed to sparsify areas that contain only zeros, which it doesn't do
>>>>>>>> any more. It turns out that this even happens when not only the
>>>>>>>> unaligned area is zeroed, but also the blocks before and after it. In
>>>>>>>> the bug report, conversion of a fragmented 10G image containing only
>>>>>>>> zeros resulted in an image consuming 2.82 GiB even though the expected
>>>>>>>> size is only 4 KiB.
>>>>>>>>
>>>>>>>> As a tradeoff between both, let's ignore zeroed sectors only after
>>>>>>>> non-zero data to fix the alignment, but if we're only looking at zeros,
>>>>>>>> keep them as such, even if it may mean additional RMW cycles.
>>>>>>>>
>>>>>>> Hmm.. If I understand correctly, we are going to do unaligned
>>>>>>> write-zero. And that helps.
>>>>>> This can happen (mostly raw images on block devices, I think?), but
>>>>>> usually it just means skipping the write because we know that the target
>>>>>> image is already zeroed.
>>>>>>
>>>>>> What it does mean is that if the next part is data, we'll have an
>>>>>> unaligned data write.
>>>>>>
>>>>>>> Doesn't that mean that alignment is wrongly detected?
>>>>>> The problem is that you can have bdrv_block_status_above() return the
>>>>>> same allocation status multiple times in a row, but *pnum can be
>>>>>> unaligned for the conversion.
>>>>>>
>>>>>> We only look at a single range returned by it when detecting the
>>>>>> alignment, so it could be that we have zero buffers for both 0-11 and
>>>>>> 12-16 and detect two misaligned ranges, when both together are a
>>>>>> perfectly aligned zeroed range.
>>>>>>
>>>>>> In theory we could try to do some lookahead and merge ranges where
>>>>>> possible, which should give us the perfect result, but it would make the
>>>>>> code considerably more complicated. (Whether we want to merge them
>>>>>> doesn't only depend on the block status, but possibly also on the
>>>>>> content of a DATA range.)
>>>>>>
>>>>>> Kevin
>>>>>>
>>>>> Oh, I understand now the problem, thanks for explanation.
>>>>>
>>>>> Hmm, yes that means, that if the whole buf is zero, is_allocated_sectors must not align it down, to be possibly "merged" with next chunk if it is zero too.
>>>>>
>>>>> But it's still good to align zeroes down, if data starts somewhere inside the buf, isn't it?
>>>>>
>>>>> what about something like this:
>>>>>
>>>>> diff --git a/qemu-img.c b/qemu-img.c
>>>>> index babb5573ab..d1704584a0 100644
>>>>> --- a/qemu-img.c
>>>>> +++ b/qemu-img.c
>>>>> @@ -1167,19 +1167,39 @@ static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum,
>>>>>           }
>>>>>       }
>>>>>   +    if (i == n) {
>>>>> +        /*
>>>>> +         * The whole buf is the same.
>>>>> +         *
>>>>> +         * if it's data, just return it. It's the old behavior.
>>>>> +         *
>>>>> +         * if it's zero, just return too. It will work good if target is alredy
>>>>> +         * zeroed. And if next chunk is zero too we'll have no RMW and no reason
>>>>> +         * to write data.
>>>>> +         */
>>>>> +        *pnum = i;
>>>>> +        return !is_zero;
>>>>> +    }
>>>>> +
>>>>>       tail = (sector_num + i) & (alignment - 1);
>>>>>       if (tail) {
>>>>>           if (is_zero && i <= tail) {
>>>>> -            /* treat unallocated areas which only consist
>>>>> -             * of a small tail as allocated. */
>>>>> +            /*
>>>>> +             * For sure next sector after i is data, and it will rewrite this
>>>>> +             * tail anyway due to RMW. So, let's just write data now.
>>>>> +             */
>>>>>               is_zero = false;
>>>>>           }
>>>>>           if (!is_zero) {
>>>>> -            /* align up end offset of allocated areas. */
>>>>> +            /* If possible, align up end offset of allocated areas. */
>>>>>               i += alignment - tail;
>>>>>               i = MIN(i, n);
>>>>>           } else {
>>>>> -            /* align down end offset of zero areas. */
>>>>> +            /*
>>>>> +             * For sure next sector after i is data, and it will rewrite this
>>>>> +             * tail anyway due to RMW. Better is avoid RMW and write zeroes up
>>>>> +             * to aligned bound.
>>>>> +             */
>>>>>               i -= tail;
>>>>>           }
>>>>>       }
>>>> I think we forgot to follow up on this. Has anyone tested this
>>>> suggestion?
>>>>
>>>> Otherwise, I would try to rerun the tests I did with the my old and
>>>> Kevins suggestion.
>>> I noticed earlier this week that these patches are still in my
>>> development branch, but didn't actually pick it up again yet. So feel
>>> free to try it out.
>>
>>
>> It seems this time I forgot to follow up. Is this topic still open?
>>
>
> Most probably yes :) I now checked, that my proposed diff is still applicable to master and don't break compilation. So, if you have some test, you can check if it works better with the change.
>
Hi Vladimir, hi Kevin,


I was able to reproduce on CentOS 7 with qemu 5.1.0. The effect of the bad file is not as worse as in the ticket, but Vladimirs proposed fix solves the issue. I can also confirm that it does not

introduce additional overhead to my use cases. Vladimir, can you send a proper path please?

[root@VCORE-188 qemu]# ./qemu-img create -f raw -o extent_size_hint=1M /mnt/test.raw 10G
Formatting '/mnt/test.raw', fmt=raw size=10737418240 extent_size_hint=1048576
[root@VCORE-188 qemu]# ./qemu-img bench -f raw -t none -n -w /mnt/test.raw -c 1000000 -S 8192 -o 0
Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 0, step size 8192)
Run completed in 397.917 seconds.
[root@VCORE-188 qemu]# filefrag /mnt/test.raw
/mnt/test.raw: 1000000 extents found
[root@VCORE-188 qemu]# ./qemu-img bench -f raw -t none -n -w /mnt/test.raw -c 1000000 -S 8192 -o 4096
Sending 1000000 write requests, 4096 bytes each, 64 in parallel (starting at offset 4096, step size 8192)
Run completed in 200.162 seconds.
[root@VCORE-188 qemu]# filefrag /mnt/test.raw
/mnt/test.raw: 2000000 extents found
[root@VCORE-188 qemu]# ./qemu-img convert -p -f raw -O raw /mnt/test.raw /mnt/convert.raw
    (100.00/100%)
[root@VCORE-188 qemu]# ./qemu-img info /mnt/convert.raw
image: /mnt/convert.raw
file format: raw
virtual size: 10 GiB (10737418240 bytes)
disk size: 11.3 MiB


11.3 MiB is not as worse as the several GB from the ticket, but I see a disk size where there should be no usage at all.


[root@VCORE-188 qemu]# vi qemu-img.c
[root@VCORE-188 qemu]# make -f Makefile -j8 qemu-img
make[1]: Entering directory `/usr/src/qemu/slirp'
make[1]: Für das Ziel »all« ist nichts zu tun.
make[1]: Leaving directory `/usr/src/qemu/slirp'
  CC      qemu-img.o
  LINK    qemu-img

applied Vladimirs fix with the handling of (i==n) here.


[root@VCORE-188 qemu]# ./qemu-img convert -p -f raw -O raw /mnt/test.raw /mnt/convert.raw
    (100.00/100%)
[root@VCORE-188 qemu]# ./qemu-img info /mnt/convert.raw
image: /mnt/convert.raw
file format: raw
virtual size: 10 GiB (10737418240 bytes)
disk size: 4 KiB


Best,

Peter
diff mbox series

Patch

diff --git a/qemu-img.c b/qemu-img.c
index a5993682aa..ca4eba2dd1 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1168,20 +1168,10 @@  static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum,
     }
 
     tail = (sector_num + i) & (alignment - 1);
-    if (tail) {
-        if (is_zero && i <= tail) {
-            /* treat unallocated areas which only consist
-             * of a small tail as allocated. */
-            is_zero = false;
-        }
-        if (!is_zero) {
-            /* align up end offset of allocated areas. */
-            i += alignment - tail;
-            i = MIN(i, n);
-        } else {
-            /* align down end offset of zero areas. */
-            i -= tail;
-        }
+    if (tail && !is_zero) {
+        /* align up end offset of allocated areas. */
+        i += alignment - tail;
+        i = MIN(i, n);
     }
     *pnum = i;
     return !is_zero;
diff --git a/tests/qemu-iotests/122.out b/tests/qemu-iotests/122.out
index dcc44a2304..fe0ea34164 100644
--- a/tests/qemu-iotests/122.out
+++ b/tests/qemu-iotests/122.out
@@ -199,11 +199,9 @@  convert -S 4k
 [{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
 { "start": 4096, "length": 4096, "depth": 0, "zero": true, "data": false},
 { "start": 8192, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
-{ "start": 12288, "length": 4096, "depth": 0, "zero": true, "data": false},
-{ "start": 16384, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
-{ "start": 20480, "length": 46080, "depth": 0, "zero": true, "data": false},
-{ "start": 66560, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
-{ "start": 67584, "length": 67041280, "depth": 0, "zero": true, "data": false}]
+{ "start": 12288, "length": 5120, "depth": 0, "zero": true, "data": false},
+{ "start": 17408, "length": 3072, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
+{ "start": 20480, "length": 67088384, "depth": 0, "zero": true, "data": false}]
 
 convert -c -S 4k
 [{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true},
@@ -215,9 +213,7 @@  convert -c -S 4k
 
 convert -S 8k
 [{ "start": 0, "length": 24576, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
-{ "start": 24576, "length": 41984, "depth": 0, "zero": true, "data": false},
-{ "start": 66560, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": OFFSET},
-{ "start": 67584, "length": 67041280, "depth": 0, "zero": true, "data": false}]
+{ "start": 24576, "length": 67084288, "depth": 0, "zero": true, "data": false}]
 
 convert -c -S 8k
 [{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true},