diff mbox series

[1/4] block/io: fix bdrv_co_block_status_above

Message ID 20191116163410.12129-2-vsementsov@virtuozzo.com
State New
Headers show
Series fix & merge block_status_above and is_allocated_above | expand

Commit Message

Vladimir Sementsov-Ogievskiy Nov. 16, 2019, 4:34 p.m. UTC
bdrv_co_block_status_above has several problems with handling short
backing files:

1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
which produces these after-EOF zeros is inside requested backing
sequesnce.

2. With want_zeros=false, it will just stop inside requested region, if
we have unallocated region in top node when underlying backing is
short.

Fix these things, making logic about short backing files clearer.

Note that 154 output changed, because now bdrv_block_status_above don't
merge unallocated zeros with zeros after EOF (which are actually
"allocated" in POV of read from backing-chain top) and is_zero() just
don't understand that the whole head or tail is zero. We may update
is_zero to call bdrv_block_status_above several times, or add flag to
bdrv_block_status_above that we are not interested in ALLOCATED flag,
so ranges with different ALLOCATED status may be merged, but actually,
it seems that we'd better don't care about this corner case.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/io.c                 | 41 ++++++++++++++++++++++++++++----------
 tests/qemu-iotests/154.out |  4 ++--
 2 files changed, 32 insertions(+), 13 deletions(-)

Comments

Kevin Wolf Nov. 25, 2019, 4 p.m. UTC | #1
Am 16.11.2019 um 17:34 hat Vladimir Sementsov-Ogievskiy geschrieben:
> bdrv_co_block_status_above has several problems with handling short
> backing files:
> 
> 1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
> without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
> which produces these after-EOF zeros is inside requested backing
> sequesnce.

s/sequesnce/sequence/

> 
> 2. With want_zeros=false, it will just stop inside requested region, if
> we have unallocated region in top node when underlying backing is
> short.

I honestly don't understand this one. Can you rephrase/explain in more
detail what you mean by "stop inside [the] requested region"?

> Fix these things, making logic about short backing files clearer.
> 
> Note that 154 output changed, because now bdrv_block_status_above don't
> merge unallocated zeros with zeros after EOF (which are actually
> "allocated" in POV of read from backing-chain top) and is_zero() just
> don't understand that the whole head or tail is zero. We may update
> is_zero to call bdrv_block_status_above several times, or add flag to
> bdrv_block_status_above that we are not interested in ALLOCATED flag,
> so ranges with different ALLOCATED status may be merged, but actually,
> it seems that we'd better don't care about this corner case.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/io.c                 | 41 ++++++++++++++++++++++++++++----------
>  tests/qemu-iotests/154.out |  4 ++--
>  2 files changed, 32 insertions(+), 13 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index f75777f5ea..4d7fa99bd2 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2434,25 +2434,44 @@ static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
>          ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
>                                     file);
>          if (ret < 0) {
> -            break;
> +            return ret;
>          }
> -        if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
> +        if (*pnum == 0) {
> +            if (first) {
> +                return ret;
> +            }
> +
>              /*
> -             * Reading beyond the end of the file continues to read
> -             * zeroes, but we can only widen the result to the
> -             * unallocated length we learned from an earlier
> -             * iteration.
> +             * Reads from bs for selected region will return zeroes, produced
> +             * because current level is short. We should consider it as
> +             * allocated.

"the selected region"
"the current level"

> +             * TODO: Should we report p as file here?

I think that would make sense.

>               */
> +            assert(ret & BDRV_BLOCK_EOF);

Can this assertion be moved above the if (first)?

>              *pnum = bytes;
> +            return BDRV_BLOCK_ZERO | BDRV_BLOCK_ALLOCATED;
>          }
> -        if (ret & (BDRV_BLOCK_ZERO | BDRV_BLOCK_DATA)) {
> -            break;
> +        if (ret & BDRV_BLOCK_ALLOCATED) {
> +            /* We've found the node and the status, we must return. */
> +
> +            if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
> +                /*
> +                 * This level also responsible for reads after EOF inside
> +                 * unallocated region in previous level.

"is also responsible"
"the unallocated region in the previous level"

> +                 */
> +                *pnum = bytes;
> +            }
> +
> +            return ret;
>          }
> -        /* [offset, pnum] unallocated on this layer, which could be only
> -         * the first part of [offset, bytes].  */

Any reason for deleting this comment? I think it's still valid.

> -        bytes = MIN(bytes, *pnum);
> +
> +        /* Proceed to backing */
> +        assert(*pnum <= bytes);
> +        bytes = *pnum;
>          first = false;
>      }
> +
>      return ret;
>  }

Kevin
Vladimir Sementsov-Ogievskiy Nov. 26, 2019, 7:26 a.m. UTC | #2
25.11.2019 19:00, Kevin Wolf wrote:
> Am 16.11.2019 um 17:34 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> bdrv_co_block_status_above has several problems with handling short
>> backing files:
>>
>> 1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
>> without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
>> which produces these after-EOF zeros is inside requested backing
>> sequesnce.
> 
> s/sequesnce/sequence/
> 
>>
>> 2. With want_zeros=false, it will just stop inside requested region, if
>> we have unallocated region in top node when underlying backing is
>> short.
> 
> I honestly don't understand this one. Can you rephrase/explain in more
> detail what you mean by "stop inside [the] requested region"?

Hmm, yes, bad description. I mean, it may return pnum=0 prior to actual EOF,
because of EOF of short backing file.

> 
>> Fix these things, making logic about short backing files clearer.
>>
>> Note that 154 output changed, because now bdrv_block_status_above don't
>> merge unallocated zeros with zeros after EOF (which are actually
>> "allocated" in POV of read from backing-chain top) and is_zero() just
>> don't understand that the whole head or tail is zero. We may update
>> is_zero to call bdrv_block_status_above several times, or add flag to
>> bdrv_block_status_above that we are not interested in ALLOCATED flag,
>> so ranges with different ALLOCATED status may be merged, but actually,
>> it seems that we'd better don't care about this corner case.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/io.c                 | 41 ++++++++++++++++++++++++++++----------
>>   tests/qemu-iotests/154.out |  4 ++--
>>   2 files changed, 32 insertions(+), 13 deletions(-)
>>
>> diff --git a/block/io.c b/block/io.c
>> index f75777f5ea..4d7fa99bd2 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -2434,25 +2434,44 @@ static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
>>           ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
>>                                      file);
>>           if (ret < 0) {
>> -            break;
>> +            return ret;
>>           }
>> -        if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
>> +        if (*pnum == 0) {
>> +            if (first) {
>> +                return ret;
>> +            }
>> +
>>               /*
>> -             * Reading beyond the end of the file continues to read
>> -             * zeroes, but we can only widen the result to the
>> -             * unallocated length we learned from an earlier
>> -             * iteration.
>> +             * Reads from bs for selected region will return zeroes, produced
>> +             * because current level is short. We should consider it as
>> +             * allocated.
> 
> "the selected region"
> "the current level"
> 
>> +             * TODO: Should we report p as file here?
> 
> I think that would make sense.
> 
>>                */
>> +            assert(ret & BDRV_BLOCK_EOF);
> 
> Can this assertion be moved above the if (first)?

it may correspond to requested bytes==0.. But we can check it separately
before for loop and move this assertion.

> 
>>               *pnum = bytes;
>> +            return BDRV_BLOCK_ZERO | BDRV_BLOCK_ALLOCATED;
>>           }
>> -        if (ret & (BDRV_BLOCK_ZERO | BDRV_BLOCK_DATA)) {
>> -            break;
>> +        if (ret & BDRV_BLOCK_ALLOCATED) {
>> +            /* We've found the node and the status, we must return. */
>> +
>> +            if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
>> +                /*
>> +                 * This level also responsible for reads after EOF inside
>> +                 * unallocated region in previous level.
> 
> "is also responsible"
> "the unallocated region in the previous level"
> 
>> +                 */
>> +                *pnum = bytes;
>> +            }
>> +
>> +            return ret;
>>           }
>> -        /* [offset, pnum] unallocated on this layer, which could be only
>> -         * the first part of [offset, bytes].  */
> 
> Any reason for deleting this comment? I think it's still valid.

Hmm, I decided that it's obvious and shorter comment is enough.
I can leave it, of course.

> 
>> -        bytes = MIN(bytes, *pnum);
>> +
>> +        /* Proceed to backing */
>> +        assert(*pnum <= bytes);
>> +        bytes = *pnum;
>>           first = false;
>>       }
>> +
>>       return ret;
>>   }
> 
> Kevin
>
Kevin Wolf Nov. 26, 2019, 2:20 p.m. UTC | #3
Am 26.11.2019 um 08:26 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 25.11.2019 19:00, Kevin Wolf wrote:
> > Am 16.11.2019 um 17:34 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> bdrv_co_block_status_above has several problems with handling short
> >> backing files:
> >>
> >> 1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but
> >> without BDRV_BLOCK_ALLOCATED flag, when actually short backing file
> >> which produces these after-EOF zeros is inside requested backing
> >> sequesnce.
> > 
> > s/sequesnce/sequence/
> > 
> >>
> >> 2. With want_zeros=false, it will just stop inside requested region, if
> >> we have unallocated region in top node when underlying backing is
> >> short.
> > 
> > I honestly don't understand this one. Can you rephrase/explain in more
> > detail what you mean by "stop inside [the] requested region"?
> 
> Hmm, yes, bad description. I mean, it may return pnum=0 prior to actual EOF,
> because of EOF of short backing file.

Ah, yes, that's true. Definitely mention pnum=0 in the comment, this
explanation is much clearer.

> >> Fix these things, making logic about short backing files clearer.
> >>
> >> Note that 154 output changed, because now bdrv_block_status_above don't
> >> merge unallocated zeros with zeros after EOF (which are actually
> >> "allocated" in POV of read from backing-chain top) and is_zero() just
> >> don't understand that the whole head or tail is zero. We may update
> >> is_zero to call bdrv_block_status_above several times, or add flag to
> >> bdrv_block_status_above that we are not interested in ALLOCATED flag,
> >> so ranges with different ALLOCATED status may be merged, but actually,
> >> it seems that we'd better don't care about this corner case.
> >>
> >> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >> ---
> >>   block/io.c                 | 41 ++++++++++++++++++++++++++++----------
> >>   tests/qemu-iotests/154.out |  4 ++--
> >>   2 files changed, 32 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/block/io.c b/block/io.c
> >> index f75777f5ea..4d7fa99bd2 100644
> >> --- a/block/io.c
> >> +++ b/block/io.c
> >> @@ -2434,25 +2434,44 @@ static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
> >>           ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
> >>                                      file);
> >>           if (ret < 0) {
> >> -            break;
> >> +            return ret;
> >>           }
> >> -        if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
> >> +        if (*pnum == 0) {
> >> +            if (first) {
> >> +                return ret;
> >> +            }
> >> +
> >>               /*
> >> -             * Reading beyond the end of the file continues to read
> >> -             * zeroes, but we can only widen the result to the
> >> -             * unallocated length we learned from an earlier
> >> -             * iteration.
> >> +             * Reads from bs for selected region will return zeroes, produced
> >> +             * because current level is short. We should consider it as
> >> +             * allocated.
> > 
> > "the selected region"
> > "the current level"
> > 
> >> +             * TODO: Should we report p as file here?
> > 
> > I think that would make sense.
> > 
> >>                */
> >> +            assert(ret & BDRV_BLOCK_EOF);
> > 
> > Can this assertion be moved above the if (first)?
> 
> it may correspond to requested bytes==0.. But we can check it separately
> before for loop and move this assertion.

Ah, right. Don't bother then, it's fine either way.

Kevin
diff mbox series

Patch

diff --git a/block/io.c b/block/io.c
index f75777f5ea..4d7fa99bd2 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2434,25 +2434,44 @@  static int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
         ret = bdrv_co_block_status(p, want_zero, offset, bytes, pnum, map,
                                    file);
         if (ret < 0) {
-            break;
+            return ret;
         }
-        if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
+        if (*pnum == 0) {
+            if (first) {
+                return ret;
+            }
+
             /*
-             * Reading beyond the end of the file continues to read
-             * zeroes, but we can only widen the result to the
-             * unallocated length we learned from an earlier
-             * iteration.
+             * Reads from bs for selected region will return zeroes, produced
+             * because current level is short. We should consider it as
+             * allocated.
+             *
+             * TODO: Should we report p as file here?
              */
+            assert(ret & BDRV_BLOCK_EOF);
             *pnum = bytes;
+            return BDRV_BLOCK_ZERO | BDRV_BLOCK_ALLOCATED;
         }
-        if (ret & (BDRV_BLOCK_ZERO | BDRV_BLOCK_DATA)) {
-            break;
+        if (ret & BDRV_BLOCK_ALLOCATED) {
+            /* We've found the node and the status, we must return. */
+
+            if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) {
+                /*
+                 * This level also responsible for reads after EOF inside
+                 * unallocated region in previous level.
+                 */
+                *pnum = bytes;
+            }
+
+            return ret;
         }
-        /* [offset, pnum] unallocated on this layer, which could be only
-         * the first part of [offset, bytes].  */
-        bytes = MIN(bytes, *pnum);
+
+        /* Proceed to backing */
+        assert(*pnum <= bytes);
+        bytes = *pnum;
         first = false;
     }
+
     return ret;
 }
 
diff --git a/tests/qemu-iotests/154.out b/tests/qemu-iotests/154.out
index fa3673317f..a203dfcadd 100644
--- a/tests/qemu-iotests/154.out
+++ b/tests/qemu-iotests/154.out
@@ -310,13 +310,13 @@  wrote 512/512 bytes at offset 134217728
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 2048/2048 bytes allocated at offset 128 MiB
 [{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
-{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}]
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": OFFSET}]
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
 wrote 512/512 bytes at offset 134219264
 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 2048/2048 bytes allocated at offset 128 MiB
 [{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false},
-{ "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}]
+{ "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": OFFSET}]
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base
 wrote 1024/1024 bytes at offset 134218240
 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)