diff mbox series

[v4,20/30] qcow2: Add subcluster support to discard_in_l2_slice()

Message ID 99b45e3beb4a38b17eb50fcde1e09cdefdb99724.1584468723.git.berto@igalia.com
State New
Headers show
Series Add subcluster allocation to qcow2 | expand

Commit Message

Alberto Garcia March 17, 2020, 6:16 p.m. UTC
Two changes are needed in this function:

1) A full discard deallocates a cluster so we can skip the operation if
   it is already unallocated. With extended L2 entries however if any
   of the subclusters has the 'all zeroes' bit set then we have to
   clear it.

2) Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
   image has extended L2 entries. Instead, the individual 'all zeroes'
   bits must be used.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 block/qcow2-cluster.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

Comments

Max Reitz April 9, 2020, 10:05 a.m. UTC | #1
On 17.03.20 19:16, Alberto Garcia wrote:
> Two changes are needed in this function:
> 
> 1) A full discard deallocates a cluster so we can skip the operation if
>    it is already unallocated. With extended L2 entries however if any
>    of the subclusters has the 'all zeroes' bit set then we have to
>    clear it.
> 
> 2) Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
>    image has extended L2 entries. Instead, the individual 'all zeroes'
>    bits must be used.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  block/qcow2-cluster.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 746006a117..824c710760 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1790,12 +1790,20 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>           * TODO We might want to use bdrv_block_status(bs) here, but we're
>           * holding s->lock, so that doesn't work today.
>           *
> -         * If full_discard is true, the sector should not read back as zeroes,
> +         * If full_discard is true, the cluster should not read back as zeroes,
>           * but rather fall through to the backing file.
>           */
>          switch (qcow2_get_cluster_type(bs, old_l2_entry)) {
>          case QCOW2_CLUSTER_UNALLOCATED:
> -            if (full_discard || !bs->backing) {
> +            if (full_discard) {
> +                /* If the image has extended L2 entries we can only
> +                 * skip this operation if the L2 bitmap is zero. */
> +                uint64_t bitmap = has_subclusters(s) ?
> +                    get_l2_bitmap(s, l2_slice, l2_index + i) : 0;

Isn’t this bitmap only valid for standard clusters?  In this case, the
whole cluster is unallocated, so the bitmap shouldn’t be relevant, AFAIU.

Max

> +                if (bitmap == 0) {
> +                    continue;
> +                }
> +            } else if (!bs->backing) {
>                  continue;
>              }
>              break;
Alberto Garcia April 10, 2020, 12:47 p.m. UTC | #2
On Thu 09 Apr 2020 12:05:12 PM CEST, Max Reitz wrote:
>>          switch (qcow2_get_cluster_type(bs, old_l2_entry)) {
>>          case QCOW2_CLUSTER_UNALLOCATED:
>> -            if (full_discard || !bs->backing) {
>> +            if (full_discard) {
>> +                /* If the image has extended L2 entries we can only
>> +                 * skip this operation if the L2 bitmap is zero. */
>> +                uint64_t bitmap = has_subclusters(s) ?
>> +                    get_l2_bitmap(s, l2_slice, l2_index + i) : 0;
>
> Isn’t this bitmap only valid for standard clusters?  In this case, the
> whole cluster is unallocated, so the bitmap shouldn’t be relevant,
> AFAIU.

I'm not sure if I follow you.

An unallocated cluster can still have QCOW_OFLAG_SUB_ZERO set in some of
its subclusters. Those read as zeroes and the rest go to the backing
file.

After a full discard all subclusters should be completely deallocated so
those bits should be cleared.

If the bitmap is already 0 (the whole cluster is already unallocated) or
if the image does not have extended L2 entries (which also means that
the whole cluster is already unallocated) then we can skip the discard.

Berto
Max Reitz April 14, 2020, 10:13 a.m. UTC | #3
On 10.04.20 14:47, Alberto Garcia wrote:
> On Thu 09 Apr 2020 12:05:12 PM CEST, Max Reitz wrote:
>>>          switch (qcow2_get_cluster_type(bs, old_l2_entry)) {
>>>          case QCOW2_CLUSTER_UNALLOCATED:
>>> -            if (full_discard || !bs->backing) {
>>> +            if (full_discard) {
>>> +                /* If the image has extended L2 entries we can only
>>> +                 * skip this operation if the L2 bitmap is zero. */
>>> +                uint64_t bitmap = has_subclusters(s) ?
>>> +                    get_l2_bitmap(s, l2_slice, l2_index + i) : 0;
>>
>> Isn’t this bitmap only valid for standard clusters?  In this case, the
>> whole cluster is unallocated, so the bitmap shouldn’t be relevant,
>> AFAIU.
> 
> I'm not sure if I follow you.
> 
> An unallocated cluster can still have QCOW_OFLAG_SUB_ZERO set in some of
> its subclusters. Those read as zeroes and the rest go to the backing
> file.

Hm, right, this is the only way to have non-preallocated zero clusters
after all.

I suppose I read the spec wrong and assumed somehow that unallocated
clusters don’t use “standard cluster descriptors”, so their bitmap usage
would be undefined.  Don’t know how that happened.

> After a full discard all subclusters should be completely deallocated so
> those bits should be cleared.
> 
> If the bitmap is already 0 (the whole cluster is already unallocated) or
> if the image does not have extended L2 entries (which also means that
> the whole cluster is already unallocated) then we can skip the discard.

Yep, seems right.

Reviewed-by: Max Reitz <mreitz@redhat.com>
Vladimir Sementsov-Ogievskiy April 22, 2020, 11:35 a.m. UTC | #4
17.03.2020 21:16, Alberto Garcia wrote:
> Two changes are needed in this function:
> 
> 1) A full discard deallocates a cluster so we can skip the operation if
>     it is already unallocated. With extended L2 entries however if any
>     of the subclusters has the 'all zeroes' bit set then we have to
>     clear it.
> 
> 2) Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
>     image has extended L2 entries. Instead, the individual 'all zeroes'
>     bits must be used.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   block/qcow2-cluster.c | 18 +++++++++++++++---
>   1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index 746006a117..824c710760 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -1790,12 +1790,20 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>            * TODO We might want to use bdrv_block_status(bs) here, but we're
>            * holding s->lock, so that doesn't work today.
>            *
> -         * If full_discard is true, the sector should not read back as zeroes,
> +         * If full_discard is true, the cluster should not read back as zeroes,
>            * but rather fall through to the backing file.
>            */
>           switch (qcow2_get_cluster_type(bs, old_l2_entry)) {
>           case QCOW2_CLUSTER_UNALLOCATED:
> -            if (full_discard || !bs->backing) {
> +            if (full_discard) {
> +                /* If the image has extended L2 entries we can only
> +                 * skip this operation if the L2 bitmap is zero. */
> +                uint64_t bitmap = has_subclusters(s) ?
> +                    get_l2_bitmap(s, l2_slice, l2_index + i) : 0;
> +                if (bitmap == 0) {
> +                    continue;
> +                }
> +            } else if (!bs->backing) {
>                   continue;
>               }

Hmm, so you do continue if full_discard is false AND bitmap != 0 & !bs->backing,
but you do not continue if full_discard is true AND bitmap != 0 & !bs->backing (as you will not go to "else") branch.

Seems it's a mistake.

I think, correct condition is

if (!bs->backing || full_discard && !get_l2_bitmap(s, l2_slice, l2_index + i))

, but, for doing so we also need


--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -565,6 +565,7 @@ static inline uint64_t get_l2_entry(BDRVQcow2State *s, uint64_t *l2_slice,
      return be64_to_cpu(l2_slice[idx]);
  }

+/* Return l2-entry bitmap if image has subclusters and 0 otherwise. */
  static inline uint64_t get_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
                                       int idx)
  {
@@ -572,7 +573,6 @@ static inline uint64_t get_l2_bitmap(BDRVQcow2State *s, uint64_t *l2_slice,
          idx *= l2_entry_size(s) / sizeof(uint64_t);
          return be64_to_cpu(l2_slice[idx + 1]);
      } else {
-        /* For convenience only; the caller should ignore this value. */
          return 0;
      }
  }

or if you don't want, keep it explicit

if (!bs->backing || full_discard && (!has_subclusters(s) || !get_l2_bitmap(s, l2_slice, l2_index + i)))


=====

In case QCOW2_CLUSTER_ZERO_PLAIN, worth assert !has_subclusters(s) or mark image corrupted ?

>               break;
> @@ -1817,7 +1825,11 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>   
>           /* First remove L2 entries */
>           qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
> -        if (!full_discard && s->qcow_version >= 3) {
> +        if (has_subclusters(s)) {
> +            set_l2_entry(s, l2_slice, l2_index + i, 0);
> +            set_l2_bitmap(s, l2_slice, l2_index + i,
> +                          full_discard ? 0 : QCOW_L2_BITMAP_ALL_ZEROES);
> +        } else if (!full_discard && s->qcow_version >= 3) {
>               set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
>           } else {
>               set_l2_entry(s, l2_slice, l2_index + i, 0);
>
Alberto Garcia April 22, 2020, 5:42 p.m. UTC | #5
On Wed 22 Apr 2020 01:35:25 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
> 17.03.2020 21:16, Alberto Garcia wrote:
>> Two changes are needed in this function:
>> 
>> 1) A full discard deallocates a cluster so we can skip the operation if
>>     it is already unallocated. With extended L2 entries however if any
>>     of the subclusters has the 'all zeroes' bit set then we have to
>>     clear it.
>> 
>> 2) Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
>>     image has extended L2 entries. Instead, the individual 'all zeroes'
>>     bits must be used.
>> 
>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>> ---
>>   block/qcow2-cluster.c | 18 +++++++++++++++---
>>   1 file changed, 15 insertions(+), 3 deletions(-)
>> 
>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>> index 746006a117..824c710760 100644
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -1790,12 +1790,20 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>>            * TODO We might want to use bdrv_block_status(bs) here, but we're
>>            * holding s->lock, so that doesn't work today.
>>            *
>> -         * If full_discard is true, the sector should not read back as zeroes,
>> +         * If full_discard is true, the cluster should not read back as zeroes,
>>            * but rather fall through to the backing file.
>>            */
>>           switch (qcow2_get_cluster_type(bs, old_l2_entry)) {
>>           case QCOW2_CLUSTER_UNALLOCATED:
>> -            if (full_discard || !bs->backing) {
>> +            if (full_discard) {
>> +                /* If the image has extended L2 entries we can only
>> +                 * skip this operation if the L2 bitmap is zero. */
>> +                uint64_t bitmap = has_subclusters(s) ?
>> +                    get_l2_bitmap(s, l2_slice, l2_index + i) : 0;
>> +                if (bitmap == 0) {
>> +                    continue;
>> +                }
>> +            } else if (!bs->backing) {
>>                   continue;
>>               }
>
> Hmm, so you do continue if full_discard is false AND bitmap != 0 &
> !bs->backing,

> but you do not continue if full_discard is true AND bitmap != 0 &
> !bs->backing (as you will not go to "else") branch.

1. If full_discard is true it means that the entry and the bitmap should
   always be set to 0, regardless of whether there's a backing file or
   any other consideration.

   This is used e.g when shrinking an image, or by qcow2_make_empty().

   We can only skip this operation if both the entry and the bitmap are
   already 0 (the former we know because of QCOW2_CLUSTER_UNALLOCATED).

2. If full_discard is false it means that we must ensure that the
   cluster reads back as zeroes, but there's no need to clear the bitmap
   (in fact we must set QCOW_OFLAG_ZERO or QCOW_L2_BITMAP_ALL_ZEROES
   depending on the type of image).

   We can skip this operation if there's no backing file and the cluster
   is already unallocated (because then we know that it already reads as
   zeroes).

   One optimization would be to skip the operation also if the image has
   subclusters and the bitmap is QCOW_L2_BITMAP_ALL_ZEROES, I can do
   that for the next version.

> In case QCOW2_CLUSTER_ZERO_PLAIN, worth assert !has_subclusters(s) or
> mark image corrupted ?

I think that should be handled directly in qcow2_get_cluster_type().

There's currently an inconsistency now that I think of it: if an image
has subclusters and QCOW_OFLAG_ZERO set then qcow2_get_cluster_type()
returns QCOW2_CLUSTER_ZERO_* but qcow2_get_subcluster_type() returns
QCOW2_SUBCLUSTER_INVALID.

Two alternatives:

  - We add QCOW2_CLUSTER_INVALID so we get an error in both
    cases. Problem: any function that calls qcow2_get_cluster_type()
    should be modified to handle that.

  - We ignore QCOW_OFLAG_ZERO. Simpler, and it would allow us to use
    that bit in the future if we wanted.

Berto
Vladimir Sementsov-Ogievskiy April 22, 2020, 6:09 p.m. UTC | #6
22.04.2020 20:42, Alberto Garcia wrote:
> On Wed 22 Apr 2020 01:35:25 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
>> 17.03.2020 21:16, Alberto Garcia wrote:
>>> Two changes are needed in this function:
>>>
>>> 1) A full discard deallocates a cluster so we can skip the operation if
>>>      it is already unallocated. With extended L2 entries however if any
>>>      of the subclusters has the 'all zeroes' bit set then we have to
>>>      clear it.
>>>
>>> 2) Setting the QCOW_OFLAG_ZERO bit of the L2 entry is forbidden if an
>>>      image has extended L2 entries. Instead, the individual 'all zeroes'
>>>      bits must be used.
>>>
>>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>>> ---
>>>    block/qcow2-cluster.c | 18 +++++++++++++++---
>>>    1 file changed, 15 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>>> index 746006a117..824c710760 100644
>>> --- a/block/qcow2-cluster.c
>>> +++ b/block/qcow2-cluster.c
>>> @@ -1790,12 +1790,20 @@ static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
>>>             * TODO We might want to use bdrv_block_status(bs) here, but we're
>>>             * holding s->lock, so that doesn't work today.
>>>             *
>>> -         * If full_discard is true, the sector should not read back as zeroes,
>>> +         * If full_discard is true, the cluster should not read back as zeroes,
>>>             * but rather fall through to the backing file.
>>>             */
>>>            switch (qcow2_get_cluster_type(bs, old_l2_entry)) {
>>>            case QCOW2_CLUSTER_UNALLOCATED:
>>> -            if (full_discard || !bs->backing) {
>>> +            if (full_discard) {
>>> +                /* If the image has extended L2 entries we can only
>>> +                 * skip this operation if the L2 bitmap is zero. */
>>> +                uint64_t bitmap = has_subclusters(s) ?
>>> +                    get_l2_bitmap(s, l2_slice, l2_index + i) : 0;
>>> +                if (bitmap == 0) {
>>> +                    continue;
>>> +                }
>>> +            } else if (!bs->backing) {
>>>                    continue;
>>>                }
>>
>> Hmm, so you do continue if full_discard is false AND bitmap != 0 &
>> !bs->backing,
> 
>> but you do not continue if full_discard is true AND bitmap != 0 &
>> !bs->backing (as you will not go to "else") branch.
> 
> 1. If full_discard is true it means that the entry and the bitmap should
>     always be set to 0, regardless of whether there's a backing file or
>     any other consideration.
> 
>     This is used e.g when shrinking an image, or by qcow2_make_empty().
> 
>     We can only skip this operation if both the entry and the bitmap are
>     already 0 (the former we know because of QCOW2_CLUSTER_UNALLOCATED).

Ah, understand, sorry. I thought that behavior was changed accidentally, but it is for purpose. With old code cluster is already unallocated, but with subclusters we may have some ZERO_PLAIN subclusters.

> 
> 2. If full_discard is false it means that we must ensure that the
>     cluster reads back as zeroes, but there's no need to clear the bitmap
>     (in fact we must set QCOW_OFLAG_ZERO or QCOW_L2_BITMAP_ALL_ZEROES
>     depending on the type of image).
> 
>     We can skip this operation if there's no backing file and the cluster
>     is already unallocated (because then we know that it already reads as
>     zeroes).
> 
>     One optimization would be to skip the operation also if the image has
>     subclusters and the bitmap is QCOW_L2_BITMAP_ALL_ZEROES, I can do
>     that for the next version.
> 
>> In case QCOW2_CLUSTER_ZERO_PLAIN, worth assert !has_subclusters(s) or
>> mark image corrupted ?
> 
> I think that should be handled directly in qcow2_get_cluster_type().
> 
> There's currently an inconsistency now that I think of it: if an image
> has subclusters and QCOW_OFLAG_ZERO set then qcow2_get_cluster_type()
> returns QCOW2_CLUSTER_ZERO_* but qcow2_get_subcluster_type() returns
> QCOW2_SUBCLUSTER_INVALID.
> 
> Two alternatives:
> 
>    - We add QCOW2_CLUSTER_INVALID so we get an error in both
>      cases. Problem: any function that calls qcow2_get_cluster_type()
>      should be modified to handle that.
> 
>    - We ignore QCOW_OFLAG_ZERO. Simpler, and it would allow us to use
>      that bit in the future if we wanted.
> 

Hmm. Actually we don't check other reserved bits. But ZERO bit is risky, we may miss data corruptions during transmission to the qcow2-subclusters world. So I'm for the first variant if it's not too huge.
Alberto Garcia April 23, 2020, 2:18 p.m. UTC | #7
On Wed 22 Apr 2020 08:09:53 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
>> There's currently an inconsistency now that I think of it: if an image
>> has subclusters and QCOW_OFLAG_ZERO set then qcow2_get_cluster_type()
>> returns QCOW2_CLUSTER_ZERO_* but qcow2_get_subcluster_type() returns
>> QCOW2_SUBCLUSTER_INVALID.
>> 
>> Two alternatives:
>> 
>>    - We add QCOW2_CLUSTER_INVALID so we get an error in both
>>      cases. Problem: any function that calls qcow2_get_cluster_type()
>>      should be modified to handle that.
>> 
>>    - We ignore QCOW_OFLAG_ZERO. Simpler, and it would allow us to use
>>      that bit in the future if we wanted.
>> 
>
> Hmm. Actually we don't check other reserved bits. But ZERO bit is
> risky, we may miss data corruptions during transmission to the
> qcow2-subclusters world.

That's the best argument for checking that bit.

> So I'm for the first variant if it's not too huge.

The other problem is that if we ever want to use that bit for something
else then we would need to add an incompatible feature. If we just
ignore it now then we may be able to make it a compatible feature. But
the chances for that are low I think, and we still have 8 available bits
in the L2 entry.

Berto
diff mbox series

Patch

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 746006a117..824c710760 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1790,12 +1790,20 @@  static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
          * TODO We might want to use bdrv_block_status(bs) here, but we're
          * holding s->lock, so that doesn't work today.
          *
-         * If full_discard is true, the sector should not read back as zeroes,
+         * If full_discard is true, the cluster should not read back as zeroes,
          * but rather fall through to the backing file.
          */
         switch (qcow2_get_cluster_type(bs, old_l2_entry)) {
         case QCOW2_CLUSTER_UNALLOCATED:
-            if (full_discard || !bs->backing) {
+            if (full_discard) {
+                /* If the image has extended L2 entries we can only
+                 * skip this operation if the L2 bitmap is zero. */
+                uint64_t bitmap = has_subclusters(s) ?
+                    get_l2_bitmap(s, l2_slice, l2_index + i) : 0;
+                if (bitmap == 0) {
+                    continue;
+                }
+            } else if (!bs->backing) {
                 continue;
             }
             break;
@@ -1817,7 +1825,11 @@  static int discard_in_l2_slice(BlockDriverState *bs, uint64_t offset,
 
         /* First remove L2 entries */
         qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice);
-        if (!full_discard && s->qcow_version >= 3) {
+        if (has_subclusters(s)) {
+            set_l2_entry(s, l2_slice, l2_index + i, 0);
+            set_l2_bitmap(s, l2_slice, l2_index + i,
+                          full_discard ? 0 : QCOW_L2_BITMAP_ALL_ZEROES);
+        } else if (!full_discard && s->qcow_version >= 3) {
             set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO);
         } else {
             set_l2_entry(s, l2_slice, l2_index + i, 0);