diff mbox series

[13/17] qcow2: Add new autoclear feature for all zero image

Message ID 20200131174436.2961874-14-eblake@redhat.com
State New
Headers show
Series Improve qcow2 all-zero detection | expand

Commit Message

Eric Blake Jan. 31, 2020, 5:44 p.m. UTC
With the recent introduction of BDRV_ZERO_OPEN, we can optimize
various qemu-img operations if we know the destination starts life
with all zero content.  For an image with no cluster allocations and
no backing file, this was already trivial with BDRV_ZERO_CREATE; but
for a fully preallocated image, it does not scale to crawl through the
entire L1/L2 tree to see if every cluster is currently marked as a
zero cluster.  But it is quite easy to add an autoclear bit to the
qcow2 file itself: the bit will be set after newly creating an image
or after qcow2_make_empty, and cleared on any other modification
(including by an older qemu that doesn't recognize the bit).

This patch documents the new bit, independently of implementing the
places in code that should set it (which means that for bisection
purposes, it is safer to still mask the bit out when opening an image
with the bit set).

A few iotests have updated output due to the larger number of named
header features.

Signed-off-by: Eric Blake <eblake@redhat.com>

---
RFC: As defined in this patch, I defined the bit to be clear if any
cluster defers to a backing file. But the block layer would handle
things just fine if we instead allowed the bit to be set if all
clusters allocated in this image are zero, even if there are other
clusters not allocated.  Or maybe we want TWO bits: one if all
clusters allocated here are known zero, and a second if we know that
there are any clusters that defer to a backing image.
---
 block/qcow2.c              |  9 +++++++++
 block/qcow2.h              |  3 +++
 docs/interop/qcow2.txt     | 12 +++++++++++-
 qapi/block-core.json       |  4 ++++
 tests/qemu-iotests/031.out |  8 ++++----
 tests/qemu-iotests/036.out |  4 ++--
 tests/qemu-iotests/061.out | 14 +++++++-------
 7 files changed, 40 insertions(+), 14 deletions(-)

Comments

Vladimir Sementsov-Ogievskiy Feb. 3, 2020, 5:45 p.m. UTC | #1
31.01.2020 20:44, Eric Blake wrote:
> With the recent introduction of BDRV_ZERO_OPEN, we can optimize
> various qemu-img operations if we know the destination starts life
> with all zero content.  For an image with no cluster allocations and
> no backing file, this was already trivial with BDRV_ZERO_CREATE; but
> for a fully preallocated image, it does not scale to crawl through the
> entire L1/L2 tree to see if every cluster is currently marked as a
> zero cluster.  But it is quite easy to add an autoclear bit to the
> qcow2 file itself: the bit will be set after newly creating an image
> or after qcow2_make_empty, and cleared on any other modification
> (including by an older qemu that doesn't recognize the bit).
> 
> This patch documents the new bit, independently of implementing the
> places in code that should set it (which means that for bisection
> purposes, it is safer to still mask the bit out when opening an image
> with the bit set).
> 
> A few iotests have updated output due to the larger number of named
> header features.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> RFC: As defined in this patch, I defined the bit to be clear if any
> cluster defers to a backing file. But the block layer would handle
> things just fine if we instead allowed the bit to be set if all
> clusters allocated in this image are zero, even if there are other
> clusters not allocated.  Or maybe we want TWO bits: one if all
> clusters allocated here are known zero, and a second if we know that
> there are any clusters that defer to a backing image.
> ---
>   block/qcow2.c              |  9 +++++++++
>   block/qcow2.h              |  3 +++
>   docs/interop/qcow2.txt     | 12 +++++++++++-
>   qapi/block-core.json       |  4 ++++
>   tests/qemu-iotests/031.out |  8 ++++----
>   tests/qemu-iotests/036.out |  4 ++--
>   tests/qemu-iotests/061.out | 14 +++++++-------
>   7 files changed, 40 insertions(+), 14 deletions(-)
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 9f2371925737..20cce9410c84 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -2859,6 +2859,11 @@ int qcow2_update_header(BlockDriverState *bs)
>                   .bit  = QCOW2_AUTOCLEAR_DATA_FILE_RAW_BITNR,
>                   .name = "raw external data",
>               },
> +            {
> +                .type = QCOW2_FEAT_TYPE_AUTOCLEAR,
> +                .bit  = QCOW2_AUTOCLEAR_ALL_ZERO_BITNR,
> +                .name = "all zero",
> +            },
>           };
> 
>           ret = header_ext_add(buf, QCOW2_EXT_MAGIC_FEATURE_TABLE,
> @@ -4874,6 +4879,10 @@ static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs,
>               .corrupt            = s->incompatible_features &
>                                     QCOW2_INCOMPAT_CORRUPT,
>               .has_corrupt        = true,
> +            .all_zero           = s->autoclear_features &
> +                                  QCOW2_AUTOCLEAR_ALL_ZERO,
> +            .has_all_zero       = s->autoclear_features &
> +                                  QCOW2_AUTOCLEAR_ALL_ZERO,
>               .refcount_bits      = s->refcount_bits,
>               .has_bitmaps        = !!bitmaps,
>               .bitmaps            = bitmaps,
> diff --git a/block/qcow2.h b/block/qcow2.h
> index 094212623257..6fc2d323d753 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -237,11 +237,14 @@ enum {
>   enum {
>       QCOW2_AUTOCLEAR_BITMAPS_BITNR       = 0,
>       QCOW2_AUTOCLEAR_DATA_FILE_RAW_BITNR = 1,
> +    QCOW2_AUTOCLEAR_ALL_ZERO_BITNR      = 2,
>       QCOW2_AUTOCLEAR_BITMAPS             = 1 << QCOW2_AUTOCLEAR_BITMAPS_BITNR,
>       QCOW2_AUTOCLEAR_DATA_FILE_RAW       = 1 << QCOW2_AUTOCLEAR_DATA_FILE_RAW_BITNR,
> +    QCOW2_AUTOCLEAR_ALL_ZERO            = 1 << QCOW2_AUTOCLEAR_ALL_ZERO_BITNR,
> 
>       QCOW2_AUTOCLEAR_MASK                = QCOW2_AUTOCLEAR_BITMAPS
>                                           | QCOW2_AUTOCLEAR_DATA_FILE_RAW,
> +    /* TODO: Add _ALL_ZERO to _MASK once it is handled correctly */
>   };
> 
>   enum qcow2_discard_type {
> diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
> index 8510d74c8079..d435363a413c 100644
> --- a/docs/interop/qcow2.txt
> +++ b/docs/interop/qcow2.txt
> @@ -153,7 +153,17 @@ in the description of a field.
>                                   File bit (incompatible feature bit 1) is also
>                                   set.
> 
> -                    Bits 2-63:  Reserved (set to 0)
> +                    Bit 2:      All zero image bit
> +                                If this bit is set, the entire image reads
> +                                as all zeroes. This can be useful for
> +                                detecting just-created images even when
> +                                clusters are preallocated, which in turn
> +                                can be used to optimize image copying.
> +
> +                                This bit should not be set if any cluster
> +                                in the image defers to a backing file.

Hmm. The term "defers to a backing file" not defined in the spec. And, as I
understand, can't be defined by design. Backing file may be added/removed/changed
dynamically, and qcow2 driver will not know about it. So, the only way to
be sure that clusters are not defer to backing file is to make them
ZERO clusters (not UNALLOCATED). But this is inefficient, as we'll have to
allocated all L2 tables.

So, I think better to define this flag as "all allocated clusters are zero".

Hmm interesting, in qcow2 spec "allocated" means allocated on disk and has
offset. So, ZERO cluster is actually unallocated cluster, with bit 0 of
L2 entry set to 1. On the other hand, qemu block layer considers ZERO
clusters as "allocated" (in POV of backing-chain).

So, if we define it as "all allocated clusters are zero", we are done:
other clusters are either unallocated and MAY refer to backing, so we
can say nothing about their read-as-zero status at the level of qcow2
spec, or unallocated with zero-bit set, which are normal ZERO clusters.

So, on the level of qcow2 driver I think it's better consider only this
image. Still, we can implement generic bdrv_is_all_zeros, which will
check or layers (or at least, check that bs->backing is NULL).


> +
> +                    Bits 3-63:  Reserved (set to 0)
> 
>            96 -  99:  refcount_order
>                       Describes the width of a reference count block entry (width
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index ef94a296868f..af837ed5af33 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -71,6 +71,9 @@
>   # @corrupt: true if the image has been marked corrupt; only valid for
>   #           compat >= 1.1 (since 2.2)
>   #
> +# @all-zero: present and true only if the image is known to read as all
> +#            zeroes (since 5.0)
> +#
>   # @refcount-bits: width of a refcount entry in bits (since 2.3)
>   #
>   # @encrypt: details about encryption parameters; only set if image
> @@ -87,6 +90,7 @@
>         '*data-file-raw': 'bool',
>         '*lazy-refcounts': 'bool',
>         '*corrupt': 'bool',
> +      '*all-zero': 'bool',
>         'refcount-bits': 'int',
>         '*encrypt': 'ImageInfoSpecificQCow2Encryption',
>         '*bitmaps': ['Qcow2BitmapInfo']
> diff --git a/tests/qemu-iotests/031.out b/tests/qemu-iotests/031.out
> index 46f97c5a4ea4..bb1afa7b87f6 100644
> --- a/tests/qemu-iotests/031.out
> +++ b/tests/qemu-iotests/031.out
> @@ -117,7 +117,7 @@ header_length             104
> 
>   Header extension:
>   magic                     0x6803f857
> -length                    288
> +length                    336
>   data                      <binary>
> 
>   Header extension:
> @@ -150,7 +150,7 @@ header_length             104
> 
>   Header extension:
>   magic                     0x6803f857
> -length                    288
> +length                    336
>   data                      <binary>
> 
>   Header extension:
> @@ -164,7 +164,7 @@ No errors were found on the image.
> 
>   magic                     0x514649fb
>   version                   3
> -backing_file_offset       0x1d8
> +backing_file_offset       0x208
>   backing_file_size         0x17
>   cluster_bits              16
>   size                      67108864
> @@ -188,7 +188,7 @@ data                      'host_device'
> 
>   Header extension:
>   magic                     0x6803f857
> -length                    288
> +length                    336
>   data                      <binary>
> 
>   Header extension:
> diff --git a/tests/qemu-iotests/036.out b/tests/qemu-iotests/036.out
> index 23b699ce0622..e409acf60e2b 100644
> --- a/tests/qemu-iotests/036.out
> +++ b/tests/qemu-iotests/036.out
> @@ -26,7 +26,7 @@ compatible_features       []
>   autoclear_features        [63]
>   Header extension:
>   magic                     0x6803f857
> -length                    288
> +length                    336
>   data                      <binary>
> 
> 
> @@ -38,7 +38,7 @@ compatible_features       []
>   autoclear_features        []
>   Header extension:
>   magic                     0x6803f857
> -length                    288
> +length                    336
>   data                      <binary>
> 
>   *** done
> diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
> index 413cc4e0f4ab..d873f79bb606 100644
> --- a/tests/qemu-iotests/061.out
> +++ b/tests/qemu-iotests/061.out
> @@ -26,7 +26,7 @@ header_length             104
> 
>   Header extension:
>   magic                     0x6803f857
> -length                    288
> +length                    336
>   data                      <binary>
> 
>   magic                     0x514649fb
> @@ -84,7 +84,7 @@ header_length             104
> 
>   Header extension:
>   magic                     0x6803f857
> -length                    288
> +length                    336
>   data                      <binary>
> 
>   magic                     0x514649fb
> @@ -140,7 +140,7 @@ header_length             104
> 
>   Header extension:
>   magic                     0x6803f857
> -length                    288
> +length                    336
>   data                      <binary>
> 
>   ERROR cluster 5 refcount=0 reference=1
> @@ -195,7 +195,7 @@ header_length             104
> 
>   Header extension:
>   magic                     0x6803f857
> -length                    288
> +length                    336
>   data                      <binary>
> 
>   magic                     0x514649fb
> @@ -264,7 +264,7 @@ header_length             104
> 
>   Header extension:
>   magic                     0x6803f857
> -length                    288
> +length                    336
>   data                      <binary>
> 
>   read 65536/65536 bytes at offset 44040192
> @@ -298,7 +298,7 @@ header_length             104
> 
>   Header extension:
>   magic                     0x6803f857
> -length                    288
> +length                    336
>   data                      <binary>
> 
>   ERROR cluster 5 refcount=0 reference=1
> @@ -327,7 +327,7 @@ header_length             104
> 
>   Header extension:
>   magic                     0x6803f857
> -length                    288
> +length                    336
>   data                      <binary>
> 
>   read 131072/131072 bytes at offset 0
>
Eric Blake Feb. 4, 2020, 1:12 p.m. UTC | #2
On 2/3/20 11:45 AM, Vladimir Sementsov-Ogievskiy wrote:
> 31.01.2020 20:44, Eric Blake wrote:
>> With the recent introduction of BDRV_ZERO_OPEN, we can optimize
>> various qemu-img operations if we know the destination starts life
>> with all zero content.  For an image with no cluster allocations and
>> no backing file, this was already trivial with BDRV_ZERO_CREATE; but
>> for a fully preallocated image, it does not scale to crawl through the
>> entire L1/L2 tree to see if every cluster is currently marked as a
>> zero cluster.  But it is quite easy to add an autoclear bit to the
>> qcow2 file itself: the bit will be set after newly creating an image
>> or after qcow2_make_empty, and cleared on any other modification
>> (including by an older qemu that doesn't recognize the bit).
>>
>> This patch documents the new bit, independently of implementing the
>> places in code that should set it (which means that for bisection
>> purposes, it is safer to still mask the bit out when opening an image
>> with the bit set).
>>
>> A few iotests have updated output due to the larger number of named
>> header features.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>
>> ---
>> RFC: As defined in this patch, I defined the bit to be clear if any
>> cluster defers to a backing file. But the block layer would handle
>> things just fine if we instead allowed the bit to be set if all
>> clusters allocated in this image are zero, even if there are other
>> clusters not allocated.  Or maybe we want TWO bits: one if all
>> clusters allocated here are known zero, and a second if we know that
>> there are any clusters that defer to a backing image.

>> -                    Bits 2-63:  Reserved (set to 0)
>> +                    Bit 2:      All zero image bit
>> +                                If this bit is set, the entire image 
>> reads
>> +                                as all zeroes. This can be useful for
>> +                                detecting just-created images even when
>> +                                clusters are preallocated, which in turn
>> +                                can be used to optimize image copying.
>> +
>> +                                This bit should not be set if any 
>> cluster
>> +                                in the image defers to a backing file.
> 
> Hmm. The term "defers to a backing file" not defined in the spec. And, as I
> understand, can't be defined by design. Backing file may be 
> added/removed/changed
> dynamically, and qcow2 driver will not know about it. So, the only way to
> be sure that clusters are not defer to backing file is to make them
> ZERO clusters (not UNALLOCATED). But this is inefficient, as we'll have to
> allocated all L2 tables.
> 
> So, I think better to define this flag as "all allocated clusters are 
> zero".

That was precisely the topic of my RFC question.

I _do_ think it is simpler to report that 'all clusters where content 
comes from _this_ image read as zero', leaving unallocated clusters as 
zero only if 1. there is no backing image, or 2. the backing image also 
reads as all zero (recursing as needed).  I'll spin v2 of these patches 
along those lines, although I'm hoping for more review on the rest of 
the series, first.

> 
> Hmm interesting, in qcow2 spec "allocated" means allocated on disk and has
> offset. So, ZERO cluster is actually unallocated cluster, with bit 0 of
> L2 entry set to 1. On the other hand, qemu block layer considers ZERO
> clusters as "allocated" (in POV of backing-chain).

I really want the definition to be 'any cluster whose contents come from 
this layer' (the qemu-io definition of allocated, not necessarily the 
qcow2 definition of allocated), which picks up BOTH types of qcow2 zero 
clusters (those preallocated but marked 0, where the contents of the 
allocated area are indeterminate but never read, and those unallocated 
but marked 0 which do not defer to the backing layer).  Whether or not 
the cluster is allocated is less important than whether the image reads 
as 0 at that cluster.

But I think that you are right that an alternative definition of 'all 
allocated clusters are zero' will give the same results when crawling 
through the backing chain to learn if the overall image reads as zero, 
and that's all the more that we can expect out of this bit.

> 
> So, if we define it as "all allocated clusters are zero", we are done:
> other clusters are either unallocated and MAY refer to backing, so we
> can say nothing about their read-as-zero status at the level of qcow2
> spec, or unallocated with zero-bit set, which are normal ZERO clusters.
> 
> So, on the level of qcow2 driver I think it's better consider only this
> image. Still, we can implement generic bdrv_is_all_zeros, which will
> check or layers (or at least, check that bs->backing is NULL).

The earlier parts of this series which renamed bdrv_has_zero_init() into 
bdrv_known_zeroes() does just that - it already handles recursion 
through the backing chain, and insists that an image is all zeroes with 
respect to BDRV_ZERO_OPEN only if all layers of the backing chain agree.
Vladimir Sementsov-Ogievskiy Feb. 4, 2020, 1:29 p.m. UTC | #3
04.02.2020 16:12, Eric Blake wrote:
> On 2/3/20 11:45 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 31.01.2020 20:44, Eric Blake wrote:
>>> With the recent introduction of BDRV_ZERO_OPEN, we can optimize
>>> various qemu-img operations if we know the destination starts life
>>> with all zero content.  For an image with no cluster allocations and
>>> no backing file, this was already trivial with BDRV_ZERO_CREATE; but
>>> for a fully preallocated image, it does not scale to crawl through the
>>> entire L1/L2 tree to see if every cluster is currently marked as a
>>> zero cluster.  But it is quite easy to add an autoclear bit to the
>>> qcow2 file itself: the bit will be set after newly creating an image
>>> or after qcow2_make_empty, and cleared on any other modification
>>> (including by an older qemu that doesn't recognize the bit).
>>>
>>> This patch documents the new bit, independently of implementing the
>>> places in code that should set it (which means that for bisection
>>> purposes, it is safer to still mask the bit out when opening an image
>>> with the bit set).
>>>
>>> A few iotests have updated output due to the larger number of named
>>> header features.
>>>
>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>>
>>> ---
>>> RFC: As defined in this patch, I defined the bit to be clear if any
>>> cluster defers to a backing file. But the block layer would handle
>>> things just fine if we instead allowed the bit to be set if all
>>> clusters allocated in this image are zero, even if there are other
>>> clusters not allocated.  Or maybe we want TWO bits: one if all
>>> clusters allocated here are known zero, and a second if we know that
>>> there are any clusters that defer to a backing image.
> 
>>> -                    Bits 2-63:  Reserved (set to 0)
>>> +                    Bit 2:      All zero image bit
>>> +                                If this bit is set, the entire image reads
>>> +                                as all zeroes. This can be useful for
>>> +                                detecting just-created images even when
>>> +                                clusters are preallocated, which in turn
>>> +                                can be used to optimize image copying.
>>> +
>>> +                                This bit should not be set if any cluster
>>> +                                in the image defers to a backing file.
>>
>> Hmm. The term "defers to a backing file" not defined in the spec. And, as I
>> understand, can't be defined by design. Backing file may be added/removed/changed
>> dynamically, and qcow2 driver will not know about it. So, the only way to
>> be sure that clusters are not defer to backing file is to make them
>> ZERO clusters (not UNALLOCATED). But this is inefficient, as we'll have to
>> allocated all L2 tables.
>>
>> So, I think better to define this flag as "all allocated clusters are zero".
> 
> That was precisely the topic of my RFC question.

Yes, and this is what I'm thinking about it :)  Looks like I worded it in
manner that I didn't see the RFC and just consider it as final patch,
sorry for that.

> 
> I _do_ think it is simpler to report that 'all clusters where content comes from _this_ image read as zero', leaving unallocated clusters as zero only if 1. there is no backing image, or 2. the backing image also reads as all zero (recursing as needed).  I'll spin v2 of these patches along those lines, although I'm hoping for more review on the rest of the series, first.

Still, I'm not sure that it make sense to consider backing at all. In my POV,
backing is up to the user. User may load backing file which is specified in
qcow2 header, but on the same time, user may chose some other backing file.
Backing file is "external" thing, so, may be better not rely on it.

> 
>>
>> Hmm interesting, in qcow2 spec "allocated" means allocated on disk and has
>> offset. So, ZERO cluster is actually unallocated cluster, with bit 0 of
>> L2 entry set to 1. On the other hand, qemu block layer considers ZERO
>> clusters as "allocated" (in POV of backing-chain).
> 
> I really want the definition to be 'any cluster whose contents come from this layer' (the qemu-io definition of allocated, not necessarily the qcow2 definition of allocated), which picks up BOTH types of qcow2 zero clusters (those preallocated but marked 0, where the contents of the allocated area are indeterminate but never read, and those unallocated but marked 0 which do not defer to the backing layer).  Whether or not the cluster is allocated is less important than whether the image reads as 0 at that cluster.
> 
> But I think that you are right that an alternative definition of 'all allocated clusters are zero' will give the same results when crawling through the backing chain to learn if the overall image reads as zero, and that's all the more that we can expect out of this bit.

Yes, it's equal, because unallocated clusters marked as ZERO are zero anyway.

> 
>>
>> So, if we define it as "all allocated clusters are zero", we are done:
>> other clusters are either unallocated and MAY refer to backing, so we
>> can say nothing about their read-as-zero status at the level of qcow2
>> spec, or unallocated with zero-bit set, which are normal ZERO clusters.
>>
>> So, on the level of qcow2 driver I think it's better consider only this
>> image. Still, we can implement generic bdrv_is_all_zeros, which will
>> check or layers (or at least, check that bs->backing is NULL).
> 
> The earlier parts of this series which renamed bdrv_has_zero_init() into bdrv_known_zeroes() does just that - it already handles recursion through the backing chain, and insists that an image is all zeroes with respect to BDRV_ZERO_OPEN only if all layers of the backing chain agree.
> 

Great. I'll look at other patches soon.
diff mbox series

Patch

diff --git a/block/qcow2.c b/block/qcow2.c
index 9f2371925737..20cce9410c84 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2859,6 +2859,11 @@  int qcow2_update_header(BlockDriverState *bs)
                 .bit  = QCOW2_AUTOCLEAR_DATA_FILE_RAW_BITNR,
                 .name = "raw external data",
             },
+            {
+                .type = QCOW2_FEAT_TYPE_AUTOCLEAR,
+                .bit  = QCOW2_AUTOCLEAR_ALL_ZERO_BITNR,
+                .name = "all zero",
+            },
         };

         ret = header_ext_add(buf, QCOW2_EXT_MAGIC_FEATURE_TABLE,
@@ -4874,6 +4879,10 @@  static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs,
             .corrupt            = s->incompatible_features &
                                   QCOW2_INCOMPAT_CORRUPT,
             .has_corrupt        = true,
+            .all_zero           = s->autoclear_features &
+                                  QCOW2_AUTOCLEAR_ALL_ZERO,
+            .has_all_zero       = s->autoclear_features &
+                                  QCOW2_AUTOCLEAR_ALL_ZERO,
             .refcount_bits      = s->refcount_bits,
             .has_bitmaps        = !!bitmaps,
             .bitmaps            = bitmaps,
diff --git a/block/qcow2.h b/block/qcow2.h
index 094212623257..6fc2d323d753 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -237,11 +237,14 @@  enum {
 enum {
     QCOW2_AUTOCLEAR_BITMAPS_BITNR       = 0,
     QCOW2_AUTOCLEAR_DATA_FILE_RAW_BITNR = 1,
+    QCOW2_AUTOCLEAR_ALL_ZERO_BITNR      = 2,
     QCOW2_AUTOCLEAR_BITMAPS             = 1 << QCOW2_AUTOCLEAR_BITMAPS_BITNR,
     QCOW2_AUTOCLEAR_DATA_FILE_RAW       = 1 << QCOW2_AUTOCLEAR_DATA_FILE_RAW_BITNR,
+    QCOW2_AUTOCLEAR_ALL_ZERO            = 1 << QCOW2_AUTOCLEAR_ALL_ZERO_BITNR,

     QCOW2_AUTOCLEAR_MASK                = QCOW2_AUTOCLEAR_BITMAPS
                                         | QCOW2_AUTOCLEAR_DATA_FILE_RAW,
+    /* TODO: Add _ALL_ZERO to _MASK once it is handled correctly */
 };

 enum qcow2_discard_type {
diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 8510d74c8079..d435363a413c 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -153,7 +153,17 @@  in the description of a field.
                                 File bit (incompatible feature bit 1) is also
                                 set.

-                    Bits 2-63:  Reserved (set to 0)
+                    Bit 2:      All zero image bit
+                                If this bit is set, the entire image reads
+                                as all zeroes. This can be useful for
+                                detecting just-created images even when
+                                clusters are preallocated, which in turn
+                                can be used to optimize image copying.
+
+                                This bit should not be set if any cluster
+                                in the image defers to a backing file.
+
+                    Bits 3-63:  Reserved (set to 0)

          96 -  99:  refcount_order
                     Describes the width of a reference count block entry (width
diff --git a/qapi/block-core.json b/qapi/block-core.json
index ef94a296868f..af837ed5af33 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -71,6 +71,9 @@ 
 # @corrupt: true if the image has been marked corrupt; only valid for
 #           compat >= 1.1 (since 2.2)
 #
+# @all-zero: present and true only if the image is known to read as all
+#            zeroes (since 5.0)
+#
 # @refcount-bits: width of a refcount entry in bits (since 2.3)
 #
 # @encrypt: details about encryption parameters; only set if image
@@ -87,6 +90,7 @@ 
       '*data-file-raw': 'bool',
       '*lazy-refcounts': 'bool',
       '*corrupt': 'bool',
+      '*all-zero': 'bool',
       'refcount-bits': 'int',
       '*encrypt': 'ImageInfoSpecificQCow2Encryption',
       '*bitmaps': ['Qcow2BitmapInfo']
diff --git a/tests/qemu-iotests/031.out b/tests/qemu-iotests/031.out
index 46f97c5a4ea4..bb1afa7b87f6 100644
--- a/tests/qemu-iotests/031.out
+++ b/tests/qemu-iotests/031.out
@@ -117,7 +117,7 @@  header_length             104

 Header extension:
 magic                     0x6803f857
-length                    288
+length                    336
 data                      <binary>

 Header extension:
@@ -150,7 +150,7 @@  header_length             104

 Header extension:
 magic                     0x6803f857
-length                    288
+length                    336
 data                      <binary>

 Header extension:
@@ -164,7 +164,7 @@  No errors were found on the image.

 magic                     0x514649fb
 version                   3
-backing_file_offset       0x1d8
+backing_file_offset       0x208
 backing_file_size         0x17
 cluster_bits              16
 size                      67108864
@@ -188,7 +188,7 @@  data                      'host_device'

 Header extension:
 magic                     0x6803f857
-length                    288
+length                    336
 data                      <binary>

 Header extension:
diff --git a/tests/qemu-iotests/036.out b/tests/qemu-iotests/036.out
index 23b699ce0622..e409acf60e2b 100644
--- a/tests/qemu-iotests/036.out
+++ b/tests/qemu-iotests/036.out
@@ -26,7 +26,7 @@  compatible_features       []
 autoclear_features        [63]
 Header extension:
 magic                     0x6803f857
-length                    288
+length                    336
 data                      <binary>


@@ -38,7 +38,7 @@  compatible_features       []
 autoclear_features        []
 Header extension:
 magic                     0x6803f857
-length                    288
+length                    336
 data                      <binary>

 *** done
diff --git a/tests/qemu-iotests/061.out b/tests/qemu-iotests/061.out
index 413cc4e0f4ab..d873f79bb606 100644
--- a/tests/qemu-iotests/061.out
+++ b/tests/qemu-iotests/061.out
@@ -26,7 +26,7 @@  header_length             104

 Header extension:
 magic                     0x6803f857
-length                    288
+length                    336
 data                      <binary>

 magic                     0x514649fb
@@ -84,7 +84,7 @@  header_length             104

 Header extension:
 magic                     0x6803f857
-length                    288
+length                    336
 data                      <binary>

 magic                     0x514649fb
@@ -140,7 +140,7 @@  header_length             104

 Header extension:
 magic                     0x6803f857
-length                    288
+length                    336
 data                      <binary>

 ERROR cluster 5 refcount=0 reference=1
@@ -195,7 +195,7 @@  header_length             104

 Header extension:
 magic                     0x6803f857
-length                    288
+length                    336
 data                      <binary>

 magic                     0x514649fb
@@ -264,7 +264,7 @@  header_length             104

 Header extension:
 magic                     0x6803f857
-length                    288
+length                    336
 data                      <binary>

 read 65536/65536 bytes at offset 44040192
@@ -298,7 +298,7 @@  header_length             104

 Header extension:
 magic                     0x6803f857
-length                    288
+length                    336
 data                      <binary>

 ERROR cluster 5 refcount=0 reference=1
@@ -327,7 +327,7 @@  header_length             104

 Header extension:
 magic                     0x6803f857
-length                    288
+length                    336
 data                      <binary>

 read 131072/131072 bytes at offset 0