diff mbox series

[for-4.1?,1/4] qcow2: Fix .bdrv_has_zero_init()

Message ID 20190715104508.7568-2-mreitz@redhat.com
State New
Headers show
Series block: Fix three .bdrv_has_zero_init()s | expand

Commit Message

Max Reitz July 15, 2019, 10:45 a.m. UTC
If a qcow2 file is preallocated, it can no longer guarantee that it
initially appears as filled with zeroes.

So implement .bdrv_has_zero_init() by checking whether the file is
preallocated; if so, forward the call to the underlying storage node,
except for when it is encrypted: Encrypted preallocated images always
return effectively random data, so .bdrv_has_zero_init() must always
return 0 for them.

Reported-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 89 insertions(+), 1 deletion(-)

Comments

Kevin Wolf July 16, 2019, 4:54 p.m. UTC | #1
Am 15.07.2019 um 12:45 hat Max Reitz geschrieben:
> If a qcow2 file is preallocated, it can no longer guarantee that it
> initially appears as filled with zeroes.
> 
> So implement .bdrv_has_zero_init() by checking whether the file is
> preallocated; if so, forward the call to the underlying storage node,
> except for when it is encrypted: Encrypted preallocated images always
> return effectively random data, so .bdrv_has_zero_init() must always
> return 0 for them.
> 
> Reported-by: Stefano Garzarella <sgarzare@redhat.com>
> Signed-off-by: Max Reitz <mreitz@redhat.com>

Hm... This patch only really works directly after image creation (which
is indeed where .bdrv_has_zero_init is used). Why do we have to have a
full qcow2_is_zero() that loops over the whole image just to find out
whether it's preallocated? Wouldn't looking at a single data cluster be
enough?

Kevin
Max Reitz July 17, 2019, 8:37 a.m. UTC | #2
On 16.07.19 18:54, Kevin Wolf wrote:
> Am 15.07.2019 um 12:45 hat Max Reitz geschrieben:
>> If a qcow2 file is preallocated, it can no longer guarantee that it
>> initially appears as filled with zeroes.
>>
>> So implement .bdrv_has_zero_init() by checking whether the file is
>> preallocated; if so, forward the call to the underlying storage node,
>> except for when it is encrypted: Encrypted preallocated images always
>> return effectively random data, so .bdrv_has_zero_init() must always
>> return 0 for them.
>>
>> Reported-by: Stefano Garzarella <sgarzare@redhat.com>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> 
> Hm... This patch only really works directly after image creation (which
> is indeed where .bdrv_has_zero_init is used). Why do we have to have a
> full qcow2_is_zero() that loops over the whole image just to find out
> whether it's preallocated? Wouldn't looking at a single data cluster be
> enough?

Hm.  I would like to agree (because you’re right), but now I see that
the callers of bdrv_has_zero_init() don’t necessarily hold to that
convention.

For example, qemu-img convert has the -n flag, but that doesn’t stop it
from invoking bdrv_has_zero_init().

Which is a bug, of course.

$ ./qemu-img create -f qcow2 src.qcow2 64M
$ ./qemu-img create -f qcow2 dest.qcow2 64M
$ ./qemu-io -c 'write -P 42 0 64M' dest.qcow2
$ ./qemu-img convert -n src.qcow2 dest.qcow2
$ ./qemu-img compare src.qcow2 dest.qcow2
Content mismatch at offset 0!

Aw, man, why does this keep happening... :-/

OK, so qemu-img convert -n is easy to fix.

But there are more callers:

mirror: Uses this function to inquire whether it needs to zero the
target before actually doing something useful.  There is no guarantee
that the target is a new image.  Well, it just isn’t with mode=existing
or blockdev-mirror.

parallels: Whether to write zeroes to newly added image areas.  That
actually sounds correct, because those new areas cannot point to any
data yet.
Well, maybe not correct, because bdrv_has_zero_init() is not the same as
“when this image grows, new areas will be zero”, but at least
bdrv_hsa_zero_init() will return false if the the latter is false.

vhdx: Similarly to parallels, it uses this information to check whether
it needs to zero new areas when growing an image file.

raw/vmdk/vpc: Just passing through info from their storage child.


Hm, OK.  So mirror and qemu-img need fixing.  That sounds possible.

Max
diff mbox series

Patch

diff --git a/block/qcow2.c b/block/qcow2.c
index 039bdc2f7e..730fd53890 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -4631,6 +4631,94 @@  static ImageInfoSpecific *qcow2_get_specific_info(BlockDriverState *bs,
     return spec_info;
 }
 
+/*
+ * Return 1 if the file only contains zero and unallocated clusters.
+ * Return 0 if it contains compressed or normal clusters.
+ * Return -errno on error.
+ */
+static int qcow2_is_zero(BlockDriverState *bs)
+{
+    BDRVQcow2State *s = bs->opaque;
+    int l1_i;
+    int ret;
+
+    if (bs->backing) {
+        return 0;
+    }
+
+    for (l1_i = 0; l1_i < s->l1_size; l1_i++) {
+        uint64_t l2_offset = s->l1_table[l1_i] & L1E_OFFSET_MASK;
+        int slice_start_i;
+
+        if (!l2_offset) {
+            continue;
+        }
+
+        for (slice_start_i = 0; slice_start_i < s->l2_size;
+             slice_start_i += s->l2_slice_size)
+        {
+            uint64_t *l2_slice;
+            int l2_slice_i;
+
+            ret = qcow2_cache_get(bs, s->l2_table_cache,
+                                  l2_offset + slice_start_i * sizeof(uint64_t),
+                                  (void **)&l2_slice);
+            if (ret < 0) {
+                return ret;
+            }
+
+            for (l2_slice_i = 0; l2_slice_i < s->l2_slice_size; l2_slice_i++) {
+                uint64_t l2_entry = be64_to_cpu(l2_slice[l2_slice_i]);
+
+                switch (qcow2_get_cluster_type(bs, l2_entry)) {
+                case QCOW2_CLUSTER_UNALLOCATED:
+                case QCOW2_CLUSTER_ZERO_PLAIN:
+                case QCOW2_CLUSTER_ZERO_ALLOC:
+                    break;
+
+                case QCOW2_CLUSTER_NORMAL:
+                case QCOW2_CLUSTER_COMPRESSED:
+                    qcow2_cache_put(s->l2_table_cache, (void **)&l2_slice);
+                    return 0;
+
+                default:
+                    abort();
+                }
+            }
+
+            qcow2_cache_put(s->l2_table_cache, (void **)&l2_slice);
+        }
+    }
+
+    return 1;
+}
+
+static int qcow2_has_zero_init(BlockDriverState *bs)
+{
+    BDRVQcow2State *s = bs->opaque;
+    int ret;
+
+    if (qemu_in_coroutine()) {
+        qemu_co_mutex_lock(&s->lock);
+    }
+    /* Check preallocation status */
+    ret = qcow2_is_zero(bs);
+    if (qemu_in_coroutine()) {
+        qemu_co_mutex_unlock(&s->lock);
+    }
+    if (ret < 0) {
+        return 0;
+    }
+
+    if (ret == 1) {
+        return 1;
+    } else if (bs->encrypted) {
+        return 0;
+    } else {
+        return bdrv_has_zero_init(s->data_file->bs);
+    }
+}
+
 static int qcow2_save_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
                               int64_t pos)
 {
@@ -5186,7 +5274,7 @@  BlockDriver bdrv_qcow2 = {
     .bdrv_child_perm      = bdrv_format_default_perms,
     .bdrv_co_create_opts  = qcow2_co_create_opts,
     .bdrv_co_create       = qcow2_co_create,
-    .bdrv_has_zero_init = bdrv_has_zero_init_1,
+    .bdrv_has_zero_init   = qcow2_has_zero_init,
     .bdrv_co_block_status = qcow2_co_block_status,
 
     .bdrv_co_preadv         = qcow2_co_preadv,