Patchwork [v2,2/3] qcow2: Implement bdrv_amend_options

login
register
mail settings
Submitter Max Reitz
Date Aug. 29, 2013, 11:20 a.m.
Message ID <1377775241-26278-3-git-send-email-mreitz@redhat.com>
Download mbox | patch
Permalink /patch/270797/
State New
Headers show

Comments

Max Reitz - Aug. 29, 2013, 11:20 a.m.
Implement bdrv_amend_options for compat, size, backing_file, backing_fmt
and lazy_refcounts.

Downgrading images from compat=1.1 to compat=0.10 is achieved through
handling all incompatible flags accordingly, clearing all compatible and
autoclear flags and expanding all zero clusters.

Signed-off-by: Max Reitz <mreitz@redhat.com>
---
 block/qcow2-cluster.c | 154 ++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.c         | 184 ++++++++++++++++++++++++++++++++++++++++++++++++++
 block/qcow2.h         |   2 +
 3 files changed, 340 insertions(+)
Eric Blake - Aug. 29, 2013, 12:45 p.m.
On 08/29/2013 05:20 AM, Max Reitz wrote:
> Implement bdrv_amend_options for compat, size, backing_file, backing_fmt
> and lazy_refcounts.
> 
> Downgrading images from compat=1.1 to compat=0.10 is achieved through
> handling all incompatible flags accordingly, clearing all compatible and
> autoclear flags and expanding all zero clusters.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---

> +/*
> + * Expands all zero clusters on the image; important for downgrading to a qcow2
> + * version which doesn't yet support metadata zero clusters.

Do we have to fully write 0 blocks into the image no matter what, or are
there cases where, when the file has a backing image and we know the
backing image has 0 bytes at the same offset, where we could flatten by
removing the cluster and letting the reference defer to the backing
file?  It's always safer to write 0 blocks into this image, but it may
be worth considering whether we need the (ability) to try the alternate
method as it results in a smaller file and potentially faster conversion.


> +
> +    /* the refcount order might be different in newer images - however, qemu
> +     * doesn't support anything different than 4 anyway, so nothing to fix
> +     * there */

This sounds risky.  Wouldn't it be safer to error out if the image
didn't have a refcount order of 4, than to just ignore it; on the
grounds that if qemu DOES add support for non-4 refcount order, an error
will at least alert someone to the fact that they need to add some
(potentially complicated) code here?
Max Reitz - Aug. 29, 2013, 12:52 p.m.
Am 29.08.2013 14:45, schrieb Eric Blake:
> On 08/29/2013 05:20 AM, Max Reitz wrote:
>> Implement bdrv_amend_options for compat, size, backing_file, backing_fmt
>> and lazy_refcounts.
>>
>> Downgrading images from compat=1.1 to compat=0.10 is achieved through
>> handling all incompatible flags accordingly, clearing all compatible and
>> autoclear flags and expanding all zero clusters.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>> +/*
>> + * Expands all zero clusters on the image; important for downgrading to a qcow2
>> + * version which doesn't yet support metadata zero clusters.
> Do we have to fully write 0 blocks into the image no matter what, or are
> there cases where, when the file has a backing image and we know the
> backing image has 0 bytes at the same offset, where we could flatten by
> removing the cluster and letting the reference defer to the backing
> file?  It's always safer to write 0 blocks into this image, but it may
> be worth considering whether we need the (ability) to try the alternate
> method as it results in a smaller file and potentially faster conversion.
This seems non-trivial to optimize to me (at least doing that check 
fast), at least too non-trivial for implementing it solely for an image 
version downgrade (which nobody who is concerned about image size should 
do anyway, imho).

For non-backed images however, we could certainly just unallocate the 
blocks, I guess, since the spec explicitly states for that case that "if 
a cluster is unallocated, read requests […] shall read zeros for all 
parts that are not covered by the backing file" (also applies if there 
is no backing file at all).

>> +
>> +    /* the refcount order might be different in newer images - however, qemu
>> +     * doesn't support anything different than 4 anyway, so nothing to fix
>> +     * there */
> This sounds risky.  Wouldn't it be safer to error out if the image
> didn't have a refcount order of 4, than to just ignore it; on the
> grounds that if qemu DOES add support for non-4 refcount order, an error
> will at least alert someone to the fact that they need to add some
> (potentially complicated) code here?
>
Oh, yes, of course. I'll fix it.


Max
Kevin Wolf - Aug. 29, 2013, 1 p.m.
Am 29.08.2013 um 14:52 hat Max Reitz geschrieben:
> Am 29.08.2013 14:45, schrieb Eric Blake:
> >On 08/29/2013 05:20 AM, Max Reitz wrote:
> >>Implement bdrv_amend_options for compat, size, backing_file, backing_fmt
> >>and lazy_refcounts.
> >>
> >>Downgrading images from compat=1.1 to compat=0.10 is achieved through
> >>handling all incompatible flags accordingly, clearing all compatible and
> >>autoclear flags and expanding all zero clusters.
> >>
> >>Signed-off-by: Max Reitz <mreitz@redhat.com>
> >>---
> >>+/*
> >>+ * Expands all zero clusters on the image; important for downgrading to a qcow2
> >>+ * version which doesn't yet support metadata zero clusters.
> >Do we have to fully write 0 blocks into the image no matter what, or are
> >there cases where, when the file has a backing image and we know the
> >backing image has 0 bytes at the same offset, where we could flatten by
> >removing the cluster and letting the reference defer to the backing
> >file?  It's always safer to write 0 blocks into this image, but it may
> >be worth considering whether we need the (ability) to try the alternate
> >method as it results in a smaller file and potentially faster conversion.
> This seems non-trivial to optimize to me (at least doing that check
> fast), at least too non-trivial for implementing it solely for an
> image version downgrade (which nobody who is concerned about image
> size should do anyway, imho).
> 
> For non-backed images however, we could certainly just unallocate
> the blocks, I guess, since the spec explicitly states for that case
> that "if a cluster is unallocated, read requests […] shall read
> zeros for all parts that are not covered by the backing file" (also
> applies if there is no backing file at all).

Yup, checking for !bs->backing_hd is easy, so simple deallocating in
this case sounds like a good idea to do.

Reading from the backing file and checking if the buffer is zero isn't
_that_ complicated either, but at least the conversion speed won't be
improved by doing this. If we already had Paolo'sbdrv_get_block_status,
we could try that, but as it is today I don't think it's worth doing
anything else here.

Downgrading an image is an unusual operation anyway.

Kevin

Patch

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index cca76d4..06e6165 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -1476,3 +1476,157 @@  fail:
 
     return ret;
 }
+
+/*
+ * Expands all zero clusters in a specific L1 table.
+ */
+static int expand_zero_clusters_in_l1(BlockDriverState *bs, uint64_t *l1_table,
+                                      int l1_size)
+{
+    BDRVQcowState *s = bs->opaque;
+    bool is_active_l1 = (l1_table == s->l1_table);
+    uint64_t *l2_table;
+    int ret;
+    int i, j;
+
+    if (!is_active_l1) {
+        /* inactive L2 tables require a buffer to be stored in when loading
+         * them from disk */
+        l2_table = g_malloc(s->cluster_size);
+    }
+
+    for (i = 0; i < l1_size; i++) {
+        uint64_t l2_offset = l1_table[i] & L1E_OFFSET_MASK;
+        bool l2_dirty = false;
+
+        if (!l2_offset) {
+            /* unallocated */
+            continue;
+        }
+
+        if (is_active_l1) {
+            /* get active L2 tables from cache */
+            ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset,
+                    (void **)&l2_table);
+        } else {
+            /* load inactive L2 tables from disk */
+            ret = bdrv_read(bs->file, l2_offset / BDRV_SECTOR_SIZE,
+                    (void *)l2_table, s->cluster_sectors);
+        }
+        if (ret < 0) {
+            goto fail;
+        }
+
+        for (j = 0; j < s->l2_size; j++) {
+            uint64_t l2_entry = be64_to_cpu(l2_table[j]);
+            int64_t offset;
+
+            if (qcow2_get_cluster_type(l2_entry) != QCOW2_CLUSTER_ZERO) {
+                continue;
+            }
+
+            offset = l2_entry & L2E_OFFSET_MASK;
+            if (!offset) {
+                /* not preallocated */
+                offset = qcow2_alloc_clusters(bs, s->cluster_size);
+                if (offset < 0) {
+                    ret = offset;
+                    goto fail;
+                }
+            }
+
+            ret = bdrv_write_zeroes(bs->file, offset / BDRV_SECTOR_SIZE,
+                                    s->cluster_sectors);
+            if (ret < 0) {
+                qcow2_free_clusters(bs, offset, s->cluster_size,
+                        QCOW2_DISCARD_ALWAYS);
+                goto fail;
+            }
+
+            l2_table[j] = cpu_to_be64(offset | QCOW_OFLAG_COPIED);
+            l2_dirty = true;
+        }
+
+        if (is_active_l1) {
+            if (l2_dirty) {
+                qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+                qcow2_cache_depends_on_flush(s->l2_table_cache);
+            }
+            ret = qcow2_cache_put(bs, s->l2_table_cache, (void **)&l2_table);
+            if (ret < 0) {
+                l2_table = NULL;
+                goto fail;
+            }
+        } else {
+            if (l2_dirty) {
+                ret = bdrv_write(bs->file, l2_offset / BDRV_SECTOR_SIZE,
+                        (void *)l2_table, s->cluster_sectors);
+                if (ret < 0) {
+                    goto fail;
+                }
+            }
+        }
+    }
+
+    ret = 0;
+
+fail:
+    if (l2_table) {
+        if (!is_active_l1) {
+            g_free(l2_table);
+        } else {
+            if (ret < 0) {
+                qcow2_cache_put(bs, s->l2_table_cache, (void **)&l2_table);
+            } else {
+                ret = qcow2_cache_put(bs, s->l2_table_cache,
+                        (void **)&l2_table);
+            }
+        }
+    }
+    return ret;
+}
+
+/*
+ * Expands all zero clusters on the image; important for downgrading to a qcow2
+ * version which doesn't yet support metadata zero clusters.
+ */
+int qcow2_expand_zero_clusters(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+    uint64_t *l1_table = NULL;
+    int ret;
+    int i, j;
+
+    ret = expand_zero_clusters_in_l1(bs, s->l1_table, s->l1_size);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    for (i = 0; i < s->nb_snapshots; i++) {
+        int l1_sectors = (s->snapshots[i].l1_size * sizeof(uint64_t) +
+                BDRV_SECTOR_SIZE - 1) / BDRV_SECTOR_SIZE;
+
+        l1_table = g_realloc(l1_table, l1_sectors * BDRV_SECTOR_SIZE);
+
+        ret = bdrv_read(bs->file, s->snapshots[i].l1_table_offset /
+                BDRV_SECTOR_SIZE, (void *)l1_table, l1_sectors);
+        if (ret < 0) {
+            goto fail;
+        }
+
+        for (j = 0; j < s->snapshots[i].l1_size; j++) {
+            be64_to_cpus(&l1_table[j]);
+        }
+
+        ret = expand_zero_clusters_in_l1(bs, l1_table, s->snapshots[i].l1_size);
+        if (ret < 0) {
+            goto fail;
+        }
+    }
+
+    ret = 0;
+
+fail:
+    g_free(l1_table);
+    return ret;
+}
diff --git a/block/qcow2.c b/block/qcow2.c
index 78097e5..2ed7d64 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1735,6 +1735,189 @@  static int qcow2_load_vmstate(BlockDriverState *bs, uint8_t *buf,
     return ret;
 }
 
+/*
+ * Downgrades an image's version. To achieve this, any incompatible features
+ * have to be removed.
+ */
+static int qcow2_downgrade(BlockDriverState *bs, int target_version)
+{
+    BDRVQcowState *s = bs->opaque;
+    int current_version = s->qcow_version;
+    int ret;
+
+    if (target_version == current_version) {
+        return 0;
+    } else if (target_version > current_version) {
+        return -EINVAL;
+    } else if (target_version != 2) {
+        return -EINVAL;
+    }
+
+    /* clear incompatible features */
+    if (s->incompatible_features & QCOW2_INCOMPAT_DIRTY) {
+        ret = qcow2_mark_clean(bs);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    if (s->incompatible_features) {
+        return -ENOTSUP;
+    }
+
+    /* since we can ignore compatible features, we can set them to 0 as well */
+    s->compatible_features = 0;
+    /* if lazy refcounts have been used, they have already been fixed through
+     * clearing the dirty flag */
+
+    /* clearing autoclear features is trivial */
+    s->autoclear_features = 0;
+
+    /* the refcount order might be different in newer images - however, qemu
+     * doesn't support anything different than 4 anyway, so nothing to fix
+     * there */
+
+    ret = qcow2_expand_zero_clusters(bs);
+    if (ret < 0) {
+        return ret;
+    }
+
+    s->qcow_version = target_version;
+    ret = qcow2_update_header(bs);
+    if (ret < 0) {
+        s->qcow_version = current_version;
+        return ret;
+    }
+    return 0;
+}
+
+static int qcow2_amend_options(BlockDriverState *bs,
+                               QEMUOptionParameter *options)
+{
+    BDRVQcowState *s = bs->opaque;
+    int old_version = s->qcow_version, new_version = old_version;
+    uint64_t new_size = 0;
+    const char *backing_file = NULL, *backing_format = NULL;
+    bool lazy_refcounts = s->use_lazy_refcounts;
+    int ret;
+    int i;
+
+    for (i = 0; options[i].name; i++)
+    {
+        if (!strcmp(options[i].name, "compat")) {
+            if (!options[i].value.s) {
+                /* preserve default */
+            } else if (!strcmp(options[i].value.s, "0.10")) {
+                new_version = 2;
+            } else if (!strcmp(options[i].value.s, "1.1")) {
+                new_version = 3;
+            } else {
+                fprintf(stderr, "Unknown compatibility level %s.\n",
+                        options[i].value.s);
+                return -EINVAL;
+            }
+        } else if (!strcmp(options[i].name, "preallocation")) {
+            if (options[i].assigned) {
+                fprintf(stderr, "Cannot change preallocation mode.\n");
+                return -ENOTSUP;
+            }
+        } else if (!strcmp(options[i].name, "size")) {
+            new_size = options[i].value.n;
+        } else if (!strcmp(options[i].name, "backing_file")) {
+            backing_file = options[i].value.s;
+        } else if (!strcmp(options[i].name, "backing_fmt")) {
+            backing_format = options[i].value.s;
+        } else if (!strcmp(options[i].name, "encryption")) {
+            if (options[i].assigned &&
+                (options[i].value.n != !!s->crypt_method)) {
+                fprintf(stderr, "Changing the encryption flag is not "
+                        "supported.\n");
+                return -ENOTSUP;
+            }
+        } else if (!strcmp(options[i].name, "cluster_size")) {
+            if (options[i].assigned && (options[i].value.n != s->cluster_size))
+            {
+                fprintf(stderr, "Changing the cluster size is not "
+                        "supported.\n");
+                return -ENOTSUP;
+            }
+        } else if (!strcmp(options[i].name, "lazy_refcounts")) {
+            if (options[i].assigned) {
+                lazy_refcounts = options[i].value.n;
+            }
+        } else {
+            /* if this assertion fails, this probably means a new option was
+             * added without having it covered here */
+            assert(false);
+        }
+    }
+
+    if (new_version != old_version) {
+        if (new_version > old_version) {
+            /* Upgrade */
+            s->qcow_version = new_version;
+            ret = qcow2_update_header(bs);
+            if (ret < 0) {
+                s->qcow_version = old_version;
+                return ret;
+            }
+        } else {
+            ret = qcow2_downgrade(bs, new_version);
+            if (ret < 0) {
+                return ret;
+            }
+        }
+    }
+
+    if (new_size) {
+        ret = qcow2_truncate(bs, new_size);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    if (backing_file || backing_format) {
+        ret = qcow2_change_backing_file(bs, backing_file ?: bs->backing_file,
+                                        backing_format ?: bs->backing_format);
+        if (ret < 0) {
+            return ret;
+        }
+    }
+
+    if (s->use_lazy_refcounts != lazy_refcounts) {
+        if (lazy_refcounts) {
+            if (s->qcow_version < 3) {
+                fprintf(stderr, "Lazy refcounts only supported with compatibility "
+                        "level 1.1 and above (use compat=1.1 or greater)\n");
+                return -EINVAL;
+            }
+            s->compatible_features |= QCOW2_COMPAT_LAZY_REFCOUNTS;
+            ret = qcow2_update_header(bs);
+            if (ret < 0) {
+                s->compatible_features &= ~QCOW2_COMPAT_LAZY_REFCOUNTS;
+                return ret;
+            }
+            s->use_lazy_refcounts = true;
+        } else {
+            /* make image clean first */
+            ret = qcow2_mark_clean(bs);
+            if (ret < 0) {
+                return ret;
+            }
+            /* now disallow lazy refcounts */
+            s->compatible_features &= ~QCOW2_COMPAT_LAZY_REFCOUNTS;
+            ret = qcow2_update_header(bs);
+            if (ret < 0) {
+                s->compatible_features |= QCOW2_COMPAT_LAZY_REFCOUNTS;
+                return ret;
+            }
+            s->use_lazy_refcounts = false;
+        }
+    }
+
+    return 0;
+}
+
 static QEMUOptionParameter qcow2_create_options[] = {
     {
         .name = BLOCK_OPT_SIZE,
@@ -1818,6 +2001,7 @@  static BlockDriver bdrv_qcow2 = {
 
     .create_options = qcow2_create_options,
     .bdrv_check = qcow2_check,
+    .bdrv_amend_options = qcow2_amend_options,
 };
 
 static void bdrv_qcow2_init(void)
diff --git a/block/qcow2.h b/block/qcow2.h
index dba9771..84109de 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -408,6 +408,8 @@  int qcow2_discard_clusters(BlockDriverState *bs, uint64_t offset,
     int nb_sectors);
 int qcow2_zero_clusters(BlockDriverState *bs, uint64_t offset, int nb_sectors);
 
+int qcow2_expand_zero_clusters(BlockDriverState *bs);
+
 /* qcow2-snapshot.c functions */
 int qcow2_snapshot_create(BlockDriverState *bs, QEMUSnapshotInfo *sn_info);
 int qcow2_snapshot_goto(BlockDriverState *bs, const char *snapshot_id);