diff mbox

[v7,6/9] qcow2: add bdrv_measure() support

Message ID 20170613133329.23653-7-stefanha@redhat.com
State New
Headers show

Commit Message

Stefan Hajnoczi June 13, 2017, 1:33 p.m. UTC
Use qcow2_calc_prealloc_size() to get the required file size.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
---
v7:
 * Check that qcow2 supports the image file size [Berto]
---
 block/qcow2.c | 133 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 133 insertions(+)

Comments

Alberto Garcia June 13, 2017, 3:07 p.m. UTC | #1
On Tue 13 Jun 2017 03:33:26 PM CEST, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> Use qcow2_calc_prealloc_size() to get the required file size.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>

You kept my R-b here but one of the changes was in this patch:

> +    info = g_new(BlockMeasureInfo, 1);
> +    info->fully_allocated =
> +        qcow2_calc_prealloc_size(virtual_size, cluster_size,
> +                                 ctz32(refcount_bits));
> +    if (DIV_ROUND_UP(info->fully_allocated, cluster_size) > INT_MAX) {
> +        g_free(info);
> +        error_setg(&local_err, "The image size is too large "
> +                               "(try using a larger cluster size)");
> +        goto err;
> +    }

This has the opposite problem than the previous version: valid image
sizes are now rejected by the 'measure' command.

$ qemu-img create -f qcow2 img.qcow2 1P
Formatting 'img.qcow2', fmt=qcow2 size=1125899906842624 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16

$ build/qemu-img measure -O qcow2 --size 1P
qemu-img: The image size is too large (try using a larger cluster size)

The actual limit is:

#define QCOW_MAX_L1_SIZE 0x2000000

That's 4194304 entries, each one can address cluster_size^2 / 8 bytes

So using that formula, here is the maximum virtual size depending on the
cluster size:

|--------------+------------------|
| Cluster size | Max virtual size |
|--------------+------------------|
| 512 bytes    | 128 GB           |
| 1 KB         | 512 GB           |
| 2 KB         | 2 TB             |
| 4 KB         | 8 TB             |
| 8 KB         | 32 TB            |
| 16 KB        | 128 TB           |
| 32 KB        | 512 TB           |
| 64 KB        | 2 PB             |
| 128 KB       | 8 PB             |
| 256 KB       | 32 PB            |
| 512 KB       | 128 PB           |
| 1 MB         | 512 PB           |
| 2 MB         | 2 EB             |
|--------------+------------------|

I just created a 2 EB image and it works fine, Linux can detect it
without problems, I can create a file system, etc.

If you specify a larger size, qcow2_grow_l1_table() fails with -EFIB.

Berto
Stefan Hajnoczi June 14, 2017, 12:25 p.m. UTC | #2
On Tue, Jun 13, 2017 at 05:07:13PM +0200, Alberto Garcia wrote:
> On Tue 13 Jun 2017 03:33:26 PM CEST, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > Use qcow2_calc_prealloc_size() to get the required file size.
> >
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Reviewed-by: Alberto Garcia <berto@igalia.com>
> 
> You kept my R-b here but one of the changes was in this patch:

Sorry, I forgot to remove it when making the change.

> > +    info = g_new(BlockMeasureInfo, 1);
> > +    info->fully_allocated =
> > +        qcow2_calc_prealloc_size(virtual_size, cluster_size,
> > +                                 ctz32(refcount_bits));
> > +    if (DIV_ROUND_UP(info->fully_allocated, cluster_size) > INT_MAX) {
> > +        g_free(info);
> > +        error_setg(&local_err, "The image size is too large "
> > +                               "(try using a larger cluster size)");
> > +        goto err;
> > +    }
> 
> This has the opposite problem than the previous version: valid image
> sizes are now rejected by the 'measure' command.
> 
> $ qemu-img create -f qcow2 img.qcow2 1P
> Formatting 'img.qcow2', fmt=qcow2 size=1125899906842624 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
> 
> $ build/qemu-img measure -O qcow2 --size 1P
> qemu-img: The image size is too large (try using a larger cluster size)

Hmm...if host file size (not virtual disk size) is 1P then qemu-img
check fails:

int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
                          BdrvCheckMode fix)
{
    int64_t size, highest_cluster, nb_clusters;
    ...

    size = bdrv_getlength(bs->file->bs);
    if (size < 0) {
        res->check_errors++;
        return size;
    }

    nb_clusters = size_to_clusters(s, size);
    if (nb_clusters > INT_MAX) {
        res->check_errors++;
        return -EFBIG;
    }

This is also where I got the limit from.  It was introduced in:

commit 0abe740f1de899737242bcba1fb4a9857f7a3087
Author: Kevin Wolf <kwolf@redhat.com>
Date:   Wed Mar 26 13:05:52 2014 +0100

    qcow2: Protect against some integer overflows in bdrv_check

At that point the code used ints so the overflow check was necessary.
Now the code uses 64-bit types so INT_MAX is obsolete.

I will send a separate patch to fix qemu-img check.

> The actual limit is:
> 
> #define QCOW_MAX_L1_SIZE 0x2000000
> 
> That's 4194304 entries, each one can address cluster_size^2 / 8 bytes
> 
> So using that formula, here is the maximum virtual size depending on the
> cluster size:
> 
> |--------------+------------------|
> | Cluster size | Max virtual size |
> |--------------+------------------|
> | 512 bytes    | 128 GB           |
> | 1 KB         | 512 GB           |
> | 2 KB         | 2 TB             |
> | 4 KB         | 8 TB             |
> | 8 KB         | 32 TB            |
> | 16 KB        | 128 TB           |
> | 32 KB        | 512 TB           |
> | 64 KB        | 2 PB             |
> | 128 KB       | 8 PB             |
> | 256 KB       | 32 PB            |
> | 512 KB       | 128 PB           |
> | 1 MB         | 512 PB           |
> | 2 MB         | 2 EB             |
> |--------------+------------------|
> 
> I just created a 2 EB image and it works fine, Linux can detect it
> without problems, I can create a file system, etc.
> 
> If you specify a larger size, qcow2_grow_l1_table() fails with -EFIB.

Thanks, will fix!

Stefan
diff mbox

Patch

diff --git a/block/qcow2.c b/block/qcow2.c
index fd0fba5..ff7f568 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2958,6 +2958,138 @@  static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
     return 0;
 }
 
+static BlockMeasureInfo *qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
+                                       Error **errp)
+{
+    Error *local_err = NULL;
+    BlockMeasureInfo *info;
+    uint64_t required = 0; /* bytes that contribute to required size */
+    uint64_t virtual_size; /* disk size as seen by guest */
+    uint64_t refcount_bits;
+    size_t cluster_size;
+    int version;
+    char *optstr;
+    PreallocMode prealloc;
+    bool has_backing_file;
+
+    /* Parse image creation options */
+    cluster_size = qcow2_opt_get_cluster_size_del(opts, &local_err);
+    if (local_err) {
+        goto err;
+    }
+
+    version = qcow2_opt_get_version_del(opts, &local_err);
+    if (local_err) {
+        goto err;
+    }
+
+    refcount_bits = qcow2_opt_get_refcount_bits_del(opts, version, &local_err);
+    if (local_err) {
+        goto err;
+    }
+
+    optstr = qemu_opt_get_del(opts, BLOCK_OPT_PREALLOC);
+    prealloc = qapi_enum_parse(PreallocMode_lookup, optstr,
+                               PREALLOC_MODE__MAX, PREALLOC_MODE_OFF,
+                               &local_err);
+    g_free(optstr);
+    if (local_err) {
+        goto err;
+    }
+
+    optstr = qemu_opt_get_del(opts, BLOCK_OPT_BACKING_FILE);
+    has_backing_file = !!optstr;
+    g_free(optstr);
+
+    virtual_size = align_offset(qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0),
+                                cluster_size);
+
+    /* Account for input image */
+    if (in_bs) {
+        int64_t ssize = bdrv_getlength(in_bs);
+        if (ssize < 0) {
+            error_setg_errno(&local_err, -ssize,
+                             "Unable to get image virtual_size");
+            goto err;
+        }
+
+        virtual_size = align_offset(ssize, cluster_size);
+
+        if (has_backing_file) {
+            /* We don't how much of the backing chain is shared by the input
+             * image and the new image file.  In the worst case the new image's
+             * backing file has nothing in common with the input image.  Be
+             * conservative and assume all clusters need to be written.
+             */
+            required = virtual_size;
+        } else {
+            int cluster_sectors = cluster_size / BDRV_SECTOR_SIZE;
+            int64_t sector_num;
+            int pnum = 0;
+
+            for (sector_num = 0;
+                 sector_num < ssize / BDRV_SECTOR_SIZE;
+                 sector_num += pnum) {
+                int nb_sectors = MAX(ssize / BDRV_SECTOR_SIZE - sector_num,
+                                     INT_MAX);
+                BlockDriverState *file;
+                int64_t ret;
+
+                ret = bdrv_get_block_status_above(in_bs, NULL,
+                                                  sector_num, nb_sectors,
+                                                  &pnum, &file);
+                if (ret < 0) {
+                    error_setg_errno(&local_err, -ret,
+                                     "Unable to get block status");
+                    goto err;
+                }
+
+                if (ret & BDRV_BLOCK_ZERO) {
+                    /* Skip zero regions (safe with no backing file) */
+                } else if ((ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) ==
+                           (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) {
+                    /* Extend pnum to end of cluster for next iteration */
+                    pnum = ROUND_UP(sector_num + pnum, cluster_sectors) -
+                           sector_num;
+
+                    /* Count clusters we've seen */
+                    required += (sector_num % cluster_sectors + pnum) *
+                                BDRV_SECTOR_SIZE;
+                }
+            }
+        }
+    }
+
+    /* Take into account preallocation.  Nothing special is needed for
+     * PREALLOC_MODE_METADATA since metadata is always counted.
+     */
+    if (prealloc == PREALLOC_MODE_FULL || prealloc == PREALLOC_MODE_FALLOC) {
+        required = virtual_size;
+    }
+
+    info = g_new(BlockMeasureInfo, 1);
+    info->fully_allocated =
+        qcow2_calc_prealloc_size(virtual_size, cluster_size,
+                                 ctz32(refcount_bits));
+    if (DIV_ROUND_UP(info->fully_allocated, cluster_size) > INT_MAX) {
+        g_free(info);
+        error_setg(&local_err, "The image size is too large "
+                               "(try using a larger cluster size)");
+        goto err;
+    }
+
+    /* Remove data clusters that are not required.  This overestimates the
+     * required size because metadata needed for the fully allocated file is
+     * still counted.
+     */
+    info->required = info->fully_allocated - virtual_size + required;
+    return info;
+
+err:
+    error_propagate(errp, local_err);
+    return NULL;
+}
+
 static int qcow2_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
 {
     BDRVQcow2State *s = bs->opaque;
@@ -3505,6 +3637,7 @@  BlockDriver bdrv_qcow2 = {
     .bdrv_snapshot_delete   = qcow2_snapshot_delete,
     .bdrv_snapshot_list     = qcow2_snapshot_list,
     .bdrv_snapshot_load_tmp = qcow2_snapshot_load_tmp,
+    .bdrv_measure           = qcow2_measure,
     .bdrv_get_info          = qcow2_get_info,
     .bdrv_get_specific_info = qcow2_get_specific_info,