Message ID | 1409088987-17207-4-git-send-email-mreitz@redhat.com |
---|---|
State | New |
Headers | show |
On 08/26/2014 03:36 PM, Max Reitz wrote: > bdrv_make_empty() is currently only called if the current image > represents an external snapshot that has been committed to its base > image; it is therefore unlikely to have internal snapshots. In this > case, bdrv_make_empty() can be greatly sped up by emptying the L1 and > refcount table (while having the dirty flag set) and creating a trivial > refcount structure. > > If there are snapshots, fall back to the simple implementation (discard > all clusters). > > Signed-off-by: Max Reitz <mreitz@redhat.com> > --- > block/blkdebug.c | 2 + > block/qcow2.c | 137 ++++++++++++++++++++++++++++++++++++++++++++------ > include/block/block.h | 2 + > 3 files changed, 126 insertions(+), 15 deletions(-) > > + } else { > + int l1_clusters; > + int64_t offset; > + uint64_t *new_reftable; > + uint8_t l1_ofs_rt_ofs_cls[20]; /* L1 offset; RT offset and clusters */ > + uint64_t rt_entry; > + > + ret = qcow2_cache_empty(bs, s->l2_table_cache); > if (ret < 0) { > - break; > + return ret; > + } > + > + ret = qcow2_cache_empty(bs, s->refcount_block_cache); > + if (ret < 0) { > + return ret; > + } > + > + /* Refcounts will be broken utterly */ > + ret = qcow2_mark_dirty(bs); qcow2_mark_dirty does assert(s->qcow_version >= 3). But this code can be reached when committing a qcow2 0.10 compat-level file, right? > + /* "Create" an empty reftable (one cluster) directly after the image > + * header and an empty L1 table three clusters after the image header; > + * the cluster between those two will be used as the first refblock */ > + cpu_to_be64w((uint64_t *)&l1_ofs_rt_ofs_cls[ 0], 3 * s->cluster_size); > + cpu_to_be64w((uint64_t *)&l1_ofs_rt_ofs_cls[ 8], s->cluster_size); > + cpu_to_be32w((uint32_t *)&l1_ofs_rt_ofs_cls[16], 1); The offsets here feel a bit magic.
On 10/10/2014 06:32 AM, Eric Blake wrote: > On 08/26/2014 03:36 PM, Max Reitz wrote: >> bdrv_make_empty() is currently only called if the current image >> represents an external snapshot that has been committed to its base >> image; it is therefore unlikely to have internal snapshots. In this >> case, bdrv_make_empty() can be greatly sped up by emptying the L1 and >> refcount table (while having the dirty flag set) and creating a trivial >> refcount structure. >> >> If there are snapshots, fall back to the simple implementation (discard >> all clusters). Brain-wave... >> + /* Refcounts will be broken utterly */ >> + ret = qcow2_mark_dirty(bs); > > qcow2_mark_dirty does assert(s->qcow_version >= 3). But this code can > be reached when committing a qcow2 0.10 compat-level file, right? What if you use the fallback of the slower code on compat=0.10 files? It's no worse than it has always been, and after all, this patch is about adding an optimization, not adding new behavior. It's okay if the optimization is only usable on compat=1.1 files.
Am 10.10.2014 um 14:32 schrieb Eric Blake: > On 08/26/2014 03:36 PM, Max Reitz wrote: >> bdrv_make_empty() is currently only called if the current image >> represents an external snapshot that has been committed to its base >> image; it is therefore unlikely to have internal snapshots. In this >> case, bdrv_make_empty() can be greatly sped up by emptying the L1 and >> refcount table (while having the dirty flag set) and creating a trivial >> refcount structure. >> >> If there are snapshots, fall back to the simple implementation (discard >> all clusters). >> >> Signed-off-by: Max Reitz <mreitz@redhat.com> >> --- >> block/blkdebug.c | 2 + >> block/qcow2.c | 137 ++++++++++++++++++++++++++++++++++++++++++++------ >> include/block/block.h | 2 + >> 3 files changed, 126 insertions(+), 15 deletions(-) >> >> + } else { >> + int l1_clusters; >> + int64_t offset; >> + uint64_t *new_reftable; >> + uint8_t l1_ofs_rt_ofs_cls[20]; /* L1 offset; RT offset and clusters */ >> + uint64_t rt_entry; >> + >> + ret = qcow2_cache_empty(bs, s->l2_table_cache); >> if (ret < 0) { >> - break; >> + return ret; >> + } >> + >> + ret = qcow2_cache_empty(bs, s->refcount_block_cache); >> + if (ret < 0) { >> + return ret; >> + } >> + >> + /* Refcounts will be broken utterly */ >> + ret = qcow2_mark_dirty(bs); > qcow2_mark_dirty does assert(s->qcow_version >= 3). But this code can > be reached when committing a qcow2 0.10 compat-level file, right? You're right, this code doesn't work for compat=0.10. As you proposed in your own answer to this, I'll just use the fallback code in that case (in v13) >> + /* "Create" an empty reftable (one cluster) directly after the image >> + * header and an empty L1 table three clusters after the image header; >> + * the cluster between those two will be used as the first refblock */ >> + cpu_to_be64w((uint64_t *)&l1_ofs_rt_ofs_cls[ 0], 3 * s->cluster_size); >> + cpu_to_be64w((uint64_t *)&l1_ofs_rt_ofs_cls[ 8], s->cluster_size); >> + cpu_to_be32w((uint32_t *)&l1_ofs_rt_ofs_cls[16], 1); > The offsets here feel a bit magic. Well, such byte offsets are everywhere in the qcow2 code which does such small writes to the image header. Making them non-magic would mean offsetof(...) - offsetof(...). Can and will do, but I probably won't find it much more readable. ;-) Max
diff --git a/block/blkdebug.c b/block/blkdebug.c index 69b330e..a21678d 100644 --- a/block/blkdebug.c +++ b/block/blkdebug.c @@ -198,6 +198,8 @@ static const char *event_names[BLKDBG_EVENT_MAX] = { [BLKDBG_PWRITEV] = "pwritev", [BLKDBG_PWRITEV_ZERO] = "pwritev_zero", [BLKDBG_PWRITEV_DONE] = "pwritev_done", + + [BLKDBG_EMPTY_IMAGE_PREPARE] = "empty_image_prepare", }; static int get_event_by_name(const char *name, BlkDebugEvent *event) diff --git a/block/qcow2.c b/block/qcow2.c index 2efd9b5..e475151 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -2155,24 +2155,131 @@ fail: static int qcow2_make_empty(BlockDriverState *bs) { + BDRVQcowState *s = bs->opaque; int ret = 0; - uint64_t start_sector; - int sector_step = INT_MAX / BDRV_SECTOR_SIZE; - for (start_sector = 0; start_sector < bs->total_sectors; - start_sector += sector_step) - { - /* As this function is generally used after committing an external - * snapshot, QCOW2_DISCARD_SNAPSHOT seems appropriate. Also, the - * default action for this kind of discard is to pass the discard, - * which will ideally result in an actually smaller image file, as - * is probably desired. */ - ret = qcow2_discard_clusters(bs, start_sector * BDRV_SECTOR_SIZE, - MIN(sector_step, - bs->total_sectors - start_sector), - QCOW2_DISCARD_SNAPSHOT, true); + if (s->snapshots) { + uint64_t start_sector; + int sector_step = INT_MAX / BDRV_SECTOR_SIZE; + + /* If there are snapshots, every active cluster has to be discarded */ + + for (start_sector = 0; start_sector < bs->total_sectors; + start_sector += sector_step) + { + /* As this function is generally used after committing an external + * snapshot, QCOW2_DISCARD_SNAPSHOT seems appropriate. Also, the + * default action for this kind of discard is to pass the discard, + * which will ideally result in an actually smaller image file, as + * is probably desired. */ + ret = qcow2_discard_clusters(bs, start_sector * BDRV_SECTOR_SIZE, + MIN(sector_step, + bs->total_sectors - start_sector), + QCOW2_DISCARD_SNAPSHOT, true); + if (ret < 0) { + break; + } + } + } else { + int l1_clusters; + int64_t offset; + uint64_t *new_reftable; + uint8_t l1_ofs_rt_ofs_cls[20]; /* L1 offset; RT offset and clusters */ + uint64_t rt_entry; + + ret = qcow2_cache_empty(bs, s->l2_table_cache); if (ret < 0) { - break; + return ret; + } + + ret = qcow2_cache_empty(bs, s->refcount_block_cache); + if (ret < 0) { + return ret; + } + + /* Refcounts will be broken utterly */ + ret = qcow2_mark_dirty(bs); + if (ret < 0) { + return ret; + } + + l1_clusters = DIV_ROUND_UP(s->l1_size, + s->cluster_size / sizeof(uint64_t)); + new_reftable = g_try_new0(uint64_t, s->cluster_size / sizeof(uint64_t)); + if (!new_reftable) { + return -ENOMEM; + } + + BLKDBG_EVENT(bs->file, BLKDBG_EMPTY_IMAGE_PREPARE); + + /* Overwrite enough clusters at the beginning of the sectors to place + * the refcount table, a refcount block and the L1 table in; this may + * overwrite parts of the existing refcount and L1 table, which is not + * an issue because the dirty flag is set, complete data loss is in fact + * desired and partial data loss is consequently fine as well */ + ret = bdrv_write_zeroes(bs->file, s->cluster_size / BDRV_SECTOR_SIZE, + (2 + l1_clusters) * s->cluster_size / + BDRV_SECTOR_SIZE, 0); + if (ret < 0) { + g_free(new_reftable); + return ret; + } + + BLKDBG_EVENT(bs->file, BLKDBG_L1_UPDATE); + BLKDBG_EVENT(bs->file, BLKDBG_REFTABLE_UPDATE); + + /* "Create" an empty reftable (one cluster) directly after the image + * header and an empty L1 table three clusters after the image header; + * the cluster between those two will be used as the first refblock */ + cpu_to_be64w((uint64_t *)&l1_ofs_rt_ofs_cls[ 0], 3 * s->cluster_size); + cpu_to_be64w((uint64_t *)&l1_ofs_rt_ofs_cls[ 8], s->cluster_size); + cpu_to_be32w((uint32_t *)&l1_ofs_rt_ofs_cls[16], 1); + ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader, l1_table_offset), + l1_ofs_rt_ofs_cls, sizeof(l1_ofs_rt_ofs_cls)); + if (ret < 0) { + g_free(new_reftable); + return ret; + } + + s->l1_table_offset = 3 * s->cluster_size; + memset(s->l1_table, 0, s->l1_size * sizeof(uint64_t)); + + s->refcount_table_offset = s->cluster_size; + s->refcount_table_size = s->cluster_size / sizeof(uint64_t); + + g_free(s->refcount_table); + s->refcount_table = new_reftable; + + BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC); + + /* Enter the first refblock into the reftable */ + rt_entry = cpu_to_be64(2 * s->cluster_size); + ret = bdrv_pwrite_sync(bs->file, s->cluster_size, + &rt_entry, sizeof(rt_entry)); + if (ret < 0) { + return ret; + } + + s->refcount_table[0] = 2 * s->cluster_size; + + ret = bdrv_truncate(bs->file, (3 + l1_clusters) * s->cluster_size); + if (ret < 0) { + return ret; + } + + s->free_cluster_index = 0; + offset = qcow2_alloc_clusters(bs, 3 * s->cluster_size + + s->l1_size * sizeof(uint64_t)); + if (offset < 0) { + return offset; + } else if (offset > 0) { + error_report("First cluster in emptied image is in use"); + abort(); + } + + ret = qcow2_mark_clean(bs); + if (ret < 0) { + return ret; } } diff --git a/include/block/block.h b/include/block/block.h index 8f4ad16..7ac4caf 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -557,6 +557,8 @@ typedef enum { BLKDBG_PWRITEV_ZERO, BLKDBG_PWRITEV_DONE, + BLKDBG_EMPTY_IMAGE_PREPARE, + BLKDBG_EVENT_MAX, } BlkDebugEvent;
bdrv_make_empty() is currently only called if the current image represents an external snapshot that has been committed to its base image; it is therefore unlikely to have internal snapshots. In this case, bdrv_make_empty() can be greatly sped up by emptying the L1 and refcount table (while having the dirty flag set) and creating a trivial refcount structure. If there are snapshots, fall back to the simple implementation (discard all clusters). Signed-off-by: Max Reitz <mreitz@redhat.com> --- block/blkdebug.c | 2 + block/qcow2.c | 137 ++++++++++++++++++++++++++++++++++++++++++++------ include/block/block.h | 2 + 3 files changed, 126 insertions(+), 15 deletions(-)