From patchwork Wed Dec 12 13:46:24 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 205519 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 8CB042C0092 for ; Thu, 13 Dec 2012 00:47:57 +1100 (EST) Received: from localhost ([::1]:47057 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TimfD-0007To-MY for incoming@patchwork.ozlabs.org; Wed, 12 Dec 2012 08:47:55 -0500 Received: from eggs.gnu.org ([208.118.235.92]:58987) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Timei-0007Id-0r for qemu-devel@nongnu.org; Wed, 12 Dec 2012 08:47:25 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TimeX-0003jL-HR for qemu-devel@nongnu.org; Wed, 12 Dec 2012 08:47:23 -0500 Received: from mail-ie0-f173.google.com ([209.85.223.173]:62359) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TimeX-0003jH-Am for qemu-devel@nongnu.org; Wed, 12 Dec 2012 08:47:13 -0500 Received: by mail-ie0-f173.google.com with SMTP id e13so1768458iej.4 for ; Wed, 12 Dec 2012 05:47:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:x-mailer:in-reply-to :references; bh=rQBTmnAHkXOrBWPcylGaWuPz02x8lb6ZrMz2gFt/0GU=; b=Fz0OXpEd4NkSMwifPy9E9H83wjaJnuOxFumYytc2M/4b65pc8GsBfcAyewKwb5160P kEIXvzs/Ds+9Kl4ClBgmDWAZ5cGyjMbd/69GrnVk0/Xk1Kjcw8MMCuIjuMFiS2erBavg Jj1JZ+1C864hI9OGw/Iju6RUSmdqAAv5BUxnzT/8nuy/+EoGGGF6yfCgCQCkIOuUQERy k0GAK0DgczR2sulVcmMp8DXJYyswd8zWH6kHSJOtn2qGlOnqqE2/HYaekYySw0HllfzM XjwfrIIxVd/vZNyO0d1F5Mn2qqRBRxZDdJ1EO8aZ2vs15KLPM4n4IWXGP4P+Ai1wVEae ik4Q== Received: by 10.50.40.229 with SMTP id a5mr13511119igl.59.1355320032825; Wed, 12 Dec 2012 05:47:12 -0800 (PST) Received: from yakj.usersys.redhat.com (93-34-219-150.ip51.fastwebnet.it. [93.34.219.150]) by mx.google.com with ESMTPS id fv6sm1786818igc.17.2012.12.12.05.47.09 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 12 Dec 2012 05:47:11 -0800 (PST) From: Paolo Bonzini To: qemu-devel@nongnu.org Date: Wed, 12 Dec 2012 14:46:24 +0100 Message-Id: <1355319999-30627-6-git-send-email-pbonzini@redhat.com> X-Mailer: git-send-email 1.8.0.1 In-Reply-To: <1355319999-30627-1-git-send-email-pbonzini@redhat.com> References: <1355319999-30627-1-git-send-email-pbonzini@redhat.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 209.85.223.173 Cc: kwolf@redhat.com, jcody@redhat.com, stefanha@redhat.com Subject: [Qemu-devel] [PATCH 05/20] mirror: perform COW if the cluster size is bigger than the granularity X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org When mirroring runs, the backing files for the target may not yet be ready. However, this means that a copy-on-write operation on the target would fill the missing sectors with zeros. Copy-on-write only happens if the granularity of the dirty bitmap is smaller than the cluster size (and only for clusters that are allocated in the source after the job has started copying). So far, the granularity was fixed to 1MB; to avoid the problem we detected the situation and required the backing files to be available in that case only. However, we want to lower the granularity for efficiency, so we need a better solution. The solution is to always copy a whole cluster the first time it is touched. The code keeps a bitmap of clusters that have already been allocated by the mirroring job, and only does "manual" copy-on-write if the chunk being copied is zero in the bitmap. Signed-off-by: Paolo Bonzini Reviewed-by: Eric Blake --- block/mirror.c | 60 +++++++++++++++++++++++++++++++++++++++------- blockdev.c | 15 +++--------- tests/qemu-iotests/041 | 21 ++++++++++++++++ tests/qemu-iotests/041.out | 4 ++-- trace-events | 1 + 5 files changed, 78 insertions(+), 23 deletions(-) diff --git a/block/mirror.c b/block/mirror.c index 30bb267..14ce1a7 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -15,6 +15,7 @@ #include "blockjob.h" #include "block_int.h" #include "qemu/ratelimit.h" +#include "bitmap.h" enum { /* @@ -36,6 +37,8 @@ typedef struct MirrorBlockJob { bool synced; bool should_complete; int64_t sector_num; + size_t buf_size; + unsigned long *cow_bitmap; HBitmapIter hbi; uint8_t *buf; } MirrorBlockJob; @@ -60,7 +63,7 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s, BlockDriverState *target = s->target; QEMUIOVector qiov; int ret, nb_sectors; - int64_t end; + int64_t end, sector_num, cluster_num; struct iovec iov; s->sector_num = hbitmap_iter_next(&s->hbi); @@ -71,22 +74,41 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s, assert(s->sector_num >= 0); } + /* If we have no backing file yet in the destination, and the cluster size + * is very large, we need to do COW ourselves. The first time a cluster is + * copied, copy it entirely. + * + * Because both BDRV_SECTORS_PER_DIRTY_CHUNK and the cluster size are + * powers of two, the number of sectors to copy cannot exceed one cluster. + */ + sector_num = s->sector_num; + nb_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK; + cluster_num = sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK; + if (s->cow_bitmap && !test_bit(cluster_num, s->cow_bitmap)) { + trace_mirror_cow(s, sector_num); + bdrv_round_to_clusters(s->target, + sector_num, BDRV_SECTORS_PER_DIRTY_CHUNK, + §or_num, &nb_sectors); + bitmap_set(s->cow_bitmap, sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK, + nb_sectors / BDRV_SECTORS_PER_DIRTY_CHUNK); + } + end = s->common.len >> BDRV_SECTOR_BITS; - nb_sectors = MIN(BDRV_SECTORS_PER_DIRTY_CHUNK, end - s->sector_num); - bdrv_reset_dirty(source, s->sector_num, nb_sectors); + nb_sectors = MIN(nb_sectors, end - sector_num); + bdrv_reset_dirty(source, sector_num, nb_sectors); /* Copy the dirty cluster. */ iov.iov_base = s->buf; iov.iov_len = nb_sectors * 512; qemu_iovec_init_external(&qiov, &iov, 1); - trace_mirror_one_iteration(s, s->sector_num, nb_sectors); - ret = bdrv_co_readv(source, s->sector_num, nb_sectors, &qiov); + trace_mirror_one_iteration(s, sector_num, nb_sectors); + ret = bdrv_co_readv(source, sector_num, nb_sectors, &qiov); if (ret < 0) { *p_action = mirror_error_action(s, true, -ret); goto fail; } - ret = bdrv_co_writev(target, s->sector_num, nb_sectors, &qiov); + ret = bdrv_co_writev(target, sector_num, nb_sectors, &qiov); if (ret < 0) { *p_action = mirror_error_action(s, false, -ret); s->synced = false; @@ -96,7 +118,7 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s, fail: /* Try again later. */ - bdrv_set_dirty(source, s->sector_num, nb_sectors); + bdrv_set_dirty(source, sector_num, nb_sectors); return ret; } @@ -104,7 +126,9 @@ static void coroutine_fn mirror_run(void *opaque) { MirrorBlockJob *s = opaque; BlockDriverState *bs = s->common.bs; - int64_t sector_num, end; + int64_t sector_num, end, length; + BlockDriverInfo bdi; + char backing_filename[1024]; int ret = 0; int n; @@ -118,8 +142,23 @@ static void coroutine_fn mirror_run(void *opaque) return; } + /* If we have no backing file yet in the destination, we cannot let + * the destination do COW. Instead, we copy sectors around the + * dirty data if needed. We need a bitmap to do that. + */ + bdrv_get_backing_filename(s->target, backing_filename, + sizeof(backing_filename)); + if (backing_filename[0] && !s->target->backing_hd) { + bdrv_get_info(s->target, &bdi); + if (s->buf_size < bdi.cluster_size) { + s->buf_size = bdi.cluster_size; + length = (bdrv_getlength(bs) + BLOCK_SIZE - 1) / BLOCK_SIZE; + s->cow_bitmap = bitmap_new(length); + } + } + end = s->common.len >> BDRV_SECTOR_BITS; - s->buf = qemu_blockalign(bs, BLOCK_SIZE); + s->buf = qemu_blockalign(bs, s->buf_size); if (s->mode != MIRROR_SYNC_MODE_NONE) { /* First part, loop on the sectors and initialize the dirty bitmap. */ @@ -234,6 +273,7 @@ static void coroutine_fn mirror_run(void *opaque) immediate_exit: g_free(s->buf); + g_free(s->cow_bitmap); bdrv_set_dirty_tracking(bs, false); bdrv_iostatus_disable(s->target); if (s->should_complete && ret == 0) { @@ -320,6 +360,8 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target, s->on_target_error = on_target_error; s->target = target; s->mode = mode; + s->buf_size = BLOCK_SIZE; + bdrv_set_dirty_tracking(bs, true); bdrv_set_enable_write_cache(s->target, true); bdrv_set_on_error(s->target, on_target_error, on_target_error); diff --git a/blockdev.c b/blockdev.c index e73fd6e..f309dba 100644 --- a/blockdev.c +++ b/blockdev.c @@ -1196,7 +1196,6 @@ void qmp_drive_mirror(const char *device, const char *target, bool has_on_target_error, BlockdevOnError on_target_error, Error **errp) { - BlockDriverInfo bdi; BlockDriverState *bs; BlockDriverState *source, *target_bs; BlockDriver *proto_drv; @@ -1287,6 +1286,9 @@ void qmp_drive_mirror(const char *device, const char *target, return; } + /* Mirroring takes care of copy-on-write using the source's backing + * file. + */ target_bs = bdrv_new(""); ret = bdrv_open(target_bs, target, flags | BDRV_O_NO_BACKING, drv); @@ -1296,17 +1298,6 @@ void qmp_drive_mirror(const char *device, const char *target, return; } - /* We need a backing file if we will copy parts of a cluster. */ - if (bdrv_get_info(target_bs, &bdi) >= 0 && bdi.cluster_size != 0 && - bdi.cluster_size >= BDRV_SECTORS_PER_DIRTY_CHUNK * 512) { - ret = bdrv_open_backing_file(target_bs); - if (ret < 0) { - bdrv_delete(target_bs); - error_set(errp, QERR_OPEN_FILE_FAILED, target); - return; - } - } - mirror_start(bs, target_bs, speed, sync, on_source_error, on_target_error, block_job_cb, bs, &local_err); if (local_err != NULL) { diff --git a/tests/qemu-iotests/041 b/tests/qemu-iotests/041 index c6eb851..d0e9a7f 100755 --- a/tests/qemu-iotests/041 +++ b/tests/qemu-iotests/041 @@ -292,6 +292,27 @@ class TestMirrorNoBacking(ImageMirroringTestCase): self.assertTrue(self.compare_images(test_img, target_img), 'target image does not match source after mirroring') + def test_large_cluster(self): + self.assert_no_active_mirrors() + + # qemu-img create fails if the image is not there + qemu_img('create', '-f', iotests.imgfmt, '-o', 'size=%d' + %(TestMirrorNoBacking.image_len), target_backing_img) + qemu_img('create', '-f', iotests.imgfmt, '-o', 'cluster_size=%d,backing_file=%s' + % (TestMirrorNoBacking.image_len, target_backing_img), target_img) + os.remove(target_backing_img) + + result = self.vm.qmp('drive-mirror', device='drive0', sync='full', + mode='existing', target=target_img) + self.assert_qmp(result, 'return', {}) + + self.complete_and_wait() + result = self.vm.qmp('query-block') + self.assert_qmp(result, 'return[0]/inserted/file', target_img) + self.vm.shutdown() + self.assertTrue(self.compare_images(test_img, target_img), + 'target image does not match source after mirroring') + class TestReadErrors(ImageMirroringTestCase): image_len = 2 * 1024 * 1024 # MB diff --git a/tests/qemu-iotests/041.out b/tests/qemu-iotests/041.out index 71009c2..4176bb9 100644 --- a/tests/qemu-iotests/041.out +++ b/tests/qemu-iotests/041.out @@ -1,5 +1,5 @@ -.................. +................... ---------------------------------------------------------------------- -Ran 18 tests +Ran 19 tests OK diff --git a/trace-events b/trace-events index b868ac6..b318b4e 100644 --- a/trace-events +++ b/trace-events @@ -84,6 +84,7 @@ mirror_before_flush(void *s) "s %p" mirror_before_drain(void *s, int64_t cnt) "s %p dirty count %"PRId64 mirror_before_sleep(void *s, int64_t cnt, int synced) "s %p dirty count %"PRId64" synced %d" mirror_one_iteration(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d" +mirror_cow(void *s, int64_t sector_num) "s %p sector_num %"PRId64 # blockdev.c qmp_block_job_cancel(void *job) "job %p"