From patchwork Fri Jan 18 16:22:48 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 213669 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 6AC972C0084 for ; Sat, 19 Jan 2013 03:23:04 +1100 (EST) Received: from localhost ([::1]:33473 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TwEic-0005s1-Io for incoming@patchwork.ozlabs.org; Fri, 18 Jan 2013 11:23:02 -0500 Received: from eggs.gnu.org ([208.118.235.92]:51253) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TwEiU-0005rX-6F for qemu-devel@nongnu.org; Fri, 18 Jan 2013 11:22:56 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TwEiS-0003CD-OH for qemu-devel@nongnu.org; Fri, 18 Jan 2013 11:22:54 -0500 Received: from mx1.redhat.com ([209.132.183.28]:20236) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TwEiS-0003C9-GH for qemu-devel@nongnu.org; Fri, 18 Jan 2013 11:22:52 -0500 Received: from int-mx10.intmail.prod.int.phx2.redhat.com (int-mx10.intmail.prod.int.phx2.redhat.com [10.5.11.23]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r0IGMpNO026559 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Fri, 18 Jan 2013 11:22:51 -0500 Received: from yakj.usersys.redhat.com (ovpn-112-26.ams2.redhat.com [10.36.112.26]) by int-mx10.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id r0IGMnEq024792; Fri, 18 Jan 2013 11:22:49 -0500 Message-ID: <50F976D8.60303@redhat.com> Date: Fri, 18 Jan 2013 17:22:48 +0100 From: Paolo Bonzini User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2 MIME-Version: 1.0 To: Kevin Wolf References: <1358357479-7912-1-git-send-email-pbonzini@redhat.com> <1358357479-7912-6-git-send-email-pbonzini@redhat.com> <50F9668A.4080308@redhat.com> In-Reply-To: <50F9668A.4080308@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.23 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x X-Received-From: 209.132.183.28 Cc: qemu-devel@nongnu.org, stefanha@redhat.com Subject: Re: [Qemu-devel] [PATCH v2 05/12] mirror: perform COW if the cluster size is bigger than the granularity X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Il 18/01/2013 16:13, Kevin Wolf ha scritto: > Am 16.01.2013 18:31, schrieb Paolo Bonzini: >> When mirroring runs, the backing files for the target may not yet be >> ready. However, this means that a copy-on-write operation on the target >> would fill the missing sectors with zeros. Copy-on-write only happens >> if the granularity of the dirty bitmap is smaller than the cluster size >> (and only for clusters that are allocated in the source after the job >> has started copying). So far, the granularity was fixed to 1MB; to avoid >> the problem we detected the situation and required the backing files to >> be available in that case only. >> >> However, we want to lower the granularity for efficiency, so we need >> a better solution. The solution is to always copy a whole cluster the >> first time it is touched. The code keeps a bitmap of clusters that >> have already been allocated by the mirroring job, and only does "manual" >> copy-on-write if the chunk being copied is zero in the bitmap. >> >> Signed-off-by: Paolo Bonzini >> --- >> v1->v2: rebased for moved include files >> >> block/mirror.c | 60 +++++++++++++++++++++++++++++++++++++------ >> blockdev.c | 15 ++--------- >> tests/qemu-iotests/041 | 21 +++++++++++++++ >> tests/qemu-iotests/041.out | 4 +- >> trace-events | 1 + >> 5 files changed, 78 insertions(+), 23 deletions(-) >> >> diff --git a/block/mirror.c b/block/mirror.c >> index 20cb1e7..ee45e2e 100644 >> --- a/block/mirror.c >> +++ b/block/mirror.c >> @@ -15,6 +15,7 @@ >> #include "block/blockjob.h" >> #include "block/block_int.h" >> #include "qemu/ratelimit.h" >> +#include "qemu/bitmap.h" >> >> enum { >> /* >> @@ -36,6 +37,8 @@ typedef struct MirrorBlockJob { >> bool synced; >> bool should_complete; >> int64_t sector_num; >> + size_t buf_size; >> + unsigned long *cow_bitmap; >> HBitmapIter hbi; >> uint8_t *buf; >> } MirrorBlockJob; >> @@ -60,7 +63,7 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s, >> BlockDriverState *target = s->target; >> QEMUIOVector qiov; >> int ret, nb_sectors; >> - int64_t end; >> + int64_t end, sector_num, cluster_num; >> struct iovec iov; >> >> s->sector_num = hbitmap_iter_next(&s->hbi); >> @@ -71,22 +74,41 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s, >> assert(s->sector_num >= 0); >> } >> >> + /* If we have no backing file yet in the destination, and the cluster size >> + * is very large, we need to do COW ourselves. The first time a cluster is >> + * copied, copy it entirely. >> + * >> + * Because both BDRV_SECTORS_PER_DIRTY_CHUNK and the cluster size are >> + * powers of two, the number of sectors to copy cannot exceed one cluster. >> + */ >> + sector_num = s->sector_num; >> + nb_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK; >> + cluster_num = sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK; >> + if (s->cow_bitmap && !test_bit(cluster_num, s->cow_bitmap)) { >> + trace_mirror_cow(s, sector_num); >> + bdrv_round_to_clusters(s->target, >> + sector_num, BDRV_SECTORS_PER_DIRTY_CHUNK, >> + §or_num, &nb_sectors); >> + bitmap_set(s->cow_bitmap, sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK, >> + nb_sectors / BDRV_SECTORS_PER_DIRTY_CHUNK); > > Here the bit in the cow_bitmap is set before the COW has actually been > performed. It could still fail. > >> + } >> + >> end = s->common.len >> BDRV_SECTOR_BITS; >> - nb_sectors = MIN(BDRV_SECTORS_PER_DIRTY_CHUNK, end - s->sector_num); >> - bdrv_reset_dirty(source, s->sector_num, nb_sectors); >> + nb_sectors = MIN(nb_sectors, end - sector_num); >> + bdrv_reset_dirty(source, sector_num, nb_sectors); >> >> /* Copy the dirty cluster. */ >> iov.iov_base = s->buf; >> iov.iov_len = nb_sectors * 512; >> qemu_iovec_init_external(&qiov, &iov, 1); >> >> - trace_mirror_one_iteration(s, s->sector_num, nb_sectors); >> - ret = bdrv_co_readv(source, s->sector_num, nb_sectors, &qiov); >> + trace_mirror_one_iteration(s, sector_num, nb_sectors); >> + ret = bdrv_co_readv(source, sector_num, nb_sectors, &qiov); >> if (ret < 0) { >> *p_action = mirror_error_action(s, true, -ret); >> goto fail; >> } >> - ret = bdrv_co_writev(target, s->sector_num, nb_sectors, &qiov); >> + ret = bdrv_co_writev(target, sector_num, nb_sectors, &qiov); >> if (ret < 0) { >> *p_action = mirror_error_action(s, false, -ret); >> s->synced = false; >> @@ -96,7 +118,7 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s, >> >> fail: >> /* Try again later. */ >> - bdrv_set_dirty(source, s->sector_num, nb_sectors); >> + bdrv_set_dirty(source, sector_num, nb_sectors); > > If it does, we mark the whole cluster dirty now, but in the cow_bitmap > it's still marked at present on the target. When restarting the job, > wouldn't it copy only the start of the cluster next time and corrupt the > rest of it? Yes, very good catch. I think this should fix it. I haven't written a testcase for it, it's tricky but should be doable. Do you want me to respin, or can it be done as a followup? I would prefer a followup also because it will give a better pointer when we backport this fix to the RHEL6 code. Paolo diff --git a/block/mirror.c b/block/mirror.c index 82abc2f..0fc140a 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -87,6 +87,9 @@ static void mirror_iteration_done(MirrorOp *op) cluster_num = op->sector_num / s->granularity; nb_chunks = op->nb_sectors / s->granularity; bitmap_clear(s->in_flight_bitmap, cluster_num, nb_chunks); + if (s->cow_bitmap) { + bitmap_set(s->cow_bitmap, cluster_num, nb_chunks); + } trace_mirror_iteration_done(s, op->sector_num, op->nb_sectors); g_slice_free(MirrorOp, op); @@ -217,9 +220,6 @@ static void coroutine_fn mirror_iteration(MirrorBlockJob *s) /* We have enough free space to copy these sectors. */ bitmap_set(s->in_flight_bitmap, next_cluster, added_chunks); - if (s->cow_bitmap) { - bitmap_set(s->cow_bitmap, next_cluster, added_chunks); - } nb_sectors += added_sectors; nb_chunks += added_chunks;