From patchwork Tue Jul 24 11:04:22 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Bonzini X-Patchwork-Id: 172876 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id D9D9E2C0090 for ; Tue, 24 Jul 2012 22:45:24 +1000 (EST) Received: from localhost ([::1]:47798 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1SteUN-0002ll-2G for incoming@patchwork.ozlabs.org; Tue, 24 Jul 2012 08:45:23 -0400 Received: from eggs.gnu.org ([208.118.235.92]:57114) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Stcx6-0000QN-Te for qemu-devel@nongnu.org; Tue, 24 Jul 2012 07:06:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Stcx0-0005wP-6w for qemu-devel@nongnu.org; Tue, 24 Jul 2012 07:06:56 -0400 Received: from mail-gg0-f173.google.com ([209.85.161.173]:58807) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Stcx0-0005ql-0X for qemu-devel@nongnu.org; Tue, 24 Jul 2012 07:06:50 -0400 Received: by mail-gg0-f173.google.com with SMTP id p1so6580043ggn.4 for ; Tue, 24 Jul 2012 04:06:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:x-mailer:in-reply-to :references; bh=/5UaYtWUJ/OyWLwKjQYG5OjjZomc0T0fVGP2nE1Ue6o=; b=ubxZlnAOKyeac5rv5YiF9D5zU4A0rSjw12Z/Ov6tA8F3Y7ioE75HXuenJJjE3Vv751 0OPMB8LUpE207IXcy+5jNKASD4JTXkTAHBtudXSh8zWpIh0sQowgjlpqzvtNF7w2c828 Ev0joJiWH/fyukj/kRkUmoAtdndMUZOrhPJ9hHOlUR6S1dZ3uQ8T7mjDyrgjdC2h5MSL Dxbn4BMGmB4Rm0HYoPKHtk3PiexflElH4b+a/g6gIz5iBmHPa6nUfqrY7+KUAZC7utlW GwZurgLaJxCbo/CAPxrCMgc6BYcKwehdhcrMqYVe3PE337MfKL1MNd4lohKfCU0fQHo4 U1VA== Received: by 10.42.38.200 with SMTP id d8mr9210886ice.19.1343128009689; Tue, 24 Jul 2012 04:06:49 -0700 (PDT) Received: from yakj.usersys.redhat.com (93-34-189-113.ip51.fastwebnet.it. [93.34.189.113]) by mx.google.com with ESMTPS id if4sm1752561igc.10.2012.07.24.04.06.46 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 24 Jul 2012 04:06:48 -0700 (PDT) From: Paolo Bonzini To: qemu-devel@nongnu.org Date: Tue, 24 Jul 2012 13:04:22 +0200 Message-Id: <1343127865-16608-45-git-send-email-pbonzini@redhat.com> X-Mailer: git-send-email 1.7.10.4 In-Reply-To: <1343127865-16608-1-git-send-email-pbonzini@redhat.com> References: <1343127865-16608-1-git-send-email-pbonzini@redhat.com> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 209.85.161.173 Cc: kwolf@redhat.com, jcody@redhat.com, eblake@redhat.com, stefanha@linux.vnet.ibm.com Subject: [Qemu-devel] [PATCH 44/47] mirror: switch mirror_iteration to AIO X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org There is really no change in the behavior of the job here, since there is still a maximum of one in-flight I/O operation between the source and the target. However, this patch already introduces moves the copy logic from mirror_iteration to AIO callbacks; it also adds the logic to count in-flight operations, and only complete the job after they have finished. Some care is required in the error and cancellation cases, in order to avoid access to dangling pointers (and consequent corruption). Signed-off-by: Paolo Bonzini --- block/mirror.c | 161 ++++++++++++++++++++++++++++++++++++++++++-------------- trace-events | 2 + 2 files changed, 123 insertions(+), 40 deletions(-) diff --git a/block/mirror.c b/block/mirror.c index 81a600b..971c923 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -17,7 +17,7 @@ #include "qemu/ratelimit.h" #include "bitmap.h" -#define SLICE_TIME 100000000ULL /* ns */ +#define SLICE_TIME 100000000ULL /* ns */ typedef struct MirrorBlockJob { BlockJob common; @@ -33,17 +33,78 @@ typedef struct MirrorBlockJob { unsigned long *cow_bitmap; HBitmapIter hbi; uint8_t *buf; + + int in_flight; + int ret; } MirrorBlockJob; -static int coroutine_fn mirror_iteration(MirrorBlockJob *s, - BlockErrorAction *p_action) +typedef struct MirrorOp { + MirrorBlockJob *s; + QEMUIOVector qiov; + struct iovec iov; + int64_t sector_num; + int nb_sectors; +} MirrorOp; + +static void mirror_iteration_done(MirrorOp *op) { + MirrorBlockJob *s = op->s; + + s->in_flight--; + trace_mirror_iteration_done(s, op->sector_num, op->nb_sectors); + g_slice_free(MirrorOp, op); + qemu_coroutine_enter(s->common.co, NULL); +} + +static void mirror_write_complete(void *opaque, int ret) +{ + MirrorOp *op = opaque; + MirrorBlockJob *s = op->s; BlockDriverState *source = s->common.bs; BlockDriverState *target = s->target; - QEMUIOVector qiov; - int ret, nb_sectors, nb_sectors_chunk; + BlockErrorAction action; + + if (ret < 0) { + bdrv_set_dirty(source, op->sector_num, op->nb_sectors); + action = block_job_error_action(&s->common, target, s->on_target_error, + false, -ret); + s->synced = false; + if (action == BDRV_ACTION_REPORT && s->ret >= 0) { + s->ret = ret; + } + } + mirror_iteration_done(op); +} + +static void mirror_read_complete(void *opaque, int ret) +{ + MirrorOp *op = opaque; + MirrorBlockJob *s = op->s; + BlockDriverState *source = s->common.bs; + BlockDriverState *target = s->target; + BlockErrorAction action; + if (ret < 0) { + bdrv_set_dirty(source, op->sector_num, op->nb_sectors); + action = block_job_error_action(&s->common, source, s->on_source_error, + true, -ret); + s->synced = false; + if (action == BDRV_ACTION_REPORT && s->ret >= 0) { + s->ret = ret; + } + + mirror_iteration_done(op); + return; + } + bdrv_aio_writev(target, op->sector_num, &op->qiov, op->nb_sectors, + mirror_write_complete, op); +} + +static void coroutine_fn mirror_iteration(MirrorBlockJob *s) +{ + BlockDriverState *source = s->common.bs; + int nb_sectors, nb_sectors_chunk; int64_t end, sector_num, cluster_num; - struct iovec iov; + MirrorOp *op; s->sector_num = hbitmap_iter_next(&s->hbi); if (s->sector_num < 0) { @@ -74,34 +135,30 @@ static int coroutine_fn mirror_iteration(MirrorBlockJob *s, end = s->common.len >> BDRV_SECTOR_BITS; nb_sectors = MIN(nb_sectors, end - sector_num); + + /* Allocate a MirrorOp that is used as an AIO callback. */ + op = g_slice_new(MirrorOp); + op->s = s; + op->iov.iov_base = s->buf; + op->iov.iov_len = nb_sectors * 512; + op->sector_num = sector_num; + op->nb_sectors = nb_sectors; + qemu_iovec_init_external(&op->qiov, &op->iov, 1); + bdrv_reset_dirty(source, sector_num, nb_sectors); /* Copy the dirty cluster. */ - iov.iov_base = s->buf; - iov.iov_len = nb_sectors * 512; - qemu_iovec_init_external(&qiov, &iov, 1); - + s->in_flight++; trace_mirror_one_iteration(s, sector_num, nb_sectors); - ret = bdrv_co_readv(source, sector_num, nb_sectors, &qiov); - if (ret < 0) { - *p_action = block_job_error_action(&s->common, source, - s->on_source_error, true, -ret); - s->synced = false; - goto fail; - } - ret = bdrv_co_writev(target, sector_num, nb_sectors, &qiov); - if (ret < 0) { - *p_action = block_job_error_action(&s->common, target, - s->on_target_error, false, -ret); - s->synced = false; - goto fail; - } - return 0; + bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors, + mirror_read_complete, op); +} -fail: - /* Try again later. */ - bdrv_set_dirty(source, sector_num, nb_sectors); - return ret; +static void mirror_drain(MirrorBlockJob *s) +{ + while (s->in_flight > 0) { + qemu_coroutine_yield(); + } } static void coroutine_fn mirror_run(void *opaque) @@ -109,6 +166,7 @@ static void coroutine_fn mirror_run(void *opaque) MirrorBlockJob *s = opaque; BlockDriverState *bs = s->common.bs; int64_t sector_num, end, nb_sectors_chunk, length; + uint64_t last_pause_ns; BlockDriverInfo bdi; char backing_filename[1024]; int ret = 0; @@ -168,22 +226,37 @@ static void coroutine_fn mirror_run(void *opaque) } bdrv_dirty_iter_init(bs, &s->hbi); + last_pause_ns = qemu_get_clock_ns(rt_clock); for (;;) { uint64_t delay_ns; int64_t cnt; bool should_complete; + if (s->ret < 0) { + ret = s->ret; + break; + } + cnt = bdrv_get_dirty_count(bs); - if (cnt != 0) { - BlockErrorAction action = BDRV_ACTION_REPORT; - ret = mirror_iteration(s, &action); - if (ret < 0 && action == BDRV_ACTION_REPORT) { - break; + + /* Note that even when no rate limit is applied we need to yield + * periodically with no pending I/O so that qemu_aio_flush() returns. + * We do so every SLICE_TIME milliseconds, or when there is an error, + * or when the source is clean, whichever comes first. + */ + if (qemu_get_clock_ns(rt_clock) - last_pause_ns < SLICE_TIME && + s->common.iostatus == BLOCK_DEVICE_IO_STATUS_OK) { + if (s->in_flight > 0) { + trace_mirror_yield(s, s->in_flight, cnt); + qemu_coroutine_yield(); + continue; + } else if (cnt != 0) { + mirror_iteration(s); + continue; } - cnt = bdrv_get_dirty_count(bs); } - if (cnt != 0) { + if (s->in_flight > 0 || cnt != 0) { should_complete = false; } else { trace_mirror_before_flush(s); @@ -228,15 +301,12 @@ static void coroutine_fn mirror_run(void *opaque) delay_ns = 0; } - /* Note that even when no rate limit is applied we need to yield - * with no pending I/O here so that qemu_aio_flush() returns. - */ block_job_sleep_ns(&s->common, rt_clock, delay_ns); if (block_job_is_cancelled(&s->common)) { break; } } else if (!should_complete) { - delay_ns = (cnt == 0 ? SLICE_TIME : 0); + delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0); block_job_sleep_ns(&s->common, rt_clock, delay_ns); } else if (cnt == 0) { /* The two disks are in sync. Exit and report successful @@ -246,9 +316,20 @@ static void coroutine_fn mirror_run(void *opaque) s->common.cancelled = false; break; } + last_pause_ns = qemu_get_clock_ns(rt_clock); + } + + if (s->in_flight > 0) { + /* We get here only if something went wrong. Either the job failed, + * or it was cancelled prematurely so that we do not guarantee that + * the target is a copy of the source. + */ + assert(ret < 0 || (!s->synced && block_job_is_cancelled(&s->common))); + mirror_drain(s); } immediate_exit: + assert(s->in_flight == 0); g_free(s->buf); g_free(s->cow_bitmap); bdrv_set_dirty_tracking(bs, 0); diff --git a/trace-events b/trace-events index 6b504d8..fe20bd7 100644 --- a/trace-events +++ b/trace-events @@ -83,6 +83,8 @@ mirror_before_drain(void *s, int64_t cnt) "s %p dirty count %"PRId64 mirror_before_sleep(void *s, int64_t cnt, int synced) "s %p dirty count %"PRId64" synced %d" mirror_one_iteration(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d" mirror_cow(void *s, int64_t sector_num) "s %p sector_num %"PRId64 +mirror_iteration_done(void *s, int64_t sector_num, int nb_sectors) "s %p sector_num %"PRId64" nb_sectors %d" +mirror_yield(void *s, int64_t cnt, int in_flight) "s %p dirty count %"PRId64" in_flight %d" # blockdev.c qmp_block_job_cancel(void *job) "job %p"