From patchwork Mon Jul 9 16:37:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vladimir Sementsov-Ogievskiy X-Patchwork-Id: 941459 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=virtuozzo.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41PWL24ywyz9ryt for ; Tue, 10 Jul 2018 02:40:58 +1000 (AEST) Received: from localhost ([::1]:43327 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fcZDc-00054L-CE for incoming@patchwork.ozlabs.org; Mon, 09 Jul 2018 12:40:56 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57850) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fcZAH-0002bq-7J for qemu-devel@nongnu.org; Mon, 09 Jul 2018 12:37:30 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fcZAB-0002cr-AP for qemu-devel@nongnu.org; Mon, 09 Jul 2018 12:37:29 -0400 Received: from relay.sw.ru ([185.231.240.75]:32922) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fcZAB-0002ax-1p; Mon, 09 Jul 2018 12:37:23 -0400 Received: from vz-out.virtuozzo.com ([185.231.240.5] helo=kvm.sw.ru) by relay.sw.ru with esmtp (Exim 4.90_1) (envelope-from ) id 1fcZA7-00062Y-Tc; Mon, 09 Jul 2018 19:37:19 +0300 From: Vladimir Sementsov-Ogievskiy To: qemu-devel@nongnu.org, qemu-block@nongnu.org Date: Mon, 9 Jul 2018 19:37:16 +0300 Message-Id: <20180709163719.87107-2-vsementsov@virtuozzo.com> X-Mailer: git-send-email 2.11.1 In-Reply-To: <20180709163719.87107-1-vsementsov@virtuozzo.com> References: <20180709163719.87107-1-vsementsov@virtuozzo.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 185.231.240.75 Subject: [Qemu-devel] [PATCH v5 1/4] block/io: fix copy_range X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, vsementsov@virtuozzo.com, famz@redhat.com, ronniesahlberg@gmail.com, jcody@redhat.com, pl@kamp.de, mreitz@redhat.com, stefanha@redhat.com, den@openvz.org, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Here two things are fixed: 1. Architecture On each recursion step, we go to the child of src or dst, only for one of them. So, it's wrong to create tracked requests for both on each step. It leads to tracked requests duplication. 2. Wait for serializing requests on write path independently of BDRV_REQ_NO_SERIALISING Before commit 9ded4a01149 "backup: Use copy offloading", BDRV_REQ_NO_SERIALISING was used for only one case: read in copy-on-write operation during backup. Also, the flag was handled only on read path (in bdrv_co_preadv and bdrv_aligned_preadv). After 9ded4a01149, flag is used for not waiting serializing operations on backup target (in same case of copy-on-write operation). This behavior change is unsubstantiated and potentially dangerous, let's drop it and add additional asserts and documentation. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fam Zheng --- include/block/block.h | 11 +++++++++++ block/io.c | 42 +++++++++++++++++++++++++++--------------- 2 files changed, 38 insertions(+), 15 deletions(-) diff --git a/include/block/block.h b/include/block/block.h index bc76b1e59f..5320f82ac9 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -50,6 +50,17 @@ typedef enum { * opened with BDRV_O_UNMAP. */ BDRV_REQ_MAY_UNMAP = 0x4, + + /* The BDRV_REQ_NO_SERIALISING flag is only valid for reads and means that + * we don't want wait_serialising_requests() during read operation. + * + * This flag is used for backup copy on write operation, when we need to + * read old data before write (write notifier triggered). It is okay since + * we already waited for other serializing requests in the initiating write + * (see bdrv_aligned_pwritev), and it is necessary if the initiating write + * is already serializing (without the flag, the read would deadlock + * waiting for the serialising write to complete). + */ BDRV_REQ_NO_SERIALISING = 0x8, BDRV_REQ_FUA = 0x10, BDRV_REQ_WRITE_COMPRESSED = 0x20, diff --git a/block/io.c b/block/io.c index 1a2272fad3..94b6ec94e2 100644 --- a/block/io.c +++ b/block/io.c @@ -1572,6 +1572,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child, max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX), align); + /* BDRV_REQ_NO_SERIALISING is only for read operation */ + assert(!(flags & BDRV_REQ_NO_SERIALISING)); waited = wait_serialising_requests(req); assert(!waited || !req->serialising); assert(req->overlap_offset <= offset); @@ -2896,7 +2898,7 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src, BdrvRequestFlags flags, bool recurse_src) { - BdrvTrackedRequest src_req, dst_req; + BdrvTrackedRequest req; int ret; if (!dst || !dst->bs) { @@ -2923,32 +2925,42 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src, || src->bs->encrypted || dst->bs->encrypted) { return -ENOTSUP; } - bdrv_inc_in_flight(src->bs); - bdrv_inc_in_flight(dst->bs); - tracked_request_begin(&src_req, src->bs, src_offset, - bytes, BDRV_TRACKED_READ); - tracked_request_begin(&dst_req, dst->bs, dst_offset, - bytes, BDRV_TRACKED_WRITE); - if (!(flags & BDRV_REQ_NO_SERIALISING)) { - wait_serialising_requests(&src_req); - wait_serialising_requests(&dst_req); - } if (recurse_src) { + bdrv_inc_in_flight(src->bs); + tracked_request_begin(&req, src->bs, src_offset, bytes, + BDRV_TRACKED_READ); + + if (!(flags & BDRV_REQ_NO_SERIALISING)) { + wait_serialising_requests(&req); + } + ret = src->bs->drv->bdrv_co_copy_range_from(src->bs, src, src_offset, dst, dst_offset, bytes, flags); + + tracked_request_end(&req); + bdrv_dec_in_flight(src->bs); } else { + bdrv_inc_in_flight(dst->bs); + tracked_request_begin(&req, dst->bs, dst_offset, bytes, + BDRV_TRACKED_WRITE); + + /* BDRV_REQ_NO_SERIALISING is only for read operation, + * so we ignore it in flags. + */ + wait_serialising_requests(&req); + ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs, src, src_offset, dst, dst_offset, bytes, flags); + + tracked_request_end(&req); + bdrv_dec_in_flight(dst->bs); } - tracked_request_end(&src_req); - tracked_request_end(&dst_req); - bdrv_dec_in_flight(src->bs); - bdrv_dec_in_flight(dst->bs); + return ret; } From patchwork Mon Jul 9 16:37:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vladimir Sementsov-Ogievskiy X-Patchwork-Id: 941445 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=virtuozzo.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41PWH23Wskz9ryt for ; Tue, 10 Jul 2018 02:38:22 +1000 (AEST) Received: from localhost ([::1]:43311 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fcZB6-0002ge-3P for incoming@patchwork.ozlabs.org; Mon, 09 Jul 2018 12:38:20 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57834) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fcZAG-0002bk-83 for qemu-devel@nongnu.org; Mon, 09 Jul 2018 12:37:30 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fcZAB-0002d7-Eq for qemu-devel@nongnu.org; Mon, 09 Jul 2018 12:37:28 -0400 Received: from relay.sw.ru ([185.231.240.75]:32928) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fcZAB-0002ay-2C; Mon, 09 Jul 2018 12:37:23 -0400 Received: from vz-out.virtuozzo.com ([185.231.240.5] helo=kvm.sw.ru) by relay.sw.ru with esmtp (Exim 4.90_1) (envelope-from ) id 1fcZA8-00062Y-2B; Mon, 09 Jul 2018 19:37:20 +0300 From: Vladimir Sementsov-Ogievskiy To: qemu-devel@nongnu.org, qemu-block@nongnu.org Date: Mon, 9 Jul 2018 19:37:17 +0300 Message-Id: <20180709163719.87107-3-vsementsov@virtuozzo.com> X-Mailer: git-send-email 2.11.1 In-Reply-To: <20180709163719.87107-1-vsementsov@virtuozzo.com> References: <20180709163719.87107-1-vsementsov@virtuozzo.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 185.231.240.75 Subject: [Qemu-devel] [PATCH v5 2/4] block: split flags in copy_range X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, vsementsov@virtuozzo.com, famz@redhat.com, ronniesahlberg@gmail.com, jcody@redhat.com, pl@kamp.de, mreitz@redhat.com, stefanha@redhat.com, den@openvz.org, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Pass read flags and write flags separately. This is needed to handle coming BDRV_REQ_NO_SERIALISING clearly in following patches. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fam Zheng --- include/block/block.h | 3 ++- include/block/block_int.h | 14 +++++++++---- include/sysemu/block-backend.h | 3 ++- block/backup.c | 2 +- block/block-backend.c | 5 +++-- block/file-posix.c | 21 +++++++++++-------- block/io.c | 46 +++++++++++++++++++++++------------------- block/iscsi.c | 9 ++++++--- block/qcow2.c | 20 +++++++++--------- block/raw-format.c | 24 ++++++++++++++-------- qemu-img.c | 2 +- 11 files changed, 90 insertions(+), 59 deletions(-) diff --git a/include/block/block.h b/include/block/block.h index 5320f82ac9..57a96d4988 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -690,5 +690,6 @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host); **/ int coroutine_fn bdrv_co_copy_range(BdrvChild *src, uint64_t src_offset, BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags); + uint64_t bytes, BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags); #endif diff --git a/include/block/block_int.h b/include/block/block_int.h index af71b414be..119b6e4ea5 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -218,7 +218,8 @@ struct BlockDriver { BdrvChild *dst, uint64_t dst_offset, uint64_t bytes, - BdrvRequestFlags flags); + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags); /* Map [offset, offset + nbytes) range onto a child of bs to copy data to, * and invoke bdrv_co_copy_range_to(child, src, ...), or perform the copy @@ -234,7 +235,8 @@ struct BlockDriver { BdrvChild *dst, uint64_t dst_offset, uint64_t bytes, - BdrvRequestFlags flags); + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags); /* * Building block for bdrv_block_status[_above] and @@ -1153,10 +1155,14 @@ void blockdev_close_all_bdrv_states(void); int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset, BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags); + uint64_t bytes, + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags); int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset, BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags); + uint64_t bytes, + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags); int refresh_total_sectors(BlockDriverState *bs, int64_t hint); diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index 8d03d493c2..830d873f24 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -234,6 +234,7 @@ void blk_unregister_buf(BlockBackend *blk, void *host); int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in, BlockBackend *blk_out, int64_t off_out, - int bytes, BdrvRequestFlags flags); + int bytes, BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags); #endif diff --git a/block/backup.c b/block/backup.c index 81895ddbe2..f3e4e814b6 100644 --- a/block/backup.c +++ b/block/backup.c @@ -163,7 +163,7 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job, hbitmap_reset(job->copy_bitmap, start / job->cluster_size, nr_clusters); ret = blk_co_copy_range(blk, start, job->target, start, nbytes, - is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0); + is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0, 0); if (ret < 0) { trace_backup_do_cow_copy_range_fail(job, start, ret); hbitmap_set(job->copy_bitmap, start / job->cluster_size, diff --git a/block/block-backend.c b/block/block-backend.c index 6b75bca317..ac8c3e0b1c 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -2218,7 +2218,8 @@ void blk_unregister_buf(BlockBackend *blk, void *host) int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in, BlockBackend *blk_out, int64_t off_out, - int bytes, BdrvRequestFlags flags) + int bytes, BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { int r; r = blk_check_byte_request(blk_in, off_in, bytes); @@ -2231,5 +2232,5 @@ int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in, } return bdrv_co_copy_range(blk_in->root, off_in, blk_out->root, off_out, - bytes, flags); + bytes, read_flags, write_flags); } diff --git a/block/file-posix.c b/block/file-posix.c index 98987b80f1..4fec8cb53c 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -2589,18 +2589,23 @@ static void raw_abort_perm_update(BlockDriverState *bs) raw_handle_perm_lock(bs, RAW_PL_ABORT, 0, 0, NULL); } -static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs, - BdrvChild *src, uint64_t src_offset, - BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags) +static int coroutine_fn raw_co_copy_range_from( + BlockDriverState *bs, BdrvChild *src, uint64_t src_offset, + BdrvChild *dst, uint64_t dst_offset, uint64_t bytes, + BdrvRequestFlags read_flags, BdrvRequestFlags write_flags) { - return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, flags); + return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, + read_flags, write_flags); } static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs, - BdrvChild *src, uint64_t src_offset, - BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags) + BdrvChild *src, + uint64_t src_offset, + BdrvChild *dst, + uint64_t dst_offset, + uint64_t bytes, + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { BDRVRawState *s = bs->opaque; BDRVRawState *src_s; diff --git a/block/io.c b/block/io.c index 94b6ec94e2..5d5651ad84 100644 --- a/block/io.c +++ b/block/io.c @@ -2890,13 +2890,11 @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host) } } -static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src, - uint64_t src_offset, - BdrvChild *dst, - uint64_t dst_offset, - uint64_t bytes, - BdrvRequestFlags flags, - bool recurse_src) +static int coroutine_fn bdrv_co_copy_range_internal( + BdrvChild *src, uint64_t src_offset, BdrvChild *dst, + uint64_t dst_offset, uint64_t bytes, + BdrvRequestFlags read_flags, BdrvRequestFlags write_flags, + bool recurse_src) { BdrvTrackedRequest req; int ret; @@ -2908,8 +2906,8 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src, if (ret) { return ret; } - if (flags & BDRV_REQ_ZERO_WRITE) { - return bdrv_co_pwrite_zeroes(dst, dst_offset, bytes, flags); + if (write_flags & BDRV_REQ_ZERO_WRITE) { + return bdrv_co_pwrite_zeroes(dst, dst_offset, bytes, write_flags); } if (!src || !src->bs) { @@ -2931,14 +2929,15 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src, tracked_request_begin(&req, src->bs, src_offset, bytes, BDRV_TRACKED_READ); - if (!(flags & BDRV_REQ_NO_SERIALISING)) { + if (!(read_flags & BDRV_REQ_NO_SERIALISING)) { wait_serialising_requests(&req); } ret = src->bs->drv->bdrv_co_copy_range_from(src->bs, src, src_offset, dst, dst_offset, - bytes, flags); + bytes, + read_flags, write_flags); tracked_request_end(&req); bdrv_dec_in_flight(src->bs); @@ -2947,15 +2946,15 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src, tracked_request_begin(&req, dst->bs, dst_offset, bytes, BDRV_TRACKED_WRITE); - /* BDRV_REQ_NO_SERIALISING is only for read operation, - * so we ignore it in flags. - */ + /* BDRV_REQ_NO_SERIALISING is only for read operation */ + assert(!(write_flags & BDRV_REQ_NO_SERIALISING)); wait_serialising_requests(&req); ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs, src, src_offset, dst, dst_offset, - bytes, flags); + bytes, + read_flags, write_flags); tracked_request_end(&req); bdrv_dec_in_flight(dst->bs); @@ -2970,10 +2969,12 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src, * semantics. */ int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset, BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags) + uint64_t bytes, + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset, - bytes, flags, true); + bytes, read_flags, write_flags, true); } /* Copy range from @src to @dst. @@ -2982,19 +2983,22 @@ int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset, * semantics. */ int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset, BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags) + uint64_t bytes, + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset, - bytes, flags, false); + bytes, read_flags, write_flags, false); } int coroutine_fn bdrv_co_copy_range(BdrvChild *src, uint64_t src_offset, BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags) + uint64_t bytes, BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { return bdrv_co_copy_range_from(src, src_offset, dst, dst_offset, - bytes, flags); + bytes, read_flags, write_flags); } static void bdrv_parent_cb_resize(BlockDriverState *bs) diff --git a/block/iscsi.c b/block/iscsi.c index ead2bd5aa7..38b917a1e5 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -2193,9 +2193,11 @@ static int coroutine_fn iscsi_co_copy_range_from(BlockDriverState *bs, BdrvChild *dst, uint64_t dst_offset, uint64_t bytes, - BdrvRequestFlags flags) + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { - return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, flags); + return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, + read_flags, write_flags); } static struct scsi_task *iscsi_xcopy_task(int param_len) @@ -2332,7 +2334,8 @@ static int coroutine_fn iscsi_co_copy_range_to(BlockDriverState *bs, BdrvChild *dst, uint64_t dst_offset, uint64_t bytes, - BdrvRequestFlags flags) + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { IscsiLun *dst_lun = dst->bs->opaque; IscsiLun *src_lun; diff --git a/block/qcow2.c b/block/qcow2.c index 33b61b7480..867ce02d50 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -3253,13 +3253,14 @@ static int coroutine_fn qcow2_co_copy_range_from(BlockDriverState *bs, BdrvChild *src, uint64_t src_offset, BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags) + uint64_t bytes, BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { BDRVQcow2State *s = bs->opaque; int ret; unsigned int cur_bytes; /* number of bytes in current iteration */ BdrvChild *child = NULL; - BdrvRequestFlags cur_flags; + BdrvRequestFlags cur_write_flags; assert(!bs->encrypted); qemu_co_mutex_lock(&s->lock); @@ -3268,7 +3269,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs, uint64_t copy_offset = 0; /* prepare next request */ cur_bytes = MIN(bytes, INT_MAX); - cur_flags = flags; + cur_write_flags = write_flags; ret = qcow2_get_cluster_offset(bs, src_offset, &cur_bytes, ©_offset); if (ret < 0) { @@ -3280,20 +3281,20 @@ qcow2_co_copy_range_from(BlockDriverState *bs, if (bs->backing && bs->backing->bs) { int64_t backing_length = bdrv_getlength(bs->backing->bs); if (src_offset >= backing_length) { - cur_flags |= BDRV_REQ_ZERO_WRITE; + cur_write_flags |= BDRV_REQ_ZERO_WRITE; } else { child = bs->backing; cur_bytes = MIN(cur_bytes, backing_length - src_offset); copy_offset = src_offset; } } else { - cur_flags |= BDRV_REQ_ZERO_WRITE; + cur_write_flags |= BDRV_REQ_ZERO_WRITE; } break; case QCOW2_CLUSTER_ZERO_PLAIN: case QCOW2_CLUSTER_ZERO_ALLOC: - cur_flags |= BDRV_REQ_ZERO_WRITE; + cur_write_flags |= BDRV_REQ_ZERO_WRITE; break; case QCOW2_CLUSTER_COMPRESSED: @@ -3317,7 +3318,7 @@ qcow2_co_copy_range_from(BlockDriverState *bs, ret = bdrv_co_copy_range_from(child, copy_offset, dst, dst_offset, - cur_bytes, cur_flags); + cur_bytes, read_flags, cur_write_flags); qemu_co_mutex_lock(&s->lock); if (ret < 0) { goto out; @@ -3338,7 +3339,8 @@ static int coroutine_fn qcow2_co_copy_range_to(BlockDriverState *bs, BdrvChild *src, uint64_t src_offset, BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags) + uint64_t bytes, BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { BDRVQcow2State *s = bs->opaque; int offset_in_cluster; @@ -3382,7 +3384,7 @@ qcow2_co_copy_range_to(BlockDriverState *bs, ret = bdrv_co_copy_range_to(src, src_offset, bs->file, cluster_offset + offset_in_cluster, - cur_bytes, flags); + cur_bytes, read_flags, write_flags); qemu_co_mutex_lock(&s->lock); if (ret < 0) { goto fail; diff --git a/block/raw-format.c b/block/raw-format.c index b78da564d4..a3591985f6 100644 --- a/block/raw-format.c +++ b/block/raw-format.c @@ -498,9 +498,13 @@ static int raw_probe_geometry(BlockDriverState *bs, HDGeometry *geo) } static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs, - BdrvChild *src, uint64_t src_offset, - BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags) + BdrvChild *src, + uint64_t src_offset, + BdrvChild *dst, + uint64_t dst_offset, + uint64_t bytes, + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { int ret; @@ -509,13 +513,17 @@ static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs, return ret; } return bdrv_co_copy_range_from(bs->file, src_offset, dst, dst_offset, - bytes, flags); + bytes, read_flags, write_flags); } static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs, - BdrvChild *src, uint64_t src_offset, - BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags) + BdrvChild *src, + uint64_t src_offset, + BdrvChild *dst, + uint64_t dst_offset, + uint64_t bytes, + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { int ret; @@ -524,7 +532,7 @@ static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs, return ret; } return bdrv_co_copy_range_to(src, src_offset, bs->file, dst_offset, bytes, - flags); + read_flags, write_flags); } BlockDriver bdrv_raw = { diff --git a/qemu-img.c b/qemu-img.c index 7651d8172c..f4074ebf75 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -1783,7 +1783,7 @@ static int coroutine_fn convert_co_copy_range(ImgConvertState *s, int64_t sector ret = blk_co_copy_range(blk, offset, s->target, sector_num << BDRV_SECTOR_BITS, - n << BDRV_SECTOR_BITS, 0); + n << BDRV_SECTOR_BITS, 0, 0); if (ret < 0) { return ret; } From patchwork Mon Jul 9 16:37:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vladimir Sementsov-Ogievskiy X-Patchwork-Id: 941444 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=virtuozzo.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41PWH10FPYz9s1B for ; Tue, 10 Jul 2018 02:38:19 +1000 (AEST) Received: from localhost ([::1]:43310 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fcZB0-0002cV-Et for incoming@patchwork.ozlabs.org; Mon, 09 Jul 2018 12:38:14 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57831) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fcZAG-0002bi-7P for qemu-devel@nongnu.org; Mon, 09 Jul 2018 12:37:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fcZAB-0002cy-Aq for qemu-devel@nongnu.org; Mon, 09 Jul 2018 12:37:28 -0400 Received: from relay.sw.ru ([185.231.240.75]:32926) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fcZAB-0002az-1z; Mon, 09 Jul 2018 12:37:23 -0400 Received: from vz-out.virtuozzo.com ([185.231.240.5] helo=kvm.sw.ru) by relay.sw.ru with esmtp (Exim 4.90_1) (envelope-from ) id 1fcZA8-00062Y-6C; Mon, 09 Jul 2018 19:37:20 +0300 From: Vladimir Sementsov-Ogievskiy To: qemu-devel@nongnu.org, qemu-block@nongnu.org Date: Mon, 9 Jul 2018 19:37:18 +0300 Message-Id: <20180709163719.87107-4-vsementsov@virtuozzo.com> X-Mailer: git-send-email 2.11.1 In-Reply-To: <20180709163719.87107-1-vsementsov@virtuozzo.com> References: <20180709163719.87107-1-vsementsov@virtuozzo.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 185.231.240.75 Subject: [Qemu-devel] [PATCH v5 3/4] block: add BDRV_REQ_SERIALISING flag X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, vsementsov@virtuozzo.com, famz@redhat.com, ronniesahlberg@gmail.com, jcody@redhat.com, pl@kamp.de, mreitz@redhat.com, stefanha@redhat.com, den@openvz.org, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Serialized writes should be used in copy-on-write of backup(sync=none) for image fleecing scheme. We need to change an assert in bdrv_aligned_pwritev, added in 28de2dcd88de. The assert may fail now, because call to wait_serialising_requests here may become first call to it for this request with serializing flag set. It occurs if the request is aligned (otherwise, we should already set serializing flag before calling bdrv_aligned_pwritev and correspondingly waited for all intersecting requests). However, for aligned requests, we should not care about outdating of previously read data, as there no such data. Therefore, let's just update an assert to not care about aligned requests. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fam Zheng --- include/block/block.h | 13 ++++++++++++- block/io.c | 26 +++++++++++++++++++++++++- 2 files changed, 37 insertions(+), 2 deletions(-) diff --git a/include/block/block.h b/include/block/block.h index 57a96d4988..6623d15432 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -69,8 +69,19 @@ typedef enum { * content. */ BDRV_REQ_WRITE_UNCHANGED = 0x40, + /* BDRV_REQ_SERIALISING forces request serialisation for writes. + * It is used to ensure that writes to the backing file of a backup process + * target cannot race with a read of the backup target that defers to the + * backing file. + * + * Note, that BDRV_REQ_SERIALISING is _not_ opposite in meaning to + * BDRV_REQ_NO_SERIALISING. A more descriptive name for the latter might be + * _DO_NOT_WAIT_FOR_SERIALISING, except that is too long. + */ + BDRV_REQ_SERIALISING = 0x80, + /* Mask of valid flags */ - BDRV_REQ_MASK = 0x7f, + BDRV_REQ_MASK = 0xff, } BdrvRequestFlags; typedef struct BlockSizes { diff --git a/block/io.c b/block/io.c index 5d5651ad84..102cb0e1ba 100644 --- a/block/io.c +++ b/block/io.c @@ -623,6 +623,16 @@ static void mark_request_serialising(BdrvTrackedRequest *req, uint64_t align) req->overlap_bytes = MAX(req->overlap_bytes, overlap_bytes); } +static bool is_request_serialising_and_aligned(BdrvTrackedRequest *req) +{ + /* if request is serialising, overlap_offset and overlap_bytes are set, so + * we can check is request aligned. Otherwise don't care and return false + */ + + return req->serialising && (req->offset == req->overlap_offset) && + (req->bytes == req->overlap_bytes); +} + /** * Round a region to cluster boundaries */ @@ -1291,6 +1301,9 @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child, mark_request_serialising(req, bdrv_get_cluster_size(bs)); } + /* BDRV_REQ_SERIALISING is only for write operation */ + assert(!(flags & BDRV_REQ_SERIALISING)); + if (!(flags & BDRV_REQ_NO_SERIALISING)) { wait_serialising_requests(req); } @@ -1574,8 +1587,14 @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child, /* BDRV_REQ_NO_SERIALISING is only for read operation */ assert(!(flags & BDRV_REQ_NO_SERIALISING)); + + if (flags & BDRV_REQ_SERIALISING) { + mark_request_serialising(req, bdrv_get_cluster_size(bs)); + } + waited = wait_serialising_requests(req); - assert(!waited || !req->serialising); + assert(!waited || !req->serialising || + is_request_serialising_and_aligned(req)); assert(req->overlap_offset <= offset); assert(offset + bytes <= req->overlap_offset + req->overlap_bytes); if (flags & BDRV_REQ_WRITE_UNCHANGED) { @@ -2929,6 +2948,8 @@ static int coroutine_fn bdrv_co_copy_range_internal( tracked_request_begin(&req, src->bs, src_offset, bytes, BDRV_TRACKED_READ); + /* BDRV_REQ_SERIALISING is only for write operation */ + assert(!(read_flags & BDRV_REQ_SERIALISING)); if (!(read_flags & BDRV_REQ_NO_SERIALISING)) { wait_serialising_requests(&req); } @@ -2948,6 +2969,9 @@ static int coroutine_fn bdrv_co_copy_range_internal( /* BDRV_REQ_NO_SERIALISING is only for read operation */ assert(!(write_flags & BDRV_REQ_NO_SERIALISING)); + if (write_flags & BDRV_REQ_SERIALISING) { + mark_request_serialising(&req, bdrv_get_cluster_size(dst->bs)); + } wait_serialising_requests(&req); ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs, From patchwork Mon Jul 9 16:37:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vladimir Sementsov-Ogievskiy X-Patchwork-Id: 941442 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=virtuozzo.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41PWH10FLyz9s19 for ; Tue, 10 Jul 2018 02:38:19 +1000 (AEST) Received: from localhost ([::1]:43308 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fcZB0-0002cJ-CJ for incoming@patchwork.ozlabs.org; Mon, 09 Jul 2018 12:38:14 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57832) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fcZAG-0002bj-7j for qemu-devel@nongnu.org; Mon, 09 Jul 2018 12:37:29 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fcZAB-0002cm-9i for qemu-devel@nongnu.org; Mon, 09 Jul 2018 12:37:28 -0400 Received: from relay.sw.ru ([185.231.240.75]:32916) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fcZAB-0002aw-22; Mon, 09 Jul 2018 12:37:23 -0400 Received: from vz-out.virtuozzo.com ([185.231.240.5] helo=kvm.sw.ru) by relay.sw.ru with esmtp (Exim 4.90_1) (envelope-from ) id 1fcZA8-00062Y-AF; Mon, 09 Jul 2018 19:37:20 +0300 From: Vladimir Sementsov-Ogievskiy To: qemu-devel@nongnu.org, qemu-block@nongnu.org Date: Mon, 9 Jul 2018 19:37:19 +0300 Message-Id: <20180709163719.87107-5-vsementsov@virtuozzo.com> X-Mailer: git-send-email 2.11.1 In-Reply-To: <20180709163719.87107-1-vsementsov@virtuozzo.com> References: <20180709163719.87107-1-vsementsov@virtuozzo.com> X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 185.231.240.75 Subject: [Qemu-devel] [PATCH v5 4/4] block/backup: fix fleecing scheme: use serialized writes X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, vsementsov@virtuozzo.com, famz@redhat.com, ronniesahlberg@gmail.com, jcody@redhat.com, pl@kamp.de, mreitz@redhat.com, stefanha@redhat.com, den@openvz.org, pbonzini@redhat.com, jsnow@redhat.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Fleecing scheme works as follows: we want a kind of temporary snapshot of active drive A. We create temporary image B, with B->backing = A. Then we start backup(sync=none) from A to B. From this point, B reads as point-in-time snapshot of A (A continues to be active drive, accepting guest IO). This scheme needs some additional synchronization between reads from B and backup COW operations, otherwise, the following situation is theoretically possible: (assume B is qcow2, client is NBD client, reading from B) 1. client starts reading and take qcow2 mutex in qcow2_co_preadv, and goes up to l2 table loading (assume cache miss) 2) guest write => backup COW => qcow2 write => try to take qcow2 mutex => waiting 3. l2 table loaded, we see that cluster is UNALLOCATED, go to "case QCOW2_CLUSTER_UNALLOCATED" and unlock mutex before bdrv_co_preadv(bs->backing, ...) 4) aha, mutex unlocked, backup COW continues, and we finally finish guest write and change cluster in our active disk A 5. actually, do bdrv_co_preadv(bs->backing, ...) and read _new updated_ data. To avoid this, let's make backup writes serializing, to not intersect with reads from B. Note: we expand range of handled cases from (sync=none and B->backing = A) to just (A in backing chain of B), to finally allow safe reading from B during backup for all cases when A in backing chain of B, i.e. B formally looks like point-in-time snapshot of A. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Fam Zheng --- block/backup.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/block/backup.c b/block/backup.c index f3e4e814b6..319fc922e8 100644 --- a/block/backup.c +++ b/block/backup.c @@ -47,6 +47,8 @@ typedef struct BackupBlockJob { HBitmap *copy_bitmap; bool use_copy_range; int64_t copy_range_size; + + bool serialize_target_writes; } BackupBlockJob; static const BlockJobDriver backup_job_driver; @@ -102,6 +104,8 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job, QEMUIOVector qiov; BlockBackend *blk = job->common.blk; int nbytes; + int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0; + int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0; hbitmap_reset(job->copy_bitmap, start / job->cluster_size, 1); nbytes = MIN(job->cluster_size, job->len - start); @@ -112,8 +116,7 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job, iov.iov_len = nbytes; qemu_iovec_init_external(&qiov, &iov, 1); - ret = blk_co_preadv(blk, start, qiov.size, &qiov, - is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0); + ret = blk_co_preadv(blk, start, qiov.size, &qiov, read_flags); if (ret < 0) { trace_backup_do_cow_read_fail(job, start, ret); if (error_is_read) { @@ -124,11 +127,11 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job, if (qemu_iovec_is_zero(&qiov)) { ret = blk_co_pwrite_zeroes(job->target, start, - qiov.size, BDRV_REQ_MAY_UNMAP); + qiov.size, write_flags | BDRV_REQ_MAY_UNMAP); } else { ret = blk_co_pwritev(job->target, start, - qiov.size, &qiov, - job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0); + qiov.size, &qiov, write_flags | + (job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0)); } if (ret < 0) { trace_backup_do_cow_write_fail(job, start, ret); @@ -156,6 +159,8 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job, int nr_clusters; BlockBackend *blk = job->common.blk; int nbytes; + int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0; + int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0; assert(QEMU_IS_ALIGNED(job->copy_range_size, job->cluster_size)); nbytes = MIN(job->copy_range_size, end - start); @@ -163,7 +168,7 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job, hbitmap_reset(job->copy_bitmap, start / job->cluster_size, nr_clusters); ret = blk_co_copy_range(blk, start, job->target, start, nbytes, - is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0, 0); + read_flags, write_flags); if (ret < 0) { trace_backup_do_cow_copy_range_fail(job, start, ret); hbitmap_set(job->copy_bitmap, start / job->cluster_size, @@ -701,6 +706,9 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs, sync_bitmap : NULL; job->compress = compress; + /* Detect image-fleecing (and similar) schemes */ + job->serialize_target_writes = bdrv_chain_contains(target, bs); + /* If there is no backing file on the target, we cannot rely on COW if our * backup cluster size is smaller than the target cluster size. Even for * targets with a backing file, try to avoid COW if possible. */