From patchwork Fri May 11 12:08:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Fam Zheng X-Patchwork-Id: 911967 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40j86R1R4Pz9s2L for ; Fri, 11 May 2018 22:09:51 +1000 (AEST) Received: from localhost ([::1]:39824 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fH6rs-0002VC-PO for incoming@patchwork.ozlabs.org; Fri, 11 May 2018 08:09:48 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46584) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fH6qw-0002LW-Lp for qemu-devel@nongnu.org; Fri, 11 May 2018 08:08:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fH6qv-00034o-9x for qemu-devel@nongnu.org; Fri, 11 May 2018 08:08:50 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:34330 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fH6qs-00032N-7A; Fri, 11 May 2018 08:08:46 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D3DE9400187E; Fri, 11 May 2018 12:08:45 +0000 (UTC) Received: from lemon.usersys.redhat.com (ovpn-12-175.pek2.redhat.com [10.72.12.175]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7F40A6B5B6; Fri, 11 May 2018 12:08:40 +0000 (UTC) From: Fam Zheng To: qemu-devel@nongnu.org Date: Fri, 11 May 2018 20:08:14 +0800 Message-Id: <20180511120823.7892-2-famz@redhat.com> In-Reply-To: <20180511120823.7892-1-famz@redhat.com> References: <20180511120823.7892-1-famz@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Fri, 11 May 2018 12:08:45 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Fri, 11 May 2018 12:08:45 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'famz@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH v4 01/10] block: Introduce API for copy offloading X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Wolf , Fam Zheng , Stefan Hajnoczi , qemu-block@nongnu.org, Peter Lieven , Max Reitz , Ronnie Sahlberg , Paolo Bonzini Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Introduce the bdrv_co_copy_range() API for copy offloading. Block drivers implementing this API support efficient copy operations that avoid reading each block from the source device and writing it to the destination devices. Examples of copy offload primitives are SCSI EXTENDED COPY and Linux copy_file_range(2). Signed-off-by: Fam Zheng Reviewed-by: Stefan Hajnoczi --- block/io.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++ include/block/block.h | 32 ++++++++++++++++ include/block/block_int.h | 38 +++++++++++++++++++ 3 files changed, 166 insertions(+) diff --git a/block/io.c b/block/io.c index bd9a19a9c4..79e0195bc6 100644 --- a/block/io.c +++ b/block/io.c @@ -2826,3 +2826,99 @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host) bdrv_unregister_buf(child->bs, host); } } + +static int bdrv_co_copy_range_internal(BdrvChild *src, + uint64_t src_offset, + BdrvChild *dst, + uint64_t dst_offset, + uint64_t bytes, BdrvRequestFlags flags, + bool recurse_src) +{ + int ret; + + if (!src || !dst || !src->bs || !dst->bs) { + return -ENOMEDIUM; + } + ret = bdrv_check_byte_request(src->bs, src_offset, bytes); + if (ret) { + return ret; + } + + ret = bdrv_check_byte_request(dst->bs, dst_offset, bytes); + if (ret) { + return ret; + } + if (flags & BDRV_REQ_ZERO_WRITE) { + return bdrv_co_pwrite_zeroes(dst, dst_offset, bytes, flags); + } + + if (!src->bs->drv->bdrv_co_copy_range_from + || !dst->bs->drv->bdrv_co_copy_range_to + || src->bs->encrypted || dst->bs->encrypted) { + return -ENOTSUP; + } + if (recurse_src) { + return src->bs->drv->bdrv_co_copy_range_from(src->bs, + src, src_offset, + dst, dst_offset, + bytes, flags); + } else { + return dst->bs->drv->bdrv_co_copy_range_to(dst->bs, + src, src_offset, + dst, dst_offset, + bytes, flags); + } +} + +/* Copy range from @src to @dst. + * + * See the comment of bdrv_co_copy_range for the parameter and return value + * semantics. */ +int bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset, + BdrvChild *dst, uint64_t dst_offset, + uint64_t bytes, BdrvRequestFlags flags) +{ + return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset, + bytes, flags, true); +} + +/* Copy range from @src to @dst. + * + * See the comment of bdrv_co_copy_range for the parameter and return value + * semantics. */ +int bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset, + BdrvChild *dst, uint64_t dst_offset, + uint64_t bytes, BdrvRequestFlags flags) +{ + return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset, + bytes, flags, false); +} + +int bdrv_co_copy_range(BdrvChild *src, uint64_t src_offset, + BdrvChild *dst, uint64_t dst_offset, + uint64_t bytes, BdrvRequestFlags flags) +{ + BdrvTrackedRequest src_req, dst_req; + BlockDriverState *src_bs = src->bs; + BlockDriverState *dst_bs = dst->bs; + int ret; + + bdrv_inc_in_flight(src_bs); + bdrv_inc_in_flight(dst_bs); + tracked_request_begin(&src_req, src_bs, src_offset, + bytes, BDRV_TRACKED_READ); + tracked_request_begin(&dst_req, dst_bs, dst_offset, + bytes, BDRV_TRACKED_WRITE); + + wait_serialising_requests(&src_req); + wait_serialising_requests(&dst_req); + ret = bdrv_co_copy_range_from(src, src_offset, + dst, dst_offset, + bytes, flags); + + tracked_request_end(&src_req); + tracked_request_end(&dst_req); + bdrv_dec_in_flight(src_bs); + bdrv_dec_in_flight(dst_bs); + return ret; +} diff --git a/include/block/block.h b/include/block/block.h index cdec3639a3..aa3a5326a8 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -604,4 +604,36 @@ bool bdrv_can_store_new_dirty_bitmap(BlockDriverState *bs, const char *name, */ void bdrv_register_buf(BlockDriverState *bs, void *host, size_t size); void bdrv_unregister_buf(BlockDriverState *bs, void *host); + +/** + * + * bdrv_co_copy_range: + * + * Do offloaded copy between two children. If the operation is not implemented + * by the driver, or if the backend storage doesn't support it, a negative + * error code will be returned. + * + * Note: block layer doesn't emulate or fallback to a bounce buffer approach + * because usually the caller shouldn't attempt offloaded copy any more (e.g. + * calling copy_file_range(2)) after the first error, thus it should fall back + * to a read+write path in the caller level. + * + * @src: Source child to copy data from + * @src_offset: offset in @src image to read data + * @dst: Destination child to copy data to + * @dst_offset: offset in @dst image to write data + * @bytes: number of bytes to copy + * @flags: request flags. Must be one of: + * 0 - actually read data from src; + * BDRV_REQ_ZERO_WRITE - treat the @src range as zero data and do zero + * write on @dst as if bdrv_co_pwrite_zeroes is + * called. Used to simplify caller code, or + * during BlockDriver.bdrv_co_copy_range_from() + * recursion. + * + * Returns: 0 if succeeded; negative error code if failed. + **/ +int bdrv_co_copy_range(BdrvChild *src, uint64_t src_offset, + BdrvChild *dst, uint64_t dst_offset, + uint64_t bytes, BdrvRequestFlags flags); #endif diff --git a/include/block/block_int.h b/include/block/block_int.h index c4dd1d4bb8..bf6f0b061a 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -206,6 +206,37 @@ struct BlockDriver { int coroutine_fn (*bdrv_co_pdiscard)(BlockDriverState *bs, int64_t offset, int bytes); + /* Map [offset, offset + nbytes) range onto a child of @bs to copy from, + * and invoke bdrv_co_copy_range_from(child, ...), or invoke + * bdrv_co_copy_range_to() if @bs is the leaf child to copy data from. + * + * See the comment of bdrv_co_copy_range for the parameter and return value + * semantics. + */ + int coroutine_fn (*bdrv_co_copy_range_from)(BlockDriverState *bs, + BdrvChild *src, + uint64_t offset, + BdrvChild *dst, + uint64_t dst_offset, + uint64_t bytes, + BdrvRequestFlags flags); + + /* Map [offset, offset + nbytes) range onto a child of bs to copy data to, + * and invoke bdrv_co_copy_range_to(child, src, ...), or perform the copy + * operation if @bs is the leaf and @src has the same BlockDriver. Return + * -ENOTSUP if @bs is the leaf but @src has a different BlockDriver. + * + * See the comment of bdrv_co_copy_range for the parameter and return value + * semantics. + */ + int coroutine_fn (*bdrv_co_copy_range_to)(BlockDriverState *bs, + BdrvChild *src, + uint64_t src_offset, + BdrvChild *dst, + uint64_t dst_offset, + uint64_t bytes, + BdrvRequestFlags flags); + /* * Building block for bdrv_block_status[_above] and * bdrv_is_allocated[_above]. The driver should answer only @@ -1090,4 +1121,11 @@ void bdrv_dec_in_flight(BlockDriverState *bs); void blockdev_close_all_bdrv_states(void); +int bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset, + BdrvChild *dst, uint64_t dst_offset, + uint64_t bytes, BdrvRequestFlags flags); +int bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset, + BdrvChild *dst, uint64_t dst_offset, + uint64_t bytes, BdrvRequestFlags flags); + #endif /* BLOCK_INT_H */