From patchwork Thu Jul 12 13:00:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Lieven X-Patchwork-Id: 943011 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=kamp.de Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41RGK76lvzz9s2M for ; Thu, 12 Jul 2018 23:01:14 +1000 (AEST) Received: from localhost ([::1]:59647 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fdbDb-0001J6-8w for incoming@patchwork.ozlabs.org; Thu, 12 Jul 2018 09:01:11 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48382) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fdbCx-0001Fu-2J for qemu-devel@nongnu.org; Thu, 12 Jul 2018 09:00:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fdbCr-0006RF-BE for qemu-devel@nongnu.org; Thu, 12 Jul 2018 09:00:31 -0400 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:55481 helo=mx01.kamp.de) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fdbCr-0006Pv-1l for qemu-devel@nongnu.org; Thu, 12 Jul 2018 09:00:25 -0400 Received: (qmail 1560 invoked by uid 89); 12 Jul 2018 13:00:14 -0000 Received: from [195.62.97.192] by client-16-kamp (envelope-from , uid 89) with qmail-scanner-2010/03/19-MF (clamdscan: 0.100.1/24745. avast: 1.2.2/17010300. spamassassin: 3.4.1. Clear:RC:1(195.62.97.192):. Processed in 0.06294 secs); 12 Jul 2018 13:00:14 -0000 Received: from kerio.kamp.de ([195.62.97.192]) by mx01.kamp.de with ESMTPS (DHE-RSA-AES256-SHA encrypted); 12 Jul 2018 13:00:14 -0000 X-GL_Whitelist: yes X-Footer: a2FtcC5kZQ== Received: from submission.kamp.de ([195.62.97.28]) by kerio.kamp.de with ESMTPS (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256 bits)) for qemu-devel@nongnu.org; Thu, 12 Jul 2018 15:00:10 +0200 Received: (qmail 23841 invoked from network); 12 Jul 2018 13:00:12 -0000 Received: from lieven-vm.kamp-intra.net (HELO lieven-vm-neu) (relay@kamp.de@::ffff:172.21.12.69) by submission.kamp.de with ESMTPS (DHE-RSA-AES256-GCM-SHA384 encrypted) ESMTPA; 12 Jul 2018 13:00:12 -0000 Received: by lieven-vm-neu (Postfix, from userid 1060) id 1D71F20193; Thu, 12 Jul 2018 15:00:12 +0200 (CEST) From: Peter Lieven To: qemu-devel@nongnu.org, qemu-block@nongnu.org Date: Thu, 12 Jul 2018 15:00:10 +0200 Message-Id: <1531400410-27059-1-git-send-email-pl@kamp.de> X-Mailer: git-send-email 1.9.1 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2a02:248:0:51::16 Subject: [Qemu-devel] [PATCH V6] qemu-img: align result of is_allocated_sectors X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: kwolf@redhat.com, Peter Lieven , mreitz@redhat.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" We currently don't enforce that the sparse segments we detect during convert are aligned. This leads to unnecessary and costly read-modify-write cycles either internally in Qemu or in the background on the storage device as nearly all modern filesystems or hardware have a 4k alignment internally. This patch modifies is_allocated_sectors so that its *pnum result will always end at an alignment boundary. This way all requests will end at an alignment boundary. The start of all requests will also be aligned as long as the results of get_block_status do not lead to an unaligned offset. The number of RMW cycles when converting an example image [1] to a raw device that has 4k sector size is about 4600 4k read requests to perform a total of about 15000 write requests. With this path the additional 4600 read requests are eliminated while the number of total write requests stays constant. [1] https://cloud-images.ubuntu.com/releases/16.04/release/ubuntu-16.04-server-cloudimg-amd64-disk1.vmdk Signed-off-by: Peter Lieven --- V5->V6: - fix output of iotest 122 [Kevin] V4->V5: - is_zero is a bool [Kevin] - treat zero areas as allocated if i <= tail to avoid *pnum underflow [Kevin] V3->V4: - only focus on the end offset in is_allocated_sectors [Kevin] V2->V3: - ensure that s.alignment is a power of 2 - correctly handle n < alignment in is_allocated_sectors if sector_num % alignment > 0. V1->V2: - take the current sector offset into account [Max] - try to figure out the target alignment [Max] qemu-img.c | 44 ++++++++++++++++++++++++++++++++++++++------ tests/qemu-iotests/122.out | 18 ++++++++---------- 2 files changed, 46 insertions(+), 16 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index 7651d81..9021ab0 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -1105,11 +1105,15 @@ static int64_t find_nonzero(const uint8_t *buf, int64_t n) * * 'pnum' is set to the number of sectors (including and immediately following * the first one) that are known to be in the same allocated/unallocated state. + * The function will try to align the end offset to alignment boundaries so + * that the request will at least end aligned and consequtive requests will + * also start at an aligned offset. */ -static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum) +static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum, + int64_t sector_num, int alignment) { bool is_zero; - int i; + int i, tail; if (n <= 0) { *pnum = 0; @@ -1122,6 +1126,23 @@ static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum) break; } } + + tail = (sector_num + i) & (alignment - 1); + if (tail) { + if (is_zero && i <= tail) { + /* treat unallocated areas which only consist + * of a small tail as allocated. */ + is_zero = false; + } + if (!is_zero) { + /* align up end offset of allocated areas. */ + i += alignment - tail; + i = MIN(i, n); + } else { + /* align down end offset of zero areas. */ + i -= tail; + } + } *pnum = i; return !is_zero; } @@ -1132,7 +1153,7 @@ static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum) * breaking up write requests for only small sparse areas. */ static int is_allocated_sectors_min(const uint8_t *buf, int n, int *pnum, - int min) + int min, int64_t sector_num, int alignment) { int ret; int num_checked, num_used; @@ -1141,7 +1162,7 @@ static int is_allocated_sectors_min(const uint8_t *buf, int n, int *pnum, min = n; } - ret = is_allocated_sectors(buf, n, pnum); + ret = is_allocated_sectors(buf, n, pnum, sector_num, alignment); if (!ret) { return ret; } @@ -1149,13 +1170,15 @@ static int is_allocated_sectors_min(const uint8_t *buf, int n, int *pnum, num_used = *pnum; buf += BDRV_SECTOR_SIZE * *pnum; n -= *pnum; + sector_num += *pnum; num_checked = num_used; while (n > 0) { - ret = is_allocated_sectors(buf, n, pnum); + ret = is_allocated_sectors(buf, n, pnum, sector_num, alignment); buf += BDRV_SECTOR_SIZE * *pnum; n -= *pnum; + sector_num += *pnum; num_checked += *pnum; if (ret) { num_used = num_checked; @@ -1560,6 +1583,7 @@ typedef struct ImgConvertState { bool wr_in_order; bool copy_range; int min_sparse; + int alignment; size_t cluster_sectors; size_t buf_sectors; long num_coroutines; @@ -1724,7 +1748,8 @@ static int coroutine_fn convert_co_write(ImgConvertState *s, int64_t sector_num, * zeroed. */ if (!s->min_sparse || (!s->compressed && - is_allocated_sectors_min(buf, n, &n, s->min_sparse)) || + is_allocated_sectors_min(buf, n, &n, s->min_sparse, + sector_num, s->alignment)) || (s->compressed && !buffer_is_zero(buf, n * BDRV_SECTOR_SIZE))) { @@ -2368,6 +2393,13 @@ static int img_convert(int argc, char **argv) out_bs->bl.pdiscard_alignment >> BDRV_SECTOR_BITS))); + /* try to align the write requests to the destination to avoid unnecessary + * RMW cycles. */ + s.alignment = MAX(pow2floor(s.min_sparse), + DIV_ROUND_UP(out_bs->bl.request_alignment, + BDRV_SECTOR_SIZE)); + assert(is_power_of_2(s.alignment)); + if (skip_create) { int64_t output_sectors = blk_nb_sectors(s.target); if (output_sectors < 0) { diff --git a/tests/qemu-iotests/122.out b/tests/qemu-iotests/122.out index 6c7ee1d..c576705 100644 --- a/tests/qemu-iotests/122.out +++ b/tests/qemu-iotests/122.out @@ -194,12 +194,12 @@ wrote 1024/1024 bytes at offset 17408 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) convert -S 4k -[{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": OFFSET}, -{ "start": 1024, "length": 7168, "depth": 0, "zero": true, "data": false}, -{ "start": 8192, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": OFFSET}, -{ "start": 9216, "length": 8192, "depth": 0, "zero": true, "data": false}, -{ "start": 17408, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": OFFSET}, -{ "start": 18432, "length": 67090432, "depth": 0, "zero": true, "data": false}] +[{ "start": 0, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET}, +{ "start": 4096, "length": 4096, "depth": 0, "zero": true, "data": false}, +{ "start": 8192, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET}, +{ "start": 12288, "length": 4096, "depth": 0, "zero": true, "data": false}, +{ "start": 16384, "length": 4096, "depth": 0, "zero": false, "data": true, "offset": OFFSET}, +{ "start": 20480, "length": 67088384, "depth": 0, "zero": true, "data": false}] convert -c -S 4k [{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true}, @@ -210,10 +210,8 @@ convert -c -S 4k { "start": 18432, "length": 67090432, "depth": 0, "zero": true, "data": false}] convert -S 8k -[{ "start": 0, "length": 9216, "depth": 0, "zero": false, "data": true, "offset": OFFSET}, -{ "start": 9216, "length": 8192, "depth": 0, "zero": true, "data": false}, -{ "start": 17408, "length": 1024, "depth": 0, "zero": false, "data": true, "offset": OFFSET}, -{ "start": 18432, "length": 67090432, "depth": 0, "zero": true, "data": false}] +[{ "start": 0, "length": 24576, "depth": 0, "zero": false, "data": true, "offset": OFFSET}, +{ "start": 24576, "length": 67084288, "depth": 0, "zero": true, "data": false}] convert -c -S 8k [{ "start": 0, "length": 1024, "depth": 0, "zero": false, "data": true},