From patchwork Wed Dec 7 12:10:58 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 129948 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [140.186.70.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 9BD7C1007D1 for ; Wed, 7 Dec 2011 23:11:47 +1100 (EST) Received: from localhost ([::1]:55470 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RYGLg-0003yQ-5t for incoming@patchwork.ozlabs.org; Wed, 07 Dec 2011 07:11:44 -0500 Received: from eggs.gnu.org ([140.186.70.92]:60069) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RYGLE-0002fP-7F for qemu-devel@nongnu.org; Wed, 07 Dec 2011 07:11:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1RYGL8-0005JP-8Q for qemu-devel@nongnu.org; Wed, 07 Dec 2011 07:11:16 -0500 Received: from e06smtp11.uk.ibm.com ([195.75.94.107]:49178) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1RYGL7-0005If-SN for qemu-devel@nongnu.org; Wed, 07 Dec 2011 07:11:10 -0500 Received: from /spool/local by e06smtp11.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 7 Dec 2011 12:11:07 -0000 Received: from d06nrmr1707.portsmouth.uk.ibm.com ([9.149.39.225]) by e06smtp11.uk.ibm.com ([192.168.101.141]) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 7 Dec 2011 12:11:06 -0000 Received: from d06av05.portsmouth.uk.ibm.com (d06av05.portsmouth.uk.ibm.com [9.149.37.229]) by d06nrmr1707.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id pB7CB5hM2424922 for ; Wed, 7 Dec 2011 12:11:05 GMT Received: from d06av05.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av05.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id pB7CB54l006515 for ; Wed, 7 Dec 2011 05:11:05 -0700 Received: from localhost (sig-9-145-136-175.de.ibm.com [9.145.136.175]) by d06av05.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id pB7CB4FI006484; Wed, 7 Dec 2011 05:11:05 -0700 From: Stefan Hajnoczi To: Date: Wed, 7 Dec 2011 12:10:58 +0000 Message-Id: <1323259859-8709-3-git-send-email-stefanha@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.7.3 In-Reply-To: <1323259859-8709-1-git-send-email-stefanha@linux.vnet.ibm.com> References: <1323259859-8709-1-git-send-email-stefanha@linux.vnet.ibm.com> x-cbid: 11120712-5024-0000-0000-0000010622B2 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 195.75.94.107 Cc: Kevin Wolf , Marcelo Tosatti , Stefan Hajnoczi Subject: [Qemu-devel] [PATCH v2 2/3] qed: add zero write detection support X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org The QED image format is able to efficiently represent clusters containing zeroes with a magic offset value. This patch implements zero write detection for allocating writes so that image streaming can copy over zero clusters from a backing file without expanding the image file unnecessarily. This is based code by Anthony Liguori . Signed-off-by: Stefan Hajnoczi --- block/qed.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++++++++------ 1 files changed, 72 insertions(+), 8 deletions(-) diff --git a/block/qed.c b/block/qed.c index 8da3ebe..db4246a 100644 --- a/block/qed.c +++ b/block/qed.c @@ -941,9 +941,8 @@ static void qed_aio_write_l1_update(void *opaque, int ret) /** * Update L2 table with new cluster offsets and write them out */ -static void qed_aio_write_l2_update(void *opaque, int ret) +static void qed_aio_write_l2_update(QEDAIOCB *acb, int ret, uint64_t offset) { - QEDAIOCB *acb = opaque; BDRVQEDState *s = acb_to_s(acb); bool need_alloc = acb->find_cluster_ret == QED_CLUSTER_L1; int index; @@ -959,7 +958,7 @@ static void qed_aio_write_l2_update(void *opaque, int ret) index = qed_l2_index(s, acb->cur_pos); qed_update_l2_table(s, acb->request.l2_table->table, index, acb->cur_nclusters, - acb->cur_cluster); + offset); if (need_alloc) { /* Write out the whole new L2 table */ @@ -976,6 +975,51 @@ err: qed_aio_complete(acb, ret); } +static void qed_aio_write_l2_update_cb(void *opaque, int ret) +{ + QEDAIOCB *acb = opaque; + qed_aio_write_l2_update(acb, ret, acb->cur_cluster); +} + +/** + * Determine if we have a zero write to a block of clusters + * + * We validate that the write is aligned to a cluster boundary, and that it's + * a multiple of cluster size with all zeros. + */ +static bool qed_is_zero_write(QEDAIOCB *acb) +{ + BDRVQEDState *s = acb_to_s(acb); + int i; + + if (!qed_offset_is_cluster_aligned(s, acb->cur_pos)) { + return false; + } + + if (!qed_offset_is_cluster_aligned(s, acb->cur_qiov.size)) { + return false; + } + + for (i = 0; i < acb->cur_qiov.niov; i++) { + struct iovec *iov = &acb->cur_qiov.iov[i]; + uint64_t *v; + int j; + + if ((iov->iov_len & 0x07)) { + return false; + } + + v = iov->iov_base; + for (j = 0; j < iov->iov_len; j += sizeof(v[0])) { + if (v[j >> 3]) { + return false; + } + } + } + + return true; +} + /** * Flush new data clusters before updating the L2 table * @@ -990,7 +1034,7 @@ static void qed_aio_write_flush_before_l2_update(void *opaque, int ret) QEDAIOCB *acb = opaque; BDRVQEDState *s = acb_to_s(acb); - if (!bdrv_aio_flush(s->bs->file, qed_aio_write_l2_update, opaque)) { + if (!bdrv_aio_flush(s->bs->file, qed_aio_write_l2_update_cb, opaque)) { qed_aio_complete(acb, -EIO); } } @@ -1019,7 +1063,7 @@ static void qed_aio_write_main(void *opaque, int ret) if (s->bs->backing_hd) { next_fn = qed_aio_write_flush_before_l2_update; } else { - next_fn = qed_aio_write_l2_update; + next_fn = qed_aio_write_l2_update_cb; } } @@ -1081,6 +1125,18 @@ static bool qed_should_set_need_check(BDRVQEDState *s) return !(s->header.features & QED_F_NEED_CHECK); } +static void qed_aio_write_zero_cluster(void *opaque, int ret) +{ + QEDAIOCB *acb = opaque; + + if (ret) { + qed_aio_complete(acb, ret); + return; + } + + qed_aio_write_l2_update(acb, 0, 1); +} + /** * Write new data cluster * @@ -1092,6 +1148,7 @@ static bool qed_should_set_need_check(BDRVQEDState *s) static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len) { BDRVQEDState *s = acb_to_s(acb); + BlockDriverCompletionFunc *cb; /* Cancel timer when the first allocating request comes in */ if (QSIMPLEQ_EMPTY(&s->allocating_write_reqs)) { @@ -1109,14 +1166,21 @@ static void qed_aio_write_alloc(QEDAIOCB *acb, size_t len) acb->cur_nclusters = qed_bytes_to_clusters(s, qed_offset_into_cluster(s, acb->cur_pos) + len); - acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters); qemu_iovec_copy(&acb->cur_qiov, acb->qiov, acb->qiov_offset, len); + /* Zero write detection */ + if (s->bs->zero_detection && qed_is_zero_write(acb)) { + cb = qed_aio_write_zero_cluster; + } else { + cb = qed_aio_write_prefill; + acb->cur_cluster = qed_alloc_clusters(s, acb->cur_nclusters); + } + if (qed_should_set_need_check(s)) { s->header.features |= QED_F_NEED_CHECK; - qed_write_header(s, qed_aio_write_prefill, acb); + qed_write_header(s, cb, acb); } else { - qed_aio_write_prefill(acb, 0); + cb(acb, 0); } }