From patchwork Thu Jan 20 17:10:10 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kevin Wolf X-Patchwork-Id: 79730 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) by ozlabs.org (Postfix) with ESMTP id D3F5BB710A for ; Fri, 21 Jan 2011 04:17:47 +1100 (EST) Received: from localhost ([127.0.0.1]:52811 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pfy3m-0004hx-U1 for incoming@patchwork.ozlabs.org; Thu, 20 Jan 2011 12:12:35 -0500 Received: from [140.186.70.92] (port=46842 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Pfy0E-00048T-DF for qemu-devel@nongnu.org; Thu, 20 Jan 2011 12:09:01 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Pfy0A-0002x9-91 for qemu-devel@nongnu.org; Thu, 20 Jan 2011 12:08:51 -0500 Received: from mx1.redhat.com ([209.132.183.28]:21961) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Pfy09-0002wu-Vi for qemu-devel@nongnu.org; Thu, 20 Jan 2011 12:08:50 -0500 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.13.8/8.13.8) with ESMTP id p0KH8k0Y013434 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 20 Jan 2011 12:08:46 -0500 Received: from dhcp-5-188.str.redhat.com (vpn1-5-247.ams2.redhat.com [10.36.5.247]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p0KH8fIf026822; Thu, 20 Jan 2011 12:08:44 -0500 From: Kevin Wolf To: qemu-devel@nongnu.org Date: Thu, 20 Jan 2011 18:10:10 +0100 Message-Id: <1295543412-24012-2-git-send-email-kwolf@redhat.com> In-Reply-To: <1295543412-24012-1-git-send-email-kwolf@redhat.com> References: <1295543412-24012-1-git-send-email-kwolf@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.25 X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. Cc: kwolf@redhat.com, stefanha@gmail.com Subject: [Qemu-devel] [PATCH v4 1/3] qcow2: Add QcowCache X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org This adds some new cache functions to qcow2 which can be used for caching refcount blocks and L2 tables. When used with cache=writethrough they work like the old caching code which is spread all over qcow2, so for this case we have merely a cleanup. The interesting case is with writeback caching (this includes cache=none) where data isn't written to disk immediately but only kept in cache initially. This leads to some form of metadata write batching which avoids the current "write to refcount block, flush, write to L2 table" pattern for each single request when a lot of cluster allocations happen. Instead, cache entries are only written out if its required to maintain the right order. In the pure cluster allocation case this means that all metadata updates for requests are done in memory initially and on sync, first the refcount blocks are written to disk, then fsync, then L2 tables. This improves performance of scenarios with lots of cluster allocations noticably (e.g. installation or after taking a snapshot). Signed-off-by: Kevin Wolf --- Makefile.objs | 2 +- block/qcow2-cache.c | 290 +++++++++++++++++++++++++++++++++++++++++++++++++++ block/qcow2.h | 19 ++++ 3 files changed, 310 insertions(+), 1 deletions(-) create mode 100644 block/qcow2-cache.c diff --git a/Makefile.objs b/Makefile.objs index c3e52c5..65889c9 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -19,7 +19,7 @@ block-obj-$(CONFIG_POSIX) += posix-aio-compat.o block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o -block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o +block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-nested-y += qed-check.o block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c new file mode 100644 index 0000000..f7c4e2a --- /dev/null +++ b/block/qcow2-cache.c @@ -0,0 +1,290 @@ +/* + * L2/refcount table cache for the QCOW2 format + * + * Copyright (c) 2010 Kevin Wolf + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include "block_int.h" +#include "qemu-common.h" +#include "qcow2.h" + +typedef struct Qcow2CachedTable { + void* table; + int64_t offset; + bool dirty; + int cache_hits; + int ref; +} Qcow2CachedTable; + +struct Qcow2Cache { + int size; + Qcow2CachedTable* entries; + struct Qcow2Cache* depends; + bool writethrough; +}; + +Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables, + bool writethrough) +{ + BDRVQcowState *s = bs->opaque; + Qcow2Cache *c; + int i; + + c = qemu_mallocz(sizeof(*c)); + c->size = num_tables; + c->entries = qemu_mallocz(sizeof(*c->entries) * num_tables); + c->writethrough = writethrough; + + for (i = 0; i < c->size; i++) { + c->entries[i].table = qemu_blockalign(bs, s->cluster_size); + } + + return c; +} + +int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c) +{ + int i; + + for (i = 0; i < c->size; i++) { + assert(c->entries[i].ref == 0); + qemu_vfree(c->entries[i].table); + } + + qemu_free(c->entries); + qemu_free(c); + + return 0; +} + +static int qcow2_cache_flush_dependency(BlockDriverState *bs, Qcow2Cache *c) +{ + int ret; + + ret = qcow2_cache_flush(bs, c->depends); + if (ret < 0) { + return ret; + } + + c->depends = NULL; + return 0; +} + +static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i) +{ + BDRVQcowState *s = bs->opaque; + int ret; + + if (!c->entries[i].dirty || !c->entries[i].offset) { + return 0; + } + + if (c->depends) { + ret = qcow2_cache_flush_dependency(bs, c); + if (ret < 0) { + return ret; + } + } + + ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table, + s->cluster_size); + if (ret < 0) { + return ret; + } + + c->entries[i].dirty = false; + + return 0; +} + +int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c) +{ + int result = 0; + int ret; + int i; + + for (i = 0; i < c->size; i++) { + ret = qcow2_cache_entry_flush(bs, c, i); + if (ret < 0 && result != -ENOSPC) { + result = ret; + } + } + + if (result == 0) { + ret = bdrv_flush(bs->file); + if (ret < 0) { + result = ret; + } + } + + return result; +} + +int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c, + Qcow2Cache *dependency) +{ + int ret; + + if (dependency->depends) { + ret = qcow2_cache_flush_dependency(bs, dependency); + if (ret < 0) { + return ret; + } + } + + if (c->depends && (c->depends != dependency)) { + ret = qcow2_cache_flush_dependency(bs, c); + if (ret < 0) { + return ret; + } + } + + c->depends = dependency; + return 0; +} + +static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c) +{ + int i; + int min_count = INT_MAX; + int min_index = -1; + + + for (i = 0; i < c->size; i++) { + if (c->entries[i].ref) { + continue; + } + + if (c->entries[i].cache_hits < min_count) { + min_index = i; + min_count = c->entries[i].cache_hits; + } + + /* Give newer hits priority */ + /* TODO Check how to optimize the replacement strategy */ + c->entries[i].cache_hits /= 2; + } + + if (min_index == -1) { + /* This can't happen in current synchronous code, but leave the check + * here as a reminder for whoever starts using AIO with the cache */ + abort(); + } + return min_index; +} + +static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c, + uint64_t offset, void **table, bool read_from_disk) +{ + BDRVQcowState *s = bs->opaque; + int i; + int ret; + + /* Check if the table is already cached */ + for (i = 0; i < c->size; i++) { + if (c->entries[i].offset == offset) { + goto found; + } + } + + /* If not, write a table back and replace it */ + i = qcow2_cache_find_entry_to_replace(c); + if (i < 0) { + return i; + } + + ret = qcow2_cache_entry_flush(bs, c, i); + if (ret < 0) { + return ret; + } + + c->entries[i].offset = 0; + if (read_from_disk) { + ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size); + if (ret < 0) { + return ret; + } + } + + /* Give the table some hits for the start so that it won't be replaced + * immediately. The number 32 is completely arbitrary. */ + c->entries[i].cache_hits = 32; + c->entries[i].offset = offset; + + /* And return the right table */ +found: + c->entries[i].cache_hits++; + c->entries[i].ref++; + *table = c->entries[i].table; + return 0; +} + +int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset, + void **table) +{ + return qcow2_cache_do_get(bs, c, offset, table, true); +} + +int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset, + void **table) +{ + return qcow2_cache_do_get(bs, c, offset, table, false); +} + +int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table) +{ + int i; + + for (i = 0; i < c->size; i++) { + if (c->entries[i].table == *table) { + goto found; + } + } + return -ENOENT; + +found: + c->entries[i].ref--; + *table = NULL; + + assert(c->entries[i].ref >= 0); + + if (c->writethrough) { + return qcow2_cache_entry_flush(bs, c, i); + } else { + return 0; + } +} + +void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table) +{ + int i; + + for (i = 0; i < c->size; i++) { + if (c->entries[i].table == table) { + goto found; + } + } + abort(); + +found: + c->entries[i].dirty = true; +} + diff --git a/block/qcow2.h b/block/qcow2.h index 5217bea..e5473e1 100644 --- a/block/qcow2.h +++ b/block/qcow2.h @@ -78,6 +78,9 @@ typedef struct QCowSnapshot { uint64_t vm_clock_nsec; } QCowSnapshot; +struct Qcow2Cache; +typedef struct Qcow2Cache Qcow2Cache; + typedef struct BDRVQcowState { int cluster_bits; int cluster_size; @@ -215,4 +218,20 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name); void qcow2_free_snapshots(BlockDriverState *bs); int qcow2_read_snapshots(BlockDriverState *bs); +/* qcow2-cache.c functions */ +Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables, + bool writethrough); +int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c); + +void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table); +int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c); +int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c, + Qcow2Cache *dependency); + +int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset, + void **table); +int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset, + void **table); +int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table); + #endif