From patchwork Wed Jun 8 14:50:25 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmitry Konishchev X-Patchwork-Id: 99486 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [140.186.70.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id D713FB6FD3 for ; Thu, 9 Jun 2011 02:15:43 +1000 (EST) Received: from localhost ([::1]:52004 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QULPw-0003VH-5a for incoming@patchwork.ozlabs.org; Wed, 08 Jun 2011 12:15:40 -0400 Received: from eggs.gnu.org ([140.186.70.92]:36807) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QUK6Y-0006JP-CU for qemu-devel@nongnu.org; Wed, 08 Jun 2011 10:51:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QUK6W-00040A-1E for qemu-devel@nongnu.org; Wed, 08 Jun 2011 10:51:34 -0400 Received: from mail-fx0-f45.google.com ([209.85.161.45]:41320) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QUK6V-0003zq-Ef for qemu-devel@nongnu.org; Wed, 08 Jun 2011 10:51:31 -0400 Received: by fxm2 with SMTP id 2so429630fxm.4 for ; Wed, 08 Jun 2011 07:51:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:from:to:cc:subject:date:message-id:x-mailer; bh=4arzS0O69TNXtJaYeikrU+7vMTeQ0d6pDOVleDyUB18=; b=f2+PFOndmiPkxzrRALHW+rFtb685Ny4hGa+kw7JWm30oTfLW3rPTmJHSImQEoiQ6SJ t3Ot9fBoliCisKnQ4IAp+cF7sOnoLROWlEZRqtGf6VfBmqxYPbxXLnIdIzZCBO7AgiY3 mWCwQiBTfeQhjYdRu1lehFD/O+djSHadIjqDk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:to:cc:subject:date:message-id:x-mailer; b=eKqfxBWQypud0u7hzkMZ4+HhKNS9BilGtDVLGmmh5bILvk6A4yjGAhahhRMWg97IUh s7a901dGKDINFkL5716+6CLAe4Zvyrb3OFvk1oe8yIJDnmWq9IZ2xEPokh/7S/5+Y2GD ge3Egskl4bVqsnpGGsl+NFdv0Itz/Ozt07KB0= Received: by 10.223.127.197 with SMTP id h5mr2355180fas.36.1307544689795; Wed, 08 Jun 2011 07:51:29 -0700 (PDT) Received: from localhost.localdomain ([94.79.45.97]) by mx.google.com with ESMTPS id q21sm267446fan.16.2011.06.08.07.51.29 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 08 Jun 2011 07:51:29 -0700 (PDT) From: Dmitry Konishchev To: qemu-devel@nongnu.org Date: Wed, 8 Jun 2011 18:50:25 +0400 Message-Id: <1307544625-22907-1-git-send-email-konishchev@gmail.com> X-Mailer: git-send-email 1.7.4.1 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2) X-Received-From: 209.85.161.45 Cc: Kevin Wolf , Dmitry Konishchev Subject: [Qemu-devel] [PATCH] CPU consumption optimization of 'qemu-img convert' using bdrv_is_allocated() X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org This patch optimizes 'qemu-img convert' operation for volumes which are almost fully unallocated. Here are the results of simple tests: We have a snapshot of a volume: $ qemu-img info snapshot.qcow2 image: snapshot.qcow2 file format: qcow2 virtual size: 5.0G (5372805120 bytes) disk size: 4.0G cluster_size: 65536 Create a volume from the snapshot and use it a little: $ qemu-img create -f qcow2 -o backing_file=snapshot.qcow2 volume.qcow2 For volumes which are almost fully allocated we have a little regression: $ time qemu-img convert -O qcow2 volume.qcow2 volume_snapshot.qcow2 real 2m43.864s user 0m9.257s sys 0m40.559s $ time qemu-img-patched convert -O qcow2 volume.qcow2 volume_snapshot.qcow2 real 2m46.899s user 0m9.749s sys 0m40.471s But now create a volume which is almost fully unallocated: $ qemu-img create -f qcow2 -o backing_file=snapshot.qcow2 volume.qcow2 1T And now we have more than twice decreased CPU consumption: $ time qemu-img convert -O qcow2 volume.qcow2 volume_snapshot.qcow2 real 6m40.985s user 4m13.832s sys 0m33.738s $ time qemu-img-patched convert -O qcow2 volume.qcow2 volume_snapshot.qcow2 real 4m28.448s user 1m43.882s sys 0m33.894s Signed-off-by: Dmitry Konishchev --- qemu-img.c | 168 +++++++++++++++++++++++++++++++++++++++++++++--------------- 1 files changed, 127 insertions(+), 41 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index 4f162d1..9d905ed 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -38,6 +38,8 @@ typedef struct img_cmd_t { int (*handler)(int argc, char **argv); } img_cmd_t; +static const int SECTOR_SIZE = 512; + /* Default to cache=writeback as data integrity is not important for qemu-tcg. */ #define BDRV_O_FLAGS BDRV_O_CACHE_WB @@ -531,7 +533,7 @@ static int is_not_zero(const uint8_t *sector, int len) } /* - * Returns true iff the first sector pointed to by 'buf' contains at least + * Returns true if the first sector pointed to by 'buf' contains at least * a non-NUL byte. * * 'pnum' is set to the number of sectors (including and immediately following @@ -590,15 +592,15 @@ static int compare_sectors(const uint8_t *buf1, const uint8_t *buf2, int n, static int img_convert(int argc, char **argv) { - int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size, cluster_sectors; + int c, ret = 0, n, cur_n, bs_n, bs_i, compress, cluster_size, cluster_sectors; int progress = 0; const char *fmt, *out_fmt, *out_baseimg, *out_filename; BlockDriver *drv, *proto_drv; BlockDriverState **bs = NULL, *out_bs = NULL; int64_t total_sectors, nb_sectors, sector_num, bs_offset; uint64_t bs_sectors; + uint64_t *bs_geometry = NULL; uint8_t * buf = NULL; - const uint8_t *buf1; BlockDriverInfo bdi; QEMUOptionParameter *param = NULL, *create_options = NULL; QEMUOptionParameter *out_baseimg_param; @@ -874,14 +876,21 @@ static int img_convert(int argc, char **argv) /* signal EOF to align */ bdrv_write_compressed(out_bs, 0, NULL, 0); } else { + int backing_depth; + int bs_i_prev = -1; + float progress = 100; + BlockDriverState *cur_bs; int has_zero_init = bdrv_has_zero_init(out_bs); sector_num = 0; // total number of sectors converted so far nb_sectors = total_sectors - sector_num; - local_progress = (float)100 / - (nb_sectors / MIN(nb_sectors, IO_BUF_SIZE / 512)); for(;;) { + if (total_sectors) { + progress = (long double) sector_num / total_sectors * 100; + } + qemu_progress_print(progress, 0); + nb_sectors = total_sectors - sector_num; if (nb_sectors <= 0) { break; @@ -893,15 +902,38 @@ static int img_convert(int argc, char **argv) } while (sector_num - bs_offset >= bs_sectors) { - bs_i ++; - assert (bs_i < bs_n); + bs_i++; + assert(bs_i < bs_n); bs_offset += bs_sectors; bdrv_get_geometry(bs[bs_i], &bs_sectors); + /* printf("changing part: sector_num=%" PRId64 ", bs_i=%d, " "bs_offset=%" PRId64 ", bs_sectors=%" PRId64 "\n", sector_num, bs_i, bs_offset, bs_sectors); */ } + if (bs_i != bs_i_prev) { + /* Getting geometry of the image and all its backing images */ + + backing_depth = 1; + cur_bs = bs[bs_i]; + while (( cur_bs = cur_bs->backing_hd )) { + backing_depth++; + } + + bs_geometry = (uint64_t *) qemu_realloc( + bs_geometry, backing_depth * sizeof(uint64_t)); + + backing_depth = 1; + cur_bs = bs[bs_i]; + *bs_geometry = bs_sectors; + while (( cur_bs = cur_bs->backing_hd )) { + bdrv_get_geometry(cur_bs, bs_geometry + backing_depth++); + } + + bs_i_prev = bs_i; + } + if (n > bs_offset + bs_sectors - sector_num) { n = bs_offset + bs_sectors - sector_num; } @@ -912,55 +944,109 @@ static int img_convert(int argc, char **argv) are present in both the output's and input's base images (no need to copy them). */ if (out_baseimg) { - if (!bdrv_is_allocated(bs[bs_i], sector_num - bs_offset, - n, &n1)) { - sector_num += n1; + if (!bdrv_is_allocated(bs[bs_i], sector_num - bs_offset, n, &cur_n)) { + sector_num += cur_n; continue; } - /* The next 'n1' sectors are allocated in the input image. Copy + /* The next 'cur_n' sectors are allocated in the input image. Copy only those as they may be followed by unallocated sectors. */ - n = n1; + n = cur_n; } - } else { - n1 = n; } - ret = bdrv_read(bs[bs_i], sector_num - bs_offset, buf, n); - if (ret < 0) { - error_report("error while reading"); - goto out; - } - /* NOTE: at the same time we convert, we do not write zero - sectors to have a chance to compress the image. Ideally, we - should add a specific call to have the info to go faster */ - buf1 = buf; - while (n > 0) { - /* If the output image is being created as a copy on write image, - copy all sectors even the ones containing only NUL bytes, - because they may differ from the sectors in the base image. - - If the output is to a host device, we also write out - sectors that are entirely 0, since whatever data was - already there is garbage, not 0s. */ - if (!has_zero_init || out_baseimg || - is_allocated_sectors(buf1, n, &n1)) { - ret = bdrv_write(out_bs, sector_num, buf1, n1); - if (ret < 0) { - error_report("error while writing"); - goto out; + /* If the output image is being created as a copy on write image, + copy all sectors even the ones containing only zero bytes, + because they may differ from the sectors in the base image. + + If the output is to a host device, we also write out + sectors that are entirely 0, since whatever data was + already there is garbage, not 0s. */ + if (!has_zero_init || out_baseimg) { + ret = bdrv_read(bs[bs_i], sector_num - bs_offset, buf, n); + if (ret < 0) { + error_report("error while reading"); + goto out; + } + + ret = bdrv_write(out_bs, sector_num, buf, n); + if (ret < 0) { + error_report("error while writing"); + goto out; + } + + sector_num += n; + } else { + /* Look for the sectors in the image and if they are not + allocated - sequentially in all its backing images. + + Write only non-zero bytes to the output image. */ + + uint64_t cur_sectors; + uint64_t bs_sector; + int allocated_num; + int sector_found; + + while (n > 0) { + cur_bs = bs[bs_i]; + bs_sector = sector_num - bs_offset; + backing_depth = 0; + sector_found = 0; + + do { + cur_sectors = bs_geometry[backing_depth++]; + + if (bs_sector >= cur_sectors) { + continue; + } + + if (bs_sector + n <= cur_sectors) { + cur_n = n; + } else { + cur_n = cur_sectors - bs_sector; + } + + if (bdrv_is_allocated(cur_bs, bs_sector, cur_n, &allocated_num)) { + const uint8_t *cur_buf = buf; + sector_found = 1; + + ret = bdrv_read(cur_bs, bs_sector, buf, allocated_num); + if (ret < 0) { + error_report("error while reading"); + goto out; + } + + while (allocated_num > 0) { + if (is_allocated_sectors(cur_buf, allocated_num, &cur_n)) { + ret = bdrv_write(out_bs, sector_num, cur_buf, cur_n); + if (ret < 0) { + error_report("error while writing"); + goto out; + } + } + + n -= cur_n; + sector_num += cur_n; + allocated_num -= cur_n; + cur_buf += cur_n * SECTOR_SIZE; + } + + break; + } + } while(( cur_bs = cur_bs->backing_hd )); + + if (!sector_found) { + sector_num++; + n--; } } - sector_num += n1; - n -= n1; - buf1 += n1 * 512; } - qemu_progress_print(local_progress, 100); } } out: qemu_progress_end(); free_option_parameters(create_options); free_option_parameters(param); + qemu_free(bs_geometry); qemu_free(buf); if (out_bs) { bdrv_delete(out_bs);