From patchwork Fri Jan 29 19:04:40 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 44012 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 45BDEB7D21 for ; Sat, 30 Jan 2010 06:10:34 +1100 (EST) Received: from localhost ([127.0.0.1]:37648 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NawEh-00052c-8r for incoming@patchwork.ozlabs.org; Fri, 29 Jan 2010 14:10:31 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Naw9B-0002Vr-M8 for qemu-devel@nongnu.org; Fri, 29 Jan 2010 14:04:49 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Naw96-0002Sn-ID for qemu-devel@nongnu.org; Fri, 29 Jan 2010 14:04:49 -0500 Received: from [199.232.76.173] (port=41711 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Naw96-0002Sf-C1 for qemu-devel@nongnu.org; Fri, 29 Jan 2010 14:04:44 -0500 Received: from verein.lst.de ([213.95.11.210]:49396) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_3DES_EDE_CBC_SHA1:24) (Exim 4.60) (envelope-from ) id 1Naw94-000464-Qo for qemu-devel@nongnu.org; Fri, 29 Jan 2010 14:04:43 -0500 Received: from verein.lst.de (localhost [127.0.0.1]) by verein.lst.de (8.12.3/8.12.3/Debian-7.1) with ESMTP id o0TJ4eWY025333 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO); Fri, 29 Jan 2010 20:04:40 +0100 Received: (from hch@localhost) by verein.lst.de (8.12.3/8.12.3/Debian-7.2) id o0TJ4eR0025332; Fri, 29 Jan 2010 20:04:40 +0100 Date: Fri, 29 Jan 2010 20:04:40 +0100 From: Christoph Hellwig To: qemu-devel@nongnu.org Message-ID: <20100129190440.GA25287@lst.de> References: <20100129190417.GA25237@lst.de> Mime-Version: 1.0 Content-Disposition: inline In-Reply-To: <20100129190417.GA25237@lst.de> User-Agent: Mutt/1.3.28i X-Spam-Score: 1.052 (*) DOMAIN_BODY X-Scanned-By: MIMEDefang 2.39 X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6 (newer, 2) Cc: "Martin K. Petersen" Subject: [Qemu-devel] [PATCH 2/4] block: add block topology options X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Add three new suboptions for the drive option to export block topology information to the guest. This is needed to get optimal I/O alignment for RAID arrays or SSDs. The options are: - physical_block_size to specify the physical block size of the device, this is going to increase from 512 bytes to 4096 kilobytes for many modern storage devices - min_io_size to specify the minimal I/O size without performance impact, this is typically set to the RAID chunk size for arrays. - opt_io_size to specify the optimal sustained I/O size, this is typically the RAID stripe width for arrays. I decided to not auto-probe these values from blkid which might easily be possible as I don't know how to deal with these issues on migration. Note that we specificly only set the physical_block_size, and not the logial one which is the unit all I/O is described in. The reason for that is that IDE does not support increasing the logical block size and at last for now I want to stick to one meachnisms in queue and allow for easy switching of transports for a given backing image which would not be possible if scsi and virtio use real 4k sectors, while ide only uses the physical block exponent. Signed-off-by: Christoph Hellwig Index: qemu/block.c =================================================================== --- qemu.orig/block.c 2010-01-29 11:07:50.083004364 +0100 +++ qemu/block.c 2010-01-29 11:08:32.940004255 +0100 @@ -1028,6 +1028,51 @@ int bdrv_enable_write_cache(BlockDriverS return bs->enable_write_cache; } +unsigned int bdrv_get_physical_block_size(BlockDriverState *bs) +{ + return bs->physical_block_size; +} + +unsigned int bdrv_get_physical_block_exp(BlockDriverState *bs) +{ + unsigned int exp = 0, size; + + for (size = bs->physical_block_size; size > 512; size >>= 1) { + exp++; + } + + return exp; +} + +void bdrv_set_physical_block_size(BlockDriverState *bs, + unsigned int physical_block_size) +{ + bs->physical_block_size = physical_block_size; +} + + +unsigned int bdrv_get_min_io_size(BlockDriverState *bs) +{ + return bs->min_io_size; +} + +void bdrv_set_min_io_size(BlockDriverState *bs, + unsigned int min_io_size) +{ + bs->min_io_size = min_io_size; +} + +unsigned int bdrv_get_opt_io_size(BlockDriverState *bs) +{ + return bs->opt_io_size; +} + +void bdrv_set_opt_io_size(BlockDriverState *bs, + unsigned int opt_io_size) +{ + bs->opt_io_size = opt_io_size; +} + /* XXX: no longer used */ void bdrv_set_change_cb(BlockDriverState *bs, void (*change_cb)(void *opaque), void *opaque) Index: qemu/block.h =================================================================== --- qemu.orig/block.h 2010-01-29 11:07:50.089004011 +0100 +++ qemu/block.h 2010-01-29 11:08:32.940004255 +0100 @@ -152,6 +152,16 @@ int bdrv_is_inserted(BlockDriverState *b int bdrv_media_changed(BlockDriverState *bs); int bdrv_is_locked(BlockDriverState *bs); void bdrv_set_locked(BlockDriverState *bs, int locked); +unsigned int bdrv_get_physical_block_size(BlockDriverState *bs); +unsigned int bdrv_get_physical_block_exp(BlockDriverState *bs); +void bdrv_set_physical_block_size(BlockDriverState *bs, + unsigned int physical_block_size); +unsigned int bdrv_get_min_io_size(BlockDriverState *bs); +void bdrv_set_min_io_size(BlockDriverState *bs, + unsigned int min_io_size); +unsigned int bdrv_get_opt_io_size(BlockDriverState *bs); +void bdrv_set_opt_io_size(BlockDriverState *bs, + unsigned int opt_io_size); int bdrv_eject(BlockDriverState *bs, int eject_flag); void bdrv_set_change_cb(BlockDriverState *bs, void (*change_cb)(void *opaque), void *opaque); Index: qemu/block_int.h =================================================================== --- qemu.orig/block_int.h 2010-01-29 11:07:50.096004065 +0100 +++ qemu/block_int.h 2010-01-29 11:08:32.941003474 +0100 @@ -173,6 +173,14 @@ struct BlockDriverState { drivers. They are not used by the block driver */ int cyls, heads, secs, translation; int type; + + /* + * Topology information, all optional. + */ + unsigned int physical_block_size; + unsigned int min_io_size; + unsigned int opt_io_size; + char device_name[32]; unsigned long *dirty_bitmap; BlockDriverState *next; Index: qemu/qemu-config.c =================================================================== --- qemu.orig/qemu-config.c 2010-01-29 11:07:50.133004032 +0100 +++ qemu/qemu-config.c 2010-01-29 11:08:32.944025367 +0100 @@ -78,6 +78,15 @@ QemuOptsList qemu_drive_opts = { },{ .name = "readonly", .type = QEMU_OPT_BOOL, + },{ + .name = "physical_block_size", + .type = QEMU_OPT_NUMBER, + },{ + .name = "min_io_size", + .type = QEMU_OPT_NUMBER, + },{ + .name = "opt_io_size", + .type = QEMU_OPT_NUMBER, }, { /* end if list */ } }, Index: qemu/vl.c =================================================================== --- qemu.orig/vl.c 2010-01-29 11:07:50.141004284 +0100 +++ qemu/vl.c 2010-01-29 11:08:32.947003820 +0100 @@ -1904,6 +1904,9 @@ DriveInfo *drive_init(QemuOpts *opts, vo int index; int cache; int aio = 0; + unsigned long physical_block_size = 512; + unsigned long min_io_size = 0; + unsigned long opt_io_size = 0; int ro = 0; int bdrv_flags; int on_read_error, on_write_error; @@ -2053,6 +2056,32 @@ DriveInfo *drive_init(QemuOpts *opts, vo } #endif + if ((buf = qemu_opt_get(opts, "physical_block_size")) != NULL) { + physical_block_size = strtoul(buf, NULL, 10); + if (physical_block_size < 512) { + fprintf(stderr, "sector size must be larger than 512 bytes\n"); + return NULL; + } + } + + if ((buf = qemu_opt_get(opts, "min_io_size")) != NULL) { + min_io_size = strtoul(buf, NULL, 10); + if (!min_io_size || (min_io_size % physical_block_size)) { + fprintf(stderr, + "min_io_size must be a multiple of the sector size\n"); + return NULL; + } + } + + if ((buf = qemu_opt_get(opts, "opt_io_size")) != NULL) { + opt_io_size = strtoul(buf, NULL, 10); + if (!opt_io_size || (opt_io_size % min_io_size)) { + fprintf(stderr, + "opt_io_size must be a multiple of min_io_size\n"); + return NULL; + } + } + if ((buf = qemu_opt_get(opts, "format")) != NULL) { if (strcmp(buf, "?") == 0) { fprintf(stderr, "qemu: Supported formats:"); @@ -2257,6 +2286,12 @@ DriveInfo *drive_init(QemuOpts *opts, vo return NULL; } + bdrv_set_physical_block_size(dinfo->bdrv, physical_block_size); + if (min_io_size) + bdrv_set_min_io_size(dinfo->bdrv, min_io_size); + if (opt_io_size) + bdrv_set_opt_io_size(dinfo->bdrv, opt_io_size); + if (bdrv_key_required(dinfo->bdrv)) autostart = 0; *fatal_error = 0; Index: qemu/qemu-options.hx =================================================================== --- qemu.orig/qemu-options.hx 2010-01-29 11:07:50.149004395 +0100 +++ qemu/qemu-options.hx 2010-01-29 19:36:06.118256061 +0100 @@ -104,6 +104,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive, " [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n" " [,cache=writethrough|writeback|none][,format=f][,serial=s]\n" " [,addr=A][,id=name][,aio=threads|native][,readonly=on|off]\n" + " [,physical_block=size=size][,min_io_size=size][,opt_io_size=size]\n" " use 'file' as a drive image\n") DEF("set", HAS_ARG, QEMU_OPTION_set, "-set group.id.arg=value\n" @@ -149,6 +150,15 @@ an untrusted format header. This option specifies the serial number to assign to the device. @item addr=@var{addr} Specify the controller's PCI address (if=virtio only). +@item physical_sector_size=@var{size} +Report a physical block size larger than the logical block size of 512 bytes. +@item min_io_size=@var{size} +Reported a minimum I/O size or optimum I/O granularity. This is the smallest +I/O size the device can perform without a performance penalty. For RAID +devices this should be set to the stripe chunk size. +@item opt_io_size=@var{size} +Report an optimal I/O size, which is the device's preferred unit for +sustained I/O. This should be set to the stripe width for RAID devices. @end table By default, writethrough caching is used for all block device. This means that