From patchwork Fri Apr 15 13:40:55 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Hajnoczi X-Patchwork-Id: 91376 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [140.186.70.17]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id D1402B6FC3 for ; Fri, 15 Apr 2011 23:41:23 +1000 (EST) Received: from localhost ([::1]:58774 helo=lists2.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QAjGx-0004eu-QL for incoming@patchwork.ozlabs.org; Fri, 15 Apr 2011 09:41:19 -0400 Received: from eggs.gnu.org ([140.186.70.92]:50114) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QAjGo-0004en-Tx for qemu-devel@nongnu.org; Fri, 15 Apr 2011 09:41:12 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QAjGn-0005FR-Su for qemu-devel@nongnu.org; Fri, 15 Apr 2011 09:41:10 -0400 Received: from mtagate5.uk.ibm.com ([194.196.100.165]:45188) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QAjGn-0005F8-HZ for qemu-devel@nongnu.org; Fri, 15 Apr 2011 09:41:09 -0400 Received: from d06nrmr1507.portsmouth.uk.ibm.com (d06nrmr1507.portsmouth.uk.ibm.com [9.149.38.233]) by mtagate5.uk.ibm.com (8.13.1/8.13.1) with ESMTP id p3FDf3o3001428 for ; Fri, 15 Apr 2011 13:41:03 GMT Received: from d06av09.portsmouth.uk.ibm.com (d06av09.portsmouth.uk.ibm.com [9.149.37.250]) by d06nrmr1507.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p3FDfsrQ1761356 for ; Fri, 15 Apr 2011 14:41:54 +0100 Received: from d06av09.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av09.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p3FDf2UN031302 for ; Fri, 15 Apr 2011 07:41:02 -0600 Received: from stefanha-thinkpad.manchester-maybrook.uk.ibm.com (dyn-9-174-219-27.manchester-maybrook.uk.ibm.com [9.174.219.27]) by d06av09.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id p3FDf2TA031286; Fri, 15 Apr 2011 07:41:02 -0600 From: Stefan Hajnoczi To: Date: Fri, 15 Apr 2011 14:40:55 +0100 Message-Id: <1302874855-14736-1-git-send-email-stefanha@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.4.1 X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-Received-From: 194.196.100.165 Cc: Kevin Wolf , Anthony Liguori , Stefan Hajnoczi , Khoa Huynh , Badari Pulavarty , Christoph Hellwig Subject: [Qemu-devel] [PATCH] raw-posix: Linearize direct I/O on Linux NFS X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org The Linux NFS client issues separate NFS requests for vectored direct I/O writes. For example, a pwritev() with 8 elements results in 8 write requests to the server. This is very inefficient and a kernel-side fix is not trivial or likely to be available soon. This patch detects files on NFS and uses the QEMU_AIO_MISALIGNED flag to force requests to bounce through a linear buffer. Khoa Huynh reports the following ffsb benchmark results over 1 Gbit Ethernet: Test (threads=8) unpatched patched (MB/s) (MB/s) Large File Creates (bs=256 KB) 20.5 112.0 Sequential Reads (bs=256 KB) 58.7 112.0 Large File Creates (bs=8 KB) 5.2 5.8 Sequential Reads (bs=8 KB) 46.7 80.9 Random Reads (bs=8 KB) 8.7 23.4 Random Writes (bs=8 KB) 39.6 44.0 Mail Server (bs=8 KB) 10.2 23.6 Test (threads=1) unpatched patched (MB/s) (MB/s) Large File Creates (bs=256 KB) 14.5 49.8 Sequential Reads (bs=256 KB) 87.9 83.9 Large File Creates (bs=8 KB) 4.8 4.8 Sequential Reads (bs=8 KB) 23.2 23.1 Random Reads (bs=8 KB) 4.8 4.7 Random Writes (bs=8 KB) 9.4 12.8 Mail Server (bs=8 KB) 5.4 7.3 Signed-off-by: Stefan Hajnoczi --- block/raw-posix.c | 55 ++++++++++++++++++++++++++++++++++++++++++++-------- 1 files changed, 46 insertions(+), 9 deletions(-) diff --git a/block/raw-posix.c b/block/raw-posix.c index 6b72470..40b7c61 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -49,8 +49,10 @@ #ifdef __linux__ #include #include +#include #include #include +#include #endif #if defined (__FreeBSD__) || defined(__FreeBSD_kernel__) #include @@ -124,6 +126,7 @@ typedef struct BDRVRawState { #endif uint8_t *aligned_buf; unsigned aligned_buf_size; + bool force_linearize; #ifdef CONFIG_XFS bool is_xfs : 1; #endif @@ -136,6 +139,32 @@ static int64_t raw_getlength(BlockDriverState *bs); static int cdrom_reopen(BlockDriverState *bs); #endif +#if defined(__linux__) +static bool is_vectored_io_slow(int fd, int open_flags) +{ + struct statfs stfs; + int ret; + + do { + ret = fstatfs(fd, &stfs); + } while (ret != 0 && errno == EINTR); + + /* + * Linux NFS client splits vectored direct I/O requests into separate NFS + * requests so it is faster to submit a single buffer instead. + */ + if (!ret && stfs.f_type == NFS_SUPER_MAGIC && (open_flags & O_DIRECT)) { + return true; + } + return false; +} +#else /* !defined(__linux__) */ +static bool is_vectored_io_slow(int fd, int open_flags) +{ + return false; +} +#endif + static int raw_open_common(BlockDriverState *bs, const char *filename, int bdrv_flags, int open_flags) { @@ -167,6 +196,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename, } s->fd = fd; s->aligned_buf = NULL; + s->force_linearize = is_vectored_io_slow(fd, s->open_flags); if ((bdrv_flags & BDRV_O_NOCACHE)) { /* @@ -536,20 +566,27 @@ static BlockDriverAIOCB *raw_aio_submit(BlockDriverState *bs, return NULL; /* + * Check if buffers need to be copied into a single linear buffer. + */ + if (s->force_linearize && qiov->niov > 1) { + type |= QEMU_AIO_MISALIGNED; + } + + /* * If O_DIRECT is used the buffer needs to be aligned on a sector - * boundary. Check if this is the case or telll the low-level + * boundary. Check if this is the case or tell the low-level * driver that it needs to copy the buffer. */ - if (s->aligned_buf) { - if (!qiov_is_aligned(bs, qiov)) { - type |= QEMU_AIO_MISALIGNED; + if (s->aligned_buf && !qiov_is_aligned(bs, qiov)) { + type |= QEMU_AIO_MISALIGNED; + } + #ifdef CONFIG_LINUX_AIO - } else if (s->use_aio) { - return laio_submit(bs, s->aio_ctx, s->fd, sector_num, qiov, - nb_sectors, cb, opaque, type); -#endif - } + if (s->use_aio && (type & QEMU_AIO_MISALIGNED) == 0) { + return laio_submit(bs, s->aio_ctx, s->fd, sector_num, qiov, + nb_sectors, cb, opaque, type); } +#endif return paio_submit(bs, s->fd, sector_num, qiov, nb_sectors, cb, opaque, type);