From patchwork Fri Jun 3 01:02:33 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christoph Hellwig X-Patchwork-Id: 98498 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 5FFFEB6FB3 for ; Fri, 3 Jun 2011 11:02:41 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753811Ab1FCBCj (ORCPT ); Thu, 2 Jun 2011 21:02:39 -0400 Received: from 173-166-109-252-newengland.hfc.comcastbusiness.net ([173.166.109.252]:50213 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753634Ab1FCBCj (ORCPT ); Thu, 2 Jun 2011 21:02:39 -0400 Received: from hch by bombadil.infradead.org with local (Exim 4.76 #1 (Red Hat Linux)) id 1QSImX-0005tO-Td; Fri, 03 Jun 2011 01:02:33 +0000 Date: Thu, 2 Jun 2011 21:02:33 -0400 From: Christoph Hellwig To: Vivek Goyal Cc: Ted Ts'o , Dave Chinner , linux-ext4@vger.kernel.org Subject: Re: Query about DIO/AIO WRITE throttling and ext4 serialization Message-ID: <20110603010233.GA17726@infradead.org> References: <20110601215049.GC17449@redhat.com> <20110602012209.GQ561@dastard> <20110602141716.GD18712@redhat.com> <20110602143633.GE18712@redhat.com> <20110602155610.GF18712@redhat.com> <20110602235153.GV561@dastard> <20110603002714.GA27129@redhat.com> <20110603004300.GE16306@thunk.org> <20110603005403.GB27129@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20110603005403.GB27129@redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-SRS-Rewrite: SMTP reverse-path rewritten from by bombadil.infradead.org See http://www.infradead.org/rpr.html Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Thu, Jun 02, 2011 at 08:54:03PM -0400, Vivek Goyal wrote: > Just wondering why ext4 and XFS behavior are different and which is a > more appropriate behavior. ext4 does not seem to be waiting for all > pending AIO/DIO to finish while XFS does. They're both wrong. Ext4 completely misses support in fsync or sync to catch pending unwrittent extent conversions, and thus fails to obey the data integrity guarante. XFS is beeing rather stupid about the amount of synchronization it requires. The untested patch below should help with avoiding the synchronization if you're purely doing overwrites: --- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Index: xfs/fs/xfs/linux-2.6/xfs_aops.c =================================================================== --- xfs.orig/fs/xfs/linux-2.6/xfs_aops.c 2011-06-03 09:54:52.964337556 +0900 +++ xfs/fs/xfs/linux-2.6/xfs_aops.c 2011-06-03 09:57:06.877674259 +0900 @@ -270,7 +270,7 @@ xfs_finish_ioend_sync( * (vs. incore size). */ STATIC xfs_ioend_t * -xfs_alloc_ioend( +__xfs_alloc_ioend( struct inode *inode, unsigned int type) { @@ -290,7 +290,6 @@ xfs_alloc_ioend( ioend->io_inode = inode; ioend->io_buffer_head = NULL; ioend->io_buffer_tail = NULL; - atomic_inc(&XFS_I(ioend->io_inode)->i_iocount); ioend->io_offset = 0; ioend->io_size = 0; ioend->io_iocb = NULL; @@ -300,6 +299,18 @@ xfs_alloc_ioend( return ioend; } +STATIC xfs_ioend_t * +xfs_alloc_ioend( + struct inode *inode, + unsigned int type) +{ + struct xfs_ioend *ioend; + + ioend = __xfs_alloc_ioend(inode, type); + atomic_inc(&XFS_I(ioend->io_inode)->i_iocount); + return ioend; +} + STATIC int xfs_map_blocks( struct inode *inode, @@ -1318,6 +1329,7 @@ xfs_end_io_direct_write( */ iocb->private = NULL; + atomic_inc(&XFS_I(ioend->io_inode)->i_iocount); ioend->io_offset = offset; ioend->io_size = size; if (private && size > 0) @@ -1354,7 +1366,7 @@ xfs_vm_direct_IO( ssize_t ret; if (rw & WRITE) { - iocb->private = xfs_alloc_ioend(inode, IO_DIRECT); + iocb->private = __xfs_alloc_ioend(inode, IO_DIRECT); ret = __blockdev_direct_IO(rw, iocb, inode, bdev, iov, offset, nr_segs,