From patchwork Thu Oct 9 15:11:20 2008 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Mason X-Patchwork-Id: 3598 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id 74407DE161 for ; Fri, 10 Oct 2008 02:12:45 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754092AbYJIPMm (ORCPT ); Thu, 9 Oct 2008 11:12:42 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756151AbYJIPMl (ORCPT ); Thu, 9 Oct 2008 11:12:41 -0400 Received: from rgminet01.oracle.com ([148.87.113.118]:50777 "EHLO rgminet01.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754092AbYJIPMk (ORCPT ); Thu, 9 Oct 2008 11:12:40 -0400 Received: from agmgw1.us.oracle.com (agmgw1.us.oracle.com [152.68.180.212]) by rgminet01.oracle.com (Switch-3.2.4/Switch-3.1.6) with ESMTP id m99FBhc8008497; Thu, 9 Oct 2008 09:11:46 -0600 Received: from acsmt704.oracle.com (acsmt704.oracle.com [141.146.40.82]) by agmgw1.us.oracle.com (Switch-3.2.0/Switch-3.2.0) with ESMTP id m99FBcuN000940; Thu, 9 Oct 2008 09:11:38 -0600 Received: from [192.168.1.15] (/72.225.43.119) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 09 Oct 2008 15:11:37 +0000 Subject: Re: [PATCH] Improve buffered streaming write ordering From: Chris Mason To: Dave Chinner Cc: "Aneesh Kumar K.V" , Andrew Morton , linux-kernel , linux-fsdevel , ext4 , Christoph Hellwig In-Reply-To: <20081002234309.GH30001@disturbed> References: <1222886451.9158.34.camel@think.oraclecorp.com> <20081001215239.ee2ae63f.akpm@linux-foundation.org> <1222950054.6745.18.camel@think.oraclecorp.com> <20081002181856.GB29613@skywalker> <20081002234309.GH30001@disturbed> Date: Thu, 09 Oct 2008 11:11:20 -0400 Message-Id: <1223565080.14090.28.camel@think.oraclecorp.com> Mime-Version: 1.0 X-Mailer: Evolution 2.22.2 X-Brightmail-Tracker: AAAAAQAAAAI= X-Brightmail-Tracker: AAAAAQAAAAI= X-Whitelist: TRUE X-Whitelist: TRUE Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Fri, 2008-10-03 at 09:43 +1000, Dave Chinner wrote: > On Thu, Oct 02, 2008 at 11:48:56PM +0530, Aneesh Kumar K.V wrote: > > On Thu, Oct 02, 2008 at 08:20:54AM -0400, Chris Mason wrote: > > > On Wed, 2008-10-01 at 21:52 -0700, Andrew Morton wrote: > > > For a 4.5GB streaming buffered write, this printk inside > > > ext4_da_writepage shows up 37,2429 times in /var/log/messages. > > > > > > > Part of that can happen due to shrink_page_list -> pageout -> writepagee > > call back with lots of unallocated buffer_heads(blocks). > > Quite frankly, a simple streaming buffered write should *never* > trigger writeback from the LRU in memory reclaim. That indicates > that some feedback loop has broken down and we are not cleaning > pages fast enough or perhaps in the correct order. Page reclaim in > this case should be reclaiming clean pages (those that have already > been written back), not writing back random dirty pages. Here are some go faster stripes for the XFS buffered writeback. This patch has a lot of debatable features to it, but the idea is to show which knobs are slowing us down today. The first change is to avoid calling balance_dirty_pages_ratelimited on every page. When we know we're doing a largeish write it makes more sense to balance things less often. This might just mean our ratelimit_pages magic value is too small. The second change makes xfs bump wbc->nr_to_write (suggested by Christoph), which probably makes delalloc go in bigger chunks. On unpatched kernels, XFS does streaming writes to my 4 drive array at around 205MB/s. With the patch below, I come in at 326MB/s. O_DIRECT runs at 330MB/s, so that's pretty good. With just the nr_to_write change, I get around 315MB/s. With just the balance_dirty_pages_nr change, I get around 240MB/s. -chris --- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/xfs/linux-2.6/xfs_aops.c b/fs/xfs/linux-2.6/xfs_aops.c index a44d68e..c72bd54 100644 --- a/fs/xfs/linux-2.6/xfs_aops.c +++ b/fs/xfs/linux-2.6/xfs_aops.c @@ -944,6 +944,9 @@ xfs_page_state_convert( int trylock = 0; int all_bh = unmapped; + + wbc->nr_to_write *= 4; + if (startio) { if (wbc->sync_mode == WB_SYNC_NONE && wbc->nonblocking) trylock |= BMAPI_TRYLOCK; diff --git a/mm/filemap.c b/mm/filemap.c index 876bc59..b6c26e3 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -2389,6 +2389,7 @@ static ssize_t generic_perform_write(struct file *file, long status = 0; ssize_t written = 0; unsigned int flags = 0; + unsigned long nr = 0; /* * Copies from kernel address space cannot fail (NFSD is a big user). @@ -2460,11 +2461,17 @@ again: } pos += copied; written += copied; - - balance_dirty_pages_ratelimited(mapping); + nr++; + if (nr > 256) { + balance_dirty_pages_ratelimited_nr(mapping, nr); + nr = 0; + } } while (iov_iter_count(i)); + if (nr) + balance_dirty_pages_ratelimited_nr(mapping, nr); + return written ? written : status; }