From patchwork Mon Aug 16 12:32:55 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stefan Bader X-Patchwork-Id: 61805 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from chlorine.canonical.com (chlorine.canonical.com [91.189.94.204]) by ozlabs.org (Postfix) with ESMTP id 95DC4B6F0E for ; Mon, 16 Aug 2010 22:33:06 +1000 (EST) Received: from localhost ([127.0.0.1] helo=chlorine.canonical.com) by chlorine.canonical.com with esmtp (Exim 4.69) (envelope-from ) id 1Okys7-0000Nd-UZ; Mon, 16 Aug 2010 13:33:00 +0100 Received: from adelie.canonical.com ([91.189.90.139]) by chlorine.canonical.com with esmtp (Exim 4.69) (envelope-from ) id 1Okys6-0000NX-7i for kernel-team@lists.ubuntu.com; Mon, 16 Aug 2010 13:32:58 +0100 Received: from hutte.canonical.com ([91.189.90.181]) by adelie.canonical.com with esmtp (Exim 4.69 #1 (Debian)) id 1Okys4-0001DY-Ui; Mon, 16 Aug 2010 13:32:56 +0100 Received: from p5b2e6fb6.dip.t-dialin.net ([91.46.111.182] helo=[192.168.2.121]) by hutte.canonical.com with esmtpsa (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.69) (envelope-from ) id 1Okys4-0005Il-OS; Mon, 16 Aug 2010 13:32:56 +0100 Message-ID: <4C692FF7.7050702@canonical.com> Date: Mon, 16 Aug 2010 14:32:55 +0200 From: Stefan Bader User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.11) Gecko/20100713 Lightning/1.0b1 Thunderbird/3.0.6 MIME-Version: 1.0 To: Jens Axboe Subject: Re: [stable] [PATCH 00/16]: Fix writeback regressions in 2.6.32 References: <1281432544-20831-1-git-send-email-stefan.bader@canonical.com> <20100811195931.GI23566@kroah.com> <4C630414.7050400@kernel.dk> <4C636A79.7020704@fusionio.com> In-Reply-To: <4C636A79.7020704@fusionio.com> X-Enigmail-Version: 1.0.1 Cc: Greg KH , kernel-team@lists.ubuntu.com, Jan Kara , stable@kernel.org X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: kernel-team-bounces@lists.ubuntu.com Errors-To: kernel-team-bounces@lists.ubuntu.com On 08/12/2010 05:28 AM, Jens Axboe wrote: > On 08/11/2010 04:12 PM, Jens Axboe wrote: >> On 08/11/2010 03:59 PM, Greg KH wrote: >>> On Tue, Aug 10, 2010 at 11:28:48AM +0200, Stefan Bader wrote: >>>> Impact: 2.6.32 is facing some serious performance regression which are visible >>>> plainly when unmounting filesystems as async writeback degrades to sync in >>>> that case. But there seem to be other less visible problems as well. >>>> I believe this set of patches also would be benefitial to go into upstream >>>> stable. >>>> >>>> Fix: This series of patches was picked and backported when needed to apply >>>> to 2.6.32.y (there is another set for 2.6.34.y[1]). It probably does not >>>> solve all currently known problems but testing has shown considerable >>>> improvements. At the beginning of the set there are some patches that were >>>> picked to make the later patches applicable. Only one of them is a bit more >>>> intrusive the others rather trivial. >>>> >>>> Testcase: Unmounting a fs that still has larger amounts of data to write back >>>> to disk will take noticably longer time to complete (IIRC might be 30min). >>>> Also general responsiveness was reported to improve. Currently there are no >>>> known regressions reported when testing this backport. >>> >>> Jens, what do you think about this series? >> >> I think it is scary :-) >> >> But seriously, we need to do something about this umount regression and >> there's likely no way around this series. Doing the full backport is >> the best option. Stefan, to what extent did you test this backport? > > So I discussed this with Jan tonight, and a better solution may be > something similar to my first attempt at fixing this patch earlier. > Basically just pass in the info on whether this is done from umount or > not. The bug here is that umount already holds the sb sem when doing > WB_SYNC_NONE writeback, so we skip all dirty inodes for that pass and > end up doing everything as WB_SYNC_ALL. If we pass in this lock > information, then wb_writeback() knows when to skip pinning the sb. > > The below patch is totally untested. Stefan, care to give it a spin with > the test case? This would be a lot less intrusive than the full > backport, which is primarily backporting cleanups and optimizations that > we really need not put in 2.6.32-stable if at all avoidable. > While applying the patch for testing I saw that stable 2.6.32 carries commit b78a38dca6e04634ddc718e315712b45abcf92fd Author: Eric Sandeen Date: Wed Dec 23 07:57:07 2009 -0500 fs-writeback: Add helper function to start writeback if idle commit 17bd55d037a02b04d9119511cfd1a4b985d20f63 upstream. which needs an additional change in writeback_inodes_sb_if_idle(). Making an educated guess I changed that call into the "not locked" variant (patch attached). Is that correct? -Stefan Acked-by: Jan Kara From e8f179b9dc92fe91d7379a3ac330a57a2ef9fc87 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Fri, 13 Aug 2010 14:45:16 +0200 Subject: [PATCH] writeback: Alternate fix for umount regression Signed-off-by: Stefan Bader --- fs/fs-writeback.c | 18 +++++++++++------- fs/sync.c | 2 +- fs/ubifs/budget.c | 2 +- include/linux/backing-dev.h | 2 +- include/linux/writeback.h | 4 +++- mm/page-writeback.c | 2 +- 6 files changed, 18 insertions(+), 12 deletions(-) diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 460e5b7..bc87976 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -45,6 +45,7 @@ struct wb_writeback_args { int for_kupdate:1; int range_cyclic:1; int for_background:1; + int locked:1; }; /* @@ -252,13 +253,14 @@ static void bdi_sync_writeback(struct backing_dev_info *bdi, * */ void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb, - long nr_pages) + long nr_pages, int locked) { struct wb_writeback_args args = { .sb = sb, .sync_mode = WB_SYNC_NONE, .nr_pages = nr_pages, .range_cyclic = 1, + .locked = locked, }; /* @@ -585,7 +587,7 @@ static int pin_sb_for_writeback(struct writeback_control *wbc, /* * Caller must already hold the ref for this */ - if (wbc->sync_mode == WB_SYNC_ALL) { + if (wbc->sync_mode == WB_SYNC_ALL || wbc->locked) { WARN_ON(!rwsem_is_locked(&sb->s_umount)); return 0; } @@ -758,6 +760,7 @@ static long wb_writeback(struct bdi_writeback *wb, .older_than_this = NULL, .for_kupdate = args->for_kupdate, .range_cyclic = args->range_cyclic, + .locked = args->locked, }; unsigned long oldest_jif; long wrote = 0; @@ -912,7 +915,7 @@ long wb_do_writeback(struct bdi_writeback *wb, int force_wait) * If this isn't a data integrity operation, just notify * that we have seen this work and we are now starting it. */ - if (args.sync_mode == WB_SYNC_NONE) + if (args.sync_mode == WB_SYNC_NONE && !args.locked) wb_clear_pending(wb, work); wrote += wb_writeback(wb, &args); @@ -921,7 +924,7 @@ long wb_do_writeback(struct bdi_writeback *wb, int force_wait) * This is a data integrity writeback, so only do the * notification when we have completed the work. */ - if (args.sync_mode == WB_SYNC_ALL) + if (args.sync_mode == WB_SYNC_ALL || args.locked) wb_clear_pending(wb, work); } @@ -1206,13 +1209,14 @@ static void wait_sb_inodes(struct super_block *sb) /** * writeback_inodes_sb - writeback dirty inodes from given super_block * @sb: the superblock + * @locked: sb already pinned * * Start writeback on some inodes on this super_block. No guarantees are made * on how many (if any) will be written, and this function does not wait * for IO completion of submitted IO. The number of pages submitted is * returned. */ -void writeback_inodes_sb(struct super_block *sb) +void writeback_inodes_sb(struct super_block *sb, int locked) { unsigned long nr_dirty = global_page_state(NR_FILE_DIRTY); unsigned long nr_unstable = global_page_state(NR_UNSTABLE_NFS); @@ -1221,7 +1225,7 @@ void writeback_inodes_sb(struct super_block *sb) nr_to_write = nr_dirty + nr_unstable + (inodes_stat.nr_inodes - inodes_stat.nr_unused); - bdi_start_writeback(sb->s_bdi, sb, nr_to_write); + bdi_start_writeback(sb->s_bdi, sb, nr_to_write, locked); } EXPORT_SYMBOL(writeback_inodes_sb); @@ -1235,7 +1239,7 @@ EXPORT_SYMBOL(writeback_inodes_sb); int writeback_inodes_sb_if_idle(struct super_block *sb) { if (!writeback_in_progress(sb->s_bdi)) { - writeback_inodes_sb(sb); + writeback_inodes_sb(sb, 0); return 1; } else return 0; diff --git a/fs/sync.c b/fs/sync.c index d104591..41d543e 100644 --- a/fs/sync.c +++ b/fs/sync.c @@ -37,7 +37,7 @@ static int __sync_filesystem(struct super_block *sb, int wait) /* Avoid doing twice syncing and cache pruning for quota sync */ if (!wait) { writeout_quota_sb(sb, -1); - writeback_inodes_sb(sb); + writeback_inodes_sb(sb, 1); } else { sync_quota_sb(sb, -1); sync_inodes_sb(sb); diff --git a/fs/ubifs/budget.c b/fs/ubifs/budget.c index 076ca50..d16a8eb 100644 --- a/fs/ubifs/budget.c +++ b/fs/ubifs/budget.c @@ -62,7 +62,7 @@ */ static void shrink_liability(struct ubifs_info *c, int nr_to_write) { - writeback_inodes_sb(c->vfs_sb); + writeback_inodes_sb(c->vfs_sb, 0); } /** diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index b449e73..ce997ba 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -102,7 +102,7 @@ int bdi_register(struct backing_dev_info *bdi, struct device *parent, int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev); void bdi_unregister(struct backing_dev_info *bdi); void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb, - long nr_pages); + long nr_pages, int locked); int bdi_writeback_task(struct bdi_writeback *wb); int bdi_has_dirty_io(struct backing_dev_info *bdi); diff --git a/include/linux/writeback.h b/include/linux/writeback.h index dc52482..4578dc1 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -52,6 +52,8 @@ struct writeback_control { unsigned for_reclaim:1; /* Invoked from the page allocator */ unsigned range_cyclic:1; /* range_start is cyclic */ unsigned more_io:1; /* more io to be dispatched */ + unsigned locked:1; /* sb pinned for WB_SYNC_NONE */ + /* * write_cache_pages() won't update wbc->nr_to_write and * mapping->writeback_index if no_nrwrite_index_update @@ -68,7 +70,7 @@ struct writeback_control { */ struct bdi_writeback; int inode_wait(void *); -void writeback_inodes_sb(struct super_block *); +void writeback_inodes_sb(struct super_block *, int); int writeback_inodes_sb_if_idle(struct super_block *); void sync_inodes_sb(struct super_block *); void writeback_inodes_wbc(struct writeback_control *wbc); diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 2c5d792..e73dbb7 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -597,7 +597,7 @@ static void balance_dirty_pages(struct address_space *mapping, (!laptop_mode && ((global_page_state(NR_FILE_DIRTY) + global_page_state(NR_UNSTABLE_NFS)) > background_thresh))) - bdi_start_writeback(bdi, NULL, 0); + bdi_start_writeback(bdi, NULL, 0, 0); } void set_page_dirty_balance(struct page *page, int page_mkwrite) -- 1.7.0.4