From patchwork Tue May 10 14:35:09 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mel Gorman X-Patchwork-Id: 94994 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 534AFB6EF3 for ; Wed, 11 May 2011 00:39:41 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755501Ab1EJOfO (ORCPT ); Tue, 10 May 2011 10:35:14 -0400 Received: from cantor.suse.de ([195.135.220.2]:35857 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755399Ab1EJOfN (ORCPT ); Tue, 10 May 2011 10:35:13 -0400 Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.221.2]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.suse.de (Postfix) with ESMTP id 7FFDC9486B; Tue, 10 May 2011 16:35:12 +0200 (CEST) Date: Tue, 10 May 2011 15:35:09 +0100 From: Mel Gorman To: James Bottomley Cc: Mel Gorman , Jan Kara , colin.king@canonical.com, Chris Mason , linux-fsdevel , linux-mm , linux-kernel , linux-ext4 Subject: Re: [BUG] fatal hang untarring 90GB file, possibly writeback related. Message-ID: <20110510143509.GD4146@suse.de> References: <1304025145.2598.24.camel@mulgrave.site> <1304030629.2598.42.camel@mulgrave.site> <20110503091320.GA4542@novell.com> <1304431982.2576.5.camel@mulgrave.site> <1304432553.2576.10.camel@mulgrave.site> <20110506074224.GB6591@suse.de> <20110506080728.GC6591@suse.de> <1304964980.4865.53.camel@mulgrave.site> <20110510102141.GA4149@novell.com> <1305036064.6737.8.camel@mulgrave.site> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1305036064.6737.8.camel@mulgrave.site> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Tue, May 10, 2011 at 09:01:04AM -0500, James Bottomley wrote: > On Tue, 2011-05-10 at 11:21 +0100, Mel Gorman wrote: > > I really would like to hear if the fix makes a big difference or > > if we need to consider forcing SLUB high-order allocations bailing > > at the first sign of trouble (e.g. by masking out __GFP_WAIT in > > allocate_slab). Even with the fix applied, kswapd might be waking up > > less but processes will still be getting stalled in direct compaction > > and direct reclaim so it would still be jittery. > > "the fix" being this > > https://lkml.org/lkml/2011/3/5/121 > Drop this for the moment. It was a long shot at best and there is little evidence the problem is in this area. I'm attaching two patches. The first is the NO_KSWAPD one to stop kswapd being woken up by SLUB using speculative high-orders. The second one is more drastic and prevents slub entering direct reclaim or compaction. It applies on top of patch 1. These are both untested and afraid are a bit rushed as well :( From 59220aa310c0ba60afee29eeea1e602f4a374c60 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Tue, 10 May 2011 15:30:20 +0100 Subject: [PATCH] mm: slub: Do not take expensive steps for SLUBs speculative high-order allocations To avoid locking and per-cpu overhead, SLUB optimisically uses high-order allocations and falls back to lower allocations if they fail. However, by simply trying to allocate, the caller can enter compaction or reclaim - both of which are likely to cost more than the benefit of using high-order pages in SLUB. On a desktop system, two users report that the system is getting locked up with kswapd using large amounts of CPU. Using SLAB instead of SLUB makes this problem go away. This patch prevents SLUB taking any expensive steps when trying to use high-order allocations. Instead, it is expected to fall back to smaller orders more aggressively. Not-signed-off-yet: Mel Gorman --- mm/page_alloc.c | 3 ++- mm/slub.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 9f8a97b..f160d93 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1972,6 +1972,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask) { int alloc_flags = ALLOC_WMARK_MIN | ALLOC_CPUSET; const gfp_t wait = gfp_mask & __GFP_WAIT; + const gfp_t wakes_kswapd = !(gfp_mask & __GFP_NO_KSWAPD); /* __GFP_HIGH is assumed to be the same as ALLOC_HIGH to save a branch. */ BUILD_BUG_ON(__GFP_HIGH != (__force gfp_t) ALLOC_HIGH); @@ -1984,7 +1985,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask) */ alloc_flags |= (__force int) (gfp_mask & __GFP_HIGH); - if (!wait) { + if (!wait && wakes_kswapd) { /* * Not worth trying to allocate harder for * __GFP_NOMEMALLOC even if it can't schedule. diff --git a/mm/slub.c b/mm/slub.c index 98c358d..1071723 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1170,7 +1170,8 @@ static struct page *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) * Let the initial higher-order allocation fail under memory pressure * so we fall-back to the minimum order allocation. */ - alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY | __GFP_NO_KSWAPD) & ~__GFP_NOFAIL; + alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY | __GFP_NO_KSWAPD) & + ~(__GFP_NOFAIL | __GFP_WAIT); page = alloc_slab_page(alloc_gfp, node, oo); if (unlikely(!page)) {