From patchwork Fri May 13 00:47:05 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Bottomley X-Patchwork-Id: 95408 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id E2E1CB6F06 for ; Fri, 13 May 2011 10:47:21 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757886Ab1EMArQ (ORCPT ); Thu, 12 May 2011 20:47:16 -0400 Received: from bedivere.hansenpartnership.com ([66.63.167.143]:39047 "EHLO bedivere.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752789Ab1EMArQ (ORCPT ); Thu, 12 May 2011 20:47:16 -0400 Received: from localhost (localhost [127.0.0.1]) by bedivere.hansenpartnership.com (Postfix) with ESMTP id 9A1D48EE125; Thu, 12 May 2011 17:47:15 -0700 (PDT) Received: from bedivere.hansenpartnership.com ([127.0.0.1]) by localhost (bedivere.hansenpartnership.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pa4WVSuHTy6t; Thu, 12 May 2011 17:47:15 -0700 (PDT) Received: from [192.168.2.10] (dagonet.hansenpartnership.com [76.243.235.53]) by bedivere.hansenpartnership.com (Postfix) with ESMTPSA id B6B508EE0EB; Thu, 12 May 2011 17:47:13 -0700 (PDT) Subject: Re: [PATCH 3/3] mm: slub: Default slub_max_order to 0 From: James Bottomley To: Johannes Weiner Cc: Pekka Enberg , Christoph Lameter , Mel Gorman , Andrew Morton , Colin King , Raghavendra D Prabhu , Jan Kara , Chris Mason , Rik van Riel , linux-fsdevel , linux-mm , linux-kernel , linux-ext4 In-Reply-To: <20110512221506.GM16531@cmpxchg.org> References: <1305127773-10570-4-git-send-email-mgorman@suse.de> <1305213359.2575.46.camel@mulgrave.site> <1305214993.2575.50.camel@mulgrave.site> <1305215742.27848.40.camel@jaguar> <1305225467.2575.66.camel@mulgrave.site> <1305229447.2575.71.camel@mulgrave.site> <1305230652.2575.72.camel@mulgrave.site> <1305237882.2575.100.camel@mulgrave.site> <20110512221506.GM16531@cmpxchg.org> Date: Thu, 12 May 2011 19:47:05 -0500 Message-ID: <1305247626.2575.111.camel@mulgrave.site> Mime-Version: 1.0 X-Mailer: Evolution 2.32.1 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Fri, 2011-05-13 at 00:15 +0200, Johannes Weiner wrote: > On Thu, May 12, 2011 at 05:04:41PM -0500, James Bottomley wrote: > > On Thu, 2011-05-12 at 15:04 -0500, James Bottomley wrote: > > > Confirmed, I'm afraid ... I can trigger the problem with all three > > > patches under PREEMPT. It's not a hang this time, it's just kswapd > > > taking 100% system time on 1 CPU and it won't calm down after I unload > > > the system. > > > > Just on a "if you don't know what's wrong poke about and see" basis, I > > sliced out all the complex logic in sleeping_prematurely() and, as far > > as I can tell, it cures the problem behaviour. I've loaded up the > > system, and taken the tar load generator through three runs without > > producing a spinning kswapd (this is PREEMPT). I'll try with a > > non-PREEMPT kernel shortly. > > > > What this seems to say is that there's a problem with the complex logic > > in sleeping_prematurely(). I'm pretty sure hacking up > > sleeping_prematurely() just to dump all the calculations is the wrong > > thing to do, but perhaps someone can see what the right thing is ... > > I think I see the problem: the boolean logic of sleeping_prematurely() > is odd. If it returns true, kswapd will keep running. So if > pgdat_balanced() returns true, kswapd should go to sleep. > > This? I was going to say this was a winner, but on the third untar run on non-PREEMPT, I hit the kswapd livelock. It's got much farther than previous attempts, which all hang on the first run, but I think the essential problem is still (at least on this machine) that sleeping_prematurely() is doing too much work for the wakeup storm that allocators are causing. Something that ratelimits the amount of time we spend in the watermark calculations, like the below (which incorporates your pgdat fix) seems to be much more stable (I've not run it for three full runs yet, but kswapd CPU time is way lower so far). The heuristic here is that if we're making the calculation more than ten times in 1/10 of a second, stop and sleep anyway. James --- -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/mm/vmscan.c b/mm/vmscan.c index 0665520..545250c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2249,12 +2249,32 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining, { int i; unsigned long balanced = 0; - bool all_zones_ok = true; + bool all_zones_ok = true, ret; + static int returned_true = 0; + static unsigned long prev_jiffies = 0; + /* If a direct reclaimer woke kswapd within HZ/10, it's premature */ if (remaining) return true; + /* rate limit our entry to the watermark calculations */ + if (time_after(prev_jiffies + HZ/10, jiffies)) { + /* previously returned false, do so again */ + if (returned_true == 0) + return false; + /* or we've done the true calculation too many times */ + if (returned_true++ > 10) + return false; + + return true; + } else { + /* haven't been here for a while, reset the true count */ + returned_true = 0; + } + + prev_jiffies = jiffies; + /* Check the watermark levels */ for (i = 0; i < pgdat->nr_zones; i++) { struct zone *zone = pgdat->node_zones + i; @@ -2286,9 +2306,16 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining, * must be balanced */ if (order) - return pgdat_balanced(pgdat, balanced, classzone_idx); + ret = !pgdat_balanced(pgdat, balanced, classzone_idx); + else + ret = !all_zones_ok; + + if (ret) + returned_true++; else - return !all_zones_ok; + returned_true = 0; + + return ret; } /*