Message ID | 1305247626.2575.111.camel@mulgrave.site |
---|---|
State | Not Applicable, archived |
Headers | show |
On Thu, 2011-05-12 at 19:47 -0500, James Bottomley wrote: > On Fri, 2011-05-13 at 00:15 +0200, Johannes Weiner wrote: > > On Thu, May 12, 2011 at 05:04:41PM -0500, James Bottomley wrote: > > > On Thu, 2011-05-12 at 15:04 -0500, James Bottomley wrote: > > > > Confirmed, I'm afraid ... I can trigger the problem with all three > > > > patches under PREEMPT. It's not a hang this time, it's just kswapd > > > > taking 100% system time on 1 CPU and it won't calm down after I unload > > > > the system. > > > > > > Just on a "if you don't know what's wrong poke about and see" basis, I > > > sliced out all the complex logic in sleeping_prematurely() and, as far > > > as I can tell, it cures the problem behaviour. I've loaded up the > > > system, and taken the tar load generator through three runs without > > > producing a spinning kswapd (this is PREEMPT). I'll try with a > > > non-PREEMPT kernel shortly. > > > > > > What this seems to say is that there's a problem with the complex logic > > > in sleeping_prematurely(). I'm pretty sure hacking up > > > sleeping_prematurely() just to dump all the calculations is the wrong > > > thing to do, but perhaps someone can see what the right thing is ... > > > > I think I see the problem: the boolean logic of sleeping_prematurely() > > is odd. If it returns true, kswapd will keep running. So if > > pgdat_balanced() returns true, kswapd should go to sleep. > > > > This? > > I was going to say this was a winner, but on the third untar run on > non-PREEMPT, I hit the kswapd livelock. It's got much farther than > previous attempts, which all hang on the first run, but I think the > essential problem is still (at least on this machine) that > sleeping_prematurely() is doing too much work for the wakeup storm that > allocators are causing. > > Something that ratelimits the amount of time we spend in the watermark > calculations, like the below (which incorporates your pgdat fix) seems > to be much more stable (I've not run it for three full runs yet, but > kswapd CPU time is way lower so far). I've hammered it for several hours now with multiple loads; I can't seem to break it (famous last words, of course). James -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/mm/vmscan.c b/mm/vmscan.c index 0665520..545250c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2249,12 +2249,32 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining, { int i; unsigned long balanced = 0; - bool all_zones_ok = true; + bool all_zones_ok = true, ret; + static int returned_true = 0; + static unsigned long prev_jiffies = 0; + /* If a direct reclaimer woke kswapd within HZ/10, it's premature */ if (remaining) return true; + /* rate limit our entry to the watermark calculations */ + if (time_after(prev_jiffies + HZ/10, jiffies)) { + /* previously returned false, do so again */ + if (returned_true == 0) + return false; + /* or we've done the true calculation too many times */ + if (returned_true++ > 10) + return false; + + return true; + } else { + /* haven't been here for a while, reset the true count */ + returned_true = 0; + } + + prev_jiffies = jiffies; + /* Check the watermark levels */ for (i = 0; i < pgdat->nr_zones; i++) { struct zone *zone = pgdat->node_zones + i; @@ -2286,9 +2306,16 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining, * must be balanced */ if (order) - return pgdat_balanced(pgdat, balanced, classzone_idx); + ret = !pgdat_balanced(pgdat, balanced, classzone_idx); + else + ret = !all_zones_ok; + + if (ret) + returned_true++; else - return !all_zones_ok; + returned_true = 0; + + return ret; } /*