From patchwork Wed Jul 20 21:47:14 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tim Gardner X-Patchwork-Id: 105845 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from chlorine.canonical.com (chlorine.canonical.com [91.189.94.204]) by ozlabs.org (Postfix) with ESMTP id 15F5CB6F75 for ; Thu, 21 Jul 2011 07:48:27 +1000 (EST) Received: from localhost ([127.0.0.1] helo=chlorine.canonical.com) by chlorine.canonical.com with esmtp (Exim 4.71) (envelope-from ) id 1Qjecr-0005tO-1K; Wed, 20 Jul 2011 21:48:17 +0000 Received: from mail.tpi.com ([70.99.223.143]) by chlorine.canonical.com with esmtp (Exim 4.71) (envelope-from ) id 1Qjecl-0005oJ-CY for kernel-team@lists.ubuntu.com; Wed, 20 Jul 2011 21:48:11 +0000 Received: from sepang.rtg.net (mail.tpi.com [70.99.223.143]) by mail.tpi.com (Postfix) with ESMTP id CA8BFA4762 for ; Wed, 20 Jul 2011 14:47:43 -0700 (PDT) Received: by sepang.rtg.net (Postfix, from userid 1000) id CFFF8F8AFC; Wed, 20 Jul 2011 15:47:14 -0600 (MDT) To: kernel-team@lists.ubuntu.com Subject: Natty SRU: vmscan: fix a livelock in kswapd Message-Id: <20110720214714.CFFF8F8AFC@sepang.rtg.net> Date: Wed, 20 Jul 2011 15:47:14 -0600 (MDT) From: timg@tpi.com (Tim Gardner) X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.13 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: kernel-team-bounces@lists.ubuntu.com Errors-To: kernel-team-bounces@lists.ubuntu.com From 64ba82a4b63a45fd0c102dd97ea81c59fc522b76 Mon Sep 17 00:00:00 2001 From: Shaohua Li Date: Tue, 19 Jul 2011 08:49:26 -0700 Subject: [PATCH] vmscan: fix a livelock in kswapd BugLink: http://bugs.launchpad.net/bugs/813797 I'm running a workload which triggers a lot of swap in a machine with 4 nodes. After I kill the workload, I found a kswapd livelock. Sometimes kswapd3 or kswapd2 are keeping running and I can't access filesystem, but most memory is free. This looks like a regression since commit 08951e545918c159 ("mm: vmscan: correct check for kswapd sleeping in sleeping_prematurely"). Node 2 and 3 have only ZONE_NORMAL, but balance_pgdat() will return 0 for classzone_idx. The reason is end_zone in balance_pgdat() is 0 by default, if all zones have watermark ok, end_zone will keep 0. Later sleeping_prematurely() always returns true. Because this is an order 3 wakeup, and if classzone_idx is 0, both balanced_pages and present_pages in pgdat_balanced() are 0. We add a special case here. If a zone has no page, we think it's balanced. This fixes the livelock. Signed-off-by: Shaohua Li Acked-by: Mel Gorman Cc: Minchan Kim Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds (cherry picked from commit 4746efded84d7c5a9c8d64d4c6e814ff0cf9fb42) Signed-off-by: Tim Gardner Acked-by: Andy Whitcroft Acked-by: Stefan Bader --- mm/vmscan.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 4b8b37c..1e0eefe 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2245,7 +2245,8 @@ static bool pgdat_balanced(pg_data_t *pgdat, unsigned long balanced_pages, for (i = 0; i <= classzone_idx; i++) present_pages += pgdat->node_zones[i].present_pages; - return balanced_pages > (present_pages >> 2); + /* A special case here: if zone has no page, we think it's balanced */ + return balanced_pages >= (present_pages >> 2); } /* is kswapd sleeping prematurely? */