From patchwork Tue Jul 15 21:29:31 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kamal Mostafa X-Patchwork-Id: 370225 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 515B3140080; Wed, 16 Jul 2014 07:33:41 +1000 (EST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1X7ALy-00005u-Iw; Tue, 15 Jul 2014 21:33:38 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1X7AI1-0005wm-Iv for kernel-team@lists.ubuntu.com; Tue, 15 Jul 2014 21:29:33 +0000 Received: from c-67-160-228-185.hsd1.ca.comcast.net ([67.160.228.185] helo=fourier) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1X7AI1-0006tp-CB; Tue, 15 Jul 2014 21:29:33 +0000 Received: from kamal by fourier with local (Exim 4.82) (envelope-from ) id 1X7AHz-0002fS-CC; Tue, 15 Jul 2014 14:29:31 -0700 From: Kamal Mostafa To: Mel Gorman Subject: [3.13.y.z extended stable] Patch "mm: vmscan: do not throttle based on pfmemalloc reserves if node has no ZONE_NORMAL" has been added to staging queue Date: Tue, 15 Jul 2014 14:29:31 -0700 Message-Id: <1405459771-10223-1-git-send-email-kamal@canonical.com> X-Mailer: git-send-email 1.9.1 X-Extended-Stable: 3.13 Cc: Andrew Morton , Linus Torvalds , Kamal Mostafa , kernel-team@lists.ubuntu.com X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.14 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: kernel-team-bounces@lists.ubuntu.com This is a note to let you know that I have just added a patch titled mm: vmscan: do not throttle based on pfmemalloc reserves if node has no ZONE_NORMAL to the linux-3.13.y-queue branch of the 3.13.y.z extended stable tree which can be found at: http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.13.y-queue This patch is scheduled to be released in version 3.13.11.5. If you, or anyone else, feels it should not be added to this tree, please reply to this email. For more information about the 3.13.y.z tree, see https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable Thanks. -Kamal ------ From 7cfdc0c7d34154d3e3df96861066b9159282c5e9 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Wed, 4 Jun 2014 16:07:35 -0700 Subject: mm: vmscan: do not throttle based on pfmemalloc reserves if node has no ZONE_NORMAL commit 675becce15f320337499bc1a9356260409a5ba29 upstream. throttle_direct_reclaim() is meant to trigger during swap-over-network during which the min watermark is treated as a pfmemalloc reserve. It throttes on the first node in the zonelist but this is flawed. The user-visible impact is that a process running on CPU whose local memory node has no ZONE_NORMAL will stall for prolonged periods of time, possibly indefintely. This is due to throttle_direct_reclaim thinking the pfmemalloc reserves are depleted when in fact they don't exist on that node. On a NUMA machine running a 32-bit kernel (I know) allocation requests from CPUs on node 1 would detect no pfmemalloc reserves and the process gets throttled. This patch adjusts throttling of direct reclaim to throttle based on the first node in the zonelist that has a usable ZONE_NORMAL or lower zone. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Kamal Mostafa --- mm/vmscan.c | 43 +++++++++++++++++++++++++++++++++++++------ 1 file changed, 37 insertions(+), 6 deletions(-) -- 1.9.1 diff --git a/mm/vmscan.c b/mm/vmscan.c index 05e6095..d873137 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2484,10 +2484,17 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat) for (i = 0; i <= ZONE_NORMAL; i++) { zone = &pgdat->node_zones[i]; + if (!populated_zone(zone)) + continue; + pfmemalloc_reserve += min_wmark_pages(zone); free_pages += zone_page_state(zone, NR_FREE_PAGES); } + /* If there are no reserves (unexpected config) then do not throttle */ + if (!pfmemalloc_reserve) + return true; + wmark_ok = free_pages > pfmemalloc_reserve / 2; /* kswapd must be awake if processes are being throttled */ @@ -2512,9 +2519,9 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat) static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist, nodemask_t *nodemask) { + struct zoneref *z; struct zone *zone; - int high_zoneidx = gfp_zone(gfp_mask); - pg_data_t *pgdat; + pg_data_t *pgdat = NULL; /* * Kernel threads should not be throttled as they may be indirectly @@ -2533,10 +2540,34 @@ static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist, if (fatal_signal_pending(current)) goto out; - /* Check if the pfmemalloc reserves are ok */ - first_zones_zonelist(zonelist, high_zoneidx, NULL, &zone); - pgdat = zone->zone_pgdat; - if (pfmemalloc_watermark_ok(pgdat)) + /* + * Check if the pfmemalloc reserves are ok by finding the first node + * with a usable ZONE_NORMAL or lower zone. The expectation is that + * GFP_KERNEL will be required for allocating network buffers when + * swapping over the network so ZONE_HIGHMEM is unusable. + * + * Throttling is based on the first usable node and throttled processes + * wait on a queue until kswapd makes progress and wakes them. There + * is an affinity then between processes waking up and where reclaim + * progress has been made assuming the process wakes on the same node. + * More importantly, processes running on remote nodes will not compete + * for remote pfmemalloc reserves and processes on different nodes + * should make reasonable progress. + */ + for_each_zone_zonelist_nodemask(zone, z, zonelist, + gfp_mask, nodemask) { + if (zone_idx(zone) > ZONE_NORMAL) + continue; + + /* Throttle based on the first usable node */ + pgdat = zone->zone_pgdat; + if (pfmemalloc_watermark_ok(pgdat)) + goto out; + break; + } + + /* If no zone was usable by the allocation flags then do not throttle */ + if (!pgdat) goto out; /* Account for the throttling */