From patchwork Thu Mar 5 18:05:49 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nishanth Aravamudan X-Patchwork-Id: 446844 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id BC46314015A for ; Fri, 6 Mar 2015 05:09:58 +1100 (AEDT) Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 554501A0BEC for ; Fri, 6 Mar 2015 05:09:58 +1100 (AEDT) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from e37.co.us.ibm.com (e37.co.us.ibm.com [32.97.110.158]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 30C771A0006 for ; Fri, 6 Mar 2015 05:06:37 +1100 (AEDT) Received: from /spool/local by e37.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 5 Mar 2015 11:06:35 -0700 Received: from d03dlp01.boulder.ibm.com (9.17.202.177) by e37.co.us.ibm.com (192.168.1.137) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 5 Mar 2015 11:06:34 -0700 Received: from b03cxnp08028.gho.boulder.ibm.com (b03cxnp08028.gho.boulder.ibm.com [9.17.130.20]) by d03dlp01.boulder.ibm.com (Postfix) with ESMTP id CA2941FF0023 for ; Thu, 5 Mar 2015 10:57:44 -0700 (MST) Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by b03cxnp08028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id t25I6UHS26214444 for ; Thu, 5 Mar 2015 11:06:30 -0700 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id t25I67Zo011780 for ; Thu, 5 Mar 2015 11:06:08 -0700 Received: from kernel.stglabs.ibm.com (kernel.stglabs.ibm.com [9.114.214.19]) by d03av04.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id t25I66uI009485; Thu, 5 Mar 2015 11:06:07 -0700 Received: by kernel.stglabs.ibm.com (Postfix, from userid 1031) id 0623C240519; Thu, 5 Mar 2015 10:05:49 -0800 (PST) Date: Thu, 5 Mar 2015 10:05:49 -0800 From: Nishanth Aravamudan To: Michael Ellerman Subject: [RFC PATCH] powerpc/numa: reset node_possible_map to only node_online_map Message-ID: <20150305180549.GA29601@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Disposition: inline X-Operating-System: Linux 3.13.0-40-generic (x86_64) User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15030518-0025-0000-0000-0000090F9391 Cc: Raghavendra K T , Paul Mackerras , Anton Blanchard , David Rientjes , Tejun Heo , linuxppc-dev@lists.ozlabs.org X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Raghu noticed an issue with excessive memory allocation on power with a simple cgroup test, specifically, in mem_cgroup_css_alloc -> for_each_node -> alloc_mem_cgroup_per_zone_info(), which ends up blowing up the kmalloc-2048 slab (to the order of 200MB for 400 cgroup directories). The underlying issue is that NODES_SHIFT on power is 8 (256 NUMA nodes possible), which defines node_possible_map, which in turn defines the iteration of for_each_node. In practice, we never see a system with 256 NUMA nodes, and in fact, we do not support node hotplug on power in the first place, so the nodes that are online when we come up are the nodes that will be present for the lifetime of this kernel. So let's, at least, drop the NUMA possible map down to the online map at runtime. This is similar to what x86 does in its initialization routines. One could alternatively nodemask_and(node_possible_map, node_online_map), but I think the cost of anding the two will always be higher than zero and set a few bits in practice. Signed-off-by: Nishanth Aravamudan --- While looking at this, I noticed that nr_node_ids is actually a misnomer, it seems. It's not the number, but the maximum_node_id, as with sparse NUMA nodes, you might only have two NUMA nodes possible, but to make certain loops work, nr_node_ids will be, e.g., 17. Should it be changed? diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 0257a7d659ef..24de29b3651b 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -958,9 +958,17 @@ void __init initmem_init(void) memblock_dump_all(); + /* + * zero out the possible nodes after we parse the device-tree, + * so that we lower the maximum NUMA node ID to what is actually + * present. + */ + nodes_clear(node_possible_map); + for_each_online_node(nid) { unsigned long start_pfn, end_pfn; + node_set(nid, node_possible_map); get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); setup_node_data(nid, start_pfn, end_pfn); sparse_memory_present_with_active_regions(nid);