From patchwork Tue Nov 28 22:58:36 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Bringmann X-Patchwork-Id: 842354 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3ymfKs5S5kz9sDB for ; Wed, 29 Nov 2017 10:01:21 +1100 (AEDT) Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3ymfKs4VqtzDrvP for ; Wed, 29 Nov 2017 10:01:21 +1100 (AEDT) X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=linux.vnet.ibm.com (client-ip=148.163.156.1; helo=mx0a-001b2d01.pphosted.com; envelope-from=mwb@linux.vnet.ibm.com; receiver=) Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3ymfH66Fh7zDrLt for ; Wed, 29 Nov 2017 09:58:58 +1100 (AEDT) Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vASMreD0058124 for ; Tue, 28 Nov 2017 17:58:56 -0500 Received: from e13.ny.us.ibm.com (e13.ny.us.ibm.com [129.33.205.203]) by mx0a-001b2d01.pphosted.com with ESMTP id 2ehbb2yrhh-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Tue, 28 Nov 2017 17:58:55 -0500 Received: from localhost by e13.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 28 Nov 2017 17:58:54 -0500 Received: from b01cxnp22033.gho.pok.ibm.com (9.57.198.23) by e13.ny.us.ibm.com (146.89.104.200) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Tue, 28 Nov 2017 17:58:52 -0500 Received: from b01ledav003.gho.pok.ibm.com (b01ledav003.gho.pok.ibm.com [9.57.199.108]) by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id vASMwqlt41156760; Tue, 28 Nov 2017 22:58:52 GMT Received: from b01ledav003.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B5C85B204D; Tue, 28 Nov 2017 17:56:00 -0500 (EST) Received: from oc5000245537.ibm.com (unknown [9.53.92.243]) by b01ledav003.gho.pok.ibm.com (Postfix) with ESMTP id 48E3BB2046; Tue, 28 Nov 2017 17:56:00 -0500 (EST) To: linuxppc-dev@lists.ozlabs.org From: Michael Bringmann Subject: [PATCH V8 1/3] powerpc/nodes: Ensure enough nodes avail for operations In-Reply-To: <79bd4c96-2381-c9b1-8222-b350e069655b@linux.vnet.ibm.com> Organization: IBM Linux Technology Center Date: Tue, 28 Nov 2017 16:58:36 -0600 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 Content-Language: en-US X-TM-AS-GCONF: 00 x-cbid: 17112822-0008-0000-0000-000002A77977 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00008121; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000242; SDB=6.00952622; UDB=6.00481259; IPR=6.00732703; BA=6.00005719; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00018236; XFM=3.00000015; UTC=2017-11-28 22:58:53 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17112822-0009-0000-0000-000037711D5A Message-Id: <4faec177-a704-1404-c7ae-7f064153de2b@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-11-28_12:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1709140000 definitions=main-1711280305 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.24 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Nathan Fontenot , Michael Bringmann Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" On powerpc systems which allow 'hot-add' of CPU or memory resources, it may occur that the new resources are to be inserted into nodes that were not used for these resources at bootup. In the kernel, any node that is used must be defined and initialized. These empty nodes may occur when, * Dedicated vs. shared resources. Shared resources require information such as the VPHN hcall for CPU assignment to nodes. Associativity decisions made based on dedicated resource rules, such as associativity properties in the device tree, may vary from decisions made using the values returned by the VPHN hcall. * memoryless nodes at boot. Nodes need to be defined as 'possible' at boot for operation with other code modules. Previously, the powerpc code would limit the set of possible nodes to those which have memory assigned at boot, and were thus online. Subsequent add/remove of CPUs or memory would only work with this subset of possible nodes. * memoryless nodes with CPUs at boot. Due to the previous restriction on nodes, nodes that had CPUs but no memory were being collapsed into other nodes that did have memory at boot. In practice this meant that the node assignment presented by the runtime kernel differed from the affinity and associativity attributes presented by the device tree or VPHN hcalls. Nodes that might be known to the pHyp were not 'possible' in the runtime kernel because they did not have memory at boot. This patch ensures that sufficient nodes are defined to support configuration requirements after boot, as well as at boot. This patch set fixes a couple of problems. * Nodes known to powerpc to be memoryless at boot, but to have CPUs in them are allowed to be 'possible' and 'online'. Memory allocations for those nodes are taken from another node that does have memory until and if memory is hot-added to the node. * Nodes which have no resources assigned at boot, but which may still be referenced subsequently by affinity or associativity attributes, are kept in the list of 'possible' nodes for powerpc. Hot-add of memory or CPUs to the system can reference these nodes and bring them online instead of redirecting to one of the set of nodes that were known to have memory at boot. This patch extracts the value of the lowest domain level (number of allocable resources) from the device tree property "ibm,max-associativity-domains" to use as the maximum number of nodes to setup as possibly available in the system. This new setting will override the instruction, nodes_and(node_possible_map, node_possible_map, node_online_map); presently seen in the function arch/powerpc/mm/numa.c:initmem_init(). If the "ibm,max-associativity-domains" property is not present at boot, no operation will be performed to define or enable additional nodes, or enable the above 'nodes_and()'. Signed-off-by: Michael Bringmann Reviewed-by: Nathan Fontenot --- Changes in V8: -- Remove unneeded pr_info() statement --- arch/powerpc/mm/numa.c | 37 ++++++++++++++++++++++++++++++++++--- 1 file changed, 34 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index adb6364f..735e3fd 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -892,6 +892,34 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn) NODE_DATA(nid)->node_spanned_pages = spanned_pages; } +static void __init find_possible_nodes(void) +{ + struct device_node *rtas; + u32 numnodes, i; + + if (min_common_depth <= 0) + return; + + rtas = of_find_node_by_path("/rtas"); + if (!rtas) + return; + + if (of_property_read_u32_index(rtas, + "ibm,max-associativity-domains", + min_common_depth, &numnodes)) + goto out; + + for (i = 0; i < numnodes; i++) { + if (!node_possible(i)) { + setup_node_data(i, 0, 0); + node_set(i, node_possible_map); + } + } + +out: + of_node_put(rtas); +} + void __init initmem_init(void) { int nid, cpu; @@ -905,12 +933,15 @@ void __init initmem_init(void) memblock_dump_all(); /* - * Reduce the possible NUMA nodes to the online NUMA nodes, - * since we do not support node hotplug. This ensures that we - * lower the maximum NUMA node ID to what is actually present. + * Modify the set of possible NUMA nodes to reflect information + * available about the set of online nodes, and the set of nodes + * that we expect to make use of for this platform's affinity + * calculations. */ nodes_and(node_possible_map, node_possible_map, node_online_map); + find_possible_nodes(); + for_each_online_node(nid) { unsigned long start_pfn, end_pfn;