diff mbox series

[v2,1/1] powerpc/numa: Online a node if PHB is attached.

Message ID 20240517142531.3273464-3-nilay@linux.ibm.com (mailing list archive)
State New
Headers show
Series powerpc/numa: Make cpu/memory less numa-node online | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_selftests success Successfully ran 8 jobs.
snowpatch_ozlabs/github-powerpc_sparse success Successfully ran 4 jobs.
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 6 jobs.
snowpatch_ozlabs/github-powerpc_kernel_qemu success Successfully ran 23 jobs.

Commit Message

Nilay Shroff May 17, 2024, 2:25 p.m. UTC
In the current design, a numa-node is made online only if
that node is attached to cpu/memory. With this design, if
any PCI/IO device is found to be attached to a numa-node
which is not online then the numa-node id of the corresponding
PCI/IO device is set to NUMA_NO_NODE(-1). This design may
negatively impact the performance of PCIe device if the
numa-node assigned to PCIe device is -1 because in such case
we may not be able to accurately calculate the distance
between two nodes.
The multi-controller NVMe PCIe disk has an issue with
calculating the node distance if the PCIe NVMe controller
is attached to a PCI host bridge which has numa-node id
value set to NUMA_NO_NODE. This patch helps fix this ensuring
that a cpu/memory less numa node is made online if it's
attached to PCI host bridge.

Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
 arch/powerpc/mm/numa.c                     | 14 +++++++++++++-
 arch/powerpc/platforms/pseries/pci_dlpar.c | 14 ++++++++++++++
 2 files changed, 27 insertions(+), 1 deletion(-)

Comments

Srikar Dronamraju May 20, 2024, 5:16 p.m. UTC | #1
* Nilay Shroff <nilay@linux.ibm.com> [2024-05-17 19:55:23]:

Hi Nilay,

> In the current design, a numa-node is made online only if
> that node is attached to cpu/memory. With this design, if
> any PCI/IO device is found to be attached to a numa-node
> which is not online then the numa-node id of the corresponding
> PCI/IO device is set to NUMA_NO_NODE(-1). This design may
> negatively impact the performance of PCIe device if the
> numa-node assigned to PCIe device is -1 because in such case
> we may not be able to accurately calculate the distance
> between two nodes.
> The multi-controller NVMe PCIe disk has an issue with
> calculating the node distance if the PCIe NVMe controller
> is attached to a PCI host bridge which has numa-node id
> value set to NUMA_NO_NODE. This patch helps fix this ensuring
> that a cpu/memory less numa node is made online if it's
> attached to PCI host bridge.
> 
Looks good to me.

Reviewed-by: Srikar Dronamraju <srikar@linux.ibm.com>
Krishna Kumar May 24, 2024, 1:31 p.m. UTC | #2
On 5/17/24 19:55, Nilay Shroff wrote:
> In the current design, a numa-node is made online only if
> that node is attached to cpu/memory. With this design, if
> any PCI/IO device is found to be attached to a numa-node
> which is not online then the numa-node id of the corresponding
> PCI/IO device is set to NUMA_NO_NODE(-1). This design may
> negatively impact the performance of PCIe device if the
> numa-node assigned to PCIe device is -1 because in such case
> we may not be able to accurately calculate the distance
> between two nodes.
> The multi-controller NVMe PCIe disk has an issue with
> calculating the node distance if the PCIe NVMe controller
> is attached to a PCI host bridge which has numa-node id
> value set to NUMA_NO_NODE. This patch helps fix this ensuring
> that a cpu/memory less numa node is made online if it's
> attached to PCI host bridge.
>
> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>

Thanks for fixing this. Looks good to me.

Reviewed-by: Krishna Kumar (krishnak@linux.ibm.com)
diff mbox series

Patch

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index a490724e84ad..aa89899f0c1a 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -896,7 +896,7 @@  static int __init numa_setup_drmem_lmb(struct drmem_lmb *lmb,
 
 static int __init parse_numa_properties(void)
 {
-	struct device_node *memory;
+	struct device_node *memory, *pci;
 	int default_nid = 0;
 	unsigned long i;
 	const __be32 *associativity;
@@ -1010,6 +1010,18 @@  static int __init parse_numa_properties(void)
 			goto new_range;
 	}
 
+	for_each_node_by_name(pci, "pci") {
+		int nid = NUMA_NO_NODE;
+
+		associativity = of_get_associativity(pci);
+		if (associativity) {
+			nid = associativity_to_nid(associativity);
+			initialize_form1_numa_distance(associativity);
+		}
+		if (likely(nid >= 0) && !node_online(nid))
+			node_set_online(nid);
+	}
+
 	/*
 	 * Now do the same thing for each MEMBLOCK listed in the
 	 * ibm,dynamic-memory property in the
diff --git a/arch/powerpc/platforms/pseries/pci_dlpar.c b/arch/powerpc/platforms/pseries/pci_dlpar.c
index 4448386268d9..52e2623a741d 100644
--- a/arch/powerpc/platforms/pseries/pci_dlpar.c
+++ b/arch/powerpc/platforms/pseries/pci_dlpar.c
@@ -11,6 +11,7 @@ 
 
 #include <linux/pci.h>
 #include <linux/export.h>
+#include <linux/node.h>
 #include <asm/pci-bridge.h>
 #include <asm/ppc-pci.h>
 #include <asm/firmware.h>
@@ -21,9 +22,22 @@ 
 struct pci_controller *init_phb_dynamic(struct device_node *dn)
 {
 	struct pci_controller *phb;
+	int nid;
 
 	pr_debug("PCI: Initializing new hotplug PHB %pOF\n", dn);
 
+	nid = of_node_to_nid(dn);
+	if (likely((nid) >= 0)) {
+		if (!node_online(nid)) {
+			if (__register_one_node(nid)) {
+				pr_err("PCI: Failed to register node %d\n", nid);
+			} else {
+				update_numa_distance(dn);
+				node_set_online(nid);
+			}
+		}
+	}
+
 	phb = pcibios_alloc_controller(dn);
 	if (!phb)
 		return NULL;