diff mbox series

[1/3] powerpc/pseries: Simplify cpu readd to use drc_index

Message ID 20190516023706.50118-1-tyreld@linux.ibm.com (mailing list archive)
State Rejected
Headers show
Series [1/3] powerpc/pseries: Simplify cpu readd to use drc_index | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch next (8150a153c013aa2dd1ffae43370b89ac1347a7fb)
snowpatch_ozlabs/checkpatch warning total: 0 errors, 1 warnings, 0 checks, 39 lines checked

Commit Message

Tyrel Datwyler May 16, 2019, 2:37 a.m. UTC
The current dlpar_cpu_readd() takes in a cpu_id and uses that to look up
the cpus device_node so that we can get at the ibm,my-drc-index
property. The only user of cpu readd is an OF notifier call back. This
call back already has a reference to the device_node and therefore can
retrieve the drc_index from the device_node.

This patch simplifies dlpar_cpu_readd() to take a drc_index directly and
does away with an uneccsary device_node lookup.

Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/topology.h          |  2 +-
 arch/powerpc/mm/numa.c                       |  6 +++---
 arch/powerpc/platforms/pseries/hotplug-cpu.c | 10 +---------
 3 files changed, 5 insertions(+), 13 deletions(-)

Comments

Nathan Lynch May 16, 2019, 7:17 p.m. UTC | #1
Tyrel Datwyler <tyreld@linux.ibm.com> writes:
> The current dlpar_cpu_readd() takes in a cpu_id and uses that to look up
> the cpus device_node so that we can get at the ibm,my-drc-index
> property. The only user of cpu readd is an OF notifier call back. This
> call back already has a reference to the device_node and therefore can
> retrieve the drc_index from the device_node.

dlpar_cpu_readd is a hack to try to change the CPU-node relationship at
runtime without destabilizing the system. It doesn't accomplish that and
it should just be removed (and I'm working on that).
Tyrel Datwyler May 17, 2019, 10:58 p.m. UTC | #2
On 05/16/2019 12:17 PM, Nathan Lynch wrote:
> Tyrel Datwyler <tyreld@linux.ibm.com> writes:
>> The current dlpar_cpu_readd() takes in a cpu_id and uses that to look up
>> the cpus device_node so that we can get at the ibm,my-drc-index
>> property. The only user of cpu readd is an OF notifier call back. This
>> call back already has a reference to the device_node and therefore can
>> retrieve the drc_index from the device_node.
> 
> dlpar_cpu_readd is a hack to try to change the CPU-node relationship at
> runtime without destabilizing the system. It doesn't accomplish that and
> it should just be removed (and I'm working on that).
> 

I will politely disagree. We've done exactly this from userspace for years. My
experience still suggests that memory affinity is the problem area, and that the
work to push this all into the kernel originally was poorly tested.

-Tyrel
Nathan Lynch May 20, 2019, 3:01 p.m. UTC | #3
Tyrel Datwyler <tyreld@linux.vnet.ibm.com> writes:

> On 05/16/2019 12:17 PM, Nathan Lynch wrote:
>> Tyrel Datwyler <tyreld@linux.ibm.com> writes:
>>> The current dlpar_cpu_readd() takes in a cpu_id and uses that to look up
>>> the cpus device_node so that we can get at the ibm,my-drc-index
>>> property. The only user of cpu readd is an OF notifier call back. This
>>> call back already has a reference to the device_node and therefore can
>>> retrieve the drc_index from the device_node.
>> 
>> dlpar_cpu_readd is a hack to try to change the CPU-node relationship at
>> runtime without destabilizing the system. It doesn't accomplish that and
>> it should just be removed (and I'm working on that).
>> 
>
> I will politely disagree. We've done exactly this from userspace for
> years. My experience still suggests that memory affinity is the
> problem area, and that the work to push this all into the kernel
> originally was poorly tested.

Kernel implementation details aside, how do you change the cpu-node
relationship at runtime without breaking NUMA-aware applications? Is
this not a fundamental issue to address before adding code like this?
Tyrel Datwyler June 3, 2019, 12:11 a.m. UTC | #4
On 05/20/2019 08:01 AM, Nathan Lynch wrote:
> Tyrel Datwyler <tyreld@linux.vnet.ibm.com> writes:
> 
>> On 05/16/2019 12:17 PM, Nathan Lynch wrote:
>>> Tyrel Datwyler <tyreld@linux.ibm.com> writes:
>>>> The current dlpar_cpu_readd() takes in a cpu_id and uses that to look up
>>>> the cpus device_node so that we can get at the ibm,my-drc-index
>>>> property. The only user of cpu readd is an OF notifier call back. This
>>>> call back already has a reference to the device_node and therefore can
>>>> retrieve the drc_index from the device_node.
>>>
>>> dlpar_cpu_readd is a hack to try to change the CPU-node relationship at
>>> runtime without destabilizing the system. It doesn't accomplish that and
>>> it should just be removed (and I'm working on that).
>>>
>>
>> I will politely disagree. We've done exactly this from userspace for
>> years. My experience still suggests that memory affinity is the
>> problem area, and that the work to push this all into the kernel
>> originally was poorly tested.
> 
> Kernel implementation details aside, how do you change the cpu-node
> relationship at runtime without breaking NUMA-aware applications? Is
> this not a fundamental issue to address before adding code like this?
> 

If that is the concern then hotplug in general already breaks them. Take for
example the removal of a faulty processor and then adding a new processor back.
It is quite possible that the new processor is in a different NUMA node. Keep in
mind that in this scenario the new processor and threads gets the same logical
cpu ids as the faulty processor we just removed.

Now we have to ask the question who is right and who is wrong. In this case the
kernel data structures reflect the correct NUMA topology. However, did the NUMA
aware application or libnuma make an assumption that specific sets of logical
cpu ids are always in the same NUMA node?

-Tyrel
Nathan Lynch June 4, 2019, 5:21 p.m. UTC | #5
Tyrel Datwyler <tyreld@linux.vnet.ibm.com> writes:
> On 05/20/2019 08:01 AM, Nathan Lynch wrote:
>> Kernel implementation details aside, how do you change the cpu-node
>> relationship at runtime without breaking NUMA-aware applications? Is
>> this not a fundamental issue to address before adding code like this?
>> 
>
> If that is the concern then hotplug in general already breaks
> them. Take for example the removal of a faulty processor and then
> adding a new processor back.  It is quite possible that the new
> processor is in a different NUMA node. Keep in mind that in this
> scenario the new processor and threads gets the same logical cpu ids
> as the faulty processor we just removed.

Yes, the problem is re-use of a logical CPU id with a node id that
differs from the one it was initially assigned, and there are several
ways to get into that situation on this platform. We probably need to be
more careful in how we allocate a spot in the CPU maps for a newly-added
processor. I believe the algorithm is simple first-fit right now, and it
doesn't take into account prior NUMA relationships.


> Now we have to ask the question who is right and who is wrong. In this
> case the kernel data structures reflect the correct NUMA
> topology. However, did the NUMA aware application or libnuma make an
> assumption that specific sets of logical cpu ids are always in the
> same NUMA node?

Yes, and that assumption is widespread because people tend to develop on
an architecture where this kind of stuff doesn't happen (at least not
yet).

And I don't really agree that the current behavior reflects what is
actually going on. When Linux running in a PowerVM LPAR receives a
notification to change the NUMA properties of a processor at runtime,
it's because the platform has changed the physical characteristics of
the partition. I.e. you're now using a different physical processor,
with different relationships to the other resources in the system. Even
if it didn't destabilize the kernel (by changing the result of
cpu_to_node() when various subsystems assume it will be static),
continuing to use the logical CPU ids on the new processor obscures what
has actually happened. And we have developers that have told us that
this behavior - changing the logical cpu<->node relationship at runtime
- is something their existing NUMA-aware applications cannot handle.
diff mbox series

Patch

diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index f85e2b01c3df..c906d9ec9013 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -133,7 +133,7 @@  static inline void shared_proc_topology_init(void) {}
 #define topology_core_cpumask(cpu)	(per_cpu(cpu_core_map, cpu))
 #define topology_core_id(cpu)		(cpu_to_core_id(cpu))
 
-int dlpar_cpu_readd(int cpu);
+int dlpar_cpu_readd(u32 drc_index);
 #endif
 #endif
 
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 57e64273cb33..40c0b6da12c2 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -1479,9 +1479,9 @@  static int dt_update_callback(struct notifier_block *nb,
 	case OF_RECONFIG_UPDATE_PROPERTY:
 		if (of_node_is_type(update->dn, "cpu") &&
 		    !of_prop_cmp(update->prop->name, "ibm,associativity")) {
-			u32 core_id;
-			of_property_read_u32(update->dn, "reg", &core_id);
-			rc = dlpar_cpu_readd(core_id);
+			u32 drc_index;
+			of_property_read_u32(update->dn, "ibm,my-drc-index", &drc_index);
+			rc = dlpar_cpu_readd(drc_index);
 			rc = NOTIFY_OK;
 		}
 		break;
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 97feb6e79f1a..2dfa9416ce54 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -802,18 +802,10 @@  static int dlpar_cpu_add_by_count(u32 cpus_to_add)
 	return rc;
 }
 
-int dlpar_cpu_readd(int cpu)
+int dlpar_cpu_readd(u32 drc_index)
 {
-	struct device_node *dn;
-	struct device *dev;
-	u32 drc_index;
 	int rc;
 
-	dev = get_cpu_device(cpu);
-	dn = dev->of_node;
-
-	rc = of_property_read_u32(dn, "ibm,my-drc-index", &drc_index);
-
 	rc = dlpar_cpu_remove_by_index(drc_index);
 	if (!rc)
 		rc = dlpar_cpu_add(drc_index);