diff mbox

[RFC] powerpc/numa: Use VPHN based node ID information on shared processor LPARs

Message ID 1444813335-4009-1-git-send-email-khandual@linux.vnet.ibm.com (mailing list archive)
State Changes Requested
Headers show

Commit Message

Anshuman Khandual Oct. 14, 2015, 9:02 a.m. UTC
On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the
dynamic virtual-physical mapping for any given processor. Currently we
use VPHN node ID information only after getting either a PRRN or a VPHN
event. But during boot time inside the function numa_setup_cpu, we still
query the OF device tree for the node ID value which might be different
than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a
scenario where there are no PRRN or VPHN event after boot, all node-cpu
mapping will remain incorrect there after.

With this proposed change, numa_setup_cpu will try to override the OF
device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY
hcall fetched node ID value. Right now shared processor property of the
LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu
during boot time. So initmem_init function has been moved after ppc_md.
setup_arch inside setup_arch during boot.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
Before the change:
# numactl -H
available: 2 nodes (0,3)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 0 size: 0 MB
node 0 free: 0 MB
node 3 cpus:
node 3 size: 16315 MB
node 3 free: 15716 MB
node distances:
node   0   3 
  0:  10  20 
  3:  20  10 
 
After the change:
# numactl -H
available: 2 nodes (0,3)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 3 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 3 size: 16315 MB
node 3 free: 15537 MB
node distances:
node   0   3 
  0:  10  20 
  3:  20  10

 arch/powerpc/kernel/setup_64.c |  2 +-
 arch/powerpc/mm/numa.c         | 27 ++++++++++++++++++++++++---
 2 files changed, 25 insertions(+), 4 deletions(-)

Comments

Michael Ellerman Oct. 14, 2015, 9:19 a.m. UTC | #1
On Wed, 2015-10-14 at 14:32 +0530, Anshuman Khandual wrote:
> On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the
> dynamic virtual-physical mapping for any given processor. Currently we
> use VPHN node ID information only after getting either a PRRN or a VPHN
> event. But during boot time inside the function numa_setup_cpu, we still
> query the OF device tree for the node ID value which might be different
> than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a
> scenario where there are no PRRN or VPHN event after boot, all node-cpu
> mapping will remain incorrect there after.
> 
> With this proposed change, numa_setup_cpu will try to override the OF
> device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY
> hcall fetched node ID value. Right now shared processor property of the
> LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu
> during boot time. So initmem_init function has been moved after ppc_md.
> setup_arch inside setup_arch during boot.

I would be *very* reluctant to change the order of initmem_init() vs
setup_arch().

At a minimum you'd need to go through every setup_arch() implementation and
carefully determine if the ordering of what it does matters vs initmem_init().
And then you'd need to test on every affected platform.

So I suggest you think of a different way to do it if at all possible.

cheers
Anshuman Khandual Oct. 14, 2015, 10:13 a.m. UTC | #2
On 10/14/2015 02:49 PM, Michael Ellerman wrote:
> On Wed, 2015-10-14 at 14:32 +0530, Anshuman Khandual wrote:
>> On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the
>> dynamic virtual-physical mapping for any given processor. Currently we
>> use VPHN node ID information only after getting either a PRRN or a VPHN
>> event. But during boot time inside the function numa_setup_cpu, we still
>> query the OF device tree for the node ID value which might be different
>> than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a
>> scenario where there are no PRRN or VPHN event after boot, all node-cpu
>> mapping will remain incorrect there after.
>>
>> With this proposed change, numa_setup_cpu will try to override the OF
>> device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY
>> hcall fetched node ID value. Right now shared processor property of the
>> LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu
>> during boot time. So initmem_init function has been moved after ppc_md.
>> setup_arch inside setup_arch during boot.
> 
> I would be *very* reluctant to change the order of initmem_init() vs
> setup_arch().
> 
> At a minimum you'd need to go through every setup_arch() implementation and
> carefully determine if the ordering of what it does matters vs initmem_init().
> And then you'd need to test on every affected platform.
> 
> So I suggest you think of a different way to do it if at all possible.

vpa_init() is being called inside pSeries_setup_arch which is ppc_md
.setup_arch for the platform. Its called directly for the boot cpu
and through smp_init_pseries_xics for other cpus on the system. Not
sure what is the reason behind calling vpa_init() from XICS init
though.

If we can move all these vpa_init() calls from pSeries_setup_arch
to initmem_init just before calling numa_setup_cpu, the VPA area
would be initialized when we need it during boot. Will look in
this direction.
Michael Ellerman Oct. 16, 2015, 2:24 a.m. UTC | #3
On Wed, 2015-10-14 at 15:43 +0530, Anshuman Khandual wrote:
> On 10/14/2015 02:49 PM, Michael Ellerman wrote:
> > On Wed, 2015-10-14 at 14:32 +0530, Anshuman Khandual wrote:
> >> On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the
> >> dynamic virtual-physical mapping for any given processor. Currently we
> >> use VPHN node ID information only after getting either a PRRN or a VPHN
> >> event. But during boot time inside the function numa_setup_cpu, we still
> >> query the OF device tree for the node ID value which might be different
> >> than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a
> >> scenario where there are no PRRN or VPHN event after boot, all node-cpu
> >> mapping will remain incorrect there after.
> >>
> >> With this proposed change, numa_setup_cpu will try to override the OF
> >> device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY
> >> hcall fetched node ID value. Right now shared processor property of the
> >> LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu
> >> during boot time. So initmem_init function has been moved after ppc_md.
> >> setup_arch inside setup_arch during boot.
> > 
> > I would be *very* reluctant to change the order of initmem_init() vs
> > setup_arch().
> > 
> > At a minimum you'd need to go through every setup_arch() implementation and
> > carefully determine if the ordering of what it does matters vs initmem_init().
> > And then you'd need to test on every affected platform.
> > 
> > So I suggest you think of a different way to do it if at all possible.
> 
> vpa_init() is being called inside pSeries_setup_arch which is ppc_md
> .setup_arch for the platform. Its called directly for the boot cpu
> and through smp_init_pseries_xics for other cpus on the system. Not
> sure what is the reason behind calling vpa_init() from XICS init
> though.
> 
> If we can move all these vpa_init() calls from pSeries_setup_arch
> to initmem_init just before calling numa_setup_cpu, the VPA area
> would be initialized when we need it during boot. Will look in
> this direction.

Back up a bit. The dependency on vpa_init() is only because you want to call
lppaca_shared_proc() right?

But do you really need to? What happens if you call VPHN on a non-shared proc
machine? Does it 1) give you something sane or 2) give you an error or 3) give
you a junk value?

If it's either of 1 or 2 then you should be OK to just call it. You either use
the value it returned which is sane or you see the error and just fall back to
the device tree nid.

cheers
Michael Ellerman Oct. 16, 2015, 2:27 a.m. UTC | #4
On Wed, 2015-10-14 at 14:32 +0530, Anshuman Khandual wrote:
> On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the
> dynamic virtual-physical mapping for any given processor. Currently we
> use VPHN node ID information only after getting either a PRRN or a VPHN
> event. But during boot time inside the function numa_setup_cpu, we still
> query the OF device tree for the node ID value which might be different
> than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a
> scenario where there are no PRRN or VPHN event after boot, all node-cpu
> mapping will remain incorrect there after.
> 
> With this proposed change, numa_setup_cpu will try to override the OF
> device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY
> hcall fetched node ID value. Right now shared processor property of the
> LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu
> during boot time. So initmem_init function has been moved after ppc_md.
> setup_arch inside setup_arch during boot.
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 8b9502a..e404d05 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -553,6 +557,17 @@ static int numa_setup_cpu(unsigned long lcpu)
>  
>  	nid = of_node_to_nid_single(cpu);
>  
> +	/*
> +	 * Override the OF device tree fetched node number
> +	 * with VPHN based node number in case of a shared
> +	 * processor LPAR on PHYP platform.
> +	 */
> +#ifdef CONFIG_PPC_SPLPAR
> +	if (lppaca_shared_proc(get_lppaca())) {
> +		nid = vphn_get_node(lcpu);
> +	}
> +#endif


That logic exposes a potential problem which you don't seem to have addressed.

You're not updating the logic in of_node_to_nid[_single](), instead you're
overriding it in *this one location*. But what about other code that uses
of_node_to_nid()? It will still get the old device-tree value and so will have
the wrong nid, won't it?

cheers
Anshuman Khandual Oct. 16, 2015, 5:55 a.m. UTC | #5
On 10/16/2015 07:54 AM, Michael Ellerman wrote:
> On Wed, 2015-10-14 at 15:43 +0530, Anshuman Khandual wrote:
>> On 10/14/2015 02:49 PM, Michael Ellerman wrote:
>>> On Wed, 2015-10-14 at 14:32 +0530, Anshuman Khandual wrote:
>>>> On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the
>>>> dynamic virtual-physical mapping for any given processor. Currently we
>>>> use VPHN node ID information only after getting either a PRRN or a VPHN
>>>> event. But during boot time inside the function numa_setup_cpu, we still
>>>> query the OF device tree for the node ID value which might be different
>>>> than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a
>>>> scenario where there are no PRRN or VPHN event after boot, all node-cpu
>>>> mapping will remain incorrect there after.
>>>>
>>>> With this proposed change, numa_setup_cpu will try to override the OF
>>>> device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY
>>>> hcall fetched node ID value. Right now shared processor property of the
>>>> LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu
>>>> during boot time. So initmem_init function has been moved after ppc_md.
>>>> setup_arch inside setup_arch during boot.
>>>
>>> I would be *very* reluctant to change the order of initmem_init() vs
>>> setup_arch().
>>>
>>> At a minimum you'd need to go through every setup_arch() implementation and
>>> carefully determine if the ordering of what it does matters vs initmem_init().
>>> And then you'd need to test on every affected platform.
>>>
>>> So I suggest you think of a different way to do it if at all possible.
>>
>> vpa_init() is being called inside pSeries_setup_arch which is ppc_md
>> .setup_arch for the platform. Its called directly for the boot cpu
>> and through smp_init_pseries_xics for other cpus on the system. Not
>> sure what is the reason behind calling vpa_init() from XICS init
>> though.
>>
>> If we can move all these vpa_init() calls from pSeries_setup_arch
>> to initmem_init just before calling numa_setup_cpu, the VPA area
>> would be initialized when we need it during boot. Will look in
>> this direction.
> 
> Back up a bit. The dependency on vpa_init() is only because you want to call
> lppaca_shared_proc() right?

Right.

> 
> But do you really need to? What happens if you call VPHN on a non-shared proc
> machine? Does it 1) give you something sane or 2) give you an error or 3) give
> you a junk value?
> 
> If it's either of 1 or 2 then you should be OK to just call it. You either use
> the value it returned which is sane or you see the error and just fall back to
> the device tree nid.

Most probably it will be a sane value without any error. But the
decision to override the DT fetched value will be based on whether
we are running on a shared processor LPAR or not. Hence dependency
on lppaca_shared_proc(). In case of error from VPHN on a shared
processor LPAR, we will still have the DT fetched value to fall
back on (will update the logic in the patch for this).
Anshuman Khandual Oct. 16, 2015, 5:55 a.m. UTC | #6
On 10/16/2015 07:57 AM, Michael Ellerman wrote:
> On Wed, 2015-10-14 at 14:32 +0530, Anshuman Khandual wrote:
>> On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the
>> dynamic virtual-physical mapping for any given processor. Currently we
>> use VPHN node ID information only after getting either a PRRN or a VPHN
>> event. But during boot time inside the function numa_setup_cpu, we still
>> query the OF device tree for the node ID value which might be different
>> than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a
>> scenario where there are no PRRN or VPHN event after boot, all node-cpu
>> mapping will remain incorrect there after.
>>
>> With this proposed change, numa_setup_cpu will try to override the OF
>> device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY
>> hcall fetched node ID value. Right now shared processor property of the
>> LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu
>> during boot time. So initmem_init function has been moved after ppc_md.
>> setup_arch inside setup_arch during boot.
>>
>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>> index 8b9502a..e404d05 100644
>> --- a/arch/powerpc/mm/numa.c
>> +++ b/arch/powerpc/mm/numa.c
>> @@ -553,6 +557,17 @@ static int numa_setup_cpu(unsigned long lcpu)
>>  
>>  	nid = of_node_to_nid_single(cpu);
>>  
>> +	/*
>> +	 * Override the OF device tree fetched node number
>> +	 * with VPHN based node number in case of a shared
>> +	 * processor LPAR on PHYP platform.
>> +	 */
>> +#ifdef CONFIG_PPC_SPLPAR
>> +	if (lppaca_shared_proc(get_lppaca())) {
>> +		nid = vphn_get_node(lcpu);
>> +	}
>> +#endif
> 
> 
> That logic exposes a potential problem which you don't seem to have addressed.

You are right.

> 
> You're not updating the logic in of_node_to_nid[_single](), instead you're
> overriding it in *this one location*. But what about other code that uses
> of_node_to_nid()? It will still get the old device-tree value and so will have
> the wrong nid, won't it?

Yeah it will. of_node_to_nid() calls of_node_to_nid_single(). So we
can move in this VPHN override logic inside of_node_to_nid_single to
make it available across the board. But the original problem of timing
of vpa_init() still remains to make lppaca_shared_proc() check available
during boot time inside numa_setup_cpu() function.
diff mbox

Patch

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index bdcbb71..56026b7 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -694,7 +694,6 @@  void __init setup_arch(char **cmdline_p)
 	exc_lvl_early_init();
 	emergency_stack_init();
 
-	initmem_init();
 
 #ifdef CONFIG_DUMMY_CONSOLE
 	conswitchp = &dummy_con;
@@ -703,6 +702,7 @@  void __init setup_arch(char **cmdline_p)
 	if (ppc_md.setup_arch)
 		ppc_md.setup_arch();
 
+	initmem_init();
 	paging_init();
 
 	/* Initialize the MMU context management stuff */
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 8b9502a..e404d05 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -41,6 +41,10 @@ 
 #include <asm/setup.h>
 #include <asm/vdso.h>
 
+#ifdef CONFIG_PPC_SPLPAR
+static int vphn_get_node(unsigned int cpu);
+#endif
+
 static int numa_enabled = 1;
 
 static char *cmdline __initdata;
@@ -553,6 +557,17 @@  static int numa_setup_cpu(unsigned long lcpu)
 
 	nid = of_node_to_nid_single(cpu);
 
+	/*
+	 * Override the OF device tree fetched node number
+	 * with VPHN based node number in case of a shared
+	 * processor LPAR on PHYP platform.
+	 */
+#ifdef CONFIG_PPC_SPLPAR
+	if (lppaca_shared_proc(get_lppaca())) {
+		nid = vphn_get_node(lcpu);
+	}
+#endif
+
 out_present:
 	if (nid < 0 || !node_online(nid))
 		nid = first_online_node;
@@ -1364,6 +1379,14 @@  static int update_lookup_table(void *data)
 	return 0;
 }
 
+static int vphn_get_node(unsigned int cpu)
+{
+	__be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
+
+	vphn_get_associativity(cpu, associativity);
+	return associativity_to_nid(associativity);
+}
+
 /*
  * Update the node maps and sysfs entries for each cpu whose home node
  * has changed. Returns 1 when the topology has changed, and 0 otherwise.
@@ -1372,7 +1395,6 @@  int arch_update_cpu_topology(void)
 {
 	unsigned int cpu, sibling, changed = 0;
 	struct topology_update_data *updates, *ud;
-	__be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
 	cpumask_t updated_cpus;
 	struct device *dev;
 	int weight, new_nid, i = 0;
@@ -1408,8 +1430,7 @@  int arch_update_cpu_topology(void)
 		}
 
 		/* Use associativity from first thread for all siblings */
-		vphn_get_associativity(cpu, associativity);
-		new_nid = associativity_to_nid(associativity);
+		new_nid = vphn_get_node(cpu);
 		if (new_nid < 0 || !node_online(new_nid))
 			new_nid = first_online_node;