Message ID | 1444813335-4009-1-git-send-email-khandual@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
On Wed, 2015-10-14 at 14:32 +0530, Anshuman Khandual wrote: > On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the > dynamic virtual-physical mapping for any given processor. Currently we > use VPHN node ID information only after getting either a PRRN or a VPHN > event. But during boot time inside the function numa_setup_cpu, we still > query the OF device tree for the node ID value which might be different > than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a > scenario where there are no PRRN or VPHN event after boot, all node-cpu > mapping will remain incorrect there after. > > With this proposed change, numa_setup_cpu will try to override the OF > device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY > hcall fetched node ID value. Right now shared processor property of the > LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu > during boot time. So initmem_init function has been moved after ppc_md. > setup_arch inside setup_arch during boot. I would be *very* reluctant to change the order of initmem_init() vs setup_arch(). At a minimum you'd need to go through every setup_arch() implementation and carefully determine if the ordering of what it does matters vs initmem_init(). And then you'd need to test on every affected platform. So I suggest you think of a different way to do it if at all possible. cheers
On 10/14/2015 02:49 PM, Michael Ellerman wrote: > On Wed, 2015-10-14 at 14:32 +0530, Anshuman Khandual wrote: >> On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the >> dynamic virtual-physical mapping for any given processor. Currently we >> use VPHN node ID information only after getting either a PRRN or a VPHN >> event. But during boot time inside the function numa_setup_cpu, we still >> query the OF device tree for the node ID value which might be different >> than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a >> scenario where there are no PRRN or VPHN event after boot, all node-cpu >> mapping will remain incorrect there after. >> >> With this proposed change, numa_setup_cpu will try to override the OF >> device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY >> hcall fetched node ID value. Right now shared processor property of the >> LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu >> during boot time. So initmem_init function has been moved after ppc_md. >> setup_arch inside setup_arch during boot. > > I would be *very* reluctant to change the order of initmem_init() vs > setup_arch(). > > At a minimum you'd need to go through every setup_arch() implementation and > carefully determine if the ordering of what it does matters vs initmem_init(). > And then you'd need to test on every affected platform. > > So I suggest you think of a different way to do it if at all possible. vpa_init() is being called inside pSeries_setup_arch which is ppc_md .setup_arch for the platform. Its called directly for the boot cpu and through smp_init_pseries_xics for other cpus on the system. Not sure what is the reason behind calling vpa_init() from XICS init though. If we can move all these vpa_init() calls from pSeries_setup_arch to initmem_init just before calling numa_setup_cpu, the VPA area would be initialized when we need it during boot. Will look in this direction.
On Wed, 2015-10-14 at 15:43 +0530, Anshuman Khandual wrote: > On 10/14/2015 02:49 PM, Michael Ellerman wrote: > > On Wed, 2015-10-14 at 14:32 +0530, Anshuman Khandual wrote: > >> On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the > >> dynamic virtual-physical mapping for any given processor. Currently we > >> use VPHN node ID information only after getting either a PRRN or a VPHN > >> event. But during boot time inside the function numa_setup_cpu, we still > >> query the OF device tree for the node ID value which might be different > >> than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a > >> scenario where there are no PRRN or VPHN event after boot, all node-cpu > >> mapping will remain incorrect there after. > >> > >> With this proposed change, numa_setup_cpu will try to override the OF > >> device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY > >> hcall fetched node ID value. Right now shared processor property of the > >> LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu > >> during boot time. So initmem_init function has been moved after ppc_md. > >> setup_arch inside setup_arch during boot. > > > > I would be *very* reluctant to change the order of initmem_init() vs > > setup_arch(). > > > > At a minimum you'd need to go through every setup_arch() implementation and > > carefully determine if the ordering of what it does matters vs initmem_init(). > > And then you'd need to test on every affected platform. > > > > So I suggest you think of a different way to do it if at all possible. > > vpa_init() is being called inside pSeries_setup_arch which is ppc_md > .setup_arch for the platform. Its called directly for the boot cpu > and through smp_init_pseries_xics for other cpus on the system. Not > sure what is the reason behind calling vpa_init() from XICS init > though. > > If we can move all these vpa_init() calls from pSeries_setup_arch > to initmem_init just before calling numa_setup_cpu, the VPA area > would be initialized when we need it during boot. Will look in > this direction. Back up a bit. The dependency on vpa_init() is only because you want to call lppaca_shared_proc() right? But do you really need to? What happens if you call VPHN on a non-shared proc machine? Does it 1) give you something sane or 2) give you an error or 3) give you a junk value? If it's either of 1 or 2 then you should be OK to just call it. You either use the value it returned which is sane or you see the error and just fall back to the device tree nid. cheers
On Wed, 2015-10-14 at 14:32 +0530, Anshuman Khandual wrote: > On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the > dynamic virtual-physical mapping for any given processor. Currently we > use VPHN node ID information only after getting either a PRRN or a VPHN > event. But during boot time inside the function numa_setup_cpu, we still > query the OF device tree for the node ID value which might be different > than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a > scenario where there are no PRRN or VPHN event after boot, all node-cpu > mapping will remain incorrect there after. > > With this proposed change, numa_setup_cpu will try to override the OF > device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY > hcall fetched node ID value. Right now shared processor property of the > LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu > during boot time. So initmem_init function has been moved after ppc_md. > setup_arch inside setup_arch during boot. > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > index 8b9502a..e404d05 100644 > --- a/arch/powerpc/mm/numa.c > +++ b/arch/powerpc/mm/numa.c > @@ -553,6 +557,17 @@ static int numa_setup_cpu(unsigned long lcpu) > > nid = of_node_to_nid_single(cpu); > > + /* > + * Override the OF device tree fetched node number > + * with VPHN based node number in case of a shared > + * processor LPAR on PHYP platform. > + */ > +#ifdef CONFIG_PPC_SPLPAR > + if (lppaca_shared_proc(get_lppaca())) { > + nid = vphn_get_node(lcpu); > + } > +#endif That logic exposes a potential problem which you don't seem to have addressed. You're not updating the logic in of_node_to_nid[_single](), instead you're overriding it in *this one location*. But what about other code that uses of_node_to_nid()? It will still get the old device-tree value and so will have the wrong nid, won't it? cheers
On 10/16/2015 07:54 AM, Michael Ellerman wrote: > On Wed, 2015-10-14 at 15:43 +0530, Anshuman Khandual wrote: >> On 10/14/2015 02:49 PM, Michael Ellerman wrote: >>> On Wed, 2015-10-14 at 14:32 +0530, Anshuman Khandual wrote: >>>> On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the >>>> dynamic virtual-physical mapping for any given processor. Currently we >>>> use VPHN node ID information only after getting either a PRRN or a VPHN >>>> event. But during boot time inside the function numa_setup_cpu, we still >>>> query the OF device tree for the node ID value which might be different >>>> than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a >>>> scenario where there are no PRRN or VPHN event after boot, all node-cpu >>>> mapping will remain incorrect there after. >>>> >>>> With this proposed change, numa_setup_cpu will try to override the OF >>>> device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY >>>> hcall fetched node ID value. Right now shared processor property of the >>>> LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu >>>> during boot time. So initmem_init function has been moved after ppc_md. >>>> setup_arch inside setup_arch during boot. >>> >>> I would be *very* reluctant to change the order of initmem_init() vs >>> setup_arch(). >>> >>> At a minimum you'd need to go through every setup_arch() implementation and >>> carefully determine if the ordering of what it does matters vs initmem_init(). >>> And then you'd need to test on every affected platform. >>> >>> So I suggest you think of a different way to do it if at all possible. >> >> vpa_init() is being called inside pSeries_setup_arch which is ppc_md >> .setup_arch for the platform. Its called directly for the boot cpu >> and through smp_init_pseries_xics for other cpus on the system. Not >> sure what is the reason behind calling vpa_init() from XICS init >> though. >> >> If we can move all these vpa_init() calls from pSeries_setup_arch >> to initmem_init just before calling numa_setup_cpu, the VPA area >> would be initialized when we need it during boot. Will look in >> this direction. > > Back up a bit. The dependency on vpa_init() is only because you want to call > lppaca_shared_proc() right? Right. > > But do you really need to? What happens if you call VPHN on a non-shared proc > machine? Does it 1) give you something sane or 2) give you an error or 3) give > you a junk value? > > If it's either of 1 or 2 then you should be OK to just call it. You either use > the value it returned which is sane or you see the error and just fall back to > the device tree nid. Most probably it will be a sane value without any error. But the decision to override the DT fetched value will be based on whether we are running on a shared processor LPAR or not. Hence dependency on lppaca_shared_proc(). In case of error from VPHN on a shared processor LPAR, we will still have the DT fetched value to fall back on (will update the logic in the patch for this).
On 10/16/2015 07:57 AM, Michael Ellerman wrote: > On Wed, 2015-10-14 at 14:32 +0530, Anshuman Khandual wrote: >> On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the >> dynamic virtual-physical mapping for any given processor. Currently we >> use VPHN node ID information only after getting either a PRRN or a VPHN >> event. But during boot time inside the function numa_setup_cpu, we still >> query the OF device tree for the node ID value which might be different >> than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a >> scenario where there are no PRRN or VPHN event after boot, all node-cpu >> mapping will remain incorrect there after. >> >> With this proposed change, numa_setup_cpu will try to override the OF >> device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY >> hcall fetched node ID value. Right now shared processor property of the >> LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu >> during boot time. So initmem_init function has been moved after ppc_md. >> setup_arch inside setup_arch during boot. >> >> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c >> index 8b9502a..e404d05 100644 >> --- a/arch/powerpc/mm/numa.c >> +++ b/arch/powerpc/mm/numa.c >> @@ -553,6 +557,17 @@ static int numa_setup_cpu(unsigned long lcpu) >> >> nid = of_node_to_nid_single(cpu); >> >> + /* >> + * Override the OF device tree fetched node number >> + * with VPHN based node number in case of a shared >> + * processor LPAR on PHYP platform. >> + */ >> +#ifdef CONFIG_PPC_SPLPAR >> + if (lppaca_shared_proc(get_lppaca())) { >> + nid = vphn_get_node(lcpu); >> + } >> +#endif > > > That logic exposes a potential problem which you don't seem to have addressed. You are right. > > You're not updating the logic in of_node_to_nid[_single](), instead you're > overriding it in *this one location*. But what about other code that uses > of_node_to_nid()? It will still get the old device-tree value and so will have > the wrong nid, won't it? Yeah it will. of_node_to_nid() calls of_node_to_nid_single(). So we can move in this VPHN override logic inside of_node_to_nid_single to make it available across the board. But the original problem of timing of vpa_init() still remains to make lppaca_shared_proc() check available during boot time inside numa_setup_cpu() function.
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index bdcbb71..56026b7 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -694,7 +694,6 @@ void __init setup_arch(char **cmdline_p) exc_lvl_early_init(); emergency_stack_init(); - initmem_init(); #ifdef CONFIG_DUMMY_CONSOLE conswitchp = &dummy_con; @@ -703,6 +702,7 @@ void __init setup_arch(char **cmdline_p) if (ppc_md.setup_arch) ppc_md.setup_arch(); + initmem_init(); paging_init(); /* Initialize the MMU context management stuff */ diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 8b9502a..e404d05 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -41,6 +41,10 @@ #include <asm/setup.h> #include <asm/vdso.h> +#ifdef CONFIG_PPC_SPLPAR +static int vphn_get_node(unsigned int cpu); +#endif + static int numa_enabled = 1; static char *cmdline __initdata; @@ -553,6 +557,17 @@ static int numa_setup_cpu(unsigned long lcpu) nid = of_node_to_nid_single(cpu); + /* + * Override the OF device tree fetched node number + * with VPHN based node number in case of a shared + * processor LPAR on PHYP platform. + */ +#ifdef CONFIG_PPC_SPLPAR + if (lppaca_shared_proc(get_lppaca())) { + nid = vphn_get_node(lcpu); + } +#endif + out_present: if (nid < 0 || !node_online(nid)) nid = first_online_node; @@ -1364,6 +1379,14 @@ static int update_lookup_table(void *data) return 0; } +static int vphn_get_node(unsigned int cpu) +{ + __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0}; + + vphn_get_associativity(cpu, associativity); + return associativity_to_nid(associativity); +} + /* * Update the node maps and sysfs entries for each cpu whose home node * has changed. Returns 1 when the topology has changed, and 0 otherwise. @@ -1372,7 +1395,6 @@ int arch_update_cpu_topology(void) { unsigned int cpu, sibling, changed = 0; struct topology_update_data *updates, *ud; - __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0}; cpumask_t updated_cpus; struct device *dev; int weight, new_nid, i = 0; @@ -1408,8 +1430,7 @@ int arch_update_cpu_topology(void) } /* Use associativity from first thread for all siblings */ - vphn_get_associativity(cpu, associativity); - new_nid = associativity_to_nid(associativity); + new_nid = vphn_get_node(cpu); if (new_nid < 0 || !node_online(new_nid)) new_nid = first_online_node;
On shared processor LPARs, H_HOME_NODE_ASSOCIATIVITY hcall provides the dynamic virtual-physical mapping for any given processor. Currently we use VPHN node ID information only after getting either a PRRN or a VPHN event. But during boot time inside the function numa_setup_cpu, we still query the OF device tree for the node ID value which might be different than what can be fetched from the H_HOME_NODE_ASSOCIATIVITY hcall. In a scenario where there are no PRRN or VPHN event after boot, all node-cpu mapping will remain incorrect there after. With this proposed change, numa_setup_cpu will try to override the OF device tree fetched node ID information with H_HOME_NODE_ASSOCIATIVITY hcall fetched node ID value. Right now shared processor property of the LPAR cannot be queried as VPA inializaion happens after numa_setup_cpu during boot time. So initmem_init function has been moved after ppc_md. setup_arch inside setup_arch during boot. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> --- Before the change: # numactl -H available: 2 nodes (0,3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 0 size: 0 MB node 0 free: 0 MB node 3 cpus: node 3 size: 16315 MB node 3 free: 15716 MB node distances: node 0 3 0: 10 20 3: 20 10 After the change: # numactl -H available: 2 nodes (0,3) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 3 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 3 size: 16315 MB node 3 free: 15537 MB node distances: node 0 3 0: 10 20 3: 20 10 arch/powerpc/kernel/setup_64.c | 2 +- arch/powerpc/mm/numa.c | 27 ++++++++++++++++++++++++--- 2 files changed, 25 insertions(+), 4 deletions(-)