Message ID | 20190629083629.29037-1-aneesh.kumar@linux.ibm.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | ["RFC,PATCH",1/2] powerpc/mm: Fix node look up with numa=off boot | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch next (c7d64b560ce80d8c44f082eee8352f0778a73195) |
snowpatch_ozlabs/checkpatch | warning | total: 0 errors, 1 warnings, 0 checks, 30 lines checked |
On 6/29/19 2:06 PM, Aneesh Kumar K.V wrote: > If we boot with numa=off, we need to make sure we return NUMA_NO_NODE when > looking up associativity details of resources. Without this, we hit crash > like below > > BUG: Unable to handle kernel data access at 0x40000000008 > Faulting instruction address: 0xc000000008f31704 > cpu 0x1b: Vector: 380 (Data SLB Access) at [c00000000b9bb320] > pc: c000000008f31704: _raw_spin_lock+0x14/0x100 > lr: c0000000083f41fc: ____cache_alloc_node+0x5c/0x290 > sp: c00000000b9bb5b0 > msr: 800000010280b033 > dar: 40000000008 > current = 0xc00000000b9a2700 > paca = 0xc00000000a740c00 irqmask: 0x03 irq_happened: 0x01 > pid = 1, comm = swapper/27 > Linux version 5.2.0-rc4-00925-g74e188c620b1 (root@linux-d8ip) (gcc version 7.4.1 20190424 [gcc-7-branch revision 270538] (SUSE Linux)) #34 SMP Sat Jun 29 00:41:02 EDT 2019 > enter ? for help > [link register ] c0000000083f41fc ____cache_alloc_node+0x5c/0x290 > [c00000000b9bb5b0] 0000000000000dc0 (unreliable) > [c00000000b9bb5f0] c0000000083f48c8 kmem_cache_alloc_node_trace+0x138/0x360 > [c00000000b9bb670] c000000008aa789c devres_alloc_node+0x4c/0xa0 > [c00000000b9bb6a0] c000000008337218 devm_memremap+0x58/0x130 > [c00000000b9bb6f0] c000000008aed00c devm_nsio_enable+0xdc/0x170 > [c00000000b9bb780] c000000008af3b6c nd_pmem_probe+0x4c/0x180 > [c00000000b9bb7b0] c000000008ad84cc nvdimm_bus_probe+0xac/0x260 > [c00000000b9bb840] c000000008aa0628 really_probe+0x148/0x500 > [c00000000b9bb8d0] c000000008aa0d7c driver_probe_device+0x19c/0x1d0 > [c00000000b9bb950] c000000008aa11bc device_driver_attach+0xcc/0x100 > [c00000000b9bb990] c000000008aa12ec __driver_attach+0xfc/0x1e0 > [c00000000b9bba10] c000000008a9d0a4 bus_for_each_dev+0xb4/0x130 > [c00000000b9bba70] c000000008a9fc04 driver_attach+0x34/0x50 > [c00000000b9bba90] c000000008a9f118 bus_add_driver+0x1d8/0x300 > [c00000000b9bbb20] c000000008aa2358 driver_register+0x98/0x1a0 > [c00000000b9bbb90] c000000008ad7e6c __nd_driver_register+0x5c/0x100 > [c00000000b9bbbf0] c0000000093efbac nd_pmem_driver_init+0x34/0x48 > [c00000000b9bbc10] c0000000080106c0 do_one_initcall+0x60/0x2d0 > [c00000000b9bbce0] c00000000938463c kernel_init_freeable+0x384/0x48c > [c00000000b9bbdb0] c000000008010a5c kernel_init+0x2c/0x160 > [c00000000b9bbe20] c00000000800ba54 ret_from_kernel_thread+0x5c/0x68 > > Reported-and-debugged-by: Vaibhav Jain <vaibhav@linux.ibm.com> > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> > --- > arch/powerpc/mm/numa.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c > index 917904d2fe97..f6d68baeaa96 100644 > --- a/arch/powerpc/mm/numa.c > +++ b/arch/powerpc/mm/numa.c > @@ -212,7 +212,7 @@ static int associativity_to_nid(const __be32 *associativity) > { > int nid = NUMA_NO_NODE; > > - if (min_common_depth == -1) > + if (min_common_depth == -1 || !numa_enabled) > goto out; > > if (of_read_number(associativity, 1) >= min_common_depth) > @@ -416,10 +416,14 @@ static int of_get_assoc_arrays(struct assoc_arrays *aa) > static int of_drconf_to_nid_single(struct drmem_lmb *lmb) > { > struct assoc_arrays aa = { .arrays = NULL }; > + /* is that correct? */ > int default_nid = 0; > int nid = default_nid; > int rc, index; > > + if (!numa_enabled) > + return NUMA_NO_NODE; > + I guess we should have here. modified arch/powerpc/mm/numa.c @@ -416,12 +416,11 @@ static int of_get_assoc_arrays(struct assoc_arrays *aa) static int of_drconf_to_nid_single(struct drmem_lmb *lmb) { struct assoc_arrays aa = { .arrays = NULL }; - /* is that correct? */ int default_nid = 0; int nid = default_nid; int rc, index; - if (!numa_enabled) + if ((min_common_depth < 0) || !numa_enabled) return NUMA_NO_NODE; rc = of_get_assoc_arrays(&aa); Nathan, Can you check this? > rc = of_get_assoc_arrays(&aa); > if (rc) > return default_nid; > @@ -808,7 +812,7 @@ static void __init find_possible_nodes(void) > struct device_node *rtas; > u32 numnodes, i; > > - if (min_common_depth <= 0) > + if (min_common_depth <= 0 || !numa_enabled) > return; > > rtas = of_find_node_by_path("/rtas"); > -aneesh
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes: > I guess we should have here. > > modified arch/powerpc/mm/numa.c > @@ -416,12 +416,11 @@ static int of_get_assoc_arrays(struct assoc_arrays > *aa) > static int of_drconf_to_nid_single(struct drmem_lmb *lmb) > { > struct assoc_arrays aa = { .arrays = NULL }; > - /* is that correct? */ > int default_nid = 0; > int nid = default_nid; > int rc, index; > > - if (!numa_enabled) > + if ((min_common_depth < 0) || !numa_enabled) > return NUMA_NO_NODE; > > rc = of_get_assoc_arrays(&aa); > > > Nathan, > > Can you check this? Looks like it would do the right thing. Just checking: do people still need numa=off? Seems like it's a maintenance burden :-)
On 7/1/19 10:12 PM, Nathan Lynch wrote: > "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes: >> I guess we should have here. >> >> modified arch/powerpc/mm/numa.c >> @@ -416,12 +416,11 @@ static int of_get_assoc_arrays(struct assoc_arrays >> *aa) >> static int of_drconf_to_nid_single(struct drmem_lmb *lmb) >> { >> struct assoc_arrays aa = { .arrays = NULL }; >> - /* is that correct? */ >> int default_nid = 0; >> int nid = default_nid; >> int rc, index; >> >> - if (!numa_enabled) >> + if ((min_common_depth < 0) || !numa_enabled) >> return NUMA_NO_NODE; >> >> rc = of_get_assoc_arrays(&aa); >> >> >> Nathan, >> >> Can you check this? > > Looks like it would do the right thing. > > Just checking: do people still need numa=off? Seems like it's a > maintenance burden :-) > That is used in kdump kernel. -aneesh
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes: >> Just checking: do people still need numa=off? Seems like it's a >> maintenance burden :-) >> > > That is used in kdump kernel. I see, thanks.
Nathan Lynch <nathanl@linux.ibm.com> writes: > "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes: >>> Just checking: do people still need numa=off? Seems like it's a >>> maintenance burden :-) >>> >> >> That is used in kdump kernel. > > I see, thanks. That doesn't mean it's a good idea :) Does it actually reduce memory usage much? Last time I dug into the kdump kernel's usage of weird command line flags none of them really did anything useful. cheers
Michael Ellerman <mpe@ellerman.id.au> writes: > Nathan Lynch <nathanl@linux.ibm.com> writes: >> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes: >>>> Just checking: do people still need numa=off? Seems like it's a >>>> maintenance burden :-) >>>> >>> >>> That is used in kdump kernel. >> >> I see, thanks. > > That doesn't mean it's a good idea :) > > Does it actually reduce memory usage much? Last time I dug into the > kdump kernel's usage of weird command line flags none of them really did > anything useful. I think it's intended to work around bugs in numa initialization, e.g. https://www.suse.com/support/kb/doc/?id=7023399 Hopefully the original bug with numa/kdump interaction has been fixed?
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 917904d2fe97..f6d68baeaa96 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -212,7 +212,7 @@ static int associativity_to_nid(const __be32 *associativity) { int nid = NUMA_NO_NODE; - if (min_common_depth == -1) + if (min_common_depth == -1 || !numa_enabled) goto out; if (of_read_number(associativity, 1) >= min_common_depth) @@ -416,10 +416,14 @@ static int of_get_assoc_arrays(struct assoc_arrays *aa) static int of_drconf_to_nid_single(struct drmem_lmb *lmb) { struct assoc_arrays aa = { .arrays = NULL }; + /* is that correct? */ int default_nid = 0; int nid = default_nid; int rc, index; + if (!numa_enabled) + return NUMA_NO_NODE; + rc = of_get_assoc_arrays(&aa); if (rc) return default_nid; @@ -808,7 +812,7 @@ static void __init find_possible_nodes(void) struct device_node *rtas; u32 numnodes, i; - if (min_common_depth <= 0) + if (min_common_depth <= 0 || !numa_enabled) return; rtas = of_find_node_by_path("/rtas");
If we boot with numa=off, we need to make sure we return NUMA_NO_NODE when looking up associativity details of resources. Without this, we hit crash like below BUG: Unable to handle kernel data access at 0x40000000008 Faulting instruction address: 0xc000000008f31704 cpu 0x1b: Vector: 380 (Data SLB Access) at [c00000000b9bb320] pc: c000000008f31704: _raw_spin_lock+0x14/0x100 lr: c0000000083f41fc: ____cache_alloc_node+0x5c/0x290 sp: c00000000b9bb5b0 msr: 800000010280b033 dar: 40000000008 current = 0xc00000000b9a2700 paca = 0xc00000000a740c00 irqmask: 0x03 irq_happened: 0x01 pid = 1, comm = swapper/27 Linux version 5.2.0-rc4-00925-g74e188c620b1 (root@linux-d8ip) (gcc version 7.4.1 20190424 [gcc-7-branch revision 270538] (SUSE Linux)) #34 SMP Sat Jun 29 00:41:02 EDT 2019 enter ? for help [link register ] c0000000083f41fc ____cache_alloc_node+0x5c/0x290 [c00000000b9bb5b0] 0000000000000dc0 (unreliable) [c00000000b9bb5f0] c0000000083f48c8 kmem_cache_alloc_node_trace+0x138/0x360 [c00000000b9bb670] c000000008aa789c devres_alloc_node+0x4c/0xa0 [c00000000b9bb6a0] c000000008337218 devm_memremap+0x58/0x130 [c00000000b9bb6f0] c000000008aed00c devm_nsio_enable+0xdc/0x170 [c00000000b9bb780] c000000008af3b6c nd_pmem_probe+0x4c/0x180 [c00000000b9bb7b0] c000000008ad84cc nvdimm_bus_probe+0xac/0x260 [c00000000b9bb840] c000000008aa0628 really_probe+0x148/0x500 [c00000000b9bb8d0] c000000008aa0d7c driver_probe_device+0x19c/0x1d0 [c00000000b9bb950] c000000008aa11bc device_driver_attach+0xcc/0x100 [c00000000b9bb990] c000000008aa12ec __driver_attach+0xfc/0x1e0 [c00000000b9bba10] c000000008a9d0a4 bus_for_each_dev+0xb4/0x130 [c00000000b9bba70] c000000008a9fc04 driver_attach+0x34/0x50 [c00000000b9bba90] c000000008a9f118 bus_add_driver+0x1d8/0x300 [c00000000b9bbb20] c000000008aa2358 driver_register+0x98/0x1a0 [c00000000b9bbb90] c000000008ad7e6c __nd_driver_register+0x5c/0x100 [c00000000b9bbbf0] c0000000093efbac nd_pmem_driver_init+0x34/0x48 [c00000000b9bbc10] c0000000080106c0 do_one_initcall+0x60/0x2d0 [c00000000b9bbce0] c00000000938463c kernel_init_freeable+0x384/0x48c [c00000000b9bbdb0] c000000008010a5c kernel_init+0x2c/0x160 [c00000000b9bbe20] c00000000800ba54 ret_from_kernel_thread+0x5c/0x68 Reported-and-debugged-by: Vaibhav Jain <vaibhav@linux.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> --- arch/powerpc/mm/numa.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)