diff mbox series

["RFC,PATCH",1/2] powerpc/mm: Fix node look up with numa=off boot

Message ID 20190629083629.29037-1-aneesh.kumar@linux.ibm.com (mailing list archive)
State Not Applicable
Headers show
Series ["RFC,PATCH",1/2] powerpc/mm: Fix node look up with numa=off boot | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch next (c7d64b560ce80d8c44f082eee8352f0778a73195)
snowpatch_ozlabs/checkpatch warning total: 0 errors, 1 warnings, 0 checks, 30 lines checked

Commit Message

Aneesh Kumar K V June 29, 2019, 8:36 a.m. UTC
If we boot with numa=off, we need to make sure we return NUMA_NO_NODE when
looking up associativity details of resources. Without this, we hit crash
like below

BUG: Unable to handle kernel data access at 0x40000000008
Faulting instruction address: 0xc000000008f31704
cpu 0x1b: Vector: 380 (Data SLB Access) at [c00000000b9bb320]
    pc: c000000008f31704: _raw_spin_lock+0x14/0x100
    lr: c0000000083f41fc: ____cache_alloc_node+0x5c/0x290
    sp: c00000000b9bb5b0
   msr: 800000010280b033
   dar: 40000000008
  current = 0xc00000000b9a2700
  paca    = 0xc00000000a740c00   irqmask: 0x03   irq_happened: 0x01
    pid   = 1, comm = swapper/27
Linux version 5.2.0-rc4-00925-g74e188c620b1 (root@linux-d8ip) (gcc version 7.4.1 20190424 [gcc-7-branch revision 270538] (SUSE Linux)) #34 SMP Sat Jun 29 00:41:02 EDT 2019
enter ? for help
[link register   ] c0000000083f41fc ____cache_alloc_node+0x5c/0x290
[c00000000b9bb5b0] 0000000000000dc0 (unreliable)
[c00000000b9bb5f0] c0000000083f48c8 kmem_cache_alloc_node_trace+0x138/0x360
[c00000000b9bb670] c000000008aa789c devres_alloc_node+0x4c/0xa0
[c00000000b9bb6a0] c000000008337218 devm_memremap+0x58/0x130
[c00000000b9bb6f0] c000000008aed00c devm_nsio_enable+0xdc/0x170
[c00000000b9bb780] c000000008af3b6c nd_pmem_probe+0x4c/0x180
[c00000000b9bb7b0] c000000008ad84cc nvdimm_bus_probe+0xac/0x260
[c00000000b9bb840] c000000008aa0628 really_probe+0x148/0x500
[c00000000b9bb8d0] c000000008aa0d7c driver_probe_device+0x19c/0x1d0
[c00000000b9bb950] c000000008aa11bc device_driver_attach+0xcc/0x100
[c00000000b9bb990] c000000008aa12ec __driver_attach+0xfc/0x1e0
[c00000000b9bba10] c000000008a9d0a4 bus_for_each_dev+0xb4/0x130
[c00000000b9bba70] c000000008a9fc04 driver_attach+0x34/0x50
[c00000000b9bba90] c000000008a9f118 bus_add_driver+0x1d8/0x300
[c00000000b9bbb20] c000000008aa2358 driver_register+0x98/0x1a0
[c00000000b9bbb90] c000000008ad7e6c __nd_driver_register+0x5c/0x100
[c00000000b9bbbf0] c0000000093efbac nd_pmem_driver_init+0x34/0x48
[c00000000b9bbc10] c0000000080106c0 do_one_initcall+0x60/0x2d0
[c00000000b9bbce0] c00000000938463c kernel_init_freeable+0x384/0x48c
[c00000000b9bbdb0] c000000008010a5c kernel_init+0x2c/0x160
[c00000000b9bbe20] c00000000800ba54 ret_from_kernel_thread+0x5c/0x68

Reported-and-debugged-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/mm/numa.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Aneesh Kumar K V June 29, 2019, 9:03 a.m. UTC | #1
On 6/29/19 2:06 PM, Aneesh Kumar K.V wrote:
> If we boot with numa=off, we need to make sure we return NUMA_NO_NODE when
> looking up associativity details of resources. Without this, we hit crash
> like below
> 
> BUG: Unable to handle kernel data access at 0x40000000008
> Faulting instruction address: 0xc000000008f31704
> cpu 0x1b: Vector: 380 (Data SLB Access) at [c00000000b9bb320]
>      pc: c000000008f31704: _raw_spin_lock+0x14/0x100
>      lr: c0000000083f41fc: ____cache_alloc_node+0x5c/0x290
>      sp: c00000000b9bb5b0
>     msr: 800000010280b033
>     dar: 40000000008
>    current = 0xc00000000b9a2700
>    paca    = 0xc00000000a740c00   irqmask: 0x03   irq_happened: 0x01
>      pid   = 1, comm = swapper/27
> Linux version 5.2.0-rc4-00925-g74e188c620b1 (root@linux-d8ip) (gcc version 7.4.1 20190424 [gcc-7-branch revision 270538] (SUSE Linux)) #34 SMP Sat Jun 29 00:41:02 EDT 2019
> enter ? for help
> [link register   ] c0000000083f41fc ____cache_alloc_node+0x5c/0x290
> [c00000000b9bb5b0] 0000000000000dc0 (unreliable)
> [c00000000b9bb5f0] c0000000083f48c8 kmem_cache_alloc_node_trace+0x138/0x360
> [c00000000b9bb670] c000000008aa789c devres_alloc_node+0x4c/0xa0
> [c00000000b9bb6a0] c000000008337218 devm_memremap+0x58/0x130
> [c00000000b9bb6f0] c000000008aed00c devm_nsio_enable+0xdc/0x170
> [c00000000b9bb780] c000000008af3b6c nd_pmem_probe+0x4c/0x180
> [c00000000b9bb7b0] c000000008ad84cc nvdimm_bus_probe+0xac/0x260
> [c00000000b9bb840] c000000008aa0628 really_probe+0x148/0x500
> [c00000000b9bb8d0] c000000008aa0d7c driver_probe_device+0x19c/0x1d0
> [c00000000b9bb950] c000000008aa11bc device_driver_attach+0xcc/0x100
> [c00000000b9bb990] c000000008aa12ec __driver_attach+0xfc/0x1e0
> [c00000000b9bba10] c000000008a9d0a4 bus_for_each_dev+0xb4/0x130
> [c00000000b9bba70] c000000008a9fc04 driver_attach+0x34/0x50
> [c00000000b9bba90] c000000008a9f118 bus_add_driver+0x1d8/0x300
> [c00000000b9bbb20] c000000008aa2358 driver_register+0x98/0x1a0
> [c00000000b9bbb90] c000000008ad7e6c __nd_driver_register+0x5c/0x100
> [c00000000b9bbbf0] c0000000093efbac nd_pmem_driver_init+0x34/0x48
> [c00000000b9bbc10] c0000000080106c0 do_one_initcall+0x60/0x2d0
> [c00000000b9bbce0] c00000000938463c kernel_init_freeable+0x384/0x48c
> [c00000000b9bbdb0] c000000008010a5c kernel_init+0x2c/0x160
> [c00000000b9bbe20] c00000000800ba54 ret_from_kernel_thread+0x5c/0x68
> 
> Reported-and-debugged-by: Vaibhav Jain <vaibhav@linux.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>   arch/powerpc/mm/numa.c | 8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 917904d2fe97..f6d68baeaa96 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -212,7 +212,7 @@ static int associativity_to_nid(const __be32 *associativity)
>   {
>   	int nid = NUMA_NO_NODE;
>   
> -	if (min_common_depth == -1)
> +	if (min_common_depth == -1 || !numa_enabled)
>   		goto out;
>   
>   	if (of_read_number(associativity, 1) >= min_common_depth)
> @@ -416,10 +416,14 @@ static int of_get_assoc_arrays(struct assoc_arrays *aa)
>   static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
>   {
>   	struct assoc_arrays aa = { .arrays = NULL };
> +	/* is that correct? */
>   	int default_nid = 0;
>   	int nid = default_nid;
>   	int rc, index;
>   
> +	if (!numa_enabled)
> +		return NUMA_NO_NODE;
> +


I guess we should have here.

modified   arch/powerpc/mm/numa.c
@@ -416,12 +416,11 @@ static int of_get_assoc_arrays(struct assoc_arrays 
*aa)
  static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
  {
  	struct assoc_arrays aa = { .arrays = NULL };
-	/* is that correct? */
  	int default_nid = 0;
  	int nid = default_nid;
  	int rc, index;

-	if (!numa_enabled)
+	if ((min_common_depth < 0) || !numa_enabled)
  		return NUMA_NO_NODE;

  	rc = of_get_assoc_arrays(&aa);


Nathan,

Can you check this?

>   	rc = of_get_assoc_arrays(&aa);
>   	if (rc)
>   		return default_nid;
> @@ -808,7 +812,7 @@ static void __init find_possible_nodes(void)
>   	struct device_node *rtas;
>   	u32 numnodes, i;
>   
> -	if (min_common_depth <= 0)
> +	if (min_common_depth <= 0 || !numa_enabled)
>   		return;
>   
>   	rtas = of_find_node_by_path("/rtas");
> 

-aneesh
Nathan Lynch July 1, 2019, 4:42 p.m. UTC | #2
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
> I guess we should have here.
>
> modified   arch/powerpc/mm/numa.c
> @@ -416,12 +416,11 @@ static int of_get_assoc_arrays(struct assoc_arrays 
> *aa)
>   static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
>   {
>   	struct assoc_arrays aa = { .arrays = NULL };
> -	/* is that correct? */
>   	int default_nid = 0;
>   	int nid = default_nid;
>   	int rc, index;
>
> -	if (!numa_enabled)
> +	if ((min_common_depth < 0) || !numa_enabled)
>   		return NUMA_NO_NODE;
>
>   	rc = of_get_assoc_arrays(&aa);
>
>
> Nathan,
>
> Can you check this?

Looks like it would do the right thing.

Just checking: do people still need numa=off? Seems like it's a
maintenance burden :-)
Aneesh Kumar K V July 2, 2019, 2:45 a.m. UTC | #3
On 7/1/19 10:12 PM, Nathan Lynch wrote:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>> I guess we should have here.
>>
>> modified   arch/powerpc/mm/numa.c
>> @@ -416,12 +416,11 @@ static int of_get_assoc_arrays(struct assoc_arrays
>> *aa)
>>    static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
>>    {
>>    	struct assoc_arrays aa = { .arrays = NULL };
>> -	/* is that correct? */
>>    	int default_nid = 0;
>>    	int nid = default_nid;
>>    	int rc, index;
>>
>> -	if (!numa_enabled)
>> +	if ((min_common_depth < 0) || !numa_enabled)
>>    		return NUMA_NO_NODE;
>>
>>    	rc = of_get_assoc_arrays(&aa);
>>
>>
>> Nathan,
>>
>> Can you check this?
> 
> Looks like it would do the right thing.
> 
> Just checking: do people still need numa=off? Seems like it's a
> maintenance burden :-)
> 

That is used in kdump kernel.

-aneesh
Nathan Lynch July 2, 2019, 3:57 p.m. UTC | #4
"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>> Just checking: do people still need numa=off? Seems like it's a
>> maintenance burden :-)
>> 
>
> That is used in kdump kernel.

I see, thanks.
Michael Ellerman July 4, 2019, 2:29 p.m. UTC | #5
Nathan Lynch <nathanl@linux.ibm.com> writes:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>> Just checking: do people still need numa=off? Seems like it's a
>>> maintenance burden :-)
>>> 
>>
>> That is used in kdump kernel.
>
> I see, thanks.

That doesn't mean it's a good idea :)

Does it actually reduce memory usage much? Last time I dug into the
kdump kernel's usage of weird command line flags none of them really did
anything useful.

cheers
Nathan Lynch July 9, 2019, 2:56 p.m. UTC | #6
Michael Ellerman <mpe@ellerman.id.au> writes:
> Nathan Lynch <nathanl@linux.ibm.com> writes:
>> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>>>> Just checking: do people still need numa=off? Seems like it's a
>>>> maintenance burden :-)
>>>> 
>>>
>>> That is used in kdump kernel.
>>
>> I see, thanks.
>
> That doesn't mean it's a good idea :)
>
> Does it actually reduce memory usage much? Last time I dug into the
> kdump kernel's usage of weird command line flags none of them really did
> anything useful.

I think it's intended to work around bugs in numa initialization, e.g.

https://www.suse.com/support/kb/doc/?id=7023399

Hopefully the original bug with numa/kdump interaction has been fixed?
diff mbox series

Patch

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 917904d2fe97..f6d68baeaa96 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -212,7 +212,7 @@  static int associativity_to_nid(const __be32 *associativity)
 {
 	int nid = NUMA_NO_NODE;
 
-	if (min_common_depth == -1)
+	if (min_common_depth == -1 || !numa_enabled)
 		goto out;
 
 	if (of_read_number(associativity, 1) >= min_common_depth)
@@ -416,10 +416,14 @@  static int of_get_assoc_arrays(struct assoc_arrays *aa)
 static int of_drconf_to_nid_single(struct drmem_lmb *lmb)
 {
 	struct assoc_arrays aa = { .arrays = NULL };
+	/* is that correct? */
 	int default_nid = 0;
 	int nid = default_nid;
 	int rc, index;
 
+	if (!numa_enabled)
+		return NUMA_NO_NODE;
+
 	rc = of_get_assoc_arrays(&aa);
 	if (rc)
 		return default_nid;
@@ -808,7 +812,7 @@  static void __init find_possible_nodes(void)
 	struct device_node *rtas;
 	u32 numnodes, i;
 
-	if (min_common_depth <= 0)
+	if (min_common_depth <= 0 || !numa_enabled)
 		return;
 
 	rtas = of_find_node_by_path("/rtas");