[v2] powerpc/perf: Fix memory allocation for core-imc based on num_possible_cpus()

Message ID 1526452518-12611-1-git-send-email-anju@linux.vnet.ibm.com
State Accepted
Commit d2032678e57fc508d7878307badde8f89b632ba3
Headers show
Series
  • [v2] powerpc/perf: Fix memory allocation for core-imc based on num_possible_cpus()
Related show

Commit Message

Anju T Sudhakar May 16, 2018, 6:35 a.m.
Currently memory is allocated for core-imc based on cpu_present_mask,
which has bit 'cpu' set iff cpu is populated. We use (cpu number / threads
per core) as the array index to access the memory.

Under some circumstances firmware marks a CPU as GUARDed CPU and boot the
system, until cleared of errors, these CPU's are unavailable for all
subsequent boots. GUARDed CPUs are possible but not present from linux
view, so it blows a hole when we assume the max length of our allocation
is driven by our max present cpus, where as one of the cpus might be online
and be beyond the max present cpus, due to the hole. 
So (cpu number / threads per core) value bounds the array index and leads
to memory overflow.

Call trace observed during a guard test:

Faulting instruction address: 0xc000000000149f1c
cpu 0x69: Vector: 380 (Data Access Out of Range) at [c000003fea303420]
    pc:c000000000149f1c: prefetch_freepointer+0x14/0x30
    lr:c00000000014e0f8: __kmalloc+0x1a8/0x1ac
    sp:c000003fea3036a0
   msr:9000000000009033
   dar:c9c54b2c91dbf6b7
  current = 0xc000003fea2c0000
  paca    = 0xc00000000fddd880	 softe: 3	 irq_happened: 0x01
    pid   = 1, comm = swapper/104
Linux version 4.16.7-openpower1 (smc@smc-desktop) (gcc version 6.4.0
(Buildroot 2018.02.1-00006-ga8d1126)) #2 SMP Fri May 4 16:44:54 PDT 2018
enter ? for help
call trace:
	 __kmalloc+0x1a8/0x1ac
	 (unreliable)
	 init_imc_pmu+0x7f4/0xbf0
	 opal_imc_counters_probe+0x3fc/0x43c
	 platform_drv_probe+0x48/0x80
	 driver_probe_device+0x22c/0x308
	 __driver_attach+0xa0/0xd8
	 bus_for_each_dev+0x88/0xb4
	 driver_attach+0x2c/0x40
	 bus_add_driver+0x1e8/0x228
	 driver_register+0xd0/0x114
	 __platform_driver_register+0x50/0x64
	 opal_imc_driver_init+0x24/0x38
	 do_one_initcall+0x150/0x15c
	 kernel_init_freeable+0x250/0x254
	 kernel_init+0x1c/0x150
	 ret_from_kernel_thread+0x5c/0xc8

Allocating memory for core-imc based on cpu_possible_mask, which has
bit 'cpu' set iff cpu is populatable, will fix this issue.

Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Balbir Singh <bsingharora@gmail.com>
---
 arch/powerpc/perf/imc-pmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Anju T Sudhakar May 16, 2018, 6:46 a.m. | #1
On Wednesday 16 May 2018 12:18 PM, ppaidipe wrote:
> On 2018-05-16 12:05, Anju T Sudhakar wrote:
>> Currently memory is allocated for core-imc based on cpu_present_mask,
>> which has bit 'cpu' set iff cpu is populated. We use (cpu number / 
>> threads
>> per core) as the array index to access the memory.
>>
>> Under some circumstances firmware marks a CPU as GUARDed CPU and boot 
>> the
>> system, until cleared of errors, these CPU's are unavailable for all
>> subsequent boots. GUARDed CPUs are possible but not present from linux
>> view, so it blows a hole when we assume the max length of our allocation
>> is driven by our max present cpus, where as one of the cpus might be 
>> online
>> and be beyond the max present cpus, due to the hole.
>> So (cpu number / threads per core) value bounds the array index and 
>> leads
>> to memory overflow.
>>
>> Call trace observed during a guard test:
>>
>> Faulting instruction address: 0xc000000000149f1c
>> cpu 0x69: Vector: 380 (Data Access Out of Range) at [c000003fea303420]
>>     pc:c000000000149f1c: prefetch_freepointer+0x14/0x30
>>     lr:c00000000014e0f8: __kmalloc+0x1a8/0x1ac
>>     sp:c000003fea3036a0
>>    msr:9000000000009033
>>    dar:c9c54b2c91dbf6b7
>>   current = 0xc000003fea2c0000
>>   paca    = 0xc00000000fddd880     softe: 3     irq_happened: 0x01
>>     pid   = 1, comm = swapper/104
>> Linux version 4.16.7-openpower1 (smc@smc-desktop) (gcc version 6.4.0
>> (Buildroot 2018.02.1-00006-ga8d1126)) #2 SMP Fri May 4 16:44:54 PDT 2018
>> enter ? for help
>> call trace:
>>      __kmalloc+0x1a8/0x1ac
>>      (unreliable)
>>      init_imc_pmu+0x7f4/0xbf0
>>      opal_imc_counters_probe+0x3fc/0x43c
>>      platform_drv_probe+0x48/0x80
>>      driver_probe_device+0x22c/0x308
>>      __driver_attach+0xa0/0xd8
>>      bus_for_each_dev+0x88/0xb4
>>      driver_attach+0x2c/0x40
>>      bus_add_driver+0x1e8/0x228
>>      driver_register+0xd0/0x114
>>      __platform_driver_register+0x50/0x64
>>      opal_imc_driver_init+0x24/0x38
>>      do_one_initcall+0x150/0x15c
>>      kernel_init_freeable+0x250/0x254
>>      kernel_init+0x1c/0x150
>>      ret_from_kernel_thread+0x5c/0xc8
>>
>> Allocating memory for core-imc based on cpu_possible_mask, which has
>> bit 'cpu' set iff cpu is populatable, will fix this issue.
>>
>> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
>> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
>> Reviewed-by: Balbir Singh <bsingharora@gmail.com>
>
> I have verified this fix with both normal and kexec boot multiple 
> times when
> system is having GARDed cores. Not seen any crash/memory corruption 
> issues with
> this.
>
> Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
>
>

Thanks.

-Anju

>> ---
>>  arch/powerpc/perf/imc-pmu.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
>> index d7532e7..75fb23c 100644
>> --- a/arch/powerpc/perf/imc-pmu.c
>> +++ b/arch/powerpc/perf/imc-pmu.c
>> @@ -1146,7 +1146,7 @@ static int init_nest_pmu_ref(void)
>>
>>  static void cleanup_all_core_imc_memory(void)
>>  {
>> -    int i, nr_cores = DIV_ROUND_UP(num_present_cpus(), 
>> threads_per_core);
>> +    int i, nr_cores = DIV_ROUND_UP(num_possible_cpus(), 
>> threads_per_core);
>>      struct imc_mem_info *ptr = core_imc_pmu->mem_info;
>>      int size = core_imc_pmu->counter_mem_size;
>>
>> @@ -1264,7 +1264,7 @@ static int imc_mem_init(struct imc_pmu *pmu_ptr,
>> struct device_node *parent,
>>          if (!pmu_ptr->pmu.name)
>>              return -ENOMEM;
>>
>> -        nr_cores = DIV_ROUND_UP(num_present_cpus(), threads_per_core);
>> +        nr_cores = DIV_ROUND_UP(num_possible_cpus(), threads_per_core);
>>          pmu_ptr->mem_info = kcalloc(nr_cores, sizeof(struct 
>> imc_mem_info),
>>                                  GFP_KERNEL);
ppaidipe May 16, 2018, 6:48 a.m. | #2
On 2018-05-16 12:05, Anju T Sudhakar wrote:
> Currently memory is allocated for core-imc based on cpu_present_mask,
> which has bit 'cpu' set iff cpu is populated. We use (cpu number / 
> threads
> per core) as the array index to access the memory.
> 
> Under some circumstances firmware marks a CPU as GUARDed CPU and boot 
> the
> system, until cleared of errors, these CPU's are unavailable for all
> subsequent boots. GUARDed CPUs are possible but not present from linux
> view, so it blows a hole when we assume the max length of our 
> allocation
> is driven by our max present cpus, where as one of the cpus might be 
> online
> and be beyond the max present cpus, due to the hole.
> So (cpu number / threads per core) value bounds the array index and 
> leads
> to memory overflow.
> 
> Call trace observed during a guard test:
> 
> Faulting instruction address: 0xc000000000149f1c
> cpu 0x69: Vector: 380 (Data Access Out of Range) at [c000003fea303420]
>     pc:c000000000149f1c: prefetch_freepointer+0x14/0x30
>     lr:c00000000014e0f8: __kmalloc+0x1a8/0x1ac
>     sp:c000003fea3036a0
>    msr:9000000000009033
>    dar:c9c54b2c91dbf6b7
>   current = 0xc000003fea2c0000
>   paca    = 0xc00000000fddd880	 softe: 3	 irq_happened: 0x01
>     pid   = 1, comm = swapper/104
> Linux version 4.16.7-openpower1 (smc@smc-desktop) (gcc version 6.4.0
> (Buildroot 2018.02.1-00006-ga8d1126)) #2 SMP Fri May 4 16:44:54 PDT 
> 2018
> enter ? for help
> call trace:
> 	 __kmalloc+0x1a8/0x1ac
> 	 (unreliable)
> 	 init_imc_pmu+0x7f4/0xbf0
> 	 opal_imc_counters_probe+0x3fc/0x43c
> 	 platform_drv_probe+0x48/0x80
> 	 driver_probe_device+0x22c/0x308
> 	 __driver_attach+0xa0/0xd8
> 	 bus_for_each_dev+0x88/0xb4
> 	 driver_attach+0x2c/0x40
> 	 bus_add_driver+0x1e8/0x228
> 	 driver_register+0xd0/0x114
> 	 __platform_driver_register+0x50/0x64
> 	 opal_imc_driver_init+0x24/0x38
> 	 do_one_initcall+0x150/0x15c
> 	 kernel_init_freeable+0x250/0x254
> 	 kernel_init+0x1c/0x150
> 	 ret_from_kernel_thread+0x5c/0xc8
> 
> Allocating memory for core-imc based on cpu_possible_mask, which has
> bit 'cpu' set iff cpu is populatable, will fix this issue.
> 
> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
> Reviewed-by: Balbir Singh <bsingharora@gmail.com>

I have verified this fix with both normal and kexec boot multiple times 
when
system is having GARDed cores. Not seen any crash/memory corruption 
issues with
this.

Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>


> ---
>  arch/powerpc/perf/imc-pmu.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
> index d7532e7..75fb23c 100644
> --- a/arch/powerpc/perf/imc-pmu.c
> +++ b/arch/powerpc/perf/imc-pmu.c
> @@ -1146,7 +1146,7 @@ static int init_nest_pmu_ref(void)
> 
>  static void cleanup_all_core_imc_memory(void)
>  {
> -	int i, nr_cores = DIV_ROUND_UP(num_present_cpus(), threads_per_core);
> +	int i, nr_cores = DIV_ROUND_UP(num_possible_cpus(), 
> threads_per_core);
>  	struct imc_mem_info *ptr = core_imc_pmu->mem_info;
>  	int size = core_imc_pmu->counter_mem_size;
> 
> @@ -1264,7 +1264,7 @@ static int imc_mem_init(struct imc_pmu *pmu_ptr,
> struct device_node *parent,
>  		if (!pmu_ptr->pmu.name)
>  			return -ENOMEM;
> 
> -		nr_cores = DIV_ROUND_UP(num_present_cpus(), threads_per_core);
> +		nr_cores = DIV_ROUND_UP(num_possible_cpus(), threads_per_core);
>  		pmu_ptr->mem_info = kcalloc(nr_cores, sizeof(struct imc_mem_info),
>  								GFP_KERNEL);
Michael Ellerman May 21, 2018, 10:01 a.m. | #3
On Wed, 2018-05-16 at 06:35:18 UTC, Anju T Sudhakar wrote:
> Currently memory is allocated for core-imc based on cpu_present_mask,
> which has bit 'cpu' set iff cpu is populated. We use (cpu number / threads
> per core) as the array index to access the memory.
> 
> Under some circumstances firmware marks a CPU as GUARDed CPU and boot the
> system, until cleared of errors, these CPU's are unavailable for all
> subsequent boots. GUARDed CPUs are possible but not present from linux
> view, so it blows a hole when we assume the max length of our allocation
> is driven by our max present cpus, where as one of the cpus might be online
> and be beyond the max present cpus, due to the hole. 
> So (cpu number / threads per core) value bounds the array index and leads
> to memory overflow.
> 
> Call trace observed during a guard test:
> 
> Faulting instruction address: 0xc000000000149f1c
> cpu 0x69: Vector: 380 (Data Access Out of Range) at [c000003fea303420]
>     pc:c000000000149f1c: prefetch_freepointer+0x14/0x30
>     lr:c00000000014e0f8: __kmalloc+0x1a8/0x1ac
>     sp:c000003fea3036a0
>    msr:9000000000009033
>    dar:c9c54b2c91dbf6b7
>   current = 0xc000003fea2c0000
>   paca    = 0xc00000000fddd880	 softe: 3	 irq_happened: 0x01
>     pid   = 1, comm = swapper/104
> Linux version 4.16.7-openpower1 (smc@smc-desktop) (gcc version 6.4.0
> (Buildroot 2018.02.1-00006-ga8d1126)) #2 SMP Fri May 4 16:44:54 PDT 2018
> enter ? for help
> call trace:
> 	 __kmalloc+0x1a8/0x1ac
> 	 (unreliable)
> 	 init_imc_pmu+0x7f4/0xbf0
> 	 opal_imc_counters_probe+0x3fc/0x43c
> 	 platform_drv_probe+0x48/0x80
> 	 driver_probe_device+0x22c/0x308
> 	 __driver_attach+0xa0/0xd8
> 	 bus_for_each_dev+0x88/0xb4
> 	 driver_attach+0x2c/0x40
> 	 bus_add_driver+0x1e8/0x228
> 	 driver_register+0xd0/0x114
> 	 __platform_driver_register+0x50/0x64
> 	 opal_imc_driver_init+0x24/0x38
> 	 do_one_initcall+0x150/0x15c
> 	 kernel_init_freeable+0x250/0x254
> 	 kernel_init+0x1c/0x150
> 	 ret_from_kernel_thread+0x5c/0xc8
> 
> Allocating memory for core-imc based on cpu_possible_mask, which has
> bit 'cpu' set iff cpu is populatable, will fix this issue.
> 
> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
> Reviewed-by: Balbir Singh <bsingharora@gmail.com>
> Tested-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/d2032678e57fc508d7878307badde8

cheers

Patch

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index d7532e7..75fb23c 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -1146,7 +1146,7 @@  static int init_nest_pmu_ref(void)
 
 static void cleanup_all_core_imc_memory(void)
 {
-	int i, nr_cores = DIV_ROUND_UP(num_present_cpus(), threads_per_core);
+	int i, nr_cores = DIV_ROUND_UP(num_possible_cpus(), threads_per_core);
 	struct imc_mem_info *ptr = core_imc_pmu->mem_info;
 	int size = core_imc_pmu->counter_mem_size;
 
@@ -1264,7 +1264,7 @@  static int imc_mem_init(struct imc_pmu *pmu_ptr, struct device_node *parent,
 		if (!pmu_ptr->pmu.name)
 			return -ENOMEM;
 
-		nr_cores = DIV_ROUND_UP(num_present_cpus(), threads_per_core);
+		nr_cores = DIV_ROUND_UP(num_possible_cpus(), threads_per_core);
 		pmu_ptr->mem_info = kcalloc(nr_cores, sizeof(struct imc_mem_info),
 								GFP_KERNEL);