powerpc/perf: Fix nest-imc cpuhotplug callback failure

Anju T Sudhakar Dec. 5, 2017, 5:30 a.m.
Call trace observed during boot:                                                
Faulting instruction address: 0xc000000000248340                                
cpu 0x0: Vector: 380 (Data Access Out of Range) at [c000000ff66fb850]           
    pc: c000000000248340: event_function_call+0x50/0x1f0                        
    lr: c00000000024878c: perf_remove_from_context+0x3c/0x100                   
    sp: c000000ff66fbad0                                                        
   msr: 9000000000009033                                                        
   dar: 7d20e2a6f92d03c0                                                        
  current = 0xc000000ff6679200                                                  
  paca    = 0xc00000000fd40000   softe: 0  irq_happened: 0x01                   
    pid   = 14, comm = cpuhp/0                                                  
Linux version 4.14.0-rc2-42789-ge8eae4b (rgrimm@XXXX) (gcc version 5.4.0        
20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4)) #1 SMP Thu Nov 16 14:35:14 CST    
enter ? for help                                                                
[c000000ff66fbb80] c00000000024878c perf_remove_from_context+0x3c/0x100         
[c000000ff66fbbc0] c00000000024e84c perf_pmu_migrate_context+0x10c/0x380        
[c000000ff66fbc60] c0000000000ca050 ppc_nest_imc_cpu_offline+0x1b0/0x210        
[c000000ff66fbcb0] c0000000000d5d54 cpuhp_invoke_callback+0x194/0x620           
[c000000ff66fbd20] c0000000000d702c cpuhp_thread_fun+0x7c/0x1b0                 
[c000000ff66fbd60] c00000000010ad90 smpboot_thread_fn+0x290/0x2a0               
[c000000ff66fbdc0] c000000000104818 kthread+0x168/0x1b0                         
[c000000ff66fbe30] c00000000000b5a0 ret_from_kernel_thread+0x5c/0xbc            
While registering the cpuhotplug callbacks for nest-imc, if we fail in the      
cpuhotplug online path for any random node in a multi node system (because      
the opal call to stop nest-imc counters fails for that node),                   
ppc_nest_imc_cpu_offline() will get invoked for other nodes who successfully    
returned from cpuhotplug online path.                                           
This call trace is generated since in the ppc_nest_imc_cpu_offline()            
path we are trying to migrate the event context, when nest-imc counters are     
not even initialized.                                                           
Patch to add a check to ensure that nest-imc is registered before migrating     
the event context.                                     

Madhavan Srinivasan has recently send a skiboot patch to have a check in the    
skiboot code to make sure that the  microcode is initialized in all the chips,  
before enabling the nest units.                                                 
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>  
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
 arch/powerpc/perf/imc-pmu.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)


diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 0ead3cd..9daa1c3 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -309,6 +309,20 @@  static int ppc_nest_imc_cpu_offline(unsigned int cpu)
 	if (!cpumask_test_and_clear_cpu(cpu, &nest_imc_cpumask))
 		return 0;
+       /*
+	* Check whether nest_imc is registered. We could end up here
+	* if the cpuhotplug callback registration fails. i.e, callback
+	* invokes the offline path for all successfully registered nodes.
+	* At this stage, nest_imc pmu will not be registered and we
+	* should return here.
+	*
+	* We return with a zero since this is not an offline failure.
+	* And cpuhp_setup_state() returns the actual failure reason
+	* to the caller, which inturn will call the cleanup routine.
+	*/
+	if (!nest_pmus)
+		return 0;
 	 * Now that this cpu is one of the designated,
 	 * find a next cpu a) which is online and b) in same chip.