Message ID | 150783587481.992.16332072755377718653.stgit@jupiter.in.ibm.com |
---|---|
State | Accepted |
Headers | show |
Series | opal/cpu: Mark the core as bad while disabling threads of the core. | expand |
Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com> writes: > From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> > > If any of the core fails to sync its TB during chipTOD initialization, > all the threads of that core are disabled. But this does not make > linux kernel to ignore the core/cpus. It crashes while bringing them up > with below backtrace: > > [ 38.883898] kexec_core: Starting new kernel > cpu 0x0: Vector: 300 (Data Access) at [c0000003f277b730] > pc: c0000000001b9890: internal_create_group+0x30/0x304 > lr: c0000000001b9880: internal_create_group+0x20/0x304 > sp: c0000003f277b9b0 > msr: 900000000280b033 > dar: 40 > dsisr: 40000000 > current = 0xc0000003f9f41000 > paca = 0xc00000000fe00000 softe: 0 irq_happened: 0x01 > pid = 2572, comm = kexec > Linux version 4.13.2-openpower1 (jenkins@p89) (gcc version 6.4.0 (Buildroot 2017.08-00006-g319c6e1)) #1 SMP Wed Sep 20 05:42:11 UTC 2017 > enter ? for help > [c0000003f277b9b0] c0000000008a8780 (unreliable) > [c0000003f277ba50] c00000000041c3ac topology_add_dev+0x2c/0x40 > [c0000003f277ba70] c00000000006b078 cpuhp_invoke_callback+0x88/0x170 > [c0000003f277bac0] c00000000006b22c cpuhp_up_callbacks+0x54/0xb8 > [c0000003f277bb10] c00000000006bc68 cpu_up+0x11c/0x168 > [c0000003f277bbc0] c00000000002f0e0 default_machine_kexec+0x1fc/0x274 > [c0000003f277bc50] c00000000002e2d8 machine_kexec+0x50/0x58 > [c0000003f277bc70] c0000000000de4e8 kernel_kexec+0x98/0xb4 > [c0000003f277bce0] c00000000008b0f0 SyS_reboot+0x1c8/0x1f4 > [c0000003f277be30] c00000000000b118 system_call+0x58/0x6c > --- Exception: c01 (System Call) at 00007fff7f775074 > SP (7fffe6c7bf10) is in userspace > 0:mon> > > This patch fixes this issue by marking the core status device property as > "bad". > > Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> > --- > core/cpu.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) So, this is certainly an improvement over the current situation, although we should perhaps think about a centralized way to do things like this when we discover during boot that a CPU/core shouldn't be used. Merged to master as of 5b1c330fd0b08d1244d61e7ef85be9475eef9796
diff --git a/core/cpu.c b/core/cpu.c index 78565b5..be0e451 100644 --- a/core/cpu.c +++ b/core/cpu.c @@ -766,14 +766,24 @@ void cpu_remove_node(const struct cpu_thread *t) void cpu_disable_all_threads(struct cpu_thread *cpu) { unsigned int i; + struct dt_property *p; for (i = 0; i <= cpu_max_pir; i++) { struct cpu_thread *t = &cpu_stacks[i].cpu; if (t->primary == cpu->primary) t->state = cpu_state_disabled; + } + /* Mark this core as bad so that Linux kernel don't use this CPU. */ + prlog(PR_DEBUG, "CPU: Mark CPU bad (PIR 0x%04x)...\n", cpu->pir); + p = __dt_find_property(cpu->node, "status"); + if (p) + dt_del_property(cpu->node, p); + + dt_add_property_string(cpu->node, "status", "bad"); + /* XXX Do something to actually stop the core */ }