diff mbox

powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode

Message ID 53922FB8.6070408@linux.vnet.ibm.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

Srivatsa S. Bhat June 6, 2014, 9:16 p.m. UTC
On 06/06/2014 05:59 PM, Srivatsa S. Bhat wrote:
> On 06/04/2014 03:39 AM, Benjamin Herrenschmidt wrote:
>> On Wed, 2014-06-04 at 01:58 +0530, Srivatsa S. Bhat wrote:
>>> Yep, that makes sense. But unfortunately I don't have enough insight into
>>> why exactly powerpc has to online the CPUs before doing a kexec. I just
>>> know from the commit log and the comment mentioned above (and from my own
>>> experiments) that the CPUs will get stuck if they were offline. Perhaps
>>> somebody more knowledgeable can explain this in detail and suggest a proper
>>> long-term solution.
>>>
>>> Matt, Ben, any thoughts on this?
>>
>> The problem is with our "soft offline" which we do on some platforms. When we
>> offline we don't actually send the CPUs back to firmware or anything like that.
>>
>> We put them into a very low low power loop inside Linux.
>>
>> The new kernel has no way to extract them from that loop. So we must re-"online"
>> them before we kexec so they can be passed to the new kernel normally (or returned
>> to firmware like we do on powernv).
>>
> 
> Thanks a lot for the explanation Ben!
> 
> I thought about this and this is what I think: whether the CPU is in the kernel
> or in the firmware is a hard-boundary. But once we know it is still in the
> kernel, whether it is online or offline is a soft-boundary, something that
> ideally shouldn't make any difference to kexec.
> 
> Then I looked at what is that special state that kexec expects the online CPUs
> to be in, before performing kexec, and I found that that state is entered via
> kexec_smp_down().
> 
> Which means, if we poke the soft-offline CPUs and make them execute
> kexec_smp_down(), we should be able to do a successful kexec without having to
> actually online them. After all, the core kexec code doesn't mandate that they
> should be online. So if we satisfy powerpc's requirement that all the CPUs are
> in a sane state, that should be good enough. (This would be similar to how the
> subcore code wakes up offline CPUs to perform the split-core procedure).
> 
> I know, this is all theory for now since I haven't tested it yet, but I think
> we can make this work.
> 
> Below are the 4 preliminary patches I'm have so far, to implement this.
> 

And with the following hunk added (which I had forgotten earlier), it worked just
fine on powernv :-)



I tried putting the machine into ST mode, and in a separate experiment, I kept
just CPU 0 online in the first kernel, and then issued a kexec. The second kernel
booted successfully with all the CPUs in both the cases.

I haven't explored the crashed-kernel case though, it might need some auditing
to check if the code handles that as well.

Regards,
Srivatsa S. Bhat

Comments

Joel Stanley June 12, 2014, 6:39 a.m. UTC | #1
Hi Srivatsa,

On Sat, Jun 7, 2014 at 7:16 AM, Srivatsa S. Bhat
<srivatsa.bhat@linux.vnet.ibm.com> wrote:
> And with the following hunk added (which I had forgotten earlier), it worked just
> fine on powernv :-)

How are the patches coming along?

I just hung a machine here while attempting to kexec. It appears to
have onlined all of the secondary threads, and then hung here:

kexec: Waking offline cpu 1.
kvm: enabling virtualization on CPU1
kexec: Waking offline cpu 2.
kvm: enabling virtualization on CPU2
kexec: Waking offline cpu 3.
kvm: enabling virtualization on CPU3
kexec: Waking offline cpu 5.
kvm: enabling virtualization on CPU5
[...]
kvm: enabling virtualization on CPU63
kexec: waiting for cpu 1 (physical 1) to enter OPAL
kexec: waiting for cpu 2 (physical 2) to enter OPAL
kexec: waiting for cpu 3 (physical 3) to enter OPAL

I'm running benh's next branch as of thismorning, and SMT was off.

Could you please post your latest patches a series? I will test them here.

Cheers,

Joel
diff mbox

Patch

diff --git a/arch/powerpc/kernel/machine_kexec_64.c b/arch/powerpc/kernel/machine_kexec_64.c
index 2ef6c58..84e91293 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -243,6 +243,9 @@  static void wake_offline_cpus(void)
 {
 	int cpu = 0;
 
+	if (ppc_md.kexec_wake_prepare)
+		ppc_md.kexec_wake_prepare();
+
 	for_each_present_cpu(cpu) {
 		if (!cpu_online(cpu)) {
 			printk(KERN_INFO "kexec: Waking offline cpu %d.\n",