[v2] powerpc/book3s: mce: Move add_taint() later in virtual mode.

Submitted by Mahesh Salgaonkar on April 18, 2017, 4:38 p.m.

Details

Message ID 149253349689.12722.1516352063236500616.stgit@jupiter.in.ibm.com
State New
Headers show

Commit Message

Mahesh Salgaonkar April 18, 2017, 4:38 p.m.
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

machine_check_early() gets called in real mode. The very first time when
add_taint() is called, it prints a warning which ends up calling opal
call (that uses OPAL_CALL wrapper) for writing it to console. If we get a
very first machine check while we are in opal we are doomed. OPAL_CALL
overwrites the PACASAVEDMSR in r13 and in this case when we are done with
MCE handling the original opal call will use this new MSR on it's way
back to opal_return. This usually leads unexpected behaviour or kernel
to panic. Instead move the add_taint() call later in the virtual mode
where it is safe to call.

This is broken with current FW level. We got lucky so far for not getting
very first MCE hit while in OPAL. But easily reproducible on Mambo.
This should go to stable.

Fixes: 27ea2c420cad powerpc: Set the correct kernel taint on machine check errors.
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/mce.c   |    2 ++
 arch/powerpc/kernel/traps.c |    4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

Comments

Daniel Axtens April 20, 2017, 11:09 a.m.
Hi Mahesh,

> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> index a1475e6..b23b323 100644
> --- a/arch/powerpc/kernel/mce.c
> +++ b/arch/powerpc/kernel/mce.c
> @@ -221,6 +221,8 @@ static void machine_check_process_queued_event(struct irq_work *work)
>  {
>  	int index;
>  
> +	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
> +
This bit makes sense...

> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index ff365f9..af97e81 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -741,6 +739,8 @@ void machine_check_exception(struct pt_regs *regs)
>  
>  	__this_cpu_inc(irq_stat.mce_exceptions);
>  
> +	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
> +

But this bit I'm not sure about.

Isn't machine_check_exception called from asm in
kernel/exceptions-64s.S? As in, it's called really early/in real mode?

Regards,
Daniel

>  	/* See if any machine dependent calls. In theory, we would want
>  	 * to call the CPU first, and call the ppc_md. one if the CPU
>  	 * one returns a positive number. However there is existing code
Michael Ellerman April 21, 2017, 4:07 a.m.
Daniel Axtens <dja@axtens.net> writes:
>> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
>> index a1475e6..b23b323 100644
>> --- a/arch/powerpc/kernel/mce.c
>> +++ b/arch/powerpc/kernel/mce.c
>> @@ -221,6 +221,8 @@ static void machine_check_process_queued_event(struct irq_work *work)
>>  {
>>  	int index;
>>  
>> +	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
>> +
> This bit makes sense...
>
>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>> index ff365f9..af97e81 100644
>> --- a/arch/powerpc/kernel/traps.c
>> +++ b/arch/powerpc/kernel/traps.c
>> @@ -741,6 +739,8 @@ void machine_check_exception(struct pt_regs *regs)
>>  
>>  	__this_cpu_inc(irq_stat.mce_exceptions);
>>  
>> +	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
>> +
>
> But this bit I'm not sure about.
>
> Isn't machine_check_exception called from asm in
> kernel/exceptions-64s.S? As in, it's called really early/in real mode?

It is called from there, in asm, but not from real mode AFAICS.

There's a call from machine_check_common(), we're already in virtual
mode there.

The other call is from unrecover_mce(), and both places that call that
do so via rfid, using PACAKMSR, which should turn on virtual mode.


But none of that really matters. The fundamental issue here is we can't
recursively call OPAL, that's what matters.

So if we were in OPAL and take an MCE, then we must not call OPAL again
from the MCE handler.

This fixes one case where we know that can happen, but AFAICS we are not
protected in general from it.

For example if we take an MCE in OPAL, decide it's not recoverable and
go to unrecover_mce(), that will call machine_check_exception() which
can then call OPAL via printk.

Or maybe there's a check in there somewhere that makes it OK, but it's
not clear to me.

cheers

Patch hide | download patch | download mbox

diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index a1475e6..b23b323 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -221,6 +221,8 @@  static void machine_check_process_queued_event(struct irq_work *work)
 {
 	int index;
 
+	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
+
 	/*
 	 * For now just print it to console.
 	 * TODO: log this error event to FSP or nvram.
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index ff365f9..af97e81 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -306,8 +306,6 @@  long machine_check_early(struct pt_regs *regs)
 
 	__this_cpu_inc(irq_stat.mce_exceptions);
 
-	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
-
 	if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
 		handled = cur_cpu_spec->machine_check_early(regs);
 	return handled;
@@ -741,6 +739,8 @@  void machine_check_exception(struct pt_regs *regs)
 
 	__this_cpu_inc(irq_stat.mce_exceptions);
 
+	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
+
 	/* See if any machine dependent calls. In theory, we would want
 	 * to call the CPU first, and call the ppc_md. one if the CPU
 	 * one returns a positive number. However there is existing code