diff mbox

[RESEND] For machine check occurring while in guest, KVM layer tries recovery

Message ID 20150317092707.16806.62378.stgit@mars (mailing list archive)
State Rejected
Delegated to: Paul Mackerras
Headers show

Commit Message

Mahesh J Salgaonkar March 17, 2015, 9:27 a.m. UTC
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

and deliver MCE to guest if recovery is failed. For recovered errors
we just go back to normal functioning of guest. But there are cases
where we may hit MCE in guest with MSR(RI=0), which means MCE interrupt is
not recoverable and guest can not function normally it should go down to
panic path. The current implementation does not have check for MSR(RI=0)
which can cause guest to crash with Bad kernel stack pointer instead of
machine check oops message.

[26281.490060] Bad kernel stack pointer 3fff9ccce5b0 at c00000000000490c
[26281.490434] Oops: Bad kernel stack pointer, sig: 6 [#1]
[26281.490472] SMP NR_CPUS=2048 NUMA pSeries

This patch fixes this issue by checking MSR(RI=0) in KVM layer and forwarding
unrecoverable interrupt to guest which then panics with proper machine check
Oops message.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

Comments

Paul Mackerras March 23, 2015, 3:32 a.m. UTC | #1
On Tue, Mar 17, 2015 at 02:57:48PM +0530, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> and deliver MCE to guest if recovery is failed. For recovered errors
> we just go back to normal functioning of guest. But there are cases
> where we may hit MCE in guest with MSR(RI=0), which means MCE interrupt is
> not recoverable and guest can not function normally it should go down to
> panic path. The current implementation does not have check for MSR(RI=0)
> which can cause guest to crash with Bad kernel stack pointer instead of
> machine check oops message.
> 
> [26281.490060] Bad kernel stack pointer 3fff9ccce5b0 at c00000000000490c
> [26281.490434] Oops: Bad kernel stack pointer, sig: 6 [#1]
> [26281.490472] SMP NR_CPUS=2048 NUMA pSeries
> 
> This patch fixes this issue by checking MSR(RI=0) in KVM layer and forwarding
> unrecoverable interrupt to guest which then panics with proper machine check
> Oops message.
> 
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S |   12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)

The patch itself is fine, but you need a proper headline (something
like "KVM: PPC: Book3S HV: Inform guest of unrecoverable machine
checks" perhaps) as the subject of the email, and you need to post the
patch to both the kvm@vger.kernel.org list and the
kvm-ppc@vger.kernel.org list.  Also, the English in the patch
description could use some improvement.

Acked-by: Paul Mackerras <paulus@samba.org>
diff mbox

Patch

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index bb94e6f..258f46d 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2063,7 +2063,6 @@  machine_check_realmode:
 	mr	r3, r9		/* get vcpu pointer */
 	bl	kvmppc_realmode_machine_check
 	nop
-	cmpdi	r3, 0		/* Did we handle MCE ? */
 	ld	r9, HSTATE_KVM_VCPU(r13)
 	li	r12, BOOK3S_INTERRUPT_MACHINE_CHECK
 	/*
@@ -2076,13 +2075,18 @@  machine_check_realmode:
 	 * The old code used to return to host for unhandled errors which
 	 * was causing guest to hang with soft lockups inside guest and
 	 * makes it difficult to recover guest instance.
+	 *
+	 * if we receive machine check with MSR(RI=0) then deliver it to
+	 * guest as machine check causing guest to crash.
 	 */
-	ld	r10, VCPU_PC(r9)
 	ld	r11, VCPU_MSR(r9)
+	andi.	r10, r11, MSR_RI	/* check for unrecoverable exception */
+	beq	1f			/* Deliver a machine check to guest */
+	ld	r10, VCPU_PC(r9)
+	cmpdi	r3, 0		/* Did we handle MCE ? */
 	bne	2f	/* Continue guest execution. */
 	/* If not, deliver a machine check.  SRR0/1 are already set */
-	li	r10, BOOK3S_INTERRUPT_MACHINE_CHECK
-	ld	r11, VCPU_MSR(r9)
+1:	li	r10, BOOK3S_INTERRUPT_MACHINE_CHECK
 	bl	kvmppc_msr_interrupt
 2:	b	fast_interrupt_c_return