diff mbox

[RFC,7/9] powerpc: Decode and save machine check event.

Message ID 20130807093913.5389.80206.stgit@mars.in.ibm.com (mailing list archive)
State Changes Requested
Headers show

Commit Message

Mahesh J Salgaonkar Aug. 7, 2013, 9:39 a.m. UTC
From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

Now that we handle machine check in linux, the MCE decoding should also
take place in linux host. This info is crucial to log before we go down
in case we can not handle the machine check errors. This patch decodes
and populates a machine check event which contain high level meaning full
MCE information.

We do this in real mode C code with ME bit on. The MCE information is still
available on emergency stack (in pt_regs structure format). Even if we take
another exception at this point the MCE early handler will allocate a new
stack frame on top of current one. So when we return back here we still have
our MCE information safe on current stack.

We use per cpu buffer to save high level MCE information. Each per cpu buffer
is an array of machine check event structure indexed by per cpu counter
mce_nest_count. The mce_nest_count is incremented every time we enter
machine check early handler in real mode to get the current free slot
(index = mce_nest_count - 1). The mce_nest_count is decremented once the
MCE info is consumed by virtual mode machine exception handler.

This patch provides save_mce_event and get_mce_event generic routine that
can be used by machine check handlers to populate and retrieve the event.

This patch also updates kvm code to invoke get_mce_event to retrieve generic
mce event rather than paca->opal_mce_evt.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/mce.h        |  119 ++++++++++++++++++++++++++
 arch/powerpc/kernel/Makefile          |    2 
 arch/powerpc/kernel/mce.c             |  152 +++++++++++++++++++++++++++++++++
 arch/powerpc/kernel/mce_power.c       |  116 +++++++++++++++++++++++--
 arch/powerpc/kvm/book3s_hv_ras.c      |   24 ++---
 arch/powerpc/platforms/powernv/opal.c |   35 ++++----
 6 files changed, 407 insertions(+), 41 deletions(-)
 create mode 100644 arch/powerpc/kernel/mce.c

Comments

Scott Wood Aug. 7, 2013, 6:41 p.m. UTC | #1
On Wed, 2013-08-07 at 15:09 +0530, Mahesh J Salgaonkar wrote:
> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index a1aba53..0b5b04a 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -35,7 +35,7 @@ obj-y				:= cputable.o ptrace.o syscalls.o \
>  				   misc_$(CONFIG_WORD_SIZE).o vdso32/
>  obj-$(CONFIG_PPC64)		+= setup_64.o sys_ppc32.o \
>  				   signal_64.o ptrace32.o \
> -				   paca.o nvram_64.o firmware.o
> +				   paca.o nvram_64.o firmware.o mce.o

What happens when we build mce.o on book3e?

Please don't take things that are specific to certain hardware and try
to apply them to all powerpc.

For that matter, it would be nice if book3s-specific (or POWER-specific)
patches had that in the subject prefix, just as booke-specific patches
do.

-Scott
Mahesh J Salgaonkar Aug. 8, 2013, 3:40 a.m. UTC | #2
On 08/08/2013 12:11 AM, Scott Wood wrote:
> On Wed, 2013-08-07 at 15:09 +0530, Mahesh J Salgaonkar wrote:
>> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
>> index a1aba53..0b5b04a 100644
>> --- a/arch/powerpc/kernel/Makefile
>> +++ b/arch/powerpc/kernel/Makefile
>> @@ -35,7 +35,7 @@ obj-y				:= cputable.o ptrace.o syscalls.o \
>>  				   misc_$(CONFIG_WORD_SIZE).o vdso32/
>>  obj-$(CONFIG_PPC64)		+= setup_64.o sys_ppc32.o \
>>  				   signal_64.o ptrace32.o \
>> -				   paca.o nvram_64.o firmware.o
>> +				   paca.o nvram_64.o firmware.o mce.o
> 
> What happens when we build mce.o on book3e?
> 
> Please don't take things that are specific to certain hardware and try
> to apply them to all powerpc.

sure. I will move it under obj-$(CONFIG_PPC_BOOK3S_64).

> 
> For that matter, it would be nice if book3s-specific (or POWER-specific)
> patches had that in the subject prefix, just as booke-specific patches
> do.

Sure will follow this for future versions.

Thanks,
-Mahesh.
Paul Mackerras Aug. 8, 2013, 5:14 a.m. UTC | #3
On Wed, Aug 07, 2013 at 03:09:13PM +0530, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
> 
> Now that we handle machine check in linux, the MCE decoding should also
> take place in linux host. This info is crucial to log before we go down
> in case we can not handle the machine check errors. This patch decodes
> and populates a machine check event which contain high level meaning full
> MCE information.

A couple of things worry me about this patch:

First, there is the fact that we can only do get_mce_event() once for
a given machine check.  You call it in kvmppc_realmode_mc_power7(),
which is fine, but if it is not something we recognize and can handle
we will proceed to exit the guest and jump to machine_check_fwnmi,
which will then proceed to machine_check_common() and then
opal_machine_check(), where you have added another call to
get_mce_event(), which will probably underflow your little per-cpu
stack of machine check events.

Secondly, we shouldn't call save_mce_event() if we're not in
hypervisor mode, since per-cpu variables are not in general accessible
in real mode when running under a hypervisor with a limited real-mode
area (RMA).

Paul.
Mahesh J Salgaonkar Aug. 8, 2013, 1:19 p.m. UTC | #4
On 08/08/2013 10:44 AM, Paul Mackerras wrote:
> On Wed, Aug 07, 2013 at 03:09:13PM +0530, Mahesh J Salgaonkar wrote:
>> From: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
>>
>> Now that we handle machine check in linux, the MCE decoding should also
>> take place in linux host. This info is crucial to log before we go down
>> in case we can not handle the machine check errors. This patch decodes
>> and populates a machine check event which contain high level meaning full
>> MCE information.
> 
> A couple of things worry me about this patch:
> 
> First, there is the fact that we can only do get_mce_event() once for
> a given machine check.  You call it in kvmppc_realmode_mc_power7(),
> which is fine, but if it is not something we recognize and can handle
> we will proceed to exit the guest and jump to machine_check_fwnmi,
> which will then proceed to machine_check_common() and then
> opal_machine_check(), where you have added another call to
> get_mce_event(), which will probably underflow your little per-cpu
> stack of machine check events.

Ouch! I missed that. Will work on fixing it.

> 
> Secondly, we shouldn't call save_mce_event() if we're not in
> hypervisor mode, since per-cpu variables are not in general accessible
> in real mode when running under a hypervisor with a limited real-mode
> area (RMA).

Does that mean in real mode we can never be able to access per cpu
variable? OR do I need to use some tweaks to access those?

Thanks,
-Mahesh.
Benjamin Herrenschmidt Aug. 8, 2013, 1:33 p.m. UTC | #5
On Thu, 2013-08-08 at 18:49 +0530, Mahesh Jagannath Salgaonkar wrote:
> > Secondly, we shouldn't call save_mce_event() if we're not in
> > hypervisor mode, since per-cpu variables are not in general
> accessible
> > in real mode when running under a hypervisor with a limited
> real-mode
> > area (RMA).
> 
> Does that mean in real mode we can never be able to access per cpu
> variable? OR do I need to use some tweaks to access those?

Right, all you can access is stuff that was specifically allocated to be
in the RMA during boot, such as the PACA's.

Ben.
diff mbox

Patch

diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index 6866062..cf57f3d 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -66,5 +66,124 @@ 
 
 #define P8_DSISR_MC_SLB_ERRORS		(P7_DSISR_MC_SLB_ERRORS | \
 					 P8_DSISR_MC_ERAT_MULTIHIT_SEC)
+enum MCE_Version {
+	MCE_V1 = 1,
+};
+
+enum MCE_Severity {
+	MCE_SEV_NO_ERROR = 0,
+	MCE_SEV_WARNING = 1,
+	MCE_SEV_ERROR_SYNC = 2,
+	MCE_SEV_FATAL = 3,
+};
+
+enum MCE_Disposition {
+	MCE_DISPOSITION_RECOVERED = 0,
+	MCE_DISPOSITION_NOT_RECOVERED = 1,
+};
+
+enum MCE_Initiator {
+	MCE_INITIATOR_UNKNOWN = 0,
+	MCE_INITIATOR_CPU = 1,
+};
+
+enum MCE_ErrorType {
+	MCE_ERROR_TYPE_UNKNOWN = 0,
+	MCE_ERROR_TYPE_UE = 1,
+	MCE_ERROR_TYPE_SLB = 2,
+	MCE_ERROR_TYPE_ERAT = 3,
+	MCE_ERROR_TYPE_TLB = 4,
+};
+
+enum MCE_UeErrorType {
+	MCE_UE_ERROR_INDETERMINATE = 0,
+	MCE_UE_ERROR_IFETCH = 1,
+	MCE_UE_ERROR_PAGE_TABLE_WALK_IFETCH = 2,
+	MCE_UE_ERROR_LOAD_STORE = 3,
+	MCE_UE_ERROR_PAGE_TABLE_WALK_LOAD_STORE = 4,
+};
+
+enum MCE_SlbErrorType {
+	MCE_SLB_ERROR_INDETERMINATE = 0,
+	MCE_SLB_ERROR_PARITY = 1,
+	MCE_SLB_ERROR_MULTIHIT = 2,
+};
+
+enum MCE_EratErrorType {
+	MCE_ERAT_ERROR_INDETERMINATE = 0,
+	MCE_ERAT_ERROR_PARITY = 1,
+	MCE_ERAT_ERROR_MULTIHIT = 2,
+};
+
+enum MCE_TlbErrorType {
+	MCE_TLB_ERROR_INDETERMINATE = 0,
+	MCE_TLB_ERROR_PARITY = 1,
+	MCE_TLB_ERROR_MULTIHIT = 2,
+};
+
+struct machine_check_event {
+	enum MCE_Version	version:8;	/* 0x00 */
+	uint8_t			in_use;		/* 0x01 */
+	enum MCE_Severity	severity:8;	/* 0x02 */
+	enum MCE_Initiator	initiator:8;	/* 0x03 */
+	enum MCE_ErrorType	error_type:8;	/* 0x04 */
+	enum MCE_Disposition	disposition:8;	/* 0x05 */
+	uint8_t			reserved_1[2];	/* 0x06 */
+	uint64_t		gpr3;		/* 0x08 */
+	uint64_t		srr0;		/* 0x10 */
+	uint64_t		srr1;		/* 0x18 */
+	union {					/* 0x20 */
+		struct {
+			enum MCE_UeErrorType ue_error_type:8;
+			uint8_t		effective_address_provided;
+			uint8_t		physical_address_provided;
+			uint8_t		reserved_1[5];
+			uint64_t	effective_address;
+			uint64_t	physical_address;
+			uint8_t		reserved_2[8];
+		} ue_error;
+
+		struct {
+			enum MCE_SlbErrorType slb_error_type:8;
+			uint8_t		effective_address_provided;
+			uint8_t		reserved_1[6];
+			uint64_t	effective_address;
+			uint8_t		reserved_2[16];
+		} slb_error;
+
+		struct {
+			enum MCE_EratErrorType erat_error_type:8;
+			uint8_t		effective_address_provided;
+			uint8_t		reserved_1[6];
+			uint64_t	effective_address;
+			uint8_t		reserved_2[16];
+		} erat_error;
+
+		struct {
+			enum MCE_TlbErrorType tlb_error_type:8;
+			uint8_t		effective_address_provided;
+			uint8_t		reserved_1[6];
+			uint64_t	effective_address;
+			uint8_t		reserved_2[16];
+		} tlb_error;
+	} u;
+};
+
+struct mce_error_info {
+	enum MCE_ErrorType error_type:8;
+	union {
+		enum MCE_UeErrorType ue_error_type:8;
+		enum MCE_SlbErrorType slb_error_type:8;
+		enum MCE_EratErrorType erat_error_type:8;
+		enum MCE_TlbErrorType tlb_error_type:8;
+	} u;
+	uint8_t		reserved[2];
+};
+
+#define MAX_MC_EVT	100
+
+extern void save_mce_event(struct pt_regs *regs, long handled,
+			   struct mce_error_info *mce_err, uint64_t addr);
+extern int get_mce_event(struct machine_check_event *mce);
 
 #endif /* __ASM_PPC64_MCE_H__ */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index a1aba53..0b5b04a 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -35,7 +35,7 @@  obj-y				:= cputable.o ptrace.o syscalls.o \
 				   misc_$(CONFIG_WORD_SIZE).o vdso32/
 obj-$(CONFIG_PPC64)		+= setup_64.o sys_ppc32.o \
 				   signal_64.o ptrace32.o \
-				   paca.o nvram_64.o firmware.o
+				   paca.o nvram_64.o firmware.o mce.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)	+= hw_breakpoint.o
 obj-$(CONFIG_PPC_BOOK3S_64)	+= cpu_setup_ppc970.o cpu_setup_pa6t.o
 obj-$(CONFIG_PPC_BOOK3S_64)	+= cpu_setup_power.o
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
new file mode 100644
index 0000000..a523d8b
--- /dev/null
+++ b/arch/powerpc/kernel/mce.c
@@ -0,0 +1,152 @@ 
+/*
+ * Machine check exception handling.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright 2013 IBM Corporation
+ * Author: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
+ */
+
+#undef DEBUG
+#define pr_fmt(fmt) "mce: " fmt
+
+#include <linux/types.h>
+#include <linux/ptrace.h>
+#include <linux/percpu.h>
+#include <linux/export.h>
+#include <asm/mce.h>
+
+static DEFINE_PER_CPU(int, mce_nest_count);
+static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], mce_event);
+
+static void mce_set_error_info(struct machine_check_event *mce,
+			       struct mce_error_info *mce_err)
+{
+	mce->error_type = mce_err->error_type;
+	switch (mce_err->error_type) {
+	case MCE_ERROR_TYPE_UE:
+		mce->u.ue_error.ue_error_type = mce_err->u.ue_error_type;
+		break;
+	case MCE_ERROR_TYPE_SLB:
+		mce->u.slb_error.slb_error_type = mce_err->u.slb_error_type;
+		break;
+	case MCE_ERROR_TYPE_ERAT:
+		mce->u.erat_error.erat_error_type = mce_err->u.erat_error_type;
+		break;
+	case MCE_ERROR_TYPE_TLB:
+		mce->u.tlb_error.tlb_error_type = mce_err->u.tlb_error_type;
+		break;
+	case MCE_ERROR_TYPE_UNKNOWN:
+	default:
+		break;
+	}
+}
+
+/*
+ * Decode and save high level MCE information into per cpu buffer which
+ * is an array of machine_check_event structure.
+ */
+void save_mce_event(struct pt_regs *regs, long handled,
+		    struct mce_error_info *mce_err,
+		    uint64_t addr)
+{
+	uint64_t srr1;
+	int index = __get_cpu_var(mce_nest_count)++;
+	struct machine_check_event *mce = &__get_cpu_var(mce_event[index]);
+
+	/*
+	 * Return if we don't have enough space to log mce event.
+	 * mce_nest_count may go beyond MAX_MC_EVT but that's ok,
+	 * the check below will stop buffer overrun.
+	 */
+	if (index >= MAX_MC_EVT)
+		return;
+
+	/* Populate generic machine check info */
+	mce->version = MCE_V1;
+	mce->srr0 = regs->nip;
+	mce->srr1 = regs->msr;
+	mce->gpr3 = regs->gpr[3];
+	mce->in_use = 1;
+
+	mce->initiator = MCE_INITIATOR_CPU;
+	if (handled)
+		mce->disposition = MCE_DISPOSITION_RECOVERED;
+	else
+		mce->disposition = MCE_DISPOSITION_NOT_RECOVERED;
+	mce->severity = MCE_SEV_ERROR_SYNC;
+
+	srr1 = regs->msr;
+
+	/*
+	 * Populate the mce error_type and type-specific error_type.
+	 */
+	mce_set_error_info(mce, mce_err);
+
+	if (!addr)
+		return;
+
+	if (mce->error_type == MCE_ERROR_TYPE_TLB) {
+		mce->u.tlb_error.effective_address_provided = true;
+		mce->u.tlb_error.effective_address = addr;
+	} else if (mce->error_type == MCE_ERROR_TYPE_SLB) {
+		mce->u.slb_error.effective_address_provided = true;
+		mce->u.slb_error.effective_address = addr;
+	} else if (mce->error_type == MCE_ERROR_TYPE_ERAT) {
+		mce->u.erat_error.effective_address_provided = true;
+		mce->u.erat_error.effective_address = addr;
+	} else if (mce->error_type == MCE_ERROR_TYPE_UE) {
+		mce->u.ue_error.effective_address_provided = true;
+		mce->u.ue_error.effective_address = addr;
+	}
+	return;
+}
+
+/*
+ * get_mce_event:
+ *	mce	Pointer to machine_check_event structure to be filled.
+ *
+ *	return	1 = success
+ *		0 = failure
+ *
+ * get_mce_event() will be called by platform specific machine check
+ * handle routine and in KVM.
+ * When we call get_mce_event(), we are still in interrupt context and
+ * preemption will not be scheduled until ret_from_expect() routine
+ * is called.
+ */
+int get_mce_event(struct machine_check_event *mce)
+{
+	int index = __get_cpu_var(mce_nest_count) - 1;
+	struct machine_check_event *mc_evt;
+	int ret = 0;
+
+	/* Sanity check */
+	if (!mce || (index < 0))
+		return ret;
+
+	/* Check if we have MCE info to process. */
+	if (index < MAX_MC_EVT) {
+		mc_evt = &__get_cpu_var(mce_event[index]);
+		/* Copy the event structure and release the original */
+		*mce = *mc_evt;
+		mc_evt->in_use = 0;
+		ret = 1;
+	}
+	/* Decrement the count to free the slot */
+	__get_cpu_var(mce_nest_count)--;
+
+	return ret;
+}
diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
index 949d102..c153d9c 100644
--- a/arch/powerpc/kernel/mce_power.c
+++ b/arch/powerpc/kernel/mce_power.c
@@ -136,22 +136,116 @@  static long mce_handle_ierror_p7(uint64_t srr1)
 	return handled;
 }
 
+static void mce_get_common_ierror(struct mce_error_info *mce_err, uint64_t srr1)
+{
+	switch (P7_SRR1_MC_IFETCH(srr1)) {
+	case P7_SRR1_MC_IFETCH_SLB_PARITY:
+		mce_err->error_type = MCE_ERROR_TYPE_SLB;
+		mce_err->u.slb_error_type = MCE_SLB_ERROR_PARITY;
+		break;
+	case P7_SRR1_MC_IFETCH_SLB_MULTIHIT:
+		mce_err->error_type = MCE_ERROR_TYPE_SLB;
+		mce_err->u.slb_error_type = MCE_SLB_ERROR_MULTIHIT;
+		break;
+	case P7_SRR1_MC_IFETCH_TLB_MULTIHIT:
+		mce_err->error_type = MCE_ERROR_TYPE_TLB;
+		mce_err->u.tlb_error_type = MCE_TLB_ERROR_MULTIHIT;
+		break;
+	case P7_SRR1_MC_IFETCH_UE:
+	case P7_SRR1_MC_IFETCH_UE_IFU_INTERNAL:
+		mce_err->error_type = MCE_ERROR_TYPE_UE;
+		mce_err->u.ue_error_type = MCE_UE_ERROR_IFETCH;
+		break;
+	case P7_SRR1_MC_IFETCH_UE_TLB_RELOAD:
+		mce_err->error_type = MCE_ERROR_TYPE_UE;
+		mce_err->u.ue_error_type =
+				MCE_UE_ERROR_PAGE_TABLE_WALK_IFETCH;
+		break;
+	}
+}
+
+static void mce_get_ierror_p7(struct mce_error_info *mce_err, uint64_t srr1)
+{
+	mce_get_common_ierror(mce_err, srr1);
+	if (P7_SRR1_MC_IFETCH(srr1) == P7_SRR1_MC_IFETCH_SLB_BOTH) {
+		mce_err->error_type = MCE_ERROR_TYPE_SLB;
+		mce_err->u.slb_error_type = MCE_SLB_ERROR_INDETERMINATE;
+	}
+}
+
+static void mce_get_derror_p7(struct mce_error_info *mce_err, uint64_t dsisr)
+{
+	if (dsisr & P7_DSISR_MC_UE) {
+		mce_err->error_type = MCE_ERROR_TYPE_UE;
+		mce_err->u.ue_error_type = MCE_UE_ERROR_LOAD_STORE;
+	} else if (dsisr & P7_DSISR_MC_UE_TABLEWALK) {
+		mce_err->error_type = MCE_ERROR_TYPE_UE;
+		mce_err->u.ue_error_type =
+				MCE_UE_ERROR_PAGE_TABLE_WALK_LOAD_STORE;
+	} else if (dsisr & P7_DSISR_MC_ERAT_MULTIHIT) {
+		mce_err->error_type = MCE_ERROR_TYPE_ERAT;
+		mce_err->u.erat_error_type = MCE_ERAT_ERROR_MULTIHIT;
+	} else if (dsisr & P7_DSISR_MC_SLB_MULTIHIT) {
+		mce_err->error_type = MCE_ERROR_TYPE_SLB;
+		mce_err->u.slb_error_type = MCE_SLB_ERROR_MULTIHIT;
+	} else if (dsisr & P7_DSISR_MC_SLB_PARITY_MFSLB) {
+		mce_err->error_type = MCE_ERROR_TYPE_SLB;
+		mce_err->u.slb_error_type = MCE_SLB_ERROR_PARITY;
+	} else if (dsisr & P7_DSISR_MC_TLB_MULTIHIT_MFTLB) {
+		mce_err->error_type = MCE_ERROR_TYPE_TLB;
+		mce_err->u.tlb_error_type = MCE_TLB_ERROR_MULTIHIT;
+	} else if (dsisr & P7_DSISR_MC_SLB_MULTIHIT_PARITY) {
+		mce_err->error_type = MCE_ERROR_TYPE_SLB;
+		mce_err->u.slb_error_type = MCE_SLB_ERROR_INDETERMINATE;
+	}
+}
+
 long __machine_check_early_realmode_p7(struct pt_regs *regs)
 {
-	uint64_t srr1;
+	uint64_t srr1, addr;
 	long handled = 1;
+	struct mce_error_info mce_error_info = { 0 };
 
 	srr1 = regs->msr;
 
-	if (P7_SRR1_MC_LOADSTORE(srr1))
+	/*
+	 * Handle memory errors depending whether this was a load/store or
+	 * ifetch exception. Also, populate the mce error_type and
+	 * type-specific error_type from either SRR1 or DSISR, depending
+	 * whether this was a load/store or ifetch exception
+	 */
+	if (P7_SRR1_MC_LOADSTORE(srr1)) {
 		handled = mce_handle_derror_p7(regs->dsisr);
-	else
+		mce_get_derror_p7(&mce_error_info, regs->dsisr);
+		addr = regs->dar;
+	} else {
 		handled = mce_handle_ierror_p7(srr1);
+		mce_get_ierror_p7(&mce_error_info, srr1);
+		addr = regs->nip;
+	}
 
-	/* TODO: Decode machine check reason. */
+	save_mce_event(regs, handled, &mce_error_info, addr);
 	return handled;
 }
 
+static void mce_get_ierror_p8(struct mce_error_info *mce_err, uint64_t srr1)
+{
+	mce_get_common_ierror(mce_err, srr1);
+	if (P7_SRR1_MC_IFETCH(srr1) == P8_SRR1_MC_IFETCH_ERAT_MULTIHIT) {
+		mce_err->error_type = MCE_ERROR_TYPE_ERAT;
+		mce_err->u.erat_error_type = MCE_ERAT_ERROR_MULTIHIT;
+	}
+}
+
+static void mce_get_derror_p8(struct mce_error_info *mce_err, uint64_t dsisr)
+{
+	mce_get_derror_p7(mce_err, dsisr);
+	if (dsisr & P8_DSISR_MC_ERAT_MULTIHIT_SEC) {
+		mce_err->error_type = MCE_ERROR_TYPE_ERAT;
+		mce_err->u.erat_error_type = MCE_ERAT_ERROR_MULTIHIT;
+	}
+}
+
 static long mce_handle_ierror_p8(uint64_t srr1)
 {
 	long handled = 0;
@@ -172,16 +266,22 @@  static long mce_handle_derror_p8(uint64_t dsisr)
 
 long __machine_check_early_realmode_p8(struct pt_regs *regs)
 {
-	uint64_t srr1;
+	uint64_t srr1, addr;
 	long handled = 1;
+	struct mce_error_info mce_error_info = { 0 };
 
 	srr1 = regs->msr;
 
-	if (P7_SRR1_MC_LOADSTORE(srr1))
+	if (P7_SRR1_MC_LOADSTORE(srr1)) {
 		handled = mce_handle_derror_p8(regs->dsisr);
-	else
+		mce_get_derror_p8(&mce_error_info, regs->dsisr);
+		addr = regs->dar;
+	} else {
 		handled = mce_handle_ierror_p8(srr1);
+		mce_get_ierror_p8(&mce_error_info, srr1);
+		addr = regs->nip;
+	}
 
-	/* TODO: Decode machine check reason. */
+	save_mce_event(regs, handled, &mce_error_info, addr);
 	return handled;
 }
diff --git a/arch/powerpc/kvm/book3s_hv_ras.c b/arch/powerpc/kvm/book3s_hv_ras.c
index 5c427b4..cddeba3 100644
--- a/arch/powerpc/kvm/book3s_hv_ras.c
+++ b/arch/powerpc/kvm/book3s_hv_ras.c
@@ -12,6 +12,7 @@ 
 #include <linux/kvm_host.h>
 #include <linux/kernel.h>
 #include <asm/opal.h>
+#include <asm/mce.h>
 
 /* SRR1 bits for machine check on POWER7 */
 #define SRR1_MC_LDSTERR		(1ul << (63-42))
@@ -67,9 +68,7 @@  static void reload_slb(struct kvm_vcpu *vcpu)
 static long kvmppc_realmode_mc_power7(struct kvm_vcpu *vcpu)
 {
 	unsigned long srr1 = vcpu->arch.shregs.msr;
-#ifdef CONFIG_PPC_POWERNV
-	struct opal_machine_check_event *opal_evt;
-#endif
+	struct machine_check_event mce_evt;
 	long handled = 1;
 
 	if (srr1 & SRR1_MC_LDSTERR) {
@@ -109,22 +108,19 @@  static long kvmppc_realmode_mc_power7(struct kvm_vcpu *vcpu)
 		handled = 0;
 	}
 
-#ifdef CONFIG_PPC_POWERNV
 	/*
-	 * See if OPAL has already handled the condition.
-	 * We assume that if the condition is recovered then OPAL
+	 * See if we have already handled the condition in the linux host.
+	 * We assume that if the condition is recovered then linux host
 	 * will have generated an error log event that we will pick
 	 * up and log later.
 	 */
-	opal_evt = local_paca->opal_mc_evt;
-	if (opal_evt->version == OpalMCE_V1 &&
-	    (opal_evt->severity == OpalMCE_SEV_NO_ERROR ||
-	     opal_evt->disposition == OpalMCE_DISPOSITION_RECOVERED))
-		handled = 1;
+	if (!get_mce_event(&mce_evt))
+		return handled;
 
-	if (handled)
-		opal_evt->in_use = 0;
-#endif
+	if (mce_evt.version == MCE_V1 &&
+	    (mce_evt.severity == MCE_SEV_NO_ERROR ||
+	     mce_evt.disposition == MCE_DISPOSITION_RECOVERED))
+		handled = 1;
 
 	return handled;
 }
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 106301f..bd1afa4 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -19,6 +19,7 @@ 
 #include <linux/slab.h>
 #include <asm/opal.h>
 #include <asm/firmware.h>
+#include <asm/mce.h>
 
 #include "powernv.h"
 
@@ -245,8 +246,7 @@  int opal_put_chars(uint32_t vtermno, const char *data, int total_len)
 
 int opal_machine_check(struct pt_regs *regs)
 {
-	struct opal_machine_check_event *opal_evt = get_paca()->opal_mc_evt;
-	struct opal_machine_check_event evt;
+	struct machine_check_event evt;
 	const char *level, *sevstr, *subtype;
 	static const char *opal_mc_ue_types[] = {
 		"Indeterminate",
@@ -271,30 +271,29 @@  int opal_machine_check(struct pt_regs *regs)
 		"Multihit",
 	};
 
-	/* Copy the event structure and release the original */
-	evt = *opal_evt;
-	opal_evt->in_use = 0;
+	if (!get_mce_event(&evt))
+		return 0;
 
 	/* Print things out */
-	if (evt.version != OpalMCE_V1) {
+	if (evt.version != MCE_V1) {
 		pr_err("Machine Check Exception, Unknown event version %d !\n",
 		       evt.version);
 		return 0;
 	}
 	switch(evt.severity) {
-	case OpalMCE_SEV_NO_ERROR:
+	case MCE_SEV_NO_ERROR:
 		level = KERN_INFO;
 		sevstr = "Harmless";
 		break;
-	case OpalMCE_SEV_WARNING:
+	case MCE_SEV_WARNING:
 		level = KERN_WARNING;
 		sevstr = "";
 		break;
-	case OpalMCE_SEV_ERROR_SYNC:
+	case MCE_SEV_ERROR_SYNC:
 		level = KERN_ERR;
 		sevstr = "Severe";
 		break;
-	case OpalMCE_SEV_FATAL:
+	case MCE_SEV_FATAL:
 	default:
 		level = KERN_ERR;
 		sevstr = "Fatal";
@@ -302,12 +301,12 @@  int opal_machine_check(struct pt_regs *regs)
 	}
 
 	printk("%s%s Machine check interrupt [%s]\n", level, sevstr,
-	       evt.disposition == OpalMCE_DISPOSITION_RECOVERED ?
+	       evt.disposition == MCE_DISPOSITION_RECOVERED ?
 	       "Recovered" : "[Not recovered");
 	printk("%s  Initiator: %s\n", level,
-	       evt.initiator == OpalMCE_INITIATOR_CPU ? "CPU" : "Unknown");
+	       evt.initiator == MCE_INITIATOR_CPU ? "CPU" : "Unknown");
 	switch(evt.error_type) {
-	case OpalMCE_ERROR_TYPE_UE:
+	case MCE_ERROR_TYPE_UE:
 		subtype = evt.u.ue_error.ue_error_type <
 			ARRAY_SIZE(opal_mc_ue_types) ?
 			opal_mc_ue_types[evt.u.ue_error.ue_error_type]
@@ -320,7 +319,7 @@  int opal_machine_check(struct pt_regs *regs)
 			printk("%s      Physial address: %016llx\n",
 			       level, evt.u.ue_error.physical_address);
 		break;
-	case OpalMCE_ERROR_TYPE_SLB:
+	case MCE_ERROR_TYPE_SLB:
 		subtype = evt.u.slb_error.slb_error_type <
 			ARRAY_SIZE(opal_mc_slb_types) ?
 			opal_mc_slb_types[evt.u.slb_error.slb_error_type]
@@ -330,7 +329,7 @@  int opal_machine_check(struct pt_regs *regs)
 			printk("%s    Effective address: %016llx\n",
 			       level, evt.u.slb_error.effective_address);
 		break;
-	case OpalMCE_ERROR_TYPE_ERAT:
+	case MCE_ERROR_TYPE_ERAT:
 		subtype = evt.u.erat_error.erat_error_type <
 			ARRAY_SIZE(opal_mc_erat_types) ?
 			opal_mc_erat_types[evt.u.erat_error.erat_error_type]
@@ -340,7 +339,7 @@  int opal_machine_check(struct pt_regs *regs)
 			printk("%s    Effective address: %016llx\n",
 			       level, evt.u.erat_error.effective_address);
 		break;
-	case OpalMCE_ERROR_TYPE_TLB:
+	case MCE_ERROR_TYPE_TLB:
 		subtype = evt.u.tlb_error.tlb_error_type <
 			ARRAY_SIZE(opal_mc_tlb_types) ?
 			opal_mc_tlb_types[evt.u.tlb_error.tlb_error_type]
@@ -351,11 +350,11 @@  int opal_machine_check(struct pt_regs *regs)
 			       level, evt.u.tlb_error.effective_address);
 		break;
 	default:
-	case OpalMCE_ERROR_TYPE_UNKNOWN:
+	case MCE_ERROR_TYPE_UNKNOWN:
 		printk("%s  Error type: Unknown\n", level);
 		break;
 	}
-	return evt.severity == OpalMCE_SEV_FATAL ? 0 : 1;
+	return evt.severity == MCE_SEV_FATAL ? 0 : 1;
 }
 
 static irqreturn_t opal_interrupt(int irq, void *data)