[v2,3/5] powerpc/mce: Hookup derror (load/store) UE errors

Message ID 20170913061049.13256-4-bsingharora@gmail.com
State New
Headers show
Series
  • Revisit MCE handling for UE Errors
Related show

Commit Message

Balbir Singh Sept. 13, 2017, 6:10 a.m.
Extract physical_address for UE errors by walking the page
tables for the mm and address at the NIP, to extract the
instruction. Then use the instruction to find the effective
address via analyse_instr().

We might have page table walking races, but we expect them to
be rare, the physical address extraction is best effort. The idea
is to then hook up this infrastructure to memory failure eventually.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
---
 arch/powerpc/include/asm/mce.h  |  2 +-
 arch/powerpc/kernel/mce.c       |  6 +++-
 arch/powerpc/kernel/mce_power.c | 80 ++++++++++++++++++++++++++++++++++++++---
 3 files changed, 81 insertions(+), 7 deletions(-)

Comments

Nicholas Piggin Sept. 13, 2017, 6:21 a.m. | #1
On Wed, 13 Sep 2017 16:10:47 +1000
Balbir Singh <bsingharora@gmail.com> wrote:

> Extract physical_address for UE errors by walking the page
> tables for the mm and address at the NIP, to extract the
> instruction. Then use the instruction to find the effective
> address via analyse_instr().
> 
> We might have page table walking races, but we expect them to
> be rare, the physical address extraction is best effort. The idea
> is to then hook up this infrastructure to memory failure eventually.

This all looks pretty good to me, you can probably update these
changelogs now because you are hooking it into memory failure.

I wonder if it would be worth skipping the instruction analysis and
page table walk if we've recursed up to the maximum MCE depth, just
in case we're hitting MCEs in part of that code or data.

Thanks,
Nick
Balbir Singh Sept. 13, 2017, 6:26 a.m. | #2
On Wed, Sep 13, 2017 at 4:21 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
> On Wed, 13 Sep 2017 16:10:47 +1000
> Balbir Singh <bsingharora@gmail.com> wrote:
>
>> Extract physical_address for UE errors by walking the page
>> tables for the mm and address at the NIP, to extract the
>> instruction. Then use the instruction to find the effective
>> address via analyse_instr().
>>
>> We might have page table walking races, but we expect them to
>> be rare, the physical address extraction is best effort. The idea
>> is to then hook up this infrastructure to memory failure eventually.
>
> This all looks pretty good to me, you can probably update these
> changelogs now because you are hooking it into memory failure.

Yep, the eventually can probably go, I meant in the next patch.
The following patch then hooks this up into memory_failure

>
> I wonder if it would be worth skipping the instruction analysis and
> page table walk if we've recursed up to the maximum MCE depth, just
> in case we're hitting MCEs in part of that code or data.

Yep, good idea. Would you be OK if we did this after this small series
got merged? Since that would mean that we got a UE error
while processing the our third machine check exception, I think the
probability of us running into that is low, but I'd definitely like to do that
once these changes are merged.

Balbir Singh.
Nicholas Piggin Sept. 13, 2017, 8:56 a.m. | #3
On Wed, 13 Sep 2017 16:26:59 +1000
Balbir Singh <bsingharora@gmail.com> wrote:

> On Wed, Sep 13, 2017 at 4:21 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
> > On Wed, 13 Sep 2017 16:10:47 +1000
> > Balbir Singh <bsingharora@gmail.com> wrote:
> >  
> >> Extract physical_address for UE errors by walking the page
> >> tables for the mm and address at the NIP, to extract the
> >> instruction. Then use the instruction to find the effective
> >> address via analyse_instr().
> >>
> >> We might have page table walking races, but we expect them to
> >> be rare, the physical address extraction is best effort. The idea
> >> is to then hook up this infrastructure to memory failure eventually.  
> >
> > This all looks pretty good to me, you can probably update these
> > changelogs now because you are hooking it into memory failure.  
> 
> Yep, the eventually can probably go, I meant in the next patch.
> The following patch then hooks this up into memory_failure
> 
> >
> > I wonder if it would be worth skipping the instruction analysis and
> > page table walk if we've recursed up to the maximum MCE depth, just
> > in case we're hitting MCEs in part of that code or data.  
> 
> Yep, good idea. Would you be OK if we did this after this small series
> got merged?

I don't mind much, but I'd have thought being that it's all new code,
adding the check would be pretty easy.

    if (get_paca()->in_mce == 4) {}

(Probably with the '4' appropriately #defined out of here and the
exception-64s.S code)

> Since that would mean that we got a UE error
> while processing the our third machine check exception, I think the
> probability of us running into that is low, but I'd definitely like to do that
> once these changes are merged.

If we're getting UEs in the machine check code or walking kernel
page tables though, it will just keep recurring. Unlikely yes, but
it's still a slight regression.

Thanks,
Nick

Patch

diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
index 75292c7..3a1226e 100644
--- a/arch/powerpc/include/asm/mce.h
+++ b/arch/powerpc/include/asm/mce.h
@@ -204,7 +204,7 @@  struct mce_error_info {
 
 extern void save_mce_event(struct pt_regs *regs, long handled,
 			   struct mce_error_info *mce_err, uint64_t nip,
-			   uint64_t addr);
+			   uint64_t addr, uint64_t phys_addr);
 extern int get_mce_event(struct machine_check_event *mce, bool release);
 extern void release_mce_event(void);
 extern void machine_check_queue_event(void);
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index f351adf..c7acc33 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -82,7 +82,7 @@  static void mce_set_error_info(struct machine_check_event *mce,
  */
 void save_mce_event(struct pt_regs *regs, long handled,
 		    struct mce_error_info *mce_err,
-		    uint64_t nip, uint64_t addr)
+		    uint64_t nip, uint64_t addr, uint64_t phys_addr)
 {
 	int index = __this_cpu_inc_return(mce_nest_count) - 1;
 	struct machine_check_event *mce = this_cpu_ptr(&mce_event[index]);
@@ -140,6 +140,10 @@  void save_mce_event(struct pt_regs *regs, long handled,
 	} else if (mce->error_type == MCE_ERROR_TYPE_UE) {
 		mce->u.ue_error.effective_address_provided = true;
 		mce->u.ue_error.effective_address = addr;
+		if (phys_addr != ULONG_MAX) {
+			mce->u.ue_error.physical_address_provided = true;
+			mce->u.ue_error.physical_address = phys_addr;
+		}
 	}
 	return;
 }
diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
index b76ca19..2dfbbe0 100644
--- a/arch/powerpc/kernel/mce_power.c
+++ b/arch/powerpc/kernel/mce_power.c
@@ -27,6 +27,29 @@ 
 #include <asm/mmu.h>
 #include <asm/mce.h>
 #include <asm/machdep.h>
+#include <asm/pgtable.h>
+#include <asm/pte-walk.h>
+#include <asm/sstep.h>
+
+/*
+ * Convert an address related to an mm to a PFN. NOTE: we are in real
+ * mode, we could potentially race with page table updates.
+ */
+static unsigned long addr_to_pfn(struct mm_struct *mm, unsigned long addr)
+{
+	pte_t *ptep;
+	unsigned long flags;
+
+	local_irq_save(flags);
+	if (mm == current->mm)
+		ptep = find_current_mm_pte(mm->pgd, addr, NULL, NULL);
+	else
+		ptep = find_init_mm_pte(addr, NULL);
+	local_irq_restore(flags);
+	if (!ptep || pte_special(*ptep))
+		return ULONG_MAX;
+	return pte_pfn(*ptep);
+}
 
 static void flush_tlb_206(unsigned int num_sets, unsigned int action)
 {
@@ -421,6 +444,48 @@  static const struct mce_derror_table mce_p9_derror_table[] = {
   MCE_INITIATOR_CPU,   MCE_SEV_ERROR_SYNC, },
 { 0, false, 0, 0, 0, 0 } };
 
+static int mce_find_instr_ea_and_pfn(struct pt_regs *regs, uint64_t *addr,
+					uint64_t *phys_addr)
+{
+	/*
+	 * Carefully look at the NIP to determine
+	 * the instruction to analyse. Reading the NIP
+	 * in real-mode is tricky and can lead to recursive
+	 * faults
+	 */
+	int instr;
+	struct mm_struct *mm;
+	unsigned long nip = regs->nip;
+	unsigned long pfn, instr_addr;
+	struct instruction_op op;
+	struct pt_regs tmp = *regs;
+
+	if (user_mode(regs))
+		mm = current->mm;
+	else
+		mm = &init_mm;
+
+	pfn = addr_to_pfn(mm, nip);
+	if (pfn != ULONG_MAX) {
+		instr_addr = (pfn << PAGE_SHIFT) + (nip & ~PAGE_MASK);
+		instr = *(unsigned int *)(instr_addr);
+		if (!analyse_instr(&op, &tmp, instr)) {
+			pfn = addr_to_pfn(mm, op.ea);
+			*addr = op.ea;
+			*phys_addr = pfn;
+			return 0;
+		}
+		/*
+		 * analyse_instr() might fail if the instruction
+		 * is not a load/store, although this is unexpected
+		 * for load/store errors or if we got the NIP
+		 * wrong
+		 */
+	}
+	*addr = 0;
+	return -1;
+}
+
 static int mce_handle_ierror(struct pt_regs *regs,
 		const struct mce_ierror_table table[],
 		struct mce_error_info *mce_err, uint64_t *addr)
@@ -489,7 +554,8 @@  static int mce_handle_ierror(struct pt_regs *regs,
 
 static int mce_handle_derror(struct pt_regs *regs,
 		const struct mce_derror_table table[],
-		struct mce_error_info *mce_err, uint64_t *addr)
+		struct mce_error_info *mce_err, uint64_t *addr,
+		uint64_t *phys_addr)
 {
 	uint64_t dsisr = regs->dsisr;
 	int handled = 0;
@@ -555,7 +621,10 @@  static int mce_handle_derror(struct pt_regs *regs,
 		mce_err->initiator = table[i].initiator;
 		if (table[i].dar_valid)
 			*addr = regs->dar;
-
+		else if (mce_err->severity == MCE_SEV_ERROR_SYNC &&
+				table[i].error_type == MCE_ERROR_TYPE_UE) {
+			mce_find_instr_ea_and_pfn(regs, addr, phys_addr);
+		}
 		found = 1;
 	}
 
@@ -592,19 +661,20 @@  static long mce_handle_error(struct pt_regs *regs,
 		const struct mce_ierror_table itable[])
 {
 	struct mce_error_info mce_err = { 0 };
-	uint64_t addr;
+	uint64_t addr, phys_addr;
 	uint64_t srr1 = regs->msr;
 	long handled;
 
 	if (SRR1_MC_LOADSTORE(srr1))
-		handled = mce_handle_derror(regs, dtable, &mce_err, &addr);
+		handled = mce_handle_derror(regs, dtable, &mce_err, &addr,
+				&phys_addr);
 	else
 		handled = mce_handle_ierror(regs, itable, &mce_err, &addr);
 
 	if (!handled && mce_err.error_type == MCE_ERROR_TYPE_UE)
 		handled = mce_handle_ue_error(regs);
 
-	save_mce_event(regs, handled, &mce_err, regs->nip, addr);
+	save_mce_event(regs, handled, &mce_err, regs->nip, addr, phys_addr);
 
 	return handled;
 }