[RFC,3/3] core/vm: try to handle recoverable MCEs by turning off VMM
diff mbox series

Message ID 20190605023616.26893-4-npiggin@gmail.com
State New
Headers show
Series
  • WIP VMM for OPAL boot
Related show

Checks

Context Check Description
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot-dco fail Signed-off-by missing
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot fail Test snowpatch/job/snowpatch-skiboot on branch master
snowpatch_ozlabs/apply_patch success Successfully applied on branch master (0a2f8fbf931491ed97c1d11a5ae85b9d30338162)

Commit Message

Nicholas Piggin June 5, 2019, 2:36 a.m. UTC
This is not foolproof because local mappings and io accessors will
not cope with the VMM being switched off, but it may catch some
cases. io accessors in particular won't recover properly because
they won't switch to use the real-mode cache inhibited instructions
after VMM turns off. That could be solved by marking IO accessors
as non-recoverable, or providing an exception recovery address
which retries.

This is mostly just a debugging aid, real MCE recovery for VMM
faults would have to flush SLB and translation caches.
---
 core/exceptions.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

Patch
diff mbox series

diff --git a/core/exceptions.c b/core/exceptions.c
index 89b4451ab..8dafed676 100644
--- a/core/exceptions.c
+++ b/core/exceptions.c
@@ -41,6 +41,21 @@  static void dump_regs(struct stack_frame *stack)
 
 #define EXCEPTION_MAX_STR 320
 
+static void print_recoverable_mce_vm(struct stack_frame *stack, uint64_t nip, uint64_t msr)
+{
+	char buf[EXCEPTION_MAX_STR];
+	size_t l;
+
+	l = 0;
+	l += snprintf(buf + l, EXCEPTION_MAX_STR - l,
+		"Recoverable MCE with VM on at "REG"   ", nip);
+	l += snprintf_symbol(buf + l, EXCEPTION_MAX_STR - l, nip);
+	l += snprintf(buf + l, EXCEPTION_MAX_STR - l, "  MSR "REG, msr);
+	prerror("%s\n", buf);
+	dump_regs(stack);
+	prerror("Continuing with VM off\n");
+}
+
 void exception_entry(struct stack_frame *stack)
 {
 	bool fatal = false;
@@ -92,6 +107,17 @@  void exception_entry(struct stack_frame *stack)
 		break;
 
 	case 0x200:
+		if (this_cpu()->vm_local_map_inuse)
+			fatal = true; /* local map is non-linear */
+
+		if (!fatal && (msr & (MSR_IR|MSR_DR))) {
+			print_recoverable_mce_vm(stack, nip, msr);
+			/* Turn off VM and try again */
+			this_cpu()->vm_setup = false;
+			stack->srr1 &= ~(MSR_IR|MSR_DR);
+			goto out;
+		}
+
 		fatal = true;
 		prerror("***********************************************\n");
 		l += snprintf(buf + l, EXCEPTION_MAX_STR - l,