diff mbox

cpu: Cleanup AMR and IAMR when re-initializing CPUs

Message ID 1498602601.3651.36.camel@kernel.crashing.org
State Accepted
Headers show

Commit Message

Benjamin Herrenschmidt June 27, 2017, 10:30 p.m. UTC
There's a bug in current Linux kernels leaving crap in those registers
accross kexec and not sanitizing them on boot. This breaks kexec under
some circumstances (such as booting a hash kernel from a radix one
on P9 DD2.0).

The long term fix is in Linux, but this workaround is a reasonable
way of "sanitizing" those SPRs when Linux calls opal_reinit_cpus()
and shouldn't have adverse effects.

We could also use that same mechanism to cleanup other things as
well such as restoring some other SPRs to their default value in
the future.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---

Balbir, can you sort out the kernel ? It should cleanup AMR/IAMR
on entry at least, preferably on kexec too.

 core/cpu.c          | 29 +++++++++++++++++++++++++++++
 include/processor.h |  2 ++
 2 files changed, 31 insertions(+)

Comments

Stewart Smith July 3, 2017, 3:21 a.m. UTC | #1
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> There's a bug in current Linux kernels leaving crap in those registers
> accross kexec and not sanitizing them on boot. This breaks kexec under
> some circumstances (such as booting a hash kernel from a radix one
> on P9 DD2.0).
>
> The long term fix is in Linux, but this workaround is a reasonable
> way of "sanitizing" those SPRs when Linux calls opal_reinit_cpus()
> and shouldn't have adverse effects.
>
> We could also use that same mechanism to cleanup other things as
> well such as restoring some other SPRs to their default value in
> the future.
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Merged to master (and 5.7-rc1) as of d98c46b55801015a887fa27752db421280a48f7b
diff mbox

Patch

diff --git a/core/cpu.c b/core/cpu.c
index 75c7008..a0c395f 100644
--- a/core/cpu.c
+++ b/core/cpu.c
@@ -1077,6 +1077,27 @@  void cpu_set_radix_mode(void)
 	cpu_change_all_hid0(&req);
 }
 
+static void cpu_cleanup_one(void *param __unused)
+{
+	mtspr(SPR_AMR, 0);
+	mtspr(SPR_IAMR, 0);
+}
+
+static int64_t cpu_cleanup_all(void)
+{
+	struct cpu_thread *cpu;
+
+	for_each_available_cpu(cpu) {
+		if (cpu == this_cpu()) {
+			cpu_cleanup_one(NULL);
+			continue;
+		}
+		cpu_wait_job(cpu_queue_job(cpu, "cpu_cleanup",
+					   cpu_cleanup_one, NULL), true);
+	}
+	return OPAL_SUCCESS;
+}
+
 void cpu_fast_reboot_complete(void)
 {
 	/* Fast reboot will have cleared HID0:HILE */
@@ -1132,6 +1153,14 @@  static int64_t opal_reinit_cpus(uint64_t flags)
 	this_cpu()->in_reinit = true;
 	unlock(&reinit_lock);
 
+	/*
+	 * This cleans up a few things left over by Linux
+	 * that can cause problems in cases such as radix->hash
+	 * transitions. Ideally Linux should do it but doing it
+	 * here works around existing broken kernels.
+	 */
+	cpu_cleanup_all();
+
 	/* If HILE change via HID0 is supported ... */
 	if (hile_supported &&
 	    (flags & (OPAL_REINIT_CPUS_HILE_BE |
diff --git a/include/processor.h b/include/processor.h
index af3fd2b..2e1ac37 100644
--- a/include/processor.h
+++ b/include/processor.h
@@ -51,6 +51,8 @@ 
 #define SPR_SRR0	0x01a	/* RW: Exception save/restore reg 0 */
 #define SPR_SRR1	0x01b	/* RW: Exception save/restore reg 1 */
 #define SPR_CFAR	0x01c	/* RW: Come From Address Register */
+#define SPR_AMR		0x01d	/* RW: Authority Mask Register */
+#define SPR_IAMR	0x03d	/* RW: Instruction Authority Mask Register */
 #define SPR_RPR		0x0ba   /* RW: Relative Priority Register */
 #define SPR_TBRL	0x10c	/* RO: Timebase low */
 #define SPR_TBRU	0x10d	/* RO: Timebase high */