diff mbox series

[v2,1/2] mpipl: Delay MPIPL registration until OPAL init is complete

Message ID 20200617071610.127371-1-hegdevasant@linux.vnet.ibm.com
State Accepted
Headers show
Series [v2,1/2] mpipl: Delay MPIPL registration until OPAL init is complete | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch master (fe70fbb78d33abea788a3221bc409a7c50c019c3)
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot success Test snowpatch/job/snowpatch-skiboot on branch master
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot-dco success Signed-off-by present

Commit Message

Vasant Hegde June 17, 2020, 7:16 a.m. UTC
If OPAL boot fails after MPIPL init (opal_mpipl_init()) then we call MPIPL
boot instead of reboot. BMC is not aware of MPIPL. Hence it may result in
continuous MPIPL loop (boot -> crash -> MPIPL -> boot).

If OPAL boot fails (before loading kernel) then its better to call reboot.
So that BMC can detect `n` number of boot failures (generally n = 3) and
stop booting. That way we can avoid continuous loop.

This patch moves MPIPL init to the end of init process (just before starting
kernel). So that if we fail to boot OPAL we call normal reboot.

Also this patch introduces new function to detect MPIPL is enabled or not
(is_mpipl_enabled()). And in assert() path we check for this function
instead of `dump` DT node. So that it will make sure we will not call
MPIPL until opal_mpipl_init is complete.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
---
Changes in v2:
  - Moved mpipl_init just before checksum_romem(). Othesewise fast
    reboot fails.

-Vasant

 core/init.c         |  7 ++++---
 core/opal-dump.c    | 10 ++++++++++
 hw/sbe-p9.c         |  2 +-
 include/opal-dump.h |  3 +++
 4 files changed, 18 insertions(+), 4 deletions(-)

Comments

Oliver O'Halloran June 23, 2020, 1:48 a.m. UTC | #1
On Wed, Jun 17, 2020 at 5:16 PM Vasant Hegde
<hegdevasant@linux.vnet.ibm.com> wrote:
>
> If OPAL boot fails after MPIPL init (opal_mpipl_init()) then we call MPIPL
> boot instead of reboot. BMC is not aware of MPIPL. Hence it may result in
> continuous MPIPL loop (boot -> crash -> MPIPL -> boot).
>
> If OPAL boot fails (before loading kernel) then its better to call reboot.
> So that BMC can detect `n` number of boot failures (generally n = 3) and
> stop booting. That way we can avoid continuous loop.
>
> This patch moves MPIPL init to the end of init process (just before starting
> kernel). So that if we fail to boot OPAL we call normal reboot.
>
> Also this patch introduces new function to detect MPIPL is enabled or not
> (is_mpipl_enabled()). And in assert() path we check for this function
> instead of `dump` DT node. So that it will make sure we will not call
> MPIPL until opal_mpipl_init is complete.

Why do we even allow repeated MPIPLs? Just seems like a good way to
get stuck in a boot loop.

This patch sorta helps, but we've got the same problem if we happen to
hit a crash while petitboot is running. Maybe we should only MPIPL if
the host kernel explicitly requests one rather than hacking it into
the termination path like we're doing now.
Vasant Hegde June 23, 2020, 10:31 a.m. UTC | #2
On 6/23/20 7:18 AM, Oliver O'Halloran wrote:
> On Wed, Jun 17, 2020 at 5:16 PM Vasant Hegde
> <hegdevasant@linux.vnet.ibm.com> wrote:
>>
>> If OPAL boot fails after MPIPL init (opal_mpipl_init()) then we call MPIPL
>> boot instead of reboot. BMC is not aware of MPIPL. Hence it may result in
>> continuous MPIPL loop (boot -> crash -> MPIPL -> boot).
>>
>> If OPAL boot fails (before loading kernel) then its better to call reboot.
>> So that BMC can detect `n` number of boot failures (generally n = 3) and
>> stop booting. That way we can avoid continuous loop.
>>
>> This patch moves MPIPL init to the end of init process (just before starting
>> kernel). So that if we fail to boot OPAL we call normal reboot.
>>
>> Also this patch introduces new function to detect MPIPL is enabled or not
>> (is_mpipl_enabled()). And in assert() path we check for this function
>> instead of `dump` DT node. So that it will make sure we will not call
>> MPIPL until opal_mpipl_init is complete.
> 
> Why do we even allow repeated MPIPLs? Just seems like a good way to
> get stuck in a boot loop.
> 
> This patch sorta helps, but we've got the same problem if we happen to
> hit a crash while petitboot is running. Maybe we should only MPIPL if
> the host kernel explicitly requests one rather than hacking it into
> the termination path like we're doing now.
> 

Yes. We should improve crash path.

I think for now this is good enough. If petitboot crashes it will call xstop() 
and BMC
can detect it.

-Vasant
Oliver O'Halloran July 3, 2020, 5:37 a.m. UTC | #3
On Wed, Jun 17, 2020 at 5:16 PM Vasant Hegde
<hegdevasant@linux.vnet.ibm.com> wrote:
>
> If OPAL boot fails after MPIPL init (opal_mpipl_init()) then we call MPIPL
> boot instead of reboot. BMC is not aware of MPIPL. Hence it may result in
> continuous MPIPL loop (boot -> crash -> MPIPL -> boot).
>
> If OPAL boot fails (before loading kernel) then its better to call reboot.
> So that BMC can detect `n` number of boot failures (generally n = 3) and
> stop booting. That way we can avoid continuous loop.
>
> This patch moves MPIPL init to the end of init process (just before starting
> kernel). So that if we fail to boot OPAL we call normal reboot.
>
> Also this patch introduces new function to detect MPIPL is enabled or not
> (is_mpipl_enabled()). And in assert() path we check for this function
> instead of `dump` DT node. So that it will make sure we will not call
> MPIPL until opal_mpipl_init is complete.

Thanks, series merged as of 9f5374b46aab
diff mbox series

Patch

diff --git a/core/init.c b/core/init.c
index 9f3b8c6fd..0a593cd1a 100644
--- a/core/init.c
+++ b/core/init.c
@@ -635,6 +635,10 @@  void __noreturn load_and_boot_kernel(bool is_reboot)
 	patch_traps(false);
 	cpu_set_hile_mode(false); /* Clear HILE on all CPUs */
 
+	/* init MPIPL */
+	if (!is_reboot)
+		opal_mpipl_init();
+
 	checksum_romem();
 
 	debug_descriptor.state_flags |= OPAL_BOOT_COMPLETE;
@@ -1363,9 +1367,6 @@  void __noreturn __nomcount main_cpu_entry(const void *fdt)
 	/* Create the LPC bus interrupt-map on P9 */
 	lpc_finalize_interrupts();
 
-	/* init opal dump */
-	opal_mpipl_init();
-
 	/* Add the list of interrupts going to OPAL */
 	add_opal_interrupts();
 
diff --git a/core/opal-dump.c b/core/opal-dump.c
index a31ecbe4d..ca6bf0648 100644
--- a/core/opal-dump.c
+++ b/core/opal-dump.c
@@ -67,6 +67,8 @@  static int opal_mpipl_max_tags = MAX_OPAL_MPIPL_TAGS;
 
 static u64 opal_dump_addr, opal_dump_size;
 
+static bool mpipl_enabled;
+
 static int opal_mpipl_add_entry(u8 region, u64 src, u64 dest, u64 size)
 {
 	int i;
@@ -526,6 +528,11 @@  void opal_mpipl_reserve_mem(void)
 		       arch_regs_dest, arch_regs_size);
 }
 
+bool is_mpipl_enabled(void)
+{
+	return mpipl_enabled;
+}
+
 void opal_mpipl_init(void)
 {
 	void *mdst_base = (void *)MDST_TABLE_BASE;
@@ -575,4 +582,7 @@  void opal_mpipl_init(void)
 	opal_register(OPAL_MPIPL_UPDATE, opal_mpipl_update, 4);
 	opal_register(OPAL_MPIPL_REGISTER_TAG, opal_mpipl_register_tag, 2);
 	opal_register(OPAL_MPIPL_QUERY_TAG, opal_mpipl_query_tag, 2);
+
+	/* Enable MPIPL */
+	mpipl_enabled = true;
 }
diff --git a/hw/sbe-p9.c b/hw/sbe-p9.c
index 18caa0a28..24ed91b93 100644
--- a/hw/sbe-p9.c
+++ b/hw/sbe-p9.c
@@ -949,7 +949,7 @@  void p9_sbe_terminate(void)
 	struct proc_chip *chip;
 
 	/* Return if MPIPL is not supported */
-	if (!dt_find_by_path(opal_node, "dump"))
+	if (!is_mpipl_enabled())
 		return;
 
 	/* Unregister flash. It will request BMC MBOX reset */
diff --git a/include/opal-dump.h b/include/opal-dump.h
index 060712e99..bc1b8768d 100644
--- a/include/opal-dump.h
+++ b/include/opal-dump.h
@@ -127,4 +127,7 @@  void opal_mpipl_save_crashing_pir(void);
 /* Reserve memory to capture OPAL dump */
 extern void opal_mpipl_reserve_mem(void);
 
+/* Check MPIPL enabled or not */
+extern bool is_mpipl_enabled(void);
+
 #endif	/* __OPAL_DUMP_H */