Message ID | 20200511213152.24952-2-erichte@linux.ibm.com |
---|---|
State | Changes Requested |
Headers | show |
Series | Add initial secure variable storage and backend drivers | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch master (0f1937ef40fca0c3212a9dff1010b832a24fb063) |
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot | success | Test snowpatch/job/snowpatch-skiboot on branch master |
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot-dco | success | Signed-off-by present |
On Mon, 2020-05-11 at 16:31 -0500, Eric Richter wrote: > Halting the boot via an abort() call will cause the BMC to keep > restarting the machine indefinitely. The BMC is supposed to detect the host bootlooping and stop powering it on after 5 or so. > Ending via platform.terminate() > should be cleaner and prevent needless bootloops. abort() is a #defined as assert(0) which results in a call to assert_fail(): void __noreturn assert_fail(const char *msg, const char *file, unsigned int line, const char *function) { static bool in_abort = false; (void)function; if (in_abort) for (;;) ; in_abort = true; /* snip comment */ prlog(PR_EMERG, "assert failed at %s:%u: %s\n", file, line, msg); backtrace(); /* Save crashing CPU details */ opal_mpipl_save_crashing_pir(); if (platform.terminate) platform.terminate(msg); for (;;) ; } So I'm not really seeing how this is an improvement. The terminate callback is usually ipmi_terminate() which will also trigger an MPIPL on the systems which support it. That might be the cause of your boot loop? Vasant, do you have anything to add?
On 5/11/20 10:49 PM, Oliver O'Halloran wrote: > On Mon, 2020-05-11 at 16:31 -0500, Eric Richter wrote: >> Halting the boot via an abort() call will cause the BMC to keep >> restarting the machine indefinitely. > > The BMC is supposed to detect the host bootlooping and stop powering it > on after 5 or so. > >> Ending via platform.terminate() >> should be cleaner and prevent needless bootloops. > > abort() is a #defined as assert(0) which results in a call to > assert_fail(): > > void __noreturn assert_fail(const char *msg, const char *file, > unsigned int line, const char *function) > { > static bool in_abort = false; > > (void)function; > if (in_abort) > for (;;) ; > in_abort = true; > > /* snip comment */ > prlog(PR_EMERG, "assert failed at %s:%u: %s\n", file, line, msg); > backtrace(); > > /* Save crashing CPU details */ > opal_mpipl_save_crashing_pir(); > > if (platform.terminate) > platform.terminate(msg); > > for (;;) ; > } > > So I'm not really seeing how this is an improvement. The terminate > callback is usually ipmi_terminate() which will also trigger an MPIPL > on the systems which support it. That might be the cause of your boot > loop? > Admittedly, I took the off-list suggestion directly, and didn't look deep into the terminate() behavior. Is there a better method to guaranteed halt the machine that could be used instead? There is no reason to attempt a reboot/MPIPL if a secureboot trap is hit. > Vasant, do you have anything to add? > >
On 5/12/20 9:19 AM, Oliver O'Halloran wrote: > On Mon, 2020-05-11 at 16:31 -0500, Eric Richter wrote: >> Halting the boot via an abort() call will cause the BMC to keep >> restarting the machine indefinitely. Sorry for delay. I missed to respond to couple of emails. If MPIPL is enabled its resulting in calling MPIPL, not reboot. So BMC will not be aware of MPIPL boots. Hence we endup in continuous loop. > > The BMC is supposed to detect the host bootlooping and stop powering it > on after 5 or so. > Correct. I think its 3 continuous boot failures. >> Ending via platform.terminate() >> should be cleaner and prevent needless bootloops. > > abort() is a #defined as assert(0) which results in a call to > assert_fail(): > > void __noreturn assert_fail(const char *msg, const char *file, > unsigned int line, const char *function) > { > static bool in_abort = false; > > (void)function; > if (in_abort) > for (;;) ; > in_abort = true; > > /* snip comment */ > prlog(PR_EMERG, "assert failed at %s:%u: %s\n", file, line, msg); > backtrace(); > > /* Save crashing CPU details */ > opal_mpipl_save_crashing_pir(); > > if (platform.terminate) > platform.terminate(msg); > > for (;;) ; > } > > So I'm not really seeing how this is an improvement. The terminate > callback is usually ipmi_terminate() which will also trigger an MPIPL > on the systems which support it. That might be the cause of your boot > loop? > > Vasant, do you have anything to add? Problem today is, it trigger MPIPL if it find `dump` node in DT. I think we should defer MPIPL enablement until our init is complete. I will fix this and send patch soon. -Vasant
On 5/21/20 12:37 AM, Eric Richter wrote: > On 5/11/20 10:49 PM, Oliver O'Halloran wrote: >> On Mon, 2020-05-11 at 16:31 -0500, Eric Richter wrote: >>> Halting the boot via an abort() call will cause the BMC to keep >>> restarting the machine indefinitely. >> >> The BMC is supposed to detect the host bootlooping and stop powering it >> on after 5 or so. >> >>> Ending via platform.terminate() >>> should be cleaner and prevent needless bootloops. >> >> abort() is a #defined as assert(0) which results in a call to >> assert_fail(): >> >> void __noreturn assert_fail(const char *msg, const char *file, >> unsigned int line, const char *function) >> { >> static bool in_abort = false; >> >> (void)function; >> if (in_abort) >> for (;;) ; >> in_abort = true; >> >> /* snip comment */ >> prlog(PR_EMERG, "assert failed at %s:%u: %s\n", file, line, msg); >> backtrace(); >> >> /* Save crashing CPU details */ >> opal_mpipl_save_crashing_pir(); >> >> if (platform.terminate) >> platform.terminate(msg); >> >> for (;;) ; >> } >> >> So I'm not really seeing how this is an improvement. The terminate >> callback is usually ipmi_terminate() which will also trigger an MPIPL >> on the systems which support it. That might be the cause of your boot >> loop? >> > > Admittedly, I took the off-list suggestion directly, and didn't look deep into the terminate() behavior. Is there a better method to guaranteed halt the machine that could be used instead? There is no reason to attempt a reboot/MPIPL if a secureboot trap is hit. I have posted patch to fix MPIPL behavior in boot path [1]. With that assert() should work fine. [1] https://lists.ozlabs.org/pipermail/skiboot/2020-June/017075.html Let me know if you still hit any issue with assert(). -Vasant
diff --git a/libstb/secureboot.c b/libstb/secureboot.c index c8697216..2a4b975e 100644 --- a/libstb/secureboot.c +++ b/libstb/secureboot.c @@ -27,7 +27,7 @@ static struct { { IBM_SECUREBOOT_V2, "ibm,secureboot-v2" }, }; -static void secureboot_enforce(void) +void secureboot_enforce(void) { /* Sanity check */ if (!secure_mode) @@ -39,8 +39,7 @@ static void secureboot_enforce(void) * extra info to BMC other than just abort. Terminate Immediate * Attention ? (TI) */ - prlog(PR_EMERG, "secure mode enforced, aborting.\n"); - abort(); + platform.terminate("secure mode enforced, aborting.\n"); } bool secureboot_is_compatible(struct dt_node *node, int *version, const char **compat) diff --git a/libstb/secureboot.h b/libstb/secureboot.h index 0792dd5a..721b28de 100644 --- a/libstb/secureboot.h +++ b/libstb/secureboot.h @@ -15,6 +15,7 @@ enum secureboot_version { IBM_SECUREBOOT_V2, }; +void secureboot_enforce(void); bool secureboot_is_compatible(struct dt_node *node, int *version, const char **compat); void secureboot_init(void);
Halting the boot via an abort() call will cause the BMC to keep restarting the machine indefinitely. Ending via platform.terminate() should be cleaner and prevent needless bootloops. This patch also exposes secureboot_enforce() for future secvar use to cease the boot. Signed-off-by: Eric Richter <erichte@linux.ibm.com> --- libstb/secureboot.c | 5 ++--- libstb/secureboot.h | 1 + 2 files changed, 3 insertions(+), 3 deletions(-)