diff mbox series

[v4,01/18] libstb/secureboot: use platform.terminate instead of hard abort

Message ID 20200511213152.24952-2-erichte@linux.ibm.com
State Changes Requested
Headers show
Series Add initial secure variable storage and backend drivers | expand

Checks

Context Check Description
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot-dco success Signed-off-by present
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot success Test snowpatch/job/snowpatch-skiboot on branch master
snowpatch_ozlabs/apply_patch success Successfully applied on branch master (0f1937ef40fca0c3212a9dff1010b832a24fb063)

Commit Message

Eric Richter May 11, 2020, 9:31 p.m. UTC
Halting the boot via an abort() call will cause the BMC to keep
restarting the machine indefinitely. Ending via platform.terminate()
should be cleaner and prevent needless bootloops.

This patch also exposes secureboot_enforce() for future secvar use to
cease the boot.

Signed-off-by: Eric Richter <erichte@linux.ibm.com>
---
 libstb/secureboot.c | 5 ++---
 libstb/secureboot.h | 1 +
 2 files changed, 3 insertions(+), 3 deletions(-)

Comments

Oliver O'Halloran May 12, 2020, 3:49 a.m. UTC | #1
On Mon, 2020-05-11 at 16:31 -0500, Eric Richter wrote:
> Halting the boot via an abort() call will cause the BMC to keep
> restarting the machine indefinitely.

The BMC is supposed to detect the host bootlooping and stop powering it
on after 5 or so.

> Ending via platform.terminate()
> should be cleaner and prevent needless bootloops.

abort() is a #defined as assert(0) which results in a call to
assert_fail():

void __noreturn assert_fail(const char *msg, const char *file,
                                unsigned int line, const char *function)
{
        static bool in_abort = false;

        (void)function;
        if (in_abort)
                for (;;) ;
        in_abort = true;

        /* snip comment */
        prlog(PR_EMERG, "assert failed at %s:%u: %s\n", file, line, msg);
        backtrace();

        /* Save crashing CPU details */
        opal_mpipl_save_crashing_pir();

        if (platform.terminate)
                platform.terminate(msg);

        for (;;) ;
}

So I'm not really seeing how this is an improvement. The terminate
callback is usually ipmi_terminate() which will also trigger an MPIPL
on the systems which support it. That might be the cause of your boot
loop?

Vasant, do you have anything to add?
Eric Richter May 20, 2020, 7:07 p.m. UTC | #2
On 5/11/20 10:49 PM, Oliver O'Halloran wrote:
> On Mon, 2020-05-11 at 16:31 -0500, Eric Richter wrote:
>> Halting the boot via an abort() call will cause the BMC to keep
>> restarting the machine indefinitely.
> 
> The BMC is supposed to detect the host bootlooping and stop powering it
> on after 5 or so.
> 
>> Ending via platform.terminate()
>> should be cleaner and prevent needless bootloops.
> 
> abort() is a #defined as assert(0) which results in a call to
> assert_fail():
> 
> void __noreturn assert_fail(const char *msg, const char *file,
>                                 unsigned int line, const char *function)
> {
>         static bool in_abort = false;
> 
>         (void)function;
>         if (in_abort)
>                 for (;;) ;
>         in_abort = true;
> 
>         /* snip comment */
>         prlog(PR_EMERG, "assert failed at %s:%u: %s\n", file, line, msg);
>         backtrace();
> 
>         /* Save crashing CPU details */
>         opal_mpipl_save_crashing_pir();
> 
>         if (platform.terminate)
>                 platform.terminate(msg);
> 
>         for (;;) ;
> }
> 
> So I'm not really seeing how this is an improvement. The terminate
> callback is usually ipmi_terminate() which will also trigger an MPIPL
> on the systems which support it. That might be the cause of your boot
> loop?
> 

Admittedly, I took the off-list suggestion directly, and didn't look deep into the terminate() behavior. Is there a better method to guaranteed halt the machine that could be used instead? There is no reason to attempt a reboot/MPIPL if a secureboot trap is hit.

> Vasant, do you have anything to add?
> 
>
Vasant Hegde June 13, 2020, 12:42 p.m. UTC | #3
On 5/12/20 9:19 AM, Oliver O'Halloran wrote:
> On Mon, 2020-05-11 at 16:31 -0500, Eric Richter wrote:
>> Halting the boot via an abort() call will cause the BMC to keep
>> restarting the machine indefinitely.

Sorry for delay. I missed to respond to couple of emails.

If MPIPL is enabled its resulting in calling MPIPL, not reboot. So BMC will not 
be aware of MPIPL boots. Hence we endup in continuous loop.

> 
> The BMC is supposed to detect the host bootlooping and stop powering it
> on after 5 or so.
> 

Correct. I think its 3 continuous boot failures.

>> Ending via platform.terminate()
>> should be cleaner and prevent needless bootloops.
> 
> abort() is a #defined as assert(0) which results in a call to
> assert_fail():
> 
> void __noreturn assert_fail(const char *msg, const char *file,
>                                  unsigned int line, const char *function)
> {
>          static bool in_abort = false;
> 
>          (void)function;
>          if (in_abort)
>                  for (;;) ;
>          in_abort = true;
> 
>          /* snip comment */
>          prlog(PR_EMERG, "assert failed at %s:%u: %s\n", file, line, msg);
>          backtrace();
> 
>          /* Save crashing CPU details */
>          opal_mpipl_save_crashing_pir();
> 
>          if (platform.terminate)
>                  platform.terminate(msg);
> 
>          for (;;) ;
> }
> 
> So I'm not really seeing how this is an improvement. The terminate
> callback is usually ipmi_terminate() which will also trigger an MPIPL
> on the systems which support it. That might be the cause of your boot
> loop?
> 
> Vasant, do you have anything to add?

Problem today is, it trigger MPIPL if it find `dump` node in DT. I think we 
should defer MPIPL enablement until our init is complete. I will fix this and 
send patch soon.

-Vasant
Vasant Hegde June 15, 2020, 6:19 a.m. UTC | #4
On 5/21/20 12:37 AM, Eric Richter wrote:
> On 5/11/20 10:49 PM, Oliver O'Halloran wrote:
>> On Mon, 2020-05-11 at 16:31 -0500, Eric Richter wrote:
>>> Halting the boot via an abort() call will cause the BMC to keep
>>> restarting the machine indefinitely.
>>
>> The BMC is supposed to detect the host bootlooping and stop powering it
>> on after 5 or so.
>>
>>> Ending via platform.terminate()
>>> should be cleaner and prevent needless bootloops.
>>
>> abort() is a #defined as assert(0) which results in a call to
>> assert_fail():
>>
>> void __noreturn assert_fail(const char *msg, const char *file,
>>                                  unsigned int line, const char *function)
>> {
>>          static bool in_abort = false;
>>
>>          (void)function;
>>          if (in_abort)
>>                  for (;;) ;
>>          in_abort = true;
>>
>>          /* snip comment */
>>          prlog(PR_EMERG, "assert failed at %s:%u: %s\n", file, line, msg);
>>          backtrace();
>>
>>          /* Save crashing CPU details */
>>          opal_mpipl_save_crashing_pir();
>>
>>          if (platform.terminate)
>>                  platform.terminate(msg);
>>
>>          for (;;) ;
>> }
>>
>> So I'm not really seeing how this is an improvement. The terminate
>> callback is usually ipmi_terminate() which will also trigger an MPIPL
>> on the systems which support it. That might be the cause of your boot
>> loop?
>>
> 
> Admittedly, I took the off-list suggestion directly, and didn't look deep into the terminate() behavior. Is there a better method to guaranteed halt the machine that could be used instead? There is no reason to attempt a reboot/MPIPL if a secureboot trap is hit.

I have posted patch to fix MPIPL behavior in boot path [1]. With that assert() 
should work fine.

[1] https://lists.ozlabs.org/pipermail/skiboot/2020-June/017075.html

Let me know if you still hit any issue with assert().

-Vasant
diff mbox series

Patch

diff --git a/libstb/secureboot.c b/libstb/secureboot.c
index c8697216..2a4b975e 100644
--- a/libstb/secureboot.c
+++ b/libstb/secureboot.c
@@ -27,7 +27,7 @@  static struct {
 	{ IBM_SECUREBOOT_V2, "ibm,secureboot-v2" },
 };
 
-static void secureboot_enforce(void)
+void secureboot_enforce(void)
 {
 	/* Sanity check */
 	if (!secure_mode)
@@ -39,8 +39,7 @@  static void secureboot_enforce(void)
 	 * extra info to BMC other than just abort.  Terminate Immediate
 	 * Attention ? (TI)
 	 */
-	prlog(PR_EMERG, "secure mode enforced, aborting.\n");
-	abort();
+	platform.terminate("secure mode enforced, aborting.\n");
 }
 
 bool secureboot_is_compatible(struct dt_node *node, int *version, const char **compat)
diff --git a/libstb/secureboot.h b/libstb/secureboot.h
index 0792dd5a..721b28de 100644
--- a/libstb/secureboot.h
+++ b/libstb/secureboot.h
@@ -15,6 +15,7 @@  enum secureboot_version {
 	IBM_SECUREBOOT_V2,
 };
 
+void secureboot_enforce(void);
 bool secureboot_is_compatible(struct dt_node *node, int *version, const char **compat);
 void secureboot_init(void);