diff mbox series

astbmc: Handle failure to initialise raw flash

Message ID 20190320061620.27598-1-andrew@aj.id.au
State Accepted
Headers show
Series astbmc: Handle failure to initialise raw flash | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch master (b392d785eb49630b9f00fef8d17944ed82b2c1fe)
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot success Test snowpatch/job/snowpatch-skiboot on branch master
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot-dco success Signed-off-by present

Commit Message

Andrew Jeffery March 20, 2019, 6:16 a.m. UTC
Initialising raw flash lead to a dead assignment to rc. Check the return
code and take the failure path as necessary. Both before and after the
fix we see output along the lines of the following when flash_init()
fails:

[   53.283182881,7] IRQ: Registering 0800..0ff7 ops @0x300d4b98 (data 0x3052b9d8)
[   53.283184335,7] IRQ: Registering 0ff8..0fff ops @0x300d4bc8 (data 0x3052b9d8)
[   53.283185513,7] PHB#0000: Initializing PHB...
[   53.288260827,4] FLASH: Can't load resource id:0. No system flash found
[   53.288354442,4] FLASH: Can't load resource id:1. No system flash found
[   53.342933439,3] CAPP: Error loading ucode lid. index=200ea
[   53.462749486,2] NVRAM: Failed to load
[   53.462819095,2] NVRAM: Failed to load
[   53.462894236,2] NVRAM: Failed to load
[   53.462967071,2] NVRAM: Failed to load
[   53.463033077,2] NVRAM: Failed to load
[   53.463144847,2] NVRAM: Failed to load

Eventually followed by:

[   57.216942479,5] INIT: platform wait for kernel load failed
[   57.217051132,5] INIT: Assuming kernel at 0x20000000
[   57.217127508,3] INIT: ELF header not found. Assuming raw binary.
[   57.217249886,2] NVRAM: Failed to load
[   57.221294487,0] FATAL: Kernel is zeros, can't execute!
[   57.221397429,0] Assert fail: core/init.c:615:0
[   57.221471414,0] Aborting!
CPU 0028 Backtrace:
 S: 0000000031d43c60 R: 000000003001b274   ._abort+0x4c
 S: 0000000031d43ce0 R: 000000003001b2f0   .assert_fail+0x34
 S: 0000000031d43d60 R: 0000000030014814   .load_and_boot_kernel+0xae4
 S: 0000000031d43e30 R: 0000000030015164   .main_cpu_entry+0x680
 S: 0000000031d43f00 R: 0000000030002718   boot_entry+0x1c0
 --- OPAL boot ---

Analysis of the execution paths suggests we'll always "safely" end this
way due the setup sequence for the blocklevel callbacks in flash_init()
and error handling in blocklevel_get_info(), and there's no current risk
of executing from unexpected memory locations. As such the issue is
reduced to down to a fix for poor error hygene in the original change
and a resolution for a Coverity warning (famous last words etc).

Fixes: c826e1ca9e5b ("astbmc: Try IPMI HIOMAP for P8 (again)")
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
---
 platforms/astbmc/pnor.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Vasant Hegde March 20, 2019, 8:28 a.m. UTC | #1
On 03/20/2019 11:46 AM, Andrew Jeffery wrote:
> Initialising raw flash lead to a dead assignment to rc. Check the return
> code and take the failure path as necessary. Both before and after the
> fix we see output along the lines of the following when flash_init()
> fails:

If flash_init() fails, then we are sure we are going to crash at later stage.
May be we should assert() if falsh_init() fails. But that will change current flow.

Between I think this should go to Stable # 6.2.x. I will grab this patch for 
next stable release.

-Vasant
Andrew Jeffery March 20, 2019, 8:34 a.m. UTC | #2
On Wed, 20 Mar 2019, at 18:58, Vasant Hegde wrote:
> On 03/20/2019 11:46 AM, Andrew Jeffery wrote:
> > Initialising raw flash lead to a dead assignment to rc. Check the return
> > code and take the failure path as necessary. Both before and after the
> > fix we see output along the lines of the following when flash_init()
> > fails:
> 
> If flash_init() fails, then we are sure we are going to crash at later 
> stage.
> May be we should assert() if falsh_init() fails. But that will change 
> current flow.
> 
> Between I think this should go to Stable # 6.2.x. I will grab this patch for 
> next stable release.

Stewart and I were debating whether it should go to stable. If you're happy to pick it up I'm fine with that; we had concluded that it was somewhat cosmetic as things stand now.

Andrew

> 
> -Vasant
> 
>
Stewart Smith March 29, 2019, 4:35 a.m. UTC | #3
"Andrew Jeffery" <andrew@aj.id.au> writes:
> On Wed, 20 Mar 2019, at 18:58, Vasant Hegde wrote:
>> On 03/20/2019 11:46 AM, Andrew Jeffery wrote:
>> > Initialising raw flash lead to a dead assignment to rc. Check the return
>> > code and take the failure path as necessary. Both before and after the
>> > fix we see output along the lines of the following when flash_init()
>> > fails:
>> 
>> If flash_init() fails, then we are sure we are going to crash at later 
>> stage.
>> May be we should assert() if falsh_init() fails. But that will change 
>> current flow.
>> 
>> Between I think this should go to Stable # 6.2.x. I will grab this patch for 
>> next stable release.
>
> Stewart and I were debating whether it should go to stable. If you're
> happy to pick it up I'm fine with that; we had concluded that it was
> somewhat cosmetic as things stand now.

I've picked it up for master as of
6225d11924933f0541fd3b1a6192a7fcda279f1f.

The problem I mentioned on internal slack with Mambo and STB payloads
was actually coming from 8d8a9ca30453bc92977409966e6006f8c18f4be5 and
some poor bisecting / makefile dependencies. So I fixed it in merging
that and drank heavily as I remembered exactly how we work out where on
earth the payload is and how we jump to it.

I'd be deeply keen for us to simplify that code path and be very very
very predictable (maybe standardise on things being in the DT if we're
not loading from flash?). I think the biggest users of "kernel is
already in memory" are sim and BML.
Stewart Smith March 29, 2019, 4:59 a.m. UTC | #4
"Andrew Jeffery" <andrew@aj.id.au> writes:
> On Wed, 20 Mar 2019, at 18:58, Vasant Hegde wrote:
>> On 03/20/2019 11:46 AM, Andrew Jeffery wrote:
>> > Initialising raw flash lead to a dead assignment to rc. Check the return
>> > code and take the failure path as necessary. Both before and after the
>> > fix we see output along the lines of the following when flash_init()
>> > fails:
>> 
>> If flash_init() fails, then we are sure we are going to crash at later 
>> stage.
>> May be we should assert() if falsh_init() fails. But that will change 
>> current flow.
>> 
>> Between I think this should go to Stable # 6.2.x. I will grab this patch for 
>> next stable release.
>
> Stewart and I were debating whether it should go to stable. If you're
> happy to pick it up I'm fine with that; we had concluded that it was
> somewhat cosmetic as things stand now.

Oh, I meant to mention, I'm going to *completely* leave that decision up
to Vasant to bikeshed. I'd err on the side of probably not bringing it
to stable, but there's an argument to be had to be more conservative and
die reliably in the first place you know there's trouble if only to
avoid some crazy problem we haven't yet thought of/discovered.
diff mbox series

Patch

diff --git a/platforms/astbmc/pnor.c b/platforms/astbmc/pnor.c
index 1c364351e065..6c1d287220b4 100644
--- a/platforms/astbmc/pnor.c
+++ b/platforms/astbmc/pnor.c
@@ -79,8 +79,11 @@  int pnor_init(void)
 		goto fail;
 	}
 
-	if (style == raw_flash || style == raw_mem)
+	if (style == raw_flash || style == raw_mem) {
 	    rc = flash_init(pnor_ctrl, &bl, NULL);
+	    if (rc)
+		goto fail;
+	}
 
 	rc = flash_register(bl);
 	if (!rc)