Message ID | 20190320061620.27598-1-andrew@aj.id.au |
---|---|
State | Accepted |
Headers | show |
Series | astbmc: Handle failure to initialise raw flash | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch master (b392d785eb49630b9f00fef8d17944ed82b2c1fe) |
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot | success | Test snowpatch/job/snowpatch-skiboot on branch master |
snowpatch_ozlabs/snowpatch_job_snowpatch-skiboot-dco | success | Signed-off-by present |
On 03/20/2019 11:46 AM, Andrew Jeffery wrote: > Initialising raw flash lead to a dead assignment to rc. Check the return > code and take the failure path as necessary. Both before and after the > fix we see output along the lines of the following when flash_init() > fails: If flash_init() fails, then we are sure we are going to crash at later stage. May be we should assert() if falsh_init() fails. But that will change current flow. Between I think this should go to Stable # 6.2.x. I will grab this patch for next stable release. -Vasant
On Wed, 20 Mar 2019, at 18:58, Vasant Hegde wrote: > On 03/20/2019 11:46 AM, Andrew Jeffery wrote: > > Initialising raw flash lead to a dead assignment to rc. Check the return > > code and take the failure path as necessary. Both before and after the > > fix we see output along the lines of the following when flash_init() > > fails: > > If flash_init() fails, then we are sure we are going to crash at later > stage. > May be we should assert() if falsh_init() fails. But that will change > current flow. > > Between I think this should go to Stable # 6.2.x. I will grab this patch for > next stable release. Stewart and I were debating whether it should go to stable. If you're happy to pick it up I'm fine with that; we had concluded that it was somewhat cosmetic as things stand now. Andrew > > -Vasant > >
"Andrew Jeffery" <andrew@aj.id.au> writes: > On Wed, 20 Mar 2019, at 18:58, Vasant Hegde wrote: >> On 03/20/2019 11:46 AM, Andrew Jeffery wrote: >> > Initialising raw flash lead to a dead assignment to rc. Check the return >> > code and take the failure path as necessary. Both before and after the >> > fix we see output along the lines of the following when flash_init() >> > fails: >> >> If flash_init() fails, then we are sure we are going to crash at later >> stage. >> May be we should assert() if falsh_init() fails. But that will change >> current flow. >> >> Between I think this should go to Stable # 6.2.x. I will grab this patch for >> next stable release. > > Stewart and I were debating whether it should go to stable. If you're > happy to pick it up I'm fine with that; we had concluded that it was > somewhat cosmetic as things stand now. I've picked it up for master as of 6225d11924933f0541fd3b1a6192a7fcda279f1f. The problem I mentioned on internal slack with Mambo and STB payloads was actually coming from 8d8a9ca30453bc92977409966e6006f8c18f4be5 and some poor bisecting / makefile dependencies. So I fixed it in merging that and drank heavily as I remembered exactly how we work out where on earth the payload is and how we jump to it. I'd be deeply keen for us to simplify that code path and be very very very predictable (maybe standardise on things being in the DT if we're not loading from flash?). I think the biggest users of "kernel is already in memory" are sim and BML.
"Andrew Jeffery" <andrew@aj.id.au> writes: > On Wed, 20 Mar 2019, at 18:58, Vasant Hegde wrote: >> On 03/20/2019 11:46 AM, Andrew Jeffery wrote: >> > Initialising raw flash lead to a dead assignment to rc. Check the return >> > code and take the failure path as necessary. Both before and after the >> > fix we see output along the lines of the following when flash_init() >> > fails: >> >> If flash_init() fails, then we are sure we are going to crash at later >> stage. >> May be we should assert() if falsh_init() fails. But that will change >> current flow. >> >> Between I think this should go to Stable # 6.2.x. I will grab this patch for >> next stable release. > > Stewart and I were debating whether it should go to stable. If you're > happy to pick it up I'm fine with that; we had concluded that it was > somewhat cosmetic as things stand now. Oh, I meant to mention, I'm going to *completely* leave that decision up to Vasant to bikeshed. I'd err on the side of probably not bringing it to stable, but there's an argument to be had to be more conservative and die reliably in the first place you know there's trouble if only to avoid some crazy problem we haven't yet thought of/discovered.
diff --git a/platforms/astbmc/pnor.c b/platforms/astbmc/pnor.c index 1c364351e065..6c1d287220b4 100644 --- a/platforms/astbmc/pnor.c +++ b/platforms/astbmc/pnor.c @@ -79,8 +79,11 @@ int pnor_init(void) goto fail; } - if (style == raw_flash || style == raw_mem) + if (style == raw_flash || style == raw_mem) { rc = flash_init(pnor_ctrl, &bl, NULL); + if (rc) + goto fail; + } rc = flash_register(bl); if (!rc)
Initialising raw flash lead to a dead assignment to rc. Check the return code and take the failure path as necessary. Both before and after the fix we see output along the lines of the following when flash_init() fails: [ 53.283182881,7] IRQ: Registering 0800..0ff7 ops @0x300d4b98 (data 0x3052b9d8) [ 53.283184335,7] IRQ: Registering 0ff8..0fff ops @0x300d4bc8 (data 0x3052b9d8) [ 53.283185513,7] PHB#0000: Initializing PHB... [ 53.288260827,4] FLASH: Can't load resource id:0. No system flash found [ 53.288354442,4] FLASH: Can't load resource id:1. No system flash found [ 53.342933439,3] CAPP: Error loading ucode lid. index=200ea [ 53.462749486,2] NVRAM: Failed to load [ 53.462819095,2] NVRAM: Failed to load [ 53.462894236,2] NVRAM: Failed to load [ 53.462967071,2] NVRAM: Failed to load [ 53.463033077,2] NVRAM: Failed to load [ 53.463144847,2] NVRAM: Failed to load Eventually followed by: [ 57.216942479,5] INIT: platform wait for kernel load failed [ 57.217051132,5] INIT: Assuming kernel at 0x20000000 [ 57.217127508,3] INIT: ELF header not found. Assuming raw binary. [ 57.217249886,2] NVRAM: Failed to load [ 57.221294487,0] FATAL: Kernel is zeros, can't execute! [ 57.221397429,0] Assert fail: core/init.c:615:0 [ 57.221471414,0] Aborting! CPU 0028 Backtrace: S: 0000000031d43c60 R: 000000003001b274 ._abort+0x4c S: 0000000031d43ce0 R: 000000003001b2f0 .assert_fail+0x34 S: 0000000031d43d60 R: 0000000030014814 .load_and_boot_kernel+0xae4 S: 0000000031d43e30 R: 0000000030015164 .main_cpu_entry+0x680 S: 0000000031d43f00 R: 0000000030002718 boot_entry+0x1c0 --- OPAL boot --- Analysis of the execution paths suggests we'll always "safely" end this way due the setup sequence for the blocklevel callbacks in flash_init() and error handling in blocklevel_get_info(), and there's no current risk of executing from unexpected memory locations. As such the issue is reduced to down to a fix for poor error hygene in the original change and a resolution for a Coverity warning (famous last words etc). Fixes: c826e1ca9e5b ("astbmc: Try IPMI HIOMAP for P8 (again)") Signed-off-by: Andrew Jeffery <andrew@aj.id.au> --- platforms/astbmc/pnor.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)