Message ID | 20181018075612.20530-1-andrew@aj.id.au |
---|---|
State | Accepted |
Headers | show |
Series | lpc: Clear sync no-response field prior to device probe | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | master/apply_patch Successfully applied |
snowpatch_ozlabs/make_check | success | Test make_check on branch master |
On 18/10/2018 10:56, Andrew Jeffery wrote: > Artem Senichev reported[1] his P8 platform was failing to boot from > a43e9a66aae9 ("astbmc: Fail SFC init if SIO is unavailable") with the > following error: > > [ 110.097168975,3] PLAT: Failed to open PNOR flash controller > > I reproduced this behaviour on a Palmetto; we need to ensure the state > of the no-response error bit is clear before proceding with the presence > test. > > The fix appears to resolve the failure to open the PNOR flash controller > on Palmetto and doesn't change the expected behaviour on Witherspoon. > > [1] https://github.com/open-power/skiboot/issues/197 > > Signed-off-by: Andrew Jeffery <andrew@aj.id.au> > --- > hw/lpc.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/hw/lpc.c b/hw/lpc.c > index c55d47638ee9..20e54c99cd73 100644 > --- a/hw/lpc.c > +++ b/hw/lpc.c > @@ -473,6 +473,7 @@ static const struct lpc_error_entry lpc_error_table[] = { > static int64_t lpc_probe_prepare(struct lpcm *lpc) > { > const uint32_t irqmask_addr = lpc_reg_opb_base + LPC_HC_IRQMASK; > + const uint32_t irqstat_addr = lpc_reg_opb_base + LPC_HC_IRQSTAT; > uint32_t irqmask; > int rc; > > @@ -481,7 +482,11 @@ static int64_t lpc_probe_prepare(struct lpcm *lpc) > return rc; > > irqmask &= ~LPC_HC_IRQ_SYNC_NORESP_ERR; > - return opb_write(lpc, irqmask_addr, irqmask, 4); > + rc = opb_write(lpc, irqmask_addr, irqmask, 4); > + if (rc) > + return rc; > + > + return opb_write(lpc, irqstat_addr, LPC_HC_IRQ_SYNC_NORESP_ERR, 4); > } > > static int64_t lpc_probe_test(struct lpcm *lpc) > Tested-by: Artem Senichev <a.senichev@yadro.com>
On Thu, 18 Oct 2018 at 18:29, Andrew Jeffery <andrew@aj.id.au> wrote: > > Artem Senichev reported[1] his P8 platform was failing to boot from > a43e9a66aae9 ("astbmc: Fail SFC init if SIO is unavailable") with the > following error: > > [ 110.097168975,3] PLAT: Failed to open PNOR flash controller Yep, this bricked my Garrison: [ 118.463273159,3] PLAT: Failed to open PNOR flash controller [ 119.309908447,2] NVRAM: Failed to load [ 119.309999680,2] NVRAM: Failed to load [ 119.311393404,2] NVRAM: Failed to load [ 119.312534959,2] NVRAM: Failed to load [ 119.312612048,2] NVRAM: Failed to load With your patch it looked okay. I did see this on the next boot: [ 138.038277373,7] OPAL: Start CPU 0x00ee (PIR 0x00ee) -> 0x000000000000a96c [ 138.047297723,7] OPAL: Start CPU 0x00ef (PIR 0x00ef) -> 0x000000000000a96c [ 138.097362833,3] LPC[000]: Got SYNC no-response error. Error address reg: 0xd001002f [ 138.097391393,6] IPMI: dropping non severe PEL event [ 138.100886198,7] UART: IRQ functional ! [ 138.100939322,7] PHB#0009: Got interrupt 0x000057ff [ 138.112595579,7] IPMI Get Message Flags: 02 [ 138.112995295,7] IPMI Get Message Flags: 02 [ 138.113537667,7] IPMI read event 35 complete: 16 bytes. cc: 00 [ 138.113542495,6] IPMI: dropping System Event Record SEL [ 138.113998903,7] IPMI: Got error response 0x80 > > I reproduced this behaviour on a Palmetto; we need to ensure the state > of the no-response error bit is clear before proceding with the presence > test. > > The fix appears to resolve the failure to open the PNOR flash controller > on Palmetto and doesn't change the expected behaviour on Witherspoon. > > [1] https://github.com/open-power/skiboot/issues/197 > > Signed-off-by: Andrew Jeffery <andrew@aj.id.au> Tested-by: Joel Stanley <joel@jms.id.au>
On Thu, 25 Oct 2018, at 15:04, Joel Stanley wrote: > On Thu, 18 Oct 2018 at 18:29, Andrew Jeffery <andrew@aj.id.au> wrote: > > > > Artem Senichev reported[1] his P8 platform was failing to boot from > > a43e9a66aae9 ("astbmc: Fail SFC init if SIO is unavailable") with the > > following error: > > > > [ 110.097168975,3] PLAT: Failed to open PNOR flash controller > > Yep, this bricked my Garrison: Agh. Sorry. > > [ 118.463273159,3] PLAT: Failed to open PNOR flash controller > [ 119.309908447,2] NVRAM: Failed to load > [ 119.309999680,2] NVRAM: Failed to load > [ 119.311393404,2] NVRAM: Failed to load > [ 119.312534959,2] NVRAM: Failed to load > [ 119.312612048,2] NVRAM: Failed to load > > With your patch it looked okay. I did see this on the next boot: > > [ 138.038277373,7] OPAL: Start CPU 0x00ee (PIR 0x00ee) -> 0x000000000000a96c > [ 138.047297723,7] OPAL: Start CPU 0x00ef (PIR 0x00ef) -> 0x000000000000a96c > [ 138.097362833,3] LPC[000]: Got SYNC no-response error. Error > address reg: 0xd001002f > [ 138.097391393,6] IPMI: dropping non severe PEL event Hmm, yeah that's curious. I must admit I didn't get all the way to the bottom of the problem, the patch I sent fixes a correctness issue with the test which happens to have the side-effect of allowing the machine to boot. However, I don't know why the LPCHC is in this error state to begin with. I should look into that at some point. > [ 138.100886198,7] UART: IRQ functional ! > [ 138.100939322,7] PHB#0009: Got interrupt 0x000057ff > [ 138.112595579,7] IPMI Get Message Flags: 02 > [ 138.112995295,7] IPMI Get Message Flags: 02 > [ 138.113537667,7] IPMI read event 35 complete: 16 bytes. cc: 00 > [ 138.113542495,6] IPMI: dropping System Event Record SEL > [ 138.113998903,7] IPMI: Got error response 0x80 > > > > > I reproduced this behaviour on a Palmetto; we need to ensure the state > > of the no-response error bit is clear before proceding with the presence > > test. > > > > The fix appears to resolve the failure to open the PNOR flash controller > > on Palmetto and doesn't change the expected behaviour on Witherspoon. > > > > [1] https://github.com/open-power/skiboot/issues/197 > > > > Signed-off-by: Andrew Jeffery <andrew@aj.id.au> > > Tested-by: Joel Stanley <joel@jms.id.au> Cheers Andrew
Andrew Jeffery <andrew@aj.id.au> writes: > Artem Senichev reported[1] his P8 platform was failing to boot from > a43e9a66aae9 ("astbmc: Fail SFC init if SIO is unavailable") with the > following error: > > [ 110.097168975,3] PLAT: Failed to open PNOR flash controller > > I reproduced this behaviour on a Palmetto; we need to ensure the state > of the no-response error bit is clear before proceding with the presence > test. > > The fix appears to resolve the failure to open the PNOR flash controller > on Palmetto and doesn't change the expected behaviour on Witherspoon. > > [1] https://github.com/open-power/skiboot/issues/197 > > Signed-off-by: Andrew Jeffery <andrew@aj.id.au> > --- > hw/lpc.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) Thanks, and thanks to Artem and Joel for testing (and apologies for breaking P8 things in the first place, I really need a regular OpenBMC based P8 machine for testing). Merged to master as of 7194e92cc700bfcc6f12f5fc12da06ef936bd2b8
On Fri, 26 Oct 2018 at 09:41, Stewart Smith <stewart@linux.ibm.com> wrote: > Thanks, and thanks to Artem and Joel for testing (and apologies for > breaking P8 things in the first place, I really need a regular OpenBMC > based P8 machine for testing). FWIW, this is not related to OpenBMC. I hit the failure on an AMI machine. Cheers, Joel
Joel Stanley <joel@jms.id.au> writes: > On Fri, 26 Oct 2018 at 09:41, Stewart Smith <stewart@linux.ibm.com> wrote: >> Thanks, and thanks to Artem and Joel for testing (and apologies for >> breaking P8 things in the first place, I really need a regular OpenBMC >> based P8 machine for testing). > > FWIW, this is not related to OpenBMC. I hit the failure on an AMI > machine. I just need firmware that can operate a host console reliably, and that's not AMI BMC firmware.
diff --git a/hw/lpc.c b/hw/lpc.c index c55d47638ee9..20e54c99cd73 100644 --- a/hw/lpc.c +++ b/hw/lpc.c @@ -473,6 +473,7 @@ static const struct lpc_error_entry lpc_error_table[] = { static int64_t lpc_probe_prepare(struct lpcm *lpc) { const uint32_t irqmask_addr = lpc_reg_opb_base + LPC_HC_IRQMASK; + const uint32_t irqstat_addr = lpc_reg_opb_base + LPC_HC_IRQSTAT; uint32_t irqmask; int rc; @@ -481,7 +482,11 @@ static int64_t lpc_probe_prepare(struct lpcm *lpc) return rc; irqmask &= ~LPC_HC_IRQ_SYNC_NORESP_ERR; - return opb_write(lpc, irqmask_addr, irqmask, 4); + rc = opb_write(lpc, irqmask_addr, irqmask, 4); + if (rc) + return rc; + + return opb_write(lpc, irqstat_addr, LPC_HC_IRQ_SYNC_NORESP_ERR, 4); } static int64_t lpc_probe_test(struct lpcm *lpc)
Artem Senichev reported[1] his P8 platform was failing to boot from a43e9a66aae9 ("astbmc: Fail SFC init if SIO is unavailable") with the following error: [ 110.097168975,3] PLAT: Failed to open PNOR flash controller I reproduced this behaviour on a Palmetto; we need to ensure the state of the no-response error bit is clear before proceding with the presence test. The fix appears to resolve the failure to open the PNOR flash controller on Palmetto and doesn't change the expected behaviour on Witherspoon. [1] https://github.com/open-power/skiboot/issues/197 Signed-off-by: Andrew Jeffery <andrew@aj.id.au> --- hw/lpc.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)