diff mbox series

lpc: Clear sync no-response field prior to device probe

Message ID 20181018075612.20530-1-andrew@aj.id.au
State Accepted
Headers show
Series lpc: Clear sync no-response field prior to device probe | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success master/apply_patch Successfully applied
snowpatch_ozlabs/make_check success Test make_check on branch master

Commit Message

Andrew Jeffery Oct. 18, 2018, 7:56 a.m. UTC
Artem Senichev reported[1] his P8 platform was failing to boot from
a43e9a66aae9 ("astbmc: Fail SFC init if SIO is unavailable") with the
following error:

[  110.097168975,3] PLAT: Failed to open PNOR flash controller

I reproduced this behaviour on a Palmetto; we need to ensure the state
of the no-response error bit is clear before proceding with the presence
test.

The fix appears to resolve the failure to open the PNOR flash controller
on Palmetto and doesn't change the expected behaviour on Witherspoon.

[1] https://github.com/open-power/skiboot/issues/197

Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
---
 hw/lpc.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Artem Senichev Oct. 18, 2018, 9:11 a.m. UTC | #1
On 18/10/2018 10:56, Andrew Jeffery wrote:
> Artem Senichev reported[1] his P8 platform was failing to boot from
> a43e9a66aae9 ("astbmc: Fail SFC init if SIO is unavailable") with the
> following error:
> 
> [  110.097168975,3] PLAT: Failed to open PNOR flash controller
> 
> I reproduced this behaviour on a Palmetto; we need to ensure the state
> of the no-response error bit is clear before proceding with the presence
> test.
> 
> The fix appears to resolve the failure to open the PNOR flash controller
> on Palmetto and doesn't change the expected behaviour on Witherspoon.
> 
> [1] https://github.com/open-power/skiboot/issues/197
> 
> Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
> ---
>   hw/lpc.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/lpc.c b/hw/lpc.c
> index c55d47638ee9..20e54c99cd73 100644
> --- a/hw/lpc.c
> +++ b/hw/lpc.c
> @@ -473,6 +473,7 @@ static const struct lpc_error_entry lpc_error_table[] = {
>   static int64_t lpc_probe_prepare(struct lpcm *lpc)
>   {
>   	const uint32_t irqmask_addr = lpc_reg_opb_base + LPC_HC_IRQMASK;
> +	const uint32_t irqstat_addr = lpc_reg_opb_base + LPC_HC_IRQSTAT;
>   	uint32_t irqmask;
>   	int rc;
>   
> @@ -481,7 +482,11 @@ static int64_t lpc_probe_prepare(struct lpcm *lpc)
>   		return rc;
>   
>   	irqmask &= ~LPC_HC_IRQ_SYNC_NORESP_ERR;
> -	return opb_write(lpc, irqmask_addr, irqmask, 4);
> +	rc = opb_write(lpc, irqmask_addr, irqmask, 4);
> +	if (rc)
> +		return rc;
> +
> +	return opb_write(lpc, irqstat_addr, LPC_HC_IRQ_SYNC_NORESP_ERR, 4);
>   }
>   
>   static int64_t lpc_probe_test(struct lpcm *lpc)
> 
Tested-by: Artem Senichev <a.senichev@yadro.com>
Joel Stanley Oct. 25, 2018, 4:34 a.m. UTC | #2
On Thu, 18 Oct 2018 at 18:29, Andrew Jeffery <andrew@aj.id.au> wrote:
>
> Artem Senichev reported[1] his P8 platform was failing to boot from
> a43e9a66aae9 ("astbmc: Fail SFC init if SIO is unavailable") with the
> following error:
>
> [  110.097168975,3] PLAT: Failed to open PNOR flash controller

Yep, this bricked my Garrison:

[  118.463273159,3] PLAT: Failed to open PNOR flash controller
[  119.309908447,2] NVRAM: Failed to load
[  119.309999680,2] NVRAM: Failed to load
[  119.311393404,2] NVRAM: Failed to load
[  119.312534959,2] NVRAM: Failed to load
[  119.312612048,2] NVRAM: Failed to load

With your patch it looked okay. I did see this on the next boot:

[  138.038277373,7] OPAL: Start CPU 0x00ee (PIR 0x00ee) -> 0x000000000000a96c
[  138.047297723,7] OPAL: Start CPU 0x00ef (PIR 0x00ef) -> 0x000000000000a96c
[  138.097362833,3] LPC[000]: Got SYNC no-response error. Error
address reg: 0xd001002f
[  138.097391393,6] IPMI: dropping non severe PEL event
[  138.100886198,7] UART: IRQ functional !
[  138.100939322,7] PHB#0009: Got interrupt 0x000057ff
[  138.112595579,7] IPMI Get Message Flags: 02
[  138.112995295,7] IPMI Get Message Flags: 02
[  138.113537667,7] IPMI read event 35 complete: 16 bytes. cc: 00
[  138.113542495,6] IPMI: dropping System Event Record SEL
[  138.113998903,7] IPMI: Got error response 0x80

>
> I reproduced this behaviour on a Palmetto; we need to ensure the state
> of the no-response error bit is clear before proceding with the presence
> test.
>
> The fix appears to resolve the failure to open the PNOR flash controller
> on Palmetto and doesn't change the expected behaviour on Witherspoon.
>
> [1] https://github.com/open-power/skiboot/issues/197
>
> Signed-off-by: Andrew Jeffery <andrew@aj.id.au>

Tested-by: Joel Stanley <joel@jms.id.au>
Andrew Jeffery Oct. 25, 2018, 6:17 a.m. UTC | #3
On Thu, 25 Oct 2018, at 15:04, Joel Stanley wrote:
> On Thu, 18 Oct 2018 at 18:29, Andrew Jeffery <andrew@aj.id.au> wrote:
> >
> > Artem Senichev reported[1] his P8 platform was failing to boot from
> > a43e9a66aae9 ("astbmc: Fail SFC init if SIO is unavailable") with the
> > following error:
> >
> > [  110.097168975,3] PLAT: Failed to open PNOR flash controller
> 
> Yep, this bricked my Garrison:

Agh. Sorry.

> 
> [  118.463273159,3] PLAT: Failed to open PNOR flash controller
> [  119.309908447,2] NVRAM: Failed to load
> [  119.309999680,2] NVRAM: Failed to load
> [  119.311393404,2] NVRAM: Failed to load
> [  119.312534959,2] NVRAM: Failed to load
> [  119.312612048,2] NVRAM: Failed to load
> 
> With your patch it looked okay. I did see this on the next boot:
> 
> [  138.038277373,7] OPAL: Start CPU 0x00ee (PIR 0x00ee) -> 0x000000000000a96c
> [  138.047297723,7] OPAL: Start CPU 0x00ef (PIR 0x00ef) -> 0x000000000000a96c
> [  138.097362833,3] LPC[000]: Got SYNC no-response error. Error
> address reg: 0xd001002f
> [  138.097391393,6] IPMI: dropping non severe PEL event

Hmm, yeah that's curious. I must admit I didn't get all the way to the bottom of the problem, the patch I sent fixes a correctness issue with the test which happens to have the side-effect of allowing the machine to boot. However, I don't know why the LPCHC is in this error state to begin with. I should look into that at some point.

> [  138.100886198,7] UART: IRQ functional !
> [  138.100939322,7] PHB#0009: Got interrupt 0x000057ff
> [  138.112595579,7] IPMI Get Message Flags: 02
> [  138.112995295,7] IPMI Get Message Flags: 02
> [  138.113537667,7] IPMI read event 35 complete: 16 bytes. cc: 00
> [  138.113542495,6] IPMI: dropping System Event Record SEL
> [  138.113998903,7] IPMI: Got error response 0x80
> 
> >
> > I reproduced this behaviour on a Palmetto; we need to ensure the state
> > of the no-response error bit is clear before proceding with the presence
> > test.
> >
> > The fix appears to resolve the failure to open the PNOR flash controller
> > on Palmetto and doesn't change the expected behaviour on Witherspoon.
> >
> > [1] https://github.com/open-power/skiboot/issues/197
> >
> > Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
> 
> Tested-by: Joel Stanley <joel@jms.id.au>

Cheers

Andrew
Stewart Smith Oct. 25, 2018, 11:10 p.m. UTC | #4
Andrew Jeffery <andrew@aj.id.au> writes:
> Artem Senichev reported[1] his P8 platform was failing to boot from
> a43e9a66aae9 ("astbmc: Fail SFC init if SIO is unavailable") with the
> following error:
>
> [  110.097168975,3] PLAT: Failed to open PNOR flash controller
>
> I reproduced this behaviour on a Palmetto; we need to ensure the state
> of the no-response error bit is clear before proceding with the presence
> test.
>
> The fix appears to resolve the failure to open the PNOR flash controller
> on Palmetto and doesn't change the expected behaviour on Witherspoon.
>
> [1] https://github.com/open-power/skiboot/issues/197
>
> Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
> ---
>  hw/lpc.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)

Thanks, and thanks to Artem and Joel for testing (and apologies for
breaking P8 things in the first place, I really need a regular OpenBMC
based P8 machine for testing).

Merged to master as of 7194e92cc700bfcc6f12f5fc12da06ef936bd2b8
Joel Stanley Oct. 25, 2018, 11:16 p.m. UTC | #5
On Fri, 26 Oct 2018 at 09:41, Stewart Smith <stewart@linux.ibm.com> wrote:
> Thanks, and thanks to Artem and Joel for testing (and apologies for
> breaking P8 things in the first place, I really need a regular OpenBMC
> based P8 machine for testing).

FWIW, this is not related to OpenBMC. I hit the failure on an AMI machine.

Cheers,

Joel
Stewart Smith Oct. 26, 2018, 5:55 a.m. UTC | #6
Joel Stanley <joel@jms.id.au> writes:
> On Fri, 26 Oct 2018 at 09:41, Stewart Smith <stewart@linux.ibm.com> wrote:
>> Thanks, and thanks to Artem and Joel for testing (and apologies for
>> breaking P8 things in the first place, I really need a regular OpenBMC
>> based P8 machine for testing).
>
> FWIW, this is not related to OpenBMC. I hit the failure on an AMI
> machine.

I just need firmware that can operate a host console reliably, and
that's not AMI BMC firmware.
diff mbox series

Patch

diff --git a/hw/lpc.c b/hw/lpc.c
index c55d47638ee9..20e54c99cd73 100644
--- a/hw/lpc.c
+++ b/hw/lpc.c
@@ -473,6 +473,7 @@  static const struct lpc_error_entry lpc_error_table[] = {
 static int64_t lpc_probe_prepare(struct lpcm *lpc)
 {
 	const uint32_t irqmask_addr = lpc_reg_opb_base + LPC_HC_IRQMASK;
+	const uint32_t irqstat_addr = lpc_reg_opb_base + LPC_HC_IRQSTAT;
 	uint32_t irqmask;
 	int rc;
 
@@ -481,7 +482,11 @@  static int64_t lpc_probe_prepare(struct lpcm *lpc)
 		return rc;
 
 	irqmask &= ~LPC_HC_IRQ_SYNC_NORESP_ERR;
-	return opb_write(lpc, irqmask_addr, irqmask, 4);
+	rc = opb_write(lpc, irqmask_addr, irqmask, 4);
+	if (rc)
+		return rc;
+
+	return opb_write(lpc, irqstat_addr, LPC_HC_IRQ_SYNC_NORESP_ERR, 4);
 }
 
 static int64_t lpc_probe_test(struct lpcm *lpc)