diff mbox

powerpc/fsl: Add support for pci(e) machine check exception on E500MC / E5500

Message ID 1412009312-7400-1-git-send-email-linux@roeck-us.net (mailing list archive)
State Changes Requested
Delegated to: Scott Wood
Headers show

Commit Message

Guenter Roeck Sept. 29, 2014, 4:48 p.m. UTC
From: Jojy G Varghese <jojyv@juniper.net>

For E500MC and E5500, a machine check exception in pci(e) memory space
crashes the kernel.

Testing shows that the MCAR(U) register is zero on a MC exception for the
E5500 core. At the same time, DEAR register has been found to have the
address of the faulty load address during an MC exception for this core.

This fix changes the current behavior to fixup the result register
and instruction pointers in the case of a load operation on a faulty
PCI address.

The changes are:
- Added the hook to pci machine check handing to the e500mc machine check
  exception handler.
- For the E5500 core, load faulting address from SPRN_DEAR register.
  As mentioned above, this is necessary because the E5500 core does not
  report the fault address in the MCAR register.

Cc: Scott Wood <scottwood@freescale.com>
Signed-off-by: Jojy G Varghese <jojyv@juniper.net>
[Guenter Roeck: updated description]
Signed-off-by: Guenter Roeck <groeck@juniper.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 arch/powerpc/kernel/traps.c   | 3 ++-
 arch/powerpc/sysdev/fsl_pci.c | 5 +++++
 2 files changed, 7 insertions(+), 1 deletion(-)

Comments

Scott Wood Sept. 29, 2014, 6:36 p.m. UTC | #1
On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
> From: Jojy G Varghese <jojyv@juniper.net>
> 
> For E500MC and E5500, a machine check exception in pci(e) memory space
> crashes the kernel.
> 
> Testing shows that the MCAR(U) register is zero on a MC exception for the
> E5500 core. At the same time, DEAR register has been found to have the
> address of the faulty load address during an MC exception for this core.
> 
> This fix changes the current behavior to fixup the result register
> and instruction pointers in the case of a load operation on a faulty
> PCI address.
> 
> The changes are:
> - Added the hook to pci machine check handing to the e500mc machine check
>   exception handler.
> - For the E5500 core, load faulting address from SPRN_DEAR register.
>   As mentioned above, this is necessary because the E5500 core does not
>   report the fault address in the MCAR register.
> 
> Cc: Scott Wood <scottwood@freescale.com>
> Signed-off-by: Jojy G Varghese <jojyv@juniper.net>
> [Guenter Roeck: updated description]
> Signed-off-by: Guenter Roeck <groeck@juniper.net>
> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> ---
>  arch/powerpc/kernel/traps.c   | 3 ++-
>  arch/powerpc/sysdev/fsl_pci.c | 5 +++++
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 0dc43f9..ecb709b 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
>  	int recoverable = 1;
>  
>  	if (reason & MCSR_LD) {
> -		recoverable = fsl_rio_mcheck_exception(regs);
> +		recoverable = fsl_rio_mcheck_exception(regs) ||
> +			fsl_pci_mcheck_exception(regs);
>  		if (recoverable == 1)
>  			goto silent_out;
>  	}
> diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
> index c507767..bdb956b 100644
> --- a/arch/powerpc/sysdev/fsl_pci.c
> +++ b/arch/powerpc/sysdev/fsl_pci.c
> @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs)
>  #endif
>  	addr += mfspr(SPRN_MCAR);
>  
> +#ifdef CONFIG_E5500_CPU
> +	if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
> +		addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
> +#endif

Kconfig tells you what hardware is supported, not what hardware you're
actually running on.

Jia Hongtao, do you know anything about this issue?  Is there an
erratum?  What chips are affected by the the erratum covered by
<http://patchwork.ozlabs.org/patch/240239/>?

Can we rely on DEAR or is this just a side effect of likely having taken
a TLB miss for the address recently?  Perhaps we should use the
instruction emulation to determine the effective address instead.

Guenter, is this patch intended to deal with an erratum or are you
covering up legitimate errors?

-Scott
Guenter Roeck Sept. 29, 2014, 7:06 p.m. UTC | #2
On Mon, Sep 29, 2014 at 01:36:06PM -0500, Scott Wood wrote:
> On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
> > From: Jojy G Varghese <jojyv@juniper.net>
> > 
> > For E500MC and E5500, a machine check exception in pci(e) memory space
> > crashes the kernel.
> > 
> > Testing shows that the MCAR(U) register is zero on a MC exception for the
> > E5500 core. At the same time, DEAR register has been found to have the
> > address of the faulty load address during an MC exception for this core.
> > 
> > This fix changes the current behavior to fixup the result register
> > and instruction pointers in the case of a load operation on a faulty
> > PCI address.
> > 
> > The changes are:
> > - Added the hook to pci machine check handing to the e500mc machine check
> >   exception handler.
> > - For the E5500 core, load faulting address from SPRN_DEAR register.
> >   As mentioned above, this is necessary because the E5500 core does not
> >   report the fault address in the MCAR register.
> > 
> > Cc: Scott Wood <scottwood@freescale.com>
> > Signed-off-by: Jojy G Varghese <jojyv@juniper.net>
> > [Guenter Roeck: updated description]
> > Signed-off-by: Guenter Roeck <groeck@juniper.net>
> > Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> > ---
> >  arch/powerpc/kernel/traps.c   | 3 ++-
> >  arch/powerpc/sysdev/fsl_pci.c | 5 +++++
> >  2 files changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> > index 0dc43f9..ecb709b 100644
> > --- a/arch/powerpc/kernel/traps.c
> > +++ b/arch/powerpc/kernel/traps.c
> > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
> >  	int recoverable = 1;
> >  
> >  	if (reason & MCSR_LD) {
> > -		recoverable = fsl_rio_mcheck_exception(regs);
> > +		recoverable = fsl_rio_mcheck_exception(regs) ||
> > +			fsl_pci_mcheck_exception(regs);
> >  		if (recoverable == 1)
> >  			goto silent_out;
> >  	}
> > diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
> > index c507767..bdb956b 100644
> > --- a/arch/powerpc/sysdev/fsl_pci.c
> > +++ b/arch/powerpc/sysdev/fsl_pci.c
> > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs)
> >  #endif
> >  	addr += mfspr(SPRN_MCAR);
> >  
> > +#ifdef CONFIG_E5500_CPU
> > +	if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
> > +		addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
> > +#endif
> 
> Kconfig tells you what hardware is supported, not what hardware you're
> actually running on.
> 
Hi Scott,

Good point. Jojy, guess we'll have to check if the CPU is actually an E5500.
Can you look into that ?

> Jia Hongtao, do you know anything about this issue?  Is there an
> erratum?  What chips are affected by the the erratum covered by
> <http://patchwork.ozlabs.org/patch/240239/>?
> 
We already have and use the above patch(es) in our kernel. It works fine
for E500 (P2020), but does not address E5500 (P5020/P5040).

> Can we rely on DEAR or is this just a side effect of likely having taken
> a TLB miss for the address recently?  Perhaps we should use the
> instruction emulation to determine the effective address instead.
> 
> Guenter, is this patch intended to deal with an erratum or are you
> covering up legitimate errors?
> 
Those are errors related to PCIe hotplug, and are seen with unexpected PCIe
device removals (triggered, for example, by removing power from a PCIe adapter).
The behavior we see on E5500 is quite similar to the same behavior on E500:
If unhandled, the CPU keeps executing the same instruction over and over again
if there is an error on a PCIe access and thus stalls. I don't know if this
is considered an erratum or expected behavior, but it is one we have to address
since we have to be able to handle that condition. Ultimately, we'll want to
implement PCIe error handlers for the affected drivers, but that will be a next
step.

Please let me know if you have a better solution to address this problem.

Thanks,
Guenter
Jojy Varghese Sept. 29, 2014, 11:03 p.m. UTC | #3
On 9/29/14 12:06 PM, "Guenter Roeck" <linux@roeck-us.net> wrote:

>On Mon, Sep 29, 2014 at 01:36:06PM -0500, Scott Wood wrote:

>> On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:

>> > From: Jojy G Varghese <jojyv@juniper.net>

>> > 

>> > For E500MC and E5500, a machine check exception in pci(e) memory space

>> > crashes the kernel.

>> > 

>> > Testing shows that the MCAR(U) register is zero on a MC exception for

>>the

>> > E5500 core. At the same time, DEAR register has been found to have the

>> > address of the faulty load address during an MC exception for this

>>core.

>> > 

>> > This fix changes the current behavior to fixup the result register

>> > and instruction pointers in the case of a load operation on a faulty

>> > PCI address.

>> > 

>> > The changes are:

>> > - Added the hook to pci machine check handing to the e500mc machine

>>check

>> >   exception handler.

>> > - For the E5500 core, load faulting address from SPRN_DEAR register.

>> >   As mentioned above, this is necessary because the E5500 core does

>>not

>> >   report the fault address in the MCAR register.

>> > 

>> > Cc: Scott Wood <scottwood@freescale.com>

>> > Signed-off-by: Jojy G Varghese <jojyv@juniper.net>

>> > [Guenter Roeck: updated description]

>> > Signed-off-by: Guenter Roeck <groeck@juniper.net>

>> > Signed-off-by: Guenter Roeck <linux@roeck-us.net>

>> > ---

>> >  arch/powerpc/kernel/traps.c   | 3 ++-

>> >  arch/powerpc/sysdev/fsl_pci.c | 5 +++++

>> >  2 files changed, 7 insertions(+), 1 deletion(-)

>> > 

>> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c

>> > index 0dc43f9..ecb709b 100644

>> > --- a/arch/powerpc/kernel/traps.c

>> > +++ b/arch/powerpc/kernel/traps.c

>> > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)

>> >  	int recoverable = 1;

>> >  

>> >  	if (reason & MCSR_LD) {

>> > -		recoverable = fsl_rio_mcheck_exception(regs);

>> > +		recoverable = fsl_rio_mcheck_exception(regs) ||

>> > +			fsl_pci_mcheck_exception(regs);

>> >  		if (recoverable == 1)

>> >  			goto silent_out;

>> >  	}

>> > diff --git a/arch/powerpc/sysdev/fsl_pci.c

>>b/arch/powerpc/sysdev/fsl_pci.c

>> > index c507767..bdb956b 100644

>> > --- a/arch/powerpc/sysdev/fsl_pci.c

>> > +++ b/arch/powerpc/sysdev/fsl_pci.c

>> > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs

>>*regs)

>> >  #endif

>> >  	addr += mfspr(SPRN_MCAR);

>> >  

>> > +#ifdef CONFIG_E5500_CPU

>> > +	if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)

>> > +		addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));

>> > +#endif

>> 

>> Kconfig tells you what hardware is supported, not what hardware you're

>> actually running on.

>> 

>Hi Scott,

>

>Good point. Jojy, guess we'll have to check if the CPU is actually an

>E5500.

>Can you look into that ?



"/proc/cpuinfo" shows the cpu as "e5500". Scott, are you suggesting that
we use a runtime method of determining the cpu type (cpu_spec's cpu_name
for
example).  


>

>> Jia Hongtao, do you know anything about this issue?  Is there an

>> erratum?  What chips are affected by the the erratum covered by

>> <http://patchwork.ozlabs.org/patch/240239/>?

>> 

>We already have and use the above patch(es) in our kernel. It works fine

>for E500 (P2020), but does not address E5500 (P5020/P5040).

>

>> Can we rely on DEAR or is this just a side effect of likely having taken

>> a TLB miss for the address recently?  Perhaps we should use the

>> instruction emulation to determine the effective address instead.

>> 

>> Guenter, is this patch intended to deal with an erratum or are you

>> covering up legitimate errors?

>> 

>Those are errors related to PCIe hotplug, and are seen with unexpected

>PCIe

>device removals (triggered, for example, by removing power from a PCIe

>adapter).

>The behavior we see on E5500 is quite similar to the same behavior on

>E500:

>If unhandled, the CPU keeps executing the same instruction over and over

>again

>if there is an error on a PCIe access and thus stalls. I don't know if

>this

>is considered an erratum or expected behavior, but it is one we have to

>address

>since we have to be able to handle that condition. Ultimately, we'll want

>to

>implement PCIe error handlers for the affected drivers, but that will be

>a next

>step.


According to the spec, we MCAR is supposed to hold the faulty data address
but for 5500 core, we found that MCAR is zero. You are right that DEAR
entry could
be a resultOf a TLB miss but that¹s the register we could rely on.

What do you mean by "instruction emulation"? Are you suggesting that we
examine the RD, RS 
registers for the instruction?



>

>Please let me know if you have a better solution to address this problem.

>

>Thanks,

>Guenter



Thanks
Jojy
Guenter Roeck Sept. 30, 2014, 3:50 p.m. UTC | #4
On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
> On Mon, 2014-09-29 at 23:03 +0000, Jojy Varghese wrote:
> > 
> > On 9/29/14 12:06 PM, "Guenter Roeck" <linux@roeck-us.net> wrote:
> > 
> > >On Mon, Sep 29, 2014 at 01:36:06PM -0500, Scott Wood wrote:
> > >> On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
> > >> > From: Jojy G Varghese <jojyv@juniper.net>
> > >> > 
> > >> > For E500MC and E5500, a machine check exception in pci(e) memory space
> > >> > crashes the kernel.
> > >> > 
> > >> > Testing shows that the MCAR(U) register is zero on a MC exception for
> > >>the
> > >> > E5500 core. At the same time, DEAR register has been found to have the
> > >> > address of the faulty load address during an MC exception for this
> > >>core.
> > >> > 
> > >> > This fix changes the current behavior to fixup the result register
> > >> > and instruction pointers in the case of a load operation on a faulty
> > >> > PCI address.
> > >> > 
> > >> > The changes are:
> > >> > - Added the hook to pci machine check handing to the e500mc machine
> > >>check
> > >> >   exception handler.
> > >> > - For the E5500 core, load faulting address from SPRN_DEAR register.
> > >> >   As mentioned above, this is necessary because the E5500 core does
> > >>not
> > >> >   report the fault address in the MCAR register.
> > >> > 
> > >> > Cc: Scott Wood <scottwood@freescale.com>
> > >> > Signed-off-by: Jojy G Varghese <jojyv@juniper.net>
> > >> > [Guenter Roeck: updated description]
> > >> > Signed-off-by: Guenter Roeck <groeck@juniper.net>
> > >> > Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> > >> > ---
> > >> >  arch/powerpc/kernel/traps.c   | 3 ++-
> > >> >  arch/powerpc/sysdev/fsl_pci.c | 5 +++++
> > >> >  2 files changed, 7 insertions(+), 1 deletion(-)
> > >> > 
> > >> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> > >> > index 0dc43f9..ecb709b 100644
> > >> > --- a/arch/powerpc/kernel/traps.c
> > >> > +++ b/arch/powerpc/kernel/traps.c
> > >> > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
> > >> >  	int recoverable = 1;
> > >> >  
> > >> >  	if (reason & MCSR_LD) {
> > >> > -		recoverable = fsl_rio_mcheck_exception(regs);
> > >> > +		recoverable = fsl_rio_mcheck_exception(regs) ||
> > >> > +			fsl_pci_mcheck_exception(regs);
> > >> >  		if (recoverable == 1)
> > >> >  			goto silent_out;
> > >> >  	}
> > >> > diff --git a/arch/powerpc/sysdev/fsl_pci.c
> > >>b/arch/powerpc/sysdev/fsl_pci.c
> > >> > index c507767..bdb956b 100644
> > >> > --- a/arch/powerpc/sysdev/fsl_pci.c
> > >> > +++ b/arch/powerpc/sysdev/fsl_pci.c
> > >> > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
> > >>*regs)
> > >> >  #endif
> > >> >  	addr += mfspr(SPRN_MCAR);
> > >> >  
> > >> > +#ifdef CONFIG_E5500_CPU
> > >> > +	if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
> > >> > +		addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
> > >> > +#endif
> > >> 
> > >> Kconfig tells you what hardware is supported, not what hardware you're
> > >> actually running on.
> 
> Plus, CONFIG_E5500_CPU may not even be set when running on an e5500, as
> it is used for selecting GCC optimization settings.  You could have
> CONFIG_GENERIC_CPU instead.
> 
> And the subject says "E500MC / E5500", not just "E5500". :-)
> 
> > >Hi Scott,
> > >
> > >Good point. Jojy, guess we'll have to check if the CPU is actually an
> > >E5500.
> > >Can you look into that ?
> > 
> > 
> > "/proc/cpuinfo" shows the cpu as "e5500". Scott, are you suggesting that
> > we use a runtime method of determining the cpu type (cpu_spec's cpu_name
> > for
> > example).  
> 
> Yes, if there's a bug to be worked around, and we don't want to apply
> the workaround unconditionally, you should use PVR to determine whether
> you're running on an affected core.
> 
> > >> Can we rely on DEAR or is this just a side effect of likely having taken
> > >> a TLB miss for the address recently?  Perhaps we should use the
> > >> instruction emulation to determine the effective address instead.
> > >> 
> > >> Guenter, is this patch intended to deal with an erratum or are you
> > >> covering up legitimate errors?
> > >> 
> >
> > >Those are errors related to PCIe hotplug, and are seen with unexpected
> > >PCIe
> > >device removals (triggered, for example, by removing power from a PCIe
> > >adapter).
> > >The behavior we see on E5500 is quite similar to the same behavior on
> > >E500:
> > >If unhandled, the CPU keeps executing the same instruction over and over
> > >again
> > >if there is an error on a PCIe access and thus stalls. I don't know if
> > >this
> > >is considered an erratum or expected behavior, but it is one we have to
> > >address
> > >since we have to be able to handle that condition. 
> 
> The reason I ask is that the handling for e500 was described as an
> erratum workaround.  If it is an erratum it would be nice to know the
> erratum number and the full list of affected chips.
> 
My understanding, which may be wrong, was that this is expected behavior,
at least for E5500. I actually thought I had seen it somewhere in the
specification (response to PCIe errors), but I don't recall where exactly.

At least for my part I am not aware of an erratum.

> > >Ultimately, we'll want
> > >to
> > >implement PCIe error handlers for the affected drivers, but that will be
> > >a next
> > >step.
> 
> For now can we at least print a ratelimited error message?  I don't like
> the idea of silently ignoring these errors.  I suppose it's a separate
> issue from extending the workaround to cover e500mc, though.
> 
I don't really like the idea of printing an error message pretty much each time
when an unexpected hotplug event occurs.

> > According to the spec, we MCAR is supposed to hold the faulty data address
> > but for 5500 core, we found that MCAR is zero.
> 
> Which specific chip and revision did you see this on?  What is the value
> in MCSR?
> 
Jojy can answer that, at least for P5020. We have seen it on P5040 as well,
though, so it is not just limited to one chip/revision.

Guenter

> > You are right that DEAR entry could
> > be a resultOf a TLB miss but that¹s the register we could rely on.
> 
> If it's the result of a previous TLB miss then we can't rely on it.  The
> translation might have been loaded into the TLB before the hotplug
> event, or there might have been an interrupt between loading the
> translation into the TLB and using the translation.
> 
> > What do you mean by "instruction emulation"? 
> 
> mcheck_handle_load()
> 
> > Are you suggesting that we
> > examine the RD, RS 
> > registers for the instruction?
> 
> Yes, if we don't have a simpler reliable source of the address.
> 
> -Scott
> 
>
Jojy Varghese Sept. 30, 2014, 8:15 p.m. UTC | #5
On 9/30/14 8:50 AM, "Guenter Roeck" <linux@roeck-us.net> wrote:

>On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:

>> On Mon, 2014-09-29 at 23:03 +0000, Jojy Varghese wrote:

>> > 

>> > On 9/29/14 12:06 PM, "Guenter Roeck" <linux@roeck-us.net> wrote:

>> > 

>> > >On Mon, Sep 29, 2014 at 01:36:06PM -0500, Scott Wood wrote:

>> > >> On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:

>> > >> > From: Jojy G Varghese <jojyv@juniper.net>

>> > >> > 

>> > >> > For E500MC and E5500, a machine check exception in pci(e) memory

>>space

>> > >> > crashes the kernel.

>> > >> > 

>> > >> > Testing shows that the MCAR(U) register is zero on a MC

>>exception for

>> > >>the

>> > >> > E5500 core. At the same time, DEAR register has been found to

>>have the

>> > >> > address of the faulty load address during an MC exception for

>>this

>> > >>core.

>> > >> > 

>> > >> > This fix changes the current behavior to fixup the result

>>register

>> > >> > and instruction pointers in the case of a load operation on a

>>faulty

>> > >> > PCI address.

>> > >> > 

>> > >> > The changes are:

>> > >> > - Added the hook to pci machine check handing to the e500mc

>>machine

>> > >>check

>> > >> >   exception handler.

>> > >> > - For the E5500 core, load faulting address from SPRN_DEAR

>>register.

>> > >> >   As mentioned above, this is necessary because the E5500 core

>>does

>> > >>not

>> > >> >   report the fault address in the MCAR register.

>> > >> > 

>> > >> > Cc: Scott Wood <scottwood@freescale.com>

>> > >> > Signed-off-by: Jojy G Varghese <jojyv@juniper.net>

>> > >> > [Guenter Roeck: updated description]

>> > >> > Signed-off-by: Guenter Roeck <groeck@juniper.net>

>> > >> > Signed-off-by: Guenter Roeck <linux@roeck-us.net>

>> > >> > ---

>> > >> >  arch/powerpc/kernel/traps.c   | 3 ++-

>> > >> >  arch/powerpc/sysdev/fsl_pci.c | 5 +++++

>> > >> >  2 files changed, 7 insertions(+), 1 deletion(-)

>> > >> > 

>> > >> > diff --git a/arch/powerpc/kernel/traps.c

>>b/arch/powerpc/kernel/traps.c

>> > >> > index 0dc43f9..ecb709b 100644

>> > >> > --- a/arch/powerpc/kernel/traps.c

>> > >> > +++ b/arch/powerpc/kernel/traps.c

>> > >> > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs

>>*regs)

>> > >> >  	int recoverable = 1;

>> > >> >  

>> > >> >  	if (reason & MCSR_LD) {

>> > >> > -		recoverable = fsl_rio_mcheck_exception(regs);

>> > >> > +		recoverable = fsl_rio_mcheck_exception(regs) ||

>> > >> > +			fsl_pci_mcheck_exception(regs);

>> > >> >  		if (recoverable == 1)

>> > >> >  			goto silent_out;

>> > >> >  	}

>> > >> > diff --git a/arch/powerpc/sysdev/fsl_pci.c

>> > >>b/arch/powerpc/sysdev/fsl_pci.c

>> > >> > index c507767..bdb956b 100644

>> > >> > --- a/arch/powerpc/sysdev/fsl_pci.c

>> > >> > +++ b/arch/powerpc/sysdev/fsl_pci.c

>> > >> > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct

>>pt_regs

>> > >>*regs)

>> > >> >  #endif

>> > >> >  	addr += mfspr(SPRN_MCAR);

>> > >> >  

>> > >> > +#ifdef CONFIG_E5500_CPU

>> > >> > +	if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)

>> > >> > +		addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));

>> > >> > +#endif

>> > >> 

>> > >> Kconfig tells you what hardware is supported, not what hardware

>>you're

>> > >> actually running on.

>> 

>> Plus, CONFIG_E5500_CPU may not even be set when running on an e5500, as

>> it is used for selecting GCC optimization settings.  You could have

>> CONFIG_GENERIC_CPU instead.

>> 

>> And the subject says "E500MC / E5500", not just "E5500". :-)

>> 

>> > >Hi Scott,

>> > >

>> > >Good point. Jojy, guess we'll have to check if the CPU is actually an

>> > >E5500.

>> > >Can you look into that ?

>> > 

>> > 

>> > "/proc/cpuinfo" shows the cpu as "e5500". Scott, are you suggesting

>>that

>> > we use a runtime method of determining the cpu type (cpu_spec's

>>cpu_name

>> > for

>> > example).  

>> 

>> Yes, if there's a bug to be worked around, and we don't want to apply

>> the workaround unconditionally, you should use PVR to determine whether

>> you're running on an affected core.

>> 

>> > >> Can we rely on DEAR or is this just a side effect of likely having

>>taken

>> > >> a TLB miss for the address recently?  Perhaps we should use the

>> > >> instruction emulation to determine the effective address instead.

>> > >> 

>> > >> Guenter, is this patch intended to deal with an erratum or are you

>> > >> covering up legitimate errors?

>> > >> 

>> >

>> > >Those are errors related to PCIe hotplug, and are seen with

>>unexpected

>> > >PCIe

>> > >device removals (triggered, for example, by removing power from a

>>PCIe

>> > >adapter).

>> > >The behavior we see on E5500 is quite similar to the same behavior on

>> > >E500:

>> > >If unhandled, the CPU keeps executing the same instruction over and

>>over

>> > >again

>> > >if there is an error on a PCIe access and thus stalls. I don't know

>>if

>> > >this

>> > >is considered an erratum or expected behavior, but it is one we have

>>to

>> > >address

>> > >since we have to be able to handle that condition.

>> 

>> The reason I ask is that the handling for e500 was described as an

>> erratum workaround.  If it is an erratum it would be nice to know the

>> erratum number and the full list of affected chips.

>> 

>My understanding, which may be wrong, was that this is expected behavior,

>at least for E5500. I actually thought I had seen it somewhere in the

>specification (response to PCIe errors), but I don't recall where exactly.

>

>At least for my part I am not aware of an erratum.

>

>> > >Ultimately, we'll want

>> > >to

>> > >implement PCIe error handlers for the affected drivers, but that

>>will be

>> > >a next

>> > >step.

>> 

>> For now can we at least print a ratelimited error message?  I don't like

>> the idea of silently ignoring these errors.  I suppose it's a separate

>> issue from extending the workaround to cover e500mc, though.

>> 

>I don't really like the idea of printing an error message pretty much

>each time

>when an unexpected hotplug event occurs.

>

>> > According to the spec, we MCAR is supposed to hold the faulty data

>>address

>> > but for 5500 core, we found that MCAR is zero.

>> 

>> Which specific chip and revision did you see this on?  What is the value

>> in MCSR?

>> 

>Jojy can answer that, at least for P5020. We have seen it on P5040 as

>well,

>though, so it is not just limited to one chip/revision.


The specifics are:
PVR: 0x80240012
Instruction that causes the MC exception: lwbrx
	The faulty load address is also present in RB. So we could change the
logic to use that 
instead of DEAR. What I don’t know is of there are other cases also which
escapes the current logic.

					
				
			
		
	

>

>Guenter

>

>> > You are right that DEAR entry could

>> > be a resultOf a TLB miss but that¹s the register we could rely on.

>> 

>> If it's the result of a previous TLB miss then we can't rely on it.  The

>> translation might have been loaded into the TLB before the hotplug

>> event, or there might have been an interrupt between loading the

>> translation into the TLB and using the translation.

>> 

>> > What do you mean by "instruction emulation"?

>> 

>> mcheck_handle_load()

>> 

>> > Are you suggesting that we

>> > examine the RD, RS

>> > registers for the instruction?

>> 

>> Yes, if we don't have a simpler reliable source of the address.

>> 

>> -Scott

>> 

>>
Scott Wood Oct. 1, 2014, 12:43 a.m. UTC | #6
On Tue, 2014-09-30 at 08:50 -0700, Guenter Roeck wrote:
> On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:
> > On Mon, 2014-09-29 at 23:03 +0000, Jojy Varghese wrote:
> > > 
> > > On 9/29/14 12:06 PM, "Guenter Roeck" <linux@roeck-us.net> wrote:
> > > 
> > > >Those are errors related to PCIe hotplug, and are seen with unexpected
> > > >PCIe
> > > >device removals (triggered, for example, by removing power from a PCIe
> > > >adapter).
> > > >The behavior we see on E5500 is quite similar to the same behavior on
> > > >E500:
> > > >If unhandled, the CPU keeps executing the same instruction over and over
> > > >again
> > > >if there is an error on a PCIe access and thus stalls. I don't know if
> > > >this
> > > >is considered an erratum or expected behavior, but it is one we have to
> > > >address
> > > >since we have to be able to handle that condition. 
> > 
> > The reason I ask is that the handling for e500 was described as an
> > erratum workaround.  If it is an erratum it would be nice to know the
> > erratum number and the full list of affected chips.
> > 
> My understanding, which may be wrong, was that this is expected behavior,
> at least for E5500. I actually thought I had seen it somewhere in the
> specification (response to PCIe errors), but I don't recall where exactly.
> 
> At least for my part I am not aware of an erratum.

Jia Hongtao, can you comment here?

> > > >Ultimately, we'll want
> > > >to
> > > >implement PCIe error handlers for the affected drivers, but that will be
> > > >a next
> > > >step.
> > 
> > For now can we at least print a ratelimited error message?  I don't like
> > the idea of silently ignoring these errors.  I suppose it's a separate
> > issue from extending the workaround to cover e500mc, though.
> > 
> I don't really like the idea of printing an error message pretty much each time
> when an unexpected hotplug event occurs.

Unexpected events seem like the sort of thing you'd want to log, but my
concern is that this might not be the only cause of PCI errors.

-Scott
Hongtao Jia Oct. 8, 2014, 3:08 a.m. UTC | #7
> -----Original Message-----

> From: Wood Scott-B07421

> Sent: Tuesday, September 30, 2014 2:36 AM

> To: Guenter Roeck

> Cc: Benjamin Herrenschmidt; Paul Mackerras; Michael Ellerman; linuxppc-

> dev@lists.ozlabs.org; linux-kernel@vger.kernel.org; Jojy G Varghese;

> Guenter Roeck; Jia Hongtao-B38951

> Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check

> exception on E500MC / E5500

> 

> On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:

> > From: Jojy G Varghese <jojyv@juniper.net>

> >

> > For E500MC and E5500, a machine check exception in pci(e) memory space

> > crashes the kernel.

> >

> > Testing shows that the MCAR(U) register is zero on a MC exception for

> > the

> > E5500 core. At the same time, DEAR register has been found to have the

> > address of the faulty load address during an MC exception for this core.

> >

> > This fix changes the current behavior to fixup the result register and

> > instruction pointers in the case of a load operation on a faulty PCI

> > address.

> >

> > The changes are:

> > - Added the hook to pci machine check handing to the e500mc machine

> check

> >   exception handler.

> > - For the E5500 core, load faulting address from SPRN_DEAR register.

> >   As mentioned above, this is necessary because the E5500 core does not

> >   report the fault address in the MCAR register.

> >

> > Cc: Scott Wood <scottwood@freescale.com>

> > Signed-off-by: Jojy G Varghese <jojyv@juniper.net> [Guenter Roeck:

> > updated description]

> > Signed-off-by: Guenter Roeck <groeck@juniper.net>

> > Signed-off-by: Guenter Roeck <linux@roeck-us.net>

> > ---

> >  arch/powerpc/kernel/traps.c   | 3 ++-

> >  arch/powerpc/sysdev/fsl_pci.c | 5 +++++

> >  2 files changed, 7 insertions(+), 1 deletion(-)

> >

> > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c

> > index 0dc43f9..ecb709b 100644

> > --- a/arch/powerpc/kernel/traps.c

> > +++ b/arch/powerpc/kernel/traps.c

> > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)

> >  	int recoverable = 1;

> >

> >  	if (reason & MCSR_LD) {

> > -		recoverable = fsl_rio_mcheck_exception(regs);

> > +		recoverable = fsl_rio_mcheck_exception(regs) ||

> > +			fsl_pci_mcheck_exception(regs);

> >  		if (recoverable == 1)

> >  			goto silent_out;

> >  	}

> > diff --git a/arch/powerpc/sysdev/fsl_pci.c

> > b/arch/powerpc/sysdev/fsl_pci.c index c507767..bdb956b 100644

> > --- a/arch/powerpc/sysdev/fsl_pci.c

> > +++ b/arch/powerpc/sysdev/fsl_pci.c

> > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs

> > *regs)  #endif

> >  	addr += mfspr(SPRN_MCAR);

> >

> > +#ifdef CONFIG_E5500_CPU

> > +	if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)

> > +		addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));

> #endif

> 

> Kconfig tells you what hardware is supported, not what hardware you're

> actually running on.

> 

> Jia Hongtao, do you know anything about this issue?  Is there an erratum?


Sorry for the late response, I just return from my vacation.
I don't know this issue.

> What chips are affected by the the erratum covered by

> <http://patchwork.ozlabs.org/patch/240239/>?


MPC8544, MPC8548, MPC8572 are affected by this erratum.
I checked P4080 which using e500mc and no such erratum is found.

> 

> Can we rely on DEAR or is this just a side effect of likely having taken

> a TLB miss for the address recently?  Perhaps we should use the

> instruction emulation to determine the effective address instead.

> 

> Guenter, is this patch intended to deal with an erratum or are you

> covering up legitimate errors?

> 

> -Scott

>
Hongtao Jia Oct. 8, 2014, 3:10 a.m. UTC | #8
> -----Original Message-----

> From: Wood Scott-B07421

> Sent: Wednesday, October 01, 2014 8:44 AM

> To: Guenter Roeck

> Cc: Jojy Varghese; Benjamin Herrenschmidt; Paul Mackerras; Michael

> Ellerman; linuxppc-dev@lists.ozlabs.org; linux-kernel@vger.kernel.org;

> Guenter Roeck; Jia Hongtao-B38951

> Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check

> exception on E500MC / E5500

> 

> On Tue, 2014-09-30 at 08:50 -0700, Guenter Roeck wrote:

> > On Mon, Sep 29, 2014 at 06:31:06PM -0500, Scott Wood wrote:

> > > On Mon, 2014-09-29 at 23:03 +0000, Jojy Varghese wrote:

> > > >

> > > > On 9/29/14 12:06 PM, "Guenter Roeck" <linux@roeck-us.net> wrote:

> > > >

> > > > >Those are errors related to PCIe hotplug, and are seen with

> > > > >unexpected PCIe device removals (triggered, for example, by

> > > > >removing power from a PCIe adapter).

> > > > >The behavior we see on E5500 is quite similar to the same

> > > > >behavior on

> > > > >E500:

> > > > >If unhandled, the CPU keeps executing the same instruction over

> > > > >and over again if there is an error on a PCIe access and thus

> > > > >stalls. I don't know if this is considered an erratum or expected

> > > > >behavior, but it is one we have to address since we have to be

> > > > >able to handle that condition.

> > >

> > > The reason I ask is that the handling for e500 was described as an

> > > erratum workaround.  If it is an erratum it would be nice to know

> > > the erratum number and the full list of affected chips.

> > >

> > My understanding, which may be wrong, was that this is expected

> > behavior, at least for E5500. I actually thought I had seen it

> > somewhere in the specification (response to PCIe errors), but I don't

> recall where exactly.

> >

> > At least for my part I am not aware of an erratum.

> 

> Jia Hongtao, can you comment here?


I did not find any related erratum either.

> 

> > > > >Ultimately, we'll want

> > > > >to

> > > > >implement PCIe error handlers for the affected drivers, but that

> > > > >will be a next step.

> > >

> > > For now can we at least print a ratelimited error message?  I don't

> > > like the idea of silently ignoring these errors.  I suppose it's a

> > > separate issue from extending the workaround to cover e500mc, though.

> > >

> > I don't really like the idea of printing an error message pretty much

> > each time when an unexpected hotplug event occurs.

> 

> Unexpected events seem like the sort of thing you'd want to log, but my

> concern is that this might not be the only cause of PCI errors.

> 

> -Scott

>
Scott Wood Oct. 8, 2014, 11:48 p.m. UTC | #9
On Tue, 2014-10-07 at 22:08 -0500, Jia Hongtao-B38951 wrote:
> 
> > -----Original Message-----
> > From: Wood Scott-B07421
> > Sent: Tuesday, September 30, 2014 2:36 AM
> > To: Guenter Roeck
> > Cc: Benjamin Herrenschmidt; Paul Mackerras; Michael Ellerman; linuxppc-
> > dev@lists.ozlabs.org; linux-kernel@vger.kernel.org; Jojy G Varghese;
> > Guenter Roeck; Jia Hongtao-B38951
> > Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check
> > exception on E500MC / E5500
> > 
> > On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:
> > > From: Jojy G Varghese <jojyv@juniper.net>
> > >
> > > For E500MC and E5500, a machine check exception in pci(e) memory space
> > > crashes the kernel.
> > >
> > > Testing shows that the MCAR(U) register is zero on a MC exception for
> > > the
> > > E5500 core. At the same time, DEAR register has been found to have the
> > > address of the faulty load address during an MC exception for this core.
> > >
> > > This fix changes the current behavior to fixup the result register and
> > > instruction pointers in the case of a load operation on a faulty PCI
> > > address.
> > >
> > > The changes are:
> > > - Added the hook to pci machine check handing to the e500mc machine
> > check
> > >   exception handler.
> > > - For the E5500 core, load faulting address from SPRN_DEAR register.
> > >   As mentioned above, this is necessary because the E5500 core does not
> > >   report the fault address in the MCAR register.
> > >
> > > Cc: Scott Wood <scottwood@freescale.com>
> > > Signed-off-by: Jojy G Varghese <jojyv@juniper.net> [Guenter Roeck:
> > > updated description]
> > > Signed-off-by: Guenter Roeck <groeck@juniper.net>
> > > Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> > > ---
> > >  arch/powerpc/kernel/traps.c   | 3 ++-
> > >  arch/powerpc/sysdev/fsl_pci.c | 5 +++++
> > >  2 files changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> > > index 0dc43f9..ecb709b 100644
> > > --- a/arch/powerpc/kernel/traps.c
> > > +++ b/arch/powerpc/kernel/traps.c
> > > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)
> > >  	int recoverable = 1;
> > >
> > >  	if (reason & MCSR_LD) {
> > > -		recoverable = fsl_rio_mcheck_exception(regs);
> > > +		recoverable = fsl_rio_mcheck_exception(regs) ||
> > > +			fsl_pci_mcheck_exception(regs);
> > >  		if (recoverable == 1)
> > >  			goto silent_out;
> > >  	}
> > > diff --git a/arch/powerpc/sysdev/fsl_pci.c
> > > b/arch/powerpc/sysdev/fsl_pci.c index c507767..bdb956b 100644
> > > --- a/arch/powerpc/sysdev/fsl_pci.c
> > > +++ b/arch/powerpc/sysdev/fsl_pci.c
> > > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs
> > > *regs)  #endif
> > >  	addr += mfspr(SPRN_MCAR);
> > >
> > > +#ifdef CONFIG_E5500_CPU
> > > +	if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
> > > +		addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
> > #endif
> > 
> > Kconfig tells you what hardware is supported, not what hardware you're
> > actually running on.
> > 
> > Jia Hongtao, do you know anything about this issue?  Is there an erratum?
> 
> Sorry for the late response, I just return from my vacation.
> I don't know this issue.
> 
> > What chips are affected by the the erratum covered by
> > <http://patchwork.ozlabs.org/patch/240239/>?
> 
> MPC8544, MPC8548, MPC8572 are affected by this erratum.

What is the erratum number?

> I checked P4080 which using e500mc and no such erratum is found.

What is the erratum behavior, and how does it differ from the problem
that Jojy and Guenter are trying to solve?

-Scott
Hongtao Jia Oct. 9, 2014, 2:18 a.m. UTC | #10
> -----Original Message-----

> From: Wood Scott-B07421

> Sent: Thursday, October 09, 2014 7:48 AM

> To: Jia Hongtao-B38951

> Cc: Guenter Roeck; Benjamin Herrenschmidt; Paul Mackerras; Michael

> Ellerman; linuxppc-dev@lists.ozlabs.org; linux-kernel@vger.kernel.org;

> Jojy G Varghese; Guenter Roeck

> Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine check

> exception on E500MC / E5500

> 

> On Tue, 2014-10-07 at 22:08 -0500, Jia Hongtao-B38951 wrote:

> >

> > > -----Original Message-----

> > > From: Wood Scott-B07421

> > > Sent: Tuesday, September 30, 2014 2:36 AM

> > > To: Guenter Roeck

> > > Cc: Benjamin Herrenschmidt; Paul Mackerras; Michael Ellerman;

> > > linuxppc- dev@lists.ozlabs.org; linux-kernel@vger.kernel.org; Jojy G

> > > Varghese; Guenter Roeck; Jia Hongtao-B38951

> > > Subject: Re: [PATCH] powerpc/fsl: Add support for pci(e) machine

> > > check exception on E500MC / E5500

> > >

> > > On Mon, 2014-09-29 at 09:48 -0700, Guenter Roeck wrote:

> > > > From: Jojy G Varghese <jojyv@juniper.net>

> > > >

> > > > For E500MC and E5500, a machine check exception in pci(e) memory

> > > > space crashes the kernel.

> > > >

> > > > Testing shows that the MCAR(U) register is zero on a MC exception

> > > > for the

> > > > E5500 core. At the same time, DEAR register has been found to have

> > > > the address of the faulty load address during an MC exception for

> this core.

> > > >

> > > > This fix changes the current behavior to fixup the result register

> > > > and instruction pointers in the case of a load operation on a

> > > > faulty PCI address.

> > > >

> > > > The changes are:

> > > > - Added the hook to pci machine check handing to the e500mc

> > > > machine

> > > check

> > > >   exception handler.

> > > > - For the E5500 core, load faulting address from SPRN_DEAR register.

> > > >   As mentioned above, this is necessary because the E5500 core does

> not

> > > >   report the fault address in the MCAR register.

> > > >

> > > > Cc: Scott Wood <scottwood@freescale.com>

> > > > Signed-off-by: Jojy G Varghese <jojyv@juniper.net> [Guenter Roeck:

> > > > updated description]

> > > > Signed-off-by: Guenter Roeck <groeck@juniper.net>

> > > > Signed-off-by: Guenter Roeck <linux@roeck-us.net>

> > > > ---

> > > >  arch/powerpc/kernel/traps.c   | 3 ++-

> > > >  arch/powerpc/sysdev/fsl_pci.c | 5 +++++

> > > >  2 files changed, 7 insertions(+), 1 deletion(-)

> > > >

> > > > diff --git a/arch/powerpc/kernel/traps.c

> > > > b/arch/powerpc/kernel/traps.c index 0dc43f9..ecb709b 100644

> > > > --- a/arch/powerpc/kernel/traps.c

> > > > +++ b/arch/powerpc/kernel/traps.c

> > > > @@ -494,7 +494,8 @@ int machine_check_e500mc(struct pt_regs *regs)

> > > >  	int recoverable = 1;

> > > >

> > > >  	if (reason & MCSR_LD) {

> > > > -		recoverable = fsl_rio_mcheck_exception(regs);

> > > > +		recoverable = fsl_rio_mcheck_exception(regs) ||

> > > > +			fsl_pci_mcheck_exception(regs);

> > > >  		if (recoverable == 1)

> > > >  			goto silent_out;

> > > >  	}

> > > > diff --git a/arch/powerpc/sysdev/fsl_pci.c

> > > > b/arch/powerpc/sysdev/fsl_pci.c index c507767..bdb956b 100644

> > > > --- a/arch/powerpc/sysdev/fsl_pci.c

> > > > +++ b/arch/powerpc/sysdev/fsl_pci.c

> > > > @@ -1021,6 +1021,11 @@ int fsl_pci_mcheck_exception(struct pt_regs

> > > > *regs)  #endif

> > > >  	addr += mfspr(SPRN_MCAR);

> > > >

> > > > +#ifdef CONFIG_E5500_CPU

> > > > +	if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)

> > > > +		addr = PFN_PHYS(vmalloc_to_pfn((void

> *)mfspr(SPRN_DEAR)));

> > > #endif

> > >

> > > Kconfig tells you what hardware is supported, not what hardware

> > > you're actually running on.

> > >

> > > Jia Hongtao, do you know anything about this issue?  Is there an

> erratum?

> >

> > Sorry for the late response, I just return from my vacation.

> > I don't know this issue.

> >

> > > What chips are affected by the the erratum covered by

> > > <http://patchwork.ozlabs.org/patch/240239/>?

> >

> > MPC8544, MPC8548, MPC8572 are affected by this erratum.

> 

> What is the erratum number?


The number of this erratum for each chip is not consistent.
MPC8544: PCIe 4
MPC8548: PCI-Ex 39
MPC8572: PCI-Ex 3

> 

> > I checked P4080 which using e500mc and no such erratum is found.

> 

> What is the erratum behavior, and how does it differ from the problem

> that Jojy and Guenter are trying to solve?


Here is the description of the erratum:

"When its link goes down, the PCI Express controller clears all outstanding transactions with an
error indicator and sends a link down exception to the interrupt controller if
PEX_PME_MES_DISR[LDDD] = 0. If, however, any transactions are sent to the controller
after the link down event, they will be accepted by the controller and wait for the link to come
back up before starting any timeout counters (e.g. completion timeout). There is no mechanism
to cancel the new transactions short of a device HRESET."

For e500mc as Jojy and Guenter described it's like the same erratum on e500, not 100% sure.

For e5500 I don't quite understand yet.

> 

> -Scott

>
diff mbox

Patch

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 0dc43f9..ecb709b 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -494,7 +494,8 @@  int machine_check_e500mc(struct pt_regs *regs)
 	int recoverable = 1;
 
 	if (reason & MCSR_LD) {
-		recoverable = fsl_rio_mcheck_exception(regs);
+		recoverable = fsl_rio_mcheck_exception(regs) ||
+			fsl_pci_mcheck_exception(regs);
 		if (recoverable == 1)
 			goto silent_out;
 	}
diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
index c507767..bdb956b 100644
--- a/arch/powerpc/sysdev/fsl_pci.c
+++ b/arch/powerpc/sysdev/fsl_pci.c
@@ -1021,6 +1021,11 @@  int fsl_pci_mcheck_exception(struct pt_regs *regs)
 #endif
 	addr += mfspr(SPRN_MCAR);
 
+#ifdef CONFIG_E5500_CPU
+	if (mfspr(SPRN_EPCR) & SPRN_EPCR_ICM)
+		addr = PFN_PHYS(vmalloc_to_pfn((void *)mfspr(SPRN_DEAR)));
+#endif
+
 	if (is_in_pci_mem_space(addr)) {
 		if (user_mode(regs)) {
 			pagefault_disable();