diff mbox

[Vivid,SRU] powerpc/eeh: Fix fenced PHB caused by eeh_slot_error_detail()

Message ID 1449088437-25624-1-git-send-email-tim.gardner@canonical.com
State New
Headers show

Commit Message

Tim Gardner Dec. 2, 2015, 8:33 p.m. UTC
From: Gavin Shan <gwshan@linux.vnet.ibm.com>

BugLink: http://bugs.launchpad.net/bugs/1522071

The config space of some PCI devices can't be accessed when their
PEs are in frozen state. Otherwise, fenced PHB might be seen.
Those PEs are identified with flag EEH_PE_CFG_RESTRICTED, meaing
EEH_PE_CFG_BLOCKED is set automatically when the PE is put to
frozen state (EEH_PE_ISOLATED). eeh_slot_error_detail() restores
PCI device BARs with eeh_pe_restore_bars(), which then calls
eeh_ops->restore_config() to reinitialize the PCI device in
(OPAL) firmware. eeh_ops->restore_config() produces PCI config
access that causes fenced PHB. The problem was reported on below
adapter:

   0001:01:00.0 0200: 14e4:168e (rev 10)
   0001:01:00.0 Ethernet controller: Broadcom Corporation \
                NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)

This fixes the issue by skipping eeh_pe_restore_bars() in
eeh_slot_error_detail() when EEH_PE_CFG_BLOCKED is set for the PE.

Fixes: b6541db1 ("powerpc/eeh: Block PCI config access upon frozen PE")
Cc: stable@vger.kernel.org # v4.0+
Reported-by: Manvanthara B. Puttashankar <mputtash@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 259800135c654a098d9f0adfdd3d1f20eef1f231)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
---
 arch/powerpc/kernel/eeh.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

Comments

Andy Whitcroft Dec. 4, 2015, 11:57 a.m. UTC | #1
On Wed, Dec 02, 2015 at 01:33:57PM -0700, tim.gardner@canonical.com wrote:
> From: Gavin Shan <gwshan@linux.vnet.ibm.com>
> 
> BugLink: http://bugs.launchpad.net/bugs/1522071
> 
> The config space of some PCI devices can't be accessed when their
> PEs are in frozen state. Otherwise, fenced PHB might be seen.
> Those PEs are identified with flag EEH_PE_CFG_RESTRICTED, meaing
> EEH_PE_CFG_BLOCKED is set automatically when the PE is put to
> frozen state (EEH_PE_ISOLATED). eeh_slot_error_detail() restores
> PCI device BARs with eeh_pe_restore_bars(), which then calls
> eeh_ops->restore_config() to reinitialize the PCI device in
> (OPAL) firmware. eeh_ops->restore_config() produces PCI config
> access that causes fenced PHB. The problem was reported on below
> adapter:
> 
>    0001:01:00.0 0200: 14e4:168e (rev 10)
>    0001:01:00.0 Ethernet controller: Broadcom Corporation \
>                 NetXtreme II BCM57810 10 Gigabit Ethernet (rev 10)
> 
> This fixes the issue by skipping eeh_pe_restore_bars() in
> eeh_slot_error_detail() when EEH_PE_CFG_BLOCKED is set for the PE.
> 
> Fixes: b6541db1 ("powerpc/eeh: Block PCI config access upon frozen PE")
> Cc: stable@vger.kernel.org # v4.0+
> Reported-by: Manvanthara B. Puttashankar <mputtash@in.ibm.com>
> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> (cherry picked from commit 259800135c654a098d9f0adfdd3d1f20eef1f231)
> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
> ---
>  arch/powerpc/kernel/eeh.c | 21 ++++++++++++++++++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 3b2252e..79150db 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -306,11 +306,26 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
>  	if (!(pe->type & EEH_PE_PHB)) {
>  		if (eeh_has_flag(EEH_ENABLE_IO_FOR_LOG))
>  			eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
> +
> +		/*
> +		 * The config space of some PCI devices can't be accessed
> +		 * when their PEs are in frozen state. Otherwise, fenced
> +		 * PHB might be seen. Those PEs are identified with flag
> +		 * EEH_PE_CFG_RESTRICTED, indicating EEH_PE_CFG_BLOCKED
> +		 * is set automatically when the PE is put to EEH_PE_ISOLATED.
> +		 *
> +		 * Restoring BARs possibly triggers PCI config access in
> +		 * (OPAL) firmware and then causes fenced PHB. If the
> +		 * PCI config is blocked with flag EEH_PE_CFG_BLOCKED, it's
> +		 * pointless to restore BARs and dump config space.
> +		 */
>  		eeh_ops->configure_bridge(pe);
> -		eeh_pe_restore_bars(pe);
> +		if (!(pe->state & EEH_PE_CFG_BLOCKED)) {
> +			eeh_pe_restore_bars(pe);
>  
> -		pci_regs_buf[0] = 0;
> -		eeh_pe_traverse(pe, eeh_dump_pe_log, &loglen);
> +			pci_regs_buf[0] = 0;
> +			eeh_pe_traverse(pe, eeh_dump_pe_log, &loglen);
> +		}
>  	}
>  
>  	eeh_ops->get_log(pe, severity, pci_regs_buf, loglen);


Looks to do what is claimed.  

Acked-by: Andy Whitcroft <apw@canonical.com>

-apw
Stefan Bader Dec. 8, 2015, 11:08 a.m. UTC | #2
cherry pick though upstream confusingly tagged it as only 4.0+, but looks sane
enough and if it applies and compiles...
Andy Whitcroft Dec. 8, 2015, 11:26 a.m. UTC | #3
Applied to vivid master-next.

-apw
Andy Whitcroft Dec. 8, 2015, 11:27 a.m. UTC | #4
On Tue, Dec 08, 2015 at 12:08:34PM +0100, Stefan Bader wrote:
> cherry pick though upstream confusingly tagged it as only 4.0+, but looks sane
> enough and if it applies and compiles...

Tim,

This does apply cleanly and builds, but it might be worth circling with
IBM on this and confirming we are not missing a trick by applying it
before 4.0.

-apw
Michael Ellerman Dec. 8, 2015, 11:39 a.m. UTC | #5
On Tue, 2015-12-08 at 11:27 +0000, Andy Whitcroft wrote:
> On Tue, Dec 08, 2015 at 12:08:34PM +0100, Stefan Bader wrote:
> > cherry pick though upstream confusingly tagged it as only 4.0+, but looks sane
> > enough and if it applies and compiles...
>
> Tim,
>
> This does apply cleanly and builds, but it might be worth circling with
> IBM on this and confirming we are not missing a trick by applying it
> before 4.0.

It looks like Gavin added the 4.0+ annotation himself:

  https://patchwork.ozlabs.org/patch/511744/

I don't remember discussing it with him, and I can't find any chat logs or
anything. I suspect it's probably just wrongly tagged, and should go back to at
least match the Fixes line, which is v3.18.

So I think you should probably take it.

cheers
Andy Whitcroft Dec. 8, 2015, 12:56 p.m. UTC | #6
On Tue, Dec 08, 2015 at 10:39:21PM +1100, Michael Ellerman wrote:
> On Tue, 2015-12-08 at 11:27 +0000, Andy Whitcroft wrote:
> > On Tue, Dec 08, 2015 at 12:08:34PM +0100, Stefan Bader wrote:
> > > cherry pick though upstream confusingly tagged it as only 4.0+, but looks sane
> > > enough and if it applies and compiles...
> >
> > Tim,
> >
> > This does apply cleanly and builds, but it might be worth circling with
> > IBM on this and confirming we are not missing a trick by applying it
> > before 4.0.
> 
> It looks like Gavin added the 4.0+ annotation himself:
> 
>   https://patchwork.ozlabs.org/patch/511744/
> 
> I don't remember discussing it with him, and I can't find any chat logs or
> anything. I suspect it's probably just wrongly tagged, and should go back to at
> least match the Fixes line, which is v3.18.
> 
> So I think you should probably take it.

Thanks Michael for the clarification.  I have applied this for the next upload.

-apw
diff mbox

Patch

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 3b2252e..79150db 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -306,11 +306,26 @@  void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
 	if (!(pe->type & EEH_PE_PHB)) {
 		if (eeh_has_flag(EEH_ENABLE_IO_FOR_LOG))
 			eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
+
+		/*
+		 * The config space of some PCI devices can't be accessed
+		 * when their PEs are in frozen state. Otherwise, fenced
+		 * PHB might be seen. Those PEs are identified with flag
+		 * EEH_PE_CFG_RESTRICTED, indicating EEH_PE_CFG_BLOCKED
+		 * is set automatically when the PE is put to EEH_PE_ISOLATED.
+		 *
+		 * Restoring BARs possibly triggers PCI config access in
+		 * (OPAL) firmware and then causes fenced PHB. If the
+		 * PCI config is blocked with flag EEH_PE_CFG_BLOCKED, it's
+		 * pointless to restore BARs and dump config space.
+		 */
 		eeh_ops->configure_bridge(pe);
-		eeh_pe_restore_bars(pe);
+		if (!(pe->state & EEH_PE_CFG_BLOCKED)) {
+			eeh_pe_restore_bars(pe);
 
-		pci_regs_buf[0] = 0;
-		eeh_pe_traverse(pe, eeh_dump_pe_log, &loglen);
+			pci_regs_buf[0] = 0;
+			eeh_pe_traverse(pe, eeh_dump_pe_log, &loglen);
+		}
 	}
 
 	eeh_ops->get_log(pe, severity, pci_regs_buf, loglen);