diff mbox

[1/6] powerpc/eeh: Don't collect PCI-CFG data on PHB

Message ID 1372154461-29674-2-git-send-email-shangw@linux.vnet.ibm.com (mailing list archive)
State Accepted, archived
Commit 812d5cdc7dc97791a0838c47f5e8762f1b25a308
Headers show

Commit Message

Gavin Shan June 25, 2013, 10 a.m. UTC
When the PHB is fenced or dead, it's pointless to collect the data
from PCI config space of subordinate PCI devices since it should
return 0xFF's. It also has potential risk to incur additional errors.
The patch avoids collecting PCI-CFG data while PHB is in fenced or
dead state.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/eeh.c |   34 ++++++++++++++++++++++++----------
 1 files changed, 24 insertions(+), 10 deletions(-)

Comments

Benjamin Herrenschmidt June 25, 2013, 11:55 a.m. UTC | #1
On Tue, 2013-06-25 at 18:00 +0800, Gavin Shan wrote:
> +       /*
> +        * When the PHB is fenced or dead, it's pointless to collect
> +        * the data from PCI config space because it should return
> +        * 0xFF's. For ER, we still retrieve the data from the PCI
> +        * config space.
> +        */
> +       if (eeh_probe_mode_dev() &&
> +           (pe->type & EEH_PE_PHB) &&
> +           (pe->state & (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD)))
> +               valid_cfg_log = false;
> +

I'm still unsure about that one. EEH_PE_ISOLATED could be the result
of a normal ER of PE#0 (which can happen for various reasons other
than a fence) in which case the config space is available and
interesting.

I would either not bother and collect the FF's, or make this specific
to fence and only fence.

Cheers,
Ben.
Benjamin Herrenschmidt June 25, 2013, 11:56 a.m. UTC | #2
On Tue, 2013-06-25 at 18:00 +0800, Gavin Shan wrote:
> +               pci_regs_buf[0] = 0;
> +               eeh_pe_for_each_dev(pe, edev) {
> +                       loglen += eeh_gather_pci_data(edev, pci_regs_buf,
> +                                                     EEH_PCI_REGS_LOG_LEN);
> +               }
> +       }

Unless I'm mistaken, this is buggy and will overwrite the content of
pci_regs_buf for every device (they will all write over the same
portion of the log).

Ben.
Gavin Shan June 25, 2013, 11:49 p.m. UTC | #3
On Tue, Jun 25, 2013 at 09:55:15PM +1000, Benjamin Herrenschmidt wrote:
>On Tue, 2013-06-25 at 18:00 +0800, Gavin Shan wrote:
>> +       /*
>> +        * When the PHB is fenced or dead, it's pointless to collect
>> +        * the data from PCI config space because it should return
>> +        * 0xFF's. For ER, we still retrieve the data from the PCI
>> +        * config space.
>> +        */
>> +       if (eeh_probe_mode_dev() &&
>> +           (pe->type & EEH_PE_PHB) &&
>> +           (pe->state & (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD)))
>> +               valid_cfg_log = false;
>> +
>
>I'm still unsure about that one. EEH_PE_ISOLATED could be the result
>of a normal ER of PE#0 (which can happen for various reasons other
>than a fence) in which case the config space is available and
>interesting.
>

It's something like the followings. For ER on PE#0, we will have
PE with type of EEH_PE_BUS marked as isolated, instead of the
one with EEH_PE_PHB.


	[ EEH_PE_PHB] <---> [ EEH_PE_PHB] <---> [ EEH_PE_PHB]
				  |
			    [ EEH_PE_BUS ] PE#0
				  |
		        -------------------------
			|			|
		   [ EEH_PE_BUS ] PE#1	   [ EEH_PE_BUS] PE#2

>I would either not bother and collect the FF's, or make this specific
>to fence and only fence.
>

I'd like to keep it specific to fenced PHB and it's already be that :-)

Thanks,
Gavin
Gavin Shan June 25, 2013, 11:50 p.m. UTC | #4
On Tue, Jun 25, 2013 at 09:56:06PM +1000, Benjamin Herrenschmidt wrote:
>On Tue, 2013-06-25 at 18:00 +0800, Gavin Shan wrote:
>> +               pci_regs_buf[0] = 0;
>> +               eeh_pe_for_each_dev(pe, edev) {
>> +                       loglen += eeh_gather_pci_data(edev, pci_regs_buf,
>> +                                                     EEH_PCI_REGS_LOG_LEN);
>> +               }
>> +       }
>
>Unless I'm mistaken, this is buggy and will overwrite the content of
>pci_regs_buf for every device (they will all write over the same
>portion of the log).
>

No, you're right. I'm going to fix it in next revision :-)

Thanks,
Gavin
Benjamin Herrenschmidt June 25, 2013, 11:57 p.m. UTC | #5
On Wed, 2013-06-26 at 07:49 +0800, Gavin Shan wrote:
> It's something like the followings. For ER on PE#0, we will have
> PE with type of EEH_PE_BUS marked as isolated, instead of the
> one with EEH_PE_PHB.
> 
> 
>         [ EEH_PE_PHB] <---> [ EEH_PE_PHB] <---> [ EEH_PE_PHB]
>                                   |
>                             [ EEH_PE_BUS ] PE#0
>                                   |

So we actually have two PEs here ? One real (PE#0) and one imaginary
(PHB PE) with no PE# associated ?

>                         -------------------------
>                         |                       |
>                    [ EEH_PE_BUS ] PE#1     [ EEH_PE_BUS] PE#2
> 
> >I would either not bother and collect the FF's, or make this specific
> >to fence and only fence.
> >
> 
> I'd like to keep it specific to fenced PHB and it's already be
> that :-)

Cheers,
Ben.
Gavin Shan June 26, 2013, 12:12 a.m. UTC | #6
On Wed, Jun 26, 2013 at 09:57:26AM +1000, Benjamin Herrenschmidt wrote:
>On Wed, 2013-06-26 at 07:49 +0800, Gavin Shan wrote:
>> It's something like the followings. For ER on PE#0, we will have
>> PE with type of EEH_PE_BUS marked as isolated, instead of the
>> one with EEH_PE_PHB.
>> 
>> 
>>         [ EEH_PE_PHB] <---> [ EEH_PE_PHB] <---> [ EEH_PE_PHB]
>>                                   |
>>                             [ EEH_PE_BUS ] PE#0
>>                                   |
>
>So we actually have two PEs here ? One real (PE#0) and one imaginary
>(PHB PE) with no PE# associated ?
>

Yes, The (PHB PE) is actually a container to all PEs under the
PHB ;-)

>>                         -------------------------
>>                         |                       |
>>                    [ EEH_PE_BUS ] PE#1     [ EEH_PE_BUS] PE#2
>> 
>> >I would either not bother and collect the FF's, or make this specific
>> >to fence and only fence.
>> >
>> 
>> I'd like to keep it specific to fenced PHB and it's already be
>> that :-)

Thanks,
Gavin
diff mbox

Patch

diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 951a632..60deb42 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -232,16 +232,30 @@  void eeh_slot_error_detail(struct eeh_pe *pe, int severity)
 {
 	size_t loglen = 0;
 	struct eeh_dev *edev;
+	bool valid_cfg_log = true;
 
-	eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
-	eeh_ops->configure_bridge(pe);
-	eeh_pe_restore_bars(pe);
-
-	pci_regs_buf[0] = 0;
-	eeh_pe_for_each_dev(pe, edev) {
-		loglen += eeh_gather_pci_data(edev, pci_regs_buf,
-				EEH_PCI_REGS_LOG_LEN);
-        }
+	/*
+	 * When the PHB is fenced or dead, it's pointless to collect
+	 * the data from PCI config space because it should return
+	 * 0xFF's. For ER, we still retrieve the data from the PCI
+	 * config space.
+	 */
+	if (eeh_probe_mode_dev() &&
+	    (pe->type & EEH_PE_PHB) &&
+	    (pe->state & (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD)))
+		valid_cfg_log = false;
+
+	if (valid_cfg_log) {
+		eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
+		eeh_ops->configure_bridge(pe);
+		eeh_pe_restore_bars(pe);
+
+		pci_regs_buf[0] = 0;
+		eeh_pe_for_each_dev(pe, edev) {
+			loglen += eeh_gather_pci_data(edev, pci_regs_buf,
+						      EEH_PCI_REGS_LOG_LEN);
+		}
+	}
 
 	eeh_ops->get_log(pe, severity, pci_regs_buf, loglen);
 }