Message ID | 1372154461-29674-2-git-send-email-shangw@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Accepted, archived |
Commit | 812d5cdc7dc97791a0838c47f5e8762f1b25a308 |
Headers | show |
On Tue, 2013-06-25 at 18:00 +0800, Gavin Shan wrote: > + /* > + * When the PHB is fenced or dead, it's pointless to collect > + * the data from PCI config space because it should return > + * 0xFF's. For ER, we still retrieve the data from the PCI > + * config space. > + */ > + if (eeh_probe_mode_dev() && > + (pe->type & EEH_PE_PHB) && > + (pe->state & (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD))) > + valid_cfg_log = false; > + I'm still unsure about that one. EEH_PE_ISOLATED could be the result of a normal ER of PE#0 (which can happen for various reasons other than a fence) in which case the config space is available and interesting. I would either not bother and collect the FF's, or make this specific to fence and only fence. Cheers, Ben.
On Tue, 2013-06-25 at 18:00 +0800, Gavin Shan wrote: > + pci_regs_buf[0] = 0; > + eeh_pe_for_each_dev(pe, edev) { > + loglen += eeh_gather_pci_data(edev, pci_regs_buf, > + EEH_PCI_REGS_LOG_LEN); > + } > + } Unless I'm mistaken, this is buggy and will overwrite the content of pci_regs_buf for every device (they will all write over the same portion of the log). Ben.
On Tue, Jun 25, 2013 at 09:55:15PM +1000, Benjamin Herrenschmidt wrote: >On Tue, 2013-06-25 at 18:00 +0800, Gavin Shan wrote: >> + /* >> + * When the PHB is fenced or dead, it's pointless to collect >> + * the data from PCI config space because it should return >> + * 0xFF's. For ER, we still retrieve the data from the PCI >> + * config space. >> + */ >> + if (eeh_probe_mode_dev() && >> + (pe->type & EEH_PE_PHB) && >> + (pe->state & (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD))) >> + valid_cfg_log = false; >> + > >I'm still unsure about that one. EEH_PE_ISOLATED could be the result >of a normal ER of PE#0 (which can happen for various reasons other >than a fence) in which case the config space is available and >interesting. > It's something like the followings. For ER on PE#0, we will have PE with type of EEH_PE_BUS marked as isolated, instead of the one with EEH_PE_PHB. [ EEH_PE_PHB] <---> [ EEH_PE_PHB] <---> [ EEH_PE_PHB] | [ EEH_PE_BUS ] PE#0 | ------------------------- | | [ EEH_PE_BUS ] PE#1 [ EEH_PE_BUS] PE#2 >I would either not bother and collect the FF's, or make this specific >to fence and only fence. > I'd like to keep it specific to fenced PHB and it's already be that :-) Thanks, Gavin
On Tue, Jun 25, 2013 at 09:56:06PM +1000, Benjamin Herrenschmidt wrote: >On Tue, 2013-06-25 at 18:00 +0800, Gavin Shan wrote: >> + pci_regs_buf[0] = 0; >> + eeh_pe_for_each_dev(pe, edev) { >> + loglen += eeh_gather_pci_data(edev, pci_regs_buf, >> + EEH_PCI_REGS_LOG_LEN); >> + } >> + } > >Unless I'm mistaken, this is buggy and will overwrite the content of >pci_regs_buf for every device (they will all write over the same >portion of the log). > No, you're right. I'm going to fix it in next revision :-) Thanks, Gavin
On Wed, 2013-06-26 at 07:49 +0800, Gavin Shan wrote: > It's something like the followings. For ER on PE#0, we will have > PE with type of EEH_PE_BUS marked as isolated, instead of the > one with EEH_PE_PHB. > > > [ EEH_PE_PHB] <---> [ EEH_PE_PHB] <---> [ EEH_PE_PHB] > | > [ EEH_PE_BUS ] PE#0 > | So we actually have two PEs here ? One real (PE#0) and one imaginary (PHB PE) with no PE# associated ? > ------------------------- > | | > [ EEH_PE_BUS ] PE#1 [ EEH_PE_BUS] PE#2 > > >I would either not bother and collect the FF's, or make this specific > >to fence and only fence. > > > > I'd like to keep it specific to fenced PHB and it's already be > that :-) Cheers, Ben.
On Wed, Jun 26, 2013 at 09:57:26AM +1000, Benjamin Herrenschmidt wrote: >On Wed, 2013-06-26 at 07:49 +0800, Gavin Shan wrote: >> It's something like the followings. For ER on PE#0, we will have >> PE with type of EEH_PE_BUS marked as isolated, instead of the >> one with EEH_PE_PHB. >> >> >> [ EEH_PE_PHB] <---> [ EEH_PE_PHB] <---> [ EEH_PE_PHB] >> | >> [ EEH_PE_BUS ] PE#0 >> | > >So we actually have two PEs here ? One real (PE#0) and one imaginary >(PHB PE) with no PE# associated ? > Yes, The (PHB PE) is actually a container to all PEs under the PHB ;-) >> ------------------------- >> | | >> [ EEH_PE_BUS ] PE#1 [ EEH_PE_BUS] PE#2 >> >> >I would either not bother and collect the FF's, or make this specific >> >to fence and only fence. >> > >> >> I'd like to keep it specific to fenced PHB and it's already be >> that :-) Thanks, Gavin
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c index 951a632..60deb42 100644 --- a/arch/powerpc/kernel/eeh.c +++ b/arch/powerpc/kernel/eeh.c @@ -232,16 +232,30 @@ void eeh_slot_error_detail(struct eeh_pe *pe, int severity) { size_t loglen = 0; struct eeh_dev *edev; + bool valid_cfg_log = true; - eeh_pci_enable(pe, EEH_OPT_THAW_MMIO); - eeh_ops->configure_bridge(pe); - eeh_pe_restore_bars(pe); - - pci_regs_buf[0] = 0; - eeh_pe_for_each_dev(pe, edev) { - loglen += eeh_gather_pci_data(edev, pci_regs_buf, - EEH_PCI_REGS_LOG_LEN); - } + /* + * When the PHB is fenced or dead, it's pointless to collect + * the data from PCI config space because it should return + * 0xFF's. For ER, we still retrieve the data from the PCI + * config space. + */ + if (eeh_probe_mode_dev() && + (pe->type & EEH_PE_PHB) && + (pe->state & (EEH_PE_ISOLATED | EEH_PE_PHB_DEAD))) + valid_cfg_log = false; + + if (valid_cfg_log) { + eeh_pci_enable(pe, EEH_OPT_THAW_MMIO); + eeh_ops->configure_bridge(pe); + eeh_pe_restore_bars(pe); + + pci_regs_buf[0] = 0; + eeh_pe_for_each_dev(pe, edev) { + loglen += eeh_gather_pci_data(edev, pci_regs_buf, + EEH_PCI_REGS_LOG_LEN); + } + } eeh_ops->get_log(pe, severity, pci_regs_buf, loglen); }
When the PHB is fenced or dead, it's pointless to collect the data from PCI config space of subordinate PCI devices since it should return 0xFF's. It also has potential risk to incur additional errors. The patch avoids collecting PCI-CFG data while PHB is in fenced or dead state. Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com> --- arch/powerpc/kernel/eeh.c | 34 ++++++++++++++++++++++++---------- 1 files changed, 24 insertions(+), 10 deletions(-)