Message ID | 20180411033758.20794-1-mikey@neuling.org (mailing list archive) |
---|---|
State | Accepted |
Commit | 13a83eac373c49c0a081cbcd137e79210fe78acd |
Headers | show |
Series | powerpc/eeh: Fix enabling bridge MMIO windows | expand |
On Wed, 2018-04-11 at 13:37 +1000, Michael Neuling wrote: > On boot we save the configuration space of PCIe bridges. We do this > so > when we get an EEH event and everything gets reset that we can > restore > them. > > Unfortunately we save this state before we've enabled the MMIO space > on the bridges. Hence if we have to reset the bridge when we come > back > MMIO is not enabled and we end up taking an PE freeze when the driver > starts accessing again. > > This patch forces the memory/MMIO and bus mastering on when restoring > bridges on EEH. Ideally we'd do this correctly by saving the > configuration space writes later, but that will have to come later in > a larger EEH rewrite. For now we have this simple fix. > > The original bug can be triggered on a boston machine by doing: > echo 0x8000000000000000 > > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound > On boston, this PHB has a PCIe switch on it. Without this patch, > you'll see two EEH events, 1 expected and 1 the failure we are fixing > here. The second EEH event causes the anything under the PHB to > disappear (i.e. the i40e eth). > > With this patch, only 1 EEH event occurs and devices properly > recover. > > Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> > Signed-off-by: Michael Neuling <mikey@neuling.org> > Cc: stable@vger.kernel.org Acked-by: Russell Currey <ruscur@russell.cc>
On Wed, 2018-04-11 at 03:37:58 UTC, Michael Neuling wrote: > On boot we save the configuration space of PCIe bridges. We do this so > when we get an EEH event and everything gets reset that we can restore > them. > > Unfortunately we save this state before we've enabled the MMIO space > on the bridges. Hence if we have to reset the bridge when we come back > MMIO is not enabled and we end up taking an PE freeze when the driver > starts accessing again. > > This patch forces the memory/MMIO and bus mastering on when restoring > bridges on EEH. Ideally we'd do this correctly by saving the > configuration space writes later, but that will have to come later in > a larger EEH rewrite. For now we have this simple fix. > > The original bug can be triggered on a boston machine by doing: > echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound > On boston, this PHB has a PCIe switch on it. Without this patch, > you'll see two EEH events, 1 expected and 1 the failure we are fixing > here. The second EEH event causes the anything under the PHB to > disappear (i.e. the i40e eth). > > With this patch, only 1 EEH event occurs and devices properly recover. > > Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> > Signed-off-by: Michael Neuling <mikey@neuling.org> > Cc: stable@vger.kernel.org > Acked-by: Russell Currey <ruscur@russell.cc> Applied to powerpc fixes, thanks. https://git.kernel.org/powerpc/c/13a83eac373c49c0a081cbcd137e79 cheers
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c index 2d4956e97a..ee5a67d57a 100644 --- a/arch/powerpc/kernel/eeh_pe.c +++ b/arch/powerpc/kernel/eeh_pe.c @@ -807,7 +807,8 @@ static void eeh_restore_bridge_bars(struct eeh_dev *edev) eeh_ops->write_config(pdn, 15*4, 4, edev->config_space[15]); /* PCI Command: 0x4 */ - eeh_ops->write_config(pdn, PCI_COMMAND, 4, edev->config_space[1]); + eeh_ops->write_config(pdn, PCI_COMMAND, 4, edev->config_space[1] | + PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER); /* Check the PCIe link is ready */ eeh_bridge_check_link(edev);
On boot we save the configuration space of PCIe bridges. We do this so when we get an EEH event and everything gets reset that we can restore them. Unfortunately we save this state before we've enabled the MMIO space on the bridges. Hence if we have to reset the bridge when we come back MMIO is not enabled and we end up taking an PE freeze when the driver starts accessing again. This patch forces the memory/MMIO and bus mastering on when restoring bridges on EEH. Ideally we'd do this correctly by saving the configuration space writes later, but that will have to come later in a larger EEH rewrite. For now we have this simple fix. The original bug can be triggered on a boston machine by doing: echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound On boston, this PHB has a PCIe switch on it. Without this patch, you'll see two EEH events, 1 expected and 1 the failure we are fixing here. The second EEH event causes the anything under the PHB to disappear (i.e. the i40e eth). With this patch, only 1 EEH event occurs and devices properly recover. Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com> Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: stable@vger.kernel.org --- arch/powerpc/kernel/eeh_pe.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)