powerpc/eeh: Fix enabling bridge MMIO windows

Message ID 20180411033758.20794-1-mikey@neuling.org
State Accepted
Commit 13a83eac373c49c0a081cbcd137e79210fe78acd
Headers show
Series
  • powerpc/eeh: Fix enabling bridge MMIO windows
Related show

Commit Message

Michael Neuling April 11, 2018, 3:37 a.m.
On boot we save the configuration space of PCIe bridges. We do this so
when we get an EEH event and everything gets reset that we can restore
them.

Unfortunately we save this state before we've enabled the MMIO space
on the bridges. Hence if we have to reset the bridge when we come back
MMIO is not enabled and we end up taking an PE freeze when the driver
starts accessing again.

This patch forces the memory/MMIO and bus mastering on when restoring
bridges on EEH. Ideally we'd do this correctly by saving the
configuration space writes later, but that will have to come later in
a larger EEH rewrite. For now we have this simple fix.

The original bug can be triggered on a boston machine by doing:
  echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound
On boston, this PHB has a PCIe switch on it.  Without this patch,
you'll see two EEH events, 1 expected and 1 the failure we are fixing
here. The second EEH event causes the anything under the PHB to
disappear (i.e. the i40e eth).

With this patch, only 1 EEH event occurs and devices properly recover.

Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Cc: stable@vger.kernel.org
---
 arch/powerpc/kernel/eeh_pe.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Russell Currey April 19, 2018, 1:15 a.m. | #1
On Wed, 2018-04-11 at 13:37 +1000, Michael Neuling wrote:
> On boot we save the configuration space of PCIe bridges. We do this
> so
> when we get an EEH event and everything gets reset that we can
> restore
> them.
> 
> Unfortunately we save this state before we've enabled the MMIO space
> on the bridges. Hence if we have to reset the bridge when we come
> back
> MMIO is not enabled and we end up taking an PE freeze when the driver
> starts accessing again.
> 
> This patch forces the memory/MMIO and bus mastering on when restoring
> bridges on EEH. Ideally we'd do this correctly by saving the
> configuration space writes later, but that will have to come later in
> a larger EEH rewrite. For now we have this simple fix.
> 
> The original bug can be triggered on a boston machine by doing:
>   echo 0x8000000000000000 >
> /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound
> On boston, this PHB has a PCIe switch on it.  Without this patch,
> you'll see two EEH events, 1 expected and 1 the failure we are fixing
> here. The second EEH event causes the anything under the PHB to
> disappear (i.e. the i40e eth).
> 
> With this patch, only 1 EEH event occurs and devices properly
> recover.
> 
> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Cc: stable@vger.kernel.org

Acked-by: Russell Currey <ruscur@russell.cc>
Michael Ellerman April 19, 2018, 1:42 p.m. | #2
On Wed, 2018-04-11 at 03:37:58 UTC, Michael Neuling wrote:
> On boot we save the configuration space of PCIe bridges. We do this so
> when we get an EEH event and everything gets reset that we can restore
> them.
> 
> Unfortunately we save this state before we've enabled the MMIO space
> on the bridges. Hence if we have to reset the bridge when we come back
> MMIO is not enabled and we end up taking an PE freeze when the driver
> starts accessing again.
> 
> This patch forces the memory/MMIO and bus mastering on when restoring
> bridges on EEH. Ideally we'd do this correctly by saving the
> configuration space writes later, but that will have to come later in
> a larger EEH rewrite. For now we have this simple fix.
> 
> The original bug can be triggered on a boston machine by doing:
>   echo 0x8000000000000000 > /sys/kernel/debug/powerpc/PCI0001/err_injct_outbound
> On boston, this PHB has a PCIe switch on it.  Without this patch,
> you'll see two EEH events, 1 expected and 1 the failure we are fixing
> here. The second EEH event causes the anything under the PHB to
> disappear (i.e. the i40e eth).
> 
> With this patch, only 1 EEH event occurs and devices properly recover.
> 
> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
> Signed-off-by: Michael Neuling <mikey@neuling.org>
> Cc: stable@vger.kernel.org
> Acked-by: Russell Currey <ruscur@russell.cc>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/13a83eac373c49c0a081cbcd137e79

cheers

Patch

diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 2d4956e97a..ee5a67d57a 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -807,7 +807,8 @@  static void eeh_restore_bridge_bars(struct eeh_dev *edev)
 	eeh_ops->write_config(pdn, 15*4, 4, edev->config_space[15]);
 
 	/* PCI Command: 0x4 */
-	eeh_ops->write_config(pdn, PCI_COMMAND, 4, edev->config_space[1]);
+	eeh_ops->write_config(pdn, PCI_COMMAND, 4, edev->config_space[1] |
+			      PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER);
 
 	/* Check the PCIe link is ready */
 	eeh_bridge_check_link(edev);