Message ID | bfb486992d040a2359d30260599216694532e2ce.1521064564.git.joseph.salisbury@canonical.com |
---|---|
State | New |
Headers | show |
Series | [Bionic,1/1] nvme-pci: Fix EEH failure on ppc | expand |
On 14/03/18 22:03, Joseph Salisbury wrote: > From: Wen Xiong <wenxiong@linux.vnet.ibm.com> > > BugLink: http://bugs.launchpad.net/bugs/1753371 > > Triggering PPC EEH detection and handling requires a memory mapped read > failure. The NVMe driver removed the periodic health check MMIO, so > there's no early detection mechanism to trigger the recovery. Instead, > the detection now happens when the nvme driver handles an IO timeout > event. This takes the pci channel offline, so we do not want the driver > to proceed with escalating its own recovery efforts that may conflict > with the EEH handler. > > This patch ensures the driver will observe the channel was set to offline > after a failed MMIO read and resets the IO timer so the EEH handler has > a chance to recover the device. > > Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com> > [updated change log] > Signed-off-by: Keith Busch <keith.busch@intel.com> > > (cherry picked from commit 651438bb0af5213f1f70d66e75bf11d08cb5537a) > Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> > --- > drivers/nvme/host/pci.c | 13 +++++++------ > 1 file changed, 7 insertions(+), 6 deletions(-) > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index 4276ebf..3a0fcb7 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -1148,12 +1148,6 @@ static bool nvme_should_reset(struct nvme_dev *dev, u32 csts) > if (!(csts & NVME_CSTS_CFS) && !nssro) > return false; > > - /* If PCI error recovery process is happening, we cannot reset or > - * the recovery mechanism will surely fail. > - */ > - if (pci_channel_offline(to_pci_dev(dev->dev))) > - return false; > - > return true; > } > > @@ -1184,6 +1178,13 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved) > struct nvme_command cmd; > u32 csts = readl(dev->bar + NVME_REG_CSTS); > > + /* If PCI error recovery process is happening, we cannot reset or > + * the recovery mechanism will surely fail. > + */ > + mb(); > + if (pci_channel_offline(to_pci_dev(dev->dev))) > + return BLK_EH_RESET_TIMER; > + > /* > * Reset immediately if the controller is failed > */ > Clean upstream cherry pick, looks good to me. Acked-by: Colin Ian King <colin.king@canonical.com>
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 4276ebf..3a0fcb7 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1148,12 +1148,6 @@ static bool nvme_should_reset(struct nvme_dev *dev, u32 csts) if (!(csts & NVME_CSTS_CFS) && !nssro) return false; - /* If PCI error recovery process is happening, we cannot reset or - * the recovery mechanism will surely fail. - */ - if (pci_channel_offline(to_pci_dev(dev->dev))) - return false; - return true; } @@ -1184,6 +1178,13 @@ static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved) struct nvme_command cmd; u32 csts = readl(dev->bar + NVME_REG_CSTS); + /* If PCI error recovery process is happening, we cannot reset or + * the recovery mechanism will surely fail. + */ + mb(); + if (pci_channel_offline(to_pci_dev(dev->dev))) + return BLK_EH_RESET_TIMER; + /* * Reset immediately if the controller is failed */