diff mbox series

[1/4] PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset

Message ID 251f4edcc04c14f873ff1c967bc686169cd07d2d.1627638184.git.lukas@wunner.de
State New
Headers show
Series pciehp error recovery fix + cleanups | expand

Commit Message

Lukas Wunner July 31, 2021, 12:39 p.m. UTC
Stuart Hayes reports that an error handled by DPC at a Root Port results
in pciehp gratuitously bringing down a subordinate hotplug port:

  RP -- UP -- DP -- UP -- DP (hotplug) -- EP

pciehp brings the slot down because the Link to the Endpoint goes down.
That is caused by a Hot Reset being propagated as a result of DPC.
Per PCIe Base Spec 5.0, section 6.6.1 "Conventional Reset":

  For a Switch, the following must cause a hot reset to be sent on all
  Downstream Ports: [...]

  * The Data Link Layer of the Upstream Port reporting DL_Down status.
    In Switches that support Link speeds greater than 5.0 GT/s, the
    Upstream Port must direct the LTSSM of each Downstream Port to the
    Hot Reset state, but not hold the LTSSMs in that state. This permits
    each Downstream Port to begin Link training immediately after its
    hot reset completes. This behavior is recommended for all Switches.

  * Receiving a hot reset on the Upstream Port.

Once DPC recovers, pcie_do_recovery() walks down the hierarchy and
invokes pcie_portdrv_slot_reset() to restore each port's config space.
At that point, a hotplug interrupt is signaled per PCIe Base Spec r5.0,
section 6.7.3.4 "Software Notification of Hot-Plug Events":

  If the Port is enabled for edge-triggered interrupt signaling using
  MSI or MSI-X, an interrupt message must be sent every time the logical
  AND of the following conditions transitions from FALSE to TRUE: [...]

  * The Hot-Plug Interrupt Enable bit in the Slot Control register is
    set to 1b.

  * At least one hot-plug event status bit in the Slot Status register
    and its associated enable bit in the Slot Control register are both
    set to 1b.

Prevent pciehp from gratuitously bringing down the slot by clearing the
error-induced Data Link Layer State Changed event before restoring
config space.  Afterwards, check whether the link has unexpectedly
failed to retrain and synthesize a DLLSC event if so.

Allow each pcie_port_service_driver (one of them being pciehp) to define
a slot_reset callback and re-use the existing pm_iter() function to
iterate over the callbacks.

Thereby, the Endpoint driver remains bound throughout error recovery and
may restore the device to working state.

Surprise removal during error recovery is detected through a Presence
Detect Changed event.  The hotplug port is expected to not signal that
event as a result of a Hot Reset.

The issue isn't DPC-specific, it also occurs when an error is handled by
AER through aer_root_reset().  So while the issue was noticed only now,
it's been around since 2006 when AER support was first introduced.

Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
Link: https://lore.kernel.org/linux-pci/08c046b0-c9f2-3489-eeef-7e7aca435bb9@gmail.com/
Reported-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v2.6.19+: ba952824e6c1: PCI/portdrv: Report reset for frozen channel
Cc: Keith Busch <kbusch@kernel.org>
---
 drivers/pci/Kconfig               |  3 +++
 drivers/pci/hotplug/pciehp.h      |  2 ++
 drivers/pci/hotplug/pciehp_core.c |  4 ++++
 drivers/pci/hotplug/pciehp_hpc.c  | 28 ++++++++++++++++++++++++++++
 drivers/pci/pcie/portdrv.h        |  2 ++
 drivers/pci/pcie/portdrv_core.c   | 20 ++++++++++----------
 drivers/pci/pcie/portdrv_pci.c    |  3 +++
 7 files changed, 52 insertions(+), 10 deletions(-)

Comments

Stuart Hayes Sept. 29, 2021, 8:40 p.m. UTC | #1
On 7/31/2021 7:39 AM, Lukas Wunner wrote:
> Stuart Hayes reports that an error handled by DPC at a Root Port results
> in pciehp gratuitously bringing down a subordinate hotplug port:
> 
>    RP -- UP -- DP -- UP -- DP (hotplug) -- EP
> 
> pciehp brings the slot down because the Link to the Endpoint goes down.
> That is caused by a Hot Reset being propagated as a result of DPC.
> Per PCIe Base Spec 5.0, section 6.6.1 "Conventional Reset":
> 
>    For a Switch, the following must cause a hot reset to be sent on all
>    Downstream Ports: [...]
> 
>    * The Data Link Layer of the Upstream Port reporting DL_Down status.
>      In Switches that support Link speeds greater than 5.0 GT/s, the
>      Upstream Port must direct the LTSSM of each Downstream Port to the
>      Hot Reset state, but not hold the LTSSMs in that state. This permits
>      each Downstream Port to begin Link training immediately after its
>      hot reset completes. This behavior is recommended for all Switches.
> 
>    * Receiving a hot reset on the Upstream Port.
> 
> Once DPC recovers, pcie_do_recovery() walks down the hierarchy and
> invokes pcie_portdrv_slot_reset() to restore each port's config space.
> At that point, a hotplug interrupt is signaled per PCIe Base Spec r5.0,
> section 6.7.3.4 "Software Notification of Hot-Plug Events":
> 
>    If the Port is enabled for edge-triggered interrupt signaling using
>    MSI or MSI-X, an interrupt message must be sent every time the logical
>    AND of the following conditions transitions from FALSE to TRUE: [...]
> 
>    * The Hot-Plug Interrupt Enable bit in the Slot Control register is
>      set to 1b.
> 
>    * At least one hot-plug event status bit in the Slot Status register
>      and its associated enable bit in the Slot Control register are both
>      set to 1b.
> 
> Prevent pciehp from gratuitously bringing down the slot by clearing the
> error-induced Data Link Layer State Changed event before restoring
> config space.  Afterwards, check whether the link has unexpectedly
> failed to retrain and synthesize a DLLSC event if so.
> 
> Allow each pcie_port_service_driver (one of them being pciehp) to define
> a slot_reset callback and re-use the existing pm_iter() function to
> iterate over the callbacks.
> 
> Thereby, the Endpoint driver remains bound throughout error recovery and
> may restore the device to working state.
> 
> Surprise removal during error recovery is detected through a Presence
> Detect Changed event.  The hotplug port is expected to not signal that
> event as a result of a Hot Reset.
> 
> The issue isn't DPC-specific, it also occurs when an error is handled by
> AER through aer_root_reset().  So while the issue was noticed only now,
> it's been around since 2006 when AER support was first introduced.
> 
> Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
> Link: https://lore.kernel.org/linux-pci/08c046b0-c9f2-3489-eeef-7e7aca435bb9@gmail.com/
> Reported-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v2.6.19+: ba952824e6c1: PCI/portdrv: Report reset for frozen channel
> Cc: Keith Busch <kbusch@kernel.org>

Tested-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Bjorn Helgaas Oct. 7, 2021, 11 p.m. UTC | #2
On Sat, Jul 31, 2021 at 02:39:01PM +0200, Lukas Wunner wrote:
> Stuart Hayes reports that an error handled by DPC at a Root Port results
> in pciehp gratuitously bringing down a subordinate hotplug port:
> 
>   RP -- UP -- DP -- UP -- DP (hotplug) -- EP
> 
> pciehp brings the slot down because the Link to the Endpoint goes down.
> That is caused by a Hot Reset being propagated as a result of DPC.
> Per PCIe Base Spec 5.0, section 6.6.1 "Conventional Reset":
> 
>   For a Switch, the following must cause a hot reset to be sent on all
>   Downstream Ports: [...]
> 
>   * The Data Link Layer of the Upstream Port reporting DL_Down status.
>     In Switches that support Link speeds greater than 5.0 GT/s, the
>     Upstream Port must direct the LTSSM of each Downstream Port to the
>     Hot Reset state, but not hold the LTSSMs in that state. This permits
>     each Downstream Port to begin Link training immediately after its
>     hot reset completes. This behavior is recommended for all Switches.
> 
>   * Receiving a hot reset on the Upstream Port.
> 
> Once DPC recovers, pcie_do_recovery() walks down the hierarchy and
> invokes pcie_portdrv_slot_reset() to restore each port's config space.
> At that point, a hotplug interrupt is signaled per PCIe Base Spec r5.0,
> section 6.7.3.4 "Software Notification of Hot-Plug Events":
> 
>   If the Port is enabled for edge-triggered interrupt signaling using
>   MSI or MSI-X, an interrupt message must be sent every time the logical
>   AND of the following conditions transitions from FALSE to TRUE: [...]
> 
>   * The Hot-Plug Interrupt Enable bit in the Slot Control register is
>     set to 1b.
> 
>   * At least one hot-plug event status bit in the Slot Status register
>     and its associated enable bit in the Slot Control register are both
>     set to 1b.
> 
> Prevent pciehp from gratuitously bringing down the slot by clearing the
> error-induced Data Link Layer State Changed event before restoring
> config space.  Afterwards, check whether the link has unexpectedly
> failed to retrain and synthesize a DLLSC event if so.
> 
> Allow each pcie_port_service_driver (one of them being pciehp) to define
> a slot_reset callback and re-use the existing pm_iter() function to
> iterate over the callbacks.
> 
> Thereby, the Endpoint driver remains bound throughout error recovery and
> may restore the device to working state.
> 
> Surprise removal during error recovery is detected through a Presence
> Detect Changed event.  The hotplug port is expected to not signal that
> event as a result of a Hot Reset.
> 
> The issue isn't DPC-specific, it also occurs when an error is handled by
> AER through aer_root_reset().  So while the issue was noticed only now,
> it's been around since 2006 when AER support was first introduced.
> 
> Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
> Link: https://lore.kernel.org/linux-pci/08c046b0-c9f2-3489-eeef-7e7aca435bb9@gmail.com/
> Reported-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> Signed-off-by: Lukas Wunner <lukas@wunner.de>
> Cc: stable@vger.kernel.org # v2.6.19+: ba952824e6c1: PCI/portdrv: Report reset for frozen channel

Have you tried backporting this to v2.6.19?  I bet it's tough.  Not
sure we should suggest that stable pick this up unless there's a
reasonable path to do that.

> Cc: Keith Busch <kbusch@kernel.org>
> ---
>  drivers/pci/Kconfig               |  3 +++
>  drivers/pci/hotplug/pciehp.h      |  2 ++
>  drivers/pci/hotplug/pciehp_core.c |  4 ++++
>  drivers/pci/hotplug/pciehp_hpc.c  | 28 ++++++++++++++++++++++++++++
>  drivers/pci/pcie/portdrv.h        |  2 ++
>  drivers/pci/pcie/portdrv_core.c   | 20 ++++++++++----------
>  drivers/pci/pcie/portdrv_pci.c    |  3 +++
>  7 files changed, 52 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index 0c473d75e625..a295d3c48927 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -182,6 +182,9 @@ config PCI_LABEL
>  	def_bool y if (DMI || ACPI)
>  	select NLS
>  
> +config PCI_ERROR_RECOVERY
> +	def_bool PCIEAER || EEH

I'm having a hard time connecting this to the code.  

pcie_portdrv_slot_reset() is the .slot_reset() method for
pcie_portdriver.  When CONFIG_PCIEPORTBUS=y, portdrv is bound to every
PCIe port and RCEC.

The callers of pci_driver->err_handler->slot_reset() that I see are:

  eeh_report_reset		# arch/powerpc/kernel/eeh_driver.c
  report_slot_reset		# drivers/pci/pcie/err.c
  pci_error_handlers		# drivers/misc/cxl/guest.c
  cxl_pci_slot_reset		# drivers/misc/cxl/pci.c
  pcifront_common_process	# drivers/pci/xen-pcifront.c

I guess the cxl and xen cases probably do not involve PCIe ports;
they're probably only concerned with endpoints, so maybe we can
exclude those.

But this still seems like it's kind of dangling.  It's not obvious to
me why pciehp_slot_reset() should be inside that #ifdef.  Do we need
the #ifdef for a functional reason, or is it there because we know it
will never be called?  If the latter, I don't think the savings are
worth it.

>  config PCI_HYPERV
>  	tristate "Hyper-V PCI Frontend"
>  	depends on X86_64 && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN && SYSFS
> diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
> index d4a930881054..7f24fe30a898 100644
> --- a/drivers/pci/hotplug/pciehp.h
> +++ b/drivers/pci/hotplug/pciehp.h
> @@ -189,6 +189,8 @@ int pciehp_get_attention_status(struct hotplug_slot *hotplug_slot, u8 *status);
>  int pciehp_set_raw_indicator_status(struct hotplug_slot *h_slot, u8 status);
>  int pciehp_get_raw_indicator_status(struct hotplug_slot *h_slot, u8 *status);
>  
> +int pciehp_slot_reset(struct pcie_device *dev);
> +
>  static inline const char *slot_name(struct controller *ctrl)
>  {
>  	return hotplug_slot_name(&ctrl->hotplug_slot);
> diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
> index ad3393930ecb..46a62237dcc8 100644
> --- a/drivers/pci/hotplug/pciehp_core.c
> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -351,6 +351,10 @@ static struct pcie_port_service_driver hpdriver_portdrv = {
>  	.runtime_suspend = pciehp_runtime_suspend,
>  	.runtime_resume	= pciehp_runtime_resume,
>  #endif	/* PM */
> +
> +#ifdef	CONFIG_PCI_ERROR_RECOVERY
> +	.slot_reset	= pciehp_slot_reset,
> +#endif
>  };
>  
>  int __init pcie_hp_init(void)
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
> index 9d06939736c0..72ef22d0c2c9 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -862,6 +862,34 @@ void pcie_disable_interrupt(struct controller *ctrl)
>  	pcie_write_cmd(ctrl, 0, mask);
>  }
>  
> +#ifdef CONFIG_PCI_ERROR_RECOVERY
> +/**
> + * pciehp_slot_reset() - ignore link event caused by error-induced hot reset
> + * @dev: PCI Express port service device
> + *
> + * Called from pcie_portdrv_slot_reset() after AER or DPC initiated a reset
> + * further up in the hierarchy to recover from an error.  The reset was
> + * propagated down to this hotplug port.  Ignore the resulting link flap.
> + * If the link failed to retrain successfully, synthesize the ignored event.
> + * Surprise removal during reset is detected through Presence Detect Changed.
> + */
> +int pciehp_slot_reset(struct pcie_device *dev)
> +{
> +	struct controller *ctrl = get_service_data(dev);
> +
> +	if (ctrl->state != ON_STATE)
> +		return 0;
> +
> +	pcie_capability_write_word(dev->port, PCI_EXP_SLTSTA,
> +				   PCI_EXP_SLTSTA_DLLSC);
> +
> +	if (!pciehp_check_link_active(ctrl))
> +		pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
> +
> +	return 0;
> +}
> +#endif
> +
>  /*
>   * pciehp has a 1:1 bus:slot relationship so we ultimately want a secondary
>   * bus reset of the bridge, but at the same time we want to ensure that it is
> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> index 2ff5724b8f13..92a776d211ec 100644
> --- a/drivers/pci/pcie/portdrv.h
> +++ b/drivers/pci/pcie/portdrv.h
> @@ -85,6 +85,7 @@ struct pcie_port_service_driver {
>  	int (*runtime_suspend)(struct pcie_device *dev);
>  	int (*runtime_resume)(struct pcie_device *dev);
>  
> +	int (*slot_reset)(struct pcie_device *dev);
>  	/* Device driver may resume normal operations */
>  	void (*error_resume)(struct pci_dev *dev);
>  
> @@ -110,6 +111,7 @@ void pcie_port_service_unregister(struct pcie_port_service_driver *new);
>  
>  extern struct bus_type pcie_port_bus_type;
>  int pcie_port_device_register(struct pci_dev *dev);
> +int pcie_port_device_iter(struct device *dev, void *data);
>  #ifdef CONFIG_PM
>  int pcie_port_device_suspend(struct device *dev);
>  int pcie_port_device_resume_noirq(struct device *dev);
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index e1fed6649c41..ebcec7daa245 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -362,24 +362,24 @@ int pcie_port_device_register(struct pci_dev *dev)
>  	return status;
>  }
>  
> -#ifdef CONFIG_PM
> -typedef int (*pcie_pm_callback_t)(struct pcie_device *);
> +typedef int (*pcie_callback_t)(struct pcie_device *);
>  
> -static int pm_iter(struct device *dev, void *data)
> +int pcie_port_device_iter(struct device *dev, void *data)

I like this change, and it seems like it's basically a rename that
could be split off from rest to make the slot_reset part a little more
focused.

>  {
>  	struct pcie_port_service_driver *service_driver;
>  	size_t offset = *(size_t *)data;
> -	pcie_pm_callback_t cb;
> +	pcie_callback_t cb;
>  
>  	if ((dev->bus == &pcie_port_bus_type) && dev->driver) {
>  		service_driver = to_service_driver(dev->driver);
> -		cb = *(pcie_pm_callback_t *)((void *)service_driver + offset);
> +		cb = *(pcie_callback_t *)((void *)service_driver + offset);
>  		if (cb)
>  			return cb(to_pcie_device(dev));
>  	}
>  	return 0;
>  }
>  
> +#ifdef CONFIG_PM
>  /**
>   * pcie_port_device_suspend - suspend port services associated with a PCIe port
>   * @dev: PCI Express port to handle
> @@ -387,13 +387,13 @@ static int pm_iter(struct device *dev, void *data)
>  int pcie_port_device_suspend(struct device *dev)
>  {
>  	size_t off = offsetof(struct pcie_port_service_driver, suspend);
> -	return device_for_each_child(dev, &off, pm_iter);
> +	return device_for_each_child(dev, &off, pcie_port_device_iter);
>  }
>  
>  int pcie_port_device_resume_noirq(struct device *dev)
>  {
>  	size_t off = offsetof(struct pcie_port_service_driver, resume_noirq);
> -	return device_for_each_child(dev, &off, pm_iter);
> +	return device_for_each_child(dev, &off, pcie_port_device_iter);
>  }
>  
>  /**
> @@ -403,7 +403,7 @@ int pcie_port_device_resume_noirq(struct device *dev)
>  int pcie_port_device_resume(struct device *dev)
>  {
>  	size_t off = offsetof(struct pcie_port_service_driver, resume);
> -	return device_for_each_child(dev, &off, pm_iter);
> +	return device_for_each_child(dev, &off, pcie_port_device_iter);
>  }
>  
>  /**
> @@ -413,7 +413,7 @@ int pcie_port_device_resume(struct device *dev)
>  int pcie_port_device_runtime_suspend(struct device *dev)
>  {
>  	size_t off = offsetof(struct pcie_port_service_driver, runtime_suspend);
> -	return device_for_each_child(dev, &off, pm_iter);
> +	return device_for_each_child(dev, &off, pcie_port_device_iter);
>  }
>  
>  /**
> @@ -423,7 +423,7 @@ int pcie_port_device_runtime_suspend(struct device *dev)
>  int pcie_port_device_runtime_resume(struct device *dev)
>  {
>  	size_t off = offsetof(struct pcie_port_service_driver, runtime_resume);
> -	return device_for_each_child(dev, &off, pm_iter);
> +	return device_for_each_child(dev, &off, pcie_port_device_iter);
>  }
>  #endif /* PM */
>  
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index c7ff1eea225a..1af74c3d9d5d 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -160,6 +160,9 @@ static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
>  
>  static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
>  {
> +	size_t off = offsetof(struct pcie_port_service_driver, slot_reset);
> +	device_for_each_child(&dev->dev, &off, pcie_port_device_iter);
> +
>  	pci_restore_state(dev);
>  	pci_save_state(dev);
>  	return PCI_ERS_RESULT_RECOVERED;
> -- 
> 2.31.1
>
Lukas Wunner Oct. 10, 2021, 9:14 a.m. UTC | #3
On Thu, Oct 07, 2021 at 06:00:20PM -0500, Bjorn Helgaas wrote:
> On Sat, Jul 31, 2021 at 02:39:01PM +0200, Lukas Wunner wrote:
> > The issue isn't DPC-specific, it also occurs when an error is handled by
> > AER through aer_root_reset().  So while the issue was noticed only now,
> > it's been around since 2006 when AER support was first introduced.
> > 
> > Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
[...]
> > Cc: stable@vger.kernel.org # v2.6.19+: ba952824e6c1: PCI/portdrv: Report reset for frozen channel
> 
> Have you tried backporting this to v2.6.19?  I bet it's tough.  Not
> sure we should suggest that stable pick this up unless there's a
> reasonable path to do that.

The oldest kernel.org stable release that still receives updates is v4.4.
There may be older distribution kernels out there which continue to be
supported.  We don't know for sure, so I think it's customary to tag
the release that introduced an issue and leave it to the stable and
distribution maintainers to choose what they backport.

It's true that patches often do not apply cleanly to older releases.
There are some good folks who regularly take up the thankless task of
backporting (Sudip Mukherjee to name but one), but I've also frequently
backported patches myself where necessary.


> > +config PCI_ERROR_RECOVERY
> > +	def_bool PCIEAER || EEH
> 
> I'm having a hard time connecting this to the code.
[...]
> But this still seems like it's kind of dangling.  It's not obvious to
> me why pciehp_slot_reset() should be inside that #ifdef.  Do we need
> the #ifdef for a functional reason, or is it there because we know it
> will never be called?  If the latter, I don't think the savings are
> worth it.

The motivation for the #ifdef was merely to reduce code size if neither
of PCIEAER or EEH is enabled.  Happy to remove it.

I have a different question though.  We've often discussed deprecating
portdrv and moving port services to the core instead.  I've stumbled
across commit a39bd851dccf ("PCI/PM: Clear PCIe PME Status bit in core,
not PCIe port driver"), wherein you moved pcie_pme_root_status_cleanup()
from portdrv to the core, which allowed you to drop the ->resume_noirq
callback from portdrv.

I've been thinking what moving port services to the core would look like
and what your vision for that might be.  If that commit is any indication,
it seems you'd probably prefer that pciehp_slot_reset() (as introduced in
the present patch) is called directly from report_slot_reset() in
drivers/pci/pcie/err.c.

That would eliminate much of the plumbing contained in the present patch
to allow each port service driver to define a ->slot_reset callback and
iterate over those callbacks.  pciehp is probably the only port service
which requires special handling upon ->slot_reset and the same goes for
a lot of the error handling and power management callbacks defined by
port services.  So all that plumbing is probably just a very roundabout
way of doing things.

When calling into port service code from core code, one needs to find
the port service's driver data.  For now we can resort to
pcie_port_find_device(), but I imagine once we move all port services
to the core, we'll be able to access their data directly from
struct pci_dev, e.g. via pdev->hotplug or pdev->pcie->hotplug pointers.

Does that make sense to you?

Thanks,

Lukas
Bjorn Helgaas Oct. 11, 2021, 5:25 p.m. UTC | #4
On Sun, Oct 10, 2021 at 11:14:07AM +0200, Lukas Wunner wrote:
> On Thu, Oct 07, 2021 at 06:00:20PM -0500, Bjorn Helgaas wrote:
> > On Sat, Jul 31, 2021 at 02:39:01PM +0200, Lukas Wunner wrote:
> > > The issue isn't DPC-specific, it also occurs when an error is handled by
> > > AER through aer_root_reset().  So while the issue was noticed only now,
> > > it's been around since 2006 when AER support was first introduced.
> > > 
> > > Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver")
> [...]
> > > Cc: stable@vger.kernel.org # v2.6.19+: ba952824e6c1: PCI/portdrv: Report reset for frozen channel
> > 
> > Have you tried backporting this to v2.6.19?  I bet it's tough.  Not
> > sure we should suggest that stable pick this up unless there's a
> > reasonable path to do that.
> 
> The oldest kernel.org stable release that still receives updates is v4.4.
> There may be older distribution kernels out there which continue to be
> supported.  We don't know for sure, so I think it's customary to tag
> the release that introduced an issue and leave it to the stable and
> distribution maintainers to choose what they backport.
> 
> It's true that patches often do not apply cleanly to older releases.
> There are some good folks who regularly take up the thankless task of
> backporting (Sudip Mukherjee to name but one), but I've also frequently
> backported patches myself where necessary.
> 
> 
> > > +config PCI_ERROR_RECOVERY
> > > +	def_bool PCIEAER || EEH
> > 
> > I'm having a hard time connecting this to the code.
> [...]
> > But this still seems like it's kind of dangling.  It's not obvious to
> > me why pciehp_slot_reset() should be inside that #ifdef.  Do we need
> > the #ifdef for a functional reason, or is it there because we know it
> > will never be called?  If the latter, I don't think the savings are
> > worth it.
> 
> The motivation for the #ifdef was merely to reduce code size if neither
> of PCIEAER or EEH is enabled.  Happy to remove it.

That'd be great.

> I have a different question though.  We've often discussed deprecating
> portdrv and moving port services to the core instead.  I've stumbled
> across commit a39bd851dccf ("PCI/PM: Clear PCIe PME Status bit in core,
> not PCIe port driver"), wherein you moved pcie_pme_root_status_cleanup()
> from portdrv to the core, which allowed you to drop the ->resume_noirq
> callback from portdrv.
> 
> I've been thinking what moving port services to the core would look like
> and what your vision for that might be.  If that commit is any indication,
> it seems you'd probably prefer that pciehp_slot_reset() (as introduced in
> the present patch) is called directly from report_slot_reset() in
> drivers/pci/pcie/err.c.
> 
> That would eliminate much of the plumbing contained in the present patch
> to allow each port service driver to define a ->slot_reset callback and
> iterate over those callbacks.  pciehp is probably the only port service
> which requires special handling upon ->slot_reset and the same goes for
> a lot of the error handling and power management callbacks defined by
> port services.  So all that plumbing is probably just a very roundabout
> way of doing things.
> 
> When calling into port service code from core code, one needs to find
> the port service's driver data.  For now we can resort to
> pcie_port_find_device(), but I imagine once we move all port services
> to the core, we'll be able to access their data directly from
> struct pci_dev, e.g. via pdev->hotplug or pdev->pcie->hotplug pointers.

We managed to get rid of pcie_port_find_service() recently, and I'd
love to get rid of pcie_port_find_device() as well.  I'd like to get
rid of struct pcie_device, too, but that's visible via sysfs and would
be harder.  But maybe we could at least stop using it internally.

I don't know exactly what the ->slot_reset() path would look like, and
I'm not suggesting you need to change that for this patch.

Bjorn
diff mbox series

Patch

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 0c473d75e625..a295d3c48927 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -182,6 +182,9 @@  config PCI_LABEL
 	def_bool y if (DMI || ACPI)
 	select NLS
 
+config PCI_ERROR_RECOVERY
+	def_bool PCIEAER || EEH
+
 config PCI_HYPERV
 	tristate "Hyper-V PCI Frontend"
 	depends on X86_64 && HYPERV && PCI_MSI && PCI_MSI_IRQ_DOMAIN && SYSFS
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index d4a930881054..7f24fe30a898 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -189,6 +189,8 @@  int pciehp_get_attention_status(struct hotplug_slot *hotplug_slot, u8 *status);
 int pciehp_set_raw_indicator_status(struct hotplug_slot *h_slot, u8 status);
 int pciehp_get_raw_indicator_status(struct hotplug_slot *h_slot, u8 *status);
 
+int pciehp_slot_reset(struct pcie_device *dev);
+
 static inline const char *slot_name(struct controller *ctrl)
 {
 	return hotplug_slot_name(&ctrl->hotplug_slot);
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index ad3393930ecb..46a62237dcc8 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -351,6 +351,10 @@  static struct pcie_port_service_driver hpdriver_portdrv = {
 	.runtime_suspend = pciehp_runtime_suspend,
 	.runtime_resume	= pciehp_runtime_resume,
 #endif	/* PM */
+
+#ifdef	CONFIG_PCI_ERROR_RECOVERY
+	.slot_reset	= pciehp_slot_reset,
+#endif
 };
 
 int __init pcie_hp_init(void)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 9d06939736c0..72ef22d0c2c9 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -862,6 +862,34 @@  void pcie_disable_interrupt(struct controller *ctrl)
 	pcie_write_cmd(ctrl, 0, mask);
 }
 
+#ifdef CONFIG_PCI_ERROR_RECOVERY
+/**
+ * pciehp_slot_reset() - ignore link event caused by error-induced hot reset
+ * @dev: PCI Express port service device
+ *
+ * Called from pcie_portdrv_slot_reset() after AER or DPC initiated a reset
+ * further up in the hierarchy to recover from an error.  The reset was
+ * propagated down to this hotplug port.  Ignore the resulting link flap.
+ * If the link failed to retrain successfully, synthesize the ignored event.
+ * Surprise removal during reset is detected through Presence Detect Changed.
+ */
+int pciehp_slot_reset(struct pcie_device *dev)
+{
+	struct controller *ctrl = get_service_data(dev);
+
+	if (ctrl->state != ON_STATE)
+		return 0;
+
+	pcie_capability_write_word(dev->port, PCI_EXP_SLTSTA,
+				   PCI_EXP_SLTSTA_DLLSC);
+
+	if (!pciehp_check_link_active(ctrl))
+		pciehp_request(ctrl, PCI_EXP_SLTSTA_DLLSC);
+
+	return 0;
+}
+#endif
+
 /*
  * pciehp has a 1:1 bus:slot relationship so we ultimately want a secondary
  * bus reset of the bridge, but at the same time we want to ensure that it is
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 2ff5724b8f13..92a776d211ec 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -85,6 +85,7 @@  struct pcie_port_service_driver {
 	int (*runtime_suspend)(struct pcie_device *dev);
 	int (*runtime_resume)(struct pcie_device *dev);
 
+	int (*slot_reset)(struct pcie_device *dev);
 	/* Device driver may resume normal operations */
 	void (*error_resume)(struct pci_dev *dev);
 
@@ -110,6 +111,7 @@  void pcie_port_service_unregister(struct pcie_port_service_driver *new);
 
 extern struct bus_type pcie_port_bus_type;
 int pcie_port_device_register(struct pci_dev *dev);
+int pcie_port_device_iter(struct device *dev, void *data);
 #ifdef CONFIG_PM
 int pcie_port_device_suspend(struct device *dev);
 int pcie_port_device_resume_noirq(struct device *dev);
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index e1fed6649c41..ebcec7daa245 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -362,24 +362,24 @@  int pcie_port_device_register(struct pci_dev *dev)
 	return status;
 }
 
-#ifdef CONFIG_PM
-typedef int (*pcie_pm_callback_t)(struct pcie_device *);
+typedef int (*pcie_callback_t)(struct pcie_device *);
 
-static int pm_iter(struct device *dev, void *data)
+int pcie_port_device_iter(struct device *dev, void *data)
 {
 	struct pcie_port_service_driver *service_driver;
 	size_t offset = *(size_t *)data;
-	pcie_pm_callback_t cb;
+	pcie_callback_t cb;
 
 	if ((dev->bus == &pcie_port_bus_type) && dev->driver) {
 		service_driver = to_service_driver(dev->driver);
-		cb = *(pcie_pm_callback_t *)((void *)service_driver + offset);
+		cb = *(pcie_callback_t *)((void *)service_driver + offset);
 		if (cb)
 			return cb(to_pcie_device(dev));
 	}
 	return 0;
 }
 
+#ifdef CONFIG_PM
 /**
  * pcie_port_device_suspend - suspend port services associated with a PCIe port
  * @dev: PCI Express port to handle
@@ -387,13 +387,13 @@  static int pm_iter(struct device *dev, void *data)
 int pcie_port_device_suspend(struct device *dev)
 {
 	size_t off = offsetof(struct pcie_port_service_driver, suspend);
-	return device_for_each_child(dev, &off, pm_iter);
+	return device_for_each_child(dev, &off, pcie_port_device_iter);
 }
 
 int pcie_port_device_resume_noirq(struct device *dev)
 {
 	size_t off = offsetof(struct pcie_port_service_driver, resume_noirq);
-	return device_for_each_child(dev, &off, pm_iter);
+	return device_for_each_child(dev, &off, pcie_port_device_iter);
 }
 
 /**
@@ -403,7 +403,7 @@  int pcie_port_device_resume_noirq(struct device *dev)
 int pcie_port_device_resume(struct device *dev)
 {
 	size_t off = offsetof(struct pcie_port_service_driver, resume);
-	return device_for_each_child(dev, &off, pm_iter);
+	return device_for_each_child(dev, &off, pcie_port_device_iter);
 }
 
 /**
@@ -413,7 +413,7 @@  int pcie_port_device_resume(struct device *dev)
 int pcie_port_device_runtime_suspend(struct device *dev)
 {
 	size_t off = offsetof(struct pcie_port_service_driver, runtime_suspend);
-	return device_for_each_child(dev, &off, pm_iter);
+	return device_for_each_child(dev, &off, pcie_port_device_iter);
 }
 
 /**
@@ -423,7 +423,7 @@  int pcie_port_device_runtime_suspend(struct device *dev)
 int pcie_port_device_runtime_resume(struct device *dev)
 {
 	size_t off = offsetof(struct pcie_port_service_driver, runtime_resume);
-	return device_for_each_child(dev, &off, pm_iter);
+	return device_for_each_child(dev, &off, pcie_port_device_iter);
 }
 #endif /* PM */
 
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index c7ff1eea225a..1af74c3d9d5d 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -160,6 +160,9 @@  static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
 
 static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
 {
+	size_t off = offsetof(struct pcie_port_service_driver, slot_reset);
+	device_for_each_child(&dev->dev, &off, pcie_port_device_iter);
+
 	pci_restore_state(dev);
 	pci_save_state(dev);
 	return PCI_ERS_RESULT_RECOVERED;