mbox series

[v12,00/15] Add RCEC handling to PCI/AER

Message ID 20201121001036.8560-1-sean.v.kelley@intel.com
Headers show
Series Add RCEC handling to PCI/AER | expand

Message

Kelley, Sean V Nov. 21, 2020, 12:10 a.m. UTC
Changes since v11 [1] and based on pci/master tree [2]:

- No functional changes. Tested with aer injection.

- Merge RCEC class code and extended capability patch with usage.
- Apply same optimization for pci_pcie_type(dev) calls in
drivers/pci/pcie/portdrv_pci.c and drivers/pci/pcie/aer.c.
(Kuppuswamy Sathyanarayanan)

[1] https://lore.kernel.org/linux-pci/20201117191954.1322844-1-sean.v.kelley@intel.com/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/log/


Root Complex Event Collectors (RCEC) provide support for terminating error
and PME messages from Root Complex Integrated Endpoints (RCiEPs).  An RCEC
resides on a Bus in the Root Complex. Multiple RCECs can in fact reside on
a single bus. An RCEC will explicitly declare supported RCiEPs through the
Root Complex Endpoint Association Extended Capability.

(See PCIe 5.0-1, sections 1.3.2.3 (RCiEP), and 7.9.10 (RCEC Ext. Cap.))

The kernel lacks handling for these RCECs and the error messages received
from their respective associated RCiEPs. More recently, a new CPU
interconnect, Compute eXpress Link (CXL) depends on RCEC capabilities for
purposes of error messaging from CXL 1.1 supported RCiEP devices.

DocLink: https://www.computeexpresslink.org/

This use case is not limited to CXL. Existing hardware today includes
support for RCECs, such as the Denverton microserver product
family. Future hardware will be forthcoming.

(See Intel Document, Order number: 33061-003US)

So services such as AER or PME could be associated with an RCEC driver.
In the case of CXL, if an RCiEP (i.e., CXL 1.1 device) is associated with a
platform's RCEC it shall signal PME and AER error conditions through that
RCEC.

Towards the above use cases, add the missing RCEC class and extend the
PCIe Root Port and service drivers to allow association of RCiEPs to their
respective parent RCEC and facilitate handling of terminating error and PME
messages.

Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> #non-native/no RCEC


Qiuxu Zhuo (3):
  PCI/RCEC: Bind RCEC devices to the Root Port driver
  PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR
  PCI/AER: Add RCEC AER error injection support

Sean V Kelley (12):
  AER: aer_root_reset() non-native handling
  PCI/RCEC: Cache RCEC capabilities in pci_init_capabilities()
  PCI/ERR: Rename reset_link() to reset_subordinates()
  PCI/ERR: Simplify by using pci_upstream_bridge()
  PCI/ERR: Simplify by computing pci_pcie_type() once
  PCI/ERR: Use "bridge" for clarity in pcie_do_recovery()
  PCI/ERR: Avoid negated conditional for clarity
  PCI/ERR: Add pci_walk_bridge() to pcie_do_recovery()
  PCI/ERR: Limit AER resets in pcie_do_recovery()
  PCI/RCEC: Add pcie_link_rcec() to associate RCiEPs
  PCI/AER: Add pcie_walk_rcec() to RCEC AER handling
  PCI/PME: Add pcie_walk_rcec() to RCEC PME handling

 drivers/pci/pci.h               |  29 ++++-
 drivers/pci/pcie/Makefile       |   2 +-
 drivers/pci/pcie/aer.c          |  89 +++++++++++----
 drivers/pci/pcie/aer_inject.c   |   5 +-
 drivers/pci/pcie/err.c          |  93 +++++++++++-----
 drivers/pci/pcie/pme.c          |  16 ++-
 drivers/pci/pcie/portdrv_core.c |   9 +-
 drivers/pci/pcie/portdrv_pci.c  |  13 ++-
 drivers/pci/pcie/rcec.c         | 190 ++++++++++++++++++++++++++++++++
 drivers/pci/probe.c             |   2 +
 include/linux/pci.h             |   5 +
 include/linux/pci_ids.h         |   1 +
 include/uapi/linux/pci_regs.h   |   7 ++
 13 files changed, 393 insertions(+), 68 deletions(-)
 create mode 100644 drivers/pci/pcie/rcec.c

--
2.29.2

Comments

Bjorn Helgaas Nov. 21, 2020, 4:26 a.m. UTC | #1
On Fri, Nov 20, 2020 at 04:10:21PM -0800, Sean V Kelley wrote:
> Changes since v11 [1] and based on pci/master tree [2]:
> 
> - No functional changes. Tested with aer injection.
> 
> - Merge RCEC class code and extended capability patch with usage.
> - Apply same optimization for pci_pcie_type(dev) calls in
> drivers/pci/pcie/portdrv_pci.c and drivers/pci/pcie/aer.c.
> (Kuppuswamy Sathyanarayanan)
> 
> [1] https://lore.kernel.org/linux-pci/20201117191954.1322844-1-sean.v.kelley@intel.com/
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/log/
> 
> 
> Root Complex Event Collectors (RCEC) provide support for terminating error
> and PME messages from Root Complex Integrated Endpoints (RCiEPs).  An RCEC
> resides on a Bus in the Root Complex. Multiple RCECs can in fact reside on
> a single bus. An RCEC will explicitly declare supported RCiEPs through the
> Root Complex Endpoint Association Extended Capability.
> 
> (See PCIe 5.0-1, sections 1.3.2.3 (RCiEP), and 7.9.10 (RCEC Ext. Cap.))
> 
> The kernel lacks handling for these RCECs and the error messages received
> from their respective associated RCiEPs. More recently, a new CPU
> interconnect, Compute eXpress Link (CXL) depends on RCEC capabilities for
> purposes of error messaging from CXL 1.1 supported RCiEP devices.
> 
> DocLink: https://www.computeexpresslink.org/
> 
> This use case is not limited to CXL. Existing hardware today includes
> support for RCECs, such as the Denverton microserver product
> family. Future hardware will be forthcoming.
> 
> (See Intel Document, Order number: 33061-003US)
> 
> So services such as AER or PME could be associated with an RCEC driver.
> In the case of CXL, if an RCiEP (i.e., CXL 1.1 device) is associated with a
> platform's RCEC it shall signal PME and AER error conditions through that
> RCEC.
> 
> Towards the above use cases, add the missing RCEC class and extend the
> PCIe Root Port and service drivers to allow association of RCiEPs to their
> respective parent RCEC and facilitate handling of terminating error and PME
> messages.
> 
> Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> #non-native/no RCEC
> 
> 
> Qiuxu Zhuo (3):
>   PCI/RCEC: Bind RCEC devices to the Root Port driver
>   PCI/RCEC: Add RCiEP's linked RCEC to AER/ERR
>   PCI/AER: Add RCEC AER error injection support
> 
> Sean V Kelley (12):
>   AER: aer_root_reset() non-native handling
>   PCI/RCEC: Cache RCEC capabilities in pci_init_capabilities()
>   PCI/ERR: Rename reset_link() to reset_subordinates()
>   PCI/ERR: Simplify by using pci_upstream_bridge()
>   PCI/ERR: Simplify by computing pci_pcie_type() once
>   PCI/ERR: Use "bridge" for clarity in pcie_do_recovery()
>   PCI/ERR: Avoid negated conditional for clarity
>   PCI/ERR: Add pci_walk_bridge() to pcie_do_recovery()
>   PCI/ERR: Limit AER resets in pcie_do_recovery()
>   PCI/RCEC: Add pcie_link_rcec() to associate RCiEPs
>   PCI/AER: Add pcie_walk_rcec() to RCEC AER handling
>   PCI/PME: Add pcie_walk_rcec() to RCEC PME handling
> 
>  drivers/pci/pci.h               |  29 ++++-
>  drivers/pci/pcie/Makefile       |   2 +-
>  drivers/pci/pcie/aer.c          |  89 +++++++++++----
>  drivers/pci/pcie/aer_inject.c   |   5 +-
>  drivers/pci/pcie/err.c          |  93 +++++++++++-----
>  drivers/pci/pcie/pme.c          |  16 ++-
>  drivers/pci/pcie/portdrv_core.c |   9 +-
>  drivers/pci/pcie/portdrv_pci.c  |  13 ++-
>  drivers/pci/pcie/rcec.c         | 190 ++++++++++++++++++++++++++++++++
>  drivers/pci/probe.c             |   2 +
>  include/linux/pci.h             |   5 +
>  include/linux/pci_ids.h         |   1 +
>  include/uapi/linux/pci_regs.h   |   7 ++
>  13 files changed, 393 insertions(+), 68 deletions(-)
>  create mode 100644 drivers/pci/pcie/rcec.c

Good timing, I was just tidying up v11 :)

Anyway, I applied this to pci/err for v5.11, thanks!

Now I see a Tested-by from Jonathan above; this cover letter doesn't
become part of the git history, so probably I should add that to each
individual patch, or maybe just the relevant ones if there are some
that it wouldn't apply to.  I'll tidy that up next week.

Minor procedural things I fixed up already because I think they're a
consequence of you building on a previous branch I published: patches
you post shouldn't include Signed-off-by; you should add your own when
you write the patch or are part of the delivery path, but you
shouldn't add *mine*.  I add that when I apply them.

I also removed the first Link: tags since they also look like they're
from an older version.  You don't need to add those; I add those
automatically so they point to the mailing list message where the
patch was posted.