mbox series

[v5,00/11] PCI: Improve PCIe Capability RMW concurrency control

Message ID 20230717120503.15276-1-ilpo.jarvinen@linux.intel.com
Headers show
Series PCI: Improve PCIe Capability RMW concurrency control | expand

Message

Ilpo Järvinen July 17, 2023, 12:04 p.m. UTC
PCI Express Capability RMW accessors don't properly protect against
concurrent access. Link Control Register is written by a number of
things in the kernel in a RMW fashion without any concurrency control.
This could in the unlucky case lead to losing one of the updates. One
of the most obvious path which can race with most of the other LNKCTL
RMW operations seems to be ASPM policy sysfs write which triggers
LNKCTL update. Similarly, Root Control Register can be concurrently
accessed by AER and PME.

Make pcie_capability_clear_and_set_word() (and other RMW accessors that
call it) to use a per device spinlock to protect the RMW operations to
the Capability Registers that require locking. Convert open-coded
LNKCTL RMW operations to use pcie_capability_clear_and_set_word() to
benefit from the locking.

There's also a related series which improves ASPM service driver and
device driver coordination by removing out-of-band ASPM state
management from device drivers (which will remove some of the code
fragments changed by this series but it has higher regression
potential which is why it seems prudent to do these changes in two
steps):
  https://lore.kernel.org/linux-pci/20230602114751.19671-1-ilpo.jarvinen@linux.intel.com/T/#t

v5:
- Remove reversed logic from a conditional
- Use a variable for CCC setup

v4:
- Rebased on top of pci/main
- Added patch to update documentation

v3:
- Split link retraining change off from ASPM patch & reorder it earlier
- Adjust changelog to take into account the move of link retraining
  code into PCI core and no longer refer to ASPM (currently in
  pci/enumeration branch)
- based on top of pci/main

v2:
- Keep the RMW ops caller API the same
- Make pcie_capability_clear_and_set_word() a wrapper that uses
  locked/unlocked variant based on the capability reg
- Extracted LNKCTL2 changes out from this series to keep this purely
  a series which fixes something (LNKCTL2 RMW lock is necessary only
  when PCIe BW control is introduced).
- Added Fixes tags (it's a bit rathole but yeah, they're there now).
- Renamed cap_lock to pcie_cap_lock
- Changed ath1* to clear the ASPMC field before setting it

Ilpo Järvinen (11):
  PCI: Add locking to RMW PCI Express Capability Register accessors
  PCI: Make link retraining use RMW accessors for changing LNKCTL
  PCI: pciehp: Use RMW accessors for changing LNKCTL
  PCI/ASPM: Use RMW accessors for changing LNKCTL
  drm/amdgpu: Use RMW accessors for changing LNKCTL
  drm/radeon: Use RMW accessors for changing LNKCTL
  net/mlx5: Use RMW accessors for changing LNKCTL
  wifi: ath11k: Use RMW accessors for changing LNKCTL
  wifi: ath12k: Use RMW accessors for changing LNKCTL
  wifi: ath10k: Use RMW accessors for changing LNKCTL
  PCI: Document the Capability accessor RMW improvements

 Documentation/PCI/pciebus-howto.rst           | 14 ++++---
 drivers/gpu/drm/amd/amdgpu/cik.c              | 36 +++++-------------
 drivers/gpu/drm/amd/amdgpu/si.c               | 36 +++++-------------
 drivers/gpu/drm/radeon/cik.c                  | 36 +++++-------------
 drivers/gpu/drm/radeon/si.c                   | 37 +++++--------------
 .../ethernet/mellanox/mlx5/core/fw_reset.c    |  9 +----
 drivers/net/wireless/ath/ath10k/pci.c         |  9 +++--
 drivers/net/wireless/ath/ath11k/pci.c         | 10 +++--
 drivers/net/wireless/ath/ath12k/pci.c         | 10 +++--
 drivers/pci/access.c                          | 20 ++++++++--
 drivers/pci/hotplug/pciehp_hpc.c              | 12 ++----
 drivers/pci/pci.c                             |  8 +---
 drivers/pci/pcie/aspm.c                       | 30 +++++++--------
 drivers/pci/probe.c                           |  1 +
 include/linux/pci.h                           | 34 ++++++++++++++++-
 15 files changed, 136 insertions(+), 166 deletions(-)

Comments

Bjorn Helgaas Aug. 10, 2023, 4:17 p.m. UTC | #1
On Mon, Jul 17, 2023 at 03:04:52PM +0300, Ilpo Järvinen wrote:
> PCI Express Capability RMW accessors don't properly protect against
> concurrent access. Link Control Register is written by a number of
> things in the kernel in a RMW fashion without any concurrency control.
> This could in the unlucky case lead to losing one of the updates. One
> of the most obvious path which can race with most of the other LNKCTL
> RMW operations seems to be ASPM policy sysfs write which triggers
> LNKCTL update. Similarly, Root Control Register can be concurrently
> accessed by AER and PME.
> 
> Make pcie_capability_clear_and_set_word() (and other RMW accessors that
> call it) to use a per device spinlock to protect the RMW operations to
> the Capability Registers that require locking. Convert open-coded
> LNKCTL RMW operations to use pcie_capability_clear_and_set_word() to
> benefit from the locking.
> 
> There's also a related series which improves ASPM service driver and
> device driver coordination by removing out-of-band ASPM state
> management from device drivers (which will remove some of the code
> fragments changed by this series but it has higher regression
> potential which is why it seems prudent to do these changes in two
> steps):
>   https://lore.kernel.org/linux-pci/20230602114751.19671-1-ilpo.jarvinen@linux.intel.com/T/#t
> 
> v5:
> - Remove reversed logic from a conditional
> - Use a variable for CCC setup
> 
> v4:
> - Rebased on top of pci/main
> - Added patch to update documentation
> 
> v3:
> - Split link retraining change off from ASPM patch & reorder it earlier
> - Adjust changelog to take into account the move of link retraining
>   code into PCI core and no longer refer to ASPM (currently in
>   pci/enumeration branch)
> - based on top of pci/main
> 
> v2:
> - Keep the RMW ops caller API the same
> - Make pcie_capability_clear_and_set_word() a wrapper that uses
>   locked/unlocked variant based on the capability reg
> - Extracted LNKCTL2 changes out from this series to keep this purely
>   a series which fixes something (LNKCTL2 RMW lock is necessary only
>   when PCIe BW control is introduced).
> - Added Fixes tags (it's a bit rathole but yeah, they're there now).
> - Renamed cap_lock to pcie_cap_lock
> - Changed ath1* to clear the ASPMC field before setting it
> 
> Ilpo Järvinen (11):
>   PCI: Add locking to RMW PCI Express Capability Register accessors
>   PCI: Make link retraining use RMW accessors for changing LNKCTL
>   PCI: pciehp: Use RMW accessors for changing LNKCTL
>   PCI/ASPM: Use RMW accessors for changing LNKCTL
>   drm/amdgpu: Use RMW accessors for changing LNKCTL
>   drm/radeon: Use RMW accessors for changing LNKCTL
>   net/mlx5: Use RMW accessors for changing LNKCTL
>   wifi: ath11k: Use RMW accessors for changing LNKCTL
>   wifi: ath12k: Use RMW accessors for changing LNKCTL
>   wifi: ath10k: Use RMW accessors for changing LNKCTL
>   PCI: Document the Capability accessor RMW improvements
> 
>  Documentation/PCI/pciebus-howto.rst           | 14 ++++---
>  drivers/gpu/drm/amd/amdgpu/cik.c              | 36 +++++-------------
>  drivers/gpu/drm/amd/amdgpu/si.c               | 36 +++++-------------
>  drivers/gpu/drm/radeon/cik.c                  | 36 +++++-------------
>  drivers/gpu/drm/radeon/si.c                   | 37 +++++--------------
>  .../ethernet/mellanox/mlx5/core/fw_reset.c    |  9 +----
>  drivers/net/wireless/ath/ath10k/pci.c         |  9 +++--
>  drivers/net/wireless/ath/ath11k/pci.c         | 10 +++--
>  drivers/net/wireless/ath/ath12k/pci.c         | 10 +++--
>  drivers/pci/access.c                          | 20 ++++++++--
>  drivers/pci/hotplug/pciehp_hpc.c              | 12 ++----
>  drivers/pci/pci.c                             |  8 +---
>  drivers/pci/pcie/aspm.c                       | 30 +++++++--------
>  drivers/pci/probe.c                           |  1 +
>  include/linux/pci.h                           | 34 ++++++++++++++++-
>  15 files changed, 136 insertions(+), 166 deletions(-)

Applied to pci/pcie-rmw for v6.6, thanks!

I removed the stable tags because we don't know of any actual problems
these fix.

Bjorn