mbox series

[0/4] Address error and recovery for AER and DPC

Message ID 1514370022-4431-1-git-send-email-poza@codeaurora.org
Headers show
Series Address error and recovery for AER and DPC | expand

Message

Oza Pawandeep Dec. 27, 2017, 10:20 a.m. UTC
This patch set brings in support for DPC and AER to co-exist and not to
race for recovery.

The current implementation of AER and error message broadcasting to the
EP driver is tightly coupled and limited to AER service driver.
It is important to factor out broadcasting and other link handling
callbacks. So that not only when AER gets triggered, but also when DPC get
triggered, or both get triggered simultaneously (for e.g. ERR_FATAL),
callbacks are handled appropriately.
having modularized the code, the race between AER and DPC is handled
gracefully.
for e.g. when DPC is active and kicked in, AER should not attempt to do
recovery, because DPC takes care of it.

DPC should enumerate the devices after recovering the link, which is
achieved by implementing error_resume callback.

Oza Pawandeep (4):
  PCI/AER: factor out error reporting from AER
  PCI/DPC/AER: Address Concurrency between AER and DPC
  PCI/ERR: Do not do recovery if DPC service is active
  PCI/DPC: Enumerate the devices after DPC trigger event

 drivers/acpi/apei/ghes.c               |   2 +-
 drivers/pci/pcie/Makefile              |   2 +-
 drivers/pci/pcie/aer/aerdrv.h          |  30 ---
 drivers/pci/pcie/aer/aerdrv_core.c     | 306 +------------------------
 drivers/pci/pcie/aer/aerdrv_errprint.c |  27 ++-
 drivers/pci/pcie/pcie-dpc.c            | 127 ++++++++++-
 drivers/pci/pcie/pcie-err.c            | 392 +++++++++++++++++++++++++++++++++
 drivers/pci/pcie/portdrv.h             |   2 +
 include/linux/aer.h                    |   4 -
 include/linux/pci.h                    |  23 ++
 10 files changed, 569 insertions(+), 346 deletions(-)
 create mode 100644 drivers/pci/pcie/pcie-err.c

Comments

Keith Busch Dec. 28, 2017, 5:34 p.m. UTC | #1
On Wed, Dec 27, 2017 at 02:20:18AM -0800, Oza Pawandeep wrote:
> DPC should enumerate the devices after recovering the link, which is
> achieved by implementing error_resume callback.

Wouldn't that race with the link-up event that pciehp currently handles?
Oza Pawandeep Dec. 29, 2017, 5:15 a.m. UTC | #2
On 2017-12-28 23:04, Keith Busch wrote:
> On Wed, Dec 27, 2017 at 02:20:18AM -0800, Oza Pawandeep wrote:
>> DPC should enumerate the devices after recovering the link, which is
>> achieved by implementing error_resume callback.
> 
> Wouldn't that race with the link-up event that pciehp currently 
> handles?

It is with pci_lock_rescan_remove().
I was able to test with and without pciehp on our platform, and things 
seemed to be okay.

Regards,
Oza.