Message ID | 20240108194642.30460-1-mattc@purestorage.com |
---|---|
State | New |
Headers | show |
Series | [1/1] PCI/portdrv: Allow DPC if the OS controls AER natively. | expand |
On 1/8/2024 11:46 AM, Matthew W Carlis wrote: > Hello again, sorry for the delayed response.. I have been on PTO. The above patch > doesn't fix the problem in our systems as host->native_dpc is not set due to > not using or having support for Error Disconnect Recovery (EDR). I wonder if The only condition we check before requesting DPC control is whether OS supports both EDR and DPC. As long as you have enabled related configs (IS_ENABLED(CONFIG_PCIE_DPC) && IS_ENABLED(CONFIG_PCIE_EDR)), this will be true. Enabling these configs does not mean you are using it. Any reason why you don't enable it in your case? > host->native_dpc is a misleading name a way... Misleading in the sense that > setting host->native_aer implies firmware intends the OS to control AER, whereas > host->native_dpc being set appears to have an additional requirement on the > use/support of EDR in addition to DPC. When I was working on the patch as > submitted I had been thinking about all of these fields & my thinking was > as follows.. The kernel requires host->native_aer in order to control AER, but > it could control DPC whether host->native_dpc is set or unset. Therefore, if > the kernel will control AER it should also control DPC on any capable devices. > Of course there is also the requirement of having built with CONFIG_PCIE_AER > & CONFIG_PCIE_DPC. Please advise if my understanding of all this is incorrect.. > > Thanks, > -Matt > > I included an update to the patch submitted in chain which should remove the > build error that occured when CONFIG_PCIE_AER was not set. Including it > in case my understanding of EDR/DPC/etc is correct. > > diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c > index 14a4b89a3b83..2fc006f12988 100644 > --- a/drivers/pci/pcie/portdrv.c > +++ b/drivers/pci/pcie/portdrv.c > @@ -257,12 +257,18 @@ static int get_port_device_capability(struct pci_dev *dev) > } > > /* > + * _OSC AER Control is required by the OS & requires OS to control AER, > + * but _OSC DPC Control isn't required by the OS to control DPC; however > + * it does require the OS to control DPC. _OSC DPC Control also requres > + * _OSC EDR Control (Error Disconnect Recovery) (PCI Firmware - DPC ECN rev3.2) > + * PCI_Express_Base 6.1, 6.2.11 Determination of DPC Control recommends > + * platform fw or OS always link control of DPC to AER. > + * > * With dpc-native, allow Linux to use DPC even if it doesn't have > * permission to use AER. > */ > if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) && > - pci_aer_available() && > - (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER))) > + pci_aer_available() && (pcie_ports_dpc_native || host->native_aer)) > services |= PCIE_PORT_SERVICE_DPC; > > if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM || >
A small part is probably historical; we've been using DPC on PCIe switches since before there was any EDR support in the kernel. It looks like there was a PCIe DPC ECN as early as Feb 2012, but this EDR/DPC fw ECN didn't come in till Jan 2019 & kernel support for ECN was even later. Its not immediately clear I would want to use EDR in my newer architecures & then there are also the older architecures still requiring support. When I submitted this patch I came at it with the approach of trying to keep the old behavior & still support the newer EDR behavior. Bjorns patch from Dec 28 2023 would seem to change the behavior for both root ports & switch ports, requiring them to set _OSC Control Field bit 7 (DPC) and _OSC Support Field bit 7 (EDR) or a kernel command line value. I think no matter what, we want to ensure that PCIe Root Ports and PCIe switches arrive at the same policy here. Should we consider CONFIG_PCIEAER or CONFIG_PCIEDPC as any amount of directive for the OS to use AER/DPC? In addition we have kernel command line arguments for pcieports=(compat/native/dpc-native) and pci=noaer. There are perhaps some others I'm not aware of. Then, there are the PCIe capabilities of the devies & bios settings for AER FW/OS controls, etc. I'm not sure if it strikes me as the right thing to now require users to specify additional fields to use DPC when they had been using it happily before. Perhaps the condition should be: > if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) && > - pci_aer_available() && > - (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER))) > + pci_aer_available() && (pcie_ports_dpc_native || > + (host->native_aer && !IS_ENABLED(CONFIG_PCIE_EDR)))) i.e: Use DPC if we set the command line argument or use DPC if we are are using EDR's _OSC DPC field, or use DPC if we have AER & there isn't EDR support?
On 1/8/2024 4:15 PM, Matthew W Carlis wrote: > A small part is probably historical; we've been using DPC on PCIe switches > since before there was any EDR support in the kernel. It looks like there > was a PCIe DPC ECN as early as Feb 2012, but this EDR/DPC fw ECN didn't come in > till Jan 2019 & kernel support for ECN was even later. Its not immediately > clear I would want to use EDR in my newer architecures & then there are also > the older architecures still requiring support. When I submitted this patch I > came at it with the approach of trying to keep the old behavior & still support > the newer EDR behavior. Bjorns patch from Dec 28 2023 would seem to change Just advertising the support for EDR in OS should not change any functional behavior. EDR will be used only if your firmware take DPC control and sends EDR notification. Since your kernel has EDR source support, why not enable the relevant config? or did I not understand the issue correctly? > the behavior for both root ports & switch ports, requiring them to set > _OSC Control Field bit 7 (DPC) and _OSC Support Field bit 7 (EDR) or a kernel > command line value. I think no matter what, we want to ensure that PCIe Root > Ports and PCIe switches arrive at the same policy here. > > Should we consider CONFIG_PCIEAER or CONFIG_PCIEDPC as any amount of directive > for the OS to use AER/DPC? In addition we have kernel command line arguments No, I don't think we should use CONFIG options in actual support check. > for pcieports=(compat/native/dpc-native) and pci=noaer. There are perhaps some > others I'm not aware of. Then, there are the PCIe capabilities of the devies > & bios settings for AER FW/OS controls, etc. I'm not sure if it strikes me as the > right thing to now require users to specify additional fields to use DPC when > they had been using it happily before. > > Perhaps the condition should be: >> if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) && >> - pci_aer_available() && >> - (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER))) >> + pci_aer_available() && (pcie_ports_dpc_native || >> + (host->native_aer && !IS_ENABLED(CONFIG_PCIE_EDR)))) > > i.e: Use DPC if we set the command line argument or use DPC if we are are using > EDR's _OSC DPC field, or use DPC if we have AER & there isn't EDR support? >
On 1/10/2024 8:41 AM, Kuppuswamy Sathyanarayanan wrote: > > > On 1/8/2024 4:15 PM, Matthew W Carlis wrote: >> A small part is probably historical; we've been using DPC on PCIe switches >> since before there was any EDR support in the kernel. It looks like there >> was a PCIe DPC ECN as early as Feb 2012, but this EDR/DPC fw ECN didn't come in >> till Jan 2019 & kernel support for ECN was even later. Its not immediately >> clear I would want to use EDR in my newer architecures & then there are also >> the older architecures still requiring support. When I submitted this patch I >> came at it with the approach of trying to keep the old behavior & still support >> the newer EDR behavior. Bjorns patch from Dec 28 2023 would seem to change > > Just advertising the support for EDR in OS should not change any functional > behavior. EDR will be used only if your firmware take DPC control and sends > EDR notification. Since your kernel has EDR source support, why not enable > the relevant config? or did I not understand the issue correctly? > Hi Bjorn, Since requesting DPC control needs both EDR and DPC support in kernel, I am wondering whether we need two different configs for it. IMO, it makes sense to merge them under one config. I don't see EDR working without DPC support. Since DPC control also need support for EDR, I think we don't need to differentiate them. What you think? file: drivers/acpi/pci_root.c + /* + * Per the Downstream Port Containment Related Enhancements ECN to + * the PCI Firmware Spec, r3.2, sec 4.5.1, table 4-5, + * OSC_PCI_EXPRESS_DPC_CONTROL indicates the OS supports both DPC + * and EDR. + */ + if (IS_ENABLED(CONFIG_PCIE_DPC) && IS_ENABLED(CONFIG_PCIE_EDR)) + control |= OSC_PCI_EXPRESS_DPC_CONTROL; >> the behavior for both root ports & switch ports, requiring them to set >> _OSC Control Field bit 7 (DPC) and _OSC Support Field bit 7 (EDR) or a kernel >> command line value. I think no matter what, we want to ensure that PCIe Root >> Ports and PCIe switches arrive at the same policy here. >> >> Should we consider CONFIG_PCIEAER or CONFIG_PCIEDPC as any amount of directive >> for the OS to use AER/DPC? In addition we have kernel command line arguments > > No, I don't think we should use CONFIG options in actual support check. > >> for pcieports=(compat/native/dpc-native) and pci=noaer. There are perhaps some >> others I'm not aware of. Then, there are the PCIe capabilities of the devies >> & bios settings for AER FW/OS controls, etc. I'm not sure if it strikes me as the >> right thing to now require users to specify additional fields to use DPC when >> they had been using it happily before. >> >> Perhaps the condition should be: >>> if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) && >>> - pci_aer_available() && >>> - (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER))) >>> + pci_aer_available() && (pcie_ports_dpc_native || >>> + (host->native_aer && !IS_ENABLED(CONFIG_PCIE_EDR)))) >> >> i.e: Use DPC if we set the command line argument or use DPC if we are are using >> EDR's _OSC DPC field, or use DPC if we have AER & there isn't EDR support? >> >
On Mon, Jan 08, 2024 at 05:15:08PM -0700, Matthew W Carlis wrote: > A small part is probably historical; we've been using DPC on PCIe > switches since before there was any EDR support in the kernel. It > looks like there was a PCIe DPC ECN as early as Feb 2012, but this > EDR/DPC fw ECN didn't come in till Jan 2019 & kernel support for ECN > was even later. Its not immediately clear I would want to use EDR in > my newer architecures & then there are also the older architecures > still requiring support. When I submitted this patch I came at it > with the approach of trying to keep the old behavior & still support > the newer EDR behavior. Bjorns patch from Dec 28 2023 would seem to > change the behavior for both root ports & switch ports, requiring > them to set _OSC Control Field bit 7 (DPC) and _OSC Support Field > bit 7 (EDR) or a kernel command line value. I think no matter what, > we want to ensure that PCIe Root Ports and PCIe switches arrive at > the same policy here. Is there an approved DPC ECN to the PCI Firmware spec that adds DPC control negotiation, but does *not* add the EDR requirement? I'm looking at https://members.pcisig.com/wg/PCI-SIG/document/previewpdf/12888, which seems to be the final "Downstream Port Containment Related Enhancements" ECN, which is dated 1/28/2019 and applies to the PCI Firmware spec r3.2. It adds bit 7, "PCI Express Downstream Port Containment Configuration control", to the passed-in _OSC Control field, which indicates that the OS supports both "native OS control and firmware ownership models (i.e. Error Disconnect Recover notification) of Downstream Port Containment." It also adds the dependency that "If the OS sets bit 7 of the Control field, it must set bit 7 of the Support field, indicating support for the Error Disconnect Recover event." So I'm trying to figure out if the "support DPC but not EDR" situation was ever a valid place to be. Maybe it's a mistake to have separate CONFIG_PCIE_DPC and CONFIG_PCIE_EDR options. CONFIG_PCIE_EDR depends on CONFIG_ACPI, so the situation is a little bit murky on non-ACPI systems that support DPC. Bjorn
On 1/22/24 11:32 AM, Bjorn Helgaas wrote: > On Mon, Jan 08, 2024 at 05:15:08PM -0700, Matthew W Carlis wrote: >> A small part is probably historical; we've been using DPC on PCIe >> switches since before there was any EDR support in the kernel. It >> looks like there was a PCIe DPC ECN as early as Feb 2012, but this >> EDR/DPC fw ECN didn't come in till Jan 2019 & kernel support for ECN >> was even later. Its not immediately clear I would want to use EDR in >> my newer architecures & then there are also the older architecures >> still requiring support. When I submitted this patch I came at it >> with the approach of trying to keep the old behavior & still support >> the newer EDR behavior. Bjorns patch from Dec 28 2023 would seem to >> change the behavior for both root ports & switch ports, requiring >> them to set _OSC Control Field bit 7 (DPC) and _OSC Support Field >> bit 7 (EDR) or a kernel command line value. I think no matter what, >> we want to ensure that PCIe Root Ports and PCIe switches arrive at >> the same policy here. > Is there an approved DPC ECN to the PCI Firmware spec that adds DPC > control negotiation, but does *not* add the EDR requirement? > > I'm looking at > https://members.pcisig.com/wg/PCI-SIG/document/previewpdf/12888, which > seems to be the final "Downstream Port Containment Related > Enhancements" ECN, which is dated 1/28/2019 and applies to the PCI > Firmware spec r3.2. > > It adds bit 7, "PCI Express Downstream Port Containment Configuration > control", to the passed-in _OSC Control field, which indicates that > the OS supports both "native OS control and firmware ownership models > (i.e. Error Disconnect Recover notification) of Downstream Port > Containment." > > It also adds the dependency that "If the OS sets bit 7 of the Control > field, it must set bit 7 of the Support field, indicating support for > the Error Disconnect Recover event." > > So I'm trying to figure out if the "support DPC but not EDR" situation > was ever a valid place to be. Maybe it's a mistake to have separate > CONFIG_PCIE_DPC and CONFIG_PCIE_EDR options. My understanding is also similar. I have raised the same point in https://lore.kernel.org/all/3c02a6d6-917e-486c-ad41-bdf176639ff2@linux.intel.com/ IMO, we don't need a separate config for EDR. I don't think user can gain anything with disabling EDR and enabling DPC. As long as firmware does not user EDR support, just compiling the code should be harmless. So we can either remove it, or select it by default if user selects DPC config. > > CONFIG_PCIE_EDR depends on CONFIG_ACPI, so the situation is a little > bit murky on non-ACPI systems that support DPC. If we are going to remove the EDR config, it might need #ifdef CONFIG_ACPI changes in edr.c to not compile ACPI specific code. Alternative choice is to compile edr.c with CONFIG_ACPI. > > Bjorn >
On Mon, Jan 22, 2024 at 06:37:48PM -0800, Kuppuswamy Sathyanarayanan wrote: > > On 1/22/24 11:32 AM, Bjorn Helgaas wrote: > > On Mon, Jan 08, 2024 at 05:15:08PM -0700, Matthew W Carlis wrote: > >> A small part is probably historical; we've been using DPC on PCIe > >> switches since before there was any EDR support in the kernel. It > >> looks like there was a PCIe DPC ECN as early as Feb 2012, but this > >> EDR/DPC fw ECN didn't come in till Jan 2019 & kernel support for ECN > >> was even later. Its not immediately clear I would want to use EDR in > >> my newer architecures & then there are also the older architecures > >> still requiring support. When I submitted this patch I came at it > >> with the approach of trying to keep the old behavior & still support > >> the newer EDR behavior. Bjorns patch from Dec 28 2023 would seem to > >> change the behavior for both root ports & switch ports, requiring > >> them to set _OSC Control Field bit 7 (DPC) and _OSC Support Field > >> bit 7 (EDR) or a kernel command line value. I think no matter what, > >> we want to ensure that PCIe Root Ports and PCIe switches arrive at > >> the same policy here. > > Is there an approved DPC ECN to the PCI Firmware spec that adds DPC > > control negotiation, but does *not* add the EDR requirement? > > > > I'm looking at > > https://members.pcisig.com/wg/PCI-SIG/document/previewpdf/12888, which > > seems to be the final "Downstream Port Containment Related > > Enhancements" ECN, which is dated 1/28/2019 and applies to the PCI > > Firmware spec r3.2. > > > > It adds bit 7, "PCI Express Downstream Port Containment Configuration > > control", to the passed-in _OSC Control field, which indicates that > > the OS supports both "native OS control and firmware ownership models > > (i.e. Error Disconnect Recover notification) of Downstream Port > > Containment." > > > > It also adds the dependency that "If the OS sets bit 7 of the Control > > field, it must set bit 7 of the Support field, indicating support for > > the Error Disconnect Recover event." > > > > So I'm trying to figure out if the "support DPC but not EDR" situation > > was ever a valid place to be. Maybe it's a mistake to have separate > > CONFIG_PCIE_DPC and CONFIG_PCIE_EDR options. > > My understanding is also similar. I have raised the same point in > https://lore.kernel.org/all/3c02a6d6-917e-486c-ad41-bdf176639ff2@linux.intel.com/ Ah, sorry, I missed that. > IMO, we don't need a separate config for EDR. I don't think user can > gain anything with disabling EDR and enabling DPC. As long as > firmware does not user EDR support, just compiling the code should > be harmless. > > So we can either remove it, or select it by default if user selects > DPC config. > > > CONFIG_PCIE_EDR depends on CONFIG_ACPI, so the situation is a little > > bit murky on non-ACPI systems that support DPC. > > If we are going to remove the EDR config, it might need #ifdef > CONFIG_ACPI changes in edr.c to not compile ACPI specific code. > Alternative choice is to compile edr.c with CONFIG_ACPI. Right. I think we should probably remove CONFIG_PCIE_EDR completely and make everything controlled by CONFIG_PCIE_DPC. edr.c only provides two interfaces: pci_acpi_add_edr_notifier() and pci_acpi_remove_edr_notifier(), which are only used by pci-acpi.c, which is only compiled if CONFIG_ACPI, so we could probably also compile edr.c only if CONFIG_ACPI. And the declarations in include/linux/pci-acpi.h could probably be moved to drivers/pci/pci.h since they're never used outside drivers/pci/. Bjorn
Hello again! I'm glad that I'm not the only person with a little confusion about the FW ECN regarding DPC/EDR. I would argue that DPC wasn't tied to EDR & shouldn't have been because DPC was added in PCI Base Spec Rev 3.1 in 2014, but there wasn't an EDR ECN till ~2020. Anyway, that's the way it goes.. I don't want to burden the kernel with making some impossible boot time decision here. Perhaps most of the machines in the world using DPC will soon use EDR/SFI etc. My use cases are a bit out of the ordinary & the ACPI specifications don't seem to have given us a mechanism for the kernel to conclude it can use DPC without EDR support... Shall I submit a patch removing CONFIG_PCIE_EDR? Perhaps the exercise would inform me about whether its code should be in CONFIG_PCIE_DPC or CONFIG_ACPI. Thanks, -Matt
On Tue, Jan 23, 2024 at 04:18:34PM -0700, Matthew W Carlis wrote: > Hello again! I'm glad that I'm not the only person with a little > confusion about the FW ECN regarding DPC/EDR. I would argue that DPC > wasn't tied to EDR & shouldn't have been because DPC was added in > PCI Base Spec Rev 3.1 in 2014, but there wasn't an EDR ECN till > ~2020. Anyway, that's the way it goes.. It does involve several different specs (PCIe Base for the DPC hardware feature, ACPI for the OS/firmware EDR interface, PCI Firmware for the EDR support and DPC ownership negotiation), so maybe that helps explain the muddle. > I don't want to burden the kernel with making some impossible boot > time decision here. Perhaps most of the machines in the world using > DPC will soon use EDR/SFI etc. My use cases are a bit out of the > ordinary & the ACPI specifications don't seem to have given us a > mechanism for the kernel to conclude it can use DPC without EDR > support... I don't know anything about SFI and I don't see any required connection between DPC/EDR/SFI in the PCI Firmware spec. The PCI Firmware spec requires ACPI OSes to support EDR if they want to use DPC. But I don't know that the firmware is required to actually implement the EDR functionality. Non-ACPI OSes are presumed to own DPC and all other PCIe hardware features, and I don't think EDR would be in the picture since it's an ACPI thing. > Shall I submit a patch removing CONFIG_PCIE_EDR? Perhaps the > exercise would inform me about whether its code should be in > CONFIG_PCIE_DPC or CONFIG_ACPI. That would be great! I would say CONFIG_PCIE_EDR should go away and edr.c should only be compiled if CONFIG_PCIE_DPC=y and CONFIG_ACPI=y. Bjorn
On Tue, Jan 23, 2024 at 09:59:21AM -0600, Bjorn Helgaas wrote: > On Mon, Jan 22, 2024 at 06:37:48PM -0800, Kuppuswamy Sathyanarayanan wrote: > > On 1/22/24 11:32 AM, Bjorn Helgaas wrote: > > > On Mon, Jan 08, 2024 at 05:15:08PM -0700, Matthew W Carlis wrote: > > >> A small part is probably historical; we've been using DPC on PCIe > > >> switches since before there was any EDR support in the kernel. It > > >> looks like there was a PCIe DPC ECN as early as Feb 2012, but this > > >> EDR/DPC fw ECN didn't come in till Jan 2019 & kernel support for ECN > > >> was even later. Its not immediately clear I would want to use EDR in > > >> my newer architecures & then there are also the older architecures > > >> still requiring support. When I submitted this patch I came at it > > >> with the approach of trying to keep the old behavior & still support > > >> the newer EDR behavior. Bjorns patch from Dec 28 2023 would seem to > > >> change the behavior for both root ports & switch ports, requiring > > >> them to set _OSC Control Field bit 7 (DPC) and _OSC Support Field > > >> bit 7 (EDR) or a kernel command line value. I think no matter what, > > >> we want to ensure that PCIe Root Ports and PCIe switches arrive at > > >> the same policy here. > > > Is there an approved DPC ECN to the PCI Firmware spec that adds DPC > > > control negotiation, but does *not* add the EDR requirement? > > > > > > I'm looking at > > > https://members.pcisig.com/wg/PCI-SIG/document/previewpdf/12888, which > > > seems to be the final "Downstream Port Containment Related > > > Enhancements" ECN, which is dated 1/28/2019 and applies to the PCI > > > Firmware spec r3.2. > > > > > > It adds bit 7, "PCI Express Downstream Port Containment Configuration > > > control", to the passed-in _OSC Control field, which indicates that > > > the OS supports both "native OS control and firmware ownership models > > > (i.e. Error Disconnect Recover notification) of Downstream Port > > > Containment." > > > > > > It also adds the dependency that "If the OS sets bit 7 of the Control > > > field, it must set bit 7 of the Support field, indicating support for > > > the Error Disconnect Recover event." > > > > > > So I'm trying to figure out if the "support DPC but not EDR" situation > > > was ever a valid place to be. Maybe it's a mistake to have separate > > > CONFIG_PCIE_DPC and CONFIG_PCIE_EDR options. > > > > My understanding is also similar. I have raised the same point in > > https://lore.kernel.org/all/3c02a6d6-917e-486c-ad41-bdf176639ff2@linux.intel.com/ > > Ah, sorry, I missed that. > > > IMO, we don't need a separate config for EDR. I don't think user can > > gain anything with disabling EDR and enabling DPC. As long as > > firmware does not user EDR support, just compiling the code should > > be harmless. > > > > So we can either remove it, or select it by default if user selects > > DPC config. > > > > > CONFIG_PCIE_EDR depends on CONFIG_ACPI, so the situation is a little > > > bit murky on non-ACPI systems that support DPC. > > > > If we are going to remove the EDR config, it might need #ifdef > > CONFIG_ACPI changes in edr.c to not compile ACPI specific code. > > Alternative choice is to compile edr.c with CONFIG_ACPI. > > Right. I think we should probably remove CONFIG_PCIE_EDR completely > and make everything controlled by CONFIG_PCIE_DPC. In the PCI Firmware spec, r3.3, sec 4.5.1, table 4-4, the description of "Error Disconnect Recover Supported" hints at the possibility for an OS to support EDR but not DPC: In the context of PCIe, support for Error Disconnect Recover implies that the operating system will invalidate the software state associated with child devices of the port without attempting to access the child device hardware. *If* the operating system supports Downstream Port Containment (DPC), ..., the operating system shall attempt to recover the child devices if the port implements the Downstream Port Containment Extended Capability. On the other hand, sec 4.6.12 and the implementation note there with the EDR flow seem to assume the OS *does* support DPC and can clear error status and bring ports out of DPC even if the OS hasn't been granted control of DPC. EDR is an ACPI feature, and I guess it might be theoretically possible to use EDR on an ACPI system with some non-DPC error containment feature like powerpc EEH. But obviously powerpc doesn't use ACPI, and a hypothetical ACPI system with non-DPC error containment would have to add support for that containment in edr_handle_event(). So while I'm not 100% sure that making CONFIG_PCIE_DPC control both DPC and EDR is completely correct, I guess for now I still think using CONFIG_PCIE_DPC for both DPC and EDR seems like a reasonable plan because we have no support for EDR *without* DPC. Bjorn
[+cc Mahesh, Oliver, linuxppc-dev, since I mentioned powerpc below. Probably not of interest since this is about the ACPI EDR feature, but just FYI] On Wed, Feb 21, 2024 at 05:11:04PM -0600, Bjorn Helgaas wrote: > On Tue, Jan 23, 2024 at 09:59:21AM -0600, Bjorn Helgaas wrote: > > On Mon, Jan 22, 2024 at 06:37:48PM -0800, Kuppuswamy Sathyanarayanan wrote: > > > On 1/22/24 11:32 AM, Bjorn Helgaas wrote: > > > > On Mon, Jan 08, 2024 at 05:15:08PM -0700, Matthew W Carlis wrote: > > > >> A small part is probably historical; we've been using DPC on PCIe > > > >> switches since before there was any EDR support in the kernel. It > > > >> looks like there was a PCIe DPC ECN as early as Feb 2012, but this > > > >> EDR/DPC fw ECN didn't come in till Jan 2019 & kernel support for ECN > > > >> was even later. Its not immediately clear I would want to use EDR in > > > >> my newer architecures & then there are also the older architecures > > > >> still requiring support. When I submitted this patch I came at it > > > >> with the approach of trying to keep the old behavior & still support > > > >> the newer EDR behavior. Bjorns patch from Dec 28 2023 would seem to > > > >> change the behavior for both root ports & switch ports, requiring > > > >> them to set _OSC Control Field bit 7 (DPC) and _OSC Support Field > > > >> bit 7 (EDR) or a kernel command line value. I think no matter what, > > > >> we want to ensure that PCIe Root Ports and PCIe switches arrive at > > > >> the same policy here. > > > > Is there an approved DPC ECN to the PCI Firmware spec that adds DPC > > > > control negotiation, but does *not* add the EDR requirement? > > > > > > > > I'm looking at > > > > https://members.pcisig.com/wg/PCI-SIG/document/previewpdf/12888, which > > > > seems to be the final "Downstream Port Containment Related > > > > Enhancements" ECN, which is dated 1/28/2019 and applies to the PCI > > > > Firmware spec r3.2. > > > > > > > > It adds bit 7, "PCI Express Downstream Port Containment Configuration > > > > control", to the passed-in _OSC Control field, which indicates that > > > > the OS supports both "native OS control and firmware ownership models > > > > (i.e. Error Disconnect Recover notification) of Downstream Port > > > > Containment." > > > > > > > > It also adds the dependency that "If the OS sets bit 7 of the Control > > > > field, it must set bit 7 of the Support field, indicating support for > > > > the Error Disconnect Recover event." > > > > > > > > So I'm trying to figure out if the "support DPC but not EDR" situation > > > > was ever a valid place to be. Maybe it's a mistake to have separate > > > > CONFIG_PCIE_DPC and CONFIG_PCIE_EDR options. > > > > > > My understanding is also similar. I have raised the same point in > > > https://lore.kernel.org/all/3c02a6d6-917e-486c-ad41-bdf176639ff2@linux.intel.com/ > > > > Ah, sorry, I missed that. > > > > > IMO, we don't need a separate config for EDR. I don't think user can > > > gain anything with disabling EDR and enabling DPC. As long as > > > firmware does not user EDR support, just compiling the code should > > > be harmless. > > > > > > So we can either remove it, or select it by default if user selects > > > DPC config. > > > > > > > CONFIG_PCIE_EDR depends on CONFIG_ACPI, so the situation is a little > > > > bit murky on non-ACPI systems that support DPC. > > > > > > If we are going to remove the EDR config, it might need #ifdef > > > CONFIG_ACPI changes in edr.c to not compile ACPI specific code. > > > Alternative choice is to compile edr.c with CONFIG_ACPI. > > > > Right. I think we should probably remove CONFIG_PCIE_EDR completely > > and make everything controlled by CONFIG_PCIE_DPC. > > In the PCI Firmware spec, r3.3, sec 4.5.1, table 4-4, the description > of "Error Disconnect Recover Supported" hints at the possibility for > an OS to support EDR but not DPC: > > In the context of PCIe, support for Error Disconnect Recover implies > that the operating system will invalidate the software state > associated with child devices of the port without attempting to > access the child device hardware. *If* the operating system supports > Downstream Port Containment (DPC), ..., the operating system shall > attempt to recover the child devices if the port implements the > Downstream Port Containment Extended Capability. > > On the other hand, sec 4.6.12 and the implementation note there with > the EDR flow seem to assume the OS *does* support DPC and can clear > error status and bring ports out of DPC even if the OS hasn't been > granted control of DPC. > > EDR is an ACPI feature, and I guess it might be theoretically possible > to use EDR on an ACPI system with some non-DPC error containment > feature like powerpc EEH. But obviously powerpc doesn't use ACPI, and > a hypothetical ACPI system with non-DPC error containment would have > to add support for that containment in edr_handle_event(). > > So while I'm not 100% sure that making CONFIG_PCIE_DPC control both > DPC and EDR is completely correct, I guess for now I still think using > CONFIG_PCIE_DPC for both DPC and EDR seems like a reasonable plan > because we have no support for EDR *without* DPC. > > Bjorn
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c index 14a4b89a3b83..2fc006f12988 100644 --- a/drivers/pci/pcie/portdrv.c +++ b/drivers/pci/pcie/portdrv.c @@ -257,12 +257,18 @@ static int get_port_device_capability(struct pci_dev *dev) } /* + * _OSC AER Control is required by the OS & requires OS to control AER, + * but _OSC DPC Control isn't required by the OS to control DPC; however + * it does require the OS to control DPC. _OSC DPC Control also requres + * _OSC EDR Control (Error Disconnect Recovery) (PCI Firmware - DPC ECN rev3.2) + * PCI_Express_Base 6.1, 6.2.11 Determination of DPC Control recommends + * platform fw or OS always link control of DPC to AER. + * * With dpc-native, allow Linux to use DPC even if it doesn't have * permission to use AER. */ if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) && - pci_aer_available() && - (pcie_ports_dpc_native || (services & PCIE_PORT_SERVICE_AER))) + pci_aer_available() && (pcie_ports_dpc_native || host->native_aer)) services |= PCIE_PORT_SERVICE_DPC; if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||