diff mbox series

[v9,7/8] PCI/DPC: Clear AER registers in EDR mode

Message ID d4e51ce547feb6251ee2c64a4141c6dc772717ac.1570145778.git.sathyanarayanan.kuppuswamy@linux.intel.com
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series Add Error Disconnect Recover (EDR) support | expand

Commit Message

Kuppuswamy Sathyanarayanan Oct. 3, 2019, 11:39 p.m. UTC
From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

As per PCI firmware specification r3.2 Downstream Port Containment
Related Enhancements ECN, OS is responsible for clearing the AER
registers in EDR mode. So clear AER registers in dpc_process_error()
function.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/dpc.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Bjorn Helgaas Oct. 28, 2019, 11:27 p.m. UTC | #1
On Thu, Oct 03, 2019 at 04:39:03PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> As per PCI firmware specification r3.2 Downstream Port Containment
> Related Enhancements ECN, 

Specific reference, please, e.g., the section/table/figure of the PCI
Firmware Spec being modified by the ECN.

> OS is responsible for clearing the AER
> registers in EDR mode. So clear AER registers in dpc_process_error()
> function.
> 
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Acked-by: Keith Busch <keith.busch@intel.com>
> ---
>  drivers/pci/pcie/dpc.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index fafc55c00fe0..de2d892bc7c4 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -275,6 +275,10 @@ static void dpc_process_error(struct dpc_dev *dpc)
>  		pci_aer_clear_fatal_status(pdev);
>  	}
>  
> +	/* In EDR mode, OS is responsible for clearing AER registers */
> +	if (dpc->firmware_dpc)

I guess "EDR mode" is effectively the same as "firmware-first mode"?

At least, the only place we set "firmware_dpc = 1" is:

  +       if (pcie_aer_get_firmware_first(pdev))
  +               dpc->firmware_dpc = 1;

If they're the same, why do we need two different names for it?

> +		pci_cleanup_aer_error_status_regs(pdev);
> +
>  	/*
>  	 * Irrespective of whether the DPC event is triggered by
>  	 * ERR_FATAL or ERR_NONFATAL, since the link is already down,
> -- 
> 2.21.0
>
Kuppuswamy Sathyanarayanan Oct. 29, 2019, 8:04 p.m. UTC | #2
On 10/28/19 4:27 PM, Bjorn Helgaas wrote:
> On Thu, Oct 03, 2019 at 04:39:03PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
>> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>
>> As per PCI firmware specification r3.2 Downstream Port Containment
>> Related Enhancements ECN,
> Specific reference, please, e.g., the section/table/figure of the PCI
> Firmware Spec being modified by the ECN.
Ok. I will include it.
>
>> OS is responsible for clearing the AER
>> registers in EDR mode. So clear AER registers in dpc_process_error()
>> function.
>>
>> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>> Acked-by: Keith Busch <keith.busch@intel.com>
>> ---
>>   drivers/pci/pcie/dpc.c | 4 ++++
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
>> index fafc55c00fe0..de2d892bc7c4 100644
>> --- a/drivers/pci/pcie/dpc.c
>> +++ b/drivers/pci/pcie/dpc.c
>> @@ -275,6 +275,10 @@ static void dpc_process_error(struct dpc_dev *dpc)
>>   		pci_aer_clear_fatal_status(pdev);
>>   	}
>>   
>> +	/* In EDR mode, OS is responsible for clearing AER registers */
>> +	if (dpc->firmware_dpc)
> I guess "EDR mode" is effectively the same as "firmware-first mode"?
Yes, EDR mode is an upgrade to FF mode in which firmware allows
OS to share some of it job by sending ACPI notification. If you don't get
ACPI notification, EDR mode is effectively same as FF mode.
>
> At least, the only place we set "firmware_dpc = 1" is:
>
>    +       if (pcie_aer_get_firmware_first(pdev))
>    +               dpc->firmware_dpc = 1;
>
> If they're the same, why do we need two different names for it?
For better readability and performance, I tried to cache the value of
pcie_aer_get_firmware_first() result in DPC driver.
>
>> +		pci_cleanup_aer_error_status_regs(pdev);
>> +
>>   	/*
>>   	 * Irrespective of whether the DPC event is triggered by
>>   	 * ERR_FATAL or ERR_NONFATAL, since the link is already down,
>> -- 
>> 2.21.0
>>
Bjorn Helgaas Oct. 29, 2019, 10:48 p.m. UTC | #3
On Tue, Oct 29, 2019 at 01:04:29PM -0700, Kuppuswamy Sathyanarayanan wrote:
> 
> On 10/28/19 4:27 PM, Bjorn Helgaas wrote:
> > On Thu, Oct 03, 2019 at 04:39:03PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> > > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > 
> > > As per PCI firmware specification r3.2 Downstream Port Containment
> > > Related Enhancements ECN,
> > Specific reference, please, e.g., the section/table/figure of the PCI
> > Firmware Spec being modified by the ECN.
> Ok. I will include it.
> > 
> > > OS is responsible for clearing the AER
> > > registers in EDR mode. So clear AER registers in dpc_process_error()
> > > function.
> > > 
> > > Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> > > Acked-by: Keith Busch <keith.busch@intel.com>
> > > ---
> > >   drivers/pci/pcie/dpc.c | 4 ++++
> > >   1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> > > index fafc55c00fe0..de2d892bc7c4 100644
> > > --- a/drivers/pci/pcie/dpc.c
> > > +++ b/drivers/pci/pcie/dpc.c
> > > @@ -275,6 +275,10 @@ static void dpc_process_error(struct dpc_dev *dpc)
> > >   		pci_aer_clear_fatal_status(pdev);
> > >   	}
> > > +	/* In EDR mode, OS is responsible for clearing AER registers */
> > > +	if (dpc->firmware_dpc)
> >
> > I guess "EDR mode" is effectively the same as "firmware-first mode"?
>
> Yes, EDR mode is an upgrade to FF mode in which firmware allows OS
> to share some of it job by sending ACPI notification. If you don't
> get ACPI notification, EDR mode is effectively same as FF mode.

Hmm, somehow the connection between FF and EDR needs to be clear from
the code, so people who weren't involved in the development of EDR and
don't even have access to the specs/ECNs can make sense out of this.

> > At least, the only place we set "firmware_dpc = 1" is:
> > 
> >    +       if (pcie_aer_get_firmware_first(pdev))
> >    +               dpc->firmware_dpc = 1;
> > 
> > If they're the same, why do we need two different names for it?
> For better readability and performance, I tried to cache the value of
> pcie_aer_get_firmware_first() result in DPC driver.

pcie_aer_get_firmware_first() already caches the value, so I don't
think you're gaining any useful performance here, and having two
different names *decreases* readability.

I do agree that pcie_aer_get_firmware_first() is not optimally
implemented.  I think we should probably look up the firmware-first
indication explicitly during enumeration so we don't have to bother
with the dev->__aer_firmware_first_valid thing.  And if we got rid of
all those leading underscores, it would probably run faster, too ;)

> > > +		pci_cleanup_aer_error_status_regs(pdev);
> > > +
> > >   	/*
> > >   	 * Irrespective of whether the DPC event is triggered by
> > >   	 * ERR_FATAL or ERR_NONFATAL, since the link is already down,
> > > -- 
> > > 2.21.0
> > > 
> -- 
> Sathyanarayanan Kuppuswamy
> Linux kernel developer
>
Kuppuswamy Sathyanarayanan Oct. 29, 2019, 11:37 p.m. UTC | #4
On 10/29/19 3:48 PM, Bjorn Helgaas wrote:
> On Tue, Oct 29, 2019 at 01:04:29PM -0700, Kuppuswamy Sathyanarayanan wrote:
>> On 10/28/19 4:27 PM, Bjorn Helgaas wrote:
>>> On Thu, Oct 03, 2019 at 04:39:03PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
>>>> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>>>
>>>> As per PCI firmware specification r3.2 Downstream Port Containment
>>>> Related Enhancements ECN,
>>> Specific reference, please, e.g., the section/table/figure of the PCI
>>> Firmware Spec being modified by the ECN.
>> Ok. I will include it.
>>>> OS is responsible for clearing the AER
>>>> registers in EDR mode. So clear AER registers in dpc_process_error()
>>>> function.
>>>>
>>>> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>>> Acked-by: Keith Busch <keith.busch@intel.com>
>>>> ---
>>>>    drivers/pci/pcie/dpc.c | 4 ++++
>>>>    1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
>>>> index fafc55c00fe0..de2d892bc7c4 100644
>>>> --- a/drivers/pci/pcie/dpc.c
>>>> +++ b/drivers/pci/pcie/dpc.c
>>>> @@ -275,6 +275,10 @@ static void dpc_process_error(struct dpc_dev *dpc)
>>>>    		pci_aer_clear_fatal_status(pdev);
>>>>    	}
>>>> +	/* In EDR mode, OS is responsible for clearing AER registers */
>>>> +	if (dpc->firmware_dpc)
>>> I guess "EDR mode" is effectively the same as "firmware-first mode"?
>> Yes, EDR mode is an upgrade to FF mode in which firmware allows OS
>> to share some of it job by sending ACPI notification. If you don't
>> get ACPI notification, EDR mode is effectively same as FF mode.
May be I can add some documentation in code to explain the EDR mode better.
> Hmm, somehow the connection between FF and EDR needs to be clear from
> the code, so people who weren't involved in the development of EDR and
> don't even have access to the specs/ECNs can make sense out of this.
>
>>> At least, the only place we set "firmware_dpc = 1" is:
>>>
>>>     +       if (pcie_aer_get_firmware_first(pdev))
>>>     +               dpc->firmware_dpc = 1;
>>>
>>> If they're the same, why do we need two different names for it?
>> For better readability and performance, I tried to cache the value of
>> pcie_aer_get_firmware_first() result in DPC driver.
> pcie_aer_get_firmware_first() already caches the value, so I don't
> think you're gaining any useful performance here, and having two
> different names *decreases* readability.
Ok. I can replace "firmware_dpc" with pcie_aer_get_firmware_first() calls.
>
> I do agree that pcie_aer_get_firmware_first() is not optimally
> implemented.  I think we should probably look up the firmware-first
> indication explicitly during enumeration so we don't have to bother
> with the dev->__aer_firmware_first_valid thing.  And if we got rid of
> all those leading underscores, it would probably run faster, too ;)
I agree that pcie_aer_get_firmware_first() can be optimized. I can submit a
patch for it once this patch set is merged.
>
>>>> +		pci_cleanup_aer_error_status_regs(pdev);
>>>> +
>>>>    	/*
>>>>    	 * Irrespective of whether the DPC event is triggered by
>>>>    	 * ERR_FATAL or ERR_NONFATAL, since the link is already down,
>>>> -- 
>>>> 2.21.0
>>>>
>> -- 
>> Sathyanarayanan Kuppuswamy
>> Linux kernel developer
>>
diff mbox series

Patch

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index fafc55c00fe0..de2d892bc7c4 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -275,6 +275,10 @@  static void dpc_process_error(struct dpc_dev *dpc)
 		pci_aer_clear_fatal_status(pdev);
 	}
 
+	/* In EDR mode, OS is responsible for clearing AER registers */
+	if (dpc->firmware_dpc)
+		pci_cleanup_aer_error_status_regs(pdev);
+
 	/*
 	 * Irrespective of whether the DPC event is triggered by
 	 * ERR_FATAL or ERR_NONFATAL, since the link is already down,