[v9,5/8] PCI/AER: Allow clearing Error Status Register in FF mode
diff mbox series

Message ID a6de212ec82651ca6361e21926497a26c590d037.1570145778.git.sathyanarayanan.kuppuswamy@linux.intel.com
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series
  • Add Error Disconnect Recover (EDR) support
Related show

Commit Message

Kuppuswamy Sathyanarayanan Oct. 3, 2019, 11:39 p.m. UTC
From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

As per PCI firmware specification r3.2 Downstream Port Containment
Related Enhancements ECN, sec 4.5.1, table 4-6, Error Disconnect
Recover (EDR) support allows OS to handle error recovery and
clearing Error Registers even in FF mode. So remove FF mode checks in
pci_cleanup_aer_uncorrect_error_status(), pci_aer_clear_fatal_status()
and pci_cleanup_aer_error_status_regs() functions.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/aer.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

Comments

Bjorn Helgaas Oct. 28, 2019, 11:22 p.m. UTC | #1
On Thu, Oct 03, 2019 at 04:39:01PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> As per PCI firmware specification r3.2 Downstream Port Containment
> Related Enhancements ECN, sec 4.5.1, table 4-6,

That table adds bit 7, "DPC config control" to the _OSC control field
and talks about modifying registers in the DPC capability.

It doesn't say anything about firmware-first or about the OS modifying
registers in the AER capability.  But this patch doesn't do anything
related to DPC or _OSC.

> Error Disconnect
> Recover (EDR) support allows OS to handle error recovery and
> clearing Error Registers even in FF mode. So remove FF mode checks in
> pci_cleanup_aer_uncorrect_error_status(), pci_aer_clear_fatal_status()
> and pci_cleanup_aer_error_status_regs() functions.
> 
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Acked-by: Keith Busch <keith.busch@intel.com>
> ---
>  drivers/pci/pcie/aer.c | 12 ++----------
>  1 file changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index b45bc47d04fe..e1509bb900c5 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -383,9 +383,6 @@ int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev)
>  	if (!pos)
>  		return -EIO;
>  
> -	if (pcie_aer_get_firmware_first(dev))
> -		return -EIO;
> -
>  	/* Clear status bits for ERR_NONFATAL errors only */
>  	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, &status);
>  	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, &sev);
> @@ -406,9 +403,6 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
>  	if (!pos)
>  		return;
>  
> -	if (pcie_aer_get_firmware_first(dev))
> -		return;
> -
>  	/* Clear status bits for ERR_FATAL errors only */
>  	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, &status);
>  	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, &sev);

> @@ -430,9 +424,6 @@ int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
>  	if (!pos)
>  		return -EIO;
>  
> -	if (pcie_aer_get_firmware_first(dev))
> -		return -EIO;
> -
>  	port_type = pci_pcie_type(dev);
>  	if (port_type == PCI_EXP_TYPE_ROOT_PORT) {
>  		pci_read_config_dword(dev, pos + PCI_ERR_ROOT_STATUS, &status);
> @@ -455,7 +446,8 @@ void pci_aer_init(struct pci_dev *dev)
>  	if (dev->aer_cap)
>  		dev->aer_stats = kzalloc(sizeof(struct aer_stats), GFP_KERNEL);
>  
> -	pci_cleanup_aer_error_status_regs(dev);
> +	if (!pcie_aer_get_firmware_first(dev))
> +		pci_cleanup_aer_error_status_regs(dev);

This effectively moves the "if (pcie_aer_get_firmware_first())" check
from pci_cleanup_aer_error_status_regs() into one of the callers.  But
there are two other callers: pci_aer_init() and pci_restore_state().
Do they need the change, or do you want to cleanup the AER error
registers there, but not here?

>  }
>  
>  void pci_aer_exit(struct pci_dev *dev)
> -- 
> 2.21.0
>
Kuppuswamy Sathyanarayanan Oct. 29, 2019, 7:58 p.m. UTC | #2
On 10/28/19 4:22 PM, Bjorn Helgaas wrote:
> On Thu, Oct 03, 2019 at 04:39:01PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
>> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>
>> As per PCI firmware specification r3.2 Downstream Port Containment
>> Related Enhancements ECN, sec 4.5.1, table 4-6,
> That table adds bit 7, "DPC config control" to the _OSC control field
> and talks about modifying registers in the DPC capability.
>
> It doesn't say anything about firmware-first or about the OS modifying
> registers in the AER capability.  But this patch doesn't do anything
> related to DPC or _OSC.

Expectation of clearing AER registers is not explicitly mentioned in 
above DPC ECN. But it
has been clarified in the following update to this ECN.

Please check the implementation note in page 10 of
https://members.pcisig.com/wg/PCI-SIG/document/previewpdf/13563.

It  specifies that in EDR mode, firmware expects OS to clear error 
registers (Check for
following string "Bring Portout of DPC, clear port error status, assign 
bus numbers to child
devices").

I will include this ECN reference in follow up update.

>
>> Error Disconnect
>> Recover (EDR) support allows OS to handle error recovery and
>> clearing Error Registers even in FF mode. So remove FF mode checks in
>> pci_cleanup_aer_uncorrect_error_status(), pci_aer_clear_fatal_status()
>> and pci_cleanup_aer_error_status_regs() functions.
>>
>> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>> Acked-by: Keith Busch <keith.busch@intel.com>
>> ---
>>   drivers/pci/pcie/aer.c | 12 ++----------
>>   1 file changed, 2 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index b45bc47d04fe..e1509bb900c5 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -383,9 +383,6 @@ int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev)
>>   	if (!pos)
>>   		return -EIO;
>>   
>> -	if (pcie_aer_get_firmware_first(dev))
>> -		return -EIO;
>> -
>>   	/* Clear status bits for ERR_NONFATAL errors only */
>>   	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, &status);
>>   	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, &sev);
>> @@ -406,9 +403,6 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
>>   	if (!pos)
>>   		return;
>>   
>> -	if (pcie_aer_get_firmware_first(dev))
>> -		return;
>> -
>>   	/* Clear status bits for ERR_FATAL errors only */
>>   	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, &status);
>>   	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, &sev);
>> @@ -430,9 +424,6 @@ int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
>>   	if (!pos)
>>   		return -EIO;
>>   
>> -	if (pcie_aer_get_firmware_first(dev))
>> -		return -EIO;
>> -
>>   	port_type = pci_pcie_type(dev);
>>   	if (port_type == PCI_EXP_TYPE_ROOT_PORT) {
>>   		pci_read_config_dword(dev, pos + PCI_ERR_ROOT_STATUS, &status);
>> @@ -455,7 +446,8 @@ void pci_aer_init(struct pci_dev *dev)
>>   	if (dev->aer_cap)
>>   		dev->aer_stats = kzalloc(sizeof(struct aer_stats), GFP_KERNEL);
>>   
>> -	pci_cleanup_aer_error_status_regs(dev);
>> +	if (!pcie_aer_get_firmware_first(dev))
>> +		pci_cleanup_aer_error_status_regs(dev);
> This effectively moves the "if (pcie_aer_get_firmware_first())" check
> from pci_cleanup_aer_error_status_regs() into one of the callers.  But
> there are two other callers: pci_aer_init() and pci_restore_state().
> Do they need the change, or do you want to cleanup the AER error
> registers there, but not here?
Good catch. I have added this check to pci_aer_init(). But it needs to 
be added to
pci_restore_state() as well. Instead of moving the checks to the caller, 
If you agree,
I could change the API to pci_cleanup_aer_error_status_regs(struct 
pci_dev *dev, bool skip_ff_check) and
let the caller decide whether they want skip the check or not.
>
>>   }
>>   
>>   void pci_aer_exit(struct pci_dev *dev)
>> -- 
>> 2.21.0
>>
Bjorn Helgaas Oct. 29, 2019, 11:03 p.m. UTC | #3
On Tue, Oct 29, 2019 at 12:58:14PM -0700, Kuppuswamy Sathyanarayanan wrote:
> 
> On 10/28/19 4:22 PM, Bjorn Helgaas wrote:
> > On Thu, Oct 03, 2019 at 04:39:01PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> > > From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

> > > @@ -430,9 +424,6 @@ int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
> > >   	if (!pos)
> > >   		return -EIO;
> > > -	if (pcie_aer_get_firmware_first(dev))
> > > -		return -EIO;
> > > -
> > >   	port_type = pci_pcie_type(dev);
> > >   	if (port_type == PCI_EXP_TYPE_ROOT_PORT) {
> > >   		pci_read_config_dword(dev, pos + PCI_ERR_ROOT_STATUS, &status);
> > > @@ -455,7 +446,8 @@ void pci_aer_init(struct pci_dev *dev)
> > >   	if (dev->aer_cap)
> > >   		dev->aer_stats = kzalloc(sizeof(struct aer_stats), GFP_KERNEL);
> > > -	pci_cleanup_aer_error_status_regs(dev);
> > > +	if (!pcie_aer_get_firmware_first(dev))
> > > +		pci_cleanup_aer_error_status_regs(dev);
> >
> > This effectively moves the "if (pcie_aer_get_firmware_first())" check
> > from pci_cleanup_aer_error_status_regs() into one of the callers.  But
> > there are two other callers: pci_aer_init() and pci_restore_state().
> > Do they need the change, or do you want to cleanup the AER error
> > registers there, but not here?
>
> Good catch. I have added this check to pci_aer_init(). But it needs
> to be added to pci_restore_state() as well. Instead of moving the
> checks to the caller, If you agree, I could change the API to
> pci_cleanup_aer_error_status_regs(struct pci_dev *dev, bool
> skip_ff_check) and let the caller decide whether they want skip the
> check or not.

If all callers of pci_cleanup_aer_error_status_regs() would have to
check pcie_aer_get_firmware_first(), I don't understand why you're
moving the check at all.
Kuppuswamy Sathyanarayanan Oct. 29, 2019, 11:17 p.m. UTC | #4
On 10/29/19 4:03 PM, Bjorn Helgaas wrote:
> On Tue, Oct 29, 2019 at 12:58:14PM -0700, Kuppuswamy Sathyanarayanan wrote:
>> On 10/28/19 4:22 PM, Bjorn Helgaas wrote:
>>> On Thu, Oct 03, 2019 at 04:39:01PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
>>>> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>>> @@ -430,9 +424,6 @@ int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
>>>>    	if (!pos)
>>>>    		return -EIO;
>>>> -	if (pcie_aer_get_firmware_first(dev))
>>>> -		return -EIO;
>>>> -
>>>>    	port_type = pci_pcie_type(dev);
>>>>    	if (port_type == PCI_EXP_TYPE_ROOT_PORT) {
>>>>    		pci_read_config_dword(dev, pos + PCI_ERR_ROOT_STATUS, &status);
>>>> @@ -455,7 +446,8 @@ void pci_aer_init(struct pci_dev *dev)
>>>>    	if (dev->aer_cap)
>>>>    		dev->aer_stats = kzalloc(sizeof(struct aer_stats), GFP_KERNEL);
>>>> -	pci_cleanup_aer_error_status_regs(dev);
>>>> +	if (!pcie_aer_get_firmware_first(dev))
>>>> +		pci_cleanup_aer_error_status_regs(dev);
>>> This effectively moves the "if (pcie_aer_get_firmware_first())" check
>>> from pci_cleanup_aer_error_status_regs() into one of the callers.  But
>>> there are two other callers: pci_aer_init() and pci_restore_state().
>>> Do they need the change, or do you want to cleanup the AER error
>>> registers there, but not here?
>> Good catch. I have added this check to pci_aer_init(). But it needs
>> to be added to pci_restore_state() as well. Instead of moving the
>> checks to the caller, If you agree, I could change the API to
>> pci_cleanup_aer_error_status_regs(struct pci_dev *dev, bool
>> skip_ff_check) and let the caller decide whether they want skip the
>> check or not.
> If all callers of pci_cleanup_aer_error_status_regs() would have to
> check pcie_aer_get_firmware_first(), I don't understand why you're
> moving the check at all.

We need exception for the call made from DPC driver. If 
pcie_aer_get_firmware_first()
call is made from DPC driver during EDR mode execution, then FF mode 
check needs
to be skipped.

>

Patch
diff mbox series

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index b45bc47d04fe..e1509bb900c5 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -383,9 +383,6 @@  int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev)
 	if (!pos)
 		return -EIO;
 
-	if (pcie_aer_get_firmware_first(dev))
-		return -EIO;
-
 	/* Clear status bits for ERR_NONFATAL errors only */
 	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, &status);
 	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, &sev);
@@ -406,9 +403,6 @@  void pci_aer_clear_fatal_status(struct pci_dev *dev)
 	if (!pos)
 		return;
 
-	if (pcie_aer_get_firmware_first(dev))
-		return;
-
 	/* Clear status bits for ERR_FATAL errors only */
 	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, &status);
 	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, &sev);
@@ -430,9 +424,6 @@  int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
 	if (!pos)
 		return -EIO;
 
-	if (pcie_aer_get_firmware_first(dev))
-		return -EIO;
-
 	port_type = pci_pcie_type(dev);
 	if (port_type == PCI_EXP_TYPE_ROOT_PORT) {
 		pci_read_config_dword(dev, pos + PCI_ERR_ROOT_STATUS, &status);
@@ -455,7 +446,8 @@  void pci_aer_init(struct pci_dev *dev)
 	if (dev->aer_cap)
 		dev->aer_stats = kzalloc(sizeof(struct aer_stats), GFP_KERNEL);
 
-	pci_cleanup_aer_error_status_regs(dev);
+	if (!pcie_aer_get_firmware_first(dev))
+		pci_cleanup_aer_error_status_regs(dev);
 }
 
 void pci_aer_exit(struct pci_dev *dev)