diff mbox series

[v5,8/9] PCI/DPC: Add support for DPC recovery on NON_FATAL errors

Message ID 211d4bf8f856c6aa5454751e25ab5c90970960ff.1563912591.git.sathyanarayanan.kuppuswamy@linux.intel.com
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series Add Error Disconnect Recover (EDR) support | expand

Commit Message

Kuppuswamy Sathyanarayanan July 23, 2019, 8:21 p.m. UTC
From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

Currently, in native mode, DPC driver is configured to trigger DPC only
for FATAL errors and hence it only supports port recovery for FATAL
errors. But with Error Disconnect Recover (EDR) support, DPC
configuration is done by firmware, and hence we should expect DPC
triggered for both FATAL/NON-FATAL errors. So add support for DPC
recovery with NON-FATAL errors.

Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
---
 drivers/pci/pcie/dpc.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

Comments

Keith Busch July 25, 2019, 4:50 p.m. UTC | #1
On Tue, Jul 23, 2019 at 01:21:50PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> Currently, in native mode, DPC driver is configured to trigger DPC only
> for FATAL errors and hence it only supports port recovery for FATAL
> errors. But with Error Disconnect Recover (EDR) support, DPC
> configuration is done by firmware, and hence we should expect DPC
> triggered for both FATAL/NON-FATAL errors. So add support for DPC
> recovery with NON-FATAL errors.
> 
> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> ---
>  drivers/pci/pcie/dpc.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index 6e350149d793..5d328812aea9 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -267,15 +267,20 @@ static void dpc_process_error(struct dpc_dev *dpc)
>  	/* show RP PIO error detail information */
>  	if (dpc->rp_extensions && reason == 3 && ext_reason == 0)
>  		dpc_process_rp_pio_error(dpc);
> -	else if (reason == 0 &&
> +	else if (reason <= 2 &&
>  		 dpc_get_aer_uncorrect_severity(pdev, &info) &&
>  		 aer_get_device_error_info(pdev, &info)) {
>  		aer_print_error(pdev, &info);
>  		pci_cleanup_aer_uncorrect_error_status(pdev);
> -		pci_aer_clear_fatal_status(pdev);
> +		if (reason != 1)
> +			pci_aer_clear_fatal_status(pdev);

I'm not quite sure I understand the above. If the reason is 1 or 2,
then the DSP received an error message from something downstream. In
otherwords, the port was notified an error occured somewhere, but it
was not the device that detected that error, so we should not expect
aer_print_error on that device to show anything useful. Right?

>  	}
>  
> -	/* We configure DPC so it only triggers on ERR_FATAL */
> +	/*
> +	 * Irrespective of whether the DPC event is triggered by
> +	 * ERR_FATAL or ERR_NONFATAL, since the link is already down,
> +	 * use the FATAL error recovery path for both cases.
> +	 */
>  	pcie_do_recovery(pdev, pci_channel_io_frozen, PCIE_PORT_SERVICE_DPC);
>  }
>  
> -- 
> 2.21.0
>
Kuppuswamy Sathyanarayanan July 26, 2019, 8:54 p.m. UTC | #2
On 7/25/19 9:50 AM, Keith Busch wrote:
> On Tue, Jul 23, 2019 at 01:21:50PM -0700, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
>> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>
>> Currently, in native mode, DPC driver is configured to trigger DPC only
>> for FATAL errors and hence it only supports port recovery for FATAL
>> errors. But with Error Disconnect Recover (EDR) support, DPC
>> configuration is done by firmware, and hence we should expect DPC
>> triggered for both FATAL/NON-FATAL errors. So add support for DPC
>> recovery with NON-FATAL errors.
>>
>> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>> ---
>>   drivers/pci/pcie/dpc.c | 11 ++++++++---
>>   1 file changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
>> index 6e350149d793..5d328812aea9 100644
>> --- a/drivers/pci/pcie/dpc.c
>> +++ b/drivers/pci/pcie/dpc.c
>> @@ -267,15 +267,20 @@ static void dpc_process_error(struct dpc_dev *dpc)
>>   	/* show RP PIO error detail information */
>>   	if (dpc->rp_extensions && reason == 3 && ext_reason == 0)
>>   		dpc_process_rp_pio_error(dpc);
>> -	else if (reason == 0 &&
>> +	else if (reason <= 2 &&
>>   		 dpc_get_aer_uncorrect_severity(pdev, &info) &&
>>   		 aer_get_device_error_info(pdev, &info)) {
>>   		aer_print_error(pdev, &info);
>>   		pci_cleanup_aer_uncorrect_error_status(pdev);
>> -		pci_aer_clear_fatal_status(pdev);
>> +		if (reason != 1)
>> +			pci_aer_clear_fatal_status(pdev);
> I'm not quite sure I understand the above. If the reason is 1 or 2,
> then the DSP received an error message from something downstream. In
> otherwords, the port was notified an error occured somewhere, but it
> was not the device that detected that error, so we should not expect
> aer_print_error on that device to show anything useful. Right?

My intention here was to just clear the AER registers. EDR spec expects 
OS to clear AER registers during DPC event (irrespective of DPC trigger 
reason). Consider a case where DPC is triggered due to ERR_FATAL 
message, but at the same time the port has some correctable errors 
logged. So before handling DPC, we need to clear the AER registers. 
Since there is no AER driver to handle AER registers in EDR mode, we 
need to clear AER registers in the same code path. But I got your point 
about there is nothing to report/print if the reason != 0. I think the 
correct way to clear all AER registers is to use 
pci_cleanup_aer_error_status_regs(). I will fix this and send a new 
patch set.

>
>>   	}
>>   
>> -	/* We configure DPC so it only triggers on ERR_FATAL */
>> +	/*
>> +	 * Irrespective of whether the DPC event is triggered by
>> +	 * ERR_FATAL or ERR_NONFATAL, since the link is already down,
>> +	 * use the FATAL error recovery path for both cases.
>> +	 */
>>   	pcie_do_recovery(pdev, pci_channel_io_frozen, PCIE_PORT_SERVICE_DPC);
>>   }
>>   
>> -- 
>> 2.21.0
>>
diff mbox series

Patch

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 6e350149d793..5d328812aea9 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -267,15 +267,20 @@  static void dpc_process_error(struct dpc_dev *dpc)
 	/* show RP PIO error detail information */
 	if (dpc->rp_extensions && reason == 3 && ext_reason == 0)
 		dpc_process_rp_pio_error(dpc);
-	else if (reason == 0 &&
+	else if (reason <= 2 &&
 		 dpc_get_aer_uncorrect_severity(pdev, &info) &&
 		 aer_get_device_error_info(pdev, &info)) {
 		aer_print_error(pdev, &info);
 		pci_cleanup_aer_uncorrect_error_status(pdev);
-		pci_aer_clear_fatal_status(pdev);
+		if (reason != 1)
+			pci_aer_clear_fatal_status(pdev);
 	}
 
-	/* We configure DPC so it only triggers on ERR_FATAL */
+	/*
+	 * Irrespective of whether the DPC event is triggered by
+	 * ERR_FATAL or ERR_NONFATAL, since the link is already down,
+	 * use the FATAL error recovery path for both cases.
+	 */
 	pcie_do_recovery(pdev, pci_channel_io_frozen, PCIE_PORT_SERVICE_DPC);
 }