[1/4] PCI/AER: Return approrpiate value when AER is not supported

Message ID 20171219210643.24615-1-keith.busch@intel.com
State Accepted
Headers show
Series
  • [1/4] PCI/AER: Return approrpiate value when AER is not supported
Related show

Commit Message

Keith Busch Dec. 19, 2017, 9:06 p.m.
Getting the AER information is documented to return 0 when it failed to
get the information.

Signed-off-by: Keith Busch <keith.busch@intel.com>
---
 drivers/pci/pcie/aer/aerdrv_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Bjorn Helgaas Dec. 21, 2017, 1:04 a.m. | #1
s/approrpiate/appropriate/

(But maybe should be restated altogether, see below.)

On Tue, Dec 19, 2017 at 02:06:40PM -0700, Keith Busch wrote:
> Getting the AER information is documented to return 0 when it failed to
> get the information.

I think this case is either impossible (if we only call this function
for devices known to support AER), or it fixes an actual bug (the
caller would call aer_print_error() when it shouldn't, and potentially
print garbage).  Right?

If the former, I vote for removing the test.  If the latter, the
changelog should mention that it fixes a bug.

> Signed-off-by: Keith Busch <keith.busch@intel.com>
> ---
>  drivers/pci/pcie/aer/aerdrv_core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> index 744805232155..ea0dc1cc7fc7 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -660,7 +660,7 @@ static int get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
>  
>  	/* The device might not support AER */
>  	if (!pos)
> -		return 1;
> +		return 0;
>  
>  	if (info->severity == AER_CORRECTABLE) {
>  		pci_read_config_dword(dev, pos + PCI_ERR_COR_STATUS,
> -- 
> 2.13.6
>
Dongdong Liu Dec. 21, 2017, 3:53 a.m. | #2
Current code APEI AER and native AER seems a little different.
APEI AER ,the device must have AER capability, then can do recovery.
Native AER seems the device under the ROOT PORT can do not have the
AER capability. Current patch seems can fix the Inconsistency.
But as PCIe 4.0r1.0 6.2.5 Sequence of Device Error Signaling and Logging
Operations describe,It seems the device can implement the Device Status reg,
but do not have AER capability.So for such devices, can we use AER driver to
call device error handler callback to do recovery?

Thanks,
Dongdong
在 2017/12/21 9:04, Bjorn Helgaas 写道:
> s/approrpiate/appropriate/
>
> (But maybe should be restated altogether, see below.)
>
> On Tue, Dec 19, 2017 at 02:06:40PM -0700, Keith Busch wrote:
>> Getting the AER information is documented to return 0 when it failed to
>> get the information.
>
> I think this case is either impossible (if we only call this function
> for devices known to support AER), or it fixes an actual bug (the
> caller would call aer_print_error() when it shouldn't, and potentially
> print garbage).  Right?
>
> If the former, I vote for removing the test.  If the latter, the
> changelog should mention that it fixes a bug.
>
>> Signed-off-by: Keith Busch <keith.busch@intel.com>
>> ---
>>  drivers/pci/pcie/aer/aerdrv_core.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
>> index 744805232155..ea0dc1cc7fc7 100644
>> --- a/drivers/pci/pcie/aer/aerdrv_core.c
>> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
>> @@ -660,7 +660,7 @@ static int get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
>>
>>  	/* The device might not support AER */
>>  	if (!pos)
>> -		return 1;
>> +		return 0;
>>
>>  	if (info->severity == AER_CORRECTABLE) {
>>  		pci_read_config_dword(dev, pos + PCI_ERR_COR_STATUS,
>> --
>> 2.13.6
>>
>
> .
>
Keith Busch Dec. 21, 2017, 2:59 p.m. | #3
On Wed, Dec 20, 2017 at 05:04:52PM -0800, Bjorn Helgaas wrote:
> On Tue, Dec 19, 2017 at 02:06:40PM -0700, Keith Busch wrote:
> > Getting the AER information is documented to return 0 when it failed to
> > get the information.
> 
> I think this case is either impossible (if we only call this function
> for devices known to support AER), or it fixes an actual bug (the
> caller would call aer_print_error() when it shouldn't, and potentially
> print garbage).  Right?
> 
> If the former, I vote for removing the test.  If the latter, the
> changelog should mention that it fixes a bug.

I just spotted this mistake when I was changing the function to
non-static. I've never observed bogus data printed in real life.

In the current AER handling, the only way we could incorrectly get there
is if the root port's Error Source ID Register somehow contains the ID
of a device that doesn't have AER capabilities, which would of course
be broken.

So if we trust hardware, we should are safe to remove the check.
Bjorn Helgaas March 20, 2018, 11:02 p.m. | #4
On Tue, Dec 19, 2017 at 02:06:40PM -0700, Keith Busch wrote:
> Getting the AER information is documented to return 0 when it failed to
> get the information.
> 
> Signed-off-by: Keith Busch <keith.busch@intel.com>

Keith, I'm sorry, but I lost track of where we're at with this series.
Can you repost a fresh version if it's still applicable?

Don't worry about rebasing it to anything other than v4.16-rc1 (my
"master" branch).  I might have added some conflicting things, but
I'll take care of resolving any conflicts.

> ---
>  drivers/pci/pcie/aer/aerdrv_core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> index 744805232155..ea0dc1cc7fc7 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -660,7 +660,7 @@ static int get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
>  
>  	/* The device might not support AER */
>  	if (!pos)
> -		return 1;
> +		return 0;
>  
>  	if (info->severity == AER_CORRECTABLE) {
>  		pci_read_config_dword(dev, pos + PCI_ERR_COR_STATUS,
> -- 
> 2.13.6
>
Keith Busch March 21, 2018, 10:27 p.m. | #5
On Tue, Mar 20, 2018 at 04:02:01PM -0700, Bjorn Helgaas wrote:
> On Tue, Dec 19, 2017 at 02:06:40PM -0700, Keith Busch wrote:
> > Getting the AER information is documented to return 0 when it failed to
> > get the information.
> > 
> > Signed-off-by: Keith Busch <keith.busch@intel.com>
> 
> Keith, I'm sorry, but I lost track of where we're at with this series.
> Can you repost a fresh version if it's still applicable?
> 
> Don't worry about rebasing it to anything other than v4.16-rc1 (my
> "master" branch).  I might have added some conflicting things, but
> I'll take care of resolving any conflicts.

No problem, I think this is still applicable, and can be done without
conflicting with what Oza and Sinan are working on (it took a few tries,
but it finally clicked for me on how that works).

I'll rebase a new series and should have it posted by end of week.

Thanks,
Keith

Patch

diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index 744805232155..ea0dc1cc7fc7 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -660,7 +660,7 @@  static int get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
 
 	/* The device might not support AER */
 	if (!pos)
-		return 1;
+		return 0;
 
 	if (info->severity == AER_CORRECTABLE) {
 		pci_read_config_dword(dev, pos + PCI_ERR_COR_STATUS,