diff mbox series

[1/3] PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors

Message ID 20231206224231.732765-2-helgaas@kernel.org (mailing list archive)
State Handled Elsewhere
Headers show
Series PCI/AER: Clean up logging | expand

Commit Message

Bjorn Helgaas Dec. 6, 2023, 10:42 p.m. UTC
From: Bjorn Helgaas <bhelgaas@google.com>

The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
Previously we printed these as "Corrected" or "Uncorrected".  To avoid
confusion, use the same terms as the spec.

One confusing situation is when one agent detects an error, but another
agent is responsible for recovery, e.g., by re-attempting the operation.
The first agent may log a "correctable" error but it has not yet been
corrected.  The recovery agent must report an uncorrectable error if it is
unable to recover.  If we print the first agent's error as "Corrected", it
gives the false impression that it has already been resolved.

Sample message change:

  - pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
  + pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
---
 drivers/pci/pcie/aer.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Jonathan Cameron Dec. 8, 2023, 2:36 p.m. UTC | #1
On Wed,  6 Dec 2023 16:42:29 -0600
Bjorn Helgaas <helgaas@kernel.org> wrote:

> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
> Previously we printed these as "Corrected" or "Uncorrected".  To avoid
> confusion, use the same terms as the spec.
> 
> One confusing situation is when one agent detects an error, but another
> agent is responsible for recovery, e.g., by re-attempting the operation.
> The first agent may log a "correctable" error but it has not yet been
> corrected.  The recovery agent must report an uncorrectable error if it is
> unable to recover.  If we print the first agent's error as "Corrected", it
> gives the false impression that it has already been resolved.
> 
> Sample message change:
> 
>   - pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
>   + pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com
Good to tidy this up. FWIW
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Terry Bowman Dec. 12, 2023, 3 p.m. UTC | #2
Hi Bjorn,

Will help prevent confusion. LGTM. 

On 12/6/23 16:42, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
> Previously we printed these as "Corrected" or "Uncorrected".  To avoid
> confusion, use the same terms as the spec.
> 
> One confusing situation is when one agent detects an error, but another
> agent is responsible for recovery, e.g., by re-attempting the operation.
> The first agent may log a "correctable" error but it has not yet been
> corrected.  The recovery agent must report an uncorrectable error if it is
> unable to recover.  If we print the first agent's error as "Corrected", it
> gives the false impression that it has already been resolved.
> 
> Sample message change:
> 
>   - pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
>   + pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---
>  drivers/pci/pcie/aer.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 42a3bd35a3e1..20db80018b5d 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -436,9 +436,9 @@ void pci_aer_exit(struct pci_dev *dev)
>   * AER error strings
>   */
>  static const char *aer_error_severity_string[] = {
> -	"Uncorrected (Non-Fatal)",
> -	"Uncorrected (Fatal)",
> -	"Corrected"
> +	"Uncorrectable (Non-Fatal)",
> +	"Uncorrectable (Fatal)",
> +	"Correctable"
>  };
>  
>  static const char *aer_error_layer[] = {
Bjorn Helgaas Dec. 12, 2023, 9:23 p.m. UTC | #3
On Tue, Dec 12, 2023 at 09:00:24AM -0600, Terry Bowman wrote:
> Hi Bjorn,
> 
> Will help prevent confusion. LGTM. 

Thanks a lot for taking a look at these!  I'd like to give you credit
in the log, e.g., "Reviewed-by: Terry Bowman <Terry.Bowman@amd.com>",
but I'm OCD enough that I don't want to translate "LGTM" into that all
by myself.

If you want that credit (and, I guess, the privilege of being cc'd
when we find that these patches break something :)), just reply again
with that actual "Reviewed-by:" text in it.

Bjorn
Bowman, Terry Dec. 12, 2023, 10:42 p.m. UTC | #4
No problem. You can add my "Reviewed-by". Thanks.

Regards,
Terry
On 12/12/2023 3:23 PM, Bjorn Helgaas wrote:
> On Tue, Dec 12, 2023 at 09:00:24AM -0600, Terry Bowman wrote:
>> Hi Bjorn,
>>
>> Will help prevent confusion. LGTM.
> Thanks a lot for taking a look at these!  I'd like to give you credit
> in the log, e.g., "Reviewed-by: Terry Bowman<Terry.Bowman@amd.com>",
> but I'm OCD enough that I don't want to translate "LGTM" into that all
> by myself.
>
> If you want that credit (and, I guess, the privilege of being cc'd
> when we find that these patches break something :)), just reply again
> with that actual "Reviewed-by:" text in it.
>
> Bjorn
Kuppuswamy Sathyanarayanan Jan. 2, 2024, 7:23 p.m. UTC | #5
On 12/6/2023 2:42 PM, Bjorn Helgaas wrote:
> From: Bjorn Helgaas <bhelgaas@google.com>
> 
> The PCIe spec classifies errors as either "Correctable" or "Uncorrectable".
> Previously we printed these as "Corrected" or "Uncorrected".  To avoid
> confusion, use the same terms as the spec.
> 
> One confusing situation is when one agent detects an error, but another
> agent is responsible for recovery, e.g., by re-attempting the operation.
> The first agent may log a "correctable" error but it has not yet been
> corrected.  The recovery agent must report an uncorrectable error if it is
> unable to recover.  If we print the first agent's error as "Corrected", it
> gives the false impression that it has already been resolved.
> 
> Sample message change:
> 
>   - pcieport 0000:00:1c.5: AER: Corrected error received: 0000:00:1c.5
>   + pcieport 0000:00:1c.5: AER: Correctable error received: 0000:00:1c.5
> 
> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> ---

Looks good to me.

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

>  drivers/pci/pcie/aer.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 42a3bd35a3e1..20db80018b5d 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -436,9 +436,9 @@ void pci_aer_exit(struct pci_dev *dev)
>   * AER error strings
>   */
>  static const char *aer_error_severity_string[] = {
> -	"Uncorrected (Non-Fatal)",
> -	"Uncorrected (Fatal)",
> -	"Corrected"
> +	"Uncorrectable (Non-Fatal)",
> +	"Uncorrectable (Fatal)",
> +	"Correctable"
>  };
>  
>  static const char *aer_error_layer[] = {
diff mbox series

Patch

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index 42a3bd35a3e1..20db80018b5d 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -436,9 +436,9 @@  void pci_aer_exit(struct pci_dev *dev)
  * AER error strings
  */
 static const char *aer_error_severity_string[] = {
-	"Uncorrected (Non-Fatal)",
-	"Uncorrected (Fatal)",
-	"Corrected"
+	"Uncorrectable (Non-Fatal)",
+	"Uncorrectable (Fatal)",
+	"Correctable"
 };
 
 static const char *aer_error_layer[] = {