diff mbox series

[V1] PCI/AER: Configure ECRC only AER is native

Message ID 20230111203116.4896-1-vidyas@nvidia.com
State New
Headers show
Series [V1] PCI/AER: Configure ECRC only AER is native | expand

Commit Message

Vidya Sagar Jan. 11, 2023, 8:31 p.m. UTC
As the ECRC configuration bits are part of AER registers, configure
ECRC only if AER is natively owned by the kernel.

Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
---
 drivers/pci/pcie/aer.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Kuppuswamy Sathyanarayanan Jan. 11, 2023, 9:42 p.m. UTC | #1
On 1/11/23 12:31 PM, Vidya Sagar wrote:
> As the ECRC configuration bits are part of AER registers, configure
> ECRC only if AER is natively owned by the kernel.

ecrc command line option takes "bios/on/off" as possible options. It
does not clarify whether "on/off" choices can only be used if AER is
owned by OS or it can override the ownership of ECRC configuration 
similar to pcie_ports=native option. Maybe that needs to be clarified.

> 
> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> ---
>  drivers/pci/pcie/aer.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index e2d8a74f83c3..730b47bdcdef 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -184,6 +184,9 @@ static int disable_ecrc_checking(struct pci_dev *dev)
>   */
>  void pcie_set_ecrc_checking(struct pci_dev *dev)
>  {
> +	if (!pcie_aer_is_native(dev))
> +		return;
> +
>  	switch (ecrc_policy) {
>  	case ECRC_POLICY_DEFAULT:
>  		return;
Bjorn Helgaas Jan. 11, 2023, 11:10 p.m. UTC | #2
On Wed, Jan 11, 2023 at 01:42:21PM -0800, Sathyanarayanan Kuppuswamy wrote:
> On 1/11/23 12:31 PM, Vidya Sagar wrote:
> > As the ECRC configuration bits are part of AER registers, configure
> > ECRC only if AER is natively owned by the kernel.
> 
> ecrc command line option takes "bios/on/off" as possible options. It
> does not clarify whether "on/off" choices can only be used if AER is
> owned by OS or it can override the ownership of ECRC configuration 
> similar to pcie_ports=native option. Maybe that needs to be clarified.

Good point, what do you think of an update like this:

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6cfa6e3996cf..f7b40a439194 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4296,7 +4296,9 @@
 				specified, e.g., 12@pci:8086:9c22:103c:198f
 				for 4096-byte alignment.
 		ecrc=		Enable/disable PCIe ECRC (transaction layer
-				end-to-end CRC checking).
+				end-to-end CRC checking).  Only effective
+				if OS has native AER control (either granted by
+				ACPI _OSC or forced via "pcie_ports=native").
 				bios: Use BIOS/firmware settings. This is the
 				the default.
 				off: Turn ECRC off

I don't know whether the "ecrc=" parameter is really needed.  If we
were adding it today, I would ask "why not enable ECRC wherever it is
supported?"  If there are devices where it's broken, we could always
add quirks to disable it on a case-by-case basis.

But I think the patch below is the right thing to do for now.  Vidya,
did you trip over an issue because of this, e.g., a conflict between
firmware use of AER and Linux use of it?  If so, maybe we could
mention a symptom on the commit log.  But my guess is you probably
found this by inspection.

Bjorn

> > Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> > ---
> >  drivers/pci/pcie/aer.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> > index e2d8a74f83c3..730b47bdcdef 100644
> > --- a/drivers/pci/pcie/aer.c
> > +++ b/drivers/pci/pcie/aer.c
> > @@ -184,6 +184,9 @@ static int disable_ecrc_checking(struct pci_dev *dev)
> >   */
> >  void pcie_set_ecrc_checking(struct pci_dev *dev)
> >  {
> > +	if (!pcie_aer_is_native(dev))
> > +		return;
> > +
> >  	switch (ecrc_policy) {
> >  	case ECRC_POLICY_DEFAULT:
> >  		return;
> 
> -- 
> Sathyanarayanan Kuppuswamy
> Linux Kernel Developer
Kuppuswamy Sathyanarayanan Jan. 11, 2023, 11:27 p.m. UTC | #3
Hi,

On 1/11/23 3:10 PM, Bjorn Helgaas wrote:
> On Wed, Jan 11, 2023 at 01:42:21PM -0800, Sathyanarayanan Kuppuswamy wrote:
>> On 1/11/23 12:31 PM, Vidya Sagar wrote:
>>> As the ECRC configuration bits are part of AER registers, configure
>>> ECRC only if AER is natively owned by the kernel.
>>
>> ecrc command line option takes "bios/on/off" as possible options. It
>> does not clarify whether "on/off" choices can only be used if AER is
>> owned by OS or it can override the ownership of ECRC configuration 
>> similar to pcie_ports=native option. Maybe that needs to be clarified.
> 
> Good point, what do you think of an update like this:
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 6cfa6e3996cf..f7b40a439194 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4296,7 +4296,9 @@
>  				specified, e.g., 12@pci:8086:9c22:103c:198f
>  				for 4096-byte alignment.
>  		ecrc=		Enable/disable PCIe ECRC (transaction layer
> -				end-to-end CRC checking).
> +				end-to-end CRC checking).  Only effective
> +				if OS has native AER control (either granted by
> +				ACPI _OSC or forced via "pcie_ports=native").
>  				bios: Use BIOS/firmware settings. This is the
>  				the default.
>  				off: Turn ECRC off

Looks fine. But do we even need "bios" option? Since it is the default
value, I am not sure why we need to list that as an option again. IMO
this could be removed.

> 
> I don't know whether the "ecrc=" parameter is really needed.  If we
> were adding it today, I would ask "why not enable ECRC wherever it is
> supported?"  If there are devices where it's broken, we could always
> add quirks to disable it on a case-by-case basis.

Checking the original patch which added it, it looks like the intention
is to give option to boost performance over integrity.

commit 43c16408842b0eeb367c23a6fa540ce69f99e347
Author: Andrew Patterson <andrew.patterson@hp.com>
Date:   Wed Apr 22 16:52:09 2009 -0600

    PCI: Add support for turning PCIe ECRC on or off
    
    Adds support for PCI Express transaction layer end-to-end CRC checking
    (ECRC).  This patch will enable/disable ECRC checking by setting/clearing
    the ECRC Check Enable and/or ECRC Generation Enable bits for devices that
    support ECRC.
    
    The ECRC setting is controlled by the "pci=ecrc=<policy>" command-line
    option. If this option is not set or is set to 'bios", the enable and
    generation bits are left in whatever state that firmware/BIOS set them to.
    The "off" setting turns them off, and the "on" option turns them on (if the
    device supports it).
    
    Turning ECRC on or off can be a data integrity versus performance
    tradeoff.  In theory, turning it on will catch more data errors, turning
    it off means possibly better performance since CRC does not need to be
    calculated by the PCIe hardware and packet sizes are reduced.


> 
> But I think the patch below is the right thing to do for now.  Vidya,

Agree.

> did you trip over an issue because of this, e.g., a conflict between
> firmware use of AER and Linux use of it?  If so, maybe we could
> mention a symptom on the commit log.  But my guess is you probably
> found this by inspection.
> 
> Bjorn
> 
>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>> ---
>>>  drivers/pci/pcie/aer.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>>> index e2d8a74f83c3..730b47bdcdef 100644
>>> --- a/drivers/pci/pcie/aer.c
>>> +++ b/drivers/pci/pcie/aer.c
>>> @@ -184,6 +184,9 @@ static int disable_ecrc_checking(struct pci_dev *dev)
>>>   */
>>>  void pcie_set_ecrc_checking(struct pci_dev *dev)
>>>  {
>>> +	if (!pcie_aer_is_native(dev))
>>> +		return;
>>> +
>>>  	switch (ecrc_policy) {
>>>  	case ECRC_POLICY_DEFAULT:
>>>  		return;
>>
>> -- 
>> Sathyanarayanan Kuppuswamy
>> Linux Kernel Developer
Vidya Sagar Jan. 12, 2023, 3:33 a.m. UTC | #4
On 1/12/2023 4:57 AM, Sathyanarayanan Kuppuswamy wrote:
> External email: Use caution opening links or attachments
> 
> 
> Hi,
> 
> On 1/11/23 3:10 PM, Bjorn Helgaas wrote:
>> On Wed, Jan 11, 2023 at 01:42:21PM -0800, Sathyanarayanan Kuppuswamy wrote:
>>> On 1/11/23 12:31 PM, Vidya Sagar wrote:
>>>> As the ECRC configuration bits are part of AER registers, configure
>>>> ECRC only if AER is natively owned by the kernel.
>>>
>>> ecrc command line option takes "bios/on/off" as possible options. It
>>> does not clarify whether "on/off" choices can only be used if AER is
>>> owned by OS or it can override the ownership of ECRC configuration
>>> similar to pcie_ports=native option. Maybe that needs to be clarified.
>>
>> Good point, what do you think of an update like this:
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 6cfa6e3996cf..f7b40a439194 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -4296,7 +4296,9 @@
>>                                specified, e.g., 12@pci:8086:9c22:103c:198f
>>                                for 4096-byte alignment.
>>                ecrc=           Enable/disable PCIe ECRC (transaction layer
>> -                             end-to-end CRC checking).
>> +                             end-to-end CRC checking).  Only effective
>> +                             if OS has native AER control (either granted by
>> +                             ACPI _OSC or forced via "pcie_ports=native").
>>                                bios: Use BIOS/firmware settings. This is the
>>                                the default.
>>                                off: Turn ECRC off
I'm also fine with this change. I'll take it in V2.
> 
> Looks fine. But do we even need "bios" option? Since it is the default
> value, I am not sure why we need to list that as an option again. IMO
> this could be removed.
I think we still need bios option. For example, consider a system where 
BIOS needs to keep ECRC enabled for integrity reasons but if kernel 
doesn't want it for perf reasons, then, kernel can always use 'ecrc=off' 
option.
> 
>>
>> I don't know whether the "ecrc=" parameter is really needed.  If we
>> were adding it today, I would ask "why not enable ECRC wherever it is
>> supported?"  If there are devices where it's broken, we could always
>> add quirks to disable it on a case-by-case basis.
> 
> Checking the original patch which added it, it looks like the intention
> is to give option to boost performance over integrity.
> 
> commit 43c16408842b0eeb367c23a6fa540ce69f99e347
> Author: Andrew Patterson <andrew.patterson@hp.com>
> Date:   Wed Apr 22 16:52:09 2009 -0600
> 
>      PCI: Add support for turning PCIe ECRC on or off
> 
>      Adds support for PCI Express transaction layer end-to-end CRC checking
>      (ECRC).  This patch will enable/disable ECRC checking by setting/clearing
>      the ECRC Check Enable and/or ECRC Generation Enable bits for devices that
>      support ECRC.
> 
>      The ECRC setting is controlled by the "pci=ecrc=<policy>" command-line
>      option. If this option is not set or is set to 'bios", the enable and
>      generation bits are left in whatever state that firmware/BIOS set them to.
>      The "off" setting turns them off, and the "on" option turns them on (if the
>      device supports it).
> 
>      Turning ECRC on or off can be a data integrity versus performance
>      tradeoff.  In theory, turning it on will catch more data errors, turning
>      it off means possibly better performance since CRC does not need to be
>      calculated by the PCIe hardware and packet sizes are reduced.
> 
> 
>>
>> But I think the patch below is the right thing to do for now.  Vidya,
> 
> Agree.
> 
>> did you trip over an issue because of this, e.g., a conflict between
>> firmware use of AER and Linux use of it?  If so, maybe we could
>> mention a symptom on the commit log.  But my guess is you probably
>> found this by inspection.
Not really. I was just checking when does kernel touch ECRC settings and 
happened to find this where it configures ECRC irrespective of its 
ownership of AER registers.

>>
>> Bjorn
>>
>>>> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
>>>> ---
>>>>   drivers/pci/pcie/aer.c | 3 +++
>>>>   1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>>>> index e2d8a74f83c3..730b47bdcdef 100644
>>>> --- a/drivers/pci/pcie/aer.c
>>>> +++ b/drivers/pci/pcie/aer.c
>>>> @@ -184,6 +184,9 @@ static int disable_ecrc_checking(struct pci_dev *dev)
>>>>    */
>>>>   void pcie_set_ecrc_checking(struct pci_dev *dev)
>>>>   {
>>>> +   if (!pcie_aer_is_native(dev))
>>>> +           return;
>>>> +
>>>>      switch (ecrc_policy) {
>>>>      case ECRC_POLICY_DEFAULT:
>>>>              return;
>>>
>>> --
>>> Sathyanarayanan Kuppuswamy
>>> Linux Kernel Developer
> 
> --
> Sathyanarayanan Kuppuswamy
> Linux Kernel Developer
Kuppuswamy Sathyanarayanan Jan. 12, 2023, 3:48 a.m. UTC | #5
On 1/11/23 7:33 PM, Vidya Sagar wrote:
> I think we still need bios option. For example, consider a system where BIOS needs to keep ECRC enabled for integrity reasons but if kernel doesn't want it for perf reasons, then, kernel can always use 'ecrc=off' option.

I agree that "on" and "off" option makes sense. Since the kernel defaults ecrc setting to "bios", why again allow it as a command line option?
Vidya Sagar Jan. 12, 2023, 4:59 a.m. UTC | #6
On 1/12/2023 9:18 AM, Sathyanarayanan Kuppuswamy wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 1/11/23 7:33 PM, Vidya Sagar wrote:
>> I think we still need bios option. For example, consider a system where BIOS needs to keep ECRC enabled for integrity reasons but if kernel doesn't want it for perf reasons, then, kernel can always use 'ecrc=off' option.
> 
> I agree that "on" and "off" option makes sense. Since the kernel defaults ecrc setting to "bios", why again allow it as a command line option?

Agree. "on" and "off" are fine but "default" is redundant. Do you want 
me to push a change to remove that as part of this patch itself? I think
it is more like a cleanup and should go separately.

> 
> --
> Sathyanarayanan Kuppuswamy
> Linux Kernel Developer
Kuppuswamy Sathyanarayanan Jan. 12, 2023, 5:06 a.m. UTC | #7
On 1/11/23 8:59 PM, Vidya Sagar wrote:
> 
> 
> On 1/12/2023 9:18 AM, Sathyanarayanan Kuppuswamy wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On 1/11/23 7:33 PM, Vidya Sagar wrote:
>>> I think we still need bios option. For example, consider a system where BIOS needs to keep ECRC enabled for integrity reasons but if kernel doesn't want it for perf reasons, then, kernel can always use 'ecrc=off' option.
>>
>> I agree that "on" and "off" option makes sense. Since the kernel defaults ecrc setting to "bios", why again allow it as a command line option?
> 
> Agree. "on" and "off" are fine but "default" is redundant. Do you want me to push a change to remove that as part of this patch itself? I think
> it is more like a cleanup and should go separately.

IMO, the "bios" option cleanup and command line update from Bjorn can be in one patch, and your change could be a separate patch. But it is
up to you and Bjorn.

> 
>>
>> -- 
>> Sathyanarayanan Kuppuswamy
>> Linux Kernel Developer
Vidya Sagar Jan. 12, 2023, 7:12 a.m. UTC | #8
On 1/12/2023 10:36 AM, Sathyanarayanan Kuppuswamy wrote:
> External email: Use caution opening links or attachments
> 
> 
> On 1/11/23 8:59 PM, Vidya Sagar wrote:
>>
>>
>> On 1/12/2023 9:18 AM, Sathyanarayanan Kuppuswamy wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On 1/11/23 7:33 PM, Vidya Sagar wrote:
>>>> I think we still need bios option. For example, consider a system where BIOS needs to keep ECRC enabled for integrity reasons but if kernel doesn't want it for perf reasons, then, kernel can always use 'ecrc=off' option.
>>>
>>> I agree that "on" and "off" option makes sense. Since the kernel defaults ecrc setting to "bios", why again allow it as a command line option?
>>
>> Agree. "on" and "off" are fine but "default" is redundant. Do you want me to push a change to remove that as part of this patch itself? I think
>> it is more like a cleanup and should go separately.
> 
> IMO, the "bios" option cleanup and command line update from Bjorn can be in one patch, and your change could be a separate patch. But it is
> up to you and Bjorn.

I think Bjorn's command line suggestion should go along with my patch 
otherwise the ECRC control through command line doesn't work if OS 
doesn't own the AER. So, it helps to make it explicit that the 'ecrc=' 
option works only if either kernel has native AER control or 
'pcie_ports' is set to 'native'

> 
>>
>>>
>>> --
>>> Sathyanarayanan Kuppuswamy
>>> Linux Kernel Developer
> 
> --
> Sathyanarayanan Kuppuswamy
> Linux Kernel Developer
Bjorn Helgaas Jan. 12, 2023, 6:23 p.m. UTC | #9
On Wed, Jan 11, 2023 at 03:27:51PM -0800, Sathyanarayanan Kuppuswamy wrote:
> On 1/11/23 3:10 PM, Bjorn Helgaas wrote:
> > On Wed, Jan 11, 2023 at 01:42:21PM -0800, Sathyanarayanan Kuppuswamy wrote:
> >> On 1/11/23 12:31 PM, Vidya Sagar wrote:
> >>> As the ECRC configuration bits are part of AER registers, configure
> >>> ECRC only if AER is natively owned by the kernel.
> >>
> >> ecrc command line option takes "bios/on/off" as possible options. It
> >> does not clarify whether "on/off" choices can only be used if AER is
> >> owned by OS or it can override the ownership of ECRC configuration 
> >> similar to pcie_ports=native option. Maybe that needs to be clarified.
> > 
> > Good point, what do you think of an update like this:
> > 
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index 6cfa6e3996cf..f7b40a439194 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -4296,7 +4296,9 @@
> >  				specified, e.g., 12@pci:8086:9c22:103c:198f
> >  				for 4096-byte alignment.
> >  		ecrc=		Enable/disable PCIe ECRC (transaction layer
> > -				end-to-end CRC checking).
> > +				end-to-end CRC checking).  Only effective
> > +				if OS has native AER control (either granted by
> > +				ACPI _OSC or forced via "pcie_ports=native").
> >  				bios: Use BIOS/firmware settings. This is the
> >  				the default.
> >  				off: Turn ECRC off
> 
> Looks fine. But do we even need "bios" option? Since it is the default
> value, I am not sure why we need to list that as an option again. IMO
> this could be removed.

I agree, it seems pointless.

> > I don't know whether the "ecrc=" parameter is really needed.  If we
> > were adding it today, I would ask "why not enable ECRC wherever it is
> > supported?"  If there are devices where it's broken, we could always
> > add quirks to disable it on a case-by-case basis.
> 
> Checking the original patch which added it, it looks like the intention
> is to give option to boost performance over integrity.
> 
> commit 43c16408842b0eeb367c23a6fa540ce69f99e347
> Author: Andrew Patterson <andrew.patterson@hp.com>
> Date:   Wed Apr 22 16:52:09 2009 -0600
> 
>     PCI: Add support for turning PCIe ECRC on or off
>     
>     Adds support for PCI Express transaction layer end-to-end CRC checking
>     (ECRC).  This patch will enable/disable ECRC checking by setting/clearing
>     the ECRC Check Enable and/or ECRC Generation Enable bits for devices that
>     support ECRC.
>     
>     The ECRC setting is controlled by the "pci=ecrc=<policy>" command-line
>     option. If this option is not set or is set to 'bios", the enable and
>     generation bits are left in whatever state that firmware/BIOS set them to.
>     The "off" setting turns them off, and the "on" option turns them on (if the
>     device supports it).
>     
>     Turning ECRC on or off can be a data integrity versus performance
>     tradeoff.  In theory, turning it on will catch more data errors, turning
>     it off means possibly better performance since CRC does not need to be
>     calculated by the PCIe hardware and packet sizes are reduced.

Ah, right, and I think I was even part of the conversation when this
was added :)

I'm not sure I would make the same choice today, though.  IMHO it's
kind of hard to defend choosing performance over data integrity.

If a platform really wants to sacrifice integrity for performance, it
could retain control of AER, and after Vidya's patch, Linux will leave
the ECRC configuration alone.

Straw-man: If Linux owns AER and ECRC is supported, enable ECRC by
default.  Retain "ecrc=off" to turn it off, but drop a note in dmesg
and taint the kernel.

Bjorn
diff mbox series

Patch

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e2d8a74f83c3..730b47bdcdef 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -184,6 +184,9 @@  static int disable_ecrc_checking(struct pci_dev *dev)
  */
 void pcie_set_ecrc_checking(struct pci_dev *dev)
 {
+	if (!pcie_aer_is_native(dev))
+		return;
+
 	switch (ecrc_policy) {
 	case ECRC_POLICY_DEFAULT:
 		return;