diff mbox series

PCI/ASPM: fix unexpected behavior when re-enabling L1

Message ID 99e1891d-cd15-5e7b-6ac8-8c6dc5d138ec@gmail.com
State New
Headers show
Series PCI/ASPM: fix unexpected behavior when re-enabling L1 | expand

Commit Message

Heiner Kallweit Aug. 26, 2023, 11:10 a.m. UTC
After the referenced commit we may see L1 sub-states being active
unexpectedly. Following scenario as an example:
r8169 disables L1 because of known hardware issues on a number of
systems. Implicitly L1.1 and L1.2 are disabled too.
On my system L1 and L1.1 work fine, but L1.2 causes missed
rx packets. Therefore I write 1 to aspm_l1_1.
This removes ASPM_STATE_L1 from the disabled modes and therefore
unexpectedly enables also L1.2. So return to the old behavior.

A comment in the commit message of the referenced change correctly points
out that this behavior is inconsistent with aspm_attr_store_common().
So change aspm_attr_store_common() accordingly.

Fixes: fb097dcd5a28 ("PCI/ASPM: Disable only ASPM_STATE_L1 when driver disables L1")
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/pci/pcie/aspm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas Oct. 2, 2023, 3:14 p.m. UTC | #1
[+cc Sathy, Lukas]

On Sat, Aug 26, 2023 at 01:10:35PM +0200, Heiner Kallweit wrote:
> After the referenced commit we may see L1 sub-states being active
> unexpectedly. Following scenario as an example:
> r8169 disables L1 because of known hardware issues on a number of
> systems. Implicitly L1.1 and L1.2 are disabled too.
> On my system L1 and L1.1 work fine, but L1.2 causes missed
> rx packets. Therefore I write 1 to aspm_l1_1.
> This removes ASPM_STATE_L1 from the disabled modes and therefore
> unexpectedly enables also L1.2. So return to the old behavior.
> 
> A comment in the commit message of the referenced change correctly points
> out that this behavior is inconsistent with aspm_attr_store_common().
> So change aspm_attr_store_common() accordingly.

I think we should split this into a pure revert of fb097dcd5a28 with
the description of the unintended consequence, followed by another
patch to change aspm_attr_store_common().

I guess the existing aspm_attr_store_common() behavior allows a
similar unexpected behavior?  For example, in this sequence:

  - Write 0 to "l1_aspm" to disable L1
  - Write 1 to "l1_1_aspm" to enable L1.1

does L1.2 get implicitly enabled as well even though that's clearly
not what the user intended?

There's also the separate question of how the sysfs file and the
pci_disable_link_state() API should interact.  Drivers use that API
when they know about a defect in their device, but the user can
override that via syfs.

In [1], we have a similar situation with D3cold support, where we're
thinking that we should not allow a user to use sysfs to override that
driver knowledge.

Bjorn

[1] https://lore.kernel.org/r/b8a7f4af2b73f6b506ad8ddee59d747cbf834606.1695025365.git.lukas@wunner.de

> Fixes: fb097dcd5a28 ("PCI/ASPM: Disable only ASPM_STATE_L1 when driver disables L1")
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---
>  drivers/pci/pcie/aspm.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index 3dafba0b5..6d3788257 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -1063,7 +1063,7 @@ static int __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
>  	if (state & PCIE_LINK_STATE_L0S)
>  		link->aspm_disable |= ASPM_STATE_L0S;
>  	if (state & PCIE_LINK_STATE_L1)
> -		link->aspm_disable |= ASPM_STATE_L1;
> +		link->aspm_disable |= ASPM_STATE_L1 | ASPM_STATE_L1SS;
>  	if (state & PCIE_LINK_STATE_L1_1)
>  		link->aspm_disable |= ASPM_STATE_L1_1;
>  	if (state & PCIE_LINK_STATE_L1_2)
> @@ -1251,6 +1251,8 @@ static ssize_t aspm_attr_store_common(struct device *dev,
>  			link->aspm_disable &= ~ASPM_STATE_L1;
>  	} else {
>  		link->aspm_disable |= state;
> +		if (state & ASPM_STATE_L1)
> +			link->aspm_disable |= ASPM_STATE_L1SS;
>  	}
>  
>  	pcie_config_aspm_link(link, policy_to_aspm_state(link));
> -- 
> 2.42.0
>
Kuppuswamy Sathyanarayanan Oct. 3, 2023, 3:33 a.m. UTC | #2
On 10/2/2023 8:14 AM, Bjorn Helgaas wrote:
> [+cc Sathy, Lukas]
> 
> On Sat, Aug 26, 2023 at 01:10:35PM +0200, Heiner Kallweit wrote:
>> After the referenced commit we may see L1 sub-states being active
>> unexpectedly. Following scenario as an example:
>> r8169 disables L1 because of known hardware issues on a number of
>> systems. Implicitly L1.1 and L1.2 are disabled too.
>> On my system L1 and L1.1 work fine, but L1.2 causes missed
>> rx packets. Therefore I write 1 to aspm_l1_1.
>> This removes ASPM_STATE_L1 from the disabled modes and therefore
>> unexpectedly enables also L1.2. So return to the old behavior.

IIUC, the above-mentioned SysFS issue will be fixed by your change to
aspm_attr_store_common(), right?

Can you elaborate why you need the following change?

>> @@ -1063,7 +1063,7 @@ static int __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
>>  	if (state & PCIE_LINK_STATE_L1)
>> -		link->aspm_disable |= ASPM_STATE_L1;
>> +		link->aspm_disable |= ASPM_STATE_L1 | ASPM_STATE_L1SS;

>>
>> A comment in the commit message of the referenced change correctly points
>> out that this behavior is inconsistent with aspm_attr_store_common().
>> So change aspm_attr_store_common() accordingly.
> 
> I think we should split this into a pure revert of fb097dcd5a28 with
> the description of the unintended consequence, followed by another
> patch to change aspm_attr_store_common().
> 

I agree, the revert and new change should be split into two patches.

> I guess the existing aspm_attr_store_common() behavior allows a
> similar unexpected behavior?  For example, in this sequence:
> 
>   - Write 0 to "l1_aspm" to disable L1
>   - Write 1 to "l1_1_aspm" to enable L1.1
> 
> does L1.2 get implicitly enabled as well even though that's clearly
> not what the user intended?
> 
> There's also the separate question of how the sysfs file and the
> pci_disable_link_state() API should interact.  Drivers use that API
> when they know about a defect in their device, but the user can
> override that via syfs.
> 
> In [1], we have a similar situation with D3cold support, where we're
> thinking that we should not allow a user to use sysfs to override that
> driver knowledge.
> 
> Bjorn
> 
> [1] https://lore.kernel.org/r/b8a7f4af2b73f6b506ad8ddee59d747cbf834606.1695025365.git.lukas@wunner.de
> 
>> Fixes: fb097dcd5a28 ("PCI/ASPM: Disable only ASPM_STATE_L1 when driver disables L1")
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> ---
>>  drivers/pci/pcie/aspm.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>> index 3dafba0b5..6d3788257 100644
>> --- a/drivers/pci/pcie/aspm.c
>> +++ b/drivers/pci/pcie/aspm.c
>> @@ -1063,7 +1063,7 @@ static int __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
>>  	if (state & PCIE_LINK_STATE_L0S)
>>  		link->aspm_disable |= ASPM_STATE_L0S;
>>  	if (state & PCIE_LINK_STATE_L1)
>> -		link->aspm_disable |= ASPM_STATE_L1;
>> +		link->aspm_disable |= ASPM_STATE_L1 | ASPM_STATE_L1SS;
>>  	if (state & PCIE_LINK_STATE_L1_1)
>>  		link->aspm_disable |= ASPM_STATE_L1_1;
>>  	if (state & PCIE_LINK_STATE_L1_2)
>> @@ -1251,6 +1251,8 @@ static ssize_t aspm_attr_store_common(struct device *dev,
>>  			link->aspm_disable &= ~ASPM_STATE_L1;
>>  	} else {
>>  		link->aspm_disable |= state;
>> +		if (state & ASPM_STATE_L1)
>> +			link->aspm_disable |= ASPM_STATE_L1SS;
>>  	}
>>  
>>  	pcie_config_aspm_link(link, policy_to_aspm_state(link));
>> -- 
>> 2.42.0
>>
Heiner Kallweit Oct. 10, 2023, 8:29 p.m. UTC | #3
On 02.10.2023 17:14, Bjorn Helgaas wrote:
> [+cc Sathy, Lukas]
> 
> On Sat, Aug 26, 2023 at 01:10:35PM +0200, Heiner Kallweit wrote:
>> After the referenced commit we may see L1 sub-states being active
>> unexpectedly. Following scenario as an example:
>> r8169 disables L1 because of known hardware issues on a number of
>> systems. Implicitly L1.1 and L1.2 are disabled too.
>> On my system L1 and L1.1 work fine, but L1.2 causes missed
>> rx packets. Therefore I write 1 to aspm_l1_1.
>> This removes ASPM_STATE_L1 from the disabled modes and therefore
>> unexpectedly enables also L1.2. So return to the old behavior.
>>
>> A comment in the commit message of the referenced change correctly points
>> out that this behavior is inconsistent with aspm_attr_store_common().
>> So change aspm_attr_store_common() accordingly.
> 
> I think we should split this into a pure revert of fb097dcd5a28 with
> the description of the unintended consequence, followed by another
> patch to change aspm_attr_store_common().
> 
OK

> I guess the existing aspm_attr_store_common() behavior allows a
> similar unexpected behavior?  For example, in this sequence:
> 
>   - Write 0 to "l1_aspm" to disable L1
>   - Write 1 to "l1_1_aspm" to enable L1.1
> 
> does L1.2 get implicitly enabled as well even though that's clearly
> not what the user intended?
> 
Right, it's the same here. Therefore the second change in the patch.

> There's also the separate question of how the sysfs file and the
> pci_disable_link_state() API should interact.  Drivers use that API
> when they know about a defect in their device, but the user can
> override that via syfs.
> 
> In [1], we have a similar situation with D3cold support, where we're
> thinking that we should not allow a user to use sysfs to override that
> driver knowledge.
> 
In my r8169 use case ASPM works fine on one system, but causes issues
on another one with same NIC chip version. So it may be a wild mix of
NIC hw erratum, BIOS bug, mainboard chipset incompatibilities, etc.
Therefore I disable L1 per default, and allow users to re-enable
L1/L1.1/L1.2 if their system isn't affected.

> Bjorn
> 
> [1] https://lore.kernel.org/r/b8a7f4af2b73f6b506ad8ddee59d747cbf834606.1695025365.git.lukas@wunner.de
> 
>> Fixes: fb097dcd5a28 ("PCI/ASPM: Disable only ASPM_STATE_L1 when driver disables L1")
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> ---
>>  drivers/pci/pcie/aspm.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>> index 3dafba0b5..6d3788257 100644
>> --- a/drivers/pci/pcie/aspm.c
>> +++ b/drivers/pci/pcie/aspm.c
>> @@ -1063,7 +1063,7 @@ static int __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
>>  	if (state & PCIE_LINK_STATE_L0S)
>>  		link->aspm_disable |= ASPM_STATE_L0S;
>>  	if (state & PCIE_LINK_STATE_L1)
>> -		link->aspm_disable |= ASPM_STATE_L1;
>> +		link->aspm_disable |= ASPM_STATE_L1 | ASPM_STATE_L1SS;
>>  	if (state & PCIE_LINK_STATE_L1_1)
>>  		link->aspm_disable |= ASPM_STATE_L1_1;
>>  	if (state & PCIE_LINK_STATE_L1_2)
>> @@ -1251,6 +1251,8 @@ static ssize_t aspm_attr_store_common(struct device *dev,
>>  			link->aspm_disable &= ~ASPM_STATE_L1;
>>  	} else {
>>  		link->aspm_disable |= state;
>> +		if (state & ASPM_STATE_L1)
>> +			link->aspm_disable |= ASPM_STATE_L1SS;
>>  	}
>>  
>>  	pcie_config_aspm_link(link, policy_to_aspm_state(link));
>> -- 
>> 2.42.0
>>
Heiner Kallweit Oct. 10, 2023, 8:33 p.m. UTC | #4
On 03.10.2023 05:33, Kuppuswamy Sathyanarayanan wrote:
> 
> 
> On 10/2/2023 8:14 AM, Bjorn Helgaas wrote:
>> [+cc Sathy, Lukas]
>>
>> On Sat, Aug 26, 2023 at 01:10:35PM +0200, Heiner Kallweit wrote:
>>> After the referenced commit we may see L1 sub-states being active
>>> unexpectedly. Following scenario as an example:
>>> r8169 disables L1 because of known hardware issues on a number of
>>> systems. Implicitly L1.1 and L1.2 are disabled too.
>>> On my system L1 and L1.1 work fine, but L1.2 causes missed
>>> rx packets. Therefore I write 1 to aspm_l1_1.
>>> This removes ASPM_STATE_L1 from the disabled modes and therefore
>>> unexpectedly enables also L1.2. So return to the old behavior.
> 
> IIUC, the above-mentioned SysFS issue will be fixed by your change to
> aspm_attr_store_common(), right?
> 
In my use case the driver uses pci_disable_link_state() to disable L1
and implicitly also L1.1 and L1.2. The following change is needed so
that L1.2 remains disabled if L1 is removed from the list of disabled
states later. That's the revert we're talking about.

> Can you elaborate why you need the following change?
> 
>>> @@ -1063,7 +1063,7 @@ static int __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
>>>  	if (state & PCIE_LINK_STATE_L1)
>>> -		link->aspm_disable |= ASPM_STATE_L1;
>>> +		link->aspm_disable |= ASPM_STATE_L1 | ASPM_STATE_L1SS;
> 
>>>
>>> A comment in the commit message of the referenced change correctly points
>>> out that this behavior is inconsistent with aspm_attr_store_common().
>>> So change aspm_attr_store_common() accordingly.
>>
>> I think we should split this into a pure revert of fb097dcd5a28 with
>> the description of the unintended consequence, followed by another
>> patch to change aspm_attr_store_common().
>>
> 
> I agree, the revert and new change should be split into two patches.
> 
>> I guess the existing aspm_attr_store_common() behavior allows a
>> similar unexpected behavior?  For example, in this sequence:
>>
>>   - Write 0 to "l1_aspm" to disable L1
>>   - Write 1 to "l1_1_aspm" to enable L1.1
>>
>> does L1.2 get implicitly enabled as well even though that's clearly
>> not what the user intended?
>>
>> There's also the separate question of how the sysfs file and the
>> pci_disable_link_state() API should interact.  Drivers use that API
>> when they know about a defect in their device, but the user can
>> override that via syfs.
>>
>> In [1], we have a similar situation with D3cold support, where we're
>> thinking that we should not allow a user to use sysfs to override that
>> driver knowledge.
>>
>> Bjorn
>>
>> [1] https://lore.kernel.org/r/b8a7f4af2b73f6b506ad8ddee59d747cbf834606.1695025365.git.lukas@wunner.de
>>
>>> Fixes: fb097dcd5a28 ("PCI/ASPM: Disable only ASPM_STATE_L1 when driver disables L1")
>>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>>> ---
>>>  drivers/pci/pcie/aspm.c | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
>>> index 3dafba0b5..6d3788257 100644
>>> --- a/drivers/pci/pcie/aspm.c
>>> +++ b/drivers/pci/pcie/aspm.c
>>> @@ -1063,7 +1063,7 @@ static int __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
>>>  	if (state & PCIE_LINK_STATE_L0S)
>>>  		link->aspm_disable |= ASPM_STATE_L0S;
>>>  	if (state & PCIE_LINK_STATE_L1)
>>> -		link->aspm_disable |= ASPM_STATE_L1;
>>> +		link->aspm_disable |= ASPM_STATE_L1 | ASPM_STATE_L1SS;
>>>  	if (state & PCIE_LINK_STATE_L1_1)
>>>  		link->aspm_disable |= ASPM_STATE_L1_1;
>>>  	if (state & PCIE_LINK_STATE_L1_2)
>>> @@ -1251,6 +1251,8 @@ static ssize_t aspm_attr_store_common(struct device *dev,
>>>  			link->aspm_disable &= ~ASPM_STATE_L1;
>>>  	} else {
>>>  		link->aspm_disable |= state;
>>> +		if (state & ASPM_STATE_L1)
>>> +			link->aspm_disable |= ASPM_STATE_L1SS;
>>>  	}
>>>  
>>>  	pcie_config_aspm_link(link, policy_to_aspm_state(link));
>>> -- 
>>> 2.42.0
>>>
>
diff mbox series

Patch

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 3dafba0b5..6d3788257 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -1063,7 +1063,7 @@  static int __pci_disable_link_state(struct pci_dev *pdev, int state, bool sem)
 	if (state & PCIE_LINK_STATE_L0S)
 		link->aspm_disable |= ASPM_STATE_L0S;
 	if (state & PCIE_LINK_STATE_L1)
-		link->aspm_disable |= ASPM_STATE_L1;
+		link->aspm_disable |= ASPM_STATE_L1 | ASPM_STATE_L1SS;
 	if (state & PCIE_LINK_STATE_L1_1)
 		link->aspm_disable |= ASPM_STATE_L1_1;
 	if (state & PCIE_LINK_STATE_L1_2)
@@ -1251,6 +1251,8 @@  static ssize_t aspm_attr_store_common(struct device *dev,
 			link->aspm_disable &= ~ASPM_STATE_L1;
 	} else {
 		link->aspm_disable |= state;
+		if (state & ASPM_STATE_L1)
+			link->aspm_disable |= ASPM_STATE_L1SS;
 	}
 
 	pcie_config_aspm_link(link, policy_to_aspm_state(link));