diff mbox series

e1000e: bump up timeout to wait when ME un-configure ULP mode

Message ID 20200323191639.48826-1-aaron.ma@canonical.com
State Superseded
Delegated to: Jeff Kirsher
Headers show
Series e1000e: bump up timeout to wait when ME un-configure ULP mode | expand

Commit Message

Aaron Ma March 23, 2020, 7:16 p.m. UTC
ME takes 2+ seconds to un-configure ULP mode done after resume
from s2idle on some ThinkPad laptops.
Without enough wait, reset and re-init will fail with error.

Fixes: f15bb6dde738cc8fa0 ("e1000e: Add support for S0ix")
BugLink: https://bugs.launchpad.net/bugs/1865570
Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
---
 drivers/net/ethernet/intel/e1000e/ich8lan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Kai-Heng Feng March 25, 2020, 4:17 a.m. UTC | #1
Hi Aaron,

> On Mar 24, 2020, at 03:16, Aaron Ma <aaron.ma@canonical.com> wrote:
> 
> ME takes 2+ seconds to un-configure ULP mode done after resume
> from s2idle on some ThinkPad laptops.
> Without enough wait, reset and re-init will fail with error.

Thanks, this patch solves the issue. We can drop the DMI quirk in favor of this patch.

> 
> Fixes: f15bb6dde738cc8fa0 ("e1000e: Add support for S0ix")
> BugLink: https://bugs.launchpad.net/bugs/1865570
> Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
> ---
> drivers/net/ethernet/intel/e1000e/ich8lan.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> index b4135c50e905..147b15a2f8b3 100644
> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> @@ -1240,9 +1240,9 @@ static s32 e1000_disable_ulp_lpt_lp(struct e1000_hw *hw, bool force)
> 			ew32(H2ME, mac_reg);
> 		}
> 
> -		/* Poll up to 300msec for ME to clear ULP_CFG_DONE. */
> +		/* Poll up to 2.5sec for ME to clear ULP_CFG_DONE. */
> 		while (er32(FWSM) & E1000_FWSM_ULP_CFG_DONE) {
> -			if (i++ == 30) {
> +			if (i++ == 250) {
> 				ret_val = -E1000_ERR_PHY;
> 				goto out;
> 			}

The return value was not caught by the caller, so the error ends up unnoticed.
Maybe let the caller check the return value of e1000_disable_ulp_lpt_lp()?

Kai-Heng

> -- 
> 2.17.1
>
Sasha Neftin March 25, 2020, 6:36 a.m. UTC | #2
On 3/25/2020 06:17, Kai-Heng Feng wrote:
> Hi Aaron,
> 
>> On Mar 24, 2020, at 03:16, Aaron Ma <aaron.ma@canonical.com> wrote:
>>
>> ME takes 2+ seconds to un-configure ULP mode done after resume
>> from s2idle on some ThinkPad laptops.
>> Without enough wait, reset and re-init will fail with error.
> 
> Thanks, this patch solves the issue. We can drop the DMI quirk in favor of this patch.
> 
>>
>> Fixes: f15bb6dde738cc8fa0 ("e1000e: Add support for S0ix")
>> BugLink: https://bugs.launchpad.net/bugs/1865570
>> Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
>> ---
>> drivers/net/ethernet/intel/e1000e/ich8lan.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>> index b4135c50e905..147b15a2f8b3 100644
>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>> @@ -1240,9 +1240,9 @@ static s32 e1000_disable_ulp_lpt_lp(struct e1000_hw *hw, bool force)
>> 			ew32(H2ME, mac_reg);
>> 		}
>>
>> -		/* Poll up to 300msec for ME to clear ULP_CFG_DONE. */
>> +		/* Poll up to 2.5sec for ME to clear ULP_CFG_DONE. */
>> 		while (er32(FWSM) & E1000_FWSM_ULP_CFG_DONE) {
>> -			if (i++ == 30) {
>> +			if (i++ == 250) {
>> 				ret_val = -E1000_ERR_PHY;
>> 				goto out;
>> 			}
> 
> The return value was not caught by the caller, so the error ends up unnoticed.
> Maybe let the caller check the return value of e1000_disable_ulp_lpt_lp()?
> 
> Kai-Heng
Hello Kai-Heng and Aaron,
I a bit confused. In our previous conversation you told ME not running. 
let me shimming in. Increasing delay won't be solve the problem and just 
mask it. We need to understand why ME take too much time. What is 
problem with this specific system?
So, basically no ME system should works for you.

Meanwhile I prefer keep DMI quirk.
Thanks,
Sasha
> 
>> -- 
>> 2.17.1
>>
>
Kai-Heng Feng March 25, 2020, 6:39 a.m. UTC | #3
Hi Sasha,

> On Mar 25, 2020, at 14:36, Neftin, Sasha <sasha.neftin@intel.com> wrote:
> 
> On 3/25/2020 06:17, Kai-Heng Feng wrote:
>> Hi Aaron,
>>> On Mar 24, 2020, at 03:16, Aaron Ma <aaron.ma@canonical.com> wrote:
>>> 
>>> ME takes 2+ seconds to un-configure ULP mode done after resume
>>> from s2idle on some ThinkPad laptops.
>>> Without enough wait, reset and re-init will fail with error.
>> Thanks, this patch solves the issue. We can drop the DMI quirk in favor of this patch.
>>> 
>>> Fixes: f15bb6dde738cc8fa0 ("e1000e: Add support for S0ix")
>>> BugLink: https://bugs.launchpad.net/bugs/1865570
>>> Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
>>> ---
>>> drivers/net/ethernet/intel/e1000e/ich8lan.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>> index b4135c50e905..147b15a2f8b3 100644
>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>> @@ -1240,9 +1240,9 @@ static s32 e1000_disable_ulp_lpt_lp(struct e1000_hw *hw, bool force)
>>> 			ew32(H2ME, mac_reg);
>>> 		}
>>> 
>>> -		/* Poll up to 300msec for ME to clear ULP_CFG_DONE. */
>>> +		/* Poll up to 2.5sec for ME to clear ULP_CFG_DONE. */
>>> 		while (er32(FWSM) & E1000_FWSM_ULP_CFG_DONE) {
>>> -			if (i++ == 30) {
>>> +			if (i++ == 250) {
>>> 				ret_val = -E1000_ERR_PHY;
>>> 				goto out;
>>> 			}
>> The return value was not caught by the caller, so the error ends up unnoticed.
>> Maybe let the caller check the return value of e1000_disable_ulp_lpt_lp()?
>> Kai-Heng
> Hello Kai-Heng and Aaron,
> I a bit confused. In our previous conversation you told ME not running.

Yes I can confirm Intel AMT is disabled BIOS menu. I think Intel AMT is ME in this context?

How do I check if it's really disabled under Linux?

Kai-Heng

> let me shimming in. Increasing delay won't be solve the problem and just mask it. We need to understand why ME take too much time. What is problem with this specific system?
> So, basically no ME system should works for you.
> 
> Meanwhile I prefer keep DMI quirk.
> Thanks,
> Sasha
>>> -- 
>>> 2.17.1
Tsai, Rex March 25, 2020, 6:42 a.m. UTC | #4
Hello Kai-Heng,
If you are using vPro system, ME LAN driver is always alive and you have no way to disable it until you build new BIOS. Is this also for Lenovo system?

Rex Tsai | Intel Client LAN Engineer | +1 (503) 264-0517

-----Original Message-----
From: Kai-Heng Feng <kai.heng.feng@canonical.com> 
Sent: Tuesday, March 24, 2020 11:40 PM
To: Neftin, Sasha <sasha.neftin@intel.com>
Cc: Aaron Ma <aaron.ma@canonical.com>; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; David Miller <davem@davemloft.net>; moderated list:INTEL ETHERNET DRIVERS <intel-wired-lan@lists.osuosl.org>; open list:NETWORKING DRIVERS <netdev@vger.kernel.org>; open list <linux-kernel@vger.kernel.org>; Lifshits, Vitaly <vitaly.lifshits@intel.com>; Tsai, Rex <rex.tsai@intel.com>
Subject: Re: [PATCH] e1000e: bump up timeout to wait when ME un-configure ULP mode

Hi Sasha,

> On Mar 25, 2020, at 14:36, Neftin, Sasha <sasha.neftin@intel.com> wrote:
> 
> On 3/25/2020 06:17, Kai-Heng Feng wrote:
>> Hi Aaron,
>>> On Mar 24, 2020, at 03:16, Aaron Ma <aaron.ma@canonical.com> wrote:
>>> 
>>> ME takes 2+ seconds to un-configure ULP mode done after resume from 
>>> s2idle on some ThinkPad laptops.
>>> Without enough wait, reset and re-init will fail with error.
>> Thanks, this patch solves the issue. We can drop the DMI quirk in favor of this patch.
>>> 
>>> Fixes: f15bb6dde738cc8fa0 ("e1000e: Add support for S0ix")
>>> BugLink: https://bugs.launchpad.net/bugs/1865570
>>> Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
>>> ---
>>> drivers/net/ethernet/intel/e1000e/ich8lan.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c 
>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>> index b4135c50e905..147b15a2f8b3 100644
>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>> @@ -1240,9 +1240,9 @@ static s32 e1000_disable_ulp_lpt_lp(struct e1000_hw *hw, bool force)
>>> 			ew32(H2ME, mac_reg);
>>> 		}
>>> 
>>> -		/* Poll up to 300msec for ME to clear ULP_CFG_DONE. */
>>> +		/* Poll up to 2.5sec for ME to clear ULP_CFG_DONE. */
>>> 		while (er32(FWSM) & E1000_FWSM_ULP_CFG_DONE) {
>>> -			if (i++ == 30) {
>>> +			if (i++ == 250) {
>>> 				ret_val = -E1000_ERR_PHY;
>>> 				goto out;
>>> 			}
>> The return value was not caught by the caller, so the error ends up unnoticed.
>> Maybe let the caller check the return value of e1000_disable_ulp_lpt_lp()?
>> Kai-Heng
> Hello Kai-Heng and Aaron,
> I a bit confused. In our previous conversation you told ME not running.

Yes I can confirm Intel AMT is disabled BIOS menu. I think Intel AMT is ME in this context?

How do I check if it's really disabled under Linux?

Kai-Heng

> let me shimming in. Increasing delay won't be solve the problem and just mask it. We need to understand why ME take too much time. What is problem with this specific system?
> So, basically no ME system should works for you.
> 
> Meanwhile I prefer keep DMI quirk.
> Thanks,
> Sasha
>>> --
>>> 2.17.1
Aaron Ma March 25, 2020, 6:43 a.m. UTC | #5
On 3/25/20 2:36 PM, Neftin, Sasha wrote:
> On 3/25/2020 06:17, Kai-Heng Feng wrote:
>> Hi Aaron,
>>
>>> On Mar 24, 2020, at 03:16, Aaron Ma <aaron.ma@canonical.com> wrote:
>>>
>>> ME takes 2+ seconds to un-configure ULP mode done after resume
>>> from s2idle on some ThinkPad laptops.
>>> Without enough wait, reset and re-init will fail with error.
>>
>> Thanks, this patch solves the issue. We can drop the DMI quirk in
>> favor of this patch.
>>
>>>
>>> Fixes: f15bb6dde738cc8fa0 ("e1000e: Add support for S0ix")
>>> BugLink: https://bugs.launchpad.net/bugs/1865570
>>> Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
>>> ---
>>> drivers/net/ethernet/intel/e1000e/ich8lan.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>> index b4135c50e905..147b15a2f8b3 100644
>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>> @@ -1240,9 +1240,9 @@ static s32 e1000_disable_ulp_lpt_lp(struct
>>> e1000_hw *hw, bool force)
>>>             ew32(H2ME, mac_reg);
>>>         }
>>>
>>> -        /* Poll up to 300msec for ME to clear ULP_CFG_DONE. */
>>> +        /* Poll up to 2.5sec for ME to clear ULP_CFG_DONE. */
>>>         while (er32(FWSM) & E1000_FWSM_ULP_CFG_DONE) {
>>> -            if (i++ == 30) {
>>> +            if (i++ == 250) {
>>>                 ret_val = -E1000_ERR_PHY;
>>>                 goto out;
>>>             }
>>
>> The return value was not caught by the caller, so the error ends up
>> unnoticed.
>> Maybe let the caller check the return value of
>> e1000_disable_ulp_lpt_lp()?
>>
>> Kai-Heng
> Hello Kai-Heng and Aaron,
> I a bit confused. In our previous conversation you told ME not running.
> let me shimming in. Increasing delay won't be solve the problem and just
> mask it. We need to understand why ME take too much time. What is
> problem with this specific system?
> So, basically no ME system should works for you.

Some laptops ME work that's why only reproduce issue on some laptops.
In this issue i219 is controlled by ME.

Quirk is only for 1 model type. But issue is reproduced by more models.
So it won't work.

Regard,
Aaron

> 
> Meanwhile I prefer keep DMI quirk.
> Thanks,
> Sasha
>>
>>> -- 
>>> 2.17.1
>>>
>>
>
Aaron Ma March 25, 2020, 6:49 a.m. UTC | #6
On 3/25/20 2:42 PM, Tsai, Rex wrote:
> Hello Kai-Heng,
> If you are using vPro system, ME LAN driver is always alive and you have no way to disable it until you build new BIOS. Is this also for Lenovo system?

Right, some new models from Lenovo.

Regards,
Aaron

> 
> Rex Tsai | Intel Client LAN Engineer | +1 (503) 264-0517
> 
> -----Original Message-----
> From: Kai-Heng Feng <kai.heng.feng@canonical.com> 
> Sent: Tuesday, March 24, 2020 11:40 PM
> To: Neftin, Sasha <sasha.neftin@intel.com>
> Cc: Aaron Ma <aaron.ma@canonical.com>; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>; David Miller <davem@davemloft.net>; moderated list:INTEL ETHERNET DRIVERS <intel-wired-lan@lists.osuosl.org>; open list:NETWORKING DRIVERS <netdev@vger.kernel.org>; open list <linux-kernel@vger.kernel.org>; Lifshits, Vitaly <vitaly.lifshits@intel.com>; Tsai, Rex <rex.tsai@intel.com>
> Subject: Re: [PATCH] e1000e: bump up timeout to wait when ME un-configure ULP mode
> 
> Hi Sasha,
> 
>> On Mar 25, 2020, at 14:36, Neftin, Sasha <sasha.neftin@intel.com> wrote:
>>
>> On 3/25/2020 06:17, Kai-Heng Feng wrote:
>>> Hi Aaron,
>>>> On Mar 24, 2020, at 03:16, Aaron Ma <aaron.ma@canonical.com> wrote:
>>>>
>>>> ME takes 2+ seconds to un-configure ULP mode done after resume from 
>>>> s2idle on some ThinkPad laptops.
>>>> Without enough wait, reset and re-init will fail with error.
>>> Thanks, this patch solves the issue. We can drop the DMI quirk in favor of this patch.
>>>>
>>>> Fixes: f15bb6dde738cc8fa0 ("e1000e: Add support for S0ix")
>>>> BugLink: https://bugs.launchpad.net/bugs/1865570
>>>> Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
>>>> ---
>>>> drivers/net/ethernet/intel/e1000e/ich8lan.c | 4 ++--
>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c 
>>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>> index b4135c50e905..147b15a2f8b3 100644
>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>> @@ -1240,9 +1240,9 @@ static s32 e1000_disable_ulp_lpt_lp(struct e1000_hw *hw, bool force)
>>>> 			ew32(H2ME, mac_reg);
>>>> 		}
>>>>
>>>> -		/* Poll up to 300msec for ME to clear ULP_CFG_DONE. */
>>>> +		/* Poll up to 2.5sec for ME to clear ULP_CFG_DONE. */
>>>> 		while (er32(FWSM) & E1000_FWSM_ULP_CFG_DONE) {
>>>> -			if (i++ == 30) {
>>>> +			if (i++ == 250) {
>>>> 				ret_val = -E1000_ERR_PHY;
>>>> 				goto out;
>>>> 			}
>>> The return value was not caught by the caller, so the error ends up unnoticed.
>>> Maybe let the caller check the return value of e1000_disable_ulp_lpt_lp()?
>>> Kai-Heng
>> Hello Kai-Heng and Aaron,
>> I a bit confused. In our previous conversation you told ME not running.
> 
> Yes I can confirm Intel AMT is disabled BIOS menu. I think Intel AMT is ME in this context?
> 
> How do I check if it's really disabled under Linux?
> 
> Kai-Heng
> 
>> let me shimming in. Increasing delay won't be solve the problem and just mask it. We need to understand why ME take too much time. What is problem with this specific system?
>> So, basically no ME system should works for you.
>>
>> Meanwhile I prefer keep DMI quirk.
>> Thanks,
>> Sasha
>>>> --
>>>> 2.17.1
>
Sasha Neftin March 25, 2020, 1:58 p.m. UTC | #7
On 3/25/2020 08:43, Aaron Ma wrote:
> 
> 
> On 3/25/20 2:36 PM, Neftin, Sasha wrote:
>> On 3/25/2020 06:17, Kai-Heng Feng wrote:
>>> Hi Aaron,
>>>
>>>> On Mar 24, 2020, at 03:16, Aaron Ma <aaron.ma@canonical.com> wrote:
>>>>
>>>> ME takes 2+ seconds to un-configure ULP mode done after resume
>>>> from s2idle on some ThinkPad laptops.
>>>> Without enough wait, reset and re-init will fail with error.
>>>
>>> Thanks, this patch solves the issue. We can drop the DMI quirk in
>>> favor of this patch.
>>>
>>>>
>>>> Fixes: f15bb6dde738cc8fa0 ("e1000e: Add support for S0ix")
>>>> BugLink: https://bugs.launchpad.net/bugs/1865570
>>>> Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
>>>> ---
>>>> drivers/net/ethernet/intel/e1000e/ich8lan.c | 4 ++--
>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>> index b4135c50e905..147b15a2f8b3 100644
>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>> @@ -1240,9 +1240,9 @@ static s32 e1000_disable_ulp_lpt_lp(struct
>>>> e1000_hw *hw, bool force)
>>>>              ew32(H2ME, mac_reg);
>>>>          }
>>>>
>>>> -        /* Poll up to 300msec for ME to clear ULP_CFG_DONE. */
>>>> +        /* Poll up to 2.5sec for ME to clear ULP_CFG_DONE. */
>>>>          while (er32(FWSM) & E1000_FWSM_ULP_CFG_DONE) {
>>>> -            if (i++ == 30) {
>>>> +            if (i++ == 250) {
>>>>                  ret_val = -E1000_ERR_PHY;
>>>>                  goto out;
>>>>              }
>>>
>>> The return value was not caught by the caller, so the error ends up
>>> unnoticed.
>>> Maybe let the caller check the return value of
>>> e1000_disable_ulp_lpt_lp()?
>>>
>>> Kai-Heng
>> Hello Kai-Heng and Aaron,
>> I a bit confused. In our previous conversation you told ME not running.
>> let me shimming in. Increasing delay won't be solve the problem and just
>> mask it. We need to understand why ME take too much time. What is
>> problem with this specific system?
>> So, basically no ME system should works for you.
> 
> Some laptops ME work that's why only reproduce issue on some laptops.
> In this issue i219 is controlled by ME.
> 
Who can explain - why ME required too much time on this system?
Probably need work with ME/BIOS vendor and understand it.
Delay will just mask the real problem - we need to find root cause.
> Quirk is only for 1 model type. But issue is reproduced by more models.
> So it won't work.
> 
> Regard,
> Aaron
> 
>>
>> Meanwhile I prefer keep DMI quirk.
>> Thanks,
>> Sasha
>>>
>>>> -- 
>>>> 2.17.1
>>>>
>>>
>>
Aaron Ma March 25, 2020, 2:07 p.m. UTC | #8
On 3/25/20 9:58 PM, Neftin, Sasha wrote:
> On 3/25/2020 08:43, Aaron Ma wrote:
>>
>>
>> On 3/25/20 2:36 PM, Neftin, Sasha wrote:
>>> On 3/25/2020 06:17, Kai-Heng Feng wrote:
>>>> Hi Aaron,
>>>>
>>>>> On Mar 24, 2020, at 03:16, Aaron Ma <aaron.ma@canonical.com> wrote:
>>>>>
>>>>> ME takes 2+ seconds to un-configure ULP mode done after resume
>>>>> from s2idle on some ThinkPad laptops.
>>>>> Without enough wait, reset and re-init will fail with error.
>>>>
>>>> Thanks, this patch solves the issue. We can drop the DMI quirk in
>>>> favor of this patch.
>>>>
>>>>>
>>>>> Fixes: f15bb6dde738cc8fa0 ("e1000e: Add support for S0ix")
>>>>> BugLink: https://bugs.launchpad.net/bugs/1865570
>>>>> Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
>>>>> ---
>>>>> drivers/net/ethernet/intel/e1000e/ich8lan.c | 4 ++--
>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>> index b4135c50e905..147b15a2f8b3 100644
>>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>> @@ -1240,9 +1240,9 @@ static s32 e1000_disable_ulp_lpt_lp(struct
>>>>> e1000_hw *hw, bool force)
>>>>>              ew32(H2ME, mac_reg);
>>>>>          }
>>>>>
>>>>> -        /* Poll up to 300msec for ME to clear ULP_CFG_DONE. */
>>>>> +        /* Poll up to 2.5sec for ME to clear ULP_CFG_DONE. */
>>>>>          while (er32(FWSM) & E1000_FWSM_ULP_CFG_DONE) {
>>>>> -            if (i++ == 30) {
>>>>> +            if (i++ == 250) {
>>>>>                  ret_val = -E1000_ERR_PHY;
>>>>>                  goto out;
>>>>>              }
>>>>
>>>> The return value was not caught by the caller, so the error ends up
>>>> unnoticed.
>>>> Maybe let the caller check the return value of
>>>> e1000_disable_ulp_lpt_lp()?
>>>>
>>>> Kai-Heng
>>> Hello Kai-Heng and Aaron,
>>> I a bit confused. In our previous conversation you told ME not running.
>>> let me shimming in. Increasing delay won't be solve the problem and just
>>> mask it. We need to understand why ME take too much time. What is
>>> problem with this specific system?
>>> So, basically no ME system should works for you.
>>
>> Some laptops ME work that's why only reproduce issue on some laptops.
>> In this issue i219 is controlled by ME.
>>
> Who can explain - why ME required too much time on this system?
> Probably need work with ME/BIOS vendor and understand it.
> Delay will just mask the real problem - we need to find root cause.

IMHO, it should be ME check the link status when link disconnected,
that's why Poll up to 5 seconds for Cable Disconnected indication when
enable ULP.

The reason is same for both disable/enable ULP mode.

I agree to ask ME to check it too.

Regards,
Aaron

>> Quirk is only for 1 model type. But issue is reproduced by more models.
>> So it won't work.
>>
>> Regard,
>> Aaron
>>
>>>
>>> Meanwhile I prefer keep DMI quirk.
>>> Thanks,
>>> Sasha
>>>>
>>>>> -- 
>>>>> 2.17.1
>>>>>
>>>>
>>>
>
Paul Menzel March 25, 2020, 3:49 p.m. UTC | #9
Dear Linux folks,


Am 25.03.20 um 14:58 schrieb Neftin, Sasha:
> On 3/25/2020 08:43, Aaron Ma wrote:

>> On 3/25/20 2:36 PM, Neftin, Sasha wrote:
>>> On 3/25/2020 06:17, Kai-Heng Feng wrote:

>>>>> On Mar 24, 2020, at 03:16, Aaron Ma <aaron.ma@canonical.com> wrote:
>>>>>
>>>>> ME takes 2+ seconds to un-configure ULP mode done after resume
>>>>> from s2idle on some ThinkPad laptops.
>>>>> Without enough wait, reset and re-init will fail with error.
>>>>
>>>> Thanks, this patch solves the issue. We can drop the DMI quirk in
>>>> favor of this patch.
>>>>
>>>>> Fixes: f15bb6dde738cc8fa0 ("e1000e: Add support for S0ix")
>>>>> BugLink: https://bugs.launchpad.net/bugs/1865570
>>>>> Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
>>>>> ---
>>>>> drivers/net/ethernet/intel/e1000e/ich8lan.c | 4 ++--
>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>> index b4135c50e905..147b15a2f8b3 100644
>>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>> @@ -1240,9 +1240,9 @@ static s32 e1000_disable_ulp_lpt_lp(struct
>>>>> e1000_hw *hw, bool force)
>>>>>              ew32(H2ME, mac_reg);
>>>>>          }
>>>>>
>>>>> -        /* Poll up to 300msec for ME to clear ULP_CFG_DONE. */
>>>>> +        /* Poll up to 2.5sec for ME to clear ULP_CFG_DONE. */
>>>>>          while (er32(FWSM) & E1000_FWSM_ULP_CFG_DONE) {
>>>>> -            if (i++ == 30) {
>>>>> +            if (i++ == 250) {
>>>>>                  ret_val = -E1000_ERR_PHY;
>>>>>                  goto out;
>>>>>              }
>>>>
>>>> The return value was not caught by the caller, so the error ends up
>>>> unnoticed.
>>>> Maybe let the caller check the return value of
>>>> e1000_disable_ulp_lpt_lp()?

>>> I a bit confused. In our previous conversation you told ME not running.
>>> let me shimming in. Increasing delay won't be solve the problem and just
>>> mask it. We need to understand why ME take too much time. What is
>>> problem with this specific system?
>>> So, basically no ME system should works for you.
>>
>> Some laptops ME work that's why only reproduce issue on some laptops.
>> In this issue i219 is controlled by ME.
>
> Who can explain - why ME required too much time on this system?
> Probably need work with ME/BIOS vendor and understand it.
> Delay will just mask the real problem - we need to find root cause.
>> Quirk is only for 1 model type. But issue is reproduced by more models.
>> So it won't work.

(Where is Aaron’s reply? It wasn’t delivered to me yet.)

As this is clearly a regression, please revert the commit for now, and 
then find a way to correctly implement S0ix support. Linux’ regression 
policy demands that as no fix has been found since v5.5-rc1. Changing 
Intel ME settings, even if it would work around the issue, is not an 
acceptable solution. Delaying the resume time is also not a solution.

Regarding Intel Management Engine, only Intel knows what it does and 
what the error is, as the ME firmware is proprietary and closed.

Lastly, there is no way to fully disable the Intel Management Engine. 
The HAP stuff claims to stop the Intel ME execution, but nobody knows, 
if it’s successful.

Aaron, Kai-Hang, can you send the revert?


Kind regards,

Paul
Kai-Heng Feng March 26, 2020, 11:29 a.m. UTC | #10
Hi Paul,

> On Mar 25, 2020, at 23:49, Paul Menzel <pmenzel@molgen.mpg.de> wrote:
> 
> Dear Linux folks,
> 
> 
> Am 25.03.20 um 14:58 schrieb Neftin, Sasha:
>> On 3/25/2020 08:43, Aaron Ma wrote:
> 
>>> On 3/25/20 2:36 PM, Neftin, Sasha wrote:
>>>> On 3/25/2020 06:17, Kai-Heng Feng wrote:
> 
>>>>>> On Mar 24, 2020, at 03:16, Aaron Ma <aaron.ma@canonical.com> wrote:
>>>>>> 
>>>>>> ME takes 2+ seconds to un-configure ULP mode done after resume
>>>>>> from s2idle on some ThinkPad laptops.
>>>>>> Without enough wait, reset and re-init will fail with error.
>>>>> 
>>>>> Thanks, this patch solves the issue. We can drop the DMI quirk in
>>>>> favor of this patch.
>>>>> 
>>>>>> Fixes: f15bb6dde738cc8fa0 ("e1000e: Add support for S0ix")
>>>>>> BugLink: https://bugs.launchpad.net/bugs/1865570
>>>>>> Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
>>>>>> ---
>>>>>> drivers/net/ethernet/intel/e1000e/ich8lan.c | 4 ++--
>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>> 
>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>> index b4135c50e905..147b15a2f8b3 100644
>>>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>> @@ -1240,9 +1240,9 @@ static s32 e1000_disable_ulp_lpt_lp(struct
>>>>>> e1000_hw *hw, bool force)
>>>>>>              ew32(H2ME, mac_reg);
>>>>>>          }
>>>>>> 
>>>>>> -        /* Poll up to 300msec for ME to clear ULP_CFG_DONE. */
>>>>>> +        /* Poll up to 2.5sec for ME to clear ULP_CFG_DONE. */
>>>>>>          while (er32(FWSM) & E1000_FWSM_ULP_CFG_DONE) {
>>>>>> -            if (i++ == 30) {
>>>>>> +            if (i++ == 250) {
>>>>>>                  ret_val = -E1000_ERR_PHY;
>>>>>>                  goto out;
>>>>>>              }
>>>>> 
>>>>> The return value was not caught by the caller, so the error ends up
>>>>> unnoticed.
>>>>> Maybe let the caller check the return value of
>>>>> e1000_disable_ulp_lpt_lp()?
> 
>>>> I a bit confused. In our previous conversation you told ME not running.
>>>> let me shimming in. Increasing delay won't be solve the problem and just
>>>> mask it. We need to understand why ME take too much time. What is
>>>> problem with this specific system?
>>>> So, basically no ME system should works for you.
>>> 
>>> Some laptops ME work that's why only reproduce issue on some laptops.
>>> In this issue i219 is controlled by ME.
>> 
>> Who can explain - why ME required too much time on this system?
>> Probably need work with ME/BIOS vendor and understand it.
>> Delay will just mask the real problem - we need to find root cause.
>>> Quirk is only for 1 model type. But issue is reproduced by more models.
>>> So it won't work.
> 
> (Where is Aaron’s reply? It wasn’t delivered to me yet.)
> 
> As this is clearly a regression, please revert the commit for now, and then find a way to correctly implement S0ix support. Linux’ regression policy demands that as no fix has been found since v5.5-rc1. Changing Intel ME settings, even if it would work around the issue, is not an acceptable solution. Delaying the resume time is also not a solution.

The s0ix patch can reduce power consumption, finally makes S2idle an acceptable sleep method.
So I'd say it's a fix, albeit a regression was introduced.

> 
> Regarding Intel Management Engine, only Intel knows what it does and what the error is, as the ME firmware is proprietary and closed.
> 
> Lastly, there is no way to fully disable the Intel Management Engine. The HAP stuff claims to stop the Intel ME execution, but nobody knows, if it’s successful.
> 
> Aaron, Kai-Hang, can you send the revert?

I consider that as an important fix for s2idle, I don't think reverting is appropriate.

Kai-Heng

> 
> 
> Kind regards,
> 
> Paul
> 
>
Paul Menzel March 26, 2020, 11:41 a.m. UTC | #11
Dear Kai-Heng,


Am 26.03.20 um 12:29 schrieb Kai-Heng Feng:

>> On Mar 25, 2020, at 23:49, Paul Menzel <pmenzel@molgen.mpg.de> wrote:

>> Am 25.03.20 um 14:58 schrieb Neftin, Sasha:
>>> On 3/25/2020 08:43, Aaron Ma wrote:
>>
>>>> On 3/25/20 2:36 PM, Neftin, Sasha wrote:
>>>>> On 3/25/2020 06:17, Kai-Heng Feng wrote:
>>
>>>>>>> On Mar 24, 2020, at 03:16, Aaron Ma <aaron.ma@canonical.com> wrote:
>>>>>>>
>>>>>>> ME takes 2+ seconds to un-configure ULP mode done after resume
>>>>>>> from s2idle on some ThinkPad laptops.
>>>>>>> Without enough wait, reset and re-init will fail with error.
>>>>>>
>>>>>> Thanks, this patch solves the issue. We can drop the DMI quirk in
>>>>>> favor of this patch.
>>>>>>
>>>>>>> Fixes: f15bb6dde738cc8fa0 ("e1000e: Add support for S0ix")
>>>>>>> BugLink: https://bugs.launchpad.net/bugs/1865570
>>>>>>> Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
>>>>>>> ---
>>>>>>> drivers/net/ethernet/intel/e1000e/ich8lan.c | 4 ++--
>>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>> index b4135c50e905..147b15a2f8b3 100644
>>>>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>> @@ -1240,9 +1240,9 @@ static s32 e1000_disable_ulp_lpt_lp(struct
>>>>>>> e1000_hw *hw, bool force)
>>>>>>>               ew32(H2ME, mac_reg);
>>>>>>>           }
>>>>>>>
>>>>>>> -        /* Poll up to 300msec for ME to clear ULP_CFG_DONE. */
>>>>>>> +        /* Poll up to 2.5sec for ME to clear ULP_CFG_DONE. */
>>>>>>>           while (er32(FWSM) & E1000_FWSM_ULP_CFG_DONE) {
>>>>>>> -            if (i++ == 30) {
>>>>>>> +            if (i++ == 250) {
>>>>>>>                   ret_val = -E1000_ERR_PHY;
>>>>>>>                   goto out;
>>>>>>>               }
>>>>>>
>>>>>> The return value was not caught by the caller, so the error ends up
>>>>>> unnoticed.
>>>>>> Maybe let the caller check the return value of
>>>>>> e1000_disable_ulp_lpt_lp()?
>>
>>>>> I a bit confused. In our previous conversation you told ME not running.
>>>>> let me shimming in. Increasing delay won't be solve the problem and just
>>>>> mask it. We need to understand why ME take too much time. What is
>>>>> problem with this specific system?
>>>>> So, basically no ME system should works for you.
>>>>
>>>> Some laptops ME work that's why only reproduce issue on some laptops.
>>>> In this issue i219 is controlled by ME.
>>>
>>> Who can explain - why ME required too much time on this system?
>>> Probably need work with ME/BIOS vendor and understand it.
>>> Delay will just mask the real problem - we need to find root cause.
>>>> Quirk is only for 1 model type. But issue is reproduced by more models.
>>>> So it won't work.
>>
>> (Where is Aaron’s reply? It wasn’t delivered to me yet.)
>>
>> As this is clearly a regression, please revert the commit for now,
>> and then find a way to correctly implement S0ix support. Linux’
>> regression policy demands that as no fix has been found since
>> v5.5-rc1. Changing Intel ME settings, even if it would work around
>> the issue, is not an acceptable solution. Delaying the resume time
>> is also not a solution.
> 
> The s0ix patch can reduce power consumption, finally makes S2idle an
> acceptable sleep method. So I'd say it's a fix, albeit a regression
> was introduced.
> 
>> Regarding Intel Management Engine, only Intel knows what it does
>> and what the error is, as the ME firmware is proprietary and
>> closed.
>> 
>> Lastly, there is no way to fully disable the Intel Management
>> Engine. The HAP stuff claims to stop the Intel ME execution, but
>> nobody knows, if it’s successful.
>> 
>> Aaron, Kai-Hang, can you send the revert?
> 
> I consider that as an important fix for s2idle, I don't think
> reverting is appropriate.

If there is a fix with no other regression, I agree. But there has not 
been one, so please revert. It doesn’t matter if the commit introducing 
the regression fixes something else. It gets too complicated to decide, 
which regression is worth it or not. The Linux kernel guarantees its 
users, they can update any time without breaking user space (in this 
case suspend/resume).  Please read Linus’ several messages about that. 
His message [1] exactly addresses your arguments.

> Yeah, HELL NO!
> 
> Guess what? You're wrong. YOU ARE MISSING THE #1 KERNEL RULE.
> 
> We do not regress, and we do not regress exactly because your are 100% wrong.
> 
> And the reason you state for your opinion is in fact exactly *WHY* you
> are wrong.
> 
> Your "good reasons" are pure and utter garbage.
> 
> The whole point of "we do not regress" is so that people can upgrade
> the kernel and never have to worry about it.
> 
>> Kernel had a bug which has been fixed
> 
> That is *ENTIRELY* immaterial.
> 
> Guys, whether something was buggy or not DOES NOT MATTER.

So, please (also Intel developers), please adhere to this rule, and 
revert the commit.


Kind regards,

Paul


[1]: https://lkml.org/lkml/2018/8/3/621
Sasha Neftin March 26, 2020, 2:34 p.m. UTC | #12
On 3/26/2020 13:41, Paul Menzel wrote:
> Dear Kai-Heng,
> 
> 
> Am 26.03.20 um 12:29 schrieb Kai-Heng Feng:
> 
>>> On Mar 25, 2020, at 23:49, Paul Menzel <pmenzel@molgen.mpg.de> wrote:
> 
>>> Am 25.03.20 um 14:58 schrieb Neftin, Sasha:
>>>> On 3/25/2020 08:43, Aaron Ma wrote:
>>>
>>>>> On 3/25/20 2:36 PM, Neftin, Sasha wrote:
>>>>>> On 3/25/2020 06:17, Kai-Heng Feng wrote:
>>>
>>>>>>>> On Mar 24, 2020, at 03:16, Aaron Ma <aaron.ma@canonical.com> wrote:
>>>>>>>>
>>>>>>>> ME takes 2+ seconds to un-configure ULP mode done after resume
>>>>>>>> from s2idle on some ThinkPad laptops.
>>>>>>>> Without enough wait, reset and re-init will fail with error.
>>>>>>>
>>>>>>> Thanks, this patch solves the issue. We can drop the DMI quirk in
>>>>>>> favor of this patch.
>>>>>>>
>>>>>>>> Fixes: f15bb6dde738cc8fa0 ("e1000e: Add support for S0ix")
>>>>>>>> BugLink: https://bugs.launchpad.net/bugs/1865570
>>>>>>>> Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
>>>>>>>> ---
>>>>>>>> drivers/net/ethernet/intel/e1000e/ich8lan.c | 4 ++--
>>>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>>> index b4135c50e905..147b15a2f8b3 100644
>>>>>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>>> @@ -1240,9 +1240,9 @@ static s32 e1000_disable_ulp_lpt_lp(struct
>>>>>>>> e1000_hw *hw, bool force)
>>>>>>>>               ew32(H2ME, mac_reg);
>>>>>>>>           }
>>>>>>>>
>>>>>>>> -        /* Poll up to 300msec for ME to clear ULP_CFG_DONE. */
>>>>>>>> +        /* Poll up to 2.5sec for ME to clear ULP_CFG_DONE. */
>>>>>>>>           while (er32(FWSM) & E1000_FWSM_ULP_CFG_DONE) {
>>>>>>>> -            if (i++ == 30) {
>>>>>>>> +            if (i++ == 250) {
>>>>>>>>                   ret_val = -E1000_ERR_PHY;
>>>>>>>>                   goto out;
>>>>>>>>               }
>>>>>>>
>>>>>>> The return value was not caught by the caller, so the error ends up
>>>>>>> unnoticed.
>>>>>>> Maybe let the caller check the return value of
>>>>>>> e1000_disable_ulp_lpt_lp()?
>>>
>>>>>> I a bit confused. In our previous conversation you told ME not 
>>>>>> running.
>>>>>> let me shimming in. Increasing delay won't be solve the problem 
>>>>>> and just
>>>>>> mask it. We need to understand why ME take too much time. What is
>>>>>> problem with this specific system?
>>>>>> So, basically no ME system should works for you.
>>>>>
>>>>> Some laptops ME work that's why only reproduce issue on some laptops.
>>>>> In this issue i219 is controlled by ME.
>>>>
>>>> Who can explain - why ME required too much time on this system?
>>>> Probably need work with ME/BIOS vendor and understand it.
>>>> Delay will just mask the real problem - we need to find root cause.
>>>>> Quirk is only for 1 model type. But issue is reproduced by more 
>>>>> models.
>>>>> So it won't work.
>>>
>>> (Where is Aaron’s reply? It wasn’t delivered to me yet.)
>>>
>>> As this is clearly a regression, please revert the commit for now,
>>> and then find a way to correctly implement S0ix support. Linux’
>>> regression policy demands that as no fix has been found since
>>> v5.5-rc1. Changing Intel ME settings, even if it would work around
>>> the issue, is not an acceptable solution. Delaying the resume time
>>> is also not a solution.
>>
>> The s0ix patch can reduce power consumption, finally makes S2idle an
>> acceptable sleep method. So I'd say it's a fix, albeit a regression
>> was introduced.
>>
>>> Regarding Intel Management Engine, only Intel knows what it does
>>> and what the error is, as the ME firmware is proprietary and
>>> closed.
>>>
>>> Lastly, there is no way to fully disable the Intel Management
>>> Engine. The HAP stuff claims to stop the Intel ME execution, but
>>> nobody knows, if it’s successful.
>>>
>>> Aaron, Kai-Hang, can you send the revert?
>>
>> I consider that as an important fix for s2idle, I don't think
>> reverting is appropriate.
> 
> If there is a fix with no other regression, I agree. But there has not 
> been one, so please revert. It doesn’t matter if the commit introducing 
> the regression fixes something else. It gets too complicated to decide, 
> which regression is worth it or not. The Linux kernel guarantees its 
> users, they can update any time without breaking user space (in this 
> case suspend/resume).  Please read Linus’ several messages about that. 
> His message [1] exactly addresses your arguments.
> 
Revert is no option. S0ix supported on none ME system, approved by Intel 
design team and power management domain owner.
Vendor should provide none ME BIOS I thought. Our PAE will work toward 
meet this.
>> Yeah, HELL NO!
>>
>> Guess what? You're wrong. YOU ARE MISSING THE #1 KERNEL RULE.
>>
>> We do not regress, and we do not regress exactly because your are 100% 
>> wrong.
>>
>> And the reason you state for your opinion is in fact exactly *WHY* you
>> are wrong.
>>
>> Your "good reasons" are pure and utter garbage.
>>
>> The whole point of "we do not regress" is so that people can upgrade
>> the kernel and never have to worry about it.
>>
>>> Kernel had a bug which has been fixed
>>
>> That is *ENTIRELY* immaterial.
>>
>> Guys, whether something was buggy or not DOES NOT MATTER.
> 
> So, please (also Intel developers), please adhere to this rule, and 
> revert the commit.
> 
> 
> Kind regards,
> 
> Paul
> 
> 
> [1]: https://lkml.org/lkml/2018/8/3/621
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
Paul Menzel March 26, 2020, 6:37 p.m. UTC | #13
Dear Jeff, dear David,


Could you please comment as maintainers?


Am 26.03.20 um 15:34 schrieb Neftin, Sasha:
> On 3/26/2020 13:41, Paul Menzel wrote:

>> Am 26.03.20 um 12:29 schrieb Kai-Heng Feng:
>>
>>>> On Mar 25, 2020, at 23:49, Paul Menzel <pmenzel@molgen.mpg.de> wrote:
>>
>>>> Am 25.03.20 um 14:58 schrieb Neftin, Sasha:
>>>>> On 3/25/2020 08:43, Aaron Ma wrote:
>>>>
>>>>>> On 3/25/20 2:36 PM, Neftin, Sasha wrote:
>>>>>>> On 3/25/2020 06:17, Kai-Heng Feng wrote:
>>>>
>>>>>>>>> On Mar 24, 2020, at 03:16, Aaron Ma <aaron.ma@canonical.com> 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> ME takes 2+ seconds to un-configure ULP mode done after resume
>>>>>>>>> from s2idle on some ThinkPad laptops.
>>>>>>>>> Without enough wait, reset and re-init will fail with error.
>>>>>>>>
>>>>>>>> Thanks, this patch solves the issue. We can drop the DMI quirk in
>>>>>>>> favor of this patch.
>>>>>>>>
>>>>>>>>> Fixes: f15bb6dde738cc8fa0 ("e1000e: Add support for S0ix")
>>>>>>>>> BugLink: https://bugs.launchpad.net/bugs/1865570
>>>>>>>>> Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
>>>>>>>>> ---
>>>>>>>>> drivers/net/ethernet/intel/e1000e/ich8lan.c | 4 ++--
>>>>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>>>> index b4135c50e905..147b15a2f8b3 100644
>>>>>>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>>>>>> @@ -1240,9 +1240,9 @@ static s32 e1000_disable_ulp_lpt_lp(struct
>>>>>>>>> e1000_hw *hw, bool force)
>>>>>>>>>               ew32(H2ME, mac_reg);
>>>>>>>>>           }
>>>>>>>>>
>>>>>>>>> -        /* Poll up to 300msec for ME to clear ULP_CFG_DONE. */
>>>>>>>>> +        /* Poll up to 2.5sec for ME to clear ULP_CFG_DONE. */
>>>>>>>>>           while (er32(FWSM) & E1000_FWSM_ULP_CFG_DONE) {
>>>>>>>>> -            if (i++ == 30) {
>>>>>>>>> +            if (i++ == 250) {
>>>>>>>>>                   ret_val = -E1000_ERR_PHY;
>>>>>>>>>                   goto out;
>>>>>>>>>               }
>>>>>>>>
>>>>>>>> The return value was not caught by the caller, so the error ends up
>>>>>>>> unnoticed.
>>>>>>>> Maybe let the caller check the return value of
>>>>>>>> e1000_disable_ulp_lpt_lp()?
>>>>
>>>>>>> I a bit confused. In our previous conversation you told ME not 
>>>>>>> running.
>>>>>>> let me shimming in. Increasing delay won't be solve the problem 
>>>>>>> and just
>>>>>>> mask it. We need to understand why ME take too much time. What is
>>>>>>> problem with this specific system?
>>>>>>> So, basically no ME system should works for you.
>>>>>>
>>>>>> Some laptops ME work that's why only reproduce issue on some laptops.
>>>>>> In this issue i219 is controlled by ME.
>>>>>
>>>>> Who can explain - why ME required too much time on this system?
>>>>> Probably need work with ME/BIOS vendor and understand it.
>>>>> Delay will just mask the real problem - we need to find root cause.
>>>>>> Quirk is only for 1 model type. But issue is reproduced by more 
>>>>>> models.
>>>>>> So it won't work.
>>>>
>>>> (Where is Aaron’s reply? It wasn’t delivered to me yet.)
>>>>
>>>> As this is clearly a regression, please revert the commit for now,
>>>> and then find a way to correctly implement S0ix support. Linux’
>>>> regression policy demands that as no fix has been found since
>>>> v5.5-rc1. Changing Intel ME settings, even if it would work around
>>>> the issue, is not an acceptable solution. Delaying the resume time
>>>> is also not a solution.
>>>
>>> The s0ix patch can reduce power consumption, finally makes S2idle an
>>> acceptable sleep method. So I'd say it's a fix, albeit a regression
>>> was introduced.
>>>
>>>> Regarding Intel Management Engine, only Intel knows what it does
>>>> and what the error is, as the ME firmware is proprietary and
>>>> closed.
>>>>
>>>> Lastly, there is no way to fully disable the Intel Management
>>>> Engine. The HAP stuff claims to stop the Intel ME execution, but
>>>> nobody knows, if it’s successful.
>>>>
>>>> Aaron, Kai-Hang, can you send the revert?
>>>
>>> I consider that as an important fix for s2idle, I don't think
>>> reverting is appropriate.
>>
>> If there is a fix with no other regression, I agree. But there has not 
>> been one, so please revert. It doesn’t matter if the commit 
>> introducing the regression fixes something else. It gets too 
>> complicated to decide, which regression is worth it or not. The Linux 
>> kernel guarantees its users, they can update any time without breaking 
>> user space (in this case suspend/resume).  Please read Linus’ several 
>> messages about that. His message [1] exactly addresses your arguments.
>>
> Revert is no option. S0ix supported on none ME system, approved by Intel 
> design team and power management domain owner.
> Vendor should provide none ME BIOS I thought. Our PAE will work toward 
> meet this.

Did you read Linus’ messages? It doesn’t matter.

Requiring people to change system firmware settings is a no-go.

>>> Yeah, HELL NO!
>>>
>>> Guess what? You're wrong. YOU ARE MISSING THE #1 KERNEL RULE.
>>>
>>> We do not regress, and we do not regress exactly because your are 
>>> 100% wrong.
>>>
>>> And the reason you state for your opinion is in fact exactly *WHY* you
>>> are wrong.
>>>
>>> Your "good reasons" are pure and utter garbage.
>>>
>>> The whole point of "we do not regress" is so that people can upgrade
>>> the kernel and never have to worry about it.
>>>
>>>> Kernel had a bug which has been fixed
>>>
>>> That is *ENTIRELY* immaterial.
>>>
>>> Guys, whether something was buggy or not DOES NOT MATTER.
>>
>> So, please (also Intel developers), please adhere to this rule, and 
>> revert the commit.


Kind regards,

Paul


>> [1]: https://lkml.org/lkml/2018/8/3/621
David Laight March 28, 2020, 10:55 a.m. UTC | #14
From: Kai-Heng Feng
> Sent: 26 March 2020 11:30
...
> > Regarding Intel Management Engine, only Intel knows what it does and what the error is, as the ME
> firmware is proprietary and closed.
> >
> > Lastly, there is no way to fully disable the Intel Management Engine.
> > The HAP stuff claims to stop
> > the Intel ME execution, but nobody knows, if it’s successful.

This isn't the only 'bug' caused by the ME logic.

Some systems occasionally spin for many multiples of 50us on any
write to any MAC register - eg to indicate there is a packet to tx.

I really don't understand WTF this ME is playing at on an unmanaged
desktop system - if it receives or sends a packet it is most likely
to be some kind of security attack.
I'm not even sure it needs access during the boot sequence.
Maybe there are some features to get the console output over
ethernet - but they have to be enabled in the BIOS.

We have some small server boards (for 1U systems) that have a
separate ethernet interface for (I think) the ME code.
Better - except you plug a cable in and wonder why is doesn't work.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Hans de Goede April 2, 2020, 12:31 p.m. UTC | #15
Hi,

On 3/23/20 8:16 PM, Aaron Ma wrote:
> ME takes 2+ seconds to un-configure ULP mode done after resume
> from s2idle on some ThinkPad laptops.
> Without enough wait, reset and re-init will fail with error.
> 
> Fixes: f15bb6dde738cc8fa0 ("e1000e: Add support for S0ix")
> BugLink: https://bugs.launchpad.net/bugs/1865570
> Signed-off-by: Aaron Ma <aaron.ma@canonical.com>

I have been testing this bug because this is being reported against
Fedora 32 too:

https://bugzilla.redhat.com/show_bug.cgi?id=1816621

I can confirm that this patch fixes the problem of both
a X1 7th gen as a X1 8th gen no longer suspending after
a suspend resume cycle.

Not only does it fix that, before this patch the kernel
would regularly log the following error on these laptops
independent of suspend/resume activity:

e1000e 0000:00:1f.6 enp0s31f6: Hardware Error

These messages are now also gone. So it seems that the timeout
is really just too short.

I can agree that it would be good to better understand this;
and/or to get the ME firmware fixed to not take so long.

But in my experience when dealing with e.g. embedded-controller
in various laptops sometimes the firmware of these devives
simply just takes a long time for certain things.

This fix fixes a real problem, on a popular model laptop
and since it just extends a timeout it is a pretty harmless
(no chance of regressions) fix. As such since there seems
to be no other solution in sight, can we please move forward
with this fix for now ?

Regards,

Hans





> ---
>   drivers/net/ethernet/intel/e1000e/ich8lan.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> index b4135c50e905..147b15a2f8b3 100644
> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
> @@ -1240,9 +1240,9 @@ static s32 e1000_disable_ulp_lpt_lp(struct e1000_hw *hw, bool force)
>   			ew32(H2ME, mac_reg);
>   		}
>   
> -		/* Poll up to 300msec for ME to clear ULP_CFG_DONE. */
> +		/* Poll up to 2.5sec for ME to clear ULP_CFG_DONE. */
>   		while (er32(FWSM) & E1000_FWSM_ULP_CFG_DONE) {
> -			if (i++ == 30) {
> +			if (i++ == 250) {
>   				ret_val = -E1000_ERR_PHY;
>   				goto out;
>   			}
>
Aaron Ma April 3, 2020, 3:15 a.m. UTC | #16
Hi Jeffrey:

I have received the email that you apply this patch to next-queue branch
dev-queue.

But after this branch is rebased to v5.6, I can't find it.

Will you apply again?

Thanks,
Aaron

On 4/2/20 8:31 PM, Hans de Goede wrote:
> 
> This fix fixes a real problem, on a popular model laptop
> and since it just extends a timeout it is a pretty harmless
> (no chance of regressions) fix. As such since there seems
> to be no other solution in sight, can we please move forward
> with this fix for now ?
Paul Menzel April 3, 2020, 7:37 a.m. UTC | #17
Dear Linux folks,


Am 03.04.20 um 05:15 schrieb Aaron Ma:

> I have received the email that you apply this patch to next-queue branch
> dev-queue.
> 
> But after this branch is rebased to v5.6, I can't find it.
> 
> Will you apply again?

I really urge you to write more elaborate commit messages.

The exact error is not listed. The known regressed devices are not 
listed, so possible testers cannot easily test or affected people cannot 
find it, when searching the Linux git history.

How did you find out, that ME takes more than two seconds?

Also, it’s not clear, what effect increasing the timeout has. Does the 
whole resume process take longer or just getting the Ethernet device 
back up?

Lastly, in case of the timeout, an error message should be printed, so 
people with even more broken ME devices, get a clue.

Without this information, how can anybody know, if this is the right fix 
and distributions, if it should be backported?


Kind regards,

Paul
Sasha Neftin April 5, 2020, 6:46 a.m. UTC | #18
On 4/3/2020 06:15, Aaron Ma wrote:
> Hi Jeffrey:
> 
> I have received the email that you apply this patch to next-queue branch
> dev-queue.
> 
> But after this branch is rebased to v5.6, I can't find it.
> 
> Will you apply again?
Aaron, Kai,
S0ix flow supported only on none ME system. Our PAE works to communicate 
this to OS vendors. You should get BIOS option to disable ME on your system.
This fix just will mask real problem on specific system - won't be solve 
the problem. I suggest recall this patch and Lenovo Carbon in DMI black 
list.
> 
> Thanks,
> Aaron
> 
> On 4/2/20 8:31 PM, Hans de Goede wrote:
>>
>> This fix fixes a real problem, on a popular model laptop
>> and since it just extends a timeout it is a pretty harmless
>> (no chance of regressions) fix. As such since there seems
>> to be no other solution in sight, can we please move forward
>> with this fix for now ?
Thanks,
Sasha
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index b4135c50e905..147b15a2f8b3 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -1240,9 +1240,9 @@  static s32 e1000_disable_ulp_lpt_lp(struct e1000_hw *hw, bool force)
 			ew32(H2ME, mac_reg);
 		}
 
-		/* Poll up to 300msec for ME to clear ULP_CFG_DONE. */
+		/* Poll up to 2.5sec for ME to clear ULP_CFG_DONE. */
 		while (er32(FWSM) & E1000_FWSM_ULP_CFG_DONE) {
-			if (i++ == 30) {
+			if (i++ == 250) {
 				ret_val = -E1000_ERR_PHY;
 				goto out;
 			}