e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes

Message ID 20180510072835.5549-1-bpoirier@suse.com
State Accepted
Delegated to: Jeff Kirsher
Headers show
Series
  • e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
Related show

Commit Message

Benjamin Poirier May 10, 2018, 7:28 a.m.
There have been multiple reports of crashes that look like
kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
[...]
kernel: Call Trace:
kernel:  [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
kernel:  [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
kernel:  [<ffffffff810992c5>] process_one_work+0x155/0x440
kernel:  [<ffffffff81099e16>] worker_thread+0x116/0x4b0
kernel:  [<ffffffff8109f422>] kthread+0xd2/0xf0
kernel:  [<ffffffff8163184f>] ret_from_fork+0x3f/0x70

These can be traced back to the fact that e1000e_systim_reset() skips the
timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
leads to a null deref in timecounter_read().

Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
e1000e_get_base_timinca() in such a way that it can return -EINVAL for
e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.

Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
sometimes don't have the SYSCFI bit set. Retrying the read shortly after
finds the bit to be set. This was observed at boot (probe) but also link up
and link down.

Moreover, the phc (PTP Hardware Clock) seems to operate normally even after
reads where SYSCFI=0. Therefore, remove this register read and
unconditionally set the clock parameters.

Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de>
Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
Fixes: 83129b37ef35 ("e1000e: fix systim issues")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

Comments

Keller, Jacob E May 10, 2018, 6:42 p.m. | #1
> -----Original Message-----
> From: Benjamin Poirier [mailto:bpoirier@suse.com]
> Sent: Thursday, May 10, 2018 12:29 AM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; Achim Mildenberger
> <admin@fph.physik.uni-karlsruhe.de>; olouvignes@gmail.com;
> jayanth@goubiq.com; ehabkost@redhat.com; postmodern.mod3@gmail.com;
> Bart.VanAssche@wdc.com; intel-wired-lan@lists.osuosl.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
> 
> There have been multiple reports of crashes that look like
> kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
> [...]
> kernel: Call Trace:
> kernel:  [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
> kernel:  [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
> kernel:  [<ffffffff810992c5>] process_one_work+0x155/0x440
> kernel:  [<ffffffff81099e16>] worker_thread+0x116/0x4b0
> kernel:  [<ffffffff8109f422>] kthread+0xd2/0xf0
> kernel:  [<ffffffff8163184f>] ret_from_fork+0x3f/0x70
> 
> These can be traced back to the fact that e1000e_systim_reset() skips the
> timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
> leads to a null deref in timecounter_read().
> 
> Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
> e1000e_get_base_timinca() in such a way that it can return -EINVAL for
> e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.
> 
> Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
> adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
> sometimes don't have the SYSCFI bit set. Retrying the read shortly after
> finds the bit to be set. This was observed at boot (probe) but also link up
> and link down.
> 
> Moreover, the phc (PTP Hardware Clock) seems to operate normally even after
> reads where SYSCFI=0. Therefore, remove this register read and
> unconditionally set the clock parameters.
> 
> Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de>
> Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
> Fixes: 83129b37ef35 ("e1000e: fix systim issues")
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++---------
>  1 file changed, 6 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index ec4a9759a6f2..3afb1f3b6f91 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -3546,15 +3546,12 @@ s32 e1000e_get_base_timinca(struct e1000_adapter
> *adapter, u32 *timinca)
>  		}
>  		break;
>  	case e1000_pch_spt:
> -		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
> -			/* Stable 24MHz frequency */
> -			incperiod = INCPERIOD_24MHZ;
> -			incvalue = INCVALUE_24MHZ;
> -			shift = INCVALUE_SHIFT_24MHZ;
> -			adapter->cc.shift = shift;
> -			break;
> -		}
> -		return -EINVAL;
> +		/* Stable 24MHz frequency */
> +		incperiod = INCPERIOD_24MHZ;
> +		incvalue = INCVALUE_24MHZ;
> +		shift = INCVALUE_SHIFT_24MHZ;
> +		adapter->cc.shift = shift;
> +		break;
>  	case e1000_pch_cnp:
>  		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
>  			/* Stable 24MHz frequency */
> --
> 2.16.3

Given testing showing that the clock operates fine regardless of the register read, I think this is probably fine. Normally I believe the register was used to check which frequency was in use, but it doesn't seem to serve that purpose here.

Thanks,
Jake
Sasha Neftin May 13, 2018, 6:55 a.m. | #2
On 5/10/2018 21:42, Keller, Jacob E wrote:
>> -----Original Message-----
>> From: Benjamin Poirier [mailto:bpoirier@suse.com]
>> Sent: Thursday, May 10, 2018 12:29 AM
>> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
>> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; Achim Mildenberger
>> <admin@fph.physik.uni-karlsruhe.de>; olouvignes@gmail.com;
>> jayanth@goubiq.com; ehabkost@redhat.com; postmodern.mod3@gmail.com;
>> Bart.VanAssche@wdc.com; intel-wired-lan@lists.osuosl.org;
>> netdev@vger.kernel.org; linux-kernel@vger.kernel.org
>> Subject: [PATCH] e1000e: Ignore TSYNCRXCTL when getting I219 clock attributes
>>
>> There have been multiple reports of crashes that look like
>> kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
>> [...]
>> kernel: Call Trace:
>> kernel:  [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
>> kernel:  [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80 [e1000e]
>> kernel:  [<ffffffff810992c5>] process_one_work+0x155/0x440
>> kernel:  [<ffffffff81099e16>] worker_thread+0x116/0x4b0
>> kernel:  [<ffffffff8109f422>] kthread+0xd2/0xf0
>> kernel:  [<ffffffff8163184f>] ret_from_fork+0x3f/0x70
>>
>> These can be traced back to the fact that e1000e_systim_reset() skips the
>> timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
>> leads to a null deref in timecounter_read().
>>
>> Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
>> e1000e_get_base_timinca() in such a way that it can return -EINVAL for
>> e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.
>>
>> Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
>> adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
>> sometimes don't have the SYSCFI bit set. Retrying the read shortly after
>> finds the bit to be set. This was observed at boot (probe) but also link up
>> and link down.
>>
>> Moreover, the phc (PTP Hardware Clock) seems to operate normally even after
>> reads where SYSCFI=0. Therefore, remove this register read and
>> unconditionally set the clock parameters.
>>
>> Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de>
>> Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
>> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
>> Fixes: 83129b37ef35 ("e1000e: fix systim issues")
>> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
>> ---
>>   drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++---------
>>   1 file changed, 6 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>> index ec4a9759a6f2..3afb1f3b6f91 100644
>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>> @@ -3546,15 +3546,12 @@ s32 e1000e_get_base_timinca(struct e1000_adapter
>> *adapter, u32 *timinca)
>>   		}
>>   		break;
>>   	case e1000_pch_spt:
>> -		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
>> -			/* Stable 24MHz frequency */
>> -			incperiod = INCPERIOD_24MHZ;
>> -			incvalue = INCVALUE_24MHZ;
>> -			shift = INCVALUE_SHIFT_24MHZ;
>> -			adapter->cc.shift = shift;
>> -			break;
>> -		}
>> -		return -EINVAL;
>> +		/* Stable 24MHz frequency */
>> +		incperiod = INCPERIOD_24MHZ;
>> +		incvalue = INCVALUE_24MHZ;
>> +		shift = INCVALUE_SHIFT_24MHZ;
>> +		adapter->cc.shift = shift;
>> +		break;
>>   	case e1000_pch_cnp:
>>   		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
>>   			/* Stable 24MHz frequency */
>> --
>> 2.16.3
> 
> Given testing showing that the clock operates fine regardless of the register read, I think this is probably fine. Normally I believe the register was used to check which frequency was in use, but it doesn't seem to serve that purpose here.
> 
> Thanks,
> Jake
> _______________________________________________
> Intel-wired-lan mailing list
> Intel-wired-lan@osuosl.org
> https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
> 
I've checked our specification, looks only 24MHz used for this product. 
Hope no different platform with another clock support has been 
distributed. So, let's pick up this change.
Brown, Aaron F May 23, 2018, 12:44 a.m. | #3
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Benjamin Poirier
> Sent: Thursday, May 10, 2018 12:29 AM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: ehabkost@redhat.com; netdev@vger.kernel.org; jayanth@goubiq.com;
> linux-kernel@vger.kernel.org; Bart.VanAssche@wdc.com;
> postmodern.mod3@gmail.com; Achim Mildenberger
> <admin@fph.physik.uni-karlsruhe.de>; intel-wired-lan@lists.osuosl.org;
> olouvignes@gmail.com
> Subject: [Intel-wired-lan] [PATCH] e1000e: Ignore TSYNCRXCTL when getting
> I219 clock attributes
> 
> There have been multiple reports of crashes that look like
> kernel: RIP: 0010:[<ffffffff8110303f>] timecounter_read+0xf/0x50
> [...]
> kernel: Call Trace:
> kernel:  [<ffffffffa0806b0f>] e1000e_phc_gettime+0x2f/0x60 [e1000e]
> kernel:  [<ffffffffa0806c5d>] e1000e_systim_overflow_work+0x1d/0x80
> [e1000e]
> kernel:  [<ffffffff810992c5>] process_one_work+0x155/0x440
> kernel:  [<ffffffff81099e16>] worker_thread+0x116/0x4b0
> kernel:  [<ffffffff8109f422>] kthread+0xd2/0xf0
> kernel:  [<ffffffff8163184f>] ret_from_fork+0x3f/0x70
> 
> These can be traced back to the fact that e1000e_systim_reset() skips the
> timecounter_init() call if e1000e_get_base_timinca() returns -EINVAL, which
> leads to a null deref in timecounter_read().
> 
> Commit 83129b37ef35 ("e1000e: fix systim issues", v4.2-rc1) reworked
> e1000e_get_base_timinca() in such a way that it can return -EINVAL for
> e1000_pch_spt if the SYSCFI bit is not set in TSYNCRXCTL.
> 
> Some experimentation has shown that on I219 (e1000_pch_spt, "MAC: 12")
> adapters, the E1000_TSYNCRXCTL_SYSCFI flag is unstable; TSYNCRXCTL reads
> sometimes don't have the SYSCFI bit set. Retrying the read shortly after
> finds the bit to be set. This was observed at boot (probe) but also link up
> and link down.
> 
> Moreover, the phc (PTP Hardware Clock) seems to operate normally even
> after
> reads where SYSCFI=0. Therefore, remove this register read and
> unconditionally set the clock parameters.
> 
> Reported-by: Achim Mildenberger <admin@fph.physik.uni-karlsruhe.de>
> Message-Id: <20180425065243.g5mqewg5irkwgwgv@f2>
> Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1075876
> Fixes: 83129b37ef35 ("e1000e: fix systim issues")
> Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 15 ++++++---------
>  1 file changed, 6 insertions(+), 9 deletions(-)

Tested-by: Aaron Brown <aaron.f.brown@intel.com>

Patch

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index ec4a9759a6f2..3afb1f3b6f91 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -3546,15 +3546,12 @@  s32 e1000e_get_base_timinca(struct e1000_adapter *adapter, u32 *timinca)
 		}
 		break;
 	case e1000_pch_spt:
-		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
-			/* Stable 24MHz frequency */
-			incperiod = INCPERIOD_24MHZ;
-			incvalue = INCVALUE_24MHZ;
-			shift = INCVALUE_SHIFT_24MHZ;
-			adapter->cc.shift = shift;
-			break;
-		}
-		return -EINVAL;
+		/* Stable 24MHz frequency */
+		incperiod = INCPERIOD_24MHZ;
+		incvalue = INCVALUE_24MHZ;
+		shift = INCVALUE_SHIFT_24MHZ;
+		adapter->cc.shift = shift;
+		break;
 	case e1000_pch_cnp:
 		if (er32(TSYNCRXCTL) & E1000_TSYNCRXCTL_SYSCFI) {
 			/* Stable 24MHz frequency */