diff mbox series

[v1] e1000e: allow non-monotonic SYSTIM readings

Message ID 20181023123739.12489-1-mlichvar@redhat.com
State Accepted
Delegated to: Jeff Kirsher
Headers show
Series [v1] e1000e: allow non-monotonic SYSTIM readings | expand

Commit Message

Miroslav Lichvar Oct. 23, 2018, 12:37 p.m. UTC
It seems with some NICs supported by the e1000e driver a SYSTIM reading
may occasionally be few microseconds before the previous reading and if
enabled also pass e1000e_sanitize_systim() without reaching the maximum
number of rereads, even if the function is modified to check three
consecutive readings (i.e. it doesn't look like a double read error).
This causes an underflow in the timecounter and the PHC time jumps hours
ahead.

This was observed on 82574, I217 and I219. The fastest way to reproduce
it is to run a program that continuously calls the PTP_SYS_OFFSET ioctl
on the PHC.

Modify e1000e_phc_gettime() to use timecounter_cyc2time() instead of
timecounter_read() in order to allow non-monotonic SYSTIM readings and
prevent the PHC from jumping.

Cc: Jacob Keller <jacob.e.keller@intel.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
---

Notes:
    RFC->v1:
    - Removed unnecessary call of PTP gettime64() in
      e1000e_systim_overflow_work()

 drivers/net/ethernet/intel/e1000e/ptp.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Comments

Jacob Keller Oct. 23, 2018, 4:32 p.m. UTC | #1
> -----Original Message-----
> From: Miroslav Lichvar [mailto:mlichvar@redhat.com]
> Sent: Tuesday, October 23, 2018 5:38 AM
> To: intel-wired-lan@lists.osuosl.org
> Cc: Miroslav Lichvar <mlichvar@redhat.com>; Keller, Jacob E
> <jacob.e.keller@intel.com>; Richard Cochran <richardcochran@gmail.com>
> Subject: [PATCH v1] e1000e: allow non-monotonic SYSTIM readings
> 
> It seems with some NICs supported by the e1000e driver a SYSTIM reading
> may occasionally be few microseconds before the previous reading and if
> enabled also pass e1000e_sanitize_systim() without reaching the maximum
> number of rereads, even if the function is modified to check three
> consecutive readings (i.e. it doesn't look like a double read error).
> This causes an underflow in the timecounter and the PHC time jumps hours
> ahead.
> 

Weird issue, but I think this is a better solution than returning garbage time data like we were before.

> This was observed on 82574, I217 and I219. The fastest way to reproduce
> it is to run a program that continuously calls the PTP_SYS_OFFSET ioctl
> on the PHC.
> 
> Modify e1000e_phc_gettime() to use timecounter_cyc2time() instead of
> timecounter_read() in order to allow non-monotonic SYSTIM readings and
> prevent the PHC from jumping.
> 

Thanks for the patch. This looks good to me.

Acked-by: Jacob Keller <jacob.e.keller@intel.com>

> Cc: Jacob Keller <jacob.e.keller@intel.com>
> Cc: Richard Cochran <richardcochran@gmail.com>
> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
> ---
> 
> Notes:
>     RFC->v1:
>     - Removed unnecessary call of PTP gettime64() in
>       e1000e_systim_overflow_work()
> 
>  drivers/net/ethernet/intel/e1000e/ptp.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/ptp.c
> b/drivers/net/ethernet/intel/e1000e/ptp.c
> index 37c76945ad9b..e1f821edbc21 100644
> --- a/drivers/net/ethernet/intel/e1000e/ptp.c
> +++ b/drivers/net/ethernet/intel/e1000e/ptp.c
> @@ -173,10 +173,14 @@ static int e1000e_phc_gettime(struct ptp_clock_info *ptp,
> struct timespec64 *ts)
>  	struct e1000_adapter *adapter = container_of(ptp, struct e1000_adapter,
>  						     ptp_clock_info);
>  	unsigned long flags;
> -	u64 ns;
> +	u64 cycles, ns;
> 
>  	spin_lock_irqsave(&adapter->systim_lock, flags);
> -	ns = timecounter_read(&adapter->tc);
> +
> +	/* Use timecounter_cyc2time() to allow non-monotonic SYSTIM readings */
> +	cycles = adapter->cc.read(&adapter->cc);
> +	ns = timecounter_cyc2time(&adapter->tc, cycles);
> +
>  	spin_unlock_irqrestore(&adapter->systim_lock, flags);
> 
>  	*ts = ns_to_timespec64(ns);
> @@ -232,9 +236,12 @@ static void e1000e_systim_overflow_work(struct
> work_struct *work)
>  						     systim_overflow_work.work);
>  	struct e1000_hw *hw = &adapter->hw;
>  	struct timespec64 ts;
> +	u64 ns;
> 
> -	adapter->ptp_clock_info.gettime64(&adapter->ptp_clock_info, &ts);
> +	/* Update the timecounter */
> +	ns = timecounter_read(&adapter->tc);
> 
> +	ts = ns_to_timespec64(ns);
>  	e_dbg("SYSTIM overflow check at %lld.%09lu\n",
>  	      (long long) ts.tv_sec, ts.tv_nsec);
> 
> --
> 2.17.1
Miroslav Lichvar Oct. 24, 2018, 9:46 a.m. UTC | #2
On Tue, Oct 23, 2018 at 04:32:50PM +0000, Keller, Jacob E wrote:
> > It seems with some NICs supported by the e1000e driver a SYSTIM reading
> > may occasionally be few microseconds before the previous reading and if
> > enabled also pass e1000e_sanitize_systim() without reaching the maximum
> > number of rereads, even if the function is modified to check three
> > consecutive readings (i.e. it doesn't look like a double read error).
> > This causes an underflow in the timecounter and the PHC time jumps hours
> > ahead.
> > 
> 
> Weird issue, but I think this is a better solution than returning garbage time data like we were before.

It is indeed a weird issue. I think one explanation could be a double
overflow of SYSTIML with the unreliable latching of SYSTIMH.

If my math is right, depending on the frequency of the clock the
SYSTIML register overflows about every 8, 16, or 262 microseconds.
That seems too short to reliably contain reading of two registers.

Let's say the first reading of SYSMTIML is 0xffff0000 and the second
reading is 0xff000000. An overflow is detected. But before SYSTIMH is
read for the second time, another overflow may happen, which will
cause the returned time to be ahead of the true PHC time and the next
correct reading may be out-of-order.

I'm wondering whether the commit 37b12910 ("e1000e: Fix tight loop
implementation of systime read algorithm") made this more likely to
happen (if it really is what happens).

The best fix might be to use a much smaller INCVALUE, so that the
double overflow cannot happen, and implement the frequency adjustment
in software, similarly to the system clock. This could be reused in
other drivers that don't support a one-step clock in order to simplify
their code.
Jacob Keller Oct. 24, 2018, 5:51 p.m. UTC | #3
> -----Original Message-----
> From: Miroslav Lichvar [mailto:mlichvar@redhat.com]
> Sent: Wednesday, October 24, 2018 2:46 AM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: intel-wired-lan@lists.osuosl.org; Richard Cochran <richardcochran@gmail.com>
> Subject: Re: [PATCH v1] e1000e: allow non-monotonic SYSTIM readings
> 
> > Weird issue, but I think this is a better solution than returning garbage time data
> like we were before.
> 
> It is indeed a weird issue. I think one explanation could be a double
> overflow of SYSTIML with the unreliable latching of SYSTIMH.
> 

Makes some sense.

> If my math is right, depending on the frequency of the clock the
> SYSTIML register overflows about every 8, 16, or 262 microseconds.
> That seems too short to reliably contain reading of two registers.
> 

Right. In theory the hardware is supposed to be latching the values, but we know that's problematic on some of the parts in this driver.

> Let's say the first reading of SYSMTIML is 0xffff0000 and the second
> reading is 0xff000000. An overflow is detected. But before SYSTIMH is
> read for the second time, another overflow may happen, which will
> cause the returned time to be ahead of the true PHC time and the next
> correct reading may be out-of-order.
> 

Right.

> I'm wondering whether the commit 37b12910 ("e1000e: Fix tight loop
> implementation of systime read algorithm") made this more likely to
> happen (if it really is what happens).

If your analysis is correct, that makes sense.

> 
> The best fix might be to use a much smaller INCVALUE, so that the
> double overflow cannot happen, and implement the frequency adjustment
> in software, similarly to the system clock. This could be reused in
> other drivers that don't support a one-step clock in order to simplify
> their code.
> 

This makes sense. Especially since we already use a timecounter, we already don't report exactly what the hardware register indicates. This can be confusing if using hardware timer controls, or if some setup tries to read timestamps out-of-band from the PTP clock interface. But I don't think that's a major concern if we're already using a timecounter.

Thanks,
Jake

> --
> Miroslav Lichvar
Brown, Aaron F Nov. 3, 2018, 2:10 a.m. UTC | #4
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Miroslav Lichvar
> Sent: Tuesday, October 23, 2018 5:38 AM
> To: intel-wired-lan@lists.osuosl.org
> Cc: Richard Cochran <richardcochran@gmail.com>
> Subject: [Intel-wired-lan] [PATCH v1] e1000e: allow non-monotonic SYSTIM
> readings
> 
> It seems with some NICs supported by the e1000e driver a SYSTIM reading
> may occasionally be few microseconds before the previous reading and if
> enabled also pass e1000e_sanitize_systim() without reaching the maximum
> number of rereads, even if the function is modified to check three
> consecutive readings (i.e. it doesn't look like a double read error).
> This causes an underflow in the timecounter and the PHC time jumps hours
> ahead.
> 
> This was observed on 82574, I217 and I219. The fastest way to reproduce
> it is to run a program that continuously calls the PTP_SYS_OFFSET ioctl
> on the PHC.
> 
> Modify e1000e_phc_gettime() to use timecounter_cyc2time() instead of
> timecounter_read() in order to allow non-monotonic SYSTIM readings and
> prevent the PHC from jumping.
> 
> Cc: Jacob Keller <jacob.e.keller@intel.com>
> Cc: Richard Cochran <richardcochran@gmail.com>
> Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
> ---
> 
> Notes:
>     RFC->v1:
>     - Removed unnecessary call of PTP gettime64() in
>       e1000e_systim_overflow_work()
> 
>  drivers/net/ethernet/intel/e1000e/ptp.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
> 

Tested-by: Aaron Brown <aaron.f.brown@intel.com>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/intel/e1000e/ptp.c b/drivers/net/ethernet/intel/e1000e/ptp.c
index 37c76945ad9b..e1f821edbc21 100644
--- a/drivers/net/ethernet/intel/e1000e/ptp.c
+++ b/drivers/net/ethernet/intel/e1000e/ptp.c
@@ -173,10 +173,14 @@  static int e1000e_phc_gettime(struct ptp_clock_info *ptp, struct timespec64 *ts)
 	struct e1000_adapter *adapter = container_of(ptp, struct e1000_adapter,
 						     ptp_clock_info);
 	unsigned long flags;
-	u64 ns;
+	u64 cycles, ns;
 
 	spin_lock_irqsave(&adapter->systim_lock, flags);
-	ns = timecounter_read(&adapter->tc);
+
+	/* Use timecounter_cyc2time() to allow non-monotonic SYSTIM readings */
+	cycles = adapter->cc.read(&adapter->cc);
+	ns = timecounter_cyc2time(&adapter->tc, cycles);
+
 	spin_unlock_irqrestore(&adapter->systim_lock, flags);
 
 	*ts = ns_to_timespec64(ns);
@@ -232,9 +236,12 @@  static void e1000e_systim_overflow_work(struct work_struct *work)
 						     systim_overflow_work.work);
 	struct e1000_hw *hw = &adapter->hw;
 	struct timespec64 ts;
+	u64 ns;
 
-	adapter->ptp_clock_info.gettime64(&adapter->ptp_clock_info, &ts);
+	/* Update the timecounter */
+	ns = timecounter_read(&adapter->tc);
 
+	ts = ns_to_timespec64(ns);
 	e_dbg("SYSTIM overflow check at %lld.%09lu\n",
 	      (long long) ts.tv_sec, ts.tv_nsec);