diff mbox series

[net-next,v3,2/4] net: mdio: add PTP offset compensation to mdiobus_write_sts

Message ID 20190820084833.6019-3-hubert.feurstein@vahle.at
State Changes Requested
Delegated to: David Miller
Headers show
Series Improve phc2sys precision for mv88e6xxx switch in combination with imx6-fec | expand

Commit Message

Hubert Feurstein Aug. 20, 2019, 8:48 a.m. UTC
From: Hubert Feurstein <h.feurstein@gmail.com>

The slow MDIO access introduces quite a big offset (~13us) to the PTP
system time synchronisation. With this patch the driver has the possibility
to set the correct offset which can then be compensated.

Signed-off-by: Hubert Feurstein <h.feurstein@gmail.com>
---
 drivers/net/phy/mdio_bus.c | 12 ++++++++++++
 include/linux/phy.h        |  8 ++++++++
 2 files changed, 20 insertions(+)

Comments

Miroslav Lichvar Aug. 20, 2019, 9:49 a.m. UTC | #1
On Tue, Aug 20, 2019 at 10:48:31AM +0200, Hubert Feurstein wrote:

> +	/* PTP offset compensation:
> +	 * After the MDIO access is completed (from the chip perspective), the
> +	 * switch chip will snapshot the PHC timestamp. To make sure our system
> +	 * timestamp corresponds to the PHC timestamp, we have to add the
> +	 * duration of this MDIO access to sts->post_ts. Linuxptp's phc2sys
> +	 * takes the average of pre_ts and post_ts to calculate the final
> +	 * system timestamp. With this in mind, we have to add ptp_sts_offset
> +	 * twice to post_ts, in order to not introduce an constant time offset.
> +	 */
> +	if (sts)
> +		timespec64_add_ns(&sts->post_ts, 2 * bus->ptp_sts_offset);

This correction looks good to me.

Is the MDIO write delay constant in reality, or does it at least have
an upper bound? That is, is it always true that the post_ts timestamp
does not point to a time before the PHC timestamp was actually taken?

This is important to not break the estimation of maximum error in the
measured offset. Applications using the ioctl may assume that the
maximum error is (post_ts-pre_ts)/2 (i.e. half of the delay printed by
phc2sys). That would not work if the delay could be occasionally 50
microseconds for instance, i.e. the post_ts timestamp would be earlier
than the PHC timestamp.
Hubert Feurstein Aug. 20, 2019, 12:29 p.m. UTC | #2
Hi Miroslav,

Am Di., 20. Aug. 2019 um 11:49 Uhr schrieb Miroslav Lichvar
<mlichvar@redhat.com>:
>
> On Tue, Aug 20, 2019 at 10:48:31AM +0200, Hubert Feurstein wrote:
>
> > +     /* PTP offset compensation:
> > +      * After the MDIO access is completed (from the chip perspective), the
> > +      * switch chip will snapshot the PHC timestamp. To make sure our system
> > +      * timestamp corresponds to the PHC timestamp, we have to add the
> > +      * duration of this MDIO access to sts->post_ts. Linuxptp's phc2sys
> > +      * takes the average of pre_ts and post_ts to calculate the final
> > +      * system timestamp. With this in mind, we have to add ptp_sts_offset
> > +      * twice to post_ts, in order to not introduce an constant time offset.
> > +      */
> > +     if (sts)
> > +             timespec64_add_ns(&sts->post_ts, 2 * bus->ptp_sts_offset);
>
> This correction looks good to me.
>
> Is the MDIO write delay constant in reality, or does it at least have
> an upper bound? That is, is it always true that the post_ts timestamp
> does not point to a time before the PHC timestamp was actually taken?
>
> This is important to not break the estimation of maximum error in the
> measured offset. Applications using the ioctl may assume that the
> maximum error is (post_ts-pre_ts)/2 (i.e. half of the delay printed by
> phc2sys). That would not work if the delay could be occasionally 50
> microseconds for instance, i.e. the post_ts timestamp would be earlier
> than the PHC timestamp.
>
If the timestamps are taken in the MDIO driver (imx-fec in my case), then
everything is quite deterministic (see results in the cover letter). Of course,
it still can be improved slightly, by splitting up the writel into iowmb and
write_relaxed and disable the interrupts while capturing the timestamps
(as I did in my original RFC patch). But phc2sys takes by default 5 measurements
and uses the one with the smallest delay, so this shouldn't be necessary.

Although, by adding 2 * ptp_sts_offset the system timestamp to post_ts
the timestamp is aligned with the PHC timestamp, but this also increases
the delay which is reported by phc2sys (~26us). But the maximum error
which must be expected is definitely much less (< 1us). So maybe it is better
to shift both timestamps pre_ts and post_ts by ptp_sts_offset.

Hubert
Andrew Lunn Aug. 20, 2019, 1:26 p.m. UTC | #3
On Tue, Aug 20, 2019 at 11:49:03AM +0200, Miroslav Lichvar wrote:
> On Tue, Aug 20, 2019 at 10:48:31AM +0200, Hubert Feurstein wrote:
> 
> > +	/* PTP offset compensation:
> > +	 * After the MDIO access is completed (from the chip perspective), the
> > +	 * switch chip will snapshot the PHC timestamp. To make sure our system
> > +	 * timestamp corresponds to the PHC timestamp, we have to add the
> > +	 * duration of this MDIO access to sts->post_ts. Linuxptp's phc2sys
> > +	 * takes the average of pre_ts and post_ts to calculate the final
> > +	 * system timestamp. With this in mind, we have to add ptp_sts_offset
> > +	 * twice to post_ts, in order to not introduce an constant time offset.
> > +	 */
> > +	if (sts)
> > +		timespec64_add_ns(&sts->post_ts, 2 * bus->ptp_sts_offset);
> 
> This correction looks good to me.
> 
> Is the MDIO write delay constant in reality, or does it at least have
> an upper bound? That is, is it always true that the post_ts timestamp
> does not point to a time before the PHC timestamp was actually taken?

The post_ts could be before the target hardware does anything. The
write triggers an MDIO bus transaction, sending about 64 bits of data
down a wire at around 2.5Mbps. So there is a minimum delay of 25uS
just sending the bits down the wire. It is unclear to me exactly when
the post_ts is taken, has the hardware actually sent the bits, or has
it just initiated sending the bits? In this case, there is an
interrupt sometime later indicating the transaction has completed, so
my guess would be, post_ts indicates the transaction has been
initiated.

Also, how long does the device on the end of the bus actually take to
decode the bits on the wire and do what it needs to do?

> This is important to not break the estimation of maximum error in the
> measured offset. Applications using the ioctl may assume that the
> maximum error is (post_ts-pre_ts)/2 (i.e. half of the delay printed by
> phc2sys). That would not work if the delay could be occasionally 50
> microseconds for instance, i.e. the post_ts timestamp would be earlier
> than the PHC timestamp.

Given my understanding of the hardware, post_ts-pre_ts should be
constant. But what it exactly measures is not clearly defined :-(

And different hardware will have different definitions.

But the real point is, by doing these timestamps here, we are as close
as possible to where the action really happens, and we are minimising
the number of undefined things we are measuring, so in general, the
error is minimised.

    Andrew
Miroslav Lichvar Aug. 20, 2019, 2:25 p.m. UTC | #4
On Tue, Aug 20, 2019 at 02:29:27PM +0200, Hubert Feurstein wrote:
> Am Di., 20. Aug. 2019 um 11:49 Uhr schrieb Miroslav Lichvar
> > This is important to not break the estimation of maximum error in the
> > measured offset. Applications using the ioctl may assume that the
> > maximum error is (post_ts-pre_ts)/2 (i.e. half of the delay printed by
> > phc2sys). That would not work if the delay could be occasionally 50
> > microseconds for instance, i.e. the post_ts timestamp would be earlier
> > than the PHC timestamp.
> >
> If the timestamps are taken in the MDIO driver (imx-fec in my case), then
> everything is quite deterministic (see results in the cover letter). Of course,
> it still can be improved slightly, by splitting up the writel into iowmb and
> write_relaxed and disable the interrupts while capturing the timestamps
> (as I did in my original RFC patch). But phc2sys takes by default 5 measurements
> and uses the one with the smallest delay, so this shouldn't be necessary.

The delay that phc2sys sees is the difference between post_ts and
pre_ts, which doesn't contain the actual delay, right? So, I'm not
sure how well the phc2sys filtering actually works. Even if the
measured delay was related to the write delay, would be 5 measurements
always enough to get a "correct" sample?

> Although, by adding 2 * ptp_sts_offset the system timestamp to post_ts
> the timestamp is aligned with the PHC timestamp, but this also increases
> the delay which is reported by phc2sys (~26us). But the maximum error
> which must be expected is definitely much less (< 1us). So maybe it is better
> to shift both timestamps pre_ts and post_ts by ptp_sts_offset.

If you could guarantee that [pre_ts + ptp_sts_offset, post_ts +
ptp_sts_offset] always contains the PHC timestamps, then that would be
great. From what Andrew is writing, this doesn't seem to be the case.

I'd suggest a different approach:
- specify a minimum delay for the write and a minimum delay for the
  completion (if they are not equal)
- take a second "post" system timestamp after the completion
- adjust pre_ts and post_ts so that the middle point is equal to what
  you have now, but the interval contains both
  pre_ts + min_write_delay and post2_ts - min_completion_delay

This way you get the best stability and also a delay that correctly
estimates the maximum error.
Andrew Lunn Aug. 20, 2019, 3:23 p.m. UTC | #5
> - take a second "post" system timestamp after the completion

For this hardware, completion is an interrupt, which has a lot of
jitter on it. But this hardware is odd, in that it uses an
interrupt. Every other MDIO bus controller uses polled IO, with an
mdelay(10) or similar between each poll. So the jitter is going to be
much larger.

Even though the FEC is special with its interrupt completion, i would
like to see the solution being reasonably generic so that others can
copy it into other MDIO bus drivers. That is what is nice about taking
the time stamp around the write which triggers the bus transaction. It
is independent of interrupt or polled, and should mean about the same
thing for different vendors hardware.

      Andrew
Miroslav Lichvar Aug. 20, 2019, 3:40 p.m. UTC | #6
On Tue, Aug 20, 2019 at 05:23:06PM +0200, Andrew Lunn wrote:
> > - take a second "post" system timestamp after the completion
> 
> For this hardware, completion is an interrupt, which has a lot of
> jitter on it. But this hardware is odd, in that it uses an
> interrupt. Every other MDIO bus controller uses polled IO, with an
> mdelay(10) or similar between each poll. So the jitter is going to be
> much larger.

I think a large jitter is ok in this case. We just need to timestamp
something that we know for sure happened after the PHC timestamp. It
should have no impact on the offset and its stability, just the
reported delay. A test with phc2sys should be able to confirm that.
phc2sys selects the measurement with the shortest delay, which has
least uncertainty. I'd say that applies to both interrupt and polling.

If it is difficult to specify the minimum interrupt delay, I'd still
prefer an overly pessimistic interval assuming a zero delay.
Hubert Feurstein Aug. 20, 2019, 4:56 p.m. UTC | #7
Am Di., 20. Aug. 2019 um 17:40 Uhr schrieb Miroslav Lichvar
<mlichvar@redhat.com>:
>
> On Tue, Aug 20, 2019 at 05:23:06PM +0200, Andrew Lunn wrote:
> > > - take a second "post" system timestamp after the completion
> >
> > For this hardware, completion is an interrupt, which has a lot of
> > jitter on it. But this hardware is odd, in that it uses an
> > interrupt. Every other MDIO bus controller uses polled IO, with an
> > mdelay(10) or similar between each poll. So the jitter is going to be
> > much larger.
>
> I think a large jitter is ok in this case. We just need to timestamp
> something that we know for sure happened after the PHC timestamp. It
> should have no impact on the offset and its stability, just the
> reported delay. A test with phc2sys should be able to confirm that.
> phc2sys selects the measurement with the shortest delay, which has
> least uncertainty. I'd say that applies to both interrupt and polling.
>
> If it is difficult to specify the minimum interrupt delay, I'd still
> prefer an overly pessimistic interval assuming a zero delay.
>
Currently I do not see the benefit from this. The original intention was to
compensate for the remaining offset as good as possible. The current code
of phc2sys uses the delay only for the filtering of the measurement record
with the shortest delay and for reporting and statistics. Why not simple shift
the timestamps with the offset to the point where we expect the PHC timestamp
to be captured, and we have a very good result compared to where we came
from.

Hubert
Miroslav Lichvar Aug. 21, 2019, 8:07 a.m. UTC | #8
On Tue, Aug 20, 2019 at 06:56:56PM +0200, Hubert Feurstein wrote:
> Am Di., 20. Aug. 2019 um 17:40 Uhr schrieb Miroslav Lichvar
> > I think a large jitter is ok in this case. We just need to timestamp
> > something that we know for sure happened after the PHC timestamp. It
> > should have no impact on the offset and its stability, just the
> > reported delay. A test with phc2sys should be able to confirm that.
> > phc2sys selects the measurement with the shortest delay, which has
> > least uncertainty. I'd say that applies to both interrupt and polling.
> >
> > If it is difficult to specify the minimum interrupt delay, I'd still
> > prefer an overly pessimistic interval assuming a zero delay.
> >
> Currently I do not see the benefit from this. The original intention was to
> compensate for the remaining offset as good as possible.

That's ok, but IMHO the change should not break the assumptions of
existing application and users.

> The current code
> of phc2sys uses the delay only for the filtering of the measurement record
> with the shortest delay and for reporting and statistics. Why not simple shift
> the timestamps with the offset to the point where we expect the PHC timestamp
> to be captured, and we have a very good result compared to where we came
> from.

Because those reports/statistics are important in calculation of
maximum error. If someone had a requirement for a clock to be accurate
to 1.5 microseconds and the ioctl returned a delay indicating a
sufficient accuracy when in reality it could be worse, that would be a
problem.

BTW, phc2sys is not the only user of the ioctl.
Hubert Feurstein Aug. 21, 2019, 9:53 a.m. UTC | #9
Am Mi., 21. Aug. 2019 um 10:07 Uhr schrieb Miroslav Lichvar
<mlichvar@redhat.com>:
> > Currently I do not see the benefit from this. The original intention was to
> > compensate for the remaining offset as good as possible.
>
> That's ok, but IMHO the change should not break the assumptions of
> existing application and users.
>
> > The current code
> > of phc2sys uses the delay only for the filtering of the measurement record
> > with the shortest delay and for reporting and statistics. Why not simple shift
> > the timestamps with the offset to the point where we expect the PHC timestamp
> > to be captured, and we have a very good result compared to where we came
> > from.
>
> Because those reports/statistics are important in calculation of
> maximum error. If someone had a requirement for a clock to be accurate
> to 1.5 microseconds and the ioctl returned a delay indicating a
> sufficient accuracy when in reality it could be worse, that would be a
> problem.
>
Ok, I understand your point. But including the MDIO completion into
delay calculation
will indicate a much wore precision as it actually is. When the MDIO
driver implements
the PTP system timestamping as follows ...

  ptp_read_system_prets(bus->ptp_sts);
  writel(value, mdio-reg)
  ptp_read_system_postts(bus->ptp_sts);

... then we catch already the error caused by interrupts which hit the
pre/post_ts section.
Now we only have the additional error of one MDIO clock cycle
(~400ns). Because I expect
the MDIO controller to shift out the MDIO frame on the next MDIO clock
cycle. So if I subtract
one MDIO clock cycle from pre_ts and add one MDIO clock cycle to
post_ts the error indication
would be sufficiently corrected IMHO. And then we can shift both
timestamps in the switch driver
(in the gettimex handler) to compensate the switch depending offset.
What do you think?

Hubert
Vladimir Oltean Aug. 21, 2019, 10:19 a.m. UTC | #10
On Wed, 21 Aug 2019 at 12:53, Hubert Feurstein <h.feurstein@gmail.com> wrote:
>
> Am Mi., 21. Aug. 2019 um 10:07 Uhr schrieb Miroslav Lichvar
> <mlichvar@redhat.com>:
> > > Currently I do not see the benefit from this. The original intention was to
> > > compensate for the remaining offset as good as possible.
> >
> > That's ok, but IMHO the change should not break the assumptions of
> > existing application and users.
> >
> > > The current code
> > > of phc2sys uses the delay only for the filtering of the measurement record
> > > with the shortest delay and for reporting and statistics. Why not simple shift
> > > the timestamps with the offset to the point where we expect the PHC timestamp
> > > to be captured, and we have a very good result compared to where we came
> > > from.
> >
> > Because those reports/statistics are important in calculation of
> > maximum error. If someone had a requirement for a clock to be accurate
> > to 1.5 microseconds and the ioctl returned a delay indicating a
> > sufficient accuracy when in reality it could be worse, that would be a
> > problem.
> >
> Ok, I understand your point. But including the MDIO completion into
> delay calculation
> will indicate a much wore precision as it actually is. When the MDIO
> driver implements
> the PTP system timestamping as follows ...
>
>   ptp_read_system_prets(bus->ptp_sts);
>   writel(value, mdio-reg)
>   ptp_read_system_postts(bus->ptp_sts);
>
> ... then we catch already the error caused by interrupts which hit the
> pre/post_ts section.
> Now we only have the additional error of one MDIO clock cycle
> (~400ns). Because I expect
> the MDIO controller to shift out the MDIO frame on the next MDIO clock
> cycle. So if I subtract

How do you know?
The MDIO controller is a memory-mapped peripheral so there will be a
transaction on the core <-> peripheral interconnect in the SoC.
Depending on the system load, the transaction might not be
instantaneous as you think. Additionally there will be clock domain
crossings in the MDIO controller when transferring the command from
the platform clock to the peripheral clock, which might also add some
jitter.
MDIO frames may also begin with 32 bits of preamble, with some
controllers being able to suppress it. Have you checked/accounted for
this?
The only reliable moment when you know the MDIO command has completed
is when the controller says it has. If there is additional jitter in
waiting for the completion event caused by the GIC and the scheduling
of the ISR, then just switch the driver to poll mode, like everybody
else.

> one MDIO clock cycle from pre_ts and add one MDIO clock cycle to
> post_ts the error indication
> would be sufficiently corrected IMHO. And then we can shift both
> timestamps in the switch driver
> (in the gettimex handler) to compensate the switch depending offset.
> What do you think?
>
> Hubert

Regards,
-Vladimir
Miroslav Lichvar Aug. 21, 2019, 10:19 a.m. UTC | #11
On Wed, Aug 21, 2019 at 11:53:12AM +0200, Hubert Feurstein wrote:
> Am Mi., 21. Aug. 2019 um 10:07 Uhr schrieb Miroslav Lichvar
> > Because those reports/statistics are important in calculation of
> > maximum error. If someone had a requirement for a clock to be accurate
> > to 1.5 microseconds and the ioctl returned a delay indicating a
> > sufficient accuracy when in reality it could be worse, that would be a
> > problem.
> >
> Ok, I understand your point. But including the MDIO completion into
> delay calculation
> will indicate a much wore precision as it actually is.

That's ok. It's the same with PCIe devices. It takes about 500 ns to
read a PCI register, so we know in the worst case the offset is
accurate to 250 ns. It's probably much better, maybe to 50 ns, but we
don't really know. We don't know how much asymmetry there is in the
PCIe delay (it certainly is not zero), or how much time the NIC
actually needs to respond to the command and when exactly it reads the
clock.

> When the MDIO
> driver implements
> the PTP system timestamping as follows ...
> 
>   ptp_read_system_prets(bus->ptp_sts);
>   writel(value, mdio-reg)
>   ptp_read_system_postts(bus->ptp_sts);
> 
> ... then we catch already the error caused by interrupts which hit the
> pre/post_ts section.
> Now we only have the additional error of one MDIO clock cycle
> (~400ns). Because I expect
> the MDIO controller to shift out the MDIO frame on the next MDIO clock
> cycle.

Is this always the case?

> So if I subtract
> one MDIO clock cycle from pre_ts and add one MDIO clock cycle to
> post_ts the error indication
> would be sufficiently corrected IMHO.

If I understand it correctly, this ignores the time needed for the
frame to be received, decoded and the clock to be read.
diff mbox series

Patch

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 4dba2714495e..50a37cf46f96 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -739,6 +739,18 @@  int __mdiobus_write_sts(struct mii_bus *bus, int addr, u32 regnum, u16 val,
 	if (!(bus->flags & MII_BUS_F_PTP_STS_SUPPORTED))
 		ptp_read_system_postts(sts);
 
+	/* PTP offset compensation:
+	 * After the MDIO access is completed (from the chip perspective), the
+	 * switch chip will snapshot the PHC timestamp. To make sure our system
+	 * timestamp corresponds to the PHC timestamp, we have to add the
+	 * duration of this MDIO access to sts->post_ts. Linuxptp's phc2sys
+	 * takes the average of pre_ts and post_ts to calculate the final
+	 * system timestamp. With this in mind, we have to add ptp_sts_offset
+	 * twice to post_ts, in order to not introduce an constant time offset.
+	 */
+	if (sts)
+		timespec64_add_ns(&sts->post_ts, 2 * bus->ptp_sts_offset);
+
 	return retval;
 }
 EXPORT_SYMBOL(__mdiobus_write_sts);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 0b33662e0320..615df9c7f2c3 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -283,8 +283,16 @@  struct mii_bus {
 	 * The ptp_read_system_*ts functions already check the ptp_sts pointer.
 	 * The MII_BUS_F_PTP_STS_SUPPORTED-bit must be set in flags, when the
 	 * MDIO bus driver takes the timestamps as described above.
+	 *
+	 * @ptp_sts_offset: This is the compensation offset for the system
+	 * timestamp which is introduced by the slow MDIO access duration. An
+	 * MDIO access consists of 32 clock cycles. Usually the MDIO bus runs
+	 * at ~2.5MHz, so we have to compensate ~12800ns offset.
+	 * Set the ptp_sts_offset to the exact duration of one MDIO frame
+	 * (= 32 * clock-period) in nano-seconds.
 	 */
 	struct ptp_system_timestamp *ptp_sts;
+	u32 ptp_sts_offset;
 };
 
 #define to_mii_bus(d) container_of(d, struct mii_bus, dev)