mbox series

[RFC,v3,net-next,00/18] Time based packet transmission

Message ID 20180307011230.24001-1-jesus.sanchez-palencia@intel.com
Headers show
Series Time based packet transmission | expand

Message

Jesus Sanchez-Palencia March 7, 2018, 1:12 a.m. UTC
This series is the v3 of the Time based packet transmission RFC, which was
originally proposed by Richard Cochran (v1: https://lwn.net/Articles/733962/ )
and further developed by us with the addition of the tbs qdisc
(v2: https://lwn.net/Articles/744797/ ).

It introduces a new socket option (SO_TXTIME), a new qdisc (tbs) and
implements support for hw offloading on the igb driver for the Intel
i210 NIC. The tbs qdisc also supports SW best effort that can be used
as a fallback.

The main changes since v2 can be found below.

Fixes since v2:
 - skb->tstamp is only cleared on the forwarding path;
 - ktime_t is no longer the type used for timestamps (s64 is);
 - get_unaligned() is now used for copying data from the cmsg header;
 - added getsockopt() support for SO_TXTIME;
 - restricted SO_TXTIME input range to [0,1];
 - removed ns_capable() check from __sock_cmsg_send();
 - the qdisc  control struct now uses a 32 bitmap for config flags;
 - fixed qdisc backlog decrement bug;
 - 'overlimits' is now incremented on dequeue() drops in addition to the
   'dropped' counter;

Interface changes since v2:
 * CMSG interface:
   - added a per-packet clockid parameter to the cmsg (SCM_CLOCKID);
   - added a per-packet drop_if_late flag to the cmsg (SCM_DROP_IF_LATE);
 * tc-tbs:
   - clockid now receives a string;
     e.g.: CLOCK_REALTIME or /dev/ptp0
   - offload is now a standalone argument (i.e. no more offload 1);
   - sorting is now argument that enables txtime based sorting provided
     by the qdisc;

Design changes since v2:
 - Now on the dequeue() path, tbs only drops an expired packet if it has the
   skb->tc_drop_if_late flag set. In practical terms, this will define if
   the semantics of txtime on a system is "not earlier than" or "not later
   than" a given timestamp;
 - Now on the enqueue() path, the qdisc will drop a packet if its clockid
   doesn't match the qdisc's one;
 - Sorting the packets based on their txtime is now an option for the disc.
   Effectively, this means it can be configured in 4 modes: HW offload or
   SW best-effort, sorting enabled or disabled;


The tbs qdisc is designed so it buffers packets until a configurable time before
their deadline (tx times). If sorting is enabled, regardless of HW offload or SW
fallback modes, the qdisc uses a rbtree internally so the buffered packets are
always 'ordered' by the earliest deadline.

If sorting is disabled, then for HW offload the qdisc will use a 'raw' FIFO
through qdisc_enqueue_tail() / qdisc_dequeue_head(), whereas for SW best-effort,
it will use a 'scheduled' FIFO.

The other configurable parameter from the tbs qdisc is the clockid to be used.
In order to provide that, this series adds a new API to pkt_sched.h (i.e.
qdisc_watchdog_init_clockid()).

The tbs qdisc will drop any packets with a transmission time in the past or
when a deadline is missed if SCM_DROP_IF_LATE is set. Queueing packets in
advance plus configuring the delta parameter for the system correctly makes
all the difference in reducing the number of drops. Moreover, note that the
delta parameter ends up defining the Tx time when SW best-effort is used
given that the timestamps won't be used by the NIC on this case.

Examples:

# SW best-effort with sorting #

    $ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
               map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

    $ tc qdisc add dev enp2s0 parent 100:1 tbs delta 100000 \
               clockid CLOCK_REALTIME sorting

    In this example first the mqprio qdisc is setup, then the tbs qdisc is
    configured onto the first hw Tx queue using SW best-effort with sorting
    enabled. Also, it is configured so the timestamps on each packet are in
    reference to the clockid CLOCK_REALTIME and so packets are dequeued from
    the qdisc 100000 nanoseconds before their transmission time.


# HW offload without sorting #

    $ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
               map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

    $ tc qdisc add dev enp2s0 parent 100:1 tbs offload

    In this example, the Qdisc will use HW offload for the control of the
    transmission time through the network adapter. It's assumed implicitly
    the timestamp in skbuffs are in reference to the interface's PHC and
    setting any other valid clockid would be treated as an error. Because
    there is no scheduling being performed in the qdisc, setting a delta != 0
    would also be considered an error.


# HW offload with sorting #
    $ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
               map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

    $ tc qdisc add dev enp2s0 parent 100:1 tbs offload delta 100000 \
               clockid CLOCK_REALTIME sorting

    Here, the Qdisc will use HW offload for the txtime control again,
    but now sorting will be enabled, and thus there will be scheduling being
    performed by the qdisc. That is done based on the clockid CLOCK_REALTIME
    and packets leave the Qdisc "delta" (100000) nanoseconds before
    their transmission time. Because this will be using HW offload and
    since dynamic clocks are not supported by the hrtimer, the system clock
    and the PHC clock must be synchronized for this mode to behave as expected.


For testing, we've followed a similar approach from the v1 and v2 testing and
no significant changes on the results were observed. An updated version of
udp_tai.c is attached to this cover letter.

For last, most of the To Dos we still have before a final patchset are related
to further testing the igb support:
 - testing with L2 only talkers + AF_PACKET sockets;
 - testing tbs in conjunction with cbs;

Thanks for all the feedback so far,
Jesus


Jesus Sanchez-Palencia (12):
  sock: Fix SO_ZEROCOPY switch case
  net: Clear skb->tstamp only on the forwarding path
  posix-timers: Add CLOCKID_INVALID mask
  net: SO_TXTIME: Add clockid and drop_if_late params
  net: ipv4: raw: Handle remaining txtime parameters
  net: ipv4: udp: Handle remaining txtime parameters
  net: packet: Handle remaining txtime parameters
  net/sched: Add HW offloading capability to TBS
  igb: Refactor igb_configure_cbs()
  igb: Only change Tx arbitration when CBS is on
  igb: Refactor igb_offload_cbs()
  igb: Add support for TBS offload

Richard Cochran (4):
  net: Add a new socket option for a future transmit time.
  net: ipv4: raw: Hook into time based transmission.
  net: ipv4: udp: Hook into time based transmission.
  net: packet: Hook into time based transmission.

Vinicius Costa Gomes (2):
  net/sched: Allow creating a Qdisc watchdog with other clocks
  net/sched: Introduce the TBS Qdisc

 arch/alpha/include/uapi/asm/socket.h           |   5 +
 arch/frv/include/uapi/asm/socket.h             |   5 +
 arch/ia64/include/uapi/asm/socket.h            |   5 +
 arch/m32r/include/uapi/asm/socket.h            |   5 +
 arch/mips/include/uapi/asm/socket.h            |   5 +
 arch/mn10300/include/uapi/asm/socket.h         |   5 +
 arch/parisc/include/uapi/asm/socket.h          |   5 +
 arch/s390/include/uapi/asm/socket.h            |   5 +
 arch/sparc/include/uapi/asm/socket.h           |   5 +
 arch/xtensa/include/uapi/asm/socket.h          |   5 +
 drivers/net/ethernet/intel/igb/e1000_defines.h |  16 +
 drivers/net/ethernet/intel/igb/igb.h           |   1 +
 drivers/net/ethernet/intel/igb/igb_main.c      | 239 +++++++---
 include/linux/netdevice.h                      |   2 +
 include/linux/posix-timers.h                   |   1 +
 include/linux/skbuff.h                         |   3 +
 include/net/pkt_sched.h                        |   7 +
 include/net/sock.h                             |   4 +
 include/uapi/asm-generic/socket.h              |   5 +
 include/uapi/linux/pkt_sched.h                 |  18 +
 net/core/skbuff.c                              |   1 -
 net/core/sock.c                                |  44 +-
 net/ipv4/raw.c                                 |   7 +
 net/ipv4/udp.c                                 |  10 +-
 net/packet/af_packet.c                         |  19 +
 net/sched/Kconfig                              |  11 +
 net/sched/Makefile                             |   1 +
 net/sched/sch_api.c                            |  11 +-
 net/sched/sch_tbs.c                            | 591 +++++++++++++++++++++++++
 29 files changed, 978 insertions(+), 63 deletions(-)
 create mode 100644 net/sched/sch_tbs.c

Comments

Richard Cochran March 7, 2018, 5:28 a.m. UTC | #1
On Tue, Mar 06, 2018 at 05:12:12PM -0800, Jesus Sanchez-Palencia wrote:
> Design changes since v2:
>  - Now on the dequeue() path, tbs only drops an expired packet if it has the
>    skb->tc_drop_if_late flag set. In practical terms, this will define if
>    the semantics of txtime on a system is "not earlier than" or "not later
>    than" a given timestamp;
>  - Now on the enqueue() path, the qdisc will drop a packet if its clockid
>    doesn't match the qdisc's one;
>  - Sorting the packets based on their txtime is now an option for the disc.
>    Effectively, this means it can be configured in 4 modes: HW offload or
>    SW best-effort, sorting enabled or disabled;

While all of this makes the series and the configuration more complex,
still I like the fact that the interface offers these different modes.

Looking forward to testing this...

Thanks,
Richard
Henrik Austad March 8, 2018, 2:09 p.m. UTC | #2
On Tue, Mar 06, 2018 at 05:12:12PM -0800, Jesus Sanchez-Palencia wrote:
> This series is the v3 of the Time based packet transmission RFC, which was
> originally proposed by Richard Cochran (v1: https://lwn.net/Articles/733962/ )
> and further developed by us with the addition of the tbs qdisc
> (v2: https://lwn.net/Articles/744797/ ).

Nice!

> It introduces a new socket option (SO_TXTIME), a new qdisc (tbs) and
> implements support for hw offloading on the igb driver for the Intel
> i210 NIC. The tbs qdisc also supports SW best effort that can be used
> as a fallback.
> 
> The main changes since v2 can be found below.
> 
> Fixes since v2:
>  - skb->tstamp is only cleared on the forwarding path;
>  - ktime_t is no longer the type used for timestamps (s64 is);
>  - get_unaligned() is now used for copying data from the cmsg header;
>  - added getsockopt() support for SO_TXTIME;
>  - restricted SO_TXTIME input range to [0,1];
>  - removed ns_capable() check from __sock_cmsg_send();
>  - the qdisc  control struct now uses a 32 bitmap for config flags;
>  - fixed qdisc backlog decrement bug;
>  - 'overlimits' is now incremented on dequeue() drops in addition to the
>    'dropped' counter;
> 
> Interface changes since v2:
>  * CMSG interface:
>    - added a per-packet clockid parameter to the cmsg (SCM_CLOCKID);
>    - added a per-packet drop_if_late flag to the cmsg (SCM_DROP_IF_LATE);
>  * tc-tbs:
>    - clockid now receives a string;
>      e.g.: CLOCK_REALTIME or /dev/ptp0
>    - offload is now a standalone argument (i.e. no more offload 1);
>    - sorting is now argument that enables txtime based sorting provided
>      by the qdisc;
> 
> Design changes since v2:
>  - Now on the dequeue() path, tbs only drops an expired packet if it has the
>    skb->tc_drop_if_late flag set. In practical terms, this will define if
>    the semantics of txtime on a system is "not earlier than" or "not later
>    than" a given timestamp;
>  - Now on the enqueue() path, the qdisc will drop a packet if its clockid
>    doesn't match the qdisc's one;
>  - Sorting the packets based on their txtime is now an option for the disc.
>    Effectively, this means it can be configured in 4 modes: HW offload or
>    SW best-effort, sorting enabled or disabled;

A lot of new knobs, I see the need, I would've like to have fewer, but 
you've documented them pretty well. Perhaps we should add something to 
Documentation/ at one stage?

Anyways, the patches applied cleanly so I gave them a (very) quick spin. 
Using udp_tai and tcpdump in the other end to grab the frames

Setting up with hw offload and sorting in qdisc.

Sender (every 10ms) (4.16-rc4 on a core2duo 1.8Ghz w/i210 and max_rss 
bypass as dual-core and i210 is not friends):

udp_tai -c1 -i eth2 -p 20 -P 10000000

Receiver (imx7, kernel 4.9.11):
chrt -r 20 tcpdump -i eth0 ether host a0:36:9f:3f:c0:b8 | grep "UDP, length 256" > tai_imx7.log

Note: this involves 2 swtiches and a somewhat hackish kernel running on the 
receiver, so these numbers can only improve.

count    2340.000000
mean        0.043770
std         0.047784
min         0.009025
25%         0.010003
50%         0.010010
75%         0.109998
max         0.120060

I have to dig more into why this is happening, a lot frames delayed much 
more than I'd expect, but at this stage I'm pretty sure this is pebkac. One 
obvious fix is move some hw around and do a direct link, but I didn't have 
time for that right now.

I'm very interested in doing what Richard's original test was when he used 
ptp-synched clocks and also used hw receive-time and compared with expected 
tx-time. So, while I'm getting that up and running, I thought I should 
share the early results.

-Henrik

> The tbs qdisc is designed so it buffers packets until a configurable time before
> their deadline (tx times). If sorting is enabled, regardless of HW offload or SW
> fallback modes, the qdisc uses a rbtree internally so the buffered packets are
> always 'ordered' by the earliest deadline.
> 
> If sorting is disabled, then for HW offload the qdisc will use a 'raw' FIFO
> through qdisc_enqueue_tail() / qdisc_dequeue_head(), whereas for SW best-effort,
> it will use a 'scheduled' FIFO.
> 
> The other configurable parameter from the tbs qdisc is the clockid to be used.
> In order to provide that, this series adds a new API to pkt_sched.h (i.e.
> qdisc_watchdog_init_clockid()).
> 
> The tbs qdisc will drop any packets with a transmission time in the past or
> when a deadline is missed if SCM_DROP_IF_LATE is set. Queueing packets in
> advance plus configuring the delta parameter for the system correctly makes
> all the difference in reducing the number of drops. Moreover, note that the
> delta parameter ends up defining the Tx time when SW best-effort is used
> given that the timestamps won't be used by the NIC on this case.
> 
> Examples:
> 
> # SW best-effort with sorting #
> 
>     $ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
>                map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
> 
>     $ tc qdisc add dev enp2s0 parent 100:1 tbs delta 100000 \
>                clockid CLOCK_REALTIME sorting
> 
>     In this example first the mqprio qdisc is setup, then the tbs qdisc is
>     configured onto the first hw Tx queue using SW best-effort with sorting
>     enabled. Also, it is configured so the timestamps on each packet are in
>     reference to the clockid CLOCK_REALTIME and so packets are dequeued from
>     the qdisc 100000 nanoseconds before their transmission time.
> 
> 
> # HW offload without sorting #
> 
>     $ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
>                map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
> 
>     $ tc qdisc add dev enp2s0 parent 100:1 tbs offload
> 
>     In this example, the Qdisc will use HW offload for the control of the
>     transmission time through the network adapter. It's assumed implicitly
>     the timestamp in skbuffs are in reference to the interface's PHC and
>     setting any other valid clockid would be treated as an error. Because
>     there is no scheduling being performed in the qdisc, setting a delta != 0
>     would also be considered an error.
> 
> 
> # HW offload with sorting #
>     $ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
>                map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
> 
>     $ tc qdisc add dev enp2s0 parent 100:1 tbs offload delta 100000 \
>                clockid CLOCK_REALTIME sorting
> 
>     Here, the Qdisc will use HW offload for the txtime control again,
>     but now sorting will be enabled, and thus there will be scheduling being
>     performed by the qdisc. That is done based on the clockid CLOCK_REALTIME
>     and packets leave the Qdisc "delta" (100000) nanoseconds before
>     their transmission time. Because this will be using HW offload and
>     since dynamic clocks are not supported by the hrtimer, the system clock
>     and the PHC clock must be synchronized for this mode to behave as expected.
> 
> 
> For testing, we've followed a similar approach from the v1 and v2 testing and
> no significant changes on the results were observed. An updated version of
> udp_tai.c is attached to this cover letter.
> 
> For last, most of the To Dos we still have before a final patchset are related
> to further testing the igb support:
>  - testing with L2 only talkers + AF_PACKET sockets;
>  - testing tbs in conjunction with cbs;
> 
> Thanks for all the feedback so far,
> Jesus

-Henrik
Jesus Sanchez-Palencia March 8, 2018, 6:06 p.m. UTC | #3
Hi,


On 03/08/2018 06:09 AM, Henrik Austad wrote:

(...)

> 
> A lot of new knobs, I see the need, I would've like to have fewer, but 
> you've documented them pretty well. Perhaps we should add something to 
> Documentation/ at one stage?

Sure. The idea is working on that once the interfaces have been accepted.


> 
> Anyways, the patches applied cleanly so I gave them a (very) quick spin. 
> Using udp_tai and tcpdump in the other end to grab the frames
> 
> Setting up with hw offload and sorting in qdisc.
> 
> Sender (every 10ms) (4.16-rc4 on a core2duo 1.8Ghz w/i210 and max_rss 
> bypass as dual-core and i210 is not friends):
> 
> udp_tai -c1 -i eth2 -p 20 -P 10000000
> 
> Receiver (imx7, kernel 4.9.11):
> chrt -r 20 tcpdump -i eth0 ether host a0:36:9f:3f:c0:b8 | grep "UDP, length 256" > tai_imx7.log
> 
> Note: this involves 2 swtiches and a somewhat hackish kernel running on the 
> receiver, so these numbers can only improve.
> 
> count    2340.000000
> mean        0.043770
> std         0.047784
> min         0.009025
> 25%         0.010003
> 50%         0.010010
> 75%         0.109998
> max         0.120060
> 

Thanks for giving it a shot.

But I'm not sure I follow the numbers above, sorry :/
Are you computing the packet's Rx timestamp offset from the (expected) Tx time?


> I have to dig more into why this is happening, a lot frames delayed much 
> more than I'd expect, but at this stage I'm pretty sure this is pebkac. One 
> obvious fix is move some hw around and do a direct link, but I didn't have 
> time for that right now.
> 
> I'm very interested in doing what Richard's original test was when he used 
> ptp-synched clocks and also used hw receive-time and compared with expected 
> tx-time. So, while I'm getting that up and running, I thought I should 
> share the early results.


Sure, thanks. Which delta and clockid are you using, please?
Also, was this clock synchronized to the PHC? You need that for hw offload with
sorting enabled.

Thanks,
Jesus

(...)
Henrik Austad March 8, 2018, 10:54 p.m. UTC | #4
On Thu, Mar 08, 2018 at 10:06:46AM -0800, Jesus Sanchez-Palencia wrote:
> Hi,
> 
> 
> On 03/08/2018 06:09 AM, Henrik Austad wrote:
> 
> (...)
> 
> > 
> > A lot of new knobs, I see the need, I would've like to have fewer, but 
> > you've documented them pretty well. Perhaps we should add something to 
> > Documentation/ at one stage?
> 
> Sure. The idea is working on that once the interfaces have been accepted.

Yeah, probably a good idea.

> > Anyways, the patches applied cleanly so I gave them a (very) quick spin. 
> > Using udp_tai and tcpdump in the other end to grab the frames
> > 
> > Setting up with hw offload and sorting in qdisc.
> > 
> > Sender (every 10ms) (4.16-rc4 on a core2duo 1.8Ghz w/i210 and max_rss 
> > bypass as dual-core and i210 is not friends):
> > 
> > udp_tai -c1 -i eth2 -p 20 -P 10000000
> > 
> > Receiver (imx7, kernel 4.9.11):
> > chrt -r 20 tcpdump -i eth0 ether host a0:36:9f:3f:c0:b8 | grep "UDP, length 256" > tai_imx7.log
> > 
> > Note: this involves 2 swtiches and a somewhat hackish kernel running on the 
> > receiver, so these numbers can only improve.
> > 
> > count    2340.000000
> > mean        0.043770
> > std         0.047784
> > min         0.009025
> > 25%         0.010003
> > 50%         0.010010
> > 75%         0.109998
> > max         0.120060
> > 
> 
> Thanks for giving it a shot.
> 
> But I'm not sure I follow the numbers above, sorry :/
> Are you computing the packet's Rx timestamp offset from the (expected) Tx time?

Just looking at the timestamp when the frames were received. They should be 
sent at regular intervals if I read udp_tai.c correctly, so the assumption 
was that the timestamp from tcpdump should give an inkling to how well it 
worked.

I set it up to send a frame every 10ms and computed the diff between each 
UDP packet received. Nothing fancy, just tcpdump and grep for the 
timestamp and look at the distribution.

> > I have to dig more into why this is happening, a lot frames delayed much 
> > more than I'd expect, but at this stage I'm pretty sure this is pebkac. One 
> > obvious fix is move some hw around and do a direct link, but I didn't have 
> > time for that right now.
> > 
> > I'm very interested in doing what Richard's original test was when he used 
> > ptp-synched clocks and also used hw receive-time and compared with expected 
> > tx-time. So, while I'm getting that up and running, I thought I should 
> > share the early results.
> 
> Sure, thanks. Which delta and clockid are you using, please?

I used the example provided in -00,

tc qdisc replace dev eth2 parent root handle 100 mqprio num_tc 3 \
 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0

tc qdisc add dev eth2 parent 100:1 tbs offload delta 100000 clockid \
 CLOCK_REALTIME sorting

> Also, was this clock synchronized to the PHC? You need that for hw offload with
> sorting enabled.

Hmm, good point, no, NIC clock was not synchronized, I'll do that in the 
next round for both sender and receiver!

-henrik
Jesus Sanchez-Palencia March 8, 2018, 11:58 p.m. UTC | #5
Hi,


On 03/08/2018 02:54 PM, Henrik Austad wrote:
> Just looking at the timestamp when the frames were received. They should be 
> sent at regular intervals if I read udp_tai.c correctly, so the assumption 
> was that the timestamp from tcpdump should give an inkling to how well it 
> worked.
> 
> I set it up to send a frame every 10ms and computed the diff between each 
> UDP packet received. Nothing fancy, just tcpdump and grep for the 
> timestamp and look at the distribution.

Ok, I see it now. Just as a reference, this is how I've been running tcpdump on
my tests:

$ tcpdump -i enp3s0 -w foo.pcap -j adapter_unsynced \
	-tt --time-stamp-precision=nano udp port 7788 -c 10000


> 
>>> I have to dig more into why this is happening, a lot frames delayed much 
>>> more than I'd expect, but at this stage I'm pretty sure this is pebkac. One 
>>> obvious fix is move some hw around and do a direct link, but I didn't have 
>>> time for that right now.
>>>
>>> I'm very interested in doing what Richard's original test was when he used 
>>> ptp-synched clocks and also used hw receive-time and compared with expected 
>>> tx-time. So, while I'm getting that up and running, I thought I should 
>>> share the early results.
>>
>> Sure, thanks. Which delta and clockid are you using, please?
> 
> I used the example provided in -00,
> 
> tc qdisc replace dev eth2 parent root handle 100 mqprio num_tc 3 \
>  map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
> 
> tc qdisc add dev eth2 parent 100:1 tbs offload delta 100000 clockid \
>  CLOCK_REALTIME sorting


The delta value is highly dependent on the system. I recommend playing around
with it a bit before running long tests. On my KabyLake desktop I noticed that
150us is quite reliable value, for example. (same kernel as yours, and no
preempt-rt applied) But that is not the issue here it seems.



> 
>> Also, was this clock synchronized to the PHC? You need that for hw offload with
>> sorting enabled.
> 
> Hmm, good point, no, NIC clock was not synchronized, I'll do that in the 
> next round for both sender and receiver!

Oh, then you need to get that setup first. Here I synchronize both PHCs over the
network first with ptp4l:

Rx) $ ptp4l --summary_interval=3 -i enp3s0 -m -2
Tx) $ ptp4l --summary_interval=3 -i enp3s0 -s -m -2 &

My Rx is the PTP master and the Tx is the PTP slave.
Then I synchronize the PHC to the system clock on the Tx side only:

Tx) $ phc2sys -a -r -r -u 8 &


And udp_tai is using CLOCK_REALTIME. The UTC vs TAI 37s offset makes no
difference for this test specifically because I compensate for it when
calculating the offsets on the Rx side.

For the next patchset version I will be providing a more complete set of testing
instructions. I hope that helps for now.


Thanks,
Jesus