diff mbox

[RFC,3/7] net: add option to get information about timestamped packets

Message ID 20170412141737.5881-4-mlichvar@redhat.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Miroslav Lichvar April 12, 2017, 2:17 p.m. UTC
Extend the skb_shared_hwtstamps structure with the index of the
real interface which received or transmitted the packet and the length
of the packet at layer 2. Add a SOF_TIMESTAMPING_OPT_PKTINFO flag to
the SO_TIMESTAMPING option to allow applications to get this information
as struct scm_ts_pktinfo in SCM_TIMESTAMPING_PKTINFO control message.

The index is mainly useful with bonding, bridges and other virtual
interfaces, where IP_PKTINFO doesn't provide the index of the real
interface. Applications may need it to determine which PHC made the
timestamp. With the L2 length it is possible to transpose preamble
timestamps to trailer timestamps, which are used in the NTP protocol.

Instead of adding a new check to a common path that might slow down
processing of received packets without hardware timestamps, the new
fields are expected to be set by the drivers together with the hardware
timestamp using the skb_hw_timestamp() function.

CC: Richard Cochran <richardcochran@gmail.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
---
 Documentation/networking/timestamping.txt |  8 ++++++++
 include/linux/skbuff.h                    | 22 ++++++++++++++++++++++
 include/uapi/asm-generic/socket.h         |  2 ++
 include/uapi/linux/errqueue.h             |  8 ++++++++
 include/uapi/linux/net_tstamp.h           |  3 ++-
 net/core/skbuff.c                         | 12 +++++++++---
 net/socket.c                              | 11 ++++++++++-
 7 files changed, 61 insertions(+), 5 deletions(-)

Comments

Willem de Bruijn April 13, 2017, 2:37 p.m. UTC | #1
On Wed, Apr 12, 2017 at 10:17 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> Extend the skb_shared_hwtstamps structure with the index of the
> real interface which received or transmitted the packet and the length
> of the packet at layer 2.

The original packet is received along with the timestamp. Why is this L2
length needed?

> Add a SOF_TIMESTAMPING_OPT_PKTINFO flag to
> the SO_TIMESTAMPING option to allow applications to get this information
> as struct scm_ts_pktinfo in SCM_TIMESTAMPING_PKTINFO control message.

This patch saves skb->dev->ifindex, which is the same as existing
SOF_TIMESTAMPING_OPT_CMSG. See also the bug fix for that
feature I sent yesterday: http://patchwork.ozlabs.org/patch/750197/

If the intent is to return a different ifindex, I would still suggest using
the existing pktinfo infrastructure, but changing the ifindex that is
recorded.
Miroslav Lichvar April 13, 2017, 3:18 p.m. UTC | #2
On Thu, Apr 13, 2017 at 10:37:07AM -0400, Willem de Bruijn wrote:
> On Wed, Apr 12, 2017 at 10:17 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> > Extend the skb_shared_hwtstamps structure with the index of the
> > real interface which received or transmitted the packet and the length
> > of the packet at layer 2.
> 
> The original packet is received along with the timestamp.

But only outgoing packets, right?

> Why is this L2 length needed?

It's needed for incoming packets to allow converting of preamble
timestamps to trailer timestamps.

> > Add a SOF_TIMESTAMPING_OPT_PKTINFO flag to
> > the SO_TIMESTAMPING option to allow applications to get this information
> > as struct scm_ts_pktinfo in SCM_TIMESTAMPING_PKTINFO control message.
> 
> This patch saves skb->dev->ifindex, which is the same as existing
> SOF_TIMESTAMPING_OPT_CMSG. See also the bug fix for that
> feature I sent yesterday: http://patchwork.ozlabs.org/patch/750197/

The main point is that it provides the index of the device which
received the packet. It does duplicate the functionality of OPT_CMSG +
IP_PKTINFO for outgoing packets, but I thought it might be useful with
the TSONLY option.

BTW, the original ifindex used to be in skb->skb_iif, but that changed
in b6858177.

> If the intent is to return a different ifindex, I would still suggest using
> the existing pktinfo infrastructure, but changing the ifindex that is
> recorded.

How would the application get the l2 length? If this (l2 length,
if_index) tuple is specific to timestamping, I think it would make
sense to keep it out of the IP layer.
Willem de Bruijn April 13, 2017, 4:16 p.m. UTC | #3
On Thu, Apr 13, 2017 at 11:18 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> On Thu, Apr 13, 2017 at 10:37:07AM -0400, Willem de Bruijn wrote:
>> On Wed, Apr 12, 2017 at 10:17 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
>> > Extend the skb_shared_hwtstamps structure with the index of the
>> > real interface which received or transmitted the packet and the length
>> > of the packet at layer 2.
>>
>> The original packet is received along with the timestamp.
>
> But only outgoing packets, right?

Timestamps for incoming packets are also passed alongside the original packet.

>> Why is this L2 length needed?
>
> It's needed for incoming packets to allow converting of preamble
> timestamps to trailer timestamps.

Receiving the mac length of a packet sounds like a feature independent
from timestamping. Either an ioctl similar to SIOCGIFMTU or, if it may
vary due to existince of vlan headers, a new independent cmsg at the
SOL_SOCKET layer.

>> > Add a SOF_TIMESTAMPING_OPT_PKTINFO flag to
>> > the SO_TIMESTAMPING option to allow applications to get this information
>> > as struct scm_ts_pktinfo in SCM_TIMESTAMPING_PKTINFO control message.
>>
>> This patch saves skb->dev->ifindex, which is the same as existing
>> SOF_TIMESTAMPING_OPT_CMSG. See also the bug fix for that
>> feature I sent yesterday: http://patchwork.ozlabs.org/patch/750197/
>
> The main point is that it provides the index of the device which
> received the packet. It does duplicate the functionality of OPT_CMSG +
> IP_PKTINFO for outgoing packets, but I thought it might be useful with
> the TSONLY option.

Agreed. I'd prefer to reuse the existing option for that and just extend it
to work together with TSONLY.

We will have to set serr->header.h4.iif from something other than skb->dev
if the skb was allocated fresh in __skb_tstamp_tx without the device
association.
Miroslav Lichvar April 24, 2017, 9 a.m. UTC | #4
On Thu, Apr 13, 2017 at 12:16:09PM -0400, Willem de Bruijn wrote:
> On Thu, Apr 13, 2017 at 11:18 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> > On Thu, Apr 13, 2017 at 10:37:07AM -0400, Willem de Bruijn wrote:
> >> Why is this L2 length needed?
> >
> > It's needed for incoming packets to allow converting of preamble
> > timestamps to trailer timestamps.
> 
> Receiving the mac length of a packet sounds like a feature independent
> from timestamping.

I agree, but so far nobody suggested another use for this information.
Do you have any suggestions?

The idea was that if it is useful only with HW timestamping, it would
be better to save it only with the timestamp, so there is no
performance impact in the more common case when HW timestamping is
disabled. Am I overly cautious here?

> Either an ioctl similar to SIOCGIFMTU or, if it may
> vary due to existince of vlan headers, a new independent cmsg at the
> SOL_SOCKET layer.

It's not just the VLAN headers. The length of the IP header may vary
with IP options, so the offset of the UDP data in the packet cannot be
assumed to be constant.

Now I'm wondering if it's actually necessary to save the original
value of skb->mac_len + skb->len. Would "skb->data - skb->head -
skb->mac_header + skb->len" always work as the L2 length for received
packets at the time when the cmsg is prepared?

As for the original ifindex, it seems to me it does need to be saved
to a new field since __netif_receive_skb_core() intentionally
overwrites skb->skb_iif. What would be the best place for it, sk_buff
or skb_shared_info?

And would it really be acceptable to save it for all packets in
__netif_receive_skb_core(), even when HW timestamping is disabled?
Seeing how the code and the data structures were optimized over time,
I have a feeling it would not be accepted.
Willem de Bruijn April 24, 2017, 3:18 p.m. UTC | #5
On Mon, Apr 24, 2017 at 5:00 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> On Thu, Apr 13, 2017 at 12:16:09PM -0400, Willem de Bruijn wrote:
>> On Thu, Apr 13, 2017 at 11:18 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
>> > On Thu, Apr 13, 2017 at 10:37:07AM -0400, Willem de Bruijn wrote:
>> >> Why is this L2 length needed?
>> >
>> > It's needed for incoming packets to allow converting of preamble
>> > timestamps to trailer timestamps.
>>
>> Receiving the mac length of a packet sounds like a feature independent
>> from timestamping.
>
> I agree, but so far nobody suggested another use for this information.
> Do you have any suggestions?
>
> The idea was that if it is useful only with HW timestamping, it would
> be better to save it only with the timestamp, so there is no
> performance impact in the more common case when HW timestamping is
> disabled. Am I overly cautious here?

The additional cost of a cmsg is zero for sockets that have no cmsg
enabled, due to

        if (inet->cmsg_flags)
                ip_cmsg_recv_offset(msg, sk, skb, sizeof(struct udphdr), off);

But you might be right that there are no uses outside the specific
timestamp requirement you have, so if you prefer to use a timestamp
option, I won't object further.

>> Either an ioctl similar to SIOCGIFMTU or, if it may
>> vary due to existince of vlan headers, a new independent cmsg at the
>> SOL_SOCKET layer.

The latter would require adding the SOL_SOCKET level cmsg processing
infra. It is simpler to just add it at the INET/INET6 levels.

> It's not just the VLAN headers. The length of the IP header may vary
> with IP options, so the offset of the UDP data in the packet cannot be
> assumed to be constant.

As well as tunnels.

> Now I'm wondering if it's actually necessary to save the original
> value of skb->mac_len + skb->len.

Computing it on recv if needed is definitely preferable to computing
on enqueue and storing in an intermediate variable.

> Would "skb->data - skb->head -
> skb->mac_header + skb->len" always work as the L2 length for received
> packets at the time when the cmsg is prepared?

(skb->data - skb->head) - skb->mac_header computes the length
of data before the mac, such as reserve? Do you mean skb->data -
skb->mac_header (or - skb_mac_offset(skb))?

> As for the original ifindex, it seems to me it does need to be saved
> to a new field since __netif_receive_skb_core() intentionally
> overwrites skb->skb_iif. What would be the best place for it, sk_buff
> or skb_shared_info?

Finding storage space on the receive path will not be easy.

One shortcut to avoid storing this information explicitly is to look up
the device from skb->napi_id.

> And would it really be acceptable to save it for all packets in
> __netif_receive_skb_core(), even when HW timestamping is disabled?
> Seeing how the code and the data structures were optimized over time,
> I have a feeling it would not be accepted.

Incurring this cost on all packets for such a rare edge case does sound
like a non-starter.

It can be called only if the netstamp_needed static key is enabled (false),
in __net_timestamp, though.
Miroslav Lichvar April 25, 2017, 1:56 p.m. UTC | #6
On Mon, Apr 24, 2017 at 11:18:13AM -0400, Willem de Bruijn wrote:
> On Mon, Apr 24, 2017 at 5:00 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> > Would "skb->data - skb->head -
> > skb->mac_header + skb->len" always work as the L2 length for received
> > packets at the time when the cmsg is prepared?
> 
> (skb->data - skb->head) - skb->mac_header computes the length
> of data before the mac, such as reserve?

data - head includes the reserve, but mac_header does too, so I think
it should be just the length of MAC header and everything up to the
data.

> Do you mean skb->data -
> skb->mac_header (or - skb_mac_offset(skb))?

That would give me a pointer? If I used skb_mac_offset(), the total
length would be just skb->len - skb_mac_offset()?

> > As for the original ifindex, it seems to me it does need to be saved
> > to a new field since __netif_receive_skb_core() intentionally
> > overwrites skb->skb_iif. What would be the best place for it, sk_buff
> > or skb_shared_info?
> 
> Finding storage space on the receive path will not be easy.
> 
> One shortcut to avoid storing this information explicitly is to look up
> the device from skb->napi_id.

Thanks. This looks promising. It will depend on CONFIG_NET_RX_BUSY_POLL,
but I guess that's ok. It nicely isolates all costs to the timestamping
option.
Willem de Bruijn April 25, 2017, 5:23 p.m. UTC | #7
On Tue, Apr 25, 2017 at 9:56 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
> On Mon, Apr 24, 2017 at 11:18:13AM -0400, Willem de Bruijn wrote:
>> On Mon, Apr 24, 2017 at 5:00 AM, Miroslav Lichvar <mlichvar@redhat.com> wrote:
>> > Would "skb->data - skb->head -
>> > skb->mac_header + skb->len" always work as the L2 length for received
>> > packets at the time when the cmsg is prepared?
>>
>> (skb->data - skb->head) - skb->mac_header computes the length
>> of data before the mac, such as reserve?
>
> data - head includes the reserve, but mac_header does too, so I think
> it should be just the length of MAC header and everything up to the
> data.
>
>> Do you mean skb->data -
>> skb->mac_header (or - skb_mac_offset(skb))?
>
> That would give me a pointer? If I used skb_mac_offset(), the total
> length would be just skb->len - skb_mac_offset()?

It appears so. The only existing caller first checks
skb_mac_header_was_set(skb).
diff mbox

Patch

diff --git a/Documentation/networking/timestamping.txt b/Documentation/networking/timestamping.txt
index 96f5069..ed04aaa 100644
--- a/Documentation/networking/timestamping.txt
+++ b/Documentation/networking/timestamping.txt
@@ -193,6 +193,14 @@  SOF_TIMESTAMPING_OPT_STATS:
   the transmit timestamps, such as how long a certain block of
   data was limited by peer's receiver window.
 
+SOF_TIMESTAMPING_OPT_PKTINFO:
+
+  Optional information about timestamped packets. It includes the
+  index of the real interface which received or transmitted the
+  packet and its length at layer 2. If the device driver provides
+  this information, it will be attached in struct scm_ts_pktinfo as
+  a separate control message of type SCM_TIMESTAMPING_PKTINFO.
+
 New applications are encouraged to pass SOF_TIMESTAMPING_OPT_ID to
 disambiguate timestamps and SOF_TIMESTAMPING_OPT_TSONLY to operate
 regardless of the setting of sysctl net.core.tstamp_allow_data.
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 741d75c..e91685a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -349,6 +349,8 @@  static inline void skb_frag_size_sub(skb_frag_t *frag, int delta)
  * struct skb_shared_hwtstamps - hardware time stamps
  * @hwtstamp:	hardware time stamp transformed into duration
  *		since arbitrary point in time
+ * @if_index:	index of the interface which timestamped the packet
+ * @pkt_length:	length of the packet
  *
  * Software time stamps generated by ktime_get_real() are stored in
  * skb->tstamp.
@@ -361,6 +363,8 @@  static inline void skb_frag_size_sub(skb_frag_t *frag, int delta)
  */
 struct skb_shared_hwtstamps {
 	ktime_t	hwtstamp;
+	int if_index;
+	int pkt_length;
 };
 
 /* Definitions for tx_flags in struct skb_shared_info */
@@ -3322,6 +3326,24 @@  static inline void skb_tx_timestamp(struct sk_buff *skb)
 }
 
 /**
+ * skb_hw_timestamp - set hardware timestamp with packet information
+ *
+ * @skb: A socket buffer.
+ * @hwtstamp: The hardware timestamp.
+ * @if_index: The index of the interface which timestamped the packet.
+ * @pkt_len: The length of the packet.
+ */
+static inline void skb_hw_timestamp(struct sk_buff *skb, ktime_t hwtstamp,
+				    int if_index, int pkt_length)
+{
+	struct skb_shared_hwtstamps *hwtstamps = skb_hwtstamps(skb);
+
+	hwtstamps->hwtstamp = hwtstamp;
+	hwtstamps->if_index = if_index;
+	hwtstamps->pkt_length = pkt_length;
+}
+
+/**
  * skb_complete_wifi_ack - deliver skb with wifi status
  *
  * @skb: the original outgoing packet
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 2b48856..a5f6e81 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -100,4 +100,6 @@ 
 
 #define SO_COOKIE		57
 
+#define SCM_TIMESTAMPING_PKTINFO	58
+
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/include/uapi/linux/errqueue.h b/include/uapi/linux/errqueue.h
index 07bdce1..66d752f 100644
--- a/include/uapi/linux/errqueue.h
+++ b/include/uapi/linux/errqueue.h
@@ -43,4 +43,12 @@  enum {
 	SCM_TSTAMP_ACK,		/* data acknowledged by peer */
 };
 
+/**
+ *	struct scm_ts_pktinfo - information about HW-timestamped packets
+ */
+struct scm_ts_pktinfo {
+	int if_index;
+	int pkt_length;
+};
+
 #endif /* _UAPI_LINUX_ERRQUEUE_H */
diff --git a/include/uapi/linux/net_tstamp.h b/include/uapi/linux/net_tstamp.h
index 0749fb1..8397ecd 100644
--- a/include/uapi/linux/net_tstamp.h
+++ b/include/uapi/linux/net_tstamp.h
@@ -26,8 +26,9 @@  enum {
 	SOF_TIMESTAMPING_OPT_CMSG = (1<<10),
 	SOF_TIMESTAMPING_OPT_TSONLY = (1<<11),
 	SOF_TIMESTAMPING_OPT_STATS = (1<<12),
+	SOF_TIMESTAMPING_OPT_PKTINFO = (1<<13),
 
-	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_STATS,
+	SOF_TIMESTAMPING_LAST = SOF_TIMESTAMPING_OPT_PKTINFO,
 	SOF_TIMESTAMPING_MASK = (SOF_TIMESTAMPING_LAST - 1) |
 				 SOF_TIMESTAMPING_LAST
 };
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 9f78109..7ca251f 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3888,10 +3888,16 @@  void __skb_tstamp_tx(struct sk_buff *orig_skb,
 		skb_shinfo(skb)->tskey = skb_shinfo(orig_skb)->tskey;
 	}
 
-	if (hwtstamps)
-		*skb_hwtstamps(skb) = *hwtstamps;
-	else
+	if (hwtstamps) {
+		skb_hwtstamps(skb)->hwtstamp = hwtstamps->hwtstamp;
+		if (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_PKTINFO) {
+			skb_hwtstamps(skb)->if_index = orig_skb->dev->ifindex;
+			skb_hwtstamps(skb)->pkt_length = orig_skb->mac_len +
+							 orig_skb->len;
+		}
+	} else {
 		skb->tstamp = ktime_get_real();
+	}
 
 	__skb_complete_tx_timestamp(skb, sk, tstype, opt_stats);
 }
diff --git a/net/socket.c b/net/socket.c
index eea9970..f272019 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -670,6 +670,7 @@  void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
 {
 	int need_software_tstamp = sock_flag(sk, SOCK_RCVTSTAMP);
 	struct scm_timestamping tss;
+	struct scm_ts_pktinfo ts_pktinfo;
 	int empty = 1;
 	struct skb_shared_hwtstamps *shhwtstamps =
 		skb_hwtstamps(skb);
@@ -699,8 +700,16 @@  void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
 		empty = 0;
 	if (shhwtstamps &&
 	    (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE) &&
-	    ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2))
+	    ktime_to_timespec_cond(shhwtstamps->hwtstamp, tss.ts + 2)) {
+		if (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_PKTINFO &&
+		    shhwtstamps->if_index) {
+			ts_pktinfo.if_index = shhwtstamps->if_index;
+			ts_pktinfo.pkt_length = shhwtstamps->pkt_length;
+			put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING_PKTINFO,
+				 sizeof(ts_pktinfo), &ts_pktinfo);
+		}
 		empty = 0;
+	}
 	if (!empty) {
 		put_cmsg(msg, SOL_SOCKET,
 			 SCM_TIMESTAMPING, sizeof(tss), &tss);