diff mbox series

[RFC,v2,net-next,01/10] net: Add a new socket option for a future transmit time.

Message ID 20180117230621.26074-2-jesus.sanchez-palencia@intel.com
State RFC
Headers show
Series Time based packet transmission | expand

Commit Message

Jesus Sanchez-Palencia Jan. 17, 2018, 11:06 p.m. UTC
From: Richard Cochran <rcochran@linutronix.de>

This patch introduces SO_TXTIME.  User space enables this option in
order to pass a desired future transmit time in a CMSG when calling
sendmsg(2).

A new field is added to struct sockcm_cookie, and the tstamp from
skbuffs will be used later on.

Signed-off-by: Richard Cochran <rcochran@linutronix.de>
Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
---
 arch/alpha/include/uapi/asm/socket.h   |  3 +++
 arch/frv/include/uapi/asm/socket.h     |  3 +++
 arch/ia64/include/uapi/asm/socket.h    |  3 +++
 arch/m32r/include/uapi/asm/socket.h    |  3 +++
 arch/mips/include/uapi/asm/socket.h    |  3 +++
 arch/mn10300/include/uapi/asm/socket.h |  3 +++
 arch/parisc/include/uapi/asm/socket.h  |  3 +++
 arch/s390/include/uapi/asm/socket.h    |  3 +++
 arch/sparc/include/uapi/asm/socket.h   |  3 +++
 arch/xtensa/include/uapi/asm/socket.h  |  3 +++
 include/net/sock.h                     |  2 ++
 include/uapi/asm-generic/socket.h      |  3 +++
 net/core/sock.c                        | 16 ++++++++++++++++
 13 files changed, 51 insertions(+)

Comments

Miroslav Lichvar Jan. 18, 2018, 8:42 a.m. UTC | #1
On Wed, Jan 17, 2018 at 03:06:12PM -0800, Jesus Sanchez-Palencia wrote:
> From: Richard Cochran <rcochran@linutronix.de>
> 
> This patch introduces SO_TXTIME.  User space enables this option in
> order to pass a desired future transmit time in a CMSG when calling
> sendmsg(2).
> 
> A new field is added to struct sockcm_cookie, and the tstamp from
> skbuffs will be used later on.

In the discussion about the v1 patchset, there was a question if the
cmsg should include a clockid_t. Without that, how can an application
prevent the packet from being sent using an incorrect clock, e.g.
the system clock when it expects it to be a PHC, or a different PHC
when the socket is not bound to a specific interface?

At least in some applications it would be preferred to not sent a
packet at all instead of sending it at a wrong time.

Please keep in mind that the PHCs and the system clock don't have to
be synchronized to each other. If I understand the rest of the series
correctly, there is an assumption that the PHCs are keeping time in
TAI and CLOCK_TAI can be used as a fallback.
Richard Cochran Jan. 18, 2018, 5:11 p.m. UTC | #2
On Wed, Jan 17, 2018 at 03:06:12PM -0800, Jesus Sanchez-Palencia wrote:
> @@ -2130,6 +2137,15 @@ int __sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct cmsghdr *cmsg,
>  		sockc->tsflags &= ~SOF_TIMESTAMPING_TX_RECORD_MASK;
>  		sockc->tsflags |= tsflags;
>  		break;
> +	case SO_TXTIME:
> +		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
> +			return -EPERM;
> +		if (!sock_flag(sk, SOCK_TXTIME))
> +			return -EINVAL;
> +		if (cmsg->cmsg_len != CMSG_LEN(sizeof(ktime_t)))
> +			return -EINVAL;
> +		sockc->transmit_time = *(ktime_t *)CMSG_DATA(cmsg);

As pointed out in the first series' review:

  No guarantee the CMSG is properly aligned on arches that might trap
  on unaligned access.

Thanks,
Richard
Richard Cochran Jan. 18, 2018, 5:13 p.m. UTC | #3
On Thu, Jan 18, 2018 at 09:42:27AM +0100, Miroslav Lichvar wrote:
> In the discussion about the v1 patchset, there was a question if the
> cmsg should include a clockid_t. Without that, how can an application
> prevent the packet from being sent using an incorrect clock, e.g.
> the system clock when it expects it to be a PHC, or a different PHC
> when the socket is not bound to a specific interface?

Right, the clockid_t should be passed in through the CMSG along with
the time.
 
Thanks,
Richard
Willem de Bruijn Jan. 19, 2018, 9:15 p.m. UTC | #4
On Wed, Jan 17, 2018 at 6:06 PM, Jesus Sanchez-Palencia
<jesus.sanchez-palencia@intel.com> wrote:
> From: Richard Cochran <rcochran@linutronix.de>
>
> This patch introduces SO_TXTIME.  User space enables this option in
> order to pass a desired future transmit time in a CMSG when calling
> sendmsg(2).
>
> A new field is added to struct sockcm_cookie, and the tstamp from
> skbuffs will be used later on.
>
> Signed-off-by: Richard Cochran <rcochran@linutronix.de>
> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
> ---

> diff --git a/net/core/sock.c b/net/core/sock.c
> index abf4cbff99b2..37ef4b33fd92 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1061,6 +1061,13 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
>                         sock_valbool_flag(sk, SOCK_ZEROCOPY, valbool);
>                 break;
>
> +       case SO_TXTIME:
> +               if (ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
> +                       sock_valbool_flag(sk, SOCK_TXTIME, valbool);
> +               else
> +                       ret = -EPERM;
> +               break;
> +
>         default:
>                 ret = -ENOPROTOOPT;
>                 break;

Please add getsockopt alongside setsockopt.

I would also restrict input to [0, 1] exactly to allow for future extensions.

If using ns_capable, skb->tstamp must continue to be scrubbed when traversing
network namespaces.

> @@ -2130,6 +2137,15 @@ int __sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct cmsghdr *cmsg,
>                 sockc->tsflags &= ~SOF_TIMESTAMPING_TX_RECORD_MASK;
>                 sockc->tsflags |= tsflags;
>                 break;
> +       case SO_TXTIME:
> +               if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
> +                       return -EPERM;
> +               if (!sock_flag(sk, SOCK_TXTIME))
> +                       return -EINVAL;

No need for ns_capable check on each packet when already required to
toggle socket option.

> +               if (cmsg->cmsg_len != CMSG_LEN(sizeof(ktime_t)))
> +                       return -EINVAL;

I don't see any existing reference to ktime_t in include/uapi. Just use a s64?
Richard Cochran Jan. 20, 2018, 2:09 a.m. UTC | #5
On Fri, Jan 19, 2018 at 04:15:46PM -0500, Willem de Bruijn wrote:
> > +               if (cmsg->cmsg_len != CMSG_LEN(sizeof(ktime_t)))
> > +                       return -EINVAL;
> 
> I don't see any existing reference to ktime_t in include/uapi. Just use a s64?

Agreed.  I didn't see the point of switching to ktime, either.

Thanks,
Richard
Jesus Sanchez-Palencia Jan. 23, 2018, 6:12 p.m. UTC | #6
On 01/18/2018 09:11 AM, Richard Cochran wrote:
> On Wed, Jan 17, 2018 at 03:06:12PM -0800, Jesus Sanchez-Palencia wrote:
>> @@ -2130,6 +2137,15 @@ int __sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct cmsghdr *cmsg,
>>  		sockc->tsflags &= ~SOF_TIMESTAMPING_TX_RECORD_MASK;
>>  		sockc->tsflags |= tsflags;
>>  		break;
>> +	case SO_TXTIME:
>> +		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
>> +			return -EPERM;
>> +		if (!sock_flag(sk, SOCK_TXTIME))
>> +			return -EINVAL;
>> +		if (cmsg->cmsg_len != CMSG_LEN(sizeof(ktime_t)))
>> +			return -EINVAL;
>> +		sockc->transmit_time = *(ktime_t *)CMSG_DATA(cmsg);
> 
> As pointed out in the first series' review:
> 
>   No guarantee the CMSG is properly aligned on arches that might trap
>   on unaligned access.


Yes, it will be fixed on the next version. We should probably fix the other
cases on this function as well then.

Thanks,
Jesus

> 
> Thanks,
> Richard
>
Jesus Sanchez-Palencia Jan. 23, 2018, 6:24 p.m. UTC | #7
Hi,


On 01/19/2018 01:15 PM, Willem de Bruijn wrote:
> On Wed, Jan 17, 2018 at 6:06 PM, Jesus Sanchez-Palencia
> <jesus.sanchez-palencia@intel.com> wrote:
>> From: Richard Cochran <rcochran@linutronix.de>
>>
>> This patch introduces SO_TXTIME.  User space enables this option in
>> order to pass a desired future transmit time in a CMSG when calling
>> sendmsg(2).
>>
>> A new field is added to struct sockcm_cookie, and the tstamp from
>> skbuffs will be used later on.
>>
>> Signed-off-by: Richard Cochran <rcochran@linutronix.de>
>> Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
>> ---
> 
>> diff --git a/net/core/sock.c b/net/core/sock.c
>> index abf4cbff99b2..37ef4b33fd92 100644
>> --- a/net/core/sock.c
>> +++ b/net/core/sock.c
>> @@ -1061,6 +1061,13 @@ int sock_setsockopt(struct socket *sock, int level, int optname,
>>                         sock_valbool_flag(sk, SOCK_ZEROCOPY, valbool);
>>                 break;
>>
>> +       case SO_TXTIME:
>> +               if (ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
>> +                       sock_valbool_flag(sk, SOCK_TXTIME, valbool);
>> +               else
>> +                       ret = -EPERM;
>> +               break;
>> +
>>         default:
>>                 ret = -ENOPROTOOPT;
>>                 break;
> 
> Please add getsockopt alongside setsockopt.
> 
> I would also restrict input to [0, 1] exactly to allow for future extensions.


Ok, will do.


> 
> If using ns_capable, skb->tstamp must continue to be scrubbed when traversing
> network namespaces.


I was planning to follow Eric's suggestion and move the tstamp scrubbing out of
skb_scrub_packet() into ____dev_forward_skb() instead. Would that break when
traversing namespaces?



> 
>> @@ -2130,6 +2137,15 @@ int __sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct cmsghdr *cmsg,
>>                 sockc->tsflags &= ~SOF_TIMESTAMPING_TX_RECORD_MASK;
>>                 sockc->tsflags |= tsflags;
>>                 break;
>> +       case SO_TXTIME:
>> +               if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
>> +                       return -EPERM;
>> +               if (!sock_flag(sk, SOCK_TXTIME))
>> +                       return -EINVAL;
> 
> No need for ns_capable check on each packet when already required to
> toggle socket option.


Ok. SO_MARK is doing the same so it might have "mis-inspired" me. I should
probably fix both.



> 
>> +               if (cmsg->cmsg_len != CMSG_LEN(sizeof(ktime_t)))
>> +                       return -EINVAL;
> 
> I don't see any existing reference to ktime_t in include/uapi. Just use a s64?


Sure, will fix.


Thanks,
Jesus
Willem de Bruijn Jan. 23, 2018, 8:02 p.m. UTC | #8
>> If using ns_capable, skb->tstamp must continue to be scrubbed when traversing
>> network namespaces.
>
>
> I was planning to follow Eric's suggestion and move the tstamp scrubbing out of
> skb_scrub_packet() into ____dev_forward_skb() instead. Would that break when
> traversing namespaces?

That implies namespace traversal, so sounds perfect for this purpose.

>>
>>> @@ -2130,6 +2137,15 @@ int __sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct cmsghdr *cmsg,
>>>                 sockc->tsflags &= ~SOF_TIMESTAMPING_TX_RECORD_MASK;
>>>                 sockc->tsflags |= tsflags;
>>>                 break;
>>> +       case SO_TXTIME:
>>> +               if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
>>> +                       return -EPERM;
>>> +               if (!sock_flag(sk, SOCK_TXTIME))
>>> +                       return -EINVAL;
>>
>> No need for ns_capable check on each packet when already required to
>> toggle socket option.
>
>
> Ok. SO_MARK is doing the same so it might have "mis-inspired" me. I should
> probably fix both.

The SO_MARK cmsg does need a check on each invocation,
because it is not conditional on a sock_flag like SO_TXTIME.
Vinicius Costa Gomes Jan. 23, 2018, 9:22 p.m. UTC | #9
Hi,

Miroslav Lichvar <mlichvar@redhat.com> writes:

> On Wed, Jan 17, 2018 at 03:06:12PM -0800, Jesus Sanchez-Palencia wrote:
>> From: Richard Cochran <rcochran@linutronix.de>
>> 
>> This patch introduces SO_TXTIME.  User space enables this option in
>> order to pass a desired future transmit time in a CMSG when calling
>> sendmsg(2).
>> 
>> A new field is added to struct sockcm_cookie, and the tstamp from
>> skbuffs will be used later on.
>
> In the discussion about the v1 patchset, there was a question if the
> cmsg should include a clockid_t. Without that, how can an application
> prevent the packet from being sent using an incorrect clock, e.g.
> the system clock when it expects it to be a PHC, or a different PHC
> when the socket is not bound to a specific interface?
>
> At least in some applications it would be preferred to not sent a
> packet at all instead of sending it at a wrong time.

Including the clockid in a CMSG field does make sense. Will add it in
the next version of this series.

What I think would be the ideal scenario would be if the clockid
parameter to the TBS Qdisc would not be necessary (if offload was
enabled), but that's not quite possible right now, because there's no
support for using the hrtimer infrastructure with dynamic clocks
(/dev/ptp*).

What I am thinking is to keep the clockid parameter for the Qdisc (and
add support for expressing the clockid in friendlier ways, as requested
later in this thread), but I can't think of a way to add support for
using /dev/ptp* clocks without first having hrtimer support them.

And the behavior would be to drop any packets with a clockid not
matching the Qdisc clockid.

How does this sound?


>
> Please keep in mind that the PHCs and the system clock don't have to
> be synchronized to each other. If I understand the rest of the series
> correctly, there is an assumption that the PHCs are keeping time in
> TAI and CLOCK_TAI can be used as a fallback.

You understand correctly, that's because of whole hrtimer issue.

>
> -- 
> Miroslav Lichvar


Cheers,
--
Vinicius
Richard Cochran Jan. 24, 2018, 3:04 a.m. UTC | #10
On Tue, Jan 23, 2018 at 01:22:37PM -0800, Vinicius Costa Gomes wrote:
> What I think would be the ideal scenario would be if the clockid
> parameter to the TBS Qdisc would not be necessary (if offload was
> enabled), but that's not quite possible right now, because there's no
> support for using the hrtimer infrastructure with dynamic clocks
> (/dev/ptp*).

We don't need hrtimer for HW offloading.  Just enqueue the packets.  I
thought we agreed that user space get the ordering correct.  In fact,
davem insisted on it, IIRC.

Thanks,
Richard
Vinicius Costa Gomes Jan. 24, 2018, 10:46 p.m. UTC | #11
Hi Richard,

Richard Cochran <richardcochran@gmail.com> writes:

> On Tue, Jan 23, 2018 at 01:22:37PM -0800, Vinicius Costa Gomes wrote:
>> What I think would be the ideal scenario would be if the clockid
>> parameter to the TBS Qdisc would not be necessary (if offload was
>> enabled), but that's not quite possible right now, because there's no
>> support for using the hrtimer infrastructure with dynamic clocks
>> (/dev/ptp*).
>
> We don't need hrtimer for HW offloading.  Just enqueue the packets.  I
> thought we agreed that user space get the ordering correct.  In fact,
> davem insisted on it, IIRC.

About the ordering of packets, From here [1], there are 3 clear points
(in my understanding):

1. Re-ordering of TX descriptors on the device queue should/must not
   happen;

2. Out of order requests are an error;

3. Timestamps in the past are an error;

The only robust way that we could think of about keeping the the packets
in order for the device queue is re-ordering packets in the Qdisc.

We tried to reach out for confirmation [2] of this understanding but
didn't receive any word.

Even if we reach a decision that the Qdisc should not re-order packets
(we wouldn't have any dependency on hrtimers in the offload case, as you
pointed out), we still need hrtimers for the software implementation.

So, I guess, the problem remains, if it's possible for the user to
express a /dev/ptp* clock, what should we do? 

>
> Thanks,
> Richard

Cheers,
--
Vinicius


[1] https://patchwork.ozlabs.org/comment/1770302/

[2] https://patchwork.ozlabs.org/comment/1816492/q
Miroslav Lichvar Jan. 25, 2018, 9:12 a.m. UTC | #12
On Fri, Jan 19, 2018 at 06:09:15PM -0800, Richard Cochran wrote:
> On Fri, Jan 19, 2018 at 04:15:46PM -0500, Willem de Bruijn wrote:
> > > +               if (cmsg->cmsg_len != CMSG_LEN(sizeof(ktime_t)))
> > > +                       return -EINVAL;
> > 
> > I don't see any existing reference to ktime_t in include/uapi. Just use a s64?
> 
> Agreed.  I didn't see the point of switching to ktime, either.

Do I understand it correctly that no other interface is using
nanoseconds since 1970? We probably don't have to worry about year
2262 yet, but wouldn't it be better to make it consistent with the
timestamping API using timespec? Or is it just better to avoid the
64/32-bit mess of time_t?
Richard Cochran Jan. 25, 2018, 4:52 p.m. UTC | #13
On Thu, Jan 25, 2018 at 10:12:25AM +0100, Miroslav Lichvar wrote:
> Do I understand it correctly that no other interface is using
> nanoseconds since 1970? We probably don't have to worry about year
> 2262 yet, but wouldn't it be better to make it consistent with the
> timestamping API using timespec? Or is it just better to avoid the
> 64/32-bit mess of time_t?

I prefer a single 64 bit nanoseconds field:

- Applications won't have to convert to timespec.

- Avoids the time_t issue.

Thanks,
Richard
Richard Cochran Jan. 26, 2018, 2:12 a.m. UTC | #14
On Wed, Jan 24, 2018 at 02:46:24PM -0800, Vinicius Costa Gomes wrote:
> The only robust way that we could think of about keeping the the packets
> in order for the device queue is re-ordering packets in the Qdisc.

Right, but you cannot afford the overhead of the timerqueue when using
HW offload, when the HW device sits on a PCIe bus.  Many serious TSN
applications (like industrial controls) will want to have just one
packet queued, readying the next one just in time for the next
deadline.  The control loops are sensitive to the feedback interval.
 
> Even if we reach a decision that the Qdisc should not re-order packets
> (we wouldn't have any dependency on hrtimers in the offload case, as you
> pointed out), we still need hrtimers for the software implementation.

Fine.
 
> So, I guess, the problem remains, if it's possible for the user to
> express a /dev/ptp* clock, what should we do? 

Thinking a bit more, it doesn't make sense to have a user choice for
the HW offloading case.  The clock should implicitly be the device
clock, always.  Using any other clock would make no sense.

Thanks,
Richard
Jesus Sanchez-Palencia Feb. 1, 2018, 12:49 a.m. UTC | #15
Hi,


On 01/18/2018 09:13 AM, Richard Cochran wrote:
> On Thu, Jan 18, 2018 at 09:42:27AM +0100, Miroslav Lichvar wrote:
>> In the discussion about the v1 patchset, there was a question if the
>> cmsg should include a clockid_t. Without that, how can an application
>> prevent the packet from being sent using an incorrect clock, e.g.
>> the system clock when it expects it to be a PHC, or a different PHC
>> when the socket is not bound to a specific interface?
> 
> Right, the clockid_t should be passed in through the CMSG along with
> the time.

While implementing this today it crossed my mind that why don't we have the
clockid_t set per socket (e.g. as an argument to SO_TXTIME) instead of per packet?

The only use-case that we could think of that would be 'blocked' was using
sendmmsg() to send a packet to different interfaces with a single syscall, but
I'm not sure how common that is.

What do you think?

Thanks,
Jesus


>  
> Thanks,
> Richard
>
Richard Cochran Feb. 1, 2018, 4:16 a.m. UTC | #16
On Wed, Jan 31, 2018 at 04:49:36PM -0800, Jesus Sanchez-Palencia wrote:
> While implementing this today it crossed my mind that why don't we have the
> clockid_t set per socket (e.g. as an argument to SO_TXTIME) instead of per packet?

Sounds good to me.

Thanks,
Richard
Miroslav Lichvar Feb. 1, 2018, 9:27 a.m. UTC | #17
On Wed, Jan 31, 2018 at 04:49:36PM -0800, Jesus Sanchez-Palencia wrote:
> On 01/18/2018 09:13 AM, Richard Cochran wrote:
> > Right, the clockid_t should be passed in through the CMSG along with
> > the time.
> 
> While implementing this today it crossed my mind that why don't we have the
> clockid_t set per socket (e.g. as an argument to SO_TXTIME) instead of per packet?

I suspect that might have an impact on the performance. Even if the
application doesn't use sendmmsg(), it would possibly have to call
setsockopt() before each sendmsg() to change the clockid_t, right?

If clockid_t could be set per packet, a special value could be used
to allow sending on interfaces that don't support it.

> The only use-case that we could think of that would be 'blocked' was using
> sendmmsg() to send a packet to different interfaces with a single syscall, but
> I'm not sure how common that is.

The SO_TXTIME option will make sendmmsg() useful in applications where
it wasn't before. For instance, an NTP server will be able to batch
multiple responses as their transmit timestamps can be set accurately
in advance and it's no longer necessary to send the responses as soon
as they are assembled.

I think it would be nice the sendmmsg() calls didn't have to be split
by clockid_t.
Jesus Sanchez-Palencia Feb. 1, 2018, 8:55 p.m. UTC | #18
Hi,


On 02/01/2018 01:27 AM, Miroslav Lichvar wrote:
> On Wed, Jan 31, 2018 at 04:49:36PM -0800, Jesus Sanchez-Palencia wrote:
>> On 01/18/2018 09:13 AM, Richard Cochran wrote:
>>> Right, the clockid_t should be passed in through the CMSG along with
>>> the time.
>>
>> While implementing this today it crossed my mind that why don't we have the
>> clockid_t set per socket (e.g. as an argument to SO_TXTIME) instead of per packet?
> 
> I suspect that might have an impact on the performance. Even if the
> application doesn't use sendmmsg(), it would possibly have to call
> setsockopt() before each sendmsg() to change the clockid_t, right?


Yes. On the other hand, for applications that will be using only 1 clockid_t,
keeping it per packet will also have an impact as we'll be copying the same
value from the cmsg cookie into sk_buffs over and over.


> 
> If clockid_t could be set per packet, a special value could be used
> to allow sending on interfaces that don't support it.
> 
>> The only use-case that we could think of that would be 'blocked' was using
>> sendmmsg() to send a packet to different interfaces with a single syscall, but
>> I'm not sure how common that is.
> 
> The SO_TXTIME option will make sendmmsg() useful in applications where
> it wasn't before. For instance, an NTP server will be able to batch
> multiple responses as their transmit timestamps can be set accurately
> in advance and it's no longer necessary to send the responses as soon
> as they are assembled.
> 
> I think it would be nice the sendmmsg() calls didn't have to be split
> by clockid_t.


OK, fair enough. I will keep it per-packet for now as initially agreed and we
can revisit this later if needed.


Thanks,
Jesus
Jesus Sanchez-Palencia Feb. 12, 2018, 10:39 p.m. UTC | #19
Hi,


On 01/18/2018 12:42 AM, Miroslav Lichvar wrote:
> On Wed, Jan 17, 2018 at 03:06:12PM -0800, Jesus Sanchez-Palencia wrote:
>> From: Richard Cochran <rcochran@linutronix.de>
>>
>> This patch introduces SO_TXTIME.  User space enables this option in
>> order to pass a desired future transmit time in a CMSG when calling
>> sendmsg(2).
>>
>> A new field is added to struct sockcm_cookie, and the tstamp from
>> skbuffs will be used later on.
> 
> In the discussion about the v1 patchset, there was a question if the
> cmsg should include a clockid_t. Without that, how can an application
> prevent the packet from being sent using an incorrect clock, e.g.
> the system clock when it expects it to be a PHC, or a different PHC
> when the socket is not bound to a specific interface?
> 
> At least in some applications it would be preferred to not sent a
> packet at all instead of sending it at a wrong time.
> 
> Please keep in mind that the PHCs and the system clock don't have to
> be synchronized to each other. If I understand the rest of the series
> correctly, there is an assumption that the PHCs are keeping time in
> TAI and CLOCK_TAI can be used as a fallback.


Just to double-check, imagine that I've configured the qdisc for
SW best-effort and with clockid CLOCK_REALTIME. When it receives a
packet with the clockid of a /dev/ptpX, the qdisc should just drop that
packet, right?

Or would this block any use-cases that I couldn't think of ?

Thanks,
Jesus
Miroslav Lichvar Feb. 13, 2018, 9:56 a.m. UTC | #20
On Mon, Feb 12, 2018 at 02:39:06PM -0800, Jesus Sanchez-Palencia wrote:
> On 01/18/2018 12:42 AM, Miroslav Lichvar wrote:
> > Please keep in mind that the PHCs and the system clock don't have to
> > be synchronized to each other. If I understand the rest of the series
> > correctly, there is an assumption that the PHCs are keeping time in
> > TAI and CLOCK_TAI can be used as a fallback.
> 
> Just to double-check, imagine that I've configured the qdisc for
> SW best-effort and with clockid CLOCK_REALTIME. When it receives a
> packet with the clockid of a /dev/ptpX, the qdisc should just drop that
> packet, right?

Yes, I think it should drop it. The kernel does not know the offset
between the two clocks (they don't even have to be synchronized), so
it cannot convert a PHC-based TX time to the system time.
diff mbox series

Patch

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index be14f16149d5..065fb372e355 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -112,4 +112,7 @@ 
 
 #define SO_ZEROCOPY		60
 
+#define SO_TXTIME		61
+#define SCM_TXTIME		SO_TXTIME
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h
index 9168e78fa32a..0e95f45cd058 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -105,5 +105,8 @@ 
 
 #define SO_ZEROCOPY		60
 
+#define SO_TXTIME		61
+#define SCM_TXTIME		SO_TXTIME
+
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h
index 3efba40adc54..c872c4e6bafb 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -114,4 +114,7 @@ 
 
 #define SO_ZEROCOPY		60
 
+#define SO_TXTIME		61
+#define SCM_TXTIME		SO_TXTIME
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h b/arch/m32r/include/uapi/asm/socket.h
index cf5018e82c3d..65276c95b8df 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -105,4 +105,7 @@ 
 
 #define SO_ZEROCOPY		60
 
+#define SO_TXTIME		61
+#define SCM_TXTIME		SO_TXTIME
+
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index 49c3d4795963..71370fb3ceef 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -123,4 +123,7 @@ 
 
 #define SO_ZEROCOPY		60
 
+#define SO_TXTIME		61
+#define SCM_TXTIME		SO_TXTIME
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h b/arch/mn10300/include/uapi/asm/socket.h
index b35eee132142..d029a40b1b55 100644
--- a/arch/mn10300/include/uapi/asm/socket.h
+++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -105,4 +105,7 @@ 
 
 #define SO_ZEROCOPY		60
 
+#define SO_TXTIME		61
+#define SCM_TXTIME		SO_TXTIME
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 1d0fdc3b5d22..061b9cf2a779 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -104,4 +104,7 @@ 
 
 #define SO_ZEROCOPY		0x4035
 
+#define SO_TXTIME		0x4036
+#define SCM_TXTIME		SO_TXTIME
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h
index 3510c0fd06f4..39d901476ee5 100644
--- a/arch/s390/include/uapi/asm/socket.h
+++ b/arch/s390/include/uapi/asm/socket.h
@@ -111,4 +111,7 @@ 
 
 #define SO_ZEROCOPY		60
 
+#define SO_TXTIME		61
+#define SCM_TXTIME		SO_TXTIME
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index d58520c2e6ff..7ea35e5601b6 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -101,6 +101,9 @@ 
 
 #define SO_ZEROCOPY		0x003e
 
+#define SO_TXTIME		0x003f
+#define SCM_TXTIME		SO_TXTIME
+
 /* Security levels - as per NRL IPv6 - don't actually do anything */
 #define SO_SECURITY_AUTHENTICATION		0x5001
 #define SO_SECURITY_ENCRYPTION_TRANSPORT	0x5002
diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h
index 75a07b8119a9..1de07a7f7680 100644
--- a/arch/xtensa/include/uapi/asm/socket.h
+++ b/arch/xtensa/include/uapi/asm/socket.h
@@ -116,4 +116,7 @@ 
 
 #define SO_ZEROCOPY		60
 
+#define SO_TXTIME		61
+#define SCM_TXTIME		SO_TXTIME
+
 #endif	/* _XTENSA_SOCKET_H */
diff --git a/include/net/sock.h b/include/net/sock.h
index 73b7830b0bb8..927af34f3e2c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -777,6 +777,7 @@  enum sock_flags {
 	SOCK_FILTER_LOCKED, /* Filter cannot be changed anymore */
 	SOCK_SELECT_ERR_QUEUE, /* Wake select on error queue */
 	SOCK_RCU_FREE, /* wait rcu grace period in sk_destruct() */
+	SOCK_TXTIME,
 };
 
 #define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE))
@@ -1567,6 +1568,7 @@  void sock_kzfree_s(struct sock *sk, void *mem, int size);
 void sk_send_sigurg(struct sock *sk);
 
 struct sockcm_cookie {
+	ktime_t transmit_time;
 	u32 mark;
 	u16 tsflags;
 };
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 0ae758c90e54..a12692e5f7a8 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -107,4 +107,7 @@ 
 
 #define SO_ZEROCOPY		60
 
+#define SO_TXTIME		61
+#define SCM_TXTIME		SO_TXTIME
+
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/net/core/sock.c b/net/core/sock.c
index abf4cbff99b2..37ef4b33fd92 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1061,6 +1061,13 @@  int sock_setsockopt(struct socket *sock, int level, int optname,
 			sock_valbool_flag(sk, SOCK_ZEROCOPY, valbool);
 		break;
 
+	case SO_TXTIME:
+		if (ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+			sock_valbool_flag(sk, SOCK_TXTIME, valbool);
+		else
+			ret = -EPERM;
+		break;
+
 	default:
 		ret = -ENOPROTOOPT;
 		break;
@@ -2130,6 +2137,15 @@  int __sock_cmsg_send(struct sock *sk, struct msghdr *msg, struct cmsghdr *cmsg,
 		sockc->tsflags &= ~SOF_TIMESTAMPING_TX_RECORD_MASK;
 		sockc->tsflags |= tsflags;
 		break;
+	case SO_TXTIME:
+		if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
+			return -EPERM;
+		if (!sock_flag(sk, SOCK_TXTIME))
+			return -EINVAL;
+		if (cmsg->cmsg_len != CMSG_LEN(sizeof(ktime_t)))
+			return -EINVAL;
+		sockc->transmit_time = *(ktime_t *)CMSG_DATA(cmsg);
+		break;
 	/* SCM_RIGHTS and SCM_CREDENTIALS are semantically in SOL_UNIX. */
 	case SCM_RIGHTS:
 	case SCM_CREDENTIALS: