Message ID | alpine.DEB.1.00.1004292246440.12776@pokey.mtv.corp.google.com |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
From: Tom Herbert <therbert@google.com> Date: Thu, 29 Apr 2010 23:07:54 -0700 (PDT) > Implement SO_TIMESTAMP{NS} for TCP. When this socket option is enabled > on a TCP socket, a timestamp for received data can be returned in the > ancillary data of a recvmsg with control message type SCM_TIMESTAMP{NS}. > The timestamp chosen is that of the skb most recently received from > which data was copied. This is useful in debugging and timing > network operations. > > Signed-off-by: Tom Herbert <therbert@google.com> That's not what you're implementing here. You're only updating the socket timestamp if the SKB passed into the update function has a more recent timestamp. There is nothing that says the timestamps have to be increasing and with retransmits and such if it were me I would want to see the real timestamp even if it was earlier than the most recently reported timestamp. I don't know, I really don't like this feature at all. SO_TIMESTAMP is really meant for datagram oriented sockets, where there is a clearly defined "packet" whose timestamp you get. A TCP receive can involve hundreds of tiny packets so the timestamp can lack any meaning. All these new checks and branches for a feature of questionable value. If you can modify you apps to grab this information you can also probe for the information using external probing tools. Sorry, I don't think I'll be applying this. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Tom Herbert <therbert@google.com> Date: Fri, 30 Apr 2010 00:58:32 -0700 >> All these new checks and branches for a feature of questionable value. > >> If you can modify you apps to grab this information you can also probe >> for the information using external probing tools. >> > I don't see an nice way to do that, we're profiling a significant > percentage of millions of connections over thousands of paths as part > of standard operations while incurring negligible overhead. The app > can can easily timestamp its operations, but without some mechanism > for getting timestamps out of a TCP connection, the networking portion > of servicing requests is pretty much a black box in that. If other people have an opinion about this, now would be the time to speak up. :-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 30 Apr 2010, David Miller wrote: > From: Tom Herbert <therbert@google.com> > Date: Fri, 30 Apr 2010 00:58:32 -0700 > > >> All these new checks and branches for a feature of questionable value. > > > >> If you can modify you apps to grab this information you can also probe > >> for the information using external probing tools. > >> > > I don't see an nice way to do that, we're profiling a significant > > percentage of millions of connections over thousands of paths as part > > of standard operations while incurring negligible overhead. The app > > can can easily timestamp its operations, but without some mechanism > > for getting timestamps out of a TCP connection, the networking portion > > of servicing requests is pretty much a black box in that. > > If other people have an opinion about this, now would be the time > to speak up. :-) Not being a kernel hacker, I will naively ask if the kernel tracing facility could somehow be used to provide the desired info (or could be modified to provide it). -Bill -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> I don't see an nice way to do that, we're profiling a significant >> percentage of millions of connections over thousands of paths as part >> of standard operations while incurring negligible overhead. The app >> can can easily timestamp its operations, but without some mechanism >> for getting timestamps out of a TCP connection, the networking portion >> of servicing requests is pretty much a black box in that. > > If other people have an opinion about this, now would be the time > to speak up. :-) > The use case that motivated this patch is really the same as that of UDP in that application is receiving messages that it wants to to time stamp; in the case of TCP the application extracts the frames out of the stream. The lack of a timestamp to discern when a message was received over TCP is readily apparent when designing a message based ULP that can dynamically select which protocol to run over. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Not being a kernel hacker, I will naively ask if the kernel tracing > facility could somehow be used to provide the desired info (or could > be modified to provide it). > We did consider kernel tracing (more in the context of implementing RFC 4898). In the case of trying get per packet timestamps, correlating a ktrace event with an application message is probably too high to make it practical. If it weren't for the cost of timestamp'ing every single skb being received, we'd probably have SO_TIMESTAMP turned on permanently for many connections. For now we're settling for a percentage of messages for sampling. Tom -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le vendredi 30 avril 2010 à 22:40 -0700, Tom Herbert a écrit : > > Not being a kernel hacker, I will naively ask if the kernel tracing > > facility could somehow be used to provide the desired info (or could > > be modified to provide it). > > > > We did consider kernel tracing (more in the context of implementing > RFC 4898). In the case of trying get per packet timestamps, > correlating a ktrace event with an application message is probably too > high to make it practical. If it weren't for the cost of > timestamp'ing every single skb being received, we'd probably have > SO_TIMESTAMP turned on permanently for many connections. For now > we're settling for a percentage of messages for sampling. Tom, did you tried to reuse existing skb or sk tstamps ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 30, 2010 at 04:41:15PM -0700, David Miller wrote: > If other people have an opinion about this, now would be the time > to speak up. :-) I have to say I agree with David. The "receive timestamp" for a TCP recv() call is completely meaningless. Each byte in the stream arguably could have a set of receive timestamps, being the timestamp of the underlying IPv4 packet containing a fragment of a TCP segment that covered that byte. One recv() call could cover many packets, many recv() calls could be required to consume one packet. We just don't know from userland. The point about IPv4 fragments in UDP is a reasonable one; that because of IPv4 fragmentation there are still potentially multiple timestamps that could be relevant to a single UDP recv() call. But no two recv() calls can possibly relate to the same IPv4 fragments, so I feel this is more defined. Plus, of all the IPv4 fragments that go into a single UDP packet, one of them is special - the first one, the one containing the UDP header. We could easily say "the timestamp of a UDP recv() call shall be the time at which its header was received, even if other fragments arrived before or after it". We cannot make any such distinction for some window in a TCP stream. All TCP segments are indistinct in this manner.
diff --git a/include/linux/tcp.h b/include/linux/tcp.h index a778ee0..7dbb662 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -314,6 +314,7 @@ struct tcp_sock { u32 snd_sml; /* Last byte of the most recently transmitted small packet */ u32 rcv_tstamp; /* timestamp of last received ACK (for keepalives) */ u32 lsndtime; /* timestamp of last sent data packet (for restart window) */ + ktime_t lrxcopytime; /* timestamp of newest data in copy to user space */ /* Data for direct copy to user */ struct { @@ -466,6 +467,15 @@ static inline struct tcp_sock *tcp_sk(const struct sock *sk) return (struct tcp_sock *)sk; } +static inline void tcp_update_lrxcopytime(struct sock *sk, + const struct sk_buff *skb) +{ + struct tcp_sock *tp = tcp_sk(sk); + + if (unlikely(skb->tstamp.tv64 > tp->lrxcopytime.tv64)) + tp->lrxcopytime = skb->tstamp; +} + struct tcp_timewait_sock { struct inet_timewait_sock tw_sk; u32 tw_rcv_nxt; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 8ce2974..c7f107a 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1381,6 +1381,27 @@ int tcp_read_sock(struct sock *sk, read_descriptor_t *desc, return copied; } +static inline void tcp_sock_recv_timestamp(struct msghdr *msg, struct sock *sk) +{ + struct tcp_sock *tp = tcp_sk(sk); + + if (likely(!sock_flag(sk, SOCK_RCVTSTAMP))) + return; + + if (msg->msg_controllen >= sizeof(struct timeval) && + tp->lrxcopytime.tv64) { + if (sock_flag(sk, SOCK_RCVTSTAMPNS)) { + struct timespec ts = ktime_to_timespec(tp->lrxcopytime); + put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPNS, + sizeof(ts), &ts); + } else { + struct timeval tv = ktime_to_timeval(tp->lrxcopytime); + put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP, + sizeof(tv), &tv); + } + } +} + /* * This routine copies from a sock struct into the user buffer. * @@ -1414,6 +1435,7 @@ int tcp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, goto out; timeo = sock_rcvtimeo(sk, nonblock); + tp->lrxcopytime.tv64 = 0; /* Urgent data needs to be handled specially. */ if (flags & MSG_OOB) @@ -1691,6 +1713,8 @@ do_prequeue: break; } } + + tcp_update_lrxcopytime(sk, skb); } *seq += used; @@ -1758,6 +1782,8 @@ skip_copy: * on connected socket. I was just happy when found this 8) --ANK */ + tcp_sock_recv_timestamp(msg, sk); + /* Clean up data we have read: This will do ACK frames. */ tcp_cleanup_rbuf(sk, copied); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index e82162c..b94ad16 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4397,6 +4397,7 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb) tp->copied_seq += chunk; eaten = (chunk == skb->len && !th->fin); tcp_rcv_space_adjust(sk); + tcp_update_lrxcopytime(sk, skb); } local_bh_disable(); } @@ -5061,6 +5062,7 @@ static int tcp_copy_to_iovec(struct sock *sk, struct sk_buff *skb, int hlen) tp->ucopy.len -= chunk; tp->copied_seq += chunk; tcp_rcv_space_adjust(sk); + tcp_update_lrxcopytime(sk, skb); } local_bh_disable(); @@ -5120,6 +5122,7 @@ static int tcp_dma_try_early_copy(struct sock *sk, struct sk_buff *skb, tp->ucopy.len -= chunk; tp->copied_seq += chunk; tcp_rcv_space_adjust(sk); + tcp_update_lrxcopytime(sk, skb); if ((tp->ucopy.len == 0) || (tcp_flag_word(tcp_hdr(skb)) & TCP_FLAG_PSH) ||
Implement SO_TIMESTAMP{NS} for TCP. When this socket option is enabled on a TCP socket, a timestamp for received data can be returned in the ancillary data of a recvmsg with control message type SCM_TIMESTAMP{NS}. The timestamp chosen is that of the skb most recently received from which data was copied. This is useful in debugging and timing network operations. Signed-off-by: Tom Herbert <therbert@google.com> --- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html