Message ID | CADVnQymwaUNVgzqM97vfb=nfMFzVWmK8wR7fQ8Wm3i_c8+Z0jA@mail.gmail.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, 2013-12-13 at 09:15 -0500, Neal Cardwell wrote: > Seems like a nice improvement, but if we apply this patch then AFAICT > to get the Nagle-enabled case right we also have to update > tcp_minshall_update() to notice these new non-MSS-aligned segments > going out, and count those as non-full-size segments for the > minshall-nagle check (to ensure we have no more than one outstanding > un-ACKed sub-MSS packet). Maybe something like (please excuse the > formatting): > > diff --git a/include/net/tcp.h b/include/net/tcp.h > index 70e55d2..a2ec237 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -980,7 +980,8 @@ bool tcp_is_cwnd_limited(const struct sock *sk, > u32 in_flight); > static inline void tcp_minshall_update(struct tcp_sock *tp, unsigned int mss, > const struct sk_buff *skb) > { > - if (skb->len < mss) > + if (skb->len < mss || > + tcp_skb_pcount(skb) * tcp_skb_mss(skb) > skb->len) > tp->snd_sml = TCP_SKB_CB(skb)->end_seq; > } Very good point Neal, but dont you think tcp_skb_mss(skb) is equal to mss at this point ? (We normally have synced with tso_segs = tcp_init_tso_segs(sk, skb, mss_now);) (Just trying to make this code more understandable...) Also I think we should move this helper out of include/net/tcp.h, we only use it from tcp_output.c I'll submit a v2, rewording the comment in front of tcp_mss_split_point() Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Dec 13, 2013 at 9:54 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Fri, 2013-12-13 at 09:15 -0500, Neal Cardwell wrote: > >> Seems like a nice improvement, but if we apply this patch then AFAICT >> to get the Nagle-enabled case right we also have to update >> tcp_minshall_update() to notice these new non-MSS-aligned segments >> going out, and count those as non-full-size segments for the >> minshall-nagle check (to ensure we have no more than one outstanding >> un-ACKed sub-MSS packet). Maybe something like (please excuse the >> formatting): >> >> diff --git a/include/net/tcp.h b/include/net/tcp.h >> index 70e55d2..a2ec237 100644 >> --- a/include/net/tcp.h >> +++ b/include/net/tcp.h >> @@ -980,7 +980,8 @@ bool tcp_is_cwnd_limited(const struct sock *sk, >> u32 in_flight); >> static inline void tcp_minshall_update(struct tcp_sock *tp, unsigned int mss, >> const struct sk_buff *skb) >> { >> - if (skb->len < mss) >> + if (skb->len < mss || >> + tcp_skb_pcount(skb) * tcp_skb_mss(skb) > skb->len) >> tp->snd_sml = TCP_SKB_CB(skb)->end_seq; >> } > > Very good point Neal, but dont you think tcp_skb_mss(skb) is equal to > mss at this point ? (We normally have synced with tso_segs = > tcp_init_tso_segs(sk, skb, mss_now);) > > (Just trying to make this code more understandable...) > > Also I think we should move this helper out of include/net/tcp.h, we > only use it from tcp_output.c > > I'll submit a v2, rewording the comment in front of > tcp_mss_split_point() Yes, I like your ideas to use mss_now instead, move tcp_minshall_update() to tcp_output.c (next to tcp_minshall_check()?), and update the comment in front of tcp_mss_split_point(). And given that mss_now is more sane than tcp_skb_mss(skb) (which is zero for one-MSS skbs) I think maybe we can make it something like: static inline void tcp_minshall_update(struct tcp_sock *tp, unsigned int mss_now, const struct sk_buff *skb) { if (skb->len < tcp_skb_pcount(skb) * mss_now) tp->snd_sml = TCP_SKB_CB(skb)->end_seq; } neal -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> From: Neal Cardwell ... > Seems like a nice improvement, but if we apply this patch then AFAICT > to get the Nagle-enabled case right we also have to update > tcp_minshall_update() to notice these new non-MSS-aligned segments > going out, and count those as non-full-size segments for the > minshall-nagle check (to ensure we have no more than one outstanding > un-ACKed sub-MSS packet). Maybe something like (please excuse the > formatting): This sort of begs the question about how Nagle should work. IIRC Nagle just suppresses short segments when there is unacked data? [1] If you have sent a TSO packet then nagle will always be 'waiting for an ack', so should only send full segments. What is questionable is whether you should send the final short segment, or buffer it waiting for further data from the application to fill the segment (or an ack from the remote system). If you split the data (as I think the code used to) then presumably with nagle the final short segment won't actually be sent (until timeout or an ack is received). So the transmitted segments are likely to all be full. OTOH with the change you'll send a partial segment. If this only happens when the tx socket buffer (etc) is empty it is probably an improvement! Run vi in a large window and page forwards, the data displayed is larger than a segment, so you have to wait for the nagle timeout before the entire screen is updated. Since the data is a single write() it would be a single TSO send - and you want it all to get sent. David [1] So that single characters typed into rlogin get sent together when the remote system finally finishes processing the previous one(s). While ftp can still send bulk data without waiting for responses. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2013-12-13 at 11:22 -0500, Neal Cardwell wrote: > Yes, I like your ideas to use mss_now instead, move > tcp_minshall_update() to tcp_output.c (next to tcp_minshall_check()?), > and update the comment in front of tcp_mss_split_point(). > > And given that mss_now is more sane than tcp_skb_mss(skb) (which is > zero for one-MSS skbs) I think maybe we can make it something like: > > static inline void tcp_minshall_update(struct tcp_sock *tp, unsigned > int mss_now, > const struct sk_buff *skb) > { > if (skb->len < tcp_skb_pcount(skb) * mss_now) > tp->snd_sml = TCP_SKB_CB(skb)->end_seq; > } This was indeed what I got ;) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2013-12-13 at 16:58 +0000, David Laight wrote: > > From: Neal Cardwell > ... > > Seems like a nice improvement, but if we apply this patch then AFAICT > > to get the Nagle-enabled case right we also have to update > > tcp_minshall_update() to notice these new non-MSS-aligned segments > > going out, and count those as non-full-size segments for the > > minshall-nagle check (to ensure we have no more than one outstanding > > un-ACKed sub-MSS packet). Maybe something like (please excuse the > > formatting): > > This sort of begs the question about how Nagle should work. > IIRC Nagle just suppresses short segments when there is unacked data? [1] > > If you have sent a TSO packet then nagle will always be 'waiting for an ack', > so should only send full segments. What is questionable is whether you should > send the final short segment, or buffer it waiting for further data from > the application to fill the segment (or an ack from the remote system). Point is the current code does a grouped send of the two skbs, without any wait or even release of socket lock, allowing a tcp_sendmsg() to aggregate/coalesce more data to the last (partial) segment. Nagle test is properly done _before_ the call to tcp_mss_split_point() I think the current behavior of tcp_mss_split_point() is a leftover of the old days, when David Miller (and others) added TSO support to the stack. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/tcp.h b/include/net/tcp.h index 70e55d2..a2ec237 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -980,7 +980,8 @@ bool tcp_is_cwnd_limited(const struct sock *sk, u32 in_flight); static inline void tcp_minshall_update(struct tcp_sock *tp, unsigned int mss, const struct sk_buff *skb) { - if (skb->len < mss) + if (skb->len < mss || + tcp_skb_pcount(skb) * tcp_skb_mss(skb) > skb->len) tp->snd_sml = TCP_SKB_CB(skb)->end_seq; }