diff mbox

tcp: properly increase rcv_ssthresh for ofo packets

Message ID 1378488958.31445.47.camel@edumazet-glaptop
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Sept. 6, 2013, 5:35 p.m. UTC
From: Eric Dumazet <edumazet@google.com>

TCP receive window handling is multi staged.

A socket has a memory budget, static or dynamic, in sk_rcvbuf.

Because we do not really know how this memory budget translates to
a TCP window (payload), TCP announces a small initial window
(about 20 MSS).

When a packet is received, we increase TCP rcv_win depending
on the payload/truesize ratio of this packet. Good citizen
packets give a hint that it's reasonable to have rcv_win = sk_rcvbuf/2

This heuristic takes place in tcp_grow_window()

Problem is : We currently call tcp_grow_window() only for in-order
packets.

This means that reorders or packet losses stop proper grow of
rcv_win, and senders are unable to benefit from fast recovery,
or proper reordering level detection.

Really, a packet being stored in OFO queue is not a bad citizen.
It should be part of the game as in-order packets.

In our traces, we very often see sender is limited by linux small
receive windows, even if linux hosts use autotuning (DRS) and should
allow rcv_win to grow to ~3MB.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/tcp_input.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Sept. 6, 2013, 6:48 p.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 06 Sep 2013 10:35:58 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> TCP receive window handling is multi staged.
> 
> A socket has a memory budget, static or dynamic, in sk_rcvbuf.
> 
> Because we do not really know how this memory budget translates to
> a TCP window (payload), TCP announces a small initial window
> (about 20 MSS).
> 
> When a packet is received, we increase TCP rcv_win depending
> on the payload/truesize ratio of this packet. Good citizen
> packets give a hint that it's reasonable to have rcv_win = sk_rcvbuf/2
> 
> This heuristic takes place in tcp_grow_window()
> 
> Problem is : We currently call tcp_grow_window() only for in-order
> packets.
> 
> This means that reorders or packet losses stop proper grow of
> rcv_win, and senders are unable to benefit from fast recovery,
> or proper reordering level detection.
> 
> Really, a packet being stored in OFO queue is not a bad citizen.
> It should be part of the game as in-order packets.
> 
> In our traces, we very often see sender is limited by linux small
> receive windows, even if linux hosts use autotuning (DRS) and should
> allow rcv_win to grow to ~3MB.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Acked-by: Neal Cardwell <ncardwell@google.com>

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 1969e16..28708d3 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4141,6 +4141,7 @@  static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 		if (!tcp_try_coalesce(sk, skb1, skb, &fragstolen)) {
 			__skb_queue_after(&tp->out_of_order_queue, skb1, skb);
 		} else {
+			tcp_grow_window(sk, skb);
 			kfree_skb_partial(skb, fragstolen);
 			skb = NULL;
 		}
@@ -4216,8 +4217,10 @@  add_sack:
 	if (tcp_is_sack(tp))
 		tcp_sack_new_ofo_skb(sk, seq, end_seq);
 end:
-	if (skb)
+	if (skb) {
+		tcp_grow_window(sk, skb);
 		skb_set_owner_r(skb, sk);
+	}
 }
 
 static int __must_check tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, int hdrlen,