Message ID | 20171211234253.102924-1-ycheng@google.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Series | [net-next] tcp: allow TLP in ECN CWR | expand |
From: Yuchung Cheng <ycheng@google.com> Date: Mon, 11 Dec 2017 15:42:53 -0800 > From: Neal Cardwell <ncardwell@google.com> > > This patch enables tail loss probe in cwnd reduction (CWR) state > to detect potential losses. Prior to this patch, since the sender > uses PRR to determine the cwnd in CWR state, the combination of > CWR+PRR plus tcp_tso_should_defer() could cause unnecessary stalls > upon losses: PRR makes cwnd so gentle that tcp_tso_should_defer() > defers sending wait for more ACKs. The ACKs may not come due to > packet losses. > > Disallowing TLP when there is unused cwnd had the primary effect > of disallowing TLP when there is TSO deferral, Nagle deferral, > or we hit the rwin limit. Because basically every application > write() or incoming ACK will cause us to run tcp_write_xmit() > to see if we can send more, and then if we sent something we call > tcp_schedule_loss_probe() to see if we should schedule a TLP. At > that point, there are a few common reasons why some cwnd budget > could still be unused: > > (a) rwin limit > (b) nagle check > (c) TSO deferral > (d) TSQ > > For (d), after the next packet tx completion the TSQ mechanism > will allow us to send more packets, so we don't really need a > TLP (in practice it shouldn't matter whether we schedule one > or not). But for (a), (b), (c) the sender won't send any more > packets until it gets another ACK. But if the whole flight was > lost, or all the ACKs were lost, then we won't get any more ACKs, > and ideally we should schedule and send a TLP to get more feedback. > In particular for a long time we have wanted some kind of timer for > TSO deferral, and at least this would give us some kind of timer > > Reported-by: Steve Ibanez <sibanez@stanford.edu> > Signed-off-by: Neal Cardwell <ncardwell@google.com> > Signed-off-by: Yuchung Cheng <ycheng@google.com> > Reviewed-by: Nandita Dukkipati <nanditad@google.com> > Reviewed-by: Eric Dumazet <edumazet@google.com> Applied, thanks.
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index a4d214c7b506..04be9f833927 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2414,15 +2414,12 @@ bool tcp_schedule_loss_probe(struct sock *sk, bool advancing_rto) early_retrans = sock_net(sk)->ipv4.sysctl_tcp_early_retrans; /* Schedule a loss probe in 2*RTT for SACK capable connections - * in Open state, that are either limited by cwnd or application. + * not in loss recovery, that are either limited by cwnd or application. */ if ((early_retrans != 3 && early_retrans != 4) || !tp->packets_out || !tcp_is_sack(tp) || - icsk->icsk_ca_state != TCP_CA_Open) - return false; - - if ((tp->snd_cwnd > tcp_packets_in_flight(tp)) && - !tcp_write_queue_empty(sk)) + (icsk->icsk_ca_state != TCP_CA_Open && + icsk->icsk_ca_state != TCP_CA_CWR)) return false; /* Probe timeout is 2*rtt. Add minimum RTO to account