mbox series

[net-next,0/8] improving TCP behavior on host congestion

Message ID 20190116230535.162758-1-ycheng@google.com
Headers show
Series improving TCP behavior on host congestion | expand

Message

Yuchung Cheng Jan. 16, 2019, 11:05 p.m. UTC
This patch set aims to improve how TCP handle local qdisc congestion
by simplifying the previous implementation.  Previously when an
skb fails to (re)transmit due to local qdisc congestion or other
resource issue, TCP refrains from setting the skb timestamp or the
recovery starting time.

This design makes determining when to abort a stalling socket more
complicated, as the timestamps of these tranmission attempts were
missing. The stack needs to sort of infer when the original attempt
happens. A by-product is a socket may disregard the system timeout
limit (i.e. sysctl net.ipv4.tcp_retries2 or USER_TIMEOUT option),
and continue to retry until the transmission is successful.

In data-center environment when TCP RTO is small, this could cause
the socket to retry frequently for long during qdisc congestion.

The solution is to first unconditionally timestamp skb and recovery
attempt. Then retry more conservatively (twice a second) on local
qdisc congestion but abort the sockets according to the system limit.

Yuchung Cheng (8):
  tcp: exit if nothing to retransmit on RTO timeout
  tcp: always timestamp on every skb transmission
  tcp: always set retrans_stamp on recovery
  tcp: properly track retry time on passive Fast Open
  tcp: create a helper to model exponential backoff
  tcp: simplify window probe aborting on USER_TIMEOUT
  tcp: retry more conservatively on local congestion
  tcp: less aggressive window probing on local congestion

 net/ipv4/tcp_output.c | 47 ++++++++++--------------
 net/ipv4/tcp_timer.c  | 83 ++++++++++++++++++-------------------------
 2 files changed, 54 insertions(+), 76 deletions(-)

Comments

David Miller Jan. 17, 2019, 11:12 p.m. UTC | #1
From: Yuchung Cheng <ycheng@google.com>
Date: Wed, 16 Jan 2019 15:05:27 -0800

> This patch set aims to improve how TCP handle local qdisc congestion
> by simplifying the previous implementation.  Previously when an
> skb fails to (re)transmit due to local qdisc congestion or other
> resource issue, TCP refrains from setting the skb timestamp or the
> recovery starting time.
> 
> This design makes determining when to abort a stalling socket more
> complicated, as the timestamps of these tranmission attempts were
> missing. The stack needs to sort of infer when the original attempt
> happens. A by-product is a socket may disregard the system timeout
> limit (i.e. sysctl net.ipv4.tcp_retries2 or USER_TIMEOUT option),
> and continue to retry until the transmission is successful.
> 
> In data-center environment when TCP RTO is small, this could cause
> the socket to retry frequently for long during qdisc congestion.
> 
> The solution is to first unconditionally timestamp skb and recovery
> attempt. Then retry more conservatively (twice a second) on local
> qdisc congestion but abort the sockets according to the system limit.

Series applied, thanks.