From patchwork Wed May 16 23:40:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuchung Cheng X-Patchwork-Id: 915010 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="bnlJ6ES/"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40mWCJ2vW6z9s1w for ; Thu, 17 May 2018 09:40:44 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751619AbeEPXkk (ORCPT ); Wed, 16 May 2018 19:40:40 -0400 Received: from mail-wm0-f68.google.com ([74.125.82.68]:50217 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751273AbeEPXki (ORCPT ); Wed, 16 May 2018 19:40:38 -0400 Received: by mail-wm0-f68.google.com with SMTP id t11-v6so5339198wmt.0 for ; Wed, 16 May 2018 16:40:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=l7fukBO06lP3b9qWLRIZdqzGt+eODzULQImZyp6BEYU=; b=bnlJ6ES/5wUWFvANqhnXR2sqD7hpNPyQBfBpNUkCP6i2SJ21Jk01rWB+7+ADwel9tB aJDIz9exqxiEr6ZqmTwhMNUK2/3KfcT1zHovH7hpbky3YdxVLNUCv6aaVIUW9WyHwdRT /aFI1C+iFm24Uv++L836CWi0St8ra/baEHCHZexHktgcH544Oa/g4cj1gS3LaQeVQ5iW bkoFXwxXLd/3+05vws2tVvhhePqrHTkn5AaFwvv+spTbzfPYB08q1vSeV3F+LIjK71Ly Esceggoa/fGtuHFlCjLJX3asr3DYz4HlMuyOnC8yy18wl8Y3u9l/W3HmhaTKGJSgNHai VEQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=l7fukBO06lP3b9qWLRIZdqzGt+eODzULQImZyp6BEYU=; b=ZnFnQSwRVLR1GvpqsoIUaPpvnhhZZM8o4amVT8MBc6sGvuu/nrZKrWxgYChSP1Rc0K 4BhTkfgV5aniWPbn103E0nR3hDC5cCEfcdxElM8M26XozA6sTq0NyatiPRAisEmW6ktZ Dvo2pvJn+c4EBm9w31Ro+GvJeWUKguvfkuew0AGxzuc3lgGN2hoIBlQfkxZ5DVWS6IQC F+3Nu3guQ/yEaKerm0rnz7btQqenpzcvFxmlmE0lTBpvmNTmkrWYrH4WecVzbPt94g3F TqzYIJkKHtwXmWvOSdMPrptg2yq5U5lIEb+uHjdWpnCeHnxaEwwbNyh+NjkK+D3vTPQK FWtA== X-Gm-Message-State: ALKqPwdXNRCTJFmDYbf03gYAhsUVCr++ahb86wghxYpGcZgBptWbcX5W XUZYC4UW4t4egqzzNBl+J1KxYA== X-Google-Smtp-Source: AB8JxZoYjKYOXj00uGv3u2IGNLgkiltGHLOdJuxeX1rxDqHkq1Yo6LMrIrsHcSUsVkB+lcasAGQuaQ== X-Received: by 2002:a1c:7151:: with SMTP id m78-v6mr179963wmc.150.1526514036515; Wed, 16 May 2018 16:40:36 -0700 (PDT) Received: from ycheng2.svl.corp.google.com ([2620:15c:2c4:201:d660:6c0b:8a4f:4c77]) by smtp.gmail.com with ESMTPSA id b11-v6sm4796488wrf.50.2018.05.16.16.40.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 May 2018 16:40:35 -0700 (PDT) From: Yuchung Cheng To: davem@davemloft.net Cc: netdev@vger.kernel.org, edumazet@google.com, ncardwell@google.com, soheil@google.com, priyarjha@google.com, Yuchung Cheng Subject: [PATCH net-next 1/8] tcp: support DUPACK threshold in RACK Date: Wed, 16 May 2018 16:40:10 -0700 Message-Id: <20180516234017.172775-2-ycheng@google.com> X-Mailer: git-send-email 2.17.0.441.gb46fe60e1d-goog In-Reply-To: <20180516234017.172775-1-ycheng@google.com> References: <20180516234017.172775-1-ycheng@google.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch adds support for the classic DUPACK threshold rule (#DupThresh) in RACK. When the number of packets SACKed is greater or equal to the threshold, RACK sets the reordering window to zero which would immediately mark all the unsacked packets below the highest SACKed sequence lost. Since this approach is known to not work well with reordering, RACK only uses it if no reordering has been observed. The DUPACK threshold rule is a particularly useful extension to the fast recoveries triggered by RACK reordering timer. For example data-center transfers where the RTT is much smaller than a timer tick, or high RTT path where the default RTT/4 may take too long. Note that this patch differs slightly from RFC6675. RFC6675 considers a packet lost when at least #DupThresh higher-sequence packets are SACKed. With RACK, for connections that have seen reordering, RACK continues to use a dynamically-adaptive time-based reordering window to detect losses. But for connections on which we have not yet seen reordering, this patch considers a packet lost when at least one higher sequence packet is SACKed and the total number of SACKed packets is at least DupThresh. For example, suppose a connection has not seen reordering, and sends 10 packets, and packets 3, 5, 7 are SACKed. RFC6675 considers packets 1 and 2 lost. RACK considers packets 1, 2, 4, 6 lost. There is some small risk of spurious retransmits here due to reordering. However, this is mostly limited to the first flight of a connection on which the sender receives SACKs from reordering. And RFC 6675 and FACK loss detection have a similar risk on the first flight with reordering (it's just that the risk of spurious retransmits from reordering was slightly narrower for those older algorithms due to the margin of 3*MSS). Also the minimum reordering window is reduced from 1 msec to 0 to recover quicker on short RTT transfers. Therefore RACK is more aggressive in marking packets lost during recovery to reduce the reordering window timeouts. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Reviewed-by: Eric Dumazet Reviewed-by: Soheil Hassas Yeganeh Reviewed-by: Priyaranjan Jha --- Documentation/networking/ip-sysctl.txt | 1 + include/net/tcp.h | 1 + net/ipv4/tcp_recovery.c | 40 +++++++++++++++++--------- 3 files changed, 29 insertions(+), 13 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 59afc9a10b4f..13bbac50dc8b 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -451,6 +451,7 @@ tcp_recovery - INTEGER RACK: 0x1 enables the RACK loss detection for fast detection of lost retransmissions and tail drops. RACK: 0x2 makes RACK's reordering window static (min_rtt/4). + RACK: 0x4 disables RACK's DUPACK threshold heuristic Default: 0x1 diff --git a/include/net/tcp.h b/include/net/tcp.h index 3b1d617b0110..85000c85ddcd 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -245,6 +245,7 @@ extern long sysctl_tcp_mem[3]; #define TCP_RACK_LOSS_DETECTION 0x1 /* Use RACK to detect losses */ #define TCP_RACK_STATIC_REO_WND 0x2 /* Use static RACK reo wnd */ +#define TCP_RACK_NO_DUPTHRESH 0x4 /* Do not use DUPACK threshold in RACK */ extern atomic_long_t tcp_memory_allocated; extern struct percpu_counter tcp_sockets_allocated; diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c index 3a81720ac0c4..1c1bdf12a96f 100644 --- a/net/ipv4/tcp_recovery.c +++ b/net/ipv4/tcp_recovery.c @@ -21,6 +21,32 @@ static bool tcp_rack_sent_after(u64 t1, u64 t2, u32 seq1, u32 seq2) return t1 > t2 || (t1 == t2 && after(seq1, seq2)); } +u32 tcp_rack_reo_wnd(const struct sock *sk) +{ + struct tcp_sock *tp = tcp_sk(sk); + + if (!tp->rack.reord) { + /* If reordering has not been observed, be aggressive during + * the recovery or starting the recovery by DUPACK threshold. + */ + if (inet_csk(sk)->icsk_ca_state >= TCP_CA_Recovery) + return 0; + + if (tp->sacked_out >= tp->reordering && + !(sock_net(sk)->ipv4.sysctl_tcp_recovery & TCP_RACK_NO_DUPTHRESH)) + return 0; + } + + /* To be more reordering resilient, allow min_rtt/4 settling delay. + * Use min_rtt instead of the smoothed RTT because reordering is + * often a path property and less related to queuing or delayed ACKs. + * Upon receiving DSACKs, linearly increase the window up to the + * smoothed RTT. + */ + return min((tcp_min_rtt(tp) >> 2) * tp->rack.reo_wnd_steps, + tp->srtt_us >> 3); +} + /* RACK loss detection (IETF draft draft-ietf-tcpm-rack-01): * * Marks a packet lost, if some packet sent later has been (s)acked. @@ -44,23 +70,11 @@ static bool tcp_rack_sent_after(u64 t1, u64 t2, u32 seq1, u32 seq2) static void tcp_rack_detect_loss(struct sock *sk, u32 *reo_timeout) { struct tcp_sock *tp = tcp_sk(sk); - u32 min_rtt = tcp_min_rtt(tp); struct sk_buff *skb, *n; u32 reo_wnd; *reo_timeout = 0; - /* To be more reordering resilient, allow min_rtt/4 settling delay - * (lower-bounded to 1000uS). We use min_rtt instead of the smoothed - * RTT because reordering is often a path property and less related - * to queuing or delayed ACKs. - */ - reo_wnd = 1000; - if ((tp->rack.reord || inet_csk(sk)->icsk_ca_state < TCP_CA_Recovery) && - min_rtt != ~0U) { - reo_wnd = max((min_rtt >> 2) * tp->rack.reo_wnd_steps, reo_wnd); - reo_wnd = min(reo_wnd, tp->srtt_us >> 3); - } - + reo_wnd = tcp_rack_reo_wnd(sk); list_for_each_entry_safe(skb, n, &tp->tsorted_sent_queue, tcp_tsorted_anchor) { struct tcp_skb_cb *scb = TCP_SKB_CB(skb); From patchwork Wed May 16 23:40:11 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuchung Cheng X-Patchwork-Id: 915011 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="qqKfdOKW"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40mWCL24bmz9s1B for ; Thu, 17 May 2018 09:40:46 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751665AbeEPXkm (ORCPT ); Wed, 16 May 2018 19:40:42 -0400 Received: from mail-wm0-f65.google.com ([74.125.82.65]:54746 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751438AbeEPXkk (ORCPT ); Wed, 16 May 2018 19:40:40 -0400 Received: by mail-wm0-f65.google.com with SMTP id f6-v6so5246191wmc.4 for ; Wed, 16 May 2018 16:40:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=A/dDemnUIoD56+1lPiwy4Y1M9ua7HFkRNQ84/XPiIPA=; b=qqKfdOKW1b7/iRB1aPFqp1gRs6wEG93M02HNayTw5es9Zz7mTpy7NF6QgLhY6f7VQt heVGgb7hHGw4l0kJMxoipjXrUiqQ1k2FQ7iCxn/hM5qQkCS1jxPyMm53wVhlgJkILPWt pwzHk8pxYzg/oDBaeP9wW7NBP+5GHXP8IVBV0HPEMUs9iBgfu36qmQAsS+8p0YH66ueK uSF1fD/050JpWV+vQXdRBaRjxToOMl/efW22lZrZCoyuDew/D6f3PemGLXx7oFFgRPLL GTpPvWc21rLfUJQYA5QiPkOtekD8uph7mm6gvHo0jF46fQDaG7Sczg1DJ1w+JNB+MZ/B id1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=A/dDemnUIoD56+1lPiwy4Y1M9ua7HFkRNQ84/XPiIPA=; b=oLvxLN/YedKtkGQQkl+CHiNw67X1TJoeT7SkpHh4UNjXHU4P2ZXzq9CW8Cag2CvDo6 xEaltTsHKhiJY3uGRI3dxQ9TGAJ8eBI7GMuLxBDuas1pixgt0Qbv/DH8M35SqnKmrFr4 xo76StdZ6w7fnPeywZzh7WCdlU1CfDWkoGHIEORxCEhkpgF5reDjnQt6sQ6gtDY7s5T1 R1CAHHapoECj9fbO6n28P+65FEmQUaM8S5hNvlP2qdHsBWY0rFBDpicKVBx1TzzNjAIc XTSu8TIxD+EowVHdmUBZDmYl/U2mAeo+wEU1jkU+eP4Q4ItYRAXJ6JP7oPiQaddBhNaC EYdA== X-Gm-Message-State: ALKqPwdl0q9TSAy1h4bNtx8Jd1KQzmO+AzUMHfJiw2+k/GOt9kurI2Dc 3DON77yTEVTP1YhvCeNKB6EGmw== X-Google-Smtp-Source: AB8JxZqvqQ6Ustp1LL48upyQV87PDmrkAifO+ehAAnOlEsPgFIZIUdlyfeAxTFbE3FY+R53+YSVkDA== X-Received: by 2002:a1c:1a53:: with SMTP id a80-v6mr203266wma.36.1526514039353; Wed, 16 May 2018 16:40:39 -0700 (PDT) Received: from ycheng2.svl.corp.google.com ([2620:15c:2c4:201:d660:6c0b:8a4f:4c77]) by smtp.gmail.com with ESMTPSA id b11-v6sm4796488wrf.50.2018.05.16.16.40.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 May 2018 16:40:38 -0700 (PDT) From: Yuchung Cheng To: davem@davemloft.net Cc: netdev@vger.kernel.org, edumazet@google.com, ncardwell@google.com, soheil@google.com, priyarjha@google.com, Yuchung Cheng Subject: [PATCH net-next 2/8] tcp: disable RFC6675 loss detection Date: Wed, 16 May 2018 16:40:11 -0700 Message-Id: <20180516234017.172775-3-ycheng@google.com> X-Mailer: git-send-email 2.17.0.441.gb46fe60e1d-goog In-Reply-To: <20180516234017.172775-1-ycheng@google.com> References: <20180516234017.172775-1-ycheng@google.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch disables RFC6675 loss detection and make sysctl net.ipv4.tcp_recovery = 1 controls a binary choice between RACK (1) or RFC6675 (0). Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Reviewed-by: Eric Dumazet Reviewed-by: Soheil Hassas Yeganeh Reviewed-by: Priyaranjan Jha --- Documentation/networking/ip-sysctl.txt | 3 ++- net/ipv4/tcp_input.c | 12 ++++++++---- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 13bbac50dc8b..ea304a23c8d7 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -449,7 +449,8 @@ tcp_recovery - INTEGER features. RACK: 0x1 enables the RACK loss detection for fast detection of lost - retransmissions and tail drops. + retransmissions and tail drops. It also subsumes and disables + RFC6675 recovery for SACK connections. RACK: 0x2 makes RACK's reordering window static (min_rtt/4). RACK: 0x4 disables RACK's DUPACK threshold heuristic diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index b188e0d75edd..ccbe04f80040 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2035,6 +2035,11 @@ static inline int tcp_dupack_heuristics(const struct tcp_sock *tp) return tp->sacked_out + 1; } +static bool tcp_is_rack(const struct sock *sk) +{ + return sock_net(sk)->ipv4.sysctl_tcp_recovery & TCP_RACK_LOSS_DETECTION; +} + /* Linux NewReno/SACK/ECN state machine. * -------------------------------------- * @@ -2141,7 +2146,7 @@ static bool tcp_time_to_recover(struct sock *sk, int flag) return true; /* Not-A-Trick#2 : Classic rule... */ - if (tcp_dupack_heuristics(tp) > tp->reordering) + if (!tcp_is_rack(sk) && tcp_dupack_heuristics(tp) > tp->reordering) return true; return false; @@ -2722,8 +2727,7 @@ static void tcp_rack_identify_loss(struct sock *sk, int *ack_flag) { struct tcp_sock *tp = tcp_sk(sk); - /* Use RACK to detect loss */ - if (sock_net(sk)->ipv4.sysctl_tcp_recovery & TCP_RACK_LOSS_DETECTION) { + if (tcp_is_rack(sk)) { u32 prior_retrans = tp->retrans_out; tcp_rack_mark_lost(sk); @@ -2862,7 +2866,7 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una, fast_rexmit = 1; } - if (do_lost) + if (!tcp_is_rack(sk) && do_lost) tcp_update_scoreboard(sk, fast_rexmit); *rexmit = REXMIT_LOST; } From patchwork Wed May 16 23:40:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuchung Cheng X-Patchwork-Id: 915012 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="lbwZ8Hvt"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40mWCS4LTNz9s1B for ; Thu, 17 May 2018 09:40:52 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752014AbeEPXks (ORCPT ); Wed, 16 May 2018 19:40:48 -0400 Received: from mail-wm0-f67.google.com ([74.125.82.67]:38817 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751671AbeEPXkn (ORCPT ); Wed, 16 May 2018 19:40:43 -0400 Received: by mail-wm0-f67.google.com with SMTP id m129-v6so5725995wmb.3 for ; Wed, 16 May 2018 16:40:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=qrXNK7uPcwJireZWG8WDQ1LthxBql93gzMXJYsyYku8=; b=lbwZ8HvtHsZFQeY3vV/4YfGmbRJuvWaL77Zsxvf/Dc/xu9EIjbOcQysitdCtXowGiO ++NiWMUOzH3qxLKLEdx0XmkB+xL5mPphmDnMC7aN+ALuXvb/xEQ3iQBlstVpCYSYiJIJ dd8MoykhTnTtNtUtcpZ7uyhoYp3TG/JNmBCTrtYbyixYkL6FVc2nBH5kETYNMBixEeGN COSqrEqoovfmzMfmwO+hBGo/ihbCtaLqVqw/4iR3g6E6E3nLnivm4GcQW6KNHbCVncEk wMxqpNscf9qxrkBedDfMgtV94xMu0mZ0ARIVFpU4nqx5jYrUhmYiH+RbIYfqlT3aikQb mORw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=qrXNK7uPcwJireZWG8WDQ1LthxBql93gzMXJYsyYku8=; b=moey8k/TIByFF9sBXTcyBEQI1yipBQqK81W8FFrF895U5cZ8CvuQHlDdfVNZpPqN2Q 9IGjHKAzqIBMFKQhzest65tGr0+lNr8XOVXy5As6JNDztWksNpA2GOovqStQkQ2g83jp HDNkzMc5ysHBe4JVxHDEWnQpoKNDxoceJKeeM078W1naoFJ+TuyDJjrp9oG7vCUbK6To OFkzpB/L7c9muyF1sj5hqYLStUqCF8r3sXDrJqzqayZyFijmJRlOjArlXXmdcT0c59zV dQuBmMsQsk4MVjnDK8bwNhTK+1Ju2dKDco7erTtJnC7+9spLiXVJmd9s2iAF1SpNjBsi CM0w== X-Gm-Message-State: ALKqPweGSWZtWRZvyCtbtnaBsBmq3HUE5GJWgQGCEzNdkO7Nal+AsGhS d2d/mY/zwWpoDdYCdeNyFDQ/c9ChQ9A= X-Google-Smtp-Source: AB8JxZpqLbK80PJv04jyVMNxLFMeiIxvSzyrq54sWrK0l9hD1Q3M11gLsdfJ3kUSgn76U6NwAdk4mA== X-Received: by 2002:a1c:6503:: with SMTP id z3-v6mr162087wmb.11.1526514042012; Wed, 16 May 2018 16:40:42 -0700 (PDT) Received: from ycheng2.svl.corp.google.com ([2620:15c:2c4:201:d660:6c0b:8a4f:4c77]) by smtp.gmail.com with ESMTPSA id b11-v6sm4796488wrf.50.2018.05.16.16.40.39 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 May 2018 16:40:41 -0700 (PDT) From: Yuchung Cheng To: davem@davemloft.net Cc: netdev@vger.kernel.org, edumazet@google.com, ncardwell@google.com, soheil@google.com, priyarjha@google.com, Yuchung Cheng Subject: [PATCH net-next 3/8] tcp: simpler NewReno implementation Date: Wed, 16 May 2018 16:40:12 -0700 Message-Id: <20180516234017.172775-4-ycheng@google.com> X-Mailer: git-send-email 2.17.0.441.gb46fe60e1d-goog In-Reply-To: <20180516234017.172775-1-ycheng@google.com> References: <20180516234017.172775-1-ycheng@google.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This is a rewrite of NewReno loss recovery implementation that is simpler and standalone for readability and better performance by using less states. Note that NewReno refers to RFC6582 as a modification to the fast recovery algorithm. It is used only if the connection does not support SACK in Linux. It should not to be confused with the Reno (AIMD) congestion control. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Reviewed-by: Eric Dumazet Reviewed-by: Soheil Hassas Yeganeh Reviewed-by: Priyaranjan Jha --- include/net/tcp.h | 1 + net/ipv4/tcp_input.c | 19 +++++++++++-------- net/ipv4/tcp_recovery.c | 27 +++++++++++++++++++++++++++ 3 files changed, 39 insertions(+), 8 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 85000c85ddcd..d7f81325bee5 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1878,6 +1878,7 @@ void tcp_v4_init(void); void tcp_init(void); /* tcp_recovery.c */ +void tcp_newreno_mark_lost(struct sock *sk, bool snd_una_advanced); extern void tcp_rack_mark_lost(struct sock *sk); extern void tcp_rack_advance(struct tcp_sock *tp, u8 sacked, u32 end_seq, u64 xmit_time); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index ccbe04f80040..076206873e3e 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2223,9 +2223,7 @@ static void tcp_update_scoreboard(struct sock *sk, int fast_rexmit) { struct tcp_sock *tp = tcp_sk(sk); - if (tcp_is_reno(tp)) { - tcp_mark_head_lost(sk, 1, 1); - } else { + if (tcp_is_sack(tp)) { int sacked_upto = tp->sacked_out - tp->reordering; if (sacked_upto >= 0) tcp_mark_head_lost(sk, sacked_upto, 0); @@ -2723,11 +2721,16 @@ static bool tcp_try_undo_partial(struct sock *sk, u32 prior_snd_una) return false; } -static void tcp_rack_identify_loss(struct sock *sk, int *ack_flag) +static void tcp_identify_packet_loss(struct sock *sk, int *ack_flag) { struct tcp_sock *tp = tcp_sk(sk); - if (tcp_is_rack(sk)) { + if (tcp_rtx_queue_empty(sk)) + return; + + if (unlikely(tcp_is_reno(tp))) { + tcp_newreno_mark_lost(sk, *ack_flag & FLAG_SND_UNA_ADVANCED); + } else if (tcp_is_rack(sk)) { u32 prior_retrans = tp->retrans_out; tcp_rack_mark_lost(sk); @@ -2823,11 +2826,11 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una, tcp_try_keep_open(sk); return; } - tcp_rack_identify_loss(sk, ack_flag); + tcp_identify_packet_loss(sk, ack_flag); break; case TCP_CA_Loss: tcp_process_loss(sk, flag, is_dupack, rexmit); - tcp_rack_identify_loss(sk, ack_flag); + tcp_identify_packet_loss(sk, ack_flag); if (!(icsk->icsk_ca_state == TCP_CA_Open || (*ack_flag & FLAG_LOST_RETRANS))) return; @@ -2844,7 +2847,7 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una, if (icsk->icsk_ca_state <= TCP_CA_Disorder) tcp_try_undo_dsack(sk); - tcp_rack_identify_loss(sk, ack_flag); + tcp_identify_packet_loss(sk, ack_flag); if (!tcp_time_to_recover(sk, flag)) { tcp_try_to_open(sk, flag); return; diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c index 1c1bdf12a96f..299b0e38aa9a 100644 --- a/net/ipv4/tcp_recovery.c +++ b/net/ipv4/tcp_recovery.c @@ -216,3 +216,30 @@ void tcp_rack_update_reo_wnd(struct sock *sk, struct rate_sample *rs) tp->rack.reo_wnd_steps = 1; } } + +/* RFC6582 NewReno recovery for non-SACK connection. It simply retransmits + * the next unacked packet upon receiving + * a) three or more DUPACKs to start the fast recovery + * b) an ACK acknowledging new data during the fast recovery. + */ +void tcp_newreno_mark_lost(struct sock *sk, bool snd_una_advanced) +{ + const u8 state = inet_csk(sk)->icsk_ca_state; + struct tcp_sock *tp = tcp_sk(sk); + + if ((state < TCP_CA_Recovery && tp->sacked_out >= tp->reordering) || + (state == TCP_CA_Recovery && snd_una_advanced)) { + struct sk_buff *skb = tcp_rtx_queue_head(sk); + u32 mss; + + if (TCP_SKB_CB(skb)->sacked & TCPCB_LOST) + return; + + mss = tcp_skb_mss(skb); + if (tcp_skb_pcount(skb) > 1 && skb->len > mss) + tcp_fragment(sk, TCP_FRAG_IN_RTX_QUEUE, skb, + mss, mss, GFP_ATOMIC); + + tcp_skb_mark_lost_uncond_verify(tp, skb); + } +} From patchwork Wed May 16 23:40:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Yuchung Cheng X-Patchwork-Id: 915017 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="cGUzGyxT"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40mWCw5QV5z9s1B for ; Thu, 17 May 2018 09:41:16 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752168AbeEPXlO (ORCPT ); Wed, 16 May 2018 19:41:14 -0400 Received: from mail-wr0-f193.google.com ([209.85.128.193]:42895 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751698AbeEPXkq (ORCPT ); Wed, 16 May 2018 19:40:46 -0400 Received: by mail-wr0-f193.google.com with SMTP id t16-v6so560018wrm.9 for ; Wed, 16 May 2018 16:40:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=I+ZHDzdl+p+5HjZ+NT9tgMNYOYthm+ZeuKa5zrmo2lU=; b=cGUzGyxTNaPXo3F4ZcyKXjuMSz6Rh2sA5hBhnQL/rO7N5uAOMpPOJiCJkfKAWQF7Uy ZsFN9rnvMeI6ttTOFJXPvC21YWGb6N4aQgz2p098X6iHvTWw0+SLM78JVm21uR6hDN/t 7F1k8JK7EV6Rj1w2xqMeyyLGy+mbz3Oy8qiUeYyc3cljd8rWmN8bzL3g6+xcTEBN74ss umxiayvwrtzA1yreClBVHzYVLWcqXabsuoUFXrZiQeudGqsBQrj/ZaIScqSsORemd2sH einTPMVpoSQUtFt8sewLhPIt1uAkJQWP7xjDbv5f9bP1DjhsIhrjAfX3aTevoGCSO2Zi 2z4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=I+ZHDzdl+p+5HjZ+NT9tgMNYOYthm+ZeuKa5zrmo2lU=; b=kfgZxEgzm0PkR0Q57KTy0OjEOgoFszWQ1ysEygC5OZMwfKSQ1eAiBB/m+b8Z/4myOJ B7YGAtJaYbGi6pT3TxTPYopzdFnn8Zz3gEcKApNAlLI457shUqCn6qyJbmREpkk4JDz6 oLq0XH9rNjqXp+bstbyYHz2GtoUgObKHfvI7sZcUvw9jOad48x4tjjEwYIOLtdMh15YW DPXGctzdr+Wk0Xc1P49IvEU0lEdoYA4faiBU7zwuNYBVAu9yY1GmhOEoYWtLmC5cHBFM gybLEdoHRq2pay+mr1DHDNQXqJZXhYYdE3Q1SVCDGgKJh4ePzgSoqVPXEXhMP0L9lje4 DMEw== X-Gm-Message-State: ALKqPwfDj8ZOkD81QqnMeAArNcn/0nHiTJfkAHwQfMegcgrXMgtjDX5X rEKDFNUhZtPAZQCZZ2KvP7WKxw== X-Google-Smtp-Source: AB8JxZocpetEfUN2I5glzW13Ub+L7WLq2BRsKEAzxT1NHfY6Kv742F4sCgywI1sYTDgAWJjvf0uEXA== X-Received: by 2002:adf:9ed0:: with SMTP id b16-v6mr2048635wrf.170.1526514044709; Wed, 16 May 2018 16:40:44 -0700 (PDT) Received: from ycheng2.svl.corp.google.com ([2620:15c:2c4:201:d660:6c0b:8a4f:4c77]) by smtp.gmail.com with ESMTPSA id b11-v6sm4796488wrf.50.2018.05.16.16.40.42 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 May 2018 16:40:43 -0700 (PDT) From: Yuchung Cheng To: davem@davemloft.net Cc: netdev@vger.kernel.org, edumazet@google.com, ncardwell@google.com, soheil@google.com, priyarjha@google.com, Yuchung Cheng Subject: [PATCH net-next 4/8] tcp: account lost retransmit after timeout Date: Wed, 16 May 2018 16:40:13 -0700 Message-Id: <20180516234017.172775-5-ycheng@google.com> X-Mailer: git-send-email 2.17.0.441.gb46fe60e1d-goog In-Reply-To: <20180516234017.172775-1-ycheng@google.com> References: <20180516234017.172775-1-ycheng@google.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The previous approach for the lost and retransmit bits was to wipe the slate clean: zero all the lost and retransmit bits, correspondingly zero the lost_out and retrans_out counters, and then add back the lost bits (and correspondingly increment lost_out). The new approach is to treat this very much like marking packets lost in fast recovery. We don’t wipe the slate clean. We just say that for all packets that were not yet marked sacked or lost, we now mark them as lost in exactly the same way we do for fast recovery. This fixes the lost retransmit accounting at RTO time and greatly simplifies the RTO code by sharing much of the logic with Fast Recovery. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Reviewed-by: Eric Dumazet Reviewed-by: Soheil Hassas Yeganeh Reviewed-by: Priyaranjan Jha --- include/net/tcp.h | 1 + net/ipv4/tcp_input.c | 18 +++--------------- net/ipv4/tcp_recovery.c | 4 ++-- 3 files changed, 6 insertions(+), 17 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index d7f81325bee5..402484ed9b57 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1878,6 +1878,7 @@ void tcp_v4_init(void); void tcp_init(void); /* tcp_recovery.c */ +void tcp_mark_skb_lost(struct sock *sk, struct sk_buff *skb); void tcp_newreno_mark_lost(struct sock *sk, bool snd_una_advanced); extern void tcp_rack_mark_lost(struct sock *sk); extern void tcp_rack_advance(struct tcp_sock *tp, u8 sacked, u32 end_seq, diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 076206873e3e..6fb0a28977a0 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1929,7 +1929,6 @@ void tcp_enter_loss(struct sock *sk) struct sk_buff *skb; bool new_recovery = icsk->icsk_ca_state < TCP_CA_Recovery; bool is_reneg; /* is receiver reneging on SACKs? */ - bool mark_lost; /* Reduce ssthresh if it has not yet been made inside this window. */ if (icsk->icsk_ca_state <= TCP_CA_Disorder || @@ -1945,9 +1944,6 @@ void tcp_enter_loss(struct sock *sk) tp->snd_cwnd_cnt = 0; tp->snd_cwnd_stamp = tcp_jiffies32; - tp->retrans_out = 0; - tp->lost_out = 0; - if (tcp_is_reno(tp)) tcp_reset_reno_sack(tp); @@ -1959,21 +1955,13 @@ void tcp_enter_loss(struct sock *sk) /* Mark SACK reneging until we recover from this loss event. */ tp->is_sack_reneg = 1; } - tcp_clear_all_retrans_hints(tp); - skb_rbtree_walk_from(skb) { - mark_lost = (!(TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) || - is_reneg); - if (mark_lost) - tcp_sum_lost(tp, skb); - TCP_SKB_CB(skb)->sacked &= (~TCPCB_TAGBITS)|TCPCB_SACKED_ACKED; - if (mark_lost) { + if (is_reneg) TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_ACKED; - TCP_SKB_CB(skb)->sacked |= TCPCB_LOST; - tp->lost_out += tcp_skb_pcount(skb); - } + tcp_mark_skb_lost(sk, skb); } tcp_verify_left_out(tp); + tcp_clear_all_retrans_hints(tp); /* Timeout in disordered state after receiving substantial DUPACKs * suggests that the degree of reordering is over-estimated. diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c index 299b0e38aa9a..b2f9be388bf3 100644 --- a/net/ipv4/tcp_recovery.c +++ b/net/ipv4/tcp_recovery.c @@ -2,7 +2,7 @@ #include #include -static void tcp_rack_mark_skb_lost(struct sock *sk, struct sk_buff *skb) +void tcp_mark_skb_lost(struct sock *sk, struct sk_buff *skb) { struct tcp_sock *tp = tcp_sk(sk); @@ -95,7 +95,7 @@ static void tcp_rack_detect_loss(struct sock *sk, u32 *reo_timeout) remaining = tp->rack.rtt_us + reo_wnd - tcp_stamp_us_delta(tp->tcp_mstamp, skb->skb_mstamp); if (remaining <= 0) { - tcp_rack_mark_skb_lost(sk, skb); + tcp_mark_skb_lost(sk, skb); list_del_init(&skb->tcp_tsorted_anchor); } else { /* Record maximum wait time */ From patchwork Wed May 16 23:40:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuchung Cheng X-Patchwork-Id: 915016 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="j2ozgPBO"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40mWCp1sY2z9s1B for ; Thu, 17 May 2018 09:41:10 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752044AbeEPXkw (ORCPT ); Wed, 16 May 2018 19:40:52 -0400 Received: from mail-wm0-f65.google.com ([74.125.82.65]:37554 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751984AbeEPXkt (ORCPT ); Wed, 16 May 2018 19:40:49 -0400 Received: by mail-wm0-f65.google.com with SMTP id l1-v6so5771493wmb.2 for ; Wed, 16 May 2018 16:40:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=JRCU66xhNE0CNPJv49O2TGAPNchnP44WInBqDAUzIJA=; b=j2ozgPBOTEDdMksB0vJnTogtnJ6/o2naYq953aTEy3SrHCxHQlEnAsRq3XgSZFwaWN bJXzP0fsd1RRlLkY6eF2NIMv12aLTfJMKHRyMa9L5ory94UUZnYhovhvY3usOZMQPfG3 P2T1WsSb1C3YoQ1KUtTipfns4oVMEJ77qPTyTvLpvvh4alvIKYLhZgy5VtD9XQWneweq lSKVl97zRrxdvB1g/zDopvLq1MoxvChgQsqoJ3uDNlqga6LxEHN2xVUyP5+sJKqrYZ6t 8cYRhkm4WP9Y/eVS1Q3OUVkBf303DDuuIXDUNX875EaQqrkQmSDSsrrAncLdh867U5/k 77Cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=JRCU66xhNE0CNPJv49O2TGAPNchnP44WInBqDAUzIJA=; b=IdEOnvex3QmIlTXt5LCgGia0ikDV3Vi1LMY7sO4Omsr7pmvSQ/mmsXRIYPYHisRR7y B/y6VNDBcCDklvEKl2Y4MgKhgDXEFbWym669E9aQjPGe16a+4q2ToUuzEHFAo6MZfR3O oyB6aSix2/lNeGciui5MraYg/2lB4w1TxaT/1Gr7lVBHOLL3i935d4BFhC6jEX4O6f0N 7/oig0b+RLXA1jdqNzK9PjjGF6hvhbKSZSvyqUGE7I0yu3tRWHQURPH5hwzvHMk9BCM5 6hF1zvt5nWFRF2xoQ7HoO5HaYmjXf4NFtVzLxJOGelIUht00+AOsc5GPBjGi1YDFBQGX tS4A== X-Gm-Message-State: ALKqPwfTHXpDEUG4tpiuVwgOXP8LoFtplzY7tpdR9IRFdC/ncHrH76xF Q6VLv6ij/BHQ2/CmN9CCH2EpHA== X-Google-Smtp-Source: AB8JxZpXq5A3jtSDwzSxFJvKSnx2974elpW+wqjPlztDfbkVOHi5IBIWaUlGk+EEGWfVaU7ZeVUKUw== X-Received: by 2002:a1c:acc2:: with SMTP id v185-v6mr167634wme.67.1526514047391; Wed, 16 May 2018 16:40:47 -0700 (PDT) Received: from ycheng2.svl.corp.google.com ([2620:15c:2c4:201:d660:6c0b:8a4f:4c77]) by smtp.gmail.com with ESMTPSA id b11-v6sm4796488wrf.50.2018.05.16.16.40.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 May 2018 16:40:46 -0700 (PDT) From: Yuchung Cheng To: davem@davemloft.net Cc: netdev@vger.kernel.org, edumazet@google.com, ncardwell@google.com, soheil@google.com, priyarjha@google.com, Yuchung Cheng Subject: [PATCH net-next 5/8] tcp: new helper tcp_timeout_mark_lost Date: Wed, 16 May 2018 16:40:14 -0700 Message-Id: <20180516234017.172775-6-ycheng@google.com> X-Mailer: git-send-email 2.17.0.441.gb46fe60e1d-goog In-Reply-To: <20180516234017.172775-1-ycheng@google.com> References: <20180516234017.172775-1-ycheng@google.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Refactor using a new helper, tcp_timeout_mark_loss(), that marks packets lost upon RTO. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Reviewed-by: Eric Dumazet Reviewed-by: Soheil Hassas Yeganeh Reviewed-by: Priyaranjan Jha --- net/ipv4/tcp_input.c | 50 +++++++++++++++++++++++++------------------- 1 file changed, 29 insertions(+), 21 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 6fb0a28977a0..af32accda2a9 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1917,18 +1917,43 @@ static inline void tcp_init_undo(struct tcp_sock *tp) tp->undo_retrans = tp->retrans_out ? : -1; } -/* Enter Loss state. If we detect SACK reneging, forget all SACK information +/* If we detect SACK reneging, forget all SACK information * and reset tags completely, otherwise preserve SACKs. If receiver * dropped its ofo queue, we will know this due to reneging detection. */ +static void tcp_timeout_mark_lost(struct sock *sk) +{ + struct tcp_sock *tp = tcp_sk(sk); + struct sk_buff *skb; + bool is_reneg; /* is receiver reneging on SACKs? */ + + skb = tcp_rtx_queue_head(sk); + is_reneg = skb && (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED); + if (is_reneg) { + NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSACKRENEGING); + tp->sacked_out = 0; + /* Mark SACK reneging until we recover from this loss event. */ + tp->is_sack_reneg = 1; + } else if (tcp_is_reno(tp)) { + tcp_reset_reno_sack(tp); + } + + skb_rbtree_walk_from(skb) { + if (is_reneg) + TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_ACKED; + tcp_mark_skb_lost(sk, skb); + } + tcp_verify_left_out(tp); + tcp_clear_all_retrans_hints(tp); +} + +/* Enter Loss state. */ void tcp_enter_loss(struct sock *sk) { const struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); struct net *net = sock_net(sk); - struct sk_buff *skb; bool new_recovery = icsk->icsk_ca_state < TCP_CA_Recovery; - bool is_reneg; /* is receiver reneging on SACKs? */ /* Reduce ssthresh if it has not yet been made inside this window. */ if (icsk->icsk_ca_state <= TCP_CA_Disorder || @@ -1944,24 +1969,7 @@ void tcp_enter_loss(struct sock *sk) tp->snd_cwnd_cnt = 0; tp->snd_cwnd_stamp = tcp_jiffies32; - if (tcp_is_reno(tp)) - tcp_reset_reno_sack(tp); - - skb = tcp_rtx_queue_head(sk); - is_reneg = skb && (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED); - if (is_reneg) { - NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSACKRENEGING); - tp->sacked_out = 0; - /* Mark SACK reneging until we recover from this loss event. */ - tp->is_sack_reneg = 1; - } - skb_rbtree_walk_from(skb) { - if (is_reneg) - TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_ACKED; - tcp_mark_skb_lost(sk, skb); - } - tcp_verify_left_out(tp); - tcp_clear_all_retrans_hints(tp); + tcp_timeout_mark_lost(sk); /* Timeout in disordered state after receiving substantial DUPACKs * suggests that the degree of reordering is over-estimated. From patchwork Wed May 16 23:40:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuchung Cheng X-Patchwork-Id: 915013 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="QeyNWofB"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40mWCZ4MZgz9s1B for ; Thu, 17 May 2018 09:40:58 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752061AbeEPXky (ORCPT ); Wed, 16 May 2018 19:40:54 -0400 Received: from mail-wr0-f194.google.com ([209.85.128.194]:37067 "EHLO mail-wr0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751610AbeEPXkv (ORCPT ); Wed, 16 May 2018 19:40:51 -0400 Received: by mail-wr0-f194.google.com with SMTP id h5-v6so3734994wrm.4 for ; Wed, 16 May 2018 16:40:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=sou2VGV6UZNqsZ1Vh1Snz1y8S0BCitbu9y1fFtt4Gss=; b=QeyNWofBAP4zLDWgos7PwDWAtbsABYh5xO1S2xlkthUibqr7tU0I6ERPgRp/gSXcow yZy5Yapph7ExIUOC2PmFcPjdn65GBFWim95k1EJksRHbGU0cfvIXuI9XNWuCH+tteJ1/ Xatp55KFLNFXXUQs9R9yyzUk/WjzYSbyUo+QkvVqKxZDTTlDVXyRONDDrWdVQIeXQAWr 1OkG2+XwaON9ElnJ0FKQuPArFy7496zJ1hE/IgmcgoNhMU/7plzdCrCSiGwqiVjOowLn GcSAXzPtUIK1/68rIGhNMv9EPgTOw0x8ZkDXbKjX17g89NLm0lE5EXo0qsdXxNIYBDUU H3NQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=sou2VGV6UZNqsZ1Vh1Snz1y8S0BCitbu9y1fFtt4Gss=; b=mPXFKIEiNyWxD/Uj1G3v6Z4JDa8yY9OPtvP1SixxTFAMxPq5ATDd63JggK8y+as1qE p1WHS2/XqteGHm1+oFNn8uCS9i0NMsuyZudDFQPnL7ZYUiYA+IWxGY3YCp1VLat7ZcC2 YVR9oxG6EaXusjreikfkYO9StVprYxl5NR3s3S1iBUfIMdxy+AmsKEDRE1NYnC4uIUoK /xbP87CY1R3YC53Q5QOK2EbbBoX0mTwGCoJHNVkcy3Rj4WGoVT7b/Msl20gsGu+ZmINg PBDLAceTsZnQDNohTFamz3fo71LNTvn7+rUOcFthQBr6t1C0G9gLhkQEWeEEdz83MduV +vlw== X-Gm-Message-State: ALKqPwc1dAGpWP658Pnm9gkkuaT4/y5ZUINEM4lgaHlDEP3Q3My9vH8v 2KrdNjv31NDFe+8t4U66LlGOOg== X-Google-Smtp-Source: AB8JxZqQWd4HTl82KTk3yiaBkCvWIvaXQAJC1RbE0VcIEeuCFA/FwggwBbLQOzpq1vBgJSnG635FRQ== X-Received: by 2002:adf:9c01:: with SMTP id f1-v6mr2176536wrc.171.1526514050066; Wed, 16 May 2018 16:40:50 -0700 (PDT) Received: from ycheng2.svl.corp.google.com ([2620:15c:2c4:201:d660:6c0b:8a4f:4c77]) by smtp.gmail.com with ESMTPSA id b11-v6sm4796488wrf.50.2018.05.16.16.40.47 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 May 2018 16:40:49 -0700 (PDT) From: Yuchung Cheng To: davem@davemloft.net Cc: netdev@vger.kernel.org, edumazet@google.com, ncardwell@google.com, soheil@google.com, priyarjha@google.com, Yuchung Cheng Subject: [PATCH net-next 6/8] tcp: separate loss marking and state update on RTO Date: Wed, 16 May 2018 16:40:15 -0700 Message-Id: <20180516234017.172775-7-ycheng@google.com> X-Mailer: git-send-email 2.17.0.441.gb46fe60e1d-goog In-Reply-To: <20180516234017.172775-1-ycheng@google.com> References: <20180516234017.172775-1-ycheng@google.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Previously when TCP times out, it first updates cwnd and ssthresh, marks packets lost, and then updates congestion state again. This was fine because everything not yet delivered is marked lost, so the inflight is always 0 and cwnd can be safely set to 1 to retransmit one packet on timeout. But the inflight may not always be 0 on timeout if TCP changes to mark packets lost based on packet sent time. Therefore we must first mark the packet lost, then set the cwnd based on the (updated) inflight. This is not a pure refactor. Congestion control may potentially break if it uses (not yet updated) inflight to compute ssthresh. Fortunately all existing congestion control modules does not do that. Also it changes the inflight when CA_LOSS_EVENT is called, and only westwood processes such an event but does not use inflight. This change has two other minor side benefits: 1) consistent with Fast Recovery s.t. the inflight is updated first before tcp_enter_recovery flips state to CA_Recovery. 2) avoid intertwining loss marking with state update, making the code more readable. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Reviewed-by: Eric Dumazet Reviewed-by: Soheil Hassas Yeganeh Reviewed-by: Priyaranjan Jha --- net/ipv4/tcp_input.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index af32accda2a9..1ccc97b368c7 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1955,6 +1955,8 @@ void tcp_enter_loss(struct sock *sk) struct net *net = sock_net(sk); bool new_recovery = icsk->icsk_ca_state < TCP_CA_Recovery; + tcp_timeout_mark_lost(sk); + /* Reduce ssthresh if it has not yet been made inside this window. */ if (icsk->icsk_ca_state <= TCP_CA_Disorder || !after(tp->high_seq, tp->snd_una) || @@ -1969,8 +1971,6 @@ void tcp_enter_loss(struct sock *sk) tp->snd_cwnd_cnt = 0; tp->snd_cwnd_stamp = tcp_jiffies32; - tcp_timeout_mark_lost(sk); - /* Timeout in disordered state after receiving substantial DUPACKs * suggests that the degree of reordering is over-estimated. */ From patchwork Wed May 16 23:40:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuchung Cheng X-Patchwork-Id: 915014 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="ukQuUuKe"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40mWCb5j2zz9s1w for ; Thu, 17 May 2018 09:40:59 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752081AbeEPXk4 (ORCPT ); Wed, 16 May 2018 19:40:56 -0400 Received: from mail-wr0-f193.google.com ([209.85.128.193]:42902 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752056AbeEPXky (ORCPT ); Wed, 16 May 2018 19:40:54 -0400 Received: by mail-wr0-f193.google.com with SMTP id t16-v6so560195wrm.9 for ; Wed, 16 May 2018 16:40:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=UUPArnUpOht+7uOSwSPGsnn+/KI2TmUKNFk6S2tKytQ=; b=ukQuUuKezXru6Aw2D8yz6rC8CP00LtKX6Osbb2F+jhek7eNkJCg3qXqz9cdtQeI4NN /TA9VFkW9didT42iWvO1SdpqjvQaDftypINy3nLhZA40MAPREnAOrDXh+CojoBka1fa0 JSq/pES8ctkTvec1haDDfTMjv5F/vJ+56yl7m10UI/3/btlOVX5xaiBguV2k+vQl+W3C Z2jiK68deTiLrhNuxp/3NhFVkknHl1Yl4UEvGgUt2Yei1/ug1DhVShikIxQu0YUNNQ2N Tt26ZqLz9guX1ofj4Dpa7oMUnRu08FAYo4nxRBmWQtZnjhJ+mrkI+Nu+TjMy7Yg9AlUb Amng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=UUPArnUpOht+7uOSwSPGsnn+/KI2TmUKNFk6S2tKytQ=; b=HHZwOL5Bdal5oMghxgd62oHlNAcKZDmtzrf56rvOPNu4Ls6zZQXWuJqNronRRH6v1m Wno3nOIXNOSeGtFNp9j1EqWZhv3fPBNgL23yoaLmQOhxpuMIEIedTucNvzKE1JuoGarM X3uigW4xsgASABjq+JVe+9+S22en6qHZBz2Qp084lzRriIcrs5fz4l/F6Lp8v/ujxn/k uy9Jnei8iVer3J+e1rnmewxsR0t71Zyu7UlyRzOFF1E0aKLS+76afqBCzuc0kLEz5OYo 8vr2BYpLTeZuPA4a9HXF02qF5CCW6cBZij/kIsq2iGBu8aRtahRmgv26buJXbxRhQe9f E51w== X-Gm-Message-State: ALKqPwciZSlc4YEE92PvzIo82TfFzdMhw906tyMsp9VnL96vZrJ+wfUf 28exbmfZVaGt69bVtJdZxbxqnyzVMBY= X-Google-Smtp-Source: AB8JxZpfEOuIpurgiRMsWSJTuzmhds20xMjuvtGm38aBpqpfATbV10OPU6YU4kxAbwP0VlkIO9Qm1Q== X-Received: by 2002:adf:db4f:: with SMTP id f15-v6mr2641929wrj.212.1526514052770; Wed, 16 May 2018 16:40:52 -0700 (PDT) Received: from ycheng2.svl.corp.google.com ([2620:15c:2c4:201:d660:6c0b:8a4f:4c77]) by smtp.gmail.com with ESMTPSA id b11-v6sm4796488wrf.50.2018.05.16.16.40.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 May 2018 16:40:51 -0700 (PDT) From: Yuchung Cheng To: davem@davemloft.net Cc: netdev@vger.kernel.org, edumazet@google.com, ncardwell@google.com, soheil@google.com, priyarjha@google.com, Yuchung Cheng Subject: [PATCH net-next 7/8] tcp: new helper tcp_rack_skb_timeout Date: Wed, 16 May 2018 16:40:16 -0700 Message-Id: <20180516234017.172775-8-ycheng@google.com> X-Mailer: git-send-email 2.17.0.441.gb46fe60e1d-goog In-Reply-To: <20180516234017.172775-1-ycheng@google.com> References: <20180516234017.172775-1-ycheng@google.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Create and export a new helper tcp_rack_skb_timeout and move tcp_is_rack to prepare the final RTO change. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Reviewed-by: Eric Dumazet Reviewed-by: Soheil Hassas Yeganeh Reviewed-by: Priyaranjan Jha --- include/net/tcp.h | 2 ++ net/ipv4/tcp_input.c | 10 +++++----- net/ipv4/tcp_recovery.c | 9 +++++++-- 3 files changed, 14 insertions(+), 7 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 402484ed9b57..b46d0f9adbdb 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1880,6 +1880,8 @@ void tcp_init(void); /* tcp_recovery.c */ void tcp_mark_skb_lost(struct sock *sk, struct sk_buff *skb); void tcp_newreno_mark_lost(struct sock *sk, bool snd_una_advanced); +extern s32 tcp_rack_skb_timeout(struct tcp_sock *tp, struct sk_buff *skb, + u32 reo_wnd); extern void tcp_rack_mark_lost(struct sock *sk); extern void tcp_rack_advance(struct tcp_sock *tp, u8 sacked, u32 end_seq, u64 xmit_time); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 1ccc97b368c7..ba8a8e3464aa 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1917,6 +1917,11 @@ static inline void tcp_init_undo(struct tcp_sock *tp) tp->undo_retrans = tp->retrans_out ? : -1; } +static bool tcp_is_rack(const struct sock *sk) +{ + return sock_net(sk)->ipv4.sysctl_tcp_recovery & TCP_RACK_LOSS_DETECTION; +} + /* If we detect SACK reneging, forget all SACK information * and reset tags completely, otherwise preserve SACKs. If receiver * dropped its ofo queue, we will know this due to reneging detection. @@ -2031,11 +2036,6 @@ static inline int tcp_dupack_heuristics(const struct tcp_sock *tp) return tp->sacked_out + 1; } -static bool tcp_is_rack(const struct sock *sk) -{ - return sock_net(sk)->ipv4.sysctl_tcp_recovery & TCP_RACK_LOSS_DETECTION; -} - /* Linux NewReno/SACK/ECN state machine. * -------------------------------------- * diff --git a/net/ipv4/tcp_recovery.c b/net/ipv4/tcp_recovery.c index b2f9be388bf3..30cbfb69b1de 100644 --- a/net/ipv4/tcp_recovery.c +++ b/net/ipv4/tcp_recovery.c @@ -47,6 +47,12 @@ u32 tcp_rack_reo_wnd(const struct sock *sk) tp->srtt_us >> 3); } +s32 tcp_rack_skb_timeout(struct tcp_sock *tp, struct sk_buff *skb, u32 reo_wnd) +{ + return tp->rack.rtt_us + reo_wnd - + tcp_stamp_us_delta(tp->tcp_mstamp, skb->skb_mstamp); +} + /* RACK loss detection (IETF draft draft-ietf-tcpm-rack-01): * * Marks a packet lost, if some packet sent later has been (s)acked. @@ -92,8 +98,7 @@ static void tcp_rack_detect_loss(struct sock *sk, u32 *reo_timeout) /* A packet is lost if it has not been s/acked beyond * the recent RTT plus the reordering window. */ - remaining = tp->rack.rtt_us + reo_wnd - - tcp_stamp_us_delta(tp->tcp_mstamp, skb->skb_mstamp); + remaining = tcp_rack_skb_timeout(tp, skb, reo_wnd); if (remaining <= 0) { tcp_mark_skb_lost(sk, skb); list_del_init(&skb->tcp_tsorted_anchor); From patchwork Wed May 16 23:40:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuchung Cheng X-Patchwork-Id: 915015 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="JL4W0A+C"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40mWCg2rY7z9s1B for ; Thu, 17 May 2018 09:41:03 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752149AbeEPXlA (ORCPT ); Wed, 16 May 2018 19:41:00 -0400 Received: from mail-wr0-f196.google.com ([209.85.128.196]:37076 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752139AbeEPXk5 (ORCPT ); Wed, 16 May 2018 19:40:57 -0400 Received: by mail-wr0-f196.google.com with SMTP id h5-v6so3735119wrm.4 for ; Wed, 16 May 2018 16:40:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=gBEgR3aDu1GRkmjhbtnIZy7WAT62VMlhyEPNn4jyr2w=; b=JL4W0A+CPE33da9C05gJXDhTuF82ua18na2LpMP4WM7FtyukNPcTXZxgxbDUgbnzqr KwHFgLaQH90yfSU18O+2uAwT7Hj6rSL/gS8qfj61R8KiY1kgxGD1yTrCTqmpFbPx/UGg ReMNbWUgFLjZrkaUQOkV5GFhkyWQESnxasN9YMGnTDsAHaWdbjrMunILHKLPLUIJvxQm 50okz1M5pLA/e7crfHUzV50jjeXdXuRqtyRjxI443ykOAnE9XEHSQ5irbyCHUShT1MjL kmxKIP3UgnJnR8IyfC3PEn7Xsof/NY8MmJ0KrwaGPXXxATvmfBWwWy953KuR/HmVqGd2 cMCg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=gBEgR3aDu1GRkmjhbtnIZy7WAT62VMlhyEPNn4jyr2w=; b=Hc9r9bSGpGKwpjuurK08S9AHiLinJSy32SsOGohtPjFcX0maLvoPK1GgvOoRXZ46X/ KUpD0jTk9so0M71V5K1I8kKtjVSYqQasSyNxBTEsHdf51aqxZgjckZASsCpIVTuOa0C9 M2l2/GC0/8dySpeFwjGaB5/+DlOLbw7eDGXy1rzJOlPoC6GfyJFiThVSrQAJzp0TvTcb GigMvKL+Y2t0MDoC8j8a8ouRnOcgQwAikLClEYgIPCxzVO+DIT4HZsrppXTXl1p66ItM nOy3UxOPZdjXdl+dj+mMKpLa26whEOOUHP9m3w5L9j55y8zLuSO82gJDUcV2mzJ49t5g wh8g== X-Gm-Message-State: ALKqPwd+ZgYWsqEUgtNYNYTxZ0gN9oei67rFr1bOBdRt78DTrAUHUpvU UPeaSL7h0o5jr2UUGbMuZRpQag== X-Google-Smtp-Source: AB8JxZqUsrou3bkJTz5eQbyBSsCJzQ29QwYkFlPiCF2M2FWy885wU0LhI1YUpLbhjhGSFq3pYwh9Ow== X-Received: by 2002:adf:a9e6:: with SMTP id b93-v6mr2598630wrd.234.1526514055431; Wed, 16 May 2018 16:40:55 -0700 (PDT) Received: from ycheng2.svl.corp.google.com ([2620:15c:2c4:201:d660:6c0b:8a4f:4c77]) by smtp.gmail.com with ESMTPSA id b11-v6sm4796488wrf.50.2018.05.16.16.40.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 May 2018 16:40:54 -0700 (PDT) From: Yuchung Cheng To: davem@davemloft.net Cc: netdev@vger.kernel.org, edumazet@google.com, ncardwell@google.com, soheil@google.com, priyarjha@google.com, Yuchung Cheng Subject: [PATCH net-next 8/8] tcp: don't mark recently sent packets lost on RTO Date: Wed, 16 May 2018 16:40:17 -0700 Message-Id: <20180516234017.172775-9-ycheng@google.com> X-Mailer: git-send-email 2.17.0.441.gb46fe60e1d-goog In-Reply-To: <20180516234017.172775-1-ycheng@google.com> References: <20180516234017.172775-1-ycheng@google.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org An RTO event indicates the head has not been acked for a long time after its last (re)transmission. But the other packets are not necessarily lost if they have been only sent recently (for example due to application limit). This patch would prohibit marking packets sent within an RTT to be lost on RTO event, using similar logic in TCP RACK detection. Normally the head (SND.UNA) would be marked lost since RTO should fire strictly after the head was sent. An exception is when the most recent RACK RTT measurement is larger than the (previous) RTO. To address this exception the head is always marked lost. Congestion control interaction: since we may not mark every packet lost, the congestion window may be more than 1 (inflight plus 1). But only one packet will be retransmitted after RTO, since tcp_retransmit_timer() calls tcp_retransmit_skb(...,segs=1). The connection still performs slow start from one packet (with Cubic congestion control). This commit was tested in an A/B test with Google web servers, and showed a reduction of 2% in (spurious) retransmits post timeout (SlowStartRetrans), and correspondingly reduced DSACKs (DSACKIgnoredOld) by 7%. Signed-off-by: Yuchung Cheng Signed-off-by: Neal Cardwell Reviewed-by: Eric Dumazet Reviewed-by: Soheil Hassas Yeganeh Reviewed-by: Priyaranjan Jha --- net/ipv4/tcp_input.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index ba8a8e3464aa..0bf032839548 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1929,11 +1929,11 @@ static bool tcp_is_rack(const struct sock *sk) static void tcp_timeout_mark_lost(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); - struct sk_buff *skb; + struct sk_buff *skb, *head; bool is_reneg; /* is receiver reneging on SACKs? */ - skb = tcp_rtx_queue_head(sk); - is_reneg = skb && (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED); + head = tcp_rtx_queue_head(sk); + is_reneg = head && (TCP_SKB_CB(head)->sacked & TCPCB_SACKED_ACKED); if (is_reneg) { NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSACKRENEGING); tp->sacked_out = 0; @@ -1943,9 +1943,13 @@ static void tcp_timeout_mark_lost(struct sock *sk) tcp_reset_reno_sack(tp); } + skb = head; skb_rbtree_walk_from(skb) { if (is_reneg) TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_ACKED; + else if (tcp_is_rack(sk) && skb != head && + tcp_rack_skb_timeout(tp, skb, 0) > 0) + continue; /* Don't mark recently sent ones lost yet */ tcp_mark_skb_lost(sk, skb); } tcp_verify_left_out(tp); @@ -1972,7 +1976,7 @@ void tcp_enter_loss(struct sock *sk) tcp_ca_event(sk, CA_EVENT_LOSS); tcp_init_undo(tp); } - tp->snd_cwnd = 1; + tp->snd_cwnd = tcp_packets_in_flight(tp) + 1; tp->snd_cwnd_cnt = 0; tp->snd_cwnd_stamp = tcp_jiffies32;