From patchwork Wed Jan 17 20:11:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuchung Cheng X-Patchwork-Id: 862589 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="A7XX6xoZ"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zMJBd2bc5z9t6K for ; Thu, 18 Jan 2018 07:11:21 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752868AbeAQULS (ORCPT ); Wed, 17 Jan 2018 15:11:18 -0500 Received: from mail-pf0-f195.google.com ([209.85.192.195]:41394 "EHLO mail-pf0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752100AbeAQULQ (ORCPT ); Wed, 17 Jan 2018 15:11:16 -0500 Received: by mail-pf0-f195.google.com with SMTP id j3so12284658pfh.8 for ; Wed, 17 Jan 2018 12:11:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=PEEfM1RQeNQEwPBut+2QGpMLYuUydeVUzGRKHkRIpTw=; b=A7XX6xoZqunLDZGp4mKBsF3xRlPJmmfLL/KF62ZFJk/RxnFKfFLZYXPp1TQAb1SRyl 5M3JmJs6HVmlAaGXiZBgdeO3igQ+jki64GoBC/IP2Ns1QHXG3h3Bw8vXWuQNJm39AS6G ETob2+urcCvjJ0963H0CMftZBI1roZKbhYx44m9NJgGoZ/szNVXS69xB7mXboeHoohE/ XZqHcIUDiTu3yIfAWZR04x14JYBc2+omYGsl9Uf/AetffzmiEjHEm1VrfsODkRRZvny0 r2dgqDM69hFEY+/P8P/qZFPF5kHI9+5LJv6JZXd+8EzLOGw2dc5V52Hl0CaUQ7tRalNH 1adA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=PEEfM1RQeNQEwPBut+2QGpMLYuUydeVUzGRKHkRIpTw=; b=NXYsjoXyiKc4yrFhyJ9Cpl5Wk3qGf3iVPo1kfJHcP/jpQu7Vt3wqZJp/6LZDzRT5LX 0DNBJ+MwYsJx6M7y1W3rcKRj6F5YItn9BuX21HMKfIKV10nRfpKoczEocsYP+lXWuUp3 nb12jCDf15//c1OgZj4Km2biw5YAVrABROYiO3YkBC8INPCKJmzlGdIJLEdZ37EIjf/N UoZrAFF1WFD3LPQFNLC5lcU8BbcYbdtppsGX9tXgappAhuVRdVMEPHlNFSCDCuyqlqpD hYGVaaG2AcwmDlorlsGvtEsuL5WXLWcHvzrMHDUmGlzsBfEkgUhJRM11Xaw5SsF346hd i6Aw== X-Gm-Message-State: AKwxytcchPSQv/9G7nW6ZTcV6cfM2WRDsl9tqblvL+650tHMOKcbQUPS y6m1ws0eSEbQmvLbCfamOueWS4qFIzc= X-Google-Smtp-Source: ACJfBotZoZxcK2/Jmzek9Tsw/oQ/MoiQWOhr2lnuCg9UkA0LDYvDU8srGyOq8dzgmud0f1EabrEFiA== X-Received: by 10.84.174.197 with SMTP id r63mr4268986plb.310.1516219875954; Wed, 17 Jan 2018 12:11:15 -0800 (PST) Received: from ycheng2.svl.corp.google.com ([100.116.160.41]) by smtp.gmail.com with ESMTPSA id v25sm9961372pfg.132.2018.01.17.12.11.14 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 17 Jan 2018 12:11:14 -0800 (PST) From: Yuchung Cheng To: davem@davemloft.net Cc: netdev@vger.kernel.org, edumazet@google.com, ncardwell@google.com, soheil@google.com, Yuchung Cheng Subject: [PATCH 1/2 net-next] tcp: avoid min-RTT overestimation from delayed ACKs Date: Wed, 17 Jan 2018 12:11:00 -0800 Message-Id: <20180117201101.14137-2-ycheng@google.com> X-Mailer: git-send-email 2.16.0.rc1.238.g530d649a79-goog In-Reply-To: <20180117201101.14137-1-ycheng@google.com> References: <20180117201101.14137-1-ycheng@google.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch avoids having TCP sender or congestion control overestimate the min RTT by orders of magnitude. This happens when all the samples in the windowed filter are one-packet transfer like small request and health-check like chit-chat, which is farily common for applications using persistent connections. This patch tries to conservatively labels and skip RTT samples obtained from this type of workload. Signed-off-by: Yuchung Cheng Signed-off-by: Soheil Hassas Yeganeh Acked-by: Neal Cardwell Acked-by: Eric Dumazet --- net/ipv4/tcp_input.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index ff71b18d9682..2c6797134553 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -97,6 +97,7 @@ int sysctl_tcp_max_orphans __read_mostly = NR_FILE; #define FLAG_SACK_RENEGING 0x2000 /* snd_una advanced to a sacked seq */ #define FLAG_UPDATE_TS_RECENT 0x4000 /* tcp_replace_ts_recent() */ #define FLAG_NO_CHALLENGE_ACK 0x8000 /* do not call tcp_send_challenge_ack() */ +#define FLAG_ACK_MAYBE_DELAYED 0x10000 /* Likely a delayed ACK */ #define FLAG_ACKED (FLAG_DATA_ACKED|FLAG_SYN_ACKED) #define FLAG_NOT_DUP (FLAG_DATA|FLAG_WIN_UPDATE|FLAG_ACKED) @@ -2857,11 +2858,18 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una, *rexmit = REXMIT_LOST; } -static void tcp_update_rtt_min(struct sock *sk, u32 rtt_us) +static void tcp_update_rtt_min(struct sock *sk, u32 rtt_us, const int flag) { u32 wlen = sock_net(sk)->ipv4.sysctl_tcp_min_rtt_wlen * HZ; struct tcp_sock *tp = tcp_sk(sk); + if ((flag & FLAG_ACK_MAYBE_DELAYED) && rtt_us > tcp_min_rtt(tp)) { + /* If the remote keeps returning delayed ACKs, eventually + * the min filter would pick it up and overestimate the + * prop. delay when it expires. Skip suspected delayed ACKs. + */ + return; + } minmax_running_min(&tp->rtt_min, wlen, tcp_jiffies32, rtt_us ? : jiffies_to_usecs(1)); } @@ -2901,7 +2909,7 @@ static bool tcp_ack_update_rtt(struct sock *sk, const int flag, * always taken together with ACK, SACK, or TS-opts. Any negative * values will be skipped with the seq_rtt_us < 0 check above. */ - tcp_update_rtt_min(sk, ca_rtt_us); + tcp_update_rtt_min(sk, ca_rtt_us, flag); tcp_rtt_estimator(sk, seq_rtt_us); tcp_set_rto(sk); @@ -3125,6 +3133,17 @@ static int tcp_clean_rtx_queue(struct sock *sk, u32 prior_fack, if (likely(first_ackt) && !(flag & FLAG_RETRANS_DATA_ACKED)) { seq_rtt_us = tcp_stamp_us_delta(tp->tcp_mstamp, first_ackt); ca_rtt_us = tcp_stamp_us_delta(tp->tcp_mstamp, last_ackt); + + if (pkts_acked == 1 && last_in_flight < tp->mss_cache && + last_in_flight && !prior_sacked && fully_acked && + sack->rate->prior_delivered + 1 == tp->delivered && + !(flag & (FLAG_CA_ALERT | FLAG_SYN_ACKED))) { + /* Conservatively mark a delayed ACK. It's typically + * from a lone runt packet over the round trip to + * a receiver w/o out-of-order or CE events. + */ + flag |= FLAG_ACK_MAYBE_DELAYED; + } } if (sack->first_sackt) { sack_rtt_us = tcp_stamp_us_delta(tp->tcp_mstamp, sack->first_sackt); From patchwork Wed Jan 17 20:11:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuchung Cheng X-Patchwork-Id: 862590 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="gXFzGpgE"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zMJBh2m2Rz9t6C for ; Thu, 18 Jan 2018 07:11:24 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752912AbeAQULV (ORCPT ); Wed, 17 Jan 2018 15:11:21 -0500 Received: from mail-pf0-f194.google.com ([209.85.192.194]:42633 "EHLO mail-pf0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752829AbeAQULS (ORCPT ); Wed, 17 Jan 2018 15:11:18 -0500 Received: by mail-pf0-f194.google.com with SMTP id b25so6001889pfd.9 for ; Wed, 17 Jan 2018 12:11:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=kDUy+u5VwbYetdssfiGS5v5DUKyXJcaqkdlYFNCxTTs=; b=gXFzGpgEJex8ZoIDQ0xOipgEciZh60IEmbSmap4TIdKhQXClVgHONY4pjcoR6M2CCp 88WN9OCZ8svwIgRZ9+Jacg9SvMqQg9eu++LXKRSBWClfdRawaSyYpVdNYUDWGC+QDd+U RkBGqY3nJYa73a0MsDQB65t7uY2+nR+Aew9egwGEPKIyUCDvxek/b44oaZPFGr94bHev FfIWp9IxRP7cxZYrP5PaI70TzivXomF7EOdVHdTqTl0QcbBqPVsaUAYPaIVAxSXgjTcP ce7h4fokzX6L7MN9J3q3j3i6x+EuKlbsszSKR+7E95p9E7U4/R3USVLWkLdAyvbnlNYb +17Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=kDUy+u5VwbYetdssfiGS5v5DUKyXJcaqkdlYFNCxTTs=; b=ZR0QK/OZj+M/VEl5gc75RMH0r/qbxYHu8lAm+J5XitXkB66E7uPn6Kq7+cctQvfngt 98Ovwp8oQ5kVO60DMT5YGbJ1+aLTlFXqkPjC3nGp75xk2YAkw3DqhphGrsIEfCCsNyFs scWV6T5jvAxbJ3bDiR7A07o/JdMy0OH6qzDq1Ft5t7fz+vLjio7c+vVZKeLatnP5z10A i9q3RCQEancNSGkTCtEXLNbkPkxtPZNeDCp6RE5PTWJy2vMX+vRxGSqAK3jk8omcqnCs QOEZyGp/mUbkW5ZufkJUBurzQcGeGsywlxEbgCScVoFmFRe/WF2BjFRZyw4JlXWx87hA 7CLA== X-Gm-Message-State: AKwxytdF/0TWVbsvIQM1YHZlYYapn2g0T1nlp22UIu2Vq1t4uCXo7w2V UZpj1SvF5TXNTqig/UfIBU+XohIA2aU= X-Google-Smtp-Source: ACJfBouEFtP02RVbcOD9I45DfFbdDu3TTr9Sf209gbFAlldOOO/+xi0OjVXvoajemgLO8TAjVp76Qw== X-Received: by 10.99.56.85 with SMTP id h21mr6803805pgn.402.1516219877464; Wed, 17 Jan 2018 12:11:17 -0800 (PST) Received: from ycheng2.svl.corp.google.com ([100.116.160.41]) by smtp.gmail.com with ESMTPSA id v25sm9961372pfg.132.2018.01.17.12.11.16 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 17 Jan 2018 12:11:16 -0800 (PST) From: Yuchung Cheng To: davem@davemloft.net Cc: netdev@vger.kernel.org, edumazet@google.com, ncardwell@google.com, soheil@google.com, Yuchung Cheng Subject: [PATCH 2/2 net-next] tcp: avoid min RTT bloat by skipping RTT from delayed-ACK in BBR Date: Wed, 17 Jan 2018 12:11:01 -0800 Message-Id: <20180117201101.14137-3-ycheng@google.com> X-Mailer: git-send-email 2.16.0.rc1.238.g530d649a79-goog In-Reply-To: <20180117201101.14137-1-ycheng@google.com> References: <20180117201101.14137-1-ycheng@google.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org A persistent connection may send tiny amount of data (e.g. health-check) for a long period of time. BBR's windowed min RTT filter may only see RTT samples from delayed ACKs causing BBR to grossly over-estimate the path delay depending how much the ACK was delayed at the receiver. This patch skips RTT samples that are likely coming from delayed ACKs. Note that it is possible the sender never obtains a valid measure to set the min RTT. In this case BBR will continue to set cwnd to initial window which seems fine because the connection is thin stream. Signed-off-by: Yuchung Cheng Acked-by: Neal Cardwell Acked-by: Soheil Hassas Yeganeh Acked-by: Priyaranjan Jha --- include/net/tcp.h | 1 + net/ipv4/tcp_bbr.c | 3 ++- net/ipv4/tcp_input.c | 1 + 3 files changed, 4 insertions(+), 1 deletion(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 6939e69d3c37..5a1d26a18599 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -953,6 +953,7 @@ struct rate_sample { u32 prior_in_flight; /* in flight before this ACK */ bool is_app_limited; /* is sample from packet with bubble in pipe? */ bool is_retrans; /* is sample from retransmission? */ + bool is_ack_delayed; /* is this (likely) a delayed ACK? */ }; struct tcp_congestion_ops { diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c index 8322f26e770e..785712be5b0d 100644 --- a/net/ipv4/tcp_bbr.c +++ b/net/ipv4/tcp_bbr.c @@ -766,7 +766,8 @@ static void bbr_update_min_rtt(struct sock *sk, const struct rate_sample *rs) filter_expired = after(tcp_jiffies32, bbr->min_rtt_stamp + bbr_min_rtt_win_sec * HZ); if (rs->rtt_us >= 0 && - (rs->rtt_us <= bbr->min_rtt_us || filter_expired)) { + (rs->rtt_us <= bbr->min_rtt_us || + (filter_expired && !rs->is_ack_delayed))) { bbr->min_rtt_us = rs->rtt_us; bbr->min_rtt_stamp = tcp_jiffies32; } diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 2c6797134553..cfa51cfd2d99 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3633,6 +3633,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) delivered = tp->delivered - delivered; /* freshly ACKed or SACKed */ lost = tp->lost - lost; /* freshly marked lost */ + rs.is_ack_delayed = !!(flag & FLAG_ACK_MAYBE_DELAYED); tcp_rate_gen(sk, delivered, lost, is_sack_reneg, sack_state.rate); tcp_cong_control(sk, ack, delivered, flag, sack_state.rate); tcp_xmit_recovery(sk, rexmit);