From patchwork Wed Jan 17 20:11:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yuchung Cheng X-Patchwork-Id: 862589 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="A7XX6xoZ"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zMJBd2bc5z9t6K for ; Thu, 18 Jan 2018 07:11:21 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752868AbeAQULS (ORCPT ); Wed, 17 Jan 2018 15:11:18 -0500 Received: from mail-pf0-f195.google.com ([209.85.192.195]:41394 "EHLO mail-pf0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752100AbeAQULQ (ORCPT ); Wed, 17 Jan 2018 15:11:16 -0500 Received: by mail-pf0-f195.google.com with SMTP id j3so12284658pfh.8 for ; Wed, 17 Jan 2018 12:11:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=PEEfM1RQeNQEwPBut+2QGpMLYuUydeVUzGRKHkRIpTw=; b=A7XX6xoZqunLDZGp4mKBsF3xRlPJmmfLL/KF62ZFJk/RxnFKfFLZYXPp1TQAb1SRyl 5M3JmJs6HVmlAaGXiZBgdeO3igQ+jki64GoBC/IP2Ns1QHXG3h3Bw8vXWuQNJm39AS6G ETob2+urcCvjJ0963H0CMftZBI1roZKbhYx44m9NJgGoZ/szNVXS69xB7mXboeHoohE/ XZqHcIUDiTu3yIfAWZR04x14JYBc2+omYGsl9Uf/AetffzmiEjHEm1VrfsODkRRZvny0 r2dgqDM69hFEY+/P8P/qZFPF5kHI9+5LJv6JZXd+8EzLOGw2dc5V52Hl0CaUQ7tRalNH 1adA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=PEEfM1RQeNQEwPBut+2QGpMLYuUydeVUzGRKHkRIpTw=; b=NXYsjoXyiKc4yrFhyJ9Cpl5Wk3qGf3iVPo1kfJHcP/jpQu7Vt3wqZJp/6LZDzRT5LX 0DNBJ+MwYsJx6M7y1W3rcKRj6F5YItn9BuX21HMKfIKV10nRfpKoczEocsYP+lXWuUp3 nb12jCDf15//c1OgZj4Km2biw5YAVrABROYiO3YkBC8INPCKJmzlGdIJLEdZ37EIjf/N UoZrAFF1WFD3LPQFNLC5lcU8BbcYbdtppsGX9tXgappAhuVRdVMEPHlNFSCDCuyqlqpD hYGVaaG2AcwmDlorlsGvtEsuL5WXLWcHvzrMHDUmGlzsBfEkgUhJRM11Xaw5SsF346hd i6Aw== X-Gm-Message-State: AKwxytcchPSQv/9G7nW6ZTcV6cfM2WRDsl9tqblvL+650tHMOKcbQUPS y6m1ws0eSEbQmvLbCfamOueWS4qFIzc= X-Google-Smtp-Source: ACJfBotZoZxcK2/Jmzek9Tsw/oQ/MoiQWOhr2lnuCg9UkA0LDYvDU8srGyOq8dzgmud0f1EabrEFiA== X-Received: by 10.84.174.197 with SMTP id r63mr4268986plb.310.1516219875954; Wed, 17 Jan 2018 12:11:15 -0800 (PST) Received: from ycheng2.svl.corp.google.com ([100.116.160.41]) by smtp.gmail.com with ESMTPSA id v25sm9961372pfg.132.2018.01.17.12.11.14 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 17 Jan 2018 12:11:14 -0800 (PST) From: Yuchung Cheng To: davem@davemloft.net Cc: netdev@vger.kernel.org, edumazet@google.com, ncardwell@google.com, soheil@google.com, Yuchung Cheng Subject: [PATCH 1/2 net-next] tcp: avoid min-RTT overestimation from delayed ACKs Date: Wed, 17 Jan 2018 12:11:00 -0800 Message-Id: <20180117201101.14137-2-ycheng@google.com> X-Mailer: git-send-email 2.16.0.rc1.238.g530d649a79-goog In-Reply-To: <20180117201101.14137-1-ycheng@google.com> References: <20180117201101.14137-1-ycheng@google.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch avoids having TCP sender or congestion control overestimate the min RTT by orders of magnitude. This happens when all the samples in the windowed filter are one-packet transfer like small request and health-check like chit-chat, which is farily common for applications using persistent connections. This patch tries to conservatively labels and skip RTT samples obtained from this type of workload. Signed-off-by: Yuchung Cheng Signed-off-by: Soheil Hassas Yeganeh Acked-by: Neal Cardwell Acked-by: Eric Dumazet --- net/ipv4/tcp_input.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index ff71b18d9682..2c6797134553 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -97,6 +97,7 @@ int sysctl_tcp_max_orphans __read_mostly = NR_FILE; #define FLAG_SACK_RENEGING 0x2000 /* snd_una advanced to a sacked seq */ #define FLAG_UPDATE_TS_RECENT 0x4000 /* tcp_replace_ts_recent() */ #define FLAG_NO_CHALLENGE_ACK 0x8000 /* do not call tcp_send_challenge_ack() */ +#define FLAG_ACK_MAYBE_DELAYED 0x10000 /* Likely a delayed ACK */ #define FLAG_ACKED (FLAG_DATA_ACKED|FLAG_SYN_ACKED) #define FLAG_NOT_DUP (FLAG_DATA|FLAG_WIN_UPDATE|FLAG_ACKED) @@ -2857,11 +2858,18 @@ static void tcp_fastretrans_alert(struct sock *sk, const u32 prior_snd_una, *rexmit = REXMIT_LOST; } -static void tcp_update_rtt_min(struct sock *sk, u32 rtt_us) +static void tcp_update_rtt_min(struct sock *sk, u32 rtt_us, const int flag) { u32 wlen = sock_net(sk)->ipv4.sysctl_tcp_min_rtt_wlen * HZ; struct tcp_sock *tp = tcp_sk(sk); + if ((flag & FLAG_ACK_MAYBE_DELAYED) && rtt_us > tcp_min_rtt(tp)) { + /* If the remote keeps returning delayed ACKs, eventually + * the min filter would pick it up and overestimate the + * prop. delay when it expires. Skip suspected delayed ACKs. + */ + return; + } minmax_running_min(&tp->rtt_min, wlen, tcp_jiffies32, rtt_us ? : jiffies_to_usecs(1)); } @@ -2901,7 +2909,7 @@ static bool tcp_ack_update_rtt(struct sock *sk, const int flag, * always taken together with ACK, SACK, or TS-opts. Any negative * values will be skipped with the seq_rtt_us < 0 check above. */ - tcp_update_rtt_min(sk, ca_rtt_us); + tcp_update_rtt_min(sk, ca_rtt_us, flag); tcp_rtt_estimator(sk, seq_rtt_us); tcp_set_rto(sk); @@ -3125,6 +3133,17 @@ static int tcp_clean_rtx_queue(struct sock *sk, u32 prior_fack, if (likely(first_ackt) && !(flag & FLAG_RETRANS_DATA_ACKED)) { seq_rtt_us = tcp_stamp_us_delta(tp->tcp_mstamp, first_ackt); ca_rtt_us = tcp_stamp_us_delta(tp->tcp_mstamp, last_ackt); + + if (pkts_acked == 1 && last_in_flight < tp->mss_cache && + last_in_flight && !prior_sacked && fully_acked && + sack->rate->prior_delivered + 1 == tp->delivered && + !(flag & (FLAG_CA_ALERT | FLAG_SYN_ACKED))) { + /* Conservatively mark a delayed ACK. It's typically + * from a lone runt packet over the round trip to + * a receiver w/o out-of-order or CE events. + */ + flag |= FLAG_ACK_MAYBE_DELAYED; + } } if (sack->first_sackt) { sack_rtt_us = tcp_stamp_us_delta(tp->tcp_mstamp, sack->first_sackt);