From patchwork Fri Oct 23 20:50:12 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?UTF-8?q?Bendik=20R=C3=B8nning=20Opstad?= X-Patchwork-Id: 535260 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id E67281409F8 for ; Sat, 24 Oct 2015 07:55:51 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=ZuM6vvbb; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752586AbbJWUzq (ORCPT ); Fri, 23 Oct 2015 16:55:46 -0400 Received: from mail-lf0-f65.google.com ([209.85.215.65]:33078 "EHLO mail-lf0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752341AbbJWUzk (ORCPT ); Fri, 23 Oct 2015 16:55:40 -0400 Received: by lfaz124 with SMTP id z124so8918051lfa.0; Fri, 23 Oct 2015 13:55:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-type:content-transfer-encoding; bh=fyRADWm2lnI9F86LmGzfsTk58D3givJNq+WwlGsbJLs=; b=ZuM6vvbbbdH/leQsSYXXQTDBbwYPfN6bgGxi0w+dIGeS6TeQypxj/Cx5NgqYHz+dTH DPyk5q8iD7u8RlRC1f3Mw2Q2Zn4dMHGhni6k6MkxLt10cbiYxztCs916oZ4eArRlN6qB Fm33dclsYekqLlcFFKsAbOF0ClObsldbGytx215nvTceJzgnI3RMDlXJIGMgWQGkStAx MeAwHJ7u2h4W0Ju12v6D5WZ8+G456sWRUR1INegr+h+EMVIIg+zTDQmmhBxmyFarYl9h nU9Uzqrgr1eRa0Jukkn0vhzKzxRb9c4LqrrDRbgHIXFYsD5pbS3DHHiQ+joFVnEplhUY 4pKA== X-Received: by 10.112.135.136 with SMTP id ps8mr11974100lbb.38.1445633738492; Fri, 23 Oct 2015 13:55:38 -0700 (PDT) Received: from rdbsender.example.org ([77.88.71.157]) by smtp.gmail.com with ESMTPSA id dz9sm29960lbc.40.2015.10.23.13.55.37 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 23 Oct 2015 13:55:37 -0700 (PDT) From: "=?UTF-8?q?Bendik=20R=C3=B8nning=20Opstad?=" X-Google-Original-From: =?UTF-8?q?Bendik=20R=C3=B8nning=20Opstad?= To: "David S. Miller" , Alexey Kuznetsov , James Morris , Hideaki YOSHIFUJI , Patrick McHardy , Jonathan Corbet Cc: Eric Dumazet , Neal Cardwell , Tom Herbert , Yuchung Cheng , Paolo Abeni , Erik Kline , Hannes Frederic Sowa , Al Viro , Jiri Pirko , Alexander Duyck , Florian Westphal , Daniel Lee , Marcelo Ricardo Leitner , Daniel Borkmann , Willem de Bruijn , =?UTF-8?q?Linus=20L=C3=BCssing?= , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-api@vger.kernel.org, Andreas Petlund , Carsten Griwodz , =?UTF-8?q?P=C3=A5l=20Halvorsen?= , Jonas Markussen , Kristian Evensen , Kenneth Klette Jonassen , =?UTF-8?q?Bendik=20R=C3=B8nning=20Opstad?= Subject: [PATCH RFC net-next 1/2] tcp: Add DPIFL thin stream detection mechanism Date: Fri, 23 Oct 2015 22:50:12 +0200 Message-Id: <1445633413-3532-2-git-send-email-bro.devel+kernel@gmail.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1445633413-3532-1-git-send-email-bro.devel+kernel@gmail.com> References: <1445633413-3532-1-git-send-email-bro.devel+kernel@gmail.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The existing mechanism for detecting thin streams (tcp_stream_is_thin) is based on a static limit of less than 4 packets in flight. This treats streams differently depending on the connections RTT, such that a stream on a high RTT link may never be considered thin, whereas the same application would produce a stream that would always be thin in a low RTT scenario (e.g. data center). By calculating a dynamic packets in flight limit (DPIFL), the thin stream detection will be independent of the RTT and treat streams equally based on the transmission pattern, i.e. the inter-transmission time (ITT). Cc: Andreas Petlund Cc: Carsten Griwodz Cc: Pål Halvorsen Cc: Jonas Markussen Cc: Kristian Evensen Cc: Kenneth Klette Jonassen Signed-off-by: Bendik Rønning Opstad --- Documentation/networking/ip-sysctl.txt | 8 ++++++++ include/linux/tcp.h | 6 ++++++ include/net/tcp.h | 20 ++++++++++++++++++++ net/ipv4/sysctl_net_ipv4.c | 9 +++++++++ net/ipv4/tcp.c | 3 +++ 5 files changed, 46 insertions(+) diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index 85752c8..b841a76 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -700,6 +700,14 @@ tcp_thin_dupack - BOOLEAN Documentation/networking/tcp-thin.txt Default: 0 +tcp_thin_dpifl_itt_lower_bound - INTEGER + Controls the lower bound for ITT (inter-transmission time) threshold + for when a stream is considered thin. The value is specified in + microseconds, and may not be lower than 10000 (10 ms). This theshold + is used to calculate a dynamic packets in flight limit (DPIFL) which + is used to classify whether a stream is thin. + Default: 10000 + tcp_limit_output_bytes - INTEGER Controls TCP Small Queue limit per tcp socket. TCP bulk sender tends to increase packets in flight until it diff --git a/include/linux/tcp.h b/include/linux/tcp.h index c906f45..fc885db 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -269,6 +269,12 @@ struct tcp_sock { struct sk_buff* lost_skb_hint; struct sk_buff *retransmit_skb_hint; + /* The limit used to identify when a stream is thin based in a minimum + * allowed inter-transmission time (ITT) in microseconds. This is used + * to dynamically calculate a max packets in flight limit (DPIFL). + */ + int thin_dpifl_itt_lower_bound; + /* OOO segments go in this list. Note that socket lock must be held, * as we do not use sk_buff_head lock. */ diff --git a/include/net/tcp.h b/include/net/tcp.h index 4fc457b..6534836 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -215,6 +215,7 @@ void tcp_time_wait(struct sock *sk, int state, int timeo); /* TCP thin-stream limits */ #define TCP_THIN_LINEAR_RETRIES 6 /* After 6 linear retries, do exp. backoff */ +#define TCP_THIN_DPIFL_ITT_LOWER_BOUND_MIN 10000 /* Minimum lower bound is 10 ms (10000 usec) */ /* TCP initial congestion window as per draft-hkchu-tcpm-initcwnd-01 */ #define TCP_INIT_CWND 10 @@ -274,6 +275,7 @@ extern int sysctl_tcp_workaround_signed_windows; extern int sysctl_tcp_slow_start_after_idle; extern int sysctl_tcp_thin_linear_timeouts; extern int sysctl_tcp_thin_dupack; +extern int sysctl_tcp_thin_dpifl_itt_lower_bound; extern int sysctl_tcp_early_retrans; extern int sysctl_tcp_limit_output_bytes; extern int sysctl_tcp_challenge_ack_limit; @@ -1631,6 +1633,24 @@ static inline bool tcp_stream_is_thin(struct tcp_sock *tp) return tp->packets_out < 4 && !tcp_in_initial_slowstart(tp); } +/** + * tcp_stream_is_thin_dpifl() - Tests if the stream is thin based on dynamic PIF + * limit + * @tp: the tcp_sock struct + * + * Return: true if current packets in flight (PIF) count is lower than + * the dynamic PIF limit, else false + */ +static inline bool tcp_stream_is_thin_dpifl(const struct tcp_sock *tp) +{ + u64 dpif_lim = tp->srtt_us >> 3; + /* Div by is_thin_min_itt_lim, the minimum allowed ITT + * (Inter-transmission time) in usecs. + */ + do_div(dpif_lim, tp->thin_dpifl_itt_lower_bound); + return tcp_packets_in_flight(tp) < dpif_lim; +} + /* /proc */ enum tcp_seq_states { TCP_SEQ_STATE_LISTENING, diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 25300c5..917fdde 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -42,6 +42,7 @@ static int tcp_syn_retries_min = 1; static int tcp_syn_retries_max = MAX_TCP_SYNCNT; static int ip_ping_group_range_min[] = { 0, 0 }; static int ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX }; +static int tcp_thin_dpifl_itt_lower_bound_min = TCP_THIN_DPIFL_ITT_LOWER_BOUND_MIN; /* Update system visible IP port range */ static void set_local_port_range(struct net *net, int range[2]) @@ -709,6 +710,14 @@ static struct ctl_table ipv4_table[] = { .proc_handler = proc_dointvec }, { + .procname = "tcp_thin_dpifl_itt_lower_bound", + .data = &sysctl_tcp_thin_dpifl_itt_lower_bound, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = &proc_dointvec_minmax, + .extra1 = &tcp_thin_dpifl_itt_lower_bound_min, + }, + { .procname = "tcp_early_retrans", .data = &sysctl_tcp_early_retrans, .maxlen = sizeof(int), diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 0cfa7c0..f712d7c 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -287,6 +287,8 @@ int sysctl_tcp_min_tso_segs __read_mostly = 2; int sysctl_tcp_autocorking __read_mostly = 1; +int sysctl_tcp_thin_dpifl_itt_lower_bound __read_mostly = TCP_THIN_DPIFL_ITT_LOWER_BOUND_MIN; + struct percpu_counter tcp_orphan_count; EXPORT_SYMBOL_GPL(tcp_orphan_count); @@ -406,6 +408,7 @@ void tcp_init_sock(struct sock *sk) u64_stats_init(&tp->syncp); tp->reordering = sysctl_tcp_reordering; + tp->thin_dpifl_itt_lower_bound = sysctl_tcp_thin_dpifl_itt_lower_bound; tcp_enable_early_retrans(tp); tcp_assign_congestion_control(sk);