From patchwork Tue Oct 22 23:10:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 1181702 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="ZL4C6hDz"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46yTmv67wxz9sPV for ; Wed, 23 Oct 2019 10:12:31 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389624AbfJVXLQ (ORCPT ); Tue, 22 Oct 2019 19:11:16 -0400 Received: from mail-pg1-f195.google.com ([209.85.215.195]:38028 "EHLO mail-pg1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728076AbfJVXLQ (ORCPT ); Tue, 22 Oct 2019 19:11:16 -0400 Received: by mail-pg1-f195.google.com with SMTP id w3so10883759pgt.5 for ; Tue, 22 Oct 2019 16:11:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=HqYClyx4AWhlD7zebpm8wQ+3M2kiFr4SH53W39ZQQmQ=; b=ZL4C6hDztk1gj/qkRQT9QZxgSk1aKpst2mHHcumYKojWJqHh6zLfjTZ5LXzzwK6KYV STvUAc0ZUCJrnfZj6prtcAZZaW/obp34l4M1lJRyQYiWWBHGDgCmNuzsM2SVU7zVO0D5 kTyfmf2Xwsk+j9b1XU5/UnwT2RI1YDOd/Uc8JbL5rBbgWX01wMprsBw1p3dLF1IWhTR8 bATR/PoWtogrcfQplEUeo+5NSXJ7eGdoK6ch5nDOphbPded/pRszkEMxh5NK04S8MPco KOBk8XsWlWwXq3fMZHUwp+7HfGMmasA8j1hJ5mTs2eujCdpn6jGB78lE37bLUO6KJKfK U58Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=HqYClyx4AWhlD7zebpm8wQ+3M2kiFr4SH53W39ZQQmQ=; b=rC2BN67xCAqmcsksWUYP0TWW6N2wcV6PnvwVZDFDzNlNtYXX3qQQIiUt9RDZa/HTZI XysmQo9NmY4uQae0XSSD3dNl3DoZai03mIKdu1c7NALiIZ+qjoB85L+c9MKaAn8QCUpv BwMnFfx07ORKqf03JR/EA1x8MmWF6aUtLc9JNskfmcRTXW/ZG5sxAusde9euu5TYETUo TQZHI21xOMe6i25e+r82dL3I+G8Trkt+CQ+E+WccL9J4kgkTZf8YrbZbdGMxVRy0nVnc wvwMcyCnvCenD9NtaG/ciEH1uZebn7cn4783VS/ynxNsPp71fv79NrEwvXkKNKwPhvEk WnzQ== X-Gm-Message-State: APjAAAVJZghD6n6Y12/kfRhCn0TZ24Yv+gn3R1ky+FL9gsi/buXnKyMp F2Ygt4d33QKCcX696tDKfnkOpUZm X-Google-Smtp-Source: APXvYqxxzjkvGW+tlWJgvfF7RA9EL3a3xYTgywzVqZ9GAaILi5UNnTkVcWaqzxD5u0v8xW0PhhPcyg== X-Received: by 2002:a17:90b:d90:: with SMTP id bg16mr7704318pjb.143.1571785875312; Tue, 22 Oct 2019 16:11:15 -0700 (PDT) Received: from tw-172-25-31-76.office.twttr.net ([8.25.197.24]) by smtp.gmail.com with ESMTPSA id j24sm20619284pff.71.2019.10.22.16.11.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Oct 2019 16:11:14 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: ycheng@google.com, ncardwell@google.com, edumazet@google.com, Cong Wang Subject: [Patch net-next 1/3] tcp: get rid of ICSK_TIME_EARLY_RETRANS Date: Tue, 22 Oct 2019 16:10:49 -0700 Message-Id: <20191022231051.30770-2-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20191022231051.30770-1-xiyou.wangcong@gmail.com> References: <20191022231051.30770-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org After commit bec41a11dd3d ("tcp: remove early retransmit") ICSK_TIME_EARLY_RETRANS is no longer effective, so we can remove its definition too. Cc: Yuchung Cheng Cc: Eric Dumazet Signed-off-by: Cong Wang --- include/net/inet_connection_sock.h | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 895546058a20..e46958460739 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -142,9 +142,8 @@ struct inet_connection_sock { #define ICSK_TIME_RETRANS 1 /* Retransmit timer */ #define ICSK_TIME_DACK 2 /* Delayed ack timer */ #define ICSK_TIME_PROBE0 3 /* Zero window probe timer */ -#define ICSK_TIME_EARLY_RETRANS 4 /* Early retransmit timer */ -#define ICSK_TIME_LOSS_PROBE 5 /* Tail loss probe timer */ -#define ICSK_TIME_REO_TIMEOUT 6 /* Reordering timer */ +#define ICSK_TIME_LOSS_PROBE 4 /* Tail loss probe timer */ +#define ICSK_TIME_REO_TIMEOUT 5 /* Reordering timer */ static inline struct inet_connection_sock *inet_csk(const struct sock *sk) { @@ -227,8 +226,7 @@ static inline void inet_csk_reset_xmit_timer(struct sock *sk, const int what, } if (what == ICSK_TIME_RETRANS || what == ICSK_TIME_PROBE0 || - what == ICSK_TIME_EARLY_RETRANS || what == ICSK_TIME_LOSS_PROBE || - what == ICSK_TIME_REO_TIMEOUT) { + what == ICSK_TIME_LOSS_PROBE || what == ICSK_TIME_REO_TIMEOUT) { icsk->icsk_pending = what; icsk->icsk_timeout = jiffies + when; sk_reset_timer(sk, &icsk->icsk_retransmit_timer, icsk->icsk_timeout); From patchwork Tue Oct 22 23:10:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 1181704 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="ciZl7tXF"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46yTmx0jhbz9sPc for ; Wed, 23 Oct 2019 10:12:33 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389651AbfJVXLT (ORCPT ); Tue, 22 Oct 2019 19:11:19 -0400 Received: from mail-pf1-f194.google.com ([209.85.210.194]:42227 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389647AbfJVXLR (ORCPT ); Tue, 22 Oct 2019 19:11:17 -0400 Received: by mail-pf1-f194.google.com with SMTP id 21so350242pfj.9 for ; Tue, 22 Oct 2019 16:11:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=P2c/8plpwMSofPQHA6q0oGffHZXYNWk1bB7ctoUuHgE=; b=ciZl7tXFeXTGBFSw3CTUlwWi84eAllahDryhi2Vzg6687TNqgat7ZvwOpAIRfUZz72 wwd0Vi4pvUtlNzBBR4wgdUpGxLEfCwRjAeLAQpq46loyQEh5TEjRiGWDGOV0Ye3pyO/y 1AOOpTZ1sPK0ecZxsgD7E3ILJJLdriJphQukwNx3QpJ77tv7q7pjiLyP9ZJEu97rf1Vt YtZSLU5+NyTdzSLCejIvMY3CkM5bftNfWEgfhkKOv070/w1Tku38iylPKqqHm8nFu9zI +PdLVs7yWHlxqaou9+9I/K6LLZezqHC6/GuIu6hIzLXKO8GicBDPAAr0N4oQwibTv6ao lm9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=P2c/8plpwMSofPQHA6q0oGffHZXYNWk1bB7ctoUuHgE=; b=s+kXzN17qPeJSCRElgz0nV1IJX31jRiM3LF/0YQlM0/lm3j7q4M0/FbS6s/3K7CwkG nUhty9mmGJhhrFnaD35GcMpK4iNRfeU+M3hodEOSP6SItGTapO/R4dnHe94NucuUn8eZ HYOlycxyWKKpo3zzjwWjzann83cRidNxYG5sgoeV5kRtPGCV7cFQEYHsM8IXhsRkgxh+ frYlwackfJXbXjUX4Qbs2t5u9pUYOgsfbOqg844rr/Ru5j0rOf8dhN2yAgZz/sGNWHm5 AbzYientl3WcpDg2dcc/72GXkcilQMiaEfrX/4V7ao8f1Wyukfh8terKiLKED5/3X3+S 5MjQ== X-Gm-Message-State: APjAAAUdNzPO/0Yf1zn+yPR9e4mXLBbqq8mNnxtYyYHVr+hfhMYMpa/R hWKY71/uzBvUykUCQlcuo2dF1oA+ X-Google-Smtp-Source: APXvYqy2aAZMGUoutyA6wPtcH35sJn+cfJA08EfyE3sP66ksEpmjXfr7UvBcA6D+0K/0+2+hwUTc1w== X-Received: by 2002:a62:37c7:: with SMTP id e190mr7355670pfa.130.1571785876260; Tue, 22 Oct 2019 16:11:16 -0700 (PDT) Received: from tw-172-25-31-76.office.twttr.net ([8.25.197.24]) by smtp.gmail.com with ESMTPSA id j24sm20619284pff.71.2019.10.22.16.11.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Oct 2019 16:11:15 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: ycheng@google.com, ncardwell@google.com, edumazet@google.com, Cong Wang Subject: [Patch net-next 2/3] tcp: make tcp_send_loss_probe() boolean Date: Tue, 22 Oct 2019 16:10:50 -0700 Message-Id: <20191022231051.30770-3-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20191022231051.30770-1-xiyou.wangcong@gmail.com> References: <20191022231051.30770-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Let tcp_send_loss_probe() return whether a TLP has been sent or not. This is needed by the folllowing patch. Cc: Eric Dumazet Signed-off-by: Cong Wang --- include/net/tcp.h | 2 +- net/ipv4/tcp_output.c | 7 +++++-- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index ab4eb5eb5d07..0ee5400e751c 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -581,7 +581,7 @@ void tcp_push_one(struct sock *, unsigned int mss_now); void __tcp_send_ack(struct sock *sk, u32 rcv_nxt); void tcp_send_ack(struct sock *sk); void tcp_send_delayed_ack(struct sock *sk); -void tcp_send_loss_probe(struct sock *sk); +bool tcp_send_loss_probe(struct sock *sk); bool tcp_schedule_loss_probe(struct sock *sk, bool advancing_rto); void tcp_skb_collapse_tstamp(struct sk_buff *skb, const struct sk_buff *next_skb); diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 0488607c5cd3..9822820edca4 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2539,12 +2539,13 @@ static bool skb_still_in_host_queue(const struct sock *sk, /* When probe timeout (PTO) fires, try send a new segment if possible, else * retransmit the last segment. */ -void tcp_send_loss_probe(struct sock *sk) +bool tcp_send_loss_probe(struct sock *sk) { struct tcp_sock *tp = tcp_sk(sk); struct sk_buff *skb; int pcount; int mss = tcp_current_mss(sk); + bool sent = false; skb = tcp_send_head(sk); if (skb && tcp_snd_wnd_test(tp, skb, mss)) { @@ -2560,7 +2561,7 @@ void tcp_send_loss_probe(struct sock *sk) "invalid inflight: %u state %u cwnd %u mss %d\n", tp->packets_out, sk->sk_state, tp->snd_cwnd, mss); inet_csk(sk)->icsk_pending = 0; - return; + return false; } /* At most one outstanding TLP retransmission. */ @@ -2592,11 +2593,13 @@ void tcp_send_loss_probe(struct sock *sk) tp->tlp_high_seq = tp->snd_nxt; probe_sent: + sent = true; NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPLOSSPROBES); /* Reset s.t. tcp_rearm_rto will restart timer from now */ inet_csk(sk)->icsk_pending = 0; rearm_timer: tcp_rearm_rto(sk); + return sent; } /* Push out any pending frames which were held back due to From patchwork Tue Oct 22 23:10:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Cong Wang X-Patchwork-Id: 1181705 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="G14Lj8hb"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 46yTmx4vV8z9sPV for ; Wed, 23 Oct 2019 10:12:33 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389660AbfJVXLU (ORCPT ); Tue, 22 Oct 2019 19:11:20 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:40973 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2389652AbfJVXLU (ORCPT ); Tue, 22 Oct 2019 19:11:20 -0400 Received: by mail-pg1-f193.google.com with SMTP id t3so10874709pga.8 for ; Tue, 22 Oct 2019 16:11:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ygGNdoBgn8tpdMEsmC6SVnJ6LIduM2eem+TxJOEVG/k=; b=G14Lj8hbHDoUOoIIylPbRcewfCyfMY8zChE17X9+sBe+ezAKQ8uftfJTdEBVudJrRb ZdzmENcjdCuGS0bgZIJgCfJs0YRK23Dk5Jj77/TNwE6gPX+OBgR6ineGDwd57KEksSeG lsUQ30lxgzvAUdzSG+MbNcq/YQl6HYrD+SVJh9hlVYlclGvP3HeURCUqVjWi2i+6hyu/ XoUsBiR8pXrH8UrTZFsvQz1qYXrWLFK2u06yjfi/BofxHEFt0qPxkfKpS9a50hFcudf2 oYWRmBW9yQ2tsDoYY8mOOtfDYocO3RWZeK6tS7zlAhsbtVn1LXcGXjcfZXZe8YZAAzEq wgmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ygGNdoBgn8tpdMEsmC6SVnJ6LIduM2eem+TxJOEVG/k=; b=cL95VTiS1NpkUzVAbi03Ncs+bWSORqpFxVbvrq/i8Jz34OTLIGhCbQoitmjSyDIglU oQ+K8DVrdNZ9FPeeBDjlG59KZDDcG28XKQt6Gc0yM7I3VY90fFM7weRG/qcFPKquMNmK Qs0vDbSnpabMjDP3uXb7HEYKB46Rqgoo43Pi17/hBXbleZ6yXC1vcFg/La3x173Vny5Z UopZ9+Lwkk6xkpNfODyuOHlDgM8M7s+QEOl+YInXnxIrUzv1w/sP85GccwzI0vA6ZAbm wO3RULqFvGFkN7FqzAyYEr7nhf5B4WccBtvuqqhWWzEPJvzpgSmRCRKmIj/jdrJ9ABeZ f/1A== X-Gm-Message-State: APjAAAUxt4annCcSCveqLNDe193Q5Ngd+c9a1Wxhdvzbx7ft96QQWjIa qX0xwQnpoNIwMl8VLBQyXJy54Jaa X-Google-Smtp-Source: APXvYqyEUkllX4RrIEnLRG92WQPtFpMhcRebcY2IM5lbeXqx3z+fxJCKYsPDvF+KXrT+gXPlEC6XUQ== X-Received: by 2002:a62:750d:: with SMTP id q13mr6912364pfc.58.1571785878869; Tue, 22 Oct 2019 16:11:18 -0700 (PDT) Received: from tw-172-25-31-76.office.twttr.net ([8.25.197.24]) by smtp.gmail.com with ESMTPSA id j24sm20619284pff.71.2019.10.22.16.11.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Oct 2019 16:11:16 -0700 (PDT) From: Cong Wang To: netdev@vger.kernel.org Cc: ycheng@google.com, ncardwell@google.com, edumazet@google.com, Cong Wang Subject: [Patch net-next 3/3] tcp: decouple TLP timer from RTO timer Date: Tue, 22 Oct 2019 16:10:51 -0700 Message-Id: <20191022231051.30770-4-xiyou.wangcong@gmail.com> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20191022231051.30770-1-xiyou.wangcong@gmail.com> References: <20191022231051.30770-1-xiyou.wangcong@gmail.com> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Currently RTO, TLP and PROBE0 all share a same timer instance in kernel and use icsk->icsk_pending to dispatch the work. This causes spinlock contention when resetting the timer is too frequent, as clearly shown in the perf report: 61.72% 61.71% swapper [kernel.kallsyms] [k] queued_spin_lock_slowpath ... - 58.83% tcp_v4_rcv - 58.80% tcp_v4_do_rcv - 58.80% tcp_rcv_established - 52.88% __tcp_push_pending_frames - 52.88% tcp_write_xmit - 28.16% tcp_event_new_data_sent - 28.15% sk_reset_timer + mod_timer - 24.68% tcp_schedule_loss_probe - 24.68% sk_reset_timer + 24.68% mod_timer This patch decouples TLP timer from RTO timer by adding a new timer instance but still uses icsk->icsk_pending to dispatch, in order to minimize the risk of this patch. After this patch, the CPU time spent in tcp_write_xmit() reduced down to 10.92%. Cc: Eric Dumazet Signed-off-by: Cong Wang --- include/net/inet_connection_sock.h | 9 +++++-- include/net/tcp.h | 1 + net/dccp/timer.c | 2 +- net/ipv4/inet_connection_sock.c | 5 +++- net/ipv4/inet_diag.c | 8 ++++-- net/ipv4/tcp_input.c | 8 ++++-- net/ipv4/tcp_ipv4.c | 6 +++-- net/ipv4/tcp_output.c | 1 + net/ipv4/tcp_timer.c | 43 +++++++++++++++++++++++++++--- net/ipv6/tcp_ipv6.c | 6 +++-- 10 files changed, 73 insertions(+), 16 deletions(-) diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index e46958460739..2a129fc6b522 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -121,6 +121,7 @@ struct inet_connection_sock { __u16 last_seg_size; /* Size of last incoming segment */ __u16 rcv_mss; /* MSS used for delayed ACK decisions */ } icsk_ack; + struct timer_list icsk_tlp_timer; struct { int enabled; @@ -170,7 +171,8 @@ enum inet_csk_ack_state_t { void inet_csk_init_xmit_timers(struct sock *sk, void (*retransmit_handler)(struct timer_list *), void (*delack_handler)(struct timer_list *), - void (*keepalive_handler)(struct timer_list *)); + void (*keepalive_handler)(struct timer_list *), + void (*tlp_handler)(struct timer_list *)); void inet_csk_clear_xmit_timers(struct sock *sk); static inline void inet_csk_schedule_ack(struct sock *sk) @@ -226,7 +228,7 @@ static inline void inet_csk_reset_xmit_timer(struct sock *sk, const int what, } if (what == ICSK_TIME_RETRANS || what == ICSK_TIME_PROBE0 || - what == ICSK_TIME_LOSS_PROBE || what == ICSK_TIME_REO_TIMEOUT) { + what == ICSK_TIME_REO_TIMEOUT) { icsk->icsk_pending = what; icsk->icsk_timeout = jiffies + when; sk_reset_timer(sk, &icsk->icsk_retransmit_timer, icsk->icsk_timeout); @@ -234,6 +236,9 @@ static inline void inet_csk_reset_xmit_timer(struct sock *sk, const int what, icsk->icsk_ack.pending |= ICSK_ACK_TIMER; icsk->icsk_ack.timeout = jiffies + when; sk_reset_timer(sk, &icsk->icsk_delack_timer, icsk->icsk_ack.timeout); + } else if (what == ICSK_TIME_LOSS_PROBE) { + icsk->icsk_pending = what; + sk_reset_timer(sk, &icsk->icsk_tlp_timer, jiffies + when); } else { pr_debug("inet_csk BUG: unknown timer value\n"); } diff --git a/include/net/tcp.h b/include/net/tcp.h index 0ee5400e751c..3319d2b6b1c4 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -331,6 +331,7 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, void tcp_release_cb(struct sock *sk); void tcp_wfree(struct sk_buff *skb); void tcp_write_timer_handler(struct sock *sk); +void tcp_tail_loss_probe_handler(struct sock *sk); void tcp_delack_timer_handler(struct sock *sk); int tcp_ioctl(struct sock *sk, int cmd, unsigned long arg); int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb); diff --git a/net/dccp/timer.c b/net/dccp/timer.c index c0b3672637c4..897a0469e8f1 100644 --- a/net/dccp/timer.c +++ b/net/dccp/timer.c @@ -246,7 +246,7 @@ void dccp_init_xmit_timers(struct sock *sk) tasklet_init(&dp->dccps_xmitlet, dccp_write_xmitlet, (unsigned long)sk); timer_setup(&dp->dccps_xmit_timer, dccp_write_xmit_timer, 0); inet_csk_init_xmit_timers(sk, &dccp_write_timer, &dccp_delack_timer, - &dccp_keepalive_timer); + &dccp_keepalive_timer, NULL); } static ktime_t dccp_timestamp_seed; diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index eb30fc1770de..4b279a86308e 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -503,12 +503,14 @@ EXPORT_SYMBOL(inet_csk_accept); void inet_csk_init_xmit_timers(struct sock *sk, void (*retransmit_handler)(struct timer_list *t), void (*delack_handler)(struct timer_list *t), - void (*keepalive_handler)(struct timer_list *t)) + void (*keepalive_handler)(struct timer_list *t), + void (*tlp_handler)(struct timer_list *t)) { struct inet_connection_sock *icsk = inet_csk(sk); timer_setup(&icsk->icsk_retransmit_timer, retransmit_handler, 0); timer_setup(&icsk->icsk_delack_timer, delack_handler, 0); + timer_setup(&icsk->icsk_tlp_timer, tlp_handler, 0); timer_setup(&sk->sk_timer, keepalive_handler, 0); icsk->icsk_pending = icsk->icsk_ack.pending = 0; } @@ -522,6 +524,7 @@ void inet_csk_clear_xmit_timers(struct sock *sk) sk_stop_timer(sk, &icsk->icsk_retransmit_timer); sk_stop_timer(sk, &icsk->icsk_delack_timer); + sk_stop_timer(sk, &icsk->icsk_tlp_timer); sk_stop_timer(sk, &sk->sk_timer); } EXPORT_SYMBOL(inet_csk_clear_xmit_timers); diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c index 7dc79b973e6e..e87fe87571a1 100644 --- a/net/ipv4/inet_diag.c +++ b/net/ipv4/inet_diag.c @@ -221,8 +221,7 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk, } if (icsk->icsk_pending == ICSK_TIME_RETRANS || - icsk->icsk_pending == ICSK_TIME_REO_TIMEOUT || - icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) { + icsk->icsk_pending == ICSK_TIME_REO_TIMEOUT) { r->idiag_timer = 1; r->idiag_retrans = icsk->icsk_retransmits; r->idiag_expires = @@ -232,6 +231,11 @@ int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk, r->idiag_retrans = icsk->icsk_probes_out; r->idiag_expires = jiffies_to_msecs(icsk->icsk_timeout - jiffies); + } else if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) { + r->idiag_timer = 1; + r->idiag_retrans = icsk->icsk_retransmits; + r->idiag_expires = + jiffies_to_msecs(icsk->icsk_tlp_timer.expires - jiffies); } else if (timer_pending(&sk->sk_timer)) { r->idiag_timer = 2; r->idiag_retrans = icsk->icsk_probes_out; diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a2e52ad7cdab..71cbb486ef85 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3008,8 +3008,12 @@ void tcp_rearm_rto(struct sock *sk) */ rto = usecs_to_jiffies(max_t(int, delta_us, 1)); } - tcp_reset_xmit_timer(sk, ICSK_TIME_RETRANS, rto, - TCP_RTO_MAX, tcp_rtx_queue_head(sk)); + if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) + tcp_reset_xmit_timer(sk, ICSK_TIME_LOSS_PROBE, rto, + TCP_RTO_MAX, tcp_rtx_queue_head(sk)); + else + tcp_reset_xmit_timer(sk, ICSK_TIME_RETRANS, rto, + TCP_RTO_MAX, tcp_rtx_queue_head(sk)); } } diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index c616f0ad1fa0..f5e34fe7b2e6 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2434,13 +2434,15 @@ static void get_tcp4_sock(struct sock *sk, struct seq_file *f, int i) int state; if (icsk->icsk_pending == ICSK_TIME_RETRANS || - icsk->icsk_pending == ICSK_TIME_REO_TIMEOUT || - icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) { + icsk->icsk_pending == ICSK_TIME_REO_TIMEOUT) { timer_active = 1; timer_expires = icsk->icsk_timeout; } else if (icsk->icsk_pending == ICSK_TIME_PROBE0) { timer_active = 4; timer_expires = icsk->icsk_timeout; + } else if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) { + timer_active = 1; + timer_expires = icsk->icsk_tlp_timer.expires; } else if (timer_pending(&sk->sk_timer)) { timer_active = 2; timer_expires = sk->sk_timer.expires; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 9822820edca4..9038d7d61d0f 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -882,6 +882,7 @@ void tcp_release_cb(struct sock *sk) if (flags & TCPF_WRITE_TIMER_DEFERRED) { tcp_write_timer_handler(sk); + tcp_tail_loss_probe_handler(sk); __sock_put(sk); } if (flags & TCPF_DELACK_TIMER_DEFERRED) { diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index dd5a6317a801..f112aa979e8c 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -591,9 +591,6 @@ void tcp_write_timer_handler(struct sock *sk) case ICSK_TIME_REO_TIMEOUT: tcp_rack_reo_timeout(sk); break; - case ICSK_TIME_LOSS_PROBE: - tcp_send_loss_probe(sk); - break; case ICSK_TIME_RETRANS: icsk->icsk_pending = 0; tcp_retransmit_timer(sk); @@ -626,6 +623,42 @@ static void tcp_write_timer(struct timer_list *t) sock_put(sk); } +void tcp_tail_loss_probe_handler(struct sock *sk) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + if ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN) || + icsk->icsk_pending != ICSK_TIME_LOSS_PROBE) + goto out; + + if (timer_pending(&icsk->icsk_tlp_timer)) + goto out; + + tcp_mstamp_refresh(tcp_sk(sk)); + if (tcp_send_loss_probe(sk)) + icsk->icsk_retransmits++; +out: + sk_mem_reclaim(sk); +} + +static void tcp_tail_loss_probe_timer(struct timer_list *t) +{ + struct inet_connection_sock *icsk = + from_timer(icsk, t, icsk_tlp_timer); + struct sock *sk = &icsk->icsk_inet.sk; + + bh_lock_sock(sk); + if (!sock_owned_by_user(sk)) { + tcp_tail_loss_probe_handler(sk); + } else { + /* delegate our work to tcp_release_cb() */ + if (!test_and_set_bit(TCP_WRITE_TIMER_DEFERRED, &sk->sk_tsq_flags)) + sock_hold(sk); + } + bh_unlock_sock(sk); + sock_put(sk); +} + void tcp_syn_ack_timeout(const struct request_sock *req) { struct net *net = read_pnet(&inet_rsk(req)->ireq_net); @@ -758,7 +791,9 @@ static enum hrtimer_restart tcp_compressed_ack_kick(struct hrtimer *timer) void tcp_init_xmit_timers(struct sock *sk) { inet_csk_init_xmit_timers(sk, &tcp_write_timer, &tcp_delack_timer, - &tcp_keepalive_timer); + &tcp_keepalive_timer, + &tcp_tail_loss_probe_timer); + hrtimer_init(&tcp_sk(sk)->pacing_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_PINNED_SOFT); tcp_sk(sk)->pacing_timer.function = tcp_pace_kick; diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 4804b6dc5e65..7cc8dbe412af 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1874,13 +1874,15 @@ static void get_tcp6_sock(struct seq_file *seq, struct sock *sp, int i) srcp = ntohs(inet->inet_sport); if (icsk->icsk_pending == ICSK_TIME_RETRANS || - icsk->icsk_pending == ICSK_TIME_REO_TIMEOUT || - icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) { + icsk->icsk_pending == ICSK_TIME_REO_TIMEOUT) { timer_active = 1; timer_expires = icsk->icsk_timeout; } else if (icsk->icsk_pending == ICSK_TIME_PROBE0) { timer_active = 4; timer_expires = icsk->icsk_timeout; + } else if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) { + timer_active = 1; + timer_expires = icsk->icsk_tlp_timer.expires; } else if (timer_pending(&sp->sk_timer)) { timer_active = 2; timer_expires = sp->sk_timer.expires;