From patchwork Sat Mar 23 08:05:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lawrence Brakmo X-Patchwork-Id: 1062053 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.b="e1Lzo+ea"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 44RCmd3vs3z9sRt for ; Sat, 23 Mar 2019 19:07:13 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727143AbfCWIHL (ORCPT ); Sat, 23 Mar 2019 04:07:11 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:40552 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727135AbfCWIHK (ORCPT ); Sat, 23 Mar 2019 04:07:10 -0400 Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x2N82grJ022010 for ; Sat, 23 Mar 2019 01:07:09 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=jMADSJWD79Bo2lEXF3hdlZXloAlXn5SfDRh2l9FWW1I=; b=e1Lzo+ea+qNqh/wOkHesX7fD9e4uK2ZdZs6vDMifHBjGMExuYFhFv24YY1fD84ST37VO lOBF8qw7vGAegByWSDeS6k6H6WYmM5xS/Z+snP9lrgBuqWK3QW5AZ5IORfzzQ33WX7Sw o51ZoUQ/z/3ZUZjaAbaBynAbyYGYsX0sw5E= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2rddk9regt-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Sat, 23 Mar 2019 01:07:08 -0700 Received: from mx-out.facebook.com (2620:10d:c081:10::13) by mail.thefacebook.com (2620:10d:c081:35::126) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA) id 15.1.1713.5; Sat, 23 Mar 2019 01:07:07 -0700 Received: by devbig009.ftw2.facebook.com (Postfix, from userid 10340) id 04E695AE24E3; Sat, 23 Mar 2019 01:07:06 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: brakmo Smtp-Origin-Hostname: devbig009.ftw2.facebook.com To: netdev CC: Martin Lau , Alexei Starovoitov , Daniel Borkmann , Eric Dumazet , Kernel Team Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH bpf-next 5/7] bpf: sysctl for probe_on_drop Date: Sat, 23 Mar 2019 01:05:40 -0700 Message-ID: <20190323080542.173569-6-brakmo@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190323080542.173569-1-brakmo@fb.com> References: <20190323080542.173569-1-brakmo@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-03-23_05:, , signatures=0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org When a packet is dropped when calling queue_xmit in __tcp_transmit_skb and packets_out is 0, it is beneficial to set a small probe timer. Otherwise, the throughput for the flow can suffer because it may need to depend on the probe timer to start sending again. The default value for the probe timer is at least 200ms, this patch sets it to 20ms when a packet is dropped and there are no other packets in flight. This patch introduces a new sysctl, sysctl_tcp_probe_on_drop_ms, that is used to specify the duration of the probe timer for the case described earlier. The allowed values are between 0 and TCP_RTO_MIN. A value of 0 disables setting the probe timer with a small value. Signed-off-by: Lawrence Brakmo --- include/net/netns/ipv4.h | 1 + net/ipv4/sysctl_net_ipv4.c | 10 ++++++++++ net/ipv4/tcp_ipv4.c | 1 + net/ipv4/tcp_output.c | 18 +++++++++++++++--- 4 files changed, 27 insertions(+), 3 deletions(-) diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index 104a6669e344..d5716a193883 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -165,6 +165,7 @@ struct netns_ipv4 { int sysctl_tcp_wmem[3]; int sysctl_tcp_rmem[3]; int sysctl_tcp_comp_sack_nr; + int sysctl_tcp_probe_on_drop_ms; unsigned long sysctl_tcp_comp_sack_delay_ns; struct inet_timewait_death_row tcp_death_row; int sysctl_max_syn_backlog; diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index ba0fc4b18465..50837e66313f 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -49,6 +49,7 @@ static int ip_ping_group_range_min[] = { 0, 0 }; static int ip_ping_group_range_max[] = { GID_T_MAX, GID_T_MAX }; static int comp_sack_nr_max = 255; static u32 u32_max_div_HZ = UINT_MAX / HZ; +static int probe_on_drop_max = TCP_RTO_MIN; /* obsolete */ static int sysctl_tcp_low_latency __read_mostly; @@ -1219,6 +1220,15 @@ static struct ctl_table ipv4_net_table[] = { .extra1 = &zero, .extra2 = &comp_sack_nr_max, }, + { + .procname = "tcp_probe_on_drop_ms", + .data = &init_net.ipv4.sysctl_tcp_probe_on_drop_ms, + .maxlen = sizeof(int), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = &zero, + .extra2 = &probe_on_drop_max, + }, { .procname = "udp_rmem_min", .data = &init_net.ipv4.sysctl_udp_rmem_min, diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 277d71239d75..5aba95850d61 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -2679,6 +2679,7 @@ static int __net_init tcp_sk_init(struct net *net) spin_lock_init(&net->ipv4.tcp_fastopen_ctx_lock); net->ipv4.sysctl_tcp_fastopen_blackhole_timeout = 60 * 60; atomic_set(&net->ipv4.tfo_active_disable_times, 0); + net->ipv4.sysctl_tcp_probe_on_drop_ms = 20; /* Reno is always built in */ if (!net_eq(net, &init_net) && diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 4522579aaca2..95a0102fde3b 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1158,9 +1158,21 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, err = icsk->icsk_af_ops->queue_xmit(sk, skb, &inet->cork.fl); - if (unlikely(err > 0)) { - tcp_enter_cwr(sk); - err = net_xmit_eval(err); + if (unlikely(err)) { + if (unlikely(err > 0)) { + tcp_enter_cwr(sk); + err = net_xmit_eval(err); + } + /* Packet was dropped. If there are no packets out, + * we may need to depend on probe timer to start sending + * again. Hence, use a smaller value. + */ + if (!tp->packets_out && !inet_csk(sk)->icsk_pending && + sock_net(sk)->ipv4.sysctl_tcp_probe_on_drop_ms > 0) { + tcp_reset_xmit_timer(sk, ICSK_TIME_PROBE0, + sock_net(sk)->ipv4.sysctl_tcp_probe_on_drop_ms, + TCP_RTO_MAX, NULL); + } } if (!err && oskb) { tcp_update_skb_after_send(sk, oskb, prior_wstamp);