From patchwork Wed Jun 27 02:34:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lawrence Brakmo X-Patchwork-Id: 935233 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.b="Q+UkXbIr"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41Fn6V75CLz9s0w for ; Wed, 27 Jun 2018 12:34:10 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752178AbeF0CeH (ORCPT ); Tue, 26 Jun 2018 22:34:07 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:52880 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751155AbeF0CeG (ORCPT ); Tue, 26 Jun 2018 22:34:06 -0400 Received: from pps.filterd (m0001255.ppops.net [127.0.0.1]) by mx0b-00082601.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w5R2WURK000558 for ; Tue, 26 Jun 2018 19:34:05 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : mime-version : content-type; s=facebook; bh=Lh26pSAda0kTPmyiIKTrZOwy/6rk+yUjXIvR3eV0qxE=; b=Q+UkXbIrecmjtITd2sTM6i1El/VSJEBpHi/MndMkZy0dvPC9mDnMQ+isSXYbL5P21Ytx JWkV4k0do12YngUOjwQ0zT/6/H17ZJM0x2LJ+ug2GVPhgk5D9QhY5anGers5Jryutd2m i3YE4WJ/o+lpwK6cMX3LarSv0KCNWrQBDgE= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0b-00082601.pphosted.com with ESMTP id 2jutrj93ct-1 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Tue, 26 Jun 2018 19:34:05 -0700 Received: from mx-out.facebook.com (192.168.52.123) by mail.thefacebook.com (192.168.16.16) with Microsoft SMTP Server (TLS) id 14.3.361.1; Tue, 26 Jun 2018 19:34:04 -0700 Received: by devbig009.ftw2.facebook.com (Postfix, from userid 10340) id 965E35AE0E49; Tue, 26 Jun 2018 19:34:03 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: Lawrence Brakmo Smtp-Origin-Hostname: devbig009.ftw2.facebook.com To: netdev CC: Kernel Team , Blake Matheny , Alexei Starovoitov , Eric Dumazet Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH net-next v2] tcp: force cwnd at least 2 in tcp_cwnd_reduction Date: Tue, 26 Jun 2018 19:34:03 -0700 Message-ID: <20180627023403.3395818-1-brakmo@fb.com> X-Mailer: git-send-email 2.17.1 X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-06-26_11:, , signatures=0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org When using dctcp and doing RPCs, if the last packet of a request is ECN marked as having seen congestion (CE), the sender can decrease its cwnd to 1. As a result, it will only send one packet when a new request is sent. In some instances this results in high tail latencies. For example, in one setup there are 3 hosts sending to a 4th one, with each sender having 3 flows (1 stream, 1 1MB back-to-back RPCs and 1 10KB back-to-back RPCs). The following table shows the 99% and 99.9% latencies for both Cubic and dctcp Cubic 99% Cubic 99.9% dctcp 99% dctcp 99.9% 1MB RPCs 3.5ms 6.0ms 43ms 208ms 10KB RPCs 1.0ms 2.5ms 53ms 212ms On 4.11, pcap traces indicate that in some instances the 1st packet of the RPC is received but no ACK is sent before the packet is retransmitted. On 4.11 netstat shows TCP timeouts, with some of them spurious. On 4.16, we don't see retransmits in netstat but the high tail latencies are still there. Forcing cwnd to be at least 2 in tcp_cwnd_reduction fixes the problem with the high tail latencies. The latencies now look like this: dctcp 99% dctcp 99.9% 1MB RPCs 3.8ms 4.4ms 10KB RPCs 168us 211us Another group working with dctcp saw the same issue with production traffic and it was solved with this patch. The only issue is if it is safe to always use 2 or if it is better to use min(2, snd_ssthresh) (which could still trigger the problem). v2: fixed compiler warning in max function arguments Signed-off-by: Lawrence Brakmo --- net/ipv4/tcp_input.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 76ca88f63b70..282bd85322b0 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -2477,7 +2477,7 @@ void tcp_cwnd_reduction(struct sock *sk, int newly_acked_sacked, int flag) } /* Force a fast retransmit upon entering fast recovery */ sndcnt = max(sndcnt, (tp->prr_out ? 0 : 1)); - tp->snd_cwnd = tcp_packets_in_flight(tp) + sndcnt; + tp->snd_cwnd = max((int)tcp_packets_in_flight(tp) + sndcnt, 2); } static inline void tcp_end_cwnd_reduction(struct sock *sk)