From patchwork Fri Oct 20 18:05:39 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lawrence Brakmo X-Patchwork-Id: 828760 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.b="ZyUpI+Xg"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3yJYdP3lpvz9t3l for ; Sat, 21 Oct 2017 05:06:17 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753590AbdJTSGL (ORCPT ); Fri, 20 Oct 2017 14:06:11 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:58100 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753220AbdJTSGJ (ORCPT ); Fri, 20 Oct 2017 14:06:09 -0400 Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id v9KI4X11005145 for ; Fri, 20 Oct 2017 11:06:09 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=QfERZMmGHNAVCoj/cLziABIGy3MxRTUjKlPLgUFDVoI=; b=ZyUpI+XgHXENMCAwUP5WNITJJjBQLoSYeRm2K/OEZGbU0fRytzHzKvOpT+lxxWoLpwXn jNAscmUtwOPD3OWCG8LUtLyyGUaakbY834lp3vxzNAgoNRpczi5jW9lm6swPPEXIE4jB e7eLt4g+xCC1Y8ZZn3twvMetYfzlc7uuV+4= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2dqmbgrd3f-2 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Fri, 20 Oct 2017 11:06:09 -0700 Received: from PRN-CHUB02.TheFacebook.com (2620:10d:c081:35::11) by PRN-CHUB12.TheFacebook.com (2620:10d:c081:35::21) with Microsoft SMTP Server (TLS) id 14.3.319.2; Fri, 20 Oct 2017 11:06:08 -0700 Received: from mx-out.facebook.com (192.168.52.123) by PRN-CHUB02.TheFacebook.com (192.168.16.12) with Microsoft SMTP Server id 14.3.361.1; Fri, 20 Oct 2017 11:06:07 -0700 Received: by dev15893.prn1.facebook.com (Postfix, from userid 10340) id 432ABC21BE7; Fri, 20 Oct 2017 11:05:55 -0700 (PDT) Smtp-Origin-Hostprefix: dev From: Lawrence Brakmo Smtp-Origin-Hostname: dev15893.prn1.facebook.com To: netdev CC: Kernel Team , Alexei Starovoitov , "Daniel Borkmann" , Blake Matheny , "Lawrence Brakmo" Smtp-Origin-Cluster: prn1c29 Subject: [PATCH net-next 1/5] bpf: add support for BPF_SOCK_OPS_BASE_RTT Date: Fri, 20 Oct 2017 11:05:39 -0700 Message-ID: <20171020180543.4156833-2-brakmo@fb.com> X-Mailer: git-send-email 2.9.5 In-Reply-To: <20171020180543.4156833-1-brakmo@fb.com> References: <20171020180543.4156833-1-brakmo@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-10-20_09:, , signatures=0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org A congestion control algorithm can make a call to the BPF socket_ops program to request the base RTT. The base RTT can be congestion control dependent and is meant to represent a congestion threshold such that RTTs above it indicate congestion. This is especially useful for flows within a DC where the base RTT is easy to obtain. Being provided a base RTT solves a basic problem in RTT based congestion avoidance algorithms (such as Vegas, NV and BBR). Although it is easy to get the base RTT when the network is not congested, it is very diffcult to do when it is very congested. Newer connections get an inflated value of the base RTT leading to unfariness (newer flows with a larger base RTT get more bandwidth). As a result, RTT based congestion avoidance algorithms tend to update their base RTTs to improve fairness. In very congested networks this can lead to base RTT inflation, reducing the ability of these RTT based congestion control algorithms to prevent congestion. Note that in my experiments with TCP-NV, the base RTT provided can be much larger than the actual hardware RTT. For example, experimenting with hosts within a rack where the hardware RTT is 16-20us, I've used base RTTs up to 150us. The effect of using a larger base RTT is that the congestion avoidance algorithm will allow more queueing. When there are only a few flows the main effect is larger measured RTTs and RPC latencies due to the increased queueing. When there are a lot of flows, a larger base RTT can lead to more congestion and more packet drops. For this case, where the hardware RTT is 20us, a base RTT of 80us produces good results. This patch only introduces BPF_SOCK_OPS_BASE_RTT, a later patch in this set adds support for using it in TCP-NV. Further study and testing is needed before support can be added to other delay based congestion avoidance algorithms. Signed-off-by: Lawrence Brakmo Acked-by: Alexei Starovoitov Acked-by: Daniel Borkmann --- include/uapi/linux/bpf.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index d83f95e..1aca744 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -955,6 +955,13 @@ enum { BPF_SOCK_OPS_NEEDS_ECN, /* If connection's congestion control * needs ECN */ + BPF_SOCK_OPS_BASE_RTT, /* Get base RTT. The correct value is + * based on the path and may be + * dependent on the congestion control + * algorithm. In general it indicates + * a congestion threshold. RTTs above + * this indicate congestion + */ }; #define TCP_BPF_IW 1001 /* Set TCP initial congestion window */