From patchwork Sat Feb 23 01:06:55 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lawrence Brakmo X-Patchwork-Id: 1047230 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.b="IIctoIcU"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 445qnK0GTRz9sBF for ; Sat, 23 Feb 2019 12:07:33 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727605AbfBWBHX (ORCPT ); Fri, 22 Feb 2019 20:07:23 -0500 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:44612 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725814AbfBWBHW (ORCPT ); Fri, 22 Feb 2019 20:07:22 -0500 Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x1N0xF0N020719 for ; Fri, 22 Feb 2019 17:07:21 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=4YIFlynjP/joAMu+4MCbb+bbIaTyAut4fGyS1r3AQSY=; b=IIctoIcUO1FQsQI006P5AT+Mmx1QKeOdH+OWxvcNxawrjk9KVmrTB2wFwN4PAvrSU6dH //hLvAmPUzFJcoIsgttbKtJ/ShgYN4kuKo4eqfAWpvLTKn4ooxnwHosmNdXwznjXQPRp B5AzLoTLdIjEi7oNL3g6xJ4nODerwocJc3Q= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2qtubkr4ej-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Fri, 22 Feb 2019 17:07:21 -0800 Received: from mx-out.facebook.com (2620:10d:c081:10::13) by mail.thefacebook.com (2620:10d:c081:35::126) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA) id 15.1.1531.3; Fri, 22 Feb 2019 17:07:20 -0800 Received: by devbig009.ftw2.facebook.com (Postfix, from userid 10340) id 36B605AE1524; Fri, 22 Feb 2019 17:07:19 -0800 (PST) Smtp-Origin-Hostprefix: devbig From: brakmo Smtp-Origin-Hostname: devbig009.ftw2.facebook.com To: netdev CC: Martin Lau , Alexei Starovoitov , Daniel Borkmann , Eric Dumazet , Kernel Team Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH v2 bpf-next 1/9] bpf: Remove const from get_func_proto Date: Fri, 22 Feb 2019 17:06:55 -0800 Message-ID: <20190223010703.678070-2-brakmo@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190223010703.678070-1-brakmo@fb.com> References: <20190223010703.678070-1-brakmo@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-02-23_01:, , signatures=0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Martin KaFai Lau The next patch needs to set a bit in "prog" in cg_skb_func_proto(). Hence, the "const struct bpf_prog *" as a second argument will not work. This patch removes the "const" from get_func_proto and makes the needed changes to all get_func_proto implementations to avoid compiler error. Signed-off-by: Martin KaFai Lau Signed-off-by: Lawrence Brakmo --- drivers/media/rc/bpf-lirc.c | 2 +- include/linux/bpf.h | 2 +- kernel/bpf/cgroup.c | 2 +- kernel/trace/bpf_trace.c | 10 +++++----- net/core/filter.c | 30 +++++++++++++++--------------- 5 files changed, 23 insertions(+), 23 deletions(-) diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c index 390a722e6211..6adb7f734cb9 100644 --- a/drivers/media/rc/bpf-lirc.c +++ b/drivers/media/rc/bpf-lirc.c @@ -82,7 +82,7 @@ static const struct bpf_func_proto rc_pointer_rel_proto = { }; static const struct bpf_func_proto * -lirc_mode2_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +lirc_mode2_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_rc_repeat: diff --git a/include/linux/bpf.h b/include/linux/bpf.h index de18227b3d95..d5ba2fc01af3 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -287,7 +287,7 @@ struct bpf_verifier_ops { /* return eBPF function prototype for verification */ const struct bpf_func_proto * (*get_func_proto)(enum bpf_func_id func_id, - const struct bpf_prog *prog); + struct bpf_prog *prog); /* return true if 'size' wide access at offset 'off' within bpf_context * with 'type' (read or write) is allowed diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 4e807973aa80..0de0f5d98b46 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -701,7 +701,7 @@ int __cgroup_bpf_check_dev_permission(short dev_type, u32 major, u32 minor, EXPORT_SYMBOL(__cgroup_bpf_check_dev_permission); static const struct bpf_func_proto * -cgroup_dev_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +cgroup_dev_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_map_lookup_elem: diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index f1a86a0d881d..0d2f60828d7d 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -561,7 +561,7 @@ static const struct bpf_func_proto bpf_probe_read_str_proto = { }; static const struct bpf_func_proto * -tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +tracing_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_map_lookup_elem: @@ -610,7 +610,7 @@ tracing_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) } static const struct bpf_func_proto * -kprobe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +kprobe_prog_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_perf_event_output: @@ -726,7 +726,7 @@ static const struct bpf_func_proto bpf_get_stack_proto_tp = { }; static const struct bpf_func_proto * -tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +tp_prog_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_perf_event_output: @@ -790,7 +790,7 @@ static const struct bpf_func_proto bpf_perf_prog_read_value_proto = { }; static const struct bpf_func_proto * -pe_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +pe_prog_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_perf_event_output: @@ -873,7 +873,7 @@ static const struct bpf_func_proto bpf_get_stack_proto_raw_tp = { }; static const struct bpf_func_proto * -raw_tp_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +raw_tp_prog_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_perf_event_output: diff --git a/net/core/filter.c b/net/core/filter.c index 85749f6ec789..97916eedfe69 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5508,7 +5508,7 @@ bpf_base_func_proto(enum bpf_func_id func_id) } static const struct bpf_func_proto * -sock_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +sock_filter_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { /* inet and inet6 sockets are created in a process @@ -5524,7 +5524,7 @@ sock_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) } static const struct bpf_func_proto * -sock_addr_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +sock_addr_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { /* inet and inet6 sockets are created in a process @@ -5558,7 +5558,7 @@ sock_addr_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) } static const struct bpf_func_proto * -sk_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +sk_filter_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_skb_load_bytes: @@ -5575,7 +5575,7 @@ sk_filter_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) } static const struct bpf_func_proto * -cg_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +cg_skb_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_get_local_storage: @@ -5592,7 +5592,7 @@ cg_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) } static const struct bpf_func_proto * -tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +tc_cls_act_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_skb_store_bytes: @@ -5685,7 +5685,7 @@ tc_cls_act_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) } static const struct bpf_func_proto * -xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +xdp_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_perf_event_output: @@ -5723,7 +5723,7 @@ const struct bpf_func_proto bpf_sock_map_update_proto __weak; const struct bpf_func_proto bpf_sock_hash_update_proto __weak; static const struct bpf_func_proto * -sock_ops_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +sock_ops_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_setsockopt: @@ -5751,7 +5751,7 @@ const struct bpf_func_proto bpf_msg_redirect_map_proto __weak; const struct bpf_func_proto bpf_msg_redirect_hash_proto __weak; static const struct bpf_func_proto * -sk_msg_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +sk_msg_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_msg_redirect_map: @@ -5777,7 +5777,7 @@ const struct bpf_func_proto bpf_sk_redirect_map_proto __weak; const struct bpf_func_proto bpf_sk_redirect_hash_proto __weak; static const struct bpf_func_proto * -sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +sk_skb_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_skb_store_bytes: @@ -5812,7 +5812,7 @@ sk_skb_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) } static const struct bpf_func_proto * -flow_dissector_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +flow_dissector_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_skb_load_bytes: @@ -5823,7 +5823,7 @@ flow_dissector_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) } static const struct bpf_func_proto * -lwt_out_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +lwt_out_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_skb_load_bytes: @@ -5850,7 +5850,7 @@ lwt_out_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) } static const struct bpf_func_proto * -lwt_in_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +lwt_in_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_lwt_push_encap: @@ -5861,7 +5861,7 @@ lwt_in_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) } static const struct bpf_func_proto * -lwt_xmit_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +lwt_xmit_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_skb_get_tunnel_key: @@ -5898,7 +5898,7 @@ lwt_xmit_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) } static const struct bpf_func_proto * -lwt_seg6local_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +lwt_seg6local_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) { switch (func_id) { #if IS_ENABLED(CONFIG_IPV6_SEG6_BPF) @@ -8124,7 +8124,7 @@ static const struct bpf_func_proto sk_reuseport_load_bytes_relative_proto = { static const struct bpf_func_proto * sk_reuseport_func_proto(enum bpf_func_id func_id, - const struct bpf_prog *prog) + struct bpf_prog *prog) { switch (func_id) { case BPF_FUNC_sk_select_reuseport: From patchwork Sat Feb 23 01:06:56 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lawrence Brakmo X-Patchwork-Id: 1047232 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.b="nure6pyp"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 445qnM0bl3z9sBF for ; Sat, 23 Feb 2019 12:07:35 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727633AbfBWBH1 (ORCPT ); Fri, 22 Feb 2019 20:07:27 -0500 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:50234 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1725814AbfBWBH0 (ORCPT ); Fri, 22 Feb 2019 20:07:26 -0500 Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.16.0.27/8.16.0.27) with SMTP id x1N0wx6J010594 for ; Fri, 22 Feb 2019 17:07:23 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=sGG5jdokIhPNaOt3oe3IZ7B/JOfU8p02aKns0/lSHEk=; b=nure6pypuuFTIeovaxVCv9PodP0pzVVyn008mQNI3mq+orly+T8DlG/XQF9SVkNM7xp3 QFUaWVxc2kWEO/uAjV4SrI0ujCOZnIIomncqR2HeP7mk7LGlfth8cqykgbx3mUC9z3Zl iT9jslnoY9ZY4ic0JW6negFEMZpfx77sUFM= Received: from maileast.thefacebook.com ([199.201.65.23]) by m0089730.ppops.net with ESMTP id 2qttb18abn-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Fri, 22 Feb 2019 17:07:23 -0800 Received: from mx-out.facebook.com (2620:10d:c0a1:3::13) by mail.thefacebook.com (2620:10d:c021:18::175) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA) id 15.1.1531.3; Fri, 22 Feb 2019 17:07:22 -0800 Received: by devbig009.ftw2.facebook.com (Postfix, from userid 10340) id 3A9075AE1524; Fri, 22 Feb 2019 17:07:21 -0800 (PST) Smtp-Origin-Hostprefix: devbig From: brakmo Smtp-Origin-Hostname: devbig009.ftw2.facebook.com To: netdev CC: Martin Lau , Alexei Starovoitov , Daniel Borkmann , Eric Dumazet , Kernel Team Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH v2 bpf-next 2/9] bpf: Add bpf helper bpf_tcp_enter_cwr Date: Fri, 22 Feb 2019 17:06:56 -0800 Message-ID: <20190223010703.678070-3-brakmo@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190223010703.678070-1-brakmo@fb.com> References: <20190223010703.678070-1-brakmo@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-02-23_01:, , signatures=0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Martin KaFai Lau This patch adds a new bpf helper BPF_FUNC_tcp_enter_cwr "int bpf_tcp_enter_cwr(struct bpf_tcp_sock *tp)". It is added to BPF_PROG_TYPE_CGROUP_SKB which can be attached to the egress path where the bpf prog is called by ip_finish_output() or ip6_finish_output(). The verifier ensures that the parameter must be a tcp_sock. This helper makes a tcp_sock enter CWR state. It can be used by a bpf_prog to manage egress network bandwidth limit per cgroupv2. A later patch will have a sample program to show how it can be used to limit bandwidth usage per cgroupv2. To ensure it is only called from BPF_CGROUP_INET_EGRESS, the attr->expected_attach_type must be specified as BPF_CGROUP_INET_EGRESS during load time if the prog uses this new helper. The newly added prog->enforce_expected_attach_type bit will also be set if this new helper is used. This bit is for backward compatibility reason because currently prog->expected_attach_type has been ignored in BPF_PROG_TYPE_CGROUP_SKB. During attach time, prog->expected_attach_type is only enforced if the prog->enforce_expected_attach_type bit is set. i.e. prog->expected_attach_type is only enforced if this new helper is used by the prog. Signed-off-by: Lawrence Brakmo Signed-off-by: Martin KaFai Lau --- include/linux/bpf.h | 1 + include/linux/filter.h | 3 ++- include/uapi/linux/bpf.h | 9 ++++++++- kernel/bpf/syscall.c | 12 ++++++++++++ kernel/bpf/verifier.c | 4 ++++ net/core/filter.c | 25 +++++++++++++++++++++++++ 6 files changed, 52 insertions(+), 2 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index d5ba2fc01af3..2d54ba7cf9dd 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -195,6 +195,7 @@ enum bpf_arg_type { ARG_PTR_TO_SOCKET, /* pointer to bpf_sock */ ARG_PTR_TO_SPIN_LOCK, /* pointer to bpf_spin_lock */ ARG_PTR_TO_SOCK_COMMON, /* pointer to sock_common */ + ARG_PTR_TO_TCP_SOCK, /* pointer to tcp_sock */ }; /* type of values returned from helper functions */ diff --git a/include/linux/filter.h b/include/linux/filter.h index f32b3eca5a04..c6e878bdc5a6 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -510,7 +510,8 @@ struct bpf_prog { blinded:1, /* Was blinded */ is_func:1, /* program is a bpf function */ kprobe_override:1, /* Do we override a kprobe? */ - has_callchain_buf:1; /* callchain buffer allocated? */ + has_callchain_buf:1, /* callchain buffer allocated? */ + enforce_expected_attach_type:1; /* Enforce expected_attach_type checking at attach time */ enum bpf_prog_type type; /* Type of BPF program */ enum bpf_attach_type expected_attach_type; /* For some prog types */ u32 len; /* Number of filter blocks */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index bcdd2474eee7..95b5058fa945 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2359,6 +2359,12 @@ union bpf_attr { * Return * A **struct bpf_tcp_sock** pointer on success, or NULL in * case of failure. + * + * int bpf_tcp_enter_cwr(struct bpf_tcp_sock *tp) + * Description + * Make a tcp_sock enter CWR state. + * Return + * 0 on success, or a negative error in case of failure. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -2457,7 +2463,8 @@ union bpf_attr { FN(spin_lock), \ FN(spin_unlock), \ FN(sk_fullsock), \ - FN(tcp_sock), + FN(tcp_sock), \ + FN(tcp_enter_cwr), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index ec7c552af76b..9a478f2875cd 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -1482,6 +1482,14 @@ bpf_prog_load_check_attach_type(enum bpf_prog_type prog_type, default: return -EINVAL; } + case BPF_PROG_TYPE_CGROUP_SKB: + switch (expected_attach_type) { + case BPF_CGROUP_INET_INGRESS: + case BPF_CGROUP_INET_EGRESS: + return 0; + default: + return -EINVAL; + } default: return 0; } @@ -1725,6 +1733,10 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog, case BPF_PROG_TYPE_CGROUP_SOCK: case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: return attach_type == prog->expected_attach_type ? 0 : -EINVAL; + case BPF_PROG_TYPE_CGROUP_SKB: + return prog->enforce_expected_attach_type && + prog->expected_attach_type != attach_type ? + -EINVAL : 0; default: return 0; } diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 1b9496c41383..95fb385c6f3c 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2424,6 +2424,10 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno, return -EFAULT; } meta->ptr_id = reg->id; + } else if (arg_type == ARG_PTR_TO_TCP_SOCK) { + expected_type = PTR_TO_TCP_SOCK; + if (type != expected_type) + goto err_type; } else if (arg_type == ARG_PTR_TO_SPIN_LOCK) { if (meta->func_id == BPF_FUNC_spin_lock) { if (process_spin_lock(env, regno, true)) diff --git a/net/core/filter.c b/net/core/filter.c index 97916eedfe69..ca57ef25279c 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5426,6 +5426,24 @@ static const struct bpf_func_proto bpf_tcp_sock_proto = { .arg1_type = ARG_PTR_TO_SOCK_COMMON, }; +BPF_CALL_1(bpf_tcp_enter_cwr, struct tcp_sock *, tp) +{ + struct sock *sk = (struct sock *)tp; + + if (sk->sk_state == TCP_ESTABLISHED) { + tcp_enter_cwr(sk); + return 0; + } + + return -EINVAL; +} + +static const struct bpf_func_proto bpf_tcp_enter_cwr_proto = { + .func = bpf_tcp_enter_cwr, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_TCP_SOCK, +}; #endif /* CONFIG_INET */ bool bpf_helper_changes_pkt_data(void *func) @@ -5585,6 +5603,13 @@ cg_skb_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) #ifdef CONFIG_INET case BPF_FUNC_tcp_sock: return &bpf_tcp_sock_proto; + case BPF_FUNC_tcp_enter_cwr: + if (prog->expected_attach_type == BPF_CGROUP_INET_EGRESS) { + prog->enforce_expected_attach_type = 1; + return &bpf_tcp_enter_cwr_proto; + } else { + return NULL; + } #endif default: return sk_filter_func_proto(func_id, prog); From patchwork Sat Feb 23 01:06:57 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lawrence Brakmo X-Patchwork-Id: 1047231 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.b="SDUycvJR"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 445qnL1rbdz9sBR for ; Sat, 23 Feb 2019 12:07:34 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727622AbfBWBH0 (ORCPT ); Fri, 22 Feb 2019 20:07:26 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:59040 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727609AbfBWBH0 (ORCPT ); Fri, 22 Feb 2019 20:07:26 -0500 Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x1N13D6d029037 for ; Fri, 22 Feb 2019 17:07:24 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=XjKGTCDWUH2vOFmdoHnjnRduI7nJTNxFKBqN7HGlQoo=; b=SDUycvJRMx/C97UKKDV33/bSPBySmIiblE8HSRMTrMsdiHIs61WKsg9GIQAVHYi3Q/Pp zOzTpDCTJ0m+CS/1mEnucheQGBFs5rm2jbV2DwDbgQvozcIGgKJs7865kgHx925rWMhl TY9JdWvGEA31BIcdV9Jh+T3Eyr5XL7Ilvco= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2qtuea044m-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Fri, 22 Feb 2019 17:07:24 -0800 Received: from mx-out.facebook.com (2620:10d:c081:10::13) by mail.thefacebook.com (2620:10d:c081:35::130) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA) id 15.1.1531.3; Fri, 22 Feb 2019 17:07:23 -0800 Received: by devbig009.ftw2.facebook.com (Postfix, from userid 10340) id 459B15AE1524; Fri, 22 Feb 2019 17:07:23 -0800 (PST) Smtp-Origin-Hostprefix: devbig From: brakmo Smtp-Origin-Hostname: devbig009.ftw2.facebook.com To: netdev CC: Martin Lau , Alexei Starovoitov , Daniel Borkmann , Eric Dumazet , Kernel Team Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH v2 bpf-next 3/9] bpf: Test bpf_tcp_enter_cwr in test_verifier Date: Fri, 22 Feb 2019 17:06:57 -0800 Message-ID: <20190223010703.678070-4-brakmo@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190223010703.678070-1-brakmo@fb.com> References: <20190223010703.678070-1-brakmo@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-02-23_01:, , signatures=0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This test ensures the verifier has checked the arg1 of BPF_FUNC_tcp_enter_cwr is of ARG_PTR_TO_TCP_SOCK type. Signed-off-by: Martin KaFai Lau Signed-off-by: Lawrence Brakmo --- tools/testing/selftests/bpf/verifier/sock.c | 33 +++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/tools/testing/selftests/bpf/verifier/sock.c b/tools/testing/selftests/bpf/verifier/sock.c index 0ddfdf76aba5..b07a083eeb59 100644 --- a/tools/testing/selftests/bpf/verifier/sock.c +++ b/tools/testing/selftests/bpf/verifier/sock.c @@ -382,3 +382,36 @@ .result = REJECT, .errstr = "type=tcp_sock expected=sock", }, +{ + "bpf_tcp_enter_cwr(skb->sk)", + .insns = { + BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)), + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + BPF_EMIT_CALL(BPF_FUNC_tcp_enter_cwr), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_CGROUP_SKB, + .result = REJECT, + .errstr = "type=sock_common expected=tcp_sock", +}, +{ + "bpf_tcp_enter_cwr(bpf_tcp_sock(skb->sk))", + .insns = { + BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_1, offsetof(struct __sk_buff, sk)), + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 2), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + BPF_EMIT_CALL(BPF_FUNC_tcp_sock), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), + BPF_EXIT_INSN(), + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), + BPF_EMIT_CALL(BPF_FUNC_tcp_enter_cwr), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .prog_type = BPF_PROG_TYPE_CGROUP_SKB, + .result = ACCEPT, +}, From patchwork Sat Feb 23 01:06:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lawrence Brakmo X-Patchwork-Id: 1047236 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.b="FHzS45dr"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 445qnQ5Xv5z9sBF for ; Sat, 23 Feb 2019 12:07:38 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727686AbfBWBHh (ORCPT ); Fri, 22 Feb 2019 20:07:37 -0500 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:44628 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725814AbfBWBH3 (ORCPT ); Fri, 22 Feb 2019 20:07:29 -0500 Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x1N0x0Uj020528 for ; Fri, 22 Feb 2019 17:07:27 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=keJb9k0TuvmHBo0p+AQA422QrszLTKbFzrT2nPDe1a0=; b=FHzS45drQls1FtD+P2y0RipD0juYvuf90stPq2vonoKJO5LwAeWlzYTo+eq0rQUQVAei /JxrTievGR/RR3UOqWgIQ+coqcpZYjSc/fgBvShq0MEmayJx8BofOwz6hdwlrjk9/AVJ cp379OHVjt2bjuW87lca9h4Ssbmilz2rJTs= Received: from maileast.thefacebook.com ([199.201.65.23]) by mx0a-00082601.pphosted.com with ESMTP id 2qtubkr4es-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Fri, 22 Feb 2019 17:07:27 -0800 Received: from mx-out.facebook.com (2620:10d:c0a1:3::13) by mail.thefacebook.com (2620:10d:c021:18::176) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA) id 15.1.1531.3; Fri, 22 Feb 2019 17:07:26 -0800 Received: by devbig009.ftw2.facebook.com (Postfix, from userid 10340) id 526F55AE1524; Fri, 22 Feb 2019 17:07:25 -0800 (PST) Smtp-Origin-Hostprefix: devbig From: brakmo Smtp-Origin-Hostname: devbig009.ftw2.facebook.com To: netdev CC: Martin Lau , Alexei Starovoitov , Daniel Borkmann , Eric Dumazet , Kernel Team Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH v2 bpf-next 4/9] bpf: add bpf helper bpf_skb_ecn_set_ce Date: Fri, 22 Feb 2019 17:06:58 -0800 Message-ID: <20190223010703.678070-5-brakmo@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190223010703.678070-1-brakmo@fb.com> References: <20190223010703.678070-1-brakmo@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-02-23_01:, , signatures=0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch adds a new bpf helper BPF_FUNC_skb_ecn_set_ce "int bpf_skb_ecn_set_ce(struct sk_buff *skb)". It is added to BPF_PROG_TYPE_CGROUP_SKB typed bpf_prog which currently can be attached to the ingress and egress path. The helper is needed because his type of bpf_prog cannot modify the skb directly. This helper is used to set the ECN field of ECN capable IP packets to ce (congestion encountered) in the IPv6 or IPv4 header of the skb. It can be used by a bpf_prog to manage egress or ingress network bandwdith limit per cgroupv2 by inducing an ECN response in the TCP sender. This works best when using DCTCP. Signed-off-by: Lawrence Brakmo --- include/uapi/linux/bpf.h | 10 +++++++++- net/core/filter.c | 14 ++++++++++++++ 2 files changed, 23 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 95b5058fa945..fc646f3eaf9b 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2365,6 +2365,13 @@ union bpf_attr { * Make a tcp_sock enter CWR state. * Return * 0 on success, or a negative error in case of failure. + * + * int bpf_skb_ecn_set_ce(struct sk_buf *skb) + * Description + * Sets ECN of IP header to ce (congestion encountered) if + * current value is ect (ECN capable). Works with IPv6 and IPv4. + * Return + * 1 if set, 0 if not set. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -2464,7 +2471,8 @@ union bpf_attr { FN(spin_unlock), \ FN(sk_fullsock), \ FN(tcp_sock), \ - FN(tcp_enter_cwr), + FN(tcp_enter_cwr), \ + FN(skb_ecn_set_ce), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call diff --git a/net/core/filter.c b/net/core/filter.c index ca57ef25279c..955369c6ed30 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5444,6 +5444,18 @@ static const struct bpf_func_proto bpf_tcp_enter_cwr_proto = { .ret_type = RET_INTEGER, .arg1_type = ARG_PTR_TO_TCP_SOCK, }; + +BPF_CALL_1(bpf_skb_ecn_set_ce, struct sk_buff *, skb) +{ + return INET_ECN_set_ce(skb); +} + +static const struct bpf_func_proto bpf_skb_ecn_set_ce_proto = { + .func = bpf_skb_ecn_set_ce, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, +}; #endif /* CONFIG_INET */ bool bpf_helper_changes_pkt_data(void *func) @@ -5610,6 +5622,8 @@ cg_skb_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) } else { return NULL; } + case BPF_FUNC_skb_ecn_set_ce: + return &bpf_skb_ecn_set_ce_proto; #endif default: return sk_filter_func_proto(func_id, prog); From patchwork Sat Feb 23 01:06:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lawrence Brakmo X-Patchwork-Id: 1047234 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.b="PSX07Ck0"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 445qnN5PN6z9sBF for ; Sat, 23 Feb 2019 12:07:36 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727659AbfBWBHa (ORCPT ); Fri, 22 Feb 2019 20:07:30 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:54468 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727648AbfBWBH3 (ORCPT ); Fri, 22 Feb 2019 20:07:29 -0500 Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x1N13Kf4006329 for ; Fri, 22 Feb 2019 17:07:29 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=QLG66rWgrxCE17qC3jT5Gkv6zmMeppEQFp4+mzjL3cU=; b=PSX07Ck02YWbj4zPbOlxHmTRcLpoNVcAeQ+mTUKpqJhYdN1ffc6Y4LFckoz0OSzs5BMq ut7KGlRgxin4cJAILOc8iaIJVA552Z2r64HTNDMYRdsnRud2pXJ5iLFhWLMdhJGHvvwz 9hk+UxTXKiDe1bUHx3yh0ViDRvYUFqnMGzM= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2qtu86r524-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Fri, 22 Feb 2019 17:07:29 -0800 Received: from mx-out.facebook.com (2620:10d:c081:10::13) by mail.thefacebook.com (2620:10d:c081:35::130) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA) id 15.1.1531.3; Fri, 22 Feb 2019 17:07:28 -0800 Received: by devbig009.ftw2.facebook.com (Postfix, from userid 10340) id 5ECB15AE1524; Fri, 22 Feb 2019 17:07:27 -0800 (PST) Smtp-Origin-Hostprefix: devbig From: brakmo Smtp-Origin-Hostname: devbig009.ftw2.facebook.com To: netdev CC: Martin Lau , Alexei Starovoitov , Daniel Borkmann , Eric Dumazet , Kernel Team Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH v2 bpf-next 5/9] bpf: Add bpf helper bpf_tcp_check_probe_timer Date: Fri, 22 Feb 2019 17:06:59 -0800 Message-ID: <20190223010703.678070-6-brakmo@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190223010703.678070-1-brakmo@fb.com> References: <20190223010703.678070-1-brakmo@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-02-23_01:, , signatures=0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch adds a new bpf helper BPF_FUNC_tcp_check_probe_timer "int bpf_check_tcp_probe_timer(struct tcp_bpf_sock *tp, u32 when_us)". It is added to BPF_PROG_TYPE_CGROUP_SKB typed bpf_prog which currently can be attached to the ingress and egress path. To ensure it is only called from BPF_CGROUP_INET_EGRESS, the attr->expected_attach_type must be specified as BPF_CGROUP_INET_EGRESS during load time if the prog uses this new helper. The newly added prog->enforce_expected_attach_type bit will also be set if this new helper is used. This bit is for backward compatibility reason because currently prog->expected_attach_type has been ignored in BPF_PROG_TYPE_CGROUP_SKB. During attach time, prog->expected_attach_type is only enforced if the prog->enforce_expected_attach_type bit is set. i.e. prog->expected_attach_type is only enforced if this new helper is used by the prog. The function forces when_us to be at least TCP_TIMEOUT_MIN (currently 2 jiffies) and no more than TCP_RTO_MIN (currently 200ms). When using a bpf_prog to limit the egress bandwidth of a cgroup, it can happen that we drop a packet of a connection that has no packets out. In this case, the connection may not retry sending the packet until the probe timer fires. Since the default value of the probe timer is at least 200ms, this can introduce link underutiliation (i.e. the cgroup egress bandwidth being smaller than the specified rate) thus increased tail latency. This helper function allows for setting a smaller probe timer. Signed-off-by: Lawrence Brakmo --- include/uapi/linux/bpf.h | 12 +++++++++++- net/core/filter.c | 32 ++++++++++++++++++++++++++++++++ 2 files changed, 43 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index fc646f3eaf9b..5d0bed852800 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2372,6 +2372,15 @@ union bpf_attr { * current value is ect (ECN capable). Works with IPv6 and IPv4. * Return * 1 if set, 0 if not set. + * + * int bpf_tcp_check_probe_timer(struct bpf_tcp_sock *tp, int when_us) + * Description + * Checks that there are no packets out and there is no pending + * timer. If both of these are true, it bounds when_us by + * TCP_TIMEOUT_MIN (2 jiffies) or TCP_RTO_MIN (200ms) and + * sets the probe timer. + * Return + * 0 */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -2472,7 +2481,8 @@ union bpf_attr { FN(sk_fullsock), \ FN(tcp_sock), \ FN(tcp_enter_cwr), \ - FN(skb_ecn_set_ce), + FN(skb_ecn_set_ce), \ + FN(tcp_check_probe_timer), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call diff --git a/net/core/filter.c b/net/core/filter.c index 955369c6ed30..7d7026768840 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5456,6 +5456,31 @@ static const struct bpf_func_proto bpf_skb_ecn_set_ce_proto = { .ret_type = RET_INTEGER, .arg1_type = ARG_PTR_TO_CTX, }; + +BPF_CALL_2(bpf_tcp_check_probe_timer, struct tcp_sock *, tp, u32, when_us) +{ + struct sock *sk = (struct sock *) tp; + unsigned long when = usecs_to_jiffies(when_us); + + if (!tp->packets_out && !inet_csk(sk)->icsk_pending) { + if (when < TCP_TIMEOUT_MIN) + when = TCP_TIMEOUT_MIN; + else if (when > TCP_RTO_MIN) + when = TCP_RTO_MIN; + + tcp_reset_xmit_timer(sk, ICSK_TIME_PROBE0, + when, TCP_RTO_MAX, NULL); + } + return 0; +} + +static const struct bpf_func_proto bpf_tcp_check_probe_timer_proto = { + .func = bpf_tcp_check_probe_timer, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_TCP_SOCK, + .arg2_type = ARG_ANYTHING, +}; #endif /* CONFIG_INET */ bool bpf_helper_changes_pkt_data(void *func) @@ -5624,6 +5649,13 @@ cg_skb_func_proto(enum bpf_func_id func_id, struct bpf_prog *prog) } case BPF_FUNC_skb_ecn_set_ce: return &bpf_skb_ecn_set_ce_proto; + case BPF_FUNC_tcp_check_probe_timer: + if (prog->expected_attach_type == BPF_CGROUP_INET_EGRESS) { + prog->enforce_expected_attach_type = 1; + return &bpf_tcp_check_probe_timer_proto; + } else { + return NULL; + } #endif default: return sk_filter_func_proto(func_id, prog); From patchwork Sat Feb 23 01:07:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lawrence Brakmo X-Patchwork-Id: 1047235 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.b="VatzmNpY"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 445qnP6wm9z9sBL for ; Sat, 23 Feb 2019 12:07:37 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727668AbfBWBHd (ORCPT ); Fri, 22 Feb 2019 20:07:33 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:59062 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727648AbfBWBHc (ORCPT ); Fri, 22 Feb 2019 20:07:32 -0500 Received: from pps.filterd (m0148461.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x1N13CTe029034 for ; Fri, 22 Feb 2019 17:07:31 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=QrAVmi+1v502/TIu+QAG0303RQgysxGX6imY/cMmlXw=; b=VatzmNpYZalKYUYVqdtZKeu1K+ehLY6YC0a95I2YJYQayrh01ua8NHkUUa4ESoQ2tDx8 2lxPyCLdr+6LqNY16hNUO6+SgxMryb7qdZcqc+4bb8edGSzAm/LXj3cqnG8n7KE1C73c b726ZSe96L0mWJ18tyUoc8vc8HIt2D+Nin4= Received: from maileast.thefacebook.com ([199.201.65.23]) by mx0a-00082601.pphosted.com with ESMTP id 2qtuea0450-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Fri, 22 Feb 2019 17:07:31 -0800 Received: from mx-out.facebook.com (2620:10d:c0a1:3::13) by mail.thefacebook.com (2620:10d:c021:18::175) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA) id 15.1.1531.3; Fri, 22 Feb 2019 17:07:29 -0800 Received: by devbig009.ftw2.facebook.com (Postfix, from userid 10340) id 682C85AE1524; Fri, 22 Feb 2019 17:07:29 -0800 (PST) Smtp-Origin-Hostprefix: devbig From: brakmo Smtp-Origin-Hostname: devbig009.ftw2.facebook.com To: netdev CC: Martin Lau , Alexei Starovoitov , Daniel Borkmann , Eric Dumazet , Kernel Team Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH v2 bpf-next 6/9] bpf: sync bpf.h to tools and update bpf_helpers.h Date: Fri, 22 Feb 2019 17:07:00 -0800 Message-ID: <20190223010703.678070-7-brakmo@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190223010703.678070-1-brakmo@fb.com> References: <20190223010703.678070-1-brakmo@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-02-23_01:, , signatures=0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch syncs the uapi bpf.h to tools/ and also updates bpf_herlpers.h in tools/ Signed-off-by: Lawrence Brakmo --- tools/include/uapi/linux/bpf.h | 27 ++++++++++++++++++++++- tools/testing/selftests/bpf/bpf_helpers.h | 6 +++++ 2 files changed, 32 insertions(+), 1 deletion(-) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index bcdd2474eee7..5d0bed852800 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -2359,6 +2359,28 @@ union bpf_attr { * Return * A **struct bpf_tcp_sock** pointer on success, or NULL in * case of failure. + * + * int bpf_tcp_enter_cwr(struct bpf_tcp_sock *tp) + * Description + * Make a tcp_sock enter CWR state. + * Return + * 0 on success, or a negative error in case of failure. + * + * int bpf_skb_ecn_set_ce(struct sk_buf *skb) + * Description + * Sets ECN of IP header to ce (congestion encountered) if + * current value is ect (ECN capable). Works with IPv6 and IPv4. + * Return + * 1 if set, 0 if not set. + * + * int bpf_tcp_check_probe_timer(struct bpf_tcp_sock *tp, int when_us) + * Description + * Checks that there are no packets out and there is no pending + * timer. If both of these are true, it bounds when_us by + * TCP_TIMEOUT_MIN (2 jiffies) or TCP_RTO_MIN (200ms) and + * sets the probe timer. + * Return + * 0 */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -2457,7 +2479,10 @@ union bpf_attr { FN(spin_lock), \ FN(spin_unlock), \ FN(sk_fullsock), \ - FN(tcp_sock), + FN(tcp_sock), \ + FN(tcp_enter_cwr), \ + FN(skb_ecn_set_ce), \ + FN(tcp_check_probe_timer), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h index d9999f1ed1d2..8aec59624ebc 100644 --- a/tools/testing/selftests/bpf/bpf_helpers.h +++ b/tools/testing/selftests/bpf/bpf_helpers.h @@ -180,6 +180,12 @@ static struct bpf_sock *(*bpf_sk_fullsock)(struct bpf_sock *sk) = (void *) BPF_FUNC_sk_fullsock; static struct bpf_tcp_sock *(*bpf_tcp_sock)(struct bpf_sock *sk) = (void *) BPF_FUNC_tcp_sock; +static int (*bpf_tcp_enter_cwr)(struct bpf_tcp_sock *tp) = + (void *) BPF_FUNC_tcp_enter_cwr; +static int (*bpf_skb_ecn_set_ce)(void *ctx) = + (void *) BPF_FUNC_skb_ecn_set_ce; +static int (*bpf_tcp_check_probe_timer)(struct bpf_tcp_sock *tp, int when_us) = + (void *) BPF_FUNC_tcp_check_probe_timer; /* llvm builtin functions that eBPF C program may use to * emit BPF_LD_ABS and BPF_LD_IND instructions From patchwork Sat Feb 23 01:07:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lawrence Brakmo X-Patchwork-Id: 1047237 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.b="Ur7rpM/4"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 445qnR407sz9sBL for ; Sat, 23 Feb 2019 12:07:39 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727677AbfBWBHg (ORCPT ); Fri, 22 Feb 2019 20:07:36 -0500 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:43788 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727648AbfBWBHf (ORCPT ); Fri, 22 Feb 2019 20:07:35 -0500 Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.16.0.27/8.16.0.27) with SMTP id x1N126P7003907 for ; Fri, 22 Feb 2019 17:07:33 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=rn5jcVVPzJwQc/kogkatwerp9NWevg560PQEksl3eM0=; b=Ur7rpM/4G1ran3x9ywJUUophKTSnD347pe7fIsUtO5SGEesTAFiaLmr+N4sxK/k3R8Ze 2d155M6rKrRx+uwu+auO8R7gdIg0JYTqjVfFCc5ez/7VdFH5cHgqmfpJrev5qaFzgciD 1wZykvVeJFTdlwPxdh5I9AAss0PzYP5BMVU= Received: from mail.thefacebook.com ([199.201.64.23]) by m0001303.ppops.net with ESMTP id 2qtupd02qq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Fri, 22 Feb 2019 17:07:33 -0800 Received: from mx-out.facebook.com (2620:10d:c081:10::13) by mail.thefacebook.com (2620:10d:c081:35::127) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA) id 15.1.1531.3; Fri, 22 Feb 2019 17:07:32 -0800 Received: by devbig009.ftw2.facebook.com (Postfix, from userid 10340) id 711415AE1524; Fri, 22 Feb 2019 17:07:31 -0800 (PST) Smtp-Origin-Hostprefix: devbig From: brakmo Smtp-Origin-Hostname: devbig009.ftw2.facebook.com To: netdev CC: Martin Lau , Alexei Starovoitov , Daniel Borkmann , Eric Dumazet , Kernel Team Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH v2 bpf-next 7/9] bpf: Sample NRM BPF program to limit egress bw Date: Fri, 22 Feb 2019 17:07:01 -0800 Message-ID: <20190223010703.678070-8-brakmo@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190223010703.678070-1-brakmo@fb.com> References: <20190223010703.678070-1-brakmo@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-02-23_01:, , signatures=0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org A cgroup skb BPF program to limit cgroup output bandwidth. It uses a modified virtual token bucket queue to limit average egress bandwidth. The implementation uses credits instead of tokens. Negative credits imply that queueing would have happened (this is a virtual queue, so no queueing is done by it. However, queueing may occur at the actual qdisc (which is not used for rate limiting). This implementation uses 3 thresholds, one to start marking packets and the other two to drop packets: CREDIT - <--------------------------|------------------------> + | | | 0 | Large pkt | | drop thresh | Small pkt drop Mark threshold thresh The effect of marking depends on the type of packet: a) If the packet is ECN enabled and it is a TCP packet, then the packet is ECN marked. The current mark threshold is tuned for DCTCP. b) If the packet is a TCP packet, then we probabilistically call tcp_cwr to reduce the congestion window. The current implementation uses a linear distribution (0% probability at marking threshold, 100% probability at drop threshold). c) If the packet is not a TCP packet, then it is dropped. If the credit is below the drop threshold, the packet is dropped. If it is a TCP packet, then it also calls tcp_cwr since packets dropped by by a cgroup skb BPF program do not automatically trigger a call to tcp_cwr in the current kernel code. This BPF program actually uses 2 drop thresholds, one threshold for larger packets (>= 120 bytes) and another for smaller packets. This protects smaller packets such as SYNs, ACKs, etc. The default bandwidth limit is set at 1Gbps but this can be changed by a user program through a shared BPF map. In addition, by default this BPF program does not limit connections using loopback. This behavior can be overwritten by the user program. There is also an option to calculate some statistics, such as percent of packets marked or dropped, which the user program can access. A latter patch provides such a program (nrm.c) Signed-off-by: Lawrence Brakmo --- samples/bpf/Makefile | 2 + samples/bpf/nrm.h | 31 ++++++ samples/bpf/nrm_kern.h | 137 ++++++++++++++++++++++++++ samples/bpf/nrm_out_kern.c | 190 +++++++++++++++++++++++++++++++++++++ 4 files changed, 360 insertions(+) create mode 100644 samples/bpf/nrm.h create mode 100644 samples/bpf/nrm_kern.h create mode 100644 samples/bpf/nrm_out_kern.c diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index a0ef7eddd0b3..897b467066fd 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -167,6 +167,7 @@ always += xdpsock_kern.o always += xdp_fwd_kern.o always += task_fd_query_kern.o always += xdp_sample_pkts_kern.o +always += nrm_out_kern.o KBUILD_HOSTCFLAGS += -I$(objtree)/usr/include KBUILD_HOSTCFLAGS += -I$(srctree)/tools/lib/ @@ -266,6 +267,7 @@ $(BPF_SAMPLES_PATH)/*.c: verify_target_bpf $(LIBBPF) $(src)/*.c: verify_target_bpf $(LIBBPF) $(obj)/tracex5_kern.o: $(obj)/syscall_nrs.h +$(obj)/nrm_out_kern.o: $(src)/nrm.h $(src)/nrm_kern.h # asm/sysreg.h - inline assembly used by it is incompatible with llvm. # But, there is no easy way to fix it, so just exclude it since it is diff --git a/samples/bpf/nrm.h b/samples/bpf/nrm.h new file mode 100644 index 000000000000..ea89d6027ff0 --- /dev/null +++ b/samples/bpf/nrm.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * Copyright (c) 2019 Facebook + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * Include file for NRM programs + */ +struct nrm_vqueue { + struct bpf_spin_lock lock; + /* 4 byte hole */ + unsigned long long lasttime; /* In ns */ + int credit; /* In bytes */ + unsigned int rate; /* In bytes per NS << 20 */ +}; + +struct nrm_queue_stats { + unsigned long rate; /* in Mbps*/ + unsigned long stats:1, /* get NRM stats (marked, dropped,..) */ + loopback:1; /* also limit flows using loopback */ + unsigned long long pkts_marked; + unsigned long long bytes_marked; + unsigned long long pkts_dropped; + unsigned long long bytes_dropped; + unsigned long long pkts_total; + unsigned long long bytes_total; + unsigned long long firstPacketTime; + unsigned long long lastPacketTime; +}; diff --git a/samples/bpf/nrm_kern.h b/samples/bpf/nrm_kern.h new file mode 100644 index 000000000000..e48d4d2944a9 --- /dev/null +++ b/samples/bpf/nrm_kern.h @@ -0,0 +1,137 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * Copyright (c) 2019 Facebook + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * Include file for sample NRM BPF programs + */ +#define KBUILD_MODNAME "foo" +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "bpf_endian.h" +#include "bpf_helpers.h" +#include "nrm.h" + +#define DROP_PKT 0 +#define ALLOW_PKT 1 +#define TCP_ECN_OK 1 + +#define NRM_DEBUG 0 // Set to 1 to enable debugging +#if NRM_DEBUG +#define bpf_printk(fmt, ...) \ +({ \ + char ____fmt[] = fmt; \ + bpf_trace_printk(____fmt, sizeof(____fmt), \ + ##__VA_ARGS__); \ +}) +#else +#define bpf_printk(fmt, ...) +#endif + +#define INITIAL_CREDIT_PACKETS 100 +#define MAX_BYTES_PER_PACKET 1500 +#define MARK_THRESH (80 * MAX_BYTES_PER_PACKET) +#define DROP_THRESH (80 * 5 * MAX_BYTES_PER_PACKET) +#define LARGE_PKT_DROP_THRESH (DROP_THRESH - (15 * MAX_BYTES_PER_PACKET)) +#define MARK_REGION_SIZE (LARGE_PKT_DROP_THRESH - MARK_THRESH) +#define LARGE_PKT_THRESH 120 +#define MAX_CREDIT (100 * MAX_BYTES_PER_PACKET) +#define INIT_CREDIT (INITIAL_CREDIT_PACKETS * MAX_BYTES_PER_PACKET) + +// rate in bytes per ns << 20 +#define CREDIT_PER_NS(delta, rate) ((((u64)(delta)) * (rate)) >> 20) + +struct bpf_map_def SEC("maps") queue_state = { + .type = BPF_MAP_TYPE_CGROUP_STORAGE, + .key_size = sizeof(struct bpf_cgroup_storage_key), + .value_size = sizeof(struct nrm_vqueue), +}; +BPF_ANNOTATE_KV_PAIR(queue_state, struct bpf_cgroup_storage_key, + struct nrm_vqueue); + +struct bpf_map_def SEC("maps") queue_stats = { + .type = BPF_MAP_TYPE_ARRAY, + .key_size = sizeof(u32), + .value_size = sizeof(struct nrm_queue_stats), + .max_entries = 1, +}; +BPF_ANNOTATE_KV_PAIR(queue_stats, int, struct nrm_queue_stats); + +struct nrm_pkt_info { + bool is_ip; + bool is_tcp; + short ecn; +}; + +static __always_inline void nrm_get_pkt_info(struct __sk_buff *skb, + struct nrm_pkt_info *pkti) +{ + struct iphdr iph; + struct ipv6hdr *ip6h; + + bpf_skb_load_bytes(skb, 0, &iph, 12); + if (iph.version == 6) { + ip6h = (struct ipv6hdr *)&iph; + pkti->is_ip = true; + pkti->is_tcp = (ip6h->nexthdr == 6); + pkti->ecn = (ip6h->flow_lbl[0] >> 4) & INET_ECN_MASK; + } else if (iph.version == 4) { + pkti->is_ip = true; + pkti->is_tcp = (iph.protocol == 6); + pkti->ecn = iph.tos & INET_ECN_MASK; + } else { + pkti->is_ip = false; + pkti->is_tcp = false; + pkti->ecn = 0; + } +} + +static __always_inline void nrm_init_vqueue(struct nrm_vqueue *qdp, int rate) +{ + bpf_printk("Initializing queue_state, rate:%d\n", rate * 128); + qdp->lasttime = bpf_ktime_get_ns(); + qdp->credit = INIT_CREDIT; + qdp->rate = rate * 128; +} + +static __always_inline void nrm_update_stats(struct nrm_queue_stats *qsp, + int len, + unsigned long long curtime, + bool congestion_flag, + bool drop_flag) +{ + if (qsp != NULL) { + // Following is needed for work conserving + __sync_add_and_fetch(&(qsp->bytes_total), len); + if (qsp->stats) { + // Optionally update statistics + if (qsp->firstPacketTime == 0) + qsp->firstPacketTime = curtime; + qsp->lastPacketTime = curtime; + __sync_add_and_fetch(&(qsp->pkts_total), 1); + if (congestion_flag) { + __sync_add_and_fetch(&(qsp->pkts_marked), 1); + __sync_add_and_fetch(&(qsp->bytes_marked), len); + } + if (drop_flag) { + __sync_add_and_fetch(&(qsp->pkts_dropped), 1); + __sync_add_and_fetch(&(qsp->bytes_dropped), + len); + } + } + } +} diff --git a/samples/bpf/nrm_out_kern.c b/samples/bpf/nrm_out_kern.c new file mode 100644 index 000000000000..2d4c5a647daa --- /dev/null +++ b/samples/bpf/nrm_out_kern.c @@ -0,0 +1,190 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2019 Facebook + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * Sample Network Resource Manager (NRM) BPF program. + * + * A cgroup skb BPF egress program to limit cgroup output bandwidth. + * It uses a modified virtual token bucket queue to limit average + * egress bandwidth. The implementation uses credits instead of tokens. + * Negative credits imply that queueing would have happened (this is + * a virtual queue, so no queueing is done by it. However, queueing may + * occur at the actual qdisc (which is not used for rate limiting). + * + * This implementation uses 3 thresholds, one to start marking packets and + * the other two to drop packets: + * CREDIT + * - <--------------------------|------------------------> + + * | | | 0 + * | Large pkt | + * | drop thresh | + * Small pkt drop Mark threshold + * thresh + * + * The effect of marking depends on the type of packet: + * a) If the packet is ECN enabled and it is a TCP packet, then the packet + * is ECN marked. + * b) If the packet is a TCP packet, then we probabilistically call tcp_cwr + * to reduce the congestion window. The current implementation uses a linear + * distribution (0% probability at marking threshold, 100% probability + * at drop threshold). + * c) If the packet is not a TCP packet, then it is dropped. + * + * If the credit is below the drop threshold, the packet is dropped. If it + * is a TCP packet, then it also calls tcp_cwr since packets dropped by + * by a cgroup skb BPF program do not automatically trigger a call to + * tcp_cwr in the current kernel code. + * + * This BPF program actually uses 2 drop thresholds, one threshold + * for larger packets (>= 120 bytes) and another for smaller packets. This + * protects smaller packets such as SYNs, ACKs, etc. + * + * The default bandwidth limit is set at 1Gbps but this can be changed by + * a user program through a shared BPF map. In addition, by default this BPF + * program does not limit connections using loopback. This behavior can be + * overwritten by the user program. There is also an option to calculate + * some statistics, such as percent of packets marked or dropped, which + * the user program can access. + * + * A latter patch provides such a program (nrm.c) + */ + +#include "nrm_kern.h" + +SEC("cgroup_skb/egress") +int _nrm_out_cg(struct __sk_buff *skb) +{ + struct nrm_pkt_info pkti; + int len = skb->len; + unsigned int queue_index = 0; + unsigned long long curtime; + int credit; + signed long long delta = 0, zero = 0; + int max_credit = MAX_CREDIT; + bool congestion_flag = false; + bool drop_flag = false; + bool cwr_flag = false; + struct nrm_vqueue *qdp; + struct nrm_queue_stats *qsp = NULL; + int rv = ALLOW_PKT; + + qsp = bpf_map_lookup_elem(&queue_stats, &queue_index); + if (qsp != NULL && !qsp->loopback && (skb->ifindex == 1)) + return ALLOW_PKT; + + nrm_get_pkt_info(skb, &pkti); + + // We may want to account for the length of headers in len + // calculation, like ETH header + overhead, specially if it + // is a gso packet. But I am not doing it right now. + + qdp = bpf_get_local_storage(&queue_state, 0); + if (!qdp) + return ALLOW_PKT; + else if (qdp->lasttime == 0) + nrm_init_vqueue(qdp, 1024); + + curtime = bpf_ktime_get_ns(); + + // Begin critical section + bpf_spin_lock(&qdp->lock); + credit = qdp->credit; + delta = curtime - qdp->lasttime; + /* delta < 0 implies that another process with a curtime greater + * than ours beat us to the critical section and already added + * the new credit, so we should not add it ourselves + */ + if (delta > 0) { + qdp->lasttime = curtime; + credit += CREDIT_PER_NS(delta, qdp->rate); + if (credit > MAX_CREDIT) + credit = MAX_CREDIT; + } + credit -= len; + qdp->credit = credit; + bpf_spin_unlock(&qdp->lock); + // End critical section + + // Check if we should update rate + if (qsp != NULL && (qsp->rate * 128) != qdp->rate) { + qdp->rate = qsp->rate * 128; + bpf_printk("Updating rate: %d (1sec:%llu bits)\n", + (int)qdp->rate, + CREDIT_PER_NS(1000000000, qdp->rate) * 8); + } + + // Set flags (drop, congestion, cwr) + // Dropping => we are congested, so ignore congestion flag + if (pkti.is_ip) { + if (credit < -DROP_THRESH || + (len > LARGE_PKT_THRESH && + credit < -LARGE_PKT_DROP_THRESH)) { + // Very congested, set drop flag + drop_flag = true; + if (pkti.is_tcp && pkti.ecn == 0) + cwr_flag = true; + } else if (credit < 0) { + // Congested, set congestion flag + if (pkti.is_tcp || pkti.ecn) { + if (credit < -MARK_THRESH) + congestion_flag = true; + else + congestion_flag = false; + } else { + congestion_flag = true; + } + } + + if (congestion_flag) { + if (!pkti.ecn || !bpf_skb_ecn_set_ce(skb)) { + if (pkti.is_tcp) { + u32 rand = bpf_get_prandom_u32(); + + if (-credit >= MARK_THRESH + + (rand % MARK_REGION_SIZE)) { + // Do cong avoidance + cwr_flag = true; + } + } else if (len > LARGE_PKT_THRESH) { + // Problem if too many small packets? + drop_flag = true; + congestion_flag = false; + } + } + } + + if (pkti.is_tcp && (drop_flag || cwr_flag)) { + struct bpf_sock *sk; + struct bpf_tcp_sock *tp = NULL; + + sk = skb->sk; + if (sk) { + sk = bpf_sk_fullsock(sk); + if (sk) + tp = bpf_tcp_sock(sk); + } + if (tp && drop_flag) + bpf_tcp_check_probe_timer(tp, 20000); + if (tp && cwr_flag) + bpf_tcp_enter_cwr(tp); + } + + if (drop_flag) + rv = DROP_PKT; + + } else if (credit < -MARK_THRESH) { + drop_flag = true; + rv = DROP_PKT; + } + + nrm_update_stats(qsp, len, curtime, congestion_flag, drop_flag); + + if (rv == DROP_PKT) + __sync_add_and_fetch(&(qdp->credit), len); + + return rv; +} +char _license[] SEC("license") = "GPL"; From patchwork Sat Feb 23 01:07:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lawrence Brakmo X-Patchwork-Id: 1047238 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.b="Pkz0fCDY"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 445qnT35Gcz9sBF for ; Sat, 23 Feb 2019 12:07:41 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727695AbfBWBHk (ORCPT ); Fri, 22 Feb 2019 20:07:40 -0500 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:48178 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727670AbfBWBHf (ORCPT ); Fri, 22 Feb 2019 20:07:35 -0500 Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x1N0xtXF010980 for ; Fri, 22 Feb 2019 17:07:34 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=uBqjDGsUgkyEVGhWSHAznkxVIn/9boKRVECS6lnGGYI=; b=Pkz0fCDYUIKdM/+MV2fk2llnQyi/9TBlS711pam3vEhXBUn4+eb//QrGOep4/iCg+zHg lN8co9idDyuj1nCnAy5PDD3T68UnYkne8zHn4JNF0hRhQq8AvdRpruYBvfV6kSGiFQiU k5lzv83v4hOyEAVNkCnN78K69cNtThjj4ks= Received: from maileast.thefacebook.com ([199.201.65.23]) by mx0a-00082601.pphosted.com with ESMTP id 2qtug703nh-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT) for ; Fri, 22 Feb 2019 17:07:34 -0800 Received: from mx-out.facebook.com (2620:10d:c0a1:3::13) by mail.thefacebook.com (2620:10d:c021:18::171) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA) id 15.1.1531.3; Fri, 22 Feb 2019 17:07:33 -0800 Received: by devbig009.ftw2.facebook.com (Postfix, from userid 10340) id A02305AE1524; Fri, 22 Feb 2019 17:07:33 -0800 (PST) Smtp-Origin-Hostprefix: devbig From: brakmo Smtp-Origin-Hostname: devbig009.ftw2.facebook.com To: netdev CC: Martin Lau , Alexei Starovoitov , Daniel Borkmann , Eric Dumazet , Kernel Team Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH v2 bpf-next 8/9] bpf: User program for testing NRM Date: Fri, 22 Feb 2019 17:07:02 -0800 Message-ID: <20190223010703.678070-9-brakmo@fb.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190223010703.678070-1-brakmo@fb.com> References: <20190223010703.678070-1-brakmo@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-02-23_01:, , signatures=0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The program nrm creates a cgroup and attaches a BPF program to the cgroup for testing NRM for egress traffic. One still needs to create network traffic. This can be done through netesto, netperf or iperf3. A follow-up patch contains a script to create traffic. USAGE: nrm [-d] [-l] [-n ] [-r ] [-s] [-t ] [-w] [-h] [prog] Where: -d Print BPF trace debug buffer -l Also limit flows doing loopback -n <#> To create cgroup "/nrm#" and attach prog. Default is /nrm1 This is convenient when testing NRM in more than 1 cgroup -r Rate limit in Mbps -s Get NRM stats (marked, dropped, etc.) -t