From patchwork Tue Aug 11 09:35:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Rybowski X-Patchwork-Id: 1343225 X-Patchwork-Delegate: matthieu.baerts@tessares.net Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=tessares.net Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=tessares-net.20150623.gappssmtp.com header.i=@tessares-net.20150623.gappssmtp.com header.a=rsa-sha256 header.s=20150623 header.b=FKGO5kbj; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BQnmN55QYz9sTM for ; Tue, 11 Aug 2020 19:37:08 +1000 (AEST) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id A505312F10D19; Tue, 11 Aug 2020 02:37:06 -0700 (PDT) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::644; helo=mail-ej1-x644.google.com; envelope-from=nicolas.rybowski@tessares.net; receiver= Received: from mail-ej1-x644.google.com (mail-ej1-x644.google.com [IPv6:2a00:1450:4864:20::644]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id D363912F088E3 for ; Tue, 11 Aug 2020 02:37:03 -0700 (PDT) Received: by mail-ej1-x644.google.com with SMTP id d6so12371621ejr.5 for ; Tue, 11 Aug 2020 02:37:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tessares-net.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=feNQy4HaOBB3oHc8oiXYrtw1QKxUgNHYoWbfFfLPrhE=; b=FKGO5kbjNFy5+AQO/lN55aiDIAuW/zjfsNPGYtFBUuF7TeESj4YfUc/KiiO8xIvsjM 6Tl0SurNYdYeaix8zTXhb+4sV88d+MXgPwm5GiospwtOd0B9Mo0U2CCNJn8aEeMN5aF6 fdqYLPQJc/5nw8nylgk1RddvTgZJuQUMm5Ek9nHRsVV5TIgd0Xz6eNk4VQfPL0jxHenh TlMPzhIbEOQlBvqRwkp3lgfebT/1pyaujm1rRfUyxeeXsrD+W4MjeIWVzwHaAvucbvYi 2/BZ7YmjuJphF7LVnTMtY03KSe32kBAZ7Ua4E4pxv1PQ1Ic9wdTg6p3rzG/P0SbmQvuX exdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=feNQy4HaOBB3oHc8oiXYrtw1QKxUgNHYoWbfFfLPrhE=; b=i0/evWhNXt3xVKN4Dk8nXGyrUtWN5oL6+OGTxTQlNEv1MaRgwBPKuj0tBGEmJFndzI 9E6ohTEG6A0lIYYnaWJGCEmpK+tDqfmZFcwONZZInxFrbiVcAckiRn3l2wv1HENKNamR sKFHUo0lzl65g18rwshiGIQ/TUlfTI+AEPF7sFQWmDDIdSvrTdB0MuR+1ZtdIu6XYICl Nt0lF+owNGCfW/XhCdWwzqy9Ctn5DVMaZ7Y+i1G7t+osJV8VMrGMl3X5CzN87UAg1CDA xtbtgP2tvFJMbx3S0pYFMnOyJl1HQQunvGAyFOeOhrLHznaAp9QQ/BADHAyKP/IToGd0 UWJA== X-Gm-Message-State: AOAM5332QVqM99lMBG2WLNSFUrEMxGpdyCaah3CaIYzB/8dyalhK9mn6 Db0hiv0BKcufJqfqcikWQeOlaonB4vyfzg== X-Google-Smtp-Source: ABdhPJySjMsupEKrHKtegQhHzNMiVtqmGO5Pnbk0MJoCqGsJAOXMuRDISB3fR/iHta/72YZkmoRcDw== X-Received: by 2002:a17:906:5f8f:: with SMTP id a15mr20356426eju.291.1597138621873; Tue, 11 Aug 2020 02:37:01 -0700 (PDT) Received: from localhost.localdomain (223.60-242-81.adsl-dyn.isp.belgacom.be. [81.242.60.223]) by smtp.gmail.com with ESMTPSA id z10sm14649456eje.122.2020.08.11.02.37.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Aug 2020 02:37:01 -0700 (PDT) From: Nicolas Rybowski To: mptcp@lists.01.org Date: Tue, 11 Aug 2020 11:35:29 +0200 Message-Id: <20200811093531.27768-2-nicolas.rybowski@tessares.net> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200811093531.27768-1-nicolas.rybowski@tessares.net> References: <20200811093531.27768-1-nicolas.rybowski@tessares.net> MIME-Version: 1.0 Message-ID-Hash: RR2HYUZCIA2IP45H7ZUV2XO6GQ2EAHUK X-Message-ID-Hash: RR2HYUZCIA2IP45H7ZUV2XO6GQ2EAHUK X-MailFrom: nicolas.rybowski@tessares.net X-Mailman-Rule-Hits: member-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address CC: gregory.detal@tessares.net, Nicolas Rybowski X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH mptcp-next v2 1/3] bpf: expose is_mptcp flag to bpf_tcp_sock List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: is_mptcp is a field from struct tcp_sock used to indicate that the current tcp_sock is part of the MPTCP protocol. In this protocol, a first socket (mptcp_sock) is created with sk_protocol set to IPPROTO_MPTCP (=262) for control purpose but it isn't directly on the wire. This is the role of the subflow (kernel) sockets which are classical tcp_sock with sk_protocol set to IPPROTO_TCP. The only way to differentiate such sockets from plain TCP sockets is the is_mptcp field from tcp_sock. Such an exposure in BPF is thus required to be able to differentiate plain TCP sockets from MPTCP subflow sockets in BPF_PROG_TYPE_SOCK_OPS programs. The choice has been made to silently pass the case when CONFIG_MPTCP is unset by defaulting is_mptcp to 0 in order to make BPF independent of the MPTCP configuration. Another solution is to make the verifier fail in 'bpf_tcp_sock_is_valid_ctx_access' but this will add an additional '#ifdef CONFIG_MPTCP' in the BPF code and a same injected BPF program will not run if MPTCP is not set. An example use-case is provided in https://github.com/multipath-tcp/mptcp_net-next/tree/scripts/bpf/examples Suggested-by: Matthieu Baerts Acked-by: Matthieu Baerts Signed-off-by: Nicolas Rybowski --- include/uapi/linux/bpf.h | 1 + net/core/filter.c | 9 ++++++++- tools/include/uapi/linux/bpf.h | 1 + 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index b134e679e9db..1a0caa4abc2d 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -3864,6 +3864,7 @@ struct bpf_tcp_sock { __u32 delivered; /* Total data packets delivered incl. rexmits */ __u32 delivered_ce; /* Like the above but only ECE marked packets */ __u32 icsk_retransmits; /* Number of unrecovered [RTO] timeouts */ + __u32 is_mptcp; /* Is MPTCP subflow? */ }; struct bpf_sock_tuple { diff --git a/net/core/filter.c b/net/core/filter.c index 7124f0fe6974..ce83e1ec5259 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5742,7 +5742,7 @@ bool bpf_tcp_sock_is_valid_access(int off, int size, enum bpf_access_type type, struct bpf_insn_access_aux *info) { if (off < 0 || off >= offsetofend(struct bpf_tcp_sock, - icsk_retransmits)) + is_mptcp)) return false; if (off % size != 0) @@ -5876,6 +5876,13 @@ u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type, case offsetof(struct bpf_tcp_sock, icsk_retransmits): BPF_INET_SOCK_GET_COMMON(icsk_retransmits); break; + case offsetof(struct bpf_tcp_sock, is_mptcp): +#ifdef CONFIG_MPTCP + BPF_TCP_SOCK_GET_COMMON(is_mptcp); +#else + *insn++ = BPF_MOV32_IMM(si->dst_reg, 0); +#endif + break; } return insn - insn_buf; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index b134e679e9db..1a0caa4abc2d 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -3864,6 +3864,7 @@ struct bpf_tcp_sock { __u32 delivered; /* Total data packets delivered incl. rexmits */ __u32 delivered_ce; /* Like the above but only ECE marked packets */ __u32 icsk_retransmits; /* Number of unrecovered [RTO] timeouts */ + __u32 is_mptcp; /* Is MPTCP subflow? */ }; struct bpf_sock_tuple { From patchwork Tue Aug 11 09:35:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Rybowski X-Patchwork-Id: 1343226 X-Patchwork-Delegate: matthieu.baerts@tessares.net Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=198.145.21.10; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=tessares.net Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=tessares-net.20150623.gappssmtp.com header.i=@tessares-net.20150623.gappssmtp.com header.a=rsa-sha256 header.s=20150623 header.b=fmxxEFRi; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BQnmS16Mrz9sTM for ; Tue, 11 Aug 2020 19:37:12 +1000 (AEST) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id B124212F10D1B; Tue, 11 Aug 2020 02:37:10 -0700 (PDT) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::541; helo=mail-ed1-x541.google.com; envelope-from=nicolas.rybowski@tessares.net; receiver= Received: from mail-ed1-x541.google.com (mail-ed1-x541.google.com [IPv6:2a00:1450:4864:20::541]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id CE5C112F10D1A for ; Tue, 11 Aug 2020 02:37:07 -0700 (PDT) Received: by mail-ed1-x541.google.com with SMTP id i26so8566111edv.4 for ; Tue, 11 Aug 2020 02:37:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tessares-net.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=6IYLBlXtSnjpd2UboN+lCe+/sCY5S1/J8LD4k1K4IXs=; b=fmxxEFRiu7khyXKMJcl4f8CK6k+sakxxsgVMpOyKw3xObqSMMpy1NXDl02iDfJW8jm iP6oqSzJYtEZ5U0MSspjOVlBRqAn/NlKO6pnm4Fxj9Knbd1dp0bjelMKVdpVMh6BdQ0g gtvbjaCbJPa3ZBSv3VZ6gHWQdXIC97zsdHv/Lb+1I969PwAQr83zda7ZcZYtM0LFPaht yQknhY6LjYh6QHvMqHqqDPixNRM0mnSpR7StJlmOva2da8WBv8zVhrZl8VXG6FaGSdBI 02uy8fXx0VanMG7krzQ2WNbylLtUMKc8s4G2WtaMmpGbjT4vuT1X/kbVN7t1N9hE/QIa 4Hvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=6IYLBlXtSnjpd2UboN+lCe+/sCY5S1/J8LD4k1K4IXs=; b=egLdnqb7rC6z6Q3i20dNmReBf/g+CgSw0m2mV8xvoxrxqUKIgsbB+o3OCtWXH+unNw Xsw4I2U3lSvht9ImII+Kn85Vu6uX1649cDMzRC1BCDZXt1NLt4Gu9XVtesjTzV0XpXFS 1ToKYMPr0L7/y4gE74cOkeUxu33ICxX6+dGlstGvIzLcwpH0buxtALbXywzW+PCf18IM xJehj9qUYzd7b7AGUmpAlnZUtDHFG2Kxo9o0QKVOi2Q/NObblqxFL2HFfUdONGDLGSZt AHjkwUL2wcJwPc16jKiaS7mCM19/f5dxKbS+x2EmFaOJR17JeTC7/6XNCodBbOsCTct0 RTtg== X-Gm-Message-State: AOAM532i6Sp/cUxAiO6xsf7P3Xdk9+ouhfqENGhZ9i3kV16L9K8KwLLv 2z5pgPQrsAOrsA+ejbYvfx6+B2d0lVOKoQ== X-Google-Smtp-Source: ABdhPJx8gfXh9Nz6DTDclhB17k0TuaIj52qVLmc/PIig0O0Ah4aIgiT3I8dnFGuFAxcGB14QLuf1Uw== X-Received: by 2002:aa7:d3d9:: with SMTP id o25mr25848749edr.362.1597138625898; Tue, 11 Aug 2020 02:37:05 -0700 (PDT) Received: from localhost.localdomain (223.60-242-81.adsl-dyn.isp.belgacom.be. [81.242.60.223]) by smtp.gmail.com with ESMTPSA id z10sm14649456eje.122.2020.08.11.02.37.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Aug 2020 02:37:05 -0700 (PDT) From: Nicolas Rybowski To: mptcp@lists.01.org Date: Tue, 11 Aug 2020 11:35:30 +0200 Message-Id: <20200811093531.27768-3-nicolas.rybowski@tessares.net> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200811093531.27768-1-nicolas.rybowski@tessares.net> References: <20200811093531.27768-1-nicolas.rybowski@tessares.net> MIME-Version: 1.0 Message-ID-Hash: RFHOAGYQJVFADUWVEOYK265L5DWS2SQU X-Message-ID-Hash: RFHOAGYQJVFADUWVEOYK265L5DWS2SQU X-MailFrom: nicolas.rybowski@tessares.net X-Mailman-Rule-Hits: member-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address CC: gregory.detal@tessares.net, Nicolas Rybowski X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH mptcp-next v2 2/3] mptcp: attach subflow socket to parent cgroup List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: It has been observed that the kernel sockets created for the subflows (except the first one) are not in the same cgroup as their parents. That's because the additional subflows are created by kernel workers. This is a problem with eBPF programs attached to the parent's cgroup won't be executed for the children. But also with any other features of CGroup linked to a sk. This patch fixes this behaviour. As the subflow sockets are created by the kernel, we can't use 'mem_cgroup_sk_alloc' because of the current context being the one of the kworker. This is why we have to do low level memcg manipulation, if required. Suggested-by: Matthieu Baerts Acked-by: Matthieu Baerts Signed-off-by: Nicolas Rybowski --- Notes: v1 -> v2: - update cgroup attachment code in subflow.c due to additional #ifdef net/mptcp/subflow.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 3c2ca2108ac2..82f6d2d9e39e 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1109,6 +1109,30 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc, return err; } +static void mptcp_attach_cgroup(struct sock *parent, struct sock *child) +{ +#ifdef CONFIG_SOCK_CGROUP_DATA + struct sock_cgroup_data *parent_skcd = &parent->sk_cgrp_data, + *child_skcd = &child->sk_cgrp_data; + + /* only the additional subflows created by kworkers have to be modified */ + if (cgroup_id(sock_cgroup_ptr(parent_skcd)) != + cgroup_id(sock_cgroup_ptr(child_skcd))) { +#ifdef CONFIG_MEMCG + struct mem_cgroup *memcg = parent->sk_memcg; + + mem_cgroup_sk_free(child); + if (memcg && css_tryget(&memcg->css)) + child->sk_memcg = memcg; +#endif /* CONFIG_MEMCG */ + + cgroup_sk_free(child_skcd); + *child_skcd = *parent_skcd; + cgroup_sk_clone(child_skcd); + } +#endif /* CONFIG_SOCK_CGROUP_DATA */ +} + int mptcp_subflow_create_socket(struct sock *sk, struct socket **new_sock) { struct mptcp_subflow_context *subflow; @@ -1129,6 +1153,9 @@ int mptcp_subflow_create_socket(struct sock *sk, struct socket **new_sock) lock_sock(sf->sk); + /* the newly created socket has to be in the same cgroup as its parent */ + mptcp_attach_cgroup(sk, sf->sk); + /* kernel sockets do not by default acquire net ref, but TCP timer * needs it. */ From patchwork Tue Aug 11 09:35:31 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Rybowski X-Patchwork-Id: 1343228 X-Patchwork-Delegate: matthieu.baerts@tessares.net Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.01.org (client-ip=2001:19d0:306:5::1; helo=ml01.01.org; envelope-from=mptcp-bounces@lists.01.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=tessares.net Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=tessares-net.20150623.gappssmtp.com header.i=@tessares-net.20150623.gappssmtp.com header.a=rsa-sha256 header.s=20150623 header.b=mlq+QTzS; dkim-atps=neutral Received: from ml01.01.org (ml01.01.org [IPv6:2001:19d0:306:5::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4BQnmY1QgDz9sTS for ; Tue, 11 Aug 2020 19:37:17 +1000 (AEST) Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id C5E4F12F10D1E; Tue, 11 Aug 2020 02:37:15 -0700 (PDT) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::541; helo=mail-ed1-x541.google.com; envelope-from=nicolas.rybowski@tessares.net; receiver= Received: from mail-ed1-x541.google.com (mail-ed1-x541.google.com [IPv6:2a00:1450:4864:20::541]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 37A1912F10D08 for ; Tue, 11 Aug 2020 02:37:14 -0700 (PDT) Received: by mail-ed1-x541.google.com with SMTP id di22so8547291edb.12 for ; Tue, 11 Aug 2020 02:37:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tessares-net.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=kveaitru+JOMXKIeqMzXUmhDPuhg3tdJOshWorWCRcg=; b=mlq+QTzSvL9ZK7CUsORwUF7loPRPEJskJOgJb02TkZSsB/woQe+2ZlRDMO4+LDkdYk LR8srkq64tf9LqRuztBhbgiL86No+zQ+FyzQq8JW0kHj7QcyNdoDqofwxD1RFqMjp4bR h13B8GH+eEFNJI6b+NFBQGwFc0X8AljdVbW4UBUuki/ZsPO3CeQITKjd6oFn/++SBVv6 DSQhchDEre5G0UTDPeJutNtO89Revmb+uqf9DDGhLiOxstg0ve/qXgq2AYVHPHvvaUZH fULk0byJNIuBc/gOc1+bDRJGR8l9QU8JkgCuCsot+lEuXNnb+EzwSaMAe5KWJ5VlDpM4 iEUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=kveaitru+JOMXKIeqMzXUmhDPuhg3tdJOshWorWCRcg=; b=r02rK0dpW5wlxsYJ3vWZIkb+cPNemGF23qGIYRruxUAeb7MzM+YavudEaSBD1T+77t P3nbYNUhr+Ls+aumkwKPmpfxWWbnTzAeBU5NZnF3uPGmbY0eTuwYhgQcaexvGB7QZyDB 9emoLOSAtiOjD2gzCOYZS0eRZpz1l/iMri1MxtQO/bOkJdYh/aOM+STGgmD1aOWY7rMc hnEn3MsKYl3D89HblPHe9QOopUnz5ui8RxxoO4/thuwy6SWjFfmpjLFTPbOh9ka2QIhm 5evzMuzJld2baDp5X9bGFoD6CZLVgBiQaTALIiHpfxCDjMpUoi2q+OFI/oxxYKlhC8iQ fH1A== X-Gm-Message-State: AOAM5302u2imINFFKIyl1FP7bl/fWmbCDXU1NuJJFb7MVLqGh7xBmi2p ezFeUxhZlc+lLrf9NtahjB3EqhScYHhjoA== X-Google-Smtp-Source: ABdhPJyqqjb3AQSF2mIq595/Kxc8h9ijlJo1mfiqsFHoL8hdx68HC31AxL3z5ROh58UF44NqTU1wvA== X-Received: by 2002:a50:e047:: with SMTP id g7mr24560575edl.290.1597138632204; Tue, 11 Aug 2020 02:37:12 -0700 (PDT) Received: from localhost.localdomain (223.60-242-81.adsl-dyn.isp.belgacom.be. [81.242.60.223]) by smtp.gmail.com with ESMTPSA id z10sm14649456eje.122.2020.08.11.02.37.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 11 Aug 2020 02:37:11 -0700 (PDT) From: Nicolas Rybowski To: mptcp@lists.01.org Date: Tue, 11 Aug 2020 11:35:31 +0200 Message-Id: <20200811093531.27768-4-nicolas.rybowski@tessares.net> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200811093531.27768-1-nicolas.rybowski@tessares.net> References: <20200811093531.27768-1-nicolas.rybowski@tessares.net> MIME-Version: 1.0 Message-ID-Hash: UF75NINME2DNLHH4HYQVWIYM4NAQCX3R X-Message-ID-Hash: UF75NINME2DNLHH4HYQVWIYM4NAQCX3R X-MailFrom: nicolas.rybowski@tessares.net X-Mailman-Rule-Hits: member-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address CC: gregory.detal@tessares.net, Nicolas Rybowski X-Mailman-Version: 3.1.1 Precedence: list Subject: [MPTCP] [PATCH mptcp-next v2 3/3] bpf: add 'bpf_mptcp_sock' structure and helper List-Id: Discussions regarding MPTCP upstreaming Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: In order to precisely identify the parent MPTCP connection of a subflow, it is required to access the mptcp_sock's token which uniquely identify a MPTCP connection. This patch adds a new structure 'bpf_mptcp_sock' exposing the 'token' field of the 'mptcp_sock' extracted from a subflow's 'tcp_sock'. It also adds the declaration of a new BPF helper of the same name to expose the newly defined structure in the userspace BPF API. This is the foundation to expose more MPTCP-specific fields through BPF. Currently, it is limited to the field 'token' of the msk but it is easily extensible. Acked-by: Matthieu Baerts Signed-off-by: Nicolas Rybowski --- Notes: v1 -> v2: - move new BPF helper implementation in net/mptcp/bpf.c - add bpf.c in Makefile of net/mptcp - minor cosmetic changes: alignment of function's arguments on open parenthesis include/linux/bpf.h | 33 ++++++++++++++++ include/uapi/linux/bpf.h | 13 ++++++ kernel/bpf/verifier.c | 30 ++++++++++++++ net/core/filter.c | 4 ++ net/mptcp/Makefile | 2 + net/mptcp/bpf.c | 72 ++++++++++++++++++++++++++++++++++ scripts/bpf_helpers_doc.py | 2 + tools/include/uapi/linux/bpf.h | 13 ++++++ 8 files changed, 169 insertions(+) create mode 100644 net/mptcp/bpf.c diff --git a/include/linux/bpf.h b/include/linux/bpf.h index cef4ef0d2b4e..6eed7d870a52 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -281,6 +281,7 @@ enum bpf_return_type { RET_PTR_TO_SOCK_COMMON_OR_NULL, /* returns a pointer to a sock_common or NULL */ RET_PTR_TO_ALLOC_MEM_OR_NULL, /* returns a pointer to dynamically allocated memory or NULL */ RET_PTR_TO_BTF_ID_OR_NULL, /* returns a pointer to a btf_id or NULL */ + RET_PTR_TO_MPTCP_SOCK_OR_NULL, /* returns a pointer to mptcp_sock or NULL */ }; /* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs @@ -360,6 +361,8 @@ enum bpf_reg_type { PTR_TO_RDONLY_BUF_OR_NULL, /* reg points to a readonly buffer or NULL */ PTR_TO_RDWR_BUF, /* reg points to a read/write buffer */ PTR_TO_RDWR_BUF_OR_NULL, /* reg points to a read/write buffer or NULL */ + PTR_TO_MPTCP_SOCK, /* reg points to struct mptcp_sock */ + PTR_TO_MPTCP_SOCK_OR_NULL, /* reg points to struct mptcp_sock or NULL */ }; /* The information passed from prog-specific *_is_valid_access @@ -1734,6 +1737,7 @@ extern const struct bpf_func_proto bpf_skc_to_tcp_sock_proto; extern const struct bpf_func_proto bpf_skc_to_tcp_timewait_sock_proto; extern const struct bpf_func_proto bpf_skc_to_tcp_request_sock_proto; extern const struct bpf_func_proto bpf_skc_to_udp6_sock_proto; +extern const struct bpf_func_proto bpf_mptcp_sock_proto; const struct bpf_func_proto *bpf_tracing_func_proto( enum bpf_func_id func_id, const struct bpf_prog *prog); @@ -1790,6 +1794,35 @@ struct sk_reuseport_kern { u32 reuseport_id; bool bind_inany; }; + +#ifdef CONFIG_MPTCP +bool bpf_mptcp_sock_is_valid_access(int off, int size, + enum bpf_access_type type, + struct bpf_insn_access_aux *info); + +u32 bpf_mptcp_sock_convert_ctx_access(enum bpf_access_type type, + const struct bpf_insn *si, + struct bpf_insn *insn_buf, + struct bpf_prog *prog, + u32 *target_size); +#else /* CONFIG_MPTCP */ +static inline bool bpf_mptcp_sock_is_valid_access(int off, int size, + enum bpf_access_type type, + struct bpf_insn_access_aux *info) +{ + return false; +} + +static inline u32 bpf_mptcp_sock_convert_ctx_access(enum bpf_access_type type, + const struct bpf_insn *si, + struct bpf_insn *insn_buf, + struct bpf_prog *prog, + u32 *target_size) +{ + return 0; +} +#endif /* CONFIG_MPTCP */ + bool bpf_tcp_sock_is_valid_access(int off, int size, enum bpf_access_type type, struct bpf_insn_access_aux *info); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 1a0caa4abc2d..8d51973cbf6f 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -3394,6 +3394,14 @@ union bpf_attr { * A non-negative value equal to or less than *size* on success, * or a negative error in case of failure. * + * struct bpf_mptcp_sock *bpf_mptcp_sock(struct bpf_sock *sk) + * Description + * This helper gets a **struct bpf_mptcp_sock** pointer from a + * **struct bpf_sock** pointer. + * Return + * A **struct bpf_mptcp_sock** pointer on success, or **NULL** in + * case of failure. + * */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -3538,6 +3546,7 @@ union bpf_attr { FN(skc_to_tcp_request_sock), \ FN(skc_to_udp6_sock), \ FN(get_task_stack), \ + FN(mptcp_sock), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper @@ -3867,6 +3876,10 @@ struct bpf_tcp_sock { __u32 is_mptcp; /* Is MPTCP subflow? */ }; +struct bpf_mptcp_sock { + __u32 token; /* msk token */ +}; + struct bpf_sock_tuple { union { struct { diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index b6ccfce3bf4c..fae0afa84800 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -391,6 +391,7 @@ static bool type_is_sk_pointer(enum bpf_reg_type type) return type == PTR_TO_SOCKET || type == PTR_TO_SOCK_COMMON || type == PTR_TO_TCP_SOCK || + type == PTR_TO_MPTCP_SOCK || type == PTR_TO_XDP_SOCK; } @@ -398,6 +399,7 @@ static bool reg_type_not_null(enum bpf_reg_type type) { return type == PTR_TO_SOCKET || type == PTR_TO_TCP_SOCK || + type == PTR_TO_MPTCP_SOCK || type == PTR_TO_MAP_VALUE || type == PTR_TO_SOCK_COMMON; } @@ -408,6 +410,7 @@ static bool reg_type_may_be_null(enum bpf_reg_type type) type == PTR_TO_SOCKET_OR_NULL || type == PTR_TO_SOCK_COMMON_OR_NULL || type == PTR_TO_TCP_SOCK_OR_NULL || + type == PTR_TO_MPTCP_SOCK_OR_NULL || type == PTR_TO_BTF_ID_OR_NULL || type == PTR_TO_MEM_OR_NULL || type == PTR_TO_RDONLY_BUF_OR_NULL || @@ -426,6 +429,8 @@ static bool reg_type_may_be_refcounted_or_null(enum bpf_reg_type type) type == PTR_TO_SOCKET_OR_NULL || type == PTR_TO_TCP_SOCK || type == PTR_TO_TCP_SOCK_OR_NULL || + type == PTR_TO_MPTCP_SOCK || + type == PTR_TO_MPTCP_SOCK_OR_NULL || type == PTR_TO_MEM || type == PTR_TO_MEM_OR_NULL; } @@ -499,6 +504,8 @@ static const char * const reg_type_str[] = { [PTR_TO_SOCK_COMMON_OR_NULL] = "sock_common_or_null", [PTR_TO_TCP_SOCK] = "tcp_sock", [PTR_TO_TCP_SOCK_OR_NULL] = "tcp_sock_or_null", + [PTR_TO_MPTCP_SOCK] = "mptcp_sock", + [PTR_TO_MPTCP_SOCK_OR_NULL] = "mptcp_sock_or_null", [PTR_TO_TP_BUFFER] = "tp_buffer", [PTR_TO_XDP_SOCK] = "xdp_sock", [PTR_TO_BTF_ID] = "ptr_", @@ -2176,6 +2183,8 @@ static bool is_spillable_regtype(enum bpf_reg_type type) case PTR_TO_SOCK_COMMON_OR_NULL: case PTR_TO_TCP_SOCK: case PTR_TO_TCP_SOCK_OR_NULL: + case PTR_TO_MPTCP_SOCK: + case PTR_TO_MPTCP_SOCK_OR_NULL: case PTR_TO_XDP_SOCK: case PTR_TO_BTF_ID: case PTR_TO_BTF_ID_OR_NULL: @@ -2777,6 +2786,9 @@ static int check_sock_access(struct bpf_verifier_env *env, int insn_idx, case PTR_TO_TCP_SOCK: valid = bpf_tcp_sock_is_valid_access(off, size, t, &info); break; + case PTR_TO_MPTCP_SOCK: + valid = bpf_mptcp_sock_is_valid_access(off, size, t, &info); + break; case PTR_TO_XDP_SOCK: valid = bpf_xdp_sock_is_valid_access(off, size, t, &info); break; @@ -2935,6 +2947,9 @@ static int check_ptr_alignment(struct bpf_verifier_env *env, case PTR_TO_TCP_SOCK: pointer_desc = "tcp_sock "; break; + case PTR_TO_MPTCP_SOCK: + pointer_desc = "mptcp_sock "; + break; case PTR_TO_XDP_SOCK: pointer_desc = "xdp_sock "; break; @@ -4899,6 +4914,10 @@ static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn mark_reg_known_zero(env, regs, BPF_REG_0); regs[BPF_REG_0].type = PTR_TO_TCP_SOCK_OR_NULL; regs[BPF_REG_0].id = ++env->id_gen; + } else if (fn->ret_type == RET_PTR_TO_MPTCP_SOCK_OR_NULL) { + mark_reg_known_zero(env, regs, BPF_REG_0); + regs[BPF_REG_0].type = PTR_TO_MPTCP_SOCK_OR_NULL; + regs[BPF_REG_0].id = ++env->id_gen; } else if (fn->ret_type == RET_PTR_TO_ALLOC_MEM_OR_NULL) { mark_reg_known_zero(env, regs, BPF_REG_0); regs[BPF_REG_0].type = PTR_TO_MEM_OR_NULL; @@ -5226,6 +5245,8 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env, case PTR_TO_SOCK_COMMON_OR_NULL: case PTR_TO_TCP_SOCK: case PTR_TO_TCP_SOCK_OR_NULL: + case PTR_TO_MPTCP_SOCK: + case PTR_TO_MPTCP_SOCK_OR_NULL: case PTR_TO_XDP_SOCK: verbose(env, "R%d pointer arithmetic on %s prohibited\n", dst, reg_type_str[ptr_reg->type]); @@ -6880,6 +6901,8 @@ static void mark_ptr_or_null_reg(struct bpf_func_state *state, reg->type = PTR_TO_SOCK_COMMON; } else if (reg->type == PTR_TO_TCP_SOCK_OR_NULL) { reg->type = PTR_TO_TCP_SOCK; + } else if (reg->type == PTR_TO_MPTCP_SOCK_OR_NULL) { + reg->type = PTR_TO_MPTCP_SOCK; } else if (reg->type == PTR_TO_BTF_ID_OR_NULL) { reg->type = PTR_TO_BTF_ID; } else if (reg->type == PTR_TO_MEM_OR_NULL) { @@ -8242,6 +8265,8 @@ static bool regsafe(struct bpf_reg_state *rold, struct bpf_reg_state *rcur, case PTR_TO_SOCK_COMMON_OR_NULL: case PTR_TO_TCP_SOCK: case PTR_TO_TCP_SOCK_OR_NULL: + case PTR_TO_MPTCP_SOCK: + case PTR_TO_MPTCP_SOCK_OR_NULL: case PTR_TO_XDP_SOCK: /* Only valid matches are exact, which memcmp() above * would have accepted @@ -8769,6 +8794,8 @@ static bool reg_type_mismatch_ok(enum bpf_reg_type type) case PTR_TO_SOCK_COMMON_OR_NULL: case PTR_TO_TCP_SOCK: case PTR_TO_TCP_SOCK_OR_NULL: + case PTR_TO_MPTCP_SOCK: + case PTR_TO_MPTCP_SOCK_OR_NULL: case PTR_TO_XDP_SOCK: case PTR_TO_BTF_ID: case PTR_TO_BTF_ID_OR_NULL: @@ -9889,6 +9916,9 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) case PTR_TO_TCP_SOCK: convert_ctx_access = bpf_tcp_sock_convert_ctx_access; break; + case PTR_TO_MPTCP_SOCK: + convert_ctx_access = bpf_mptcp_sock_convert_ctx_access; + break; case PTR_TO_XDP_SOCK: convert_ctx_access = bpf_xdp_sock_convert_ctx_access; break; diff --git a/net/core/filter.c b/net/core/filter.c index ce83e1ec5259..59411822df1a 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -6560,6 +6560,10 @@ sock_ops_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) case BPF_FUNC_tcp_sock: return &bpf_tcp_sock_proto; #endif /* CONFIG_INET */ +#ifdef CONFIG_MPTCP + case BPF_FUNC_mptcp_sock: + return &bpf_mptcp_sock_proto; +#endif /* CONFIG_MPTCP */ default: return bpf_base_func_proto(func_id); } diff --git a/net/mptcp/Makefile b/net/mptcp/Makefile index a611968be4d7..aae2ede220ed 100644 --- a/net/mptcp/Makefile +++ b/net/mptcp/Makefile @@ -10,3 +10,5 @@ obj-$(CONFIG_INET_MPTCP_DIAG) += mptcp_diag.o mptcp_crypto_test-objs := crypto_test.o mptcp_token_test-objs := token_test.o obj-$(CONFIG_MPTCP_KUNIT_TESTS) += mptcp_crypto_test.o mptcp_token_test.o + +obj-$(CONFIG_BPF) += bpf.o diff --git a/net/mptcp/bpf.c b/net/mptcp/bpf.c new file mode 100644 index 000000000000..5332469fbb28 --- /dev/null +++ b/net/mptcp/bpf.c @@ -0,0 +1,72 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Multipath TCP + * + * Copyright (c) 2020, Tessares SA. + * + * Author: Nicolas Rybowski + * + */ + +#include + +#include "protocol.h" + +bool bpf_mptcp_sock_is_valid_access(int off, int size, enum bpf_access_type type, + struct bpf_insn_access_aux *info) +{ + if (off < 0 || off >= offsetofend(struct bpf_mptcp_sock, token)) + return false; + + if (off % size != 0) + return false; + + switch (off) { + default: + return size == sizeof(__u32); + } +} + +u32 bpf_mptcp_sock_convert_ctx_access(enum bpf_access_type type, + const struct bpf_insn *si, + struct bpf_insn *insn_buf, + struct bpf_prog *prog, u32 *target_size) +{ + struct bpf_insn *insn = insn_buf; + +#define BPF_MPTCP_SOCK_GET_COMMON(FIELD) \ + do { \ + BUILD_BUG_ON(sizeof_field(struct mptcp_sock, FIELD) > \ + sizeof_field(struct bpf_mptcp_sock, FIELD)); \ + *insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct mptcp_sock, FIELD), \ + si->dst_reg, si->src_reg, \ + offsetof(struct mptcp_sock, FIELD)); \ + } while (0) + + if (insn > insn_buf) + return insn - insn_buf; + + switch (si->off) { + case offsetof(struct bpf_mptcp_sock, token): + BPF_MPTCP_SOCK_GET_COMMON(token); + break; + } + + return insn - insn_buf; +} + +BPF_CALL_1(bpf_mptcp_sock, struct sock *, sk) +{ + if (sk_fullsock(sk) && sk->sk_protocol == IPPROTO_TCP && sk_is_mptcp(sk)) { + struct mptcp_subflow_context *mptcp_sfc = mptcp_subflow_ctx(sk); + + return (unsigned long)mptcp_sfc->conn; + } + return (unsigned long)NULL; +} + +const struct bpf_func_proto bpf_mptcp_sock_proto = { + .func = bpf_mptcp_sock, + .gpl_only = false, + .ret_type = RET_PTR_TO_MPTCP_SOCK_OR_NULL, + .arg1_type = ARG_PTR_TO_SOCK_COMMON, +}; diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py index 5bfa448b4704..4db41a1c344a 100755 --- a/scripts/bpf_helpers_doc.py +++ b/scripts/bpf_helpers_doc.py @@ -428,6 +428,7 @@ class PrinterHelpers(Printer): 'struct tcp_request_sock', 'struct udp6_sock', 'struct task_struct', + 'struct bpf_mptcp_sock', 'struct __sk_buff', 'struct sk_msg_md', @@ -472,6 +473,7 @@ class PrinterHelpers(Printer): 'struct tcp_request_sock', 'struct udp6_sock', 'struct task_struct', + 'struct bpf_mptcp_sock', } mapped_types = { 'u8': '__u8', diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 1a0caa4abc2d..8d51973cbf6f 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -3394,6 +3394,14 @@ union bpf_attr { * A non-negative value equal to or less than *size* on success, * or a negative error in case of failure. * + * struct bpf_mptcp_sock *bpf_mptcp_sock(struct bpf_sock *sk) + * Description + * This helper gets a **struct bpf_mptcp_sock** pointer from a + * **struct bpf_sock** pointer. + * Return + * A **struct bpf_mptcp_sock** pointer on success, or **NULL** in + * case of failure. + * */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -3538,6 +3546,7 @@ union bpf_attr { FN(skc_to_tcp_request_sock), \ FN(skc_to_udp6_sock), \ FN(get_task_stack), \ + FN(mptcp_sock), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper @@ -3867,6 +3876,10 @@ struct bpf_tcp_sock { __u32 is_mptcp; /* Is MPTCP subflow? */ }; +struct bpf_mptcp_sock { + __u32 token; /* msk token */ +}; + struct bpf_sock_tuple { union { struct {