From patchwork Sun Nov 25 18:09:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Conole X-Patchwork-Id: 1002869 X-Patchwork-Delegate: pablo@netfilter.org Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netfilter-devel-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=bytheb.org Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=bytheb-org.20150623.gappssmtp.com header.i=@bytheb-org.20150623.gappssmtp.com header.b="Z5hmTd9E"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 432ykN4rm6z9s5c for ; Mon, 26 Nov 2018 05:09:48 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726380AbeKZFBR (ORCPT ); Mon, 26 Nov 2018 00:01:17 -0500 Received: from mail-io1-f68.google.com ([209.85.166.68]:34740 "EHLO mail-io1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725838AbeKZFBQ (ORCPT ); Mon, 26 Nov 2018 00:01:16 -0500 Received: by mail-io1-f68.google.com with SMTP id f6so12164049iob.1 for ; Sun, 25 Nov 2018 10:09:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytheb-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=2HImVIHjiKiueh5AFbs/HvT4PXHBa+AMvxJlO0XW1Us=; b=Z5hmTd9EvobjHm/rMZB5LQRzgIE3eIeXSEBA1Gb1VpKWByaIhe+Xl5ERi+XlGIR3/I B/tdNM7wYUCl3p21DdGO8Xcdyz3o8VZdcSjv3X4hpo1zw5hB0Q7ZgzUz+F+jAGGXjNeR 0PEkS23YufmbDl+Gqbt1r3Xf8cWN6ApIjGlXfc96L8jcT7YNMQ0z9ihq3vVA6FuBkGLn FAeZZ4Trh17UUlt5BhyBJzqHlM9Se8BrDNHNnK4vwYlDYav4GKvZoiN/GjfAK+XFsE6J 1CqxmMFLVOFZa2bKG+QfcktwfJtviko0BQjAov65BEkHceLsDk3ZuQ4B7W1UD/HJ7DGk funw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=2HImVIHjiKiueh5AFbs/HvT4PXHBa+AMvxJlO0XW1Us=; b=MgxV+95jRaIQLHHBO7Ajx2gNZbbIVwab5Zpmki6+hwk/7Nto2s4Qz1okGqbMAwxC+g hO2Wht7IceJ/CeHLotDBUp4aD/xiQZLAplRDYUzrGwtmlX1knhs7a6NTOjzlpgp+pF/o UPzQC0etqVdKUqD+LAjqnezpdlMLISHiiyMtVbF7lKMfhalKYSxTQY9Kv1cauY4pgnN4 9g2+9MulGHIw8IwobTTGyYH2ezG5nQAPmE4RDHd53zYBMMYN4ZmWTV1xP0KS/jH//i85 blJ5SuVGLdpv7zS8eXZzQ8IZGgxdtMsdBu8xXPKfrbAf5HHtlMTvJVON/wl0k1bYEr0T dRZw== X-Gm-Message-State: AA+aEWaLNNe+LjiBK8pARhOJqOZOEkC09n+iouL1I9ujSaGb2zNM2xKw hoRWShtsByMm523L/NLfqpN5sQ== X-Google-Smtp-Source: AFSGD/WwAnErXMg1jEVBDMs+dFyBTi/ep3hy7h9/hbo51gUE6HokA6p2hJyG8rDl6NG4AMB38fa0Gg== X-Received: by 2002:a6b:7a01:: with SMTP id h1mr17552020iom.116.1543169378222; Sun, 25 Nov 2018 10:09:38 -0800 (PST) Received: from dhcp-25.97.bos.redhat.com (047-014-005-015.res.spectrum.com. [47.14.5.15]) by smtp.gmail.com with ESMTPSA id y8sm5959768ita.5.2018.11.25.10.09.36 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 25 Nov 2018 10:09:37 -0800 (PST) From: Aaron Conole To: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org, coreteam@netfilter.org, Alexei Starovoitov , Daniel Borkmann , Pablo Neira Ayuso , Jozsef Kadlecsik , Florian Westphal , John Fastabend , Jesper Brouer , "David S . Miller" , Andy Gospodarek , Rony Efraim , Simon Horman , Marcelo Leitner Subject: [RFC -next v0 1/3] bpf: modular maps Date: Sun, 25 Nov 2018 13:09:17 -0500 Message-Id: <20181125180919.13996-2-aconole@bytheb.org> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181125180919.13996-1-aconole@bytheb.org> References: <20181125180919.13996-1-aconole@bytheb.org> MIME-Version: 1.0 Sender: netfilter-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org This commit allows for map operations to be loaded by an lkm, rather than needing to be baked into the kernel at compile time. Signed-off-by: Aaron Conole --- include/linux/bpf.h | 6 +++++ init/Kconfig | 8 +++++++ kernel/bpf/syscall.c | 57 +++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 70 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 33014ae73103..bf4531f076ca 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -553,6 +553,7 @@ static inline int bpf_map_attr_numa_node(const union bpf_attr *attr) struct bpf_prog *bpf_prog_get_type_path(const char *name, enum bpf_prog_type type); int array_map_alloc_check(union bpf_attr *attr); +void bpf_map_insert_ops(size_t id, const struct bpf_map_ops *ops); #else /* !CONFIG_BPF_SYSCALL */ static inline struct bpf_prog *bpf_prog_get(u32 ufd) @@ -665,6 +666,11 @@ static inline struct bpf_prog *bpf_prog_get_type_path(const char *name, { return ERR_PTR(-EOPNOTSUPP); } + +static inline void bpf_map_insert_ops(size_t id, + const struct bpf_map_ops *ops) +{ +} #endif /* CONFIG_BPF_SYSCALL */ static inline struct bpf_prog *bpf_prog_get_type(u32 ufd, diff --git a/init/Kconfig b/init/Kconfig index a4112e95724a..aa4eb98af656 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1489,6 +1489,14 @@ config BPF_JIT_ALWAYS_ON Enables BPF JIT and removes BPF interpreter to avoid speculative execution of BPF instructions by the interpreter +config BPF_LOADABLE_MAPS + bool "Allow map types to be loaded with modules" + depends on BPF_SYSCALL && MODULES + help + Enables BPF map types to be provided by loadable modules + instead of always compiled in. Maps provided dynamically + may only be used by super users. + config USERFAULTFD bool "Enable userfaultfd() system call" select ANON_INODES diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index cf5040fd5434..fa1db9ab81e1 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -49,6 +49,8 @@ static DEFINE_SPINLOCK(map_idr_lock); int sysctl_unprivileged_bpf_disabled __read_mostly; +const struct bpf_map_ops loadable_map = {}; + static const struct bpf_map_ops * const bpf_map_types[] = { #define BPF_PROG_TYPE(_id, _ops) #define BPF_MAP_TYPE(_id, _ops) \ @@ -58,6 +60,15 @@ static const struct bpf_map_ops * const bpf_map_types[] = { #undef BPF_MAP_TYPE }; +static const struct bpf_map_ops * bpf_loadable_map_types[] = { +#define BPF_PROG_TYPE(_id, _ops) +#define BPF_MAP_TYPE(_id, _ops) \ + [_id] = NULL, +#include +#undef BPF_PROG_TYPE +#undef BPF_MAP_TYPE +}; + /* * If we're handed a bigger struct than we know of, ensure all the unknown bits * are 0 - i.e. new user-space does not rely on any kernel feature extensions @@ -105,6 +116,48 @@ const struct bpf_map_ops bpf_map_offload_ops = { .map_check_btf = map_check_no_btf, }; +/* + * Fills in the modular ops map, provided that the entry is not already + * filled, and that the caller has CAP_SYS_ADMIN. */ +void bpf_map_insert_ops(size_t id, const struct bpf_map_ops *ops) +{ +#ifdef CONFIG_BPF_LOADABLE_MAPS + if (!capable(CAP_SYS_ADMIN)) + return; + + if (id >= ARRAY_SIZE(bpf_loadable_map_types)) + return; + + id = array_index_nospec(id, ARRAY_SIZE(bpf_loadable_map_types)); + if (bpf_loadable_map_types[id] == NULL) + bpf_loadable_map_types[id] = ops; +#endif +} +EXPORT_SYMBOL_GPL(bpf_map_insert_ops); + +static const struct bpf_map_ops *find_loadable_ops(u32 type) +{ + struct user_struct *user = get_current_user(); + const struct bpf_map_ops *ops = NULL; + + if (user->uid.val) + goto done; + +#ifdef CONFIG_BPF_LOADABLE_MAPS + if (!capable(CAP_SYS_ADMIN)) + goto done; + + if (type >= ARRAY_SIZE(bpf_loadable_map_types)) + goto done; + type = array_index_nospec(type, ARRAY_SIZE(bpf_loadable_map_types)); + ops = bpf_loadable_map_types[type]; +#endif + +done: + free_uid(user); + return ops; +} + static struct bpf_map *find_and_alloc_map(union bpf_attr *attr) { const struct bpf_map_ops *ops; @@ -115,7 +168,8 @@ static struct bpf_map *find_and_alloc_map(union bpf_attr *attr) if (type >= ARRAY_SIZE(bpf_map_types)) return ERR_PTR(-EINVAL); type = array_index_nospec(type, ARRAY_SIZE(bpf_map_types)); - ops = bpf_map_types[type]; + ops = (bpf_map_types[type] != &loadable_map) ? bpf_map_types[type] : + find_loadable_ops(type); if (!ops) return ERR_PTR(-EINVAL); @@ -180,6 +234,7 @@ int bpf_map_precharge_memlock(u32 pages) return -EPERM; return 0; } +EXPORT_SYMBOL_GPL(bpf_map_precharge_memlock); static int bpf_charge_memlock(struct user_struct *user, u32 pages) { From patchwork Sun Nov 25 18:09:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Conole X-Patchwork-Id: 1002873 X-Patchwork-Delegate: pablo@netfilter.org Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netfilter-devel-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=bytheb.org Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=bytheb-org.20150623.gappssmtp.com header.i=@bytheb-org.20150623.gappssmtp.com header.b="QySSNsXV"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 432ykb4S7Cz9s3C for ; Mon, 26 Nov 2018 05:09:59 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726585AbeKZFBT (ORCPT ); Mon, 26 Nov 2018 00:01:19 -0500 Received: from mail-it1-f196.google.com ([209.85.166.196]:35056 "EHLO mail-it1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726363AbeKZFBS (ORCPT ); Mon, 26 Nov 2018 00:01:18 -0500 Received: by mail-it1-f196.google.com with SMTP id v11so24202304itj.0 for ; Sun, 25 Nov 2018 10:09:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytheb-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=9cO4xDETbEz/pkA49gxSXtcJtXsj+4EyBF/+O7VHKiE=; b=QySSNsXVrbhMTad9pQ66izQTBnjzHfz/SQ59o+mDgsvfUmwtZa40t5e67EE3fulUTM /IvVSFDLiN43EvqKbB9Ejy0v3SKq3b6Rz9WBIesthjS8janvFOKA5oM4OmU7bMjrc9s1 BjGBKhYCb6ce71YeoTxE5xJle+wk1JfT4caXKkotMllGDwXa68AddwQpnWVpnAxlOWWn 1T5xssyL/wJbkF4fD6NhYXJbgYu/ACsxU3CdqOLssgC7Km7+L/lNV04ORisLH5zmXpur zbjchn6HpDhRAvV0sN/zJFqgcrVamCR8/rzqiljCrjvAIAsJlGLuviJQI8g8nyt7BdW4 FffQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9cO4xDETbEz/pkA49gxSXtcJtXsj+4EyBF/+O7VHKiE=; b=YRYj02qJsPCKNjbQUsP17sqHl8nHGVMuSFoQOZTliTkrIpxz6FIP7KoJPW1dxqeEyZ BXztbTMevXGa8rubswACTPdpeyXCLmnGMnum5KklHtUW3OSF1KDjzYbk9+1TNRTkm2vy j0bahbmzwn1n7pDasP+XLyTyChSC5LAc3zPeh3wsaPKI3gK/8pmPAK0GVqq1B65Ad037 EZaaqGn02VfVHAEOX16LmMKB+eBkprg+miS6nbUV7MggmnN9xL1ZLoUu+psmcNx9ziTn fEWdPkl3s//5kFuyT4SE1XMJjsqfnb/kQNunCqhjdnQ2ubitvKPdorgyJaMccEXbPAa1 vx8A== X-Gm-Message-State: AGRZ1gLw5SIY7lyqIeA55K3RYrC/WHYuZdnXNz9uy9nUMYhb91A7jcjx eshM/S+3oazBux3dL5InDvpFUQ== X-Google-Smtp-Source: AJdET5cKuEebQB8o6kmcMVRFiWmoSM26DajGObGNhE3OlDIZtAPDGu+esBPsVbmeoNfme56pbZ3HmQ== X-Received: by 2002:a02:f95:: with SMTP id 21mr20763961jao.66.1543169380205; Sun, 25 Nov 2018 10:09:40 -0800 (PST) Received: from dhcp-25.97.bos.redhat.com (047-014-005-015.res.spectrum.com. [47.14.5.15]) by smtp.gmail.com with ESMTPSA id y8sm5959768ita.5.2018.11.25.10.09.38 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 25 Nov 2018 10:09:39 -0800 (PST) From: Aaron Conole To: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org, coreteam@netfilter.org, Alexei Starovoitov , Daniel Borkmann , Pablo Neira Ayuso , Jozsef Kadlecsik , Florian Westphal , John Fastabend , Jesper Brouer , "David S . Miller" , Andy Gospodarek , Rony Efraim , Simon Horman , Marcelo Leitner Subject: [RFC -next v0 2/3] netfilter: nf_flow_table: support a new 'snoop' mode Date: Sun, 25 Nov 2018 13:09:18 -0500 Message-Id: <20181125180919.13996-3-aconole@bytheb.org> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181125180919.13996-1-aconole@bytheb.org> References: <20181125180919.13996-1-aconole@bytheb.org> MIME-Version: 1.0 Sender: netfilter-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org This patch adds the ability for a flow table to receive updates on all flows added/removed to any flow table in the system. This will allow other subsystems in the kernel to register a lookup mechanism into the nftables connection tracker for those connections which should be sent to a flow offload table. Each flow table can now be set with some kinds of flags, and if one of those flags is the new 'snoop' flag, it will be updated whenever a flow entry is added or removed to any flow table. Signed-off-by: Aaron Conole --- include/net/netfilter/nf_flow_table.h | 5 +++ include/uapi/linux/netfilter/nf_tables.h | 2 ++ net/netfilter/nf_flow_table_core.c | 44 ++++++++++++++++++++++-- net/netfilter/nf_tables_api.c | 13 ++++++- 4 files changed, 60 insertions(+), 4 deletions(-) diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index 77e2761d4f2f..3fdfeb17f500 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -20,9 +20,14 @@ struct nf_flowtable_type { struct module *owner; }; +enum nf_flowtable_flags { + NF_FLOWTABLE_F_SNOOP = 0x1, +}; + struct nf_flowtable { struct list_head list; struct rhashtable rhashtable; + u32 flags; const struct nf_flowtable_type *type; struct delayed_work gc_work; }; diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index 7de4f1bdaf06..f1cfe30aecde 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -1482,6 +1482,7 @@ enum nft_object_attributes { * @NFTA_FLOWTABLE_HOOK: netfilter hook configuration(NLA_U32) * @NFTA_FLOWTABLE_USE: number of references to this flow table (NLA_U32) * @NFTA_FLOWTABLE_HANDLE: object handle (NLA_U64) + * @NFTA_FLOWTABLE_FLAGS: flags (NLA_U32) */ enum nft_flowtable_attributes { NFTA_FLOWTABLE_UNSPEC, @@ -1491,6 +1492,7 @@ enum nft_flowtable_attributes { NFTA_FLOWTABLE_USE, NFTA_FLOWTABLE_HANDLE, NFTA_FLOWTABLE_PAD, + NFTA_FLOWTABLE_FLAGS, __NFTA_FLOWTABLE_MAX }; #define NFTA_FLOWTABLE_MAX (__NFTA_FLOWTABLE_MAX - 1) diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c index b7a4816add76..289a2299eea2 100644 --- a/net/netfilter/nf_flow_table_core.c +++ b/net/netfilter/nf_flow_table_core.c @@ -15,6 +15,7 @@ struct flow_offload_entry { struct flow_offload flow; struct nf_conn *ct; + struct nf_flow_route route; struct rcu_head rcu_head; }; @@ -78,6 +79,7 @@ flow_offload_alloc(struct nf_conn *ct, struct nf_flow_route *route) goto err_dst_cache_reply; entry->ct = ct; + entry->route = *route; flow_offload_fill_dir(flow, ct, route, FLOW_OFFLOAD_DIR_ORIGINAL); flow_offload_fill_dir(flow, ct, route, FLOW_OFFLOAD_DIR_REPLY); @@ -100,6 +102,18 @@ flow_offload_alloc(struct nf_conn *ct, struct nf_flow_route *route) } EXPORT_SYMBOL_GPL(flow_offload_alloc); +static struct flow_offload *flow_offload_clone(struct flow_offload *flow) +{ + struct flow_offload *clone_flow_val; + struct flow_offload_entry *e; + + e = container_of(flow, struct flow_offload_entry, flow); + + clone_flow_val = flow_offload_alloc(e->ct, &e->route); + + return clone_flow_val; +} + static void flow_offload_fixup_tcp(struct ip_ct_tcp *tcp) { tcp->state = TCP_CONNTRACK_ESTABLISHED; @@ -182,7 +196,7 @@ static const struct rhashtable_params nf_flow_offload_rhash_params = { .automatic_shrinking = true, }; -int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow) +static void __flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow) { flow->timeout = (u32)jiffies; @@ -192,12 +206,30 @@ int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow) rhashtable_insert_fast(&flow_table->rhashtable, &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].node, nf_flow_offload_rhash_params); +} + +int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow) +{ + struct nf_flowtable *flowtable; + + __flow_offload_add(flow_table, flow); + + mutex_lock(&flowtable_lock); + list_for_each_entry(flowtable, &flowtables, list) { + if (flowtable != flow_table && + flowtable->flags & NF_FLOWTABLE_F_SNOOP) { + struct flow_offload *flow_clone = + flow_offload_clone(flow); + __flow_offload_add(flowtable, flow_clone); + } + } + mutex_unlock(&flowtable_lock); return 0; } EXPORT_SYMBOL_GPL(flow_offload_add); -static void flow_offload_del(struct nf_flowtable *flow_table, - struct flow_offload *flow) +static void __flow_offload_del(struct nf_flowtable *flow_table, + struct flow_offload *flow) { struct flow_offload_entry *e; @@ -210,6 +242,12 @@ static void flow_offload_del(struct nf_flowtable *flow_table, e = container_of(flow, struct flow_offload_entry, flow); clear_bit(IPS_OFFLOAD_BIT, &e->ct->status); +} + +static void flow_offload_del(struct nf_flowtable *flow_table, + struct flow_offload *flow) +{ + __flow_offload_del(flow_table, flow); flow_offload_free(flow); } diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 42487d01a3ed..8148de9f9a54 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -5569,6 +5569,15 @@ static int nf_tables_newflowtable(struct net *net, struct sock *nlsk, if (err < 0) goto err3; + if (nla[NFTA_FLOWTABLE_FLAGS]) { + flowtable->data.flags = + ntohl(nla_get_be32(nla[NFTA_FLOWTABLE_FLAGS])); + if (flowtable->data.flags & ~NF_FLOWTABLE_F_SNOOP) { + err = -EINVAL; + goto err4; + } + } + err = nf_tables_flowtable_parse_hook(&ctx, nla[NFTA_FLOWTABLE_HOOK], flowtable); if (err < 0) @@ -5694,7 +5703,9 @@ static int nf_tables_fill_flowtable_info(struct sk_buff *skb, struct net *net, nla_put_string(skb, NFTA_FLOWTABLE_NAME, flowtable->name) || nla_put_be32(skb, NFTA_FLOWTABLE_USE, htonl(flowtable->use)) || nla_put_be64(skb, NFTA_FLOWTABLE_HANDLE, cpu_to_be64(flowtable->handle), - NFTA_FLOWTABLE_PAD)) + NFTA_FLOWTABLE_PAD) || + nla_put_be32(skb, NFTA_FLOWTABLE_FLAGS, + htonl(flowtable->data.flags))) goto nla_put_failure; nest = nla_nest_start(skb, NFTA_FLOWTABLE_HOOK); From patchwork Sun Nov 25 18:09:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Conole X-Patchwork-Id: 1002871 X-Patchwork-Delegate: pablo@netfilter.org Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netfilter-devel-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=bytheb.org Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=bytheb-org.20150623.gappssmtp.com header.i=@bytheb-org.20150623.gappssmtp.com header.b="pn9O2FAf"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 432ykR6p6Rz9s9J for ; Mon, 26 Nov 2018 05:09:51 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726746AbeKZFBV (ORCPT ); Mon, 26 Nov 2018 00:01:21 -0500 Received: from mail-it1-f193.google.com ([209.85.166.193]:39691 "EHLO mail-it1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726655AbeKZFBU (ORCPT ); Mon, 26 Nov 2018 00:01:20 -0500 Received: by mail-it1-f193.google.com with SMTP id m15so24183765itl.4 for ; Sun, 25 Nov 2018 10:09:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytheb-org.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=UhWWASpWuscnRsDFC5DeC96YRbPqtXlv0yit6Esju10=; b=pn9O2FAfWBa2/l5ZP5eN3v6aYao+QZYJVFsyWtk1VGD/RNaHkYGgowUZHvwoF1PSI6 Ya+LOh0y2awr/SH0Lw6Mks8JLYUqgiti7tNmCbzabVQuJvDFYyK1x01Nn1IJOdDanhG+ 9+fk1nf168wVZMl/qcVMLLzTXIYMWBis1GJCAQe7Nuoljuwzaew2PcJOIoyGuEoCl4x3 o4jT8huOPXBdL749ddS96BHZBBRfXqFtHhaozabLL5UTkPhIfh6YsYgDj7KvdM05YHq+ RsMs0YCIHFyo9UwFqMXDcJvjtZ9XB/rv0kfozgUtfj/2XoIj+a/YvKMf54fxYBq1TAil ZfJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=UhWWASpWuscnRsDFC5DeC96YRbPqtXlv0yit6Esju10=; b=CJ6pOTL//oirmiY0p524aGn2unUz7TcpwcEz03bmhA1fjh0TQN44IrYxkpXTCSlGN7 wDD+lTzT1MAJ9m7djG+b7TkZwXUQI5QaNQ/jBd77v0bjaoNALfUQGZciTYsqAYUGwtqq OV/iW1ch9Tk1htuJXcqKstPEQElMvvDu3aLSUi9xVnlE2yq5+N9TL+USm3Sd7/+QoNos exYkiFPk/yLbJyv6w6UNuARKja7zWjcHUtTcOnAF0dx6zD+zCFrWH/W1GoMxh8dP9OG0 hvjMScXytwip5McyR6SkfalhHRKJrprBLUuQ3hmaBPNz3DAK+zLzuvJ8IWxf59MPov/Q 8NeA== X-Gm-Message-State: AGRZ1gK6JhiqPsP7AMFfI9GUCGUXmQScFrmxWDyKrE5PtQvDfTp3dqYr EI88lWTBzoVcYhz6JFdwpmL/oA== X-Google-Smtp-Source: AFSGD/UywMOP9aEsbkrc8XK2rYa5eLMB5rAsNn8grl3rej/m4gzFMBZA9PFmeWeSIm7pt9nVLsmR/A== X-Received: by 2002:a24:5284:: with SMTP id d126mr20588190itb.110.1543169382150; Sun, 25 Nov 2018 10:09:42 -0800 (PST) Received: from dhcp-25.97.bos.redhat.com (047-014-005-015.res.spectrum.com. [47.14.5.15]) by smtp.gmail.com with ESMTPSA id y8sm5959768ita.5.2018.11.25.10.09.40 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Sun, 25 Nov 2018 10:09:41 -0800 (PST) From: Aaron Conole To: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org, coreteam@netfilter.org, Alexei Starovoitov , Daniel Borkmann , Pablo Neira Ayuso , Jozsef Kadlecsik , Florian Westphal , John Fastabend , Jesper Brouer , "David S . Miller" , Andy Gospodarek , Rony Efraim , Simon Horman , Marcelo Leitner Subject: [RFC -next v0 3/3] netfilter: nf_flow_table_bpf_map: introduce new loadable bpf map Date: Sun, 25 Nov 2018 13:09:19 -0500 Message-Id: <20181125180919.13996-4-aconole@bytheb.org> X-Mailer: git-send-email 2.19.1 In-Reply-To: <20181125180919.13996-1-aconole@bytheb.org> References: <20181125180919.13996-1-aconole@bytheb.org> MIME-Version: 1.0 Sender: netfilter-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org This commit introduces a new loadable map that allows an eBPF program to query the flow offload tables for specific flow information. For now, that information is limited to input and output index information. Future enhancements would be to include connection tracking details, such as state, metadata, and allow for window validation. Signed-off-by: Aaron Conole --- include/linux/bpf_types.h | 2 + include/uapi/linux/bpf.h | 7 + net/netfilter/Kconfig | 9 + net/netfilter/Makefile | 1 + net/netfilter/nf_flow_table_bpf_flowmap.c | 202 ++++++++++++++++++++++ 5 files changed, 221 insertions(+) create mode 100644 net/netfilter/nf_flow_table_bpf_flowmap.c diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index 44d9ab4809bd..82d3038cf6c3 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -71,3 +71,5 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, reuseport_array_ops) #endif BPF_MAP_TYPE(BPF_MAP_TYPE_QUEUE, queue_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops) + +BPF_MAP_TYPE(BPF_MAP_TYPE_FLOWMAP, loadable_map) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 852dc17ab47a..fb77c8c5c209 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -131,6 +131,7 @@ enum bpf_map_type { BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE, BPF_MAP_TYPE_QUEUE, BPF_MAP_TYPE_STACK, + BPF_MAP_TYPE_FLOWMAP, }; enum bpf_prog_type { @@ -2942,4 +2943,10 @@ struct bpf_flow_keys { }; }; +struct bpf_flow_map { + struct bpf_flow_keys flow; + __u32 iifindex; + __u32 oifindex; +}; + #endif /* _UAPI__LINUX_BPF_H__ */ diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index 2ab870ef233a..30f1bc9084be 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -709,6 +709,15 @@ config NF_FLOW_TABLE To compile it as a module, choose M here. +config NF_FLOW_TABLE_BPF + tristate "Netfilter flowtable BPF map" + depends on NF_FLOW_TABLE + depends on BPF_LOADABLE_MAPS + help + This option adds support for retrieving flow table entries + via a loadable BPF map. + To compile it as a module, choose M here. + config NETFILTER_XTABLES tristate "Netfilter Xtables support (required for ip_tables)" default m if NETFILTER_ADVANCED=n diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 4ddf3ef51ece..8dba928a03fd 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -121,6 +121,7 @@ obj-$(CONFIG_NFT_FWD_NETDEV) += nft_fwd_netdev.o # flow table infrastructure obj-$(CONFIG_NF_FLOW_TABLE) += nf_flow_table.o +obj-$(CONFIG_NF_FLOW_TABLE_BPF) += nf_flow_table_bpf_flowmap.o nf_flow_table-objs := nf_flow_table_core.o nf_flow_table_ip.o obj-$(CONFIG_NF_FLOW_TABLE_INET) += nf_flow_table_inet.o diff --git a/net/netfilter/nf_flow_table_bpf_flowmap.c b/net/netfilter/nf_flow_table_bpf_flowmap.c new file mode 100644 index 000000000000..577985560883 --- /dev/null +++ b/net/netfilter/nf_flow_table_bpf_flowmap.c @@ -0,0 +1,202 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * Copyright (c) 2018, Aaron Conole + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + */ + +#include +#include +#include +#include +#include +#include + +struct flow_map_internal { + struct bpf_map map; + struct nf_flowtable net_flow_table; +}; + +static void flow_map_init_from_attr(struct bpf_map *map, union bpf_attr *attr) +{ + map->map_type = attr->map_type; + map->key_size = attr->key_size; + map->value_size = attr->value_size; + map->max_entries = attr->max_entries; + map->map_flags = attr->map_flags; + map->numa_node = bpf_map_attr_numa_node(attr); +} + +static struct bpf_map *flow_map_alloc(union bpf_attr *attr) +{ + struct flow_map_internal *fmap_ret; + u64 cost; + int err; + + if (!capable(CAP_NET_ADMIN)) + return ERR_PTR(-EPERM); + + if (attr->max_entries == 0 || + attr->key_size != sizeof(struct bpf_flow_map) || + attr->value_size != sizeof(struct bpf_flow_map)) + return ERR_PTR(-EINVAL); + + fmap_ret = kzalloc(sizeof(*fmap_ret), GFP_USER); + if (!fmap_ret) + return ERR_PTR(-ENOMEM); + + flow_map_init_from_attr(&fmap_ret->map, attr); + cost = (u64)fmap_ret->map.max_entries * sizeof(struct flow_offload); + if (cost >= U32_MAX - PAGE_SIZE) { + kfree(&fmap_ret); + return ERR_PTR(-ENOMEM); + } + + fmap_ret->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT; + + /* if map size is larger than memlock limit, reject it early */ + if ((err = bpf_map_precharge_memlock(fmap_ret->map.pages))) { + kfree(&fmap_ret); + return ERR_PTR(err); + } + + memset(&fmap_ret->net_flow_table, 0, sizeof(fmap_ret->net_flow_table)); + fmap_ret->net_flow_table.flags |= NF_FLOWTABLE_F_SNOOP; + nf_flow_table_init(&fmap_ret->net_flow_table); + + return &fmap_ret->map; +} + +static void flow_map_free(struct bpf_map *map) +{ + struct flow_map_internal *fmap = container_of(map, + struct flow_map_internal, + map); + + nf_flow_table_free(&fmap->net_flow_table); + synchronize_rcu(); + kfree(fmap); +} + +static void flow_walk(struct flow_offload *flow, void *data) +{ + printk("Flow offload dir0: %x:%d -> %x:%d, %u, %u, %d, %u\n", + flow->tuplehash[0].tuple.src_v4.s_addr, + flow->tuplehash[0].tuple.src_port, + flow->tuplehash[0].tuple.dst_v4.s_addr, + flow->tuplehash[0].tuple.dst_port, + flow->tuplehash[0].tuple.l3proto, + flow->tuplehash[0].tuple.l4proto, + flow->tuplehash[0].tuple.iifidx, + flow->tuplehash[0].tuple.dir + ); + + printk("Flow offload dir1: %x:%d -> %x:%d, %u, %u, %d, %u\n", + flow->tuplehash[1].tuple.src_v4.s_addr, + flow->tuplehash[1].tuple.src_port, + flow->tuplehash[1].tuple.dst_v4.s_addr, + flow->tuplehash[1].tuple.dst_port, + flow->tuplehash[1].tuple.l3proto, + flow->tuplehash[1].tuple.l4proto, + flow->tuplehash[1].tuple.iifidx, + flow->tuplehash[1].tuple.dir + ); +} + +static void *flow_map_lookup_elem(struct bpf_map *map, void *key) +{ + struct flow_map_internal *fmap = container_of(map, + struct flow_map_internal, map); + struct bpf_flow_map *internal_key = (struct bpf_flow_map *)key; + struct flow_offload_tuple_rhash *hash_ret; + struct flow_offload_tuple lookup_key; + + memset(&lookup_key, 0, sizeof(lookup_key)); + lookup_key.src_port = ntohs(internal_key->flow.sport); + lookup_key.dst_port = ntohs(internal_key->flow.dport); + lookup_key.dir = 0; + + if (internal_key->flow.addr_proto == htons(ETH_P_IP)) { + lookup_key.l3proto = AF_INET; + lookup_key.src_v4.s_addr = ntohl(internal_key->flow.ipv4_src); + lookup_key.dst_v4.s_addr = ntohl(internal_key->flow.ipv4_dst); + } else if (internal_key->flow.addr_proto == htons(ETH_P_IPV6)) { + lookup_key.l3proto = AF_INET6; + memcpy(&lookup_key.src_v6, + internal_key->flow.ipv6_src, + sizeof(lookup_key.src_v6)); + memcpy(&lookup_key.dst_v6, + internal_key->flow.ipv6_dst, + sizeof(lookup_key.dst_v6)); + } else + return NULL; + + lookup_key.l4proto = (u8)internal_key->flow.ip_proto; + lookup_key.iifidx = internal_key->iifindex; + + printk("Flow offload lookup: %x:%d -> %x:%d, %u, %u, %d, %u\n", + lookup_key.src_v4.s_addr, lookup_key.src_port, + lookup_key.dst_v4.s_addr, lookup_key.dst_port, + lookup_key.l3proto, lookup_key.l4proto, + lookup_key.iifidx, lookup_key.dir); + hash_ret = flow_offload_lookup(&fmap->net_flow_table, &lookup_key); + if (!hash_ret) { + memcpy(&lookup_key.src_v6, internal_key->flow.ipv6_src, + sizeof(lookup_key.src_v6)); + memcpy(&lookup_key.dst_v6, internal_key->flow.ipv6_dst, + sizeof(lookup_key.dst_v6)); + lookup_key.src_port = internal_key->flow.dport; + lookup_key.dst_port = internal_key->flow.sport; + lookup_key.dir = 1; + hash_ret = flow_offload_lookup(&fmap->net_flow_table, + &lookup_key); + } + + if (!hash_ret) { + printk("No flow found, but table is: %d\n", + atomic_read(&fmap->net_flow_table.rhashtable.nelems)); + nf_flow_table_iterate(&fmap->net_flow_table, flow_walk, NULL); + return NULL; + } + + printk("Flow matched!\n"); + return key; +} + +static int flow_map_get_next_key(struct bpf_map *map, void *key, void *next_key) +{ + return 0; +} + +static int flow_map_check_no_btf(const struct bpf_map *map, + const struct btf_type *key_type, + const struct btf_type *value_type) +{ + return -ENOTSUPP; +} + +const struct bpf_map_ops flow_map_ops = { + .map_alloc = flow_map_alloc, + .map_free = flow_map_free, + .map_get_next_key = flow_map_get_next_key, + .map_lookup_elem = flow_map_lookup_elem, + .map_check_btf = flow_map_check_no_btf, +}; + +static int __init flow_map_init(void) +{ + bpf_map_insert_ops(BPF_MAP_TYPE_FLOWMAP, &flow_map_ops); + return 0; +} + +module_init(flow_map_init); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Aaron Conole ");