From patchwork Fri Dec 22 19:27:26 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 852525 X-Patchwork-Delegate: pablo@netfilter.org Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netfilter-devel-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3z3JSt18fdz9sBd for ; Sat, 23 Dec 2017 06:28:14 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756798AbdLVT2M (ORCPT ); Fri, 22 Dec 2017 14:28:12 -0500 Received: from mail.us.es ([193.147.175.20]:42314 "EHLO mail.us.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756693AbdLVT2F (ORCPT ); Fri, 22 Dec 2017 14:28:05 -0500 Received: from antivirus1-rhel7.int (unknown [192.168.2.11]) by mail.us.es (Postfix) with ESMTP id 5024DEBADB for ; Fri, 22 Dec 2017 20:28:03 +0100 (CET) Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 049D3F7316 for ; Fri, 22 Dec 2017 20:28:03 +0100 (CET) Received: by antivirus1-rhel7.int (Postfix, from userid 99) id DA8B7F7321; Fri, 22 Dec 2017 20:28:02 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on antivirus1-rhel7.int X-Spam-Level: X-Spam-Status: No, score=-108.2 required=7.5 tests=ALL_TRUSTED,BAYES_50, SMTPAUTH_US2,USER_IN_WHITELIST autolearn=disabled version=3.4.1 Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id A775EF7306; Fri, 22 Dec 2017 20:27:59 +0100 (CET) Received: from 192.168.1.97 (192.168.1.97) by antivirus1-rhel7.int (F-Secure/fsigk_smtp/550/antivirus1-rhel7.int); Fri, 22 Dec 2017 20:27:59 +0100 (CET) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/antivirus1-rhel7.int) Received: from salvia.here (129.166.216.87.static.jazztel.es [87.216.166.129]) (Authenticated sender: pneira@us.es) by entrada.int (Postfix) with ESMTPA id 9CC794265A31; Fri, 22 Dec 2017 20:27:58 +0100 (CET) X-SMTPAUTHUS: auth mail.us.es From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: netdev@vger.kernel.org, f.fainelli@gmail.com, simon.horman@netronome.com, ronye@mellanox.com, jiri@mellanox.com, nbd@nbd.name, john@phrozen.org, kubakici@wp.pl, fw@strlen.de Subject: [PATCH nf-next, v3 1/7] netfilter: nf_tables: add flow table netlink frontend Date: Fri, 22 Dec 2017 20:27:26 +0100 Message-Id: <20171222192732.13188-2-pablo@netfilter.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171222192732.13188-1-pablo@netfilter.org> References: <20171222192732.13188-1-pablo@netfilter.org> X-Virus-Scanned: ClamAV using ClamSMTP Sender: netfilter-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org This patch introduces a netlink control plane to create, delete and dump flow tables. Flow tables are identified by name, this name is used from rules to refer to an specific flow table. Flow tables use the rhashtable class and a generic garbage collector to remove expired entries. This also adds the infrastructure to add different flow table types, so we can add one for each layer 3 protocol family. Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_flow_table.h | 23 + include/net/netfilter/nf_tables.h | 48 ++ include/uapi/linux/netfilter/nf_tables.h | 53 +++ net/netfilter/nf_tables_api.c | 747 ++++++++++++++++++++++++++++++- 4 files changed, 870 insertions(+), 1 deletion(-) create mode 100644 include/net/netfilter/nf_flow_table.h diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h new file mode 100644 index 000000000000..3a0779589281 --- /dev/null +++ b/include/net/netfilter/nf_flow_table.h @@ -0,0 +1,23 @@ +#ifndef _NF_FLOW_TABLE_H +#define _NF_FLOW_TABLE_H + +#include + +struct nf_flowtable; + +struct nf_flowtable_type { + struct list_head list; + int family; + void (*gc)(struct work_struct *work); + const struct rhashtable_params *params; + nf_hookfn *hook; + struct module *owner; +}; + +struct nf_flowtable { + struct rhashtable rhashtable; + const struct nf_flowtable_type *type; + struct delayed_work gc_work; +}; + +#endif /* _FLOW_OFFLOAD_H */ diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 0f5b12a4ad09..624928d22589 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -8,6 +8,7 @@ #include #include #include +#include #include #define NFT_JUMP_STACK_SIZE 16 @@ -942,6 +943,7 @@ unsigned int nft_do_chain(struct nft_pktinfo *pkt, void *priv); * @chains: chains in the table * @sets: sets in the table * @objects: stateful objects in the table + * @flowtables: flow tables in the table * @hgenerator: handle generator state * @use: number of chain references to this table * @flags: table flag (see enum nft_table_flags) @@ -953,6 +955,7 @@ struct nft_table { struct list_head chains; struct list_head sets; struct list_head objects; + struct list_head flowtables; u64 hgenerator; u32 use; u16 flags:14, @@ -1091,6 +1094,44 @@ int nft_register_obj(struct nft_object_type *obj_type); void nft_unregister_obj(struct nft_object_type *obj_type); /** + * struct nft_flowtable - nf_tables flow table + * + * @list: flow table list node in table list + * @table: the table the flow table is contained in + * @name: name of this flow table + * @hooknum: hook number + * @priority: hook priority + * @ops_len: number of hooks in array + * @genmask: generation mask + * @use: number of references to this flow table + * @data: rhashtable and garbage collector + * @ops: array of hooks + */ +struct nft_flowtable { + struct list_head list; + struct nft_table *table; + char *name; + int hooknum; + int priority; + int ops_len; + u32 genmask:2, + use:30; + /* runtime data below here */ + struct nf_hook_ops *ops ____cacheline_aligned; + struct nf_flowtable data; +}; + +struct nft_flowtable *nf_tables_flowtable_lookup(const struct nft_table *table, + const struct nlattr *nla, + u8 genmask); +void nft_flow_table_iterate(struct net *net, + void (*iter)(struct nf_flowtable *flowtable, void *data), + void *data); + +void nft_register_flowtable_type(struct nf_flowtable_type *type); +void nft_unregister_flowtable_type(struct nf_flowtable_type *type); + +/** * struct nft_traceinfo - nft tracing information and state * * @pkt: pktinfo currently processed @@ -1326,4 +1367,11 @@ struct nft_trans_obj { #define nft_trans_obj(trans) \ (((struct nft_trans_obj *)trans->data)->obj) +struct nft_trans_flowtable { + struct nft_flowtable *flowtable; +}; + +#define nft_trans_flowtable(trans) \ + (((struct nft_trans_flowtable *)trans->data)->flowtable) + #endif /* _NET_NF_TABLES_H */ diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index 871afa4871bf..9ba0f4c13de6 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -91,6 +91,9 @@ enum nft_verdicts { * @NFT_MSG_GETOBJ: get a stateful object (enum nft_obj_attributes) * @NFT_MSG_DELOBJ: delete a stateful object (enum nft_obj_attributes) * @NFT_MSG_GETOBJ_RESET: get and reset a stateful object (enum nft_obj_attributes) + * @NFT_MSG_NEWFLOWTABLE: add new flow table (enum nft_flowtable_attributes) + * @NFT_MSG_GETFLOWTABLE: get flow table (enum nft_flowtable_attributes) + * @NFT_MSG_DELFLOWTABLE: delete flow table (enum nft_flowtable_attributes) */ enum nf_tables_msg_types { NFT_MSG_NEWTABLE, @@ -115,6 +118,9 @@ enum nf_tables_msg_types { NFT_MSG_GETOBJ, NFT_MSG_DELOBJ, NFT_MSG_GETOBJ_RESET, + NFT_MSG_NEWFLOWTABLE, + NFT_MSG_GETFLOWTABLE, + NFT_MSG_DELFLOWTABLE, NFT_MSG_MAX, }; @@ -1307,6 +1313,53 @@ enum nft_object_attributes { #define NFTA_OBJ_MAX (__NFTA_OBJ_MAX - 1) /** + * enum nft_flowtable_attributes - nf_tables flow table netlink attributes + * + * @NFTA_FLOWTABLE_TABLE: name of the table containing the expression (NLA_STRING) + * @NFTA_FLOWTABLE_NAME: name of this flow table (NLA_STRING) + * @NFTA_FLOWTABLE_HOOK: netfilter hook configuration(NLA_U32) + * @NFTA_FLOWTABLE_USE: number of references to this flow table (NLA_U32) + */ +enum nft_flowtable_attributes { + NFTA_FLOWTABLE_UNSPEC, + NFTA_FLOWTABLE_TABLE, + NFTA_FLOWTABLE_NAME, + NFTA_FLOWTABLE_HOOK, + NFTA_FLOWTABLE_USE, + __NFTA_FLOWTABLE_MAX +}; +#define NFTA_FLOWTABLE_MAX (__NFTA_FLOWTABLE_MAX - 1) + +/** + * enum nft_flowtable_hook_attributes - nf_tables flow table hook netlink attributes + * + * @NFTA_FLOWTABLE_HOOK_NUM: netfilter hook number (NLA_U32) + * @NFTA_FLOWTABLE_HOOK_PRIORITY: netfilter hook priority (NLA_U32) + * @NFTA_FLOWTABLE_HOOK_DEVS: input devices this flow table is bound to (NLA_NESTED) + */ +enum nft_flowtable_hook_attributes { + NFTA_FLOWTABLE_HOOK_UNSPEC, + NFTA_FLOWTABLE_HOOK_NUM, + NFTA_FLOWTABLE_HOOK_PRIORITY, + NFTA_FLOWTABLE_HOOK_DEVS, + __NFTA_FLOWTABLE_HOOK_MAX +}; +#define NFTA_FLOWTABLE_HOOK_MAX (__NFTA_FLOWTABLE_HOOK_MAX - 1) + +/** + * enum nft_device_attributes - nf_tables device netlink attributes + * + * @NFTA_DEVICE_NAME: name of this device (NLA_STRING) + */ +enum nft_devices_attributes { + NFTA_DEVICE_UNSPEC, + NFTA_DEVICE_NAME, + __NFTA_DEVICE_MAX +}; +#define NFTA_DEVICE_MAX (__NFTA_DEVICE_MAX - 1) + + +/** * enum nft_trace_attributes - nf_tables trace netlink attributes * * @NFTA_TRACE_TABLE: name of the table (NLA_STRING) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 64e1ee091225..efd9405a8a5e 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -24,6 +25,7 @@ static LIST_HEAD(nf_tables_expressions); static LIST_HEAD(nf_tables_objects); +static LIST_HEAD(nf_tables_flowtables); /** * nft_register_afinfo - register nf_tables address family info @@ -348,6 +350,40 @@ static int nft_delobj(struct nft_ctx *ctx, struct nft_object *obj) return err; } +static int nft_trans_flowtable_add(struct nft_ctx *ctx, int msg_type, + struct nft_flowtable *flowtable) +{ + struct nft_trans *trans; + + trans = nft_trans_alloc(ctx, msg_type, + sizeof(struct nft_trans_flowtable)); + if (trans == NULL) + return -ENOMEM; + + if (msg_type == NFT_MSG_NEWFLOWTABLE) + nft_activate_next(ctx->net, flowtable); + + nft_trans_flowtable(trans) = flowtable; + list_add_tail(&trans->list, &ctx->net->nft.commit_list); + + return 0; +} + +static int nft_delflowtable(struct nft_ctx *ctx, + struct nft_flowtable *flowtable) +{ + int err; + + err = nft_trans_flowtable_add(ctx, NFT_MSG_DELFLOWTABLE, flowtable); + if (err < 0) + return err; + + nft_deactivate_next(ctx->net, flowtable); + ctx->table->use--; + + return err; +} + /* * Tables */ @@ -733,6 +769,7 @@ static int nf_tables_newtable(struct net *net, struct sock *nlsk, INIT_LIST_HEAD(&table->chains); INIT_LIST_HEAD(&table->sets); INIT_LIST_HEAD(&table->objects); + INIT_LIST_HEAD(&table->flowtables); table->flags = flags; nft_ctx_init(&ctx, net, skb, nlh, afi, table, NULL, nla); @@ -754,10 +791,11 @@ static int nf_tables_newtable(struct net *net, struct sock *nlsk, static int nft_flush_table(struct nft_ctx *ctx) { - int err; + struct nft_flowtable *flowtable, *nft; struct nft_chain *chain, *nc; struct nft_object *obj, *ne; struct nft_set *set, *ns; + int err; list_for_each_entry(chain, &ctx->table->chains, list) { if (!nft_is_active_next(ctx->net, chain)) @@ -783,6 +821,12 @@ static int nft_flush_table(struct nft_ctx *ctx) goto out; } + list_for_each_entry_safe(flowtable, nft, &ctx->table->flowtables, list) { + err = nft_delflowtable(ctx, flowtable); + if (err < 0) + goto out; + } + list_for_each_entry_safe(obj, ne, &ctx->table->objects, list) { err = nft_delobj(ctx, obj); if (err < 0) @@ -4779,6 +4823,605 @@ static void nf_tables_obj_notify(const struct nft_ctx *ctx, ctx->afi->family, ctx->report, GFP_KERNEL); } +/* + * Flow tables + */ +void nft_register_flowtable_type(struct nf_flowtable_type *type) +{ + nfnl_lock(NFNL_SUBSYS_NFTABLES); + list_add_tail_rcu(&type->list, &nf_tables_flowtables); + nfnl_unlock(NFNL_SUBSYS_NFTABLES); +} +EXPORT_SYMBOL_GPL(nft_register_flowtable_type); + +void nft_unregister_flowtable_type(struct nf_flowtable_type *type) +{ + nfnl_lock(NFNL_SUBSYS_NFTABLES); + list_del_rcu(&type->list); + nfnl_unlock(NFNL_SUBSYS_NFTABLES); +} +EXPORT_SYMBOL_GPL(nft_unregister_flowtable_type); + +static const struct nla_policy nft_flowtable_policy[NFTA_FLOWTABLE_MAX + 1] = { + [NFTA_FLOWTABLE_TABLE] = { .type = NLA_STRING, + .len = NFT_NAME_MAXLEN - 1 }, + [NFTA_FLOWTABLE_NAME] = { .type = NLA_STRING, + .len = NFT_NAME_MAXLEN - 1 }, + [NFTA_FLOWTABLE_HOOK] = { .type = NLA_NESTED }, +}; + +struct nft_flowtable *nf_tables_flowtable_lookup(const struct nft_table *table, + const struct nlattr *nla, + u8 genmask) +{ + struct nft_flowtable *flowtable; + + list_for_each_entry(flowtable, &table->flowtables, list) { + if (!nla_strcmp(nla, flowtable->name) && + nft_active_genmask(flowtable, genmask)) + return flowtable; + } + return ERR_PTR(-ENOENT); +} +EXPORT_SYMBOL_GPL(nf_tables_flowtable_lookup); + +#define NFT_FLOWTABLE_DEVICE_MAX 8 + +static int nf_tables_parse_devices(const struct nft_ctx *ctx, + const struct nlattr *attr, + struct net_device *dev_array[], int *len) +{ + const struct nlattr *tmp; + struct net_device *dev; + char ifname[IFNAMSIZ]; + int rem, n = 0, err; + + nla_for_each_nested(tmp, attr, rem) { + if (nla_type(tmp) != NFTA_DEVICE_NAME) { + err = -EINVAL; + goto err1; + } + + nla_strlcpy(ifname, tmp, IFNAMSIZ); + dev = dev_get_by_name(ctx->net, ifname); + if (!dev) { + err = -ENOENT; + goto err1; + } + + dev_array[n++] = dev; + if (n == NFT_FLOWTABLE_DEVICE_MAX) { + err = -EFBIG; + goto err1; + } + } + if (!len) + return -EINVAL; + + err = 0; +err1: + *len = n; + return err; +} + +static const struct nla_policy nft_flowtable_hook_policy[NFTA_FLOWTABLE_HOOK_MAX + 1] = { + [NFTA_FLOWTABLE_HOOK_NUM] = { .type = NLA_U32 }, + [NFTA_FLOWTABLE_HOOK_PRIORITY] = { .type = NLA_U32 }, + [NFTA_FLOWTABLE_HOOK_DEVS] = { .type = NLA_NESTED }, +}; + +static int nf_tables_flowtable_parse_hook(const struct nft_ctx *ctx, + const struct nlattr *attr, + struct nft_flowtable *flowtable) +{ + struct net_device *dev_array[NFT_FLOWTABLE_DEVICE_MAX]; + struct nlattr *tb[NFTA_FLOWTABLE_HOOK_MAX + 1]; + struct nf_hook_ops *ops; + int hooknum, priority; + int err, n = 0, i; + + err = nla_parse_nested(tb, NFTA_FLOWTABLE_HOOK_MAX, attr, + nft_flowtable_hook_policy, NULL); + if (err < 0) + return err; + + if (!tb[NFTA_FLOWTABLE_HOOK_NUM] || + !tb[NFTA_FLOWTABLE_HOOK_PRIORITY] || + !tb[NFTA_FLOWTABLE_HOOK_DEVS]) + return -EINVAL; + + hooknum = ntohl(nla_get_be32(tb[NFTA_FLOWTABLE_HOOK_NUM])); + if (hooknum >= ctx->afi->nhooks) + return -EINVAL; + + priority = ntohl(nla_get_be32(tb[NFTA_FLOWTABLE_HOOK_PRIORITY])); + + err = nf_tables_parse_devices(ctx, tb[NFTA_FLOWTABLE_HOOK_DEVS], + dev_array, &n); + if (err < 0) + goto err1; + + ops = kmalloc(sizeof(struct nf_hook_ops) * n, GFP_KERNEL); + if (!ops) { + err = -ENOMEM; + goto err1; + } + + flowtable->ops = ops; + flowtable->ops_len = n; + + for (i = 0; i < n; i++) { + flowtable->ops[i].pf = NFPROTO_NETDEV; + flowtable->ops[i].hooknum = hooknum; + flowtable->ops[i].priority = priority; + flowtable->ops[i].priv = &flowtable->data.rhashtable; + flowtable->ops[i].hook = flowtable->data.type->hook; + flowtable->ops[i].dev = dev_array[i]; + } + + err = 0; +err1: + for (i = 0; i < n; i++) + dev_put(dev_array[i]); + + return err; +} + +static const struct nf_flowtable_type * +__nft_flowtable_type_get(const struct nft_af_info *afi) +{ + const struct nf_flowtable_type *type; + + list_for_each_entry(type, &nf_tables_flowtables, list) { + if (afi->family == type->family) + return type; + } + return NULL; +} + +static const struct nf_flowtable_type * +nft_flowtable_type_get(const struct nft_af_info *afi) +{ + const struct nf_flowtable_type *type; + + type = __nft_flowtable_type_get(afi); + if (type != NULL && try_module_get(type->owner)) + return type; + +#ifdef CONFIG_MODULES + if (type == NULL) { + nfnl_unlock(NFNL_SUBSYS_NFTABLES); + request_module("nf-flowtable-%u", afi->family); + nfnl_lock(NFNL_SUBSYS_NFTABLES); + if (__nft_flowtable_type_get(afi)) + return ERR_PTR(-EAGAIN); + } +#endif + return ERR_PTR(-ENOENT); +} + +void nft_flow_table_iterate(struct net *net, + void (*iter)(struct nf_flowtable *flowtable, void *data), + void *data) +{ + struct nft_flowtable *flowtable; + const struct nft_af_info *afi; + const struct nft_table *table; + + rcu_read_lock(); + list_for_each_entry_rcu(afi, &net->nft.af_info, list) { + list_for_each_entry_rcu(table, &afi->tables, list) { + list_for_each_entry_rcu(flowtable, &table->flowtables, list) { + iter(&flowtable->data, data); + } + } + } + rcu_read_unlock(); +} +EXPORT_SYMBOL_GPL(nft_flow_table_iterate); + +static void nft_unregister_flowtable_net_hooks(struct net *net, + struct nft_flowtable *flowtable) +{ + int i; + + for (i = 0; i < flowtable->ops_len; i++) { + if (!flowtable->ops[i].dev) + continue; + + nf_unregister_net_hook(net, &flowtable->ops[i]); + } +} + +static int nf_tables_newflowtable(struct net *net, struct sock *nlsk, + struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[], + struct netlink_ext_ack *extack) +{ + const struct nfgenmsg *nfmsg = nlmsg_data(nlh); + const struct nf_flowtable_type *type; + u8 genmask = nft_genmask_next(net); + int family = nfmsg->nfgen_family; + struct nft_flowtable *flowtable; + struct nft_af_info *afi; + struct nft_table *table; + struct nft_ctx ctx; + int err, i, k; + + if (!nla[NFTA_FLOWTABLE_TABLE] || + !nla[NFTA_FLOWTABLE_NAME] || + !nla[NFTA_FLOWTABLE_HOOK]) + return -EINVAL; + + afi = nf_tables_afinfo_lookup(net, family, true); + if (IS_ERR(afi)) + return PTR_ERR(afi); + + table = nf_tables_table_lookup(afi, nla[NFTA_FLOWTABLE_TABLE], genmask); + if (IS_ERR(table)) + return PTR_ERR(table); + + flowtable = nf_tables_flowtable_lookup(table, nla[NFTA_FLOWTABLE_NAME], + genmask); + if (IS_ERR(flowtable)) { + err = PTR_ERR(flowtable); + if (err != -ENOENT) + return err; + } else { + if (nlh->nlmsg_flags & NLM_F_EXCL) + return -EEXIST; + + return 0; + } + + nft_ctx_init(&ctx, net, skb, nlh, afi, table, NULL, nla); + + flowtable = kzalloc(sizeof(*flowtable), GFP_KERNEL); + if (!flowtable) + return -ENOMEM; + + flowtable->table = table; + flowtable->name = nla_strdup(nla[NFTA_FLOWTABLE_NAME], GFP_KERNEL); + if (!flowtable->name) { + err = -ENOMEM; + goto err1; + } + + type = nft_flowtable_type_get(afi); + if (IS_ERR(type)) { + err = PTR_ERR(type); + goto err2; + } + + flowtable->data.type = type; + err = rhashtable_init(&flowtable->data.rhashtable, type->params); + if (err < 0) + goto err3; + + err = nf_tables_flowtable_parse_hook(&ctx, nla[NFTA_FLOWTABLE_HOOK], + flowtable); + if (err < 0) + goto err3; + + for (i = 0; i < flowtable->ops_len; i++) { + err = nf_register_net_hook(net, &flowtable->ops[i]); + if (err < 0) + goto err4; + } + + err = nft_trans_flowtable_add(&ctx, NFT_MSG_NEWFLOWTABLE, flowtable); + if (err < 0) + goto err5; + + INIT_DEFERRABLE_WORK(&flowtable->data.gc_work, type->gc); + queue_delayed_work(system_power_efficient_wq, + &flowtable->data.gc_work, HZ); + + list_add_tail_rcu(&flowtable->list, &table->flowtables); + table->use++; + + return 0; +err5: + i = flowtable->ops_len - 1; +err4: + for (k = i; k >= 0; k--) + nf_unregister_net_hook(net, &flowtable->ops[i]); + + kfree(flowtable->ops); +err3: + module_put(type->owner); +err2: + kfree(flowtable->name); +err1: + kfree(flowtable); + return err; +} + +static int nf_tables_delflowtable(struct net *net, struct sock *nlsk, + struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[], + struct netlink_ext_ack *extack) +{ + const struct nfgenmsg *nfmsg = nlmsg_data(nlh); + u8 genmask = nft_genmask_next(net); + int family = nfmsg->nfgen_family; + struct nft_flowtable *flowtable; + struct nft_af_info *afi; + struct nft_table *table; + struct nft_ctx ctx; + + afi = nf_tables_afinfo_lookup(net, family, true); + if (IS_ERR(afi)) + return PTR_ERR(afi); + + table = nf_tables_table_lookup(afi, nla[NFTA_FLOWTABLE_TABLE], genmask); + if (IS_ERR(table)) + return PTR_ERR(table); + + flowtable = nf_tables_flowtable_lookup(table, nla[NFTA_FLOWTABLE_NAME], + genmask); + if (IS_ERR(flowtable)) + return PTR_ERR(flowtable); + if (flowtable->use > 0) + return -EBUSY; + + nft_ctx_init(&ctx, net, skb, nlh, afi, table, NULL, nla); + + return nft_delflowtable(&ctx, flowtable); +} + +static int nf_tables_fill_flowtable_info(struct sk_buff *skb, struct net *net, + u32 portid, u32 seq, int event, + u32 flags, int family, + struct nft_flowtable *flowtable) +{ + struct nlattr *nest, *nest_devs; + struct nfgenmsg *nfmsg; + struct nlmsghdr *nlh; + int i; + + event = nfnl_msg_type(NFNL_SUBSYS_NFTABLES, event); + nlh = nlmsg_put(skb, portid, seq, event, sizeof(struct nfgenmsg), flags); + if (nlh == NULL) + goto nla_put_failure; + + nfmsg = nlmsg_data(nlh); + nfmsg->nfgen_family = family; + nfmsg->version = NFNETLINK_V0; + nfmsg->res_id = htons(net->nft.base_seq & 0xffff); + + if (nla_put_string(skb, NFTA_FLOWTABLE_TABLE, flowtable->table->name) || + nla_put_string(skb, NFTA_FLOWTABLE_NAME, flowtable->name) || + nla_put_be32(skb, NFTA_FLOWTABLE_USE, htonl(flowtable->use))) + goto nla_put_failure; + + nest = nla_nest_start(skb, NFTA_FLOWTABLE_HOOK); + if (nla_put_be32(skb, NFTA_FLOWTABLE_HOOK_NUM, htonl(flowtable->hooknum)) || + nla_put_be32(skb, NFTA_FLOWTABLE_HOOK_PRIORITY, htonl(flowtable->priority))) + goto nla_put_failure; + + nest_devs = nla_nest_start(skb, NFTA_FLOWTABLE_HOOK_DEVS); + if (!nest_devs) + goto nla_put_failure; + + for (i = 0; i < flowtable->ops_len; i++) { + if (flowtable->ops[i].dev && + nla_put_string(skb, NFTA_DEVICE_NAME, + flowtable->ops[i].dev->name)) + goto nla_put_failure; + } + nla_nest_end(skb, nest_devs); + nla_nest_end(skb, nest); + + nlmsg_end(skb, nlh); + return 0; + +nla_put_failure: + nlmsg_trim(skb, nlh); + return -1; +} + +struct nft_flowtable_filter { + char *table; +}; + +static int nf_tables_dump_flowtable(struct sk_buff *skb, + struct netlink_callback *cb) +{ + const struct nfgenmsg *nfmsg = nlmsg_data(cb->nlh); + struct nft_flowtable_filter *filter = cb->data; + unsigned int idx = 0, s_idx = cb->args[0]; + struct net *net = sock_net(skb->sk); + int family = nfmsg->nfgen_family; + struct nft_flowtable *flowtable; + const struct nft_af_info *afi; + const struct nft_table *table; + + rcu_read_lock(); + cb->seq = net->nft.base_seq; + + list_for_each_entry_rcu(afi, &net->nft.af_info, list) { + if (family != NFPROTO_UNSPEC && family != afi->family) + continue; + + list_for_each_entry_rcu(table, &afi->tables, list) { + list_for_each_entry_rcu(flowtable, &table->flowtables, list) { + if (!nft_is_active(net, flowtable)) + goto cont; + if (idx < s_idx) + goto cont; + if (idx > s_idx) + memset(&cb->args[1], 0, + sizeof(cb->args) - sizeof(cb->args[0])); + if (filter && filter->table[0] && + strcmp(filter->table, table->name)) + goto cont; + + if (nf_tables_fill_flowtable_info(skb, net, NETLINK_CB(cb->skb).portid, + cb->nlh->nlmsg_seq, + NFT_MSG_NEWFLOWTABLE, + NLM_F_MULTI | NLM_F_APPEND, + afi->family, flowtable) < 0) + goto done; + + nl_dump_check_consistent(cb, nlmsg_hdr(skb)); +cont: + idx++; + } + } + } +done: + rcu_read_unlock(); + + cb->args[0] = idx; + return skb->len; +} + +static int nf_tables_dump_flowtable_done(struct netlink_callback *cb) +{ + struct nft_flowtable_filter *filter = cb->data; + + if (!filter) + return 0; + + kfree(filter->table); + kfree(filter); + + return 0; +} + +static struct nft_flowtable_filter * +nft_flowtable_filter_alloc(const struct nlattr * const nla[]) +{ + struct nft_flowtable_filter *filter; + + filter = kzalloc(sizeof(*filter), GFP_KERNEL); + if (!filter) + return ERR_PTR(-ENOMEM); + + if (nla[NFTA_FLOWTABLE_TABLE]) { + filter->table = nla_strdup(nla[NFTA_FLOWTABLE_TABLE], + GFP_KERNEL); + if (!filter->table) { + kfree(filter); + return ERR_PTR(-ENOMEM); + } + } + return filter; +} + +static int nf_tables_getflowtable(struct net *net, struct sock *nlsk, + struct sk_buff *skb, + const struct nlmsghdr *nlh, + const struct nlattr * const nla[], + struct netlink_ext_ack *extack) +{ + const struct nfgenmsg *nfmsg = nlmsg_data(nlh); + u8 genmask = nft_genmask_cur(net); + int family = nfmsg->nfgen_family; + struct nft_flowtable *flowtable; + const struct nft_af_info *afi; + const struct nft_table *table; + struct sk_buff *skb2; + int err; + + if (nlh->nlmsg_flags & NLM_F_DUMP) { + struct netlink_dump_control c = { + .dump = nf_tables_dump_flowtable, + .done = nf_tables_dump_flowtable_done, + }; + + if (nla[NFTA_FLOWTABLE_TABLE]) { + struct nft_flowtable_filter *filter; + + filter = nft_flowtable_filter_alloc(nla); + if (IS_ERR(filter)) + return -ENOMEM; + + c.data = filter; + } + return netlink_dump_start(nlsk, skb, nlh, &c); + } + + if (!nla[NFTA_FLOWTABLE_NAME]) + return -EINVAL; + + afi = nf_tables_afinfo_lookup(net, family, false); + if (IS_ERR(afi)) + return PTR_ERR(afi); + + table = nf_tables_table_lookup(afi, nla[NFTA_FLOWTABLE_TABLE], genmask); + if (IS_ERR(table)) + return PTR_ERR(table); + + flowtable = nf_tables_flowtable_lookup(table, nla[NFTA_FLOWTABLE_NAME], + genmask); + if (IS_ERR(table)) + return PTR_ERR(flowtable); + + skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); + if (!skb2) + return -ENOMEM; + + err = nf_tables_fill_flowtable_info(skb2, net, NETLINK_CB(skb).portid, + nlh->nlmsg_seq, + NFT_MSG_NEWFLOWTABLE, 0, family, + flowtable); + if (err < 0) + goto err; + + return nlmsg_unicast(nlsk, skb2, NETLINK_CB(skb).portid); +err: + kfree_skb(skb2); + return err; +} + +static void nf_tables_flowtable_notify(struct nft_ctx *ctx, + struct nft_flowtable *flowtable, + int event) +{ + struct sk_buff *skb; + int err; + + if (ctx->report && + !nfnetlink_has_listeners(ctx->net, NFNLGRP_NFTABLES)) + return; + + skb = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL); + if (skb == NULL) + goto err; + + err = nf_tables_fill_flowtable_info(skb, ctx->net, ctx->portid, + ctx->seq, event, 0, + ctx->afi->family, flowtable); + if (err < 0) { + kfree_skb(skb); + goto err; + } + + nfnetlink_send(skb, ctx->net, ctx->portid, NFNLGRP_NFTABLES, + ctx->report, GFP_KERNEL); + return; +err: + nfnetlink_set_err(ctx->net, ctx->portid, NFNLGRP_NFTABLES, -ENOBUFS); +} + +static void nft_flowtable_destroy(void *ptr, void *arg) +{ + kfree(ptr); +} + +static void nf_tables_flowtable_destroy(struct nft_flowtable *flowtable) +{ + cancel_delayed_work_sync(&flowtable->data.gc_work); + kfree(flowtable->name); + rhashtable_free_and_destroy(&flowtable->data.rhashtable, + nft_flowtable_destroy, NULL); + module_put(flowtable->data.type->owner); +} + static int nf_tables_fill_gen_info(struct sk_buff *skb, struct net *net, u32 portid, u32 seq) { @@ -4809,6 +5452,49 @@ static int nf_tables_fill_gen_info(struct sk_buff *skb, struct net *net, return -EMSGSIZE; } +static void nft_flowtable_event(unsigned long event, struct net_device *dev, + struct nft_flowtable *flowtable) +{ + int i; + + for (i = 0; i < flowtable->ops_len; i++) { + if (flowtable->ops[i].dev != dev) + continue; + + nf_unregister_net_hook(dev_net(dev), &flowtable->ops[i]); + flowtable->ops[i].dev = NULL; + break; + } +} + +static int nf_tables_flowtable_event(struct notifier_block *this, + unsigned long event, void *ptr) +{ + struct net_device *dev = netdev_notifier_info_to_dev(ptr); + struct nft_flowtable *flowtable; + struct nft_table *table; + struct nft_af_info *afi; + + if (event != NETDEV_UNREGISTER) + return 0; + + nfnl_lock(NFNL_SUBSYS_NFTABLES); + list_for_each_entry(afi, &dev_net(dev)->nft.af_info, list) { + list_for_each_entry(table, &afi->tables, list) { + list_for_each_entry(flowtable, &table->flowtables, list) { + nft_flowtable_event(event, dev, flowtable); + } + } + } + nfnl_unlock(NFNL_SUBSYS_NFTABLES); + + return NOTIFY_DONE; +} + +static struct notifier_block nf_tables_flowtable_notifier = { + .notifier_call = nf_tables_flowtable_event, +}; + static void nf_tables_gen_notify(struct net *net, struct sk_buff *skb, int event) { @@ -4961,6 +5647,21 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = { .attr_count = NFTA_OBJ_MAX, .policy = nft_obj_policy, }, + [NFT_MSG_NEWFLOWTABLE] = { + .call_batch = nf_tables_newflowtable, + .attr_count = NFTA_FLOWTABLE_MAX, + .policy = nft_flowtable_policy, + }, + [NFT_MSG_GETFLOWTABLE] = { + .call = nf_tables_getflowtable, + .attr_count = NFTA_FLOWTABLE_MAX, + .policy = nft_flowtable_policy, + }, + [NFT_MSG_DELFLOWTABLE] = { + .call_batch = nf_tables_delflowtable, + .attr_count = NFTA_FLOWTABLE_MAX, + .policy = nft_flowtable_policy, + }, }; static void nft_chain_commit_update(struct nft_trans *trans) @@ -5006,6 +5707,9 @@ static void nf_tables_commit_release(struct nft_trans *trans) case NFT_MSG_DELOBJ: nft_obj_destroy(nft_trans_obj(trans)); break; + case NFT_MSG_DELFLOWTABLE: + nf_tables_flowtable_destroy(nft_trans_flowtable(trans)); + break; } kfree(trans); } @@ -5124,6 +5828,21 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) nf_tables_obj_notify(&trans->ctx, nft_trans_obj(trans), NFT_MSG_DELOBJ); break; + case NFT_MSG_NEWFLOWTABLE: + nft_clear(net, nft_trans_flowtable(trans)); + nf_tables_flowtable_notify(&trans->ctx, + nft_trans_flowtable(trans), + NFT_MSG_NEWFLOWTABLE); + nft_trans_destroy(trans); + break; + case NFT_MSG_DELFLOWTABLE: + list_del_rcu(&nft_trans_flowtable(trans)->list); + nf_tables_flowtable_notify(&trans->ctx, + nft_trans_flowtable(trans), + NFT_MSG_DELFLOWTABLE); + nft_unregister_flowtable_net_hooks(net, + nft_trans_flowtable(trans)); + break; } } @@ -5161,6 +5880,9 @@ static void nf_tables_abort_release(struct nft_trans *trans) case NFT_MSG_NEWOBJ: nft_obj_destroy(nft_trans_obj(trans)); break; + case NFT_MSG_NEWFLOWTABLE: + nf_tables_flowtable_destroy(nft_trans_flowtable(trans)); + break; } kfree(trans); } @@ -5251,6 +5973,17 @@ static int nf_tables_abort(struct net *net, struct sk_buff *skb) nft_clear(trans->ctx.net, nft_trans_obj(trans)); nft_trans_destroy(trans); break; + case NFT_MSG_NEWFLOWTABLE: + trans->ctx.table->use--; + list_del_rcu(&nft_trans_flowtable(trans)->list); + nft_unregister_flowtable_net_hooks(net, + nft_trans_flowtable(trans)); + break; + case NFT_MSG_DELFLOWTABLE: + trans->ctx.table->use++; + nft_clear(trans->ctx.net, nft_trans_flowtable(trans)); + nft_trans_destroy(trans); + break; } } @@ -5802,6 +6535,7 @@ EXPORT_SYMBOL_GPL(__nft_release_basechain); /* Called by nft_unregister_afinfo() from __net_exit path, nfnl_lock is held. */ static void __nft_release_afinfo(struct net *net, struct nft_af_info *afi) { + struct nft_flowtable *flowtable, *nf; struct nft_table *table, *nt; struct nft_chain *chain, *nc; struct nft_object *obj, *ne; @@ -5816,6 +6550,9 @@ static void __nft_release_afinfo(struct net *net, struct nft_af_info *afi) list_for_each_entry(chain, &table->chains, list) nf_tables_unregister_hooks(net, table, chain, afi->nops); + list_for_each_entry(flowtable, &table->flowtables, list) + nf_unregister_net_hooks(net, flowtable->ops, + flowtable->ops_len); /* No packets are walking on these chains anymore. */ ctx.table = table; list_for_each_entry(chain, &table->chains, list) { @@ -5826,6 +6563,11 @@ static void __nft_release_afinfo(struct net *net, struct nft_af_info *afi) nf_tables_rule_destroy(&ctx, rule); } } + list_for_each_entry_safe(flowtable, nf, &table->flowtables, list) { + list_del(&flowtable->list); + table->use--; + nf_tables_flowtable_destroy(flowtable); + } list_for_each_entry_safe(set, ns, &table->sets, list) { list_del(&set->list); table->use--; @@ -5869,6 +6611,8 @@ static int __init nf_tables_module_init(void) if (err < 0) goto err3; + register_netdevice_notifier(&nf_tables_flowtable_notifier); + pr_info("nf_tables: (c) 2007-2009 Patrick McHardy \n"); return register_pernet_subsys(&nf_tables_net_ops); err3: @@ -5883,6 +6627,7 @@ static void __exit nf_tables_module_exit(void) { unregister_pernet_subsys(&nf_tables_net_ops); nfnetlink_subsys_unregister(&nf_tables_subsys); + unregister_netdevice_notifier(&nf_tables_flowtable_notifier); rcu_barrier(); nf_tables_core_module_exit(); kfree(info); From patchwork Fri Dec 22 19:27:27 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 852535 X-Patchwork-Delegate: pablo@netfilter.org Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netfilter-devel-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3z3JTQ4CG2z9s74 for ; Sat, 23 Dec 2017 06:28:42 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756802AbdLVT2N (ORCPT ); Fri, 22 Dec 2017 14:28:13 -0500 Received: from mail.us.es ([193.147.175.20]:42332 "EHLO mail.us.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756706AbdLVT2F (ORCPT ); Fri, 22 Dec 2017 14:28:05 -0500 Received: from antivirus1-rhel7.int (unknown [192.168.2.11]) by mail.us.es (Postfix) with ESMTP id E6BDAEBAE9 for ; Fri, 22 Dec 2017 20:28:03 +0100 (CET) Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id D3C53F7316 for ; Fri, 22 Dec 2017 20:28:03 +0100 (CET) Received: by antivirus1-rhel7.int (Postfix, from userid 99) id C9F24F7320; Fri, 22 Dec 2017 20:28:03 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on antivirus1-rhel7.int X-Spam-Level: X-Spam-Status: No, score=-108.2 required=7.5 tests=ALL_TRUSTED,BAYES_50, SMTPAUTH_US2,USER_IN_WHITELIST autolearn=disabled version=3.4.1 Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 1B540F7310; Fri, 22 Dec 2017 20:28:01 +0100 (CET) Received: from 192.168.1.97 (192.168.1.97) by antivirus1-rhel7.int (F-Secure/fsigk_smtp/550/antivirus1-rhel7.int); Fri, 22 Dec 2017 20:28:01 +0100 (CET) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/antivirus1-rhel7.int) Received: from salvia.here (129.166.216.87.static.jazztel.es [87.216.166.129]) (Authenticated sender: pneira@us.es) by entrada.int (Postfix) with ESMTPA id 2EFFB4265A31; Fri, 22 Dec 2017 20:28:00 +0100 (CET) X-SMTPAUTHUS: auth mail.us.es From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: netdev@vger.kernel.org, f.fainelli@gmail.com, simon.horman@netronome.com, ronye@mellanox.com, jiri@mellanox.com, nbd@nbd.name, john@phrozen.org, kubakici@wp.pl, fw@strlen.de Subject: [PATCH nf-next, v3 2/7] netfilter: add generic flow table infrastructure Date: Fri, 22 Dec 2017 20:27:27 +0100 Message-Id: <20171222192732.13188-3-pablo@netfilter.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171222192732.13188-1-pablo@netfilter.org> References: <20171222192732.13188-1-pablo@netfilter.org> X-Virus-Scanned: ClamAV using ClamSMTP Sender: netfilter-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org This patch defines the API to interact with flow tables, this allows to add, delete and lookup for entries in the flow table. This also adds the generic garbage code that removes entries that have expired, ie. no traffic has been seen for a while. Users of the flow table infrastructure can delete entries via flow_offload_dead(), which sets the dying bit, this signals the garbage collector to release an entry from user context. Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_flow_table.h | 94 ++++++++ net/netfilter/Kconfig | 7 + net/netfilter/Makefile | 3 + net/netfilter/nf_flow_table.c | 434 ++++++++++++++++++++++++++++++++++ 4 files changed, 538 insertions(+) create mode 100644 net/netfilter/nf_flow_table.c diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index 3a0779589281..161f71ca78a0 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -1,7 +1,12 @@ #ifndef _NF_FLOW_TABLE_H #define _NF_FLOW_TABLE_H +#include +#include +#include #include +#include +#include struct nf_flowtable; @@ -20,4 +25,93 @@ struct nf_flowtable { struct delayed_work gc_work; }; +enum flow_offload_tuple_dir { + FLOW_OFFLOAD_DIR_ORIGINAL, + FLOW_OFFLOAD_DIR_REPLY, + __FLOW_OFFLOAD_DIR_MAX = FLOW_OFFLOAD_DIR_REPLY, +}; +#define FLOW_OFFLOAD_DIR_MAX (__FLOW_OFFLOAD_DIR_MAX + 1) + +struct flow_offload_tuple { + union { + struct in_addr src_v4; + struct in6_addr src_v6; + }; + union { + struct in_addr dst_v4; + struct in6_addr dst_v6; + }; + struct { + __be16 src_port; + __be16 dst_port; + }; + + int iifidx; + + u8 l3proto; + u8 l4proto; + u8 dir; + + int oifidx; + + struct dst_entry *dst_cache; +}; + +struct flow_offload_tuple_rhash { + struct rhash_head node; + struct flow_offload_tuple tuple; +}; + +#define FLOW_OFFLOAD_SNAT 0x1 +#define FLOW_OFFLOAD_DNAT 0x2 +#define FLOW_OFFLOAD_DYING 0x4 + +struct flow_offload { + struct flow_offload_tuple_rhash tuplehash[FLOW_OFFLOAD_DIR_MAX]; + u32 flags; + union { + /* Your private driver data here. */ + u32 timeout; + }; +}; + +#define NF_FLOW_TIMEOUT (30 * HZ) + +struct nf_flow_route { + struct { + struct dst_entry *dst; + int ifindex; + } tuple[FLOW_OFFLOAD_DIR_MAX]; +}; + +struct flow_offload *flow_offload_alloc(struct nf_conn *ct, + struct nf_flow_route *route); +void flow_offload_free(struct flow_offload *flow); + +int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow); +void flow_offload_del(struct nf_flowtable *flow_table, struct flow_offload *flow); +struct flow_offload_tuple_rhash *flow_offload_lookup(struct nf_flowtable *flow_table, + struct flow_offload_tuple *tuple); +int nf_flow_table_iterate(struct nf_flowtable *flow_table, + void (*iter)(struct flow_offload *flow, void *data), + void *data); +void nf_flow_offload_work_gc(struct work_struct *work); +extern const struct rhashtable_params nf_flow_offload_rhash_params; + +void flow_offload_dead(struct flow_offload *flow); + +int nf_flow_snat_port(const struct flow_offload *flow, + struct sk_buff *skb, unsigned int thoff, + u8 protocol, enum flow_offload_tuple_dir dir); +int nf_flow_dnat_port(const struct flow_offload *flow, + struct sk_buff *skb, unsigned int thoff, + u8 protocol, enum flow_offload_tuple_dir dir); + +struct flow_ports { + __be16 source, dest; +}; + +#define MODULE_ALIAS_NF_FLOWTABLE(family) \ + MODULE_ALIAS("nf-flowtable-" __stringify(family)) + #endif /* _FLOW_OFFLOAD_H */ diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index e4a13cc8a2e7..af0f58322515 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -649,6 +649,13 @@ endif # NF_TABLES_NETDEV endif # NF_TABLES +config NF_FLOW_TABLE + tristate "Netfilter flow table module" + help + This option adds the flow table core infrastructure. + + To compile it as a module, choose M here. + config NETFILTER_XTABLES tristate "Netfilter Xtables support (required for ip_tables)" default m if NETFILTER_ADVANCED=n diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index d3891c93edd6..1f7d92bd571a 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -106,6 +106,9 @@ obj-$(CONFIG_NFT_FIB_NETDEV) += nft_fib_netdev.o obj-$(CONFIG_NFT_DUP_NETDEV) += nft_dup_netdev.o obj-$(CONFIG_NFT_FWD_NETDEV) += nft_fwd_netdev.o +# flow table infrastructure +obj-$(CONFIG_NF_FLOW_TABLE) += nf_flow_table.o + # generic X tables obj-$(CONFIG_NETFILTER_XTABLES) += x_tables.o xt_tcpudp.o diff --git a/net/netfilter/nf_flow_table.c b/net/netfilter/nf_flow_table.c new file mode 100644 index 000000000000..e1024b17b910 --- /dev/null +++ b/net/netfilter/nf_flow_table.c @@ -0,0 +1,434 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +struct flow_offload_entry { + struct flow_offload flow; + struct nf_conn *ct; + struct rcu_head rcu_head; +}; + +struct flow_offload * +flow_offload_alloc(struct nf_conn *ct, struct nf_flow_route *route) +{ + struct flow_offload_entry *entry; + struct flow_offload *flow; + + if (unlikely(nf_ct_is_dying(ct) || + !atomic_inc_not_zero(&ct->ct_general.use))) + return NULL; + + entry = kzalloc(sizeof(*entry), GFP_ATOMIC); + if (!entry) + goto err_ct_refcnt; + + flow = &entry->flow; + + if (!dst_hold_safe(route->tuple[FLOW_OFFLOAD_DIR_ORIGINAL].dst)) + goto err_dst_cache_original; + + if (!dst_hold_safe(route->tuple[FLOW_OFFLOAD_DIR_REPLY].dst)) + goto err_dst_cache_reply; + + entry->ct = ct; + + switch (ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.l3num) { + case NFPROTO_IPV4: + flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.src_v4 = + ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.in; + flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_v4 = + ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u3.in; + flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_v4 = + ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3.in; + flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_v4 = + ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3.in; + flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.l3proto = + ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.l3num; + flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.l4proto = + ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.protonum; + flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.l3proto = + ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.l3num; + flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.l4proto = + ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.protonum; + break; + case NFPROTO_IPV6: + flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.src_v6 = + ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u3.in6; + flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_v6 = + ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u3.in6; + flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_v6 = + ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3.in6; + flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_v6 = + ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3.in6; + flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.l3proto = + ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.l3num; + flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.l4proto = + ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.protonum; + flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.l3proto = + ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.l3num; + flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.l4proto = + ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.protonum; + break; + } + + flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_cache = + route->tuple[FLOW_OFFLOAD_DIR_ORIGINAL].dst; + flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_cache = + route->tuple[FLOW_OFFLOAD_DIR_REPLY].dst; + + flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.src_port = + ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.src.u.tcp.port; + flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_port = + ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.u.tcp.port; + flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_port = + ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u.tcp.port; + flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_port = + ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u.tcp.port; + + flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dir = + FLOW_OFFLOAD_DIR_ORIGINAL; + flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dir = + FLOW_OFFLOAD_DIR_REPLY; + + flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.iifidx = + route->tuple[FLOW_OFFLOAD_DIR_ORIGINAL].ifindex; + flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.oifidx = + route->tuple[FLOW_OFFLOAD_DIR_REPLY].ifindex; + flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.iifidx = + route->tuple[FLOW_OFFLOAD_DIR_REPLY].ifindex; + flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.oifidx = + route->tuple[FLOW_OFFLOAD_DIR_ORIGINAL].ifindex; + + if (ct->status & IPS_SRC_NAT) + flow->flags |= FLOW_OFFLOAD_SNAT; + else if (ct->status & IPS_DST_NAT) + flow->flags |= FLOW_OFFLOAD_DNAT; + + return flow; + +err_dst_cache_reply: + dst_release(route->tuple[FLOW_OFFLOAD_DIR_ORIGINAL].dst); +err_dst_cache_original: + kfree(entry); +err_ct_refcnt: + nf_ct_put(ct); + + return NULL; +} +EXPORT_SYMBOL_GPL(flow_offload_alloc); + +void flow_offload_free(struct flow_offload *flow) +{ + struct flow_offload_entry *e; + + dst_release(flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_cache); + dst_release(flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_cache); + e = container_of(flow, struct flow_offload_entry, flow); + kfree(e); +} +EXPORT_SYMBOL_GPL(flow_offload_free); + +void flow_offload_dead(struct flow_offload *flow) +{ + flow->flags |= FLOW_OFFLOAD_DYING; +} +EXPORT_SYMBOL_GPL(flow_offload_dead); + +int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow) +{ + flow->timeout = (u32)jiffies; + + rhashtable_insert_fast(&flow_table->rhashtable, + &flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].node, + *flow_table->type->params); + rhashtable_insert_fast(&flow_table->rhashtable, + &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].node, + *flow_table->type->params); + return 0; +} +EXPORT_SYMBOL_GPL(flow_offload_add); + +void flow_offload_del(struct nf_flowtable *flow_table, + struct flow_offload *flow) +{ + struct flow_offload_entry *e; + + rhashtable_remove_fast(&flow_table->rhashtable, + &flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].node, + *flow_table->type->params); + rhashtable_remove_fast(&flow_table->rhashtable, + &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].node, + *flow_table->type->params); + + e = container_of(flow, struct flow_offload_entry, flow); + kfree_rcu(e, rcu_head); +} +EXPORT_SYMBOL_GPL(flow_offload_del); + +struct flow_offload_tuple_rhash * +flow_offload_lookup(struct nf_flowtable *flow_table, + struct flow_offload_tuple *tuple) +{ + return rhashtable_lookup_fast(&flow_table->rhashtable, tuple, + *flow_table->type->params); +} +EXPORT_SYMBOL_GPL(flow_offload_lookup); + +static void nf_flow_release_ct(const struct flow_offload *flow) +{ + struct flow_offload_entry *e; + + e = container_of(flow, struct flow_offload_entry, flow); + nf_ct_delete(e->ct, 0, 0); + nf_ct_put(e->ct); +} + +int nf_flow_table_iterate(struct nf_flowtable *flow_table, + void (*iter)(struct flow_offload *flow, void *data), + void *data) +{ + struct flow_offload_tuple_rhash *tuplehash; + struct rhashtable_iter hti; + struct flow_offload *flow; + int err; + + rhashtable_walk_init(&flow_table->rhashtable, &hti, GFP_KERNEL); + err = rhashtable_walk_start(&hti); + if (err && err != -EAGAIN) + goto out; + + while ((tuplehash = rhashtable_walk_next(&hti))) { + if (IS_ERR(tuplehash)) { + err = PTR_ERR(tuplehash); + if (err != -EAGAIN) + goto out; + + continue; + } + if (tuplehash->tuple.dir) + continue; + + flow = container_of(tuplehash, struct flow_offload, tuplehash[0]); + + iter(flow, data); + } +out: + rhashtable_walk_stop(&hti); + rhashtable_walk_exit(&hti); + + return err; +} +EXPORT_SYMBOL_GPL(nf_flow_table_iterate); + +static inline bool nf_flow_has_expired(const struct flow_offload *flow) +{ + return (__s32)(flow->timeout - (u32)jiffies) <= 0; +} + +static inline bool nf_flow_is_dying(const struct flow_offload *flow) +{ + return flow->flags & FLOW_OFFLOAD_DYING; +} + +void nf_flow_offload_work_gc(struct work_struct *work) +{ + struct flow_offload_tuple_rhash *tuplehash; + struct nf_flowtable *flow_table; + struct rhashtable_iter hti; + struct flow_offload *flow; + int err; + + flow_table = container_of(work, struct nf_flowtable, gc_work.work); + + rhashtable_walk_init(&flow_table->rhashtable, &hti, GFP_KERNEL); + err = rhashtable_walk_start(&hti); + if (err && err != -EAGAIN) + goto out; + + while ((tuplehash = rhashtable_walk_next(&hti))) { + if (IS_ERR(tuplehash)) { + err = PTR_ERR(tuplehash); + if (err != -EAGAIN) + goto out; + + continue; + } + if (tuplehash->tuple.dir) + continue; + + flow = container_of(tuplehash, struct flow_offload, tuplehash[0]); + + if (nf_flow_has_expired(flow) || + nf_flow_is_dying(flow)) { + flow_offload_del(flow_table, flow); + nf_flow_release_ct(flow); + } + } + + rhashtable_walk_stop(&hti); + rhashtable_walk_exit(&hti); +out: + queue_delayed_work(system_power_efficient_wq, &flow_table->gc_work, HZ); +} +EXPORT_SYMBOL_GPL(nf_flow_offload_work_gc); + +static u32 flow_offload_hash(const void *data, u32 len, u32 seed) +{ + const struct flow_offload_tuple *tuple = data; + + return jhash(tuple, offsetof(struct flow_offload_tuple, dir), seed); +} + +static u32 flow_offload_hash_obj(const void *data, u32 len, u32 seed) +{ + const struct flow_offload_tuple_rhash *tuplehash = data; + + return jhash(&tuplehash->tuple, offsetof(struct flow_offload_tuple, dir), seed); +} + +static int flow_offload_hash_cmp(struct rhashtable_compare_arg *arg, + const void *ptr) +{ + const struct flow_offload_tuple *tuple = arg->key; + const struct flow_offload_tuple_rhash *x = ptr; + + if (memcmp(&x->tuple, tuple, offsetof(struct flow_offload_tuple, dir))) + return 1; + + return 0; +} + +const struct rhashtable_params nf_flow_offload_rhash_params = { + .head_offset = offsetof(struct flow_offload_tuple_rhash, node), + .hashfn = flow_offload_hash, + .obj_hashfn = flow_offload_hash_obj, + .obj_cmpfn = flow_offload_hash_cmp, + .automatic_shrinking = true, +}; +EXPORT_SYMBOL_GPL(nf_flow_offload_rhash_params); + +static int nf_flow_nat_port_tcp(struct sk_buff *skb, unsigned int thoff, + __be16 port, __be16 new_port) +{ + struct tcphdr *tcph; + + if (!pskb_may_pull(skb, thoff + sizeof(*tcph)) || + skb_try_make_writable(skb, thoff + sizeof(*tcph))) + return -1; + + tcph = (void *)(skb_network_header(skb) + thoff); + inet_proto_csum_replace2(&tcph->check, skb, port, new_port, true); + + return 0; +} + +static int nf_flow_nat_port_udp(struct sk_buff *skb, unsigned int thoff, + __be16 port, __be16 new_port) +{ + struct udphdr *udph; + + if (!pskb_may_pull(skb, thoff + sizeof(*udph)) || + skb_try_make_writable(skb, thoff + sizeof(*udph))) + return -1; + + udph = (void *)(skb_network_header(skb) + thoff); + if (udph->check || skb->ip_summed == CHECKSUM_PARTIAL) { + inet_proto_csum_replace2(&udph->check, skb, port, + new_port, true); + if (!udph->check) + udph->check = CSUM_MANGLED_0; + } + + return 0; +} + +static int nf_flow_nat_port(struct sk_buff *skb, unsigned int thoff, + u8 protocol, __be16 port, __be16 new_port) +{ + switch (protocol) { + case IPPROTO_TCP: + if (nf_flow_nat_port_tcp(skb, thoff, port, new_port) < 0) + return NF_DROP; + break; + case IPPROTO_UDP: + if (nf_flow_nat_port_udp(skb, thoff, port, new_port) < 0) + return NF_DROP; + break; + } + + return 0; +} + +int nf_flow_snat_port(const struct flow_offload *flow, + struct sk_buff *skb, unsigned int thoff, + u8 protocol, enum flow_offload_tuple_dir dir) +{ + struct flow_ports *hdr; + __be16 port, new_port; + + if (!pskb_may_pull(skb, thoff + sizeof(*hdr)) || + skb_try_make_writable(skb, thoff + sizeof(*hdr))) + return -1; + + hdr = (void *)(skb_network_header(skb) + thoff); + + switch (dir) { + case FLOW_OFFLOAD_DIR_ORIGINAL: + port = hdr->source; + new_port = flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_port; + hdr->source = new_port; + break; + case FLOW_OFFLOAD_DIR_REPLY: + port = hdr->dest; + new_port = flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.src_port; + hdr->dest = new_port; + break; + default: + return -1; + } + + return nf_flow_nat_port(skb, thoff, protocol, port, new_port); +} +EXPORT_SYMBOL_GPL(nf_flow_snat_port); + +int nf_flow_dnat_port(const struct flow_offload *flow, + struct sk_buff *skb, unsigned int thoff, + u8 protocol, enum flow_offload_tuple_dir dir) +{ + struct flow_ports *hdr; + __be16 port, new_port; + + if (!pskb_may_pull(skb, thoff + sizeof(*hdr)) || + skb_try_make_writable(skb, thoff + sizeof(*hdr))) + return -1; + + hdr = (void *)(skb_network_header(skb) + thoff); + + switch (dir) { + case FLOW_OFFLOAD_DIR_ORIGINAL: + port = hdr->dest; + new_port = flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_port; + hdr->dest = new_port; + break; + case FLOW_OFFLOAD_DIR_REPLY: + port = hdr->source; + new_port = flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_port; + hdr->source = new_port; + break; + default: + return -1; + } + + return nf_flow_nat_port(skb, thoff, protocol, port, new_port); +} +EXPORT_SYMBOL_GPL(nf_flow_dnat_port); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Pablo Neira Ayuso "); From patchwork Fri Dec 22 19:27:28 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 852538 X-Patchwork-Delegate: pablo@netfilter.org Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netfilter-devel-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3z3JTg0Nj5z9s74 for ; Sat, 23 Dec 2017 06:28:53 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756764AbdLVT2J (ORCPT ); Fri, 22 Dec 2017 14:28:09 -0500 Received: from mail.us.es ([193.147.175.20]:42352 "EHLO mail.us.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756720AbdLVT2G (ORCPT ); Fri, 22 Dec 2017 14:28:06 -0500 Received: from antivirus1-rhel7.int (unknown [192.168.2.11]) by mail.us.es (Postfix) with ESMTP id 0CEC4EBAE3 for ; Fri, 22 Dec 2017 20:28:05 +0100 (CET) Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id EFF22F7323 for ; Fri, 22 Dec 2017 20:28:04 +0100 (CET) Received: by antivirus1-rhel7.int (Postfix, from userid 99) id DF6FEF731D; Fri, 22 Dec 2017 20:28:04 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on antivirus1-rhel7.int X-Spam-Level: X-Spam-Status: No, score=-108.2 required=7.5 tests=ALL_TRUSTED,BAYES_50, SMTPAUTH_US2,USER_IN_WHITELIST autolearn=disabled version=3.4.1 Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 70849F730E; Fri, 22 Dec 2017 20:28:02 +0100 (CET) Received: from 192.168.1.97 (192.168.1.97) by antivirus1-rhel7.int (F-Secure/fsigk_smtp/550/antivirus1-rhel7.int); Fri, 22 Dec 2017 20:28:02 +0100 (CET) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/antivirus1-rhel7.int) Received: from salvia.here (129.166.216.87.static.jazztel.es [87.216.166.129]) (Authenticated sender: pneira@us.es) by entrada.int (Postfix) with ESMTPA id 8A42B4265A31; Fri, 22 Dec 2017 20:28:01 +0100 (CET) X-SMTPAUTHUS: auth mail.us.es From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: netdev@vger.kernel.org, f.fainelli@gmail.com, simon.horman@netronome.com, ronye@mellanox.com, jiri@mellanox.com, nbd@nbd.name, john@phrozen.org, kubakici@wp.pl, fw@strlen.de Subject: [PATCH nf-next,v3 3/7] netfilter: flow table support for IPv4 Date: Fri, 22 Dec 2017 20:27:28 +0100 Message-Id: <20171222192732.13188-4-pablo@netfilter.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171222192732.13188-1-pablo@netfilter.org> References: <20171222192732.13188-1-pablo@netfilter.org> X-Virus-Scanned: ClamAV using ClamSMTP Sender: netfilter-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org This patch adds the IPv4 flow table type, that implements the datapath flow table to forward IPv4 traffic. Rationale is: 1) Look up for the packet in the flow table, from the ingress hook. 2) If there's a hit, decrement ttl and pass it on to the neighbour layer for transmission. 3) If there's a miss, packet is passed up to the classic forwarding path. This patch also supports layer 3 source and destination NAT. Signed-off-by: Pablo Neira Ayuso --- net/ipv4/netfilter/Kconfig | 8 + net/ipv4/netfilter/Makefile | 3 + net/ipv4/netfilter/nf_flow_table_ipv4.c | 283 ++++++++++++++++++++++++++++++++ 3 files changed, 294 insertions(+) create mode 100644 net/ipv4/netfilter/nf_flow_table_ipv4.c diff --git a/net/ipv4/netfilter/Kconfig b/net/ipv4/netfilter/Kconfig index c11eb1744ab1..7270771f9565 100644 --- a/net/ipv4/netfilter/Kconfig +++ b/net/ipv4/netfilter/Kconfig @@ -77,6 +77,14 @@ config NF_TABLES_ARP endif # NF_TABLES +config NF_FLOW_TABLE_IPV4 + select NF_FLOW_TABLE + tristate "Netfilter flow table IPv4 module" + help + This option adds the flow table IPv4 support. + + To compile it as a module, choose M here. + config NF_DUP_IPV4 tristate "Netfilter IPv4 packet duplication to alternate destination" depends on !NF_CONNTRACK || NF_CONNTRACK diff --git a/net/ipv4/netfilter/Makefile b/net/ipv4/netfilter/Makefile index f462fee66ac8..116745275dc0 100644 --- a/net/ipv4/netfilter/Makefile +++ b/net/ipv4/netfilter/Makefile @@ -42,6 +42,9 @@ obj-$(CONFIG_NFT_REDIR_IPV4) += nft_redir_ipv4.o obj-$(CONFIG_NFT_DUP_IPV4) += nft_dup_ipv4.o obj-$(CONFIG_NF_TABLES_ARP) += nf_tables_arp.o +# flow table support +obj-$(CONFIG_NF_FLOW_TABLE_IPV4) += nf_flow_table_ipv4.o + # generic IP tables obj-$(CONFIG_IP_NF_IPTABLES) += ip_tables.o diff --git a/net/ipv4/netfilter/nf_flow_table_ipv4.c b/net/ipv4/netfilter/nf_flow_table_ipv4.c new file mode 100644 index 000000000000..ac56c0f0492a --- /dev/null +++ b/net/ipv4/netfilter/nf_flow_table_ipv4.c @@ -0,0 +1,283 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +/* For layer 4 checksum field offset. */ +#include +#include + +static int nf_flow_nat_ip_tcp(struct sk_buff *skb, unsigned int thoff, + __be32 addr, __be32 new_addr) +{ + struct tcphdr *tcph; + + if (!pskb_may_pull(skb, thoff + sizeof(*tcph)) || + skb_try_make_writable(skb, thoff + sizeof(*tcph))) + return -1; + + tcph = (void *)(skb_network_header(skb) + thoff); + inet_proto_csum_replace4(&tcph->check, skb, addr, new_addr, true); + + return 0; +} + +static int nf_flow_nat_ip_udp(struct sk_buff *skb, unsigned int thoff, + __be32 addr, __be32 new_addr) +{ + struct udphdr *udph; + + if (!pskb_may_pull(skb, thoff + sizeof(*udph)) || + skb_try_make_writable(skb, thoff + sizeof(*udph))) + return -1; + + udph = (void *)(skb_network_header(skb) + thoff); + if (udph->check || skb->ip_summed == CHECKSUM_PARTIAL) { + inet_proto_csum_replace4(&udph->check, skb, addr, + new_addr, true); + if (!udph->check) + udph->check = CSUM_MANGLED_0; + } + + return 0; +} + +static int nf_flow_nat_ip_l4proto(struct sk_buff *skb, struct iphdr *iph, + unsigned int thoff, __be32 addr, + __be32 new_addr) +{ + switch (iph->protocol) { + case IPPROTO_TCP: + if (nf_flow_nat_ip_tcp(skb, thoff, addr, new_addr) < 0) + return NF_DROP; + break; + case IPPROTO_UDP: + if (nf_flow_nat_ip_udp(skb, thoff, addr, new_addr) < 0) + return NF_DROP; + break; + } + + return 0; +} + +static int nf_flow_snat_ip(const struct flow_offload *flow, struct sk_buff *skb, + struct iphdr *iph, unsigned int thoff, + enum flow_offload_tuple_dir dir) +{ + __be32 addr, new_addr; + + switch (dir) { + case FLOW_OFFLOAD_DIR_ORIGINAL: + addr = iph->saddr; + new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_v4.s_addr; + iph->saddr = new_addr; + break; + case FLOW_OFFLOAD_DIR_REPLY: + addr = iph->daddr; + new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.src_v4.s_addr; + iph->daddr = new_addr; + break; + default: + return -1; + } + csum_replace4(&iph->check, addr, new_addr); + + return nf_flow_nat_ip_l4proto(skb, iph, thoff, addr, new_addr); +} + +static int nf_flow_dnat_ip(const struct flow_offload *flow, struct sk_buff *skb, + struct iphdr *iph, unsigned int thoff, + enum flow_offload_tuple_dir dir) +{ + __be32 addr, new_addr; + + switch (dir) { + case FLOW_OFFLOAD_DIR_ORIGINAL: + addr = iph->daddr; + new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_v4.s_addr; + iph->daddr = new_addr; + break; + case FLOW_OFFLOAD_DIR_REPLY: + addr = iph->saddr; + new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_v4.s_addr; + iph->saddr = new_addr; + break; + default: + return -1; + } + + return nf_flow_nat_ip_l4proto(skb, iph, thoff, addr, new_addr); +} + +static int nf_flow_nat_ip(const struct flow_offload *flow, struct sk_buff *skb, + enum flow_offload_tuple_dir dir) +{ + struct iphdr *iph = ip_hdr(skb); + unsigned int thoff = iph->ihl * 4; + + if (flow->flags & FLOW_OFFLOAD_SNAT && + (nf_flow_snat_port(flow, skb, thoff, iph->protocol, dir) < 0 || + nf_flow_snat_ip(flow, skb, iph, thoff, dir) < 0)) + return -1; + if (flow->flags & FLOW_OFFLOAD_DNAT && + (nf_flow_dnat_port(flow, skb, thoff, iph->protocol, dir) < 0 || + nf_flow_dnat_ip(flow, skb, iph, thoff, dir) < 0)) + return -1; + + return 0; +} + +static bool ip_has_options(unsigned int thoff) +{ + return thoff != sizeof(struct iphdr); +} + +static int nf_flow_tuple_ip(struct sk_buff *skb, const struct net_device *dev, + struct flow_offload_tuple *tuple) +{ + struct flow_ports *ports; + unsigned int thoff; + struct iphdr *iph; + + if (!pskb_may_pull(skb, sizeof(*iph))) + return -1; + + iph = ip_hdr(skb); + thoff = iph->ihl * 4; + + if (ip_is_fragment(iph) || + unlikely(ip_has_options(thoff))) + return -1; + + if (iph->protocol != IPPROTO_TCP && + iph->protocol != IPPROTO_UDP) + return -1; + + thoff = iph->ihl * 4; + if (!pskb_may_pull(skb, thoff + sizeof(*ports))) + return -1; + + ports = (struct flow_ports *)(skb_network_header(skb) + thoff); + + tuple->src_v4.s_addr = iph->saddr; + tuple->dst_v4.s_addr = iph->daddr; + tuple->src_port = ports->source; + tuple->dst_port = ports->dest; + tuple->l3proto = AF_INET; + tuple->l4proto = iph->protocol; + tuple->iifidx = dev->ifindex; + + return 0; +} + +/* Based on ip_exceeds_mtu(). */ +static bool __nf_flow_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu) +{ + if (skb->len <= mtu) + return false; + + if ((ip_hdr(skb)->frag_off & htons(IP_DF)) == 0) + return false; + + if (skb_is_gso(skb) && skb_gso_validate_mtu(skb, mtu)) + return false; + + return true; +} + +static bool nf_flow_exceeds_mtu(struct sk_buff *skb, const struct rtable *rt) +{ + u32 mtu; + + mtu = ip_dst_mtu_maybe_forward(&rt->dst, true); + if (__nf_flow_exceeds_mtu(skb, mtu)) + return true; + + return false; +} + +static unsigned int +nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb, + const struct nf_hook_state *state) +{ + struct flow_offload_tuple_rhash *tuplehash; + struct nf_flowtable *flow_table = priv; + struct flow_offload_tuple tuple = {}; + enum flow_offload_tuple_dir dir; + struct flow_offload *flow; + struct net_device *outdev; + const struct rtable *rt; + struct iphdr *iph; + __be32 nexthop; + + if (skb->protocol != htons(ETH_P_IP)) + return NF_ACCEPT; + + if (nf_flow_tuple_ip(skb, state->in, &tuple) < 0) + return NF_ACCEPT; + + tuplehash = flow_offload_lookup(flow_table, &tuple); + if (tuplehash == NULL) + return NF_ACCEPT; + + outdev = dev_get_by_index_rcu(state->net, tuplehash->tuple.oifidx); + if (!outdev) + return NF_ACCEPT; + + dir = tuplehash->tuple.dir; + flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]); + + rt = (const struct rtable *)flow->tuplehash[dir].tuple.dst_cache; + if (unlikely(nf_flow_exceeds_mtu(skb, rt))) + return NF_ACCEPT; + + if (skb_try_make_writable(skb, sizeof(*iph))) + return NF_DROP; + + if (flow->flags & (FLOW_OFFLOAD_SNAT | FLOW_OFFLOAD_DNAT) && + nf_flow_nat_ip(flow, skb, dir) < 0) + return NF_DROP; + + flow->timeout = (u32)jiffies + NF_FLOW_TIMEOUT; + iph = ip_hdr(skb); + ip_decrease_ttl(iph); + + skb->dev = outdev; + nexthop = rt_nexthop(rt, flow->tuplehash[!dir].tuple.src_v4.s_addr); + neigh_xmit(NEIGH_ARP_TABLE, outdev, &nexthop, skb); + + return NF_STOLEN; +} + +static struct nf_flowtable_type flowtable_ipv4 = { + .family = NFPROTO_IPV4, + .params = &nf_flow_offload_rhash_params, + .gc = nf_flow_offload_work_gc, + .hook = nf_flow_offload_ip_hook, + .owner = THIS_MODULE, +}; + +static int __init nf_flow_ipv4_module_init(void) +{ + nft_register_flowtable_type(&flowtable_ipv4); + + return 0; +} + +static void __exit nf_flow_ipv4_module_exit(void) +{ + nft_unregister_flowtable_type(&flowtable_ipv4); +} + +module_init(nf_flow_ipv4_module_init); +module_exit(nf_flow_ipv4_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Pablo Neira Ayuso "); +MODULE_ALIAS_NF_FLOWTABLE(AF_INET); From patchwork Fri Dec 22 19:27:29 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 852527 X-Patchwork-Delegate: pablo@netfilter.org Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netfilter-devel-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3z3JSx0tMjz9sQm for ; Sat, 23 Dec 2017 06:28:17 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756810AbdLVT2P (ORCPT ); Fri, 22 Dec 2017 14:28:15 -0500 Received: from mail.us.es ([193.147.175.20]:42374 "EHLO mail.us.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756734AbdLVT2I (ORCPT ); Fri, 22 Dec 2017 14:28:08 -0500 Received: from antivirus1-rhel7.int (unknown [192.168.2.11]) by mail.us.es (Postfix) with ESMTP id 5AF3CEBAD5 for ; Fri, 22 Dec 2017 20:28:06 +0100 (CET) Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 4A1C0F7306 for ; Fri, 22 Dec 2017 20:28:06 +0100 (CET) Received: by antivirus1-rhel7.int (Postfix, from userid 99) id 38E8DF7319; Fri, 22 Dec 2017 20:28:06 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on antivirus1-rhel7.int X-Spam-Level: X-Spam-Status: No, score=-108.2 required=7.5 tests=ALL_TRUSTED,BAYES_50, SMTPAUTH_US2,USER_IN_WHITELIST autolearn=disabled version=3.4.1 Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id BFA39F731C; Fri, 22 Dec 2017 20:28:03 +0100 (CET) Received: from 192.168.1.97 (192.168.1.97) by antivirus1-rhel7.int (F-Secure/fsigk_smtp/550/antivirus1-rhel7.int); Fri, 22 Dec 2017 20:28:03 +0100 (CET) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/antivirus1-rhel7.int) Received: from salvia.here (129.166.216.87.static.jazztel.es [87.216.166.129]) (Authenticated sender: pneira@us.es) by entrada.int (Postfix) with ESMTPA id E3D964265A31; Fri, 22 Dec 2017 20:28:02 +0100 (CET) X-SMTPAUTHUS: auth mail.us.es From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: netdev@vger.kernel.org, f.fainelli@gmail.com, simon.horman@netronome.com, ronye@mellanox.com, jiri@mellanox.com, nbd@nbd.name, john@phrozen.org, kubakici@wp.pl, fw@strlen.de Subject: [PATCH nf-next,v3 4/7] netfilter: flow table support for IPv6 Date: Fri, 22 Dec 2017 20:27:29 +0100 Message-Id: <20171222192732.13188-5-pablo@netfilter.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171222192732.13188-1-pablo@netfilter.org> References: <20171222192732.13188-1-pablo@netfilter.org> X-Virus-Scanned: ClamAV using ClamSMTP Sender: netfilter-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org This patch adds the IPv6 flow table type, that implements the datapath flow table to forward IPv6 traffic. This patch exports ip6_dst_mtu_forward() that is required to check for mtu to pass up packets that need PMTUD handling to the classic forwarding path. Signed-off-by: Pablo Neira Ayuso --- include/net/ipv6.h | 2 + net/ipv6/ip6_output.c | 3 +- net/ipv6/netfilter/Kconfig | 8 + net/ipv6/netfilter/Makefile | 3 + net/ipv6/netfilter/nf_flow_table_ipv6.c | 277 ++++++++++++++++++++++++++++++++ 5 files changed, 292 insertions(+), 1 deletion(-) create mode 100644 net/ipv6/netfilter/nf_flow_table_ipv6.c diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 6eac5cf8f1e6..ff069a8e0cde 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -912,6 +912,8 @@ static inline struct sk_buff *ip6_finish_skb(struct sock *sk) &inet6_sk(sk)->cork); } +unsigned int ip6_dst_mtu_forward(const struct dst_entry *dst); + int ip6_dst_lookup(struct net *net, struct sock *sk, struct dst_entry **dst, struct flowi6 *fl6); struct dst_entry *ip6_dst_lookup_flow(const struct sock *sk, struct flowi6 *fl6, diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 43ca864327c7..5ccd082ce182 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -362,7 +362,7 @@ static inline int ip6_forward_finish(struct net *net, struct sock *sk, return dst_output(net, sk, skb); } -static unsigned int ip6_dst_mtu_forward(const struct dst_entry *dst) +unsigned int ip6_dst_mtu_forward(const struct dst_entry *dst) { unsigned int mtu; struct inet6_dev *idev; @@ -382,6 +382,7 @@ static unsigned int ip6_dst_mtu_forward(const struct dst_entry *dst) return mtu; } +EXPORT_SYMBOL_GPL(ip6_dst_mtu_forward); static bool ip6_pkt_too_big(const struct sk_buff *skb, unsigned int mtu) { diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig index 6acb2eecd986..806e95375ec8 100644 --- a/net/ipv6/netfilter/Kconfig +++ b/net/ipv6/netfilter/Kconfig @@ -71,6 +71,14 @@ config NFT_FIB_IPV6 endif # NF_TABLES_IPV6 endif # NF_TABLES +config NF_FLOW_TABLE_IPV6 + select NF_FLOW_TABLE + tristate "Netfilter flow table IPv6 module" + help + This option adds the flow table IPv6 support. + + To compile it as a module, choose M here. + config NF_DUP_IPV6 tristate "Netfilter IPv6 packet duplication to alternate destination" depends on !NF_CONNTRACK || NF_CONNTRACK diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile index fe180c96040e..7dceadbb9eea 100644 --- a/net/ipv6/netfilter/Makefile +++ b/net/ipv6/netfilter/Makefile @@ -44,6 +44,9 @@ obj-$(CONFIG_NFT_REDIR_IPV6) += nft_redir_ipv6.o obj-$(CONFIG_NFT_DUP_IPV6) += nft_dup_ipv6.o obj-$(CONFIG_NFT_FIB_IPV6) += nft_fib_ipv6.o +# flow table support +obj-$(CONFIG_NF_FLOW_TABLE_IPV6) += nf_flow_table_ipv6.o + # matches obj-$(CONFIG_IP6_NF_MATCH_AH) += ip6t_ah.o obj-$(CONFIG_IP6_NF_MATCH_EUI64) += ip6t_eui64.o diff --git a/net/ipv6/netfilter/nf_flow_table_ipv6.c b/net/ipv6/netfilter/nf_flow_table_ipv6.c new file mode 100644 index 000000000000..ab78703154d8 --- /dev/null +++ b/net/ipv6/netfilter/nf_flow_table_ipv6.c @@ -0,0 +1,277 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +/* For layer 4 checksum field offset. */ +#include +#include + +static int nf_flow_nat_ipv6_tcp(struct sk_buff *skb, unsigned int thoff, + struct in6_addr *addr, + struct in6_addr *new_addr) +{ + struct tcphdr *tcph; + + if (!pskb_may_pull(skb, thoff + sizeof(*tcph)) || + skb_try_make_writable(skb, thoff + sizeof(*tcph))) + return -1; + + tcph = (void *)(skb_network_header(skb) + thoff); + inet_proto_csum_replace16(&tcph->check, skb, addr->s6_addr32, + new_addr->s6_addr32, true); + + return 0; +} + +static int nf_flow_nat_ipv6_udp(struct sk_buff *skb, unsigned int thoff, + struct in6_addr *addr, + struct in6_addr *new_addr) +{ + struct udphdr *udph; + + if (!pskb_may_pull(skb, thoff + sizeof(*udph)) || + skb_try_make_writable(skb, thoff + sizeof(*udph))) + return -1; + + udph = (void *)(skb_network_header(skb) + thoff); + if (udph->check || skb->ip_summed == CHECKSUM_PARTIAL) { + inet_proto_csum_replace16(&udph->check, skb, addr->s6_addr32, + new_addr->s6_addr32, true); + if (!udph->check) + udph->check = CSUM_MANGLED_0; + } + + return 0; +} + +static int nf_flow_nat_ipv6_l4proto(struct sk_buff *skb, struct ipv6hdr *ip6h, + unsigned int thoff, struct in6_addr *addr, + struct in6_addr *new_addr) +{ + switch (ip6h->nexthdr) { + case IPPROTO_TCP: + if (nf_flow_nat_ipv6_tcp(skb, thoff, addr, new_addr) < 0) + return NF_DROP; + break; + case IPPROTO_UDP: + if (nf_flow_nat_ipv6_udp(skb, thoff, addr, new_addr) < 0) + return NF_DROP; + break; + } + + return 0; +} + +static int nf_flow_snat_ipv6(const struct flow_offload *flow, + struct sk_buff *skb, struct ipv6hdr *ip6h, + unsigned int thoff, + enum flow_offload_tuple_dir dir) +{ + struct in6_addr addr, new_addr; + + switch (dir) { + case FLOW_OFFLOAD_DIR_ORIGINAL: + addr = ip6h->saddr; + new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.dst_v6; + ip6h->saddr = new_addr; + break; + case FLOW_OFFLOAD_DIR_REPLY: + addr = ip6h->daddr; + new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.src_v6; + ip6h->daddr = new_addr; + break; + default: + return -1; + } + + return nf_flow_nat_ipv6_l4proto(skb, ip6h, thoff, &addr, &new_addr); +} + +static int nf_flow_dnat_ipv6(const struct flow_offload *flow, + struct sk_buff *skb, struct ipv6hdr *ip6h, + unsigned int thoff, + enum flow_offload_tuple_dir dir) +{ + struct in6_addr addr, new_addr; + + switch (dir) { + case FLOW_OFFLOAD_DIR_ORIGINAL: + addr = ip6h->daddr; + new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].tuple.src_v6; + ip6h->daddr = new_addr; + break; + case FLOW_OFFLOAD_DIR_REPLY: + addr = ip6h->saddr; + new_addr = flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_v6; + ip6h->saddr = new_addr; + break; + default: + return -1; + } + + return nf_flow_nat_ipv6_l4proto(skb, ip6h, thoff, &addr, &new_addr); +} + +static int nf_flow_nat_ipv6(const struct flow_offload *flow, + struct sk_buff *skb, + enum flow_offload_tuple_dir dir) +{ + struct ipv6hdr *ip6h = ipv6_hdr(skb); + unsigned int thoff = sizeof(*ip6h); + + if (flow->flags & FLOW_OFFLOAD_SNAT && + (nf_flow_snat_port(flow, skb, thoff, ip6h->nexthdr, dir) < 0 || + nf_flow_snat_ipv6(flow, skb, ip6h, thoff, dir) < 0)) + return -1; + if (flow->flags & FLOW_OFFLOAD_DNAT && + (nf_flow_dnat_port(flow, skb, thoff, ip6h->nexthdr, dir) < 0 || + nf_flow_dnat_ipv6(flow, skb, ip6h, thoff, dir) < 0)) + return -1; + + return 0; +} + +static int nf_flow_tuple_ipv6(struct sk_buff *skb, const struct net_device *dev, + struct flow_offload_tuple *tuple) +{ + struct flow_ports *ports; + struct ipv6hdr *ip6h; + unsigned int thoff; + + if (!pskb_may_pull(skb, sizeof(*ip6h))) + return -1; + + ip6h = ipv6_hdr(skb); + + if (ip6h->nexthdr != IPPROTO_TCP && + ip6h->nexthdr != IPPROTO_UDP) + return -1; + + thoff = sizeof(*ip6h); + if (!pskb_may_pull(skb, thoff + sizeof(*ports))) + return -1; + + ports = (struct flow_ports *)(skb_network_header(skb) + thoff); + + tuple->src_v6 = ip6h->saddr; + tuple->dst_v6 = ip6h->daddr; + tuple->src_port = ports->source; + tuple->dst_port = ports->dest; + tuple->l3proto = AF_INET6; + tuple->l4proto = ip6h->nexthdr; + tuple->iifidx = dev->ifindex; + + return 0; +} + +/* Based on ip_exceeds_mtu(). */ +static bool __nf_flow_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu) +{ + if (skb->len <= mtu) + return false; + + if (skb_is_gso(skb) && skb_gso_validate_mtu(skb, mtu)) + return false; + + return true; +} + +static bool nf_flow_exceeds_mtu(struct sk_buff *skb, const struct rt6_info *rt) +{ + u32 mtu; + + mtu = ip6_dst_mtu_forward(&rt->dst); + if (__nf_flow_exceeds_mtu(skb, mtu)) + return true; + + return false; +} + +static unsigned int +nf_flow_ipv6_offload_hook(void *priv, struct sk_buff *skb, + const struct nf_hook_state *state) +{ + struct flow_offload_tuple_rhash *tuplehash; + struct nf_flowtable *flow_table = priv; + struct flow_offload_tuple tuple = {}; + enum flow_offload_tuple_dir dir; + struct flow_offload *flow; + struct net_device *outdev; + struct in6_addr *nexthop; + struct ipv6hdr *ip6h; + struct rt6_info *rt; + + if (skb->protocol != htons(ETH_P_IPV6)) + return NF_ACCEPT; + + if (nf_flow_tuple_ipv6(skb, state->in, &tuple) < 0) + return NF_ACCEPT; + + tuplehash = flow_offload_lookup(flow_table, &tuple); + if (tuplehash == NULL) + return NF_ACCEPT; + + outdev = dev_get_by_index_rcu(state->net, tuplehash->tuple.oifidx); + if (!outdev) + return NF_ACCEPT; + + dir = tuplehash->tuple.dir; + flow = container_of(tuplehash, struct flow_offload, tuplehash[dir]); + + rt = (struct rt6_info *)flow->tuplehash[dir].tuple.dst_cache; + if (unlikely(nf_flow_exceeds_mtu(skb, rt))) + return NF_ACCEPT; + + if (skb_try_make_writable(skb, sizeof(*ip6h))) + return NF_DROP; + + if (flow->flags & (FLOW_OFFLOAD_SNAT | FLOW_OFFLOAD_DNAT) && + nf_flow_nat_ipv6(flow, skb, dir) < 0) + return NF_DROP; + + flow->timeout = (u32)jiffies + NF_FLOW_TIMEOUT; + ip6h = ipv6_hdr(skb); + ip6h->hop_limit--; + + skb->dev = outdev; + nexthop = rt6_nexthop(rt, &flow->tuplehash[!dir].tuple.src_v6); + neigh_xmit(NEIGH_ND_TABLE, outdev, &nexthop, skb); + + return NF_STOLEN; +} + +static struct nf_flowtable_type flowtable_ipv6 = { + .family = NFPROTO_IPV6, + .params = &nf_flow_offload_rhash_params, + .gc = nf_flow_offload_work_gc, + .hook = nf_flow_ipv6_offload_hook, + .owner = THIS_MODULE, +}; + +static int __init nf_flow_ipv6_module_init(void) +{ + nft_register_flowtable_type(&flowtable_ipv6); + + return 0; +} + +static void __exit nf_flow_ipv6_module_exit(void) +{ + nft_unregister_flowtable_type(&flowtable_ipv6); +} + +module_init(nf_flow_ipv6_module_init); +module_exit(nf_flow_ipv6_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Pablo Neira Ayuso "); +MODULE_ALIAS_NF_FLOWTABLE(AF_INET6); From patchwork Fri Dec 22 19:27:30 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 852529 X-Patchwork-Delegate: pablo@netfilter.org Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netfilter-devel-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3z3JSz2ptDz9sRV for ; Sat, 23 Dec 2017 06:28:19 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756734AbdLVT2S (ORCPT ); Fri, 22 Dec 2017 14:28:18 -0500 Received: from mail.us.es ([193.147.175.20]:42404 "EHLO mail.us.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756748AbdLVT2J (ORCPT ); Fri, 22 Dec 2017 14:28:09 -0500 Received: from antivirus1-rhel7.int (unknown [192.168.2.11]) by mail.us.es (Postfix) with ESMTP id 71F01EBAE5 for ; Fri, 22 Dec 2017 20:28:07 +0100 (CET) Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 623FBF731B for ; Fri, 22 Dec 2017 20:28:07 +0100 (CET) Received: by antivirus1-rhel7.int (Postfix, from userid 99) id 6152BF7313; Fri, 22 Dec 2017 20:28:07 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on antivirus1-rhel7.int X-Spam-Level: X-Spam-Status: No, score=-108.2 required=7.5 tests=ALL_TRUSTED,BAYES_50, SMTPAUTH_US2,USER_IN_WHITELIST autolearn=disabled version=3.4.1 Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 17A75F7313; Fri, 22 Dec 2017 20:28:05 +0100 (CET) Received: from 192.168.1.97 (192.168.1.97) by antivirus1-rhel7.int (F-Secure/fsigk_smtp/550/antivirus1-rhel7.int); Fri, 22 Dec 2017 20:28:05 +0100 (CET) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/antivirus1-rhel7.int) Received: from salvia.here (129.166.216.87.static.jazztel.es [87.216.166.129]) (Authenticated sender: pneira@us.es) by entrada.int (Postfix) with ESMTPA id 477A24265A31; Fri, 22 Dec 2017 20:28:04 +0100 (CET) X-SMTPAUTHUS: auth mail.us.es From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: netdev@vger.kernel.org, f.fainelli@gmail.com, simon.horman@netronome.com, ronye@mellanox.com, jiri@mellanox.com, nbd@nbd.name, john@phrozen.org, kubakici@wp.pl, fw@strlen.de Subject: [PATCH nf-next, v3 5/7] netfilter: flow table support for the mixed IPv4/IPv6 family Date: Fri, 22 Dec 2017 20:27:30 +0100 Message-Id: <20171222192732.13188-6-pablo@netfilter.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171222192732.13188-1-pablo@netfilter.org> References: <20171222192732.13188-1-pablo@netfilter.org> X-Virus-Scanned: ClamAV using ClamSMTP Sender: netfilter-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org This patch adds the IPv6 flow table type, that implements the datapath flow table to forward IPv6 traffic. Signed-off-by: Pablo Neira Ayuso --- include/net/netfilter/nf_flow_table.h | 5 ++++ net/ipv4/netfilter/nf_flow_table_ipv4.c | 3 ++- net/ipv6/netfilter/nf_flow_table_ipv6.c | 3 ++- net/netfilter/Kconfig | 8 ++++++ net/netfilter/nf_flow_table_inet.c | 48 +++++++++++++++++++++++++++++++++ 5 files changed, 65 insertions(+), 2 deletions(-) create mode 100644 net/netfilter/nf_flow_table_inet.c diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index 161f71ca78a0..b22b22082733 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -111,6 +111,11 @@ struct flow_ports { __be16 source, dest; }; +unsigned int nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb, + const struct nf_hook_state *state); +unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb, + const struct nf_hook_state *state); + #define MODULE_ALIAS_NF_FLOWTABLE(family) \ MODULE_ALIAS("nf-flowtable-" __stringify(family)) diff --git a/net/ipv4/netfilter/nf_flow_table_ipv4.c b/net/ipv4/netfilter/nf_flow_table_ipv4.c index ac56c0f0492a..b2d01eb25f2c 100644 --- a/net/ipv4/netfilter/nf_flow_table_ipv4.c +++ b/net/ipv4/netfilter/nf_flow_table_ipv4.c @@ -202,7 +202,7 @@ static bool nf_flow_exceeds_mtu(struct sk_buff *skb, const struct rtable *rt) return false; } -static unsigned int +unsigned int nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { @@ -254,6 +254,7 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb, return NF_STOLEN; } +EXPORT_SYMBOL_GPL(nf_flow_offload_ip_hook); static struct nf_flowtable_type flowtable_ipv4 = { .family = NFPROTO_IPV4, diff --git a/net/ipv6/netfilter/nf_flow_table_ipv6.c b/net/ipv6/netfilter/nf_flow_table_ipv6.c index ab78703154d8..021209be0c3c 100644 --- a/net/ipv6/netfilter/nf_flow_table_ipv6.c +++ b/net/ipv6/netfilter/nf_flow_table_ipv6.c @@ -196,7 +196,7 @@ static bool nf_flow_exceeds_mtu(struct sk_buff *skb, const struct rt6_info *rt) return false; } -static unsigned int +unsigned int nf_flow_ipv6_offload_hook(void *priv, struct sk_buff *skb, const struct nf_hook_state *state) { @@ -248,6 +248,7 @@ nf_flow_ipv6_offload_hook(void *priv, struct sk_buff *skb, return NF_STOLEN; } +EXPORT_SYMBOL_GPL(nf_flow_ipv6_offload_hook); static struct nf_flowtable_type flowtable_ipv6 = { .family = NFPROTO_IPV6, diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index af0f58322515..0c6256db5a6c 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -649,6 +649,14 @@ endif # NF_TABLES_NETDEV endif # NF_TABLES +config NF_FLOW_TABLE_INET + select NF_FLOW_TABLE + tristate "Netfilter flow table mixed IPv4/IPv6 module" + help + This option adds the flow table mixed IPv4/IPv6 support. + + To compile it as a module, choose M here. + config NF_FLOW_TABLE tristate "Netfilter flow table module" help diff --git a/net/netfilter/nf_flow_table_inet.c b/net/netfilter/nf_flow_table_inet.c new file mode 100644 index 000000000000..281209aeba8f --- /dev/null +++ b/net/netfilter/nf_flow_table_inet.c @@ -0,0 +1,48 @@ +#include +#include +#include +#include +#include +#include +#include + +static unsigned int +nf_flow_offload_inet_hook(void *priv, struct sk_buff *skb, + const struct nf_hook_state *state) +{ + switch (skb->protocol) { + case htons(ETH_P_IP): + return nf_flow_offload_ip_hook(priv, skb, state); + case htons(ETH_P_IPV6): + return nf_flow_offload_ipv6_hook(priv, skb, state); + } + + return NF_ACCEPT; +} + +static struct nf_flowtable_type flowtable_inet = { + .family = NFPROTO_INET, + .params = &nf_flow_offload_rhash_params, + .gc = nf_flow_offload_work_gc, + .hook = nf_flow_offload_inet_hook, + .owner = THIS_MODULE, +}; + +static int __init nf_flow_inet_module_init(void) +{ + nft_register_flowtable_type(&flowtable_inet); + + return 0; +} + +static void __exit nf_flow_inet_module_exit(void) +{ + nft_unregister_flowtable_type(&flowtable_inet); +} + +module_init(nf_flow_inet_module_init); +module_exit(nf_flow_inet_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Pablo Neira Ayuso "); +MODULE_ALIAS_NF_FLOWTABLE(1); /* NFPROTO_INET */ From patchwork Fri Dec 22 19:27:31 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 852531 X-Patchwork-Delegate: pablo@netfilter.org Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netfilter-devel-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3z3JT23wMYz9s74 for ; Sat, 23 Dec 2017 06:28:22 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756825AbdLVT2U (ORCPT ); Fri, 22 Dec 2017 14:28:20 -0500 Received: from mail.us.es ([193.147.175.20]:42360 "EHLO mail.us.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756785AbdLVT2K (ORCPT ); Fri, 22 Dec 2017 14:28:10 -0500 Received: from antivirus1-rhel7.int (unknown [192.168.2.11]) by mail.us.es (Postfix) with ESMTP id EC71AEBAD2 for ; Fri, 22 Dec 2017 20:28:08 +0100 (CET) Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id DB2F9F7320 for ; Fri, 22 Dec 2017 20:28:08 +0100 (CET) Received: by antivirus1-rhel7.int (Postfix, from userid 99) id D02D8F7316; Fri, 22 Dec 2017 20:28:08 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on antivirus1-rhel7.int X-Spam-Level: X-Spam-Status: No, score=-108.2 required=7.5 tests=ALL_TRUSTED,BAYES_50, SMTPAUTH_US2,USER_IN_WHITELIST autolearn=disabled version=3.4.1 Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 70058F7306; Fri, 22 Dec 2017 20:28:06 +0100 (CET) Received: from 192.168.1.97 (192.168.1.97) by antivirus1-rhel7.int (F-Secure/fsigk_smtp/550/antivirus1-rhel7.int); Fri, 22 Dec 2017 20:28:06 +0100 (CET) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/antivirus1-rhel7.int) Received: from salvia.here (129.166.216.87.static.jazztel.es [87.216.166.129]) (Authenticated sender: pneira@us.es) by entrada.int (Postfix) with ESMTPA id 91D054265A31; Fri, 22 Dec 2017 20:28:05 +0100 (CET) X-SMTPAUTHUS: auth mail.us.es From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: netdev@vger.kernel.org, f.fainelli@gmail.com, simon.horman@netronome.com, ronye@mellanox.com, jiri@mellanox.com, nbd@nbd.name, john@phrozen.org, kubakici@wp.pl, fw@strlen.de Subject: [PATCH nf-next, v3 6/7] netfilter: nf_tables: flow offload expression Date: Fri, 22 Dec 2017 20:27:31 +0100 Message-Id: <20171222192732.13188-7-pablo@netfilter.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171222192732.13188-1-pablo@netfilter.org> References: <20171222192732.13188-1-pablo@netfilter.org> X-Virus-Scanned: ClamAV using ClamSMTP Sender: netfilter-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org Add new instruction for the nf_tables VM that allows us to specify what flows are offloaded into a given flow table via name. This new instruction creates the flow entry and adds it to the flow table. Only established flows, ie. we have seen traffic in both directions, are added to the flow table. You can still decide to offload entries at a later stage via packet counting or checking the ct status in case you want to offload assured conntracks. This new extension depends on the conntrack subsystem. Signed-off-by: Pablo Neira Ayuso --- include/uapi/linux/netfilter/nf_tables.h | 11 ++ net/netfilter/Kconfig | 7 + net/netfilter/Makefile | 1 + net/netfilter/nft_flow_offload.c | 268 +++++++++++++++++++++++++++++++ 4 files changed, 287 insertions(+) create mode 100644 net/netfilter/nft_flow_offload.c diff --git a/include/uapi/linux/netfilter/nf_tables.h b/include/uapi/linux/netfilter/nf_tables.h index 9ba0f4c13de6..528d832fefb4 100644 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@ -954,6 +954,17 @@ enum nft_ct_attributes { }; #define NFTA_CT_MAX (__NFTA_CT_MAX - 1) +/** + * enum nft_flow_attributes - ct offload expression attributes + * @NFTA_FLOW_TABLE_NAME: flow table name (NLA_STRING) + */ +enum nft_offload_attributes { + NFTA_FLOW_UNSPEC, + NFTA_FLOW_TABLE_NAME, + __NFTA_FLOW_MAX, +}; +#define NFTA_FLOW_MAX (__NFTA_FLOW_MAX - 1) + enum nft_limit_type { NFT_LIMIT_PKTS, NFT_LIMIT_PKT_BYTES diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index 0c6256db5a6c..1ada46345f3c 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -497,6 +497,13 @@ config NFT_CT This option adds the "ct" expression that you can use to match connection tracking information such as the flow state. +config NFT_FLOW_OFFLOAD + depends on NF_CONNTRACK + tristate "Netfilter nf_tables hardware flow offload module" + help + This option adds the "flow_offload" expression that you can use to + choose what flows are placed into the hardware. + config NFT_SET_RBTREE tristate "Netfilter nf_tables rbtree set module" help diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 1f7d92bd571a..2c1b8de922f2 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -83,6 +83,7 @@ obj-$(CONFIG_NFT_META) += nft_meta.o obj-$(CONFIG_NFT_RT) += nft_rt.o obj-$(CONFIG_NFT_NUMGEN) += nft_numgen.o obj-$(CONFIG_NFT_CT) += nft_ct.o +obj-$(CONFIG_NFT_FLOW_OFFLOAD) += nft_flow_offload.o obj-$(CONFIG_NFT_LIMIT) += nft_limit.o obj-$(CONFIG_NFT_NAT) += nft_nat.o obj-$(CONFIG_NFT_OBJREF) += nft_objref.o diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c new file mode 100644 index 000000000000..4f16c37acaa3 --- /dev/null +++ b/net/netfilter/nft_flow_offload.c @@ -0,0 +1,268 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include /* for ipv4 options. */ +#include +#include +#include +#include +#include + +struct nft_flow_offload { + struct nft_flowtable *flowtable; +}; + +static int nft_flow_route(const struct nft_pktinfo *pkt, + const struct nf_conn *ct, + struct nf_flow_route *route, + enum ip_conntrack_dir dir) +{ + struct dst_entry *this_dst = skb_dst(pkt->skb); + struct dst_entry *other_dst; + const struct nf_afinfo *ai; + struct flowi fl; + + memset(&fl, 0, sizeof(fl)); + switch (nft_pf(pkt)) { + case NFPROTO_IPV4: + fl.u.ip4.daddr = ct->tuplehash[!dir].tuple.dst.u3.ip; + break; + case NFPROTO_IPV6: + fl.u.ip6.daddr = ct->tuplehash[!dir].tuple.dst.u3.in6; + break; + } + + ai = nf_get_afinfo(nft_pf(pkt)); + if (ai) { + ai->route(nft_net(pkt), &other_dst, &fl, false); + if (!other_dst) + return -ENOENT; + } + + route->tuple[dir].dst = this_dst; + route->tuple[dir].ifindex = nft_in(pkt)->ifindex; + route->tuple[!dir].dst = other_dst; + route->tuple[!dir].ifindex = nft_out(pkt)->ifindex; + + return 0; +} + +static bool nft_flow_offload_skip(struct sk_buff *skb) +{ + struct ip_options *opt = &(IPCB(skb)->opt); + + if (unlikely(opt->optlen)) + return true; + if (skb_sec_path(skb)) + return true; + + return false; +} + +static void nft_flow_offload_eval(const struct nft_expr *expr, + struct nft_regs *regs, + const struct nft_pktinfo *pkt) +{ + struct nft_flow_offload *priv = nft_expr_priv(expr); + struct nf_flowtable *flowtable = &priv->flowtable->data; + enum ip_conntrack_info ctinfo; + struct nf_flow_route route; + struct flow_offload *flow; + enum ip_conntrack_dir dir; + struct nf_conn *ct; + int ret; + + if (nft_flow_offload_skip(pkt->skb)) + goto out; + + ct = nf_ct_get(pkt->skb, &ctinfo); + if (!ct) + goto out; + + switch (ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.protonum) { + case IPPROTO_TCP: + case IPPROTO_UDP: + break; + default: + goto out; + } + + if (test_bit(IPS_HELPER_BIT, &ct->status)) + goto out; + + if (ctinfo == IP_CT_NEW || + ctinfo == IP_CT_RELATED) + goto out; + + if (test_and_set_bit(IPS_OFFLOAD_BIT, &ct->status)) + goto out; + + dir = CTINFO2DIR(ctinfo); + if (nft_flow_route(pkt, ct, &route, dir) < 0) + goto err_flow_route; + + flow = flow_offload_alloc(ct, &route); + if (!flow) + goto err_flow_alloc; + + ret = flow_offload_add(flowtable, flow); + if (ret < 0) + goto err_flow_add; + + return; + +err_flow_add: + flow_offload_free(flow); +err_flow_alloc: + dst_release(route.tuple[!dir].dst); +err_flow_route: + clear_bit(IPS_OFFLOAD_BIT, &ct->status); +out: + regs->verdict.code = NFT_BREAK; +} + +static int nft_flow_offload_validate(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nft_data **data) +{ + unsigned int hook_mask = (1 << NF_INET_FORWARD); + + return nft_chain_validate_hooks(ctx->chain, hook_mask); +} + +static int nft_flow_offload_init(const struct nft_ctx *ctx, + const struct nft_expr *expr, + const struct nlattr * const tb[]) +{ + struct nft_flow_offload *priv = nft_expr_priv(expr); + u8 genmask = nft_genmask_next(ctx->net); + struct nft_flowtable *flowtable; + + if (!tb[NFTA_FLOW_TABLE_NAME]) + return -EINVAL; + + flowtable = nf_tables_flowtable_lookup(ctx->table, + tb[NFTA_FLOW_TABLE_NAME], + genmask); + if (IS_ERR(flowtable)) + return PTR_ERR(flowtable); + + priv->flowtable = flowtable; + flowtable->use++; + + return nf_ct_netns_get(ctx->net, ctx->afi->family); +} + +static void nft_flow_offload_destroy(const struct nft_ctx *ctx, + const struct nft_expr *expr) +{ + struct nft_flow_offload *priv = nft_expr_priv(expr); + + priv->flowtable->use--; + nf_ct_netns_put(ctx->net, ctx->afi->family); +} + +static int nft_flow_offload_dump(struct sk_buff *skb, const struct nft_expr *expr) +{ + struct nft_flow_offload *priv = nft_expr_priv(expr); + + if (nla_put_string(skb, NFTA_FLOW_TABLE_NAME, priv->flowtable->name)) + goto nla_put_failure; + + return 0; + +nla_put_failure: + return -1; +} + +struct nft_expr_type nft_flow_offload_type; +static const struct nft_expr_ops nft_flow_offload_ops = { + .type = &nft_flow_offload_type, + .size = NFT_EXPR_SIZE(sizeof(struct nft_flow_offload)), + .eval = nft_flow_offload_eval, + .init = nft_flow_offload_init, + .destroy = nft_flow_offload_destroy, + .validate = nft_flow_offload_validate, + .dump = nft_flow_offload_dump, +}; + +struct nft_expr_type nft_flow_offload_type __read_mostly = { + .name = "flow_offload", + .ops = &nft_flow_offload_ops, + .maxattr = NFTA_FLOW_MAX, + .owner = THIS_MODULE, +}; + +static void flow_offload_iterate_cleanup(struct flow_offload *flow, void *data) +{ + struct net_device *dev = data; + + if (dev && flow->tuplehash[0].tuple.iifidx != dev->ifindex) + return; + + flow_offload_dead(flow); +} + +static void nft_flow_offload_iterate_cleanup(struct nf_flowtable *flowtable, + void *data) +{ + nf_flow_table_iterate(flowtable, flow_offload_iterate_cleanup, data); +} + +static int flow_offload_netdev_event(struct notifier_block *this, + unsigned long event, void *ptr) +{ + struct net_device *dev = netdev_notifier_info_to_dev(ptr); + + if (event != NETDEV_DOWN) + return NOTIFY_DONE; + + nft_flow_table_iterate(dev_net(dev), nft_flow_offload_iterate_cleanup, dev); + + return NOTIFY_DONE; +} + +static struct notifier_block flow_offload_netdev_notifier = { + .notifier_call = flow_offload_netdev_event, +}; + +static int __init nft_flow_offload_module_init(void) +{ + int err; + + register_netdevice_notifier(&flow_offload_netdev_notifier); + + err = nft_register_expr(&nft_flow_offload_type); + if (err < 0) + goto register_expr; + + return 0; + +register_expr: + unregister_netdevice_notifier(&flow_offload_netdev_notifier); + return err; +} + +static void __exit nft_flow_offload_module_exit(void) +{ + struct net *net; + + nft_unregister_expr(&nft_flow_offload_type); + unregister_netdevice_notifier(&flow_offload_netdev_notifier); + rtnl_lock(); + for_each_net(net) + nft_flow_table_iterate(net, nft_flow_offload_iterate_cleanup, NULL); + rtnl_unlock(); +} + +module_init(nft_flow_offload_module_init); +module_exit(nft_flow_offload_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Pablo Neira Ayuso "); +MODULE_ALIAS_NFT_EXPR("flow_offload"); From patchwork Fri Dec 22 19:27:32 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 852537 X-Patchwork-Delegate: pablo@netfilter.org Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netfilter-devel-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3z3JTX5XzFz9s7n for ; Sat, 23 Dec 2017 06:28:48 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756862AbdLVT2o (ORCPT ); Fri, 22 Dec 2017 14:28:44 -0500 Received: from mail.us.es ([193.147.175.20]:42404 "EHLO mail.us.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756720AbdLVT2L (ORCPT ); Fri, 22 Dec 2017 14:28:11 -0500 Received: from antivirus1-rhel7.int (unknown [192.168.2.11]) by mail.us.es (Postfix) with ESMTP id 8B066EBAE5 for ; Fri, 22 Dec 2017 20:28:10 +0100 (CET) Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 7A93BF7317 for ; Fri, 22 Dec 2017 20:28:10 +0100 (CET) Received: by antivirus1-rhel7.int (Postfix, from userid 99) id 663D0F730E; Fri, 22 Dec 2017 20:28:10 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on antivirus1-rhel7.int X-Spam-Level: X-Spam-Status: No, score=-108.2 required=7.5 tests=ALL_TRUSTED,BAYES_50, SMTPAUTH_US2,USER_IN_WHITELIST autolearn=disabled version=3.4.1 Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id E3C20F730E; Fri, 22 Dec 2017 20:28:07 +0100 (CET) Received: from 192.168.1.97 (192.168.1.97) by antivirus1-rhel7.int (F-Secure/fsigk_smtp/550/antivirus1-rhel7.int); Fri, 22 Dec 2017 20:28:07 +0100 (CET) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/antivirus1-rhel7.int) Received: from salvia.here (129.166.216.87.static.jazztel.es [87.216.166.129]) (Authenticated sender: pneira@us.es) by entrada.int (Postfix) with ESMTPA id 0FBEF4265A31; Fri, 22 Dec 2017 20:28:06 +0100 (CET) X-SMTPAUTHUS: auth mail.us.es From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: netdev@vger.kernel.org, f.fainelli@gmail.com, simon.horman@netronome.com, ronye@mellanox.com, jiri@mellanox.com, nbd@nbd.name, john@phrozen.org, kubakici@wp.pl, fw@strlen.de Subject: [PATCH RFC nf-next, v3 7/7] netfilter: nf_flow_table: add hardware offload support Date: Fri, 22 Dec 2017 20:27:32 +0100 Message-Id: <20171222192732.13188-8-pablo@netfilter.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171222192732.13188-1-pablo@netfilter.org> References: <20171222192732.13188-1-pablo@netfilter.org> X-Virus-Scanned: ClamAV using ClamSMTP Sender: netfilter-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org This patch adds the infrastructure to offload flows to hardware, in case the nic/switch comes with built-in flow tables capabilities. If the hardware comes with no hardware flow tables or they have limitations in terms of features, this falls back to the software generic flow table implementation. The software flow table garbage collector skips entries that resides in the hardware, so the hardware will be responsible for releasing this flow table entry too via flow_offload_dead(). In the next garbage collector run, this removes the entries both in the software and hardware flow table from user context. Signed-off-by: Pablo Neira Ayuso --- include/linux/netdevice.h | 9 +++ include/net/netfilter/nf_flow_table.h | 6 ++ net/netfilter/Kconfig | 9 +++ net/netfilter/Makefile | 1 + net/netfilter/nf_flow_table.c | 13 ++++ net/netfilter/nf_flow_table_hw.c | 127 ++++++++++++++++++++++++++++++++++ net/netfilter/nf_tables_api.c | 2 + net/netfilter/nft_flow_offload.c | 4 ++ 8 files changed, 171 insertions(+) create mode 100644 net/netfilter/nf_flow_table_hw.c diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f535779d9dc1..5f2919775632 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -826,6 +826,13 @@ struct xfrmdev_ops { }; #endif +struct flow_offload; + +enum flow_offload_type { + FLOW_OFFLOAD_ADD = 0, + FLOW_OFFLOAD_DEL, +}; + /* * This structure defines the management hooks for network devices. * The following hooks can be defined; unless noted otherwise, they are @@ -1281,6 +1288,8 @@ struct net_device_ops { int (*ndo_bridge_dellink)(struct net_device *dev, struct nlmsghdr *nlh, u16 flags); + int (*ndo_flow_offload)(enum flow_offload_type type, + struct flow_offload *flow); int (*ndo_change_carrier)(struct net_device *dev, bool new_carrier); int (*ndo_get_phys_port_id)(struct net_device *dev, diff --git a/include/net/netfilter/nf_flow_table.h b/include/net/netfilter/nf_flow_table.h index b22b22082733..02ac8c7e4f7f 100644 --- a/include/net/netfilter/nf_flow_table.h +++ b/include/net/netfilter/nf_flow_table.h @@ -23,6 +23,7 @@ struct nf_flowtable { struct rhashtable rhashtable; const struct nf_flowtable_type *type; struct delayed_work gc_work; + possible_net_t ft_net; }; enum flow_offload_tuple_dir { @@ -65,6 +66,7 @@ struct flow_offload_tuple_rhash { #define FLOW_OFFLOAD_SNAT 0x1 #define FLOW_OFFLOAD_DNAT 0x2 #define FLOW_OFFLOAD_DYING 0x4 +#define FLOW_OFFLOAD_HW 0x8 struct flow_offload { struct flow_offload_tuple_rhash tuplehash[FLOW_OFFLOAD_DIR_MAX]; @@ -116,6 +118,10 @@ unsigned int nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb, unsigned int nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb, const struct nf_hook_state *state); +void flow_offload_hw_add(struct net *net, struct flow_offload *flow, + struct nf_conn *ct); +void flow_offload_hw_del(struct net *net, struct flow_offload *flow); + #define MODULE_ALIAS_NF_FLOWTABLE(family) \ MODULE_ALIAS("nf-flowtable-" __stringify(family)) diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index 1ada46345f3c..cc25876cf223 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -671,6 +671,15 @@ config NF_FLOW_TABLE To compile it as a module, choose M here. +config NF_FLOW_TABLE_HW + tristate "Netfilter flow table hardware offload module" + depends on NF_FLOW_TABLE + help + This option adds hardware offload support for the flow table core + infrastructure. + + To compile it as a module, choose M here. + config NETFILTER_XTABLES tristate "Netfilter Xtables support (required for ip_tables)" default m if NETFILTER_ADVANCED=n diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 2c1b8de922f2..1a97a47ad4e8 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -109,6 +109,7 @@ obj-$(CONFIG_NFT_FWD_NETDEV) += nft_fwd_netdev.o # flow table infrastructure obj-$(CONFIG_NF_FLOW_TABLE) += nf_flow_table.o +obj-$(CONFIG_NF_FLOW_TABLE_HW) += nf_flow_table_hw.o # generic X tables obj-$(CONFIG_NETFILTER_XTABLES) += x_tables.o xt_tcpudp.o diff --git a/net/netfilter/nf_flow_table.c b/net/netfilter/nf_flow_table.c index e1024b17b910..a505351980fd 100644 --- a/net/netfilter/nf_flow_table.c +++ b/net/netfilter/nf_flow_table.c @@ -237,15 +237,22 @@ static inline bool nf_flow_is_dying(const struct flow_offload *flow) return flow->flags & FLOW_OFFLOAD_DYING; } +static inline bool nf_flow_in_hw(const struct flow_offload *flow) +{ + return flow->flags & FLOW_OFFLOAD_HW; +} + void nf_flow_offload_work_gc(struct work_struct *work) { struct flow_offload_tuple_rhash *tuplehash; struct nf_flowtable *flow_table; struct rhashtable_iter hti; struct flow_offload *flow; + struct net *net; int err; flow_table = container_of(work, struct nf_flowtable, gc_work.work); + net = read_pnet(&flow_table->ft_net); rhashtable_walk_init(&flow_table->rhashtable, &hti, GFP_KERNEL); err = rhashtable_walk_start(&hti); @@ -265,10 +272,16 @@ void nf_flow_offload_work_gc(struct work_struct *work) flow = container_of(tuplehash, struct flow_offload, tuplehash[0]); + if (nf_flow_in_hw(flow) && + !nf_flow_is_dying(flow)) + continue; + if (nf_flow_has_expired(flow) || nf_flow_is_dying(flow)) { flow_offload_del(flow_table, flow); nf_flow_release_ct(flow); + if (nf_flow_in_hw(flow)) + flow_offload_hw_del(net, flow); } } diff --git a/net/netfilter/nf_flow_table_hw.c b/net/netfilter/nf_flow_table_hw.c new file mode 100644 index 000000000000..2907564c8aec --- /dev/null +++ b/net/netfilter/nf_flow_table_hw.c @@ -0,0 +1,127 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static DEFINE_SPINLOCK(flow_offload_hw_pending_list_lock); +static LIST_HEAD(flow_offload_hw_pending_list); + +static DEFINE_MUTEX(nf_flow_offload_hw_mutex); +static struct work_struct nft_flow_offload_hw_work; + +struct flow_offload_hw { + struct list_head list; + struct flow_offload *flow; + struct nf_conn *ct; + possible_net_t flow_hw_net; +}; + +static int do_flow_offload_hw(struct net *net, struct flow_offload *flow) +{ + struct net_device *indev; + int ret, ifindex; + + ifindex = flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.iifidx; + indev = dev_get_by_index(net, ifindex); + if (WARN_ON(!indev)) + return 0; + + mutex_lock(&nf_flow_offload_hw_mutex); + ret = indev->netdev_ops->ndo_flow_offload(FLOW_OFFLOAD_ADD, flow); + mutex_unlock(&nf_flow_offload_hw_mutex); + + if (ret >= 0) + flow->flags |= FLOW_OFFLOAD_HW; + + dev_put(indev); + + return ret; +} + +static void flow_offload_hw_work(struct work_struct *work) +{ + struct flow_offload_hw *offload, *next; + LIST_HEAD(hw_offload_pending); + struct net *net; + + spin_lock_bh(&flow_offload_hw_pending_list_lock); + if (!list_empty(&flow_offload_hw_pending_list)) + list_move_tail(&flow_offload_hw_pending_list, &hw_offload_pending); + spin_unlock_bh(&flow_offload_hw_pending_list_lock); + + list_for_each_entry_safe(offload, next, &hw_offload_pending, list) { + if (nf_ct_is_dying(offload->ct)) + goto next; + + net = read_pnet(&offload->flow_hw_net); + do_flow_offload_hw(net, offload->flow); +next: + nf_conntrack_put(&offload->ct->ct_general); + list_del(&offload->list); + kfree(offload); + } +} + +void flow_offload_hw_add(struct net *net, struct flow_offload *flow, + struct nf_conn *ct) +{ + struct flow_offload_hw *offload; + + offload = kmalloc(sizeof(struct flow_offload_hw), GFP_ATOMIC); + if (!offload) + return; + + nf_conntrack_get(&ct->ct_general); + offload->ct = ct; + offload->flow = flow; + write_pnet(&offload->flow_hw_net, net); + + spin_lock_bh(&flow_offload_hw_pending_list_lock); + list_add_tail(&offload->list, &flow_offload_hw_pending_list); + spin_unlock_bh(&flow_offload_hw_pending_list_lock); + + schedule_work(&nft_flow_offload_hw_work); +} +EXPORT_SYMBOL_GPL(flow_offload_hw_add); + +void flow_offload_hw_del(struct net *net, struct flow_offload *flow) +{ + struct net_device *indev; + int ret, ifindex; + + ifindex = flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.iifidx; + indev = dev_get_by_index(net, ifindex); + if (WARN_ON(!indev)) + return; + + mutex_lock(&nf_flow_offload_hw_mutex); + ret = indev->netdev_ops->ndo_flow_offload(FLOW_OFFLOAD_DEL, flow); + mutex_unlock(&nf_flow_offload_hw_mutex); + + dev_put(indev); +} +EXPORT_SYMBOL_GPL(flow_offload_hw_del); + +static int __init nf_flow_table_module_init(void) +{ + INIT_WORK(&nft_flow_offload_hw_work, flow_offload_hw_work); + + return 0; +} + +static void __exit nf_flow_table_module_exit(void) +{ + cancel_work_sync(&nft_flow_offload_hw_work); +} + +module_init(nf_flow_table_module_init); +module_exit(nf_flow_table_module_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Pablo Neira Ayuso "); diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index efd9405a8a5e..6583d2a0e35b 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -5095,6 +5095,8 @@ static int nf_tables_newflowtable(struct net *net, struct sock *nlsk, } flowtable->data.type = type; + write_pnet(&flowtable->data.ft_net, net); + err = rhashtable_init(&flowtable->data.rhashtable, type->params); if (err < 0) goto err3; diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c index 4f16c37acaa3..5c8ea236e8a3 100644 --- a/net/netfilter/nft_flow_offload.c +++ b/net/netfilter/nft_flow_offload.c @@ -70,6 +70,7 @@ static void nft_flow_offload_eval(const struct nft_expr *expr, { struct nft_flow_offload *priv = nft_expr_priv(expr); struct nf_flowtable *flowtable = &priv->flowtable->data; + const struct net_device *indev = nft_in(pkt); enum ip_conntrack_info ctinfo; struct nf_flow_route route; struct flow_offload *flow; @@ -114,6 +115,9 @@ static void nft_flow_offload_eval(const struct nft_expr *expr, if (ret < 0) goto err_flow_add; + if (indev->netdev_ops->ndo_flow_offload) + flow_offload_hw_add(nft_net(pkt), flow, ct); + return; err_flow_add: