{"id":805189,"url":"http://patchwork.ozlabs.org/api/1.2/patches/805189/?format=json","web_url":"http://patchwork.ozlabs.org/project/netfilter-devel/patch/20170823220832.32535-1-fw@strlen.de/","project":{"id":26,"url":"http://patchwork.ozlabs.org/api/1.2/projects/26/?format=json","name":"Netfilter Development","link_name":"netfilter-devel","list_id":"netfilter-devel.vger.kernel.org","list_email":"netfilter-devel@vger.kernel.org","web_url":null,"scm_url":null,"webscm_url":null,"list_archive_url":"","list_archive_url_format":"","commit_url_format":""},"msgid":"<20170823220832.32535-1-fw@strlen.de>","list_archive_url":null,"date":"2017-08-23T22:08:32","name":"[nf-next,v2,1/3] netfilter: convert hook list to an array","commit_ref":null,"pull_url":null,"state":"accepted","archived":false,"hash":"880772fdf375ed128491e0d3f3ac608a2d0b55b1","submitter":{"id":1025,"url":"http://patchwork.ozlabs.org/api/1.2/people/1025/?format=json","name":"Florian Westphal","email":"fw@strlen.de"},"delegate":{"id":6139,"url":"http://patchwork.ozlabs.org/api/1.2/users/6139/?format=json","username":"pablo","first_name":"Pablo","last_name":"Neira","email":"pablo@netfilter.org"},"mbox":"http://patchwork.ozlabs.org/project/netfilter-devel/patch/20170823220832.32535-1-fw@strlen.de/mbox/","series":[],"comments":"http://patchwork.ozlabs.org/api/patches/805189/comments/","check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/805189/checks/","tags":{},"related":[],"headers":{"Return-Path":"<netfilter-devel-owner@vger.kernel.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netfilter-devel-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xd1lQ1CBMz9sRm\n\tfor <incoming@patchwork.ozlabs.org>;\n\tThu, 24 Aug 2017 08:08:18 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751026AbdHWWIR (ORCPT <rfc822;incoming@patchwork.ozlabs.org>);\n\tWed, 23 Aug 2017 18:08:17 -0400","from Chamillionaire.breakpoint.cc ([146.0.238.67]:60068 \"EHLO\n\tChamillionaire.breakpoint.cc\" rhost-flags-OK-OK-OK-OK)\n\tby vger.kernel.org with ESMTP id S1750715AbdHWWIQ (ORCPT\n\t<rfc822;netfilter-devel@vger.kernel.org>);\n\tWed, 23 Aug 2017 18:08:16 -0400","from fw by Chamillionaire.breakpoint.cc with local (Exim 4.84_2)\n\t(envelope-from <fw@breakpoint.cc>)\n\tid 1dkdmQ-000466-7x; Thu, 24 Aug 2017 00:05:42 +0200"],"From":"Florian Westphal <fw@strlen.de>","To":"<netfilter-devel@vger.kernel.org>","Cc":"Aaron Conole <aconole@bytheb.org>, Florian Westphal <fw@strlen.de>","Subject":"[PATCH nf-next v2 1/3] netfilter: convert hook list to an array","Date":"Thu, 24 Aug 2017 00:08:32 +0200","Message-Id":"<20170823220832.32535-1-fw@strlen.de>","X-Mailer":"git-send-email 2.13.0","In-Reply-To":"<20170823152627.19865-1-fw@strlen.de>","References":"<20170823152627.19865-1-fw@strlen.de>","Sender":"netfilter-devel-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netfilter-devel.vger.kernel.org>","X-Mailing-List":"netfilter-devel@vger.kernel.org"},"content":"From: Aaron Conole <aconole@bytheb.org>\n\nThis converts the storage and layout of netfilter hook entries from a\nlinked list to an array.  After this commit, hook entries will be\nstored adjacent in memory.  The next pointer is no longer required.\n\nThe ops pointers are stored at the end of the array as they are only\nused in the register/unregister path and in the legacy br_netfilter code.\n\nnf_unregister_net_hooks() is slower than needed as it just calls\nnf_unregister_net_hook in a loop (i.e. at least n synchronize_net()\ncalls), this will be addressed in followup patch.\n\nTest setup:\n - ixgbe 10gbit\n - netperf UDP_STREAM, 64 byte packets\n - 5 hooks: (raw + mangle prerouting, mangle+filter input, inet filter):\nempty mangle and raw prerouting, mangle and filter input hooks:\n353.9\nthis patch:\n364.2\n\nSigned-off-by: Aaron Conole <aconole@bytheb.org>\nSigned-off-by: Florian Westphal <fw@strlen.de>\n---\n Change since v1: use kvzalloc (Eric).\n\n include/linux/netdevice.h         |   2 +-\n include/linux/netfilter.h         |  45 +++---\n include/linux/netfilter_ingress.h |   4 +-\n include/net/netfilter/nf_queue.h  |   2 +-\n include/net/netns/netfilter.h     |   2 +-\n net/bridge/br_netfilter_hooks.c   |  19 ++-\n net/netfilter/core.c              | 297 ++++++++++++++++++++++++++++----------\n net/netfilter/nf_internals.h      |   3 +-\n net/netfilter/nf_queue.c          |  67 +++++----\n 9 files changed, 307 insertions(+), 134 deletions(-)","diff":"diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h\nindex 614642eb7eb7..ca0a30127300 100644\n--- a/include/linux/netdevice.h\n+++ b/include/linux/netdevice.h\n@@ -1811,7 +1811,7 @@ struct net_device {\n #endif\n \tstruct netdev_queue __rcu *ingress_queue;\n #ifdef CONFIG_NETFILTER_INGRESS\n-\tstruct nf_hook_entry __rcu *nf_hooks_ingress;\n+\tstruct nf_hook_entries __rcu *nf_hooks_ingress;\n #endif\n \n \tunsigned char\t\tbroadcast[MAX_ADDR_LEN];\ndiff --git a/include/linux/netfilter.h b/include/linux/netfilter.h\nindex 22f081065d49..f84bca1703cd 100644\n--- a/include/linux/netfilter.h\n+++ b/include/linux/netfilter.h\n@@ -72,25 +72,32 @@ struct nf_hook_ops {\n };\n \n struct nf_hook_entry {\n-\tstruct nf_hook_entry __rcu\t*next;\n \tnf_hookfn\t\t\t*hook;\n \tvoid\t\t\t\t*priv;\n-\tconst struct nf_hook_ops\t*orig_ops;\n };\n \n-static inline void\n-nf_hook_entry_init(struct nf_hook_entry *entry,\tconst struct nf_hook_ops *ops)\n-{\n-\tentry->next = NULL;\n-\tentry->hook = ops->hook;\n-\tentry->priv = ops->priv;\n-\tentry->orig_ops = ops;\n-}\n+struct nf_hook_entries {\n+\tu16\t\t\t\tnum_hook_entries;\n+\t/* padding */\n+\tstruct nf_hook_entry\t\thooks[];\n+\n+\t/* trailer: pointers to original orig_ops of each hook.\n+\t *\n+\t * This is not part of struct nf_hook_entry since its only\n+\t * needed in slow path (hook register/unregister).\n+\t *\n+\t * const struct nf_hook_ops     *orig_ops[]\n+\t */\n+};\n \n-static inline int\n-nf_hook_entry_priority(const struct nf_hook_entry *entry)\n+static inline struct nf_hook_ops **nf_hook_entries_get_hook_ops(const struct nf_hook_entries *e)\n {\n-\treturn entry->orig_ops->priority;\n+\tunsigned int n = e->num_hook_entries;\n+\tconst void *hook_end;\n+\n+\thook_end = &e->hooks[n]; /* this is *past* ->hooks[]! */\n+\n+\treturn (struct nf_hook_ops **)hook_end;\n }\n \n static inline int\n@@ -100,12 +107,6 @@ nf_hook_entry_hookfn(const struct nf_hook_entry *entry, struct sk_buff *skb,\n \treturn entry->hook(entry->priv, skb, state);\n }\n \n-static inline const struct nf_hook_ops *\n-nf_hook_entry_ops(const struct nf_hook_entry *entry)\n-{\n-\treturn entry->orig_ops;\n-}\n-\n static inline void nf_hook_state_init(struct nf_hook_state *p,\n \t\t\t\t      unsigned int hook,\n \t\t\t\t      u_int8_t pf,\n@@ -168,7 +169,7 @@ extern struct static_key nf_hooks_needed[NFPROTO_NUMPROTO][NF_MAX_HOOKS];\n #endif\n \n int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,\n-\t\t struct nf_hook_entry *entry);\n+\t\t const struct nf_hook_entries *e, unsigned int i);\n \n /**\n  *\tnf_hook - call a netfilter hook\n@@ -182,7 +183,7 @@ static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net,\n \t\t\t  struct net_device *indev, struct net_device *outdev,\n \t\t\t  int (*okfn)(struct net *, struct sock *, struct sk_buff *))\n {\n-\tstruct nf_hook_entry *hook_head;\n+\tstruct nf_hook_entries *hook_head;\n \tint ret = 1;\n \n #ifdef HAVE_JUMP_LABEL\n@@ -200,7 +201,7 @@ static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net,\n \t\tnf_hook_state_init(&state, hook, pf, indev, outdev,\n \t\t\t\t   sk, net, okfn);\n \n-\t\tret = nf_hook_slow(skb, &state, hook_head);\n+\t\tret = nf_hook_slow(skb, &state, hook_head, 0);\n \t}\n \trcu_read_unlock();\n \ndiff --git a/include/linux/netfilter_ingress.h b/include/linux/netfilter_ingress.h\nindex 59476061de86..8d5dae1e2ff8 100644\n--- a/include/linux/netfilter_ingress.h\n+++ b/include/linux/netfilter_ingress.h\n@@ -17,7 +17,7 @@ static inline bool nf_hook_ingress_active(const struct sk_buff *skb)\n /* caller must hold rcu_read_lock */\n static inline int nf_hook_ingress(struct sk_buff *skb)\n {\n-\tstruct nf_hook_entry *e = rcu_dereference(skb->dev->nf_hooks_ingress);\n+\tstruct nf_hook_entries *e = rcu_dereference(skb->dev->nf_hooks_ingress);\n \tstruct nf_hook_state state;\n \tint ret;\n \n@@ -30,7 +30,7 @@ static inline int nf_hook_ingress(struct sk_buff *skb)\n \tnf_hook_state_init(&state, NF_NETDEV_INGRESS,\n \t\t\t   NFPROTO_NETDEV, skb->dev, NULL, NULL,\n \t\t\t   dev_net(skb->dev), NULL);\n-\tret = nf_hook_slow(skb, &state, e);\n+\tret = nf_hook_slow(skb, &state, e, 0);\n \tif (ret == 0)\n \t\treturn -1;\n \ndiff --git a/include/net/netfilter/nf_queue.h b/include/net/netfilter/nf_queue.h\nindex 4454719ff849..39468720fc19 100644\n--- a/include/net/netfilter/nf_queue.h\n+++ b/include/net/netfilter/nf_queue.h\n@@ -10,9 +10,9 @@ struct nf_queue_entry {\n \tstruct list_head\tlist;\n \tstruct sk_buff\t\t*skb;\n \tunsigned int\t\tid;\n+\tunsigned int\t\thook_index;\t/* index in hook_entries->hook[] */\n \n \tstruct nf_hook_state\tstate;\n-\tstruct nf_hook_entry\t*hook;\n \tu16\t\t\tsize; /* sizeof(entry) + saved route keys */\n \n \t/* extra space to store route keys */\ndiff --git a/include/net/netns/netfilter.h b/include/net/netns/netfilter.h\nindex cea396b53a60..72d66c8763d0 100644\n--- a/include/net/netns/netfilter.h\n+++ b/include/net/netns/netfilter.h\n@@ -16,7 +16,7 @@ struct netns_nf {\n #ifdef CONFIG_SYSCTL\n \tstruct ctl_table_header *nf_log_dir_header;\n #endif\n-\tstruct nf_hook_entry __rcu *hooks[NFPROTO_NUMPROTO][NF_MAX_HOOKS];\n+\tstruct nf_hook_entries __rcu *hooks[NFPROTO_NUMPROTO][NF_MAX_HOOKS];\n #if IS_ENABLED(CONFIG_NF_DEFRAG_IPV4)\n \tbool\t\t\tdefrag_ipv4;\n #endif\ndiff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c\nindex 626f4b2cef16..c2eea1b8737a 100644\n--- a/net/bridge/br_netfilter_hooks.c\n+++ b/net/bridge/br_netfilter_hooks.c\n@@ -985,22 +985,25 @@ int br_nf_hook_thresh(unsigned int hook, struct net *net,\n \t\t      int (*okfn)(struct net *, struct sock *,\n \t\t\t\t  struct sk_buff *))\n {\n-\tstruct nf_hook_entry *elem;\n+\tconst struct nf_hook_entries *e;\n \tstruct nf_hook_state state;\n+\tstruct nf_hook_ops **ops;\n+\tunsigned int i;\n \tint ret;\n \n-\tfor (elem = rcu_dereference(net->nf.hooks[NFPROTO_BRIDGE][hook]);\n-\t     elem && nf_hook_entry_priority(elem) <= NF_BR_PRI_BRNF;\n-\t     elem = rcu_dereference(elem->next))\n-\t\t;\n-\n-\tif (!elem)\n+\te = rcu_dereference(net->nf.hooks[NFPROTO_BRIDGE][hook]);\n+\tif (!e)\n \t\treturn okfn(net, sk, skb);\n \n+\tops = nf_hook_entries_get_hook_ops(e);\n+\tfor (i = 0; i < e->num_hook_entries &&\n+\t      ops[i]->priority <= NF_BR_PRI_BRNF; i++)\n+\t\t;\n+\n \tnf_hook_state_init(&state, hook, NFPROTO_BRIDGE, indev, outdev,\n \t\t\t   sk, net, okfn);\n \n-\tret = nf_hook_slow(skb, &state, elem);\n+\tret = nf_hook_slow(skb, &state, e, i);\n \tif (ret == 1)\n \t\tret = okfn(net, sk, skb);\n \ndiff --git a/net/netfilter/core.c b/net/netfilter/core.c\nindex 974cf2a3795a..1a9e23c9ab98 100644\n--- a/net/netfilter/core.c\n+++ b/net/netfilter/core.c\n@@ -21,7 +21,7 @@\n #include <linux/inetdevice.h>\n #include <linux/proc_fs.h>\n #include <linux/mutex.h>\n-#include <linux/slab.h>\n+#include <linux/mm.h>\n #include <linux/rcupdate.h>\n #include <net/net_namespace.h>\n #include <net/sock.h>\n@@ -62,10 +62,160 @@ EXPORT_SYMBOL(nf_hooks_needed);\n #endif\n \n static DEFINE_MUTEX(nf_hook_mutex);\n+\n+/* max hooks per family/hooknum */\n+#define MAX_HOOK_COUNT\t\t1024\n+\n #define nf_entry_dereference(e) \\\n \trcu_dereference_protected(e, lockdep_is_held(&nf_hook_mutex))\n \n-static struct nf_hook_entry __rcu **nf_hook_entry_head(struct net *net, const struct nf_hook_ops *reg)\n+static struct nf_hook_entries *allocate_hook_entries_size(u16 num)\n+{\n+\tstruct nf_hook_entries *e;\n+\tsize_t alloc = sizeof(*e) +\n+\t\t       sizeof(struct nf_hook_entry) * num +\n+\t\t       sizeof(struct nf_hook_ops *) * num;\n+\n+\tif (num == 0)\n+\t\treturn NULL;\n+\n+\te = kvzalloc(alloc, GFP_KERNEL);\n+\tif (e)\n+\t\te->num_hook_entries = num;\n+\treturn e;\n+}\n+\n+static unsigned int accept_all(void *priv,\n+\t\t\t       struct sk_buff *skb,\n+\t\t\t       const struct nf_hook_state *state)\n+{\n+\treturn NF_ACCEPT; /* ACCEPT makes nf_hook_slow call next hook */\n+}\n+\n+static const struct nf_hook_ops dummy_ops = {\n+\t.hook = accept_all,\n+\t.priority = INT_MIN,\n+};\n+\n+static struct nf_hook_entries *\n+nf_hook_entries_grow(const struct nf_hook_entries *old,\n+\t\t     const struct nf_hook_ops *reg)\n+{\n+\tunsigned int i, alloc_entries, nhooks, old_entries;\n+\tstruct nf_hook_ops **orig_ops = NULL;\n+\tstruct nf_hook_ops **new_ops;\n+\tstruct nf_hook_entries *new;\n+\tbool inserted = false;\n+\n+\talloc_entries = 1;\n+\told_entries = old ? old->num_hook_entries : 0;\n+\n+\tif (old) {\n+\t\torig_ops = nf_hook_entries_get_hook_ops(old);\n+\n+\t\tfor (i = 0; i < old_entries; i++) {\n+\t\t\tif (orig_ops[i] != &dummy_ops)\n+\t\t\t\talloc_entries++;\n+\t\t}\n+\t}\n+\n+\tif (alloc_entries > MAX_HOOK_COUNT)\n+\t\treturn ERR_PTR(-E2BIG);\n+\n+\tnew = allocate_hook_entries_size(alloc_entries);\n+\tif (!new)\n+\t\treturn ERR_PTR(-ENOMEM);\n+\n+\tnew_ops = nf_hook_entries_get_hook_ops(new);\n+\n+\ti = 0;\n+\tnhooks = 0;\n+\twhile (i < old_entries) {\n+\t\tif (orig_ops[i] == &dummy_ops) {\n+\t\t\t++i;\n+\t\t\tcontinue;\n+\t\t}\n+\t\tif (inserted || reg->priority > orig_ops[i]->priority) {\n+\t\t\tnew_ops[nhooks] = (void *)orig_ops[i];\n+\t\t\tnew->hooks[nhooks] = old->hooks[i];\n+\t\t\ti++;\n+\t\t} else {\n+\t\t\tnew_ops[nhooks] = (void *)reg;\n+\t\t\tnew->hooks[nhooks].hook = reg->hook;\n+\t\t\tnew->hooks[nhooks].priv = reg->priv;\n+\t\t\tinserted = true;\n+\t\t}\n+\t\tnhooks++;\n+\t}\n+\n+\tif (!inserted) {\n+\t\tnew_ops[nhooks] = (void *)reg;\n+\t\tnew->hooks[nhooks].hook = reg->hook;\n+\t\tnew->hooks[nhooks].priv = reg->priv;\n+\t}\n+\n+\treturn new;\n+}\n+\n+/*\n+ * __nf_hook_entries_try_shrink - try to shrink hook array\n+ *\n+ * @pp -- location of hook blob\n+ *\n+ * Hook unregistration must always succeed, so to-be-removed hooks\n+ * are replaced by a dummy one that will just move to next hook.\n+ *\n+ * This counts the current dummy hooks, attempts to allocate new blob,\n+ * copies the live hooks, then replaces and discards old one.\n+ *\n+ * return values:\n+ *\n+ * Returns address to free, or NULL.\n+ */\n+static void *__nf_hook_entries_try_shrink(struct nf_hook_entries __rcu **pp)\n+{\n+\tstruct nf_hook_entries *old, *new = NULL;\n+\tunsigned int i, j, skip = 0, hook_entries;\n+\tstruct nf_hook_ops **orig_ops;\n+\tstruct nf_hook_ops **new_ops;\n+\n+\told = nf_entry_dereference(*pp);\n+\tif (WARN_ON_ONCE(!old))\n+\t\treturn NULL;\n+\n+\torig_ops = nf_hook_entries_get_hook_ops(old);\n+\tfor (i = 0; i < old->num_hook_entries; i++) {\n+\t\tif (orig_ops[i] == &dummy_ops)\n+\t\t\tskip++;\n+\t}\n+\n+\t/* if skip == hook_entries all hooks have been removed */\n+\thook_entries = old->num_hook_entries;\n+\tif (skip == hook_entries)\n+\t\tgoto out_assign;\n+\n+\tif (WARN_ON(skip == 0))\n+\t\treturn NULL;\n+\n+\thook_entries -= skip;\n+\tnew = allocate_hook_entries_size(hook_entries);\n+\tif (!new)\n+\t\treturn NULL;\n+\n+\tnew_ops = nf_hook_entries_get_hook_ops(new);\n+\tfor (i = 0, j = 0; i < old->num_hook_entries; i++) {\n+\t\tif (orig_ops[i] == &dummy_ops)\n+\t\t\tcontinue;\n+\t\tnew->hooks[j] = old->hooks[i];\n+\t\tnew_ops[j] = (void *)orig_ops[i];\n+\t\tj++;\n+\t}\n+out_assign:\n+\trcu_assign_pointer(*pp, new);\n+\treturn old;\n+}\n+\n+static struct nf_hook_entries __rcu **nf_hook_entry_head(struct net *net, const struct nf_hook_ops *reg)\n {\n \tif (reg->pf != NFPROTO_NETDEV)\n \t\treturn net->nf.hooks[reg->pf]+reg->hooknum;\n@@ -76,13 +226,14 @@ static struct nf_hook_entry __rcu **nf_hook_entry_head(struct net *net, const st\n \t\t\treturn &reg->dev->nf_hooks_ingress;\n \t}\n #endif\n+\tWARN_ON_ONCE(1);\n \treturn NULL;\n }\n \n int nf_register_net_hook(struct net *net, const struct nf_hook_ops *reg)\n {\n-\tstruct nf_hook_entry __rcu **pp;\n-\tstruct nf_hook_entry *entry, *p;\n+\tstruct nf_hook_entries *p, *new_hooks;\n+\tstruct nf_hook_entries __rcu **pp;\n \n \tif (reg->pf == NFPROTO_NETDEV) {\n #ifndef CONFIG_NETFILTER_INGRESS\n@@ -98,23 +249,18 @@ int nf_register_net_hook(struct net *net, const struct nf_hook_ops *reg)\n \tif (!pp)\n \t\treturn -EINVAL;\n \n-\tentry = kmalloc(sizeof(*entry), GFP_KERNEL);\n-\tif (!entry)\n-\t\treturn -ENOMEM;\n-\n-\tnf_hook_entry_init(entry, reg);\n-\n \tmutex_lock(&nf_hook_mutex);\n \n-\t/* Find the spot in the list */\n-\tfor (; (p = nf_entry_dereference(*pp)) != NULL; pp = &p->next) {\n-\t\tif (reg->priority < nf_hook_entry_priority(p))\n-\t\t\tbreak;\n-\t}\n-\trcu_assign_pointer(entry->next, p);\n-\trcu_assign_pointer(*pp, entry);\n+\tp = nf_entry_dereference(*pp);\n+\tnew_hooks = nf_hook_entries_grow(p, reg);\n+\n+\tif (!IS_ERR(new_hooks))\n+\t\trcu_assign_pointer(*pp, new_hooks);\n \n \tmutex_unlock(&nf_hook_mutex);\n+\tif (IS_ERR(new_hooks))\n+\t\treturn PTR_ERR(new_hooks);\n+\n #ifdef CONFIG_NETFILTER_INGRESS\n \tif (reg->pf == NFPROTO_NETDEV && reg->hooknum == NF_NETDEV_INGRESS)\n \t\tnet_inc_ingress_queue();\n@@ -122,48 +268,74 @@ int nf_register_net_hook(struct net *net, const struct nf_hook_ops *reg)\n #ifdef HAVE_JUMP_LABEL\n \tstatic_key_slow_inc(&nf_hooks_needed[reg->pf][reg->hooknum]);\n #endif\n+\tsynchronize_net();\n+\tBUG_ON(p == new_hooks);\n+\tkvfree(p);\n \treturn 0;\n }\n EXPORT_SYMBOL(nf_register_net_hook);\n \n-static struct nf_hook_entry *\n-__nf_unregister_net_hook(struct net *net, const struct nf_hook_ops *reg)\n+/*\n+ * __nf_unregister_net_hook - remove a hook from blob\n+ *\n+ * @oldp: current address of hook blob\n+ * @unreg: hook to unregister\n+ *\n+ * This cannot fail, hook unregistration must always succeed.\n+ * Therefore replace the to-be-removed hook with a dummy hook.\n+ */\n+static void __nf_unregister_net_hook(struct nf_hook_entries *old,\n+\t\t\t\t     const struct nf_hook_ops *unreg)\n {\n-\tstruct nf_hook_entry __rcu **pp;\n-\tstruct nf_hook_entry *p;\n-\n-\tpp = nf_hook_entry_head(net, reg);\n-\tif (WARN_ON_ONCE(!pp))\n-\t\treturn NULL;\n+\tstruct nf_hook_ops **orig_ops;\n+\tbool found = false;\n+\tunsigned int i;\n \n-\tmutex_lock(&nf_hook_mutex);\n-\tfor (; (p = nf_entry_dereference(*pp)) != NULL; pp = &p->next) {\n-\t\tif (nf_hook_entry_ops(p) == reg) {\n-\t\t\trcu_assign_pointer(*pp, p->next);\n-\t\t\tbreak;\n-\t\t}\n-\t}\n-\tmutex_unlock(&nf_hook_mutex);\n-\tif (!p) {\n-\t\tWARN(1, \"nf_unregister_net_hook: hook not found!\\n\");\n-\t\treturn NULL;\n+\torig_ops = nf_hook_entries_get_hook_ops(old);\n+\tfor (i = 0; i < old->num_hook_entries; i++) {\n+\t\tif (orig_ops[i] != unreg)\n+\t\t\tcontinue;\n+\t\tWRITE_ONCE(old->hooks[i].hook, accept_all);\n+\t\tWRITE_ONCE(orig_ops[i], &dummy_ops);\n+\t\tfound = true;\n+\t\tbreak;\n \t}\n+\n+\tif (found) {\n #ifdef CONFIG_NETFILTER_INGRESS\n-\tif (reg->pf == NFPROTO_NETDEV && reg->hooknum == NF_NETDEV_INGRESS)\n-\t\tnet_dec_ingress_queue();\n+\t\tif (unreg->pf == NFPROTO_NETDEV && unreg->hooknum == NF_NETDEV_INGRESS)\n+\t\t\tnet_dec_ingress_queue();\n #endif\n #ifdef HAVE_JUMP_LABEL\n-\tstatic_key_slow_dec(&nf_hooks_needed[reg->pf][reg->hooknum]);\n+\t\tstatic_key_slow_dec(&nf_hooks_needed[unreg->pf][unreg->hooknum]);\n #endif\n-\n-\treturn p;\n+\t} else {\n+\t\tWARN_ONCE(1, \"hook not found, pf %d num %d\", unreg->pf, unreg->hooknum);\n+\t}\n }\n \n void nf_unregister_net_hook(struct net *net, const struct nf_hook_ops *reg)\n {\n-\tstruct nf_hook_entry *p = __nf_unregister_net_hook(net, reg);\n+\tstruct nf_hook_entries __rcu **pp;\n+\tstruct nf_hook_entries *p;\n \tunsigned int nfq;\n \n+\tpp = nf_hook_entry_head(net, reg);\n+\tif (!pp)\n+\t\treturn;\n+\n+\tmutex_lock(&nf_hook_mutex);\n+\n+\tp = nf_entry_dereference(*pp);\n+\tif (WARN_ON_ONCE(!p)) {\n+\t\tmutex_unlock(&nf_hook_mutex);\n+\t\treturn;\n+\t}\n+\n+\t__nf_unregister_net_hook(p, reg);\n+\n+\tp = __nf_hook_entries_try_shrink(pp);\n+\tmutex_unlock(&nf_hook_mutex);\n \tif (!p)\n \t\treturn;\n \n@@ -173,7 +345,7 @@ void nf_unregister_net_hook(struct net *net, const struct nf_hook_ops *reg)\n \tnfq = nf_queue_nf_hook_drop(net);\n \tif (nfq)\n \t\tsynchronize_net();\n-\tkfree(p);\n+\tkvfree(p);\n }\n EXPORT_SYMBOL(nf_unregister_net_hook);\n \n@@ -200,46 +372,25 @@ EXPORT_SYMBOL(nf_register_net_hooks);\n void nf_unregister_net_hooks(struct net *net, const struct nf_hook_ops *reg,\n \t\t\t     unsigned int hookcount)\n {\n-\tstruct nf_hook_entry *to_free[16];\n-\tunsigned int i, n, nfq;\n-\n-\tdo {\n-\t\tn = min_t(unsigned int, hookcount, ARRAY_SIZE(to_free));\n-\n-\t\tfor (i = 0; i < n; i++)\n-\t\t\tto_free[i] = __nf_unregister_net_hook(net, &reg[i]);\n-\n-\t\tsynchronize_net();\n-\n-\t\t/* need 2nd synchronize_net() if nfqueue is used, skb\n-\t\t * can get reinjected right before nf_queue_hook_drop()\n-\t\t */\n-\t\tnfq = nf_queue_nf_hook_drop(net);\n-\t\tif (nfq)\n-\t\t\tsynchronize_net();\n-\n-\t\tfor (i = 0; i < n; i++)\n-\t\t\tkfree(to_free[i]);\n+\tunsigned int i;\n \n-\t\treg += n;\n-\t\thookcount -= n;\n-\t} while (hookcount > 0);\n+\tfor (i = 0; i < hookcount; i++)\n+\t\tnf_unregister_net_hook(net, &reg[i]);\n }\n EXPORT_SYMBOL(nf_unregister_net_hooks);\n \n /* Returns 1 if okfn() needs to be executed by the caller,\n  * -EPERM for NF_DROP, 0 otherwise.  Caller must hold rcu_read_lock. */\n int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,\n-\t\t struct nf_hook_entry *entry)\n+\t\t const struct nf_hook_entries *e, unsigned int s)\n {\n \tunsigned int verdict;\n \tint ret;\n \n-\tdo {\n-\t\tverdict = nf_hook_entry_hookfn(entry, skb, state);\n+\tfor (; s < e->num_hook_entries; s++) {\n+\t\tverdict = nf_hook_entry_hookfn(&e->hooks[s], skb, state);\n \t\tswitch (verdict & NF_VERDICT_MASK) {\n \t\tcase NF_ACCEPT:\n-\t\t\tentry = rcu_dereference(entry->next);\n \t\t\tbreak;\n \t\tcase NF_DROP:\n \t\t\tkfree_skb(skb);\n@@ -248,8 +399,8 @@ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,\n \t\t\t\tret = -EPERM;\n \t\t\treturn ret;\n \t\tcase NF_QUEUE:\n-\t\t\tret = nf_queue(skb, state, &entry, verdict);\n-\t\t\tif (ret == 1 && entry)\n+\t\t\tret = nf_queue(skb, state, e, s, verdict);\n+\t\t\tif (ret == 1)\n \t\t\t\tcontinue;\n \t\t\treturn ret;\n \t\tdefault:\n@@ -258,7 +409,7 @@ int nf_hook_slow(struct sk_buff *skb, struct nf_hook_state *state,\n \t\t\t */\n \t\t\treturn 0;\n \t\t}\n-\t} while (entry);\n+\t}\n \n \treturn 1;\n }\ndiff --git a/net/netfilter/nf_internals.h b/net/netfilter/nf_internals.h\nindex 19f00a47a710..bacd6363946e 100644\n--- a/net/netfilter/nf_internals.h\n+++ b/net/netfilter/nf_internals.h\n@@ -13,7 +13,8 @@\n \n /* nf_queue.c */\n int nf_queue(struct sk_buff *skb, struct nf_hook_state *state,\n-\t     struct nf_hook_entry **entryp, unsigned int verdict);\n+\t     const struct nf_hook_entries *entries, unsigned int index,\n+\t     unsigned int verdict);\n unsigned int nf_queue_nf_hook_drop(struct net *net);\n \n /* nf_log.c */\ndiff --git a/net/netfilter/nf_queue.c b/net/netfilter/nf_queue.c\nindex 4f4d80a58fb5..f7e21953b1de 100644\n--- a/net/netfilter/nf_queue.c\n+++ b/net/netfilter/nf_queue.c\n@@ -112,7 +112,8 @@ unsigned int nf_queue_nf_hook_drop(struct net *net)\n EXPORT_SYMBOL_GPL(nf_queue_nf_hook_drop);\n \n static int __nf_queue(struct sk_buff *skb, const struct nf_hook_state *state,\n-\t\t      struct nf_hook_entry *hook_entry, unsigned int queuenum)\n+\t\t      const struct nf_hook_entries *entries,\n+\t\t      unsigned int index, unsigned int queuenum)\n {\n \tint status = -ENOENT;\n \tstruct nf_queue_entry *entry = NULL;\n@@ -140,7 +141,7 @@ static int __nf_queue(struct sk_buff *skb, const struct nf_hook_state *state,\n \t*entry = (struct nf_queue_entry) {\n \t\t.skb\t= skb,\n \t\t.state\t= *state,\n-\t\t.hook\t= hook_entry,\n+\t\t.hook_index = index,\n \t\t.size\t= sizeof(*entry) + afinfo->route_key_size,\n \t};\n \n@@ -163,18 +164,16 @@ static int __nf_queue(struct sk_buff *skb, const struct nf_hook_state *state,\n \n /* Packets leaving via this function must come back through nf_reinject(). */\n int nf_queue(struct sk_buff *skb, struct nf_hook_state *state,\n-\t     struct nf_hook_entry **entryp, unsigned int verdict)\n+\t     const struct nf_hook_entries *entries, unsigned int index,\n+\t     unsigned int verdict)\n {\n-\tstruct nf_hook_entry *entry = *entryp;\n \tint ret;\n \n-\tret = __nf_queue(skb, state, entry, verdict >> NF_VERDICT_QBITS);\n+\tret = __nf_queue(skb, state, entries, index, verdict >> NF_VERDICT_QBITS);\n \tif (ret < 0) {\n \t\tif (ret == -ESRCH &&\n-\t\t    (verdict & NF_VERDICT_FLAG_QUEUE_BYPASS)) {\n-\t\t\t*entryp = rcu_dereference(entry->next);\n+\t\t    (verdict & NF_VERDICT_FLAG_QUEUE_BYPASS))\n \t\t\treturn 1;\n-\t\t}\n \t\tkfree_skb(skb);\n \t}\n \n@@ -183,33 +182,56 @@ int nf_queue(struct sk_buff *skb, struct nf_hook_state *state,\n \n static unsigned int nf_iterate(struct sk_buff *skb,\n \t\t\t       struct nf_hook_state *state,\n-\t\t\t       struct nf_hook_entry **entryp)\n+\t\t\t       const struct nf_hook_entries *hooks,\n+\t\t\t       unsigned int *index)\n {\n-\tunsigned int verdict;\n+\tconst struct nf_hook_entry *hook;\n+\tunsigned int verdict, i = *index;\n \n-\tdo {\n+\twhile (i < hooks->num_hook_entries) {\n+\t\thook = &hooks->hooks[i];\n repeat:\n-\t\tverdict = nf_hook_entry_hookfn((*entryp), skb, state);\n+\t\tverdict = nf_hook_entry_hookfn(hook, skb, state);\n \t\tif (verdict != NF_ACCEPT) {\n \t\t\tif (verdict != NF_REPEAT)\n \t\t\t\treturn verdict;\n \t\t\tgoto repeat;\n \t\t}\n-\t\t*entryp = rcu_dereference((*entryp)->next);\n-\t} while (*entryp);\n+\t\ti++;\n+\t}\n \n+\t*index = i;\n \treturn NF_ACCEPT;\n }\n \n+/* Caller must hold rcu read-side lock */\n void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict)\n {\n-\tstruct nf_hook_entry *hook_entry = entry->hook;\n+\tconst struct nf_hook_entry *hook_entry;\n+\tconst struct nf_hook_entries *hooks;\n \tstruct sk_buff *skb = entry->skb;\n \tconst struct nf_afinfo *afinfo;\n+\tconst struct net *net;\n+\tunsigned int i;\n \tint err;\n+\tu8 pf;\n+\n+\tnet = entry->state.net;\n+\tpf = entry->state.pf;\n+\n+\thooks = rcu_dereference(net->nf.hooks[pf][entry->state.hook]);\n \n \tnf_queue_entry_release_refs(entry);\n \n+\ti = entry->hook_index;\n+\tif (WARN_ON_ONCE(i >= hooks->num_hook_entries)) {\n+\t\tkfree_skb(skb);\n+\t\tkfree(entry);\n+\t\treturn;\n+\t}\n+\n+\thook_entry = &hooks->hooks[i];\n+\n \t/* Continue traversal iff userspace said ok... */\n \tif (verdict == NF_REPEAT)\n \t\tverdict = nf_hook_entry_hookfn(hook_entry, skb, &entry->state);\n@@ -221,27 +243,22 @@ void nf_reinject(struct nf_queue_entry *entry, unsigned int verdict)\n \t}\n \n \tif (verdict == NF_ACCEPT) {\n-\t\thook_entry = rcu_dereference(hook_entry->next);\n-\t\tif (hook_entry)\n next_hook:\n-\t\t\tverdict = nf_iterate(skb, &entry->state, &hook_entry);\n+\t\t++i;\n+\t\tverdict = nf_iterate(skb, &entry->state, hooks, &i);\n \t}\n \n \tswitch (verdict & NF_VERDICT_MASK) {\n \tcase NF_ACCEPT:\n \tcase NF_STOP:\n-okfn:\n \t\tlocal_bh_disable();\n \t\tentry->state.okfn(entry->state.net, entry->state.sk, skb);\n \t\tlocal_bh_enable();\n \t\tbreak;\n \tcase NF_QUEUE:\n-\t\terr = nf_queue(skb, &entry->state, &hook_entry, verdict);\n-\t\tif (err == 1) {\n-\t\t\tif (hook_entry)\n-\t\t\t\tgoto next_hook;\n-\t\t\tgoto okfn;\n-\t\t}\n+\t\terr = nf_queue(skb, &entry->state, hooks, i, verdict);\n+\t\tif (err == 1)\n+\t\t\tgoto next_hook;\n \t\tbreak;\n \tcase NF_STOLEN:\n \t\tbreak;\n","prefixes":["nf-next","v2","1/3"]}