{"id":809399,"url":"http://patchwork.ozlabs.org/api/1.2/patches/809399/?format=json","web_url":"http://patchwork.ozlabs.org/project/netfilter-devel/patch/1504477667-12130-10-git-send-email-pablo@netfilter.org/","project":{"id":26,"url":"http://patchwork.ozlabs.org/api/1.2/projects/26/?format=json","name":"Netfilter Development","link_name":"netfilter-devel","list_id":"netfilter-devel.vger.kernel.org","list_email":"netfilter-devel@vger.kernel.org","web_url":null,"scm_url":null,"webscm_url":null,"list_archive_url":"","list_archive_url_format":"","commit_url_format":""},"msgid":"<1504477667-12130-10-git-send-email-pablo@netfilter.org>","list_archive_url":null,"date":"2017-09-03T22:27:19","name":"[19/47] netfilter: nft_set_rbtree: use seqcount to avoid lock in most cases","commit_ref":null,"pull_url":null,"state":"accepted","archived":false,"hash":"8f6bea934935526cb7283687934d8deadb526551","submitter":{"id":1315,"url":"http://patchwork.ozlabs.org/api/1.2/people/1315/?format=json","name":"Pablo Neira Ayuso","email":"pablo@netfilter.org"},"delegate":{"id":6139,"url":"http://patchwork.ozlabs.org/api/1.2/users/6139/?format=json","username":"pablo","first_name":"Pablo","last_name":"Neira","email":"pablo@netfilter.org"},"mbox":"http://patchwork.ozlabs.org/project/netfilter-devel/patch/1504477667-12130-10-git-send-email-pablo@netfilter.org/mbox/","series":[{"id":1280,"url":"http://patchwork.ozlabs.org/api/1.2/series/1280/?format=json","web_url":"http://patchwork.ozlabs.org/project/netfilter-devel/list/?series=1280","date":"2017-09-03T22:25:42","name":"[01/47] netfilter: expect: add to hash table after expect init","version":1,"mbox":"http://patchwork.ozlabs.org/series/1280/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/patches/809399/comments/","check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/809399/checks/","tags":{},"related":[],"headers":{"Return-Path":"<netfilter-devel-owner@vger.kernel.org>","X-Original-To":"incoming@patchwork.ozlabs.org","Delivered-To":"patchwork-incoming@bilbo.ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=netfilter-devel-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xlngc2nkjz9s06\n\tfor <incoming@patchwork.ozlabs.org>;\n\tMon,  4 Sep 2017 08:28:28 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1753219AbdICW2W (ORCPT <rfc822;incoming@patchwork.ozlabs.org>);\n\tSun, 3 Sep 2017 18:28:22 -0400","from mail.us.es ([193.147.175.20]:51232 \"EHLO mail.us.es\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1753280AbdICW2U (ORCPT <rfc822; netfilter-devel@vger.kernel.org>);\n\tSun, 3 Sep 2017 18:28:20 -0400","from antivirus1-rhel7.int (unknown [192.168.2.11])\n\tby mail.us.es (Postfix) with ESMTP id 88279190F62\n\tfor <netfilter-devel@vger.kernel.org>;\n\tMon,  4 Sep 2017 00:27:53 +0200 (CEST)","from antivirus1-rhel7.int (localhost [127.0.0.1])\n\tby antivirus1-rhel7.int (Postfix) with ESMTP id 78BFAB502E\n\tfor <netfilter-devel@vger.kernel.org>;\n\tMon,  4 Sep 2017 00:27:53 +0200 (CEST)","by antivirus1-rhel7.int (Postfix, from userid 99)\n\tid 6E473B502C; Mon,  4 Sep 2017 00:27:53 +0200 (CEST)","from antivirus1-rhel7.int (localhost [127.0.0.1])\n\tby antivirus1-rhel7.int (Postfix) with ESMTP id 51157B502D;\n\tMon,  4 Sep 2017 00:27:51 +0200 (CEST)","from 192.168.1.97 (192.168.1.97) by antivirus1-rhel7.int\n\t(F-Secure/fsigk_smtp/550/antivirus1-rhel7.int); \n\tMon, 04 Sep 2017 00:27:51 +0200 (CEST)","from salvia.here (unknown [31.4.193.113])\n\t(Authenticated sender: pneira@us.es)\n\tby entrada.int (Postfix) with ESMTPA id E35F74265A20;\n\tMon,  4 Sep 2017 00:27:50 +0200 (CEST)"],"X-Spam-Checker-Version":"SpamAssassin 3.4.1 (2015-04-28) on\n\tantivirus1-rhel7.int","X-Spam-Level":"","X-Spam-Status":"No, score=-108.2 required=7.5 tests=ALL_TRUSTED,BAYES_50,\n\tSMTPAUTH_US2,USER_IN_WHITELIST autolearn=disabled version=3.4.1","X-Virus-Status":"clean(F-Secure/fsigk_smtp/550/antivirus1-rhel7.int)","X-SMTPAUTHUS":"auth mail.us.es","From":"Pablo Neira Ayuso <pablo@netfilter.org>","To":"netfilter-devel@vger.kernel.org","Cc":"davem@davemloft.net, netdev@vger.kernel.org","Subject":"[PATCH 19/47] netfilter: nft_set_rbtree: use seqcount to avoid lock\n\tin most cases","Date":"Mon,  4 Sep 2017 00:27:19 +0200","Message-Id":"<1504477667-12130-10-git-send-email-pablo@netfilter.org>","X-Mailer":"git-send-email 2.1.4","In-Reply-To":"<1504477667-12130-1-git-send-email-pablo@netfilter.org>","References":"<1504477667-12130-1-git-send-email-pablo@netfilter.org>","X-Virus-Scanned":"ClamAV using ClamSMTP","Sender":"netfilter-devel-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<netfilter-devel.vger.kernel.org>","X-Mailing-List":"netfilter-devel@vger.kernel.org"},"content":"From: Florian Westphal <fw@strlen.de>\n\nswitch to lockless lockup. write side now also increments sequence\ncounter.  On lookup, sample counter value and only take the lock\nif we did not find a match and the counter has changed.\n\nThis avoids need to write to private area in normal (lookup) cases.\n\nIn case we detect a writer (seqretry is true) we fall back to taking\nthe readlock.\n\nThe readlock is also used during dumps to ensure we get a consistent\ntree walk.\n\nSimilar technique (rbtree+seqlock) was used by David Howells in rxrpc.\n\nSigned-off-by: Florian Westphal <fw@strlen.de>\nSigned-off-by: Pablo Neira Ayuso <pablo@netfilter.org>\n---\n net/netfilter/nft_set_rbtree.c | 49 +++++++++++++++++++++++++++++++-----------\n 1 file changed, 37 insertions(+), 12 deletions(-)","diff":"diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c\nindex bce5382f1d49..d83a4ec5900d 100644\n--- a/net/netfilter/nft_set_rbtree.c\n+++ b/net/netfilter/nft_set_rbtree.c\n@@ -19,8 +19,9 @@\n #include <net/netfilter/nf_tables.h>\n \n struct nft_rbtree {\n-\trwlock_t\t\tlock;\n \tstruct rb_root\t\troot;\n+\trwlock_t\t\tlock;\n+\tseqcount_t\t\tcount;\n };\n \n struct nft_rbtree_elem {\n@@ -40,8 +41,9 @@ static bool nft_rbtree_equal(const struct nft_set *set, const void *this,\n \treturn memcmp(this, nft_set_ext_key(&interval->ext), set->klen) == 0;\n }\n \n-static bool nft_rbtree_lookup(const struct net *net, const struct nft_set *set,\n-\t\t\t      const u32 *key, const struct nft_set_ext **ext)\n+static bool __nft_rbtree_lookup(const struct net *net, const struct nft_set *set,\n+\t\t\t\tconst u32 *key, const struct nft_set_ext **ext,\n+\t\t\t\tunsigned int seq)\n {\n \tstruct nft_rbtree *priv = nft_set_priv(set);\n \tconst struct nft_rbtree_elem *rbe, *interval = NULL;\n@@ -50,15 +52,17 @@ static bool nft_rbtree_lookup(const struct net *net, const struct nft_set *set,\n \tconst void *this;\n \tint d;\n \n-\tread_lock_bh(&priv->lock);\n-\tparent = priv->root.rb_node;\n+\tparent = rcu_dereference_raw(priv->root.rb_node);\n \twhile (parent != NULL) {\n+\t\tif (read_seqcount_retry(&priv->count, seq))\n+\t\t\treturn false;\n+\n \t\trbe = rb_entry(parent, struct nft_rbtree_elem, node);\n \n \t\tthis = nft_set_ext_key(&rbe->ext);\n \t\td = memcmp(this, key, set->klen);\n \t\tif (d < 0) {\n-\t\t\tparent = parent->rb_left;\n+\t\t\tparent = rcu_dereference_raw(parent->rb_left);\n \t\t\tif (interval &&\n \t\t\t    nft_rbtree_equal(set, this, interval) &&\n \t\t\t    nft_rbtree_interval_end(this) &&\n@@ -66,15 +70,14 @@ static bool nft_rbtree_lookup(const struct net *net, const struct nft_set *set,\n \t\t\t\tcontinue;\n \t\t\tinterval = rbe;\n \t\t} else if (d > 0)\n-\t\t\tparent = parent->rb_right;\n+\t\t\tparent = rcu_dereference_raw(parent->rb_right);\n \t\telse {\n \t\t\tif (!nft_set_elem_active(&rbe->ext, genmask)) {\n-\t\t\t\tparent = parent->rb_left;\n+\t\t\t\tparent = rcu_dereference_raw(parent->rb_left);\n \t\t\t\tcontinue;\n \t\t\t}\n \t\t\tif (nft_rbtree_interval_end(rbe))\n \t\t\t\tgoto out;\n-\t\t\tread_unlock_bh(&priv->lock);\n \n \t\t\t*ext = &rbe->ext;\n \t\t\treturn true;\n@@ -84,15 +87,32 @@ static bool nft_rbtree_lookup(const struct net *net, const struct nft_set *set,\n \tif (set->flags & NFT_SET_INTERVAL && interval != NULL &&\n \t    nft_set_elem_active(&interval->ext, genmask) &&\n \t    !nft_rbtree_interval_end(interval)) {\n-\t\tread_unlock_bh(&priv->lock);\n \t\t*ext = &interval->ext;\n \t\treturn true;\n \t}\n out:\n-\tread_unlock_bh(&priv->lock);\n \treturn false;\n }\n \n+static bool nft_rbtree_lookup(const struct net *net, const struct nft_set *set,\n+\t\t\t      const u32 *key, const struct nft_set_ext **ext)\n+{\n+\tstruct nft_rbtree *priv = nft_set_priv(set);\n+\tunsigned int seq = read_seqcount_begin(&priv->count);\n+\tbool ret;\n+\n+\tret = __nft_rbtree_lookup(net, set, key, ext, seq);\n+\tif (ret || !read_seqcount_retry(&priv->count, seq))\n+\t\treturn ret;\n+\n+\tread_lock_bh(&priv->lock);\n+\tseq = read_seqcount_begin(&priv->count);\n+\tret = __nft_rbtree_lookup(net, set, key, ext, seq);\n+\tread_unlock_bh(&priv->lock);\n+\n+\treturn ret;\n+}\n+\n static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set,\n \t\t\t       struct nft_rbtree_elem *new,\n \t\t\t       struct nft_set_ext **ext)\n@@ -130,7 +150,7 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set,\n \t\t\t}\n \t\t}\n \t}\n-\trb_link_node(&new->node, parent, p);\n+\trb_link_node_rcu(&new->node, parent, p);\n \trb_insert_color(&new->node, &priv->root);\n \treturn 0;\n }\n@@ -144,7 +164,9 @@ static int nft_rbtree_insert(const struct net *net, const struct nft_set *set,\n \tint err;\n \n \twrite_lock_bh(&priv->lock);\n+\twrite_seqcount_begin(&priv->count);\n \terr = __nft_rbtree_insert(net, set, rbe, ext);\n+\twrite_seqcount_end(&priv->count);\n \twrite_unlock_bh(&priv->lock);\n \n \treturn err;\n@@ -158,7 +180,9 @@ static void nft_rbtree_remove(const struct net *net,\n \tstruct nft_rbtree_elem *rbe = elem->priv;\n \n \twrite_lock_bh(&priv->lock);\n+\twrite_seqcount_begin(&priv->count);\n \trb_erase(&rbe->node, &priv->root);\n+\twrite_seqcount_end(&priv->count);\n \twrite_unlock_bh(&priv->lock);\n }\n \n@@ -264,6 +288,7 @@ static int nft_rbtree_init(const struct nft_set *set,\n \tstruct nft_rbtree *priv = nft_set_priv(set);\n \n \trwlock_init(&priv->lock);\n+\tseqcount_init(&priv->count);\n \tpriv->root = RB_ROOT;\n \treturn 0;\n }\n","prefixes":["19/47"]}