From patchwork Sun Sep 3 22:27:19 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 809398 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3xlngY36Mkz9s06 for ; Mon, 4 Sep 2017 08:28:25 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753285AbdICW2X (ORCPT ); Sun, 3 Sep 2017 18:28:23 -0400 Received: from mail.us.es ([193.147.175.20]:51230 "EHLO mail.us.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753269AbdICW2U (ORCPT ); Sun, 3 Sep 2017 18:28:20 -0400 Received: from antivirus1-rhel7.int (unknown [192.168.2.11]) by mail.us.es (Postfix) with ESMTP id 7E2D9190F61 for ; Mon, 4 Sep 2017 00:27:53 +0200 (CEST) Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 6F629B502D for ; Mon, 4 Sep 2017 00:27:53 +0200 (CEST) Received: by antivirus1-rhel7.int (Postfix, from userid 99) id 653E4B5028; Mon, 4 Sep 2017 00:27:53 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on antivirus1-rhel7.int X-Spam-Level: X-Spam-Status: No, score=-108.2 required=7.5 tests=ALL_TRUSTED,BAYES_50, SMTPAUTH_US2,USER_IN_WHITELIST autolearn=disabled version=3.4.1 Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 51157B502D; Mon, 4 Sep 2017 00:27:51 +0200 (CEST) Received: from 192.168.1.97 (192.168.1.97) by antivirus1-rhel7.int (F-Secure/fsigk_smtp/550/antivirus1-rhel7.int); Mon, 04 Sep 2017 00:27:51 +0200 (CEST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/antivirus1-rhel7.int) Received: from salvia.here (unknown [31.4.193.113]) (Authenticated sender: pneira@us.es) by entrada.int (Postfix) with ESMTPA id E35F74265A20; Mon, 4 Sep 2017 00:27:50 +0200 (CEST) X-SMTPAUTHUS: auth mail.us.es From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org Subject: [PATCH 19/47] netfilter: nft_set_rbtree: use seqcount to avoid lock in most cases Date: Mon, 4 Sep 2017 00:27:19 +0200 Message-Id: <1504477667-12130-10-git-send-email-pablo@netfilter.org> X-Mailer: git-send-email 2.1.4 In-Reply-To: <1504477667-12130-1-git-send-email-pablo@netfilter.org> References: <1504477667-12130-1-git-send-email-pablo@netfilter.org> X-Virus-Scanned: ClamAV using ClamSMTP Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Florian Westphal switch to lockless lockup. write side now also increments sequence counter. On lookup, sample counter value and only take the lock if we did not find a match and the counter has changed. This avoids need to write to private area in normal (lookup) cases. In case we detect a writer (seqretry is true) we fall back to taking the readlock. The readlock is also used during dumps to ensure we get a consistent tree walk. Similar technique (rbtree+seqlock) was used by David Howells in rxrpc. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nft_set_rbtree.c | 49 +++++++++++++++++++++++++++++++----------- 1 file changed, 37 insertions(+), 12 deletions(-) diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c index bce5382f1d49..d83a4ec5900d 100644 --- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -19,8 +19,9 @@ #include struct nft_rbtree { - rwlock_t lock; struct rb_root root; + rwlock_t lock; + seqcount_t count; }; struct nft_rbtree_elem { @@ -40,8 +41,9 @@ static bool nft_rbtree_equal(const struct nft_set *set, const void *this, return memcmp(this, nft_set_ext_key(&interval->ext), set->klen) == 0; } -static bool nft_rbtree_lookup(const struct net *net, const struct nft_set *set, - const u32 *key, const struct nft_set_ext **ext) +static bool __nft_rbtree_lookup(const struct net *net, const struct nft_set *set, + const u32 *key, const struct nft_set_ext **ext, + unsigned int seq) { struct nft_rbtree *priv = nft_set_priv(set); const struct nft_rbtree_elem *rbe, *interval = NULL; @@ -50,15 +52,17 @@ static bool nft_rbtree_lookup(const struct net *net, const struct nft_set *set, const void *this; int d; - read_lock_bh(&priv->lock); - parent = priv->root.rb_node; + parent = rcu_dereference_raw(priv->root.rb_node); while (parent != NULL) { + if (read_seqcount_retry(&priv->count, seq)) + return false; + rbe = rb_entry(parent, struct nft_rbtree_elem, node); this = nft_set_ext_key(&rbe->ext); d = memcmp(this, key, set->klen); if (d < 0) { - parent = parent->rb_left; + parent = rcu_dereference_raw(parent->rb_left); if (interval && nft_rbtree_equal(set, this, interval) && nft_rbtree_interval_end(this) && @@ -66,15 +70,14 @@ static bool nft_rbtree_lookup(const struct net *net, const struct nft_set *set, continue; interval = rbe; } else if (d > 0) - parent = parent->rb_right; + parent = rcu_dereference_raw(parent->rb_right); else { if (!nft_set_elem_active(&rbe->ext, genmask)) { - parent = parent->rb_left; + parent = rcu_dereference_raw(parent->rb_left); continue; } if (nft_rbtree_interval_end(rbe)) goto out; - read_unlock_bh(&priv->lock); *ext = &rbe->ext; return true; @@ -84,15 +87,32 @@ static bool nft_rbtree_lookup(const struct net *net, const struct nft_set *set, if (set->flags & NFT_SET_INTERVAL && interval != NULL && nft_set_elem_active(&interval->ext, genmask) && !nft_rbtree_interval_end(interval)) { - read_unlock_bh(&priv->lock); *ext = &interval->ext; return true; } out: - read_unlock_bh(&priv->lock); return false; } +static bool nft_rbtree_lookup(const struct net *net, const struct nft_set *set, + const u32 *key, const struct nft_set_ext **ext) +{ + struct nft_rbtree *priv = nft_set_priv(set); + unsigned int seq = read_seqcount_begin(&priv->count); + bool ret; + + ret = __nft_rbtree_lookup(net, set, key, ext, seq); + if (ret || !read_seqcount_retry(&priv->count, seq)) + return ret; + + read_lock_bh(&priv->lock); + seq = read_seqcount_begin(&priv->count); + ret = __nft_rbtree_lookup(net, set, key, ext, seq); + read_unlock_bh(&priv->lock); + + return ret; +} + static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set, struct nft_rbtree_elem *new, struct nft_set_ext **ext) @@ -130,7 +150,7 @@ static int __nft_rbtree_insert(const struct net *net, const struct nft_set *set, } } } - rb_link_node(&new->node, parent, p); + rb_link_node_rcu(&new->node, parent, p); rb_insert_color(&new->node, &priv->root); return 0; } @@ -144,7 +164,9 @@ static int nft_rbtree_insert(const struct net *net, const struct nft_set *set, int err; write_lock_bh(&priv->lock); + write_seqcount_begin(&priv->count); err = __nft_rbtree_insert(net, set, rbe, ext); + write_seqcount_end(&priv->count); write_unlock_bh(&priv->lock); return err; @@ -158,7 +180,9 @@ static void nft_rbtree_remove(const struct net *net, struct nft_rbtree_elem *rbe = elem->priv; write_lock_bh(&priv->lock); + write_seqcount_begin(&priv->count); rb_erase(&rbe->node, &priv->root); + write_seqcount_end(&priv->count); write_unlock_bh(&priv->lock); } @@ -264,6 +288,7 @@ static int nft_rbtree_init(const struct nft_set *set, struct nft_rbtree *priv = nft_set_priv(set); rwlock_init(&priv->lock); + seqcount_init(&priv->count); priv->root = RB_ROOT; return 0; }