From patchwork Mon Mar 20 10:08:43 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Pablo Neira Ayuso X-Patchwork-Id: 740903 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3vmsCS44QTz9s06 for ; Mon, 20 Mar 2017 21:10:44 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753941AbdCTKKo (ORCPT ); Mon, 20 Mar 2017 06:10:44 -0400 Received: from mail.us.es ([193.147.175.20]:52100 "EHLO mail.us.es" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754061AbdCTKJp (ORCPT ); Mon, 20 Mar 2017 06:09:45 -0400 Received: from antivirus1-rhel7.int (unknown [192.168.2.11]) by mail.us.es (Postfix) with ESMTP id D092C5232D for ; Mon, 20 Mar 2017 11:09:14 +0100 (CET) Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id BD793DA874 for ; Mon, 20 Mar 2017 11:09:14 +0100 (CET) Received: by antivirus1-rhel7.int (Postfix, from userid 99) id B2C69DA86D; Mon, 20 Mar 2017 11:09:14 +0100 (CET) X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on antivirus1-rhel7.int X-Spam-Level: X-Spam-Status: No, score=-107.2 required=7.5 tests=BAYES_50,SMTPAUTH_US, USER_IN_WHITELIST autolearn=disabled version=3.4.1 Received: from antivirus1-rhel7.int (localhost [127.0.0.1]) by antivirus1-rhel7.int (Postfix) with ESMTP id 7E37CDA86D for ; Mon, 20 Mar 2017 11:09:12 +0100 (CET) Received: from 192.168.1.13 (192.168.1.13) by antivirus1-rhel7.int (F-Secure/fsigk_smtp/540/antivirus1-rhel7.int); Mon, 20 Mar 2017 11:09:12 +0100 (CET) X-Virus-Status: clean(F-Secure/fsigk_smtp/540/antivirus1-rhel7.int) Received: (qmail 23391 invoked from network); 20 Mar 2017 11:09:12 +0100 Received: from 129.166.216.87.static.jazztel.es (HELO salvia.here) (pneira@us.es@87.216.166.129) by mail.us.es with SMTP; 20 Mar 2017 11:09:12 +0100 From: Pablo Neira Ayuso To: netfilter-devel@vger.kernel.org Cc: davem@davemloft.net, netdev@vger.kernel.org Subject: [PATCH 15/22] netfilter: nft_set_rbtree: use per-set rwlock to improve the scalability Date: Mon, 20 Mar 2017 11:08:43 +0100 Message-Id: <1490004530-9128-16-git-send-email-pablo@netfilter.org> X-Mailer: git-send-email 2.1.4 In-Reply-To: <1490004530-9128-1-git-send-email-pablo@netfilter.org> References: <1490004530-9128-1-git-send-email-pablo@netfilter.org> X-Virus-Scanned: ClamAV using ClamSMTP Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Liping Zhang Karel Rericha reported that in his test case, ICMP packets going through boxes had normally about 5ms latency. But when running nft, actually listing the sets with interval flags, latency would go up to 30-100ms. This was observed when router throughput is from 600Mbps to 2Gbps. This is because we use a single global spinlock to protect the whole rbtree sets, so "dumping sets" will race with the "key lookup" inevitably. But actually they are all _readers_, so it's ok to convert the spinlock to rwlock to avoid competition between them. Also use per-set rwlock since each set is independent. Reported-by: Karel Rericha Tested-by: Karel Rericha Signed-off-by: Liping Zhang Signed-off-by: Pablo Neira Ayuso --- net/netfilter/nft_set_rbtree.c | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c index 78dfbf9588b3..e97e2fb53f0a 100644 --- a/net/netfilter/nft_set_rbtree.c +++ b/net/netfilter/nft_set_rbtree.c @@ -18,9 +18,8 @@ #include #include -static DEFINE_SPINLOCK(nft_rbtree_lock); - struct nft_rbtree { + rwlock_t lock; struct rb_root root; }; @@ -44,14 +43,14 @@ static bool nft_rbtree_equal(const struct nft_set *set, const void *this, static bool nft_rbtree_lookup(const struct net *net, const struct nft_set *set, const u32 *key, const struct nft_set_ext **ext) { - const struct nft_rbtree *priv = nft_set_priv(set); + struct nft_rbtree *priv = nft_set_priv(set); const struct nft_rbtree_elem *rbe, *interval = NULL; u8 genmask = nft_genmask_cur(net); const struct rb_node *parent; const void *this; int d; - spin_lock_bh(&nft_rbtree_lock); + read_lock_bh(&priv->lock); parent = priv->root.rb_node; while (parent != NULL) { rbe = rb_entry(parent, struct nft_rbtree_elem, node); @@ -75,7 +74,7 @@ static bool nft_rbtree_lookup(const struct net *net, const struct nft_set *set, } if (nft_rbtree_interval_end(rbe)) goto out; - spin_unlock_bh(&nft_rbtree_lock); + read_unlock_bh(&priv->lock); *ext = &rbe->ext; return true; @@ -85,12 +84,12 @@ static bool nft_rbtree_lookup(const struct net *net, const struct nft_set *set, if (set->flags & NFT_SET_INTERVAL && interval != NULL && nft_set_elem_active(&interval->ext, genmask) && !nft_rbtree_interval_end(interval)) { - spin_unlock_bh(&nft_rbtree_lock); + read_unlock_bh(&priv->lock); *ext = &interval->ext; return true; } out: - spin_unlock_bh(&nft_rbtree_lock); + read_unlock_bh(&priv->lock); return false; } @@ -140,12 +139,13 @@ static int nft_rbtree_insert(const struct net *net, const struct nft_set *set, const struct nft_set_elem *elem, struct nft_set_ext **ext) { + struct nft_rbtree *priv = nft_set_priv(set); struct nft_rbtree_elem *rbe = elem->priv; int err; - spin_lock_bh(&nft_rbtree_lock); + write_lock_bh(&priv->lock); err = __nft_rbtree_insert(net, set, rbe, ext); - spin_unlock_bh(&nft_rbtree_lock); + write_unlock_bh(&priv->lock); return err; } @@ -157,9 +157,9 @@ static void nft_rbtree_remove(const struct net *net, struct nft_rbtree *priv = nft_set_priv(set); struct nft_rbtree_elem *rbe = elem->priv; - spin_lock_bh(&nft_rbtree_lock); + write_lock_bh(&priv->lock); rb_erase(&rbe->node, &priv->root); - spin_unlock_bh(&nft_rbtree_lock); + write_unlock_bh(&priv->lock); } static void nft_rbtree_activate(const struct net *net, @@ -224,12 +224,12 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx, struct nft_set *set, struct nft_set_iter *iter) { - const struct nft_rbtree *priv = nft_set_priv(set); + struct nft_rbtree *priv = nft_set_priv(set); struct nft_rbtree_elem *rbe; struct nft_set_elem elem; struct rb_node *node; - spin_lock_bh(&nft_rbtree_lock); + read_lock_bh(&priv->lock); for (node = rb_first(&priv->root); node != NULL; node = rb_next(node)) { rbe = rb_entry(node, struct nft_rbtree_elem, node); @@ -242,13 +242,13 @@ static void nft_rbtree_walk(const struct nft_ctx *ctx, iter->err = iter->fn(ctx, set, iter, &elem); if (iter->err < 0) { - spin_unlock_bh(&nft_rbtree_lock); + read_unlock_bh(&priv->lock); return; } cont: iter->count++; } - spin_unlock_bh(&nft_rbtree_lock); + read_unlock_bh(&priv->lock); } static unsigned int nft_rbtree_privsize(const struct nlattr * const nla[]) @@ -262,6 +262,7 @@ static int nft_rbtree_init(const struct nft_set *set, { struct nft_rbtree *priv = nft_set_priv(set); + rwlock_init(&priv->lock); priv->root = RB_ROOT; return 0; }