From patchwork Thu Jan 14 17:03:56 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sasha Levin X-Patchwork-Id: 567603 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id AB9E0140662 for ; Fri, 15 Jan 2016 04:05:04 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754562AbcANREs (ORCPT ); Thu, 14 Jan 2016 12:04:48 -0500 Received: from userp1040.oracle.com ([156.151.31.81]:18385 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753212AbcANREq (ORCPT ); Thu, 14 Jan 2016 12:04:46 -0500 Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u0EH44lA029475 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Thu, 14 Jan 2016 17:04:05 GMT Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0022.oracle.com (8.13.8/8.13.8) with ESMTP id u0EH41oH006585 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Thu, 14 Jan 2016 17:04:01 GMT Received: from abhmp0004.oracle.com (abhmp0004.oracle.com [141.146.116.10]) by aserv0122.oracle.com (8.13.8/8.13.8) with ESMTP id u0EH41Sd016681; Thu, 14 Jan 2016 17:04:01 GMT Received: from lappy.us.oracle.com (/10.154.115.201) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 14 Jan 2016 09:04:00 -0800 From: Sasha Levin To: pablo@netfilter.org, kaber@trash.net, kadlec@blackhole.kfki.hu, davem@davemloft.net Cc: netfilter-devel@vger.kernel.org, coreteam@netfilter.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Sasha Levin Subject: [PATCH v2] netfilter: nf_conntrack: use safer way to lock all buckets Date: Thu, 14 Jan 2016 12:03:56 -0500 Message-Id: <1452791036-2161-1-git-send-email-sasha.levin@oracle.com> X-Mailer: git-send-email 2.5.0 X-Source-IP: aserv0022.oracle.com [141.146.126.234] Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org When we need to lock all buckets in the connection hashtable we'd attempt to lock 1024 spinlocks, which is way more preemption levels than supported by the kernel. Furthermore, this behavior was hidden by checking if lockdep is enabled, and if it was - use only 8 buckets(!). Fix this by using a global lock and synchronize all buckets on it when we need to lock them all. This is pretty heavyweight, but is only done when we need to resize the hashtable, and that doesn't happen often enough (or at all). Signed-off-by: Sasha Levin --- include/net/netfilter/nf_conntrack_core.h | 8 +++--- net/netfilter/nf_conntrack_core.c | 38 +++++++++++++++++++++-------- net/netfilter/nf_conntrack_helper.c | 2 +- net/netfilter/nf_conntrack_netlink.c | 2 +- net/netfilter/nfnetlink_cttimeout.c | 4 +-- 5 files changed, 35 insertions(+), 19 deletions(-) diff --git a/include/net/netfilter/nf_conntrack_core.h b/include/net/netfilter/nf_conntrack_core.h index 788ef58..62e17d1 100644 --- a/include/net/netfilter/nf_conntrack_core.h +++ b/include/net/netfilter/nf_conntrack_core.h @@ -79,12 +79,10 @@ print_tuple(struct seq_file *s, const struct nf_conntrack_tuple *tuple, const struct nf_conntrack_l3proto *l3proto, const struct nf_conntrack_l4proto *proto); -#ifdef CONFIG_LOCKDEP -# define CONNTRACK_LOCKS 8 -#else -# define CONNTRACK_LOCKS 1024 -#endif +#define CONNTRACK_LOCKS 1024 + extern spinlock_t nf_conntrack_locks[CONNTRACK_LOCKS]; +void nf_conntrack_lock(spinlock_t *lock); extern spinlock_t nf_conntrack_expect_lock; diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 3cb3cb8..4ccf5ad 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -66,6 +66,21 @@ EXPORT_SYMBOL_GPL(nf_conntrack_locks); __cacheline_aligned_in_smp DEFINE_SPINLOCK(nf_conntrack_expect_lock); EXPORT_SYMBOL_GPL(nf_conntrack_expect_lock); +spinlock_t nf_conntrack_locks_all_lock; +bool nf_conntrack_locks_all; + +void nf_conntrack_lock(spinlock_t *lock) +{ + spin_lock(lock); + while (unlikely(nf_conntrack_locks_all)) { + spin_unlock(lock); + spin_lock(&nf_conntrack_locks_all_lock); + spin_unlock(&nf_conntrack_locks_all_lock); + spin_lock(lock); + } +} +EXPORT_SYMBOL_GPL(nf_conntrack_lock); + static void nf_conntrack_double_unlock(unsigned int h1, unsigned int h2) { h1 %= CONNTRACK_LOCKS; @@ -82,12 +97,12 @@ static bool nf_conntrack_double_lock(struct net *net, unsigned int h1, h1 %= CONNTRACK_LOCKS; h2 %= CONNTRACK_LOCKS; if (h1 <= h2) { - spin_lock(&nf_conntrack_locks[h1]); + nf_conntrack_lock(&nf_conntrack_locks[h1]); if (h1 != h2) spin_lock_nested(&nf_conntrack_locks[h2], SINGLE_DEPTH_NESTING); } else { - spin_lock(&nf_conntrack_locks[h2]); + nf_conntrack_lock(&nf_conntrack_locks[h2]); spin_lock_nested(&nf_conntrack_locks[h1], SINGLE_DEPTH_NESTING); } @@ -102,16 +117,19 @@ static void nf_conntrack_all_lock(void) { int i; - for (i = 0; i < CONNTRACK_LOCKS; i++) - spin_lock_nested(&nf_conntrack_locks[i], i); + spin_lock(&nf_conntrack_locks_all_lock); + nf_conntrack_locks_all = true; + + for (i = 0; i < CONNTRACK_LOCKS; i++) { + spin_lock(&nf_conntrack_locks[i]); + spin_unlock(&nf_conntrack_locks[i]); + } } static void nf_conntrack_all_unlock(void) { - int i; - - for (i = 0; i < CONNTRACK_LOCKS; i++) - spin_unlock(&nf_conntrack_locks[i]); + nf_conntrack_locks_all = false; + spin_unlock(&nf_conntrack_locks_all_lock); } unsigned int nf_conntrack_htable_size __read_mostly; @@ -757,7 +775,7 @@ restart: hash = hash_bucket(_hash, net); for (; i < net->ct.htable_size; i++) { lockp = &nf_conntrack_locks[hash % CONNTRACK_LOCKS]; - spin_lock(lockp); + nf_conntrack_lock(lockp); if (read_seqcount_retry(&net->ct.generation, sequence)) { spin_unlock(lockp); goto restart; @@ -1382,7 +1400,7 @@ get_next_corpse(struct net *net, int (*iter)(struct nf_conn *i, void *data), for (; *bucket < net->ct.htable_size; (*bucket)++) { lockp = &nf_conntrack_locks[*bucket % CONNTRACK_LOCKS]; local_bh_disable(); - spin_lock(lockp); + nf_conntrack_lock(lockp); if (*bucket < net->ct.htable_size) { hlist_nulls_for_each_entry(h, n, &net->ct.hash[*bucket], hnnode) { if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL) diff --git a/net/netfilter/nf_conntrack_helper.c b/net/netfilter/nf_conntrack_helper.c index bd9d315..3b40ec5 100644 --- a/net/netfilter/nf_conntrack_helper.c +++ b/net/netfilter/nf_conntrack_helper.c @@ -425,7 +425,7 @@ static void __nf_conntrack_helper_unregister(struct nf_conntrack_helper *me, } local_bh_disable(); for (i = 0; i < net->ct.htable_size; i++) { - spin_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]); + nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]); if (i < net->ct.htable_size) { hlist_nulls_for_each_entry(h, nn, &net->ct.hash[i], hnnode) unhelp(h, me); diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index dbb1bb3..355e855 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -840,7 +840,7 @@ ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb) for (; cb->args[0] < net->ct.htable_size; cb->args[0]++) { restart: lockp = &nf_conntrack_locks[cb->args[0] % CONNTRACK_LOCKS]; - spin_lock(lockp); + nf_conntrack_lock(lockp); if (cb->args[0] >= net->ct.htable_size) { spin_unlock(lockp); goto out; diff --git a/net/netfilter/nfnetlink_cttimeout.c b/net/netfilter/nfnetlink_cttimeout.c index 5d010f2..94837d2 100644 --- a/net/netfilter/nfnetlink_cttimeout.c +++ b/net/netfilter/nfnetlink_cttimeout.c @@ -307,12 +307,12 @@ static void ctnl_untimeout(struct net *net, struct ctnl_timeout *timeout) local_bh_disable(); for (i = 0; i < net->ct.htable_size; i++) { - spin_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]); + nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]); if (i < net->ct.htable_size) { hlist_nulls_for_each_entry(h, nn, &net->ct.hash[i], hnnode) untimeout(h, timeout); } - spin_unlock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]); + nf_conntrack_lock(&nf_conntrack_locks[i % CONNTRACK_LOCKS]); } local_bh_enable(); }