From patchwork Thu Apr 28 17:13:41 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 616345 X-Patchwork-Delegate: pablo@netfilter.org Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3qwk1v0drYz9t7P for ; Fri, 29 Apr 2016 03:13:39 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753057AbcD1RNi (ORCPT ); Thu, 28 Apr 2016 13:13:38 -0400 Received: from Chamillionaire.breakpoint.cc ([80.244.247.6]:41568 "EHLO Chamillionaire.breakpoint.cc" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751734AbcD1RNh (ORCPT ); Thu, 28 Apr 2016 13:13:37 -0400 Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.84_2) (envelope-from ) id 1avpVO-0004k1-99; Thu, 28 Apr 2016 19:13:34 +0200 From: Florian Westphal To: Cc: netdev@vger.kernel.org, Florian Westphal Subject: [PATCH nf-next 2/9] netfilter: conntrack: fix lookup race during hash resize Date: Thu, 28 Apr 2016 19:13:41 +0200 Message-Id: <1461863628-23350-3-git-send-email-fw@strlen.de> X-Mailer: git-send-email 2.7.3 In-Reply-To: <1461863628-23350-1-git-send-email-fw@strlen.de> References: <1461863628-23350-1-git-send-email-fw@strlen.de> Sender: netfilter-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org When resizing the conntrack hash table at runtime via echo 42 > /sys/module/nf_conntrack/parameters/hashsize, we are racing with the conntrack lookup path -- reads can happen in parallel and nothing prevents readers from observing a the newly allocated hash but the old size (or vice versa). So access to hash[bucket] can trigger OOB read access in case the table got expanded and we saw the new size but the old hash pointer (or it got shrunk and we got new hash ptr but the size of the old and larger table): kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.6.0-rc2+ #107 [..] Call Trace: [] ? nf_conntrack_tuple_taken+0x12a/0xe90 [] ? nf_ct_invert_tuplepr+0x221/0x3a0 [] get_unique_tuple+0xfb3/0x2760 Use generation counter to obtain the address/length of the same table. Also add a synchronize_net before freeing the old hash. AFAICS, without it we might access ct_hash[bucket] after ct_hash has been freed, provided that lockless reader got delayed by another event: CPU1 CPU2 seq_begin seq_retry resize occurs free oldhash for_each(oldhash[size]) Note that resize is only supported in init_netns, it took over 2 minutes of constant resizing+flooding to produce the warning, so this isn't a big problem in practice. Signed-off-by: Florian Westphal --- net/netfilter/nf_conntrack_core.c | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 1b63359..29fa08b 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -469,11 +469,18 @@ ____nf_conntrack_find(struct net *net, const struct nf_conntrack_zone *zone, const struct nf_conntrack_tuple *tuple, u32 hash) { struct nf_conntrack_tuple_hash *h; + struct hlist_nulls_head *ct_hash; struct hlist_nulls_node *n; - unsigned int bucket = hash_bucket(hash, net); + unsigned int bucket, sequence; begin: - hlist_nulls_for_each_entry_rcu(h, n, &net->ct.hash[bucket], hnnode) { + do { + sequence = read_seqcount_begin(&nf_conntrack_generation); + bucket = hash_bucket(hash, net); + ct_hash = net->ct.hash; + } while (read_seqcount_retry(&nf_conntrack_generation, sequence)); + + hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[bucket], hnnode) { if (nf_ct_key_equal(h, tuple, zone)) { NF_CT_STAT_INC_ATOMIC(net, found); return h; @@ -722,15 +729,21 @@ nf_conntrack_tuple_taken(const struct nf_conntrack_tuple *tuple, struct net *net = nf_ct_net(ignored_conntrack); const struct nf_conntrack_zone *zone; struct nf_conntrack_tuple_hash *h; + struct hlist_nulls_head *ct_hash; + unsigned int hash, sequence; struct hlist_nulls_node *n; struct nf_conn *ct; - unsigned int hash; zone = nf_ct_zone(ignored_conntrack); - hash = hash_conntrack(net, tuple); rcu_read_lock(); - hlist_nulls_for_each_entry_rcu(h, n, &net->ct.hash[hash], hnnode) { + do { + sequence = read_seqcount_begin(&nf_conntrack_generation); + hash = hash_conntrack(net, tuple); + ct_hash = net->ct.hash; + } while (read_seqcount_retry(&nf_conntrack_generation, sequence)); + + hlist_nulls_for_each_entry_rcu(h, n, &ct_hash[hash], hnnode) { ct = nf_ct_tuplehash_to_ctrack(h); if (ct != ignored_conntrack && nf_ct_tuple_equal(tuple, &h->tuple) && @@ -1607,6 +1620,7 @@ int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp) nf_conntrack_all_unlock(); local_bh_enable(); + synchronize_net(); nf_ct_free_hashtable(old_hash, old_size); return 0; }