From patchwork Sat Jul 2 10:59:25 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Liping Zhang X-Patchwork-Id: 643456 X-Patchwork-Delegate: pablo@netfilter.org Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3rhVg16nbWz9s9G for ; Sat, 2 Jul 2016 21:00:13 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=163.com header.i=@163.com header.b=nmXzfMHi; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751213AbcGBLAM (ORCPT ); Sat, 2 Jul 2016 07:00:12 -0400 Received: from m12-16.163.com ([220.181.12.16]:58573 "EHLO m12-16.163.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751128AbcGBLAJ (ORCPT ); Sat, 2 Jul 2016 07:00:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=From:Subject:Date:Message-Id:MIME-Version; bh=lXp/E P1Z7xx4MtvUTxiygAdwfsncivcHkYqwK8ojxxs=; b=nmXzfMHiviXbOpXMPwZVM Gc/IfjKoiC9+wm7HEPR2kUVcpwIyMspnybjklTFBDflLaHs21El1xmFtwcqAVUR0 AbbCAfJ+4ERMTNdPbC42uNp8Jn4J+50oZoMQTLhvcHo9l7KRAFTvL+pwNOYC0877 k2YFs8Bk7PbKQx88JWwhVY= Received: from MiWiFi-R2D-srv.localdomain (unknown [180.170.252.41]) by smtp12 (Coremail) with SMTP id EMCowABXmaWnnndXXtNsAg--.15306S3; Sat, 02 Jul 2016 19:00:01 +0800 (CST) From: Liping Zhang To: pablo@netfilter.org Cc: netfilter-devel@vger.kernel.org, Liping Zhang Subject: [PATCH nf 1/3] netfilter: conntrack: fix race between nf_conntrack proc read and hash resize Date: Sat, 2 Jul 2016 18:59:25 +0800 Message-Id: <1467457167-5363-2-git-send-email-zlpnobody@163.com> X-Mailer: git-send-email 2.5.5 In-Reply-To: <1467457167-5363-1-git-send-email-zlpnobody@163.com> References: <1467457167-5363-1-git-send-email-zlpnobody@163.com> MIME-Version: 1.0 X-CM-TRANSID: EMCowABXmaWnnndXXtNsAg--.15306S3 X-Coremail-Antispam: 1Uf129KBjvJXoW3WFWxZF4xKry7Xw1kuFW3ZFb_yoWxtF1fpF nYk343Gry8Gr4YyF4jq34kArnxAws3uaySgry3Cr95Cwsrtw4fCFWfKry2vF98JrWkJr47 Zr4j9w15CFWrtaUanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07jfku7UUUUU= X-Originating-IP: [180.170.252.41] X-CM-SenderInfo: x2os00perg5qqrwthudrp/1tbiQBGal1SIPjr+KAAAsS Sender: netfilter-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org From: Liping Zhang When we do "cat /proc/net/nf_conntrack", and meanwhile resize the conntrack hash table via /sys/module/nf_conntrack/parameters/hashsize, race will happen, because reader can observe a newly allocated hash but the old size (or vice versa). So oops will happen like follows: BUG: unable to handle kernel NULL pointer dereference at 0000000000000017 IP: [] seq_print_acct+0x11/0x50 [nf_conntrack] Call Trace: [] ? ct_seq_show+0x14e/0x340 [nf_conntrack] [] seq_read+0x2cc/0x390 [] proc_reg_read+0x42/0x70 [] __vfs_read+0x37/0x130 [] ? security_file_permission+0xa0/0xc0 [] vfs_read+0x95/0x140 [] SyS_read+0x55/0xc0 [] entry_SYSCALL_64_fastpath+0x1a/0xa4 It is very easy to reproduce this kernel crash. 1. open one shell and input the following cmds: while : ; do echo $RANDOM > hashsize done 2. open more shells and input the following cmds: while : ; do cat /proc/net/nf_conntrack done 3. just wait a monent, oops will happen soon. The solution in this patch is based on Florian's Commit 5e3c61f98175 ("netfilter: conntrack: fix lookup race during hash resize"). Signed-off-by: Liping Zhang --- include/net/netfilter/nf_conntrack_core.h | 1 + .../netfilter/nf_conntrack_l3proto_ipv4_compat.c | 20 ++++++++++++++++---- net/netfilter/nf_conntrack_core.c | 4 +++- net/netfilter/nf_conntrack_standalone.c | 20 +++++++++++++++----- 4 files changed, 35 insertions(+), 10 deletions(-) diff --git a/include/net/netfilter/nf_conntrack_core.h b/include/net/netfilter/nf_conntrack_core.h index 3e2f332..4f6453a 100644 --- a/include/net/netfilter/nf_conntrack_core.h +++ b/include/net/netfilter/nf_conntrack_core.h @@ -82,6 +82,7 @@ print_tuple(struct seq_file *s, const struct nf_conntrack_tuple *tuple, #define CONNTRACK_LOCKS 1024 extern struct hlist_nulls_head *nf_conntrack_hash; +extern seqcount_t nf_conntrack_generation; extern spinlock_t nf_conntrack_locks[CONNTRACK_LOCKS]; void nf_conntrack_lock(spinlock_t *lock); diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c index c6f3c40..584899f 100644 --- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c +++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4_compat.c @@ -26,6 +26,8 @@ struct ct_iter_state { struct seq_net_private p; + struct hlist_nulls_head *hash; + unsigned int htable_size; unsigned int bucket; }; @@ -35,10 +37,10 @@ static struct hlist_nulls_node *ct_get_first(struct seq_file *seq) struct hlist_nulls_node *n; for (st->bucket = 0; - st->bucket < nf_conntrack_htable_size; + st->bucket < st->htable_size; st->bucket++) { n = rcu_dereference( - hlist_nulls_first_rcu(&nf_conntrack_hash[st->bucket])); + hlist_nulls_first_rcu(&st->hash[st->bucket])); if (!is_a_nulls(n)) return n; } @@ -53,11 +55,11 @@ static struct hlist_nulls_node *ct_get_next(struct seq_file *seq, head = rcu_dereference(hlist_nulls_next_rcu(head)); while (is_a_nulls(head)) { if (likely(get_nulls_value(head) == st->bucket)) { - if (++st->bucket >= nf_conntrack_htable_size) + if (++st->bucket >= st->htable_size) return NULL; } head = rcu_dereference( - hlist_nulls_first_rcu(&nf_conntrack_hash[st->bucket])); + hlist_nulls_first_rcu(&st->hash[st->bucket])); } return head; } @@ -75,7 +77,17 @@ static struct hlist_nulls_node *ct_get_idx(struct seq_file *seq, loff_t pos) static void *ct_seq_start(struct seq_file *seq, loff_t *pos) __acquires(RCU) { + struct ct_iter_state *st = seq->private; + unsigned int sequence; + rcu_read_lock(); + + do { + sequence = read_seqcount_begin(&nf_conntrack_generation); + st->htable_size = nf_conntrack_htable_size; + st->hash = nf_conntrack_hash; + } while (read_seqcount_retry(&nf_conntrack_generation, sequence)); + return ct_get_idx(seq, *pos); } diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index f204274..1c39697 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -72,9 +72,11 @@ EXPORT_SYMBOL_GPL(nf_conntrack_expect_lock); struct hlist_nulls_head *nf_conntrack_hash __read_mostly; EXPORT_SYMBOL_GPL(nf_conntrack_hash); +seqcount_t nf_conntrack_generation __read_mostly; +EXPORT_SYMBOL_GPL(nf_conntrack_generation); + static __read_mostly struct kmem_cache *nf_conntrack_cachep; static __read_mostly spinlock_t nf_conntrack_locks_all_lock; -static __read_mostly seqcount_t nf_conntrack_generation; static __read_mostly DEFINE_SPINLOCK(nf_conntrack_locks_all_lock); static __read_mostly bool nf_conntrack_locks_all; diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c index c026c47..2cf484b 100644 --- a/net/netfilter/nf_conntrack_standalone.c +++ b/net/netfilter/nf_conntrack_standalone.c @@ -48,6 +48,8 @@ EXPORT_SYMBOL_GPL(print_tuple); struct ct_iter_state { struct seq_net_private p; + struct hlist_nulls_head *hash; + unsigned int htable_size; unsigned int bucket; u_int64_t time_now; }; @@ -58,9 +60,10 @@ static struct hlist_nulls_node *ct_get_first(struct seq_file *seq) struct hlist_nulls_node *n; for (st->bucket = 0; - st->bucket < nf_conntrack_htable_size; + st->bucket < st->htable_size; st->bucket++) { - n = rcu_dereference(hlist_nulls_first_rcu(&nf_conntrack_hash[st->bucket])); + n = rcu_dereference( + hlist_nulls_first_rcu(&st->hash[st->bucket])); if (!is_a_nulls(n)) return n; } @@ -75,12 +78,11 @@ static struct hlist_nulls_node *ct_get_next(struct seq_file *seq, head = rcu_dereference(hlist_nulls_next_rcu(head)); while (is_a_nulls(head)) { if (likely(get_nulls_value(head) == st->bucket)) { - if (++st->bucket >= nf_conntrack_htable_size) + if (++st->bucket >= st->htable_size) return NULL; } head = rcu_dereference( - hlist_nulls_first_rcu( - &nf_conntrack_hash[st->bucket])); + hlist_nulls_first_rcu(&st->hash[st->bucket])); } return head; } @@ -99,9 +101,17 @@ static void *ct_seq_start(struct seq_file *seq, loff_t *pos) __acquires(RCU) { struct ct_iter_state *st = seq->private; + unsigned int sequence; st->time_now = ktime_get_real_ns(); rcu_read_lock(); + + do { + sequence = read_seqcount_begin(&nf_conntrack_generation); + st->htable_size = nf_conntrack_htable_size; + st->hash = nf_conntrack_hash; + } while (read_seqcount_retry(&nf_conntrack_generation, sequence)); + return ct_get_idx(seq, *pos); }