Patchwork [netfilter,bug] BUG: using smp_processor_id() in preemptible [00000000] code: ssh/9115, caller is ipt_do_table+0xc8/0x559

login
register
mail settings
Submitter Ingo Molnar
Date April 2, 2009, 8:32 p.m.
Message ID <20090402203220.GA30375@elte.hu>
Download mbox | patch
Permalink /patch/25543/
State Rejected
Delegated to: David Miller
Headers show

Comments

Ingo Molnar - April 2, 2009, 8:32 p.m.
* Eric Dumazet <dada1@cosmosbay.com> wrote:

> David put into its tree fix for that a few hours ago
> 
> commit fa9a86ddc8ecd2830a5e773facc250f110300ae7
> 
> (netfilter: iptables: lock free counters) forgot to disable BH
> in arpt_do_table(), ipt_do_table() and  ip6t_do_table()
> 
> Use rcu_read_lock_bh() instead of rcu_read_lock() cures the problem.

ok, got your fix (attached below), thanks Eric for the pointer.

But i think my fix might be slightly better, because it does not 
manipulate the preempt counter and leaves preemption enabled. 

There's no BH context worries since this code did not seem to have 
BH protection before either. (it used a plain read_lock(), not 
read_lock_bh(), AFAICS)

I dont see any preemption worries either. I must be missing 
something :)

With my patch applied the box appears to boot fine and there's no 
syslog flood either. (no heavy testing done though)

	Ingo

--------------->
From fa9a86ddc8ecd2830a5e773facc250f110300ae7 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <dada1@cosmosbay.com>
Date: Thu, 2 Apr 2009 00:53:49 -0700
Subject: [PATCH] netfilter: use rcu_read_bh() in ipt_do_table()

Commit 784544739a25c30637397ace5489eeb6e15d7d49
(netfilter: iptables: lock free counters) forgot to disable BH
in arpt_do_table(), ipt_do_table() and  ip6t_do_table()

Use rcu_read_lock_bh() instead of rcu_read_lock() cures the problem.

Reported-and-bisected-by: Roman Mindalev <r000n@r000n.net>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/netfilter/arp_tables.c |    4 ++--
 net/ipv4/netfilter/ip_tables.c  |    4 ++--
 net/ipv6/netfilter/ip6_tables.c |    4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar - April 2, 2009, 9:16 p.m.
* Ingo Molnar <mingo@elte.hu> wrote:

> * Eric Dumazet <dada1@cosmosbay.com> wrote:
> 
> > David put into its tree fix for that a few hours ago
> > 
> > commit fa9a86ddc8ecd2830a5e773facc250f110300ae7
> > 
> > (netfilter: iptables: lock free counters) forgot to disable BH
> > in arpt_do_table(), ipt_do_table() and  ip6t_do_table()
> > 
> > Use rcu_read_lock_bh() instead of rcu_read_lock() cures the problem.
> 
> ok, got your fix (attached below), thanks Eric for the pointer.
> 
> But i think my fix might be slightly better, because it does not 
> manipulate the preempt counter and leaves preemption enabled.
> 
> There's no BH context worries since this code did not seem to have 
> BH protection before either. (it used a plain read_lock(), not 
> read_lock_bh(), AFAICS)
> 
> I dont see any preemption worries either. I must be missing 
> something :)

as per the other mail - what i missed was that the old code _did_ 
use read_lock_bh(), which did not get carried over into the 
rcu_read_lock().

So this fix affects basically all things netfilter, not just 
rcu-preempt - a plain rcu_read_lock() doesnt protect against BH 
context interaction.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney - April 4, 2009, 5:23 p.m.
On Thu, Apr 02, 2009 at 11:16:06PM +0200, Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
> > * Eric Dumazet <dada1@cosmosbay.com> wrote:
> > > David put into its tree fix for that a few hours ago
> > > 
> > > commit fa9a86ddc8ecd2830a5e773facc250f110300ae7
> > > 
> > > (netfilter: iptables: lock free counters) forgot to disable BH
> > > in arpt_do_table(), ipt_do_table() and  ip6t_do_table()
> > > 
> > > Use rcu_read_lock_bh() instead of rcu_read_lock() cures the problem.
> > 
> > ok, got your fix (attached below), thanks Eric for the pointer.
> > 
> > But i think my fix might be slightly better, because it does not 
> > manipulate the preempt counter and leaves preemption enabled.
> > 
> > There's no BH context worries since this code did not seem to have 
> > BH protection before either. (it used a plain read_lock(), not 
> > read_lock_bh(), AFAICS)
> > 
> > I dont see any preemption worries either. I must be missing 
> > something :)
> 
> as per the other mail - what i missed was that the old code _did_ 
> use read_lock_bh(), which did not get carried over into the 
> rcu_read_lock().
> 
> So this fix affects basically all things netfilter, not just 
> rcu-preempt - a plain rcu_read_lock() doesnt protect against BH 
> context interaction.

Strangely enough, the original motivation for rcu_read_lock_bh() does not
apply to -rt kernels.  The problem was that denial-of-service workloads
could apply such a heavy interrupt load to a given CPU that it never
got back to process-level execution, thus never passing through any
quiescent states.

So rcu-bh has softirq-level quiescent states, solving that problem,
but by disabling softirq (and thus preemption) across the read-side
critical sections.

But -rt has every point in the code not covered by rcu_read_lock()
as a quiescent state, so should not be vulnerable to that particular
denial-of-service attack.  But rcu-bh has the additional semantic of
excluding BH execution while under rcu_read_lock_bh(), which appears
to be used in this case, and probably others as well.

Interesting corner we have painted ourselves into here...

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 35c5f6a..5ba533d 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -253,7 +253,7 @@  unsigned int arpt_do_table(struct sk_buff *skb,
 	indev = in ? in->name : nulldevname;
 	outdev = out ? out->name : nulldevname;
 
-	rcu_read_lock();
+	rcu_read_lock_bh();
 	private = rcu_dereference(table->private);
 	table_base = rcu_dereference(private->entries[smp_processor_id()]);
 
@@ -329,7 +329,7 @@  unsigned int arpt_do_table(struct sk_buff *skb,
 		}
 	} while (!hotdrop);
 
-	rcu_read_unlock();
+	rcu_read_unlock_bh();
 
 	if (hotdrop)
 		return NF_DROP;
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 82ee7c9..810c0b6 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -339,7 +339,7 @@  ipt_do_table(struct sk_buff *skb,
 
 	IP_NF_ASSERT(table->valid_hooks & (1 << hook));
 
-	rcu_read_lock();
+	rcu_read_lock_bh();
 	private = rcu_dereference(table->private);
 	table_base = rcu_dereference(private->entries[smp_processor_id()]);
 
@@ -437,7 +437,7 @@  ipt_do_table(struct sk_buff *skb,
 		}
 	} while (!hotdrop);
 
-	rcu_read_unlock();
+	rcu_read_unlock_bh();
 
 #ifdef DEBUG_ALLOW_ALL
 	return NF_ACCEPT;
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index e89cfa3..dfed176 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -365,7 +365,7 @@  ip6t_do_table(struct sk_buff *skb,
 
 	IP_NF_ASSERT(table->valid_hooks & (1 << hook));
 
-	rcu_read_lock();
+	rcu_read_lock_bh();
 	private = rcu_dereference(table->private);
 	table_base = rcu_dereference(private->entries[smp_processor_id()]);
 
@@ -466,7 +466,7 @@  ip6t_do_table(struct sk_buff *skb,
 #ifdef CONFIG_NETFILTER_DEBUG
 	((struct ip6t_entry *)table_base)->comefrom = NETFILTER_LINK_POISON;
 #endif
-	rcu_read_unlock();
+	rcu_read_unlock_bh();
 
 #ifdef DEBUG_ALLOW_ALL
 	return NF_ACCEPT;