Message ID | 20090402203220.GA30375@elte.hu |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
* Ingo Molnar <mingo@elte.hu> wrote: > * Eric Dumazet <dada1@cosmosbay.com> wrote: > > > David put into its tree fix for that a few hours ago > > > > commit fa9a86ddc8ecd2830a5e773facc250f110300ae7 > > > > (netfilter: iptables: lock free counters) forgot to disable BH > > in arpt_do_table(), ipt_do_table() and ip6t_do_table() > > > > Use rcu_read_lock_bh() instead of rcu_read_lock() cures the problem. > > ok, got your fix (attached below), thanks Eric for the pointer. > > But i think my fix might be slightly better, because it does not > manipulate the preempt counter and leaves preemption enabled. > > There's no BH context worries since this code did not seem to have > BH protection before either. (it used a plain read_lock(), not > read_lock_bh(), AFAICS) > > I dont see any preemption worries either. I must be missing > something :) as per the other mail - what i missed was that the old code _did_ use read_lock_bh(), which did not get carried over into the rcu_read_lock(). So this fix affects basically all things netfilter, not just rcu-preempt - a plain rcu_read_lock() doesnt protect against BH context interaction. Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 02, 2009 at 11:16:06PM +0200, Ingo Molnar wrote: > * Ingo Molnar <mingo@elte.hu> wrote: > > * Eric Dumazet <dada1@cosmosbay.com> wrote: > > > David put into its tree fix for that a few hours ago > > > > > > commit fa9a86ddc8ecd2830a5e773facc250f110300ae7 > > > > > > (netfilter: iptables: lock free counters) forgot to disable BH > > > in arpt_do_table(), ipt_do_table() and ip6t_do_table() > > > > > > Use rcu_read_lock_bh() instead of rcu_read_lock() cures the problem. > > > > ok, got your fix (attached below), thanks Eric for the pointer. > > > > But i think my fix might be slightly better, because it does not > > manipulate the preempt counter and leaves preemption enabled. > > > > There's no BH context worries since this code did not seem to have > > BH protection before either. (it used a plain read_lock(), not > > read_lock_bh(), AFAICS) > > > > I dont see any preemption worries either. I must be missing > > something :) > > as per the other mail - what i missed was that the old code _did_ > use read_lock_bh(), which did not get carried over into the > rcu_read_lock(). > > So this fix affects basically all things netfilter, not just > rcu-preempt - a plain rcu_read_lock() doesnt protect against BH > context interaction. Strangely enough, the original motivation for rcu_read_lock_bh() does not apply to -rt kernels. The problem was that denial-of-service workloads could apply such a heavy interrupt load to a given CPU that it never got back to process-level execution, thus never passing through any quiescent states. So rcu-bh has softirq-level quiescent states, solving that problem, but by disabling softirq (and thus preemption) across the read-side critical sections. But -rt has every point in the code not covered by rcu_read_lock() as a quiescent state, so should not be vulnerable to that particular denial-of-service attack. But rcu-bh has the additional semantic of excluding BH execution while under rcu_read_lock_bh(), which appears to be used in this case, and probably others as well. Interesting corner we have painted ourselves into here... Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index 35c5f6a..5ba533d 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -253,7 +253,7 @@ unsigned int arpt_do_table(struct sk_buff *skb, indev = in ? in->name : nulldevname; outdev = out ? out->name : nulldevname; - rcu_read_lock(); + rcu_read_lock_bh(); private = rcu_dereference(table->private); table_base = rcu_dereference(private->entries[smp_processor_id()]); @@ -329,7 +329,7 @@ unsigned int arpt_do_table(struct sk_buff *skb, } } while (!hotdrop); - rcu_read_unlock(); + rcu_read_unlock_bh(); if (hotdrop) return NF_DROP; diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index 82ee7c9..810c0b6 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -339,7 +339,7 @@ ipt_do_table(struct sk_buff *skb, IP_NF_ASSERT(table->valid_hooks & (1 << hook)); - rcu_read_lock(); + rcu_read_lock_bh(); private = rcu_dereference(table->private); table_base = rcu_dereference(private->entries[smp_processor_id()]); @@ -437,7 +437,7 @@ ipt_do_table(struct sk_buff *skb, } } while (!hotdrop); - rcu_read_unlock(); + rcu_read_unlock_bh(); #ifdef DEBUG_ALLOW_ALL return NF_ACCEPT; diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index e89cfa3..dfed176 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -365,7 +365,7 @@ ip6t_do_table(struct sk_buff *skb, IP_NF_ASSERT(table->valid_hooks & (1 << hook)); - rcu_read_lock(); + rcu_read_lock_bh(); private = rcu_dereference(table->private); table_base = rcu_dereference(private->entries[smp_processor_id()]); @@ -466,7 +466,7 @@ ip6t_do_table(struct sk_buff *skb, #ifdef CONFIG_NETFILTER_DEBUG ((struct ip6t_entry *)table_base)->comefrom = NETFILTER_LINK_POISON; #endif - rcu_read_unlock(); + rcu_read_unlock_bh(); #ifdef DEBUG_ALLOW_ALL return NF_ACCEPT;