Patchwork netfilter: iptables: lock free counters, PREEMPT_RCU=y fix

login
register
mail settings
Submitter Ingo Molnar
Date April 2, 2009, 8:12 p.m.
Message ID <20090402201245.GA29904@elte.hu>
Download mbox | patch
Permalink /patch/25542/
State Rejected
Delegated to: David Miller
Headers show

Comments

Ingo Molnar - April 2, 2009, 8:12 p.m.
Impact: fix log spam under CONFIG_DEBUG_PREEMPT=y

This recent commit:

   7845447: netfilter: iptables: lock free counters

Converted a couple of netfilter codepaths from read_lock() critical 
sections to lockless rcu_read_lock(). What it forgot about is that 
under CONFIG_PREEMPT=y and CONFIG_PREEMPT_RCU=y these sections can 
be preempted.

Under CONFIG_DEBUG_PREEMPT=y this produces such warnings:

BUG: using smp_processor_id() in preemptible [00000000] code: ssh/9115
caller is ipt_do_table+0xc8/0x559
Pid: 9115, comm: ssh Tainted: G        W  2.6.29-tip-08646-g45ef7c3-dirty #26231
Call Trace:
 [<c0c0dacf>] ? printk+0x14/0x16
 [<c048c2e6>] debug_smp_processor_id+0xa6/0xbc
 [<c0adbd03>] ipt_do_table+0xc8/0x559
 [<c0c10277>] ? _read_unlock+0x3d/0x49
 [<c0ad68c0>] ? fn_hash_lookup+0x94/0xa0
 [<c0ad330e>] ? __inet_dev_addr_type+0x56/0x8d
 [<c0a87b02>] ? neigh_lookup+0xe5/0x108
 [<c0adc2bc>] ipt_local_hook+0x40/0x50
 [<c0a93e57>] nf_iterate+0x34/0x80
 [<c0aad4b8>] ? dst_output+0x0/0x10
 [<c0a93eea>] nf_hook_slow+0x47/0xa4
 [<c0aad4b8>] ? dst_output+0x0/0x10
 [<c0aaec04>] __ip_local_out+0x78/0x7f
 [<c0aad4b8>] ? dst_output+0x0/0x10
 [<c0aaec1b>] ip_local_out+0x10/0x20
 [<c0aaf416>] ip_queue_xmit+0x2bc/0x332
 [<c0aa8fdf>] ? __ip_route_output_key+0x112/0x77b
 [<c01397fe>] ? local_bh_enable+0x10/0x12
 [<c0abf0e5>] ? tcp_connect+0x32a/0x3bb
 [<c0ab15a7>] ? __inet_hash_nolisten+0x97/0xaf
 [<c0a78dde>] ? __copy_skb_header+0xe/0x13a
 [<c0abf0e5>] ? tcp_connect+0x32a/0x3bb
 [<c0abe955>] ? tcp_transmit_skb+0x5a5/0x61c
 [<c0abe995>] tcp_transmit_skb+0x5e5/0x61c
 [<c0a7b6c0>] ? __alloc_skb+0x54/0x120
 [<c0abefca>] ? tcp_connect+0x20f/0x3bb
 [<c0abf0e5>] tcp_connect+0x32a/0x3bb
 [<c0ac4b68>] tcp_v4_connect+0x466/0x4be
 [<c0acf380>] inet_stream_connect+0x8f/0x212
 [<c018f4e6>] ? might_fault+0x75/0x77
 [<c047f198>] ? copy_from_user+0x2f/0x117
BUG: using smp_processor_id() in preemptible [00000000] code: ssh/9114

Since it appears that the tables are RCU freed, and there are no 
non-preempt assumptions in the code, the using of 
raw_smp_processor_id() is safe.

[ I also audited all of net/netfilter/*.c for smp_processor_id() use,
  and fixed all places that used them unsafely. ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 net/ipv4/netfilter/arp_tables.c |    2 +-
 net/ipv4/netfilter/ip_tables.c  |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger - April 2, 2009, 8:38 p.m.
On Thu, 2 Apr 2009 22:12:45 +0200
Ingo Molnar <mingo@elte.hu> wrote:

> 
> Impact: fix log spam under CONFIG_DEBUG_PREEMPT=y
> 
> This recent commit:
> 
>    7845447: netfilter: iptables: lock free counters
> 
> Converted a couple of netfilter codepaths from read_lock() critical 
> sections to lockless rcu_read_lock(). What it forgot about is that 
> under CONFIG_PREEMPT=y and CONFIG_PREEMPT_RCU=y these sections can 
> be preempted.
> 
> Under CONFIG_DEBUG_PREEMPT=y this produces such warnings:
> 
> BUG: using smp_processor_id() in preemptible [00000000] code: ssh/9115
> caller is ipt_do_table+0xc8/0x559
> Pid: 9115, comm: ssh Tainted: G        W  2.6.29-tip-08646-g45ef7c3-dirty #26231
> Call Trace:
>  [<c0c0dacf>] ? printk+0x14/0x16
>  [<c048c2e6>] debug_smp_processor_id+0xa6/0xbc
>  [<c0adbd03>] ipt_do_table+0xc8/0x559
>  [<c0c10277>] ? _read_unlock+0x3d/0x49
>  [<c0ad68c0>] ? fn_hash_lookup+0x94/0xa0
>  [<c0ad330e>] ? __inet_dev_addr_type+0x56/0x8d
>  [<c0a87b02>] ? neigh_lookup+0xe5/0x108
>  [<c0adc2bc>] ipt_local_hook+0x40/0x50
>  [<c0a93e57>] nf_iterate+0x34/0x80
>  [<c0aad4b8>] ? dst_output+0x0/0x10
>  [<c0a93eea>] nf_hook_slow+0x47/0xa4
>  [<c0aad4b8>] ? dst_output+0x0/0x10
>  [<c0aaec04>] __ip_local_out+0x78/0x7f
>  [<c0aad4b8>] ? dst_output+0x0/0x10
>  [<c0aaec1b>] ip_local_out+0x10/0x20
>  [<c0aaf416>] ip_queue_xmit+0x2bc/0x332
>  [<c0aa8fdf>] ? __ip_route_output_key+0x112/0x77b
>  [<c01397fe>] ? local_bh_enable+0x10/0x12
>  [<c0abf0e5>] ? tcp_connect+0x32a/0x3bb
>  [<c0ab15a7>] ? __inet_hash_nolisten+0x97/0xaf
>  [<c0a78dde>] ? __copy_skb_header+0xe/0x13a
>  [<c0abf0e5>] ? tcp_connect+0x32a/0x3bb
>  [<c0abe955>] ? tcp_transmit_skb+0x5a5/0x61c
>  [<c0abe995>] tcp_transmit_skb+0x5e5/0x61c
>  [<c0a7b6c0>] ? __alloc_skb+0x54/0x120
>  [<c0abefca>] ? tcp_connect+0x20f/0x3bb
>  [<c0abf0e5>] tcp_connect+0x32a/0x3bb
>  [<c0ac4b68>] tcp_v4_connect+0x466/0x4be
>  [<c0acf380>] inet_stream_connect+0x8f/0x212
>  [<c018f4e6>] ? might_fault+0x75/0x77
>  [<c047f198>] ? copy_from_user+0x2f/0x117
> BUG: using smp_processor_id() in preemptible [00000000] code: ssh/9114
> 
> Since it appears that the tables are RCU freed, and there are no 
> non-preempt assumptions in the code, the using of 
> raw_smp_processor_id() is safe.
> 
> [ I also audited all of net/netfilter/*.c for smp_processor_id() use,
>   and fixed all places that used them unsafely. ]
> 
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  net/ipv4/netfilter/arp_tables.c |    2 +-
>  net/ipv4/netfilter/ip_tables.c  |    4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
> index 35c5f6a..30baf3e 100644
> --- a/net/ipv4/netfilter/arp_tables.c
> +++ b/net/ipv4/netfilter/arp_tables.c
> @@ -255,7 +255,7 @@ unsigned int arpt_do_table(struct sk_buff *skb,
>  
>  	rcu_read_lock();
>  	private = rcu_dereference(table->private);
> -	table_base = rcu_dereference(private->entries[smp_processor_id()]);
> +	table_base = rcu_dereference(private->entries[raw_smp_processor_id()]);
>  
>  	e = get_entry(table_base, private->hook_entry[hook]);
>  	back = get_entry(table_base, private->underflow[hook]);
> diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
> index 82ee7c9..eff124e 100644
> --- a/net/ipv4/netfilter/ip_tables.c
> +++ b/net/ipv4/netfilter/ip_tables.c
> @@ -280,7 +280,7 @@ static void trace_packet(struct sk_buff *skb,
>  	char *hookname, *chainname, *comment;
>  	unsigned int rulenum = 0;
>  
> -	table_base = (void *)private->entries[smp_processor_id()];
> +	table_base = (void *)private->entries[raw_smp_processor_id()];
>  	root = get_entry(table_base, private->hook_entry[hook]);
>  
>  	hookname = chainname = (char *)hooknames[hook];
> @@ -341,7 +341,7 @@ ipt_do_table(struct sk_buff *skb,
>  
>  	rcu_read_lock();
>  	private = rcu_dereference(table->private);
> -	table_base = rcu_dereference(private->entries[smp_processor_id()]);
> +	table_base = rcu_dereference(private->entries[raw_smp_processor_id()]);
>  
>  	e = get_entry(table_base, private->hook_entry[hook]);
>  

NAK. The rcu_read_lock() needs to be rcu_read_lock_bh() otherwise RCU
could corrupt the referenceses.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar - April 2, 2009, 8:52 p.m.
* Stephen Hemminger <shemminger@vyatta.com> wrote:

> On Thu, 2 Apr 2009 22:12:45 +0200
> Ingo Molnar <mingo@elte.hu> wrote:
> 
> > 
> > Impact: fix log spam under CONFIG_DEBUG_PREEMPT=y
> > 
> > This recent commit:
> > 
> >    7845447: netfilter: iptables: lock free counters
> > 
> > Converted a couple of netfilter codepaths from read_lock() critical 
> > sections to lockless rcu_read_lock(). What it forgot about is that 
> > under CONFIG_PREEMPT=y and CONFIG_PREEMPT_RCU=y these sections can 
> > be preempted.
> > 
> > Under CONFIG_DEBUG_PREEMPT=y this produces such warnings:
> > 
> > BUG: using smp_processor_id() in preemptible [00000000] code: ssh/9115
> > caller is ipt_do_table+0xc8/0x559
> > Pid: 9115, comm: ssh Tainted: G        W  2.6.29-tip-08646-g45ef7c3-dirty #26231
> > Call Trace:
> >  [<c0c0dacf>] ? printk+0x14/0x16
> >  [<c048c2e6>] debug_smp_processor_id+0xa6/0xbc
> >  [<c0adbd03>] ipt_do_table+0xc8/0x559
> >  [<c0c10277>] ? _read_unlock+0x3d/0x49
> >  [<c0ad68c0>] ? fn_hash_lookup+0x94/0xa0
> >  [<c0ad330e>] ? __inet_dev_addr_type+0x56/0x8d
> >  [<c0a87b02>] ? neigh_lookup+0xe5/0x108
> >  [<c0adc2bc>] ipt_local_hook+0x40/0x50
> >  [<c0a93e57>] nf_iterate+0x34/0x80
> >  [<c0aad4b8>] ? dst_output+0x0/0x10
> >  [<c0a93eea>] nf_hook_slow+0x47/0xa4
> >  [<c0aad4b8>] ? dst_output+0x0/0x10
> >  [<c0aaec04>] __ip_local_out+0x78/0x7f
> >  [<c0aad4b8>] ? dst_output+0x0/0x10
> >  [<c0aaec1b>] ip_local_out+0x10/0x20
> >  [<c0aaf416>] ip_queue_xmit+0x2bc/0x332
> >  [<c0aa8fdf>] ? __ip_route_output_key+0x112/0x77b
> >  [<c01397fe>] ? local_bh_enable+0x10/0x12
> >  [<c0abf0e5>] ? tcp_connect+0x32a/0x3bb
> >  [<c0ab15a7>] ? __inet_hash_nolisten+0x97/0xaf
> >  [<c0a78dde>] ? __copy_skb_header+0xe/0x13a
> >  [<c0abf0e5>] ? tcp_connect+0x32a/0x3bb
> >  [<c0abe955>] ? tcp_transmit_skb+0x5a5/0x61c
> >  [<c0abe995>] tcp_transmit_skb+0x5e5/0x61c
> >  [<c0a7b6c0>] ? __alloc_skb+0x54/0x120
> >  [<c0abefca>] ? tcp_connect+0x20f/0x3bb
> >  [<c0abf0e5>] tcp_connect+0x32a/0x3bb
> >  [<c0ac4b68>] tcp_v4_connect+0x466/0x4be
> >  [<c0acf380>] inet_stream_connect+0x8f/0x212
> >  [<c018f4e6>] ? might_fault+0x75/0x77
> >  [<c047f198>] ? copy_from_user+0x2f/0x117
> > BUG: using smp_processor_id() in preemptible [00000000] code: ssh/9114
> > 
> > Since it appears that the tables are RCU freed, and there are no 
> > non-preempt assumptions in the code, the using of 
> > raw_smp_processor_id() is safe.
> > 
> > [ I also audited all of net/netfilter/*.c for smp_processor_id() use,
> >   and fixed all places that used them unsafely. ]
> > 
> > Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > ---
> >  net/ipv4/netfilter/arp_tables.c |    2 +-
> >  net/ipv4/netfilter/ip_tables.c  |    4 ++--
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
> > index 35c5f6a..30baf3e 100644
> > --- a/net/ipv4/netfilter/arp_tables.c
> > +++ b/net/ipv4/netfilter/arp_tables.c
> > @@ -255,7 +255,7 @@ unsigned int arpt_do_table(struct sk_buff *skb,
> >  
> >  	rcu_read_lock();
> >  	private = rcu_dereference(table->private);
> > -	table_base = rcu_dereference(private->entries[smp_processor_id()]);
> > +	table_base = rcu_dereference(private->entries[raw_smp_processor_id()]);
> >  
> >  	e = get_entry(table_base, private->hook_entry[hook]);
> >  	back = get_entry(table_base, private->underflow[hook]);
> > diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
> > index 82ee7c9..eff124e 100644
> > --- a/net/ipv4/netfilter/ip_tables.c
> > +++ b/net/ipv4/netfilter/ip_tables.c
> > @@ -280,7 +280,7 @@ static void trace_packet(struct sk_buff *skb,
> >  	char *hookname, *chainname, *comment;
> >  	unsigned int rulenum = 0;
> >  
> > -	table_base = (void *)private->entries[smp_processor_id()];
> > +	table_base = (void *)private->entries[raw_smp_processor_id()];
> >  	root = get_entry(table_base, private->hook_entry[hook]);
> >  
> >  	hookname = chainname = (char *)hooknames[hook];
> > @@ -341,7 +341,7 @@ ipt_do_table(struct sk_buff *skb,
> >  
> >  	rcu_read_lock();
> >  	private = rcu_dereference(table->private);
> > -	table_base = rcu_dereference(private->entries[smp_processor_id()]);
> > +	table_base = rcu_dereference(private->entries[raw_smp_processor_id()]);
> >  
> >  	e = get_entry(table_base, private->hook_entry[hook]);
> >  
> 
> NAK. The rcu_read_lock() needs to be rcu_read_lock_bh() otherwise RCU
> could corrupt the referenceses.

Indeed - somehow i missed that it originally used read_lock_bh().

I was wondering about that already: how could get_entry() and 
nf-iterator work without worrying about softnet-rx processing, it 
just looked too involved to be trivially preemptible.

Thanks, i'll use Eric's patch.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 35c5f6a..30baf3e 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -255,7 +255,7 @@  unsigned int arpt_do_table(struct sk_buff *skb,
 
 	rcu_read_lock();
 	private = rcu_dereference(table->private);
-	table_base = rcu_dereference(private->entries[smp_processor_id()]);
+	table_base = rcu_dereference(private->entries[raw_smp_processor_id()]);
 
 	e = get_entry(table_base, private->hook_entry[hook]);
 	back = get_entry(table_base, private->underflow[hook]);
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 82ee7c9..eff124e 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -280,7 +280,7 @@  static void trace_packet(struct sk_buff *skb,
 	char *hookname, *chainname, *comment;
 	unsigned int rulenum = 0;
 
-	table_base = (void *)private->entries[smp_processor_id()];
+	table_base = (void *)private->entries[raw_smp_processor_id()];
 	root = get_entry(table_base, private->hook_entry[hook]);
 
 	hookname = chainname = (char *)hooknames[hook];
@@ -341,7 +341,7 @@  ipt_do_table(struct sk_buff *skb,
 
 	rcu_read_lock();
 	private = rcu_dereference(table->private);
-	table_base = rcu_dereference(private->entries[smp_processor_id()]);
+	table_base = rcu_dereference(private->entries[raw_smp_processor_id()]);
 
 	e = get_entry(table_base, private->hook_entry[hook]);