diff mbox

[3/3] netfilter: x_tables: fix ordering of jumpstack allocation and table update

Message ID 1382519724-3953-4-git-send-email-pablo@netfilter.org
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Pablo Neira Ayuso Oct. 23, 2013, 9:15 a.m. UTC
From: Will Deacon <will.deacon@arm.com>

During kernel stability testing on an SMP ARMv7 system, Yalin Wang
reported the following panic from the netfilter code:

  1fe0: 0000001c 5e2d3b10 4007e779 4009e110 60000010 00000032 ff565656 ff545454
  [<c06c48dc>] (ipt_do_table+0x448/0x584) from [<c0655ef0>] (nf_iterate+0x48/0x7c)
  [<c0655ef0>] (nf_iterate+0x48/0x7c) from [<c0655f7c>] (nf_hook_slow+0x58/0x104)
  [<c0655f7c>] (nf_hook_slow+0x58/0x104) from [<c0683bbc>] (ip_local_deliver+0x88/0xa8)
  [<c0683bbc>] (ip_local_deliver+0x88/0xa8) from [<c0683718>] (ip_rcv_finish+0x418/0x43c)
  [<c0683718>] (ip_rcv_finish+0x418/0x43c) from [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598)
  [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598) from [<c062b314>] (process_backlog+0x84/0x158)
  [<c062b314>] (process_backlog+0x84/0x158) from [<c062de84>] (net_rx_action+0x70/0x1dc)
  [<c062de84>] (net_rx_action+0x70/0x1dc) from [<c0088230>] (__do_softirq+0x11c/0x27c)
  [<c0088230>] (__do_softirq+0x11c/0x27c) from [<c008857c>] (do_softirq+0x44/0x50)
  [<c008857c>] (do_softirq+0x44/0x50) from [<c0088614>] (local_bh_enable_ip+0x8c/0xd0)
  [<c0088614>] (local_bh_enable_ip+0x8c/0xd0) from [<c06b0330>] (inet_stream_connect+0x164/0x298)
  [<c06b0330>] (inet_stream_connect+0x164/0x298) from [<c061d68c>] (sys_connect+0x88/0xc8)
  [<c061d68c>] (sys_connect+0x88/0xc8) from [<c000e340>] (ret_fast_syscall+0x0/0x30)
  Code: 2a000021 e59d2028 e59de01c e59f011c (e7824103)
  ---[ end trace da227214a82491bd ]---
  Kernel panic - not syncing: Fatal exception in interrupt

This comes about because CPU1 is executing xt_replace_table in response
to a setsockopt syscall, resulting in:

	ret = xt_jumpstack_alloc(newinfo);
		--> newinfo->jumpstack = kzalloc(size, GFP_KERNEL);

	[...]

	table->private = newinfo;
	newinfo->initial_entries = private->initial_entries;

Meanwhile, CPU0 is handling the network receive path and ends up in
ipt_do_table, resulting in:

	private = table->private;

	[...]

	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];

On weakly ordered memory architectures, the writes to table->private
and newinfo->jumpstack from CPU1 can be observed out of order by CPU0.
Furthermore, on architectures which don't respect ordering of address
dependencies (i.e. Alpha), the reads from CPU0 can also be re-ordered.

This patch adds an smp_wmb() before the assignment to table->private
(which is essentially publishing newinfo) to ensure that all writes to
newinfo will be observed before plugging it into the table structure.
A dependent-read barrier is also added on the consumer sides, to ensure
the same ordering requirements are also respected there.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: Wang, Yalin <Yalin.Wang@sonymobile.com>
Tested-by: Wang, Yalin <Yalin.Wang@sonymobile.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/arp_tables.c |    5 +++++
 net/ipv4/netfilter/ip_tables.c  |    5 +++++
 net/ipv6/netfilter/ip6_tables.c |    5 +++++
 net/netfilter/x_tables.c        |    7 ++++++-
 4 files changed, 21 insertions(+), 1 deletion(-)

Comments

David Laight Oct. 23, 2013, 9:45 a.m. UTC | #1
> Subject: [PATCH 3/3] netfilter: x_tables: fix ordering of jumpstack allocation and table update
...
> Meanwhile, CPU0 is handling the network receive path and ends up in
> ipt_do_table, resulting in:
> 
> 	private = table->private;
> 
> 	[...]
> 
> 	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];
> 
> On weakly ordered memory architectures, the writes to table->private
> and newinfo->jumpstack from CPU1 can be observed out of order by CPU0.
> Furthermore, on architectures which don't respect ordering of address
> dependencies (i.e. Alpha), the reads from CPU0 can also be re-ordered.

Which reads might be out of order?
AFAICT they are strongly sequenced because they second depends on the
value read by the first.
So I don't see why the read barrier is needed.

I presume the above code is tied to a single cpu.

...
> 
> -	table->private = newinfo;
>  	newinfo->initial_entries = private->initial_entries;
> +	/*
> +	 * Ensure contents of newinfo are visible before assigning to
> +	 * private.
> +	 */
> +	smp_wmb();
> +	table->private = newinfo;

Those writes were in the wrong order on all systems.
Also gcc needs to be told not to reorder the writes even on non-smp
systems (if the code might be pre-empted).
So an asm volatile (:::"memory") is needed there even if no specific
synchronisation instruction is needed.

	David



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 23, 2013, 12:13 p.m. UTC | #2
On Wed, 2013-10-23 at 10:45 +0100, David Laight wrote:

> Those writes were in the wrong order on all systems.
> Also gcc needs to be told not to reorder the writes even on non-smp
> systems (if the code might be pre-empted).
> So an asm volatile (:::"memory") is needed there even if no specific
> synchronisation instruction is needed.

smp_wmb() contains this compiler barrier already.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Will Deacon Oct. 23, 2013, 4:37 p.m. UTC | #3
Hi David,

On Wed, Oct 23, 2013 at 10:45:04AM +0100, David Laight wrote:
> > Subject: [PATCH 3/3] netfilter: x_tables: fix ordering of jumpstack allocation and table update
> ...
> > Meanwhile, CPU0 is handling the network receive path and ends up in
> > ipt_do_table, resulting in:
> > 
> > 	private = table->private;
> > 
> > 	[...]
> > 
> > 	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];
> > 
> > On weakly ordered memory architectures, the writes to table->private
> > and newinfo->jumpstack from CPU1 can be observed out of order by CPU0.
> > Furthermore, on architectures which don't respect ordering of address
> > dependencies (i.e. Alpha), the reads from CPU0 can also be re-ordered.
> 
> Which reads might be out of order?
> AFAICT they are strongly sequenced because they second depends on the
> value read by the first.
> So I don't see why the read barrier is needed.

That is why this is a dependent read barrier. Some architectures (e.g.
Alpha) *do* allow dependent reads to be observed out of order, so you can
effectively load the data pointed to by a pointer before you load the
pointer itself!

Take a look at Paul's paper about memory ordering if you're curious:

  http://www.rdrop.com/users/paulmck/scalability/paper/whymb.2009.04.05a.pdf

> > -	table->private = newinfo;
> >  	newinfo->initial_entries = private->initial_entries;
> > +	/*
> > +	 * Ensure contents of newinfo are visible before assigning to
> > +	 * private.
> > +	 */
> > +	smp_wmb();
> > +	table->private = newinfo;
> 
> Those writes were in the wrong order on all systems.
> Also gcc needs to be told not to reorder the writes even on non-smp
> systems (if the code might be pre-empted).
> So an asm volatile (:::"memory") is needed there even if no specific
> synchronisation instruction is needed.

The smp_* barriers expand to barrier() when !CONFIG_SMP, which gives you the
memory clobber you want.

What I'm *not* 100% sure about is the table freeing path. There is a mutex
there for removing the table from a list, but I'm not sure how we ensure
that there are no parallel readers at that point.

Will
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 23, 2013, 5:04 p.m. UTC | #4
On Wed, 2013-10-23 at 17:37 +0100, Will Deacon wrote:

> What I'm *not* 100% sure about is the table freeing path. There is a mutex
> there for removing the table from a list, but I'm not sure how we ensure
> that there are no parallel readers at that point.

Sequence is :

xt_replace_table();
get_counters();
xt_free_table_info();

get_counters() is the way we ensure no cpu is using old copy of the
table before freeing.




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 85a4f21..59da7cd 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -271,6 +271,11 @@  unsigned int arpt_do_table(struct sk_buff *skb,
 	local_bh_disable();
 	addend = xt_write_recseq_begin();
 	private = table->private;
+	/*
+	 * Ensure we load private-> members after we've fetched the base
+	 * pointer.
+	 */
+	smp_read_barrier_depends();
 	table_base = private->entries[smp_processor_id()];
 
 	e = get_entry(table_base, private->hook_entry[hook]);
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index d23118d..718dfbd 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -327,6 +327,11 @@  ipt_do_table(struct sk_buff *skb,
 	addend = xt_write_recseq_begin();
 	private = table->private;
 	cpu        = smp_processor_id();
+	/*
+	 * Ensure we load private-> members after we've fetched the base
+	 * pointer.
+	 */
+	smp_read_barrier_depends();
 	table_base = private->entries[cpu];
 	jumpstack  = (struct ipt_entry **)private->jumpstack[cpu];
 	stackptr   = per_cpu_ptr(private->stackptr, cpu);
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 44400c2..710238f 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -349,6 +349,11 @@  ip6t_do_table(struct sk_buff *skb,
 	local_bh_disable();
 	addend = xt_write_recseq_begin();
 	private = table->private;
+	/*
+	 * Ensure we load private-> members after we've fetched the base
+	 * pointer.
+	 */
+	smp_read_barrier_depends();
 	cpu        = smp_processor_id();
 	table_base = private->entries[cpu];
 	jumpstack  = (struct ip6t_entry **)private->jumpstack[cpu];
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 8b03028..227aa11 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -845,8 +845,13 @@  xt_replace_table(struct xt_table *table,
 		return NULL;
 	}
 
-	table->private = newinfo;
 	newinfo->initial_entries = private->initial_entries;
+	/*
+	 * Ensure contents of newinfo are visible before assigning to
+	 * private.
+	 */
+	smp_wmb();
+	table->private = newinfo;
 
 	/*
 	 * Even though table entries have now been swapped, other CPU's