diff mbox

netfilter: fix race in conntrack between dump_table and destroy

Message ID 20101126135101.4e4b97cc@nehalam
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

stephen hemminger Nov. 26, 2010, 9:51 p.m. UTC
The netlink interface to dump the connection tracking table has a race
when entries are deleted at the same time. A customer reported a crash 
and the backtrace showed thatctnetlink_dump_table was running while a 
conntrack entry wasbeing destroyed.
(see https://bugzilla.vyatta.com/show_bug.cgi?id=6402).

According to RCU documentation, when using hlist_nulls the reader
must handle the case of seeing a deleted entry and not proceed
further down the linked list.  The old code would continue
which caused the scan to walk into the free list. 

This patch uses locking (rather than RCU) for this operation which
is guaranteed safe, and no longer requires getting reference while
doing dump operation.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet Nov. 27, 2010, 6:32 a.m. UTC | #1
Le vendredi 26 novembre 2010 à 13:51 -0800, Stephen Hemminger a écrit :
> The netlink interface to dump the connection tracking table has a race
> when entries are deleted at the same time. A customer reported a crash 
> and the backtrace showed thatctnetlink_dump_table was running while a 
> conntrack entry wasbeing destroyed.
> (see https://bugzilla.vyatta.com/show_bug.cgi?id=6402).
> 
> According to RCU documentation, when using hlist_nulls the reader
> must handle the case of seeing a deleted entry and not proceed
> further down the linked list.  The old code would continue
> which caused the scan to walk into the free list. 
> 
> This patch uses locking (rather than RCU) for this operation which
> is guaranteed safe, and no longer requires getting reference while
> doing dump operation.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger Nov. 30, 2010, 5:28 p.m. UTC | #2
On Fri, 26 Nov 2010 13:51:01 -0800
Stephen Hemminger <shemminger@vyatta.com> wrote:

> The netlink interface to dump the connection tracking table has a race
> when entries are deleted at the same time. A customer reported a crash 
> and the backtrace showed thatctnetlink_dump_table was running while a 
> conntrack entry wasbeing destroyed.
> (see https://bugzilla.vyatta.com/show_bug.cgi?id=6402).
> 
> According to RCU documentation, when using hlist_nulls the reader
> must handle the case of seeing a deleted entry and not proceed
> further down the linked list.  The old code would continue
> which caused the scan to walk into the free list. 
> 
> This patch uses locking (rather than RCU) for this operation which
> is guaranteed safe, and no longer requires getting reference while
> doing dump operation.
> 
> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

This should go in net-2.6 and stable for 2.6.32, 2.6.35, and 2.6.36
Pablo Neira Ayuso Jan. 9, 2011, 9:32 p.m. UTC | #3
On 26/11/10 22:51, Stephen Hemminger wrote:
> The netlink interface to dump the connection tracking table has a race
> when entries are deleted at the same time. A customer reported a crash 
> and the backtrace showed thatctnetlink_dump_table was running while a 
> conntrack entry wasbeing destroyed.
> (see https://bugzilla.vyatta.com/show_bug.cgi?id=6402).
> 
> According to RCU documentation, when using hlist_nulls the reader
> must handle the case of seeing a deleted entry and not proceed
> further down the linked list.  The old code would continue
> which caused the scan to walk into the free list. 
> 
> This patch uses locking (rather than RCU) for this operation which
> is guaranteed safe, and no longer requires getting reference while
> doing dump operation.

I have put this in my tree:

http://1984.lsi.us.es/git/?p=net-2.6/.git;a=summary

I'll pass it to David for -stable inclusion.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/net/netfilter/nf_conntrack_netlink.c	2010-11-25 21:49:11.401158365 -0800
+++ b/net/netfilter/nf_conntrack_netlink.c	2010-11-25 22:18:08.164421697 -0800
@@ -642,25 +642,23 @@  ctnetlink_dump_table(struct sk_buff *skb
 	struct nfgenmsg *nfmsg = nlmsg_data(cb->nlh);
 	u_int8_t l3proto = nfmsg->nfgen_family;
 
-	rcu_read_lock();
+	spin_lock_bh(&nf_conntrack_lock);
 	last = (struct nf_conn *)cb->args[1];
 	for (; cb->args[0] < net->ct.htable_size; cb->args[0]++) {
 restart:
-		hlist_nulls_for_each_entry_rcu(h, n, &net->ct.hash[cb->args[0]],
+		hlist_nulls_for_each_entry(h, n, &net->ct.hash[cb->args[0]],
 					 hnnode) {
 			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
 				continue;
 			ct = nf_ct_tuplehash_to_ctrack(h);
-			if (!atomic_inc_not_zero(&ct->ct_general.use))
-				continue;
 			/* Dump entries of a given L3 protocol number.
 			 * If it is not specified, ie. l3proto == 0,
 			 * then dump everything. */
 			if (l3proto && nf_ct_l3num(ct) != l3proto)
-				goto releasect;
+				continue;
 			if (cb->args[1]) {
 				if (ct != last)
-					goto releasect;
+					continue;
 				cb->args[1] = 0;
 			}
 			if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid,
@@ -678,8 +676,6 @@  restart:
 				if (acct)
 					memset(acct, 0, sizeof(struct nf_conn_counter[IP_CT_DIR_MAX]));
 			}
-releasect:
-		nf_ct_put(ct);
 		}
 		if (cb->args[1]) {
 			cb->args[1] = 0;
@@ -687,7 +683,7 @@  releasect:
 		}
 	}
 out:
-	rcu_read_unlock();
+	spin_unlock_bh(&nf_conntrack_lock);
 	if (last)
 		nf_ct_put(last);