diff mbox

[nf] netfilter: nf_conntrack: fix endless loop on netns deletion

Message ID b41741db7ecfaabe74fbcacac9a628375217ee35.1435765328.git.daniel@iogearbox.net
State Not Applicable
Delegated to: Pablo Neira
Headers show

Commit Message

Daniel Borkmann July 1, 2015, 4:24 p.m. UTC
When adding connection tracking template rules to a netns, f.e. to
configure netfilter zones, the kernel will endlessly busy-loop as soon
as we try to delete the given netns in case there's at least one
template present. Minimal example:

  ip netns add foo
  ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1
  ip netns del foo

What happens is that when nf_ct_iterate_cleanup() is being called from
nf_conntrack_cleanup_net_list() for a provided netns, we always end up
with a net->ct.count > 0 and thus jump back to i_see_dead_people. We
don't get a soft-lockup as we still have a schedule() point, but the
serving CPU spins on 100% from that point onwards.

Since templates are normally allocated with nf_conntrack_alloc(), we
also bump net->ct.count, but they are never freed via nf_ct_put(). Thus,
when we delete a netns, we also need to check and free them from the
pcpu template list, so that the refcount can actually drop to 0 and
eventually move on with destroying the netns.

Therefore, we add a nf_ct_tmpls_cleanup() function, that is similar to
nf_ct_iterate_cleanup(), but which handles templates that got onto the
list via nf_conntrack_tmpl_insert(). Note that nf_ct_put() needs to be
done outside of the lock protecting the pcpu lists.

Fixes: 252b3e8c1bc0 ("netfilter: xt_CT: fix crash while destroy ct templates")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 (I believe this should be the case since 252b3e8c1bc0.)

 net/netfilter/nf_conntrack_core.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

Comments

Florian Westphal July 1, 2015, 4:57 p.m. UTC | #1
Daniel Borkmann <daniel@iogearbox.net> wrote:
> When adding connection tracking template rules to a netns, f.e. to
> configure netfilter zones, the kernel will endlessly busy-loop as soon
> as we try to delete the given netns in case there's at least one
> template present. Minimal example:
> 
>   ip netns add foo
>   ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1
>   ip netns del foo

[..]

> +static struct nf_conn *get_next_tmpl(struct ct_pcpu *pcpu)
> +{
> +	struct nf_conntrack_tuple_hash *h;
> +	struct hlist_nulls_node *n;
> +	struct nf_conn *ct = NULL;
> +
> +	spin_lock_bh(&pcpu->lock);
> +	hlist_nulls_for_each_entry(h, n, &pcpu->tmpl, hnnode) {
> +		ct = nf_ct_tuplehash_to_ctrack(h);
> +		break;
> +	}
> +	spin_unlock_bh(&pcpu->lock);
> +
> +	return ct;
> +}
> +
> +static void nf_ct_tmpls_cleanup(struct net *net)
> +{
> +	int cpu;
> +
> +	for_each_possible_cpu(cpu) {
> +		struct ct_pcpu *pcpu = per_cpu_ptr(net->ct.pcpu_lists, cpu);
> +		struct nf_conn *ct;
> +
> +		while ((ct = get_next_tmpl(pcpu)) != NULL)
> +			nf_ct_put(ct);
> +	}
> +}

I was worried next call to nf_ct_tmpls_cleanup() might see same ct
again, thus putting it more than once.

But it seems safe as it runs after a synchronize_net, i.e.  ct refcnt
should always be 1, and thus the nf_ct_put should result in invocation of
destructor & removal from tmplate list.

Thanks Daniel!

Acked-by: Florian Westpha <fw@strlen.de>
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Borkmann July 1, 2015, 9:29 p.m. UTC | #2
On 07/01/2015 06:57 PM, Florian Westphal wrote:
> Daniel Borkmann <daniel@iogearbox.net> wrote:
>> When adding connection tracking template rules to a netns, f.e. to
>> configure netfilter zones, the kernel will endlessly busy-loop as soon
>> as we try to delete the given netns in case there's at least one
>> template present. Minimal example:
>>
>>    ip netns add foo
>>    ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1
>>    ip netns del foo
>
> [..]
...
> I was worried next call to nf_ct_tmpls_cleanup() might see same ct
> again, thus putting it more than once.
>
> But it seems safe as it runs after a synchronize_net, i.e.  ct refcnt
> should always be 1, and thus the nf_ct_put should result in invocation of
> destructor & removal from tmplate list.

Please drop this patch, it needs changes.

While debugging this further, I noticed the issue seems actually a
different one that I thought it was originally: I.e. when the netns
is removed, the ct template is in fact being freed/ref-dropped via
xt_ct_tg_destroy(), but that happens at a later stage after the
nf_conntrack_cleanup_net_list(), where we test for net->ct.count.

Given that in nf_conntrack_cleanup_net_list() we tear down all the
per net ct infrastructure, they cannot be deferred until xt_ct_tg_destroy().

Will try to find a different solution.

Cheers,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 13fad86..dec0b3a 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1409,6 +1409,35 @@  void nf_ct_iterate_cleanup(struct net *net,
 }
 EXPORT_SYMBOL_GPL(nf_ct_iterate_cleanup);
 
+static struct nf_conn *get_next_tmpl(struct ct_pcpu *pcpu)
+{
+	struct nf_conntrack_tuple_hash *h;
+	struct hlist_nulls_node *n;
+	struct nf_conn *ct = NULL;
+
+	spin_lock_bh(&pcpu->lock);
+	hlist_nulls_for_each_entry(h, n, &pcpu->tmpl, hnnode) {
+		ct = nf_ct_tuplehash_to_ctrack(h);
+		break;
+	}
+	spin_unlock_bh(&pcpu->lock);
+
+	return ct;
+}
+
+static void nf_ct_tmpls_cleanup(struct net *net)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		struct ct_pcpu *pcpu = per_cpu_ptr(net->ct.pcpu_lists, cpu);
+		struct nf_conn *ct;
+
+		while ((ct = get_next_tmpl(pcpu)) != NULL)
+			nf_ct_put(ct);
+	}
+}
+
 static int kill_all(struct nf_conn *i, void *data)
 {
 	return 1;
@@ -1488,6 +1517,8 @@  i_see_dead_people:
 	busy = 0;
 	list_for_each_entry(net, net_exit_list, exit_list) {
 		nf_ct_iterate_cleanup(net, kill_all, NULL, 0, 0);
+		nf_ct_tmpls_cleanup(net);
+
 		if (atomic_read(&net->ct.count) != 0)
 			busy = 1;
 	}