Message ID | b41741db7ecfaabe74fbcacac9a628375217ee35.1435765328.git.daniel@iogearbox.net |
---|---|
State | Not Applicable |
Delegated to: | Pablo Neira |
Headers | show |
Daniel Borkmann <daniel@iogearbox.net> wrote: > When adding connection tracking template rules to a netns, f.e. to > configure netfilter zones, the kernel will endlessly busy-loop as soon > as we try to delete the given netns in case there's at least one > template present. Minimal example: > > ip netns add foo > ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1 > ip netns del foo [..] > +static struct nf_conn *get_next_tmpl(struct ct_pcpu *pcpu) > +{ > + struct nf_conntrack_tuple_hash *h; > + struct hlist_nulls_node *n; > + struct nf_conn *ct = NULL; > + > + spin_lock_bh(&pcpu->lock); > + hlist_nulls_for_each_entry(h, n, &pcpu->tmpl, hnnode) { > + ct = nf_ct_tuplehash_to_ctrack(h); > + break; > + } > + spin_unlock_bh(&pcpu->lock); > + > + return ct; > +} > + > +static void nf_ct_tmpls_cleanup(struct net *net) > +{ > + int cpu; > + > + for_each_possible_cpu(cpu) { > + struct ct_pcpu *pcpu = per_cpu_ptr(net->ct.pcpu_lists, cpu); > + struct nf_conn *ct; > + > + while ((ct = get_next_tmpl(pcpu)) != NULL) > + nf_ct_put(ct); > + } > +} I was worried next call to nf_ct_tmpls_cleanup() might see same ct again, thus putting it more than once. But it seems safe as it runs after a synchronize_net, i.e. ct refcnt should always be 1, and thus the nf_ct_put should result in invocation of destructor & removal from tmplate list. Thanks Daniel! Acked-by: Florian Westpha <fw@strlen.de> -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/01/2015 06:57 PM, Florian Westphal wrote: > Daniel Borkmann <daniel@iogearbox.net> wrote: >> When adding connection tracking template rules to a netns, f.e. to >> configure netfilter zones, the kernel will endlessly busy-loop as soon >> as we try to delete the given netns in case there's at least one >> template present. Minimal example: >> >> ip netns add foo >> ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1 >> ip netns del foo > > [..] ... > I was worried next call to nf_ct_tmpls_cleanup() might see same ct > again, thus putting it more than once. > > But it seems safe as it runs after a synchronize_net, i.e. ct refcnt > should always be 1, and thus the nf_ct_put should result in invocation of > destructor & removal from tmplate list. Please drop this patch, it needs changes. While debugging this further, I noticed the issue seems actually a different one that I thought it was originally: I.e. when the netns is removed, the ct template is in fact being freed/ref-dropped via xt_ct_tg_destroy(), but that happens at a later stage after the nf_conntrack_cleanup_net_list(), where we test for net->ct.count. Given that in nf_conntrack_cleanup_net_list() we tear down all the per net ct infrastructure, they cannot be deferred until xt_ct_tg_destroy(). Will try to find a different solution. Cheers, Daniel -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 13fad86..dec0b3a 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -1409,6 +1409,35 @@ void nf_ct_iterate_cleanup(struct net *net, } EXPORT_SYMBOL_GPL(nf_ct_iterate_cleanup); +static struct nf_conn *get_next_tmpl(struct ct_pcpu *pcpu) +{ + struct nf_conntrack_tuple_hash *h; + struct hlist_nulls_node *n; + struct nf_conn *ct = NULL; + + spin_lock_bh(&pcpu->lock); + hlist_nulls_for_each_entry(h, n, &pcpu->tmpl, hnnode) { + ct = nf_ct_tuplehash_to_ctrack(h); + break; + } + spin_unlock_bh(&pcpu->lock); + + return ct; +} + +static void nf_ct_tmpls_cleanup(struct net *net) +{ + int cpu; + + for_each_possible_cpu(cpu) { + struct ct_pcpu *pcpu = per_cpu_ptr(net->ct.pcpu_lists, cpu); + struct nf_conn *ct; + + while ((ct = get_next_tmpl(pcpu)) != NULL) + nf_ct_put(ct); + } +} + static int kill_all(struct nf_conn *i, void *data) { return 1; @@ -1488,6 +1517,8 @@ i_see_dead_people: busy = 0; list_for_each_entry(net, net_exit_list, exit_list) { nf_ct_iterate_cleanup(net, kill_all, NULL, 0, 0); + nf_ct_tmpls_cleanup(net); + if (atomic_read(&net->ct.count) != 0) busy = 1; }
When adding connection tracking template rules to a netns, f.e. to configure netfilter zones, the kernel will endlessly busy-loop as soon as we try to delete the given netns in case there's at least one template present. Minimal example: ip netns add foo ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1 ip netns del foo What happens is that when nf_ct_iterate_cleanup() is being called from nf_conntrack_cleanup_net_list() for a provided netns, we always end up with a net->ct.count > 0 and thus jump back to i_see_dead_people. We don't get a soft-lockup as we still have a schedule() point, but the serving CPU spins on 100% from that point onwards. Since templates are normally allocated with nf_conntrack_alloc(), we also bump net->ct.count, but they are never freed via nf_ct_put(). Thus, when we delete a netns, we also need to check and free them from the pcpu template list, so that the refcount can actually drop to 0 and eventually move on with destroying the netns. Therefore, we add a nf_ct_tmpls_cleanup() function, that is similar to nf_ct_iterate_cleanup(), but which handles templates that got onto the list via nf_conntrack_tmpl_insert(). Note that nf_ct_put() needs to be done outside of the lock protecting the pcpu lists. Fixes: 252b3e8c1bc0 ("netfilter: xt_CT: fix crash while destroy ct templates") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> --- (I believe this should be the case since 252b3e8c1bc0.) net/netfilter/nf_conntrack_core.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+)