Message ID | 20220408125837.221673-2-pablo@netfilter.org |
---|---|
State | RFC |
Delegated to: | Pablo Neira |
Headers | show |
Series | [nf-next,RFC,1/2] netfilter: conntrack: add nf_ct_iter_data object for nf_ct_iterate_cleanup*() | expand |
Pablo Neira Ayuso <pablo@netfilter.org> wrote: > 70e9942f17a6 ("netfilter: nf_conntrack: make event callback registration > per-netns") introduced a per-netns callback for events to workaround a > crash when delivering conntrack events on a stale per-netns nfnetlink > kernel socket. > > This patch adds a new flag to the nf_ct_iter_data object to skip event > delivery from the netns cleanup path to address this issue. > > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> > --- > compiled tested only. > @Florian: Maybe this helps to remove the per-netns nf_conntrack_event_cb > callback without having to update nfnetlink to deal with this corner case? Old crash recipe is (from your changelog of the 'make it pernet' change): 0) make sure nf_conntrack_netlink and nf_conntrack_ipv4 are loaded. 1) container is started. 2) connect to it via lxc-console. 3) generate some traffic with the container to create some conntrack entries in its table. 4) stop the container: you hit one oops because the conntrack table cleanup tries to report the destroy event to user-space but the per-netns nfnetlink socket has already gone (as the nfnetlink socket is per-netns but event callback registration is global). Pernet exit handlers are called in reverse order of the module load order, so normally this means: ctnetlink exit handlers nfnetlink_net_exit_batch, removes nfnl socket nf_conntrack_pernet_exit(), removes entries, Because callback is pernet atm this prevents crash after nfntlink sk has been closed. If thats no longer the case, we need some other way to suppress calls with stale nfnl sk. With the proposed patch series its still possible that we end up in nfnetlink via the ctnl event handler. E.g. gc worker could evit at the right time, or some kfree_skb call ends up dropping last reference. If you really dislike the nfnl changes I will respin without this and will keep the pernet ctnetlink callback.
On Fri, Apr 08, 2022 at 09:34:13PM +0200, Florian Westphal wrote: > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > 70e9942f17a6 ("netfilter: nf_conntrack: make event callback registration > > per-netns") introduced a per-netns callback for events to workaround a > > crash when delivering conntrack events on a stale per-netns nfnetlink > > kernel socket. > > > > This patch adds a new flag to the nf_ct_iter_data object to skip event > > delivery from the netns cleanup path to address this issue. > > > > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> > > --- > > compiled tested only. > > @Florian: Maybe this helps to remove the per-netns nf_conntrack_event_cb > > callback without having to update nfnetlink to deal with this corner case? > > Old crash recipe is (from your changelog of the 'make it pernet' change): > > 0) make sure nf_conntrack_netlink and nf_conntrack_ipv4 are loaded. > 1) container is started. > 2) connect to it via lxc-console. > 3) generate some traffic with the container to create some conntrack > entries in its table. > 4) stop the container: you hit one oops because the conntrack table > cleanup tries to report the destroy event to user-space but the > per-netns nfnetlink socket has already gone (as the nfnetlink > socket is per-netns but event callback registration is global). > > Pernet exit handlers are called in reverse order of the module load > order, so normally this means: > > ctnetlink exit handlers > nfnetlink_net_exit_batch, removes nfnl socket > nf_conntrack_pernet_exit(), removes entries, > > Because callback is pernet atm this prevents crash after nfntlink sk > has been closed. > > If thats no longer the case, we need some other way to suppress > calls with stale nfnl sk. > > With the proposed patch series its still possible that we end up > in nfnetlink via the ctnl event handler. > > E.g. gc worker could evit at the right time, or some kfree_skb call > ends up dropping last reference. > > If you really dislike the nfnl changes I will respin without this > and will keep the pernet ctnetlink callback. OK, my patch is not covering all the possible cases then. Probably we can remove the hooks from .pre_exit, then force a run of the garbage collector from there. Then .exit path skips event delivery as my patch does. This would allow to remove the per-netns callback workaround, and all would be handled from nf_conntrack instead?
Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > If you really dislike the nfnl changes I will respin without this > > and will keep the pernet ctnetlink callback. > > OK, my patch is not covering all the possible cases then. > > Probably we can remove the hooks from .pre_exit, then force a run of > the garbage collector from there. Then .exit path skips event delivery > as my patch does. Hmm, sounds tricky, but doabble. > This would allow to remove the per-netns callback workaround, and all > would be handled from nf_conntrack instead? Ok. I will drop the pernet change from this patch set.
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index 14dd6bbe360c..25687bb80a64 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -199,7 +199,12 @@ void nf_ct_netns_put(struct net *net, u8 nfproto); void *nf_ct_alloc_hashtable(unsigned int *sizep, int nulls); int nf_conntrack_hash_check_insert(struct nf_conn *ct); -bool nf_ct_delete(struct nf_conn *ct, u32 pid, int report); + +bool __nf_ct_delete(struct nf_conn *ct, u32 portid, int report, bool skip_event); +static inline bool nf_ct_delete(struct nf_conn *ct, u32 pid, int report) +{ + return __nf_ct_delete(ct, pid, report, false); +} bool nf_ct_get_tuplepr(const struct sk_buff *skb, unsigned int nhoff, u_int16_t l3num, struct net *net, @@ -244,6 +249,7 @@ struct nf_ct_iter_data { void *data; u32 portid; int report; + bool skip_event; }; /* Iterate over all conntracks: if iter returns true, it's deleted. */ diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 93c30c16bade..51d248ee28ca 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -687,7 +687,7 @@ static void nf_ct_delete_from_lists(struct nf_conn *ct) local_bh_enable(); } -bool nf_ct_delete(struct nf_conn *ct, u32 portid, int report) +bool __nf_ct_delete(struct nf_conn *ct, u32 portid, int report, bool skip_event) { struct nf_conn_tstamp *tstamp; struct net *net; @@ -704,6 +704,9 @@ bool nf_ct_delete(struct nf_conn *ct, u32 portid, int report) tstamp->stop -= jiffies_to_nsecs(-timeout); } + if (skip_event) + goto out; + if (nf_conntrack_event_report(IPCT_DESTROY, ct, portid, report) < 0) { /* destroy event was not delivered. nf_ct_put will @@ -717,6 +720,8 @@ bool nf_ct_delete(struct nf_conn *ct, u32 portid, int report) net = nf_ct_net(ct); if (nf_conntrack_ecache_dwork_pending(net)) nf_conntrack_ecache_work(net, NFCT_ECACHE_DESTROY_SENT); + +out: nf_ct_delete_from_lists(ct); nf_ct_put(ct); return true; @@ -2383,7 +2388,8 @@ static void nf_ct_iterate_cleanup(int (*iter)(struct nf_conn *i, void *data), while ((ct = get_next_corpse(iter, iter_data, &bucket)) != NULL) { /* Time to push up daises... */ - nf_ct_delete(ct, iter_data->portid, iter_data->report); + __nf_ct_delete(ct, iter_data->portid, iter_data->report, + iter_data->skip_event); nf_ct_put(ct); cond_resched(); } @@ -2532,7 +2538,9 @@ void nf_conntrack_cleanup_net(struct net *net) void nf_conntrack_cleanup_net_list(struct list_head *net_exit_list) { - struct nf_ct_iter_data iter_data = {}; + struct nf_ct_iter_data iter_data = { + .skip_event = true, + }; struct net *net; int busy;
70e9942f17a6 ("netfilter: nf_conntrack: make event callback registration per-netns") introduced a per-netns callback for events to workaround a crash when delivering conntrack events on a stale per-netns nfnetlink kernel socket. This patch adds a new flag to the nf_ct_iter_data object to skip event delivery from the netns cleanup path to address this issue. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> --- compiled tested only. @Florian: Maybe this helps to remove the per-netns nf_conntrack_event_cb callback without having to update nfnetlink to deal with this corner case? include/net/netfilter/nf_conntrack.h | 8 +++++++- net/netfilter/nf_conntrack_core.c | 14 +++++++++++--- 2 files changed, 18 insertions(+), 4 deletions(-)