diff mbox series

[nf-next,RFC,2/2] netfilter: conntrack: skip event delivery for the netns exit path

Message ID 20220408125837.221673-2-pablo@netfilter.org
State RFC
Delegated to: Pablo Neira
Headers show
Series [nf-next,RFC,1/2] netfilter: conntrack: add nf_ct_iter_data object for nf_ct_iterate_cleanup*() | expand

Commit Message

Pablo Neira Ayuso April 8, 2022, 12:58 p.m. UTC
70e9942f17a6 ("netfilter: nf_conntrack: make event callback registration
per-netns") introduced a per-netns callback for events to workaround a
crash when delivering conntrack events on a stale per-netns nfnetlink
kernel socket.

This patch adds a new flag to the nf_ct_iter_data object to skip event
delivery from the netns cleanup path to address this issue.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
compiled tested only.

@Florian: Maybe this helps to remove the per-netns nf_conntrack_event_cb
callback without having to update nfnetlink to deal with this corner case?

 include/net/netfilter/nf_conntrack.h |  8 +++++++-
 net/netfilter/nf_conntrack_core.c    | 14 +++++++++++---
 2 files changed, 18 insertions(+), 4 deletions(-)

Comments

Florian Westphal April 8, 2022, 7:34 p.m. UTC | #1
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> 70e9942f17a6 ("netfilter: nf_conntrack: make event callback registration
> per-netns") introduced a per-netns callback for events to workaround a
> crash when delivering conntrack events on a stale per-netns nfnetlink
> kernel socket.
> 
> This patch adds a new flag to the nf_ct_iter_data object to skip event
> delivery from the netns cleanup path to address this issue.
> 
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
> compiled tested only.
> @Florian: Maybe this helps to remove the per-netns nf_conntrack_event_cb
> callback without having to update nfnetlink to deal with this corner case?

Old crash recipe is (from your changelog of the 'make it pernet' change):

 0) make sure nf_conntrack_netlink and nf_conntrack_ipv4 are loaded.
 1) container is started.
 2) connect to it via lxc-console.
 3) generate some traffic with the container to create some conntrack
    entries in its table.
 4) stop the container: you hit one oops because the conntrack table
    cleanup tries to report the destroy event to user-space but the
    per-netns nfnetlink socket has already gone (as the nfnetlink
    socket is per-netns but event callback registration is global).

Pernet exit handlers are called in reverse order of the module load
order, so normally this means:

ctnetlink exit handlers
nfnetlink_net_exit_batch, removes nfnl socket
nf_conntrack_pernet_exit(), removes entries,

Because callback is pernet atm this prevents crash after nfntlink sk
has been closed.

If thats no longer the case, we need some other way to suppress
calls with stale nfnl sk.

With the proposed patch series its still possible that we end up
in nfnetlink via  the ctnl event handler.

E.g. gc worker could evit at the right time, or some kfree_skb call
ends up dropping last reference.

If you really dislike the nfnl changes I will respin without this
and will keep the pernet ctnetlink callback.
Pablo Neira Ayuso April 10, 2022, 3:23 p.m. UTC | #2
On Fri, Apr 08, 2022 at 09:34:13PM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > 70e9942f17a6 ("netfilter: nf_conntrack: make event callback registration
> > per-netns") introduced a per-netns callback for events to workaround a
> > crash when delivering conntrack events on a stale per-netns nfnetlink
> > kernel socket.
> > 
> > This patch adds a new flag to the nf_ct_iter_data object to skip event
> > delivery from the netns cleanup path to address this issue.
> > 
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > ---
> > compiled tested only.
> > @Florian: Maybe this helps to remove the per-netns nf_conntrack_event_cb
> > callback without having to update nfnetlink to deal with this corner case?
> 
> Old crash recipe is (from your changelog of the 'make it pernet' change):
> 
>  0) make sure nf_conntrack_netlink and nf_conntrack_ipv4 are loaded.
>  1) container is started.
>  2) connect to it via lxc-console.
>  3) generate some traffic with the container to create some conntrack
>     entries in its table.
>  4) stop the container: you hit one oops because the conntrack table
>     cleanup tries to report the destroy event to user-space but the
>     per-netns nfnetlink socket has already gone (as the nfnetlink
>     socket is per-netns but event callback registration is global).
> 
> Pernet exit handlers are called in reverse order of the module load
> order, so normally this means:
> 
> ctnetlink exit handlers
> nfnetlink_net_exit_batch, removes nfnl socket
> nf_conntrack_pernet_exit(), removes entries,
> 
> Because callback is pernet atm this prevents crash after nfntlink sk
> has been closed.
> 
> If thats no longer the case, we need some other way to suppress
> calls with stale nfnl sk.
> 
> With the proposed patch series its still possible that we end up
> in nfnetlink via  the ctnl event handler.
> 
> E.g. gc worker could evit at the right time, or some kfree_skb call
> ends up dropping last reference.
> 
> If you really dislike the nfnl changes I will respin without this
> and will keep the pernet ctnetlink callback.

OK, my patch is not covering all the possible cases then.

Probably we can remove the hooks from .pre_exit, then force a run of
the garbage collector from there. Then .exit path skips event delivery
as my patch does.

This would allow to remove the per-netns callback workaround, and all
would be handled from nf_conntrack instead?
Florian Westphal April 10, 2022, 3:38 p.m. UTC | #3
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > If you really dislike the nfnl changes I will respin without this
> > and will keep the pernet ctnetlink callback.
> 
> OK, my patch is not covering all the possible cases then.
> 
> Probably we can remove the hooks from .pre_exit, then force a run of
> the garbage collector from there. Then .exit path skips event delivery
> as my patch does.

Hmm, sounds tricky, but doabble.

> This would allow to remove the per-netns callback workaround, and all
> would be handled from nf_conntrack instead?

Ok. I will drop the pernet change from this patch set.
diff mbox series

Patch

diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index 14dd6bbe360c..25687bb80a64 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -199,7 +199,12 @@  void nf_ct_netns_put(struct net *net, u8 nfproto);
 void *nf_ct_alloc_hashtable(unsigned int *sizep, int nulls);
 
 int nf_conntrack_hash_check_insert(struct nf_conn *ct);
-bool nf_ct_delete(struct nf_conn *ct, u32 pid, int report);
+
+bool __nf_ct_delete(struct nf_conn *ct, u32 portid, int report, bool skip_event);
+static inline bool nf_ct_delete(struct nf_conn *ct, u32 pid, int report)
+{
+	return __nf_ct_delete(ct, pid, report, false);
+}
 
 bool nf_ct_get_tuplepr(const struct sk_buff *skb, unsigned int nhoff,
 		       u_int16_t l3num, struct net *net,
@@ -244,6 +249,7 @@  struct nf_ct_iter_data {
 	void *data;
 	u32 portid;
 	int report;
+	bool skip_event;
 };
 
 /* Iterate over all conntracks: if iter returns true, it's deleted. */
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 93c30c16bade..51d248ee28ca 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -687,7 +687,7 @@  static void nf_ct_delete_from_lists(struct nf_conn *ct)
 	local_bh_enable();
 }
 
-bool nf_ct_delete(struct nf_conn *ct, u32 portid, int report)
+bool __nf_ct_delete(struct nf_conn *ct, u32 portid, int report, bool skip_event)
 {
 	struct nf_conn_tstamp *tstamp;
 	struct net *net;
@@ -704,6 +704,9 @@  bool nf_ct_delete(struct nf_conn *ct, u32 portid, int report)
 			tstamp->stop -= jiffies_to_nsecs(-timeout);
 	}
 
+	if (skip_event)
+		goto out;
+
 	if (nf_conntrack_event_report(IPCT_DESTROY, ct,
 				    portid, report) < 0) {
 		/* destroy event was not delivered. nf_ct_put will
@@ -717,6 +720,8 @@  bool nf_ct_delete(struct nf_conn *ct, u32 portid, int report)
 	net = nf_ct_net(ct);
 	if (nf_conntrack_ecache_dwork_pending(net))
 		nf_conntrack_ecache_work(net, NFCT_ECACHE_DESTROY_SENT);
+
+out:
 	nf_ct_delete_from_lists(ct);
 	nf_ct_put(ct);
 	return true;
@@ -2383,7 +2388,8 @@  static void nf_ct_iterate_cleanup(int (*iter)(struct nf_conn *i, void *data),
 	while ((ct = get_next_corpse(iter, iter_data, &bucket)) != NULL) {
 		/* Time to push up daises... */
 
-		nf_ct_delete(ct, iter_data->portid, iter_data->report);
+		__nf_ct_delete(ct, iter_data->portid, iter_data->report,
+			       iter_data->skip_event);
 		nf_ct_put(ct);
 		cond_resched();
 	}
@@ -2532,7 +2538,9 @@  void nf_conntrack_cleanup_net(struct net *net)
 
 void nf_conntrack_cleanup_net_list(struct list_head *net_exit_list)
 {
-	struct nf_ct_iter_data iter_data = {};
+	struct nf_ct_iter_data iter_data = {
+		.skip_event	= true,
+	};
 	struct net *net;
 	int busy;