diff mbox series

[SRU,F:linux-bluefield,v1,1/2] netfilter: conntrack: remove offload_pickup sysctl again

Message ID 1653570184-5304-1-git-send-email-bodong@nvidia.com
State New
Headers show
Series [SRU,F:linux-bluefield,v1,1/2] netfilter: conntrack: remove offload_pickup sysctl again | expand

Commit Message

Bodong Wang May 26, 2022, 1:03 p.m. UTC
From: Florian Westphal <fw@strlen.de>

BugLink: https://bugs.launchpad.net/bugs/1975820

These two sysctls were added because the hardcoded defaults (2 minutes,
tcp, 30 seconds, udp) turned out to be too low for some setups.

They appeared in 5.14-rc1 so it should be fine to remove it again.

Marcelo convinced me that there should be no difference between a flow
that was offloaded vs. a flow that was not wrt. timeout handling.
Thus the default is changed to those for TCP established and UDP stream,
5 days and 120 seconds, respectively.

Marcelo also suggested to account for the timeout value used for the
offloading, this avoids increase beyond the value in the conntrack-sysctl
and will also instantly expire the conntrack entry with altered sysctls.

Example:
   nf_conntrack_udp_timeout_stream=60
   nf_flowtable_udp_timeout=60

This will remove offloaded udp flows after one minute, rather than two.

An earlier version of this patch also cleared the ASSURED bit to
allow nf_conntrack to evict the entry via early_drop (i.e., table full).
However, it looks like we can safely assume that connection timed out
via HW is still in established state, so this isn't needed.

Quoting Oz:
 [..] the hardware sends all packets with a set FIN flags to sw.
 [..] Connections that are aged in hardware are expected to be in the
 established state.

In case it turns out that back-to-sw-path transition can occur for
'dodgy' connections too (e.g., one side disappeared while software-path
would have been in RETRANS timeout), we can adjust this later.

Cc: Oz Shlomo <ozsh@nvidia.com>
Cc: Paul Blakey <paulb@nvidia.com>
Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(backported from commit 4592ee7f525c4683ec9e290381601fdee50ae110)
Signed-off-by: Bodong Wang <bodong@nvidia.com>
[bodong: remove doc file]

Conflicts:
	Documentation/networking/nf_conntrack-sysctl.rst
---
 include/net/netns/conntrack.h           |  2 --
 net/netfilter/nf_conntrack_proto_tcp.c  |  1 -
 net/netfilter/nf_conntrack_proto_udp.c  |  1 -
 net/netfilter/nf_conntrack_standalone.c | 16 ----------------
 net/netfilter/nf_flow_table_core.c      | 11 ++++++++---
 5 files changed, 8 insertions(+), 23 deletions(-)

Comments

Tim Gardner June 16, 2022, 3:10 p.m. UTC | #1
On 5/26/22 07:03, Bodong Wang wrote:
> From: Florian Westphal <fw@strlen.de>
> 
> BugLink: https://bugs.launchpad.net/bugs/1975820
> 
> These two sysctls were added because the hardcoded defaults (2 minutes,
> tcp, 30 seconds, udp) turned out to be too low for some setups.
> 
> They appeared in 5.14-rc1 so it should be fine to remove it again.
> 
> Marcelo convinced me that there should be no difference between a flow
> that was offloaded vs. a flow that was not wrt. timeout handling.
> Thus the default is changed to those for TCP established and UDP stream,
> 5 days and 120 seconds, respectively.
> 
> Marcelo also suggested to account for the timeout value used for the
> offloading, this avoids increase beyond the value in the conntrack-sysctl
> and will also instantly expire the conntrack entry with altered sysctls.
> 
> Example:
>     nf_conntrack_udp_timeout_stream=60
>     nf_flowtable_udp_timeout=60
> 
> This will remove offloaded udp flows after one minute, rather than two.
> 
> An earlier version of this patch also cleared the ASSURED bit to
> allow nf_conntrack to evict the entry via early_drop (i.e., table full).
> However, it looks like we can safely assume that connection timed out
> via HW is still in established state, so this isn't needed.
> 
> Quoting Oz:
>   [..] the hardware sends all packets with a set FIN flags to sw.
>   [..] Connections that are aged in hardware are expected to be in the
>   established state.
> 
> In case it turns out that back-to-sw-path transition can occur for
> 'dodgy' connections too (e.g., one side disappeared while software-path
> would have been in RETRANS timeout), we can adjust this later.
> 
> Cc: Oz Shlomo <ozsh@nvidia.com>
> Cc: Paul Blakey <paulb@nvidia.com>
> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> (backported from commit 4592ee7f525c4683ec9e290381601fdee50ae110)
> Signed-off-by: Bodong Wang <bodong@nvidia.com>
> [bodong: remove doc file]
> 
> Conflicts:
> 	Documentation/networking/nf_conntrack-sysctl.rst
> ---
>   include/net/netns/conntrack.h           |  2 --
>   net/netfilter/nf_conntrack_proto_tcp.c  |  1 -
>   net/netfilter/nf_conntrack_proto_udp.c  |  1 -
>   net/netfilter/nf_conntrack_standalone.c | 16 ----------------
>   net/netfilter/nf_flow_table_core.c      | 11 ++++++++---
>   5 files changed, 8 insertions(+), 23 deletions(-)
> 
> diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
> index 67bbaa6..9e3963c8 100644
> --- a/include/net/netns/conntrack.h
> +++ b/include/net/netns/conntrack.h
> @@ -29,7 +29,6 @@ struct nf_tcp_net {
>   	int tcp_max_retrans;
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	unsigned int offload_timeout;
> -	unsigned int offload_pickup;
>   #endif
>   };
>   
> @@ -43,7 +42,6 @@ struct nf_udp_net {
>   	unsigned int timeouts[UDP_CT_MAX];
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	unsigned int offload_timeout;
> -	unsigned int offload_pickup;
>   #endif
>   };
>   
> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> index d8e554c..0e51fb7 100644
> --- a/net/netfilter/nf_conntrack_proto_tcp.c
> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> @@ -1435,7 +1435,6 @@ void nf_conntrack_tcp_init_net(struct net *net)
>   
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	tn->offload_timeout = 30 * HZ;
> -	tn->offload_pickup = 120 * HZ;
>   #endif
>   }
>   
> diff --git a/net/netfilter/nf_conntrack_proto_udp.c b/net/netfilter/nf_conntrack_proto_udp.c
> index b3e135e..ee336d7 100644
> --- a/net/netfilter/nf_conntrack_proto_udp.c
> +++ b/net/netfilter/nf_conntrack_proto_udp.c
> @@ -282,7 +282,6 @@ void nf_conntrack_udp_init_net(struct net *net)
>   
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	un->offload_timeout = 30 * HZ;
> -	un->offload_pickup = 30 * HZ;
>   #endif
>   }
>   
> diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
> index ac8f12b..1e78ad8 100644
> --- a/net/netfilter/nf_conntrack_standalone.c
> +++ b/net/netfilter/nf_conntrack_standalone.c
> @@ -569,7 +569,6 @@ enum nf_ct_sysctl_index {
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_UNACK,
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD,
> -	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP,
>   #endif
>   	NF_SYSCTL_CT_PROTO_TCP_LOOSE,
>   	NF_SYSCTL_CT_PROTO_TCP_LIBERAL,
> @@ -578,7 +577,6 @@ enum nf_ct_sysctl_index {
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM,
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD,
> -	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP,
>   #endif
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP,
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMPV6,
> @@ -773,12 +771,6 @@ enum nf_ct_sysctl_index {
>   		.mode		= 0644,
>   		.proc_handler	= proc_dointvec_jiffies,
>   	},
> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP] = {
> -		.procname	= "nf_flowtable_tcp_pickup",
> -		.maxlen		= sizeof(unsigned int),
> -		.mode		= 0644,
> -		.proc_handler	= proc_dointvec_jiffies,
> -	},
>   #endif
>   	[NF_SYSCTL_CT_PROTO_TCP_LOOSE] = {
>   		.procname	= "nf_conntrack_tcp_loose",
> @@ -821,12 +813,6 @@ enum nf_ct_sysctl_index {
>   		.mode		= 0644,
>   		.proc_handler	= proc_dointvec_jiffies,
>   	},
> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP] = {
> -		.procname	= "nf_flowtable_udp_pickup",
> -		.maxlen		= sizeof(unsigned int),
> -		.mode		= 0644,
> -		.proc_handler	= proc_dointvec_jiffies,
> -	},
>   #endif
>   	[NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP] = {
>   		.procname	= "nf_conntrack_icmp_timeout",
> @@ -1006,7 +992,6 @@ static void nf_conntrack_standalone_init_tcp_sysctl(struct net *net,
>   
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD].data = &tn->offload_timeout;
> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP].data = &tn->offload_pickup;
>   #endif
>   
>   }
> @@ -1098,7 +1083,6 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
>   	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM].data = &un->timeouts[UDP_CT_REPLIED];
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD].data = &un->offload_timeout;
> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP].data = &un->offload_pickup;
>   #endif
>   
>   	nf_conntrack_standalone_init_tcp_sysctl(net, table);
> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> index 8ed2752..ba86bc0 100644
> --- a/net/netfilter/nf_flow_table_core.c
> +++ b/net/netfilter/nf_flow_table_core.c
> @@ -136,7 +136,7 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>   	const struct nf_conntrack_l4proto *l4proto;
>   	struct net *net = nf_ct_net(ct);
>   	int l4num = nf_ct_protonum(ct);
> -	unsigned int timeout;
> +	s32 timeout;
>   
>   	l4proto = nf_ct_l4proto_find(l4num);
>   	if (!l4proto)
> @@ -145,15 +145,20 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>   	if (l4num == IPPROTO_TCP) {
>   		struct nf_tcp_net *tn = nf_tcp_pernet(net);
>   
> -		timeout = tn->offload_pickup;
> +		timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
> +		timeout -= tn->offload_timeout;
>   	} else if (l4num == IPPROTO_UDP) {
>   		struct nf_udp_net *tn = nf_udp_pernet(net);
>   
> -		timeout = tn->offload_pickup;
> +		timeout = tn->timeouts[UDP_CT_REPLIED];
> +		timeout -= tn->offload_timeout;
>   	} else {
>   		return;
>   	}
>   
> +	if (timeout < 0)
> +		timeout = 0;
> +
>   	if (nf_flow_timeout_delta(ct->timeout) > (__s32)timeout)
>   		ct->timeout = nfct_time_stamp + timeout;
>   }
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Bodong Wang June 21, 2022, 2:57 p.m. UTC | #2
Hi Tim/Zach,

Seems like this patch series is ACKed but not merged to master-next. Could you please check?

Thanks,
Bodong

-----Original Message-----
From: Tim Gardner <tim.gardner@canonical.com> 
Sent: Thursday, June 16, 2022 10:10 AM
To: Bodong Wang <bodong@nvidia.com>; kernel-team@lists.ubuntu.com
Cc: Vladimir Sokolovsky <vlad@nvidia.com>; Maor Dickman <maord@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>
Subject: ACK: [SRU][F:linux-bluefield][PATCH v1 1/2] netfilter: conntrack: remove offload_pickup sysctl again

On 5/26/22 07:03, Bodong Wang wrote:
> From: Florian Westphal <fw@strlen.de>
> 
> BugLink: https://bugs.launchpad.net/bugs/1975820
> 
> These two sysctls were added because the hardcoded defaults (2 
> minutes, tcp, 30 seconds, udp) turned out to be too low for some setups.
> 
> They appeared in 5.14-rc1 so it should be fine to remove it again.
> 
> Marcelo convinced me that there should be no difference between a flow 
> that was offloaded vs. a flow that was not wrt. timeout handling.
> Thus the default is changed to those for TCP established and UDP 
> stream,
> 5 days and 120 seconds, respectively.
> 
> Marcelo also suggested to account for the timeout value used for the 
> offloading, this avoids increase beyond the value in the 
> conntrack-sysctl and will also instantly expire the conntrack entry with altered sysctls.
> 
> Example:
>     nf_conntrack_udp_timeout_stream=60
>     nf_flowtable_udp_timeout=60
> 
> This will remove offloaded udp flows after one minute, rather than two.
> 
> An earlier version of this patch also cleared the ASSURED bit to allow 
> nf_conntrack to evict the entry via early_drop (i.e., table full).
> However, it looks like we can safely assume that connection timed out 
> via HW is still in established state, so this isn't needed.
> 
> Quoting Oz:
>   [..] the hardware sends all packets with a set FIN flags to sw.
>   [..] Connections that are aged in hardware are expected to be in the
>   established state.
> 
> In case it turns out that back-to-sw-path transition can occur for 
> 'dodgy' connections too (e.g., one side disappeared while 
> software-path would have been in RETRANS timeout), we can adjust this later.
> 
> Cc: Oz Shlomo <ozsh@nvidia.com>
> Cc: Paul Blakey <paulb@nvidia.com>
> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (backported 
> from commit 4592ee7f525c4683ec9e290381601fdee50ae110)
> Signed-off-by: Bodong Wang <bodong@nvidia.com>
> [bodong: remove doc file]
> 
> Conflicts:
> 	Documentation/networking/nf_conntrack-sysctl.rst
> ---
>   include/net/netns/conntrack.h           |  2 --
>   net/netfilter/nf_conntrack_proto_tcp.c  |  1 -
>   net/netfilter/nf_conntrack_proto_udp.c  |  1 -
>   net/netfilter/nf_conntrack_standalone.c | 16 ----------------
>   net/netfilter/nf_flow_table_core.c      | 11 ++++++++---
>   5 files changed, 8 insertions(+), 23 deletions(-)
> 
> diff --git a/include/net/netns/conntrack.h 
> b/include/net/netns/conntrack.h index 67bbaa6..9e3963c8 100644
> --- a/include/net/netns/conntrack.h
> +++ b/include/net/netns/conntrack.h
> @@ -29,7 +29,6 @@ struct nf_tcp_net {
>   	int tcp_max_retrans;
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	unsigned int offload_timeout;
> -	unsigned int offload_pickup;
>   #endif
>   };
>   
> @@ -43,7 +42,6 @@ struct nf_udp_net {
>   	unsigned int timeouts[UDP_CT_MAX];
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	unsigned int offload_timeout;
> -	unsigned int offload_pickup;
>   #endif
>   };
>   
> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c 
> b/net/netfilter/nf_conntrack_proto_tcp.c
> index d8e554c..0e51fb7 100644
> --- a/net/netfilter/nf_conntrack_proto_tcp.c
> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> @@ -1435,7 +1435,6 @@ void nf_conntrack_tcp_init_net(struct net *net)
>   
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	tn->offload_timeout = 30 * HZ;
> -	tn->offload_pickup = 120 * HZ;
>   #endif
>   }
>   
> diff --git a/net/netfilter/nf_conntrack_proto_udp.c 
> b/net/netfilter/nf_conntrack_proto_udp.c
> index b3e135e..ee336d7 100644
> --- a/net/netfilter/nf_conntrack_proto_udp.c
> +++ b/net/netfilter/nf_conntrack_proto_udp.c
> @@ -282,7 +282,6 @@ void nf_conntrack_udp_init_net(struct net *net)
>   
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	un->offload_timeout = 30 * HZ;
> -	un->offload_pickup = 30 * HZ;
>   #endif
>   }
>   
> diff --git a/net/netfilter/nf_conntrack_standalone.c 
> b/net/netfilter/nf_conntrack_standalone.c
> index ac8f12b..1e78ad8 100644
> --- a/net/netfilter/nf_conntrack_standalone.c
> +++ b/net/netfilter/nf_conntrack_standalone.c
> @@ -569,7 +569,6 @@ enum nf_ct_sysctl_index {
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_UNACK,
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD,
> -	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP,
>   #endif
>   	NF_SYSCTL_CT_PROTO_TCP_LOOSE,
>   	NF_SYSCTL_CT_PROTO_TCP_LIBERAL,
> @@ -578,7 +577,6 @@ enum nf_ct_sysctl_index {
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM,
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD,
> -	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP,
>   #endif
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP,
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMPV6,
> @@ -773,12 +771,6 @@ enum nf_ct_sysctl_index {
>   		.mode		= 0644,
>   		.proc_handler	= proc_dointvec_jiffies,
>   	},
> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP] = {
> -		.procname	= "nf_flowtable_tcp_pickup",
> -		.maxlen		= sizeof(unsigned int),
> -		.mode		= 0644,
> -		.proc_handler	= proc_dointvec_jiffies,
> -	},
>   #endif
>   	[NF_SYSCTL_CT_PROTO_TCP_LOOSE] = {
>   		.procname	= "nf_conntrack_tcp_loose",
> @@ -821,12 +813,6 @@ enum nf_ct_sysctl_index {
>   		.mode		= 0644,
>   		.proc_handler	= proc_dointvec_jiffies,
>   	},
> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP] = {
> -		.procname	= "nf_flowtable_udp_pickup",
> -		.maxlen		= sizeof(unsigned int),
> -		.mode		= 0644,
> -		.proc_handler	= proc_dointvec_jiffies,
> -	},
>   #endif
>   	[NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP] = {
>   		.procname	= "nf_conntrack_icmp_timeout",
> @@ -1006,7 +992,6 @@ static void 
> nf_conntrack_standalone_init_tcp_sysctl(struct net *net,
>   
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD].data = &tn->offload_timeout;
> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP].data = &tn->offload_pickup;
>   #endif
>   
>   }
> @@ -1098,7 +1083,6 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
>   	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM].data = &un->timeouts[UDP_CT_REPLIED];
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD].data = &un->offload_timeout;
> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP].data = &un->offload_pickup;
>   #endif
>   
>   	nf_conntrack_standalone_init_tcp_sysctl(net, table); diff --git 
> a/net/netfilter/nf_flow_table_core.c 
> b/net/netfilter/nf_flow_table_core.c
> index 8ed2752..ba86bc0 100644
> --- a/net/netfilter/nf_flow_table_core.c
> +++ b/net/netfilter/nf_flow_table_core.c
> @@ -136,7 +136,7 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>   	const struct nf_conntrack_l4proto *l4proto;
>   	struct net *net = nf_ct_net(ct);
>   	int l4num = nf_ct_protonum(ct);
> -	unsigned int timeout;
> +	s32 timeout;
>   
>   	l4proto = nf_ct_l4proto_find(l4num);
>   	if (!l4proto)
> @@ -145,15 +145,20 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>   	if (l4num == IPPROTO_TCP) {
>   		struct nf_tcp_net *tn = nf_tcp_pernet(net);
>   
> -		timeout = tn->offload_pickup;
> +		timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
> +		timeout -= tn->offload_timeout;
>   	} else if (l4num == IPPROTO_UDP) {
>   		struct nf_udp_net *tn = nf_udp_pernet(net);
>   
> -		timeout = tn->offload_pickup;
> +		timeout = tn->timeouts[UDP_CT_REPLIED];
> +		timeout -= tn->offload_timeout;
>   	} else {
>   		return;
>   	}
>   
> +	if (timeout < 0)
> +		timeout = 0;
> +
>   	if (nf_flow_timeout_delta(ct->timeout) > (__s32)timeout)
>   		ct->timeout = nfct_time_stamp + timeout;
>   }
Acked-by: Tim Gardner <tim.gardner@canonical.com>

--
-----------
Tim Gardner
Canonical, Inc
From: Pablo Neira Ayuso <pablo@netfilter.org>

BugLink: https://bugs.launchpad.net/bugs/1975649

This patch addresses three possible problems:

1. ct gc may race to undo the timeout adjustment of the packet path, leaving
   the conntrack entry in place with the internal offload timeout (one day).

2. ct gc removes the ct because the IPS_OFFLOAD_BIT is not set and the CLOSE
   timeout is reached before the flow offload del.

3. tcp ct is always set to ESTABLISHED with a very long timeout
   in flow offload teardown/delete even though the state might be already
   CLOSED. Also as a remark we cannot assume that the FIN or RST packet
   is hitting flow table teardown as the packet might get bumped to the
   slow path in nftables.

This patch resets IPS_OFFLOAD_BIT from flow_offload_teardown(), so
conntrack handles the tcp rst/fin packet which triggers the CLOSE/FIN
state transition.

Moreover, teturn the connection's ownership to conntrack upon teardown
by clearing the offload flag and fixing the established timeout value.
The flow table GC thread will asynchonrnously free the flow table and
hardware offload entries.

Before this patch, the IPS_OFFLOAD_BIT remained set for expired flows on
which is also misleading since the flow is back to classic conntrack
path.

If nf_ct_delete() removes the entry from the conntrack table, then it
calls nf_ct_put() which decrements the refcnt. This is not a problem
because the flowtable holds a reference to the conntrack object from
flow_offload_alloc() path which is released via flow_offload_free().

This patch also updates nft_flow_offload to skip packets in SYN_RECV
state. Since we might miss or bump packets to slow path, we do not know
what will happen there while we are still in SYN_RECV, this patch
postpones offload up to the next packet which also aligns to the
existing behaviour in tc-ct.

flow_offload_teardown() does not reset the existing tcp state from
flow_offload_fixup_tcp() to ESTABLISHED anymore, packets bump to slow
path might have already update the state to CLOSE/FIN.

Joint work with Oz and Sven.

Fixes: 1e5b2471bcc4 ("netfilter: nf_flow_table: teardown flow timeout race")
Signed-off-by: Oz Shlomo <ozsh@nvidia.com>
Signed-off-by: Sven Auhagen <sven.auhagen@voleatech.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
(backported from commit e5eaac2beb54f0a16ff851125082d9faeb475572 linux-next)
Signed-off-by: Bodong Wang <bodong@nvidia.com>
[bodong: ignore the unused functions]

Conflicts:
        net/netfilter/nf_flow_table_core.c
---
 net/netfilter/nf_flow_table_core.c | 33 +++++++--------------------------
 net/netfilter/nft_flow_offload.c   |  3 ++-
 2 files changed, 9 insertions(+), 27 deletions(-)

diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index ba86bc0..3270bc3 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -126,12 +126,11 @@ int flow_offload_route_init(struct flow_offload *flow,

 static void flow_offload_fixup_tcp(struct ip_ct_tcp *tcp)
 {
-       tcp->state = TCP_CONNTRACK_ESTABLISHED;
        tcp->seen[0].td_maxwin = 0;
        tcp->seen[1].td_maxwin = 0;
 }

-static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
+static void flow_offload_fixup_ct(struct nf_conn *ct)
 {
        const struct nf_conntrack_l4proto *l4proto;
        struct net *net = nf_ct_net(ct);
@@ -145,7 +144,9 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
        if (l4num == IPPROTO_TCP) {
                struct nf_tcp_net *tn = nf_tcp_pernet(net);

-               timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
+               flow_offload_fixup_tcp(&ct->proto.tcp);
+
+               timeout = tn->timeouts[ct->proto.tcp.state];
                timeout -= tn->offload_timeout;
        } else if (l4num == IPPROTO_UDP) {
                struct nf_udp_net *tn = nf_udp_pernet(net);
@@ -163,18 +164,6 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
                ct->timeout = nfct_time_stamp + timeout;
 }

-static void flow_offload_fixup_ct_state(struct nf_conn *ct)
-{
-       if (nf_ct_protonum(ct) == IPPROTO_TCP)
-               flow_offload_fixup_tcp(&ct->proto.tcp);
-}
-
-static void flow_offload_fixup_ct(struct nf_conn *ct)
-{
-       flow_offload_fixup_ct_state(ct);
-       flow_offload_fixup_ct_timeout(ct);
-}
-
 static void flow_offload_route_release(struct flow_offload *flow)
 {
        dst_release(flow->tuplehash[FLOW_OFFLOAD_DIR_ORIGINAL].tuple.dst_cache);
@@ -312,22 +301,14 @@ static void flow_offload_del(struct nf_flowtable *flow_table,
        rhashtable_remove_fast(&flow_table->rhashtable,
                               &flow->tuplehash[FLOW_OFFLOAD_DIR_REPLY].node,
                               nf_flow_offload_rhash_params);
-
-       clear_bit(IPS_OFFLOAD_BIT, &flow->ct->status);
-
-       if (nf_flow_has_expired(flow))
-               flow_offload_fixup_ct(flow->ct);
-       else
-               flow_offload_fixup_ct_timeout(flow->ct);
-
        flow_offload_free(flow);
 }

 void flow_offload_teardown(struct flow_offload *flow)
 {
+       clear_bit(IPS_OFFLOAD_BIT, &flow->ct->status);
        set_bit(NF_FLOW_TEARDOWN, &flow->flags);
-
-       flow_offload_fixup_ct_state(flow->ct);
+       flow_offload_fixup_ct(flow->ct);
 }
 EXPORT_SYMBOL_GPL(flow_offload_teardown);

@@ -395,7 +376,7 @@ static void nf_flow_offload_gc_step(struct flow_offload *flow, void *data)
        struct nf_flowtable *flow_table = data;

        if (nf_flow_has_expired(flow) || nf_ct_is_dying(flow->ct))
-               set_bit(NF_FLOW_TEARDOWN, &flow->flags);
+               flow_offload_teardown(flow);

        if (test_bit(NF_FLOW_TEARDOWN, &flow->flags)) {
                if (test_bit(NF_FLOW_HW, &flow->flags)) {
diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index b70b489..af3f29d 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -92,7 +92,8 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
        case IPPROTO_TCP:
                tcph = skb_header_pointer(pkt->skb, pkt->xt.thoff,
                                          sizeof(_tcph), &_tcph);
-               if (unlikely(!tcph || tcph->fin || tcph->rst))
+               if (unlikely(!tcph || tcph->fin || tcph->rst ||
+                            !nf_conntrack_tcp_established(ct)))
                        goto out;
                break;
        case IPPROTO_UDP:
--
1.8.3.1
Zachary Tahenakos June 21, 2022, 3:05 p.m. UTC | #3
Hi Bodong,

Yes it isn't merged to master-next on purpose. This patch would have 
landed for 2022.06.20, however the previous cycle is being extended out. 
We are prioritizing the urgent patches for the 2022.05.30 cycle to 
ensure those can get out the door smoothly and then once the cycles 
catch up, we can merge this in, among other patches that need to come in 
as well.


-Zack

On 6/21/22 10:57 AM, Bodong Wang wrote:
> Hi Tim/Zach,
>
> Seems like this patch series is ACKed but not merged to master-next. Could you please check?
>
> Thanks,
> Bodong
>
> -----Original Message-----
> From: Tim Gardner <tim.gardner@canonical.com>
> Sent: Thursday, June 16, 2022 10:10 AM
> To: Bodong Wang <bodong@nvidia.com>; kernel-team@lists.ubuntu.com
> Cc: Vladimir Sokolovsky <vlad@nvidia.com>; Maor Dickman <maord@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>
> Subject: ACK: [SRU][F:linux-bluefield][PATCH v1 1/2] netfilter: conntrack: remove offload_pickup sysctl again
>
> On 5/26/22 07:03, Bodong Wang wrote:
>> From: Florian Westphal <fw@strlen.de>
>>
>> BugLink: https://bugs.launchpad.net/bugs/1975820
>>
>> These two sysctls were added because the hardcoded defaults (2
>> minutes, tcp, 30 seconds, udp) turned out to be too low for some setups.
>>
>> They appeared in 5.14-rc1 so it should be fine to remove it again.
>>
>> Marcelo convinced me that there should be no difference between a flow
>> that was offloaded vs. a flow that was not wrt. timeout handling.
>> Thus the default is changed to those for TCP established and UDP
>> stream,
>> 5 days and 120 seconds, respectively.
>>
>> Marcelo also suggested to account for the timeout value used for the
>> offloading, this avoids increase beyond the value in the
>> conntrack-sysctl and will also instantly expire the conntrack entry with altered sysctls.
>>
>> Example:
>>      nf_conntrack_udp_timeout_stream=60
>>      nf_flowtable_udp_timeout=60
>>
>> This will remove offloaded udp flows after one minute, rather than two.
>>
>> An earlier version of this patch also cleared the ASSURED bit to allow
>> nf_conntrack to evict the entry via early_drop (i.e., table full).
>> However, it looks like we can safely assume that connection timed out
>> via HW is still in established state, so this isn't needed.
>>
>> Quoting Oz:
>>    [..] the hardware sends all packets with a set FIN flags to sw.
>>    [..] Connections that are aged in hardware are expected to be in the
>>    established state.
>>
>> In case it turns out that back-to-sw-path transition can occur for
>> 'dodgy' connections too (e.g., one side disappeared while
>> software-path would have been in RETRANS timeout), we can adjust this later.
>>
>> Cc: Oz Shlomo <ozsh@nvidia.com>
>> Cc: Paul Blakey <paulb@nvidia.com>
>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>> Signed-off-by: Florian Westphal <fw@strlen.de>
>> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>> Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (backported
>> from commit 4592ee7f525c4683ec9e290381601fdee50ae110)
>> Signed-off-by: Bodong Wang <bodong@nvidia.com>
>> [bodong: remove doc file]
>>
>> Conflicts:
>> 	Documentation/networking/nf_conntrack-sysctl.rst
>> ---
>>    include/net/netns/conntrack.h           |  2 --
>>    net/netfilter/nf_conntrack_proto_tcp.c  |  1 -
>>    net/netfilter/nf_conntrack_proto_udp.c  |  1 -
>>    net/netfilter/nf_conntrack_standalone.c | 16 ----------------
>>    net/netfilter/nf_flow_table_core.c      | 11 ++++++++---
>>    5 files changed, 8 insertions(+), 23 deletions(-)
>>
>> diff --git a/include/net/netns/conntrack.h
>> b/include/net/netns/conntrack.h index 67bbaa6..9e3963c8 100644
>> --- a/include/net/netns/conntrack.h
>> +++ b/include/net/netns/conntrack.h
>> @@ -29,7 +29,6 @@ struct nf_tcp_net {
>>    	int tcp_max_retrans;
>>    #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>    	unsigned int offload_timeout;
>> -	unsigned int offload_pickup;
>>    #endif
>>    };
>>    
>> @@ -43,7 +42,6 @@ struct nf_udp_net {
>>    	unsigned int timeouts[UDP_CT_MAX];
>>    #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>    	unsigned int offload_timeout;
>> -	unsigned int offload_pickup;
>>    #endif
>>    };
>>    
>> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c
>> b/net/netfilter/nf_conntrack_proto_tcp.c
>> index d8e554c..0e51fb7 100644
>> --- a/net/netfilter/nf_conntrack_proto_tcp.c
>> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
>> @@ -1435,7 +1435,6 @@ void nf_conntrack_tcp_init_net(struct net *net)
>>    
>>    #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>    	tn->offload_timeout = 30 * HZ;
>> -	tn->offload_pickup = 120 * HZ;
>>    #endif
>>    }
>>    
>> diff --git a/net/netfilter/nf_conntrack_proto_udp.c
>> b/net/netfilter/nf_conntrack_proto_udp.c
>> index b3e135e..ee336d7 100644
>> --- a/net/netfilter/nf_conntrack_proto_udp.c
>> +++ b/net/netfilter/nf_conntrack_proto_udp.c
>> @@ -282,7 +282,6 @@ void nf_conntrack_udp_init_net(struct net *net)
>>    
>>    #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>    	un->offload_timeout = 30 * HZ;
>> -	un->offload_pickup = 30 * HZ;
>>    #endif
>>    }
>>    
>> diff --git a/net/netfilter/nf_conntrack_standalone.c
>> b/net/netfilter/nf_conntrack_standalone.c
>> index ac8f12b..1e78ad8 100644
>> --- a/net/netfilter/nf_conntrack_standalone.c
>> +++ b/net/netfilter/nf_conntrack_standalone.c
>> @@ -569,7 +569,6 @@ enum nf_ct_sysctl_index {
>>    	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_UNACK,
>>    #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>    	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD,
>> -	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP,
>>    #endif
>>    	NF_SYSCTL_CT_PROTO_TCP_LOOSE,
>>    	NF_SYSCTL_CT_PROTO_TCP_LIBERAL,
>> @@ -578,7 +577,6 @@ enum nf_ct_sysctl_index {
>>    	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM,
>>    #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>    	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD,
>> -	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP,
>>    #endif
>>    	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP,
>>    	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMPV6,
>> @@ -773,12 +771,6 @@ enum nf_ct_sysctl_index {
>>    		.mode		= 0644,
>>    		.proc_handler	= proc_dointvec_jiffies,
>>    	},
>> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP] = {
>> -		.procname	= "nf_flowtable_tcp_pickup",
>> -		.maxlen		= sizeof(unsigned int),
>> -		.mode		= 0644,
>> -		.proc_handler	= proc_dointvec_jiffies,
>> -	},
>>    #endif
>>    	[NF_SYSCTL_CT_PROTO_TCP_LOOSE] = {
>>    		.procname	= "nf_conntrack_tcp_loose",
>> @@ -821,12 +813,6 @@ enum nf_ct_sysctl_index {
>>    		.mode		= 0644,
>>    		.proc_handler	= proc_dointvec_jiffies,
>>    	},
>> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP] = {
>> -		.procname	= "nf_flowtable_udp_pickup",
>> -		.maxlen		= sizeof(unsigned int),
>> -		.mode		= 0644,
>> -		.proc_handler	= proc_dointvec_jiffies,
>> -	},
>>    #endif
>>    	[NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP] = {
>>    		.procname	= "nf_conntrack_icmp_timeout",
>> @@ -1006,7 +992,6 @@ static void
>> nf_conntrack_standalone_init_tcp_sysctl(struct net *net,
>>    
>>    #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>    	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD].data = &tn->offload_timeout;
>> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP].data = &tn->offload_pickup;
>>    #endif
>>    
>>    }
>> @@ -1098,7 +1083,6 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
>>    	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM].data = &un->timeouts[UDP_CT_REPLIED];
>>    #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>    	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD].data = &un->offload_timeout;
>> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP].data = &un->offload_pickup;
>>    #endif
>>    
>>    	nf_conntrack_standalone_init_tcp_sysctl(net, table); diff --git
>> a/net/netfilter/nf_flow_table_core.c
>> b/net/netfilter/nf_flow_table_core.c
>> index 8ed2752..ba86bc0 100644
>> --- a/net/netfilter/nf_flow_table_core.c
>> +++ b/net/netfilter/nf_flow_table_core.c
>> @@ -136,7 +136,7 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>>    	const struct nf_conntrack_l4proto *l4proto;
>>    	struct net *net = nf_ct_net(ct);
>>    	int l4num = nf_ct_protonum(ct);
>> -	unsigned int timeout;
>> +	s32 timeout;
>>    
>>    	l4proto = nf_ct_l4proto_find(l4num);
>>    	if (!l4proto)
>> @@ -145,15 +145,20 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>>    	if (l4num == IPPROTO_TCP) {
>>    		struct nf_tcp_net *tn = nf_tcp_pernet(net);
>>    
>> -		timeout = tn->offload_pickup;
>> +		timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
>> +		timeout -= tn->offload_timeout;
>>    	} else if (l4num == IPPROTO_UDP) {
>>    		struct nf_udp_net *tn = nf_udp_pernet(net);
>>    
>> -		timeout = tn->offload_pickup;
>> +		timeout = tn->timeouts[UDP_CT_REPLIED];
>> +		timeout -= tn->offload_timeout;
>>    	} else {
>>    		return;
>>    	}
>>    
>> +	if (timeout < 0)
>> +		timeout = 0;
>> +
>>    	if (nf_flow_timeout_delta(ct->timeout) > (__s32)timeout)
>>    		ct->timeout = nfct_time_stamp + timeout;
>>    }
> Acked-by: Tim Gardner <tim.gardner@canonical.com>
>
> --
> -----------
> Tim Gardner
> Canonical, Inc
Vladimir Sokolovsky June 21, 2022, 3:08 p.m. UTC | #4
Hi Zachary,
So, when should we expect these patches to be merged into the "master-next" branch?

Regards,
Vladimir

-----Original Message-----
From: Zachary Tahenakos <zachary.tahenakos@canonical.com> 
Sent: Tuesday, June 21, 2022 10:05 AM
To: Bodong Wang <bodong@nvidia.com>; Tim Gardner <tim.gardner@canonical.com>; kernel-team@lists.ubuntu.com
Cc: Vladimir Sokolovsky <vlad@nvidia.com>; Maor Dickman <maord@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>
Subject: Re: ACK: [SRU][F:linux-bluefield][PATCH v1 1/2] netfilter: conntrack: remove offload_pickup sysctl again

Hi Bodong,

Yes it isn't merged to master-next on purpose. This patch would have landed for 2022.06.20, however the previous cycle is being extended out. 
We are prioritizing the urgent patches for the 2022.05.30 cycle to ensure those can get out the door smoothly and then once the cycles catch up, we can merge this in, among other patches that need to come in as well.


-Zack

On 6/21/22 10:57 AM, Bodong Wang wrote:
> Hi Tim/Zach,
>
> Seems like this patch series is ACKed but not merged to master-next. Could you please check?
>
> Thanks,
> Bodong
>
> -----Original Message-----
> From: Tim Gardner <tim.gardner@canonical.com>
> Sent: Thursday, June 16, 2022 10:10 AM
> To: Bodong Wang <bodong@nvidia.com>; kernel-team@lists.ubuntu.com
> Cc: Vladimir Sokolovsky <vlad@nvidia.com>; Maor Dickman 
> <maord@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>
> Subject: ACK: [SRU][F:linux-bluefield][PATCH v1 1/2] netfilter: 
> conntrack: remove offload_pickup sysctl again
>
> On 5/26/22 07:03, Bodong Wang wrote:
>> From: Florian Westphal <fw@strlen.de>
>>
>> BugLink: https://bugs.launchpad.net/bugs/1975820
>>
>> These two sysctls were added because the hardcoded defaults (2 
>> minutes, tcp, 30 seconds, udp) turned out to be too low for some setups.
>>
>> They appeared in 5.14-rc1 so it should be fine to remove it again.
>>
>> Marcelo convinced me that there should be no difference between a 
>> flow that was offloaded vs. a flow that was not wrt. timeout handling.
>> Thus the default is changed to those for TCP established and UDP 
>> stream,
>> 5 days and 120 seconds, respectively.
>>
>> Marcelo also suggested to account for the timeout value used for the 
>> offloading, this avoids increase beyond the value in the 
>> conntrack-sysctl and will also instantly expire the conntrack entry with altered sysctls.
>>
>> Example:
>>      nf_conntrack_udp_timeout_stream=60
>>      nf_flowtable_udp_timeout=60
>>
>> This will remove offloaded udp flows after one minute, rather than two.
>>
>> An earlier version of this patch also cleared the ASSURED bit to 
>> allow nf_conntrack to evict the entry via early_drop (i.e., table full).
>> However, it looks like we can safely assume that connection timed out 
>> via HW is still in established state, so this isn't needed.
>>
>> Quoting Oz:
>>    [..] the hardware sends all packets with a set FIN flags to sw.
>>    [..] Connections that are aged in hardware are expected to be in the
>>    established state.
>>
>> In case it turns out that back-to-sw-path transition can occur for 
>> 'dodgy' connections too (e.g., one side disappeared while 
>> software-path would have been in RETRANS timeout), we can adjust this later.
>>
>> Cc: Oz Shlomo <ozsh@nvidia.com>
>> Cc: Paul Blakey <paulb@nvidia.com>
>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>> Signed-off-by: Florian Westphal <fw@strlen.de>
>> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>> Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (backported 
>> from commit 4592ee7f525c4683ec9e290381601fdee50ae110)
>> Signed-off-by: Bodong Wang <bodong@nvidia.com>
>> [bodong: remove doc file]
>>
>> Conflicts:
>> 	Documentation/networking/nf_conntrack-sysctl.rst
>> ---
>>    include/net/netns/conntrack.h           |  2 --
>>    net/netfilter/nf_conntrack_proto_tcp.c  |  1 -
>>    net/netfilter/nf_conntrack_proto_udp.c  |  1 -
>>    net/netfilter/nf_conntrack_standalone.c | 16 ----------------
>>    net/netfilter/nf_flow_table_core.c      | 11 ++++++++---
>>    5 files changed, 8 insertions(+), 23 deletions(-)
>>
>> diff --git a/include/net/netns/conntrack.h 
>> b/include/net/netns/conntrack.h index 67bbaa6..9e3963c8 100644
>> --- a/include/net/netns/conntrack.h
>> +++ b/include/net/netns/conntrack.h
>> @@ -29,7 +29,6 @@ struct nf_tcp_net {
>>    	int tcp_max_retrans;
>>    #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>    	unsigned int offload_timeout;
>> -	unsigned int offload_pickup;
>>    #endif
>>    };
>>    
>> @@ -43,7 +42,6 @@ struct nf_udp_net {
>>    	unsigned int timeouts[UDP_CT_MAX];
>>    #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>    	unsigned int offload_timeout;
>> -	unsigned int offload_pickup;
>>    #endif
>>    };
>>    
>> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c
>> b/net/netfilter/nf_conntrack_proto_tcp.c
>> index d8e554c..0e51fb7 100644
>> --- a/net/netfilter/nf_conntrack_proto_tcp.c
>> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
>> @@ -1435,7 +1435,6 @@ void nf_conntrack_tcp_init_net(struct net *net)
>>    
>>    #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>    	tn->offload_timeout = 30 * HZ;
>> -	tn->offload_pickup = 120 * HZ;
>>    #endif
>>    }
>>    
>> diff --git a/net/netfilter/nf_conntrack_proto_udp.c
>> b/net/netfilter/nf_conntrack_proto_udp.c
>> index b3e135e..ee336d7 100644
>> --- a/net/netfilter/nf_conntrack_proto_udp.c
>> +++ b/net/netfilter/nf_conntrack_proto_udp.c
>> @@ -282,7 +282,6 @@ void nf_conntrack_udp_init_net(struct net *net)
>>    
>>    #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>    	un->offload_timeout = 30 * HZ;
>> -	un->offload_pickup = 30 * HZ;
>>    #endif
>>    }
>>    
>> diff --git a/net/netfilter/nf_conntrack_standalone.c
>> b/net/netfilter/nf_conntrack_standalone.c
>> index ac8f12b..1e78ad8 100644
>> --- a/net/netfilter/nf_conntrack_standalone.c
>> +++ b/net/netfilter/nf_conntrack_standalone.c
>> @@ -569,7 +569,6 @@ enum nf_ct_sysctl_index {
>>    	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_UNACK,
>>    #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>    	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD,
>> -	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP,
>>    #endif
>>    	NF_SYSCTL_CT_PROTO_TCP_LOOSE,
>>    	NF_SYSCTL_CT_PROTO_TCP_LIBERAL,
>> @@ -578,7 +577,6 @@ enum nf_ct_sysctl_index {
>>    	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM,
>>    #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>    	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD,
>> -	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP,
>>    #endif
>>    	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP,
>>    	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMPV6,
>> @@ -773,12 +771,6 @@ enum nf_ct_sysctl_index {
>>    		.mode		= 0644,
>>    		.proc_handler	= proc_dointvec_jiffies,
>>    	},
>> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP] = {
>> -		.procname	= "nf_flowtable_tcp_pickup",
>> -		.maxlen		= sizeof(unsigned int),
>> -		.mode		= 0644,
>> -		.proc_handler	= proc_dointvec_jiffies,
>> -	},
>>    #endif
>>    	[NF_SYSCTL_CT_PROTO_TCP_LOOSE] = {
>>    		.procname	= "nf_conntrack_tcp_loose",
>> @@ -821,12 +813,6 @@ enum nf_ct_sysctl_index {
>>    		.mode		= 0644,
>>    		.proc_handler	= proc_dointvec_jiffies,
>>    	},
>> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP] = {
>> -		.procname	= "nf_flowtable_udp_pickup",
>> -		.maxlen		= sizeof(unsigned int),
>> -		.mode		= 0644,
>> -		.proc_handler	= proc_dointvec_jiffies,
>> -	},
>>    #endif
>>    	[NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP] = {
>>    		.procname	= "nf_conntrack_icmp_timeout",
>> @@ -1006,7 +992,6 @@ static void
>> nf_conntrack_standalone_init_tcp_sysctl(struct net *net,
>>    
>>    #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>    	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD].data = &tn->offload_timeout;
>> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP].data = &tn->offload_pickup;
>>    #endif
>>    
>>    }
>> @@ -1098,7 +1083,6 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
>>    	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM].data = &un->timeouts[UDP_CT_REPLIED];
>>    #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>    	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD].data = &un->offload_timeout;
>> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP].data = &un->offload_pickup;
>>    #endif
>>    
>>    	nf_conntrack_standalone_init_tcp_sysctl(net, table); diff --git 
>> a/net/netfilter/nf_flow_table_core.c
>> b/net/netfilter/nf_flow_table_core.c
>> index 8ed2752..ba86bc0 100644
>> --- a/net/netfilter/nf_flow_table_core.c
>> +++ b/net/netfilter/nf_flow_table_core.c
>> @@ -136,7 +136,7 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>>    	const struct nf_conntrack_l4proto *l4proto;
>>    	struct net *net = nf_ct_net(ct);
>>    	int l4num = nf_ct_protonum(ct);
>> -	unsigned int timeout;
>> +	s32 timeout;
>>    
>>    	l4proto = nf_ct_l4proto_find(l4num);
>>    	if (!l4proto)
>> @@ -145,15 +145,20 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>>    	if (l4num == IPPROTO_TCP) {
>>    		struct nf_tcp_net *tn = nf_tcp_pernet(net);
>>    
>> -		timeout = tn->offload_pickup;
>> +		timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
>> +		timeout -= tn->offload_timeout;
>>    	} else if (l4num == IPPROTO_UDP) {
>>    		struct nf_udp_net *tn = nf_udp_pernet(net);
>>    
>> -		timeout = tn->offload_pickup;
>> +		timeout = tn->timeouts[UDP_CT_REPLIED];
>> +		timeout -= tn->offload_timeout;
>>    	} else {
>>    		return;
>>    	}
>>    
>> +	if (timeout < 0)
>> +		timeout = 0;
>> +
>>    	if (nf_flow_timeout_delta(ct->timeout) > (__s32)timeout)
>>    		ct->timeout = nfct_time_stamp + timeout;
>>    }
> Acked-by: Tim Gardner <tim.gardner@canonical.com>
>
> --
> -----------
> Tim Gardner
> Canonical, Inc
Zachary Tahenakos June 21, 2022, 5:35 p.m. UTC | #5
Hi Vladimir,

I unfortunately do not have an answer for that.


-Zack

On 6/21/22 11:08 AM, Vladimir Sokolovsky wrote:
> Hi Zachary,
> So, when should we expect these patches to be merged into the "master-next" branch?
>
> Regards,
> Vladimir
>
> -----Original Message-----
> From: Zachary Tahenakos <zachary.tahenakos@canonical.com>
> Sent: Tuesday, June 21, 2022 10:05 AM
> To: Bodong Wang <bodong@nvidia.com>; Tim Gardner <tim.gardner@canonical.com>; kernel-team@lists.ubuntu.com
> Cc: Vladimir Sokolovsky <vlad@nvidia.com>; Maor Dickman <maord@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>
> Subject: Re: ACK: [SRU][F:linux-bluefield][PATCH v1 1/2] netfilter: conntrack: remove offload_pickup sysctl again
>
> Hi Bodong,
>
> Yes it isn't merged to master-next on purpose. This patch would have landed for 2022.06.20, however the previous cycle is being extended out.
> We are prioritizing the urgent patches for the 2022.05.30 cycle to ensure those can get out the door smoothly and then once the cycles catch up, we can merge this in, among other patches that need to come in as well.
>
>
> -Zack
>
> On 6/21/22 10:57 AM, Bodong Wang wrote:
>> Hi Tim/Zach,
>>
>> Seems like this patch series is ACKed but not merged to master-next. Could you please check?
>>
>> Thanks,
>> Bodong
>>
>> -----Original Message-----
>> From: Tim Gardner <tim.gardner@canonical.com>
>> Sent: Thursday, June 16, 2022 10:10 AM
>> To: Bodong Wang <bodong@nvidia.com>; kernel-team@lists.ubuntu.com
>> Cc: Vladimir Sokolovsky <vlad@nvidia.com>; Maor Dickman
>> <maord@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>
>> Subject: ACK: [SRU][F:linux-bluefield][PATCH v1 1/2] netfilter:
>> conntrack: remove offload_pickup sysctl again
>>
>> On 5/26/22 07:03, Bodong Wang wrote:
>>> From: Florian Westphal <fw@strlen.de>
>>>
>>> BugLink: https://bugs.launchpad.net/bugs/1975820
>>>
>>> These two sysctls were added because the hardcoded defaults (2
>>> minutes, tcp, 30 seconds, udp) turned out to be too low for some setups.
>>>
>>> They appeared in 5.14-rc1 so it should be fine to remove it again.
>>>
>>> Marcelo convinced me that there should be no difference between a
>>> flow that was offloaded vs. a flow that was not wrt. timeout handling.
>>> Thus the default is changed to those for TCP established and UDP
>>> stream,
>>> 5 days and 120 seconds, respectively.
>>>
>>> Marcelo also suggested to account for the timeout value used for the
>>> offloading, this avoids increase beyond the value in the
>>> conntrack-sysctl and will also instantly expire the conntrack entry with altered sysctls.
>>>
>>> Example:
>>>       nf_conntrack_udp_timeout_stream=60
>>>       nf_flowtable_udp_timeout=60
>>>
>>> This will remove offloaded udp flows after one minute, rather than two.
>>>
>>> An earlier version of this patch also cleared the ASSURED bit to
>>> allow nf_conntrack to evict the entry via early_drop (i.e., table full).
>>> However, it looks like we can safely assume that connection timed out
>>> via HW is still in established state, so this isn't needed.
>>>
>>> Quoting Oz:
>>>     [..] the hardware sends all packets with a set FIN flags to sw.
>>>     [..] Connections that are aged in hardware are expected to be in the
>>>     established state.
>>>
>>> In case it turns out that back-to-sw-path transition can occur for
>>> 'dodgy' connections too (e.g., one side disappeared while
>>> software-path would have been in RETRANS timeout), we can adjust this later.
>>>
>>> Cc: Oz Shlomo <ozsh@nvidia.com>
>>> Cc: Paul Blakey <paulb@nvidia.com>
>>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>> Signed-off-by: Florian Westphal <fw@strlen.de>
>>> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>> Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
>>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (backported
>>> from commit 4592ee7f525c4683ec9e290381601fdee50ae110)
>>> Signed-off-by: Bodong Wang <bodong@nvidia.com>
>>> [bodong: remove doc file]
>>>
>>> Conflicts:
>>> 	Documentation/networking/nf_conntrack-sysctl.rst
>>> ---
>>>     include/net/netns/conntrack.h           |  2 --
>>>     net/netfilter/nf_conntrack_proto_tcp.c  |  1 -
>>>     net/netfilter/nf_conntrack_proto_udp.c  |  1 -
>>>     net/netfilter/nf_conntrack_standalone.c | 16 ----------------
>>>     net/netfilter/nf_flow_table_core.c      | 11 ++++++++---
>>>     5 files changed, 8 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/include/net/netns/conntrack.h
>>> b/include/net/netns/conntrack.h index 67bbaa6..9e3963c8 100644
>>> --- a/include/net/netns/conntrack.h
>>> +++ b/include/net/netns/conntrack.h
>>> @@ -29,7 +29,6 @@ struct nf_tcp_net {
>>>     	int tcp_max_retrans;
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	unsigned int offload_timeout;
>>> -	unsigned int offload_pickup;
>>>     #endif
>>>     };
>>>     
>>> @@ -43,7 +42,6 @@ struct nf_udp_net {
>>>     	unsigned int timeouts[UDP_CT_MAX];
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	unsigned int offload_timeout;
>>> -	unsigned int offload_pickup;
>>>     #endif
>>>     };
>>>     
>>> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c
>>> b/net/netfilter/nf_conntrack_proto_tcp.c
>>> index d8e554c..0e51fb7 100644
>>> --- a/net/netfilter/nf_conntrack_proto_tcp.c
>>> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
>>> @@ -1435,7 +1435,6 @@ void nf_conntrack_tcp_init_net(struct net *net)
>>>     
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	tn->offload_timeout = 30 * HZ;
>>> -	tn->offload_pickup = 120 * HZ;
>>>     #endif
>>>     }
>>>     
>>> diff --git a/net/netfilter/nf_conntrack_proto_udp.c
>>> b/net/netfilter/nf_conntrack_proto_udp.c
>>> index b3e135e..ee336d7 100644
>>> --- a/net/netfilter/nf_conntrack_proto_udp.c
>>> +++ b/net/netfilter/nf_conntrack_proto_udp.c
>>> @@ -282,7 +282,6 @@ void nf_conntrack_udp_init_net(struct net *net)
>>>     
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	un->offload_timeout = 30 * HZ;
>>> -	un->offload_pickup = 30 * HZ;
>>>     #endif
>>>     }
>>>     
>>> diff --git a/net/netfilter/nf_conntrack_standalone.c
>>> b/net/netfilter/nf_conntrack_standalone.c
>>> index ac8f12b..1e78ad8 100644
>>> --- a/net/netfilter/nf_conntrack_standalone.c
>>> +++ b/net/netfilter/nf_conntrack_standalone.c
>>> @@ -569,7 +569,6 @@ enum nf_ct_sysctl_index {
>>>     	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_UNACK,
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD,
>>> -	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP,
>>>     #endif
>>>     	NF_SYSCTL_CT_PROTO_TCP_LOOSE,
>>>     	NF_SYSCTL_CT_PROTO_TCP_LIBERAL,
>>> @@ -578,7 +577,6 @@ enum nf_ct_sysctl_index {
>>>     	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM,
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD,
>>> -	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP,
>>>     #endif
>>>     	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP,
>>>     	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMPV6,
>>> @@ -773,12 +771,6 @@ enum nf_ct_sysctl_index {
>>>     		.mode		= 0644,
>>>     		.proc_handler	= proc_dointvec_jiffies,
>>>     	},
>>> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP] = {
>>> -		.procname	= "nf_flowtable_tcp_pickup",
>>> -		.maxlen		= sizeof(unsigned int),
>>> -		.mode		= 0644,
>>> -		.proc_handler	= proc_dointvec_jiffies,
>>> -	},
>>>     #endif
>>>     	[NF_SYSCTL_CT_PROTO_TCP_LOOSE] = {
>>>     		.procname	= "nf_conntrack_tcp_loose",
>>> @@ -821,12 +813,6 @@ enum nf_ct_sysctl_index {
>>>     		.mode		= 0644,
>>>     		.proc_handler	= proc_dointvec_jiffies,
>>>     	},
>>> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP] = {
>>> -		.procname	= "nf_flowtable_udp_pickup",
>>> -		.maxlen		= sizeof(unsigned int),
>>> -		.mode		= 0644,
>>> -		.proc_handler	= proc_dointvec_jiffies,
>>> -	},
>>>     #endif
>>>     	[NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP] = {
>>>     		.procname	= "nf_conntrack_icmp_timeout",
>>> @@ -1006,7 +992,6 @@ static void
>>> nf_conntrack_standalone_init_tcp_sysctl(struct net *net,
>>>     
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD].data = &tn->offload_timeout;
>>> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP].data = &tn->offload_pickup;
>>>     #endif
>>>     
>>>     }
>>> @@ -1098,7 +1083,6 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
>>>     	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM].data = &un->timeouts[UDP_CT_REPLIED];
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD].data = &un->offload_timeout;
>>> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP].data = &un->offload_pickup;
>>>     #endif
>>>     
>>>     	nf_conntrack_standalone_init_tcp_sysctl(net, table); diff --git
>>> a/net/netfilter/nf_flow_table_core.c
>>> b/net/netfilter/nf_flow_table_core.c
>>> index 8ed2752..ba86bc0 100644
>>> --- a/net/netfilter/nf_flow_table_core.c
>>> +++ b/net/netfilter/nf_flow_table_core.c
>>> @@ -136,7 +136,7 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>>>     	const struct nf_conntrack_l4proto *l4proto;
>>>     	struct net *net = nf_ct_net(ct);
>>>     	int l4num = nf_ct_protonum(ct);
>>> -	unsigned int timeout;
>>> +	s32 timeout;
>>>     
>>>     	l4proto = nf_ct_l4proto_find(l4num);
>>>     	if (!l4proto)
>>> @@ -145,15 +145,20 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>>>     	if (l4num == IPPROTO_TCP) {
>>>     		struct nf_tcp_net *tn = nf_tcp_pernet(net);
>>>     
>>> -		timeout = tn->offload_pickup;
>>> +		timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
>>> +		timeout -= tn->offload_timeout;
>>>     	} else if (l4num == IPPROTO_UDP) {
>>>     		struct nf_udp_net *tn = nf_udp_pernet(net);
>>>     
>>> -		timeout = tn->offload_pickup;
>>> +		timeout = tn->timeouts[UDP_CT_REPLIED];
>>> +		timeout -= tn->offload_timeout;
>>>     	} else {
>>>     		return;
>>>     	}
>>>     
>>> +	if (timeout < 0)
>>> +		timeout = 0;
>>> +
>>>     	if (nf_flow_timeout_delta(ct->timeout) > (__s32)timeout)
>>>     		ct->timeout = nfct_time_stamp + timeout;
>>>     }
>> Acked-by: Tim Gardner <tim.gardner@canonical.com>
>>
>> --
>> -----------
>> Tim Gardner
>> Canonical, Inc
Zachary Tahenakos July 6, 2022, 1:06 p.m. UTC | #6
Hello Vladimir and all,

This patch set is currently slated to go into the 2022.06.20 f:bluefield 
kernel, however this cycle is currently delayed due to the previous 
cycle and other factors. In order to get bluefield back on schedule, I 
would like to skip the 06.20 cycle and instead have this change set get 
released in the 2022.07.11 cycle. Is this OK for you or is there an 
urgency on this change that requires this to be out before the release 
of the 07.11 kernel.

-Zack

On 6/21/22 11:08 AM, Vladimir Sokolovsky wrote:
> Hi Zachary,
> So, when should we expect these patches to be merged into the "master-next" branch?
>
> Regards,
> Vladimir
>
> -----Original Message-----
> From: Zachary Tahenakos <zachary.tahenakos@canonical.com>
> Sent: Tuesday, June 21, 2022 10:05 AM
> To: Bodong Wang <bodong@nvidia.com>; Tim Gardner <tim.gardner@canonical.com>; kernel-team@lists.ubuntu.com
> Cc: Vladimir Sokolovsky <vlad@nvidia.com>; Maor Dickman <maord@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>
> Subject: Re: ACK: [SRU][F:linux-bluefield][PATCH v1 1/2] netfilter: conntrack: remove offload_pickup sysctl again
>
> Hi Bodong,
>
> Yes it isn't merged to master-next on purpose. This patch would have landed for 2022.06.20, however the previous cycle is being extended out.
> We are prioritizing the urgent patches for the 2022.05.30 cycle to ensure those can get out the door smoothly and then once the cycles catch up, we can merge this in, among other patches that need to come in as well.
>
>
> -Zack
>
> On 6/21/22 10:57 AM, Bodong Wang wrote:
>> Hi Tim/Zach,
>>
>> Seems like this patch series is ACKed but not merged to master-next. Could you please check?
>>
>> Thanks,
>> Bodong
>>
>> -----Original Message-----
>> From: Tim Gardner <tim.gardner@canonical.com>
>> Sent: Thursday, June 16, 2022 10:10 AM
>> To: Bodong Wang <bodong@nvidia.com>; kernel-team@lists.ubuntu.com
>> Cc: Vladimir Sokolovsky <vlad@nvidia.com>; Maor Dickman
>> <maord@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>
>> Subject: ACK: [SRU][F:linux-bluefield][PATCH v1 1/2] netfilter:
>> conntrack: remove offload_pickup sysctl again
>>
>> On 5/26/22 07:03, Bodong Wang wrote:
>>> From: Florian Westphal <fw@strlen.de>
>>>
>>> BugLink: https://bugs.launchpad.net/bugs/1975820
>>>
>>> These two sysctls were added because the hardcoded defaults (2
>>> minutes, tcp, 30 seconds, udp) turned out to be too low for some setups.
>>>
>>> They appeared in 5.14-rc1 so it should be fine to remove it again.
>>>
>>> Marcelo convinced me that there should be no difference between a
>>> flow that was offloaded vs. a flow that was not wrt. timeout handling.
>>> Thus the default is changed to those for TCP established and UDP
>>> stream,
>>> 5 days and 120 seconds, respectively.
>>>
>>> Marcelo also suggested to account for the timeout value used for the
>>> offloading, this avoids increase beyond the value in the
>>> conntrack-sysctl and will also instantly expire the conntrack entry with altered sysctls.
>>>
>>> Example:
>>>       nf_conntrack_udp_timeout_stream=60
>>>       nf_flowtable_udp_timeout=60
>>>
>>> This will remove offloaded udp flows after one minute, rather than two.
>>>
>>> An earlier version of this patch also cleared the ASSURED bit to
>>> allow nf_conntrack to evict the entry via early_drop (i.e., table full).
>>> However, it looks like we can safely assume that connection timed out
>>> via HW is still in established state, so this isn't needed.
>>>
>>> Quoting Oz:
>>>     [..] the hardware sends all packets with a set FIN flags to sw.
>>>     [..] Connections that are aged in hardware are expected to be in the
>>>     established state.
>>>
>>> In case it turns out that back-to-sw-path transition can occur for
>>> 'dodgy' connections too (e.g., one side disappeared while
>>> software-path would have been in RETRANS timeout), we can adjust this later.
>>>
>>> Cc: Oz Shlomo <ozsh@nvidia.com>
>>> Cc: Paul Blakey <paulb@nvidia.com>
>>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>> Signed-off-by: Florian Westphal <fw@strlen.de>
>>> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>> Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
>>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (backported
>>> from commit 4592ee7f525c4683ec9e290381601fdee50ae110)
>>> Signed-off-by: Bodong Wang <bodong@nvidia.com>
>>> [bodong: remove doc file]
>>>
>>> Conflicts:
>>> 	Documentation/networking/nf_conntrack-sysctl.rst
>>> ---
>>>     include/net/netns/conntrack.h           |  2 --
>>>     net/netfilter/nf_conntrack_proto_tcp.c  |  1 -
>>>     net/netfilter/nf_conntrack_proto_udp.c  |  1 -
>>>     net/netfilter/nf_conntrack_standalone.c | 16 ----------------
>>>     net/netfilter/nf_flow_table_core.c      | 11 ++++++++---
>>>     5 files changed, 8 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/include/net/netns/conntrack.h
>>> b/include/net/netns/conntrack.h index 67bbaa6..9e3963c8 100644
>>> --- a/include/net/netns/conntrack.h
>>> +++ b/include/net/netns/conntrack.h
>>> @@ -29,7 +29,6 @@ struct nf_tcp_net {
>>>     	int tcp_max_retrans;
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	unsigned int offload_timeout;
>>> -	unsigned int offload_pickup;
>>>     #endif
>>>     };
>>>     
>>> @@ -43,7 +42,6 @@ struct nf_udp_net {
>>>     	unsigned int timeouts[UDP_CT_MAX];
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	unsigned int offload_timeout;
>>> -	unsigned int offload_pickup;
>>>     #endif
>>>     };
>>>     
>>> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c
>>> b/net/netfilter/nf_conntrack_proto_tcp.c
>>> index d8e554c..0e51fb7 100644
>>> --- a/net/netfilter/nf_conntrack_proto_tcp.c
>>> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
>>> @@ -1435,7 +1435,6 @@ void nf_conntrack_tcp_init_net(struct net *net)
>>>     
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	tn->offload_timeout = 30 * HZ;
>>> -	tn->offload_pickup = 120 * HZ;
>>>     #endif
>>>     }
>>>     
>>> diff --git a/net/netfilter/nf_conntrack_proto_udp.c
>>> b/net/netfilter/nf_conntrack_proto_udp.c
>>> index b3e135e..ee336d7 100644
>>> --- a/net/netfilter/nf_conntrack_proto_udp.c
>>> +++ b/net/netfilter/nf_conntrack_proto_udp.c
>>> @@ -282,7 +282,6 @@ void nf_conntrack_udp_init_net(struct net *net)
>>>     
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	un->offload_timeout = 30 * HZ;
>>> -	un->offload_pickup = 30 * HZ;
>>>     #endif
>>>     }
>>>     
>>> diff --git a/net/netfilter/nf_conntrack_standalone.c
>>> b/net/netfilter/nf_conntrack_standalone.c
>>> index ac8f12b..1e78ad8 100644
>>> --- a/net/netfilter/nf_conntrack_standalone.c
>>> +++ b/net/netfilter/nf_conntrack_standalone.c
>>> @@ -569,7 +569,6 @@ enum nf_ct_sysctl_index {
>>>     	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_UNACK,
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD,
>>> -	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP,
>>>     #endif
>>>     	NF_SYSCTL_CT_PROTO_TCP_LOOSE,
>>>     	NF_SYSCTL_CT_PROTO_TCP_LIBERAL,
>>> @@ -578,7 +577,6 @@ enum nf_ct_sysctl_index {
>>>     	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM,
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD,
>>> -	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP,
>>>     #endif
>>>     	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP,
>>>     	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMPV6,
>>> @@ -773,12 +771,6 @@ enum nf_ct_sysctl_index {
>>>     		.mode		= 0644,
>>>     		.proc_handler	= proc_dointvec_jiffies,
>>>     	},
>>> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP] = {
>>> -		.procname	= "nf_flowtable_tcp_pickup",
>>> -		.maxlen		= sizeof(unsigned int),
>>> -		.mode		= 0644,
>>> -		.proc_handler	= proc_dointvec_jiffies,
>>> -	},
>>>     #endif
>>>     	[NF_SYSCTL_CT_PROTO_TCP_LOOSE] = {
>>>     		.procname	= "nf_conntrack_tcp_loose",
>>> @@ -821,12 +813,6 @@ enum nf_ct_sysctl_index {
>>>     		.mode		= 0644,
>>>     		.proc_handler	= proc_dointvec_jiffies,
>>>     	},
>>> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP] = {
>>> -		.procname	= "nf_flowtable_udp_pickup",
>>> -		.maxlen		= sizeof(unsigned int),
>>> -		.mode		= 0644,
>>> -		.proc_handler	= proc_dointvec_jiffies,
>>> -	},
>>>     #endif
>>>     	[NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP] = {
>>>     		.procname	= "nf_conntrack_icmp_timeout",
>>> @@ -1006,7 +992,6 @@ static void
>>> nf_conntrack_standalone_init_tcp_sysctl(struct net *net,
>>>     
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD].data = &tn->offload_timeout;
>>> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP].data = &tn->offload_pickup;
>>>     #endif
>>>     
>>>     }
>>> @@ -1098,7 +1083,6 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
>>>     	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM].data = &un->timeouts[UDP_CT_REPLIED];
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD].data = &un->offload_timeout;
>>> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP].data = &un->offload_pickup;
>>>     #endif
>>>     
>>>     	nf_conntrack_standalone_init_tcp_sysctl(net, table); diff --git
>>> a/net/netfilter/nf_flow_table_core.c
>>> b/net/netfilter/nf_flow_table_core.c
>>> index 8ed2752..ba86bc0 100644
>>> --- a/net/netfilter/nf_flow_table_core.c
>>> +++ b/net/netfilter/nf_flow_table_core.c
>>> @@ -136,7 +136,7 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>>>     	const struct nf_conntrack_l4proto *l4proto;
>>>     	struct net *net = nf_ct_net(ct);
>>>     	int l4num = nf_ct_protonum(ct);
>>> -	unsigned int timeout;
>>> +	s32 timeout;
>>>     
>>>     	l4proto = nf_ct_l4proto_find(l4num);
>>>     	if (!l4proto)
>>> @@ -145,15 +145,20 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>>>     	if (l4num == IPPROTO_TCP) {
>>>     		struct nf_tcp_net *tn = nf_tcp_pernet(net);
>>>     
>>> -		timeout = tn->offload_pickup;
>>> +		timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
>>> +		timeout -= tn->offload_timeout;
>>>     	} else if (l4num == IPPROTO_UDP) {
>>>     		struct nf_udp_net *tn = nf_udp_pernet(net);
>>>     
>>> -		timeout = tn->offload_pickup;
>>> +		timeout = tn->timeouts[UDP_CT_REPLIED];
>>> +		timeout -= tn->offload_timeout;
>>>     	} else {
>>>     		return;
>>>     	}
>>>     
>>> +	if (timeout < 0)
>>> +		timeout = 0;
>>> +
>>>     	if (nf_flow_timeout_delta(ct->timeout) > (__s32)timeout)
>>>     		ct->timeout = nfct_time_stamp + timeout;
>>>     }
>> Acked-by: Tim Gardner <tim.gardner@canonical.com>
>>
>> --
>> -----------
>> Tim Gardner
>> Canonical, Inc
Vladimir Sokolovsky July 6, 2022, 1:30 p.m. UTC | #7
Hi Zack,
We need to have signed kernel DEBs with these patches not later than July 18. Otherwise, it will be too late for our release.
Can this be done?

Regards,
Vladimir

-----Original Message-----
From: Zachary Tahenakos <zachary.tahenakos@canonical.com> 
Sent: Wednesday, July 6, 2022 8:07 AM
To: Vladimir Sokolovsky <vlad@nvidia.com>; Bodong Wang <bodong@nvidia.com>; Tim Gardner <tim.gardner@canonical.com>; kernel-team@lists.ubuntu.com
Cc: Maor Dickman <maord@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>
Subject: Re: ACK: [SRU][F:linux-bluefield][PATCH v1 1/2] netfilter: conntrack: remove offload_pickup sysctl again

Hello Vladimir and all,

This patch set is currently slated to go into the 2022.06.20 f:bluefield kernel, however this cycle is currently delayed due to the previous cycle and other factors. In order to get bluefield back on schedule, I would like to skip the 06.20 cycle and instead have this change set get released in the 2022.07.11 cycle. Is this OK for you or is there an urgency on this change that requires this to be out before the release of the 07.11 kernel.

-Zack

On 6/21/22 11:08 AM, Vladimir Sokolovsky wrote:
> Hi Zachary,
> So, when should we expect these patches to be merged into the "master-next" branch?
>
> Regards,
> Vladimir
>
> -----Original Message-----
> From: Zachary Tahenakos <zachary.tahenakos@canonical.com>
> Sent: Tuesday, June 21, 2022 10:05 AM
> To: Bodong Wang <bodong@nvidia.com>; Tim Gardner <tim.gardner@canonical.com>; kernel-team@lists.ubuntu.com
> Cc: Vladimir Sokolovsky <vlad@nvidia.com>; Maor Dickman <maord@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>
> Subject: Re: ACK: [SRU][F:linux-bluefield][PATCH v1 1/2] netfilter: conntrack: remove offload_pickup sysctl again
>
> Hi Bodong,
>
> Yes it isn't merged to master-next on purpose. This patch would have landed for 2022.06.20, however the previous cycle is being extended out.
> We are prioritizing the urgent patches for the 2022.05.30 cycle to ensure those can get out the door smoothly and then once the cycles catch up, we can merge this in, among other patches that need to come in as well.
>
>
> -Zack
>
> On 6/21/22 10:57 AM, Bodong Wang wrote:
>> Hi Tim/Zach,
>>
>> Seems like this patch series is ACKed but not merged to master-next. Could you please check?
>>
>> Thanks,
>> Bodong
>>
>> -----Original Message-----
>> From: Tim Gardner <tim.gardner@canonical.com>
>> Sent: Thursday, June 16, 2022 10:10 AM
>> To: Bodong Wang <bodong@nvidia.com>; kernel-team@lists.ubuntu.com
>> Cc: Vladimir Sokolovsky <vlad@nvidia.com>; Maor Dickman
>> <maord@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>
>> Subject: ACK: [SRU][F:linux-bluefield][PATCH v1 1/2] netfilter:
>> conntrack: remove offload_pickup sysctl again
>>
>> On 5/26/22 07:03, Bodong Wang wrote:
>>> From: Florian Westphal <fw@strlen.de>
>>>
>>> BugLink: https://bugs.launchpad.net/bugs/1975820
>>>
>>> These two sysctls were added because the hardcoded defaults (2
>>> minutes, tcp, 30 seconds, udp) turned out to be too low for some setups.
>>>
>>> They appeared in 5.14-rc1 so it should be fine to remove it again.
>>>
>>> Marcelo convinced me that there should be no difference between a
>>> flow that was offloaded vs. a flow that was not wrt. timeout handling.
>>> Thus the default is changed to those for TCP established and UDP
>>> stream,
>>> 5 days and 120 seconds, respectively.
>>>
>>> Marcelo also suggested to account for the timeout value used for the
>>> offloading, this avoids increase beyond the value in the
>>> conntrack-sysctl and will also instantly expire the conntrack entry with altered sysctls.
>>>
>>> Example:
>>>       nf_conntrack_udp_timeout_stream=60
>>>       nf_flowtable_udp_timeout=60
>>>
>>> This will remove offloaded udp flows after one minute, rather than two.
>>>
>>> An earlier version of this patch also cleared the ASSURED bit to
>>> allow nf_conntrack to evict the entry via early_drop (i.e., table full).
>>> However, it looks like we can safely assume that connection timed out
>>> via HW is still in established state, so this isn't needed.
>>>
>>> Quoting Oz:
>>>     [..] the hardware sends all packets with a set FIN flags to sw.
>>>     [..] Connections that are aged in hardware are expected to be in the
>>>     established state.
>>>
>>> In case it turns out that back-to-sw-path transition can occur for
>>> 'dodgy' connections too (e.g., one side disappeared while
>>> software-path would have been in RETRANS timeout), we can adjust this later.
>>>
>>> Cc: Oz Shlomo <ozsh@nvidia.com>
>>> Cc: Paul Blakey <paulb@nvidia.com>
>>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>> Signed-off-by: Florian Westphal <fw@strlen.de>
>>> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>> Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
>>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (backported
>>> from commit 4592ee7f525c4683ec9e290381601fdee50ae110)
>>> Signed-off-by: Bodong Wang <bodong@nvidia.com>
>>> [bodong: remove doc file]
>>>
>>> Conflicts:
>>> 	Documentation/networking/nf_conntrack-sysctl.rst
>>> ---
>>>     include/net/netns/conntrack.h           |  2 --
>>>     net/netfilter/nf_conntrack_proto_tcp.c  |  1 -
>>>     net/netfilter/nf_conntrack_proto_udp.c  |  1 -
>>>     net/netfilter/nf_conntrack_standalone.c | 16 ----------------
>>>     net/netfilter/nf_flow_table_core.c      | 11 ++++++++---
>>>     5 files changed, 8 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/include/net/netns/conntrack.h
>>> b/include/net/netns/conntrack.h index 67bbaa6..9e3963c8 100644
>>> --- a/include/net/netns/conntrack.h
>>> +++ b/include/net/netns/conntrack.h
>>> @@ -29,7 +29,6 @@ struct nf_tcp_net {
>>>     	int tcp_max_retrans;
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	unsigned int offload_timeout;
>>> -	unsigned int offload_pickup;
>>>     #endif
>>>     };
>>>     
>>> @@ -43,7 +42,6 @@ struct nf_udp_net {
>>>     	unsigned int timeouts[UDP_CT_MAX];
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	unsigned int offload_timeout;
>>> -	unsigned int offload_pickup;
>>>     #endif
>>>     };
>>>     
>>> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c
>>> b/net/netfilter/nf_conntrack_proto_tcp.c
>>> index d8e554c..0e51fb7 100644
>>> --- a/net/netfilter/nf_conntrack_proto_tcp.c
>>> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
>>> @@ -1435,7 +1435,6 @@ void nf_conntrack_tcp_init_net(struct net *net)
>>>     
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	tn->offload_timeout = 30 * HZ;
>>> -	tn->offload_pickup = 120 * HZ;
>>>     #endif
>>>     }
>>>     
>>> diff --git a/net/netfilter/nf_conntrack_proto_udp.c
>>> b/net/netfilter/nf_conntrack_proto_udp.c
>>> index b3e135e..ee336d7 100644
>>> --- a/net/netfilter/nf_conntrack_proto_udp.c
>>> +++ b/net/netfilter/nf_conntrack_proto_udp.c
>>> @@ -282,7 +282,6 @@ void nf_conntrack_udp_init_net(struct net *net)
>>>     
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	un->offload_timeout = 30 * HZ;
>>> -	un->offload_pickup = 30 * HZ;
>>>     #endif
>>>     }
>>>     
>>> diff --git a/net/netfilter/nf_conntrack_standalone.c
>>> b/net/netfilter/nf_conntrack_standalone.c
>>> index ac8f12b..1e78ad8 100644
>>> --- a/net/netfilter/nf_conntrack_standalone.c
>>> +++ b/net/netfilter/nf_conntrack_standalone.c
>>> @@ -569,7 +569,6 @@ enum nf_ct_sysctl_index {
>>>     	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_UNACK,
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD,
>>> -	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP,
>>>     #endif
>>>     	NF_SYSCTL_CT_PROTO_TCP_LOOSE,
>>>     	NF_SYSCTL_CT_PROTO_TCP_LIBERAL,
>>> @@ -578,7 +577,6 @@ enum nf_ct_sysctl_index {
>>>     	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM,
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD,
>>> -	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP,
>>>     #endif
>>>     	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP,
>>>     	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMPV6,
>>> @@ -773,12 +771,6 @@ enum nf_ct_sysctl_index {
>>>     		.mode		= 0644,
>>>     		.proc_handler	= proc_dointvec_jiffies,
>>>     	},
>>> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP] = {
>>> -		.procname	= "nf_flowtable_tcp_pickup",
>>> -		.maxlen		= sizeof(unsigned int),
>>> -		.mode		= 0644,
>>> -		.proc_handler	= proc_dointvec_jiffies,
>>> -	},
>>>     #endif
>>>     	[NF_SYSCTL_CT_PROTO_TCP_LOOSE] = {
>>>     		.procname	= "nf_conntrack_tcp_loose",
>>> @@ -821,12 +813,6 @@ enum nf_ct_sysctl_index {
>>>     		.mode		= 0644,
>>>     		.proc_handler	= proc_dointvec_jiffies,
>>>     	},
>>> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP] = {
>>> -		.procname	= "nf_flowtable_udp_pickup",
>>> -		.maxlen		= sizeof(unsigned int),
>>> -		.mode		= 0644,
>>> -		.proc_handler	= proc_dointvec_jiffies,
>>> -	},
>>>     #endif
>>>     	[NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP] = {
>>>     		.procname	= "nf_conntrack_icmp_timeout",
>>> @@ -1006,7 +992,6 @@ static void
>>> nf_conntrack_standalone_init_tcp_sysctl(struct net *net,
>>>     
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD].data = &tn->offload_timeout;
>>> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP].data = &tn->offload_pickup;
>>>     #endif
>>>     
>>>     }
>>> @@ -1098,7 +1083,6 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
>>>     	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM].data = &un->timeouts[UDP_CT_REPLIED];
>>>     #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>     	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD].data = &un->offload_timeout;
>>> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP].data = &un->offload_pickup;
>>>     #endif
>>>     
>>>     	nf_conntrack_standalone_init_tcp_sysctl(net, table); diff --git
>>> a/net/netfilter/nf_flow_table_core.c
>>> b/net/netfilter/nf_flow_table_core.c
>>> index 8ed2752..ba86bc0 100644
>>> --- a/net/netfilter/nf_flow_table_core.c
>>> +++ b/net/netfilter/nf_flow_table_core.c
>>> @@ -136,7 +136,7 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>>>     	const struct nf_conntrack_l4proto *l4proto;
>>>     	struct net *net = nf_ct_net(ct);
>>>     	int l4num = nf_ct_protonum(ct);
>>> -	unsigned int timeout;
>>> +	s32 timeout;
>>>     
>>>     	l4proto = nf_ct_l4proto_find(l4num);
>>>     	if (!l4proto)
>>> @@ -145,15 +145,20 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>>>     	if (l4num == IPPROTO_TCP) {
>>>     		struct nf_tcp_net *tn = nf_tcp_pernet(net);
>>>     
>>> -		timeout = tn->offload_pickup;
>>> +		timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
>>> +		timeout -= tn->offload_timeout;
>>>     	} else if (l4num == IPPROTO_UDP) {
>>>     		struct nf_udp_net *tn = nf_udp_pernet(net);
>>>     
>>> -		timeout = tn->offload_pickup;
>>> +		timeout = tn->timeouts[UDP_CT_REPLIED];
>>> +		timeout -= tn->offload_timeout;
>>>     	} else {
>>>     		return;
>>>     	}
>>>     
>>> +	if (timeout < 0)
>>> +		timeout = 0;
>>> +
>>>     	if (nf_flow_timeout_delta(ct->timeout) > (__s32)timeout)
>>>     		ct->timeout = nfct_time_stamp + timeout;
>>>     }
>> Acked-by: Tim Gardner <tim.gardner@canonical.com>
>>
>> --
>> -----------
>> Tim Gardner
>> Canonical, Inc
Zachary Tahenakos July 6, 2022, 2:13 p.m. UTC | #8
Hey Vladimir,

No that will not be doable for 2022.07.11, so we will try to get this 
done for 2022.06.20.

-Zack

On 7/6/22 9:30 AM, Vladimir Sokolovsky wrote:
> Hi Zack,
> We need to have signed kernel DEBs with these patches not later than July 18. Otherwise, it will be too late for our release.
> Can this be done?
>
> Regards,
> Vladimir
>
> -----Original Message-----
> From: Zachary Tahenakos <zachary.tahenakos@canonical.com>
> Sent: Wednesday, July 6, 2022 8:07 AM
> To: Vladimir Sokolovsky <vlad@nvidia.com>; Bodong Wang <bodong@nvidia.com>; Tim Gardner <tim.gardner@canonical.com>; kernel-team@lists.ubuntu.com
> Cc: Maor Dickman <maord@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>
> Subject: Re: ACK: [SRU][F:linux-bluefield][PATCH v1 1/2] netfilter: conntrack: remove offload_pickup sysctl again
>
> Hello Vladimir and all,
>
> This patch set is currently slated to go into the 2022.06.20 f:bluefield kernel, however this cycle is currently delayed due to the previous cycle and other factors. In order to get bluefield back on schedule, I would like to skip the 06.20 cycle and instead have this change set get released in the 2022.07.11 cycle. Is this OK for you or is there an urgency on this change that requires this to be out before the release of the 07.11 kernel.
>
> -Zack
>
> On 6/21/22 11:08 AM, Vladimir Sokolovsky wrote:
>> Hi Zachary,
>> So, when should we expect these patches to be merged into the "master-next" branch?
>>
>> Regards,
>> Vladimir
>>
>> -----Original Message-----
>> From: Zachary Tahenakos <zachary.tahenakos@canonical.com>
>> Sent: Tuesday, June 21, 2022 10:05 AM
>> To: Bodong Wang <bodong@nvidia.com>; Tim Gardner <tim.gardner@canonical.com>; kernel-team@lists.ubuntu.com
>> Cc: Vladimir Sokolovsky <vlad@nvidia.com>; Maor Dickman <maord@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>
>> Subject: Re: ACK: [SRU][F:linux-bluefield][PATCH v1 1/2] netfilter: conntrack: remove offload_pickup sysctl again
>>
>> Hi Bodong,
>>
>> Yes it isn't merged to master-next on purpose. This patch would have landed for 2022.06.20, however the previous cycle is being extended out.
>> We are prioritizing the urgent patches for the 2022.05.30 cycle to ensure those can get out the door smoothly and then once the cycles catch up, we can merge this in, among other patches that need to come in as well.
>>
>>
>> -Zack
>>
>> On 6/21/22 10:57 AM, Bodong Wang wrote:
>>> Hi Tim/Zach,
>>>
>>> Seems like this patch series is ACKed but not merged to master-next. Could you please check?
>>>
>>> Thanks,
>>> Bodong
>>>
>>> -----Original Message-----
>>> From: Tim Gardner <tim.gardner@canonical.com>
>>> Sent: Thursday, June 16, 2022 10:10 AM
>>> To: Bodong Wang <bodong@nvidia.com>; kernel-team@lists.ubuntu.com
>>> Cc: Vladimir Sokolovsky <vlad@nvidia.com>; Maor Dickman
>>> <maord@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>
>>> Subject: ACK: [SRU][F:linux-bluefield][PATCH v1 1/2] netfilter:
>>> conntrack: remove offload_pickup sysctl again
>>>
>>> On 5/26/22 07:03, Bodong Wang wrote:
>>>> From: Florian Westphal <fw@strlen.de>
>>>>
>>>> BugLink: https://bugs.launchpad.net/bugs/1975820
>>>>
>>>> These two sysctls were added because the hardcoded defaults (2
>>>> minutes, tcp, 30 seconds, udp) turned out to be too low for some setups.
>>>>
>>>> They appeared in 5.14-rc1 so it should be fine to remove it again.
>>>>
>>>> Marcelo convinced me that there should be no difference between a
>>>> flow that was offloaded vs. a flow that was not wrt. timeout handling.
>>>> Thus the default is changed to those for TCP established and UDP
>>>> stream,
>>>> 5 days and 120 seconds, respectively.
>>>>
>>>> Marcelo also suggested to account for the timeout value used for the
>>>> offloading, this avoids increase beyond the value in the
>>>> conntrack-sysctl and will also instantly expire the conntrack entry with altered sysctls.
>>>>
>>>> Example:
>>>>        nf_conntrack_udp_timeout_stream=60
>>>>        nf_flowtable_udp_timeout=60
>>>>
>>>> This will remove offloaded udp flows after one minute, rather than two.
>>>>
>>>> An earlier version of this patch also cleared the ASSURED bit to
>>>> allow nf_conntrack to evict the entry via early_drop (i.e., table full).
>>>> However, it looks like we can safely assume that connection timed out
>>>> via HW is still in established state, so this isn't needed.
>>>>
>>>> Quoting Oz:
>>>>      [..] the hardware sends all packets with a set FIN flags to sw.
>>>>      [..] Connections that are aged in hardware are expected to be in the
>>>>      established state.
>>>>
>>>> In case it turns out that back-to-sw-path transition can occur for
>>>> 'dodgy' connections too (e.g., one side disappeared while
>>>> software-path would have been in RETRANS timeout), we can adjust this later.
>>>>
>>>> Cc: Oz Shlomo <ozsh@nvidia.com>
>>>> Cc: Paul Blakey <paulb@nvidia.com>
>>>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>>> Signed-off-by: Florian Westphal <fw@strlen.de>
>>>> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>>> Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
>>>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (backported
>>>> from commit 4592ee7f525c4683ec9e290381601fdee50ae110)
>>>> Signed-off-by: Bodong Wang <bodong@nvidia.com>
>>>> [bodong: remove doc file]
>>>>
>>>> Conflicts:
>>>> 	Documentation/networking/nf_conntrack-sysctl.rst
>>>> ---
>>>>      include/net/netns/conntrack.h           |  2 --
>>>>      net/netfilter/nf_conntrack_proto_tcp.c  |  1 -
>>>>      net/netfilter/nf_conntrack_proto_udp.c  |  1 -
>>>>      net/netfilter/nf_conntrack_standalone.c | 16 ----------------
>>>>      net/netfilter/nf_flow_table_core.c      | 11 ++++++++---
>>>>      5 files changed, 8 insertions(+), 23 deletions(-)
>>>>
>>>> diff --git a/include/net/netns/conntrack.h
>>>> b/include/net/netns/conntrack.h index 67bbaa6..9e3963c8 100644
>>>> --- a/include/net/netns/conntrack.h
>>>> +++ b/include/net/netns/conntrack.h
>>>> @@ -29,7 +29,6 @@ struct nf_tcp_net {
>>>>      	int tcp_max_retrans;
>>>>      #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>>      	unsigned int offload_timeout;
>>>> -	unsigned int offload_pickup;
>>>>      #endif
>>>>      };
>>>>      
>>>> @@ -43,7 +42,6 @@ struct nf_udp_net {
>>>>      	unsigned int timeouts[UDP_CT_MAX];
>>>>      #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>>      	unsigned int offload_timeout;
>>>> -	unsigned int offload_pickup;
>>>>      #endif
>>>>      };
>>>>      
>>>> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c
>>>> b/net/netfilter/nf_conntrack_proto_tcp.c
>>>> index d8e554c..0e51fb7 100644
>>>> --- a/net/netfilter/nf_conntrack_proto_tcp.c
>>>> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
>>>> @@ -1435,7 +1435,6 @@ void nf_conntrack_tcp_init_net(struct net *net)
>>>>      
>>>>      #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>>      	tn->offload_timeout = 30 * HZ;
>>>> -	tn->offload_pickup = 120 * HZ;
>>>>      #endif
>>>>      }
>>>>      
>>>> diff --git a/net/netfilter/nf_conntrack_proto_udp.c
>>>> b/net/netfilter/nf_conntrack_proto_udp.c
>>>> index b3e135e..ee336d7 100644
>>>> --- a/net/netfilter/nf_conntrack_proto_udp.c
>>>> +++ b/net/netfilter/nf_conntrack_proto_udp.c
>>>> @@ -282,7 +282,6 @@ void nf_conntrack_udp_init_net(struct net *net)
>>>>      
>>>>      #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>>      	un->offload_timeout = 30 * HZ;
>>>> -	un->offload_pickup = 30 * HZ;
>>>>      #endif
>>>>      }
>>>>      
>>>> diff --git a/net/netfilter/nf_conntrack_standalone.c
>>>> b/net/netfilter/nf_conntrack_standalone.c
>>>> index ac8f12b..1e78ad8 100644
>>>> --- a/net/netfilter/nf_conntrack_standalone.c
>>>> +++ b/net/netfilter/nf_conntrack_standalone.c
>>>> @@ -569,7 +569,6 @@ enum nf_ct_sysctl_index {
>>>>      	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_UNACK,
>>>>      #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>>      	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD,
>>>> -	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP,
>>>>      #endif
>>>>      	NF_SYSCTL_CT_PROTO_TCP_LOOSE,
>>>>      	NF_SYSCTL_CT_PROTO_TCP_LIBERAL,
>>>> @@ -578,7 +577,6 @@ enum nf_ct_sysctl_index {
>>>>      	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM,
>>>>      #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>>      	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD,
>>>> -	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP,
>>>>      #endif
>>>>      	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP,
>>>>      	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMPV6,
>>>> @@ -773,12 +771,6 @@ enum nf_ct_sysctl_index {
>>>>      		.mode		= 0644,
>>>>      		.proc_handler	= proc_dointvec_jiffies,
>>>>      	},
>>>> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP] = {
>>>> -		.procname	= "nf_flowtable_tcp_pickup",
>>>> -		.maxlen		= sizeof(unsigned int),
>>>> -		.mode		= 0644,
>>>> -		.proc_handler	= proc_dointvec_jiffies,
>>>> -	},
>>>>      #endif
>>>>      	[NF_SYSCTL_CT_PROTO_TCP_LOOSE] = {
>>>>      		.procname	= "nf_conntrack_tcp_loose",
>>>> @@ -821,12 +813,6 @@ enum nf_ct_sysctl_index {
>>>>      		.mode		= 0644,
>>>>      		.proc_handler	= proc_dointvec_jiffies,
>>>>      	},
>>>> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP] = {
>>>> -		.procname	= "nf_flowtable_udp_pickup",
>>>> -		.maxlen		= sizeof(unsigned int),
>>>> -		.mode		= 0644,
>>>> -		.proc_handler	= proc_dointvec_jiffies,
>>>> -	},
>>>>      #endif
>>>>      	[NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP] = {
>>>>      		.procname	= "nf_conntrack_icmp_timeout",
>>>> @@ -1006,7 +992,6 @@ static void
>>>> nf_conntrack_standalone_init_tcp_sysctl(struct net *net,
>>>>      
>>>>      #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>>      	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD].data = &tn->offload_timeout;
>>>> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP].data = &tn->offload_pickup;
>>>>      #endif
>>>>      
>>>>      }
>>>> @@ -1098,7 +1083,6 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
>>>>      	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM].data = &un->timeouts[UDP_CT_REPLIED];
>>>>      #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>>      	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD].data = &un->offload_timeout;
>>>> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP].data = &un->offload_pickup;
>>>>      #endif
>>>>      
>>>>      	nf_conntrack_standalone_init_tcp_sysctl(net, table); diff --git
>>>> a/net/netfilter/nf_flow_table_core.c
>>>> b/net/netfilter/nf_flow_table_core.c
>>>> index 8ed2752..ba86bc0 100644
>>>> --- a/net/netfilter/nf_flow_table_core.c
>>>> +++ b/net/netfilter/nf_flow_table_core.c
>>>> @@ -136,7 +136,7 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>>>>      	const struct nf_conntrack_l4proto *l4proto;
>>>>      	struct net *net = nf_ct_net(ct);
>>>>      	int l4num = nf_ct_protonum(ct);
>>>> -	unsigned int timeout;
>>>> +	s32 timeout;
>>>>      
>>>>      	l4proto = nf_ct_l4proto_find(l4num);
>>>>      	if (!l4proto)
>>>> @@ -145,15 +145,20 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>>>>      	if (l4num == IPPROTO_TCP) {
>>>>      		struct nf_tcp_net *tn = nf_tcp_pernet(net);
>>>>      
>>>> -		timeout = tn->offload_pickup;
>>>> +		timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
>>>> +		timeout -= tn->offload_timeout;
>>>>      	} else if (l4num == IPPROTO_UDP) {
>>>>      		struct nf_udp_net *tn = nf_udp_pernet(net);
>>>>      
>>>> -		timeout = tn->offload_pickup;
>>>> +		timeout = tn->timeouts[UDP_CT_REPLIED];
>>>> +		timeout -= tn->offload_timeout;
>>>>      	} else {
>>>>      		return;
>>>>      	}
>>>>      
>>>> +	if (timeout < 0)
>>>> +		timeout = 0;
>>>> +
>>>>      	if (nf_flow_timeout_delta(ct->timeout) > (__s32)timeout)
>>>>      		ct->timeout = nfct_time_stamp + timeout;
>>>>      }
>>> Acked-by: Tim Gardner <tim.gardner@canonical.com>
>>>
>>> --
>>> -----------
>>> Tim Gardner
>>> Canonical, Inc
Vladimir Sokolovsky July 6, 2022, 2:15 p.m. UTC | #9
Thank you, Zack.

Regards,
Vladimir

-----Original Message-----
From: Zachary Tahenakos <zachary.tahenakos@canonical.com> 
Sent: Wednesday, July 6, 2022 9:14 AM
To: Vladimir Sokolovsky <vlad@nvidia.com>; Bodong Wang <bodong@nvidia.com>; Tim Gardner <tim.gardner@canonical.com>; kernel-team@lists.ubuntu.com
Cc: Maor Dickman <maord@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>
Subject: Re: ACK: [SRU][F:linux-bluefield][PATCH v1 1/2] netfilter: conntrack: remove offload_pickup sysctl again

Hey Vladimir,

No that will not be doable for 2022.07.11, so we will try to get this done for 2022.06.20.

-Zack

On 7/6/22 9:30 AM, Vladimir Sokolovsky wrote:
> Hi Zack,
> We need to have signed kernel DEBs with these patches not later than July 18. Otherwise, it will be too late for our release.
> Can this be done?
>
> Regards,
> Vladimir
>
> -----Original Message-----
> From: Zachary Tahenakos <zachary.tahenakos@canonical.com>
> Sent: Wednesday, July 6, 2022 8:07 AM
> To: Vladimir Sokolovsky <vlad@nvidia.com>; Bodong Wang 
> <bodong@nvidia.com>; Tim Gardner <tim.gardner@canonical.com>; 
> kernel-team@lists.ubuntu.com
> Cc: Maor Dickman <maord@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>
> Subject: Re: ACK: [SRU][F:linux-bluefield][PATCH v1 1/2] netfilter: 
> conntrack: remove offload_pickup sysctl again
>
> Hello Vladimir and all,
>
> This patch set is currently slated to go into the 2022.06.20 f:bluefield kernel, however this cycle is currently delayed due to the previous cycle and other factors. In order to get bluefield back on schedule, I would like to skip the 06.20 cycle and instead have this change set get released in the 2022.07.11 cycle. Is this OK for you or is there an urgency on this change that requires this to be out before the release of the 07.11 kernel.
>
> -Zack
>
> On 6/21/22 11:08 AM, Vladimir Sokolovsky wrote:
>> Hi Zachary,
>> So, when should we expect these patches to be merged into the "master-next" branch?
>>
>> Regards,
>> Vladimir
>>
>> -----Original Message-----
>> From: Zachary Tahenakos <zachary.tahenakos@canonical.com>
>> Sent: Tuesday, June 21, 2022 10:05 AM
>> To: Bodong Wang <bodong@nvidia.com>; Tim Gardner 
>> <tim.gardner@canonical.com>; kernel-team@lists.ubuntu.com
>> Cc: Vladimir Sokolovsky <vlad@nvidia.com>; Maor Dickman 
>> <maord@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>
>> Subject: Re: ACK: [SRU][F:linux-bluefield][PATCH v1 1/2] netfilter: 
>> conntrack: remove offload_pickup sysctl again
>>
>> Hi Bodong,
>>
>> Yes it isn't merged to master-next on purpose. This patch would have landed for 2022.06.20, however the previous cycle is being extended out.
>> We are prioritizing the urgent patches for the 2022.05.30 cycle to ensure those can get out the door smoothly and then once the cycles catch up, we can merge this in, among other patches that need to come in as well.
>>
>>
>> -Zack
>>
>> On 6/21/22 10:57 AM, Bodong Wang wrote:
>>> Hi Tim/Zach,
>>>
>>> Seems like this patch series is ACKed but not merged to master-next. Could you please check?
>>>
>>> Thanks,
>>> Bodong
>>>
>>> -----Original Message-----
>>> From: Tim Gardner <tim.gardner@canonical.com>
>>> Sent: Thursday, June 16, 2022 10:10 AM
>>> To: Bodong Wang <bodong@nvidia.com>; kernel-team@lists.ubuntu.com
>>> Cc: Vladimir Sokolovsky <vlad@nvidia.com>; Maor Dickman 
>>> <maord@nvidia.com>; Oz Shlomo <ozsh@nvidia.com>
>>> Subject: ACK: [SRU][F:linux-bluefield][PATCH v1 1/2] netfilter:
>>> conntrack: remove offload_pickup sysctl again
>>>
>>> On 5/26/22 07:03, Bodong Wang wrote:
>>>> From: Florian Westphal <fw@strlen.de>
>>>>
>>>> BugLink: https://bugs.launchpad.net/bugs/1975820
>>>>
>>>> These two sysctls were added because the hardcoded defaults (2 
>>>> minutes, tcp, 30 seconds, udp) turned out to be too low for some setups.
>>>>
>>>> They appeared in 5.14-rc1 so it should be fine to remove it again.
>>>>
>>>> Marcelo convinced me that there should be no difference between a 
>>>> flow that was offloaded vs. a flow that was not wrt. timeout handling.
>>>> Thus the default is changed to those for TCP established and UDP 
>>>> stream,
>>>> 5 days and 120 seconds, respectively.
>>>>
>>>> Marcelo also suggested to account for the timeout value used for 
>>>> the offloading, this avoids increase beyond the value in the 
>>>> conntrack-sysctl and will also instantly expire the conntrack entry with altered sysctls.
>>>>
>>>> Example:
>>>>        nf_conntrack_udp_timeout_stream=60
>>>>        nf_flowtable_udp_timeout=60
>>>>
>>>> This will remove offloaded udp flows after one minute, rather than two.
>>>>
>>>> An earlier version of this patch also cleared the ASSURED bit to 
>>>> allow nf_conntrack to evict the entry via early_drop (i.e., table full).
>>>> However, it looks like we can safely assume that connection timed 
>>>> out via HW is still in established state, so this isn't needed.
>>>>
>>>> Quoting Oz:
>>>>      [..] the hardware sends all packets with a set FIN flags to sw.
>>>>      [..] Connections that are aged in hardware are expected to be in the
>>>>      established state.
>>>>
>>>> In case it turns out that back-to-sw-path transition can occur for 
>>>> 'dodgy' connections too (e.g., one side disappeared while 
>>>> software-path would have been in RETRANS timeout), we can adjust this later.
>>>>
>>>> Cc: Oz Shlomo <ozsh@nvidia.com>
>>>> Cc: Paul Blakey <paulb@nvidia.com>
>>>> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>>> Signed-off-by: Florian Westphal <fw@strlen.de>
>>>> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
>>>> Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
>>>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> (backported 
>>>> from commit 4592ee7f525c4683ec9e290381601fdee50ae110)
>>>> Signed-off-by: Bodong Wang <bodong@nvidia.com>
>>>> [bodong: remove doc file]
>>>>
>>>> Conflicts:
>>>> 	Documentation/networking/nf_conntrack-sysctl.rst
>>>> ---
>>>>      include/net/netns/conntrack.h           |  2 --
>>>>      net/netfilter/nf_conntrack_proto_tcp.c  |  1 -
>>>>      net/netfilter/nf_conntrack_proto_udp.c  |  1 -
>>>>      net/netfilter/nf_conntrack_standalone.c | 16 ----------------
>>>>      net/netfilter/nf_flow_table_core.c      | 11 ++++++++---
>>>>      5 files changed, 8 insertions(+), 23 deletions(-)
>>>>
>>>> diff --git a/include/net/netns/conntrack.h 
>>>> b/include/net/netns/conntrack.h index 67bbaa6..9e3963c8 100644
>>>> --- a/include/net/netns/conntrack.h
>>>> +++ b/include/net/netns/conntrack.h
>>>> @@ -29,7 +29,6 @@ struct nf_tcp_net {
>>>>      	int tcp_max_retrans;
>>>>      #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>>      	unsigned int offload_timeout;
>>>> -	unsigned int offload_pickup;
>>>>      #endif
>>>>      };
>>>>      
>>>> @@ -43,7 +42,6 @@ struct nf_udp_net {
>>>>      	unsigned int timeouts[UDP_CT_MAX];
>>>>      #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>>      	unsigned int offload_timeout;
>>>> -	unsigned int offload_pickup;
>>>>      #endif
>>>>      };
>>>>      
>>>> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c
>>>> b/net/netfilter/nf_conntrack_proto_tcp.c
>>>> index d8e554c..0e51fb7 100644
>>>> --- a/net/netfilter/nf_conntrack_proto_tcp.c
>>>> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
>>>> @@ -1435,7 +1435,6 @@ void nf_conntrack_tcp_init_net(struct net 
>>>> *net)
>>>>      
>>>>      #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>>      	tn->offload_timeout = 30 * HZ;
>>>> -	tn->offload_pickup = 120 * HZ;
>>>>      #endif
>>>>      }
>>>>      
>>>> diff --git a/net/netfilter/nf_conntrack_proto_udp.c
>>>> b/net/netfilter/nf_conntrack_proto_udp.c
>>>> index b3e135e..ee336d7 100644
>>>> --- a/net/netfilter/nf_conntrack_proto_udp.c
>>>> +++ b/net/netfilter/nf_conntrack_proto_udp.c
>>>> @@ -282,7 +282,6 @@ void nf_conntrack_udp_init_net(struct net *net)
>>>>      
>>>>      #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>>      	un->offload_timeout = 30 * HZ;
>>>> -	un->offload_pickup = 30 * HZ;
>>>>      #endif
>>>>      }
>>>>      
>>>> diff --git a/net/netfilter/nf_conntrack_standalone.c
>>>> b/net/netfilter/nf_conntrack_standalone.c
>>>> index ac8f12b..1e78ad8 100644
>>>> --- a/net/netfilter/nf_conntrack_standalone.c
>>>> +++ b/net/netfilter/nf_conntrack_standalone.c
>>>> @@ -569,7 +569,6 @@ enum nf_ct_sysctl_index {
>>>>      	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_UNACK,
>>>>      #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>>      	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD,
>>>> -	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP,
>>>>      #endif
>>>>      	NF_SYSCTL_CT_PROTO_TCP_LOOSE,
>>>>      	NF_SYSCTL_CT_PROTO_TCP_LIBERAL, @@ -578,7 +577,6 @@ enum 
>>>> nf_ct_sysctl_index {
>>>>      	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM,
>>>>      #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>>      	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD,
>>>> -	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP,
>>>>      #endif
>>>>      	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP,
>>>>      	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMPV6,
>>>> @@ -773,12 +771,6 @@ enum nf_ct_sysctl_index {
>>>>      		.mode		= 0644,
>>>>      		.proc_handler	= proc_dointvec_jiffies,
>>>>      	},
>>>> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP] = {
>>>> -		.procname	= "nf_flowtable_tcp_pickup",
>>>> -		.maxlen		= sizeof(unsigned int),
>>>> -		.mode		= 0644,
>>>> -		.proc_handler	= proc_dointvec_jiffies,
>>>> -	},
>>>>      #endif
>>>>      	[NF_SYSCTL_CT_PROTO_TCP_LOOSE] = {
>>>>      		.procname	= "nf_conntrack_tcp_loose",
>>>> @@ -821,12 +813,6 @@ enum nf_ct_sysctl_index {
>>>>      		.mode		= 0644,
>>>>      		.proc_handler	= proc_dointvec_jiffies,
>>>>      	},
>>>> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP] = {
>>>> -		.procname	= "nf_flowtable_udp_pickup",
>>>> -		.maxlen		= sizeof(unsigned int),
>>>> -		.mode		= 0644,
>>>> -		.proc_handler	= proc_dointvec_jiffies,
>>>> -	},
>>>>      #endif
>>>>      	[NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP] = {
>>>>      		.procname	= "nf_conntrack_icmp_timeout",
>>>> @@ -1006,7 +992,6 @@ static void
>>>> nf_conntrack_standalone_init_tcp_sysctl(struct net *net,
>>>>      
>>>>      #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>>      	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD].data = &tn->offload_timeout;
>>>> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP].data = &tn->offload_pickup;
>>>>      #endif
>>>>      
>>>>      }
>>>> @@ -1098,7 +1083,6 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
>>>>      	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM].data = &un->timeouts[UDP_CT_REPLIED];
>>>>      #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>>>>      	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD].data = &un->offload_timeout;
>>>> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP].data = &un->offload_pickup;
>>>>      #endif
>>>>      
>>>>      	nf_conntrack_standalone_init_tcp_sysctl(net, table); diff 
>>>> --git a/net/netfilter/nf_flow_table_core.c
>>>> b/net/netfilter/nf_flow_table_core.c
>>>> index 8ed2752..ba86bc0 100644
>>>> --- a/net/netfilter/nf_flow_table_core.c
>>>> +++ b/net/netfilter/nf_flow_table_core.c
>>>> @@ -136,7 +136,7 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>>>>      	const struct nf_conntrack_l4proto *l4proto;
>>>>      	struct net *net = nf_ct_net(ct);
>>>>      	int l4num = nf_ct_protonum(ct);
>>>> -	unsigned int timeout;
>>>> +	s32 timeout;
>>>>      
>>>>      	l4proto = nf_ct_l4proto_find(l4num);
>>>>      	if (!l4proto)
>>>> @@ -145,15 +145,20 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>>>>      	if (l4num == IPPROTO_TCP) {
>>>>      		struct nf_tcp_net *tn = nf_tcp_pernet(net);
>>>>      
>>>> -		timeout = tn->offload_pickup;
>>>> +		timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
>>>> +		timeout -= tn->offload_timeout;
>>>>      	} else if (l4num == IPPROTO_UDP) {
>>>>      		struct nf_udp_net *tn = nf_udp_pernet(net);
>>>>      
>>>> -		timeout = tn->offload_pickup;
>>>> +		timeout = tn->timeouts[UDP_CT_REPLIED];
>>>> +		timeout -= tn->offload_timeout;
>>>>      	} else {
>>>>      		return;
>>>>      	}
>>>>      
>>>> +	if (timeout < 0)
>>>> +		timeout = 0;
>>>> +
>>>>      	if (nf_flow_timeout_delta(ct->timeout) > (__s32)timeout)
>>>>      		ct->timeout = nfct_time_stamp + timeout;
>>>>      }
>>> Acked-by: Tim Gardner <tim.gardner@canonical.com>
>>>
>>> --
>>> -----------
>>> Tim Gardner
>>> Canonical, Inc
Zachary Tahenakos July 6, 2022, 2:22 p.m. UTC | #10
Acked-by: Zachary Tahenakos <zachary.tahenakos@canonical.com>

On 5/26/22 9:03 AM, Bodong Wang wrote:
> From: Florian Westphal <fw@strlen.de>
>
> BugLink: https://bugs.launchpad.net/bugs/1975820
>
> These two sysctls were added because the hardcoded defaults (2 minutes,
> tcp, 30 seconds, udp) turned out to be too low for some setups.
>
> They appeared in 5.14-rc1 so it should be fine to remove it again.
>
> Marcelo convinced me that there should be no difference between a flow
> that was offloaded vs. a flow that was not wrt. timeout handling.
> Thus the default is changed to those for TCP established and UDP stream,
> 5 days and 120 seconds, respectively.
>
> Marcelo also suggested to account for the timeout value used for the
> offloading, this avoids increase beyond the value in the conntrack-sysctl
> and will also instantly expire the conntrack entry with altered sysctls.
>
> Example:
>     nf_conntrack_udp_timeout_stream=60
>     nf_flowtable_udp_timeout=60
>
> This will remove offloaded udp flows after one minute, rather than two.
>
> An earlier version of this patch also cleared the ASSURED bit to
> allow nf_conntrack to evict the entry via early_drop (i.e., table full).
> However, it looks like we can safely assume that connection timed out
> via HW is still in established state, so this isn't needed.
>
> Quoting Oz:
>   [..] the hardware sends all packets with a set FIN flags to sw.
>   [..] Connections that are aged in hardware are expected to be in the
>   established state.
>
> In case it turns out that back-to-sw-path transition can occur for
> 'dodgy' connections too (e.g., one side disappeared while software-path
> would have been in RETRANS timeout), we can adjust this later.
>
> Cc: Oz Shlomo <ozsh@nvidia.com>
> Cc: Paul Blakey <paulb@nvidia.com>
> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> (backported from commit 4592ee7f525c4683ec9e290381601fdee50ae110)
> Signed-off-by: Bodong Wang <bodong@nvidia.com>
> [bodong: remove doc file]
>
> Conflicts:
> 	Documentation/networking/nf_conntrack-sysctl.rst
> ---
>   include/net/netns/conntrack.h           |  2 --
>   net/netfilter/nf_conntrack_proto_tcp.c  |  1 -
>   net/netfilter/nf_conntrack_proto_udp.c  |  1 -
>   net/netfilter/nf_conntrack_standalone.c | 16 ----------------
>   net/netfilter/nf_flow_table_core.c      | 11 ++++++++---
>   5 files changed, 8 insertions(+), 23 deletions(-)
>
> diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
> index 67bbaa6..9e3963c8 100644
> --- a/include/net/netns/conntrack.h
> +++ b/include/net/netns/conntrack.h
> @@ -29,7 +29,6 @@ struct nf_tcp_net {
>   	int tcp_max_retrans;
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	unsigned int offload_timeout;
> -	unsigned int offload_pickup;
>   #endif
>   };
>   
> @@ -43,7 +42,6 @@ struct nf_udp_net {
>   	unsigned int timeouts[UDP_CT_MAX];
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	unsigned int offload_timeout;
> -	unsigned int offload_pickup;
>   #endif
>   };
>   
> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> index d8e554c..0e51fb7 100644
> --- a/net/netfilter/nf_conntrack_proto_tcp.c
> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> @@ -1435,7 +1435,6 @@ void nf_conntrack_tcp_init_net(struct net *net)
>   
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	tn->offload_timeout = 30 * HZ;
> -	tn->offload_pickup = 120 * HZ;
>   #endif
>   }
>   
> diff --git a/net/netfilter/nf_conntrack_proto_udp.c b/net/netfilter/nf_conntrack_proto_udp.c
> index b3e135e..ee336d7 100644
> --- a/net/netfilter/nf_conntrack_proto_udp.c
> +++ b/net/netfilter/nf_conntrack_proto_udp.c
> @@ -282,7 +282,6 @@ void nf_conntrack_udp_init_net(struct net *net)
>   
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	un->offload_timeout = 30 * HZ;
> -	un->offload_pickup = 30 * HZ;
>   #endif
>   }
>   
> diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
> index ac8f12b..1e78ad8 100644
> --- a/net/netfilter/nf_conntrack_standalone.c
> +++ b/net/netfilter/nf_conntrack_standalone.c
> @@ -569,7 +569,6 @@ enum nf_ct_sysctl_index {
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_UNACK,
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD,
> -	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP,
>   #endif
>   	NF_SYSCTL_CT_PROTO_TCP_LOOSE,
>   	NF_SYSCTL_CT_PROTO_TCP_LIBERAL,
> @@ -578,7 +577,6 @@ enum nf_ct_sysctl_index {
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM,
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD,
> -	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP,
>   #endif
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP,
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMPV6,
> @@ -773,12 +771,6 @@ enum nf_ct_sysctl_index {
>   		.mode		= 0644,
>   		.proc_handler	= proc_dointvec_jiffies,
>   	},
> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP] = {
> -		.procname	= "nf_flowtable_tcp_pickup",
> -		.maxlen		= sizeof(unsigned int),
> -		.mode		= 0644,
> -		.proc_handler	= proc_dointvec_jiffies,
> -	},
>   #endif
>   	[NF_SYSCTL_CT_PROTO_TCP_LOOSE] = {
>   		.procname	= "nf_conntrack_tcp_loose",
> @@ -821,12 +813,6 @@ enum nf_ct_sysctl_index {
>   		.mode		= 0644,
>   		.proc_handler	= proc_dointvec_jiffies,
>   	},
> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP] = {
> -		.procname	= "nf_flowtable_udp_pickup",
> -		.maxlen		= sizeof(unsigned int),
> -		.mode		= 0644,
> -		.proc_handler	= proc_dointvec_jiffies,
> -	},
>   #endif
>   	[NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP] = {
>   		.procname	= "nf_conntrack_icmp_timeout",
> @@ -1006,7 +992,6 @@ static void nf_conntrack_standalone_init_tcp_sysctl(struct net *net,
>   
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD].data = &tn->offload_timeout;
> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP].data = &tn->offload_pickup;
>   #endif
>   
>   }
> @@ -1098,7 +1083,6 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
>   	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM].data = &un->timeouts[UDP_CT_REPLIED];
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD].data = &un->offload_timeout;
> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP].data = &un->offload_pickup;
>   #endif
>   
>   	nf_conntrack_standalone_init_tcp_sysctl(net, table);
> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> index 8ed2752..ba86bc0 100644
> --- a/net/netfilter/nf_flow_table_core.c
> +++ b/net/netfilter/nf_flow_table_core.c
> @@ -136,7 +136,7 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>   	const struct nf_conntrack_l4proto *l4proto;
>   	struct net *net = nf_ct_net(ct);
>   	int l4num = nf_ct_protonum(ct);
> -	unsigned int timeout;
> +	s32 timeout;
>   
>   	l4proto = nf_ct_l4proto_find(l4num);
>   	if (!l4proto)
> @@ -145,15 +145,20 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>   	if (l4num == IPPROTO_TCP) {
>   		struct nf_tcp_net *tn = nf_tcp_pernet(net);
>   
> -		timeout = tn->offload_pickup;
> +		timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
> +		timeout -= tn->offload_timeout;
>   	} else if (l4num == IPPROTO_UDP) {
>   		struct nf_udp_net *tn = nf_udp_pernet(net);
>   
> -		timeout = tn->offload_pickup;
> +		timeout = tn->timeouts[UDP_CT_REPLIED];
> +		timeout -= tn->offload_timeout;
>   	} else {
>   		return;
>   	}
>   
> +	if (timeout < 0)
> +		timeout = 0;
> +
>   	if (nf_flow_timeout_delta(ct->timeout) > (__s32)timeout)
>   		ct->timeout = nfct_time_stamp + timeout;
>   }
Zachary Tahenakos July 6, 2022, 4:25 p.m. UTC | #11
Applied to f:bluefield/master-next

Thanks,

Zack

On 5/26/22 9:03 AM, Bodong Wang wrote:
> From: Florian Westphal <fw@strlen.de>
>
> BugLink: https://bugs.launchpad.net/bugs/1975820
>
> These two sysctls were added because the hardcoded defaults (2 minutes,
> tcp, 30 seconds, udp) turned out to be too low for some setups.
>
> They appeared in 5.14-rc1 so it should be fine to remove it again.
>
> Marcelo convinced me that there should be no difference between a flow
> that was offloaded vs. a flow that was not wrt. timeout handling.
> Thus the default is changed to those for TCP established and UDP stream,
> 5 days and 120 seconds, respectively.
>
> Marcelo also suggested to account for the timeout value used for the
> offloading, this avoids increase beyond the value in the conntrack-sysctl
> and will also instantly expire the conntrack entry with altered sysctls.
>
> Example:
>     nf_conntrack_udp_timeout_stream=60
>     nf_flowtable_udp_timeout=60
>
> This will remove offloaded udp flows after one minute, rather than two.
>
> An earlier version of this patch also cleared the ASSURED bit to
> allow nf_conntrack to evict the entry via early_drop (i.e., table full).
> However, it looks like we can safely assume that connection timed out
> via HW is still in established state, so this isn't needed.
>
> Quoting Oz:
>   [..] the hardware sends all packets with a set FIN flags to sw.
>   [..] Connections that are aged in hardware are expected to be in the
>   established state.
>
> In case it turns out that back-to-sw-path transition can occur for
> 'dodgy' connections too (e.g., one side disappeared while software-path
> would have been in RETRANS timeout), we can adjust this later.
>
> Cc: Oz Shlomo <ozsh@nvidia.com>
> Cc: Paul Blakey <paulb@nvidia.com>
> Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Reviewed-by: Oz Shlomo <ozsh@nvidia.com>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> (backported from commit 4592ee7f525c4683ec9e290381601fdee50ae110)
> Signed-off-by: Bodong Wang <bodong@nvidia.com>
> [bodong: remove doc file]
>
> Conflicts:
> 	Documentation/networking/nf_conntrack-sysctl.rst
> ---
>   include/net/netns/conntrack.h           |  2 --
>   net/netfilter/nf_conntrack_proto_tcp.c  |  1 -
>   net/netfilter/nf_conntrack_proto_udp.c  |  1 -
>   net/netfilter/nf_conntrack_standalone.c | 16 ----------------
>   net/netfilter/nf_flow_table_core.c      | 11 ++++++++---
>   5 files changed, 8 insertions(+), 23 deletions(-)
>
> diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
> index 67bbaa6..9e3963c8 100644
> --- a/include/net/netns/conntrack.h
> +++ b/include/net/netns/conntrack.h
> @@ -29,7 +29,6 @@ struct nf_tcp_net {
>   	int tcp_max_retrans;
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	unsigned int offload_timeout;
> -	unsigned int offload_pickup;
>   #endif
>   };
>   
> @@ -43,7 +42,6 @@ struct nf_udp_net {
>   	unsigned int timeouts[UDP_CT_MAX];
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	unsigned int offload_timeout;
> -	unsigned int offload_pickup;
>   #endif
>   };
>   
> diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
> index d8e554c..0e51fb7 100644
> --- a/net/netfilter/nf_conntrack_proto_tcp.c
> +++ b/net/netfilter/nf_conntrack_proto_tcp.c
> @@ -1435,7 +1435,6 @@ void nf_conntrack_tcp_init_net(struct net *net)
>   
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	tn->offload_timeout = 30 * HZ;
> -	tn->offload_pickup = 120 * HZ;
>   #endif
>   }
>   
> diff --git a/net/netfilter/nf_conntrack_proto_udp.c b/net/netfilter/nf_conntrack_proto_udp.c
> index b3e135e..ee336d7 100644
> --- a/net/netfilter/nf_conntrack_proto_udp.c
> +++ b/net/netfilter/nf_conntrack_proto_udp.c
> @@ -282,7 +282,6 @@ void nf_conntrack_udp_init_net(struct net *net)
>   
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	un->offload_timeout = 30 * HZ;
> -	un->offload_pickup = 30 * HZ;
>   #endif
>   }
>   
> diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
> index ac8f12b..1e78ad8 100644
> --- a/net/netfilter/nf_conntrack_standalone.c
> +++ b/net/netfilter/nf_conntrack_standalone.c
> @@ -569,7 +569,6 @@ enum nf_ct_sysctl_index {
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_UNACK,
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD,
> -	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP,
>   #endif
>   	NF_SYSCTL_CT_PROTO_TCP_LOOSE,
>   	NF_SYSCTL_CT_PROTO_TCP_LIBERAL,
> @@ -578,7 +577,6 @@ enum nf_ct_sysctl_index {
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM,
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD,
> -	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP,
>   #endif
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP,
>   	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMPV6,
> @@ -773,12 +771,6 @@ enum nf_ct_sysctl_index {
>   		.mode		= 0644,
>   		.proc_handler	= proc_dointvec_jiffies,
>   	},
> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP] = {
> -		.procname	= "nf_flowtable_tcp_pickup",
> -		.maxlen		= sizeof(unsigned int),
> -		.mode		= 0644,
> -		.proc_handler	= proc_dointvec_jiffies,
> -	},
>   #endif
>   	[NF_SYSCTL_CT_PROTO_TCP_LOOSE] = {
>   		.procname	= "nf_conntrack_tcp_loose",
> @@ -821,12 +813,6 @@ enum nf_ct_sysctl_index {
>   		.mode		= 0644,
>   		.proc_handler	= proc_dointvec_jiffies,
>   	},
> -	[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP] = {
> -		.procname	= "nf_flowtable_udp_pickup",
> -		.maxlen		= sizeof(unsigned int),
> -		.mode		= 0644,
> -		.proc_handler	= proc_dointvec_jiffies,
> -	},
>   #endif
>   	[NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP] = {
>   		.procname	= "nf_conntrack_icmp_timeout",
> @@ -1006,7 +992,6 @@ static void nf_conntrack_standalone_init_tcp_sysctl(struct net *net,
>   
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD].data = &tn->offload_timeout;
> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP].data = &tn->offload_pickup;
>   #endif
>   
>   }
> @@ -1098,7 +1083,6 @@ static int nf_conntrack_standalone_init_sysctl(struct net *net)
>   	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM].data = &un->timeouts[UDP_CT_REPLIED];
>   #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
>   	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD].data = &un->offload_timeout;
> -	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP].data = &un->offload_pickup;
>   #endif
>   
>   	nf_conntrack_standalone_init_tcp_sysctl(net, table);
> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> index 8ed2752..ba86bc0 100644
> --- a/net/netfilter/nf_flow_table_core.c
> +++ b/net/netfilter/nf_flow_table_core.c
> @@ -136,7 +136,7 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>   	const struct nf_conntrack_l4proto *l4proto;
>   	struct net *net = nf_ct_net(ct);
>   	int l4num = nf_ct_protonum(ct);
> -	unsigned int timeout;
> +	s32 timeout;
>   
>   	l4proto = nf_ct_l4proto_find(l4num);
>   	if (!l4proto)
> @@ -145,15 +145,20 @@ static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
>   	if (l4num == IPPROTO_TCP) {
>   		struct nf_tcp_net *tn = nf_tcp_pernet(net);
>   
> -		timeout = tn->offload_pickup;
> +		timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
> +		timeout -= tn->offload_timeout;
>   	} else if (l4num == IPPROTO_UDP) {
>   		struct nf_udp_net *tn = nf_udp_pernet(net);
>   
> -		timeout = tn->offload_pickup;
> +		timeout = tn->timeouts[UDP_CT_REPLIED];
> +		timeout -= tn->offload_timeout;
>   	} else {
>   		return;
>   	}
>   
> +	if (timeout < 0)
> +		timeout = 0;
> +
>   	if (nf_flow_timeout_delta(ct->timeout) > (__s32)timeout)
>   		ct->timeout = nfct_time_stamp + timeout;
>   }
diff mbox series

Patch

diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index 67bbaa6..9e3963c8 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -29,7 +29,6 @@  struct nf_tcp_net {
 	int tcp_max_retrans;
 #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
 	unsigned int offload_timeout;
-	unsigned int offload_pickup;
 #endif
 };
 
@@ -43,7 +42,6 @@  struct nf_udp_net {
 	unsigned int timeouts[UDP_CT_MAX];
 #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
 	unsigned int offload_timeout;
-	unsigned int offload_pickup;
 #endif
 };
 
diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c
index d8e554c..0e51fb7 100644
--- a/net/netfilter/nf_conntrack_proto_tcp.c
+++ b/net/netfilter/nf_conntrack_proto_tcp.c
@@ -1435,7 +1435,6 @@  void nf_conntrack_tcp_init_net(struct net *net)
 
 #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
 	tn->offload_timeout = 30 * HZ;
-	tn->offload_pickup = 120 * HZ;
 #endif
 }
 
diff --git a/net/netfilter/nf_conntrack_proto_udp.c b/net/netfilter/nf_conntrack_proto_udp.c
index b3e135e..ee336d7 100644
--- a/net/netfilter/nf_conntrack_proto_udp.c
+++ b/net/netfilter/nf_conntrack_proto_udp.c
@@ -282,7 +282,6 @@  void nf_conntrack_udp_init_net(struct net *net)
 
 #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
 	un->offload_timeout = 30 * HZ;
-	un->offload_pickup = 30 * HZ;
 #endif
 }
 
diff --git a/net/netfilter/nf_conntrack_standalone.c b/net/netfilter/nf_conntrack_standalone.c
index ac8f12b..1e78ad8 100644
--- a/net/netfilter/nf_conntrack_standalone.c
+++ b/net/netfilter/nf_conntrack_standalone.c
@@ -569,7 +569,6 @@  enum nf_ct_sysctl_index {
 	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_UNACK,
 #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
 	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD,
-	NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP,
 #endif
 	NF_SYSCTL_CT_PROTO_TCP_LOOSE,
 	NF_SYSCTL_CT_PROTO_TCP_LIBERAL,
@@ -578,7 +577,6 @@  enum nf_ct_sysctl_index {
 	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM,
 #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
 	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD,
-	NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP,
 #endif
 	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP,
 	NF_SYSCTL_CT_PROTO_TIMEOUT_ICMPV6,
@@ -773,12 +771,6 @@  enum nf_ct_sysctl_index {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_jiffies,
 	},
-	[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP] = {
-		.procname	= "nf_flowtable_tcp_pickup",
-		.maxlen		= sizeof(unsigned int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec_jiffies,
-	},
 #endif
 	[NF_SYSCTL_CT_PROTO_TCP_LOOSE] = {
 		.procname	= "nf_conntrack_tcp_loose",
@@ -821,12 +813,6 @@  enum nf_ct_sysctl_index {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec_jiffies,
 	},
-	[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP] = {
-		.procname	= "nf_flowtable_udp_pickup",
-		.maxlen		= sizeof(unsigned int),
-		.mode		= 0644,
-		.proc_handler	= proc_dointvec_jiffies,
-	},
 #endif
 	[NF_SYSCTL_CT_PROTO_TIMEOUT_ICMP] = {
 		.procname	= "nf_conntrack_icmp_timeout",
@@ -1006,7 +992,6 @@  static void nf_conntrack_standalone_init_tcp_sysctl(struct net *net,
 
 #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
 	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD].data = &tn->offload_timeout;
-	table[NF_SYSCTL_CT_PROTO_TIMEOUT_TCP_OFFLOAD_PICKUP].data = &tn->offload_pickup;
 #endif
 
 }
@@ -1098,7 +1083,6 @@  static int nf_conntrack_standalone_init_sysctl(struct net *net)
 	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_STREAM].data = &un->timeouts[UDP_CT_REPLIED];
 #if IS_ENABLED(CONFIG_NF_FLOW_TABLE)
 	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD].data = &un->offload_timeout;
-	table[NF_SYSCTL_CT_PROTO_TIMEOUT_UDP_OFFLOAD_PICKUP].data = &un->offload_pickup;
 #endif
 
 	nf_conntrack_standalone_init_tcp_sysctl(net, table);
diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 8ed2752..ba86bc0 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -136,7 +136,7 @@  static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
 	const struct nf_conntrack_l4proto *l4proto;
 	struct net *net = nf_ct_net(ct);
 	int l4num = nf_ct_protonum(ct);
-	unsigned int timeout;
+	s32 timeout;
 
 	l4proto = nf_ct_l4proto_find(l4num);
 	if (!l4proto)
@@ -145,15 +145,20 @@  static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
 	if (l4num == IPPROTO_TCP) {
 		struct nf_tcp_net *tn = nf_tcp_pernet(net);
 
-		timeout = tn->offload_pickup;
+		timeout = tn->timeouts[TCP_CONNTRACK_ESTABLISHED];
+		timeout -= tn->offload_timeout;
 	} else if (l4num == IPPROTO_UDP) {
 		struct nf_udp_net *tn = nf_udp_pernet(net);
 
-		timeout = tn->offload_pickup;
+		timeout = tn->timeouts[UDP_CT_REPLIED];
+		timeout -= tn->offload_timeout;
 	} else {
 		return;
 	}
 
+	if (timeout < 0)
+		timeout = 0;
+
 	if (nf_flow_timeout_delta(ct->timeout) > (__s32)timeout)
 		ct->timeout = nfct_time_stamp + timeout;
 }