diff mbox series

[net] netfilter: flowtable: additional checks for outdated flows

Message ID 20231024171718.4080012-1-vladbu@nvidia.com
State Changes Requested
Delegated to: Pablo Neira
Headers show
Series [net] netfilter: flowtable: additional checks for outdated flows | expand

Commit Message

Vlad Buslov Oct. 24, 2023, 5:17 p.m. UTC
Current nf_flow_is_outdated() implementation considers any flow table flow
which state diverged from its underlying CT connection status for teardown
which can be problematic in the following cases:

- Flow has never been offloaded to hardware in the first place either
because flow table has hardware offload disabled (flag
NF_FLOWTABLE_HW_OFFLOAD is not set) or because it is still pending on 'add'
workqueue to be offloaded for the first time. The former is incorrect, the
later generates excessive deletions and additions of flows.

- Flow is already pending to be updated on the workqueue. Tearing down such
flows will also generate excessive removals from the flow table, especially
on highly loaded system where the latency to re-offload a flow via 'add'
workqueue can be quite high.

When considering a flow for teardown as outdated verify that it is both
offloaded to hardware and doesn't have any pending updates.

Fixes: 41f2c7c342d3 ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple")
Reviewed-by: Paul Blakey <paulb@nvidia.com>
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---
 net/netfilter/nf_flow_table_core.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Pablo Neira Ayuso Oct. 24, 2023, 7:40 p.m. UTC | #1
Hi Vlad,

On Tue, Oct 24, 2023 at 07:17:18PM +0200, Vlad Buslov wrote:
> Current nf_flow_is_outdated() implementation considers any flow table flow
> which state diverged from its underlying CT connection status for teardown
> which can be problematic in the following cases:
> 
> - Flow has never been offloaded to hardware in the first place either
> because flow table has hardware offload disabled (flag
> NF_FLOWTABLE_HW_OFFLOAD is not set) or because it is still pending on 'add'
> workqueue to be offloaded for the first time. The former is incorrect, the
> later generates excessive deletions and additions of flows.
> 
> - Flow is already pending to be updated on the workqueue. Tearing down such
> flows will also generate excessive removals from the flow table, especially
> on highly loaded system where the latency to re-offload a flow via 'add'
> workqueue can be quite high.
> 
> When considering a flow for teardown as outdated verify that it is both
> offloaded to hardware and doesn't have any pending updates.

Thanks.

I have posted an alternative patch to move the handling of
NF_FLOW_HW_ESTABLISHED, which is specific for sched/act_ct:

https://patchwork.ozlabs.org/project/netfilter-devel/patch/20231024193815.1987-1-pablo@netfilter.org/

it is a bit more code, but it makes it easier to understand for the
code reader that this bit is net/sched specific.

> Fixes: 41f2c7c342d3 ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple")
> Reviewed-by: Paul Blakey <paulb@nvidia.com>
> Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
> ---
>  net/netfilter/nf_flow_table_core.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
> index 1d34d700bd09..db404f89d3d7 100644
> --- a/net/netfilter/nf_flow_table_core.c
> +++ b/net/netfilter/nf_flow_table_core.c
> @@ -319,6 +319,8 @@ EXPORT_SYMBOL_GPL(flow_offload_refresh);
>  static bool nf_flow_is_outdated(const struct flow_offload *flow)
>  {
>  	return test_bit(IPS_SEEN_REPLY_BIT, &flow->ct->status) &&
> +		test_bit(IPS_HW_OFFLOAD_BIT, &flow->ct->status) &&
> +		!test_bit(NF_FLOW_HW_PENDING, &flow->flags) &&
>  		!test_bit(NF_FLOW_HW_ESTABLISHED, &flow->flags);
>  }
>  
> -- 
> 2.39.2
>
Vlad Buslov Oct. 24, 2023, 7:45 p.m. UTC | #2
On Tue 24 Oct 2023 at 21:40, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> Hi Vlad,
>
> On Tue, Oct 24, 2023 at 07:17:18PM +0200, Vlad Buslov wrote:
>> Current nf_flow_is_outdated() implementation considers any flow table flow
>> which state diverged from its underlying CT connection status for teardown
>> which can be problematic in the following cases:
>> 
>> - Flow has never been offloaded to hardware in the first place either
>> because flow table has hardware offload disabled (flag
>> NF_FLOWTABLE_HW_OFFLOAD is not set) or because it is still pending on 'add'
>> workqueue to be offloaded for the first time. The former is incorrect, the
>> later generates excessive deletions and additions of flows.
>> 
>> - Flow is already pending to be updated on the workqueue. Tearing down such
>> flows will also generate excessive removals from the flow table, especially
>> on highly loaded system where the latency to re-offload a flow via 'add'
>> workqueue can be quite high.
>> 
>> When considering a flow for teardown as outdated verify that it is both
>> offloaded to hardware and doesn't have any pending updates.
>
> Thanks.
>
> I have posted an alternative patch to move the handling of
> NF_FLOW_HW_ESTABLISHED, which is specific for sched/act_ct:
>
> https://patchwork.ozlabs.org/project/netfilter-devel/patch/20231024193815.1987-1-pablo@netfilter.org/
>
> it is a bit more code, but it makes it easier to understand for the
> code reader that this bit is net/sched specific.
>

Thanks for refactoring this, I agree that separating the act_ct-specific
check makes it more obvious.

How would you prefer to solve the conflict with my fix? Should I wait
for your patch to be accepted to net, rebase my fix on top and submit
V2? Or you can incorporate the checks from my fix together with my
signoff and submit it as a single change?

>> Fixes: 41f2c7c342d3 ("net/sched: act_ct: Fix promotion of offloaded unreplied tuple")
>> Reviewed-by: Paul Blakey <paulb@nvidia.com>
>> Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
>> ---
>>  net/netfilter/nf_flow_table_core.c | 2 ++
>>  1 file changed, 2 insertions(+)
>> 
>> diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
>> index 1d34d700bd09..db404f89d3d7 100644
>> --- a/net/netfilter/nf_flow_table_core.c
>> +++ b/net/netfilter/nf_flow_table_core.c
>> @@ -319,6 +319,8 @@ EXPORT_SYMBOL_GPL(flow_offload_refresh);
>>  static bool nf_flow_is_outdated(const struct flow_offload *flow)
>>  {
>>  	return test_bit(IPS_SEEN_REPLY_BIT, &flow->ct->status) &&
>> +		test_bit(IPS_HW_OFFLOAD_BIT, &flow->ct->status) &&
>> +		!test_bit(NF_FLOW_HW_PENDING, &flow->flags) &&
>>  		!test_bit(NF_FLOW_HW_ESTABLISHED, &flow->flags);
>>  }
>>  
>> -- 
>> 2.39.2
>>
Pablo Neira Ayuso Oct. 24, 2023, 8:07 p.m. UTC | #3
On Tue, Oct 24, 2023 at 10:45:31PM +0300, Vlad Buslov wrote:
> On Tue 24 Oct 2023 at 21:40, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > Hi Vlad,
> >
> > On Tue, Oct 24, 2023 at 07:17:18PM +0200, Vlad Buslov wrote:
> >> Current nf_flow_is_outdated() implementation considers any flow table flow
> >> which state diverged from its underlying CT connection status for teardown
> >> which can be problematic in the following cases:
> >> 
> >> - Flow has never been offloaded to hardware in the first place either
> >> because flow table has hardware offload disabled (flag
> >> NF_FLOWTABLE_HW_OFFLOAD is not set) or because it is still pending on 'add'
> >> workqueue to be offloaded for the first time. The former is incorrect, the
> >> later generates excessive deletions and additions of flows.
> >> 
> >> - Flow is already pending to be updated on the workqueue. Tearing down such
> >> flows will also generate excessive removals from the flow table, especially
> >> on highly loaded system where the latency to re-offload a flow via 'add'
> >> workqueue can be quite high.
> >> 
> >> When considering a flow for teardown as outdated verify that it is both
> >> offloaded to hardware and doesn't have any pending updates.
> >
> > Thanks.
> >
> > I have posted an alternative patch to move the handling of
> > NF_FLOW_HW_ESTABLISHED, which is specific for sched/act_ct:
> >
> > https://patchwork.ozlabs.org/project/netfilter-devel/patch/20231024193815.1987-1-pablo@netfilter.org/
> >
> > it is a bit more code, but it makes it easier to understand for the
> > code reader that this bit is net/sched specific.
> >
> 
> Thanks for refactoring this, I agree that separating the act_ct-specific
> check makes it more obvious.
> 
> How would you prefer to solve the conflict with my fix? Should I wait
> for your patch to be accepted to net, rebase my fix on top and submit
> V2? Or you can incorporate the checks from my fix together with my
> signoff and submit it as a single change?

Rebased here as per your request:

https://patchwork.ozlabs.org/project/netfilter-devel/patch/20231024200243.50784-1-pablo@netfilter.org/

I took the freedom to take your Signed-off-by: and Paul's Reviewed-by:
which is not the best way to go, but please acknowledge this is fine
in this exceptional case.

We can handle this via nf.git tree, there were no plans to send a PR
to netdev, but I think these fixes are worth to (try to) get them
there in time for the 6.6 release.

Thanks.
Vlad Buslov Oct. 24, 2023, 8:16 p.m. UTC | #4
On Tue 24 Oct 2023 at 22:07, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Tue, Oct 24, 2023 at 10:45:31PM +0300, Vlad Buslov wrote:
>> On Tue 24 Oct 2023 at 21:40, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
>> > Hi Vlad,
>> >
>> > On Tue, Oct 24, 2023 at 07:17:18PM +0200, Vlad Buslov wrote:
>> >> Current nf_flow_is_outdated() implementation considers any flow table flow
>> >> which state diverged from its underlying CT connection status for teardown
>> >> which can be problematic in the following cases:
>> >> 
>> >> - Flow has never been offloaded to hardware in the first place either
>> >> because flow table has hardware offload disabled (flag
>> >> NF_FLOWTABLE_HW_OFFLOAD is not set) or because it is still pending on 'add'
>> >> workqueue to be offloaded for the first time. The former is incorrect, the
>> >> later generates excessive deletions and additions of flows.
>> >> 
>> >> - Flow is already pending to be updated on the workqueue. Tearing down such
>> >> flows will also generate excessive removals from the flow table, especially
>> >> on highly loaded system where the latency to re-offload a flow via 'add'
>> >> workqueue can be quite high.
>> >> 
>> >> When considering a flow for teardown as outdated verify that it is both
>> >> offloaded to hardware and doesn't have any pending updates.
>> >
>> > Thanks.
>> >
>> > I have posted an alternative patch to move the handling of
>> > NF_FLOW_HW_ESTABLISHED, which is specific for sched/act_ct:
>> >
>> > https://patchwork.ozlabs.org/project/netfilter-devel/patch/20231024193815.1987-1-pablo@netfilter.org/
>> >
>> > it is a bit more code, but it makes it easier to understand for the
>> > code reader that this bit is net/sched specific.
>> >
>> 
>> Thanks for refactoring this, I agree that separating the act_ct-specific
>> check makes it more obvious.
>> 
>> How would you prefer to solve the conflict with my fix? Should I wait
>> for your patch to be accepted to net, rebase my fix on top and submit
>> V2? Or you can incorporate the checks from my fix together with my
>> signoff and submit it as a single change?
>
> Rebased here as per your request:
>
> https://patchwork.ozlabs.org/project/netfilter-devel/patch/20231024200243.50784-1-pablo@netfilter.org/
>
> I took the freedom to take your Signed-off-by: and Paul's Reviewed-by:
> which is not the best way to go, but please acknowledge this is fine
> in this exceptional case.

Ack. Replied to the patch with my signed-off-by. Thanks!

>
> We can handle this via nf.git tree, there were no plans to send a PR
> to netdev, but I think these fixes are worth to (try to) get them
> there in time for the 6.6 release.
>
> Thanks.
diff mbox series

Patch

diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 1d34d700bd09..db404f89d3d7 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -319,6 +319,8 @@  EXPORT_SYMBOL_GPL(flow_offload_refresh);
 static bool nf_flow_is_outdated(const struct flow_offload *flow)
 {
 	return test_bit(IPS_SEEN_REPLY_BIT, &flow->ct->status) &&
+		test_bit(IPS_HW_OFFLOAD_BIT, &flow->ct->status) &&
+		!test_bit(NF_FLOW_HW_PENDING, &flow->flags) &&
 		!test_bit(NF_FLOW_HW_ESTABLISHED, &flow->flags);
 }