diff mbox series

[net,2/2] net/sched: act_ct: Set offload timeout when setting the offload bit

Message ID 20200728115759.426667-3-roid@mellanox.com
State Changes Requested
Delegated to: David Miller
Headers show
Series netfilter: conntrack: Fix CT offload timeout on heavily loaded systems | expand

Commit Message

Roi Dayan July 28, 2020, 11:57 a.m. UTC
On heavily loaded systems the GC can take time to go over all existing
conns and reset their timeout. At that time other calls like from
nf_conntrack_in() can call of nf_ct_is_expired() and see the conn as
expired. To fix this when we set the offload bit we should also reset
the timeout instead of counting on GC to finish first iteration over
all conns before the initial timeout.

Fixes: 64ff70b80fd4 ("net/sched: act_ct: Offload established connections to flow table")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
---
 net/sched/act_ct.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Marcelo Ricardo Leitner July 28, 2020, 2:42 p.m. UTC | #1
On Tue, Jul 28, 2020 at 02:57:59PM +0300, Roi Dayan wrote:
> On heavily loaded systems the GC can take time to go over all existing
> conns and reset their timeout. At that time other calls like from
> nf_conntrack_in() can call of nf_ct_is_expired() and see the conn as
> expired. To fix this when we set the offload bit we should also reset
> the timeout instead of counting on GC to finish first iteration over
> all conns before the initial timeout.
> 
> Fixes: 64ff70b80fd4 ("net/sched: act_ct: Offload established connections to flow table")
> Signed-off-by: Roi Dayan <roid@mellanox.com>
> Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
> ---
>  net/sched/act_ct.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
> index e9f3576cbf71..650c2d78a346 100644
> --- a/net/sched/act_ct.c
> +++ b/net/sched/act_ct.c
> @@ -366,6 +366,8 @@ static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,

Extra context line:
	err = flow_offload_add(&ct_ft->nf_ft, entry);
>  	if (err)
>  		goto err_add;
>  
> +	nf_ct_offload_timeout(ct);
> +

What about adding this to flow_offload_add() instead?
It is already adjusting the flow_offload timeout there and then it
also effective for nft.

>  	return;
>  
>  err_add:
> -- 
> 2.8.4
>
Roi Dayan July 29, 2020, 12:55 p.m. UTC | #2
On 2020-07-28 5:42 PM, Marcelo Ricardo Leitner wrote:
> On Tue, Jul 28, 2020 at 02:57:59PM +0300, Roi Dayan wrote:
>> On heavily loaded systems the GC can take time to go over all existing
>> conns and reset their timeout. At that time other calls like from
>> nf_conntrack_in() can call of nf_ct_is_expired() and see the conn as
>> expired. To fix this when we set the offload bit we should also reset
>> the timeout instead of counting on GC to finish first iteration over
>> all conns before the initial timeout.
>>
>> Fixes: 64ff70b80fd4 ("net/sched: act_ct: Offload established connections to flow table")
>> Signed-off-by: Roi Dayan <roid@mellanox.com>
>> Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
>> ---
>>  net/sched/act_ct.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
>> index e9f3576cbf71..650c2d78a346 100644
>> --- a/net/sched/act_ct.c
>> +++ b/net/sched/act_ct.c
>> @@ -366,6 +366,8 @@ static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
> 
> Extra context line:
> 	err = flow_offload_add(&ct_ft->nf_ft, entry);
>>  	if (err)
>>  		goto err_add;
>>  
>> +	nf_ct_offload_timeout(ct);
>> +
> 
> What about adding this to flow_offload_add() instead?
> It is already adjusting the flow_offload timeout there and then it
> also effective for nft.
> 

As you said, in flow_offload_add() we adjust the flow timeout.
Here we adjust the conn timeout.
So it's outside flow_offload_add() which only touch the flow struct.
I guess it's like conn offload bit is set outside here and for nft.
What do you think?

>>  	return;
>>  
>>  err_add:
>> -- 
>> 2.8.4
>>
Marcelo Ricardo Leitner July 29, 2020, 5:10 p.m. UTC | #3
On Wed, Jul 29, 2020 at 03:55:53PM +0300, Roi Dayan wrote:
> 
> 
> On 2020-07-28 5:42 PM, Marcelo Ricardo Leitner wrote:
> > On Tue, Jul 28, 2020 at 02:57:59PM +0300, Roi Dayan wrote:
> >> On heavily loaded systems the GC can take time to go over all existing
> >> conns and reset their timeout. At that time other calls like from
> >> nf_conntrack_in() can call of nf_ct_is_expired() and see the conn as
> >> expired. To fix this when we set the offload bit we should also reset
> >> the timeout instead of counting on GC to finish first iteration over
> >> all conns before the initial timeout.
> >>
> >> Fixes: 64ff70b80fd4 ("net/sched: act_ct: Offload established connections to flow table")
> >> Signed-off-by: Roi Dayan <roid@mellanox.com>
> >> Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
> >> ---
> >>  net/sched/act_ct.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
> >> index e9f3576cbf71..650c2d78a346 100644
> >> --- a/net/sched/act_ct.c
> >> +++ b/net/sched/act_ct.c
> >> @@ -366,6 +366,8 @@ static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
> > 
> > Extra context line:
> > 	err = flow_offload_add(&ct_ft->nf_ft, entry);
> >>  	if (err)
> >>  		goto err_add;
> >>  
> >> +	nf_ct_offload_timeout(ct);
> >> +
> > 
> > What about adding this to flow_offload_add() instead?
> > It is already adjusting the flow_offload timeout there and then it
> > also effective for nft.
> > 
> 
> As you said, in flow_offload_add() we adjust the flow timeout.
> Here we adjust the conn timeout.
> So it's outside flow_offload_add() which only touch the flow struct.
> I guess it's like conn offload bit is set outside here and for nft.

Right, but

> What do you think?

I don't see why it can't update both. flow_offload_fixup_ct_timeout(),
called by flow_offload_del(), is updating ct->timeout already. It
looks consistent to me to update it in _add as well then. 

> 
> >>  	return;
> >>  
> >>  err_add:
> >> -- 
> >> 2.8.4
> >>
Roi Dayan Aug. 3, 2020, 7:21 a.m. UTC | #4
On 2020-07-29 8:10 PM, Marcelo Ricardo Leitner wrote:
> On Wed, Jul 29, 2020 at 03:55:53PM +0300, Roi Dayan wrote:
>>
>>
>> On 2020-07-28 5:42 PM, Marcelo Ricardo Leitner wrote:
>>> On Tue, Jul 28, 2020 at 02:57:59PM +0300, Roi Dayan wrote:
>>>> On heavily loaded systems the GC can take time to go over all existing
>>>> conns and reset their timeout. At that time other calls like from
>>>> nf_conntrack_in() can call of nf_ct_is_expired() and see the conn as
>>>> expired. To fix this when we set the offload bit we should also reset
>>>> the timeout instead of counting on GC to finish first iteration over
>>>> all conns before the initial timeout.
>>>>
>>>> Fixes: 64ff70b80fd4 ("net/sched: act_ct: Offload established connections to flow table")
>>>> Signed-off-by: Roi Dayan <roid@mellanox.com>
>>>> Reviewed-by: Oz Shlomo <ozsh@mellanox.com>
>>>> ---
>>>>  net/sched/act_ct.c | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
>>>> index e9f3576cbf71..650c2d78a346 100644
>>>> --- a/net/sched/act_ct.c
>>>> +++ b/net/sched/act_ct.c
>>>> @@ -366,6 +366,8 @@ static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
>>>
>>> Extra context line:
>>> 	err = flow_offload_add(&ct_ft->nf_ft, entry);
>>>>  	if (err)
>>>>  		goto err_add;
>>>>  
>>>> +	nf_ct_offload_timeout(ct);
>>>> +
>>>
>>> What about adding this to flow_offload_add() instead?
>>> It is already adjusting the flow_offload timeout there and then it
>>> also effective for nft.
>>>
>>
>> As you said, in flow_offload_add() we adjust the flow timeout.
>> Here we adjust the conn timeout.
>> So it's outside flow_offload_add() which only touch the flow struct.
>> I guess it's like conn offload bit is set outside here and for nft.
> 
> Right, but
> 
>> What do you think?
> 
> I don't see why it can't update both. flow_offload_fixup_ct_timeout(),
> called by flow_offload_del(), is updating ct->timeout already. It
> looks consistent to me to update it in _add as well then. 
> 

I don't mind. just add is not consistent with del.
del also clears the ips_offload_bit but add doesn't add it.
i'll send v2 with your suggestion.

>>
>>>>  	return;
>>>>  
>>>>  err_add:
>>>> -- 
>>>> 2.8.4
>>>>
diff mbox series

Patch

diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index e9f3576cbf71..650c2d78a346 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -366,6 +366,8 @@  static void tcf_ct_flow_table_add(struct tcf_ct_flow_table *ct_ft,
 	if (err)
 		goto err_add;
 
+	nf_ct_offload_timeout(ct);
+
 	return;
 
 err_add: