diff mbox

[net] conntrack: perform a full scan in gc

Message ID 1476779435-30503-1-git-send-email-nicolas.dichtel@6wind.com
State Awaiting Upstream, archived
Delegated to: David Miller
Headers show

Commit Message

Nicolas Dichtel Oct. 18, 2016, 8:30 a.m. UTC
After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
timed-out entries"), netlink conntrack deletion events may be sent with a
huge delay (5 minutes).

There is two ways to evict conntrack:
 - during a conntrack lookup;
 - during a conntrack dump.
Let's do a full scan of conntrack entries after a period of inactivity
(no conntrack lookup).

CC: Florian Westphal <fw@strlen.de>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---

Here is another proposal to try to fix the problem.
Comments are welcomed,
Nicolas

 net/netfilter/nf_conntrack_core.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Florian Westphal Oct. 18, 2016, 8:47 a.m. UTC | #1
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
> After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
> timed-out entries"), netlink conntrack deletion events may be sent with a
> huge delay (5 minutes).
> 
> There is two ways to evict conntrack:
>  - during a conntrack lookup;
>  - during a conntrack dump.
> Let's do a full scan of conntrack entries after a period of inactivity
> (no conntrack lookup).
> 
> CC: Florian Westphal <fw@strlen.de>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---
> 
> Here is another proposal to try to fix the problem.
> Comments are welcomed,
> Nicolas

Hmm, I don't think its good idea in practice.
If goal is to avoid starving arbitrary 'dead' ct for too long,
then simple ping will defeat the logic here, because...

>  net/netfilter/nf_conntrack_core.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index ba6a1d421222..3dbb27bd9582 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -87,6 +87,7 @@ static __read_mostly bool nf_conntrack_locks_all;
>  #define GC_MAX_BUCKETS		8192u
>  #define GC_INTERVAL		(5 * HZ)
>  #define GC_MAX_EVICTS		256u
> +static bool gc_full_scan = true;
>  
>  static struct conntrack_gc_work conntrack_gc_work;
>  
> @@ -511,6 +512,7 @@ ____nf_conntrack_find(struct net *net, const struct nf_conntrack_zone *zone,
>  	unsigned int bucket, hsize;
>  
>  begin:
> +	gc_full_scan = false;

... we do periodic lookup (but always in same slot), so no full scan is
triggered.

If you think its useful, consider sending patch that rescheds worker
instantly in case budget expired, otherwise I will do this later this
week.

[ I am aware doing instant restart might be too late, but at least we
  would then reap more entries once we stumble upon large number of
  expired ones ].
Nicolas Dichtel Oct. 18, 2016, 10:06 a.m. UTC | #2
Le 18/10/2016 à 10:47, Florian Westphal a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>> After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
>> timed-out entries"), netlink conntrack deletion events may be sent with a
>> huge delay (5 minutes).
>>
>> There is two ways to evict conntrack:
>>  - during a conntrack lookup;
>>  - during a conntrack dump.
>> Let's do a full scan of conntrack entries after a period of inactivity
>> (no conntrack lookup).
>>
>> CC: Florian Westphal <fw@strlen.de>
>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>> ---
>>
>> Here is another proposal to try to fix the problem.
>> Comments are welcomed,
>> Nicolas
> 
> Hmm, I don't think its good idea in practice.
> If goal is to avoid starving arbitrary 'dead' ct for too long,
> then simple ping will defeat the logic here, because...
> 
>>  net/netfilter/nf_conntrack_core.c | 11 +++++++++--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
>> index ba6a1d421222..3dbb27bd9582 100644
>> --- a/net/netfilter/nf_conntrack_core.c
>> +++ b/net/netfilter/nf_conntrack_core.c
>> @@ -87,6 +87,7 @@ static __read_mostly bool nf_conntrack_locks_all;
>>  #define GC_MAX_BUCKETS		8192u
>>  #define GC_INTERVAL		(5 * HZ)
>>  #define GC_MAX_EVICTS		256u
>> +static bool gc_full_scan = true;
>>  
>>  static struct conntrack_gc_work conntrack_gc_work;
>>  
>> @@ -511,6 +512,7 @@ ____nf_conntrack_find(struct net *net, const struct nf_conntrack_zone *zone,
>>  	unsigned int bucket, hsize;
>>  
>>  begin:
>> +	gc_full_scan = false;
> 
> ... we do periodic lookup (but always in same slot), so no full scan is
> triggered.
Yes, I was wondering about that. My first idea was to have that bool per bucket
and force a scan of the bucket instead of the whole table.

> 
> If you think its useful, consider sending patch that rescheds worker
> instantly in case budget expired, otherwise I will do this later this
> week.
Ok, I will send it, but it does not address the "inactivity" problem.

> 
> [ I am aware doing instant restart might be too late, but at least we
>   would then reap more entries once we stumble upon large number of
>   expired ones ].
>
Nicolas Dichtel Oct. 20, 2016, 8:50 a.m. UTC | #3
Le 18/10/2016 à 12:06, Nicolas Dichtel a écrit :
> Le 18/10/2016 à 10:47, Florian Westphal a écrit :
>> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>>> After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to remove
>>> timed-out entries"), netlink conntrack deletion events may be sent with a
>>> huge delay (5 minutes).
>>>
>>> There is two ways to evict conntrack:
>>>  - during a conntrack lookup;
>>>  - during a conntrack dump.
>>> Let's do a full scan of conntrack entries after a period of inactivity
>>> (no conntrack lookup).
>>>
>>> CC: Florian Westphal <fw@strlen.de>
>>> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>>> ---
>>>
>>> Here is another proposal to try to fix the problem.
>>> Comments are welcomed,
>>> Nicolas
>>
>> Hmm, I don't think its good idea in practice.
>> If goal is to avoid starving arbitrary 'dead' ct for too long,
>> then simple ping will defeat the logic here, because...
>>
>>>  net/netfilter/nf_conntrack_core.c | 11 +++++++++--
>>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
>>> index ba6a1d421222..3dbb27bd9582 100644
>>> --- a/net/netfilter/nf_conntrack_core.c
>>> +++ b/net/netfilter/nf_conntrack_core.c
>>> @@ -87,6 +87,7 @@ static __read_mostly bool nf_conntrack_locks_all;
>>>  #define GC_MAX_BUCKETS		8192u
>>>  #define GC_INTERVAL		(5 * HZ)
>>>  #define GC_MAX_EVICTS		256u
>>> +static bool gc_full_scan = true;
>>>  
>>>  static struct conntrack_gc_work conntrack_gc_work;
>>>  
>>> @@ -511,6 +512,7 @@ ____nf_conntrack_find(struct net *net, const struct nf_conntrack_zone *zone,
>>>  	unsigned int bucket, hsize;
>>>  
>>>  begin:
>>> +	gc_full_scan = false;
>>
>> ... we do periodic lookup (but always in same slot), so no full scan is
>> triggered.
> Yes, I was wondering about that. My first idea was to have that bool per bucket
> and force a scan of the bucket instead of the whole table.
FYI, I'am off for about two weeks, but we really need to find a way to fix that
before the release goes out.
diff mbox

Patch

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index ba6a1d421222..3dbb27bd9582 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -87,6 +87,7 @@  static __read_mostly bool nf_conntrack_locks_all;
 #define GC_MAX_BUCKETS		8192u
 #define GC_INTERVAL		(5 * HZ)
 #define GC_MAX_EVICTS		256u
+static bool gc_full_scan = true;
 
 static struct conntrack_gc_work conntrack_gc_work;
 
@@ -511,6 +512,7 @@  ____nf_conntrack_find(struct net *net, const struct nf_conntrack_zone *zone,
 	unsigned int bucket, hsize;
 
 begin:
+	gc_full_scan = false;
 	nf_conntrack_get_ht(&ct_hash, &hsize);
 	bucket = reciprocal_scale(hash, hsize);
 
@@ -942,7 +944,11 @@  static void gc_worker(struct work_struct *work)
 
 	gc_work = container_of(work, struct conntrack_gc_work, dwork.work);
 
-	goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV, GC_MAX_BUCKETS);
+	if (gc_full_scan)
+		goal = nf_conntrack_htable_size;
+	else
+		goal = min(nf_conntrack_htable_size / GC_MAX_BUCKETS_DIV,
+			   GC_MAX_BUCKETS);
 	i = gc_work->last_bucket;
 
 	do {
@@ -977,7 +983,8 @@  static void gc_worker(struct work_struct *work)
 		rcu_read_unlock();
 		cond_resched_rcu_qs();
 	} while (++buckets < goal &&
-		 expired_count < GC_MAX_EVICTS);
+		 (gc_full_scan || expired_count < GC_MAX_EVICTS));
+	gc_full_scan = true;
 
 	if (gc_work->exiting)
 		return;