Message ID | 1485448687-6072-10-git-send-email-pablo@netfilter.org |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Le 26/01/2017 à 17:38, Pablo Neira Ayuso a écrit : > From: Florian Westphal <fw@strlen.de> > > This further refines the changes made to conntrack gc_worker in > commit e0df8cae6c16 ("netfilter: conntrack: refine gc worker heuristics"). > > The main idea of that change was to reduce the scan interval when evictions > take place. > > However, on the reporters' setup, there are 1-2 million conntrack entries > in total and roughly 8k new (and closing) connections per second. > > In this case we'll always evict at least one entry per gc cycle and scan > interval is always at 1 jiffy because of this test: > > } else if (expired_count) { > gc_work->next_gc_run /= 2U; > next_run = msecs_to_jiffies(1); > > being true almost all the time. > > Given we scan ~10k entries per run its clearly wrong to reduce interval > based on nonzero eviction count, it will only waste cpu cycles since a vast > majorities of conntracks are not timed out. > > Thus only look at the ratio (scanned entries vs. evicted entries) to make > a decision on whether to reduce or not. > > Because evictor is supposed to only kick in when system turns idle after > a busy period, pick a high ratio -- this makes it 50%. We thus keep > the idea of increasing scan rate when its likely that table contains many > expired entries. > > In order to not let timed-out entries hang around for too long > (important when using event logging, in which case we want to timely > destroy events), we now scan the full table within at most > GC_MAX_SCAN_JIFFIES (16 seconds) even in worst-case scenario where all > timed-out entries sit in same slot. > > I tested this with a vm under synflood (with > sysctl net.netfilter.nf_conntrack_tcp_timeout_syn_recv=3). > > While flood is ongoing, interval now stays at its max rate > (GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV -> 125ms). > > With feedback from Nicolas Dichtel. > > Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> > Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> > Fixes: b87a2f9199ea82eaadc ("netfilter: conntrack: add gc worker to remove timed-out entries") > Signed-off-by: Florian Westphal <fw@strlen.de> > Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> > Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> > Tested-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Pablo, is it possible to queue this patch (and the previous: 08/14) for the 4.9 stable? Thank you, Nicolas
Le 27/01/2017 à 17:51, Nicolas Dichtel a écrit : > Le 26/01/2017 à 17:38, Pablo Neira Ayuso a écrit : >> From: Florian Westphal <fw@strlen.de> >> >> This further refines the changes made to conntrack gc_worker in >> commit e0df8cae6c16 ("netfilter: conntrack: refine gc worker heuristics"). >> >> The main idea of that change was to reduce the scan interval when evictions >> take place. >> >> However, on the reporters' setup, there are 1-2 million conntrack entries >> in total and roughly 8k new (and closing) connections per second. >> >> In this case we'll always evict at least one entry per gc cycle and scan >> interval is always at 1 jiffy because of this test: >> >> } else if (expired_count) { >> gc_work->next_gc_run /= 2U; >> next_run = msecs_to_jiffies(1); >> >> being true almost all the time. >> >> Given we scan ~10k entries per run its clearly wrong to reduce interval >> based on nonzero eviction count, it will only waste cpu cycles since a vast >> majorities of conntracks are not timed out. >> >> Thus only look at the ratio (scanned entries vs. evicted entries) to make >> a decision on whether to reduce or not. >> >> Because evictor is supposed to only kick in when system turns idle after >> a busy period, pick a high ratio -- this makes it 50%. We thus keep >> the idea of increasing scan rate when its likely that table contains many >> expired entries. >> >> In order to not let timed-out entries hang around for too long >> (important when using event logging, in which case we want to timely >> destroy events), we now scan the full table within at most >> GC_MAX_SCAN_JIFFIES (16 seconds) even in worst-case scenario where all >> timed-out entries sit in same slot. >> >> I tested this with a vm under synflood (with >> sysctl net.netfilter.nf_conntrack_tcp_timeout_syn_recv=3). >> >> While flood is ongoing, interval now stays at its max rate >> (GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV -> 125ms). >> >> With feedback from Nicolas Dichtel. >> >> Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> >> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> >> Fixes: b87a2f9199ea82eaadc ("netfilter: conntrack: add gc worker to remove timed-out entries") >> Signed-off-by: Florian Westphal <fw@strlen.de> >> Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> >> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> >> Tested-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> >> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> > Pablo, is it possible to queue this patch (and the previous: 08/14) for the 4.9 > stable? Pablo, should I submit it formally? Regards, Nicolas
On Wed, Mar 01, 2017 at 04:02:53PM +0100, Nicolas Dichtel wrote: > Le 27/01/2017 à 17:51, Nicolas Dichtel a écrit : > > Le 26/01/2017 à 17:38, Pablo Neira Ayuso a écrit : > >> From: Florian Westphal <fw@strlen.de> > >> > >> This further refines the changes made to conntrack gc_worker in > >> commit e0df8cae6c16 ("netfilter: conntrack: refine gc worker heuristics"). > >> > >> The main idea of that change was to reduce the scan interval when evictions > >> take place. > >> > >> However, on the reporters' setup, there are 1-2 million conntrack entries > >> in total and roughly 8k new (and closing) connections per second. > >> > >> In this case we'll always evict at least one entry per gc cycle and scan > >> interval is always at 1 jiffy because of this test: > >> > >> } else if (expired_count) { > >> gc_work->next_gc_run /= 2U; > >> next_run = msecs_to_jiffies(1); > >> > >> being true almost all the time. > >> > >> Given we scan ~10k entries per run its clearly wrong to reduce interval > >> based on nonzero eviction count, it will only waste cpu cycles since a vast > >> majorities of conntracks are not timed out. > >> > >> Thus only look at the ratio (scanned entries vs. evicted entries) to make > >> a decision on whether to reduce or not. > >> > >> Because evictor is supposed to only kick in when system turns idle after > >> a busy period, pick a high ratio -- this makes it 50%. We thus keep > >> the idea of increasing scan rate when its likely that table contains many > >> expired entries. > >> > >> In order to not let timed-out entries hang around for too long > >> (important when using event logging, in which case we want to timely > >> destroy events), we now scan the full table within at most > >> GC_MAX_SCAN_JIFFIES (16 seconds) even in worst-case scenario where all > >> timed-out entries sit in same slot. > >> > >> I tested this with a vm under synflood (with > >> sysctl net.netfilter.nf_conntrack_tcp_timeout_syn_recv=3). > >> > >> While flood is ongoing, interval now stays at its max rate > >> (GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV -> 125ms). > >> > >> With feedback from Nicolas Dichtel. > >> > >> Reported-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> > >> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> > >> Fixes: b87a2f9199ea82eaadc ("netfilter: conntrack: add gc worker to remove timed-out entries") > >> Signed-off-by: Florian Westphal <fw@strlen.de> > >> Tested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> > >> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> > >> Tested-by: Denys Fedoryshchenko <nuclearcat@nuclearcat.com> > >> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> > > Pablo, is it possible to queue this patch (and the previous: 08/14) for the 4.9 > > stable? > > Pablo, should I submit it formally? Just requested this to Greg, sorry this didn't happen so far. Thanks.
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 6feb5d370319..4e8083c5e01d 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -85,9 +85,11 @@ static __read_mostly DEFINE_SPINLOCK(nf_conntrack_locks_all_lock); static __read_mostly bool nf_conntrack_locks_all; /* every gc cycle scans at most 1/GC_MAX_BUCKETS_DIV part of table */ -#define GC_MAX_BUCKETS_DIV 64u -/* upper bound of scan intervals */ -#define GC_INTERVAL_MAX (2 * HZ) +#define GC_MAX_BUCKETS_DIV 128u +/* upper bound of full table scan */ +#define GC_MAX_SCAN_JIFFIES (16u * HZ) +/* desired ratio of entries found to be expired */ +#define GC_EVICT_RATIO 50u static struct conntrack_gc_work conntrack_gc_work; @@ -936,6 +938,7 @@ static noinline int early_drop(struct net *net, unsigned int _hash) static void gc_worker(struct work_struct *work) { + unsigned int min_interval = max(HZ / GC_MAX_BUCKETS_DIV, 1u); unsigned int i, goal, buckets = 0, expired_count = 0; struct conntrack_gc_work *gc_work; unsigned int ratio, scanned = 0; @@ -994,27 +997,25 @@ static void gc_worker(struct work_struct *work) * 1. Minimize time until we notice a stale entry * 2. Maximize scan intervals to not waste cycles * - * Normally, expired_count will be 0, this increases the next_run time - * to priorize 2) above. + * Normally, expire ratio will be close to 0. * - * As soon as a timed-out entry is found, move towards 1) and increase - * the scan frequency. - * In case we have lots of evictions next scan is done immediately. + * As soon as a sizeable fraction of the entries have expired + * increase scan frequency. */ ratio = scanned ? expired_count * 100 / scanned : 0; - if (ratio >= 90) { - gc_work->next_gc_run = 0; - next_run = 0; - } else if (expired_count) { - gc_work->next_gc_run /= 2U; - next_run = msecs_to_jiffies(1); + if (ratio > GC_EVICT_RATIO) { + gc_work->next_gc_run = min_interval; } else { - if (gc_work->next_gc_run < GC_INTERVAL_MAX) - gc_work->next_gc_run += msecs_to_jiffies(1); + unsigned int max = GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV; - next_run = gc_work->next_gc_run; + BUILD_BUG_ON((GC_MAX_SCAN_JIFFIES / GC_MAX_BUCKETS_DIV) == 0); + + gc_work->next_gc_run += min_interval; + if (gc_work->next_gc_run > max) + gc_work->next_gc_run = max; } + next_run = gc_work->next_gc_run; gc_work->last_bucket = i; queue_delayed_work(system_long_wq, &gc_work->dwork, next_run); } @@ -1022,7 +1023,7 @@ static void gc_worker(struct work_struct *work) static void conntrack_gc_work_init(struct conntrack_gc_work *gc_work) { INIT_DELAYED_WORK(&gc_work->dwork, gc_worker); - gc_work->next_gc_run = GC_INTERVAL_MAX; + gc_work->next_gc_run = HZ; gc_work->exiting = false; } @@ -1914,7 +1915,7 @@ int nf_conntrack_init_start(void) nf_ct_untracked_status_or(IPS_CONFIRMED | IPS_UNTRACKED); conntrack_gc_work_init(&conntrack_gc_work); - queue_delayed_work(system_long_wq, &conntrack_gc_work.dwork, GC_INTERVAL_MAX); + queue_delayed_work(system_long_wq, &conntrack_gc_work.dwork, HZ); return 0;