diff mbox series

[net,2/5] netfilter: conntrack: sanitize table size default settings

Message ID 20210903163020.13741-3-pablo@netfilter.org
State Accepted
Delegated to: Pablo Neira
Headers show
Series [net,1/5] netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutex | expand

Commit Message

Pablo Neira Ayuso Sept. 3, 2021, 4:30 p.m. UTC
From: Florian Westphal <fw@strlen.de>

conntrack has two distinct table size settings:
nf_conntrack_max and nf_conntrack_buckets.

The former limits how many conntrack objects are allowed to exist
in each namespace.

The second sets the size of the hashtable.

As all entries are inserted twice (once for original direction, once for
reply), there should be at least twice as many buckets in the table than
the maximum number of conntrack objects that can exist at the same time.

Change the default multiplier to 1 and increase the chosen bucket sizes.
This results in the same nf_conntrack_max settings as before but reduces
the average bucket list length.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 .../networking/nf_conntrack-sysctl.rst        | 13 ++++----
 net/netfilter/nf_conntrack_core.c             | 30 +++++++++----------
 2 files changed, 22 insertions(+), 21 deletions(-)

Comments

Vincent Pelletier March 31, 2022, 2:59 p.m. UTC | #1
Hello,

On Fri,  3 Sep 2021 18:30:17 +0200, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> From: Florian Westphal <fw@strlen.de>
> 
> conntrack has two distinct table size settings:
> nf_conntrack_max and nf_conntrack_buckets.
> 
> The former limits how many conntrack objects are allowed to exist
> in each namespace.
> 
> The second sets the size of the hashtable.
> 
> As all entries are inserted twice (once for original direction, once for
> reply), there should be at least twice as many buckets in the table than
> the maximum number of conntrack objects that can exist at the same time.
> 
> Change the default multiplier to 1 and increase the chosen bucket sizes.
> This results in the same nf_conntrack_max settings as before but reduces
> the average bucket list length.
[...]
>  		nf_conntrack_htable_size
>  			= (((nr_pages << PAGE_SHIFT) / 16384)
>  			   / sizeof(struct hlist_head));
> -		if (nr_pages > (4 * (1024 * 1024 * 1024 / PAGE_SIZE)))
> -			nf_conntrack_htable_size = 65536;
> +		if (BITS_PER_LONG >= 64 &&
> +		    nr_pages > (4 * (1024 * 1024 * 1024 / PAGE_SIZE)))
> +			nf_conntrack_htable_size = 262144;
>  		else if (nr_pages > (1024 * 1024 * 1024 / PAGE_SIZE))
> -			nf_conntrack_htable_size = 16384;
[...]
> +			nf_conntrack_htable_size = 65536;

With this formula, there seems to be a discontinuity between the
proportional and fixed regimes:
64bits: 4GB/16k/8 = 32k, which gets bumped to 256k
32bits: 1GB/16k/4 = 16k, which gets bumped to 64k

Is this intentional ?

The background for my interest in this formula comes from OpenWRT:
low-RAM devices intended to handle a lot of connections, which led
OpenWRT to use sysctl to increase the maximum number of entries in this
hash table compared to what this formula produces.
Unfortunately, the result is that not-so-low-RAM devices running
OpenWRT get the same limit as low-RAM devices, so I am trying to tweak
the divisor in the first expression and getting rid of the sysctl call.
But then I am failing to see how I should adapt the expressions in
these "if"s blocks.

If they were maximum sizes (say, something like
nf_conntrack_htable_size = max(nf_conntrack_htable_size, 256k)), I
would understand, but I find this discontinuity surprising.

Am I missing something ?

For reference, this change is
  commit d532bcd0b2699d84d71a0c71d37157ac6eb3be25
in Linus' tree.

Regards,
Florian Westphal March 31, 2022, 3:21 p.m. UTC | #2
Vincent Pelletier <plr.vincent@gmail.com> wrote:
> On Fri,  3 Sep 2021 18:30:17 +0200, Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > From: Florian Westphal <fw@strlen.de>
> > 
> > conntrack has two distinct table size settings:
> > nf_conntrack_max and nf_conntrack_buckets.
> > 
> > The former limits how many conntrack objects are allowed to exist
> > in each namespace.
> > 
> > The second sets the size of the hashtable.
> > 
> > As all entries are inserted twice (once for original direction, once for
> > reply), there should be at least twice as many buckets in the table than
> > the maximum number of conntrack objects that can exist at the same time.
> > 
> > Change the default multiplier to 1 and increase the chosen bucket sizes.
> > This results in the same nf_conntrack_max settings as before but reduces
> > the average bucket list length.
> [...]
> >  		nf_conntrack_htable_size
> >  			= (((nr_pages << PAGE_SHIFT) / 16384)
> >  			   / sizeof(struct hlist_head));
> > -		if (nr_pages > (4 * (1024 * 1024 * 1024 / PAGE_SIZE)))
> > -			nf_conntrack_htable_size = 65536;
> > +		if (BITS_PER_LONG >= 64 &&
> > +		    nr_pages > (4 * (1024 * 1024 * 1024 / PAGE_SIZE)))
> > +			nf_conntrack_htable_size = 262144;
> >  		else if (nr_pages > (1024 * 1024 * 1024 / PAGE_SIZE))
> > -			nf_conntrack_htable_size = 16384;
> [...]
> > +			nf_conntrack_htable_size = 65536;
> 
> With this formula, there seems to be a discontinuity between the
> proportional and fixed regimes:
> 64bits: 4GB/16k/8 = 32k, which gets bumped to 256k
> 32bits: 1GB/16k/4 = 16k, which gets bumped to 64k
> 
> Is this intentional ?

There is no science here.  This tries to pick a sane default setting,
thats all. Its not possible to pick one that works for everyone and everything.

32bit kernel can't access more than 1GB so I did not want to
increase that too much.

These are default settings, users should be free to pick any value they
like/need.
diff mbox series

Patch

diff --git a/Documentation/networking/nf_conntrack-sysctl.rst b/Documentation/networking/nf_conntrack-sysctl.rst
index 024d784157c8..de3815dd4d49 100644
--- a/Documentation/networking/nf_conntrack-sysctl.rst
+++ b/Documentation/networking/nf_conntrack-sysctl.rst
@@ -17,9 +17,8 @@  nf_conntrack_acct - BOOLEAN
 nf_conntrack_buckets - INTEGER
 	Size of hash table. If not specified as parameter during module
 	loading, the default size is calculated by dividing total memory
-	by 16384 to determine the number of buckets but the hash table will
-	never have fewer than 32 and limited to 16384 buckets. For systems
-	with more than 4GB of memory it will be 65536 buckets.
+	by 16384 to determine the number of buckets. The hash table will
+	never have fewer than 1024 and never more than 262144 buckets.
 	This sysctl is only writeable in the initial net namespace.
 
 nf_conntrack_checksum - BOOLEAN
@@ -100,8 +99,12 @@  nf_conntrack_log_invalid - INTEGER
 	Log invalid packets of a type specified by value.
 
 nf_conntrack_max - INTEGER
-	Size of connection tracking table.  Default value is
-	nf_conntrack_buckets value * 4.
+        Maximum number of allowed connection tracking entries. This value is set
+        to nf_conntrack_buckets by default.
+        Note that connection tracking entries are added to the table twice -- once
+        for the original direction and once for the reply direction (i.e., with
+        the reversed address). This means that with default settings a maxed-out
+        table will have a average hash chain length of 2, not 1.
 
 nf_conntrack_tcp_be_liberal - BOOLEAN
 	- 0 - disabled (default)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index d31dbccbe7bd..cdd8a1dc2275 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -2594,26 +2594,24 @@  int nf_conntrack_init_start(void)
 		spin_lock_init(&nf_conntrack_locks[i]);
 
 	if (!nf_conntrack_htable_size) {
-		/* Idea from tcp.c: use 1/16384 of memory.
-		 * On i386: 32MB machine has 512 buckets.
-		 * >= 1GB machines have 16384 buckets.
-		 * >= 4GB machines have 65536 buckets.
-		 */
 		nf_conntrack_htable_size
 			= (((nr_pages << PAGE_SHIFT) / 16384)
 			   / sizeof(struct hlist_head));
-		if (nr_pages > (4 * (1024 * 1024 * 1024 / PAGE_SIZE)))
-			nf_conntrack_htable_size = 65536;
+		if (BITS_PER_LONG >= 64 &&
+		    nr_pages > (4 * (1024 * 1024 * 1024 / PAGE_SIZE)))
+			nf_conntrack_htable_size = 262144;
 		else if (nr_pages > (1024 * 1024 * 1024 / PAGE_SIZE))
-			nf_conntrack_htable_size = 16384;
-		if (nf_conntrack_htable_size < 32)
-			nf_conntrack_htable_size = 32;
-
-		/* Use a max. factor of four by default to get the same max as
-		 * with the old struct list_heads. When a table size is given
-		 * we use the old value of 8 to avoid reducing the max.
-		 * entries. */
-		max_factor = 4;
+			nf_conntrack_htable_size = 65536;
+
+		if (nf_conntrack_htable_size < 1024)
+			nf_conntrack_htable_size = 1024;
+		/* Use a max. factor of one by default to keep the average
+		 * hash chain length at 2 entries.  Each entry has to be added
+		 * twice (once for original direction, once for reply).
+		 * When a table size is given we use the old value of 8 to
+		 * avoid implicit reduction of the max entries setting.
+		 */
+		max_factor = 1;
 	}
 
 	nf_conntrack_hash = nf_ct_alloc_hashtable(&nf_conntrack_htable_size, 1);