Message ID | 20210903163020.13741-3-pablo@netfilter.org |
---|---|
State | Accepted |
Delegated to: | Pablo Neira |
Headers | show |
Series | [net,1/5] netfilter: nft_ct: protect nft_ct_pcpu_template_refcnt with mutex | expand |
Hello, On Fri, 3 Sep 2021 18:30:17 +0200, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > From: Florian Westphal <fw@strlen.de> > > conntrack has two distinct table size settings: > nf_conntrack_max and nf_conntrack_buckets. > > The former limits how many conntrack objects are allowed to exist > in each namespace. > > The second sets the size of the hashtable. > > As all entries are inserted twice (once for original direction, once for > reply), there should be at least twice as many buckets in the table than > the maximum number of conntrack objects that can exist at the same time. > > Change the default multiplier to 1 and increase the chosen bucket sizes. > This results in the same nf_conntrack_max settings as before but reduces > the average bucket list length. [...] > nf_conntrack_htable_size > = (((nr_pages << PAGE_SHIFT) / 16384) > / sizeof(struct hlist_head)); > - if (nr_pages > (4 * (1024 * 1024 * 1024 / PAGE_SIZE))) > - nf_conntrack_htable_size = 65536; > + if (BITS_PER_LONG >= 64 && > + nr_pages > (4 * (1024 * 1024 * 1024 / PAGE_SIZE))) > + nf_conntrack_htable_size = 262144; > else if (nr_pages > (1024 * 1024 * 1024 / PAGE_SIZE)) > - nf_conntrack_htable_size = 16384; [...] > + nf_conntrack_htable_size = 65536; With this formula, there seems to be a discontinuity between the proportional and fixed regimes: 64bits: 4GB/16k/8 = 32k, which gets bumped to 256k 32bits: 1GB/16k/4 = 16k, which gets bumped to 64k Is this intentional ? The background for my interest in this formula comes from OpenWRT: low-RAM devices intended to handle a lot of connections, which led OpenWRT to use sysctl to increase the maximum number of entries in this hash table compared to what this formula produces. Unfortunately, the result is that not-so-low-RAM devices running OpenWRT get the same limit as low-RAM devices, so I am trying to tweak the divisor in the first expression and getting rid of the sysctl call. But then I am failing to see how I should adapt the expressions in these "if"s blocks. If they were maximum sizes (say, something like nf_conntrack_htable_size = max(nf_conntrack_htable_size, 256k)), I would understand, but I find this discontinuity surprising. Am I missing something ? For reference, this change is commit d532bcd0b2699d84d71a0c71d37157ac6eb3be25 in Linus' tree. Regards,
Vincent Pelletier <plr.vincent@gmail.com> wrote: > On Fri, 3 Sep 2021 18:30:17 +0200, Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > From: Florian Westphal <fw@strlen.de> > > > > conntrack has two distinct table size settings: > > nf_conntrack_max and nf_conntrack_buckets. > > > > The former limits how many conntrack objects are allowed to exist > > in each namespace. > > > > The second sets the size of the hashtable. > > > > As all entries are inserted twice (once for original direction, once for > > reply), there should be at least twice as many buckets in the table than > > the maximum number of conntrack objects that can exist at the same time. > > > > Change the default multiplier to 1 and increase the chosen bucket sizes. > > This results in the same nf_conntrack_max settings as before but reduces > > the average bucket list length. > [...] > > nf_conntrack_htable_size > > = (((nr_pages << PAGE_SHIFT) / 16384) > > / sizeof(struct hlist_head)); > > - if (nr_pages > (4 * (1024 * 1024 * 1024 / PAGE_SIZE))) > > - nf_conntrack_htable_size = 65536; > > + if (BITS_PER_LONG >= 64 && > > + nr_pages > (4 * (1024 * 1024 * 1024 / PAGE_SIZE))) > > + nf_conntrack_htable_size = 262144; > > else if (nr_pages > (1024 * 1024 * 1024 / PAGE_SIZE)) > > - nf_conntrack_htable_size = 16384; > [...] > > + nf_conntrack_htable_size = 65536; > > With this formula, there seems to be a discontinuity between the > proportional and fixed regimes: > 64bits: 4GB/16k/8 = 32k, which gets bumped to 256k > 32bits: 1GB/16k/4 = 16k, which gets bumped to 64k > > Is this intentional ? There is no science here. This tries to pick a sane default setting, thats all. Its not possible to pick one that works for everyone and everything. 32bit kernel can't access more than 1GB so I did not want to increase that too much. These are default settings, users should be free to pick any value they like/need.
diff --git a/Documentation/networking/nf_conntrack-sysctl.rst b/Documentation/networking/nf_conntrack-sysctl.rst index 024d784157c8..de3815dd4d49 100644 --- a/Documentation/networking/nf_conntrack-sysctl.rst +++ b/Documentation/networking/nf_conntrack-sysctl.rst @@ -17,9 +17,8 @@ nf_conntrack_acct - BOOLEAN nf_conntrack_buckets - INTEGER Size of hash table. If not specified as parameter during module loading, the default size is calculated by dividing total memory - by 16384 to determine the number of buckets but the hash table will - never have fewer than 32 and limited to 16384 buckets. For systems - with more than 4GB of memory it will be 65536 buckets. + by 16384 to determine the number of buckets. The hash table will + never have fewer than 1024 and never more than 262144 buckets. This sysctl is only writeable in the initial net namespace. nf_conntrack_checksum - BOOLEAN @@ -100,8 +99,12 @@ nf_conntrack_log_invalid - INTEGER Log invalid packets of a type specified by value. nf_conntrack_max - INTEGER - Size of connection tracking table. Default value is - nf_conntrack_buckets value * 4. + Maximum number of allowed connection tracking entries. This value is set + to nf_conntrack_buckets by default. + Note that connection tracking entries are added to the table twice -- once + for the original direction and once for the reply direction (i.e., with + the reversed address). This means that with default settings a maxed-out + table will have a average hash chain length of 2, not 1. nf_conntrack_tcp_be_liberal - BOOLEAN - 0 - disabled (default) diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index d31dbccbe7bd..cdd8a1dc2275 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -2594,26 +2594,24 @@ int nf_conntrack_init_start(void) spin_lock_init(&nf_conntrack_locks[i]); if (!nf_conntrack_htable_size) { - /* Idea from tcp.c: use 1/16384 of memory. - * On i386: 32MB machine has 512 buckets. - * >= 1GB machines have 16384 buckets. - * >= 4GB machines have 65536 buckets. - */ nf_conntrack_htable_size = (((nr_pages << PAGE_SHIFT) / 16384) / sizeof(struct hlist_head)); - if (nr_pages > (4 * (1024 * 1024 * 1024 / PAGE_SIZE))) - nf_conntrack_htable_size = 65536; + if (BITS_PER_LONG >= 64 && + nr_pages > (4 * (1024 * 1024 * 1024 / PAGE_SIZE))) + nf_conntrack_htable_size = 262144; else if (nr_pages > (1024 * 1024 * 1024 / PAGE_SIZE)) - nf_conntrack_htable_size = 16384; - if (nf_conntrack_htable_size < 32) - nf_conntrack_htable_size = 32; - - /* Use a max. factor of four by default to get the same max as - * with the old struct list_heads. When a table size is given - * we use the old value of 8 to avoid reducing the max. - * entries. */ - max_factor = 4; + nf_conntrack_htable_size = 65536; + + if (nf_conntrack_htable_size < 1024) + nf_conntrack_htable_size = 1024; + /* Use a max. factor of one by default to keep the average + * hash chain length at 2 entries. Each entry has to be added + * twice (once for original direction, once for reply). + * When a table size is given we use the old value of 8 to + * avoid implicit reduction of the max entries setting. + */ + max_factor = 1; } nf_conntrack_hash = nf_ct_alloc_hashtable(&nf_conntrack_htable_size, 1);