diff mbox

[net] rhashtable: avoid large lock-array allocations

Message ID 1470968023-14338-1-git-send-email-fw@strlen.de
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Florian Westphal Aug. 12, 2016, 2:13 a.m. UTC
Sander reports following splat after netfilter nat bysrc table got
converted to rhashtable:

swapper/0: page allocation failure: order:3, mode:0x2084020(GFP_ATOMIC|__GFP_COMP)
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc1 [..]
 [<ffffffff811633ed>] warn_alloc_failed+0xdd/0x140
 [<ffffffff811638b1>] __alloc_pages_nodemask+0x3e1/0xcf0
 [<ffffffff811a72ed>] alloc_pages_current+0x8d/0x110
 [<ffffffff8117cb7f>] kmalloc_order+0x1f/0x70
 [<ffffffff811aec19>] __kmalloc+0x129/0x140
 [<ffffffff8146d561>] bucket_table_alloc+0xc1/0x1d0
 [<ffffffff8146da1d>] rhashtable_insert_rehash+0x5d/0xe0
 [<ffffffff819fcfff>] nf_nat_setup_info+0x2ef/0x400

The failure happens when allocating the spinlock array.
Even with GFP_KERNEL its unlikely for such a large allocation
to succeed.

Thomas Graf pointed me at inet_ehash_locks_alloc(), so in addition
to adding NOWARN for atomic allocations this also makes the bucket-array
sizing more conservative.

In commit 095dc8e0c3686 ("tcp: fix/cleanup inet_ehash_locks_alloc()"),
Eric Dumazet says: "Budget 2 cache lines per cpu worth of 'spinlocks'".
IOW, consider size needed by a single spinlock when determining
number of locks per cpu.

Currently, rhashtable just allocates 128 locks per cpu which gives
factor of 4 more than what inet hashtable uses with same number of
cpus.

For LOCKDEP, we now allocate a lot less locks than before (1 per cpu on
my test box) so we no longer need to pretend we only have two cpus.

Some sizes (64 byte L1 cache, 4 byte per spinlock, numbers in bytes):

cpus:    1   2   4    8   16    32   64
old:    1k  1k  4k   8k  16k   16k  16k
new:   128 256 512   1k   2k    4k   8k

With 72-byte spinlock (LOCKDEP):
cpus :   1   2   4    8   16    32   64
old:    9k 18k 18k  18k  18k   18k  18k
new:    72 144 288  575  ~1k ~2.3k  ~4k

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Suggested-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 Alernatively we could lower BUCKET_LOCKS_PER_CPU to 32
 and keep the CONFIG_PROVE_LOCKING ifdef around.

 Any preference?

 Thanks!

 lib/rhashtable.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

Comments

kernel test robot Aug. 12, 2016, 2:27 a.m. UTC | #1
Hi Florian,

[auto build test WARNING on net/master]

url:    https://github.com/0day-ci/linux/commits/Florian-Westphal/rhashtable-avoid-large-lock-array-allocations/20160812-101458
config: x86_64-randconfig-x015-201632 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from lib/rhashtable.c:18:0:
   lib/rhashtable.c: In function 'rhashtable_init':
>> lib/rhashtable.c:34:30: warning: division by zero [-Wdiv-by-zero]
              2 * L1_CACHE_BYTES / sizeof(spinlock_t), 1)
                                 ^
   include/linux/kernel.h:787:17: note: in definition of macro 'max_t'
     type __max1 = (x);   \
                    ^
>> lib/rhashtable.c:779:21: note: in expansion of macro 'BUCKET_LOCKS_PER_CPU'
      ht->p.locks_mul = BUCKET_LOCKS_PER_CPU;
                        ^~~~~~~~~~~~~~~~~~~~

vim +34 lib/rhashtable.c

    12	 * This program is free software; you can redistribute it and/or modify
    13	 * it under the terms of the GNU General Public License version 2 as
    14	 * published by the Free Software Foundation.
    15	 */
    16	
    17	#include <linux/atomic.h>
  > 18	#include <linux/kernel.h>
    19	#include <linux/init.h>
    20	#include <linux/log2.h>
    21	#include <linux/sched.h>
    22	#include <linux/slab.h>
    23	#include <linux/vmalloc.h>
    24	#include <linux/mm.h>
    25	#include <linux/jhash.h>
    26	#include <linux/random.h>
    27	#include <linux/rhashtable.h>
    28	#include <linux/err.h>
    29	#include <linux/export.h>
    30	
    31	#define HASH_DEFAULT_SIZE	64UL
    32	#define HASH_MIN_SIZE		4U
    33	#define BUCKET_LOCKS_PER_CPU	max_t(unsigned int, \
  > 34					      2 * L1_CACHE_BYTES / sizeof(spinlock_t), 1)
    35	
    36	static u32 head_hashfn(struct rhashtable *ht,
    37			       const struct bucket_table *tbl,

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Florian Westphal Aug. 12, 2016, 2:31 a.m. UTC | #2
kbuild test robot <lkp@intel.com> wrote:
> Hi Florian,
> 
> [auto build test WARNING on net/master]
> 
> url:    https://github.com/0day-ci/linux/commits/Florian-Westphal/rhashtable-avoid-large-lock-array-allocations/20160812-101458
> config: x86_64-randconfig-x015-201632 (attached as .config)
> compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=x86_64 
> 
> All warnings (new ones prefixed by >>):
> 
>    In file included from lib/rhashtable.c:18:0:
>    lib/rhashtable.c: In function 'rhashtable_init':
> >> lib/rhashtable.c:34:30: warning: division by zero [-Wdiv-by-zero]
>               2 * L1_CACHE_BYTES / sizeof(spinlock_t), 1)

I will send a v2 tomorrow, perhaps its better to just go for
32UL instead of this construct after all.
David Miller Aug. 15, 2016, 4:13 a.m. UTC | #3
From: Florian Westphal <fw@strlen.de>
Date: Fri, 12 Aug 2016 04:13:43 +0200

> Sander reports following splat after netfilter nat bysrc table got
> converted to rhashtable:
> 
> swapper/0: page allocation failure: order:3, mode:0x2084020(GFP_ATOMIC|__GFP_COMP)
>  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc1 [..]
>  [<ffffffff811633ed>] warn_alloc_failed+0xdd/0x140
>  [<ffffffff811638b1>] __alloc_pages_nodemask+0x3e1/0xcf0
>  [<ffffffff811a72ed>] alloc_pages_current+0x8d/0x110
>  [<ffffffff8117cb7f>] kmalloc_order+0x1f/0x70
>  [<ffffffff811aec19>] __kmalloc+0x129/0x140
>  [<ffffffff8146d561>] bucket_table_alloc+0xc1/0x1d0
>  [<ffffffff8146da1d>] rhashtable_insert_rehash+0x5d/0xe0
>  [<ffffffff819fcfff>] nf_nat_setup_info+0x2ef/0x400
> 
> The failure happens when allocating the spinlock array.
> Even with GFP_KERNEL its unlikely for such a large allocation
> to succeed.
> 
> Thomas Graf pointed me at inet_ehash_locks_alloc(), so in addition
> to adding NOWARN for atomic allocations this also makes the bucket-array
> sizing more conservative.
> 
> In commit 095dc8e0c3686 ("tcp: fix/cleanup inet_ehash_locks_alloc()"),
> Eric Dumazet says: "Budget 2 cache lines per cpu worth of 'spinlocks'".
> IOW, consider size needed by a single spinlock when determining
> number of locks per cpu.
> 
> Currently, rhashtable just allocates 128 locks per cpu which gives
> factor of 4 more than what inet hashtable uses with same number of
> cpus.
> 
> For LOCKDEP, we now allocate a lot less locks than before (1 per cpu on
> my test box) so we no longer need to pretend we only have two cpus.
> 
> Some sizes (64 byte L1 cache, 4 byte per spinlock, numbers in bytes):
> 
> cpus:    1   2   4    8   16    32   64
> old:    1k  1k  4k   8k  16k   16k  16k
> new:   128 256 512   1k   2k    4k   8k
> 
> With 72-byte spinlock (LOCKDEP):
> cpus :   1   2   4    8   16    32   64
> old:    9k 18k 18k  18k  18k   18k  18k
> new:    72 144 288  575  ~1k ~2.3k  ~4k
> 
> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
> Suggested-by: Thomas Graf <tgraf@suug.ch>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Applied, thanks Florian.
diff mbox

Patch

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 5d845ff..92cf5a9 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -30,7 +30,8 @@ 
 
 #define HASH_DEFAULT_SIZE	64UL
 #define HASH_MIN_SIZE		4U
-#define BUCKET_LOCKS_PER_CPU   128UL
+#define BUCKET_LOCKS_PER_CPU	max_t(unsigned int, \
+				      2 * L1_CACHE_BYTES / sizeof(spinlock_t), 1)
 
 static u32 head_hashfn(struct rhashtable *ht,
 		       const struct bucket_table *tbl,
@@ -63,14 +64,10 @@  EXPORT_SYMBOL_GPL(lockdep_rht_bucket_is_held);
 static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl,
 			      gfp_t gfp)
 {
-	unsigned int i, size;
-#if defined(CONFIG_PROVE_LOCKING)
-	unsigned int nr_pcpus = 2;
-#else
 	unsigned int nr_pcpus = num_possible_cpus();
-#endif
+	unsigned int i, size;
 
-	nr_pcpus = min_t(unsigned int, nr_pcpus, 32UL);
+	nr_pcpus = min_t(unsigned int, nr_pcpus, 64UL);
 	size = roundup_pow_of_two(nr_pcpus * ht->p.locks_mul);
 
 	/* Never allocate more than 0.5 locks per bucket */
@@ -83,6 +80,9 @@  static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl,
 			tbl->locks = vmalloc(size * sizeof(spinlock_t));
 		else
 #endif
+		if (gfp != GFP_KERNEL)
+			gfp |= __GFP_NOWARN | __GFP_NORETRY;
+
 		tbl->locks = kmalloc_array(size, sizeof(spinlock_t),
 					   gfp);
 		if (!tbl->locks)