Patchwork Fix panic in __d_lookup with high dentry hashtable counts

login
register
mail settings
Submitter Dimitri Sivanich
Date Jan. 17, 2012, 5:13 p.m.
Message ID <20120117171352.GA18738@sgi.com>
Download mbox | patch
Permalink /patch/136505/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Dimitri Sivanich - Jan. 17, 2012, 5:13 p.m.
When the number of dentry cache hash table entries gets too high
(2147483648 entries), as happens by default on a 16TB system, use
of a signed integer in the dcache_init() initialization loop prevents
the dentry_hashtable from getting initialized, causing a panic in
__d_lookup().

In addition, the _hash_mask returned from alloc_large_system_hash() does
not support more than a 32 bit hash table size.

Changing the _hash_mask size returned from alloc_large_system_hash() to
support larger hash table sizes in the future, and changing loop counter
sizes appropriately.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
---
 fs/dcache.c             |   10 +++++-----
 fs/inode.c              |   10 +++++-----
 include/linux/bootmem.h |    2 +-
 mm/page_alloc.c         |    2 +-
 net/ipv4/route.c        |    8 ++++++--
 net/ipv4/tcp.c          |   13 +++++++++----
 net/ipv4/udp.c          |    5 ++++-
 7 files changed, 31 insertions(+), 19 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Jan. 17, 2012, 5:22 p.m.
From: Dimitri Sivanich <sivanich@sgi.com>
Date: Tue, 17 Jan 2012 11:13:52 -0600

> When the number of dentry cache hash table entries gets too high
> (2147483648 entries), as happens by default on a 16TB system, use
> of a signed integer in the dcache_init() initialization loop prevents
> the dentry_hashtable from getting initialized, causing a panic in
> __d_lookup().
> 
> In addition, the _hash_mask returned from alloc_large_system_hash() does
> not support more than a 32 bit hash table size.
> 
> Changing the _hash_mask size returned from alloc_large_system_hash() to
> support larger hash table sizes in the future, and changing loop counter
> sizes appropriately.
> 
> Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>

To be honest I think this is overkill.

Supporting anything larger than a 32-bit hash mask is not even close
to being reasonable.  Nobody needs a 4GB hash table, not for anything.

Instead I would just make sure everything is "unsigned int" or "u32"
and calculations use things like "((u32) 1) << shift", and enforce an
upper bounds of 0x80000000 or similar unconditionally in the hash
allocator itself (rather than conditionally in the networking code).

All of this "long" stuff is madness, what the heck is a long?  It's a
non-fixed type, yet you put constants in your code (0x80000000) which
depend upon that type's size.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro - Jan. 17, 2012, 5:25 p.m.
On Tue, Jan 17, 2012 at 11:13:52AM -0600, Dimitri Sivanich wrote:
> When the number of dentry cache hash table entries gets too high
> (2147483648 entries), as happens by default on a 16TB system, use
> of a signed integer in the dcache_init() initialization loop prevents
> the dentry_hashtable from getting initialized, causing a panic in
> __d_lookup().
> 
> In addition, the _hash_mask returned from alloc_large_system_hash() does
> not support more than a 32 bit hash table size.
> 
> Changing the _hash_mask size returned from alloc_large_system_hash() to
> support larger hash table sizes in the future, and changing loop counter
> sizes appropriately.

... and I still would like to see somebody familiar with uses of other
hashes to comment on the desirability of such monsters.  For dcache and
icache it's absolutely certain to be worse than useless.  We are talking
about 4Gbuckets here...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Jan. 17, 2012, 5:28 p.m.
From: Al Viro <viro@ZenIV.linux.org.uk>
Date: Tue, 17 Jan 2012 17:25:27 +0000

> On Tue, Jan 17, 2012 at 11:13:52AM -0600, Dimitri Sivanich wrote:
>> When the number of dentry cache hash table entries gets too high
>> (2147483648 entries), as happens by default on a 16TB system, use
>> of a signed integer in the dcache_init() initialization loop prevents
>> the dentry_hashtable from getting initialized, causing a panic in
>> __d_lookup().
>> 
>> In addition, the _hash_mask returned from alloc_large_system_hash() does
>> not support more than a 32 bit hash table size.
>> 
>> Changing the _hash_mask size returned from alloc_large_system_hash() to
>> support larger hash table sizes in the future, and changing loop counter
>> sizes appropriately.
> 
> ... and I still would like to see somebody familiar with uses of other
> hashes to comment on the desirability of such monsters.  For dcache and
> icache it's absolutely certain to be worse than useless.  We are talking
> about 4Gbuckets here...

It's wrong in networking too.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dimitri Sivanich - Jan. 17, 2012, 5:41 p.m.
On Tue, Jan 17, 2012 at 12:22:29PM -0500, David Miller wrote:
> From: Dimitri Sivanich <sivanich@sgi.com>
> Date: Tue, 17 Jan 2012 11:13:52 -0600
> 
> > When the number of dentry cache hash table entries gets too high
> > (2147483648 entries), as happens by default on a 16TB system, use
> > of a signed integer in the dcache_init() initialization loop prevents
> > the dentry_hashtable from getting initialized, causing a panic in
> > __d_lookup().
> > 
> > In addition, the _hash_mask returned from alloc_large_system_hash() does
> > not support more than a 32 bit hash table size.
> > 
> > Changing the _hash_mask size returned from alloc_large_system_hash() to
> > support larger hash table sizes in the future, and changing loop counter
> > sizes appropriately.
> > 
> > Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
> 
> To be honest I think this is overkill.

I'm not going to flat-out disagree with you.  These would be huge hash
tables.  The thought was to make this __init code as flexible as possible.

> 
> Supporting anything larger than a 32-bit hash mask is not even close
> to being reasonable.  Nobody needs a 4GB hash table, not for anything.

Yes, at this point that is likely true.

> 
> Instead I would just make sure everything is "unsigned int" or "u32"
> and calculations use things like "((u32) 1) << shift", and enforce an
> upper bounds of 0x80000000 or similar unconditionally in the hash
> allocator itself (rather than conditionally in the networking code).

OK.  I had mentioned capping the value in alloc_large_system_hash() to
32 bits, but got no response to that proposal.  I'll create a proper
patch.

> 
> All of this "long" stuff is madness, what the heck is a long?  It's a
> non-fixed type, yet you put constants in your code (0x80000000) which
> depend upon that type's size.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

Index: linux/fs/dcache.c
===================================================================
--- linux.orig/fs/dcache.c
+++ linux/fs/dcache.c
@@ -99,7 +99,7 @@  static struct kmem_cache *dentry_cache _
 #define D_HASHBITS     d_hash_shift
 #define D_HASHMASK     d_hash_mask
 
-static unsigned int d_hash_mask __read_mostly;
+static unsigned long d_hash_mask __read_mostly;
 static unsigned int d_hash_shift __read_mostly;
 
 static struct hlist_bl_head *dentry_hashtable __read_mostly;
@@ -2968,7 +2968,7 @@  __setup("dhash_entries=", set_dhash_entr
 
 static void __init dcache_init_early(void)
 {
-	int loop;
+	unsigned long loop;
 
 	/* If hashes are distributed across NUMA nodes, defer
 	 * hash allocation until vmalloc space is available.
@@ -2986,13 +2986,13 @@  static void __init dcache_init_early(voi
 					&d_hash_mask,
 					0);
 
-	for (loop = 0; loop < (1 << d_hash_shift); loop++)
+	for (loop = 0; loop < (1UL << d_hash_shift); loop++)
 		INIT_HLIST_BL_HEAD(dentry_hashtable + loop);
 }
 
 static void __init dcache_init(void)
 {
-	int loop;
+	unsigned long loop;
 
 	/* 
 	 * A constructor could be added for stable state like the lists,
@@ -3016,7 +3016,7 @@  static void __init dcache_init(void)
 					&d_hash_mask,
 					0);
 
-	for (loop = 0; loop < (1 << d_hash_shift); loop++)
+	for (loop = 0; loop < (1UL << d_hash_shift); loop++)
 		INIT_HLIST_BL_HEAD(dentry_hashtable + loop);
 }
 
Index: linux/fs/inode.c
===================================================================
--- linux.orig/fs/inode.c
+++ linux/fs/inode.c
@@ -60,7 +60,7 @@ 
  *   inode_hash_lock
  */
 
-static unsigned int i_hash_mask __read_mostly;
+static unsigned long i_hash_mask __read_mostly;
 static unsigned int i_hash_shift __read_mostly;
 static struct hlist_head *inode_hashtable __read_mostly;
 static __cacheline_aligned_in_smp DEFINE_SPINLOCK(inode_hash_lock);
@@ -1654,7 +1654,7 @@  __setup("ihash_entries=", set_ihash_entr
  */
 void __init inode_init_early(void)
 {
-	int loop;
+	unsigned long loop;
 
 	/* If hashes are distributed across NUMA nodes, defer
 	 * hash allocation until vmalloc space is available.
@@ -1672,13 +1672,13 @@  void __init inode_init_early(void)
 					&i_hash_mask,
 					0);
 
-	for (loop = 0; loop < (1 << i_hash_shift); loop++)
+	for (loop = 0; loop < (1UL << i_hash_shift); loop++)
 		INIT_HLIST_HEAD(&inode_hashtable[loop]);
 }
 
 void __init inode_init(void)
 {
-	int loop;
+	unsigned long loop;
 
 	/* inode slab cache */
 	inode_cachep = kmem_cache_create("inode_cache",
@@ -1702,7 +1702,7 @@  void __init inode_init(void)
 					&i_hash_mask,
 					0);
 
-	for (loop = 0; loop < (1 << i_hash_shift); loop++)
+	for (loop = 0; loop < (1UL << i_hash_shift); loop++)
 		INIT_HLIST_HEAD(&inode_hashtable[loop]);
 }
 
Index: linux/include/linux/bootmem.h
===================================================================
--- linux.orig/include/linux/bootmem.h
+++ linux/include/linux/bootmem.h
@@ -153,7 +153,7 @@  extern void *alloc_large_system_hash(con
 				     int scale,
 				     int flags,
 				     unsigned int *_hash_shift,
-				     unsigned int *_hash_mask,
+				     unsigned long *_hash_mask,
 				     unsigned long limit);
 
 #define HASH_EARLY	0x00000001	/* Allocating during early boot? */
Index: linux/mm/page_alloc.c
===================================================================
--- linux.orig/mm/page_alloc.c
+++ linux/mm/page_alloc.c
@@ -5219,7 +5219,7 @@  void *__init alloc_large_system_hash(con
 				     int scale,
 				     int flags,
 				     unsigned int *_hash_shift,
-				     unsigned int *_hash_mask,
+				     unsigned long *_hash_mask,
 				     unsigned long limit)
 {
 	unsigned long long max = limit;
Index: linux/net/ipv4/route.c
===================================================================
--- linux.orig/net/ipv4/route.c
+++ linux/net/ipv4/route.c
@@ -3446,6 +3446,7 @@  __setup("rhash_entries=", set_rhash_entr
 
 int __init ip_rt_init(void)
 {
+	unsigned long hash_mask;
 	int rc = 0;
 
 #ifdef CONFIG_IP_ROUTE_CLASSID
@@ -3474,8 +3475,11 @@  int __init ip_rt_init(void)
 					15 : 17,
 					0,
 					&rt_hash_log,
-					&rt_hash_mask,
-					rhash_entries ? 0 : 512 * 1024);
+					&hash_mask,
+					rhash_entries ? 0x80000000 :
+								512 * 1024);
+	/* FIXME: Above limit value (0x80000000) allows the following cast. */
+	rt_hash_mask = (unsigned int) hash_mask;
 	memset(rt_hash_table, 0, (rt_hash_mask + 1) * sizeof(struct rt_hash_bucket));
 	rt_hash_lock_init();
 
Index: linux/net/ipv4/tcp.c
===================================================================
--- linux.orig/net/ipv4/tcp.c
+++ linux/net/ipv4/tcp.c
@@ -3219,8 +3219,9 @@  __setup("thash_entries=", set_thash_entr
 void __init tcp_init(void)
 {
 	struct sk_buff *skb = NULL;
-	unsigned long limit;
-	int i, max_share, cnt;
+	unsigned long limit, hash_mask;
+	unsigned i;
+	int max_share, cnt;
 	unsigned long jiffy = jiffies;
 
 	BUILD_BUG_ON(sizeof(struct tcp_skb_cb) > sizeof(skb->cb));
@@ -3245,8 +3246,12 @@  void __init tcp_init(void)
 					13 : 15,
 					0,
 					NULL,
-					&tcp_hashinfo.ehash_mask,
-					thash_entries ? 0 : 512 * 1024);
+					&hash_mask,
+					thash_entries ? 0x80000000 :
+								512 * 1024);
+	/* FIXME: Above limit value (0x80000000) allows the following cast. */
+	tcp_hashinfo.ehash_mask = (unsigned int)hash_mask;
+
 	for (i = 0; i <= tcp_hashinfo.ehash_mask; i++) {
 		INIT_HLIST_NULLS_HEAD(&tcp_hashinfo.ehash[i].chain, i);
 		INIT_HLIST_NULLS_HEAD(&tcp_hashinfo.ehash[i].twchain, i);
Index: linux/net/ipv4/udp.c
===================================================================
--- linux.orig/net/ipv4/udp.c
+++ linux/net/ipv4/udp.c
@@ -2180,6 +2180,7 @@  __setup("uhash_entries=", set_uhash_entr
 
 void __init udp_table_init(struct udp_table *table, const char *name)
 {
+	unsigned long hash_mask;
 	unsigned int i;
 
 	if (!CONFIG_BASE_SMALL)
@@ -2189,8 +2190,10 @@  void __init udp_table_init(struct udp_ta
 			21, /* one slot per 2 MB */
 			0,
 			&table->log,
-			&table->mask,
+			&hash_mask,
 			64 * 1024);
+	/* FIXME: Above limit value (64 * 1024) allows the following cast. */
+	table->mask = (unsigned int)hash_mask;
 	/*
 	 * Make sure hash table has the minimum size
 	 */