diff mbox

[v1,9/10] rhashtable: Allow GFP_ATOMIC bucket table allocation

Message ID E1YZahk-0000jW-Ry@gondolin.me.apana.org.au
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Herbert Xu March 22, 2015, 7:53 a.m. UTC
This patch adds the ability to allocate bucket table with GFP_ATOMIC
instead of GFP_KERNEL.  This is needed when we perform an immediate
rehash during insertion.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 lib/rhashtable.c |   24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Herbert Xu March 22, 2015, 8:02 a.m. UTC | #1
On Sun, Mar 22, 2015 at 06:53:52PM +1100, Herbert Xu wrote:
.
>  	size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
> -	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
> -		tbl = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
> +	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER) ||
> +	    gfp != GFP_KERNEL)
> +		tbl = kzalloc(size, gfp | __GFP_NOWARN | __GFP_NORETRY);
>  	if (tbl == NULL)
>  		tbl = vzalloc(size);

Oops, this will still attempt vzalloc.  I will send a v2.
David Laight March 23, 2015, 12:53 p.m. UTC | #2
From: Herbert Xu
> This patch adds the ability to allocate bucket table with GFP_ATOMIC
> instead of GFP_KERNEL.  This is needed when we perform an immediate
> rehash during insertion.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> ---
> 
>  lib/rhashtable.c |   24 ++++++++++++++----------
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/lib/rhashtable.c b/lib/rhashtable.c
> index c284099..59078ed 100644
> --- a/lib/rhashtable.c
> +++ b/lib/rhashtable.c
> @@ -58,7 +58,8 @@ EXPORT_SYMBOL_GPL(lockdep_rht_bucket_is_held);
>  #endif
> 
> 
> -static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl)
> +static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl,
> +			      gfp_t gfp)
>  {
>  	unsigned int i, size;
>  #if defined(CONFIG_PROVE_LOCKING)
> @@ -75,12 +76,13 @@ static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl)
> 
>  	if (sizeof(spinlock_t) != 0) {
>  #ifdef CONFIG_NUMA
> -		if (size * sizeof(spinlock_t) > PAGE_SIZE)
> +		if (size * sizeof(spinlock_t) > PAGE_SIZE &&
> +		    gfp == GFP_KERNEL)
>  			tbl->locks = vmalloc(size * sizeof(spinlock_t));
>  		else
>  #endif
>  		tbl->locks = kmalloc_array(size, sizeof(spinlock_t),
> -					   GFP_KERNEL);
> +					   gfp);
>  		if (!tbl->locks)
>  			return -ENOMEM;
>  		for (i = 0; i < size; i++)
...

If the lock array can't be allocated, then it is probably best to use
a lock array that is half the size rather than failing the expand.

I'm not sure your current version would allow the old lock array be used.

Does linux have any architectures where someone has decided to make a
spinlock consume an entire cache line rather than just a single word?
If so then the lock array can be gigantic.

Given the lock is only used for insert and delete, I'm also not at
all clear why you allocate 128 locks per cpu for very large tables.
With the locks in their own array I don't think there can be 'false
sharing', the worst than can happen is two cpus spinning on locks
in the same cache line.

It isn't as though the locks are likely to be held for any length
of time either.

	David

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 24, 2015, 3:09 a.m. UTC | #3
On Mon, Mar 23, 2015 at 12:53:11PM +0000, David Laight wrote:
>
> Given the lock is only used for insert and delete, I'm also not at
> all clear why you allocate 128 locks per cpu for very large tables.
> With the locks in their own array I don't think there can be 'false
> sharing', the worst than can happen is two cpus spinning on locks
> in the same cache line.

Personally I'm totally against Bucket locks.  If you have a
scalability problem you really need to solve them at a higher
level, e.g., multiqueue transmission in networking.  Bucket
locks are simply kicking the can down the road, it'll come back
to bite you sooner or later in terms of scalability.

So no I'm not going to waste my time fixing up something that
in my opinion shouldn't even exist :)

Cheers,
Eric Dumazet March 24, 2015, 4:24 a.m. UTC | #4
On Tue, 2015-03-24 at 14:09 +1100, Herbert Xu wrote:
> On Mon, Mar 23, 2015 at 12:53:11PM +0000, David Laight wrote:
> >
> > Given the lock is only used for insert and delete, I'm also not at
> > all clear why you allocate 128 locks per cpu for very large tables.
> > With the locks in their own array I don't think there can be 'false
> > sharing', the worst than can happen is two cpus spinning on locks
> > in the same cache line.
> 
> Personally I'm totally against Bucket locks.  If you have a
> scalability problem you really need to solve them at a higher
> level, e.g., multiqueue transmission in networking.  Bucket
> locks are simply kicking the can down the road, it'll come back
> to bite you sooner or later in terms of scalability.


Well, keep in mind a lock can be very big with LOCKDEP.

One lock per bucket is totally overkill.

A hash array of locks is a good compromise.

128 locks per cpu is also a good compromise, because one cache line can
hold 16 locks.

Number of locks has nothing to do with number of buckets,
unless you have unlimited memory and can afford one lock per bucket,
and don't care of memory thrashing when dumping whole hash table.

I thought this kind of solution was well understood among network
developers.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index c284099..59078ed 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -58,7 +58,8 @@  EXPORT_SYMBOL_GPL(lockdep_rht_bucket_is_held);
 #endif
 
 
-static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl)
+static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl,
+			      gfp_t gfp)
 {
 	unsigned int i, size;
 #if defined(CONFIG_PROVE_LOCKING)
@@ -75,12 +76,13 @@  static int alloc_bucket_locks(struct rhashtable *ht, struct bucket_table *tbl)
 
 	if (sizeof(spinlock_t) != 0) {
 #ifdef CONFIG_NUMA
-		if (size * sizeof(spinlock_t) > PAGE_SIZE)
+		if (size * sizeof(spinlock_t) > PAGE_SIZE &&
+		    gfp == GFP_KERNEL)
 			tbl->locks = vmalloc(size * sizeof(spinlock_t));
 		else
 #endif
 		tbl->locks = kmalloc_array(size, sizeof(spinlock_t),
-					   GFP_KERNEL);
+					   gfp);
 		if (!tbl->locks)
 			return -ENOMEM;
 		for (i = 0; i < size; i++)
@@ -105,15 +107,17 @@  static void bucket_table_free_rcu(struct rcu_head *head)
 }
 
 static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
-					       size_t nbuckets)
+					       size_t nbuckets,
+					       gfp_t gfp)
 {
 	struct bucket_table *tbl = NULL;
 	size_t size;
 	int i;
 
 	size = sizeof(*tbl) + nbuckets * sizeof(tbl->buckets[0]);
-	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER))
-		tbl = kzalloc(size, GFP_KERNEL | __GFP_NOWARN | __GFP_NORETRY);
+	if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER) ||
+	    gfp != GFP_KERNEL)
+		tbl = kzalloc(size, gfp | __GFP_NOWARN | __GFP_NORETRY);
 	if (tbl == NULL)
 		tbl = vzalloc(size);
 	if (tbl == NULL)
@@ -121,7 +125,7 @@  static struct bucket_table *bucket_table_alloc(struct rhashtable *ht,
 
 	tbl->size = nbuckets;
 
-	if (alloc_bucket_locks(ht, tbl) < 0) {
+	if (alloc_bucket_locks(ht, tbl, gfp) < 0) {
 		bucket_table_free(tbl);
 		return NULL;
 	}
@@ -273,7 +277,7 @@  static int rhashtable_expand(struct rhashtable *ht)
 
 	old_tbl = rhashtable_last_table(ht, old_tbl);
 
-	new_tbl = bucket_table_alloc(ht, old_tbl->size * 2);
+	new_tbl = bucket_table_alloc(ht, old_tbl->size * 2, GFP_KERNEL);
 	if (new_tbl == NULL)
 		return -ENOMEM;
 
@@ -304,7 +308,7 @@  int rhashtable_shrink(struct rhashtable *ht)
 	if (size < ht->p.min_size)
 		size = ht->p.min_size;
 
-	new_tbl = bucket_table_alloc(ht, size);
+	new_tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
 	if (new_tbl == NULL)
 		return -ENOMEM;
 
@@ -678,7 +682,7 @@  int rhashtable_init(struct rhashtable *ht,
 		}
 	}
 
-	tbl = bucket_table_alloc(ht, size);
+	tbl = bucket_table_alloc(ht, size, GFP_KERNEL);
 	if (tbl == NULL)
 		return -ENOMEM;