diff mbox

[v3,9/9] rhashtable: Add immediate rehash during insertion

Message ID E1Ya2kO-0004JD-Om@gondolin.me.apana.org.au
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Herbert Xu March 23, 2015, 1:50 p.m. UTC
This patch reintroduces immediate rehash during insertion.  If
we find during insertion that the table is full or the chain
length exceeds a set limit (currently 16 but may be disabled
with insecure_elasticity) then we will force an immediate rehash.
The rehash will contain an expansion if the table utilisation
exceeds 75%.

If this rehash fails then the insertion will fail.  Otherwise the
insertion will be reattempted in the new hash table.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---

 include/linux/rhashtable.h |   42 +++++++++++++++++++++++++++----
 lib/rhashtable.c           |   60 ++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 96 insertions(+), 6 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Thomas Graf March 23, 2015, 4:50 p.m. UTC | #1
On 03/24/15 at 12:50am, Herbert Xu wrote:
> @@ -116,6 +117,7 @@ struct rhashtable_params {
>  	unsigned int		max_size;
>  	unsigned int		min_size;
>  	u32			nulls_base;
> +	bool			insecure_elasticity;
>  	size_t			locks_mul;
>  	rht_hashfn_t		hashfn;
>  	rht_obj_hashfn_t	obj_hashfn;

First of all, love the naming of this variable ;-)

> @@ -127,6 +129,7 @@ struct rhashtable_params {
>   * @tbl: Bucket table
>   * @nelems: Number of elements in table
>   * @key_len: Key length for hashfn
> + * @elasticity: Maximum chain length before rehash
>   * @p: Configuration parameters
>   * @run_work: Deferred worker to expand/shrink asynchronously
>   * @mutex: Mutex to protect current/future table swapping

2nd: Seems like you rely on an underflow to allow to "disable"
the elasticity limit. Fair enough, but it would be great to
have the limit configurable as well.

How about making elasticity a signed int, default to 16 if user
specifies 0 and require it to be set to -1 (through a define)
to actually disable the behaviour. That would avoid requiring
two variables to implement this and makes the limit configurable
at the same time.

Otherwise this looks good to me.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight March 23, 2015, 4:54 p.m. UTC | #2
From: Thomas Graf
> On 03/24/15 at 12:50am, Herbert Xu wrote:
> > @@ -116,6 +117,7 @@ struct rhashtable_params {
> >  	unsigned int		max_size;
> >  	unsigned int		min_size;
> >  	u32			nulls_base;
> > +	bool			insecure_elasticity;
> >  	size_t			locks_mul;
> >  	rht_hashfn_t		hashfn;
> >  	rht_obj_hashfn_t	obj_hashfn;
> 
> First of all, love the naming of this variable ;-)
> 
> > @@ -127,6 +129,7 @@ struct rhashtable_params {
> >   * @tbl: Bucket table
> >   * @nelems: Number of elements in table
> >   * @key_len: Key length for hashfn
> > + * @elasticity: Maximum chain length before rehash
> >   * @p: Configuration parameters
> >   * @run_work: Deferred worker to expand/shrink asynchronously
> >   * @mutex: Mutex to protect current/future table swapping
> 
> 2nd: Seems like you rely on an underflow to allow to "disable"
> the elasticity limit. Fair enough, but it would be great to
> have the limit configurable as well.
> 
> How about making elasticity a signed int, default to 16 if user
> specifies 0 and require it to be set to -1 (through a define)
> to actually disable the behaviour. That would avoid requiring
> two variables to implement this and makes the limit configurable
> at the same time.

Or set the value to MAXINT to disable it.
That way you don't even need two tests in the code.

	David

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 23, 2015, 9:44 p.m. UTC | #3
On Mon, Mar 23, 2015 at 04:50:33PM +0000, Thomas Graf wrote:
>
> 2nd: Seems like you rely on an underflow to allow to "disable"
> the elasticity limit. Fair enough, but it would be great to
> have the limit configurable as well.
> 
> How about making elasticity a signed int, default to 16 if user
> specifies 0 and require it to be set to -1 (through a define)
> to actually disable the behaviour. That would avoid requiring
> two variables to implement this and makes the limit configurable
> at the same time.

I specifically made it this way because I don't think the users
should be touching the actual limit.  This is something that you
always want to enable unless you're in the situation of netfilter
where you don't care.

If you did care then 16 is sort of intrinsic to the 32-bit hash
that we're using.  Going below doesn't make much sense because
you may run into false warnings due to double rehashes (that's
how I discovered 4 was uesless, very quickly :) Remember the
rehash is only there to detect pathological cases where people
are actively attacking us.  Otherwise the 100% utilisation check
will kick in.

Going above 16 means that you're hashing multiple objects with
the same key.  Then you'd want to disable this altogether.

The rhashtable already has too many knobs and I don't want to
add anything unnecessary to rhashtable_params.

Cheers,
Thomas Graf March 23, 2015, 9:56 p.m. UTC | #4
On 03/24/15 at 08:44am, Herbert Xu wrote:
> I specifically made it this way because I don't think the users
> should be touching the actual limit.  This is something that you
> always want to enable unless you're in the situation of netfilter
> where you don't care.
> 
> If you did care then 16 is sort of intrinsic to the 32-bit hash
> that we're using.  Going below doesn't make much sense because
> you may run into false warnings due to double rehashes (that's
> how I discovered 4 was uesless, very quickly :) Remember the
> rehash is only there to detect pathological cases where people
> are actively attacking us.  Otherwise the 100% utilisation check
> will kick in.
> 
> Going above 16 means that you're hashing multiple objects with
> the same key.  Then you'd want to disable this altogether.
> 
> The rhashtable already has too many knobs and I don't want to
> add anything unnecessary to rhashtable_params.

OK. I can certinaly live with this. You are certinaly right that
simplicity isn't a bad move here. Thanks for being patience and
answering all the questions.

Acked-by: Thomas Graf <tgraf@suug.ch>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu March 23, 2015, 10:15 p.m. UTC | #5
On Mon, Mar 23, 2015 at 09:56:32PM +0000, Thomas Graf wrote:
>
> OK. I can certinaly live with this. You are certinaly right that
> simplicity isn't a bad move here. Thanks for being patience and
> answering all the questions.
> 
> Acked-by: Thomas Graf <tgraf@suug.ch>

Thank you for your reviewing and critiques!
diff mbox

Patch

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 97fa904..57af1e9 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -103,6 +103,7 @@  struct rhashtable;
  * @max_size: Maximum size while expanding
  * @min_size: Minimum size while shrinking
  * @nulls_base: Base value to generate nulls marker
+ * @insecure_elasticity: Set to true to disable chain length checks
  * @locks_mul: Number of bucket locks to allocate per cpu (default: 128)
  * @hashfn: Hash function (default: jhash2 if !(key_len % 4), or jhash)
  * @obj_hashfn: Function to hash object
@@ -116,6 +117,7 @@  struct rhashtable_params {
 	unsigned int		max_size;
 	unsigned int		min_size;
 	u32			nulls_base;
+	bool			insecure_elasticity;
 	size_t			locks_mul;
 	rht_hashfn_t		hashfn;
 	rht_obj_hashfn_t	obj_hashfn;
@@ -127,6 +129,7 @@  struct rhashtable_params {
  * @tbl: Bucket table
  * @nelems: Number of elements in table
  * @key_len: Key length for hashfn
+ * @elasticity: Maximum chain length before rehash
  * @p: Configuration parameters
  * @run_work: Deferred worker to expand/shrink asynchronously
  * @mutex: Mutex to protect current/future table swapping
@@ -137,6 +140,7 @@  struct rhashtable {
 	atomic_t			nelems;
 	bool                            being_destroyed;
 	unsigned int			key_len;
+	unsigned int			elasticity;
 	struct rhashtable_params	p;
 	struct work_struct		run_work;
 	struct mutex                    mutex;
@@ -266,6 +270,17 @@  static inline bool rht_shrink_below_30(const struct rhashtable *ht,
 	       tbl->size > ht->p.min_size;
 }
 
+/**
+ * rht_grow_above_100 - returns true if nelems > table-size
+ * @ht:		hash table
+ * @tbl:	current table
+ */
+static inline bool rht_grow_above_100(const struct rhashtable *ht,
+				      const struct bucket_table *tbl)
+{
+	return atomic_read(&ht->nelems) > tbl->size;
+}
+
 /* The bucket lock is selected based on the hash and protects mutations
  * on a group of hash buckets.
  *
@@ -307,6 +322,7 @@  int rhashtable_init(struct rhashtable *ht,
 int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
 			   struct rhash_head *obj,
 			   struct bucket_table *old_tbl);
+int rhashtable_insert_rehash(struct rhashtable *ht);
 
 int rhashtable_walk_init(struct rhashtable *ht, struct rhashtable_iter *iter);
 void rhashtable_walk_exit(struct rhashtable_iter *iter);
@@ -529,12 +545,14 @@  static inline int __rhashtable_insert_fast(
 		.ht = ht,
 		.key = key,
 	};
-	int err = -EEXIST;
 	struct bucket_table *tbl, *new_tbl;
 	struct rhash_head *head;
 	spinlock_t *lock;
+	unsigned elasticity;
 	unsigned hash;
+	int err;
 
+restart:
 	rcu_read_lock();
 
 	tbl = rht_dereference_rcu(ht->tbl, ht);
@@ -557,20 +575,34 @@  static inline int __rhashtable_insert_fast(
 	new_tbl = rht_dereference_rcu(tbl->future_tbl, ht);
 	if (unlikely(new_tbl)) {
 		err = rhashtable_insert_slow(ht, key, obj, new_tbl);
+		if (err == -EAGAIN)
+			goto slow_path;
 		goto out;
 	}
 
-	if (!key)
-		goto skip_lookup;
+	if (unlikely(rht_grow_above_100(ht, tbl))) {
+slow_path:
+		spin_unlock_bh(lock);
+		rcu_read_unlock();
+		err = rhashtable_insert_rehash(ht);
+		if (err)
+			return err;
+
+		goto restart;
+	}
 
+	err = -EEXIST;
+	elasticity = ht->elasticity;
 	rht_for_each(head, tbl, hash) {
-		if (unlikely(!(params.obj_cmpfn ?
+		if (key &&
+		    unlikely(!(params.obj_cmpfn ?
 			       params.obj_cmpfn(&arg, rht_obj(ht, head)) :
 			       rhashtable_compare(&arg, rht_obj(ht, head)))))
 			goto out;
+		if (!--elasticity)
+			goto slow_path;
 	}
 
-skip_lookup:
 	err = 0;
 
 	head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash);
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 220a11a..7686c1e 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -375,21 +375,76 @@  unlock:
 		schedule_work(&ht->run_work);
 }
 
+static bool rhashtable_check_elasticity(struct rhashtable *ht,
+					struct bucket_table *tbl,
+					unsigned hash)
+{
+	unsigned elasticity = ht->elasticity;
+	struct rhash_head *head;
+
+	rht_for_each(head, tbl, hash)
+		if (!--elasticity)
+			return true;
+
+	return false;
+}
+
+int rhashtable_insert_rehash(struct rhashtable *ht)
+{
+	struct bucket_table *old_tbl;
+	struct bucket_table *new_tbl;
+	struct bucket_table *tbl;
+	unsigned int size;
+	int err;
+
+	old_tbl = rht_dereference_rcu(ht->tbl, ht);
+	tbl = rhashtable_last_table(ht, old_tbl);
+
+	size = tbl->size;
+
+	if (rht_grow_above_75(ht, tbl))
+		size *= 2;
+	/* More than two rehashes (not resizes) detected. */
+	else if (WARN_ON(old_tbl != tbl && old_tbl->size == size))
+		return -EBUSY;
+
+	new_tbl = bucket_table_alloc(ht, size, GFP_ATOMIC);
+	if (new_tbl == NULL)
+		return -ENOMEM;
+
+	err = rhashtable_rehash_attach(ht, tbl, new_tbl);
+	if (err) {
+		bucket_table_free(new_tbl);
+		if (err == -EEXIST)
+			err = 0;
+	} else
+		schedule_work(&ht->run_work);
+
+	return err;
+}
+EXPORT_SYMBOL_GPL(rhashtable_insert_rehash);
+
 int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
 			   struct rhash_head *obj,
 			   struct bucket_table *tbl)
 {
 	struct rhash_head *head;
 	unsigned hash;
-	int err = -EEXIST;
+	int err;
 
 	tbl = rhashtable_last_table(ht, tbl);
 	hash = head_hashfn(ht, tbl, obj);
 	spin_lock_nested(rht_bucket_lock(tbl, hash), SINGLE_DEPTH_NESTING);
 
+	err = -EEXIST;
 	if (key && rhashtable_lookup_fast(ht, key, ht->p))
 		goto exit;
 
+	err = -EAGAIN;
+	if (rhashtable_check_elasticity(ht, tbl, hash) ||
+	    rht_grow_above_100(ht, tbl))
+		goto exit;
+
 	err = 0;
 
 	head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash);
@@ -678,6 +733,9 @@  int rhashtable_init(struct rhashtable *ht,
 
 	ht->p.min_size = max(ht->p.min_size, HASH_MIN_SIZE);
 
+	if (!params->insecure_elasticity)
+		ht->elasticity = 16;
+
 	if (params->locks_mul)
 		ht->p.locks_mul = roundup_pow_of_two(params->locks_mul);
 	else