diff mbox

[v2,net-next,2/3] rhashtable: Add a function for in order insertion and lookup in buckets

Message ID 1436917549-3666965-3-git-send-email-tom@herbertland.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Tom Herbert July 14, 2015, 11:45 p.m. UTC
The obj_orderfn function may be specified in the parameters for a
rhashtable. When inserting an element this function is used to order
objects in a bucket list (greatest to least ordering value).This
allows entries to have wild card fields, where entries with
more specific information match are placed first in the bucket.
When a lookup is done, the first match found will contain
the most specific match.

In order to maintain ordering guarantees during rehash, the
rhashtable_lookup_ordered_cmpfn was added. This function will check
future tables for matches that would have a greater insertion order
than a match found in an older table.

Signed-off-by: Tom Herbert <tom@herbertland.com>
---
 include/linux/rhashtable.h | 108 +++++++++++++++++++++++++++++++++++++++++++--
 lib/rhashtable.c           |  20 ++++-----
 2 files changed, 115 insertions(+), 13 deletions(-)

Comments

Herbert Xu July 15, 2015, 5:54 a.m. UTC | #1
On Tue, Jul 14, 2015 at 04:45:48PM -0700, Tom Herbert wrote:
> The obj_orderfn function may be specified in the parameters for a
> rhashtable. When inserting an element this function is used to order
> objects in a bucket list (greatest to least ordering value).This
> allows entries to have wild card fields, where entries with
> more specific information match are placed first in the bucket.
> When a lookup is done, the first match found will contain
> the most specific match.
> 
> In order to maintain ordering guarantees during rehash, the
> rhashtable_lookup_ordered_cmpfn was added. This function will check
> future tables for matches that would have a greater insertion order
> than a match found in an older table.
> 
> Signed-off-by: Tom Herbert <tom@herbertland.com>

There is another problem with exposing rhashtable directly to these
duplicate entries.  It breaks the logic on when to resize.  In the
worst case on a server with a single port you can end up with a
large hash table where everything is stored in a single chain.

Granted you can work around this by teaching rhashtable to count
identical entries as a single entry.  But I really think it's much
easier to just have this logic sit on top of rhashtable instead of
inside it.

The memory cost is merely 8 bytes per local port, is it really too
much?

Cheers,
Tom Herbert July 15, 2015, 7:46 p.m. UTC | #2
On Tue, Jul 14, 2015 at 10:54 PM, Herbert Xu
<herbert@gondor.apana.org.au> wrote:
> On Tue, Jul 14, 2015 at 04:45:48PM -0700, Tom Herbert wrote:
>> The obj_orderfn function may be specified in the parameters for a
>> rhashtable. When inserting an element this function is used to order
>> objects in a bucket list (greatest to least ordering value).This
>> allows entries to have wild card fields, where entries with
>> more specific information match are placed first in the bucket.
>> When a lookup is done, the first match found will contain
>> the most specific match.
>>
>> In order to maintain ordering guarantees during rehash, the
>> rhashtable_lookup_ordered_cmpfn was added. This function will check
>> future tables for matches that would have a greater insertion order
>> than a match found in an older table.
>>
>> Signed-off-by: Tom Herbert <tom@herbertland.com>
>
> There is another problem with exposing rhashtable directly to these
> duplicate entries.  It breaks the logic on when to resize.  In the
> worst case on a server with a single port you can end up with a
> large hash table where everything is stored in a single chain.
>
> Granted you can work around this by teaching rhashtable to count
> identical entries as a single entry.  But I really think it's much
> easier to just have this logic sit on top of rhashtable instead of
> inside it.
>
> The memory cost is merely 8 bytes per local port, is it really too
> much?
>
Okay, it looks like there is already an additional hlist_node in
skc_common that can be used for a secondary hash. It's conceivable
this can be generalized and used in the TCP listeners also in
combination with rhashtable.

Tom

> Cheers,
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Graf July 17, 2015, 12:11 p.m. UTC | #3
On 07/15/15 at 12:46pm, Tom Herbert wrote:
> On Tue, Jul 14, 2015 at 10:54 PM, Herbert Xu
> > The memory cost is merely 8 bytes per local port, is it really too
> > much?
> >
> Okay, it looks like there is already an additional hlist_node in
> skc_common that can be used for a secondary hash. It's conceivable
> this can be generalized and used in the TCP listeners also in
> combination with rhashtable.

Are you dropping this series entirely then?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 78a4e9b..651b5226 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -92,6 +92,7 @@  typedef u32 (*rht_hashfn_t)(const void *data, u32 len, u32 seed);
 typedef u32 (*rht_obj_hashfn_t)(const void *data, u32 len, u32 seed);
 typedef int (*rht_obj_cmpfn_t)(struct rhashtable_compare_arg *arg,
 			       const void *obj);
+typedef int (*rht_obj_orderfn_t)(const void *obj);
 
 struct rhashtable;
 
@@ -111,6 +112,7 @@  struct rhashtable;
  * @hashfn: Hash function (default: jhash2 if !(key_len % 4), or jhash)
  * @obj_hashfn: Function to hash object
  * @obj_cmpfn: Function to compare key with object
+ * @obj_orderfn: Function to order an object for in-order insertion
  */
 struct rhashtable_params {
 	size_t			nelem_hint;
@@ -127,6 +129,7 @@  struct rhashtable_params {
 	rht_hashfn_t		hashfn;
 	rht_obj_hashfn_t	obj_hashfn;
 	rht_obj_cmpfn_t		obj_cmpfn;
+	rht_obj_orderfn_t	obj_orderfn;
 };
 
 /**
@@ -570,6 +573,104 @@  static inline void *rhashtable_lookup_fast(
 					    params.obj_cmpfn);
 }
 
+/**
+ * rhashtable_lookup_ordered_cmpfn - search table that uses ordered insertion
+ * @ht:		hash table
+ * @key:	the pointer to the key
+ * @params:	hash table parameters
+ * @obj_cmpfn:	compare function
+ *
+ * Computes the hash value for the key and traverses the bucket chain looking
+ * for a entry that matches the key. The bucket chains are assumed to be
+ * ordered. When a match is found this is recorded as a candidate. The
+ * search proceeds to future tables (rehash is in progress) to check is there
+ * is match which which have greater ordering precedence.
+ *
+ * Returns the first entry on which the compare function returned true adhering
+ * to ordering guarantee.
+ */
+static inline void *rhashtable_lookup_ordered_cmpfn(
+	struct rhashtable *ht, const void *key,
+	const struct rhashtable_params params,
+	rht_obj_cmpfn_t obj_cmpfn)
+{
+	struct rhashtable_compare_arg arg = {
+		.ht = ht,
+		.key = key,
+	};
+	const struct bucket_table *tbl;
+	struct rhash_head *he, *result = NULL;
+	unsigned int hash;
+
+	rcu_read_lock();
+
+	tbl = rht_dereference_rcu(ht->tbl, ht);
+restart:
+	hash = rht_key_hashfn(ht, tbl, key, params);
+	rht_for_each_rcu(he, tbl, hash) {
+		if (obj_cmpfn ?
+		    obj_cmpfn(&arg, rht_obj(ht, he)) :
+		    rhashtable_compare(&arg, rht_obj(ht, he)))
+			continue;
+		if (unlikely(result)) {
+			if (params.obj_orderfn(he) > params.obj_orderfn(result))
+				result = he;
+		} else {
+			result = he;
+		}
+		break;
+	}
+
+	/* Ensure we see any new tables. */
+	smp_rmb();
+
+	tbl = rht_dereference_rcu(tbl->future_tbl, ht);
+	if (unlikely(tbl))
+		goto restart;
+	rcu_read_unlock();
+
+	return result ? rht_obj(ht, result) : NULL;
+}
+
+static inline void *rhashtable_lookup_ordered(
+	struct rhashtable *ht, const void *key,
+	const struct rhashtable_params params)
+{
+	return rhashtable_lookup_ordered_cmpfn(ht, key, params,
+					      params.obj_cmpfn);
+}
+
+struct rht_insert_pos {
+	struct rhash_head __rcu *head;
+	struct rhash_head __rcu **pos;
+};
+
+static inline void rht_insert_pos(struct rhashtable *ht,
+				  struct rhash_head *obj,
+				  struct bucket_table *tbl,
+				  unsigned int hash,
+				  struct rht_insert_pos *ipos)
+{
+	struct rhash_head __rcu *head, **pos;
+
+	pos = &tbl->buckets[hash];
+
+	if (ht->p.obj_orderfn) {
+		int obj_order = ht->p.obj_orderfn(rht_obj(ht, obj));
+
+		rht_for_each_rcu(head, tbl, hash) {
+			if (ht->p.obj_orderfn(rht_obj(ht, head)) <= obj_order)
+				break;
+			pos = &head->next;
+		}
+	} else {
+		head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash);
+	}
+
+	ipos->head = head;
+	ipos->pos = pos;
+}
+
 /* Internal function, please use rhashtable_insert_fast() instead */
 static inline int __rhashtable_insert_fast(
 	struct rhashtable *ht, const void *key, struct rhash_head *obj,
@@ -581,6 +682,7 @@  static inline int __rhashtable_insert_fast(
 	};
 	struct bucket_table *tbl, *new_tbl;
 	struct rhash_head *head;
+	struct rht_insert_pos ipos;
 	spinlock_t *lock;
 	unsigned int elasticity;
 	unsigned int hash;
@@ -643,11 +745,11 @@  slow_path:
 
 	err = 0;
 
-	head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash);
+	rht_insert_pos(ht, obj, tbl, hash, &ipos);
 
-	RCU_INIT_POINTER(obj->next, head);
+	RCU_INIT_POINTER(obj->next, ipos.head);
 
-	rcu_assign_pointer(tbl->buckets[hash], obj);
+	rcu_assign_pointer(*ipos.pos, obj);
 
 	atomic_inc(&ht->nelems);
 	if (rht_grow_above_75(ht, tbl))
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index cc0c697..0e09524 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -162,9 +162,10 @@  static int rhashtable_rehash_one(struct rhashtable *ht, unsigned int old_hash)
 		rht_dereference_rcu(old_tbl->future_tbl, ht));
 	struct rhash_head __rcu **pprev = &old_tbl->buckets[old_hash];
 	int err = -ENOENT;
-	struct rhash_head *head, *next, *entry;
+	struct rhash_head *next, *entry;
 	spinlock_t *new_bucket_lock;
 	unsigned int new_hash;
+	struct rht_insert_pos ipos;
 
 	rht_for_each(entry, old_tbl, old_hash) {
 		err = 0;
@@ -184,15 +185,14 @@  static int rhashtable_rehash_one(struct rhashtable *ht, unsigned int old_hash)
 	new_bucket_lock = rht_bucket_lock(new_tbl, new_hash);
 
 	spin_lock_nested(new_bucket_lock, SINGLE_DEPTH_NESTING);
-	head = rht_dereference_bucket(new_tbl->buckets[new_hash],
-				      new_tbl, new_hash);
+	rht_insert_pos(ht, entry, new_tbl, new_hash, &ipos);
 
-	if (rht_is_a_nulls(head))
+	if (rht_is_a_nulls(ipos.head))
 		INIT_RHT_NULLS_HEAD(entry->next, ht, new_hash);
 	else
-		RCU_INIT_POINTER(entry->next, head);
+		RCU_INIT_POINTER(entry->next, ipos.head);
 
-	rcu_assign_pointer(new_tbl->buckets[new_hash], entry);
+	rcu_assign_pointer(*ipos.pos, entry);
 	spin_unlock(new_bucket_lock);
 
 	rcu_assign_pointer(*pprev, next);
@@ -436,7 +436,7 @@  int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
 			   struct rhash_head *obj,
 			   struct bucket_table *tbl)
 {
-	struct rhash_head *head;
+	struct rht_insert_pos ipos;
 	unsigned int hash;
 	int err;
 
@@ -459,11 +459,11 @@  int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
 
 	err = 0;
 
-	head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash);
+	rht_insert_pos(ht, obj, tbl, hash, &ipos);
 
-	RCU_INIT_POINTER(obj->next, head);
+	RCU_INIT_POINTER(obj->next, ipos.head);
 
-	rcu_assign_pointer(tbl->buckets[hash], obj);
+	rcu_assign_pointer(*ipos.pos, obj);
 
 	atomic_inc(&ht->nelems);