diff mbox

[v2,01/16] hashtable: introduce a small and naive hashtable

Message ID 1345337550-24304-2-git-send-email-levinsasha928@gmail.com
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Sasha Levin Aug. 19, 2012, 12:52 a.m. UTC
This hashtable implementation is using hlist buckets to provide a simple
hashtable to prevent it from getting reimplemented all over the kernel.

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
---
 include/linux/hashtable.h |  284 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 284 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/hashtable.h

Comments

Mathieu Desnoyers Aug. 19, 2012, 1:16 p.m. UTC | #1
* Sasha Levin (levinsasha928@gmail.com) wrote:
> This hashtable implementation is using hlist buckets to provide a simple
> hashtable to prevent it from getting reimplemented all over the kernel.
> 
> Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
> ---
>  include/linux/hashtable.h |  284 +++++++++++++++++++++++++++++++++++++++++++++
[...]

Hi Sasha,

There are still a few API naming nits that I'd like to discuss:

> +
> +/**
> + * hash_for_each_size - iterate over a hashtable
> + * @name: hashtable to iterate
> + * @bits: bit count of hashing function of the hashtable
> + * @bkt: integer to use as bucket loop cursor
> + * @node: the &struct list_head to use as a loop cursor for each bucket
> + * @obj: the type * to use as a loop cursor for each bucket
> + * @member: the name of the hlist_node within the struct
> + */
> +#define hash_for_each_size(name, bits, bkt, node, obj, member)			\

What is the meaning of "for each size" ?

By looking at the implementation, I see that it takes an extra "bits"
argument to specify the key width.

But in the other patches of this patchset, I cannot find a single user
of the "*_size" API. If you do not typically expect users to specify
this parameter by hand (thanks to use of HASH_BITS(name) in for_each
functions that do not take the bits parameter), I would recommend to
only expose hash_for_each() and similar defines, but not the *_size
variants.

So I recommend merging hash_for_each_size into hash_for_each (and
doing similarly for other *_size variants). On the plus side, it will
cut down the number of for_each macros from 12 down to 6, which is more
reasonable.


> +	for (bkt = 0; bkt < HASH_SIZE(bits); bkt++)				\
> +		hlist_for_each_entry(obj, node, &name[bkt], member)
> +
> +/**
> + * hash_for_each - iterate over a hashtable
> + * @name: hashtable to iterate
> + * @bkt: integer to use as bucket loop cursor
> + * @node: the &struct list_head to use as a loop cursor for each bucket
> + * @obj: the type * to use as a loop cursor for each bucket
> + * @member: the name of the hlist_node within the struct
> + */
> +#define hash_for_each(name, bkt, node, obj, member)				\
> +	hash_for_each_size(name, HASH_BITS(name), bkt, node, obj, member)
> +

[...]

> +/**
> + * hash_for_each_possible - iterate over all possible objects for a given key
> + * @name: hashtable to iterate
> + * @obj: the type * to use as a loop cursor for each bucket
> + * @bits: bit count of hashing function of the hashtable
> + * @node: the &struct list_head to use as a loop cursor for each bucket
> + * @member: the name of the hlist_node within the struct
> + * @key: the key of the objects to iterate over
> + */
> +#define hash_for_each_possible_size(name, obj, bits, node, member, key)		\
> +	hlist_for_each_entry(obj, node,	&name[hash_min(key, bits)], member)

Second point: "for_each_possible" does not express the iteration scope.
Citing WordNet: "possible adj 1: capable of happening or existing;" --
which has nothing to do with iteration on duplicate keys within a hash
table.

I would recommend to rename "possible" to "duplicate", e.g.:

  hash_for_each_duplicate()

which clearly says what is the scope of this iteration: duplicate keys.

Thanks,

Mathieu
Mathieu Desnoyers Aug. 19, 2012, 2:16 p.m. UTC | #2
* Mathieu Desnoyers (mathieu.desnoyers@efficios.com) wrote:
> * Sasha Levin (levinsasha928@gmail.com) wrote:
[...]
> > +/**
> > + * hash_for_each_possible - iterate over all possible objects for a given key
> > + * @name: hashtable to iterate
> > + * @obj: the type * to use as a loop cursor for each bucket
> > + * @bits: bit count of hashing function of the hashtable
> > + * @node: the &struct list_head to use as a loop cursor for each bucket
> > + * @member: the name of the hlist_node within the struct
> > + * @key: the key of the objects to iterate over
> > + */
> > +#define hash_for_each_possible_size(name, obj, bits, node, member, key)		\
> > +	hlist_for_each_entry(obj, node,	&name[hash_min(key, bits)], member)
> 
> Second point: "for_each_possible" does not express the iteration scope.
> Citing WordNet: "possible adj 1: capable of happening or existing;" --
> which has nothing to do with iteration on duplicate keys within a hash
> table.
> 
> I would recommend to rename "possible" to "duplicate", e.g.:
> 
>   hash_for_each_duplicate()
> 
> which clearly says what is the scope of this iteration: duplicate keys.

OK, about this part: I now see that you iterate over all objects within
the same hash chain. I guess the description "iterate over all possible
objects for a given key" is misleading: it's not all objects with a
given key, but rather all objects hashing to the same bucket.

I understand that you don't want to build knowledge of the key
comparison function in the iterator (which makes sense for a simple hash
table).

By the way, the comment "@obj: the type * to use as a loop cursor for
each bucket" is also misleading: it's a loop cursor for each entry,
since you iterate on all nodes within single bucket. Same for "@node:
the &struct list_head to use as a loop cursor for each bucket".

So with these documentation changes applied, hash_for_each_possible
starts to make more sense, because it refers to entries that can
_possibly_ be a match (or not). Other options would be
hash_chain_for_each() or hash_bucket_for_each().

Thanks,

Mathieu
Sasha Levin Aug. 19, 2012, 4:08 p.m. UTC | #3
On 08/19/2012 03:16 PM, Mathieu Desnoyers wrote:
> * Sasha Levin (levinsasha928@gmail.com) wrote:
>> This hashtable implementation is using hlist buckets to provide a simple
>> hashtable to prevent it from getting reimplemented all over the kernel.
>>
>> Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
>> ---
>>  include/linux/hashtable.h |  284 +++++++++++++++++++++++++++++++++++++++++++++
> [...]
> 
> Hi Sasha,
> 
> There are still a few API naming nits that I'd like to discuss:
> 
>> +
>> +/**
>> + * hash_for_each_size - iterate over a hashtable
>> + * @name: hashtable to iterate
>> + * @bits: bit count of hashing function of the hashtable
>> + * @bkt: integer to use as bucket loop cursor
>> + * @node: the &struct list_head to use as a loop cursor for each bucket
>> + * @obj: the type * to use as a loop cursor for each bucket
>> + * @member: the name of the hlist_node within the struct
>> + */
>> +#define hash_for_each_size(name, bits, bkt, node, obj, member)			\
> 
> What is the meaning of "for each size" ?
> 
> By looking at the implementation, I see that it takes an extra "bits"
> argument to specify the key width.
> 
> But in the other patches of this patchset, I cannot find a single user
> of the "*_size" API. If you do not typically expect users to specify
> this parameter by hand (thanks to use of HASH_BITS(name) in for_each
> functions that do not take the bits parameter), I would recommend to
> only expose hash_for_each() and similar defines, but n I'd ot the *_size
> variants.
> 
> So I recommend merging hash_for_each_size into hash_for_each (and
> doing similarly for other *_size variants). On the plus side, it will
> cut down the number of for_each macros from 12 down to 6, whiy och is more
> reasonable.

This is actually how the hashtable API was looking in the first place - without
the _size functions.

The story here is that when I've introduced the original API which used macro
magic to work out the size, one of the first comments was that there are several
dynamically allocated hashtables around the kernels, and all this macro magic
wouldn't work.

I've grepped around the code and indeed saw several places which k*alloc/vmalloc
their hashtable, so I agreed with that and happily went ahead to extend the API
to have _size functions.

When I started converting more kernel code to use this new API, I also converted
two modules which kmalloced the hashtable, but instead of using the _size API I
ended up removing the allocation completely because it was unnecessary and
wasteful. And this is why you don't see _size being used anywhere in any of the
patches.

Looking at the kernel code again, I see several places where removal of dynamic
allocation won't work (see 'new_tl_hash' in drivers/block/drbd/drbd_nl.c for
example).

So I'd rather leave it in at least until we finish converting, as I see several
places which will definitely need it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sasha Levin Aug. 19, 2012, 4:17 p.m. UTC | #4
On 08/19/2012 04:16 PM, Mathieu Desnoyers wrote:
> * Mathieu Desnoyers (mathieu.desnoyers@efficios.com) wrote:
>> * Sasha Levin (levinsasha928@gmail.com) wrote:
> [...]
>>> +/**
>>> + * hash_for_each_possible - iterate over all possible objects for a given key
>>> + * @name: hashtable to iterate
>>> + * @obj: the type * to use as a loop cursor for each bucket
>>> + * @bits: bit count of hashing function of the hashtable
>>> + * @node: the &struct list_head to use as a loop cursor for each bucket
>>> + * @member: the name of the hlist_node within the struct
>>> + * @key: the key of the objects to iterate over
>>> + */
>>> +#define hash_for_each_possible_size(name, obj, bits, node, member, key)		\
>>> +	hlist_for_each_entry(obj, node,	&name[hash_min(key, bits)], member)
>>
>> Second point: "for_each_possible" does not express the iteration scope.
>> Citing WordNet: "possible adj 1: capable of happening or existing;" --
>> which has nothing to do with iteration on duplicate keys within a hash
>> table.
>>
>> I would recommend to rename "possible" to "duplicate", e.g.:
>>
>>   hash_for_each_duplicate()
>>
>> which clearly says what is the scope of this iteration: duplicate keys.
> 
> OK, about this part: I now see that you iterate over all objects within
> the same hash chain. I guess the description "iterate over all possible
> objects for a given key" is misleading: it's not all objects with a
> given key, but rather all objects hashing to the same bucket.
> 
> I understand that you don't want to build knowledge of the key
> comparison function in the iterator (which makes sense for a simple hash
> table).
> 
> By the way, the comment "@obj: the type * to use as a loop cursor for
> each bucket" is also misleading: it's a loop cursor for each entry,
> since you iterate on all nodes within single bucket. Same for "@node:
> the &struct list_head to use as a loop cursor for each bucket".
> 
> So with these documentation changes applied, hash_for_each_possible
> starts to make more sense, because it refers to entries that can
> _possibly_ be a match (or not). Other options would be
> hash_chain_for_each() or hash_bucket_for_each().

I'd rather avoid starting to use chain/bucket since they're not used anywhere
else (I've tried keeping internal hashing/bucketing opaque to the user).

Otherwise makes sense, I'll improve the documentation as suggested. Thanks!

> Thanks,
> 
> Mathieu
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/hashtable.h b/include/linux/hashtable.h
new file mode 100644
index 0000000..b18827a
--- /dev/null
+++ b/include/linux/hashtable.h
@@ -0,0 +1,284 @@ 
+/*
+ * Hash table implementation
+ * (C) 2012  Sasha Levin <levinsasha928@gmail.com>
+ */
+
+#ifndef _LINUX_HASHTABLE_H
+#define _LINUX_HASHTABLE_H
+
+#include <linux/list.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/hash.h>
+#include <linux/rculist.h>
+
+#define DEFINE_HASHTABLE(name, bits)					\
+	struct hlist_head name[HASH_SIZE(bits)];
+
+#define HASH_SIZE(bits) (1 << (bits))
+#define HASH_BITS(name) (ilog2(ARRAY_SIZE(name)))
+
+/* Use hash_32 when possible to allow for fast 32bit hashing in 64bit kernels. */
+#define hash_min(val, bits) ((sizeof(val)==4) ? hash_32((val), (bits)) : hash_long((val), (bits)))
+
+/**
+ * hash_init_size - initialize a hash table
+ * @hashtable: hashtable to be initialized
+ * @bits: bit count of hashing function
+ *
+ * Initializes a hash table with 2**bits buckets.
+ */
+static inline void hash_init_size(struct hlist_head *hashtable, int bits)
+{
+	int i;
+
+	for (i = 0; i < HASH_SIZE(bits); i++)
+		INIT_HLIST_HEAD(hashtable + i);
+}
+
+/**
+ * hash_init - initialize a hash table
+ * @hashtable: hashtable to be initialized
+ *
+ * Calculates the size of the hashtable from the given parameter, otherwise
+ * same as hash_init_size.
+ */
+#define hash_init(name) hash_init_size(name, HASH_BITS(name))
+
+/**
+ * hash_add_size - add an object to a hashtable
+ * @hashtable: hashtable to add to
+ * @bits: bit count used for hashing
+ * @node: the &struct hlist_node of the object to be added
+ * @key: the key of the object to be added
+ */
+#define hash_add_size(hashtable, bits, node, key)				\
+	hlist_add_head(node, &hashtable[hash_min(key, bits)]);
+
+/**
+ * hash_add - add an object to a hashtable
+ * @hashtable: hashtable to add to
+ * @node: the &struct hlist_node of the object to be added
+ * @key: the key of the object to be added
+ */
+#define hash_add(hashtable, node, key)						\
+	hash_add_size(hashtable, HASH_BITS(hashtable), node, key)
+
+/**
+ * hash_add_rcu_size - add an object to a rcu enabled hashtable
+ * @hashtable: hashtable to add to
+ * @bits: bit count used for hashing
+ * @node: the &struct hlist_node of the object to be added
+ * @key: the key of the object to be added
+ */
+#define hash_add_rcu_size(hashtable, bits, node, key)				\
+	hlist_add_head_rcu(node, &hashtable[hash_min(key, bits)]);
+
+/**
+ * hash_add_rcu - add an object to a rcu enabled hashtable
+ * @hashtable: hashtable to add to
+ * @node: the &struct hlist_node of the object to be added
+ * @key: the key of the object to be added
+ */
+#define hash_add_rcu(hashtable, node, key)					\
+	hash_add_rcu_size(hashtable, HASH_BITS(hashtable), node, key)
+
+/**
+ * hash_hashed - check whether an object is in any hashtable
+ * @node: the &struct hlist_node of the object to be checked
+ */
+#define hash_hashed(node) (!hlist_unhashed(node))
+
+/**
+ * hash_empty_size - check whether a hashtable is empty
+ * @hashtable: hashtable to check
+ * @bits: bit count used for hashing
+ */
+static inline bool hash_empty_size(struct hlist_head *hashtable, int bits)
+{
+	int i;
+
+	for (i = 0; i < HASH_SIZE(bits); i++)
+		if (!hlist_empty(&hashtable[i]))
+			return false;
+
+	return true;
+}
+
+/**
+ * hash_empty - check whether a hashtable is empty
+ * @hashtable: hashtable to check
+ */
+#define hash_empty(name) hash_empty_size(name, HASH_BITS(name))
+
+/**
+ * hash_del - remove an object from a hashtable
+ * @node: &struct hlist_node of the object to remove
+ */
+static inline void hash_del(struct hlist_node *node)
+{
+	hlist_del_init(node);
+}
+
+/**
+ * hash_del_rcu - remove an object from a rcu enabled hashtable
+ * @node: &struct hlist_node of the object to remove
+ */
+static inline void hash_del_rcu(struct hlist_node *node)
+{
+	hlist_del_init_rcu(node);
+}
+
+/**
+ * hash_for_each_size - iterate over a hashtable
+ * @name: hashtable to iterate
+ * @bits: bit count of hashing function of the hashtable
+ * @bkt: integer to use as bucket loop cursor
+ * @node: the &struct list_head to use as a loop cursor for each bucket
+ * @obj: the type * to use as a loop cursor for each bucket
+ * @member: the name of the hlist_node within the struct
+ */
+#define hash_for_each_size(name, bits, bkt, node, obj, member)			\
+	for (bkt = 0; bkt < HASH_SIZE(bits); bkt++)				\
+		hlist_for_each_entry(obj, node, &name[bkt], member)
+
+/**
+ * hash_for_each - iterate over a hashtable
+ * @name: hashtable to iterate
+ * @bkt: integer to use as bucket loop cursor
+ * @node: the &struct list_head to use as a loop cursor for each bucket
+ * @obj: the type * to use as a loop cursor for each bucket
+ * @member: the name of the hlist_node within the struct
+ */
+#define hash_for_each(name, bkt, node, obj, member)				\
+	hash_for_each_size(name, HASH_BITS(name), bkt, node, obj, member)
+
+/**
+ * hash_for_each_rcu_size - iterate over a rcu enabled hashtable
+ * @name: hashtable to iterate
+ * @bits: bit count of hashing function of the hashtable
+ * @bkt: integer to use as bucket loop cursor
+ * @node: the &struct list_head to use as a loop cursor for each bucket
+ * @obj: the type * to use as a loop cursor for each bucket
+ * @member: the name of the hlist_node within the struct
+ */
+#define hash_for_each_rcu_size(name, bits, bkt, node, obj, member)		\
+	for (bkt = 0; bkt < HASH_SIZE(bits); bkt++)				\
+		hlist_for_each_entry_rcu(obj, node, &name[bkt], member)
+
+/**
+ * hash_for_each_rcu - iterate over a rcu enabled hashtable
+ * @name: hashtable to iterate
+ * @bkt: integer to use as bucket loop cursor
+ * @node: the &struct list_head to use as a loop cursor for each bucket
+ * @obj: the type * to use as a loop cursor for each bucket
+ * @member: the name of the hlist_node within the struct
+ */
+#define hash_for_each_rcu(name, bkt, node, obj, member)				\
+	hash_for_each_rcu_size(name, HASH_BITS(name), bkt, node, obj, member)
+
+/**
+ * hash_for_each_safe_size - iterate over a hashtable safe against removal of
+ * hash entry
+ * @name: hashtable to iterate
+ * @bkt: integer to use as bucket loop cursor
+ * @node: the &struct list_head to use as a loop cursor for each bucket
+ * @tmp: another &struct hlist_node to use as temporary storage
+ * @obj: the type * to use as a loop cursor for each bucket
+ * @member: the name of the hlist_node within the struct
+ */
+#define hash_for_each_safe_size(name, bits, bkt, node, tmp, obj, member)	\
+	for (bkt = 0; bkt < HASH_SIZE(bits); bkt++)                     	\
+		hlist_for_each_entry_safe(obj, node, tmp, &name[bkt], member)
+
+/**
+ * hash_for_each_safe_size - iterate over a hashtable safe against removal of
+ * hash entry
+ * @name: hashtable to iterate
+ * @bkt: integer to use as bucket loop cursor
+ * @node: the &struct list_head to use as a loop cursor for each bucket
+ * @obj: the type * to use as a loop cursor for each bucket
+ * @member: the name of the hlist_node within the struct
+ */
+#define hash_for_each_safe(name, bkt, node, tmp, obj, member)			\
+	hash_for_each_safe_size(name, HASH_BITS(name), bkt, node,		\
+				tmp, obj, member)
+
+/**
+ * hash_for_each_possible - iterate over all possible objects for a given key
+ * @name: hashtable to iterate
+ * @obj: the type * to use as a loop cursor for each bucket
+ * @bits: bit count of hashing function of the hashtable
+ * @node: the &struct list_head to use as a loop cursor for each bucket
+ * @member: the name of the hlist_node within the struct
+ * @key: the key of the objects to iterate over
+ */
+#define hash_for_each_possible_size(name, obj, bits, node, member, key)		\
+	hlist_for_each_entry(obj, node,	&name[hash_min(key, bits)], member)
+
+/**
+ * hash_for_each_possible - iterate over all possible objects for a given key
+ * @name: hashtable to iterate
+ * @obj: the type * to use as a loop cursor for each bucket
+ * @bits: bit count of hashing function of the hashtable
+ * @node: the &struct list_head to use as a loop cursor for each bucket
+ * @member: the name of the hlist_node within the struct
+ * @key: the key of the objects to iterate over
+ */
+#define hash_for_each_possible(name, obj, node, member, key)			\
+	hash_for_each_possible_size(name, obj, HASH_BITS(name), node, member, key)
+
+/**
+ * hash_for_each_possible - iterate over all possible objects for a given key
+ * in a rcu enabled hashtable
+ * @name: hashtable to iterate
+ * @obj: the type * to use as a loop cursor for each bucket
+ * @bits: bit count of hashing function of the hashtable
+ * @node: the &struct list_head to use as a loop cursor for each bucket
+ * @member: the name of the hlist_node within the struct
+ * @key: the key of the objects to iterate over
+ */
+#define hash_for_each_possible_rcu_size(name, obj, bits, node, member, key)	\
+	hlist_for_each_entry_rcu(obj, node, &name[hash_min(key, bits)], member)
+
+/**
+ * hash_for_each_possible - iterate over all possible objects for a given key
+ * in a rcu enabled hashtable
+ * @name: hashtable to iterate
+ * @obj: the type * to use as a loop cursor for each bucket
+ * @node: the &struct list_head to use as a loop cursor for each bucket
+ * @member: the name of the hlist_node within the struct
+ * @key: the key of the objects to iterate over
+ */
+#define hash_for_each_possible_rcu(name, obj, node, member, key)		\
+	hash_for_each_possible_rcu_size(name, obj, HASH_BITS(name),		\
+					node, member, key)
+
+/**
+ * hash_for_each_possible - iterate over all possible objects for a given key
+ * safe against removal of hash entry
+ * @name: hashtable to iterate
+ * @obj: the type * to use as a loop cursor for each bucket
+ * @bits: bit count of hashing function of the hashtable
+ * @node: the &struct list_head to use as a loop cursor for each bucket
+ * @member: the name of the hlist_node within the struct
+ * @key: the key of the objects to iterate over
+ */
+#define hash_for_each_possible_safe_size(name, obj, bits, node, tmp, member, key)\
+	hlist_for_each_entry_safe(obj, node, tmp,				\
+		&name[hash_min(key, bits)], member)
+
+/**
+ * hash_for_each_possible - iterate over all possible objects for a given key
+ * safe against removal of hash entry
+ * @name: hashtable to iterate
+ * @obj: the type * to use as a loop cursor for each bucket
+ * @node: the &struct list_head to use as a loop cursor for each bucket
+ * @member: the name of the hlist_node within the struct
+ * @key: the key of the objects to iterate over
+ */
+#define hash_for_each_possible_safe(name, obj, node, tmp, member, key)		\
+	hash_for_each_possible_safe_size(name, obj, HASH_BITS(name),		\
+						node, tmp, member, key)
+
+#endif