diff mbox

Update jhash.h with the new version of Jenkins' hash

Message ID alpine.DEB.2.00.0902121006440.18739@blackhole.kfki.hu
State Superseded, archived
Delegated to: David Miller
Headers show

Commit Message

Jozsef Kadlecsik Feb. 12, 2009, 9:11 a.m. UTC
On Wed, 11 Feb 2009, wli@movementarian.org wrote:

> On Wed, Feb 11, 2009 at 11:19:20AM +0100, Jozsef Kadlecsik wrote:
> >  /* The golden ration: an arbitrary value */
> > -#define JHASH_GOLDEN_RATIO	0x9e3779b9
> > +#define JHASH_GOLDEN_RATIO	0xdeadbeef
> 
> The golden ratio is not an arbitrary value, nor is it a food ration.
> It is (sqrt(5)+1)/2. and the original value of JHASH_GOLDEN_RATIO
> approximated its reciprocal and/or fractional part in fixed point.
> 
> (double)0xdeadbeef/((unsigned long long)1 << 32) is a rather poor
> approximation of (sqrt(5)-1)/2. If this value which is not the
> golden ratio is arbitrary, calling it something different from
> JHASH_GOLDEN_RATIO would surprise the code's readers much less.

OK, here follows the patch with the arbitrary initial parameter renamed.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
The current jhash.h implements the lookup2() hash function by Bob Jenkins. 
However, lookup2() is outdated as Bob wrote a new hash function called 
lookup3(). The new hash function

- mixes better than lookup2(): it passes the check that every input bit 
  changes every output bit 50% of the time, while lookup2() failed it.
- performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster
  than lookup2() depending on the key length.

The patch replaces the lookup2() implementation of the 'jhash*' 
functions with that of lookup3().

You can read a longer comparison of the two and other hash functions at 
http://burtleburtle.net/bob/hash/doobs.html.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>


Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Patrick McHardy Feb. 12, 2009, 9:16 a.m. UTC | #1
Jozsef Kadlecsik wrote:
> -#define __jhash_mix(a, b, c) \
> +#define __rot(x,k) (((x)<<(k)) | ((x)>>(32-(k))))
> +
> +/* __jhash_mix - mix 3 32-bit values reversibly. */
> +#define __jhash_mix(a,b,c) \
> +{ \
> +  a -= c;  a ^= __rot(c, 4);  c += b; \
> +  b -= a;  b ^= __rot(a, 6);  a += c; \
> +  c -= b;  c ^= __rot(b, 8);  b += a; \
> +  a -= c;  a ^= __rot(c,16);  c += b; \
> +  b -= a;  b ^= __rot(a,19);  a += c; \
> +  c -= b;  c ^= __rot(b, 4);  b += a; \
> +}

include/linux/bitops.h includes multiple rot variants. rol32()
looks approriate.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kyle Moffett Feb. 12, 2009, 1:46 p.m. UTC | #2
On Thu, Feb 12, 2009 at 4:11 AM, Jozsef Kadlecsik
<kadlec@blackhole.kfki.hu> wrote:
> The current jhash.h implements the lookup2() hash function by Bob Jenkins.
> However, lookup2() is outdated as Bob wrote a new hash function called
> lookup3(). The new hash function
>
> - mixes better than lookup2(): it passes the check that every input bit
>  changes every output bit 50% of the time, while lookup2() failed it.
> - performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster
>  than lookup2() depending on the key length.

Well, there's another question which is not addressed by Bob Jenkins'
design docs:

Kernel code usually runs cache-cold, whereas Bob Jenkins did most of
his testing cache-hot in tight loops.  If you compile both lookup2 and
lookup3 with -Os and run them in a loop with a cache flush, how well
do they compare then?

Cheers,
Kyle Moffett
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman Feb. 12, 2009, 7:58 p.m. UTC | #3
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> writes:

> On Wed, 11 Feb 2009, wli@movementarian.org wrote:
>
>> On Wed, Feb 11, 2009 at 11:19:20AM +0100, Jozsef Kadlecsik wrote:
>> >  /* The golden ration: an arbitrary value */
>> > -#define JHASH_GOLDEN_RATIO	0x9e3779b9
>> > +#define JHASH_GOLDEN_RATIO	0xdeadbeef
>> 

0xdeadbeef is a really bad choice of an arbitrary value.  It is
already used in multiple places.  Only a few of which are currently
listed in linux/poison.h and if I happened to see that number in a
debug trace having to sort through all of the possible sources looks
like a major pain.

I don't really care what we call it but somehow 0xdeadbeef strikes me
as wrong and something that will make debugging harder.

Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jozsef Kadlecsik Feb. 17, 2009, 5:13 p.m. UTC | #4
Hi,

On Thu, 12 Feb 2009, Kyle Moffett wrote:

> On Thu, Feb 12, 2009 at 4:11 AM, Jozsef Kadlecsik
> <kadlec@blackhole.kfki.hu> wrote:
> > The current jhash.h implements the lookup2() hash function by Bob Jenkins.
> > However, lookup2() is outdated as Bob wrote a new hash function called
> > lookup3(). The new hash function
> >
> > - mixes better than lookup2(): it passes the check that every input bit
> >  changes every output bit 50% of the time, while lookup2() failed it.
> > - performs better: compiled with -O2 on Core2 Duo, lookup3() 20-40% faster
> >  than lookup2() depending on the key length.
> 
> Well, there's another question which is not addressed by Bob Jenkins'
> design docs:
> 
> Kernel code usually runs cache-cold, whereas Bob Jenkins did most of
> his testing cache-hot in tight loops.  If you compile both lookup2 and
> lookup3 with -Os and run them in a loop with a cache flush, how well
> do they compare then?

In order to get as much data as possible, I dusted off the good old cttest 
program which was used long time ago to test and pick the currently used 
lookup2 function. I added lookup3 to cttest, converted the internals to 
follow the current netfilter/conntrack hash lookup usage and introduced a 
large enough input buffer to force cold cache. I also added the superfast 
hash by Paul Hsieh and murmur2 hash of Austin Appleby, to check those as 
well.

The test program and all the generated html pages and graphs are at
http://www.kfki.hu/~kadlec/sw/netfilter/ct3/. Here follows just a terse 
summary:

- Superfasthash is definitely not good enough as it can too easily 
  produce just too much clashes.
- The murmur2 hash is the fastest one and looks quite good, still it
  had got some trouble at specific hash table sizes.
- No weakness could be spotted with the lookup3 hash.
- When compiling with -Os, the hashes from fastest to slowest are:
  murmur2, lookup3, superfast, lookup2.
- When compiling with -O2, the hashes from fastest to slowest are:
  murmur2, superfast, lookup3 and lookup2.

On Thu, 12 Feb 2009, Rusty Russell wrote:

> My concern was that it's also bigger (and we inline it).  Performance is
> pretty much a wash since we so rarely hash more than a few words.

In netfilter/conntrack (;-) we call the hash function for every packet, so 
even if a small number of cycle can be gained at one lookup, I think it's 
worth. And in the IPv4/IPv6 neutral nf_conntrack we hash 9 words.

> Any stats on code size changes?

The minimal code here

#include <stdint.h>

typedef uint64_t u64;
typedef uint32_t u32;
typedef uint16_t u16;
typedef uint8_t u8;

#ifdef JHASH
#include "jhash.h"
#else
#include "jhash3.h"
#endif

u32 hash(const u32 *key, u32 len, u32 initval)
{
        return jhash2(key, len, initval);
}
/* eof */

compiled with -Os, then stripped, give:

-rw-r--r-- 1 kadlec kadlec 1080 2009-02-17 14:10 lookup2.o
-rw-r--r-- 1 kadlec kadlec 1024 2009-02-17 14:10 lookup3.o

So I think the currently used lookup2 can safely be replaced with lookup3.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rusty Russell Feb. 18, 2009, 5:11 a.m. UTC | #5
On Wednesday 18 February 2009 03:43:04 Jozsef Kadlecsik wrote:
> On Thu, 12 Feb 2009, Rusty Russell wrote:
> > Any stats on code size changes?
> compiled with -Os, then stripped, give:
> 
> -rw-r--r-- 1 kadlec kadlec 1080 2009-02-17 14:10 lookup2.o
> -rw-r--r-- 1 kadlec kadlec 1024 2009-02-17 14:10 lookup3.o

The "size" command is more usual.  This looks like a total win to me.

Thanks for being so thorough!
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ilpo Järvinen Feb. 18, 2009, 11:50 a.m. UTC | #6
On Tue, 17 Feb 2009, Jozsef Kadlecsik wrote:

> On Thu, 12 Feb 2009, Rusty Russell wrote:
> 
> > My concern was that it's also bigger (and we inline it).  Performance is
> > pretty much a wash since we so rarely hash more than a few words.
> 
> In netfilter/conntrack (;-) we call the hash function for every packet, so 
> even if a small number of cycle can be gained at one lookup, I think it's 
> worth. And in the IPv4/IPv6 neutral nf_conntrack we hash 9 words.

FYI, I once looked into inlining cost and jhash functions were among the 
most wasteful (kernel-wide). Multiple jhash bodies were 100+ bytes, and 
the overall cost was 10k+. I never got to the final submit of the 
uninlining patch though...
diff mbox

Patch

diff --git a/include/linux/jhash.h b/include/linux/jhash.h
index 2a2f99f..765a9d6 100644
--- a/include/linux/jhash.h
+++ b/include/linux/jhash.h
@@ -3,80 +3,95 @@ 
 
 /* jhash.h: Jenkins hash support.
  *
- * Copyright (C) 1996 Bob Jenkins (bob_jenkins@burtleburtle.net)
+ * Copyright (C) 2006. Bob Jenkins (bob_jenkins@burtleburtle.net)
  *
  * http://burtleburtle.net/bob/hash/
  *
  * These are the credits from Bob's sources:
  *
- * lookup2.c, by Bob Jenkins, December 1996, Public Domain.
- * hash(), hash2(), hash3, and mix() are externally useful functions.
- * Routines to test the hash are included if SELF_TEST is defined.
- * You can use this free for any purpose.  It has no warranty.
+ * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
  *
- * Copyright (C) 2003 David S. Miller (davem@redhat.com)
+ * These are functions for producing 32-bit hashes for hash table lookup.
+ * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final() 
+ * are externally useful functions.  Routines to test the hash are included 
+ * if SELF_TEST is defined.  You can use this free for any purpose.  It's in
+ * the public domain.  It has no warranty.
+ *
+ * Copyright (C) 2009 Jozsef Kadlecsik (kadlec@blackhole.kfki.hu)
  *
  * I've modified Bob's hash to be useful in the Linux kernel, and
- * any bugs present are surely my fault.  -DaveM
+ * any bugs present are my fault.  Jozsef
  */
 
-/* NOTE: Arguments are modified. */
-#define __jhash_mix(a, b, c) \
+#define __rot(x,k) (((x)<<(k)) | ((x)>>(32-(k))))
+
+/* __jhash_mix - mix 3 32-bit values reversibly. */
+#define __jhash_mix(a,b,c) \
+{ \
+  a -= c;  a ^= __rot(c, 4);  c += b; \
+  b -= a;  b ^= __rot(a, 6);  a += c; \
+  c -= b;  c ^= __rot(b, 8);  b += a; \
+  a -= c;  a ^= __rot(c,16);  c += b; \
+  b -= a;  b ^= __rot(a,19);  a += c; \
+  c -= b;  c ^= __rot(b, 4);  b += a; \
+}
+
+/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
+#define __jhash_final(a,b,c) \
 { \
-  a -= b; a -= c; a ^= (c>>13); \
-  b -= c; b -= a; b ^= (a<<8); \
-  c -= a; c -= b; c ^= (b>>13); \
-  a -= b; a -= c; a ^= (c>>12);  \
-  b -= c; b -= a; b ^= (a<<16); \
-  c -= a; c -= b; c ^= (b>>5); \
-  a -= b; a -= c; a ^= (c>>3);  \
-  b -= c; b -= a; b ^= (a<<10); \
-  c -= a; c -= b; c ^= (b>>15); \
+  c ^= b; c -= __rot(b,14); \
+  a ^= c; a -= __rot(c,11); \
+  b ^= a; b -= __rot(a,25); \
+  c ^= b; c -= __rot(b,16); \
+  a ^= c; a -= __rot(c,4);  \
+  b ^= a; b -= __rot(a,14); \
+  c ^= b; c -= __rot(b,24); \
 }
 
-/* The golden ration: an arbitrary value */
-#define JHASH_GOLDEN_RATIO	0x9e3779b9
+/* An arbitrary initial parameter */
+#define JHASH_INIT_PARAM	0xdeadbeef
 
 /* The most generic version, hashes an arbitrary sequence
  * of bytes.  No alignment or length assumptions are made about
- * the input key.
+ * the input key. The result depends on endianness.
  */
 static inline u32 jhash(const void *key, u32 length, u32 initval)
 {
-	u32 a, b, c, len;
+	u32 a,b,c;
 	const u8 *k = key;
 
-	len = length;
-	a = b = JHASH_GOLDEN_RATIO;
-	c = initval;
-
-	while (len >= 12) {
-		a += (k[0] +((u32)k[1]<<8) +((u32)k[2]<<16) +((u32)k[3]<<24));
-		b += (k[4] +((u32)k[5]<<8) +((u32)k[6]<<16) +((u32)k[7]<<24));
-		c += (k[8] +((u32)k[9]<<8) +((u32)k[10]<<16)+((u32)k[11]<<24));
-
-		__jhash_mix(a,b,c);
+	/* Set up the internal state */
+	a = b = c = JHASH_INIT_PARAM + length + initval;
 
+	/* all but the last block: affect some 32 bits of (a,b,c) */
+	while (length > 12) {
+    		a += (k[0] + ((u32)k[1]<<8) + ((u32)k[2]<<16) + ((u32)k[3]<<24));
+		b += (k[4] + ((u32)k[5]<<8) + ((u32)k[6]<<16) + ((u32)k[7]<<24));
+		c += (k[8] + ((u32)k[9]<<8) + ((u32)k[10]<<16) + ((u32)k[11]<<24));
+		__jhash_mix(a, b, c);
+		length -= 12;
 		k += 12;
-		len -= 12;
 	}
 
-	c += length;
-	switch (len) {
-	case 11: c += ((u32)k[10]<<24);
-	case 10: c += ((u32)k[9]<<16);
-	case 9 : c += ((u32)k[8]<<8);
-	case 8 : b += ((u32)k[7]<<24);
-	case 7 : b += ((u32)k[6]<<16);
-	case 6 : b += ((u32)k[5]<<8);
+	/* last block: affect all 32 bits of (c) */
+	/* all the case statements fall through */
+	switch (length) {
+	case 12: c += (u32)k[11]<<24;
+	case 11: c += (u32)k[10]<<16;
+	case 10: c += (u32)k[9]<<8;
+	case 9 : c += k[8];
+	case 8 : b += (u32)k[7]<<24;
+	case 7 : b += (u32)k[6]<<16;
+	case 6 : b += (u32)k[5]<<8;
 	case 5 : b += k[4];
-	case 4 : a += ((u32)k[3]<<24);
-	case 3 : a += ((u32)k[2]<<16);
-	case 2 : a += ((u32)k[1]<<8);
+	case 4 : a += (u32)k[3]<<24;
+	case 3 : a += (u32)k[2]<<16;
+	case 2 : a += (u32)k[1]<<8;
 	case 1 : a += k[0];
-	};
-
-	__jhash_mix(a,b,c);
+		__jhash_final(a, b, c);
+	case 0 :
+		break;
+	}
 
 	return c;
 }
@@ -86,58 +101,57 @@  static inline u32 jhash(const void *key, u32 length, u32 initval)
  */
 static inline u32 jhash2(const u32 *k, u32 length, u32 initval)
 {
-	u32 a, b, c, len;
+	u32 a, b, c;
 
-	a = b = JHASH_GOLDEN_RATIO;
-	c = initval;
-	len = length;
+	/* Set up the internal state */
+	a = b = c = JHASH_INIT_PARAM + (length<<2) + initval;
 
-	while (len >= 3) {
+	/* handle most of the key */
+	while (length > 3) {
 		a += k[0];
 		b += k[1];
 		c += k[2];
 		__jhash_mix(a, b, c);
-		k += 3; len -= 3;
+		length -= 3;
+		k += 3;
 	}
 
-	c += length * 4;
-
-	switch (len) {
-	case 2 : b += k[1];
-	case 1 : a += k[0];
-	};
-
-	__jhash_mix(a,b,c);
+	/* handle the last 3 u32's */
+	/* all the case statements fall through */ 
+	switch (length) {
+	case 3: c += k[2];
+	case 2: b += k[1];
+	case 1: a += k[0];
+		__jhash_final(a, b, c);
+	case 0:     /* case 0: nothing left to add */
+		break;
+	}
 
 	return c;
 }
 
-
 /* A special ultra-optimized versions that knows they are hashing exactly
  * 3, 2 or 1 word(s).
- *
- * NOTE: In partilar the "c += length; __jhash_mix(a,b,c);" normally
- *       done at the end is not done here.
  */
 static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
 {
-	a += JHASH_GOLDEN_RATIO;
-	b += JHASH_GOLDEN_RATIO;
-	c += initval;
+	a += JHASH_INIT_PARAM + initval;
+	b += JHASH_INIT_PARAM + initval;
+	c += JHASH_INIT_PARAM + initval;
 
-	__jhash_mix(a, b, c);
+	__jhash_final(a, b, c);
 
 	return c;
 }
 
 static inline u32 jhash_2words(u32 a, u32 b, u32 initval)
 {
-	return jhash_3words(a, b, 0, initval);
+	return jhash_3words(0, a, b, initval);
 }
 
 static inline u32 jhash_1word(u32 a, u32 initval)
 {
-	return jhash_3words(a, 0, 0, initval);
+	return jhash_3words(0, 0, a, initval);
 }
 
 #endif /* _LINUX_JHASH_H */