Patchwork Update jhash.h with the new version of Jenkins' hash

login
register
mail settings
Submitter Jozsef Kadlecsik
Date Feb. 11, 2009, 10:19 a.m.
Message ID <alpine.DEB.2.00.0902111107480.15665@blackhole.kfki.hu>
Download mbox | patch
Permalink /patch/22909/
State Superseded
Delegated to: David Miller
Headers show

Comments

Jozsef Kadlecsik - Feb. 11, 2009, 10:19 a.m.
Hi,

The current jhash.h implements the lookup2() hash function by Bob Jenkins. 
However, lookup2() is outdated as Bob wrote a new hash function called 
lookup3(). The new hash function

- mixes better than lookup2(): it passes the check that every input bit 
  changes every output bit 50% of the time, while lookup2() failed it.
- performs better: compiled with -O2 on Core2 Duo, lookup3() is 20-40% 
  faster than lookup2() depending on the key length.

The patch replaces the lookup2() implementation of the 'jhash*' 
functions with that of lookup3() in linux/jhash.h.

You can read a longer comparison of the Jenkins and other hash functions 
at http://burtleburtle.net/bob/hash/doobs.html.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>



Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michał Mirosław - Feb. 11, 2009, 8:19 p.m.
Hi,

On Wed, Feb 11, 2009 at 11:19:20AM +0100, Jozsef Kadlecsik wrote:
>  /* The golden ration: an arbitrary value */
> -#define JHASH_GOLDEN_RATIO	0x9e3779b9
> +#define JHASH_GOLDEN_RATIO	0xdeadbeef

I have a stupid question: if this is arbitrary value, then why not
just get rid of it (IOW use zero as it's used in addition)?

Best Regards,
Michał Mirosław
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jozsef Kadlecsik - Feb. 11, 2009, 10:50 p.m.
On Wed, 11 Feb 2009, Micha? Miros?aw wrote:

> On Wed, Feb 11, 2009 at 11:19:20AM +0100, Jozsef Kadlecsik wrote:
> >  /* The golden ration: an arbitrary value */
> > -#define JHASH_GOLDEN_RATIO	0x9e3779b9
> > +#define JHASH_GOLDEN_RATIO	0xdeadbeef
> 
> I have a stupid question: if this is arbitrary value, then why not
> just get rid of it (IOW use zero as it's used in addition)?

lookup3() is a quite generic hash function and it supports 0-byte strings 
as input keys too. If the input key is a 0-byte string and the arbitrary 
value is zero, then the hash value simply equal to the initval. In order 
to avoid that case, the arbitrary value must be nonzero.

Best regards,
Jozsef
-
E-mail  : kadlec@blackhole.kfki.hu, kadlec@mail.kfki.hu
PGP key : http://www.kfki.hu/~kadlec/pgp_public_key.txt
Address : KFKI Research Institute for Particle and Nuclear Physics
          H-1525 Budapest 114, POB. 49, Hungary
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michał Mirosław - Feb. 11, 2009, 11:23 p.m.
On Wed, Feb 11, 2009 at 11:50:38PM +0100, Jozsef Kadlecsik wrote:
> On Wed, 11 Feb 2009, Micha? Miros?aw wrote:
> > On Wed, Feb 11, 2009 at 11:19:20AM +0100, Jozsef Kadlecsik wrote:
> > >  /* The golden ration: an arbitrary value */
> > > -#define JHASH_GOLDEN_RATIO	0x9e3779b9
> > > +#define JHASH_GOLDEN_RATIO	0xdeadbeef
> > I have a stupid question: if this is arbitrary value, then why not
> > just get rid of it (IOW use zero as it's used in addition)?
> lookup3() is a quite generic hash function and it supports 0-byte strings 
> as input keys too. If the input key is a 0-byte string and the arbitrary 
> value is zero, then the hash value simply equal to the initval. In order 
> to avoid that case, the arbitrary value must be nonzero.

Double-hashing, of course! I really need more sleep. Sorry for the noise. :>

Best Regards,
Michał Mirosław

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
wli@movementarian.org - Feb. 12, 2009, 12:12 a.m.
On Wed, Feb 11, 2009 at 11:19:20AM +0100, Jozsef Kadlecsik wrote:
>  /* The golden ration: an arbitrary value */
> -#define JHASH_GOLDEN_RATIO	0x9e3779b9
> +#define JHASH_GOLDEN_RATIO	0xdeadbeef

The golden ratio is not an arbitrary value, nor is it a food ration.
It is (sqrt(5)+1)/2. and the original value of JHASH_GOLDEN_RATIO
approximated its reciprocal and/or fractional part in fixed point.

(double)0xdeadbeef/((unsigned long long)1 << 32) is a rather poor
approximation of (sqrt(5)-1)/2. If this value which is not the
golden ratio is arbitrary, calling it something different from
JHASH_GOLDEN_RATIO would surprise the code's readers much less.


-- wli
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Feb. 12, 2009, 12:29 a.m.
From: wli@movementarian.org
Date: Wed, 11 Feb 2009 19:12:23 -0500

> (double)0xdeadbeef/((unsigned long long)1 << 32) is a rather poor
> approximation of (sqrt(5)-1)/2. If this value which is not the
> golden ratio is arbitrary, calling it something different from
> JHASH_GOLDEN_RATIO would surprise the code's readers much less.

Then we can change the name, whatever...

But really, anyone truly interested will read the friggin' web site
where all the design documents and hash test results live.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Engelhardt - Feb. 12, 2009, 12:41 a.m.
On Thursday 2009-02-12 01:29, David Miller wrote:
>
>> (double)0xdeadbeef/((unsigned long long)1 << 32) is a rather poor
>> approximation of (sqrt(5)-1)/2. If this value which is not the
>> golden ratio is arbitrary, calling it something different from
>> JHASH_GOLDEN_RATIO would surprise the code's readers much less.
>
>Then we can change the name, whatever...
>
>But really, anyone truly interested will read the friggin' web site
>where all the design documents and hash test results live.

Why should we use deadbeef over golden_ratio anyway? It's just a
random constant. And I'd take the gold over beef anytime ;-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rusty Russell - Feb. 12, 2009, 2:58 a.m.
On Wednesday 11 February 2009 20:49:20 Jozsef Kadlecsik wrote:
> Hi,
> 
> The current jhash.h implements the lookup2() hash function by Bob Jenkins. 
> However, lookup2() is outdated as Bob wrote a new hash function called 
> lookup3(). The new hash function
> 
> - mixes better than lookup2(): it passes the check that every input bit 
>   changes every output bit 50% of the time, while lookup2() failed it.
> - performs better: compiled with -O2 on Core2 Duo, lookup3() is 20-40% 
>   faster than lookup2() depending on the key length.

Hi Joszef,

My concern was that it's also bigger (and we inline it).  Performance is
pretty much a wash since we so rarely hash more than a few words.

Any stats on code size changes?
Rusty.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski - Feb. 12, 2009, 9:05 a.m.
On 12-02-2009 01:41, Jan Engelhardt wrote:
...
> And I'd take the gold over beef anytime ;-)

Better be careful! History, like Greek myths, or even quite recent,
could be surprising...

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski - Feb. 12, 2009, 9:55 a.m.
On Thu, Feb 12, 2009 at 09:05:13AM +0000, Jarek Poplawski wrote:
> On 12-02-2009 01:41, Jan Engelhardt wrote:
> ...
> > And I'd take the gold over beef anytime ;-)
> 
> Better be careful! History, like Greek myths, or even quite recent,
> could be surprising...

...But love over gold looks like safe bet! (love over beef too ;-)

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/include/linux/jhash.h b/include/linux/jhash.h
index 2a2f99f..2000b9f 100644
--- a/include/linux/jhash.h
+++ b/include/linux/jhash.h
@@ -3,80 +3,95 @@ 
 
 /* jhash.h: Jenkins hash support.
  *
- * Copyright (C) 1996 Bob Jenkins (bob_jenkins@burtleburtle.net)
+ * Copyright (C) 2006. Bob Jenkins (bob_jenkins@burtleburtle.net)
  *
  * http://burtleburtle.net/bob/hash/
  *
  * These are the credits from Bob's sources:
  *
- * lookup2.c, by Bob Jenkins, December 1996, Public Domain.
- * hash(), hash2(), hash3, and mix() are externally useful functions.
- * Routines to test the hash are included if SELF_TEST is defined.
- * You can use this free for any purpose.  It has no warranty.
+ * lookup3.c, by Bob Jenkins, May 2006, Public Domain.
  *
- * Copyright (C) 2003 David S. Miller (davem@redhat.com)
+ * These are functions for producing 32-bit hashes for hash table lookup.
+ * hashword(), hashlittle(), hashlittle2(), hashbig(), mix(), and final() 
+ * are externally useful functions.  Routines to test the hash are included 
+ * if SELF_TEST is defined.  You can use this free for any purpose.  It's in
+ * the public domain.  It has no warranty.
+ *
+ * Copyright (C) 2009 Jozsef Kadlecsik (kadlec@blackhole.kfki.hu)
  *
  * I've modified Bob's hash to be useful in the Linux kernel, and
- * any bugs present are surely my fault.  -DaveM
+ * any bugs present are my fault.  Jozsef
  */
 
-/* NOTE: Arguments are modified. */
-#define __jhash_mix(a, b, c) \
+#define __rot(x,k) (((x)<<(k)) | ((x)>>(32-(k))))
+
+/* __jhash_mix - mix 3 32-bit values reversibly. */
+#define __jhash_mix(a,b,c) \
+{ \
+  a -= c;  a ^= __rot(c, 4);  c += b; \
+  b -= a;  b ^= __rot(a, 6);  a += c; \
+  c -= b;  c ^= __rot(b, 8);  b += a; \
+  a -= c;  a ^= __rot(c,16);  c += b; \
+  b -= a;  b ^= __rot(a,19);  a += c; \
+  c -= b;  c ^= __rot(b, 4);  b += a; \
+}
+
+/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
+#define __jhash_final(a,b,c) \
 { \
-  a -= b; a -= c; a ^= (c>>13); \
-  b -= c; b -= a; b ^= (a<<8); \
-  c -= a; c -= b; c ^= (b>>13); \
-  a -= b; a -= c; a ^= (c>>12);  \
-  b -= c; b -= a; b ^= (a<<16); \
-  c -= a; c -= b; c ^= (b>>5); \
-  a -= b; a -= c; a ^= (c>>3);  \
-  b -= c; b -= a; b ^= (a<<10); \
-  c -= a; c -= b; c ^= (b>>15); \
+  c ^= b; c -= __rot(b,14); \
+  a ^= c; a -= __rot(c,11); \
+  b ^= a; b -= __rot(a,25); \
+  c ^= b; c -= __rot(b,16); \
+  a ^= c; a -= __rot(c,4);  \
+  b ^= a; b -= __rot(a,14); \
+  c ^= b; c -= __rot(b,24); \
 }
 
 /* The golden ration: an arbitrary value */
-#define JHASH_GOLDEN_RATIO	0x9e3779b9
+#define JHASH_GOLDEN_RATIO	0xdeadbeef
 
 /* The most generic version, hashes an arbitrary sequence
  * of bytes.  No alignment or length assumptions are made about
- * the input key.
+ * the input key. The result depends on endianness.
  */
 static inline u32 jhash(const void *key, u32 length, u32 initval)
 {
-	u32 a, b, c, len;
+	u32 a,b,c;
 	const u8 *k = key;
 
-	len = length;
-	a = b = JHASH_GOLDEN_RATIO;
-	c = initval;
-
-	while (len >= 12) {
-		a += (k[0] +((u32)k[1]<<8) +((u32)k[2]<<16) +((u32)k[3]<<24));
-		b += (k[4] +((u32)k[5]<<8) +((u32)k[6]<<16) +((u32)k[7]<<24));
-		c += (k[8] +((u32)k[9]<<8) +((u32)k[10]<<16)+((u32)k[11]<<24));
-
-		__jhash_mix(a,b,c);
+	/* Set up the internal state */
+	a = b = c = JHASH_GOLDEN_RATIO + length + initval;
 
+	/* all but the last block: affect some 32 bits of (a,b,c) */
+	while (length > 12) {
+    		a += (k[0] + ((u32)k[1]<<8) + ((u32)k[2]<<16) + ((u32)k[3]<<24));
+		b += (k[4] + ((u32)k[5]<<8) + ((u32)k[6]<<16) + ((u32)k[7]<<24));
+		c += (k[8] + ((u32)k[9]<<8) + ((u32)k[10]<<16) + ((u32)k[11]<<24));
+		__jhash_mix(a, b, c);
+		length -= 12;
 		k += 12;
-		len -= 12;
 	}
 
-	c += length;
-	switch (len) {
-	case 11: c += ((u32)k[10]<<24);
-	case 10: c += ((u32)k[9]<<16);
-	case 9 : c += ((u32)k[8]<<8);
-	case 8 : b += ((u32)k[7]<<24);
-	case 7 : b += ((u32)k[6]<<16);
-	case 6 : b += ((u32)k[5]<<8);
+	/* last block: affect all 32 bits of (c) */
+	/* all the case statements fall through */
+	switch (length) {
+	case 12: c += (u32)k[11]<<24;
+	case 11: c += (u32)k[10]<<16;
+	case 10: c += (u32)k[9]<<8;
+	case 9 : c += k[8];
+	case 8 : b += (u32)k[7]<<24;
+	case 7 : b += (u32)k[6]<<16;
+	case 6 : b += (u32)k[5]<<8;
 	case 5 : b += k[4];
-	case 4 : a += ((u32)k[3]<<24);
-	case 3 : a += ((u32)k[2]<<16);
-	case 2 : a += ((u32)k[1]<<8);
+	case 4 : a += (u32)k[3]<<24;
+	case 3 : a += (u32)k[2]<<16;
+	case 2 : a += (u32)k[1]<<8;
 	case 1 : a += k[0];
-	};
-
-	__jhash_mix(a,b,c);
+		__jhash_final(a, b, c);
+	case 0 :
+		break;
+	}
 
 	return c;
 }
@@ -86,58 +101,57 @@  static inline u32 jhash(const void *key, u32 length, u32 initval)
  */
 static inline u32 jhash2(const u32 *k, u32 length, u32 initval)
 {
-	u32 a, b, c, len;
+	u32 a, b, c;
 
-	a = b = JHASH_GOLDEN_RATIO;
-	c = initval;
-	len = length;
+	/* Set up the internal state */
+	a = b = c = JHASH_GOLDEN_RATIO + (length<<2) + initval;
 
-	while (len >= 3) {
+	/* handle most of the key */
+	while (length > 3) {
 		a += k[0];
 		b += k[1];
 		c += k[2];
 		__jhash_mix(a, b, c);
-		k += 3; len -= 3;
+		length -= 3;
+		k += 3;
 	}
 
-	c += length * 4;
-
-	switch (len) {
-	case 2 : b += k[1];
-	case 1 : a += k[0];
-	};
-
-	__jhash_mix(a,b,c);
+	/* handle the last 3 u32's */
+	/* all the case statements fall through */ 
+	switch (length) {
+	case 3: c += k[2];
+	case 2: b += k[1];
+	case 1: a += k[0];
+		__jhash_final(a, b, c);
+	case 0:     /* case 0: nothing left to add */
+		break;
+	}
 
 	return c;
 }
 
-
 /* A special ultra-optimized versions that knows they are hashing exactly
  * 3, 2 or 1 word(s).
- *
- * NOTE: In partilar the "c += length; __jhash_mix(a,b,c);" normally
- *       done at the end is not done here.
  */
 static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval)
 {
-	a += JHASH_GOLDEN_RATIO;
-	b += JHASH_GOLDEN_RATIO;
-	c += initval;
+	a += JHASH_GOLDEN_RATIO + initval;
+	b += JHASH_GOLDEN_RATIO + initval;
+	c += JHASH_GOLDEN_RATIO + initval;
 
-	__jhash_mix(a, b, c);
+	__jhash_final(a, b, c);
 
 	return c;
 }
 
 static inline u32 jhash_2words(u32 a, u32 b, u32 initval)
 {
-	return jhash_3words(a, b, 0, initval);
+	return jhash_3words(0, a, b, initval);
 }
 
 static inline u32 jhash_1word(u32 a, u32 initval)
 {
-	return jhash_3words(a, 0, 0, initval);
+	return jhash_3words(0, 0, a, initval);
 }
 
 #endif /* _LINUX_JHASH_H */