Patchwork dcache: better name hash function

login
register
mail settings
Submitter stephen hemminger
Date Oct. 26, 2009, 10:36 p.m.
Message ID <20091026153656.25be4369@nehalam>
Download mbox | patch
Permalink /patch/36946/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

stephen hemminger - Oct. 26, 2009, 10:36 p.m.
Some experiments by Octavian with large numbers of network devices identified
that name_hash does not evenly distribute values causing performance
penalties.  The name hashing function is used by dcache et. all
so let's just choose a better one.

Additional standalone tests for 10,000,000 consecutive names
using lots of different algorithms shows fnv as the winner.
It is faster and has almost ideal dispersion. 
string10 is slightly faster, but only works for names like ppp0, ppp1,...

Algorithm             Time       Ratio       Max   StdDev
string10             0.238201       1.00      2444   0.02
fnv32                0.240595       1.00      2576   1.05
fnv64                0.241224       1.00      2556   0.69
SuperFastHash        0.272872       1.00      2871   2.15
string_hash17        0.295160       1.00      2484   0.40
jhash_string         0.300925       1.00      2606   1.00
crc                  1.606741       1.00      2474   0.29
md5_string           2.424771       1.00      2644   0.99
djb2                 0.275424       1.15      3821  19.04
string_hash31        0.264806       1.21      4097  22.78
sdbm                 0.371136       2.87     13016  67.54
elf                  0.371279       3.59      9990  79.50
pjw                  0.401172       3.59      9990  79.50
full_name_hash       0.285851      13.09     35174 171.81
kr_hash              0.245068     124.84    468448 549.89
fletcher             0.267664     124.84    468448 549.89
adler32              0.640668     124.84    468448 549.89
xor                  0.220545     213.82    583189 720.85
lastchar             0.194604     409.57   1000000 998.78

Time is seconds.
Ratio is how many probes required to lookup all values versus
  an ideal hash.
Max is longest chain

Reported-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - Oct. 27, 2009, 2:45 a.m.
Stephen Hemminger <shemminger@vyatta.com>, Al Viro a écrit :
> --- a/include/linux/dcache.h	2009-10-26 14:58:45.220347300 -0700
> +++ b/include/linux/dcache.h	2009-10-26 15:12:15.004160122 -0700
> @@ -45,15 +45,28 @@ struct dentry_stat_t {
>  };
>  extern struct dentry_stat_t dentry_stat;
>  
> -/* Name hashing routines. Initial hash value */
> -/* Hash courtesy of the R5 hash in reiserfs modulo sign bits */
> -#define init_name_hash()		0
> +/*
> + * Fowler / Noll / Vo (FNV) Hash
> + * see: http://www.isthe.com/chongo/tech/comp/fnv/
> + */
> +#ifdef CONFIG_64BIT
> +#define FNV_PRIME  1099511628211ull
> +#define FNV1_INIT  14695981039346656037ull
> +#else
> +#define FNV_PRIME  16777619u
> +#define FNV1_INIT  2166136261u
> +#endif
> +
> +#define init_name_hash()	FNV1_INIT
>  
> -/* partial hash update function. Assume roughly 4 bits per character */
> +/* partial hash update function. */
>  static inline unsigned long
> -partial_name_hash(unsigned long c, unsigned long prevhash)
> +partial_name_hash(unsigned char c, unsigned long prevhash)
>  {
> -	return (prevhash + (c << 4) + (c >> 4)) * 11;
> +	prevhash ^= c;
> +	prevhash *= FNV_PRIME;
> +
> +	return prevhash;
>  }
>  
>  /*

OK, but thats strlen(name) X (long,long) multiplies.

I suspect you tested on recent x86_64 cpu.

Some arches might have slow multiplies, no ?

jhash() (and others) are optimized by compiler to use basic and fast operations.
jhash operates on blocs of 12 chars per round, so it might be a pretty good choice once
out-of-line (because its pretty large and full_name_hash() is now used by
a lot of call sites)

Please provide your test program source, so that other can test on various arches.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger - Oct. 27, 2009, 3:53 a.m.
On Tue, 27 Oct 2009 03:45:50 +0100
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Stephen Hemminger <shemminger@vyatta.com>, Al Viro a écrit :
> > --- a/include/linux/dcache.h	2009-10-26 14:58:45.220347300 -0700
> > +++ b/include/linux/dcache.h	2009-10-26 15:12:15.004160122 -0700
> > @@ -45,15 +45,28 @@ struct dentry_stat_t {
> >  };
> >  extern struct dentry_stat_t dentry_stat;
> >  
> > -/* Name hashing routines. Initial hash value */
> > -/* Hash courtesy of the R5 hash in reiserfs modulo sign bits */
> > -#define init_name_hash()		0
> > +/*
> > + * Fowler / Noll / Vo (FNV) Hash
> > + * see: http://www.isthe.com/chongo/tech/comp/fnv/
> > + */
> > +#ifdef CONFIG_64BIT
> > +#define FNV_PRIME  1099511628211ull
> > +#define FNV1_INIT  14695981039346656037ull
> > +#else
> > +#define FNV_PRIME  16777619u
> > +#define FNV1_INIT  2166136261u
> > +#endif
> > +
> > +#define init_name_hash()	FNV1_INIT
> >  
> > -/* partial hash update function. Assume roughly 4 bits per character */
> > +/* partial hash update function. */
> >  static inline unsigned long
> > -partial_name_hash(unsigned long c, unsigned long prevhash)
> > +partial_name_hash(unsigned char c, unsigned long prevhash)
> >  {
> > -	return (prevhash + (c << 4) + (c >> 4)) * 11;
> > +	prevhash ^= c;
> > +	prevhash *= FNV_PRIME;
> > +
> > +	return prevhash;
> >  }
> >  
> >  /*
> 
> OK, but thats strlen(name) X (long,long) multiplies.
> 
> I suspect you tested on recent x86_64 cpu.
> 
> Some arches might have slow multiplies, no ?
> 
> jhash() (and others) are optimized by compiler to use basic and fast operations.
> jhash operates on blocs of 12 chars per round, so it might be a pretty good choice once
> out-of-line (because its pretty large and full_name_hash() is now used by
> a lot of call sites)
> 
> Please provide your test program source, so that other can test on various arches.
> 
> Thanks

long on i386 is 32 bits so it is 32 bit multiply.  There is also an optimization
that uses shift and adds.




--
Rick Jones - Oct. 27, 2009, 4:38 p.m.
Previously Stephen kindly sent me the source and instructions, and attached are 
results from 1.0 GHz Itanium "McKinley" processors using an older gcc, both -O2 
and -O3, -O2 first:

 >>>
 >>>
 >>> $ ./hashtest 10000000 14 | sort -n -k 3 -k 2
 >>> Algorithm             Time       Ratio       Max   StdDev
 >>> string10             0.234133       1.00       612   0.03
 >>> fnv32                0.241471       1.00       689   0.93
 >>> fnv64                0.241964       1.00       680   0.85
 >>> string_hash17        0.269656       1.00       645   0.36
 >>> jhash_string         0.295795       1.00       702   1.00
 >>> crc                  1.609449       1.00       634   0.41
 >>> md5_string           2.479467       1.00       720   0.99
 >>> SuperFastHash        0.273793       1.01       900   2.13
 >>> djb2                 0.265877       1.15       964   9.52
 >>> string_hash31        0.259110       1.21      1039  11.39
 >>> sdbm                 0.369414       2.87      3268  33.77
 >>> elf                  0.372251       3.71      2907  40.71
 >>> pjw                  0.401732       3.71      2907  40.71
 >>> full_name_hash       0.283508      13.09      8796  85.91
 >>> kr_hash              0.220033     499.17    468448 551.55
 >>> fletcher             0.267009     499.17    468448 551.55
 >>> adler32              0.635047     499.17    468448 551.55
 >>> xor                  0.220314     854.94    583189 722.12
 >>> lastchar             0.155236    1637.61   1000000 999.69
 >>
 >>
 >> here then are both, from a 1.0 GHz McKinley system, 64-bit, using an older
 >> gcc
 >>
 >> raj@oslowest:~/hashtest$ ./hashtest 10000000 14 | sort -n -k 3 -k 2
 >> Algorithm             Time       Ratio       Max   StdDev
 >> string_hash17        0.901319       1.00       645   0.36
 >> string10             0.986391       1.00       612   0.03
 >> jhash_string         1.422065       1.00       702   1.00
 >> fnv32                1.705116       1.00       689   0.93
 >> fnv64                1.900326       1.00       680   0.85
 >> crc                  3.651519       1.00       634   0.41
 >> md5_string           14.155621       1.00       720   0.99
 >> SuperFastHash        1.185206       1.01       900   2.13
 >> djb2                 0.977166       1.15       964   9.52
 >> string_hash31        0.989804       1.21      1039  11.39
 >> sdbm                 1.188299       2.87      3268  33.77
 >> pjw                  1.185963       3.71      2907  40.71
 >> elf                  1.257023       3.71      2907  40.71
 >> full_name_hash       1.231514      13.09      8796  85.91
 >> kr_hash              0.890761     499.17    468448 551.55
 >> fletcher             1.080981     499.17    468448 551.55
 >> adler32              4.141714     499.17    468448 551.55
 >> xor                  1.061445     854.94    583189 722.12
 >> lastchar             0.676697    1637.61   1000000 999.69
 >>
 >> raj@oslowest:~/hashtest$ ./hashtest 10000000 8 | sort -n -k 3 -k 2
 >> Algorithm             Time       Ratio       Max   StdDev
 >> string_hash17        0.899988       1.00     39497   1.50
 >> string10             0.985100       1.00     39064   0.01
 >> SuperFastHash        1.141748       1.00     40497   2.17
 >> jhash_string         1.376414       1.00     39669   1.04
 >> fnv32                1.656967       1.00     39895   2.25
 >> fnv64                1.855259       1.00     39215   0.35
 >> crc                  3.615341       1.00     39088   0.07
 >> md5_string           14.113307       1.00     39605   0.98
 >> djb2                 0.972180       1.15     60681  76.16
 >> string_hash31        0.982233       1.21     64950  91.12
 >> sdbm                 1.181952       2.38    129900 232.22
 >> pjw                  1.178994       2.45     99990 237.86
 >> elf                  1.250936       2.45     99990 237.86
 >> kr_hash              0.892633       7.80    468451 515.52
 >> fletcher             1.082932       7.80    468451 515.52
 >> adler32              4.142414       7.80    468451 515.52
 >> full_name_hash       1.175324      13.09    562501 687.24
 >> xor                  1.060091      13.36    583189 694.98
 >> lastchar             0.675610      25.60   1000000 980.27
 >>
 >> raj@oslowest:~/hashtest$ gcc -v
 >> Using built-in specs.
 >> Target: ia64-linux-gnu
 >> Configured with: ../src/configure -v 
--enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr 
--enable-shared --with-system-zlib --libexecdir=/usr/lib 
--without-included-gettext --enable-threads=posix --enable-nls 
--program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-mpfr --disable-libssp --with-system-libunwind 
--enable-checking=release ia64-linux-gnu
 >> Thread model: posix
 >> gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
 >> raj@oslowest:~/hashtest$
 >>
 >> fnv doesn't seem to do as well there relative to the others as it did in your
 >> tests.
 >
 >
 >
 > You could try -O3 since then gcc may replace the multiply with shift/add
 > or is there something about forcing 32 and 64 bit that makes ia64 suffer.


It seems to speed things up, but the relative ordering remains the same:

oslowest:/home/raj/hashtest# make
cc -O3 -Wall   -c -o hashtest.o hashtest.c
cc -O3 -Wall   -c -o md5.o md5.c
cc -lm  hashtest.o md5.o   -o hashtest
oslowest:/home/raj/hashtest# ./hashtest 10000000 14 | sort -n -k 3 -k 2
Algorithm             Time       Ratio       Max   StdDev
string_hash17        0.893813       1.00       645   0.36
string10             0.965596       1.00       612   0.03
jhash_string         1.387773       1.00       702   1.00
fnv32                1.699041       1.00       689   0.93
fnv64                1.882314       1.00       680   0.85
crc                  3.273676       1.00       634   0.41
md5_string           13.913745       1.00       720   0.99
SuperFastHash        1.135802       1.01       900   2.13
djb2                 0.951571       1.15       964   9.52
string_hash31        0.971081       1.21      1039  11.39
sdbm                 1.168148       2.87      3268  33.77
pjw                  1.159304       3.71      2907  40.71
elf                  1.237662       3.71      2907  40.71
full_name_hash       1.212588      13.09      8796  85.91
kr_hash              0.856584     499.17    468448 551.55
fletcher             1.054516     499.17    468448 551.55
adler32              4.123742     499.17    468448 551.55
xor                  1.031910     854.94    583189 722.12
lastchar             0.648597    1637.61   1000000 999.69
oslowest:/home/raj/hashtest# ./hashtest 10000000 8 | sort -n -k 3 -k 2
Algorithm             Time       Ratio       Max   StdDev
string_hash17        0.884829       1.00     39497   1.50
string10             0.962258       1.00     39064   0.01
SuperFastHash        1.088602       1.00     40497   2.17
jhash_string         1.340878       1.00     39669   1.04
fnv32                1.637096       1.00     39895   2.25
fnv64                1.842330       1.00     39215   0.35
crc                  3.230291       1.00     39088   0.07
md5_string           13.863056       1.00     39605   0.98
djb2                 0.944159       1.15     60681  76.16
string_hash31        0.961978       1.21     64950  91.12
sdbm                 1.159156       2.38    129900 232.22
pjw                  1.154286       2.45     99990 237.86
elf                  1.232842       2.45     99990 237.86
kr_hash              0.856873       7.80    468451 515.52
fletcher             1.055389       7.80    468451 515.52
adler32              4.123254       7.80    468451 515.52
full_name_hash       1.152628      13.09    562501 687.24
xor                  1.033050      13.36    583189 694.98
lastchar             0.647504      25.60   1000000 980.27

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

--- a/include/linux/dcache.h	2009-10-26 14:58:45.220347300 -0700
+++ b/include/linux/dcache.h	2009-10-26 15:12:15.004160122 -0700
@@ -45,15 +45,28 @@  struct dentry_stat_t {
 };
 extern struct dentry_stat_t dentry_stat;
 
-/* Name hashing routines. Initial hash value */
-/* Hash courtesy of the R5 hash in reiserfs modulo sign bits */
-#define init_name_hash()		0
+/*
+ * Fowler / Noll / Vo (FNV) Hash
+ * see: http://www.isthe.com/chongo/tech/comp/fnv/
+ */
+#ifdef CONFIG_64BIT
+#define FNV_PRIME  1099511628211ull
+#define FNV1_INIT  14695981039346656037ull
+#else
+#define FNV_PRIME  16777619u
+#define FNV1_INIT  2166136261u
+#endif
+
+#define init_name_hash()	FNV1_INIT
 
-/* partial hash update function. Assume roughly 4 bits per character */
+/* partial hash update function. */
 static inline unsigned long
-partial_name_hash(unsigned long c, unsigned long prevhash)
+partial_name_hash(unsigned char c, unsigned long prevhash)
 {
-	return (prevhash + (c << 4) + (c >> 4)) * 11;
+	prevhash ^= c;
+	prevhash *= FNV_PRIME;
+
+	return prevhash;
 }
 
 /*