Message ID | 20091026153656.25be4369@nehalam |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
Stephen Hemminger <shemminger@vyatta.com>, Al Viro a écrit : > --- a/include/linux/dcache.h 2009-10-26 14:58:45.220347300 -0700 > +++ b/include/linux/dcache.h 2009-10-26 15:12:15.004160122 -0700 > @@ -45,15 +45,28 @@ struct dentry_stat_t { > }; > extern struct dentry_stat_t dentry_stat; > > -/* Name hashing routines. Initial hash value */ > -/* Hash courtesy of the R5 hash in reiserfs modulo sign bits */ > -#define init_name_hash() 0 > +/* > + * Fowler / Noll / Vo (FNV) Hash > + * see: http://www.isthe.com/chongo/tech/comp/fnv/ > + */ > +#ifdef CONFIG_64BIT > +#define FNV_PRIME 1099511628211ull > +#define FNV1_INIT 14695981039346656037ull > +#else > +#define FNV_PRIME 16777619u > +#define FNV1_INIT 2166136261u > +#endif > + > +#define init_name_hash() FNV1_INIT > > -/* partial hash update function. Assume roughly 4 bits per character */ > +/* partial hash update function. */ > static inline unsigned long > -partial_name_hash(unsigned long c, unsigned long prevhash) > +partial_name_hash(unsigned char c, unsigned long prevhash) > { > - return (prevhash + (c << 4) + (c >> 4)) * 11; > + prevhash ^= c; > + prevhash *= FNV_PRIME; > + > + return prevhash; > } > > /* OK, but thats strlen(name) X (long,long) multiplies. I suspect you tested on recent x86_64 cpu. Some arches might have slow multiplies, no ? jhash() (and others) are optimized by compiler to use basic and fast operations. jhash operates on blocs of 12 chars per round, so it might be a pretty good choice once out-of-line (because its pretty large and full_name_hash() is now used by a lot of call sites) Please provide your test program source, so that other can test on various arches. Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 27 Oct 2009 03:45:50 +0100 Eric Dumazet <eric.dumazet@gmail.com> wrote: > Stephen Hemminger <shemminger@vyatta.com>, Al Viro a écrit : > > --- a/include/linux/dcache.h 2009-10-26 14:58:45.220347300 -0700 > > +++ b/include/linux/dcache.h 2009-10-26 15:12:15.004160122 -0700 > > @@ -45,15 +45,28 @@ struct dentry_stat_t { > > }; > > extern struct dentry_stat_t dentry_stat; > > > > -/* Name hashing routines. Initial hash value */ > > -/* Hash courtesy of the R5 hash in reiserfs modulo sign bits */ > > -#define init_name_hash() 0 > > +/* > > + * Fowler / Noll / Vo (FNV) Hash > > + * see: http://www.isthe.com/chongo/tech/comp/fnv/ > > + */ > > +#ifdef CONFIG_64BIT > > +#define FNV_PRIME 1099511628211ull > > +#define FNV1_INIT 14695981039346656037ull > > +#else > > +#define FNV_PRIME 16777619u > > +#define FNV1_INIT 2166136261u > > +#endif > > + > > +#define init_name_hash() FNV1_INIT > > > > -/* partial hash update function. Assume roughly 4 bits per character */ > > +/* partial hash update function. */ > > static inline unsigned long > > -partial_name_hash(unsigned long c, unsigned long prevhash) > > +partial_name_hash(unsigned char c, unsigned long prevhash) > > { > > - return (prevhash + (c << 4) + (c >> 4)) * 11; > > + prevhash ^= c; > > + prevhash *= FNV_PRIME; > > + > > + return prevhash; > > } > > > > /* > > OK, but thats strlen(name) X (long,long) multiplies. > > I suspect you tested on recent x86_64 cpu. > > Some arches might have slow multiplies, no ? > > jhash() (and others) are optimized by compiler to use basic and fast operations. > jhash operates on blocs of 12 chars per round, so it might be a pretty good choice once > out-of-line (because its pretty large and full_name_hash() is now used by > a lot of call sites) > > Please provide your test program source, so that other can test on various arches. > > Thanks long on i386 is 32 bits so it is 32 bit multiply. There is also an optimization that uses shift and adds. --
Previously Stephen kindly sent me the source and instructions, and attached are results from 1.0 GHz Itanium "McKinley" processors using an older gcc, both -O2 and -O3, -O2 first: >>> >>> >>> $ ./hashtest 10000000 14 | sort -n -k 3 -k 2 >>> Algorithm Time Ratio Max StdDev >>> string10 0.234133 1.00 612 0.03 >>> fnv32 0.241471 1.00 689 0.93 >>> fnv64 0.241964 1.00 680 0.85 >>> string_hash17 0.269656 1.00 645 0.36 >>> jhash_string 0.295795 1.00 702 1.00 >>> crc 1.609449 1.00 634 0.41 >>> md5_string 2.479467 1.00 720 0.99 >>> SuperFastHash 0.273793 1.01 900 2.13 >>> djb2 0.265877 1.15 964 9.52 >>> string_hash31 0.259110 1.21 1039 11.39 >>> sdbm 0.369414 2.87 3268 33.77 >>> elf 0.372251 3.71 2907 40.71 >>> pjw 0.401732 3.71 2907 40.71 >>> full_name_hash 0.283508 13.09 8796 85.91 >>> kr_hash 0.220033 499.17 468448 551.55 >>> fletcher 0.267009 499.17 468448 551.55 >>> adler32 0.635047 499.17 468448 551.55 >>> xor 0.220314 854.94 583189 722.12 >>> lastchar 0.155236 1637.61 1000000 999.69 >> >> >> here then are both, from a 1.0 GHz McKinley system, 64-bit, using an older >> gcc >> >> raj@oslowest:~/hashtest$ ./hashtest 10000000 14 | sort -n -k 3 -k 2 >> Algorithm Time Ratio Max StdDev >> string_hash17 0.901319 1.00 645 0.36 >> string10 0.986391 1.00 612 0.03 >> jhash_string 1.422065 1.00 702 1.00 >> fnv32 1.705116 1.00 689 0.93 >> fnv64 1.900326 1.00 680 0.85 >> crc 3.651519 1.00 634 0.41 >> md5_string 14.155621 1.00 720 0.99 >> SuperFastHash 1.185206 1.01 900 2.13 >> djb2 0.977166 1.15 964 9.52 >> string_hash31 0.989804 1.21 1039 11.39 >> sdbm 1.188299 2.87 3268 33.77 >> pjw 1.185963 3.71 2907 40.71 >> elf 1.257023 3.71 2907 40.71 >> full_name_hash 1.231514 13.09 8796 85.91 >> kr_hash 0.890761 499.17 468448 551.55 >> fletcher 1.080981 499.17 468448 551.55 >> adler32 4.141714 499.17 468448 551.55 >> xor 1.061445 854.94 583189 722.12 >> lastchar 0.676697 1637.61 1000000 999.69 >> >> raj@oslowest:~/hashtest$ ./hashtest 10000000 8 | sort -n -k 3 -k 2 >> Algorithm Time Ratio Max StdDev >> string_hash17 0.899988 1.00 39497 1.50 >> string10 0.985100 1.00 39064 0.01 >> SuperFastHash 1.141748 1.00 40497 2.17 >> jhash_string 1.376414 1.00 39669 1.04 >> fnv32 1.656967 1.00 39895 2.25 >> fnv64 1.855259 1.00 39215 0.35 >> crc 3.615341 1.00 39088 0.07 >> md5_string 14.113307 1.00 39605 0.98 >> djb2 0.972180 1.15 60681 76.16 >> string_hash31 0.982233 1.21 64950 91.12 >> sdbm 1.181952 2.38 129900 232.22 >> pjw 1.178994 2.45 99990 237.86 >> elf 1.250936 2.45 99990 237.86 >> kr_hash 0.892633 7.80 468451 515.52 >> fletcher 1.082932 7.80 468451 515.52 >> adler32 4.142414 7.80 468451 515.52 >> full_name_hash 1.175324 13.09 562501 687.24 >> xor 1.060091 13.36 583189 694.98 >> lastchar 0.675610 25.60 1000000 980.27 >> >> raj@oslowest:~/hashtest$ gcc -v >> Using built-in specs. >> Target: ia64-linux-gnu >> Configured with: ../src/configure -v --enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr --enable-shared --with-system-zlib --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --enable-nls --program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr --disable-libssp --with-system-libunwind --enable-checking=release ia64-linux-gnu >> Thread model: posix >> gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) >> raj@oslowest:~/hashtest$ >> >> fnv doesn't seem to do as well there relative to the others as it did in your >> tests. > > > > You could try -O3 since then gcc may replace the multiply with shift/add > or is there something about forcing 32 and 64 bit that makes ia64 suffer. It seems to speed things up, but the relative ordering remains the same: oslowest:/home/raj/hashtest# make cc -O3 -Wall -c -o hashtest.o hashtest.c cc -O3 -Wall -c -o md5.o md5.c cc -lm hashtest.o md5.o -o hashtest oslowest:/home/raj/hashtest# ./hashtest 10000000 14 | sort -n -k 3 -k 2 Algorithm Time Ratio Max StdDev string_hash17 0.893813 1.00 645 0.36 string10 0.965596 1.00 612 0.03 jhash_string 1.387773 1.00 702 1.00 fnv32 1.699041 1.00 689 0.93 fnv64 1.882314 1.00 680 0.85 crc 3.273676 1.00 634 0.41 md5_string 13.913745 1.00 720 0.99 SuperFastHash 1.135802 1.01 900 2.13 djb2 0.951571 1.15 964 9.52 string_hash31 0.971081 1.21 1039 11.39 sdbm 1.168148 2.87 3268 33.77 pjw 1.159304 3.71 2907 40.71 elf 1.237662 3.71 2907 40.71 full_name_hash 1.212588 13.09 8796 85.91 kr_hash 0.856584 499.17 468448 551.55 fletcher 1.054516 499.17 468448 551.55 adler32 4.123742 499.17 468448 551.55 xor 1.031910 854.94 583189 722.12 lastchar 0.648597 1637.61 1000000 999.69 oslowest:/home/raj/hashtest# ./hashtest 10000000 8 | sort -n -k 3 -k 2 Algorithm Time Ratio Max StdDev string_hash17 0.884829 1.00 39497 1.50 string10 0.962258 1.00 39064 0.01 SuperFastHash 1.088602 1.00 40497 2.17 jhash_string 1.340878 1.00 39669 1.04 fnv32 1.637096 1.00 39895 2.25 fnv64 1.842330 1.00 39215 0.35 crc 3.230291 1.00 39088 0.07 md5_string 13.863056 1.00 39605 0.98 djb2 0.944159 1.15 60681 76.16 string_hash31 0.961978 1.21 64950 91.12 sdbm 1.159156 2.38 129900 232.22 pjw 1.154286 2.45 99990 237.86 elf 1.232842 2.45 99990 237.86 kr_hash 0.856873 7.80 468451 515.52 fletcher 1.055389 7.80 468451 515.52 adler32 4.123254 7.80 468451 515.52 full_name_hash 1.152628 13.09 562501 687.24 xor 1.033050 13.36 583189 694.98 lastchar 0.647504 25.60 1000000 980.27 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- a/include/linux/dcache.h 2009-10-26 14:58:45.220347300 -0700 +++ b/include/linux/dcache.h 2009-10-26 15:12:15.004160122 -0700 @@ -45,15 +45,28 @@ struct dentry_stat_t { }; extern struct dentry_stat_t dentry_stat; -/* Name hashing routines. Initial hash value */ -/* Hash courtesy of the R5 hash in reiserfs modulo sign bits */ -#define init_name_hash() 0 +/* + * Fowler / Noll / Vo (FNV) Hash + * see: http://www.isthe.com/chongo/tech/comp/fnv/ + */ +#ifdef CONFIG_64BIT +#define FNV_PRIME 1099511628211ull +#define FNV1_INIT 14695981039346656037ull +#else +#define FNV_PRIME 16777619u +#define FNV1_INIT 2166136261u +#endif + +#define init_name_hash() FNV1_INIT -/* partial hash update function. Assume roughly 4 bits per character */ +/* partial hash update function. */ static inline unsigned long -partial_name_hash(unsigned long c, unsigned long prevhash) +partial_name_hash(unsigned char c, unsigned long prevhash) { - return (prevhash + (c << 4) + (c >> 4)) * 11; + prevhash ^= c; + prevhash *= FNV_PRIME; + + return prevhash; } /*
Some experiments by Octavian with large numbers of network devices identified that name_hash does not evenly distribute values causing performance penalties. The name hashing function is used by dcache et. all so let's just choose a better one. Additional standalone tests for 10,000,000 consecutive names using lots of different algorithms shows fnv as the winner. It is faster and has almost ideal dispersion. string10 is slightly faster, but only works for names like ppp0, ppp1,... Algorithm Time Ratio Max StdDev string10 0.238201 1.00 2444 0.02 fnv32 0.240595 1.00 2576 1.05 fnv64 0.241224 1.00 2556 0.69 SuperFastHash 0.272872 1.00 2871 2.15 string_hash17 0.295160 1.00 2484 0.40 jhash_string 0.300925 1.00 2606 1.00 crc 1.606741 1.00 2474 0.29 md5_string 2.424771 1.00 2644 0.99 djb2 0.275424 1.15 3821 19.04 string_hash31 0.264806 1.21 4097 22.78 sdbm 0.371136 2.87 13016 67.54 elf 0.371279 3.59 9990 79.50 pjw 0.401172 3.59 9990 79.50 full_name_hash 0.285851 13.09 35174 171.81 kr_hash 0.245068 124.84 468448 549.89 fletcher 0.267664 124.84 468448 549.89 adler32 0.640668 124.84 468448 549.89 xor 0.220545 213.82 583189 720.85 lastchar 0.194604 409.57 1000000 998.78 Time is seconds. Ratio is how many probes required to lookup all values versus an ideal hash. Max is longest chain Reported-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html