Message ID | E1YX61x-0005m7-GV@gondolin.me.apana.org.au |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
From: Herbert Xu > Sent: 15 March 2015 10:44 > Keeping both size and shift is silly. We only need one. ... > --- a/lib/rhashtable.c > +++ b/lib/rhashtable.c > @@ -162,7 +162,6 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht, > return NULL; > > tbl->size = nbuckets; > - tbl->shift = ilog2(nbuckets); > > if (alloc_bucket_locks(ht, tbl) < 0) { > bucket_table_free(tbl); > @@ -189,7 +188,7 @@ static bool rht_grow_above_75(const struct rhashtable *ht, > { > /* Expand table when exceeding 75% load */ > return atomic_read(&ht->nelems) > (tbl->size / 4 * 3) && > - (!ht->p.max_shift || tbl->shift < ht->p.max_shift); > + (!ht->p.max_shift || tbl->size < (1 << ht->p.max_shift)); Looks like you could pre-calculate the 'grow_at' size. The test above would then be: return atomic_read(&ht->nelems > tbl->grow_at_size); Similarly for the shrink. David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/17/15 at 10:51am, David Laight wrote: > Looks like you could pre-calculate the 'grow_at' size. > The test above would then be: > return atomic_read(&ht->nelems > tbl->grow_at_size); > > Similarly for the shrink. Given the discussions, the grow decision will likely change to a max bucket length limit anyway. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 17, 2015 at 10:56:57AM +0000, tgraf@suug.ch wrote: > > Given the discussions, the grow decision will likely change to > a max bucket length limit anyway. Actually no. In my pathces the chain length is only used to force an immediate rehash. Growing is still based on the number of elements. The reason is that the maximum (not average) chain length actually grows with the hash table size, even at 75% utilisation. So unless we want ever decreasing utilisation as the table grows, we have to cope with a few chains with many elements. The limit I'm currently using is 16 which shouldn't be hit even if you had a 2^32 table (which you currently cannot). Cheers,
On 03/17/15 at 10:00pm, Herbert Xu wrote: > On Tue, Mar 17, 2015 at 10:56:57AM +0000, tgraf@suug.ch wrote: > > > > Given the discussions, the grow decision will likely change to > > a max bucket length limit anyway. > > Actually no. In my pathces the chain length is only used to > force an immediate rehash. Growing is still based on the number > of elements. > > The reason is that the maximum (not average) chain length actually > grows with the hash table size, even at 75% utilisation. > > So unless we want ever decreasing utilisation as the table grows, > we have to cope with a few chains with many elements. The limit > I'm currently using is 16 which shouldn't be hit even if you had > a 2^32 table (which you currently cannot). Not sure I understand the downside of a bucket length based grow decision with optional forced rehashing plus synchroneous table realloc if we hit a 2nd watermark as you proposed earlier. Shouldn't we consider deterministic lookup and insert behaviour more important than overall table utilization? Given the rehashing, the grow decision should not be attackable. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Herbert Xu > Sent: 17 March 2015 11:01 > On Tue, Mar 17, 2015 at 10:56:57AM +0000, tgraf@suug.ch wrote: > > > > Given the discussions, the grow decision will likely change to > > a max bucket length limit anyway. > > Actually no. In my pathces the chain length is only used to > force an immediate rehash. Growing is still based on the number > of elements. > > The reason is that the maximum (not average) chain length actually > grows with the hash table size, even at 75% utilisation. That doesn't surprise me. But won't the rehashed table be just as likely to have a long list? So you are likely to get an immediate rehash? David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 17, 2015 at 11:22:44AM +0000, David Laight wrote: > > But won't the rehashed table be just as likely to have a long list? > So you are likely to get an immediate rehash? Not if the threshold is set to a value that should never be hit at 100% utilisation. Cheers,
On Tue, Mar 17, 2015 at 11:22:03AM +0000, tgraf@suug.ch wrote: > > Not sure I understand the downside of a bucket length based grow > decision with optional forced rehashing plus synchroneous table > realloc if we hit a 2nd watermark as you proposed earlier. Shouldn't > we consider deterministic lookup and insert behaviour more important > than overall table utilization? Given the rehashing, the grow decision > should not be attackable. Do you really want to double the table size when 0.1% of the buckets have a chain length > 4 but still < 16? Cheers,
On 03/17/15 at 10:27pm, Herbert Xu wrote: > On Tue, Mar 17, 2015 at 11:22:03AM +0000, tgraf@suug.ch wrote: > > > > Not sure I understand the downside of a bucket length based grow > > decision with optional forced rehashing plus synchroneous table > > realloc if we hit a 2nd watermark as you proposed earlier. Shouldn't > > we consider deterministic lookup and insert behaviour more important > > than overall table utilization? Given the rehashing, the grow decision > > should not be attackable. > > Do you really want to double the table size when 0.1% of the buckets > have a chain length > 4 but still < 16? If we constantly hit that bucket because we are handling just a few TCP flows it would be worth to double the size & rehash to avoid the additional cache misses of the linked list. Although a limit of 4 would be too high. Ideally we should resize and rehash when we add the 2nd entry to a bucket to stay < 100% utilization. It seems likely though that we'll always have a bucket with >=2 entries so we would end up constantly doubling and rehashing. This is the only thing that speaks for a table wide nelems counters in my opinion. Another option would be to continue resizing & rehashing as long as we have at least one bucket with >= 4 entries but allow a table size dependant limit of (length > 1 && length < 4) buckets. This may be overengineered again though ;-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Thomas Graf > Sent: 17 March 2015 11:58 ... > > Do you really want to double the table size when 0.1% of the buckets > > have a chain length > 4 but still < 16? > > If we constantly hit that bucket because we are handling just a few > TCP flows it would be worth to double the size & rehash to avoid the > additional cache misses of the linked list. > > Although a limit of 4 would be too high. Ideally we should resize and > rehash when we add the 2nd entry to a bucket to stay < 100% utilization. > It seems likely though that we'll always have a bucket with >=2 > entries so we would end up constantly doubling and rehashing. This is > the only thing that speaks for a table wide nelems counters in my > opinion. I think you are seriously overestimating the 'efficiency' of the hash function. And not doing the 'birthday paradox' maths at all. The only way you'll get a 'hash' that good is if you can pick the input value in order to generate a perfect hash. However you aren't going to manage that for inbound TCP connections since none of the inputs to the hash can be chosen by the listening system (unless IPv6 has something than can help you). You may have to live with a few % of the items being on long chains. Maybe count the number of items on chains longer than (say) 4 and rehash or increase the table size if this exceeds a few % of the table size. (Or count the number of items that are further than 4 from the start of the hash chain.) David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/17/15 at 12:13pm, David Laight wrote: > From: Thomas Graf > > Sent: 17 March 2015 11:58 > ... > > > Do you really want to double the table size when 0.1% of the buckets > > > have a chain length > 4 but still < 16? > > > > If we constantly hit that bucket because we are handling just a few > > TCP flows it would be worth to double the size & rehash to avoid the > > additional cache misses of the linked list. > > > > Although a limit of 4 would be too high. Ideally we should resize and > > rehash when we add the 2nd entry to a bucket to stay < 100% utilization. > > It seems likely though that we'll always have a bucket with >=2 > > entries so we would end up constantly doubling and rehashing. This is > > the only thing that speaks for a table wide nelems counters in my > > opinion. > > I think you are seriously overestimating the 'efficiency' of the hash function. > And not doing the 'birthday paradox' maths at all. > > The only way you'll get a 'hash' that good is if you can pick the input > value in order to generate a perfect hash. > However you aren't going to manage that for inbound TCP connections since > none of the inputs to the hash can be chosen by the listening system > (unless IPv6 has something than can help you). > > You may have to live with a few % of the items being on long chains. > Maybe count the number of items on chains longer than (say) 4 and > rehash or increase the table size if this exceeds a few % of the > table size. > (Or count the number of items that are further than 4 from the start > of the hash chain.) Did you read my email all the way? You basically omitted to quote and then rephrased the 2nd half of my email. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 17, 2015 at 12:13:42PM +0000, David Laight wrote: > From: Thomas Graf > > Sent: 17 March 2015 11:58 > ... > > > Do you really want to double the table size when 0.1% of the buckets > > > have a chain length > 4 but still < 16? > > > > If we constantly hit that bucket because we are handling just a few > > TCP flows it would be worth to double the size & rehash to avoid the > > additional cache misses of the linked list. > > > > Although a limit of 4 would be too high. Ideally we should resize and > > rehash when we add the 2nd entry to a bucket to stay < 100% utilization. > > It seems likely though that we'll always have a bucket with >=2 > > entries so we would end up constantly doubling and rehashing. This is > > the only thing that speaks for a table wide nelems counters in my > > opinion. > > I think you are seriously overestimating the 'efficiency' of the hash function. > And not doing the 'birthday paradox' maths at all. Agreed. Thomas, an easy test to do is to pump the integers from 0 to 65535 into jhash, then mask it with 65535 and run sort | uniq -c | sort -n on it, I think you'll see what we're talking about. Cheers,
On 03/17/15 at 11:20pm, Herbert Xu wrote: > On Tue, Mar 17, 2015 at 12:13:42PM +0000, David Laight wrote: > > From: Thomas Graf > > > Sent: 17 March 2015 11:58 > > ... > > > > Do you really want to double the table size when 0.1% of the buckets > > > > have a chain length > 4 but still < 16? > > > > > > If we constantly hit that bucket because we are handling just a few > > > TCP flows it would be worth to double the size & rehash to avoid the > > > additional cache misses of the linked list. > > > > > > Although a limit of 4 would be too high. Ideally we should resize and > > > rehash when we add the 2nd entry to a bucket to stay < 100% utilization. > > > It seems likely though that we'll always have a bucket with >=2 > > > entries so we would end up constantly doubling and rehashing. This is > > > the only thing that speaks for a table wide nelems counters in my > > > opinion. > > > > I think you are seriously overestimating the 'efficiency' of the hash function. > > And not doing the 'birthday paradox' maths at all. > > Agreed. Thomas, an easy test to do is to pump the integers from > 0 to 65535 into jhash, then mask it with 65535 and run sort | > uniq -c | sort -n on it, I think you'll see what we're talking > about. I'm not claiming perfect hash functions and this is exactly why I think average utilization is not an optimal growth criteria because it gives very limited view into the actual chain lengths. What you describe above is a 100% utilization scenario. Initially we talked about 0.1% utilization and whether to resize & rehash if a single chain has length > 4. My answer is: yes we should resize & rehash or at least rehash in that case. My point here is that a chain length of 4 may be a serious performance bottleneck already and that it might be worth to try and detect bad hashing distribution and attempt to fix it at an earlier stage while ruling out the possibility of endless rehashes. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Thomas Graf > Sent: 17 March 2015 12:40 ... > I'm not claiming perfect hash functions and this is exactly why I > think average utilization is not an optimal growth criteria because > it gives very limited view into the actual chain lengths. If you assume that all the entries are being looked up equally often the interesting number ought to be the number of failed compares. So you want to sum 'length * (length - 1)/2' and compare against the total number of items (or the table size). > What you describe above is a 100% utilization scenario. Initially > we talked about 0.1% utilization and whether to resize & rehash if a > single chain has length > 4. My answer is: yes we should resize & > rehash or at least rehash in that case. 0.1% utilisation is about as realistic as 100%. 50-75% is probably a reasonable limit. This will give some short chains. > My point here is that a chain length of 4 may be a serious > performance bottleneck already and that it might be worth to try > and detect bad hashing distribution and attempt to fix it at an > earlier stage while ruling out the possibility of endless rehashes. If you have 4 items and they are all on one chain they I suspect that nothing you do will actually split them. OTOH if you have 1000 items and one chain of length 4 it really doesn't matter. Unless, of course, someone arranges to flood the last item with traffic - in which case you are going to lose anyway. David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Mar 17, 2015 at 12:40:12PM +0000, 'tgraf@suug.ch' wrote: > > > Agreed. Thomas, an easy test to do is to pump the integers from > > 0 to 65535 into jhash, then mask it with 65535 and run sort | > > uniq -c | sort -n on it, I think you'll see what we're talking > > about. > > I'm not claiming perfect hash functions and this is exactly why I > think average utilization is not an optimal growth criteria because > it gives very limited view into the actual chain lengths. > > What you describe above is a 100% utilization scenario. Initially Actually 75% is just as bad. To test that just repeat my script above and add head -n $((65536 / 4 * 3)) before the first sort. My point is that to get below anything like a chain length limit of 2 (or even 4), your hash table is going to be mostly empty. OK 0.1% might be an exaggeration but it's certainly nowhere near 50% as your hash table size grows. Please actually try out my test, here is a C program that will help you do it: #include <stdio.h> typedef unsigned char u8; typedef unsigned int u32; static inline u32 rol32(u32 word, unsigned int shift) { return (word << shift) | (word >> (32 - shift)); } /* Best hash sizes are of power of two */ #define jhash_size(n) ((u32)1<<(n)) /* Mask the hash value, i.e (value & jhash_mask(n)) instead of (value % n) */ #define jhash_mask(n) (jhash_size(n)-1) /* __jhash_mix -- mix 3 32-bit values reversibly. */ #define __jhash_mix(a, b, c) \ { \ a -= c; a ^= rol32(c, 4); c += b; \ b -= a; b ^= rol32(a, 6); a += c; \ c -= b; c ^= rol32(b, 8); b += a; \ a -= c; a ^= rol32(c, 16); c += b; \ b -= a; b ^= rol32(a, 19); a += c; \ c -= b; c ^= rol32(b, 4); b += a; \ } /* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */ #define __jhash_final(a, b, c) \ { \ c ^= b; c -= rol32(b, 14); \ a ^= c; a -= rol32(c, 11); \ b ^= a; b -= rol32(a, 25); \ c ^= b; c -= rol32(b, 16); \ a ^= c; a -= rol32(c, 4); \ b ^= a; b -= rol32(a, 14); \ c ^= b; c -= rol32(b, 24); \ } #define __get_unaligned_cpu32(x) (*(u32 *)(x)) /* An arbitrary initial parameter */ #define JHASH_INITVAL 0xdeadbeef /* jhash - hash an arbitrary key * @k: sequence of bytes as key * @length: the length of the key * @initval: the previous hash, or an arbitray value * * The generic version, hashes an arbitrary sequence of bytes. * No alignment or length assumptions are made about the input key. * * Returns the hash value of the key. The result depends on endianness. */ static inline u32 jhash(const void *key, u32 length, u32 initval) { u32 a, b, c; const u8 *k = key; /* Set up the internal state */ a = b = c = JHASH_INITVAL + length + initval; /* All but the last block: affect some 32 bits of (a,b,c) */ while (length > 12) { a += __get_unaligned_cpu32(k); b += __get_unaligned_cpu32(k + 4); c += __get_unaligned_cpu32(k + 8); __jhash_mix(a, b, c); length -= 12; k += 12; } /* Last block: affect all 32 bits of (c) */ /* All the case statements fall through */ switch (length) { case 12: c += (u32)k[11]<<24; case 11: c += (u32)k[10]<<16; case 10: c += (u32)k[9]<<8; case 9: c += k[8]; case 8: b += (u32)k[7]<<24; case 7: b += (u32)k[6]<<16; case 6: b += (u32)k[5]<<8; case 5: b += k[4]; case 4: a += (u32)k[3]<<24; case 3: a += (u32)k[2]<<16; case 2: a += (u32)k[1]<<8; case 1: a += k[0]; __jhash_final(a, b, c); case 0: /* Nothing left to add */ break; } return c; } /* jhash2 - hash an array of u32's * @k: the key which must be an array of u32's * @length: the number of u32's in the key * @initval: the previous hash, or an arbitray value * * Returns the hash value of the key. */ static inline u32 jhash2(const u32 *k, u32 length, u32 initval) { u32 a, b, c; /* Set up the internal state */ a = b = c = JHASH_INITVAL + (length<<2) + initval; /* Handle most of the key */ while (length > 3) { a += k[0]; b += k[1]; c += k[2]; __jhash_mix(a, b, c); length -= 3; k += 3; } /* Handle the last 3 u32's: all the case statements fall through */ switch (length) { case 3: c += k[2]; case 2: b += k[1]; case 1: a += k[0]; __jhash_final(a, b, c); case 0: /* Nothing left to add */ break; } return c; } /* jhash_3words - hash exactly 3, 2 or 1 word(s) */ static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval) { a += JHASH_INITVAL; b += JHASH_INITVAL; c += initval; __jhash_final(a, b, c); return c; } static inline u32 jhash_2words(u32 a, u32 b, u32 initval) { return jhash_3words(a, b, 0, initval); } static inline u32 jhash_1word(u32 a, u32 initval) { return jhash_3words(a, 0, 0, initval); } int main(int argc, char **argv) { int i; struct { void *s; void *t; u32 l; } k = { .s = 0 }; int total; total = atoi(argv[1]); for (i = 0; i < total; i++) { k.l = i; printf("0x%x\n", jhash2((u32 *)&k, 3, 12345) & (total - 1)); } return 0; } Cheers,
On 03/18/15 at 08:56am, Herbert Xu wrote: > Actually 75% is just as bad. To test that just repeat my script > above and add head -n $((65536 / 4 * 3)) before the first sort. > > My point is that to get below anything like a chain length limit > of 2 (or even 4), your hash table is going to be mostly empty. > OK 0.1% might be an exaggeration but it's certainly nowhere near 50% > as your hash table size grows. > > Please actually try out my test, here is a C program that will help > you do it: Thanks for providing that C program. I played around with it and noticed a small typo which I fixed up. I'm attaching the version I've used at the end of the mail. Maybe there is a thinko on my part but the results I'm getting are very impressive. I have a very hard time to produce more than a single hash conflict for sequences up to 0..2^27. I also tried with a pseudo random value for the u32 part of the key instead of the sequence number and got the same results. I'm running it like this: for i in $(seq 1 27); do \ echo $i: && ./jhash $((2**$i)) | uniq -D | uniq -c \ done 1: 2: 3: 3 0x1 4: 2 0xc 5: 6: 2 0x2b 7: 8: 9: 10: 11: 2 0x1c9 12: 13: 2 0x9c7 2 0x4e9 14: 15: 2 0x5c8 16: 17: 18: 2 0x34cb2 2 0x21fd0 19: 2 0x61fd0 2 0x6e82f 20: 2 0x61fd0 2 0x6e82f 2 0xfd571 21: 2 0x1a0e02 2 0x1776ea 22: 2 0x379099 23: 2 0x2650b6 2 0x6dc5b5 2 0x648fe1 2 0x543446 24: 2 0x8d60e0 2 0x8726e5 25: 2 0x1225f30 26: 2 0x3a136f4 27: --- jhash.c --- #include <stdio.h> typedef unsigned char u8; typedef unsigned int u32; static inline u32 rol32(u32 word, unsigned int shift) { return (word << shift) | (word >> (32 - shift)); } /* Best hash sizes are of power of two */ #define jhash_size(n) ((u32)1<<(n)) /* Mask the hash value, i.e (value & jhash_mask(n)) instead of (value % n) */ #define jhash_mask(n) (jhash_size(n)-1) /* __jhash_mix -- mix 3 32-bit values reversibly. */ #define __jhash_mix(a, b, c) \ { \ a -= c; a ^= rol32(c, 4); c += b; \ b -= a; b ^= rol32(a, 6); a += c; \ c -= b; c ^= rol32(b, 8); b += a; \ a -= c; a ^= rol32(c, 16); c += b; \ b -= a; b ^= rol32(a, 19); a += c; \ c -= b; c ^= rol32(b, 4); b += a; \ } /* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */ #define __jhash_final(a, b, c) \ { \ c ^= b; c -= rol32(b, 14); \ a ^= c; a -= rol32(c, 11); \ b ^= a; b -= rol32(a, 25); \ c ^= b; c -= rol32(b, 16); \ a ^= c; a -= rol32(c, 4); \ b ^= a; b -= rol32(a, 14); \ c ^= b; c -= rol32(b, 24); \ } #define __get_unaligned_cpu32(x) (*(u32 *)(x)) /* An arbitrary initial parameter */ #define JHASH_INITVAL 0xdeadbeef /* jhash - hash an arbitrary key * @k: sequence of bytes as key * @length: the length of the key * @initval: the previous hash, or an arbitray value * * The generic version, hashes an arbitrary sequence of bytes. * No alignment or length assumptions are made about the input key. * * Returns the hash value of the key. The result depends on endianness. */ static inline u32 jhash(const void *key, u32 length, u32 initval) { u32 a, b, c; const u8 *k = key; /* Set up the internal state */ a = b = c = JHASH_INITVAL + length + initval; /* All but the last block: affect some 32 bits of (a,b,c) */ while (length > 12) { a += __get_unaligned_cpu32(k); b += __get_unaligned_cpu32(k + 4); c += __get_unaligned_cpu32(k + 8); __jhash_mix(a, b, c); length -= 12; k += 12; } /* Last block: affect all 32 bits of (c) */ /* All the case statements fall through */ switch (length) { case 12: c += (u32)k[11]<<24; case 11: c += (u32)k[10]<<16; case 10: c += (u32)k[9]<<8; case 9: c += k[8]; case 8: b += (u32)k[7]<<24; case 7: b += (u32)k[6]<<16; case 6: b += (u32)k[5]<<8; case 5: b += k[4]; case 4: a += (u32)k[3]<<24; case 3: a += (u32)k[2]<<16; case 2: a += (u32)k[1]<<8; case 1: a += k[0]; __jhash_final(a, b, c); case 0: /* Nothing left to add */ break; } return c; } /* jhash2 - hash an array of u32's * @k: the key which must be an array of u32's * @length: the number of u32's in the key * @initval: the previous hash, or an arbitray value * * Returns the hash value of the key. */ static inline u32 jhash2(const u32 *k, u32 length, u32 initval) { u32 a, b, c; /* Set up the internal state */ a = b = c = JHASH_INITVAL + (length<<2) + initval; /* Handle most of the key */ while (length > 3) { a += k[0]; b += k[1]; c += k[2]; __jhash_mix(a, b, c); length -= 3; k += 3; } /* Handle the last 3 u32's: all the case statements fall through */ switch (length) { case 3: c += k[2]; case 2: b += k[1]; case 1: a += k[0]; __jhash_final(a, b, c); case 0: /* Nothing left to add */ break; } return c; } /* jhash_3words - hash exactly 3, 2 or 1 word(s) */ static inline u32 jhash_3words(u32 a, u32 b, u32 c, u32 initval) { a += JHASH_INITVAL; b += JHASH_INITVAL; c += initval; __jhash_final(a, b, c); return c; } static inline u32 jhash_2words(u32 a, u32 b, u32 initval) { return jhash_3words(a, b, 0, initval); } static inline u32 jhash_1word(u32 a, u32 initval) { return jhash_3words(a, 0, 0, initval); } int main(int argc, char **argv) { int i; struct { void *s; void *t; u32 l; } k = { .s = 0 }; int total; u32 initval; total = atoi(argv[1]); srandom(time(0)); initval = random(); for (i = 0; i < total; i++) { k.l = i; printf("0x%x\n", jhash2((u32 *)&k, sizeof(k)/4, initval) & (total - 1)); } return 0; } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 18, 2015 at 09:51:02AM +0000, 'tgraf@suug.ch' wrote: > > I'm running it like this: > for i in $(seq 1 27); do \ > echo $i: && ./jhash $((2**$i)) | uniq -D | uniq -c \ uniq -D doesn't do what you think it does: $ { echo a; echo b; echo a; } | uniq -D $ So you should use sort | uniq -c Cheers,
On 03/18/15 at 08:55pm, Herbert Xu wrote: > On Wed, Mar 18, 2015 at 09:51:02AM +0000, 'tgraf@suug.ch' wrote: > > > > I'm running it like this: > > for i in $(seq 1 27); do \ > > echo $i: && ./jhash $((2**$i)) | uniq -D | uniq -c \ > > uniq -D doesn't do what you think it does: > > $ { echo a; echo b; echo a; } | uniq -D > $ > > So you should use sort | uniq -c Thanks. The important keyword is "adjacent" here ;-) With this fixed up I can see what you mean. So if we are to only do a chain length based decision, the limit would have to grow together with the table size. for i in $(seq 1 27); do echo $i: && ./jhash $((2**$i)) | sort | uniq -D | uniq -c | sort -n -r | head -5; done 1: 2 0x0 2: 3 0x0 3: 3 0x0 2 0x2 4: 2 0xb 2 0x8 2 0x4 2 0x2 2 0x0 5: 3 0xe 3 0x13 2 0xc 2 0xa 2 0x8 6: 3 0x38 3 0x37 3 0x32 3 0x2f 3 0x2e 7: 4 0x37 3 0x7d 3 0x79 3 0x6f 3 0x69 8: 6 0xb7 5 0x79 4 0xf4 4 0xe4 4 0xa3 9: 5 0xb7 5 0x9e 5 0x15b 5 0x13b 4 0xdd 10: 5 0xa4 5 0x79 5 0x29e 5 0x22a 4 0xf4 11: 5 0x92 5 0x4a4 5 0x479 5 0x47 5 0x379 12: 6 0xf1 6 0x751 5 0xf00 5 0xe26 5 0xc26 13: 7 0xf1 6 0x520 6 0x1e06 6 0x1c44 6 0x11ac 14: 7 0x3174 7 0x2f12 6 0x520 6 0x3763 6 0x3615 15: 7 0xf4 7 0x7d0b 7 0x179 6 0x807 6 0x69c9 16: 7 0xf516 7 0xe659 7 0xdf00 7 0x5f5a 7 0x5cc3 17: 7 0xe659 7 0xde5a 7 0xcaf 7 0x9cce 7 0x7fe3 18: 9 0x17f9b 8 0x3852d 7 0x992b 7 0x5e38 7 0x3d39e 19: 8 0x5f22b 8 0x47d66 8 0x436ba 8 0x182aa 7 0xe095 20: 8 0xf87dd 8 0xf464d 8 0xef51d 8 0xc08d4 8 0x91a96 21: 9 0xa7a90 9 0x1fc5ad 8 0xfae2f 8 0xd4123 8 0xb5686 22: 9 0x97b11 9 0x3fcfa3 8 0xffa9e 8 0xfa437 8 0xef7e6 23: 10 0x3b3115 9 0x5cb5ba 9 0x568fbf 9 0x54d790 9 0x4c2393 24: 10 0x86373a 10 0x35794f 9 0xf6d21e 9 0xde5301 9 0xb36b9d -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Mar 18, 2015 at 10:08:12AM +0000, 'tgraf@suug.ch' wrote: > > With this fixed up I can see what you mean. So if we are > to only do a chain length based decision, the limit would > have to grow together with the table size. All we could just use a flat limit of 16 since our maximum table size is 2^32 (actually a bit less) and 16 should be plenty for that. Remember this limit is only there to detect when somebody has stolen our secret is trying to DoS us. So if we can keep them under 16 per-bucket it should be sufficient to defeat any attack. Of course I will also add a patch to limit the number of elements to the table size (so maximum utilisation is 100%). This will come after we allow insertions to fail. Cheers,
From: Herbert Xu .. > Of course I will also add a patch to limit the number of elements > to the table size (so maximum utilisation is 100%). This will come > after we allow insertions to fail. There may be some uses where short hash chains are acceptable. In which case a utilisation way above 100% may make sense. If a table is likely to have repeated lookups for the same item then it can make sense to cache the last looked up item for each chain. This cached value can be written without any locking - the only place care is needed is ensuring it doesn't reference an item that has just been deleted. (You'd want a separate array to not dirty the cache lines containing the head pointers. Although for a large table they are misses anyway.) David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 03/18/15 at 09:12pm, Herbert Xu wrote: > On Wed, Mar 18, 2015 at 10:08:12AM +0000, 'tgraf@suug.ch' wrote: > > > > With this fixed up I can see what you mean. So if we are > > to only do a chain length based decision, the limit would > > have to grow together with the table size. > > All we could just use a flat limit of 16 since our maximum table > size is 2^32 (actually a bit less) and 16 should be plenty for > that. > > Remember this limit is only there to detect when somebody has > stolen our secret is trying to DoS us. So if we can keep them > under 16 per-bucket it should be sufficient to defeat any attack. Agreed, that is certainly sufficient to avoid attacks. I didn't want to give up on getting rid of the need for a table wide nelemes *inside* rhashtable if we have to maintain chain length anyway but I can see the difficulties. The only approach that seems to work is to count the number of buckets with a chain length above a limit that is relative to the table size or something alike which requires to count on table level and if we have to do that we might as well just count the number of elements in the table. > Of course I will also add a patch to limit the number of elements > to the table size (so maximum utilisation is 100%). This will come > after we allow insertions to fail. I'm also attaching distribution of chain length for table sizes for completion's sake. Percentage of buckets with more than 1 entry: 1: 0% 2: 50% 3: 25% 4: 25% 5: 28% 6: 28% 7: 31% 8: 26% 9: 25% 10: 27% 11: 26% 12: 26% 13: 26% 14: 26% 15: 26% 16: 26% 17: 26% 18: 26% 19: 26% 20: 26% 21: 26% 22: 26% 23: 26% Distribution of chain lengths: 1: 1 2 2: 2 2 3: 1 2 1 3 4: 3 2 1 3 5: 6 2 4 3 6: 11 2 4 3 1 4 7: 27 2 8 3 2 4 8: 50 2 15 3 3 4 1 5 9: 101 2 33 3 6 4 1 5 10: 201 2 61 3 17 4 2 5 11: 361 2 134 3 29 4 5 5 1 6 12: 784 2 242 3 49 4 14 5 3 6 13: 1552 2 472 3 118 4 28 5 5 6 1 7 14: 2965 2 996 3 262 4 51 5 9 6 1 7 1 8 15: 6010 2 2029 3 510 4 90 5 21 6 2 7 16: 11993 2 4068 3 1014 4 175 5 40 6 5 7 1 8 17: 24123 2 8100 3 1969 4 403 5 79 6 10 7 3 8 18: 48127 2 16145 3 3930 4 825 5 137 6 25 7 1 8 1 9 19: 96558 2 32250 3 7990 4 1619 5 264 6 40 7 7 8 20: 193267 2 64476 3 16051 4 3140 5 544 6 71 7 9 8 1 9 21: 385293 2 128850 3 32366 4 6344 5 1017 6 128 7 21 8 2 9 22: 769535 2 257779 3 64534 4 13052 5 2177 6 322 7 25 8 1 9 23: 1540606 2 514743 3 130192 4 26207 5 4403 6 635 7 73 8 11 9 24: 3073982 2 1032727 3 262044 4 53475 5 9179 6 1353 7 184 8 14 9 2 10 1 11 25: 6124258 2 2071733 3 534130 4 111862 5 19754 6 3030 7 419 8 56 9 5 10 1 12 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h index 1695378..f16e856 100644 --- a/include/linux/rhashtable.h +++ b/include/linux/rhashtable.h @@ -51,7 +51,6 @@ struct rhash_head { * @size: Number of hash buckets * @rehash: Current bucket being rehashed * @hash_rnd: Random seed to fold into hash - * @shift: Current size (1 << shift) * @locks_mask: Mask to apply before accessing locks[] * @locks: Array of spinlocks protecting individual buckets * @walkers: List of active walkers @@ -63,7 +62,6 @@ struct bucket_table { unsigned int size; unsigned int rehash; u32 hash_rnd; - u32 shift; unsigned int locks_mask; spinlock_t *locks; struct list_head walkers; diff --git a/lib/rhashtable.c b/lib/rhashtable.c index c523d3a..4a99c0c 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -162,7 +162,6 @@ static struct bucket_table *bucket_table_alloc(struct rhashtable *ht, return NULL; tbl->size = nbuckets; - tbl->shift = ilog2(nbuckets); if (alloc_bucket_locks(ht, tbl) < 0) { bucket_table_free(tbl); @@ -189,7 +188,7 @@ static bool rht_grow_above_75(const struct rhashtable *ht, { /* Expand table when exceeding 75% load */ return atomic_read(&ht->nelems) > (tbl->size / 4 * 3) && - (!ht->p.max_shift || tbl->shift < ht->p.max_shift); + (!ht->p.max_shift || tbl->size < (1 << ht->p.max_shift)); } /** @@ -202,7 +201,7 @@ static bool rht_shrink_below_30(const struct rhashtable *ht, { /* Shrink table beneath 30% load */ return atomic_read(&ht->nelems) < (tbl->size * 3 / 10) && - tbl->shift > ht->p.min_shift; + tbl->size > (1 << ht->p.min_shift); } static int rhashtable_rehash_one(struct rhashtable *ht, unsigned old_hash)
Keeping both size and shift is silly. We only need one. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> --- include/linux/rhashtable.h | 2 -- lib/rhashtable.c | 5 ++--- 2 files changed, 2 insertions(+), 5 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html