Message ID | 20150513080640.GA27800@gondor.apana.org.au |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
From: Herbert Xu <herbert@gondor.apana.org.au> Date: Wed, 13 May 2015 16:06:40 +0800 > We currently have no limit on the number of elements in a hash table. > This is a problem because some users (tipc) set a ceiling on the > maximum table size and when that is reached the hash table may > degenerate. Others may encounter OOM when growing and if we allow > insertions when that happens the hash table perofrmance may also > suffer. > > This patch adds a new paramater insecure_max_entries which becomes > the cap on the table. If unset it defaults to max_size. If it is > also zero it means that there is no cap on the number of elements > in the table. However, the table will grow whenever the utilisation > hits 100% and if that growth fails, you will get ENOMEM on insertion. > > As allowing >100% utilisation is potentially dangerous, the name > contains the word insecure. > > Note that the cap is not a hard limit. This is done for performance > reasons as enforcing a hard limit will result in use of atomic ops > that are heavier than the ones we currently use. > > The reasoning is that we're only guarding against a gross over- > subscription of the table, rather than a small breach of the limit. > > Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> I'm not so sure I can get behind this change. For the default case where we propagate max_size into this new limt, if we grow the hash table to it's maximum size, and then just go one past the limit, we're going to start failing inserts. In my opinion, up to at least 2 X max_size, it's safe to allow the insert. Assuming a well choosen hash function and a roughly even distribution. A two entry deep hash chain is not a reason to fail a hash insertion. Failures on hash insertions we would not do in pretty much any other hash implementation in the kernel. I'd seriously would rather have 3 entry deep hash chains than no new connections at all. If an administrator has to analyze some situation where this is happening and they see -E2BIG propagating into their applications, this is going to be surprising. And when they do all of the work to figure out what is causing it, I am pretty sure they will be somewhat upset that we considered it OK to do this. So I'd like you to take some time to reconsider hash insert failures, and only do them when it's going to create an unnaceptable, _massive_ hardship for the machine. And I do not think that 2 or 3 entry deep hash chains quality. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 14, 2015 at 10:22:17PM -0400, David Miller wrote: > > In my opinion, up to at least 2 X max_size, it's safe to allow the > insert. Assuming a well choosen hash function and a roughly even > distribution. OK I can make it 2 x max_size/table size. Cheers,
From: Herbert Xu <herbert@gondor.apana.org.au> Date: Fri, 15 May 2015 11:06:23 +0800 > On Thu, May 14, 2015 at 10:22:17PM -0400, David Miller wrote: >> >> In my opinion, up to at least 2 X max_size, it's safe to allow the >> insert. Assuming a well choosen hash function and a roughly even >> distribution. > > OK I can make it 2 x max_size/table size. The rest of my email after what you quoted was intended to get one to consider this issue generally. :-) We wouldn't fail these inserts in any other hash table in the kernel. Would we stop making new TCP sockets if the TCP ehash chains are 3 entries deep? 4? 5? The answer to all of those is of course no for any hash chain length of N whatsoever. This new rhashtable behavior would be the default, and I seriously doubt that's a behavior people who use a hash table, generally speaking, desire or want. Should there perhaps be hard protections for _extremely_ long hash chains? Sure, I'm willing to entertain that kind of idea. But I would do so at the very far end of the spectrum. To the point where the hash table is degenerating into a linked list. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, May 14, 2015 at 11:46:15PM -0400, David Miller wrote: > > We wouldn't fail these inserts in any other hash table in the kernel. > > Would we stop making new TCP sockets if the TCP ehash chains are 3 > entries deep? 4? 5? The answer to all of those is of course no > for any hash chain length of N whatsoever. I would agree with you if this was a fixed sized table. If your table grows with you (which is the whole point of rhashtable) then we should never hit this until you run out of memory. When you are running out of memory whether you fail when the table growth fails or later when you can't even allocate an entry is immaterial because failure is inevitable. In my view everybody should be using rhashtable without a maximum size. The only place where it would make sense to have a maximum size limit is if you also had a limit on the number of entries. In which cas you might as well make that the limit on the hash table size. > Should there perhaps be hard protections for _extremely_ long hash > chains? Sure, I'm willing to entertain that kind of idea. But I > would do so at the very far end of the spectrum. To the point where > the hash table is degenerating into a linked list. Do you have any suggestions of what such a limit should be? Cheers,
From: Herbert Xu <herbert@gondor.apana.org.au> Date: Fri, 15 May 2015 14:30:57 +0800 > On Thu, May 14, 2015 at 11:46:15PM -0400, David Miller wrote: >> >> We wouldn't fail these inserts in any other hash table in the kernel. >> >> Would we stop making new TCP sockets if the TCP ehash chains are 3 >> entries deep? 4? 5? The answer to all of those is of course no >> for any hash chain length of N whatsoever. > > I would agree with you if this was a fixed sized table. If your > table grows with you (which is the whole point of rhashtable) then > we should never hit this until you run out of memory. > > When you are running out of memory whether you fail when the table > growth fails or later when you can't even allocate an entry is > immaterial because failure is inevitable. > > In my view everybody should be using rhashtable without a maximum > size. The only place where it would make sense to have a maximum > size limit is if you also had a limit on the number of entries. > In which cas you might as well make that the limit on the hash table > size. Ok, agreed. >> Should there perhaps be hard protections for _extremely_ long hash >> chains? Sure, I'm willing to entertain that kind of idea. But I >> would do so at the very far end of the spectrum. To the point where >> the hash table is degenerating into a linked list. > > Do you have any suggestions of what such a limit should be? Good question. Obviously something like 50 or 100 is too much. Perhaps something between 5 and 10. That's just my gut instinct speaking. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, May 16, 2015 at 06:09:46PM -0400, David Miller wrote: > > Obviously something like 50 or 100 is too much. > > Perhaps something between 5 and 10. You are even more parsimonious than I :) Because the maximum chain length grows at a rate of logN * loglogN, I had chosen the number 16 which should be sufficient even if you had a 2^32 table (which you currently can't as we max out somewhere below that). FWIW Thomas did some numbers on this and apparent 10 gets breached in a 2^24 table at 100% utilisation. Cheers,
From: Herbert Xu <herbert@gondor.apana.org.au> Date: Sun, 17 May 2015 09:38:29 +0800 > On Sat, May 16, 2015 at 06:09:46PM -0400, David Miller wrote: >> >> Obviously something like 50 or 100 is too much. >> >> Perhaps something between 5 and 10. > > You are even more parsimonious than I :) Because the maximum chain > length grows at a rate of logN * loglogN, I had chosen the number > 16 which should be sufficient even if you had a 2^32 table (which > you currently can't as we max out somewhere below that). > > FWIW Thomas did some numbers on this and apparent 10 gets breached > in a 2^24 table at 100% utilisation. Ok, this of course depends upon the distribution of the input data and the strength/suitability of the hash function. I'm a little bit disappointed in what Thomas found. I would expect the distribution to be at least a little bit better. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 18, 2015 at 04:12:01PM -0400, David Miller wrote: > > Ok, this of course depends upon the distribution of the input data > and the strength/suitability of the hash function. > > I'm a little bit disappointed in what Thomas found. I would expect > the distribution to be at least a little bit better. Just to make it clear this is not a problem with uneven distribution. The same thing would happen with any hash function. It's essentially a generalised birthday paradox. Of course the average chain length is still what you expect it to be, it's only the maximum chain length that grows at logN loglogN. Cheers,
From: Herbert Xu > Sent: 18 May 2015 23:35 > On Mon, May 18, 2015 at 04:12:01PM -0400, David Miller wrote: > > > > Ok, this of course depends upon the distribution of the input data > > and the strength/suitability of the hash function. > > > > I'm a little bit disappointed in what Thomas found. I would expect > > the distribution to be at least a little bit better. > > Just to make it clear this is not a problem with uneven distribution. > The same thing would happen with any hash function. It's essentially > a generalised birthday paradox. Which means that with real data and a real hash function it is very likely to be worse. Did anyone look at what happens to the longest chains if you put 2 or 4 times as many items into the hash table? Clearly there will be more short chains, but the longest might not get proportionally longer (guessing at probability theory). Personally I don't like the idea of rejecting inserts because chains might get long, and certainly not because a malloc() fails in the middle of the resize operation. I'm sure there are some tables where slightly longer chains don't matter. David -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h index dbcbcc5..f3deeaa 100644 --- a/include/linux/rhashtable.h +++ b/include/linux/rhashtable.h @@ -17,6 +17,7 @@ #ifndef _LINUX_RHASHTABLE_H #define _LINUX_RHASHTABLE_H +#include <linux/atomic.h> #include <linux/compiler.h> #include <linux/errno.h> #include <linux/jhash.h> @@ -100,6 +101,7 @@ struct rhashtable; * @key_len: Length of key * @key_offset: Offset of key in struct to be hashed * @head_offset: Offset of rhash_head in struct to be hashed + * @insecure_max_entries: Maximum number of entries (may be exceeded) * @max_size: Maximum size while expanding * @min_size: Minimum size while shrinking * @nulls_base: Base value to generate nulls marker @@ -115,6 +117,7 @@ struct rhashtable_params { size_t key_len; size_t key_offset; size_t head_offset; + unsigned int insecure_max_entries; unsigned int max_size; unsigned int min_size; u32 nulls_base; @@ -286,6 +289,18 @@ static inline bool rht_grow_above_100(const struct rhashtable *ht, (!ht->p.max_size || tbl->size < ht->p.max_size); } +/** + * rht_grow_above_max - returns true if table is above maximum + * @ht: hash table + * @tbl: current table + */ +static inline bool rht_grow_above_max(const struct rhashtable *ht, + const struct bucket_table *tbl) +{ + return ht->p.insecure_max_entries && + atomic_read(&ht->nelems) >= ht->p.insecure_max_entries; +} + /* The bucket lock is selected based on the hash and protects mutations * on a group of hash buckets. * @@ -612,6 +627,10 @@ slow_path: goto slow_path; } + err = -E2BIG; + if (unlikely(rht_grow_above_max(ht, tbl))) + goto out; + err = 0; head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash); diff --git a/lib/rhashtable.c b/lib/rhashtable.c index 4936fc4..cfd3408 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -14,6 +14,7 @@ * published by the Free Software Foundation. */ +#include <linux/atomic.h> #include <linux/kernel.h> #include <linux/init.h> #include <linux/log2.h> @@ -451,6 +452,10 @@ int rhashtable_insert_slow(struct rhashtable *ht, const void *key, rht_grow_above_100(ht, tbl)) goto exit; + err = -E2BIG; + if (unlikely(rht_grow_above_max(ht, tbl))) + goto exit; + err = 0; head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash); @@ -734,6 +739,12 @@ int rhashtable_init(struct rhashtable *ht, if (params->max_size) ht->p.max_size = rounddown_pow_of_two(params->max_size); + if (params->insecure_max_entries) + ht->p.insecure_max_entries = + rounddown_pow_of_two(params->insecure_max_entries); + else + ht->p.insecure_max_entries = ht->p.max_size; + ht->p.min_size = max(ht->p.min_size, HASH_MIN_SIZE); /* The maximum (not average) chain length grows with the
We currently have no limit on the number of elements in a hash table. This is a problem because some users (tipc) set a ceiling on the maximum table size and when that is reached the hash table may degenerate. Others may encounter OOM when growing and if we allow insertions when that happens the hash table perofrmance may also suffer. This patch adds a new paramater insecure_max_entries which becomes the cap on the table. If unset it defaults to max_size. If it is also zero it means that there is no cap on the number of elements in the table. However, the table will grow whenever the utilisation hits 100% and if that growth fails, you will get ENOMEM on insertion. As allowing >100% utilisation is potentially dangerous, the name contains the word insecure. Note that the cap is not a hard limit. This is done for performance reasons as enforcing a hard limit will result in use of atomic ops that are heavier than the ones we currently use. The reasoning is that we're only guarding against a gross over- subscription of the table, rather than a small breach of the limit. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>