Message ID | 20150424005729.GA27075@gondor.apana.org.au |
---|---|
State | Deferred, archived |
Delegated to: | David Miller |
Headers | show |
On Fri, 2015-04-24 at 08:57 +0800, Herbert Xu wrote: > It seems that I lost track somewhere along the line. I meant > to add an explicit limit on the overall number of entries since > that was what users like netlink expected but never got around > to doing it. Instead it seems that we're currently relying on > the rht_grow_above_100 to protect us. This isn't really what I wanted though :-) I just wanted to test hash collisions. > We currently have no limit on the number of elements in a hash table. > This is very bad especially considering that some rhashtable users > had such a limit before the conversion and relied on it for defence > against DoS attacks. > > We already have a maximum hash table size limit but its enforcement > is only by luck and results in a nasty WARN_ON. And doesn't actually work, the insertion appears to succeed :-) > This patch adds a new paramater insecure_max_entries which becomes typo: parameter > the cap on the table. If unset it defaults to max_size. So at least for my (admittedly testing only) use case, I wouldn't want it to default to max_size, since the two at least *seem* to do different things (max # of chains vs. max # of entries), no? Anyway - since it's for testing only I guess I could even set max_size to 4 and insecure_max_entries to something far bigger :) > If it is > also zero it means that there is no cap on the number of elements > in the table. However, the table will grow whenever the utilisation > hits 100% and if that growth fails, you will get ENOMEM on insertion. > > As allowing >100% utilisation is potentially dangerous, the name > contains the word insecure. Not sure I get this. So rhashtable is trying to actually never have collisions? How could that possibly work? > @@ -282,7 +285,20 @@ static inline bool rht_shrink_below_30(const struct rhashtable *ht, > static inline bool rht_grow_above_100(const struct rhashtable *ht, > const struct bucket_table *tbl) > { > - return atomic_read(&ht->nelems) > tbl->size; > + return atomic_read(&ht->nelems) > tbl->size && > + (!ht->p.max_size || tbl->size < ht->p.max_size); > +} Since you're also doing what I did here, would it make sense to apply my patch to net and this one only to net-next? For my use case (which was testing/debug) I don't actually care that much, but perhaps that'd be an easier sell towards the end of the merge window :) It seems that my patch would mostly fix the *issue*, while yours actually adds a new parameter that's also not actually used yet. The netlink hash table could potentially hit max_size and thus the warning and the case I was hitting (on a system with >>64k netlink sockets.) johannes -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 24, 2015 at 09:01:10AM +0200, Johannes Berg wrote: > > > As allowing >100% utilisation is potentially dangerous, the name > > contains the word insecure. > > Not sure I get this. So rhashtable is trying to actually never have > collisions? How could that possibly work? Of course it's not to eliminate collisions, but to limit the chain length to something reasonable. Using a hashtable at >100% capacity is usually not reasonable. > Since you're also doing what I did here, would it make sense to apply my > patch to net and this one only to net-next? That would make af_netlink a security hole again because you can now add unlimited entries to a hash table limited to 64K entries. Prior to the rhashtable conversion you weren't allowed to have more than 64K entries in the table. This was lost in the conversion and I was trying to restore it. > For my use case (which was testing/debug) I don't actually care that > much, but perhaps that'd be an easier sell towards the end of the merge > window :) It seems that my patch would mostly fix the *issue*, while > yours actually adds a new parameter that's also not actually used yet. Well if my patch is too complex then sure we can look at coming up with a simpler fix but I think we need something that does not let you add unlimited entries to af_netlink. Cheers,
On 04/24/15 at 08:57am, Herbert Xu wrote: > It seems that I lost track somewhere along the line. I meant > to add an explicit limit on the overall number of entries since > that was what users like netlink expected but never got around > to doing it. Instead it seems that we're currently relying on > the rht_grow_above_100 to protect us. Can we please just take Johannes's fix as-is first? It fixes the bug at hand in an isolated manner without introducing any new knobs. Your patch includes his fix as-is without modification anyway. > So here is a patch that adds an explicit limit and fixes the > problem Johannes reported. > > ---8<--- > We currently have no limit on the number of elements in a hash table. > This is very bad especially considering that some rhashtable users > had such a limit before the conversion and relied on it for defence > against DoS attacks. Which users are you talking about? Both Netlink and TIPC still have an upper limit. nft sets are controlled by privileged users. > We already have a maximum hash table size limit but its enforcement > is only by luck and results in a nasty WARN_ON. As I stated earlier, this is no longer the case and thus this paragraph only confuses the commit message. > This patch adds a new paramater insecure_max_entries which becomes > the cap on the table. If unset it defaults to max_size. If it is > also zero it means that there is no cap on the number of elements > in the table. However, the table will grow whenever the utilisation > hits 100% and if that growth fails, you will get ENOMEM on insertion. Last time we discussed this it was said that the caller should enforce the limit like Netlink does. I'm fine with adding an upper max but I'd like to discuss that in the context of a full series which converts all existing enforcements and also contains a testing mechanism to verify this. Also, unless you can show me where this is currently a real bug, this is really net-next material. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 24, 2015 at 09:06:08AM +0100, Thomas Graf wrote: > > Which users are you talking about? Both Netlink and TIPC still > have an upper limit. nft sets are controlled by privileged users. There is no limit in netlink apart from UINT_MAX AFAICS. Allowing UINT_MAX entries into a hash table limited to 64K is not a good thing. Cheers,
On 04/24/15 at 04:12pm, Herbert Xu wrote: > On Fri, Apr 24, 2015 at 09:06:08AM +0100, Thomas Graf wrote: > > > > Which users are you talking about? Both Netlink and TIPC still > > have an upper limit. nft sets are controlled by privileged users. > > There is no limit in netlink apart from UINT_MAX AFAICS. Allowing > UINT_MAX entries into a hash table limited to 64K is not a good > thing. OK, so you are saying that the Netlink limit is too low? Then let's fix that. You are claiming that the rhashtable convertion removed a cap. I'm not seeing such a change. Can you point me to where netlink_insert() enforced a cap pre-rhashtable? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 24, 2015 at 09:15:39AM +0100, Thomas Graf wrote: > > You are claiming that the rhashtable convertion removed a cap. I'm > not seeing such a change. Can you point me to where netlink_insert() > enforced a cap pre-rhashtable? OK you are right. We never had such a limit. In that case I'm OK with Johannes's original patch. Cheers,
From: Herbert Xu <herbert@gondor.apana.org.au> Date: Fri, 24 Apr 2015 16:22:11 +0800 > On Fri, Apr 24, 2015 at 09:15:39AM +0100, Thomas Graf wrote: >> >> You are claiming that the rhashtable convertion removed a cap. I'm >> not seeing such a change. Can you point me to where netlink_insert() >> enforced a cap pre-rhashtable? > > OK you are right. We never had such a limit. In that case I'm > OK with Johannes's original patch. Ok I'm applying Johanne's patch then, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h index e23d242..95b5a36 100644 --- a/include/linux/rhashtable.h +++ b/include/linux/rhashtable.h @@ -17,6 +17,7 @@ #ifndef _LINUX_RHASHTABLE_H #define _LINUX_RHASHTABLE_H +#include <linux/atomic.h> #include <linux/compiler.h> #include <linux/errno.h> #include <linux/jhash.h> @@ -100,6 +101,7 @@ struct rhashtable; * @key_len: Length of key * @key_offset: Offset of key in struct to be hashed * @head_offset: Offset of rhash_head in struct to be hashed + * @insecure_max_entries: Maximum number of entries (may be exceeded) * @max_size: Maximum size while expanding * @min_size: Minimum size while shrinking * @nulls_base: Base value to generate nulls marker @@ -115,6 +117,7 @@ struct rhashtable_params { size_t key_len; size_t key_offset; size_t head_offset; + unsigned int insecure_max_entries; unsigned int max_size; unsigned int min_size; u32 nulls_base; @@ -282,7 +285,20 @@ static inline bool rht_shrink_below_30(const struct rhashtable *ht, static inline bool rht_grow_above_100(const struct rhashtable *ht, const struct bucket_table *tbl) { - return atomic_read(&ht->nelems) > tbl->size; + return atomic_read(&ht->nelems) > tbl->size && + (!ht->p.max_size || tbl->size < ht->p.max_size); +} + +/** + * rht_grow_above_max - returns true if table is above maximum + * @ht: hash table + * @tbl: current table + */ +static inline bool rht_grow_above_max(const struct rhashtable *ht, + const struct bucket_table *tbl) +{ + return ht->p.insecure_max_entries && + atomic_read(&ht->nelems) >= ht->p.insecure_max_entries; } /* The bucket lock is selected based on the hash and protects mutations @@ -611,6 +627,10 @@ slow_path: goto slow_path; } + err = -E2BIG; + if (unlikely(rht_grow_above_max(ht, tbl))) + goto out; + err = 0; head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash); diff --git a/lib/rhashtable.c b/lib/rhashtable.c index 4898442..6716841 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -14,6 +14,7 @@ * published by the Free Software Foundation. */ +#include <linux/atomic.h> #include <linux/kernel.h> #include <linux/init.h> #include <linux/log2.h> @@ -446,6 +447,10 @@ int rhashtable_insert_slow(struct rhashtable *ht, const void *key, rht_grow_above_100(ht, tbl)) goto exit; + err = -E2BIG; + if (unlikely(rht_grow_above_max(ht, tbl))) + goto exit; + err = 0; head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash); @@ -733,6 +738,12 @@ int rhashtable_init(struct rhashtable *ht, if (params->max_size) ht->p.max_size = rounddown_pow_of_two(params->max_size); + if (params->insecure_max_entries) + ht->p.insecure_max_entries = + rounddown_pow_of_two(params->insecure_max_entries); + else + ht->p.insecure_max_entries = ht->p.max_size; + ht->p.min_size = max(ht->p.min_size, HASH_MIN_SIZE); /* The maximum (not average) chain length grows with the