diff mbox

rhashtable: Add cap on number of elements in hash table

Message ID 20150424005729.GA27075@gondor.apana.org.au
State Deferred, archived
Delegated to: David Miller
Headers show

Commit Message

Herbert Xu April 24, 2015, 12:57 a.m. UTC
On Thu, Apr 23, 2015 at 11:59:19AM -0400, David Miller wrote:
> From: Johannes Berg <johannes@sipsolutions.net>
> Date: Thu, 23 Apr 2015 16:38:43 +0200
> 
> > From: Johannes Berg <johannes.berg@intel.com>
> > 
> > The conversion of mac80211's station table to rhashtable had a bug
> > that I found by accident in code review, that hadn't been found as
> > rhashtable apparently managed to have a maximum hash chain length
> > of one (!) in all our testing.
> > 
> > In order to test the bug and verify the fix I set my rhashtable's
> > max_size very low (4) in order to force getting hash collisions.
> > 
> > At that point, rhashtable WARNed in rhashtable_insert_rehash() but
> > didn't actually reject the hash table insertion. This caused it to
> > lose insertions - my master list of stations would have 9 entries,
> > but the rhashtable only had 5. This may warrant a deeper look, but
> > that WARN_ON() just shouldn't happen.
> > 
> > Fix this by not returning true from rht_grow_above_100() when the
> > rhashtable's max_size has been reached - in this case the user is
> > explicitly configuring it to be at most that big, so even if it's
> > now above 100% it shouldn't attempt to resize.
> > 
> > This fixes the "lost insertion" issue and consequently allows my
> > code to display its error (and verify my fix for it.)
> > 
> > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> 
> It looks fine to me, but I'll let Herbert and Thomas review this.

Thanks for the heads up.

It seems that I lost track somewhere along the line.  I meant
to add an explicit limit on the overall number of entries since
that was what users like netlink expected but never got around
to doing it.  Instead it seems that we're currently relying on
the rht_grow_above_100 to protect us.

So here is a patch that adds an explicit limit and fixes the
problem Johannes reported.

---8<---
We currently have no limit on the number of elements in a hash table.
This is very bad especially considering that some rhashtable users
had such a limit before the conversion and relied on it for defence
against DoS attacks.

We already have a maximum hash table size limit but its enforcement
is only by luck and results in a nasty WARN_ON.

This patch adds a new paramater insecure_max_entries which becomes
the cap on the table.  If unset it defaults to max_size.  If it is
also zero it means that there is no cap on the number of elements
in the table.  However, the table will grow whenever the utilisation
hits 100% and if that growth fails, you will get ENOMEM on insertion.

As allowing >100% utilisation is potentially dangerous, the name
contains the word insecure.

Note that the cap is not a hard limit.  This is done for performance
reasons as enforcing a hard limit will result in use of atomic ops
that are heavier than the ones we currently use.

The reasoning is that we're only guarding against a gross over-
subscription of the table, rather than a small breach of the limit.

Reported-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Comments

Johannes Berg April 24, 2015, 7:01 a.m. UTC | #1
On Fri, 2015-04-24 at 08:57 +0800, Herbert Xu wrote:

> It seems that I lost track somewhere along the line.  I meant
> to add an explicit limit on the overall number of entries since
> that was what users like netlink expected but never got around
> to doing it.  Instead it seems that we're currently relying on
> the rht_grow_above_100 to protect us.

This isn't really what I wanted though :-)

I just wanted to test hash collisions.

> We currently have no limit on the number of elements in a hash table.
> This is very bad especially considering that some rhashtable users
> had such a limit before the conversion and relied on it for defence
> against DoS attacks.
> 
> We already have a maximum hash table size limit but its enforcement
> is only by luck and results in a nasty WARN_ON.

And doesn't actually work, the insertion appears to succeed :-)

> This patch adds a new paramater insecure_max_entries which becomes

typo: parameter

> the cap on the table.  If unset it defaults to max_size.  

So at least for my (admittedly testing only) use case, I wouldn't want
it to default to max_size, since the two at least *seem* to do different
things (max # of chains vs. max # of entries), no?

Anyway - since it's for testing only I guess I could even set max_size
to 4 and insecure_max_entries to something far bigger :)

> If it is
> also zero it means that there is no cap on the number of elements
> in the table.  However, the table will grow whenever the utilisation
> hits 100% and if that growth fails, you will get ENOMEM on insertion.
> 
> As allowing >100% utilisation is potentially dangerous, the name
> contains the word insecure.

Not sure I get this. So rhashtable is trying to actually never have
collisions? How could that possibly work?

> @@ -282,7 +285,20 @@ static inline bool rht_shrink_below_30(const struct rhashtable *ht,
>  static inline bool rht_grow_above_100(const struct rhashtable *ht,
>  				      const struct bucket_table *tbl)
>  {
> -	return atomic_read(&ht->nelems) > tbl->size;
> +	return atomic_read(&ht->nelems) > tbl->size &&
> +	       (!ht->p.max_size || tbl->size < ht->p.max_size);
> +}

Since you're also doing what I did here, would it make sense to apply my
patch to net and this one only to net-next?

For my use case (which was testing/debug) I don't actually care that
much, but perhaps that'd be an easier sell towards the end of the merge
window :) It seems that my patch would mostly fix the *issue*, while
yours actually adds a new parameter that's also not actually used yet.

The netlink hash table could potentially hit max_size and thus the
warning and the case I was hitting (on a system with >>64k netlink
sockets.)

johannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu April 24, 2015, 8:04 a.m. UTC | #2
On Fri, Apr 24, 2015 at 09:01:10AM +0200, Johannes Berg wrote:
>
> > As allowing >100% utilisation is potentially dangerous, the name
> > contains the word insecure.
> 
> Not sure I get this. So rhashtable is trying to actually never have
> collisions? How could that possibly work?

Of course it's not to eliminate collisions, but to limit the chain
length to something reasonable.  Using a hashtable at >100% capacity
is usually not reasonable.

> Since you're also doing what I did here, would it make sense to apply my
> patch to net and this one only to net-next?

That would make af_netlink a security hole again because you can now
add unlimited entries to a hash table limited to 64K entries.  Prior
to the rhashtable conversion you weren't allowed to have more than
64K entries in the table.  This was lost in the conversion and I was
trying to restore it.

> For my use case (which was testing/debug) I don't actually care that
> much, but perhaps that'd be an easier sell towards the end of the merge
> window :) It seems that my patch would mostly fix the *issue*, while
> yours actually adds a new parameter that's also not actually used yet.

Well if my patch is too complex then sure we can look at coming up
with a simpler fix but I think we need something that does not let
you add unlimited entries to af_netlink.

Cheers,
Thomas Graf April 24, 2015, 8:06 a.m. UTC | #3
On 04/24/15 at 08:57am, Herbert Xu wrote:
> It seems that I lost track somewhere along the line.  I meant
> to add an explicit limit on the overall number of entries since
> that was what users like netlink expected but never got around
> to doing it.  Instead it seems that we're currently relying on
> the rht_grow_above_100 to protect us.

Can we please just take Johannes's fix as-is first? It fixes
the bug at hand in an isolated manner without introducing any
new knobs. Your patch includes his fix as-is without modification
anyway.

> So here is a patch that adds an explicit limit and fixes the
> problem Johannes reported.
> 
> ---8<---
> We currently have no limit on the number of elements in a hash table.
> This is very bad especially considering that some rhashtable users
> had such a limit before the conversion and relied on it for defence
> against DoS attacks.

Which users are you talking about? Both Netlink and TIPC still
have an upper limit. nft sets are controlled by privileged users.

> We already have a maximum hash table size limit but its enforcement
> is only by luck and results in a nasty WARN_ON.

As I stated earlier, this is no longer the case and thus this
paragraph only confuses the commit message.

> This patch adds a new paramater insecure_max_entries which becomes
> the cap on the table.  If unset it defaults to max_size.  If it is
> also zero it means that there is no cap on the number of elements
> in the table.  However, the table will grow whenever the utilisation
> hits 100% and if that growth fails, you will get ENOMEM on insertion.

Last time we discussed this it was said that the caller should enforce
the limit like Netlink does.

I'm fine with adding an upper max but I'd like to discuss that in the
context of a full series which converts all existing enforcements and
also contains a testing mechanism to verify this. Also, unless you can
show me where this is currently a real bug, this is really net-next
material.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu April 24, 2015, 8:12 a.m. UTC | #4
On Fri, Apr 24, 2015 at 09:06:08AM +0100, Thomas Graf wrote:
>
> Which users are you talking about? Both Netlink and TIPC still
> have an upper limit. nft sets are controlled by privileged users.

There is no limit in netlink apart from UINT_MAX AFAICS.  Allowing
UINT_MAX entries into a hash table limited to 64K is not a good
thing.

Cheers,
Thomas Graf April 24, 2015, 8:15 a.m. UTC | #5
On 04/24/15 at 04:12pm, Herbert Xu wrote:
> On Fri, Apr 24, 2015 at 09:06:08AM +0100, Thomas Graf wrote:
> >
> > Which users are you talking about? Both Netlink and TIPC still
> > have an upper limit. nft sets are controlled by privileged users.
> 
> There is no limit in netlink apart from UINT_MAX AFAICS.  Allowing
> UINT_MAX entries into a hash table limited to 64K is not a good
> thing.

OK, so you are saying that the Netlink limit is too low? Then let's
fix that.

You are claiming that the rhashtable convertion removed a cap. I'm
not seeing such a change. Can you point me to where netlink_insert()
enforced a cap pre-rhashtable?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu April 24, 2015, 8:22 a.m. UTC | #6
On Fri, Apr 24, 2015 at 09:15:39AM +0100, Thomas Graf wrote:
> 
> You are claiming that the rhashtable convertion removed a cap. I'm
> not seeing such a change. Can you point me to where netlink_insert()
> enforced a cap pre-rhashtable?

OK you are right.  We never had such a limit.  In that case I'm
OK with Johannes's original patch.

Cheers,
David Miller April 24, 2015, 3:38 p.m. UTC | #7
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 24 Apr 2015 16:22:11 +0800

> On Fri, Apr 24, 2015 at 09:15:39AM +0100, Thomas Graf wrote:
>> 
>> You are claiming that the rhashtable convertion removed a cap. I'm
>> not seeing such a change. Can you point me to where netlink_insert()
>> enforced a cap pre-rhashtable?
> 
> OK you are right.  We never had such a limit.  In that case I'm
> OK with Johannes's original patch.

Ok I'm applying Johanne's patch then, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index e23d242..95b5a36 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -17,6 +17,7 @@ 
 #ifndef _LINUX_RHASHTABLE_H
 #define _LINUX_RHASHTABLE_H
 
+#include <linux/atomic.h>
 #include <linux/compiler.h>
 #include <linux/errno.h>
 #include <linux/jhash.h>
@@ -100,6 +101,7 @@  struct rhashtable;
  * @key_len: Length of key
  * @key_offset: Offset of key in struct to be hashed
  * @head_offset: Offset of rhash_head in struct to be hashed
+ * @insecure_max_entries: Maximum number of entries (may be exceeded)
  * @max_size: Maximum size while expanding
  * @min_size: Minimum size while shrinking
  * @nulls_base: Base value to generate nulls marker
@@ -115,6 +117,7 @@  struct rhashtable_params {
 	size_t			key_len;
 	size_t			key_offset;
 	size_t			head_offset;
+	unsigned int		insecure_max_entries;
 	unsigned int		max_size;
 	unsigned int		min_size;
 	u32			nulls_base;
@@ -282,7 +285,20 @@  static inline bool rht_shrink_below_30(const struct rhashtable *ht,
 static inline bool rht_grow_above_100(const struct rhashtable *ht,
 				      const struct bucket_table *tbl)
 {
-	return atomic_read(&ht->nelems) > tbl->size;
+	return atomic_read(&ht->nelems) > tbl->size &&
+	       (!ht->p.max_size || tbl->size < ht->p.max_size);
+}
+
+/**
+ * rht_grow_above_max - returns true if table is above maximum
+ * @ht:		hash table
+ * @tbl:	current table
+ */
+static inline bool rht_grow_above_max(const struct rhashtable *ht,
+				      const struct bucket_table *tbl)
+{
+	return ht->p.insecure_max_entries &&
+	       atomic_read(&ht->nelems) >= ht->p.insecure_max_entries;
 }
 
 /* The bucket lock is selected based on the hash and protects mutations
@@ -611,6 +627,10 @@  slow_path:
 			goto slow_path;
 	}
 
+	err = -E2BIG;
+	if (unlikely(rht_grow_above_max(ht, tbl)))
+		goto out;
+
 	err = 0;
 
 	head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash);
diff --git a/lib/rhashtable.c b/lib/rhashtable.c
index 4898442..6716841 100644
--- a/lib/rhashtable.c
+++ b/lib/rhashtable.c
@@ -14,6 +14,7 @@ 
  * published by the Free Software Foundation.
  */
 
+#include <linux/atomic.h>
 #include <linux/kernel.h>
 #include <linux/init.h>
 #include <linux/log2.h>
@@ -446,6 +447,10 @@  int rhashtable_insert_slow(struct rhashtable *ht, const void *key,
 	    rht_grow_above_100(ht, tbl))
 		goto exit;
 
+	err = -E2BIG;
+	if (unlikely(rht_grow_above_max(ht, tbl)))
+		goto exit;
+
 	err = 0;
 
 	head = rht_dereference_bucket(tbl->buckets[hash], tbl, hash);
@@ -733,6 +738,12 @@  int rhashtable_init(struct rhashtable *ht,
 	if (params->max_size)
 		ht->p.max_size = rounddown_pow_of_two(params->max_size);
 
+	if (params->insecure_max_entries)
+		ht->p.insecure_max_entries =
+			rounddown_pow_of_two(params->insecure_max_entries);
+	else
+		ht->p.insecure_max_entries = ht->p.max_size;
+
 	ht->p.min_size = max(ht->p.min_size, HASH_MIN_SIZE);
 
 	/* The maximum (not average) chain length grows with the