diff mbox

Ottawa and slow hash-table resize

Message ID 54EC46AB.3030302@iogearbox.net
State RFC
Delegated to: Pablo Neira
Headers show

Commit Message

Daniel Borkmann Feb. 24, 2015, 9:38 a.m. UTC
On 02/24/2015 09:59 AM, Thomas Graf wrote:
> On 02/23/15 at 10:49am, Paul E. McKenney wrote:
>> Hello!
>>
>> Alexei mentioned that there was some excitement a couple of weeks ago in
>> Ottawa, something about the resizing taking forever when there were large
>> numbers of concurrent additions.  One approach comes to mind:
>
> @Dave et al,
> Just want to make sure we have the same level of understanding of
> urgency for this. The only practical problem experienced so far is
> loading n*1M entries into an nft_hash set which takes 3s for 4M
> entries upon restore. A regular add is actually fine as it provides
> a hint and sizes the table accordingly.

Btw, there is one remaining bug in nft that Josh Hunt found which
should go into 3.20/4.0, it's not in -net tree, so it could have
affected testing with nft. We're currently missing max_shift
parameter in nft's rhashtable initialization.

This means that there will be _no_ expansions as rht_grow_above_75()
will end up always returning false. It's not that dramatic when you
have a proper hint provided, but when you start out small
(NFT_HASH_ELEMENT_HINT = 3) and try to squeeze 1M entries into it,
it might take a while.

Simplest fix would be, similarly as in other users:


But I presume Josh wanted to resend his code ... or wait for nft
folks to further review?

> I agree that rhashtable should be improved to better handle many
> inserts on small tables but wanted to make sure whether anyone thinks
> this needs urgent fixing in 3.20 or whether we can take some time to
> fully understand all scenarios and experiment with ideas and then
> propose solutions for the next merge window. I also have the TCP hash
> tables on my radar while improving this.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Patrick McHardy Feb. 24, 2015, 10:42 a.m. UTC | #1
On 24.02, Daniel Borkmann wrote:
> On 02/24/2015 09:59 AM, Thomas Graf wrote:
> >On 02/23/15 at 10:49am, Paul E. McKenney wrote:
> >>Hello!
> >>
> >>Alexei mentioned that there was some excitement a couple of weeks ago in
> >>Ottawa, something about the resizing taking forever when there were large
> >>numbers of concurrent additions.  One approach comes to mind:
> >
> >@Dave et al,
> >Just want to make sure we have the same level of understanding of
> >urgency for this. The only practical problem experienced so far is
> >loading n*1M entries into an nft_hash set which takes 3s for 4M
> >entries upon restore. A regular add is actually fine as it provides
> >a hint and sizes the table accordingly.
> 
> Btw, there is one remaining bug in nft that Josh Hunt found which
> should go into 3.20/4.0, it's not in -net tree, so it could have
> affected testing with nft. We're currently missing max_shift
> parameter in nft's rhashtable initialization.
> 
> This means that there will be _no_ expansions as rht_grow_above_75()
> will end up always returning false. It's not that dramatic when you
> have a proper hint provided, but when you start out small
> (NFT_HASH_ELEMENT_HINT = 3) and try to squeeze 1M entries into it,
> it might take a while.
> 
> Simplest fix would be, similarly as in other users:
> 
> diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
> index 61e6c40..47abdca 100644
> --- a/net/netfilter/nft_hash.c
> +++ b/net/netfilter/nft_hash.c
> @@ -192,6 +192,7 @@ static int nft_hash_init(const struct nft_set *set,
>  		.key_offset = offsetof(struct nft_hash_elem, key),
>  		.key_len = set->klen,
>  		.hashfn = jhash,
> +		.max_shift = 20, /* 1M */
>  		.grow_decision = rht_grow_above_75,
>  		.shrink_decision = rht_shrink_below_30,
>  	};
> 
> But I presume Josh wanted to resend his code ... or wait for nft
> folks to further review?

We're perfectly fine with that patch, although I'd say lets use a
slightly larger value (24) to cover what we know people are doing
using ipset.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Josh Hunt Feb. 24, 2015, 4:14 p.m. UTC | #2
On 02/24/2015 04:42 AM, Patrick McHardy wrote:
> On 24.02, Daniel Borkmann wrote:
>> On 02/24/2015 09:59 AM, Thomas Graf wrote:
>>> On 02/23/15 at 10:49am, Paul E. McKenney wrote:
>>>> Hello!
>>>>
>>>> Alexei mentioned that there was some excitement a couple of weeks ago in
>>>> Ottawa, something about the resizing taking forever when there were large
>>>> numbers of concurrent additions.  One approach comes to mind:
>>>
>>> @Dave et al,
>>> Just want to make sure we have the same level of understanding of
>>> urgency for this. The only practical problem experienced so far is
>>> loading n*1M entries into an nft_hash set which takes 3s for 4M
>>> entries upon restore. A regular add is actually fine as it provides
>>> a hint and sizes the table accordingly.
>>
>> Btw, there is one remaining bug in nft that Josh Hunt found which
>> should go into 3.20/4.0, it's not in -net tree, so it could have
>> affected testing with nft. We're currently missing max_shift
>> parameter in nft's rhashtable initialization.
>>
>> This means that there will be _no_ expansions as rht_grow_above_75()
>> will end up always returning false. It's not that dramatic when you
>> have a proper hint provided, but when you start out small
>> (NFT_HASH_ELEMENT_HINT = 3) and try to squeeze 1M entries into it,
>> it might take a while.
>>
>> Simplest fix would be, similarly as in other users:
>>
>> diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
>> index 61e6c40..47abdca 100644
>> --- a/net/netfilter/nft_hash.c
>> +++ b/net/netfilter/nft_hash.c
>> @@ -192,6 +192,7 @@ static int nft_hash_init(const struct nft_set *set,
>>   		.key_offset = offsetof(struct nft_hash_elem, key),
>>   		.key_len = set->klen,
>>   		.hashfn = jhash,
>> +		.max_shift = 20, /* 1M */
>>   		.grow_decision = rht_grow_above_75,
>>   		.shrink_decision = rht_shrink_below_30,
>>   	};
>>
>> But I presume Josh wanted to resend his code ... or wait for nft
>> folks to further review?
>
> We're perfectly fine with that patch, although I'd say lets use a
> slightly larger value (24) to cover what we know people are doing
> using ipset.
>

I just sent a patch similar to Daniel's in the set 'nft hash resize 
fixes' using a max_shift value of 24. I still think this value should be 
tunable, but sent the patch to fix the immediate expansion problem for now.

Josh
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Feb. 24, 2015, 4:25 p.m. UTC | #3
On 24.02, Josh Hunt wrote:
> On 02/24/2015 04:42 AM, Patrick McHardy wrote:
> >On 24.02, Daniel Borkmann wrote:
> >>
> >>Simplest fix would be, similarly as in other users:
> >>
> >>diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
> >>index 61e6c40..47abdca 100644
> >>--- a/net/netfilter/nft_hash.c
> >>+++ b/net/netfilter/nft_hash.c
> >>@@ -192,6 +192,7 @@ static int nft_hash_init(const struct nft_set *set,
> >>  		.key_offset = offsetof(struct nft_hash_elem, key),
> >>  		.key_len = set->klen,
> >>  		.hashfn = jhash,
> >>+		.max_shift = 20, /* 1M */
> >>  		.grow_decision = rht_grow_above_75,
> >>  		.shrink_decision = rht_shrink_below_30,
> >>  	};
> >>
> >>But I presume Josh wanted to resend his code ... or wait for nft
> >>folks to further review?
> >
> >We're perfectly fine with that patch, although I'd say lets use a
> >slightly larger value (24) to cover what we know people are doing
> >using ipset.
> 
> I just sent a patch similar to Daniel's in the set 'nft hash resize fixes'
> using a max_shift value of 24. I still think this value should be tunable,
> but sent the patch to fix the immediate expansion problem for now.

Thanks. I actually don't think we should require that parameter at
all, any limits shouldn't be imposed by the data structure but by
the code using it. We're perfectly fine using the available memory
as a limit if the users wants this, provided that hash expansion
succeeds and the lookups stay fast.

But for now let's just fix the immediate problem, I agree.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Feb. 24, 2015, 4:57 p.m. UTC | #4
From: Patrick McHardy <kaber@trash.net>
Date: Tue, 24 Feb 2015 16:25:22 +0000

> Thanks. I actually don't think we should require that parameter at
> all, any limits shouldn't be imposed by the data structure but by
> the code using it. We're perfectly fine using the available memory
> as a limit if the users wants this, provided that hash expansion
> succeeds and the lookups stay fast.
> 
> But for now let's just fix the immediate problem, I agree.

I agree, the behavior when the value is unspecified should be to grow
as needed without any limits.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/netfilter/nft_hash.c b/net/netfilter/nft_hash.c
index 61e6c40..47abdca 100644
--- a/net/netfilter/nft_hash.c
+++ b/net/netfilter/nft_hash.c
@@ -192,6 +192,7 @@  static int nft_hash_init(const struct nft_set *set,
  		.key_offset = offsetof(struct nft_hash_elem, key),
  		.key_len = set->klen,
  		.hashfn = jhash,
+		.max_shift = 20, /* 1M */
  		.grow_decision = rht_grow_above_75,
  		.shrink_decision = rht_shrink_below_30,
  	};