diff mbox series

netfilter: Reverse nft_set_lookup_byid list traversal

Message ID 21ed8188-a202-f578-6f8b-303dec37a266@plutex.de
State Deferred
Delegated to: Pablo Neira
Headers show
Series netfilter: Reverse nft_set_lookup_byid list traversal | expand

Commit Message

Jan-Philipp Litza Jan. 7, 2021, 8:56 a.m. UTC
When loading a large ruleset with many anonymous sets,
nft_set_lookup_global is called once for each added set element, which
in turn calls nft_set_lookup_byid if the set was only added in this
transaction.

The longer this transaction's queue of unapplied netlink messages gets,
the longer it takes to traverse it in search for the set referenced by
ID that was probably added near the end if it is an anonymous set. This
patch hence searches the list of unapplied netlink messages in reverse
order, finding the just-added anonymous set faster.

On some reallife ruleset of ~6000 statements and ~1000 anonymous sets,
this patch roughly halves the system time on loading:

Before: 0,06s user 0,39s system 97% cpu 0,459 total
After:  0,06s user 0,20s system 97% cpu 0,268 total

The downside might be that newly added non-anonymous named sets are
probably added at the beginning of a transaction, and looking for them
when adding elements later on takes longer. However, I reckon that named
sets too are more often filled right after their creation. Furthermore,
for named sets, users can optimize their rule structure to add elements
right after set creation, whereas it's impossible to first create all
anonymous sets at the beginning of the transaction to optimize for the
current approach.

Signed-off-by: Jan-Philipp Litza <jpl@plutex.de>
---
 net/netfilter/nf_tables_api.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--
2.27.0

Comments

Pablo Neira Ayuso Jan. 14, 2021, 10:40 p.m. UTC | #1
Hi Jan-Phillip,

On Thu, Jan 07, 2021 at 09:56:42AM +0100, Jan-Philipp Litza wrote:
> When loading a large ruleset with many anonymous sets,
> nft_set_lookup_global is called once for each added set element, which
> in turn calls nft_set_lookup_byid if the set was only added in this
> transaction.
> 
> The longer this transaction's queue of unapplied netlink messages gets,
> the longer it takes to traverse it in search for the set referenced by
> ID that was probably added near the end if it is an anonymous set. This
> patch hence searches the list of unapplied netlink messages in reverse
> order, finding the just-added anonymous set faster.
> 
> On some reallife ruleset of ~6000 statements and ~1000 anonymous sets,
> this patch roughly halves the system time on loading:
> 
> Before: 0,06s user 0,39s system 97% cpu 0,459 total
> After:  0,06s user 0,20s system 97% cpu 0,268 total
> 
> The downside might be that newly added non-anonymous named sets are
> probably added at the beginning of a transaction, and looking for them
> when adding elements later on takes longer. However, I reckon that named
> sets too are more often filled right after their creation. Furthermore,
> for named sets, users can optimize their rule structure to add elements
> right after set creation, whereas it's impossible to first create all
> anonymous sets at the beginning of the transaction to optimize for the
> current approach.

If the .nft file contains lots of (linear syntax):

add rule x y ... { ... }
...
add rule x y ... { ... }

then, this patch is a real gain. In this case, nft currently places
the new anonymous set right before the rule, so your patch makes it
perform nicely.

I hesitate with the nested syntax, ie.

table x {
       chain y {
                ... { ... }
                ...
                ... { ... }
       }
}

In this case, nft adds all the anonymous sets at the beginning of the
netlink message, then rules don't find it right at the end.

Probably it's better to convert this code to use a rhashtable for fast
lookups on the transaction so we don't mind about what userspace does
in the future.

Thanks.

> Signed-off-by: Jan-Philipp Litza <jpl@plutex.de>
> ---
>  net/netfilter/nf_tables_api.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
> index 8d5aa0ac4..c488b6b95 100644
> --- a/net/netfilter/nf_tables_api.c
> +++ b/net/netfilter/nf_tables_api.c
> @@ -3639,7 +3639,7 @@ static struct nft_set *nft_set_lookup_byid(const struct net *net,
>  	struct nft_trans *trans;
>  	u32 id = ntohl(nla_get_be32(nla));
>  
> -	list_for_each_entry(trans, &net->nft.commit_list, list) {
> +	list_for_each_entry_reverse(trans, &net->nft.commit_list, list) {
>  		if (trans->msg_type == NFT_MSG_NEWSET) {
>  			struct nft_set *set = nft_trans_set(trans);
>  
> --
> 2.27.0
>
Jan-Philipp Litza Jan. 19, 2021, 2:22 p.m. UTC | #2
Hi Pablo,

> If the .nft file contains lots of (linear syntax):
> 
> add rule x y ... { ... }
> ...
> add rule x y ... { ... }
> 
> then, this patch is a real gain. In this case, nft currently places
> the new anonymous set right before the rule, so your patch makes it
> perform nicely.
> 
> I hesitate with the nested syntax, ie.
> 
> table x {
>        chain y {
>                 ... { ... }
>                 ...
>                 ... { ... }
>        }
> }
> 
> In this case, nft adds all the anonymous sets at the beginning of the
> netlink message, then rules don't find it right at the end.

Maybe I don't quite understand "at the beginning of the netlink message"
the way you meant it, but we are actually using nested syntax - just
with hundreds of (short) chains - and the performance gains I cited were
from this ruleset, which basically looks like

table filter {
	chain if1 {
		tcp dport 22 ip saddr { x, y, z } accept
	}
}
table filter {
	chain if2 {
		ip saddr { a, b, c } accept
		tcp dport 80 accept
	}
}
...

(Yes, the "table filter" is repeated every time, because the ruleset is
generated. Don't know if that matters.)

So I suspect that nft adds the anonymous sets maybe not immediately
before the elements, but maybe at the beginning of the chain (or the
beginning of the table block, which we repeat).

But maybe, if I have one chain with hundreds of rules, then this patch
degrades loading performance.

> Probably it's better to convert this code to use a rhashtable for fast
> lookups on the transaction so we don't mind about what userspace does
> in the future.

I totally agree. As a non-kernel-hacker, however, this was out of reach
for me. ;-)

Best regards,
Jan-Philipp Litza
diff mbox series

Patch

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 8d5aa0ac4..c488b6b95 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -3639,7 +3639,7 @@  static struct nft_set *nft_set_lookup_byid(const struct net *net,
 	struct nft_trans *trans;
 	u32 id = ntohl(nla_get_be32(nla));
 
-	list_for_each_entry(trans, &net->nft.commit_list, list) {
+	list_for_each_entry_reverse(trans, &net->nft.commit_list, list) {
 		if (trans->msg_type == NFT_MSG_NEWSET) {
 			struct nft_set *set = nft_trans_set(trans);