Message ID | 20190916165000.18217-9-phil@nwl.cc |
---|---|
State | Changes Requested |
Delegated to: | Pablo Neira |
Headers | show |
Series | Improve iptables-nft performance with large rulesets | expand |
Hi Phil, On Mon, Sep 16, 2019 at 06:49:54PM +0200, Phil Sutter wrote: > When called without --noflush, don't fetch full ruleset from kernel but > merely list of tables and the current genid. Locally, initialize chain > lists and set have_cache to simulate an empty ruleset. > > Doing so reduces execution time significantly if a large ruleset exists > in kernel space. A simple test case consisting of a dump with 100,000 > rules can be restored within 15s on my testing VM. Restoring it a second > time took 21s before this patch and only 17s after it. > > Signed-off-by: Phil Sutter <phil@nwl.cc> > --- > iptables/nft.c | 27 ++++++++++++++++++++++++++- > iptables/nft.h | 1 + > iptables/xtables-restore.c | 7 +++++-- > 3 files changed, 32 insertions(+), 3 deletions(-) > > diff --git a/iptables/nft.c b/iptables/nft.c > index 7f0f9e1234ae4..820f3392dd495 100644 > --- a/iptables/nft.c > +++ b/iptables/nft.c > @@ -882,7 +882,8 @@ static int flush_cache(struct nft_cache *c, const struct builtin_table *tables, > nftnl_chain_list_free(c->table[i].chains); > c->table[i].chains = NULL; > } > - nftnl_table_list_free(c->tables); > + if (c->tables) > + nftnl_table_list_free(c->tables); > c->tables = NULL; > > return 1; > @@ -1617,6 +1618,30 @@ void nft_build_cache(struct nft_handle *h) > __nft_build_cache(h); > } > > +void nft_fake_cache(struct nft_handle *h) > +{ > + int i; > + > + if (h->have_cache) > + return; > + > + /* fetch tables so conditional table delete logic works */ > + if (!h->cache->tables) > + fetch_table_cache(h); > + > + for (i = 0; i < NFT_TABLE_MAX; i++) { > + enum nft_table_type type = h->tables[i].type; > + > + if (!h->tables[i].name || > + h->cache->table[type].chains) > + continue; > + > + h->cache->table[type].chains = nftnl_chain_list_alloc(); > + } > + mnl_genid_get(h, &h->nft_genid); > + h->have_cache = true; > +} Looking at patches from 8/24 onwards, I think it's time to introduce cache flags logic to iptables. In patch 9/14 there is a new field have_chain_cache. In patch 10/14 there is have_rule_cache. Then moving on, the logic is based on checking for this booleans and then checking if the caches are initialized or not. I think if we move towards cache flags logic (the flags tell us what if we need no cache, partial cache (only tables, tables + chains) or full cache. This would make this logic easier to understand and more maintainable.
Hi Pablo, On Fri, Sep 20, 2019 at 01:57:02PM +0200, Pablo Neira Ayuso wrote: [...] > Looking at patches from 8/24 onwards, I think it's time to introduce > cache flags logic to iptables. > > In patch 9/14 there is a new field have_chain_cache. > > In patch 10/14 there is have_rule_cache. > > Then moving on, the logic is based on checking for this booleans and > then checking if the caches are initialized or not. > > I think if we move towards cache flags logic (the flags tell us what > if we need no cache, partial cache (only tables, tables + chains) or > full cache. > > This would make this logic easier to understand and more maintainable. I am not entirely sure this is a feasible approach: On one hand, we certainly can introduce cache "levels" to distinguish between: - no cache - tables - chains - rules simply because all these naturally build upon each others. On the other hand, in my series I pushed the envelope (and watched it bend) a bit further: There are basically two modes of caching: - "traditional" breadth-first, compatible with above: e.g. all tables, if needed all chains as well, etc. - a new depth-first mode which allows to fetch e.g. a certain chain's rules but no other chains/rules. While the first mode is fine, second one really gives us the edge in situations where legacy iptables is faster simply because "number crunching" is more optimized there. This second mode doesn't have explicit completion indicators like the first one (with 'have_cache', etc.). In fact, the code treats it like a fire-and-forget situation: If e.g. the same chain is requested twice, the second time cache is simply fetched again but nftnl_chain_list_cb() discards the fetched chain if one with same name already exists in table's chain list. Actually, the only application which requires more attention is xtables-restore with --noflush. It is able to run multiple commands consecutively and these may need kernel ruleset as context. Still we don't want to fetch the full kernel ruleset upon each invocation because it's how users speed up their ruleset manipulations. In summary, things boil down to the following options: A) Avoid caching, fetch only the most necessary things to do the job. B) Build a full cache if needed anyway or if we can't predict. C) Create a fake cache if we know kernel's ruleset is irrelevant. I'll give it another try, aligning cache update logic to the above and see if things become clean enough to be considered maintainable. Cheers, Phil
diff --git a/iptables/nft.c b/iptables/nft.c index 7f0f9e1234ae4..820f3392dd495 100644 --- a/iptables/nft.c +++ b/iptables/nft.c @@ -882,7 +882,8 @@ static int flush_cache(struct nft_cache *c, const struct builtin_table *tables, nftnl_chain_list_free(c->table[i].chains); c->table[i].chains = NULL; } - nftnl_table_list_free(c->tables); + if (c->tables) + nftnl_table_list_free(c->tables); c->tables = NULL; return 1; @@ -1617,6 +1618,30 @@ void nft_build_cache(struct nft_handle *h) __nft_build_cache(h); } +void nft_fake_cache(struct nft_handle *h) +{ + int i; + + if (h->have_cache) + return; + + /* fetch tables so conditional table delete logic works */ + if (!h->cache->tables) + fetch_table_cache(h); + + for (i = 0; i < NFT_TABLE_MAX; i++) { + enum nft_table_type type = h->tables[i].type; + + if (!h->tables[i].name || + h->cache->table[type].chains) + continue; + + h->cache->table[type].chains = nftnl_chain_list_alloc(); + } + mnl_genid_get(h, &h->nft_genid); + h->have_cache = true; +} + static void __nft_flush_cache(struct nft_handle *h) { if (!h->cache_index) { diff --git a/iptables/nft.h b/iptables/nft.h index 43463d7f262e8..d9eb30a2a2413 100644 --- a/iptables/nft.h +++ b/iptables/nft.h @@ -74,6 +74,7 @@ int mnl_talk(struct nft_handle *h, struct nlmsghdr *nlh, int nft_init(struct nft_handle *h, const struct builtin_table *t); void nft_fini(struct nft_handle *h); void nft_build_cache(struct nft_handle *h); +void nft_fake_cache(struct nft_handle *h); /* * Operations with tables. diff --git a/iptables/xtables-restore.c b/iptables/xtables-restore.c index f930f5ba2d167..d1486afedc480 100644 --- a/iptables/xtables-restore.c +++ b/iptables/xtables-restore.c @@ -96,6 +96,11 @@ void xtables_restore_parse(struct nft_handle *h, line = 0; + if (h->noflush) + nft_build_cache(h); + else + nft_fake_cache(h); + /* Grab standard input. */ while (fgets(buffer, sizeof(buffer), p->in)) { int ret = 0; @@ -146,8 +151,6 @@ void xtables_restore_parse(struct nft_handle *h, if (p->tablename && (strcmp(p->tablename, table) != 0)) continue; - nft_build_cache(h); - if (h->noflush == 0) { DEBUGP("Cleaning all chains of table '%s'\n", table);
When called without --noflush, don't fetch full ruleset from kernel but merely list of tables and the current genid. Locally, initialize chain lists and set have_cache to simulate an empty ruleset. Doing so reduces execution time significantly if a large ruleset exists in kernel space. A simple test case consisting of a dump with 100,000 rules can be restored within 15s on my testing VM. Restoring it a second time took 21s before this patch and only 17s after it. Signed-off-by: Phil Sutter <phil@nwl.cc> --- iptables/nft.c | 27 ++++++++++++++++++++++++++- iptables/nft.h | 1 + iptables/xtables-restore.c | 7 +++++-- 3 files changed, 32 insertions(+), 3 deletions(-)