Message ID | 20180112134111.19132-1-fw@strlen.de |
---|---|
State | RFC |
Delegated to: | Pablo Neira |
Headers | show |
Series | [RFC,nft] src: ct: add connection counting support | expand |
Hi Florian, On Fri, Jan 12, 2018 at 02:41:11PM +0100, Florian Westphal wrote: > This adds support for a native connlimit replacement. > > It re-uses nftables concatenation support and thus allows > using any key combinations supported by nftables. > > Because counting is expensive, it is useful to limit this > to new connections only. Example: > > ct state new ct count { ip saddr . tcp dport } gt 10 drop > > TODO: man page entry. > > NB: This uses {} to separate ct count statement from grouping to > avoid shift/reduce conflicts in the parser, unlike fib we do not > have distinct 'end marker' available. If your concern is this {} curly braces, I think that should be fine from a semantic point of view. From the kernel perspective, I wonder if it would be good to place this rbtree that allows us to count in a nftables set, so we can create maps that people can flush and that can also populate from userspace via API (me thinking of this usecase: userspace software, updating reputation ranks for IP addresses based on more heuristics, using this new set type, if that makes sense to you, of course). I understand this might be more work - I haven't seen your patch to add nf_conncount to nftables, but I suspect you already made a bit of progress - so this turn may trigger some rework. Let me know, thanks! -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 15, 2018 at 11:46:39AM +0100, Pablo Neira Ayuso wrote: > Hi Florian, > > On Fri, Jan 12, 2018 at 02:41:11PM +0100, Florian Westphal wrote: > > This adds support for a native connlimit replacement. > > > > It re-uses nftables concatenation support and thus allows > > using any key combinations supported by nftables. > > > > Because counting is expensive, it is useful to limit this > > to new connections only. Example: > > > > ct state new ct count { ip saddr . tcp dport } gt 10 drop > > > > TODO: man page entry. > > > > NB: This uses {} to separate ct count statement from grouping to > > avoid shift/reduce conflicts in the parser, unlike fib we do not > > have distinct 'end marker' available. > > If your concern is this {} curly braces, I think that should be fine > from a semantic point of view. Extending this sentence: I'm telling this because my understanding is that the behaviour of this feature is basically to add elements to an anonymous set - that is represented through a rb-tree on the backend side. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > NB: This uses {} to separate ct count statement from grouping to > > avoid shift/reduce conflicts in the parser, unlike fib we do not > > have distinct 'end marker' available. > > If your concern is this {} curly braces, I think that should be fine > from a semantic point of view. Ok, good to know. > From the kernel perspective, I wonder if it would be good to place > this rbtree that allows us to count in a nftables set, so we can > create maps that people can flush and that can also populate from > userspace via API (me thinking of this usecase: userspace software, > updating reputation ranks for IP addresses based on more heuristics, > using this new set type, if that makes sense to you, of course). I will need to think about this. Basically the rbtree is a kludge because we can't store it in the conntrack table and on-demand counting of the conntrack table would be way too expensive. > I understand this might be more work - I haven't seen your patch to > add nf_conncount to nftables, but I suspect you already made a bit of > progress - so this turn may trigger some rework. The current patch is here: https://git.breakpoint.cc/cgit/fw/nf-next.git/commit/?id=82c931a0f896abf654c961859b9dc5c485f0a033 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 15, 2018 at 11:59:47AM +0100, Florian Westphal wrote: > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > > NB: This uses {} to separate ct count statement from grouping to > > > avoid shift/reduce conflicts in the parser, unlike fib we do not > > > have distinct 'end marker' available. > > > > If your concern is this {} curly braces, I think that should be fine > > from a semantic point of view. > > Ok, good to know. > > > From the kernel perspective, I wonder if it would be good to place > > this rbtree that allows us to count in a nftables set, so we can > > create maps that people can flush and that can also populate from > > userspace via API (me thinking of this usecase: userspace software, > > updating reputation ranks for IP addresses based on more heuristics, > > using this new set type, if that makes sense to you, of course). > > I will need to think about this. This is more than what iptables is doing, of course. But now that we're on this, we have a chance to re-think if it makes sense to leverage this infrastructure in nftables. > Basically the rbtree is a kludge because we can't store it in the > conntrack table and on-demand counting of the conntrack table would > be way too expensive. I see, actually we could even place this in a hashtable instead, right? Not asking you to do this, just thinking aloud here. > > I understand this might be more work - I haven't seen your patch to > > add nf_conncount to nftables, but I suspect you already made a bit of > > progress - so this turn may trigger some rework. > > The current patch is here: > > https://git.breakpoint.cc/cgit/fw/nf-next.git/commit/?id=82c931a0f896abf654c961859b9dc5c485f0a033 That's very much done work already, sorry I didn't rise this any sooner. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > Basically the rbtree is a kludge because we can't store it in the > > conntrack table and on-demand counting of the conntrack table would > > be way too expensive. > > I see, actually we could even place this in a hashtable instead, > right? Not asking you to do this, just thinking aloud here. Yes. rbtree was only chosen due to deterministic behaviour and ability to do gc lookups while traversing nodes. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/expression.h b/include/expression.h index 0a0e178fe468..b80d081b1595 100644 --- a/include/expression.h +++ b/include/expression.h @@ -303,6 +303,7 @@ struct expr { enum proto_bases base; int8_t direction; uint8_t nfproto; + struct expr *expr; } ct; struct { /* EXPR_NUMGEN */ diff --git a/include/linux/netfilter/nf_tables.h b/include/linux/netfilter/nf_tables.h index a3ee277b17a1..695e307e6e8e 100644 --- a/include/linux/netfilter/nf_tables.h +++ b/include/linux/netfilter/nf_tables.h @@ -929,6 +929,7 @@ enum nft_ct_keys { NFT_CT_AVGPKT, NFT_CT_ZONE, NFT_CT_EVENTMASK, + NFT_CT_COUNT, }; /** diff --git a/src/ct.c b/src/ct.c index d5347974bd0d..ffa5fb70154a 100644 --- a/src/ct.c +++ b/src/ct.c @@ -269,6 +269,8 @@ static const struct ct_template ct_templates[] = { BYTEORDER_HOST_ENDIAN, 16), [NFT_CT_EVENTMASK] = CT_TEMPLATE("event", &ct_event_type, BYTEORDER_HOST_ENDIAN, 32), + [NFT_CT_COUNT] = CT_TEMPLATE("count", &integer_type, + BYTEORDER_HOST_ENDIAN, 32), }; static void ct_print(enum nft_ct_keys key, int8_t dir, uint8_t nfproto, @@ -306,6 +308,16 @@ static void ct_print(enum nft_ct_keys key, int8_t dir, uint8_t nfproto, static void ct_expr_print(const struct expr *expr, struct output_ctx *octx) { ct_print(expr->ct.key, expr->ct.direction, expr->ct.nfproto, octx); + + switch (expr->ct.key) { + case NFT_CT_COUNT: + nft_print(octx, " { "); + expr_print(expr->ct.expr, octx); + nft_print(octx, " }"); + break; + default: + break; + } } static bool ct_expr_cmp(const struct expr *e1, const struct expr *e2) diff --git a/src/evaluate.c b/src/evaluate.c index 7fe738d8d590..c8ebbf02a11a 100644 --- a/src/evaluate.c +++ b/src/evaluate.c @@ -767,6 +767,13 @@ static int expr_evaluate_ct(struct eval_ctx *ctx, struct expr **expr) case NFT_CT_DST: ct_gen_nh_dependency(ctx, ct); break; + case NFT_CT_COUNT: + if (ct->ct.expr == NULL) + return expr_error(ctx->msgs, ct, "missing key\n"); + + if (expr_evaluate(ctx, &ct->ct.expr) < 0) + return -1; + break; default: break; } diff --git a/src/netlink_delinearize.c b/src/netlink_delinearize.c index 2637f4baaec4..8ad3bf239019 100644 --- a/src/netlink_delinearize.c +++ b/src/netlink_delinearize.c @@ -718,6 +718,28 @@ static void netlink_parse_ct_expr(struct netlink_parse_ctx *ctx, key = nftnl_expr_get_u32(nle, NFTNL_EXPR_CT_KEY); expr = ct_expr_alloc(loc, key, dir, NFPROTO_UNSPEC); + if (key == NFT_CT_COUNT) { + enum nft_registers sreg; + struct expr *sexpr; + uint32_t len; + + sreg = netlink_parse_register(nle, NFTNL_EXPR_CT_SREG); + sexpr = netlink_get_register(ctx, loc, sreg); + + if (sexpr == NULL) + return netlink_error(ctx, loc, + "ct count has no expression"); + + len = nftnl_expr_get_u32(nle, + NFTNL_EXPR_CT_SREG_LEN) * BITS_PER_BYTE; + if (sexpr->len < len) { + sexpr = netlink_parse_concat_expr(ctx, loc, sreg, len); + if (sexpr == NULL) + return; + } + expr->ct.expr = sexpr; + } + dreg = netlink_parse_register(nle, NFTNL_EXPR_CT_DREG); netlink_set_register(ctx, dreg, expr); } @@ -1848,6 +1870,8 @@ static void expr_postprocess(struct rule_pp_ctx *ctx, struct expr **exprp) expr_postprocess(ctx, &expr->hash.expr); break; case EXPR_CT: + if (expr->ct.expr) + expr_postprocess(ctx, &expr->ct.expr); ct_expr_update_type(&ctx->pctx, expr); break; default: diff --git a/src/netlink_linearize.c b/src/netlink_linearize.c index 99a4dde22adb..76457c6d324a 100644 --- a/src/netlink_linearize.c +++ b/src/netlink_linearize.c @@ -234,6 +234,18 @@ static void netlink_gen_ct(struct netlink_linearize_ctx *ctx, nftnl_expr_set_u8(nle, NFTNL_EXPR_CT_DIR, expr->ct.direction); + if (expr->ct.expr) { + enum nft_registers sreg; + + sreg = get_register(ctx, expr->ct.expr); + netlink_gen_expr(ctx, expr->ct.expr, sreg); + release_register(ctx, expr->ct.expr); + netlink_put_register(nle, NFTNL_EXPR_CT_SREG, sreg); + nftnl_expr_set_u32(nle, NFTNL_EXPR_CT_SREG_LEN, + div_round_up(expr->ct.expr->len, + BITS_PER_BYTE)); + } + nftnl_rule_add_expr(ctx->nlr, nle); } diff --git a/src/parser_bison.y b/src/parser_bison.y index 6e85a62804d4..cd934b9c22dc 100644 --- a/src/parser_bison.y +++ b/src/parser_bison.y @@ -3323,6 +3323,11 @@ ct_expr : CT ct_key { $$ = ct_expr_alloc(&@$, $4, $2, $3); } + | CT COUNT '{' concat_expr '}' + { + $$ = ct_expr_alloc(&@$, NFT_CT_COUNT, -1, NFPROTO_UNSPEC); + $$->ct.expr = $4; + } ; ct_dir : ORIGINAL { $$ = IP_CT_DIR_ORIGINAL; } diff --git a/tests/py/inet/ct.t b/tests/py/inet/ct.t index 1a656aa4375f..25533da51803 100644 --- a/tests/py/inet/ct.t +++ b/tests/py/inet/ct.t @@ -6,6 +6,8 @@ meta nfproto ipv4 ct original saddr 1.2.3.4;ok;ct original ip saddr 1.2.3.4 ct original ip6 saddr ::1;ok +ct state new ct count { ip saddr . tcp dport } gt 2 counter;ok;ct state new ct count { ip saddr . tcp dport } > 2 counter + # missing protocol context ct original saddr ::1;fail diff --git a/tests/py/inet/ct.t.payload b/tests/py/inet/ct.t.payload index 97128eccf540..61e9692670d1 100644 --- a/tests/py/inet/ct.t.payload +++ b/tests/py/inet/ct.t.payload @@ -11,3 +11,19 @@ inet test-inet input [ cmp eq reg 1 0x0000000a ] [ ct load src => reg 1 , dir original ] [ cmp eq reg 1 0x00000000 0x00000000 0x00000000 0x01000000 ] + +# ct state new ct count { ip saddr . tcp dport } gt 2 counter +inet test-inet input + [ ct load state => reg 1 ] + [ bitwise reg 1 = (reg=1 & 0x00000008 ) ^ 0x00000000 ] + [ cmp neq reg 1 0x00000000 ] + [ meta load nfproto => reg 1 ] + [ cmp eq reg 1 0x00000002 ] + [ meta load l4proto => reg 1 ] + [ cmp eq reg 1 0x00000006 ] + [ payload load 4b @ network header + 12 => reg 2 ] + [ payload load 2b @ transport header + 2 => reg 13 ] + [ ct reg 1 = ct count (reg 2, 8) ] + [ byteorder reg 1 = hton(reg 1, 4, 4) ] + [ cmp gt reg 1 0x02000000 ] + [ counter pkts 0 bytes 0 ]
This adds support for a native connlimit replacement. It re-uses nftables concatenation support and thus allows using any key combinations supported by nftables. Because counting is expensive, it is useful to limit this to new connections only. Example: ct state new ct count { ip saddr . tcp dport } gt 10 drop TODO: man page entry. NB: This uses {} to separate ct count statement from grouping to avoid shift/reduce conflicts in the parser, unlike fib we do not have distinct 'end marker' available. Signed-off-by: Florian Westphal <fw@strlen.de> --- include/expression.h | 1 + include/linux/netfilter/nf_tables.h | 1 + src/ct.c | 12 ++++++++++++ src/evaluate.c | 7 +++++++ src/netlink_delinearize.c | 24 ++++++++++++++++++++++++ src/netlink_linearize.c | 12 ++++++++++++ src/parser_bison.y | 5 +++++ tests/py/inet/ct.t | 2 ++ tests/py/inet/ct.t.payload | 16 ++++++++++++++++ 9 files changed, 80 insertions(+)