| Message ID | 20250508153358.8015-1-fw@strlen.de |
|---|---|
| State | Changes Requested, archived |
| Headers | show |
| Series | [nft] src: add conntrack information to trace monitor mode | expand |
Hi Florian, On Thu, May 08, 2025 at 05:33:56PM +0200, Florian Westphal wrote: > Upcoming kernel change provides the packets conntrack state in the > trace message data. > > This allows to see if packet is seen as original or reply, the conntrack > state (new, establieshed, related) and the status bits which show if e.g. > NAT was applied. Alsoi include conntrack ID so users can use conntrack > tool to query the kernel for more information via ctnetlink. > > This improves debugging when e.g. packets do not pick up the expected > NAT mapping, which could e.g. also happen because of expectations > following the NAT binding of the owning conntrack entry. This feature will be present in the next kernel. > Example output ("conntrack: " lines are new): > > trace id 32 t PRE_RAW packet: iif "enp0s3" ether saddr [..] > trace id 32 t PRE_RAW rule tcp flags syn meta nftrace set 1 (verdict continue) > trace id 32 t PRE_RAW policy accept > trace id 32 t PRE_MANGLE conntrack: ct direction original ct state new ct id 2641368242 > trace id 32 t PRE_MANGLE packet: iif "enp0s3" ether saddr [..] > trace id 32 t ct_new_pre rule jump rpfilter (verdict jump rpfilter) > trace id 32 t PRE_MANGLE policy accept > trace id 32 t INPUT conntrack: ct direction original ct state new ct status dnat-done ct id 2641368242 > trace id 32 t INPUT packet: iif "enp0s3" [..] > trace id 32 t public_in rule tcp dport 443 accept (verdict accept) > > Signed-off-by: Florian Westphal <fw@strlen.de> Comment below. > --- > src/ct.c | 4 ++ > src/netlink.c | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 114 insertions(+) > > diff --git a/src/ct.c b/src/ct.c > index 71ebb9483893..fc43bf63f02c 100644 > --- a/src/ct.c > +++ b/src/ct.c > @@ -98,7 +98,11 @@ static const struct symbol_table ct_status_tbl = { > SYMBOL("confirmed", IPS_CONFIRMED), > SYMBOL("snat", IPS_SRC_NAT), > SYMBOL("dnat", IPS_DST_NAT), > + SYMBOL("seq-adjust", IPS_SEQ_ADJUST), > + SYMBOL("snat-done", IPS_SRC_NAT_DONE), > + SYMBOL("dnat-done", IPS_DST_NAT_DONE), These will be now exposed through 'ct status' forever, not sure we have a usecase to allow users to match on this. I know these are exposed through uapi, but I don't have a usecase for them to allow users to match on them. Maybe it is _not_ worth, those flags have been there since the beginning. If we go for exposing these flags through ct status, do you think it is possible to provide terse description of these "new flags" in the manpage? > SYMBOL("dying", IPS_DYING), > + SYMBOL("fixed-timeout", IPS_FIXED_TIMEOUT), > SYMBOL_LIST_END > }, > }; > diff --git a/src/netlink.c b/src/netlink.c > index 86ca32144f02..b1d1dc7f4bd1 100644 > --- a/src/netlink.c > +++ b/src/netlink.c > @@ -2116,6 +2116,114 @@ next: > } > } > > +static struct expr *trace_alloc_list(const struct datatype *dtype, > + enum byteorder byteorder, > + unsigned int len, const void *data) Suggestion, not deal breaker: It would be good to start a new src/trace.c file to add this and move tracing infrastructure?
Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > @@ -98,7 +98,11 @@ static const struct symbol_table ct_status_tbl = { > > SYMBOL("confirmed", IPS_CONFIRMED), > > SYMBOL("snat", IPS_SRC_NAT), > > SYMBOL("dnat", IPS_DST_NAT), > > + SYMBOL("seq-adjust", IPS_SEQ_ADJUST), > > + SYMBOL("snat-done", IPS_SRC_NAT_DONE), > > + SYMBOL("dnat-done", IPS_DST_NAT_DONE), > > These will be now exposed through 'ct status' forever, not sure we > have a usecase to allow users to match on this. I don't. Its so ct status is pretty printed here: trace id 32 t INPUT conntrack: ct ... ct status dnat-done ... I can see if I can somehow restrict this to the trace display side. OTOH I don't see a reason to artificially limit this. > I know these are exposed through uapi, but I don't have a usecase for > them to allow users to match on them. I don't have that either. But I don't see why we should prevent it. > If we go for exposing these flags through ct status, do you think it > is possible to provide terse description of these "new flags" in the > manpage? Sure, will add. > > --- a/src/netlink.c > > +++ b/src/netlink.c > > @@ -2116,6 +2116,114 @@ next: > > } > > } > > > > +static struct expr *trace_alloc_list(const struct datatype *dtype, > > + enum byteorder byteorder, > > + unsigned int len, const void *data) > > Suggestion, not deal breaker: It would be good to start a new > src/trace.c file to add this and move tracing infrastructure? OK, let me respin this after splitting the existing stuff to trace.c. Thanks for reviewing!
diff --git a/src/ct.c b/src/ct.c index 71ebb9483893..fc43bf63f02c 100644 --- a/src/ct.c +++ b/src/ct.c @@ -98,7 +98,11 @@ static const struct symbol_table ct_status_tbl = { SYMBOL("confirmed", IPS_CONFIRMED), SYMBOL("snat", IPS_SRC_NAT), SYMBOL("dnat", IPS_DST_NAT), + SYMBOL("seq-adjust", IPS_SEQ_ADJUST), + SYMBOL("snat-done", IPS_SRC_NAT_DONE), + SYMBOL("dnat-done", IPS_DST_NAT_DONE), SYMBOL("dying", IPS_DYING), + SYMBOL("fixed-timeout", IPS_FIXED_TIMEOUT), SYMBOL_LIST_END }, }; diff --git a/src/netlink.c b/src/netlink.c index 86ca32144f02..b1d1dc7f4bd1 100644 --- a/src/netlink.c +++ b/src/netlink.c @@ -2116,6 +2116,114 @@ next: } } +static struct expr *trace_alloc_list(const struct datatype *dtype, + enum byteorder byteorder, + unsigned int len, const void *data) +{ + struct expr *list_expr; + unsigned int i; + mpz_t value; + uint32_t v; + + if (len != sizeof(v)) + return constant_expr_alloc(&netlink_location, + dtype, byteorder, + len * BITS_PER_BYTE, data); + + list_expr = list_expr_alloc(&netlink_location); + + mpz_init2(value, 32); + mpz_import_data(value, data, byteorder, len); + v = mpz_get_uint32(value); + if (v == 0) { + mpz_clear(value); + return NULL; + } + + for (i = 0; i < 32; i++) { + uint32_t bitv = v & (1 << i); + + if (bitv == 0) + continue; + + compound_expr_add(list_expr, + constant_expr_alloc(&netlink_location, + dtype, byteorder, + len * BITS_PER_BYTE, + &bitv)); + } + + mpz_clear(value); + return list_expr; +} + +static void trace_print_ct_expr(const struct nftnl_trace *nlt, unsigned int attr, + enum nft_ct_keys key, struct output_ctx *octx) +{ + struct expr *lhs, *rhs, *rel; + const void *data; + uint32_t len; + + data = nftnl_trace_get_data(nlt, attr, &len); + lhs = ct_expr_alloc(&netlink_location, key, -1); + + switch (key) { + case NFT_CT_STATUS: + rhs = trace_alloc_list(lhs->dtype, lhs->byteorder, len, data); + if (!rhs) { + expr_free(lhs); + return; + } + rel = binop_expr_alloc(&netlink_location, OP_IMPLICIT, lhs, rhs); + break; + case NFT_CT_DIRECTION: + case NFT_CT_STATE: + case NFT_CT_ID: + /* fallthrough */ + default: + rhs = constant_expr_alloc(&netlink_location, + lhs->dtype, lhs->byteorder, + len * BITS_PER_BYTE, data); + rel = relational_expr_alloc(&netlink_location, OP_IMPLICIT, lhs, rhs); + break; + } + + expr_print(rel, octx); + nft_print(octx, " "); + expr_free(rel); +} + +static void trace_print_ct(const struct nftnl_trace *nlt, + struct output_ctx *octx) +{ + bool ct = nftnl_trace_is_set(nlt, NFTNL_TRACE_CT_STATE); + + if (!ct) + return; + + trace_print_hdr(nlt, octx); + + nft_print(octx, "conntrack: "); + + if (nftnl_trace_is_set(nlt, NFTNL_TRACE_CT_DIRECTION)) + trace_print_ct_expr(nlt, NFTNL_TRACE_CT_DIRECTION, + NFT_CT_DIRECTION, octx); + + if (nftnl_trace_is_set(nlt, NFTNL_TRACE_CT_STATE)) + trace_print_ct_expr(nlt, NFTNL_TRACE_CT_STATE, + NFT_CT_STATE, octx); + + if (nftnl_trace_is_set(nlt, NFTNL_TRACE_CT_STATUS)) + trace_print_ct_expr(nlt, NFTNL_TRACE_CT_STATUS, + NFT_CT_STATUS, octx); + + if (nftnl_trace_is_set(nlt, NFTNL_TRACE_CT_ID)) + trace_print_ct_expr(nlt, NFTNL_TRACE_CT_ID, + NFT_CT_ID, octx); + + nft_print(octx, "\n"); +} + static void trace_print_packet(const struct nftnl_trace *nlt, struct output_ctx *octx) { @@ -2127,6 +2235,8 @@ static void trace_print_packet(const struct nftnl_trace *nlt, uint32_t nfproto; struct stmt *stmt, *next; + trace_print_ct(nlt, octx); + trace_print_hdr(nlt, octx); nft_print(octx, "packet: ");
Upcoming kernel change provides the packets conntrack state in the trace message data. This allows to see if packet is seen as original or reply, the conntrack state (new, establieshed, related) and the status bits which show if e.g. NAT was applied. Alsoi include conntrack ID so users can use conntrack tool to query the kernel for more information via ctnetlink. This improves debugging when e.g. packets do not pick up the expected NAT mapping, which could e.g. also happen because of expectations following the NAT binding of the owning conntrack entry. Example output ("conntrack: " lines are new): trace id 32 t PRE_RAW packet: iif "enp0s3" ether saddr [..] trace id 32 t PRE_RAW rule tcp flags syn meta nftrace set 1 (verdict continue) trace id 32 t PRE_RAW policy accept trace id 32 t PRE_MANGLE conntrack: ct direction original ct state new ct id 2641368242 trace id 32 t PRE_MANGLE packet: iif "enp0s3" ether saddr [..] trace id 32 t ct_new_pre rule jump rpfilter (verdict jump rpfilter) trace id 32 t PRE_MANGLE policy accept trace id 32 t INPUT conntrack: ct direction original ct state new ct status dnat-done ct id 2641368242 trace id 32 t INPUT packet: iif "enp0s3" [..] trace id 32 t public_in rule tcp dport 443 accept (verdict accept) Signed-off-by: Florian Westphal <fw@strlen.de> --- src/ct.c | 4 ++ src/netlink.c | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 114 insertions(+)