Message ID | 1448359331-12692-7-git-send-email-fw@strlen.de |
---|---|
State | Changes Requested |
Delegated to: | Pablo Neira |
Headers | show |
On 24.11, Florian Westphal wrote: > nft monitor [ trace ] > > ... can now display nftables nftrace debug information. > > $ nft rule bridge raw prerouting add tcp dport 22 limit rate 1/second meta nftrace set 1 > $ nft monitor trace > trace id 834dd100 bridge packet src 5e:95:99:72:ea:c5 dst 52:54:40:a2:3f:a6 src 192.168.7.1 dst 192.168.7.11 len 88 ttl 64 id 2719 protocol 6 sport 3628 dport 22 iif eth0 > trace id 834dd100 bridge raw prerouting rule verdict continue iif eth0 > trace id 834dd100 rule tcp dport ssh limit rate 1/second nftrace set 1 > trace id 834dd100 bridge raw prerouting policy verdict accept iif eth0 > trace id 834dd100 ip filter input rule verdict jump iif br0 > trace id 834dd100 rule ip saddr . tcp dport vmap { } > trace id 834dd100 ip filter test rule verdict accept iif br0 > trace id 834dd100 rule accept I like this *a lot*. No need for external tools and a much more readable output than using any of the other logging mechanisms. Nice work! > +static void trace_print_if(const struct nftnl_trace *nlt, uint16_t attr, const char *str) > +{ > + char __name[IFNAMSIZ]; > + const char *ifname; > + > + if (!nftnl_trace_is_set(nlt, attr)) > + return; > + > + ifname = nft_if_indextoname(nftnl_trace_get_u32(nlt, attr), __name); > + if (ifname) > + printf(" %s %s", str, ifname); > + else > + printf(" %s %d", str, nftnl_trace_get_u32(nlt, attr)); > +} A lot of the other trace attributes are not used so far. I'm wondering if you intend to add special print functions for them as well. An alternative would be to use our internal datatypes, IOW parse the attributes, associate the values with an internal type and use the regular printing functions. The benefit would be fully consistent output, also with respect to output options like numerical output. > +static int do_command_monitor(struct netlink_ctx *ctx, struct cmd *cmd) > +{ > + struct table *t; > + struct set *s; > + struct netlink_mon_handler monhandler; > + > + monhandler.cache_needed = need_cache(cmd); > if (monhandler.cache_needed) { > + struct rule *rule, *nrule; > + struct chain *chain; > + int ret; > + > list_for_each_entry(t, &table_list, list) { > list_for_each_entry(s, &t->sets, list) > s->init = set_expr_alloc(&cmd->location); > + > + if (!(cmd->monitor->flags & (1 << NFT_MSG_TRACE))) > + continue; > + > + /* When tracing we'd like to translate the rule handle > + * we receive in the trace messages to the actual rule > + * struct to print that out. Populate rule cache now. > + */ Tracing might be a long running operation. The cache can go out of sync, might be better to do a lookup on demand. Right now the caching infrastrucure has quite a lot of problems and I'd prefer to get them fixed before we base new things on it. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Patrick McHardy <kaber@trash.net> wrote: > > +static void trace_print_if(const struct nftnl_trace *nlt, uint16_t attr, const char *str) > > +{ > > + char __name[IFNAMSIZ]; > > + const char *ifname; > > + > > + if (!nftnl_trace_is_set(nlt, attr)) > > + return; > > + > > + ifname = nft_if_indextoname(nftnl_trace_get_u32(nlt, attr), __name); > > + if (ifname) > > + printf(" %s %s", str, ifname); > > + else > > + printf(" %s %d", str, nftnl_trace_get_u32(nlt, attr)); > > +} > > A lot of the other trace attributes are not used so far. I'm wondering if > you intend to add special print functions for them as well. All of the trace attributes are used, most of the use is limited to libnftnl though. Wrt. to iif/oif, I do the printing for it in nftables to take advantage of the iif/oif idx<->name cache. I would have preferred to just stick it into libnftnl like most of the other rest BUT then you either have the additional name translation overhead or just the index number... > An alternative would be to use our internal datatypes, IOW parse the > attributes, associate the values with an internal type and use the regular > printing functions. The benefit would be fully consistent output, also > with respect to output options like numerical output. Yes, right now virtually all of the printing is in libnftnl (called from nftables via nftnl_trace_fprintf() ). > > +static int do_command_monitor(struct netlink_ctx *ctx, struct cmd *cmd) > > +{ > > + struct table *t; > > + struct set *s; > > + struct netlink_mon_handler monhandler; > > + > > + monhandler.cache_needed = need_cache(cmd); > > if (monhandler.cache_needed) { > > + struct rule *rule, *nrule; > > + struct chain *chain; > > + int ret; > > + > > list_for_each_entry(t, &table_list, list) { > > list_for_each_entry(s, &t->sets, list) > > s->init = set_expr_alloc(&cmd->location); > > + > > + if (!(cmd->monitor->flags & (1 << NFT_MSG_TRACE))) > > + continue; > > + > > + /* When tracing we'd like to translate the rule handle > > + * we receive in the trace messages to the actual rule > > + * struct to print that out. Populate rule cache now. > > + */ > > Tracing might be a long running operation. The cache can go out of sync, might > be better to do a lookup on demand. Hmm, okay. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 24, 2015 at 10:25:54AM +0000, Patrick McHardy wrote: > Tracing might be a long running operation. The cache can go out of sync, might > be better to do a lookup on demand. We'll need to handle generations in that approach. The kernel lookup per trace will be expensive. Why not just keep the cache in userspace and update it only when needed? We can easily detect when we get out of sync via ENOBUFS. > Right now the caching infrastrucure has quite a lot of problems and I'd prefer > to get them fixed before we base new things on it. The caching infrastructure only needs to have a mode to be populated via set information, then infer existing tables from handles as you indicated. What other problems you see with it? -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 24.11, Florian Westphal wrote: > Patrick McHardy <kaber@trash.net> wrote: > > > +static void trace_print_if(const struct nftnl_trace *nlt, uint16_t attr, const char *str) > > > +{ > > > + char __name[IFNAMSIZ]; > > > + const char *ifname; > > > + > > > + if (!nftnl_trace_is_set(nlt, attr)) > > > + return; > > > + > > > + ifname = nft_if_indextoname(nftnl_trace_get_u32(nlt, attr), __name); > > > + if (ifname) > > > + printf(" %s %s", str, ifname); > > > + else > > > + printf(" %s %d", str, nftnl_trace_get_u32(nlt, attr)); > > > +} > > > > A lot of the other trace attributes are not used so far. I'm wondering if > > you intend to add special print functions for them as well. > > All of the trace attributes are used, most of the use is limited to > libnftnl though. > > Wrt. to iif/oif, I do the printing for it in nftables to take advantage > of the iif/oif idx<->name cache. > > I would have preferred to just stick it into libnftnl like most of the > other rest BUT then you either have the additional name translation overhead > or just the index number... I see. In that case this seems fine, no need to make it more complicated just for this single attribute. > > An alternative would be to use our internal datatypes, IOW parse the > > attributes, associate the values with an internal type and use the regular > > printing functions. The benefit would be fully consistent output, also > > with respect to output options like numerical output. > > Yes, right now virtually all of the printing is in libnftnl > (called from nftables via nftnl_trace_fprintf() ). The downside is that we're stuck to fprintf. I'm working on moving nft to printing to buffers for pretty printing, intending set contents, sorting flow tables in different dimensions etc. If we have external printing we'll be missing some flexibilty in handling it, also regarding logging to other outputs than stdout. > > > + if (!(cmd->monitor->flags & (1 << NFT_MSG_TRACE))) > > > + continue; > > > + > > > + /* When tracing we'd like to translate the rule handle > > > + * we receive in the trace messages to the actual rule > > > + * struct to print that out. Populate rule cache now. > > > + */ > > > > Tracing might be a long running operation. The cache can go out of sync, might > > be better to do a lookup on demand. > > Hmm, okay. We can certainly also update it when we receive notifications. Not sure what we do about those currently. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 24, 2015 at 10:58:16AM +0000, Patrick McHardy wrote: > On 24.11, Florian Westphal wrote: > > > An alternative would be to use our internal datatypes, IOW parse the > > > attributes, associate the values with an internal type and use the regular > > > printing functions. The benefit would be fully consistent output, also > > > with respect to output options like numerical output. > > > > Yes, right now virtually all of the printing is in libnftnl > > (called from nftables via nftnl_trace_fprintf() ). > > The downside is that we're stuck to fprintf. I'm working on moving nft to > printing to buffers for pretty printing, intending set contents, sorting > flow tables in different dimensions etc. If we have external printing we'll > be missing some flexibilty in handling it, also regarding logging to other > outputs than stdout. What we did do far is to have the fprintf variant for quick printing, but this should be based on the printing to buffers. Both approaches would be exported. Does this sound good to you? -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 24.11, Pablo Neira Ayuso wrote: > On Tue, Nov 24, 2015 at 10:25:54AM +0000, Patrick McHardy wrote: > > Tracing might be a long running operation. The cache can go out of sync, might > > be better to do a lookup on demand. > > We'll need to handle generations in that approach. The kernel lookup > per trace will be expensive. > > Why not just keep the cache in userspace and update it only when > needed? We can easily detect when we get out of sync via ENOBUFS. > > > Right now the caching infrastrucure has quite a lot of problems and I'd prefer > > to get them fixed before we base new things on it. > > The caching infrastructure only needs to have a mode to be populated > via set information, then infer existing tables from handles as you > indicated. > > What other problems you see with it? Well I keep running into problems with it. We already discussed a few, we're dumping way to much information that we don't need and we're making nft require root even for unpriviledged operations and just testing ruleset syntax. We're basing errors on a cache that might not be up to date. When I list the bridge table, for some reason it lists *all* tables of all families. We're basically doing full dumps of everything in many cases. This will be absolutely killing performance with a big ruleset. AFAIK (did not test) we're only listing sets for the family of the first command, then set cache_initializer to true and skip further updates. When the ruleset refers to multiple families, the contents will not be present but expected. It basically seems like the big hammer approach + some bugs instead of selectively getting what we need when we need it and making sure its up to date, at least before generating errors based on it. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 24.11, Pablo Neira Ayuso wrote: > On Tue, Nov 24, 2015 at 10:58:16AM +0000, Patrick McHardy wrote: > > On 24.11, Florian Westphal wrote: > > > > An alternative would be to use our internal datatypes, IOW parse the > > > > attributes, associate the values with an internal type and use the regular > > > > printing functions. The benefit would be fully consistent output, also > > > > with respect to output options like numerical output. > > > > > > Yes, right now virtually all of the printing is in libnftnl > > > (called from nftables via nftnl_trace_fprintf() ). > > > > The downside is that we're stuck to fprintf. I'm working on moving nft to > > printing to buffers for pretty printing, intending set contents, sorting > > flow tables in different dimensions etc. If we have external printing we'll > > be missing some flexibilty in handling it, also regarding logging to other > > outputs than stdout. > > What we did do far is to have the fprintf variant for quick printing, > but this should be based on the printing to buffers. Both approaches > would be exported. > > Does this sound good to you? I'm completely fine with the fprintf printing *for libnfntnl*, but I'm questing whether nft should make use of it for its own output asides from debugging netlink operations. I also wouldn't consider the libnftnl output stable in any form, the fprintf variant it basically something for debugging, at least I've always seen it that way. So yeah, I'm not opposed to having it, just to using it for this particular purpose. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 24, 2015 at 11:04:29AM +0000, Patrick McHardy wrote: > On 24.11, Pablo Neira Ayuso wrote: > > On Tue, Nov 24, 2015 at 10:25:54AM +0000, Patrick McHardy wrote: > > > Tracing might be a long running operation. The cache can go out of sync, might > > > be better to do a lookup on demand. > > > > We'll need to handle generations in that approach. The kernel lookup > > per trace will be expensive. > > > > Why not just keep the cache in userspace and update it only when > > needed? We can easily detect when we get out of sync via ENOBUFS. > > > > > Right now the caching infrastrucure has quite a lot of problems and I'd prefer > > > to get them fixed before we base new things on it. > > > > The caching infrastructure only needs to have a mode to be populated > > via set information, then infer existing tables from handles as you > > indicated. > > > > What other problems you see with it? > > Well I keep running into problems with it. We already discussed a few, we're > dumping way to much information that we don't need and we're making nft > require root even for unpriviledged operations and just testing ruleset > syntax. > > We're basing errors on a cache that might not be up to date. > > When I list the bridge table, for some reason it lists *all* tables of all > families. We're basically doing full dumps of everything in many cases. > This will be absolutely killing performance with a big ruleset. OK, I'll be refining this to allow selective dumps. > AFAIK (did not test) we're only listing sets for the family of the first > command, then set cache_initializer to true and skip further updates. When > the ruleset refers to multiple families, the contents will not be present > but expected. This may be related to the kernel patches I have to send to use generations from the dump path. > It basically seems like the big hammer approach + some bugs instead of > selectively getting what we need when we need it and making sure its > up to date, at least before generating errors based on it. Selective dumping is good to have indeed, I'm willing to refine this. But regarding event tracing, I think it's good to keep a cache in userspace that we update based on the events that we receive, then resync if we hit ENOBUFS, instead of inquiring the kernel for every trace. So I think Florian can get this in. I'll be resolving the existing caching issues after him as this is not related to his work. Fine with this approach? -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Nov 24, 2015 at 11:07:41AM +0000, Patrick McHardy wrote: > On 24.11, Pablo Neira Ayuso wrote: > > On Tue, Nov 24, 2015 at 10:58:16AM +0000, Patrick McHardy wrote: > > > On 24.11, Florian Westphal wrote: > > > > > An alternative would be to use our internal datatypes, IOW parse the > > > > > attributes, associate the values with an internal type and use the regular > > > > > printing functions. The benefit would be fully consistent output, also > > > > > with respect to output options like numerical output. > > > > > > > > Yes, right now virtually all of the printing is in libnftnl > > > > (called from nftables via nftnl_trace_fprintf() ). > > > > > > The downside is that we're stuck to fprintf. I'm working on moving nft to > > > printing to buffers for pretty printing, intending set contents, sorting > > > flow tables in different dimensions etc. If we have external printing we'll > > > be missing some flexibilty in handling it, also regarding logging to other > > > outputs than stdout. > > > > What we did do far is to have the fprintf variant for quick printing, > > but this should be based on the printing to buffers. Both approaches > > would be exported. > > > > Does this sound good to you? > > I'm completely fine with the fprintf printing *for libnfntnl*, but I'm > questing whether nft should make use of it for its own output asides from > debugging netlink operations. I also wouldn't consider the libnftnl output > stable in any form, the fprintf variant it basically something for debugging, > at least I've always seen it that way. Oh I see, you were refering to nft, that makes sense to me. > So yeah, I'm not opposed to having it, just to using it for this particular > purpose. Thanks for your feedback, sorry if I'm insisting much, I just want to make sure we get all the same picture in our heads :) -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Patrick McHardy <kaber@trash.net> wrote: > > Yes, right now virtually all of the printing is in libnftnl > > (called from nftables via nftnl_trace_fprintf() ). > > The downside is that we're stuck to fprintf. I'm working on moving nft to > printing to buffers for pretty printing, intending set contents, sorting > flow tables in different dimensions etc. If we have external printing we'll > be missing some flexibilty in handling it, also regarding logging to other > outputs than stdout. The _fprint variant is a convenience wrapper for nftnl_trace_snprintf. Would that work for you? -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 24.11, Pablo Neira Ayuso wrote: > On Tue, Nov 24, 2015 at 11:04:29AM +0000, Patrick McHardy wrote: > > On 24.11, Pablo Neira Ayuso wrote: > > Well I keep running into problems with it. We already discussed a few, we're > > dumping way to much information that we don't need and we're making nft > > require root even for unpriviledged operations and just testing ruleset > > syntax. > > > > We're basing errors on a cache that might not be up to date. > > > > When I list the bridge table, for some reason it lists *all* tables of all > > families. We're basically doing full dumps of everything in many cases. > > This will be absolutely killing performance with a big ruleset. > > OK, I'll be refining this to allow selective dumps. > > > AFAIK (did not test) we're only listing sets for the family of the first > > command, then set cache_initializer to true and skip further updates. When > > the ruleset refers to multiple families, the contents will not be present > > but expected. > > This may be related to the kernel patches I have to send to use > generations from the dump path. As mentioned, I did not actually test this so far, but its the caching code, it initializes for the first rule and only for that rule and bases its queries on that handle. OTOH the dumping of all tables indicates that it *does* dump more than that. Not sure, would have to test to be sure, but the code looks like it. > > It basically seems like the big hammer approach + some bugs instead of > > selectively getting what we need when we need it and making sure its > > up to date, at least before generating errors based on it. > > Selective dumping is good to have indeed, I'm willing to refine this. > > But regarding event tracing, I think it's good to keep a cache in > userspace that we update based on the events that we receive, then > resync if we hit ENOBUFS, instead of inquiring the kernel for every > trace. Generally I agree. > So I think Florian can get this in. I'll be resolving the existing > caching issues after him as this is not related to his work. > > Fine with this approach? Sure, it we fix this, no question :) -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 24.11, Florian Westphal wrote: > Patrick McHardy <kaber@trash.net> wrote: > > > Yes, right now virtually all of the printing is in libnftnl > > > (called from nftables via nftnl_trace_fprintf() ). > > > > The downside is that we're stuck to fprintf. I'm working on moving nft to > > printing to buffers for pretty printing, intending set contents, sorting > > flow tables in different dimensions etc. If we have external printing we'll > > be missing some flexibilty in handling it, also regarding logging to other > > outputs than stdout. > > The _fprint variant is a convenience wrapper for nftnl_trace_snprintf. > Would that work for you? Probably, but with downsides. The nft internal buffer printing will encapsulate every printed element so we can reorder them based on an abstract meaning, f.i. for pretty printing, where we want to keep related things together and reindent other parts. We can also do filters, sorting on individual parts both numerically and lexically, structured output etc etc, but all that requires to have the invidual parts of the output available seperately. So yeah it would work, and at least the sorting part doesn't make sense for tracing anyways, but it would be a special case with some limitations. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/netfilter/nf_tables.h b/include/linux/netfilter/nf_tables.h index 5ebe3d8..85d739b 100644 --- a/include/linux/netfilter/nf_tables.h +++ b/include/linux/netfilter/nf_tables.h @@ -101,6 +101,7 @@ enum nf_tables_msg_types { NFT_MSG_DELSETELEM, NFT_MSG_NEWGEN, NFT_MSG_GETGEN, + NFT_MSG_TRACE, NFT_MSG_MAX, }; @@ -961,4 +962,33 @@ enum nft_gen_attributes { }; #define NFTA_GEN_MAX (__NFTA_GEN_MAX - 1) +enum nft_trace_attibutes { + NFTA_TRACE_UNSPEC, + NFTA_TRACE_CHAIN, + NFTA_TRACE_ID, + NFTA_TRACE_IIF, + NFTA_TRACE_OIF, + NFTA_TRACE_LL_HEADER, + NFTA_TRACE_LL_TYPE, + NFTA_TRACE_MARK, + NFTA_TRACE_PAYLOAD, + NFTA_TRACE_TABLE, + NFTA_TRACE_TYPE, + NFTA_TRACE_RULE_HANDLE, + NFTA_TRACE_VERDICT, + NFTA_TRACE_VLAN_TAG, + NFTA_TRACE_DEVTYPE, + __NFTA_TRACE_MAX +}; +#define NFTA_TRACE_MAX (__NFTA_TRACE_MAX - 1) + +enum nft_trace_types { + NFT_TRACETYPE_UNSPEC, + NFT_TRACETYPE_PACKET, + NFT_TRACETYPE_POLICY, + NFT_TRACETYPE_RETURN, + NFT_TRACETYPE_RULE, + __NFT_TRACETYPE_MAX +}; +#define NFT_TRACETYPE_MAX (__NFT_TRACETYPE_MAX - 1) #endif /* _LINUX_NF_TABLES_H */ diff --git a/src/evaluate.c b/src/evaluate.c index 48f071f..b1ab8e9 100644 --- a/src/evaluate.c +++ b/src/evaluate.c @@ -2279,6 +2279,7 @@ enum { CMD_MONITOR_EVENT_ANY, CMD_MONITOR_EVENT_NEW, CMD_MONITOR_EVENT_DEL, + CMD_MONITOR_EVENT_TRACE, CMD_MONITOR_EVENT_MAX }; @@ -2320,6 +2321,21 @@ static uint32_t monitor_flags[CMD_MONITOR_EVENT_MAX][CMD_MONITOR_OBJ_MAX] = { [CMD_MONITOR_OBJ_SETS] = (1 << NFT_MSG_DELSET), [CMD_MONITOR_OBJ_ELEMS] = (1 << NFT_MSG_DELSETELEM), }, + [CMD_MONITOR_EVENT_TRACE] = { + [CMD_MONITOR_OBJ_ANY] = (1 << NFT_MSG_NEWTABLE) | + (1 << NFT_MSG_NEWCHAIN) | + (1 << NFT_MSG_NEWRULE) | + (1 << NFT_MSG_DELTABLE) | + (1 << NFT_MSG_DELCHAIN) | + (1 << NFT_MSG_DELRULE) | + (1 << NFT_MSG_TRACE), + [CMD_MONITOR_OBJ_TABLES] = (1 << NFT_MSG_NEWTABLE) | + (1 << NFT_MSG_DELTABLE), + [CMD_MONITOR_OBJ_CHAINS] = (1 << NFT_MSG_NEWCHAIN) | + (1 << NFT_MSG_DELCHAIN), + [CMD_MONITOR_OBJ_RULES] = (1 << NFT_MSG_NEWRULE) | + (1 << NFT_MSG_DELRULE), + }, }; static int cmd_evaluate_monitor(struct eval_ctx *ctx, struct cmd *cmd) @@ -2332,6 +2348,8 @@ static int cmd_evaluate_monitor(struct eval_ctx *ctx, struct cmd *cmd) event = CMD_MONITOR_EVENT_NEW; else if (strcmp(cmd->monitor->event, "destroy") == 0) event = CMD_MONITOR_EVENT_DEL; + else if (strcmp(cmd->monitor->event, "trace") == 0) + event = CMD_MONITOR_EVENT_TRACE; else { return monitor_error(ctx, cmd->monitor, "invalid event %s", cmd->monitor->event); diff --git a/src/netlink.c b/src/netlink.c index 974afb1..8ab7bbe 100644 --- a/src/netlink.c +++ b/src/netlink.c @@ -18,6 +18,7 @@ #include <stdlib.h> #include <libnftnl/table.h> +#include <libnftnl/trace.h> #include <libnftnl/chain.h> #include <libnftnl/expr.h> #include <libnftnl/set.h> @@ -33,6 +34,7 @@ #include <gmputil.h> #include <utils.h> #include <erec.h> +#include <iface.h> static struct mnl_socket *nf_sock; static struct mnl_socket *nf_mon_sock; @@ -2125,6 +2127,85 @@ static void netlink_events_cache_update(struct netlink_mon_handler *monh, } } +static void trace_print_rule(const struct nftnl_trace *nlt) +{ + const struct table *table; + uint64_t rule_handle; + struct chain *chain; + struct handle h; + + h.table = nftnl_trace_get_str(nlt, NFTNL_TRACE_TABLE); + h.chain = nftnl_trace_get_str(nlt, NFTNL_TRACE_CHAIN); + h.family = nftnl_trace_get_u32(nlt, NFTNL_TRACE_FAMILY); + + if (!h.table) + return; + + table = table_lookup(&h); + if (!table) + return; + + chain = chain_lookup(table, &h); + if (!chain) + return; + + if (nftnl_trace_is_set(nlt, NFTNL_TRACE_RULE_HANDLE)) { + struct rule *rule; + + rule_handle = nftnl_trace_get_u64(nlt, NFTNL_TRACE_RULE_HANDLE); + + list_for_each_entry(rule, &chain->rules, list) { + if (rule->handle.handle != rule_handle) + continue; + + printf("\ntrace id %08x rule ", nftnl_trace_get_u32(nlt, NFTNL_TRACE_ID)); + rule_print(rule); + return; + } + } +} + +static void trace_print_if(const struct nftnl_trace *nlt, uint16_t attr, const char *str) +{ + char __name[IFNAMSIZ]; + const char *ifname; + + if (!nftnl_trace_is_set(nlt, attr)) + return; + + ifname = nft_if_indextoname(nftnl_trace_get_u32(nlt, attr), __name); + if (ifname) + printf(" %s %s", str, ifname); + else + printf(" %s %d", str, nftnl_trace_get_u32(nlt, attr)); +} + +static int netlink_events_trace_cb(const struct nlmsghdr *nlh, int type, + struct netlink_mon_handler *monh) +{ + struct nftnl_trace *nlt; + + assert(type == NFT_MSG_TRACE); + + nlt = nftnl_trace_alloc(); + if (!nlt) + memory_allocation_error(); + + if (nftnl_trace_nlmsg_parse(nlh, nlt) < 0) + netlink_abi_error(); + + printf("trace "); + nftnl_trace_fprintf(stdout, nlt, monh->format); + + trace_print_if(nlt, NFTNL_TRACE_IIF, "iif"); + trace_print_if(nlt, NFTNL_TRACE_OIF, "oif"); + + trace_print_rule(nlt); + printf("\n"); + nftnl_trace_free(nlt); + return MNL_CB_OK; +} + static int netlink_events_cb(const struct nlmsghdr *nlh, void *data) { int ret = MNL_CB_OK; @@ -2157,6 +2238,9 @@ static int netlink_events_cb(const struct nlmsghdr *nlh, void *data) case NFT_MSG_DELRULE: ret = netlink_events_rule_cb(nlh, type, monh); break; + case NFT_MSG_TRACE: + ret = netlink_events_trace_cb(nlh, type, monh); + break; } fflush(stdout); diff --git a/src/rule.c b/src/rule.c index 5d3cd84..553990d 100644 --- a/src/rule.c +++ b/src/rule.c @@ -1192,27 +1192,56 @@ static int do_command_rename(struct netlink_ctx *ctx, struct cmd *cmd) return 0; } -static int do_command_monitor(struct netlink_ctx *ctx, struct cmd *cmd) +static bool need_cache(const struct cmd *cmd) { - struct table *t; - struct set *s; - struct netlink_mon_handler monhandler; - - /* cache only needed if monitoring: + /* * - new rules in default format * - new elements */ if (((cmd->monitor->flags & (1 << NFT_MSG_NEWRULE)) && (cmd->monitor->format == NFTNL_OUTPUT_DEFAULT)) || (cmd->monitor->flags & (1 << NFT_MSG_NEWSETELEM))) - monhandler.cache_needed = true; - else - monhandler.cache_needed = false; + return true; + if (cmd->monitor->flags & (1 << NFT_MSG_TRACE)) + return true; + + return false; +} + +static int do_command_monitor(struct netlink_ctx *ctx, struct cmd *cmd) +{ + struct table *t; + struct set *s; + struct netlink_mon_handler monhandler; + + monhandler.cache_needed = need_cache(cmd); if (monhandler.cache_needed) { + struct rule *rule, *nrule; + struct chain *chain; + int ret; + list_for_each_entry(t, &table_list, list) { list_for_each_entry(s, &t->sets, list) s->init = set_expr_alloc(&cmd->location); + + if (!(cmd->monitor->flags & (1 << NFT_MSG_TRACE))) + continue; + + /* When tracing we'd like to translate the rule handle + * we receive in the trace messages to the actual rule + * struct to print that out. Populate rule cache now. + */ + ret = netlink_list_table(ctx, &t->handle, + &internal_location); + + if (ret != 0) /* shouldn't happen && doesn't break things too badly */ + continue; + + list_for_each_entry_safe(rule, nrule, &ctx->list, list) { + chain = chain_lookup(t, &rule->handle); + list_move_tail(&rule->list, &chain->rules); + } } }
nft monitor [ trace ] ... can now display nftables nftrace debug information. $ nft rule bridge raw prerouting add tcp dport 22 limit rate 1/second meta nftrace set 1 $ nft monitor trace trace id 834dd100 bridge packet src 5e:95:99:72:ea:c5 dst 52:54:40:a2:3f:a6 src 192.168.7.1 dst 192.168.7.11 len 88 ttl 64 id 2719 protocol 6 sport 3628 dport 22 iif eth0 trace id 834dd100 bridge raw prerouting rule verdict continue iif eth0 trace id 834dd100 rule tcp dport ssh limit rate 1/second nftrace set 1 trace id 834dd100 bridge raw prerouting policy verdict accept iif eth0 trace id 834dd100 ip filter input rule verdict jump iif br0 trace id 834dd100 rule ip saddr . tcp dport vmap { } trace id 834dd100 ip filter test rule verdict accept iif br0 trace id 834dd100 rule accept based on an initial patch from Markus Kötter. Signed-off-by: Florian Westphal <fw@strlen.de> --- include/linux/netfilter/nf_tables.h | 30 +++++++++++++ src/evaluate.c | 18 ++++++++ src/netlink.c | 84 +++++++++++++++++++++++++++++++++++++ src/rule.c | 47 +++++++++++++++++---- 4 files changed, 170 insertions(+), 9 deletions(-)