diff mbox

[nftables,6/6] src: add trace support to nft monitor mode

Message ID 1448359331-12692-7-git-send-email-fw@strlen.de
State Changes Requested
Delegated to: Pablo Neira
Headers show

Commit Message

Florian Westphal Nov. 24, 2015, 10:02 a.m. UTC
nft monitor [ trace ]

... can now display nftables nftrace debug information.

$ nft rule bridge raw prerouting add tcp dport 22 limit rate 1/second meta nftrace set 1
$ nft monitor trace
trace id 834dd100 bridge packet src 5e:95:99:72:ea:c5 dst 52:54:40:a2:3f:a6 src 192.168.7.1 dst 192.168.7.11 len 88 ttl 64 id 2719 protocol 6 sport 3628 dport 22 iif eth0
trace id 834dd100 bridge raw prerouting rule verdict continue iif eth0
trace id 834dd100 rule tcp dport ssh limit rate 1/second nftrace set 1
trace id 834dd100 bridge raw prerouting policy verdict accept iif eth0
trace id 834dd100 ip filter input rule verdict jump iif br0
trace id 834dd100 rule ip saddr . tcp dport vmap { }
trace id 834dd100 ip filter test rule verdict accept iif br0
trace id 834dd100 rule accept

based on an initial patch from Markus Kötter.

Signed-off-by: Florian Westphal <fw@strlen.de>
---
 include/linux/netfilter/nf_tables.h | 30 +++++++++++++
 src/evaluate.c                      | 18 ++++++++
 src/netlink.c                       | 84 +++++++++++++++++++++++++++++++++++++
 src/rule.c                          | 47 +++++++++++++++++----
 4 files changed, 170 insertions(+), 9 deletions(-)

Comments

Patrick McHardy Nov. 24, 2015, 10:25 a.m. UTC | #1
On 24.11, Florian Westphal wrote:
> nft monitor [ trace ]
> 
> ... can now display nftables nftrace debug information.
> 
> $ nft rule bridge raw prerouting add tcp dport 22 limit rate 1/second meta nftrace set 1
> $ nft monitor trace
> trace id 834dd100 bridge packet src 5e:95:99:72:ea:c5 dst 52:54:40:a2:3f:a6 src 192.168.7.1 dst 192.168.7.11 len 88 ttl 64 id 2719 protocol 6 sport 3628 dport 22 iif eth0
> trace id 834dd100 bridge raw prerouting rule verdict continue iif eth0
> trace id 834dd100 rule tcp dport ssh limit rate 1/second nftrace set 1
> trace id 834dd100 bridge raw prerouting policy verdict accept iif eth0
> trace id 834dd100 ip filter input rule verdict jump iif br0
> trace id 834dd100 rule ip saddr . tcp dport vmap { }
> trace id 834dd100 ip filter test rule verdict accept iif br0
> trace id 834dd100 rule accept

I like this *a lot*. No need for external tools and a much more readable
output than using any of the other logging mechanisms. Nice work!

> +static void trace_print_if(const struct nftnl_trace *nlt, uint16_t attr, const char *str)
> +{
> +	char __name[IFNAMSIZ];
> +	const char *ifname;
> +
> +        if (!nftnl_trace_is_set(nlt, attr))
> +		return;
> +
> +	ifname = nft_if_indextoname(nftnl_trace_get_u32(nlt, attr), __name);
> +	if (ifname)
> +		printf(" %s %s", str, ifname);
> +	else
> +		printf(" %s %d", str, nftnl_trace_get_u32(nlt, attr));
> +}

A lot of the other trace attributes are not used so far. I'm wondering if
you intend to add special print functions for them as well.

An alternative would be to use our internal datatypes, IOW parse the
attributes, associate the values with an internal type and use the regular
printing functions. The benefit would be fully consistent output, also
with respect to output options like numerical output.

> +static int do_command_monitor(struct netlink_ctx *ctx, struct cmd *cmd)
> +{
> +	struct table *t;
> +	struct set *s;
> +	struct netlink_mon_handler monhandler;
> +
> +	monhandler.cache_needed = need_cache(cmd);
>  	if (monhandler.cache_needed) {
> +		struct rule *rule, *nrule;
> +		struct chain *chain;
> +		int ret;
> +
>  		list_for_each_entry(t, &table_list, list) {
>  			list_for_each_entry(s, &t->sets, list)
>  				s->init = set_expr_alloc(&cmd->location);
> +
> +			if (!(cmd->monitor->flags & (1 << NFT_MSG_TRACE)))
> +				continue;
> +
> +			/* When tracing we'd like to translate the rule handle
> +			 * we receive in the trace messages to the actual rule
> +			 * struct to print that out.  Populate rule cache now.
> +			 */

Tracing might be a long running operation. The cache can go out of sync, might
be better to do a lookup on demand.

Right now the caching infrastrucure has quite a lot of problems and I'd prefer
to get them fixed before we base new things on it.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal Nov. 24, 2015, 10:48 a.m. UTC | #2
Patrick McHardy <kaber@trash.net> wrote:
> > +static void trace_print_if(const struct nftnl_trace *nlt, uint16_t attr, const char *str)
> > +{
> > +	char __name[IFNAMSIZ];
> > +	const char *ifname;
> > +
> > +        if (!nftnl_trace_is_set(nlt, attr))
> > +		return;
> > +
> > +	ifname = nft_if_indextoname(nftnl_trace_get_u32(nlt, attr), __name);
> > +	if (ifname)
> > +		printf(" %s %s", str, ifname);
> > +	else
> > +		printf(" %s %d", str, nftnl_trace_get_u32(nlt, attr));
> > +}
> 
> A lot of the other trace attributes are not used so far. I'm wondering if
> you intend to add special print functions for them as well.

All of the trace attributes are used, most of the use is limited to
libnftnl though.

Wrt. to iif/oif, I do the printing for it in nftables to take advantage
of the iif/oif idx<->name cache.

I would have preferred to just stick it into libnftnl like most of the
other rest BUT then you either have the additional name translation overhead
or just the index number...

> An alternative would be to use our internal datatypes, IOW parse the
> attributes, associate the values with an internal type and use the regular
> printing functions. The benefit would be fully consistent output, also
> with respect to output options like numerical output.

Yes, right now virtually all of the printing is in libnftnl
(called from nftables via nftnl_trace_fprintf() ).

> > +static int do_command_monitor(struct netlink_ctx *ctx, struct cmd *cmd)
> > +{
> > +	struct table *t;
> > +	struct set *s;
> > +	struct netlink_mon_handler monhandler;
> > +
> > +	monhandler.cache_needed = need_cache(cmd);
> >  	if (monhandler.cache_needed) {
> > +		struct rule *rule, *nrule;
> > +		struct chain *chain;
> > +		int ret;
> > +
> >  		list_for_each_entry(t, &table_list, list) {
> >  			list_for_each_entry(s, &t->sets, list)
> >  				s->init = set_expr_alloc(&cmd->location);
> > +
> > +			if (!(cmd->monitor->flags & (1 << NFT_MSG_TRACE)))
> > +				continue;
> > +
> > +			/* When tracing we'd like to translate the rule handle
> > +			 * we receive in the trace messages to the actual rule
> > +			 * struct to print that out.  Populate rule cache now.
> > +			 */
> 
> Tracing might be a long running operation. The cache can go out of sync, might
> be better to do a lookup on demand.

Hmm, okay.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso Nov. 24, 2015, 10:53 a.m. UTC | #3
On Tue, Nov 24, 2015 at 10:25:54AM +0000, Patrick McHardy wrote:
> Tracing might be a long running operation. The cache can go out of sync, might
> be better to do a lookup on demand.

We'll need to handle generations in that approach. The kernel lookup
per trace will be expensive.

Why not just keep the cache in userspace and update it only when
needed? We can easily detect when we get out of sync via ENOBUFS.

> Right now the caching infrastrucure has quite a lot of problems and I'd prefer
> to get them fixed before we base new things on it.

The caching infrastructure only needs to have a mode to be populated
via set information, then infer existing tables from handles as you
indicated.

What other problems you see with it?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Nov. 24, 2015, 10:58 a.m. UTC | #4
On 24.11, Florian Westphal wrote:
> Patrick McHardy <kaber@trash.net> wrote:
> > > +static void trace_print_if(const struct nftnl_trace *nlt, uint16_t attr, const char *str)
> > > +{
> > > +	char __name[IFNAMSIZ];
> > > +	const char *ifname;
> > > +
> > > +        if (!nftnl_trace_is_set(nlt, attr))
> > > +		return;
> > > +
> > > +	ifname = nft_if_indextoname(nftnl_trace_get_u32(nlt, attr), __name);
> > > +	if (ifname)
> > > +		printf(" %s %s", str, ifname);
> > > +	else
> > > +		printf(" %s %d", str, nftnl_trace_get_u32(nlt, attr));
> > > +}
> > 
> > A lot of the other trace attributes are not used so far. I'm wondering if
> > you intend to add special print functions for them as well.
> 
> All of the trace attributes are used, most of the use is limited to
> libnftnl though.
> 
> Wrt. to iif/oif, I do the printing for it in nftables to take advantage
> of the iif/oif idx<->name cache.
> 
> I would have preferred to just stick it into libnftnl like most of the
> other rest BUT then you either have the additional name translation overhead
> or just the index number...

I see. In that case this seems fine, no need to make it more complicated just
for this single attribute.

> > An alternative would be to use our internal datatypes, IOW parse the
> > attributes, associate the values with an internal type and use the regular
> > printing functions. The benefit would be fully consistent output, also
> > with respect to output options like numerical output.
> 
> Yes, right now virtually all of the printing is in libnftnl
> (called from nftables via nftnl_trace_fprintf() ).

The downside is that we're stuck to fprintf. I'm working on moving nft to
printing to buffers for pretty printing, intending set contents, sorting
flow tables in different dimensions etc. If we have external printing we'll
be missing some flexibilty in handling it, also regarding logging to other
outputs than stdout.

> > > +			if (!(cmd->monitor->flags & (1 << NFT_MSG_TRACE)))
> > > +				continue;
> > > +
> > > +			/* When tracing we'd like to translate the rule handle
> > > +			 * we receive in the trace messages to the actual rule
> > > +			 * struct to print that out.  Populate rule cache now.
> > > +			 */
> > 
> > Tracing might be a long running operation. The cache can go out of sync, might
> > be better to do a lookup on demand.
> 
> Hmm, okay.

We can certainly also update it when we receive notifications. Not sure what
we do about those currently.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso Nov. 24, 2015, 11:01 a.m. UTC | #5
On Tue, Nov 24, 2015 at 10:58:16AM +0000, Patrick McHardy wrote:
> On 24.11, Florian Westphal wrote:
> > > An alternative would be to use our internal datatypes, IOW parse the
> > > attributes, associate the values with an internal type and use the regular
> > > printing functions. The benefit would be fully consistent output, also
> > > with respect to output options like numerical output.
> > 
> > Yes, right now virtually all of the printing is in libnftnl
> > (called from nftables via nftnl_trace_fprintf() ).
> 
> The downside is that we're stuck to fprintf. I'm working on moving nft to
> printing to buffers for pretty printing, intending set contents, sorting
> flow tables in different dimensions etc. If we have external printing we'll
> be missing some flexibilty in handling it, also regarding logging to other
> outputs than stdout.

What we did do far is to have the fprintf variant for quick printing,
but this should be based on the printing to buffers. Both approaches
would be exported.

Does this sound good to you?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Nov. 24, 2015, 11:04 a.m. UTC | #6
On 24.11, Pablo Neira Ayuso wrote:
> On Tue, Nov 24, 2015 at 10:25:54AM +0000, Patrick McHardy wrote:
> > Tracing might be a long running operation. The cache can go out of sync, might
> > be better to do a lookup on demand.
> 
> We'll need to handle generations in that approach. The kernel lookup
> per trace will be expensive.
> 
> Why not just keep the cache in userspace and update it only when
> needed? We can easily detect when we get out of sync via ENOBUFS.
> 
> > Right now the caching infrastrucure has quite a lot of problems and I'd prefer
> > to get them fixed before we base new things on it.
> 
> The caching infrastructure only needs to have a mode to be populated
> via set information, then infer existing tables from handles as you
> indicated.
> 
> What other problems you see with it?

Well I keep running into problems with it. We already discussed a few, we're
dumping way to much information that we don't need and we're making nft
require root even for unpriviledged operations and just testing ruleset
syntax.

We're basing errors on a cache that might not be up to date.

When I list the bridge table, for some reason it lists *all* tables of all
families. We're basically doing full dumps of everything in many cases.
This will be absolutely killing performance with a big ruleset.

AFAIK (did not test) we're only listing sets for the family of the first
command, then set cache_initializer to true and skip further updates. When
the ruleset refers to multiple families, the contents will not be present
but expected.

It basically seems like the big hammer approach + some bugs instead of
selectively getting what we need when we need it and making sure its
up to date, at least before generating errors based on it.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Nov. 24, 2015, 11:07 a.m. UTC | #7
On 24.11, Pablo Neira Ayuso wrote:
> On Tue, Nov 24, 2015 at 10:58:16AM +0000, Patrick McHardy wrote:
> > On 24.11, Florian Westphal wrote:
> > > > An alternative would be to use our internal datatypes, IOW parse the
> > > > attributes, associate the values with an internal type and use the regular
> > > > printing functions. The benefit would be fully consistent output, also
> > > > with respect to output options like numerical output.
> > > 
> > > Yes, right now virtually all of the printing is in libnftnl
> > > (called from nftables via nftnl_trace_fprintf() ).
> > 
> > The downside is that we're stuck to fprintf. I'm working on moving nft to
> > printing to buffers for pretty printing, intending set contents, sorting
> > flow tables in different dimensions etc. If we have external printing we'll
> > be missing some flexibilty in handling it, also regarding logging to other
> > outputs than stdout.
> 
> What we did do far is to have the fprintf variant for quick printing,
> but this should be based on the printing to buffers. Both approaches
> would be exported.
> 
> Does this sound good to you?

I'm completely fine with the fprintf printing *for libnfntnl*, but I'm
questing whether nft should make use of it for its own output asides from
debugging netlink operations. I also wouldn't consider the libnftnl output
stable in any form, the fprintf variant it basically something for debugging,
at least I've always seen it that way.

So yeah, I'm not opposed to having it, just to using it for this particular
purpose.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso Nov. 24, 2015, 11:12 a.m. UTC | #8
On Tue, Nov 24, 2015 at 11:04:29AM +0000, Patrick McHardy wrote:
> On 24.11, Pablo Neira Ayuso wrote:
> > On Tue, Nov 24, 2015 at 10:25:54AM +0000, Patrick McHardy wrote:
> > > Tracing might be a long running operation. The cache can go out of sync, might
> > > be better to do a lookup on demand.
> > 
> > We'll need to handle generations in that approach. The kernel lookup
> > per trace will be expensive.
> > 
> > Why not just keep the cache in userspace and update it only when
> > needed? We can easily detect when we get out of sync via ENOBUFS.
> > 
> > > Right now the caching infrastrucure has quite a lot of problems and I'd prefer
> > > to get them fixed before we base new things on it.
> > 
> > The caching infrastructure only needs to have a mode to be populated
> > via set information, then infer existing tables from handles as you
> > indicated.
> > 
> > What other problems you see with it?
> 
> Well I keep running into problems with it. We already discussed a few, we're
> dumping way to much information that we don't need and we're making nft
> require root even for unpriviledged operations and just testing ruleset
> syntax.
> 
> We're basing errors on a cache that might not be up to date.
> 
> When I list the bridge table, for some reason it lists *all* tables of all
> families. We're basically doing full dumps of everything in many cases.
> This will be absolutely killing performance with a big ruleset.

OK, I'll be refining this to allow selective dumps.

> AFAIK (did not test) we're only listing sets for the family of the first
> command, then set cache_initializer to true and skip further updates. When
> the ruleset refers to multiple families, the contents will not be present
> but expected.

This may be related to the kernel patches I have to send to use
generations from the dump path.

> It basically seems like the big hammer approach + some bugs instead of
> selectively getting what we need when we need it and making sure its
> up to date, at least before generating errors based on it.

Selective dumping is good to have indeed, I'm willing to refine this.

But regarding event tracing, I think it's good to keep a cache in
userspace that we update based on the events that we receive, then
resync if we hit ENOBUFS, instead of inquiring the kernel for every
trace.

So I think Florian can get this in. I'll be resolving the existing
caching issues after him as this is not related to his work.

Fine with this approach?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso Nov. 24, 2015, 11:14 a.m. UTC | #9
On Tue, Nov 24, 2015 at 11:07:41AM +0000, Patrick McHardy wrote:
> On 24.11, Pablo Neira Ayuso wrote:
> > On Tue, Nov 24, 2015 at 10:58:16AM +0000, Patrick McHardy wrote:
> > > On 24.11, Florian Westphal wrote:
> > > > > An alternative would be to use our internal datatypes, IOW parse the
> > > > > attributes, associate the values with an internal type and use the regular
> > > > > printing functions. The benefit would be fully consistent output, also
> > > > > with respect to output options like numerical output.
> > > > 
> > > > Yes, right now virtually all of the printing is in libnftnl
> > > > (called from nftables via nftnl_trace_fprintf() ).
> > > 
> > > The downside is that we're stuck to fprintf. I'm working on moving nft to
> > > printing to buffers for pretty printing, intending set contents, sorting
> > > flow tables in different dimensions etc. If we have external printing we'll
> > > be missing some flexibilty in handling it, also regarding logging to other
> > > outputs than stdout.
> > 
> > What we did do far is to have the fprintf variant for quick printing,
> > but this should be based on the printing to buffers. Both approaches
> > would be exported.
> > 
> > Does this sound good to you?
> 
> I'm completely fine with the fprintf printing *for libnfntnl*, but I'm
> questing whether nft should make use of it for its own output asides from
> debugging netlink operations. I also wouldn't consider the libnftnl output
> stable in any form, the fprintf variant it basically something for debugging,
> at least I've always seen it that way.

Oh I see, you were refering to nft, that makes sense to me.

> So yeah, I'm not opposed to having it, just to using it for this particular
> purpose.

Thanks for your feedback, sorry if I'm insisting much, I just want to
make sure we get all the same picture in our heads :)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal Nov. 24, 2015, 11:14 a.m. UTC | #10
Patrick McHardy <kaber@trash.net> wrote:
> > Yes, right now virtually all of the printing is in libnftnl
> > (called from nftables via nftnl_trace_fprintf() ).
> 
> The downside is that we're stuck to fprintf. I'm working on moving nft to
> printing to buffers for pretty printing, intending set contents, sorting
> flow tables in different dimensions etc. If we have external printing we'll
> be missing some flexibilty in handling it, also regarding logging to other
> outputs than stdout.

The _fprint variant is a convenience wrapper for nftnl_trace_snprintf.
Would that work for you?

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Nov. 24, 2015, 11:36 a.m. UTC | #11
On 24.11, Pablo Neira Ayuso wrote:
> On Tue, Nov 24, 2015 at 11:04:29AM +0000, Patrick McHardy wrote:
> > On 24.11, Pablo Neira Ayuso wrote:
> > Well I keep running into problems with it. We already discussed a few, we're
> > dumping way to much information that we don't need and we're making nft
> > require root even for unpriviledged operations and just testing ruleset
> > syntax.
> > 
> > We're basing errors on a cache that might not be up to date.
> > 
> > When I list the bridge table, for some reason it lists *all* tables of all
> > families. We're basically doing full dumps of everything in many cases.
> > This will be absolutely killing performance with a big ruleset.
> 
> OK, I'll be refining this to allow selective dumps.
> 
> > AFAIK (did not test) we're only listing sets for the family of the first
> > command, then set cache_initializer to true and skip further updates. When
> > the ruleset refers to multiple families, the contents will not be present
> > but expected.
> 
> This may be related to the kernel patches I have to send to use
> generations from the dump path.

As mentioned, I did not actually test this so far, but its the caching
code, it initializes for the first rule and only for that rule and bases
its queries on that handle. OTOH the dumping of all tables indicates that
it *does* dump more than that. Not sure, would have to test to be sure,
but the code looks like it.

> > It basically seems like the big hammer approach + some bugs instead of
> > selectively getting what we need when we need it and making sure its
> > up to date, at least before generating errors based on it.
> 
> Selective dumping is good to have indeed, I'm willing to refine this.
> 
> But regarding event tracing, I think it's good to keep a cache in
> userspace that we update based on the events that we receive, then
> resync if we hit ENOBUFS, instead of inquiring the kernel for every
> trace.

Generally I agree.

> So I think Florian can get this in. I'll be resolving the existing
> caching issues after him as this is not related to his work.
> 
> Fine with this approach?

Sure, it we fix this, no question :)
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Patrick McHardy Nov. 24, 2015, 11:41 a.m. UTC | #12
On 24.11, Florian Westphal wrote:
> Patrick McHardy <kaber@trash.net> wrote:
> > > Yes, right now virtually all of the printing is in libnftnl
> > > (called from nftables via nftnl_trace_fprintf() ).
> > 
> > The downside is that we're stuck to fprintf. I'm working on moving nft to
> > printing to buffers for pretty printing, intending set contents, sorting
> > flow tables in different dimensions etc. If we have external printing we'll
> > be missing some flexibilty in handling it, also regarding logging to other
> > outputs than stdout.
> 
> The _fprint variant is a convenience wrapper for nftnl_trace_snprintf.
> Would that work for you?

Probably, but with downsides. The nft internal buffer printing will
encapsulate every printed element so we can reorder them based on an
abstract meaning, f.i. for pretty printing, where we want to keep
related things together and reindent other parts.

We can also do filters, sorting on individual parts both numerically
and lexically, structured output etc etc, but all that requires to
have the invidual parts of the output available seperately.

So yeah it would work, and at least the sorting part doesn't make sense
for tracing anyways, but it would be a special case with some limitations.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/netfilter/nf_tables.h b/include/linux/netfilter/nf_tables.h
index 5ebe3d8..85d739b 100644
--- a/include/linux/netfilter/nf_tables.h
+++ b/include/linux/netfilter/nf_tables.h
@@ -101,6 +101,7 @@  enum nf_tables_msg_types {
 	NFT_MSG_DELSETELEM,
 	NFT_MSG_NEWGEN,
 	NFT_MSG_GETGEN,
+	NFT_MSG_TRACE,
 	NFT_MSG_MAX,
 };
 
@@ -961,4 +962,33 @@  enum nft_gen_attributes {
 };
 #define NFTA_GEN_MAX		(__NFTA_GEN_MAX - 1)
 
+enum nft_trace_attibutes {
+	NFTA_TRACE_UNSPEC,
+	NFTA_TRACE_CHAIN,
+	NFTA_TRACE_ID,
+	NFTA_TRACE_IIF,
+	NFTA_TRACE_OIF,
+	NFTA_TRACE_LL_HEADER,
+	NFTA_TRACE_LL_TYPE,
+	NFTA_TRACE_MARK,
+	NFTA_TRACE_PAYLOAD,
+	NFTA_TRACE_TABLE,
+	NFTA_TRACE_TYPE,
+	NFTA_TRACE_RULE_HANDLE,
+	NFTA_TRACE_VERDICT,
+	NFTA_TRACE_VLAN_TAG,
+	NFTA_TRACE_DEVTYPE,
+	__NFTA_TRACE_MAX
+};
+#define NFTA_TRACE_MAX (__NFTA_TRACE_MAX - 1)
+
+enum nft_trace_types {
+	NFT_TRACETYPE_UNSPEC,
+	NFT_TRACETYPE_PACKET,
+	NFT_TRACETYPE_POLICY,
+	NFT_TRACETYPE_RETURN,
+	NFT_TRACETYPE_RULE,
+	__NFT_TRACETYPE_MAX
+};
+#define NFT_TRACETYPE_MAX (__NFT_TRACETYPE_MAX - 1)
 #endif /* _LINUX_NF_TABLES_H */
diff --git a/src/evaluate.c b/src/evaluate.c
index 48f071f..b1ab8e9 100644
--- a/src/evaluate.c
+++ b/src/evaluate.c
@@ -2279,6 +2279,7 @@  enum {
 	CMD_MONITOR_EVENT_ANY,
 	CMD_MONITOR_EVENT_NEW,
 	CMD_MONITOR_EVENT_DEL,
+	CMD_MONITOR_EVENT_TRACE,
 	CMD_MONITOR_EVENT_MAX
 };
 
@@ -2320,6 +2321,21 @@  static uint32_t monitor_flags[CMD_MONITOR_EVENT_MAX][CMD_MONITOR_OBJ_MAX] = {
 		[CMD_MONITOR_OBJ_SETS]		= (1 << NFT_MSG_DELSET),
 		[CMD_MONITOR_OBJ_ELEMS]		= (1 << NFT_MSG_DELSETELEM),
 	},
+	[CMD_MONITOR_EVENT_TRACE] = {
+		[CMD_MONITOR_OBJ_ANY]		= (1 << NFT_MSG_NEWTABLE) |
+						  (1 << NFT_MSG_NEWCHAIN) |
+						  (1 << NFT_MSG_NEWRULE)  |
+						  (1 << NFT_MSG_DELTABLE) |
+						  (1 << NFT_MSG_DELCHAIN) |
+						  (1 << NFT_MSG_DELRULE)  |
+						  (1 << NFT_MSG_TRACE),
+		[CMD_MONITOR_OBJ_TABLES]	= (1 << NFT_MSG_NEWTABLE) |
+						  (1 << NFT_MSG_DELTABLE),
+		[CMD_MONITOR_OBJ_CHAINS]	= (1 << NFT_MSG_NEWCHAIN) |
+						  (1 << NFT_MSG_DELCHAIN),
+		[CMD_MONITOR_OBJ_RULES]		= (1 << NFT_MSG_NEWRULE) |
+						  (1 << NFT_MSG_DELRULE),
+	},
 };
 
 static int cmd_evaluate_monitor(struct eval_ctx *ctx, struct cmd *cmd)
@@ -2332,6 +2348,8 @@  static int cmd_evaluate_monitor(struct eval_ctx *ctx, struct cmd *cmd)
 		event = CMD_MONITOR_EVENT_NEW;
 	else if (strcmp(cmd->monitor->event, "destroy") == 0)
 		event = CMD_MONITOR_EVENT_DEL;
+	else if (strcmp(cmd->monitor->event, "trace") == 0)
+		event = CMD_MONITOR_EVENT_TRACE;
 	else {
 		return monitor_error(ctx, cmd->monitor, "invalid event %s",
 				     cmd->monitor->event);
diff --git a/src/netlink.c b/src/netlink.c
index 974afb1..8ab7bbe 100644
--- a/src/netlink.c
+++ b/src/netlink.c
@@ -18,6 +18,7 @@ 
 #include <stdlib.h>
 
 #include <libnftnl/table.h>
+#include <libnftnl/trace.h>
 #include <libnftnl/chain.h>
 #include <libnftnl/expr.h>
 #include <libnftnl/set.h>
@@ -33,6 +34,7 @@ 
 #include <gmputil.h>
 #include <utils.h>
 #include <erec.h>
+#include <iface.h>
 
 static struct mnl_socket *nf_sock;
 static struct mnl_socket *nf_mon_sock;
@@ -2125,6 +2127,85 @@  static void netlink_events_cache_update(struct netlink_mon_handler *monh,
 	}
 }
 
+static void trace_print_rule(const struct nftnl_trace *nlt)
+{
+	const struct table *table;
+	uint64_t rule_handle;
+	struct chain *chain;
+	struct handle h;
+
+	h.table  = nftnl_trace_get_str(nlt, NFTNL_TRACE_TABLE);
+	h.chain  = nftnl_trace_get_str(nlt, NFTNL_TRACE_CHAIN);
+	h.family = nftnl_trace_get_u32(nlt, NFTNL_TRACE_FAMILY);
+
+	if (!h.table)
+		return;
+
+	table = table_lookup(&h);
+	if (!table)
+		return;
+
+	chain = chain_lookup(table, &h);
+	if (!chain)
+		return;
+
+	if (nftnl_trace_is_set(nlt, NFTNL_TRACE_RULE_HANDLE)) {
+		struct rule *rule;
+
+		rule_handle = nftnl_trace_get_u64(nlt, NFTNL_TRACE_RULE_HANDLE);
+
+		list_for_each_entry(rule, &chain->rules, list) {
+			if (rule->handle.handle != rule_handle)
+				continue;
+
+			printf("\ntrace id %08x rule ", nftnl_trace_get_u32(nlt, NFTNL_TRACE_ID));
+			rule_print(rule);
+			return;
+		}
+	}
+}
+
+static void trace_print_if(const struct nftnl_trace *nlt, uint16_t attr, const char *str)
+{
+	char __name[IFNAMSIZ];
+	const char *ifname;
+
+        if (!nftnl_trace_is_set(nlt, attr))
+		return;
+
+	ifname = nft_if_indextoname(nftnl_trace_get_u32(nlt, attr), __name);
+	if (ifname)
+		printf(" %s %s", str, ifname);
+	else
+		printf(" %s %d", str, nftnl_trace_get_u32(nlt, attr));
+}
+
+static int netlink_events_trace_cb(const struct nlmsghdr *nlh, int type,
+				   struct netlink_mon_handler *monh)
+{
+	struct nftnl_trace *nlt;
+
+	assert(type == NFT_MSG_TRACE);
+
+	nlt = nftnl_trace_alloc();
+	if (!nlt)
+		memory_allocation_error();
+
+	if (nftnl_trace_nlmsg_parse(nlh, nlt) < 0)
+		netlink_abi_error();
+
+	printf("trace ");
+	nftnl_trace_fprintf(stdout, nlt, monh->format);
+
+	trace_print_if(nlt, NFTNL_TRACE_IIF, "iif");
+	trace_print_if(nlt, NFTNL_TRACE_OIF, "oif");
+
+	trace_print_rule(nlt);
+	printf("\n");
+	nftnl_trace_free(nlt);
+	return MNL_CB_OK;
+}
+
 static int netlink_events_cb(const struct nlmsghdr *nlh, void *data)
 {
 	int ret = MNL_CB_OK;
@@ -2157,6 +2238,9 @@  static int netlink_events_cb(const struct nlmsghdr *nlh, void *data)
 	case NFT_MSG_DELRULE:
 		ret = netlink_events_rule_cb(nlh, type, monh);
 		break;
+	case NFT_MSG_TRACE:
+		ret = netlink_events_trace_cb(nlh, type, monh);
+		break;
 	}
 	fflush(stdout);
 
diff --git a/src/rule.c b/src/rule.c
index 5d3cd84..553990d 100644
--- a/src/rule.c
+++ b/src/rule.c
@@ -1192,27 +1192,56 @@  static int do_command_rename(struct netlink_ctx *ctx, struct cmd *cmd)
 	return 0;
 }
 
-static int do_command_monitor(struct netlink_ctx *ctx, struct cmd *cmd)
+static bool need_cache(const struct cmd *cmd)
 {
-	struct table *t;
-	struct set *s;
-	struct netlink_mon_handler monhandler;
-
-	/* cache only needed if monitoring:
+	/*
 	 *  - new rules in default format
 	 *  - new elements
 	 */
 	if (((cmd->monitor->flags & (1 << NFT_MSG_NEWRULE)) &&
 	    (cmd->monitor->format == NFTNL_OUTPUT_DEFAULT)) ||
 	    (cmd->monitor->flags & (1 << NFT_MSG_NEWSETELEM)))
-		monhandler.cache_needed = true;
-	else
-		monhandler.cache_needed = false;
+		return true;
 
+	if (cmd->monitor->flags & (1 << NFT_MSG_TRACE))
+		return true;
+
+	return false;
+}
+
+static int do_command_monitor(struct netlink_ctx *ctx, struct cmd *cmd)
+{
+	struct table *t;
+	struct set *s;
+	struct netlink_mon_handler monhandler;
+
+	monhandler.cache_needed = need_cache(cmd);
 	if (monhandler.cache_needed) {
+		struct rule *rule, *nrule;
+		struct chain *chain;
+		int ret;
+
 		list_for_each_entry(t, &table_list, list) {
 			list_for_each_entry(s, &t->sets, list)
 				s->init = set_expr_alloc(&cmd->location);
+
+			if (!(cmd->monitor->flags & (1 << NFT_MSG_TRACE)))
+				continue;
+
+			/* When tracing we'd like to translate the rule handle
+			 * we receive in the trace messages to the actual rule
+			 * struct to print that out.  Populate rule cache now.
+			 */
+			ret = netlink_list_table(ctx, &t->handle,
+						 &internal_location);
+
+			if (ret != 0) /* shouldn't happen && doesn't break things too badly */
+				continue;
+
+			list_for_each_entry_safe(rule, nrule, &ctx->list, list) {
+				chain = chain_lookup(t, &rule->handle);
+				list_move_tail(&rule->list, &chain->rules);
+			}
 		}
 	}