diff mbox series

[net-next] net: add network device notifier trace points

Message ID 20181219022706.10611-1-sthemmin@microsoft.com
State Changes Requested, archived
Delegated to: David Miller
Headers show
Series [net-next] net: add network device notifier trace points | expand

Commit Message

Stephen Hemminger Dec. 19, 2018, 2:27 a.m. UTC
There already are network trace points for transmit and receive but
nothing for state changes.  Add network tracepoints for before and
after netlink callback is done. This is simple (without extack or
other info) but that could be added if useful. Network namespace id
would also be helpful but hard to get a string for it.

This is the result of a conversation about monitoring of link
state changes with BPF. Parsing netlink is hard and unnecessary
because the data exists (unserialized) already in the
callbacks.

  #  cd /sys/kernel/debug/tracing
  #  echo 1 > events/net/net_dev_notifier_entry/enable
  #  echo 1 > events/net/net_dev_notifier/enable
  #  ip li set dev eno1 down
  #  ip li set dev eno1 up
  #  cat trace
	# tracer: nop
	#
	#                              _-----=> irqs-off
	#                             / _----=> need-resched
	#                            | / _---=> hardirq/softirq
	#                            || / _--=> preempt-depth
	#                            ||| /     delay
	#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
	#              | |       |   ||||       |         |
		      ip-3194  [011] ....    74.926831: net_dev_notifier_entry: dev=eno1 event=GOING_DOWN
		      ip-3194  [011] ....    74.926838: net_dev_notifier: dev=eno1 event=GOING_DOWN ret=1
		      ip-3194  [011] ....    75.029827: net_dev_notifier_entry: dev=eno1 event=DOWN
		      ip-3194  [011] ....    75.031808: net_dev_notifier: dev=eno1 event=DOWN ret=1
		      ip-3195  [011] ....    78.063845: net_dev_notifier_entry: dev=eno1 event=PRE_UP
		      ip-3195  [011] ....    78.063854: net_dev_notifier: dev=eno1 event=PRE_UP ret=1
		      ip-3195  [011] ....    78.279038: net_dev_notifier_entry: dev=eno1 event=UP
		      ip-3195  [011] ....    78.279065: net_dev_notifier: dev=eno1 event=UP ret=1

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 include/trace/events/net.h | 115 +++++++++++++++++++++++++++++++++++++
 net/core/dev.c             |   9 ++-
 2 files changed, 123 insertions(+), 1 deletion(-)

Comments

David Ahern Dec. 19, 2018, 3:38 a.m. UTC | #1
On 12/18/18 7:27 PM, Stephen Hemminger wrote:
> There already are network trace points for transmit and receive but
> nothing for state changes.  Add network tracepoints for before and
> after netlink callback is done. This is simple (without extack or
> other info) but that could be added if useful. Network namespace id
> would also be helpful but hard to get a string for it.

That has been a sore spot for a long time. One option is to allocate a
namespace id relative to init_net as every namespace is created and then
use the nsid in tracepoints.

> 
> This is the result of a conversation about monitoring of link
> state changes with BPF. Parsing netlink is hard and unnecessary
> because the data exists (unserialized) already in the
> callbacks.
> 
>   #  cd /sys/kernel/debug/tracing
>   #  echo 1 > events/net/net_dev_notifier_entry/enable
>   #  echo 1 > events/net/net_dev_notifier/enable
>   #  ip li set dev eno1 down
>   #  ip li set dev eno1 up
>   #  cat trace
> 	# tracer: nop
> 	#
> 	#                              _-----=> irqs-off
> 	#                             / _----=> need-resched
> 	#                            | / _---=> hardirq/softirq
> 	#                            || / _--=> preempt-depth
> 	#                            ||| /     delay
> 	#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
> 	#              | |       |   ||||       |         |
> 		      ip-3194  [011] ....    74.926831: net_dev_notifier_entry: dev=eno1 event=GOING_DOWN
> 		      ip-3194  [011] ....    74.926838: net_dev_notifier: dev=eno1 event=GOING_DOWN ret=1

naming the second one 'net_dev_notifier__exit' would align the columns,
make this a lot easier to read and better discriminate the entry/exit
difference.
Jesper Dangaard Brouer Dec. 19, 2018, 7:36 a.m. UTC | #2
On Tue, 18 Dec 2018 18:27:06 -0800
Stephen Hemminger <stephen@networkplumber.org> wrote:

> This is the result of a conversation about monitoring of link
> state changes with BPF.

If you want to use this from BPF then you are in for a surprise.  As
tracepoints BPF cannot read these "__string" constructs, here the netdev
name.  I tried a lot of different tricks that didn't work, see [1],
until Alexei explained that it simply isn't supported.

I instead recommend adding the ifindex to the tracepoint.  The__string
and __assign_str is also a performance concern as it does strcpy behind
your back.

I have an year old TODO list item about improving this:
 ** TODO Make perf-script plugin for ifindex to name translation
    SCHEDULED: <2017-11-20 Mon>

Today, the existing network tracepoints using dev->name is not that
usable by BPF, as BPF cannot identify the interface.  Thus IMHO it would
make sense to convert the existing network tracepoints dev->name into
dev->ifindex, and then let perf-script convert this to the interface
name.  Either in userspace via if_indextoname(3), or (as ACME pointed
out at the time) we might want to have a lookup table stored together
with perf.data for later inspection (in-case ifindexes changed).


[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/napi_monitor_kern.c#L34-L130

> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  include/trace/events/net.h | 115 +++++++++++++++++++++++++++++++++++++
>  net/core/dev.c             |   9 ++-
>  2 files changed, 123 insertions(+), 1 deletion(-)
> 
> diff --git a/include/trace/events/net.h b/include/trace/events/net.h
> index 1efd7d9b25fe..141310d24610 100644
> --- a/include/trace/events/net.h
> +++ b/include/trace/events/net.h
[...]
> +TRACE_EVENT(net_dev_notifier_entry,
> +
> +	TP_PROTO(const struct netdev_notifier_info *info, unsigned long val),
> +
> +	TP_ARGS(info, val),
> +
> +	TP_STRUCT__entry(
> +		__string(	name,		 info->dev->name )
> +		__field(	enum netdev_cmd, event	         )
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(name, info->dev->name);
> +		__entry->event = val;
> +       ),

These __string and __assign_str are costly and behind the scenes does a
strcpy.

> +
> +	TP_printk("dev=%s event=%s",
> +		  __get_str(name), netdev_event_type(__entry->event))
> +);
> +
> +TRACE_EVENT(net_dev_notifier,
> +
> +	TP_PROTO(const struct netdev_notifier_info *info, int rc, unsigned long val),
> +
> +	TP_ARGS(info, rc, val),
> +
> +	TP_STRUCT__entry(
> +		__string(	name,		 info->dev->name   )
> +		__field(	enum netdev_cmd, event	           )
> +		__field(	int,		 rc		   )
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(name, info->dev->name);
> +		__entry->event = val;
> +		__entry->rc = rc;
> +       ),
> +
> +	TP_printk("dev=%s event=%s ret=%d",
> +		  __get_str(name), netdev_event_type(__entry->event),
> +		  __entry->rc)
> +);
> +

The bare minimum change is to _also_ add (info->dev->)ifindex to the
tracepoint, as this makes it usable from BPF.
Stephen Hemminger Dec. 19, 2018, 3:43 p.m. UTC | #3
On Wed, 19 Dec 2018 08:36:43 +0100
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> On Tue, 18 Dec 2018 18:27:06 -0800
> Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
> > This is the result of a conversation about monitoring of link
> > state changes with BPF.  
> 
> If you want to use this from BPF then you are in for a surprise.  As
> tracepoints BPF cannot read these "__string" constructs, here the netdev
> name.  I tried a lot of different tricks that didn't work, see [1],
> until Alexei explained that it simply isn't supported.
> 
> I instead recommend adding the ifindex to the tracepoint.  The__string
> and __assign_str is also a performance concern as it does strcpy behind
> your back.

Can we record the ifindex in the event record and do the decode in
the printk?

This is not a critical path so don't really care that much.
 
> I have an year old TODO list item about improving this:
>  ** TODO Make perf-script plugin for ifindex to name translation
>     SCHEDULED: <2017-11-20 Mon>
> 
> Today, the existing network tracepoints using dev->name is not that
> usable by BPF, as BPF cannot identify the interface.  Thus IMHO it would
> make sense to convert the existing network tracepoints dev->name into
> dev->ifindex, and then let perf-script convert this to the interface
> name.  Either in userspace via if_indextoname(3), or (as ACME pointed
> out at the time) we might want to have a lookup table stored together
> with perf.data for later inspection (in-case ifindexes changed).
> 
> 
> [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/napi_monitor_kern.c#L34-L130

What about the event enum, can BPF take that?

Also want to add net namespace create/destroy events.
Daniel Borkmann Dec. 19, 2018, 3:46 p.m. UTC | #4
On 12/19/2018 08:36 AM, Jesper Dangaard Brouer wrote:
> On Tue, 18 Dec 2018 18:27:06 -0800
> Stephen Hemminger <stephen@networkplumber.org> wrote:
> 
>> This is the result of a conversation about monitoring of link
>> state changes with BPF.
> 
> If you want to use this from BPF then you are in for a surprise.  As
> tracepoints BPF cannot read these "__string" constructs, here the netdev
> name.  I tried a lot of different tricks that didn't work, see [1],
> until Alexei explained that it simply isn't supported.
> 
> I instead recommend adding the ifindex to the tracepoint.  The__string
> and __assign_str is also a performance concern as it does strcpy behind
> your back.
> 
> I have an year old TODO list item about improving this:
>  ** TODO Make perf-script plugin for ifindex to name translation
>     SCHEDULED: <2017-11-20 Mon>
> 
> Today, the existing network tracepoints using dev->name is not that
> usable by BPF, as BPF cannot identify the interface.  Thus IMHO it would
> make sense to convert the existing network tracepoints dev->name into
> dev->ifindex, and then let perf-script convert this to the interface
> name.  Either in userspace via if_indextoname(3), or (as ACME pointed
> out at the time) we might want to have a lookup table stored together
> with perf.data for later inspection (in-case ifindexes changed).

Hmm, why not just doing something as in your example below with napi_poll()
where you pass in the napi pointer, and then use bpf_probe_read_str() on
ctx->dev for fetching the name? At least there this should work and should
be okay given it's rather slow-path event.

Thanks,
Daniel

> [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/napi_monitor_kern.c#L34-L130
> 
>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>> ---
>>  include/trace/events/net.h | 115 +++++++++++++++++++++++++++++++++++++
>>  net/core/dev.c             |   9 ++-
>>  2 files changed, 123 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/trace/events/net.h b/include/trace/events/net.h
>> index 1efd7d9b25fe..141310d24610 100644
>> --- a/include/trace/events/net.h
>> +++ b/include/trace/events/net.h
> [...]
>> +TRACE_EVENT(net_dev_notifier_entry,
>> +
>> +	TP_PROTO(const struct netdev_notifier_info *info, unsigned long val),
>> +
>> +	TP_ARGS(info, val),
>> +
>> +	TP_STRUCT__entry(
>> +		__string(	name,		 info->dev->name )
>> +		__field(	enum netdev_cmd, event	         )
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__assign_str(name, info->dev->name);
>> +		__entry->event = val;
>> +       ),
> 
> These __string and __assign_str are costly and behind the scenes does a
> strcpy.
> 
>> +
>> +	TP_printk("dev=%s event=%s",
>> +		  __get_str(name), netdev_event_type(__entry->event))
>> +);
>> +
>> +TRACE_EVENT(net_dev_notifier,
>> +
>> +	TP_PROTO(const struct netdev_notifier_info *info, int rc, unsigned long val),
>> +
>> +	TP_ARGS(info, rc, val),
>> +
>> +	TP_STRUCT__entry(
>> +		__string(	name,		 info->dev->name   )
>> +		__field(	enum netdev_cmd, event	           )
>> +		__field(	int,		 rc		   )
>> +	),
>> +
>> +	TP_fast_assign(
>> +		__assign_str(name, info->dev->name);
>> +		__entry->event = val;
>> +		__entry->rc = rc;
>> +       ),
>> +
>> +	TP_printk("dev=%s event=%s ret=%d",
>> +		  __get_str(name), netdev_event_type(__entry->event),
>> +		  __entry->rc)
>> +);
>> +
> 
> The bare minimum change is to _also_ add (info->dev->)ifindex to the
> tracepoint, as this makes it usable from BPF.
>
Steven Rostedt Dec. 19, 2018, 3:51 p.m. UTC | #5
On Wed, 19 Dec 2018 08:36:43 +0100
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> > +TRACE_EVENT(net_dev_notifier_entry,
> > +
> > +	TP_PROTO(const struct netdev_notifier_info *info, unsigned long val),
> > +
> > +	TP_ARGS(info, val),
> > +
> > +	TP_STRUCT__entry(
> > +		__string(	name,		 info->dev->name )
> > +		__field(	enum netdev_cmd, event	         )
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__assign_str(name, info->dev->name);
> > +		__entry->event = val;
> > +       ),  
> 
> These __string and __assign_str are costly and behind the scenes does a
> strcpy.

True. But you could also make this into a memcpy with:

		__array(	char,	name,	IFNAMSIZ)


And in TP_fast_assign:

		memcpy(__entry->name, info->dev->name, IFNAMSIZ);

And for the TP_printk:

		"dev=%s", __entry->name


Yes, ifindex is still faster, but this does give you the name in a way
that I think even BPF can use it. I also believe that memcopy on a
constant is faster than a strcpy.

-- Steve
Jesper Dangaard Brouer Dec. 19, 2018, 4:40 p.m. UTC | #6
On Wed, 19 Dec 2018 16:46:05 +0100
Daniel Borkmann <borkmann@iogearbox.net> wrote:

> Hmm, why not just doing something as in your example below with napi_poll()
> where you pass in the napi pointer, and then use bpf_probe_read_str() on
> ctx->dev for fetching the name? At least there this should work and should
> be okay given it's rather slow-path event.
> 
> > [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/napi_monitor_kern.c#L34-L130

I didn't try to use bpf_probe_read_str() in [1], but that is also not
what I want in my use-case.  I don't want the name, but the ifindex to
filter on, as it will be faster.  My use-case is allowing my
napi_monitor program to filter on a specific net_device, inside the
kernel via BPF.

E.g. this didn't work:
 bpf_probe_read(&ifindex, 4, &ctx->napi->dev->ifindex);

Perhaps you know how I can do this deref correctly?

My napi_monitor use-case is not a slow-path event, even-though in
optimal cases we should handle 64 packets per tracepoint invocation,
but I'm using this for 100G NICs with >20Mpps. And I mostly use the
tool when something looks wrong and I don't see 64 packet bulks, which
is also why I detect when this gets invoked from idle task or from
ksoftirqd.
David Ahern Dec. 19, 2018, 4:48 p.m. UTC | #7
On 12/19/18 12:36 AM, Jesper Dangaard Brouer wrote:
> Today, the existing network tracepoints using dev->name is not that
> usable by BPF, as BPF cannot identify the interface.  Thus IMHO it would
> make sense to convert the existing network tracepoints dev->name into
> dev->ifindex, and then let perf-script convert this to the interface
> name.  Either in userspace via if_indextoname(3), or (as ACME pointed
> out at the time) we might want to have a lookup table stored together
> with perf.data for later inspection (in-case ifindexes changed).
> 

You need network namespace references as well and the ability to monitor
device notifications.
Steven Rostedt Dec. 19, 2018, 4:55 p.m. UTC | #8
On Wed, 19 Dec 2018 17:40:49 +0100
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> My napi_monitor use-case is not a slow-path event, even-though in
> optimal cases we should handle 64 packets per tracepoint invocation,
> but I'm using this for 100G NICs with >20Mpps. And I mostly use the
> tool when something looks wrong and I don't see 64 packet bulks, which
> is also why I detect when this gets invoked from idle task or from
> ksoftirqd.
> 

Would it be possible to add an interface that just connects to the
tracepoint without having to use the trace event itself (the trace
event is what shows up in tracefs/events/*/*).

That would give you full control of what you want:


void my_probe(void *data, const struct netdev_notifier_info *info,
		unsigned long val)
{
	do_something_with(info->dev->ifindex);
}

void setup(void) {

	register_trace_netdev_notifier_entry(my_probe, data);

}


That allows you to attach a function to the tracepoint and not worry
about the overhead of what the trace event gives you.

-- Steve
Daniel Borkmann Dec. 19, 2018, 5:07 p.m. UTC | #9
On 12/19/2018 05:40 PM, Jesper Dangaard Brouer wrote:
> On Wed, 19 Dec 2018 16:46:05 +0100
> Daniel Borkmann <borkmann@iogearbox.net> wrote:
> 
>> Hmm, why not just doing something as in your example below with napi_poll()
>> where you pass in the napi pointer, and then use bpf_probe_read_str() on
>> ctx->dev for fetching the name? At least there this should work and should
>> be okay given it's rather slow-path event.
>>
>>> [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/napi_monitor_kern.c#L34-L130
> 
> I didn't try to use bpf_probe_read_str() in [1], but that is also not
> what I want in my use-case.  I don't want the name, but the ifindex to
> filter on, as it will be faster.  My use-case is allowing my
> napi_monitor program to filter on a specific net_device, inside the
> kernel via BPF.
> 
> E.g. this didn't work:
>  bpf_probe_read(&ifindex, 4, &ctx->napi->dev->ifindex);

Something along the lines of this you could try:

#define probe_fetch(X) ({typeof(X) val; bpf_probe_read(&val, sizeof(val), &X); val;})

SEC("tracepoint/napi/napi_poll")
int napi_poll(struct napi_poll_ctx *ctx)
{
	struct napi_struct *napi = ctx->napi;
	struct net_device *dev;
	int ifindex;
	[...]

	dev = probe_fetch(napi->dev);
	ifindex = probe_fetch(dev->ifindex);

	[...]
}

> Perhaps you know how I can do this deref correctly?
> 
> My napi_monitor use-case is not a slow-path event, even-though in
> optimal cases we should handle 64 packets per tracepoint invocation,
> but I'm using this for 100G NICs with >20Mpps. And I mostly use the
> tool when something looks wrong and I don't see 64 packet bulks, which
> is also why I detect when this gets invoked from idle task or from
> ksoftirqd.
>
diff mbox series

Patch

diff --git a/include/trace/events/net.h b/include/trace/events/net.h
index 1efd7d9b25fe..141310d24610 100644
--- a/include/trace/events/net.h
+++ b/include/trace/events/net.h
@@ -11,6 +11,121 @@ 
 #include <linux/ip.h>
 #include <linux/tracepoint.h>
 
+TRACE_DEFINE_ENUM(NETDEV_UP);
+TRACE_DEFINE_ENUM(NETDEV_DOWN);
+TRACE_DEFINE_ENUM(NETDEV_REBOOT);
+TRACE_DEFINE_ENUM(NETDEV_CHANGE);
+TRACE_DEFINE_ENUM(NETDEV_REGISTER);
+TRACE_DEFINE_ENUM(NETDEV_UNREGISTER);
+TRACE_DEFINE_ENUM(NETDEV_CHANGEMTU);
+TRACE_DEFINE_ENUM(NETDEV_CHANGEADDR);
+TRACE_DEFINE_ENUM(NETDEV_PRE_CHANGEADDR);
+TRACE_DEFINE_ENUM(NETDEV_GOING_DOWN);
+TRACE_DEFINE_ENUM(NETDEV_CHANGENAME);
+TRACE_DEFINE_ENUM(NETDEV_FEAT_CHANGE);
+TRACE_DEFINE_ENUM(NETDEV_BONDING_FAILOVER);
+TRACE_DEFINE_ENUM(NETDEV_PRE_UP);
+TRACE_DEFINE_ENUM(NETDEV_PRE_TYPE_CHANGE);
+TRACE_DEFINE_ENUM(NETDEV_POST_TYPE_CHANGE);
+TRACE_DEFINE_ENUM(NETDEV_POST_INIT);
+TRACE_DEFINE_ENUM(NETDEV_RELEASE);
+TRACE_DEFINE_ENUM(NETDEV_NOTIFY_PEERS);
+TRACE_DEFINE_ENUM(NETDEV_JOIN);
+TRACE_DEFINE_ENUM(NETDEV_CHANGEUPPER);
+TRACE_DEFINE_ENUM(NETDEV_RESEND_IGMP);
+TRACE_DEFINE_ENUM(NETDEV_PRECHANGEMTU);
+TRACE_DEFINE_ENUM(NETDEV_CHANGEINFODATA);
+TRACE_DEFINE_ENUM(NETDEV_BONDING_INFO);
+TRACE_DEFINE_ENUM(NETDEV_PRECHANGEUPPER);
+TRACE_DEFINE_ENUM(NETDEV_CHANGELOWERSTATE);
+TRACE_DEFINE_ENUM(NETDEV_UDP_TUNNEL_PUSH_INFO);
+TRACE_DEFINE_ENUM(NETDEV_UDP_TUNNEL_DROP_INFO);
+TRACE_DEFINE_ENUM(NETDEV_CHANGE_TX_QUEUE_LEN);
+TRACE_DEFINE_ENUM(NETDEV_CVLAN_FILTER_PUSH_INFO);
+TRACE_DEFINE_ENUM(NETDEV_CVLAN_FILTER_DROP_INFO);
+TRACE_DEFINE_ENUM(NETDEV_SVLAN_FILTER_PUSH_INFO);
+TRACE_DEFINE_ENUM(NETDEV_SVLAN_FILTER_DROP_INFO);
+
+#define netdev_event_type(type)					\
+	__print_symbolic(type,					\
+		 { NETDEV_UP, "UP" },				\
+		 { NETDEV_DOWN, "DOWN" },			\
+		 { NETDEV_REBOOT, "REBOOT" },			\
+		 { NETDEV_CHANGE, "CHANGE" },			\
+		 { NETDEV_REGISTER, "REGISTER" },		\
+		 { NETDEV_UNREGISTER, "UNREGISTER" },		\
+		 { NETDEV_CHANGEMTU, "CHANGEMTU" },		\
+		 { NETDEV_CHANGEADDR, "CHANGEADDR" },		\
+		 { NETDEV_PRE_CHANGEADDR, "PRE_CHANGEADDR" },	\
+		 { NETDEV_GOING_DOWN, "GOING_DOWN" },		\
+		 { NETDEV_CHANGENAME, "CHANGENAME" },		\
+		 { NETDEV_FEAT_CHANGE, "FEAT_CHANGE" },		\
+		 { NETDEV_BONDING_FAILOVER, "BONDING_FAILOVER" }, \
+		 { NETDEV_PRE_UP, "PRE_UP" },			\
+		 { NETDEV_PRE_TYPE_CHANGE, "PRE_TYPE_CHANGE" }, \
+		 { NETDEV_POST_TYPE_CHANGE, "POST_TYPE_CHANGE" }, \
+		 { NETDEV_POST_INIT, "POST_INIT" },		\
+		 { NETDEV_RELEASE, "RELEASE" },			\
+		 { NETDEV_NOTIFY_PEERS, "NOTIFY_PEERS" },	\
+		 { NETDEV_JOIN, "JOIN" },			\
+		 { NETDEV_CHANGEUPPER, "CHANGEUPPER" },		\
+		 { NETDEV_RESEND_IGMP, "RESEND_IGMP" },		\
+		 { NETDEV_PRECHANGEMTU, "PRECHANGEMTU" },	\
+		 { NETDEV_CHANGEINFODATA, "CHANGEINFODATA" },	\
+		 { NETDEV_BONDING_INFO, "BONDING_INFO" },	\
+		 { NETDEV_PRECHANGEUPPER, "PRECHANGEUPPER" },	\
+		 { NETDEV_CHANGELOWERSTATE, "CHANGELOWERSTATE" }, \
+		 { NETDEV_UDP_TUNNEL_PUSH_INFO, "UDP_TUNNEL_PUSH_INFO" }, \
+		 { NETDEV_UDP_TUNNEL_DROP_INFO, "UDP_TUNNEL_DROP_INFO" }, \
+		 { NETDEV_CHANGE_TX_QUEUE_LEN, "CHANGE_TX_QUEUE_LEN" }, \
+		 { NETDEV_CVLAN_FILTER_PUSH_INFO, "CVLAN_FILTER_PUSH_INFO" }, \
+		 { NETDEV_CVLAN_FILTER_DROP_INFO, "CVLAN_FILTER_DROP_INFO" }, \
+		 { NETDEV_SVLAN_FILTER_PUSH_INFO, "SVLAN_FILTER_PUSH_INFO" }, \
+		 { NETDEV_SVLAN_FILTER_DROP_INFO, "SVLAN_FILTER_DROP_INFO" } )
+
+TRACE_EVENT(net_dev_notifier_entry,
+
+	TP_PROTO(const struct netdev_notifier_info *info, unsigned long val),
+
+	TP_ARGS(info, val),
+
+	TP_STRUCT__entry(
+		__string(	name,		 info->dev->name )
+		__field(	enum netdev_cmd, event	         )
+	),
+
+	TP_fast_assign(
+		__assign_str(name, info->dev->name);
+		__entry->event = val;
+       ),
+
+	TP_printk("dev=%s event=%s",
+		  __get_str(name), netdev_event_type(__entry->event))
+);
+
+TRACE_EVENT(net_dev_notifier,
+
+	TP_PROTO(const struct netdev_notifier_info *info, int rc, unsigned long val),
+
+	TP_ARGS(info, rc, val),
+
+	TP_STRUCT__entry(
+		__string(	name,		 info->dev->name   )
+		__field(	enum netdev_cmd, event	           )
+		__field(	int,		 rc		   )
+	),
+
+	TP_fast_assign(
+		__assign_str(name, info->dev->name);
+		__entry->event = val;
+		__entry->rc = rc;
+       ),
+
+	TP_printk("dev=%s event=%s ret=%d",
+		  __get_str(name), netdev_event_type(__entry->event),
+		  __entry->rc)
+);
+
 TRACE_EVENT(net_dev_start_xmit,
 
 	TP_PROTO(const struct sk_buff *skb, const struct net_device *dev),
diff --git a/net/core/dev.c b/net/core/dev.c
index 1b5a4410be0e..0906f317e5ca 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1735,8 +1735,15 @@  EXPORT_SYMBOL(unregister_netdevice_notifier);
 static int call_netdevice_notifiers_info(unsigned long val,
 					 struct netdev_notifier_info *info)
 {
+	int rc;
+
 	ASSERT_RTNL();
-	return raw_notifier_call_chain(&netdev_chain, val, info);
+
+	trace_net_dev_notifier_entry(info, val);
+	rc = raw_notifier_call_chain(&netdev_chain, val, info);
+	trace_net_dev_notifier(info, rc, val);
+
+	return rc;
 }
 
 static int call_netdevice_notifiers_extack(unsigned long val,