diff mbox series

[v3] net: Add trace events for all receive exit points

Message ID 20181113201326.5627-1-gbastien@versatic.net
State Deferred, archived
Delegated to: David Miller
Headers show
Series [v3] net: Add trace events for all receive exit points | expand

Commit Message

Geneviève Bastien Nov. 13, 2018, 8:13 p.m. UTC
Trace events are already present for the receive entry points, to indicate
how the reception entered the stack.

This patch adds the corresponding exit trace events that will bound the
reception such that all events occurring between the entry and the exit
can be considered as part of the reception context. This greatly helps
for dependency and root cause analyses.

Without this, it is not possible with tracepoint instrumentation to
determine whether a sched_wakeup event following a netif_receive_skb
event is the result of the packet reception or a simple coincidence after
further processing by the thread. It is possible using other mechanisms
like kretprobes, but considering the "entry" points are already present,
it would be good to add the matching exit events.

In addition to linking packets with wakeups, the entry/exit event pair
can also be used to perform network stack latency analyses.

Signed-off-by: Geneviève Bastien <gbastien@versatic.net>
CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Ingo Molnar <mingo@redhat.com>
CC: David S. Miller <davem@davemloft.net>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> (tracing
side)
---
Changes in v3:
  - Update commit message
  - Rename exit template to match the entry one
  - Move the similar tracepoints together in the code
---
Changes in v2:
  - Add the return value to tracepoints where applicable
  - Verify if tracepoint is enabled before walking list in
    netif_receive_skb_list
---
 include/trace/events/net.h | 71 ++++++++++++++++++++++++++++++++++++++
 net/core/dev.c             | 38 ++++++++++++++++----
 2 files changed, 103 insertions(+), 6 deletions(-)

Comments

David Miller Nov. 17, 2018, 3:50 a.m. UTC | #1
From: Geneviève Bastien <gbastien@versatic.net>
Date: Tue, 13 Nov 2018 15:13:26 -0500

> @@ -5222,9 +5228,14 @@ static void netif_receive_skb_list_internal(struct list_head *head)
>   */
>  int netif_receive_skb(struct sk_buff *skb)
>  {
> +	int ret;
> +
>  	trace_netif_receive_skb_entry(skb);
>  
> -	return netif_receive_skb_internal(skb);
> +	ret = netif_receive_skb_internal(skb);
> +	trace_netif_receive_skb_exit(skb, ret);

Every time I read this code from now on I'm going to say to myself
"oh crap, we reference 'skb' after it's potentially freed up"

I really don't like this.

I know only the pointer is used, but that pointer can be reallocated
to another SLAB object, even another SKB, by the time these exit
tracepoints execute.

Sorry, I can't really convince myself to apply this now.
Mathieu Desnoyers Nov. 17, 2018, 6:27 p.m. UTC | #2
----- On Nov 16, 2018, at 10:50 PM, David S. Miller davem@davemloft.net wrote:

> From: Geneviève Bastien <gbastien@versatic.net>
> Date: Tue, 13 Nov 2018 15:13:26 -0500
> 
>> @@ -5222,9 +5228,14 @@ static void netif_receive_skb_list_internal(struct
>> list_head *head)
>>   */
>>  int netif_receive_skb(struct sk_buff *skb)
>>  {
>> +	int ret;
>> +
>>  	trace_netif_receive_skb_entry(skb);
>>  
>> -	return netif_receive_skb_internal(skb);
>> +	ret = netif_receive_skb_internal(skb);
>> +	trace_netif_receive_skb_exit(skb, ret);
> 
> Every time I read this code from now on I'm going to say to myself
> "oh crap, we reference 'skb' after it's potentially freed up"
> 
> I really don't like this.
> 
> I know only the pointer is used, but that pointer can be reallocated
> to another SLAB object, even another SKB, by the time these exit
> tracepoints execute.
> 
> Sorry, I can't really convince myself to apply this now.

Hi David,

Thanks for looking into this patch. You bring a very good point indeed!
Passing a skb pointer that may have been already reallocated to skb_exit
tracepoints is pretty much useless for analysis purposes.

I see two possible solutions:

1) Remove the "skb" argument from the sbk_exit tracepoints completely.
Anyway, I think it's not really needed for analysis purposes because
we can link the "entry" with the associated "exit" using the thread ID
executing those tracepoints.  (Geneviève, would that work for your
analyses ?)

2) Move the skb_exit tracepoints before freeing the skb pointer. My
concern here is that the instrumentation may become much uglier than
the currently proposed patch. (I have not looked at the specifics
though, so I may be wrong.)

Do you have a preference between those two approaches, or perhaps you
have an alternative solution in mind ?

Thanks!

Mathieu
David Miller Nov. 18, 2018, 6:19 a.m. UTC | #3
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date: Sat, 17 Nov 2018 13:27:29 -0500 (EST)

> I see two possible solutions:
> 
> 1) Remove the "skb" argument from the sbk_exit tracepoints completely.
> Anyway, I think it's not really needed for analysis purposes because
> we can link the "entry" with the associated "exit" using the thread ID
> executing those tracepoints.  (Geneviève, would that work for your
> analyses ?)
> 
> 2) Move the skb_exit tracepoints before freeing the skb pointer. My
> concern here is that the instrumentation may become much uglier than
> the currently proposed patch. (I have not looked at the specifics
> though, so I may be wrong.)
> 
> Do you have a preference between those two approaches, or perhaps you
> have an alternative solution in mind ?

I wonder how other situations handle this.

About #2, if you put the tracepoint beforehand you can't log the
'ret' value.  So at least in that regard I prefer #1.

If tracepoints generally handle this by matching up the thread
ID, then definitely that's how we should do it here too instead
of trying to use the SKB pointer for this purpose.
Geneviève Bastien Nov. 19, 2018, 3:27 p.m. UTC | #4
On 2018-11-18 1:19 a.m., David Miller wrote:
> From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Date: Sat, 17 Nov 2018 13:27:29 -0500 (EST)
>
>> I see two possible solutions:
>>
>> 1) Remove the "skb" argument from the sbk_exit tracepoints completely.
>> Anyway, I think it's not really needed for analysis purposes because
>> we can link the "entry" with the associated "exit" using the thread ID
>> executing those tracepoints.  (Genevi�ve, would that work for your
>> analyses ?)
>>
>> 2) Move the skb_exit tracepoints before freeing the skb pointer. My
>> concern here is that the instrumentation may become much uglier than
>> the currently proposed patch. (I have not looked at the specifics
>> though, so I may be wrong.)
>>
>> Do you have a preference between those two approaches, or perhaps you
>> have an alternative solution in mind ?
> I wonder how other situations handle this.
>
> About #2, if you put the tracepoint beforehand you can't log the
> 'ret' value.  So at least in that regard I prefer #1.
>
> If tracepoints generally handle this by matching up the thread
> ID, then definitely that's how we should do it here too instead
> of trying to use the SKB pointer for this purpose.

I would go for #1 too, the "skb" is not used to match entry/exit, it is more the context in which they appear (thread, softirq, etc). And I did indeed get seg faults on my first attempt when I tried to use the existing templates.

There's just the list tracepoint that would now log nothing, so there's no point looping through the list.
Geneviève Bastien Nov. 19, 2018, 5:47 p.m. UTC | #5
On 2018-11-19 10:27 a.m., Geneviève Bastien wrote:
> On 2018-11-18 1:19 a.m., David Miller wrote:
>> From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>> Date: Sat, 17 Nov 2018 13:27:29 -0500 (EST)
>>
>>> I see two possible solutions:
>>>
>>> 1) Remove the "skb" argument from the sbk_exit tracepoints completely.
>>> Anyway, I think it's not really needed for analysis purposes because
>>> we can link the "entry" with the associated "exit" using the thread ID
>>> executing those tracepoints.  (Genevi�ve, would that work for your
>>> analyses ?)
>>>
>>> 2) Move the skb_exit tracepoints before freeing the skb pointer. My
>>> concern here is that the instrumentation may become much uglier than
>>> the currently proposed patch. (I have not looked at the specifics
>>> though, so I may be wrong.)
>>>
>>> Do you have a preference between those two approaches, or perhaps you
>>> have an alternative solution in mind ?
>> I wonder how other situations handle this.
>>
>> About #2, if you put the tracepoint beforehand you can't log the
>> 'ret' value.  So at least in that regard I prefer #1.
>>
>> If tracepoints generally handle this by matching up the thread
>> ID, then definitely that's how we should do it here too instead
>> of trying to use the SKB pointer for this purpose.
> I would go for #1 too, the "skb" is not used to match entry/exit, it is more the context in which they appear (thread, softirq, etc). And I did indeed get seg faults on my first attempt when I tried to use the existing templates.
>
> There's just the list tracepoint that would now log nothing, so there's no point looping through the list.
>
After further investigation, the 'netif_receive_skb_list_exit' tracepoint poses a problem: There is no return value in that function and the TRACE_EVENT infrastructure does not support tracepoints without payload, afaik.

Here's a few possible solutions:

1) Add a unique trace_netif_receive_skb_list_exit tracepoint with a 'fake' return value of 0, so it can use the same template as the others. Easiest, but looks hackish.

2) Add tracepoints in the callees in the various locations where individual skbs get removed from the list. But that's quite a few code paths to cover, and we may miss a few.

3) Simply ignore the exit of the list code path. Dependency analyses will miss this case though.

4) Replace the proposed exit tracepoints by others after the delivery of packets, ie in the 'netif_receive_skb_internal' and 'netif_receive_skb_list_internal' functions, again at the risk of missing code paths.

5) None of the above (and fallback to using kretprobes with the tracers)


Any one of these approaches sound more appealing?

Thanks,

Geneviève
diff mbox series

Patch

diff --git a/include/trace/events/net.h b/include/trace/events/net.h
index 00aa72ce0e7c..aa64169c42cb 100644
--- a/include/trace/events/net.h
+++ b/include/trace/events/net.h
@@ -244,6 +244,77 @@  DEFINE_EVENT(net_dev_rx_verbose_template, netif_rx_ni_entry,
 	TP_ARGS(skb)
 );
 
+DECLARE_EVENT_CLASS(net_dev_rx_exit_template,
+
+	TP_PROTO(struct sk_buff *skb, int ret),
+
+	TP_ARGS(skb, ret),
+
+	TP_STRUCT__entry(
+		__field(void *,	skbaddr)
+		__field(int,	ret)
+	),
+
+	TP_fast_assign(
+		__entry->skbaddr = skb;
+		__entry->ret = ret;
+	),
+
+	TP_printk("skbaddr=%p ret=%d", __entry->skbaddr, __entry->ret)
+);
+
+DEFINE_EVENT(net_dev_rx_exit_template, napi_gro_frags_exit,
+
+	TP_PROTO(struct sk_buff *skb, int ret),
+
+	TP_ARGS(skb, ret)
+);
+
+DEFINE_EVENT(net_dev_rx_exit_template, napi_gro_receive_exit,
+
+	TP_PROTO(struct sk_buff *skb, int ret),
+
+	TP_ARGS(skb, ret)
+);
+
+DEFINE_EVENT(net_dev_rx_exit_template, netif_receive_skb_exit,
+
+	TP_PROTO(struct sk_buff *skb, int ret),
+
+	TP_ARGS(skb, ret)
+);
+
+DEFINE_EVENT(net_dev_rx_exit_template, netif_rx_exit,
+
+	TP_PROTO(struct sk_buff *skb, int ret),
+
+	TP_ARGS(skb, ret)
+);
+
+DEFINE_EVENT(net_dev_rx_exit_template, netif_rx_ni_exit,
+
+	TP_PROTO(struct sk_buff *skb, int ret),
+
+	TP_ARGS(skb, ret)
+);
+
+TRACE_EVENT(netif_receive_skb_list_exit,
+
+	TP_PROTO(struct sk_buff *skb),
+
+	TP_ARGS(skb),
+
+	TP_STRUCT__entry(
+		__field(void *,	skbaddr)
+	),
+
+	TP_fast_assign(
+		__entry->skbaddr = skb;
+	),
+
+	TP_printk("skbaddr=%p", __entry->skbaddr)
+);
+
 #endif /* _TRACE_NET_H */
 
 /* This part must be outside protection */
diff --git a/net/core/dev.c b/net/core/dev.c
index 0ffcbdd55fa9..c4dc5df34abe 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4520,9 +4520,14 @@  static int netif_rx_internal(struct sk_buff *skb)
 
 int netif_rx(struct sk_buff *skb)
 {
+	int ret;
+
 	trace_netif_rx_entry(skb);
 
-	return netif_rx_internal(skb);
+	ret = netif_rx_internal(skb);
+	trace_netif_rx_exit(skb, ret);
+
+	return ret;
 }
 EXPORT_SYMBOL(netif_rx);
 
@@ -4537,6 +4542,7 @@  int netif_rx_ni(struct sk_buff *skb)
 	if (local_softirq_pending())
 		do_softirq();
 	preempt_enable();
+	trace_netif_rx_ni_exit(skb, err);
 
 	return err;
 }
@@ -5222,9 +5228,14 @@  static void netif_receive_skb_list_internal(struct list_head *head)
  */
 int netif_receive_skb(struct sk_buff *skb)
 {
+	int ret;
+
 	trace_netif_receive_skb_entry(skb);
 
-	return netif_receive_skb_internal(skb);
+	ret = netif_receive_skb_internal(skb);
+	trace_netif_receive_skb_exit(skb, ret);
+
+	return ret;
 }
 EXPORT_SYMBOL(netif_receive_skb);
 
@@ -5244,9 +5255,15 @@  void netif_receive_skb_list(struct list_head *head)
 
 	if (list_empty(head))
 		return;
-	list_for_each_entry(skb, head, list)
-		trace_netif_receive_skb_list_entry(skb);
+	if (trace_netif_receive_skb_list_entry_enabled()) {
+		list_for_each_entry(skb, head, list)
+			trace_netif_receive_skb_list_entry(skb);
+	}
 	netif_receive_skb_list_internal(head);
+	if (trace_netif_receive_skb_list_exit_enabled()) {
+		list_for_each_entry(skb, head, list)
+			trace_netif_receive_skb_list_exit(skb);
+	}
 }
 EXPORT_SYMBOL(netif_receive_skb_list);
 
@@ -5634,12 +5651,17 @@  static gro_result_t napi_skb_finish(gro_result_t ret, struct sk_buff *skb)
 
 gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
 {
+	gro_result_t ret;
+
 	skb_mark_napi_id(skb, napi);
 	trace_napi_gro_receive_entry(skb);
 
 	skb_gro_reset_offset(skb);
 
-	return napi_skb_finish(dev_gro_receive(napi, skb), skb);
+	ret = napi_skb_finish(dev_gro_receive(napi, skb), skb);
+	trace_napi_gro_receive_exit(skb, ret);
+
+	return ret;
 }
 EXPORT_SYMBOL(napi_gro_receive);
 
@@ -5753,6 +5775,7 @@  static struct sk_buff *napi_frags_skb(struct napi_struct *napi)
 
 gro_result_t napi_gro_frags(struct napi_struct *napi)
 {
+	gro_result_t ret;
 	struct sk_buff *skb = napi_frags_skb(napi);
 
 	if (!skb)
@@ -5760,7 +5783,10 @@  gro_result_t napi_gro_frags(struct napi_struct *napi)
 
 	trace_napi_gro_frags_entry(skb);
 
-	return napi_frags_finish(napi, skb, dev_gro_receive(napi, skb));
+	ret = napi_frags_finish(napi, skb, dev_gro_receive(napi, skb));
+	trace_napi_gro_frags_exit(skb, ret);
+
+	return ret;
 }
 EXPORT_SYMBOL(napi_gro_frags);