diff mbox

[net-next,1/8] perf: optimize perf_fetch_caller_regs

Message ID 1459831974-2891931-2-git-send-email-ast@fb.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Alexei Starovoitov April 5, 2016, 4:52 a.m. UTC
avoid memset in perf_fetch_caller_regs, since it's the critical path of all tracepoints.
It's called from perf_sw_event_sched, perf_event_task_sched_in and all of perf_trace_##call
with this_cpu_ptr(&__perf_regs[..]) which are zero initialized by perpcu_alloc and
subsequent call to perf_arch_fetch_caller_regs initializes the same fields on all archs,
so we can safely drop memset from all of the above cases and move it into
perf_ftrace_function_call that calls it with stack allocated pt_regs.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/perf_event.h      | 2 --
 kernel/trace/trace_event_perf.c | 1 +
 2 files changed, 1 insertion(+), 2 deletions(-)

Comments

Peter Zijlstra April 5, 2016, 12:06 p.m. UTC | #1
On Mon, Apr 04, 2016 at 09:52:47PM -0700, Alexei Starovoitov wrote:
> avoid memset in perf_fetch_caller_regs, since it's the critical path of all tracepoints.
> It's called from perf_sw_event_sched, perf_event_task_sched_in and all of perf_trace_##call
> with this_cpu_ptr(&__perf_regs[..]) which are zero initialized by perpcu_alloc

Its not actually allocated; but because its a static uninitialized
variable we get .bss like behaviour and the initial value is copied to
all CPUs when the per-cpu allocator thingy bootstraps SMP IIRC.

> and
> subsequent call to perf_arch_fetch_caller_regs initializes the same fields on all archs,
> so we can safely drop memset from all of the above cases and 

Indeed.

> move it into
> perf_ftrace_function_call that calls it with stack allocated pt_regs.

Hmm, is there a reason that's still on-stack instead of using the
per-cpu thing, Steve?

> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

In any case,

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Alexei Starovoitov April 5, 2016, 5:41 p.m. UTC | #2
On 4/5/16 5:06 AM, Peter Zijlstra wrote:
> On Mon, Apr 04, 2016 at 09:52:47PM -0700, Alexei Starovoitov wrote:
>> avoid memset in perf_fetch_caller_regs, since it's the critical path of all tracepoints.
>> It's called from perf_sw_event_sched, perf_event_task_sched_in and all of perf_trace_##call
>> with this_cpu_ptr(&__perf_regs[..]) which are zero initialized by perpcu_alloc
>
> Its not actually allocated; but because its a static uninitialized
> variable we get .bss like behaviour and the initial value is copied to
> all CPUs when the per-cpu allocator thingy bootstraps SMP IIRC.

yes, it's .bss-like in a special section. I think static percpu still
goes through some fancy boot time init similar to dynamic.
What I tried to emphasize that either static or dynamic percpu areas
are guaranteed to be zero initialized.

>> and
>> subsequent call to perf_arch_fetch_caller_regs initializes the same fields on all archs,
>> so we can safely drop memset from all of the above cases and
>
> Indeed.
>
>> move it into
>> perf_ftrace_function_call that calls it with stack allocated pt_regs.
>
> Hmm, is there a reason that's still on-stack instead of using the
> per-cpu thing, Steve?
>
>> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
>
> In any case,
>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Thanks for the quick review.
Steven Rostedt April 8, 2016, 10:12 p.m. UTC | #3
On Tue, 5 Apr 2016 14:06:26 +0200
Peter Zijlstra <peterz@infradead.org> wrote:

> On Mon, Apr 04, 2016 at 09:52:47PM -0700, Alexei Starovoitov wrote:
> > avoid memset in perf_fetch_caller_regs, since it's the critical path of all tracepoints.
> > It's called from perf_sw_event_sched, perf_event_task_sched_in and all of perf_trace_##call
> > with this_cpu_ptr(&__perf_regs[..]) which are zero initialized by perpcu_alloc  
> 
> Its not actually allocated; but because its a static uninitialized
> variable we get .bss like behaviour and the initial value is copied to
> all CPUs when the per-cpu allocator thingy bootstraps SMP IIRC.
> 
> > and
> > subsequent call to perf_arch_fetch_caller_regs initializes the same fields on all archs,
> > so we can safely drop memset from all of the above cases and   
> 
> Indeed.
> 
> > move it into
> > perf_ftrace_function_call that calls it with stack allocated pt_regs.  
> 
> Hmm, is there a reason that's still on-stack instead of using the
> per-cpu thing, Steve?

Well, what do you do when you are tracing with regs in an interrupt
that already set the per cpu regs field? We could create our own
per-cpu one as well, but then that would require checking which level
we are in, as we can have one for normal context, one for softirq
context, one for irq context and one for nmi context.

-- Steve



> 
> > Signed-off-by: Alexei Starovoitov <ast@kernel.org>  
> 
> In any case,
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
diff mbox

Patch

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f291275ffd71..e89f7199c223 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -882,8 +882,6 @@  static inline void perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned lo
  */
 static inline void perf_fetch_caller_regs(struct pt_regs *regs)
 {
-	memset(regs, 0, sizeof(*regs));
-
 	perf_arch_fetch_caller_regs(regs, CALLER_ADDR0);
 }
 
diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c
index 00df25fd86ef..7a68afca8249 100644
--- a/kernel/trace/trace_event_perf.c
+++ b/kernel/trace/trace_event_perf.c
@@ -316,6 +316,7 @@  perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip,
 
 	BUILD_BUG_ON(ENTRY_SIZE > PERF_MAX_TRACE_SIZE);
 
+	memset(&regs, 0, sizeof(regs));
 	perf_fetch_caller_regs(&regs);
 
 	entry = perf_trace_buf_prepare(ENTRY_SIZE, TRACE_FN, NULL, &rctx);