Message ID | 1459831974-2891931-2-git-send-email-ast@fb.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Mon, Apr 04, 2016 at 09:52:47PM -0700, Alexei Starovoitov wrote: > avoid memset in perf_fetch_caller_regs, since it's the critical path of all tracepoints. > It's called from perf_sw_event_sched, perf_event_task_sched_in and all of perf_trace_##call > with this_cpu_ptr(&__perf_regs[..]) which are zero initialized by perpcu_alloc Its not actually allocated; but because its a static uninitialized variable we get .bss like behaviour and the initial value is copied to all CPUs when the per-cpu allocator thingy bootstraps SMP IIRC. > and > subsequent call to perf_arch_fetch_caller_regs initializes the same fields on all archs, > so we can safely drop memset from all of the above cases and Indeed. > move it into > perf_ftrace_function_call that calls it with stack allocated pt_regs. Hmm, is there a reason that's still on-stack instead of using the per-cpu thing, Steve? > Signed-off-by: Alexei Starovoitov <ast@kernel.org> In any case, Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
On 4/5/16 5:06 AM, Peter Zijlstra wrote: > On Mon, Apr 04, 2016 at 09:52:47PM -0700, Alexei Starovoitov wrote: >> avoid memset in perf_fetch_caller_regs, since it's the critical path of all tracepoints. >> It's called from perf_sw_event_sched, perf_event_task_sched_in and all of perf_trace_##call >> with this_cpu_ptr(&__perf_regs[..]) which are zero initialized by perpcu_alloc > > Its not actually allocated; but because its a static uninitialized > variable we get .bss like behaviour and the initial value is copied to > all CPUs when the per-cpu allocator thingy bootstraps SMP IIRC. yes, it's .bss-like in a special section. I think static percpu still goes through some fancy boot time init similar to dynamic. What I tried to emphasize that either static or dynamic percpu areas are guaranteed to be zero initialized. >> and >> subsequent call to perf_arch_fetch_caller_regs initializes the same fields on all archs, >> so we can safely drop memset from all of the above cases and > > Indeed. > >> move it into >> perf_ftrace_function_call that calls it with stack allocated pt_regs. > > Hmm, is there a reason that's still on-stack instead of using the > per-cpu thing, Steve? > >> Signed-off-by: Alexei Starovoitov <ast@kernel.org> > > In any case, > > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Thanks for the quick review.
On Tue, 5 Apr 2016 14:06:26 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Mon, Apr 04, 2016 at 09:52:47PM -0700, Alexei Starovoitov wrote: > > avoid memset in perf_fetch_caller_regs, since it's the critical path of all tracepoints. > > It's called from perf_sw_event_sched, perf_event_task_sched_in and all of perf_trace_##call > > with this_cpu_ptr(&__perf_regs[..]) which are zero initialized by perpcu_alloc > > Its not actually allocated; but because its a static uninitialized > variable we get .bss like behaviour and the initial value is copied to > all CPUs when the per-cpu allocator thingy bootstraps SMP IIRC. > > > and > > subsequent call to perf_arch_fetch_caller_regs initializes the same fields on all archs, > > so we can safely drop memset from all of the above cases and > > Indeed. > > > move it into > > perf_ftrace_function_call that calls it with stack allocated pt_regs. > > Hmm, is there a reason that's still on-stack instead of using the > per-cpu thing, Steve? Well, what do you do when you are tracing with regs in an interrupt that already set the per cpu regs field? We could create our own per-cpu one as well, but then that would require checking which level we are in, as we can have one for normal context, one for softirq context, one for irq context and one for nmi context. -- Steve > > > Signed-off-by: Alexei Starovoitov <ast@kernel.org> > > In any case, > > Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index f291275ffd71..e89f7199c223 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -882,8 +882,6 @@ static inline void perf_arch_fetch_caller_regs(struct pt_regs *regs, unsigned lo */ static inline void perf_fetch_caller_regs(struct pt_regs *regs) { - memset(regs, 0, sizeof(*regs)); - perf_arch_fetch_caller_regs(regs, CALLER_ADDR0); } diff --git a/kernel/trace/trace_event_perf.c b/kernel/trace/trace_event_perf.c index 00df25fd86ef..7a68afca8249 100644 --- a/kernel/trace/trace_event_perf.c +++ b/kernel/trace/trace_event_perf.c @@ -316,6 +316,7 @@ perf_ftrace_function_call(unsigned long ip, unsigned long parent_ip, BUILD_BUG_ON(ENTRY_SIZE > PERF_MAX_TRACE_SIZE); + memset(®s, 0, sizeof(regs)); perf_fetch_caller_regs(®s); entry = perf_trace_buf_prepare(ENTRY_SIZE, TRACE_FN, NULL, &rctx);
avoid memset in perf_fetch_caller_regs, since it's the critical path of all tracepoints. It's called from perf_sw_event_sched, perf_event_task_sched_in and all of perf_trace_##call with this_cpu_ptr(&__perf_regs[..]) which are zero initialized by perpcu_alloc and subsequent call to perf_arch_fetch_caller_regs initializes the same fields on all archs, so we can safely drop memset from all of the above cases and move it into perf_ftrace_function_call that calls it with stack allocated pt_regs. Signed-off-by: Alexei Starovoitov <ast@kernel.org> --- include/linux/perf_event.h | 2 -- kernel/trace/trace_event_perf.c | 1 + 2 files changed, 1 insertion(+), 2 deletions(-)