mbox series

[bpf-next,v9,00/10] bpf: add bpf_get_stack helper

Message ID 20180429052816.2882032-1-yhs@fb.com
Headers show
Series bpf: add bpf_get_stack helper | expand

Message

Yonghong Song April 29, 2018, 5:28 a.m. UTC
Currently, stackmap and bpf_get_stackid helper are provided
for bpf program to get the stack trace. This approach has
a limitation though. If two stack traces have the same hash,
only one will get stored in the stackmap table regardless of
whether BPF_F_REUSE_STACKID is specified or not,
so some stack traces may be missing from user perspective.

This patch implements a new helper, bpf_get_stack, will
send stack traces directly to bpf program. The bpf program
is able to see all stack traces, and then can do in-kernel
processing or send stack traces to user space through
shared map or bpf_perf_event_output.

Patches #1 and #2 implemented the core kernel support.
Patch #3 removes two never-hit branches in verifier.
Patches #4 and #5 are two verifier improves to make
bpf programming easier. Patch #6 synced the new helper
to tools headers. Patch #7 moved perf_event polling code
and ksym lookup code from samples/bpf to
tools/testing/selftests/bpf. Patch #8 added a verifier
test in tools/bpf for new verifier change.
Patches #9 and #10 added tests for raw tracepoint prog
and tracepoint prog respectively.

Changelogs:
  v8 -> v9:
    . make function perf_event_mmap (in trace_helpers.c) extern
      to decouple perf_event_mmap and perf_event_poller.
    . add jit enabled handling for kernel stack verification
      in Patch #9. Since we did not have a good way to
      verify jit enabled kernel stack, just return true if
      the kernel stack is not empty.
    . In path #9, using raw_syscalls/sys_enter instead of
      sched/sched_switch, removed calling cmd
      "task 1 dd if=/dev/zero of=/dev/null" which is left
      with dangling process after the program exited.

  v7 -> v8:
    . rebase on top of latest bpf-next
    . simplify BPF_ARSH dst_reg->smin_val/smax_value tracking
    . rewrite the description of bpf_get_stack() in uapi bpf.h
      based on new format.
  v6 -> v7:
    . do perf callchain buffer allocation inside the
      verifier. so if the prog->has_callchain_buf is set,
      it is guaranteed that the buffer has been allocated.
    . change condition "trace_nr <= skip" to "trace_nr < skip"
      so that for zero size buffer, return 0 instead of -EFAULT
  v5 -> v6:
    . after refining return register smax_value and umax_value
      for helpers bpf_get_stack and bpf_probe_read_str,
      bounds and var_off of the return register are further refined.
    . added missing commit message for tools header sync commit.
    . removed one unnecessary empty line.
  v4 -> v5:
    . relied on dst_reg->var_off to refine umin_val/umax_val
      in verifier handling BPF_ARSH value range tracking,
      suggested by Edward.
  v3 -> v4:
    . fixed a bug when meta ptr is set to NULL in check_func_arg.
    . introduced tnum_arshift and added detailed comments for
      the underlying implementation
    . avoided using VLA in tools/bpf test_progs.
  v2 -> v3:
    . used meta to track helper memory size argument
    . implemented range checking for ARSH in verifier
    . moved perf event polling and ksym related functions
      from samples/bpf to tools/bpf
    . added test to compare build id's between bpf_get_stackid
      and bpf_get_stack
  v1 -> v2:
    . fixed compilation error when CONFIG_PERF_EVENTS is not enabled

Yonghong Song (10):
  bpf: change prototype for stack_map_get_build_id_offset
  bpf: add bpf_get_stack helper
  bpf/verifier: refine retval R0 state for bpf_get_stack helper
  bpf: remove never-hit branches in verifier adjust_scalar_min_max_vals
  bpf/verifier: improve register value range tracking with ARSH
  tools/bpf: add bpf_get_stack helper to tools headers
  samples/bpf: move common-purpose trace functions to selftests
  tools/bpf: add a verifier test case for bpf_get_stack helper and ARSH
  tools/bpf: add a test for bpf_get_stack with raw tracepoint prog
  tools/bpf: add a test for bpf_get_stack with tracepoint prog

 include/linux/bpf.h                                |   1 +
 include/linux/filter.h                             |   3 +-
 include/linux/tnum.h                               |   4 +-
 include/uapi/linux/bpf.h                           |  42 +++-
 kernel/bpf/core.c                                  |   5 +
 kernel/bpf/stackmap.c                              |  80 ++++++-
 kernel/bpf/tnum.c                                  |  10 +
 kernel/bpf/verifier.c                              |  80 ++++++-
 kernel/trace/bpf_trace.c                           |  50 ++++-
 samples/bpf/Makefile                               |  11 +-
 samples/bpf/bpf_load.c                             |  63 ------
 samples/bpf/bpf_load.h                             |   7 -
 samples/bpf/offwaketime_user.c                     |   1 +
 samples/bpf/sampleip_user.c                        |   1 +
 samples/bpf/spintest_user.c                        |   1 +
 samples/bpf/trace_event_user.c                     |   1 +
 samples/bpf/trace_output_user.c                    | 110 +---------
 tools/include/uapi/linux/bpf.h                     |  42 +++-
 tools/testing/selftests/bpf/Makefile               |   4 +-
 tools/testing/selftests/bpf/bpf_helpers.h          |   2 +
 tools/testing/selftests/bpf/test_get_stack_rawtp.c | 102 +++++++++
 tools/testing/selftests/bpf/test_progs.c           | 242 +++++++++++++++++++--
 .../selftests/bpf/test_stacktrace_build_id.c       |  20 +-
 tools/testing/selftests/bpf/test_stacktrace_map.c  |  19 +-
 tools/testing/selftests/bpf/test_verifier.c        |  45 ++++
 tools/testing/selftests/bpf/trace_helpers.c        | 180 +++++++++++++++
 tools/testing/selftests/bpf/trace_helpers.h        |  23 ++
 27 files changed, 927 insertions(+), 222 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/test_get_stack_rawtp.c
 create mode 100644 tools/testing/selftests/bpf/trace_helpers.c
 create mode 100644 tools/testing/selftests/bpf/trace_helpers.h

Comments

Alexei Starovoitov April 29, 2018, 3:55 p.m. UTC | #1
On Sat, Apr 28, 2018 at 10:28:06PM -0700, Yonghong Song wrote:
> Currently, stackmap and bpf_get_stackid helper are provided
> for bpf program to get the stack trace. This approach has
> a limitation though. If two stack traces have the same hash,
> only one will get stored in the stackmap table regardless of
> whether BPF_F_REUSE_STACKID is specified or not,
> so some stack traces may be missing from user perspective.
> 
> This patch implements a new helper, bpf_get_stack, will
> send stack traces directly to bpf program. The bpf program
> is able to see all stack traces, and then can do in-kernel
> processing or send stack traces to user space through
> shared map or bpf_perf_event_output.
> 
> Patches #1 and #2 implemented the core kernel support.
> Patch #3 removes two never-hit branches in verifier.
> Patches #4 and #5 are two verifier improves to make
> bpf programming easier. Patch #6 synced the new helper
> to tools headers. Patch #7 moved perf_event polling code
> and ksym lookup code from samples/bpf to
> tools/testing/selftests/bpf. Patch #8 added a verifier
> test in tools/bpf for new verifier change.
> Patches #9 and #10 added tests for raw tracepoint prog
> and tracepoint prog respectively.
> 
> Changelogs:
>   v8 -> v9:
>     . make function perf_event_mmap (in trace_helpers.c) extern
>       to decouple perf_event_mmap and perf_event_poller.
>     . add jit enabled handling for kernel stack verification
>       in Patch #9. Since we did not have a good way to
>       verify jit enabled kernel stack, just return true if
>       the kernel stack is not empty.
>     . In path #9, using raw_syscalls/sys_enter instead of
>       sched/sched_switch, removed calling cmd
>       "task 1 dd if=/dev/zero of=/dev/null" which is left
>       with dangling process after the program exited.
> 
>   v7 -> v8:
>     . rebase on top of latest bpf-next
>     . simplify BPF_ARSH dst_reg->smin_val/smax_value tracking
>     . rewrite the description of bpf_get_stack() in uapi bpf.h
>       based on new format.
>   v6 -> v7:
>     . do perf callchain buffer allocation inside the
>       verifier. so if the prog->has_callchain_buf is set,
>       it is guaranteed that the buffer has been allocated.
>     . change condition "trace_nr <= skip" to "trace_nr < skip"
>       so that for zero size buffer, return 0 instead of -EFAULT
>   v5 -> v6:
>     . after refining return register smax_value and umax_value
>       for helpers bpf_get_stack and bpf_probe_read_str,
>       bounds and var_off of the return register are further refined.
>     . added missing commit message for tools header sync commit.
>     . removed one unnecessary empty line.
>   v4 -> v5:
>     . relied on dst_reg->var_off to refine umin_val/umax_val
>       in verifier handling BPF_ARSH value range tracking,
>       suggested by Edward.
>   v3 -> v4:
>     . fixed a bug when meta ptr is set to NULL in check_func_arg.
>     . introduced tnum_arshift and added detailed comments for
>       the underlying implementation
>     . avoided using VLA in tools/bpf test_progs.
>   v2 -> v3:
>     . used meta to track helper memory size argument
>     . implemented range checking for ARSH in verifier
>     . moved perf event polling and ksym related functions
>       from samples/bpf to tools/bpf
>     . added test to compare build id's between bpf_get_stackid
>       and bpf_get_stack
>   v1 -> v2:
>     . fixed compilation error when CONFIG_PERF_EVENTS is not enabled

Applied, Thanks Yonghong.