[v2,bpf-next,5/8] bpf: introduce BPF_RAW_TRACEPOINT

From: Alexei Starovoitov <ast@kernel.org>

From: Alexei Starovoitov <ast@kernel.org>

Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access
kernel internal arguments of the tracepoints in their raw form.

From bpf program point of view the access to the arguments look like:
struct bpf_raw_tracepoint_args {
       __u64 args[0];
};

int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
{
  // program can read args[N] where N depends on tracepoint
  // and statically verified at program load+attach time
}

kprobe+bpf infrastructure allows programs access function arguments.
This feature allows programs access raw tracepoint arguments.

Similar to proposed 'dynamic ftrace events' there are no abi guarantees
to what the tracepoints arguments are and what their meaning is.
The program needs to type cast args properly and use bpf_probe_read()
helper to access struct fields when argument is a pointer.

For every tracepoint __bpf_trace_##call function is prepared.
In assembler it looks like:
(gdb) disassemble __bpf_trace_xdp_exception
Dump of assembler code for function __bpf_trace_xdp_exception:
   0xffffffff81132080 <+0>:     mov    %ecx,%ecx
   0xffffffff81132082 <+2>:     jmpq   0xffffffff811231f0 <bpf_trace_run3>

where

TRACE_EVENT(xdp_exception,
        TP_PROTO(const struct net_device *dev,
                 const struct bpf_prog *xdp, u32 act),

The above assembler snippet is casting 32-bit 'act' field into 'u64'
to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is.
All of ~500 of __bpf_trace_*() functions are only 5-10 byte long
and in total this approach adds 7k bytes to .text and 8k bytes
to .rodata since the probe funcs need to appear in kallsyms.
The alternative of having __bpf_trace_##call being global in kallsyms
could have been to keep them static and add another pointer to these
static functions to 'struct trace_event_class' and 'struct trace_event_call',
but keeping them global simplifies implementation and keeps it indepedent
from the tracing side.

Also such approach gives the lowest possible overhead
while calling trace_xdp_exception() from kernel C code and
transitioning into bpf land.
Since tracepoint+bpf are used at speeds of 1M+ events per second
this is very valuable optimization.

Since ftrace and perf side are not involved the new
BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced
that returns anon_inode FD of 'bpf-raw-tracepoint' object.

The user space looks like:
// load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
prog_fd = bpf_prog_load(...);
// receive anon_inode fd for given bpf_raw_tracepoint with prog attached
raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);

Ctrl-C of tracing daemon or cmdline tool that uses this feature
will automatically detach bpf program, unload it and
unregister tracepoint probe.

On the kernel side for_each_kernel_tracepoint() is used
to find a tracepoint with "xdp_exception" name
(that would be __tracepoint_xdp_exception record)

Then kallsyms_lookup_name() is used to find the addr
of __bpf_trace_xdp_exception() probe function.

And finally tracepoint_probe_register() is used to connect probe
with tracepoint.

Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf
tracepoint mechanisms. perf_event_open() can be used in parallel
on the same tracepoint.
Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted.
Each with its own bpf program. The kernel will execute
all tracepoint probes and all attached bpf programs.

In the future bpf_raw_tracepoints can be extended with
query/introspection logic.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
---
 include/linux/bpf_types.h    |   1 +
 include/linux/trace_events.h |  57 +++++++++
 include/trace/bpf_probe.h    |  87 +++++++++++++
 include/trace/define_trace.h |   1 +
 include/uapi/linux/bpf.h     |  11 ++
 kernel/bpf/syscall.c         |  87 +++++++++++++
 kernel/trace/bpf_trace.c     | 283 +++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 527 insertions(+)
 create mode 100644 include/trace/bpf_probe.h

Message ID	20180321185448.2806324-6-ast@fb.com
State	Superseded, archived
Delegated to:	BPF Maintainers
Headers	show Return-Path: <netdev-owner@vger.kernel.org> X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=<UNKNOWN>) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.b="D6fnUYLh"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 405zXW3Yb5z9s0t for <patchwork-incoming-netdev@ozlabs.org>; Thu, 22 Mar 2018 05:55:55 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752971AbeCUSzy (ORCPT <rfc822;patchwork-incoming-netdev@ozlabs.org>); Wed, 21 Mar 2018 14:55:54 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:48008 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752904AbeCUSzg (ORCPT <rfc822;netdev@vger.kernel.org>); Wed, 21 Mar 2018 14:55:36 -0400 Received: from pps.filterd (m0001255.ppops.net [127.0.0.1]) by mx0b-00082601.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w2LIspcS019749 for <netdev@vger.kernel.org>; Wed, 21 Mar 2018 11:55:35 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=uRPsIrrkEVm9HUoUbC2QMX21q0gWysYUiDpJuO5kzJE=; b=D6fnUYLhFVEpOUn/HOOqw87Gobf94gpstggx0ig7uovHull/4UPvd6ztLhIFyVvynBXg +INyYb8tAaWcJHtG0V7le2Jf7Ti5d7DnZvQQ3On+AdCROlza1unVdcnvx8edhG8WMeGA Vhm8ciJSZ6+0NN9JNtrI6B/nOopjO83ZT6w= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0b-00082601.pphosted.com with ESMTP id 2gush5rqmg-13 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT) for <netdev@vger.kernel.org>; Wed, 21 Mar 2018 11:55:35 -0700 Received: from mx-out.facebook.com (192.168.52.123) by PRN-CHUB14.TheFacebook.com (192.168.16.24) with Microsoft SMTP Server id 14.3.361.1; Wed, 21 Mar 2018 11:54:50 -0700 Received: by devbig500.prn1.facebook.com (Postfix, from userid 572438) id 6758F2180D5E; Wed, 21 Mar 2018 11:54:48 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: Alexei Starovoitov <ast@fb.com> Smtp-Origin-Hostname: devbig500.prn1.facebook.com To: <davem@davemloft.net> CC: <daniel@iogearbox.net>, <torvalds@linux-foundation.org>, <peterz@infradead.org>, <rostedt@goodmis.org>, <netdev@vger.kernel.org>, <kernel-team@fb.com>, <linux-api@vger.kernel.org> Smtp-Origin-Cluster: prn1c29 Subject: [PATCH v2 bpf-next 5/8] bpf: introduce BPF_RAW_TRACEPOINT Date: Wed, 21 Mar 2018 11:54:45 -0700 Message-ID: <20180321185448.2806324-6-ast@fb.com> X-Mailer: git-send-email 2.9.5 In-Reply-To: <20180321185448.2806324-1-ast@fb.com> References: <20180321185448.2806324-1-ast@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 Content-Type: text/plain X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2018-03-21_09:, , signatures=0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: <netdev.vger.kernel.org> X-Mailing-List: netdev@vger.kernel.org
Series	bpf, tracing: introduce bpf raw tracepoints \| expand [v2,bpf-next,0/8] bpf, tracing: introduce bpf raw tracepoints [v2,bpf-next,1/8] treewide: remove struct-pass-by-value from tracepoints arguments [v2,bpf-next,2/8] net/mediatek: disambiguate mt76 vs mt7601u trace events [v2,bpf-next,3/8] net/mac802154: disambiguate mac80215 vs mac802154 trace events [v2,bpf-next,4/8] tracepoint: compute num_args at build time [v2,bpf-next,5/8] bpf: introduce BPF_RAW_TRACEPOINT [v2,bpf-next,6/8] libbpf: add bpf_raw_tracepoint_open helper [v2,bpf-next,7/8] samples/bpf: raw tracepoint test [v2,bpf-next,8/8] selftests/bpf: test for bpf_get_stackid() from raw tracepoints

[v2,bpf-next,5/8] bpf: introduce BPF_RAW_TRACEPOINT

Commit Message

Comments

Patch