From patchwork Tue May 15 23:45:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yonghong Song X-Patchwork-Id: 914033 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.b="HDjiTkii"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40lvMk4DdYz9s15 for ; Wed, 16 May 2018 09:45:54 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752433AbeEOXpv (ORCPT ); Tue, 15 May 2018 19:45:51 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:58912 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752560AbeEOXps (ORCPT ); Tue, 15 May 2018 19:45:48 -0400 Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w4FNiVtR003284 for ; Tue, 15 May 2018 16:45:47 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=k3aW4nwBtLCrkrKLay75LB99sXDgTgWmHjD3sOZ26S0=; b=HDjiTkiirV8FvgSvtSePn41/ZsVhZ55JZ1lEjV5Rto57Gh8o6Rucuv2eGTdouciKkGHp w2bdwsTKf2VFj1ou0gol9/pTETfsVwhyiu3MoJI6Ed0Ad89TSbk4C3i00ztabhfNNjQk y8oHKyp7Hob1geFvu0RIdHTyVUvnx1jf6UU= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2j07pa08xf-1 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Tue, 15 May 2018 16:45:47 -0700 Received: from mx-out.facebook.com (192.168.52.123) by PRN-CHUB09.TheFacebook.com (192.168.16.19) with Microsoft SMTP Server id 14.3.361.1; Tue, 15 May 2018 16:45:46 -0700 Received: by devbig003.ftw2.facebook.com (Postfix, from userid 128203) id 2B9B7370157E; Tue, 15 May 2018 16:45:22 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: Yonghong Song Smtp-Origin-Hostname: devbig003.ftw2.facebook.com To: , , , CC: Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH bpf-next 2/7] bpf: introduce bpf subcommand BPF_PERF_EVENT_QUERY Date: Tue, 15 May 2018 16:45:16 -0700 Message-ID: <20180515234521.856763-3-yhs@fb.com> X-Mailer: git-send-email 2.9.5 In-Reply-To: <20180515234521.856763-1-yhs@fb.com> References: <20180515234521.856763-1-yhs@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-05-15_09:, , signatures=0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Currently, suppose a userspace application has loaded a bpf program and attached it to a tracepoint/kprobe/uprobe, and a bpf introspection tool, e.g., bpftool, wants to show which bpf program is attached to which tracepoint/kprobe/uprobe. Such attachment information will be really useful to understand the overall bpf deployment in the system. There is a name field (16 bytes) for each program, which could be used to encode the attachment point. There are some drawbacks for this approaches. First, bpftool user (e.g., an admin) may not really understand the association between the name and the attachment point. Second, if one program is attached to multiple places, encoding a proper name which can imply all these attachments becomes difficult. This patch introduces a new bpf subcommand BPF_PERF_EVENT_QUERY. Given a pid and fd, if the is associated with a tracepoint/kprobe/uprobea perf event, BPF_PERF_EVENT_QUERY will return . prog_id . tracepoint name, or . k[ret]probe funcname + offset or kernel addr, or . u[ret]probe filename + offset to the userspace. The user can use "bpftool prog" to find more information about bpf program itself with prog_id. Signed-off-by: Yonghong Song --- include/linux/trace_events.h | 15 ++++++ include/uapi/linux/bpf.h | 25 ++++++++++ kernel/bpf/syscall.c | 113 +++++++++++++++++++++++++++++++++++++++++++ kernel/trace/bpf_trace.c | 53 ++++++++++++++++++++ kernel/trace/trace_kprobe.c | 29 +++++++++++ kernel/trace/trace_uprobe.c | 22 +++++++++ 6 files changed, 257 insertions(+) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 2bde3ef..ec1f604 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -473,6 +473,9 @@ int perf_event_query_prog_array(struct perf_event *event, void __user *info); int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog); int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog); struct bpf_raw_event_map *bpf_find_raw_tracepoint(const char *name); +int bpf_get_perf_event_info(struct file *file, u32 *prog_id, u32 *prog_info, + const char **buf, u64 *probe_offset, + u64 *probe_addr); #else static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx) { @@ -504,6 +507,12 @@ static inline struct bpf_raw_event_map *bpf_find_raw_tracepoint(const char *name { return NULL; } +static inline int bpf_get_perf_event_info(struct file *file, u32 *prog_id, + u32 *prog_info, const char **buf, + u64 *probe_offset, u64 *probe_addr) +{ + return -EOPNOTSUPP; +} #endif enum { @@ -560,10 +569,16 @@ extern void perf_trace_del(struct perf_event *event, int flags); #ifdef CONFIG_KPROBE_EVENTS extern int perf_kprobe_init(struct perf_event *event, bool is_retprobe); extern void perf_kprobe_destroy(struct perf_event *event); +extern int bpf_get_kprobe_info(struct perf_event *event, u32 *prog_info, + const char **symbol, u64 *probe_offset, + u64 *probe_addr, bool perf_type_tracepoint); #endif #ifdef CONFIG_UPROBE_EVENTS extern int perf_uprobe_init(struct perf_event *event, bool is_retprobe); extern void perf_uprobe_destroy(struct perf_event *event); +extern int bpf_get_uprobe_info(struct perf_event *event, u32 *prog_info, + const char **filename, u64 *probe_offset, + bool perf_type_tracepoint); #endif extern int ftrace_profile_set_filter(struct perf_event *event, int event_id, char *filter_str); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index d94d333..b78eca1 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -97,6 +97,7 @@ enum bpf_cmd { BPF_RAW_TRACEPOINT_OPEN, BPF_BTF_LOAD, BPF_BTF_GET_FD_BY_ID, + BPF_PERF_EVENT_QUERY, }; enum bpf_map_type { @@ -379,6 +380,22 @@ union bpf_attr { __u32 btf_log_size; __u32 btf_log_level; }; + + struct { + int pid; /* input: pid */ + int fd; /* input: fd */ + __u32 flags; /* input: flags */ + __u32 buf_len; /* input: buf len */ + __aligned_u64 buf; /* input/output: + * tp_name for tracepoint + * symbol for kprobe + * filename for uprobe + */ + __u32 prog_id; /* output: prod_id */ + __u32 prog_info; /* output: BPF_PERF_INFO_* */ + __u64 probe_offset; /* output: probe_offset */ + __u64 probe_addr; /* output: probe_addr */ + } perf_event_query; } __attribute__((aligned(8))); /* The description below is an attempt at providing documentation to eBPF @@ -2450,4 +2467,12 @@ struct bpf_fib_lookup { __u8 dmac[6]; /* ETH_ALEN */ }; +enum { + BPF_PERF_INFO_TP_NAME, /* tp name */ + BPF_PERF_INFO_KPROBE, /* (symbol + offset) or addr */ + BPF_PERF_INFO_KRETPROBE, /* (symbol + offset) or addr */ + BPF_PERF_INFO_UPROBE, /* filename + offset */ + BPF_PERF_INFO_URETPROBE, /* filename + offset */ +}; + #endif /* _UAPI__LINUX_BPF_H__ */ diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index e2aeb5e..347e4d2 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -18,7 +18,9 @@ #include #include #include +#include #include +#include #include #include #include @@ -2093,6 +2095,114 @@ static int bpf_btf_get_fd_by_id(const union bpf_attr *attr) return btf_get_fd_by_id(attr->btf_id); } +static int bpf_perf_event_info_copy(const union bpf_attr *attr, + union bpf_attr __user *uattr, + u32 prog_id, u32 prog_info, + const char *buf, u64 probe_offset, + u64 probe_addr) +{ + __u64 __user *ubuf; + int len; + + ubuf = u64_to_user_ptr(attr->perf_event_query.buf); + if (buf) { + len = strlen(buf); + if (attr->perf_event_query.buf_len < len + 1) + return -ENOSPC; + if (copy_to_user(ubuf, buf, len + 1)) + return -EFAULT; + } else if (attr->perf_event_query.buf_len) { + /* copy '\0' to ubuf */ + __u8 zero = 0; + + if (copy_to_user(ubuf, &zero, 1)) + return -EFAULT; + } + + if (copy_to_user(&uattr->perf_event_query.prog_id, &prog_id, + sizeof(prog_id)) || + copy_to_user(&uattr->perf_event_query.prog_info, &prog_info, + sizeof(prog_info)) || + copy_to_user(&uattr->perf_event_query.probe_offset, &probe_offset, + sizeof(probe_offset)) || + copy_to_user(&uattr->perf_event_query.probe_addr, &probe_addr, + sizeof(probe_addr))) + return -EFAULT; + + return 0; +} + +#define BPF_PERF_EVENT_QUERY_LAST_FIELD perf_event_query.probe_addr + +static int bpf_perf_event_query(const union bpf_attr *attr, + union bpf_attr __user *uattr) +{ + pid_t pid = attr->perf_event_query.pid; + int fd = attr->perf_event_query.fd; + struct files_struct *files; + struct task_struct *task; + struct file *file; + int err; + + if (CHECK_ATTR(BPF_PERF_EVENT_QUERY)) + return -EINVAL; + + if (!capable(CAP_SYS_ADMIN)) + return -EPERM; + + task = get_pid_task(find_vpid(pid), PIDTYPE_PID); + if (!task) + return -ENOENT; + + files = get_files_struct(task); + put_task_struct(task); + if (!files) + return -ENOENT; + + err = 0; + spin_lock(&files->file_lock); + file = fcheck_files(files, fd); + if (!file) + err = -ENOENT; + else + get_file(file); + spin_unlock(&files->file_lock); + put_files_struct(files); + + if (err) + goto out; + + if (file->f_op == &bpf_raw_tp_fops) { + struct bpf_raw_tracepoint *raw_tp = file->private_data; + struct bpf_raw_event_map *btp = raw_tp->btp; + + if (!raw_tp->prog) + err = -ENOENT; + else + err = bpf_perf_event_info_copy(attr, uattr, + raw_tp->prog->aux->id, + BPF_PERF_INFO_TP_NAME, + btp->tp->name, 0, 0); + } else { + u64 probe_offset, probe_addr; + u32 prog_id, prog_info; + const char *buf; + + err = bpf_get_perf_event_info(file, &prog_id, &prog_info, + &buf, &probe_offset, + &probe_addr); + if (!err) + err = bpf_perf_event_info_copy(attr, uattr, prog_id, + prog_info, buf, + probe_offset, + probe_addr); + } + + fput(file); +out: + return err; +} + SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, size) { union bpf_attr attr = {}; @@ -2179,6 +2289,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz case BPF_BTF_GET_FD_BY_ID: err = bpf_btf_get_fd_by_id(&attr); break; + case BPF_PERF_EVENT_QUERY: + err = bpf_perf_event_query(&attr, uattr); + break; default: err = -EINVAL; break; diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index ce2cbbf..7e8121e 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include "trace_probe.h" @@ -1163,3 +1164,55 @@ int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog *prog) mutex_unlock(&bpf_event_mutex); return err; } + +int bpf_get_perf_event_info(struct file *file, u32 *prog_id, u32 *prog_info, + const char **buf, u64 *probe_offset, + u64 *probe_addr) +{ + bool is_tracepoint, is_syscall_tp; + struct perf_event *event; + struct bpf_prog *prog; + int flags, err = 0; + + event = perf_get_event(file); + if (IS_ERR(event)) + return PTR_ERR(event); + + prog = event->prog; + if (!prog) + return -ENOENT; + + /* not supporting BPF_PROG_TYPE_PERF_EVENT yet */ + if (prog->type == BPF_PROG_TYPE_PERF_EVENT) + return -EOPNOTSUPP; + + *prog_id = prog->aux->id; + flags = event->tp_event->flags; + is_tracepoint = flags & TRACE_EVENT_FL_TRACEPOINT; + is_syscall_tp = is_syscall_trace_event(event->tp_event); + + if (is_tracepoint || is_syscall_tp) { + *buf = is_tracepoint ? event->tp_event->tp->name + : event->tp_event->name; + *prog_info = BPF_PERF_INFO_TP_NAME; + *probe_offset = 0x0; + *probe_addr = 0x0; + } else { + /* kprobe/uprobe */ + err = -EOPNOTSUPP; +#ifdef CONFIG_KPROBE_EVENTS + if (flags & TRACE_EVENT_FL_KPROBE) + err = bpf_get_kprobe_info(event, prog_info, buf, + probe_offset, probe_addr, + event->attr.type == PERF_TYPE_TRACEPOINT); +#endif +#ifdef CONFIG_UPROBE_EVENTS + if (flags & TRACE_EVENT_FL_UPROBE) + err = bpf_get_uprobe_info(event, prog_info, buf, + probe_offset, + event->attr.type == PERF_TYPE_TRACEPOINT); +#endif + } + + return err; +} diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c index 02aed76..595d154 100644 --- a/kernel/trace/trace_kprobe.c +++ b/kernel/trace/trace_kprobe.c @@ -1287,6 +1287,35 @@ kretprobe_perf_func(struct trace_kprobe *tk, struct kretprobe_instance *ri, head, NULL); } NOKPROBE_SYMBOL(kretprobe_perf_func); + +int bpf_get_kprobe_info(struct perf_event *event, u32 *prog_info, + const char **symbol, u64 *probe_offset, + u64 *probe_addr, bool perf_type_tracepoint) +{ + const char *pevent = trace_event_name(event->tp_event); + const char *group = event->tp_event->class->system; + struct trace_kprobe *tk; + + if (perf_type_tracepoint) + tk = find_trace_kprobe(pevent, group); + else + tk = event->tp_event->data; + if (!tk) + return -EINVAL; + + *prog_info = trace_kprobe_is_return(tk) ? BPF_PERF_INFO_KRETPROBE + : BPF_PERF_INFO_KPROBE; + if (tk->symbol) { + *symbol = tk->symbol; + *probe_offset = tk->rp.kp.offset; + *probe_addr = 0; + } else { + *symbol = NULL; + *probe_offset = 0; + *probe_addr = (u64)tk->rp.kp.addr; + } + return 0; +} #endif /* CONFIG_PERF_EVENTS */ /* diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index ac89287..e781a9f 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -1161,6 +1161,28 @@ static void uretprobe_perf_func(struct trace_uprobe *tu, unsigned long func, { __uprobe_perf_func(tu, func, regs, ucb, dsize); } + +int bpf_get_uprobe_info(struct perf_event *event, u32 *prog_info, + const char **filename, u64 *probe_offset, + bool perf_type_tracepoint) +{ + const char *pevent = trace_event_name(event->tp_event); + const char *group = event->tp_event->class->system; + struct trace_uprobe *tu; + + if (perf_type_tracepoint) + tu = find_probe_event(pevent, group); + else + tu = event->tp_event->data; + if (!tu) + return -EINVAL; + + *prog_info = is_ret_probe(tu) ? BPF_PERF_INFO_URETPROBE + : BPF_PERF_INFO_UPROBE; + *filename = tu->filename; + *probe_offset = tu->offset; + return 0; +} #endif /* CONFIG_PERF_EVENTS */ static int