From patchwork Wed Nov 8 18:17:14 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 835945 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.b="YKoOQmwx"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3yXDzk1bnBz9sPm for ; Thu, 9 Nov 2017 05:17:38 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752134AbdKHSRf (ORCPT ); Wed, 8 Nov 2017 13:17:35 -0500 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:34042 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751963AbdKHSRc (ORCPT ); Wed, 8 Nov 2017 13:17:32 -0500 Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vA8IF9ku028917 for ; Wed, 8 Nov 2017 10:17:32 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=UDxZXsH3Qci4Zrkc+VEPdGBxCXXpKsov7d3+1CyDepA=; b=YKoOQmwx0+AYVj1f6pBj7twRuhQEaclJeYTanlxXzEZnupY5JKZfVUylvK2btzzFXKF5 OkbSG93A/5bpJBdKx2kS5/4RW1TE38uoOrsaVkJIqsEHw4FZifPIjAGauxR4yLm+TAFU X3SqBYrG14RsEklMYphP+BDfLU1kVhNe+Kk= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2e466ygatw-5 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Wed, 08 Nov 2017 10:17:31 -0800 Received: from mx-out.facebook.com (192.168.52.123) by PRN-CHUB14.TheFacebook.com (192.168.16.24) with Microsoft SMTP Server id 14.3.361.1; Wed, 8 Nov 2017 10:17:31 -0800 Received: by devbig102.frc2.facebook.com (Postfix, from userid 4523) id 0EA95428251A; Wed, 8 Nov 2017 10:17:27 -0800 (PST) Smtp-Origin-Hostprefix: devbig From: Song Liu Smtp-Origin-Hostname: devbig102.frc2.facebook.com To: , , , , , , CC: , Song Liu Smtp-Origin-Cluster: frc2c02 Subject: [RFC] bcc: Try use new API to create [k, u]probe with perf_event_open Date: Wed, 8 Nov 2017 10:17:14 -0800 Message-ID: <20171108181721.3354137-2-songliubraving@fb.com> X-Mailer: git-send-email 2.9.5 In-Reply-To: <20171108181721.3354137-1-songliubraving@fb.com> References: <20171108181721.3354137-1-songliubraving@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:, , definitions=2017-11-08_03:, , signatures=0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org New kernel API allows creating [k,u]probe with perf_event_open. This patch tries to use the new API. If the new API doesn't work, we fall back to old API. bpf_detach_probe() looks up the event being removed. If the event is not found, we skip the clean up procedure. Signed-off-by: Song Liu --- src/cc/libbpf.c | 224 +++++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 155 insertions(+), 69 deletions(-) diff --git a/src/cc/libbpf.c b/src/cc/libbpf.c index 77413df..d7be0a9 100644 --- a/src/cc/libbpf.c +++ b/src/cc/libbpf.c @@ -520,38 +520,66 @@ int bpf_attach_socket(int sock, int prog) { return setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog, sizeof(prog)); } +/* + * new kernel API allows creating [k,u]probe with perf_event_open, which + * makes it easier to clean up the [k,u]probe. This function tries to + * create pfd with the new API. + */ +static int bpf_try_perf_event_open_with_probe(struct probe_desc *pd, int pid, + int cpu, int group_fd, int is_uprobe, int is_return) +{ + struct perf_event_attr attr = {}; + + attr.type = PERF_TYPE_PROBE; + attr.probe_desc = ptr_to_u64(pd); + attr.sample_type = PERF_SAMPLE_RAW | PERF_SAMPLE_CALLCHAIN; + attr.sample_period = 1; + attr.wakeup_events = 1; + attr.is_uprobe = is_uprobe; + attr.is_return = is_return; + return syscall(__NR_perf_event_open, &attr, pid, cpu, group_fd, + PERF_FLAG_FD_CLOEXEC); +} + static int bpf_attach_tracing_event(int progfd, const char *event_path, - struct perf_reader *reader, int pid, int cpu, int group_fd) { - int efd, pfd; + struct perf_reader *reader, int pid, int cpu, int group_fd, int pfd) { + int efd; ssize_t bytes; char buf[256]; struct perf_event_attr attr = {}; - snprintf(buf, sizeof(buf), "%s/id", event_path); - efd = open(buf, O_RDONLY, 0); - if (efd < 0) { - fprintf(stderr, "open(%s): %s\n", buf, strerror(errno)); - return -1; - } + /* + * Only look up id and call perf_event_open when + * bpf_try_perf_event_open_with_probe() didn't returns valid pfd. + */ + if (pfd < 0) { + snprintf(buf, sizeof(buf), "%s/id", event_path); + efd = open(buf, O_RDONLY, 0); + if (efd < 0) { + fprintf(stderr, "open(%s): %s\n", buf, strerror(errno)); + return -1; + } - bytes = read(efd, buf, sizeof(buf)); - if (bytes <= 0 || bytes >= sizeof(buf)) { - fprintf(stderr, "read(%s): %s\n", buf, strerror(errno)); + bytes = read(efd, buf, sizeof(buf)); + if (bytes <= 0 || bytes >= sizeof(buf)) { + fprintf(stderr, "read(%s): %s\n", buf, strerror(errno)); + close(efd); + return -1; + } close(efd); - return -1; - } - close(efd); - buf[bytes] = '\0'; - attr.config = strtol(buf, NULL, 0); - attr.type = PERF_TYPE_TRACEPOINT; - attr.sample_type = PERF_SAMPLE_RAW | PERF_SAMPLE_CALLCHAIN; - attr.sample_period = 1; - attr.wakeup_events = 1; - pfd = syscall(__NR_perf_event_open, &attr, pid, cpu, group_fd, PERF_FLAG_FD_CLOEXEC); - if (pfd < 0) { - fprintf(stderr, "perf_event_open(%s/id): %s\n", event_path, strerror(errno)); - return -1; + buf[bytes] = '\0'; + attr.config = strtol(buf, NULL, 0); + attr.type = PERF_TYPE_TRACEPOINT; + attr.sample_type = PERF_SAMPLE_RAW | PERF_SAMPLE_CALLCHAIN; + attr.sample_period = 1; + attr.wakeup_events = 1; + pfd = syscall(__NR_perf_event_open, &attr, pid, cpu, group_fd, PERF_FLAG_FD_CLOEXEC); + if (pfd < 0) { + fprintf(stderr, "perf_event_open(%s/id): %s\n", event_path, strerror(errno)); + return -1; + } } + perf_reader_set_fd(reader, pfd); if (perf_reader_mmap(reader, attr.type, attr.sample_type) < 0) @@ -579,31 +607,41 @@ void * bpf_attach_kprobe(int progfd, enum bpf_probe_attach_type attach_type, con char event_alias[128]; struct perf_reader *reader = NULL; static char *event_type = "kprobe"; + struct probe_desc pd; + int pfd; reader = perf_reader_new(cb, NULL, NULL, cb_cookie, probe_perf_reader_page_cnt); if (!reader) goto error; - snprintf(buf, sizeof(buf), "/sys/kernel/debug/tracing/%s_events", event_type); - kfd = open(buf, O_WRONLY | O_APPEND, 0); - if (kfd < 0) { - fprintf(stderr, "open(%s): %s\n", buf, strerror(errno)); - goto error; - } + /* try use new API to create kprobe */ + pd.func = ptr_to_u64((void *)fn_name); + pd.offset = 0; + pfd = bpf_try_perf_event_open_with_probe(&pd, pid, cpu, group_fd, 0, + attach_type != BPF_PROBE_ENTRY); - snprintf(event_alias, sizeof(event_alias), "%s_bcc_%d", ev_name, getpid()); - snprintf(buf, sizeof(buf), "%c:%ss/%s %s", attach_type==BPF_PROBE_ENTRY ? 'p' : 'r', - event_type, event_alias, fn_name); - if (write(kfd, buf, strlen(buf)) < 0) { - if (errno == EINVAL) - fprintf(stderr, "check dmesg output for possible cause\n"); + if (pfd < 0) { + snprintf(buf, sizeof(buf), "/sys/kernel/debug/tracing/%s_events", event_type); + kfd = open(buf, O_WRONLY | O_APPEND, 0); + if (kfd < 0) { + fprintf(stderr, "open(%s): %s\n", buf, strerror(errno)); + goto error; + } + + snprintf(event_alias, sizeof(event_alias), "%s_bcc_%d", ev_name, getpid()); + snprintf(buf, sizeof(buf), "%c:%ss/%s %s", attach_type==BPF_PROBE_ENTRY ? 'p' : 'r', + event_type, event_alias, fn_name); + if (write(kfd, buf, strlen(buf)) < 0) { + if (errno == EINVAL) + fprintf(stderr, "check dmesg output for possible cause\n"); + close(kfd); + goto error; + } close(kfd); - goto error; + snprintf(buf, sizeof(buf), "/sys/kernel/debug/tracing/events/%ss/%s", event_type, event_alias); } - close(kfd); - snprintf(buf, sizeof(buf), "/sys/kernel/debug/tracing/events/%ss/%s", event_type, event_alias); - if (bpf_attach_tracing_event(progfd, buf, reader, pid, cpu, group_fd) < 0) + if (bpf_attach_tracing_event(progfd, buf, reader, pid, cpu, group_fd, pfd) < 0) goto error; return reader; @@ -685,42 +723,53 @@ void * bpf_attach_uprobe(int progfd, enum bpf_probe_attach_type attach_type, con struct perf_reader *reader = NULL; static char *event_type = "uprobe"; int res, kfd = -1, ns_fd = -1; + struct probe_desc pd; + int pfd; reader = perf_reader_new(cb, NULL, NULL, cb_cookie, probe_perf_reader_page_cnt); if (!reader) goto error; - snprintf(buf, sizeof(buf), "/sys/kernel/debug/tracing/%s_events", event_type); - kfd = open(buf, O_WRONLY | O_APPEND, 0); - if (kfd < 0) { - fprintf(stderr, "open(%s): %s\n", buf, strerror(errno)); - goto error; - } + /* try use new API to create uprobe */ + pd.path = ptr_to_u64((void *)binary_path); + pd.offset = offset; + pfd = bpf_try_perf_event_open_with_probe(&pd, pid, cpu, group_fd, 1, + attach_type != BPF_PROBE_ENTRY); - res = snprintf(event_alias, sizeof(event_alias), "%s_bcc_%d", ev_name, getpid()); - if (res < 0 || res >= sizeof(event_alias)) { - fprintf(stderr, "Event name (%s) is too long for buffer\n", ev_name); - goto error; - } - res = snprintf(buf, sizeof(buf), "%c:%ss/%s %s:0x%lx", attach_type==BPF_PROBE_ENTRY ? 'p' : 'r', - event_type, event_alias, binary_path, offset); - if (res < 0 || res >= sizeof(buf)) { - fprintf(stderr, "Event alias (%s) too long for buffer\n", event_alias); - goto error; - } + if (pfd < 0) { + snprintf(buf, sizeof(buf), "/sys/kernel/debug/tracing/%s_events", event_type); + kfd = open(buf, O_WRONLY | O_APPEND, 0); + if (kfd < 0) { + fprintf(stderr, "open(%s): %s\n", buf, strerror(errno)); + goto error; + } - ns_fd = enter_mount_ns(pid); - if (write(kfd, buf, strlen(buf)) < 0) { - if (errno == EINVAL) - fprintf(stderr, "check dmesg output for possible cause\n"); - goto error; + res = snprintf(event_alias, sizeof(event_alias), "%s_bcc_%d", ev_name, getpid()); + if (res < 0 || res >= sizeof(event_alias)) { + fprintf(stderr, "Event name (%s) is too long for buffer\n", ev_name); + goto error; + } + res = snprintf(buf, sizeof(buf), "%c:%ss/%s %s:0x%lx", attach_type==BPF_PROBE_ENTRY ? 'p' : 'r', + event_type, event_alias, binary_path, offset); + if (res < 0 || res >= sizeof(buf)) { + fprintf(stderr, "Event alias (%s) too long for buffer\n", event_alias); + goto error; + } + + ns_fd = enter_mount_ns(pid); + if (write(kfd, buf, strlen(buf)) < 0) { + if (errno == EINVAL) + fprintf(stderr, "check dmesg output for possible cause\n"); + goto error; + } + close(kfd); + exit_mount_ns(ns_fd); + ns_fd = -1; + + snprintf(buf, sizeof(buf), "/sys/kernel/debug/tracing/events/%ss/%s", event_type, event_alias); } - close(kfd); - exit_mount_ns(ns_fd); - ns_fd = -1; - snprintf(buf, sizeof(buf), "/sys/kernel/debug/tracing/events/%ss/%s", event_type, event_alias); - if (bpf_attach_tracing_event(progfd, buf, reader, pid, cpu, group_fd) < 0) + if (bpf_attach_tracing_event(progfd, buf, reader, pid, cpu, group_fd, pfd) < 0) goto error; return reader; @@ -735,8 +784,43 @@ error: static int bpf_detach_probe(const char *ev_name, const char *event_type) { - int kfd, res; + int kfd = -1, res; char buf[PATH_MAX]; + int found_event = 0; + size_t bufsize = 0; + char *cptr = NULL; + FILE *fp; + + /* + * For [k,u]probe created with perf_event_open (on newer kernel), it is + * not necessary to clean it up in [k,u]probe_events. We first look up + * the %s_bcc_%d line in [k,u]probe_events. If the event is not found, + * it is safe to skip the cleaning up process (write -:... to the file). + */ + snprintf(buf, sizeof(buf), "/sys/kernel/debug/tracing/%s_events", event_type); + fp = fopen(buf, "r"); + if (!fp) { + fprintf(stderr, "open(%s): %s\n", buf, strerror(errno)); + goto error; + } + + res = snprintf(buf, sizeof(buf), "%ss/%s_bcc_%d", event_type, ev_name, getpid()); + if (res < 0 || res >= sizeof(buf)) { + fprintf(stderr, "snprintf(%s): %d\n", ev_name, res); + goto error; + } + + while (getline(&cptr, &bufsize, fp) != -1) + if (strstr(cptr, buf) != NULL) { + found_event = 1; + break; + } + fclose(fp); + fp = NULL; + + if (!found_event) + return 0; + snprintf(buf, sizeof(buf), "/sys/kernel/debug/tracing/%s_events", event_type); kfd = open(buf, O_WRONLY | O_APPEND, 0); if (kfd < 0) { @@ -760,6 +844,8 @@ static int bpf_detach_probe(const char *ev_name, const char *event_type) error: if (kfd >= 0) close(kfd); + if (fp) + fclose(fp); return -1; } @@ -786,7 +872,7 @@ void * bpf_attach_tracepoint(int progfd, const char *tp_category, snprintf(buf, sizeof(buf), "/sys/kernel/debug/tracing/events/%s/%s", tp_category, tp_name); - if (bpf_attach_tracing_event(progfd, buf, reader, pid, cpu, group_fd) < 0) + if (bpf_attach_tracing_event(progfd, buf, reader, pid, cpu, group_fd, -1) < 0) goto error; return reader;