From patchwork Wed May 2 23:20:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Song Liu X-Patchwork-Id: 907775 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=fb.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=fb.com header.i=@fb.com header.b="kLdYCe3a"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40bvQm38Kkz9s4n for ; Thu, 3 May 2018 09:20:48 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751837AbeEBXUq (ORCPT ); Wed, 2 May 2018 19:20:46 -0400 Received: from mx0b-00082601.pphosted.com ([67.231.153.30]:41826 "EHLO mx0b-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751599AbeEBXUm (ORCPT ); Wed, 2 May 2018 19:20:42 -0400 Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.22/8.16.0.22) with SMTP id w42NJ72B005169 for ; Wed, 2 May 2018 16:20:41 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-type; s=facebook; bh=2ZYYRkgwgFrWQk5hMfT5GHaghBTCvm/mj8yzuFtsDSE=; b=kLdYCe3aXNd8tVG7ZVsEzo3PJesVH9fCcn/oJ93yY5lNNkhOQ4+ACRUp2f/tof3QG0Ry yQ7U0+NNGjkIoL8kueyBBTAOOl+CcGqLQI/uCnE7g+tjkE5266hdgY9BaKyphGGbwUTe P2QaiXKM+tOkGvWqXGAYgVdhlY6jOsVsPi8= Received: from mail.thefacebook.com ([199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2hqj7mrus5-1 (version=TLSv1 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Wed, 02 May 2018 16:20:41 -0700 Received: from mx-out.facebook.com (192.168.52.123) by PRN-CHUB01.TheFacebook.com (192.168.16.11) with Microsoft SMTP Server id 14.3.361.1; Wed, 2 May 2018 16:20:39 -0700 Received: by devbig006.ftw2.facebook.com (Postfix, from userid 4523) id 5920262E1423; Wed, 2 May 2018 16:20:39 -0700 (PDT) Smtp-Origin-Hostprefix: devbig From: Song Liu Smtp-Origin-Hostname: devbig006.ftw2.facebook.com To: CC: Song Liu , , , Alexei Starovoitov , Daniel Borkmann , Peter Zijlstra Smtp-Origin-Cluster: ftw2c04 Subject: [PATCH v2 bpf-next 1/2] bpf: enable stackmap with build_id in nmi context Date: Wed, 2 May 2018 16:20:29 -0700 Message-ID: <20180502232030.3788284-2-songliubraving@fb.com> X-Mailer: git-send-email 2.9.5 In-Reply-To: <20180502232030.3788284-1-songliubraving@fb.com> References: <20180502232030.3788284-1-songliubraving@fb.com> X-FB-Internal: Safe MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2018-05-02_09:, , signatures=0 X-Proofpoint-Spam-Reason: safe X-FB-Internal: Safe Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Currently, we cannot parse build_id in nmi context because of up_read(¤t->mm->mmap_sem), this makes stackmap with build_id less useful. This patch enables parsing build_id in nmi by putting the up_read() call in irq_work. To avoid memory allocation in nmi context, we use per cpu variable for the irq_work. As a result, only one irq_work per cpu is allowed. If the irq_work is in-use, we fallback to only report ips. Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Peter Zijlstra Signed-off-by: Song Liu --- init/Kconfig | 1 + kernel/bpf/stackmap.c | 59 +++++++++++++++++++++++++++++++++++++++++++++------ 2 files changed, 54 insertions(+), 6 deletions(-) diff --git a/init/Kconfig b/init/Kconfig index f013afc..480a4f2 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1391,6 +1391,7 @@ config BPF_SYSCALL bool "Enable bpf() system call" select ANON_INODES select BPF + select IRQ_WORK default n help Enable the bpf() system call that allows to manipulate eBPF diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index 3ba102b..51d4aea 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -11,6 +11,7 @@ #include #include #include +#include #include "percpu_freelist.h" #define STACK_CREATE_FLAG_MASK \ @@ -32,6 +33,23 @@ struct bpf_stack_map { struct stack_map_bucket *buckets[]; }; +/* irq_work to run up_read() for build_id lookup in nmi context */ +struct stack_map_irq_work { + struct irq_work irq_work; + struct rw_semaphore *sem; +}; + +static void do_up_read(struct irq_work *entry) +{ + struct stack_map_irq_work *work = container_of(entry, + struct stack_map_irq_work, irq_work); + + up_read(work->sem); + work->sem = NULL; +} + +static DEFINE_PER_CPU(struct stack_map_irq_work, up_read_work); + static inline bool stack_map_use_build_id(struct bpf_map *map) { return (map->map_flags & BPF_F_STACK_BUILD_ID); @@ -267,17 +285,27 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, { int i; struct vm_area_struct *vma; + bool in_nmi_ctx = in_nmi(); + bool irq_work_busy = false; + struct stack_map_irq_work *work; + + if (in_nmi_ctx) { + work = this_cpu_ptr(&up_read_work); + if (work->irq_work.flags & IRQ_WORK_BUSY) + /* cannot queue more up_read, fallback */ + irq_work_busy = true; + } /* - * We cannot do up_read() in nmi context, so build_id lookup is - * only supported for non-nmi events. If at some point, it is - * possible to run find_vma() without taking the semaphore, we - * would like to allow build_id lookup in nmi context. + * We cannot do up_read() in nmi context. To do build_id lookup + * in nmi context, we need to run up_read() in irq_work. We use + * a percpu variable to do the irq_work. If the irq_work is + * already used by another lookup, we fall back to report ips. * * Same fallback is used for kernel stack (!user) on a stackmap * with build_id. */ - if (!user || !current || !current->mm || in_nmi() || + if (!user || !current || !current->mm || irq_work_busy || down_read_trylock(¤t->mm->mmap_sem) == 0) { /* cannot access current->mm, fall back to ips */ for (i = 0; i < trace_nr; i++) { @@ -299,7 +327,13 @@ static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, - vma->vm_start; id_offs[i].status = BPF_STACK_BUILD_ID_VALID; } - up_read(¤t->mm->mmap_sem); + + if (!in_nmi_ctx) + up_read(¤t->mm->mmap_sem); + else { + work->sem = ¤t->mm->mmap_sem; + irq_work_queue(&work->irq_work); + } } BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map, @@ -575,3 +609,16 @@ const struct bpf_map_ops stack_map_ops = { .map_update_elem = stack_map_update_elem, .map_delete_elem = stack_map_delete_elem, }; + +static int __init stack_map_init(void) +{ + int cpu; + struct stack_map_irq_work *work; + + for_each_possible_cpu(cpu) { + work = per_cpu_ptr(&up_read_work, cpu); + init_irq_work(&work->irq_work, do_up_read); + } + return 0; +} +subsys_initcall(stack_map_init);