From patchwork Tue Sep 18 04:52:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Vasquez X-Patchwork-Id: 970930 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=polito.it Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 42DrGb56scz9sCh for ; Tue, 18 Sep 2018 14:53:11 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729227AbeIRKXy (ORCPT ); Tue, 18 Sep 2018 06:23:54 -0400 Received: from fm2nodo1.polito.it ([130.192.180.17]:40239 "EHLO fm2nodo1.polito.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726779AbeIRKXy (ORCPT ); Tue, 18 Sep 2018 06:23:54 -0400 Received: from polito.it (frontmail2.polito.it [130.192.180.42]) by fm2nodo1.polito.it with ESMTP id w8I4qYd6032262-w8I4qYd8032262 (version=TLSv1.0 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=CAFAIL); Tue, 18 Sep 2018 06:52:34 +0200 X-ExtScanner: Niversoft's FindAttachments (free) Received: from [130.192.225.241] (account d040768@polito.it HELO [127.0.1.1]) by polito.it (CommuniGate Pro SMTP 6.2.5) with ESMTPSA id 79303230; Tue, 18 Sep 2018 06:52:34 +0200 Subject: [RFC PATCH bpf-next v3 1/7] bpf: rename stack trace map From: Mauricio Vasquez B To: Alexei Starovoitov , Daniel Borkmann , netdev@vger.kernel.org Cc: Yonghong Song Date: Tue, 18 Sep 2018 06:52:34 +0200 Message-ID: <153724635419.7866.6301500459081173494.stgit@kernel> In-Reply-To: <153724634652.7866.6354309647800281793.stgit@kernel> References: <153724634652.7866.6354309647800281793.stgit@kernel> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org In the following patches queue and stack maps (FIFO and LIFO datastructures) will be implemented. In order to avoid confusion and a possible name clash rename stackmap.c to stacktracemap.c and stack_map_ops to stack_trace_map_ops Signed-off-by: Mauricio Vasquez B --- include/linux/bpf_types.h | 2 kernel/bpf/Makefile | 2 kernel/bpf/stackmap.c | 624 -------------------------------------------- kernel/bpf/stacktracemap.c | 624 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 626 insertions(+), 626 deletions(-) delete mode 100644 kernel/bpf/stackmap.c create mode 100644 kernel/bpf/stacktracemap.c diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index cd26c090e7c0..33f7f574b983 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -49,7 +49,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_LRU_HASH, htab_lru_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_LRU_PERCPU_HASH, htab_lru_percpu_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_LPM_TRIE, trie_map_ops) #ifdef CONFIG_PERF_EVENTS -BPF_MAP_TYPE(BPF_MAP_TYPE_STACK_TRACE, stack_map_ops) +BPF_MAP_TYPE(BPF_MAP_TYPE_STACK_TRACE, stack_trace_map_ops) #endif BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY_OF_MAPS, array_of_maps_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_HASH_OF_MAPS, htab_of_maps_map_ops) diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 0488b8258321..e656bce87c8f 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -20,7 +20,7 @@ endif endif endif ifeq ($(CONFIG_PERF_EVENTS),y) -obj-$(CONFIG_BPF_SYSCALL) += stackmap.o +obj-$(CONFIG_BPF_SYSCALL) += stacktracemap.o endif obj-$(CONFIG_CGROUP_BPF) += cgroup.o ifeq ($(CONFIG_INET),y) diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c deleted file mode 100644 index 8061a439ef18..000000000000 --- a/kernel/bpf/stackmap.c +++ /dev/null @@ -1,624 +0,0 @@ -/* Copyright (c) 2016 Facebook - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of version 2 of the GNU General Public - * License as published by the Free Software Foundation. - */ -#include -#include -#include -#include -#include -#include -#include -#include -#include "percpu_freelist.h" - -#define STACK_CREATE_FLAG_MASK \ - (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY | \ - BPF_F_STACK_BUILD_ID) - -struct stack_map_bucket { - struct pcpu_freelist_node fnode; - u32 hash; - u32 nr; - u64 data[]; -}; - -struct bpf_stack_map { - struct bpf_map map; - void *elems; - struct pcpu_freelist freelist; - u32 n_buckets; - struct stack_map_bucket *buckets[]; -}; - -/* irq_work to run up_read() for build_id lookup in nmi context */ -struct stack_map_irq_work { - struct irq_work irq_work; - struct rw_semaphore *sem; -}; - -static void do_up_read(struct irq_work *entry) -{ - struct stack_map_irq_work *work; - - work = container_of(entry, struct stack_map_irq_work, irq_work); - up_read(work->sem); - work->sem = NULL; -} - -static DEFINE_PER_CPU(struct stack_map_irq_work, up_read_work); - -static inline bool stack_map_use_build_id(struct bpf_map *map) -{ - return (map->map_flags & BPF_F_STACK_BUILD_ID); -} - -static inline int stack_map_data_size(struct bpf_map *map) -{ - return stack_map_use_build_id(map) ? - sizeof(struct bpf_stack_build_id) : sizeof(u64); -} - -static int prealloc_elems_and_freelist(struct bpf_stack_map *smap) -{ - u32 elem_size = sizeof(struct stack_map_bucket) + smap->map.value_size; - int err; - - smap->elems = bpf_map_area_alloc(elem_size * smap->map.max_entries, - smap->map.numa_node); - if (!smap->elems) - return -ENOMEM; - - err = pcpu_freelist_init(&smap->freelist); - if (err) - goto free_elems; - - pcpu_freelist_populate(&smap->freelist, smap->elems, elem_size, - smap->map.max_entries); - return 0; - -free_elems: - bpf_map_area_free(smap->elems); - return err; -} - -/* Called from syscall */ -static struct bpf_map *stack_map_alloc(union bpf_attr *attr) -{ - u32 value_size = attr->value_size; - struct bpf_stack_map *smap; - u64 cost, n_buckets; - int err; - - if (!capable(CAP_SYS_ADMIN)) - return ERR_PTR(-EPERM); - - if (attr->map_flags & ~STACK_CREATE_FLAG_MASK) - return ERR_PTR(-EINVAL); - - /* check sanity of attributes */ - if (attr->max_entries == 0 || attr->key_size != 4 || - value_size < 8 || value_size % 8) - return ERR_PTR(-EINVAL); - - BUILD_BUG_ON(sizeof(struct bpf_stack_build_id) % sizeof(u64)); - if (attr->map_flags & BPF_F_STACK_BUILD_ID) { - if (value_size % sizeof(struct bpf_stack_build_id) || - value_size / sizeof(struct bpf_stack_build_id) - > sysctl_perf_event_max_stack) - return ERR_PTR(-EINVAL); - } else if (value_size / 8 > sysctl_perf_event_max_stack) - return ERR_PTR(-EINVAL); - - /* hash table size must be power of 2 */ - n_buckets = roundup_pow_of_two(attr->max_entries); - - cost = n_buckets * sizeof(struct stack_map_bucket *) + sizeof(*smap); - if (cost >= U32_MAX - PAGE_SIZE) - return ERR_PTR(-E2BIG); - - smap = bpf_map_area_alloc(cost, bpf_map_attr_numa_node(attr)); - if (!smap) - return ERR_PTR(-ENOMEM); - - err = -E2BIG; - cost += n_buckets * (value_size + sizeof(struct stack_map_bucket)); - if (cost >= U32_MAX - PAGE_SIZE) - goto free_smap; - - bpf_map_init_from_attr(&smap->map, attr); - smap->map.value_size = value_size; - smap->n_buckets = n_buckets; - smap->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT; - - err = bpf_map_precharge_memlock(smap->map.pages); - if (err) - goto free_smap; - - err = get_callchain_buffers(sysctl_perf_event_max_stack); - if (err) - goto free_smap; - - err = prealloc_elems_and_freelist(smap); - if (err) - goto put_buffers; - - return &smap->map; - -put_buffers: - put_callchain_buffers(); -free_smap: - bpf_map_area_free(smap); - return ERR_PTR(err); -} - -#define BPF_BUILD_ID 3 -/* - * Parse build id from the note segment. This logic can be shared between - * 32-bit and 64-bit system, because Elf32_Nhdr and Elf64_Nhdr are - * identical. - */ -static inline int stack_map_parse_build_id(void *page_addr, - unsigned char *build_id, - void *note_start, - Elf32_Word note_size) -{ - Elf32_Word note_offs = 0, new_offs; - - /* check for overflow */ - if (note_start < page_addr || note_start + note_size < note_start) - return -EINVAL; - - /* only supports note that fits in the first page */ - if (note_start + note_size > page_addr + PAGE_SIZE) - return -EINVAL; - - while (note_offs + sizeof(Elf32_Nhdr) < note_size) { - Elf32_Nhdr *nhdr = (Elf32_Nhdr *)(note_start + note_offs); - - if (nhdr->n_type == BPF_BUILD_ID && - nhdr->n_namesz == sizeof("GNU") && - nhdr->n_descsz == BPF_BUILD_ID_SIZE) { - memcpy(build_id, - note_start + note_offs + - ALIGN(sizeof("GNU"), 4) + sizeof(Elf32_Nhdr), - BPF_BUILD_ID_SIZE); - return 0; - } - new_offs = note_offs + sizeof(Elf32_Nhdr) + - ALIGN(nhdr->n_namesz, 4) + ALIGN(nhdr->n_descsz, 4); - if (new_offs <= note_offs) /* overflow */ - break; - note_offs = new_offs; - } - return -EINVAL; -} - -/* Parse build ID from 32-bit ELF */ -static int stack_map_get_build_id_32(void *page_addr, - unsigned char *build_id) -{ - Elf32_Ehdr *ehdr = (Elf32_Ehdr *)page_addr; - Elf32_Phdr *phdr; - int i; - - /* only supports phdr that fits in one page */ - if (ehdr->e_phnum > - (PAGE_SIZE - sizeof(Elf32_Ehdr)) / sizeof(Elf32_Phdr)) - return -EINVAL; - - phdr = (Elf32_Phdr *)(page_addr + sizeof(Elf32_Ehdr)); - - for (i = 0; i < ehdr->e_phnum; ++i) - if (phdr[i].p_type == PT_NOTE) - return stack_map_parse_build_id(page_addr, build_id, - page_addr + phdr[i].p_offset, - phdr[i].p_filesz); - return -EINVAL; -} - -/* Parse build ID from 64-bit ELF */ -static int stack_map_get_build_id_64(void *page_addr, - unsigned char *build_id) -{ - Elf64_Ehdr *ehdr = (Elf64_Ehdr *)page_addr; - Elf64_Phdr *phdr; - int i; - - /* only supports phdr that fits in one page */ - if (ehdr->e_phnum > - (PAGE_SIZE - sizeof(Elf64_Ehdr)) / sizeof(Elf64_Phdr)) - return -EINVAL; - - phdr = (Elf64_Phdr *)(page_addr + sizeof(Elf64_Ehdr)); - - for (i = 0; i < ehdr->e_phnum; ++i) - if (phdr[i].p_type == PT_NOTE) - return stack_map_parse_build_id(page_addr, build_id, - page_addr + phdr[i].p_offset, - phdr[i].p_filesz); - return -EINVAL; -} - -/* Parse build ID of ELF file mapped to vma */ -static int stack_map_get_build_id(struct vm_area_struct *vma, - unsigned char *build_id) -{ - Elf32_Ehdr *ehdr; - struct page *page; - void *page_addr; - int ret; - - /* only works for page backed storage */ - if (!vma->vm_file) - return -EINVAL; - - page = find_get_page(vma->vm_file->f_mapping, 0); - if (!page) - return -EFAULT; /* page not mapped */ - - ret = -EINVAL; - page_addr = page_address(page); - ehdr = (Elf32_Ehdr *)page_addr; - - /* compare magic x7f "ELF" */ - if (memcmp(ehdr->e_ident, ELFMAG, SELFMAG) != 0) - goto out; - - /* only support executable file and shared object file */ - if (ehdr->e_type != ET_EXEC && ehdr->e_type != ET_DYN) - goto out; - - if (ehdr->e_ident[EI_CLASS] == ELFCLASS32) - ret = stack_map_get_build_id_32(page_addr, build_id); - else if (ehdr->e_ident[EI_CLASS] == ELFCLASS64) - ret = stack_map_get_build_id_64(page_addr, build_id); -out: - put_page(page); - return ret; -} - -static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, - u64 *ips, u32 trace_nr, bool user) -{ - int i; - struct vm_area_struct *vma; - bool irq_work_busy = false; - struct stack_map_irq_work *work = NULL; - - if (in_nmi()) { - work = this_cpu_ptr(&up_read_work); - if (work->irq_work.flags & IRQ_WORK_BUSY) - /* cannot queue more up_read, fallback */ - irq_work_busy = true; - } - - /* - * We cannot do up_read() in nmi context. To do build_id lookup - * in nmi context, we need to run up_read() in irq_work. We use - * a percpu variable to do the irq_work. If the irq_work is - * already used by another lookup, we fall back to report ips. - * - * Same fallback is used for kernel stack (!user) on a stackmap - * with build_id. - */ - if (!user || !current || !current->mm || irq_work_busy || - down_read_trylock(¤t->mm->mmap_sem) == 0) { - /* cannot access current->mm, fall back to ips */ - for (i = 0; i < trace_nr; i++) { - id_offs[i].status = BPF_STACK_BUILD_ID_IP; - id_offs[i].ip = ips[i]; - } - return; - } - - for (i = 0; i < trace_nr; i++) { - vma = find_vma(current->mm, ips[i]); - if (!vma || stack_map_get_build_id(vma, id_offs[i].build_id)) { - /* per entry fall back to ips */ - id_offs[i].status = BPF_STACK_BUILD_ID_IP; - id_offs[i].ip = ips[i]; - continue; - } - id_offs[i].offset = (vma->vm_pgoff << PAGE_SHIFT) + ips[i] - - vma->vm_start; - id_offs[i].status = BPF_STACK_BUILD_ID_VALID; - } - - if (!work) { - up_read(¤t->mm->mmap_sem); - } else { - work->sem = ¤t->mm->mmap_sem; - irq_work_queue(&work->irq_work); - } -} - -BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map, - u64, flags) -{ - struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map); - struct perf_callchain_entry *trace; - struct stack_map_bucket *bucket, *new_bucket, *old_bucket; - u32 max_depth = map->value_size / stack_map_data_size(map); - /* stack_map_alloc() checks that max_depth <= sysctl_perf_event_max_stack */ - u32 init_nr = sysctl_perf_event_max_stack - max_depth; - u32 skip = flags & BPF_F_SKIP_FIELD_MASK; - u32 hash, id, trace_nr, trace_len; - bool user = flags & BPF_F_USER_STACK; - bool kernel = !user; - u64 *ips; - bool hash_matches; - - if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK | - BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID))) - return -EINVAL; - - trace = get_perf_callchain(regs, init_nr, kernel, user, - sysctl_perf_event_max_stack, false, false); - - if (unlikely(!trace)) - /* couldn't fetch the stack trace */ - return -EFAULT; - - /* get_perf_callchain() guarantees that trace->nr >= init_nr - * and trace-nr <= sysctl_perf_event_max_stack, so trace_nr <= max_depth - */ - trace_nr = trace->nr - init_nr; - - if (trace_nr <= skip) - /* skipping more than usable stack trace */ - return -EFAULT; - - trace_nr -= skip; - trace_len = trace_nr * sizeof(u64); - ips = trace->ip + skip + init_nr; - hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0); - id = hash & (smap->n_buckets - 1); - bucket = READ_ONCE(smap->buckets[id]); - - hash_matches = bucket && bucket->hash == hash; - /* fast cmp */ - if (hash_matches && flags & BPF_F_FAST_STACK_CMP) - return id; - - if (stack_map_use_build_id(map)) { - /* for build_id+offset, pop a bucket before slow cmp */ - new_bucket = (struct stack_map_bucket *) - pcpu_freelist_pop(&smap->freelist); - if (unlikely(!new_bucket)) - return -ENOMEM; - new_bucket->nr = trace_nr; - stack_map_get_build_id_offset( - (struct bpf_stack_build_id *)new_bucket->data, - ips, trace_nr, user); - trace_len = trace_nr * sizeof(struct bpf_stack_build_id); - if (hash_matches && bucket->nr == trace_nr && - memcmp(bucket->data, new_bucket->data, trace_len) == 0) { - pcpu_freelist_push(&smap->freelist, &new_bucket->fnode); - return id; - } - if (bucket && !(flags & BPF_F_REUSE_STACKID)) { - pcpu_freelist_push(&smap->freelist, &new_bucket->fnode); - return -EEXIST; - } - } else { - if (hash_matches && bucket->nr == trace_nr && - memcmp(bucket->data, ips, trace_len) == 0) - return id; - if (bucket && !(flags & BPF_F_REUSE_STACKID)) - return -EEXIST; - - new_bucket = (struct stack_map_bucket *) - pcpu_freelist_pop(&smap->freelist); - if (unlikely(!new_bucket)) - return -ENOMEM; - memcpy(new_bucket->data, ips, trace_len); - } - - new_bucket->hash = hash; - new_bucket->nr = trace_nr; - - old_bucket = xchg(&smap->buckets[id], new_bucket); - if (old_bucket) - pcpu_freelist_push(&smap->freelist, &old_bucket->fnode); - return id; -} - -const struct bpf_func_proto bpf_get_stackid_proto = { - .func = bpf_get_stackid, - .gpl_only = true, - .ret_type = RET_INTEGER, - .arg1_type = ARG_PTR_TO_CTX, - .arg2_type = ARG_CONST_MAP_PTR, - .arg3_type = ARG_ANYTHING, -}; - -BPF_CALL_4(bpf_get_stack, struct pt_regs *, regs, void *, buf, u32, size, - u64, flags) -{ - u32 init_nr, trace_nr, copy_len, elem_size, num_elem; - bool user_build_id = flags & BPF_F_USER_BUILD_ID; - u32 skip = flags & BPF_F_SKIP_FIELD_MASK; - bool user = flags & BPF_F_USER_STACK; - struct perf_callchain_entry *trace; - bool kernel = !user; - int err = -EINVAL; - u64 *ips; - - if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK | - BPF_F_USER_BUILD_ID))) - goto clear; - if (kernel && user_build_id) - goto clear; - - elem_size = (user && user_build_id) ? sizeof(struct bpf_stack_build_id) - : sizeof(u64); - if (unlikely(size % elem_size)) - goto clear; - - num_elem = size / elem_size; - if (sysctl_perf_event_max_stack < num_elem) - init_nr = 0; - else - init_nr = sysctl_perf_event_max_stack - num_elem; - trace = get_perf_callchain(regs, init_nr, kernel, user, - sysctl_perf_event_max_stack, false, false); - if (unlikely(!trace)) - goto err_fault; - - trace_nr = trace->nr - init_nr; - if (trace_nr < skip) - goto err_fault; - - trace_nr -= skip; - trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem; - copy_len = trace_nr * elem_size; - ips = trace->ip + skip + init_nr; - if (user && user_build_id) - stack_map_get_build_id_offset(buf, ips, trace_nr, user); - else - memcpy(buf, ips, copy_len); - - if (size > copy_len) - memset(buf + copy_len, 0, size - copy_len); - return copy_len; - -err_fault: - err = -EFAULT; -clear: - memset(buf, 0, size); - return err; -} - -const struct bpf_func_proto bpf_get_stack_proto = { - .func = bpf_get_stack, - .gpl_only = true, - .ret_type = RET_INTEGER, - .arg1_type = ARG_PTR_TO_CTX, - .arg2_type = ARG_PTR_TO_UNINIT_MEM, - .arg3_type = ARG_CONST_SIZE_OR_ZERO, - .arg4_type = ARG_ANYTHING, -}; - -/* Called from eBPF program */ -static void *stack_map_lookup_elem(struct bpf_map *map, void *key) -{ - return NULL; -} - -/* Called from syscall */ -int bpf_stackmap_copy(struct bpf_map *map, void *key, void *value) -{ - struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map); - struct stack_map_bucket *bucket, *old_bucket; - u32 id = *(u32 *)key, trace_len; - - if (unlikely(id >= smap->n_buckets)) - return -ENOENT; - - bucket = xchg(&smap->buckets[id], NULL); - if (!bucket) - return -ENOENT; - - trace_len = bucket->nr * stack_map_data_size(map); - memcpy(value, bucket->data, trace_len); - memset(value + trace_len, 0, map->value_size - trace_len); - - old_bucket = xchg(&smap->buckets[id], bucket); - if (old_bucket) - pcpu_freelist_push(&smap->freelist, &old_bucket->fnode); - return 0; -} - -static int stack_map_get_next_key(struct bpf_map *map, void *key, - void *next_key) -{ - struct bpf_stack_map *smap = container_of(map, - struct bpf_stack_map, map); - u32 id; - - WARN_ON_ONCE(!rcu_read_lock_held()); - - if (!key) { - id = 0; - } else { - id = *(u32 *)key; - if (id >= smap->n_buckets || !smap->buckets[id]) - id = 0; - else - id++; - } - - while (id < smap->n_buckets && !smap->buckets[id]) - id++; - - if (id >= smap->n_buckets) - return -ENOENT; - - *(u32 *)next_key = id; - return 0; -} - -static int stack_map_update_elem(struct bpf_map *map, void *key, void *value, - u64 map_flags) -{ - return -EINVAL; -} - -/* Called from syscall or from eBPF program */ -static int stack_map_delete_elem(struct bpf_map *map, void *key) -{ - struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map); - struct stack_map_bucket *old_bucket; - u32 id = *(u32 *)key; - - if (unlikely(id >= smap->n_buckets)) - return -E2BIG; - - old_bucket = xchg(&smap->buckets[id], NULL); - if (old_bucket) { - pcpu_freelist_push(&smap->freelist, &old_bucket->fnode); - return 0; - } else { - return -ENOENT; - } -} - -/* Called when map->refcnt goes to zero, either from workqueue or from syscall */ -static void stack_map_free(struct bpf_map *map) -{ - struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map); - - /* wait for bpf programs to complete before freeing stack map */ - synchronize_rcu(); - - bpf_map_area_free(smap->elems); - pcpu_freelist_destroy(&smap->freelist); - bpf_map_area_free(smap); - put_callchain_buffers(); -} - -const struct bpf_map_ops stack_map_ops = { - .map_alloc = stack_map_alloc, - .map_free = stack_map_free, - .map_get_next_key = stack_map_get_next_key, - .map_lookup_elem = stack_map_lookup_elem, - .map_update_elem = stack_map_update_elem, - .map_delete_elem = stack_map_delete_elem, - .map_check_btf = map_check_no_btf, -}; - -static int __init stack_map_init(void) -{ - int cpu; - struct stack_map_irq_work *work; - - for_each_possible_cpu(cpu) { - work = per_cpu_ptr(&up_read_work, cpu); - init_irq_work(&work->irq_work, do_up_read); - } - return 0; -} -subsys_initcall(stack_map_init); diff --git a/kernel/bpf/stacktracemap.c b/kernel/bpf/stacktracemap.c new file mode 100644 index 000000000000..bb41e293418d --- /dev/null +++ b/kernel/bpf/stacktracemap.c @@ -0,0 +1,624 @@ +/* Copyright (c) 2016 Facebook + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include "percpu_freelist.h" + +#define STACK_CREATE_FLAG_MASK \ + (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY | \ + BPF_F_STACK_BUILD_ID) + +struct stack_map_bucket { + struct pcpu_freelist_node fnode; + u32 hash; + u32 nr; + u64 data[]; +}; + +struct bpf_stack_map { + struct bpf_map map; + void *elems; + struct pcpu_freelist freelist; + u32 n_buckets; + struct stack_map_bucket *buckets[]; +}; + +/* irq_work to run up_read() for build_id lookup in nmi context */ +struct stack_map_irq_work { + struct irq_work irq_work; + struct rw_semaphore *sem; +}; + +static void do_up_read(struct irq_work *entry) +{ + struct stack_map_irq_work *work; + + work = container_of(entry, struct stack_map_irq_work, irq_work); + up_read(work->sem); + work->sem = NULL; +} + +static DEFINE_PER_CPU(struct stack_map_irq_work, up_read_work); + +static inline bool stack_map_use_build_id(struct bpf_map *map) +{ + return (map->map_flags & BPF_F_STACK_BUILD_ID); +} + +static inline int stack_map_data_size(struct bpf_map *map) +{ + return stack_map_use_build_id(map) ? + sizeof(struct bpf_stack_build_id) : sizeof(u64); +} + +static int prealloc_elems_and_freelist(struct bpf_stack_map *smap) +{ + u32 elem_size = sizeof(struct stack_map_bucket) + smap->map.value_size; + int err; + + smap->elems = bpf_map_area_alloc(elem_size * smap->map.max_entries, + smap->map.numa_node); + if (!smap->elems) + return -ENOMEM; + + err = pcpu_freelist_init(&smap->freelist); + if (err) + goto free_elems; + + pcpu_freelist_populate(&smap->freelist, smap->elems, elem_size, + smap->map.max_entries); + return 0; + +free_elems: + bpf_map_area_free(smap->elems); + return err; +} + +/* Called from syscall */ +static struct bpf_map *stack_map_alloc(union bpf_attr *attr) +{ + u32 value_size = attr->value_size; + struct bpf_stack_map *smap; + u64 cost, n_buckets; + int err; + + if (!capable(CAP_SYS_ADMIN)) + return ERR_PTR(-EPERM); + + if (attr->map_flags & ~STACK_CREATE_FLAG_MASK) + return ERR_PTR(-EINVAL); + + /* check sanity of attributes */ + if (attr->max_entries == 0 || attr->key_size != 4 || + value_size < 8 || value_size % 8) + return ERR_PTR(-EINVAL); + + BUILD_BUG_ON(sizeof(struct bpf_stack_build_id) % sizeof(u64)); + if (attr->map_flags & BPF_F_STACK_BUILD_ID) { + if (value_size % sizeof(struct bpf_stack_build_id) || + value_size / sizeof(struct bpf_stack_build_id) + > sysctl_perf_event_max_stack) + return ERR_PTR(-EINVAL); + } else if (value_size / 8 > sysctl_perf_event_max_stack) + return ERR_PTR(-EINVAL); + + /* hash table size must be power of 2 */ + n_buckets = roundup_pow_of_two(attr->max_entries); + + cost = n_buckets * sizeof(struct stack_map_bucket *) + sizeof(*smap); + if (cost >= U32_MAX - PAGE_SIZE) + return ERR_PTR(-E2BIG); + + smap = bpf_map_area_alloc(cost, bpf_map_attr_numa_node(attr)); + if (!smap) + return ERR_PTR(-ENOMEM); + + err = -E2BIG; + cost += n_buckets * (value_size + sizeof(struct stack_map_bucket)); + if (cost >= U32_MAX - PAGE_SIZE) + goto free_smap; + + bpf_map_init_from_attr(&smap->map, attr); + smap->map.value_size = value_size; + smap->n_buckets = n_buckets; + smap->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT; + + err = bpf_map_precharge_memlock(smap->map.pages); + if (err) + goto free_smap; + + err = get_callchain_buffers(sysctl_perf_event_max_stack); + if (err) + goto free_smap; + + err = prealloc_elems_and_freelist(smap); + if (err) + goto put_buffers; + + return &smap->map; + +put_buffers: + put_callchain_buffers(); +free_smap: + bpf_map_area_free(smap); + return ERR_PTR(err); +} + +#define BPF_BUILD_ID 3 +/* + * Parse build id from the note segment. This logic can be shared between + * 32-bit and 64-bit system, because Elf32_Nhdr and Elf64_Nhdr are + * identical. + */ +static inline int stack_map_parse_build_id(void *page_addr, + unsigned char *build_id, + void *note_start, + Elf32_Word note_size) +{ + Elf32_Word note_offs = 0, new_offs; + + /* check for overflow */ + if (note_start < page_addr || note_start + note_size < note_start) + return -EINVAL; + + /* only supports note that fits in the first page */ + if (note_start + note_size > page_addr + PAGE_SIZE) + return -EINVAL; + + while (note_offs + sizeof(Elf32_Nhdr) < note_size) { + Elf32_Nhdr *nhdr = (Elf32_Nhdr *)(note_start + note_offs); + + if (nhdr->n_type == BPF_BUILD_ID && + nhdr->n_namesz == sizeof("GNU") && + nhdr->n_descsz == BPF_BUILD_ID_SIZE) { + memcpy(build_id, + note_start + note_offs + + ALIGN(sizeof("GNU"), 4) + sizeof(Elf32_Nhdr), + BPF_BUILD_ID_SIZE); + return 0; + } + new_offs = note_offs + sizeof(Elf32_Nhdr) + + ALIGN(nhdr->n_namesz, 4) + ALIGN(nhdr->n_descsz, 4); + if (new_offs <= note_offs) /* overflow */ + break; + note_offs = new_offs; + } + return -EINVAL; +} + +/* Parse build ID from 32-bit ELF */ +static int stack_map_get_build_id_32(void *page_addr, + unsigned char *build_id) +{ + Elf32_Ehdr *ehdr = (Elf32_Ehdr *)page_addr; + Elf32_Phdr *phdr; + int i; + + /* only supports phdr that fits in one page */ + if (ehdr->e_phnum > + (PAGE_SIZE - sizeof(Elf32_Ehdr)) / sizeof(Elf32_Phdr)) + return -EINVAL; + + phdr = (Elf32_Phdr *)(page_addr + sizeof(Elf32_Ehdr)); + + for (i = 0; i < ehdr->e_phnum; ++i) + if (phdr[i].p_type == PT_NOTE) + return stack_map_parse_build_id(page_addr, build_id, + page_addr + phdr[i].p_offset, + phdr[i].p_filesz); + return -EINVAL; +} + +/* Parse build ID from 64-bit ELF */ +static int stack_map_get_build_id_64(void *page_addr, + unsigned char *build_id) +{ + Elf64_Ehdr *ehdr = (Elf64_Ehdr *)page_addr; + Elf64_Phdr *phdr; + int i; + + /* only supports phdr that fits in one page */ + if (ehdr->e_phnum > + (PAGE_SIZE - sizeof(Elf64_Ehdr)) / sizeof(Elf64_Phdr)) + return -EINVAL; + + phdr = (Elf64_Phdr *)(page_addr + sizeof(Elf64_Ehdr)); + + for (i = 0; i < ehdr->e_phnum; ++i) + if (phdr[i].p_type == PT_NOTE) + return stack_map_parse_build_id(page_addr, build_id, + page_addr + phdr[i].p_offset, + phdr[i].p_filesz); + return -EINVAL; +} + +/* Parse build ID of ELF file mapped to vma */ +static int stack_map_get_build_id(struct vm_area_struct *vma, + unsigned char *build_id) +{ + Elf32_Ehdr *ehdr; + struct page *page; + void *page_addr; + int ret; + + /* only works for page backed storage */ + if (!vma->vm_file) + return -EINVAL; + + page = find_get_page(vma->vm_file->f_mapping, 0); + if (!page) + return -EFAULT; /* page not mapped */ + + ret = -EINVAL; + page_addr = page_address(page); + ehdr = (Elf32_Ehdr *)page_addr; + + /* compare magic x7f "ELF" */ + if (memcmp(ehdr->e_ident, ELFMAG, SELFMAG) != 0) + goto out; + + /* only support executable file and shared object file */ + if (ehdr->e_type != ET_EXEC && ehdr->e_type != ET_DYN) + goto out; + + if (ehdr->e_ident[EI_CLASS] == ELFCLASS32) + ret = stack_map_get_build_id_32(page_addr, build_id); + else if (ehdr->e_ident[EI_CLASS] == ELFCLASS64) + ret = stack_map_get_build_id_64(page_addr, build_id); +out: + put_page(page); + return ret; +} + +static void stack_map_get_build_id_offset(struct bpf_stack_build_id *id_offs, + u64 *ips, u32 trace_nr, bool user) +{ + int i; + struct vm_area_struct *vma; + bool irq_work_busy = false; + struct stack_map_irq_work *work = NULL; + + if (in_nmi()) { + work = this_cpu_ptr(&up_read_work); + if (work->irq_work.flags & IRQ_WORK_BUSY) + /* cannot queue more up_read, fallback */ + irq_work_busy = true; + } + + /* + * We cannot do up_read() in nmi context. To do build_id lookup + * in nmi context, we need to run up_read() in irq_work. We use + * a percpu variable to do the irq_work. If the irq_work is + * already used by another lookup, we fall back to report ips. + * + * Same fallback is used for kernel stack (!user) on a stackmap + * with build_id. + */ + if (!user || !current || !current->mm || irq_work_busy || + down_read_trylock(¤t->mm->mmap_sem) == 0) { + /* cannot access current->mm, fall back to ips */ + for (i = 0; i < trace_nr; i++) { + id_offs[i].status = BPF_STACK_BUILD_ID_IP; + id_offs[i].ip = ips[i]; + } + return; + } + + for (i = 0; i < trace_nr; i++) { + vma = find_vma(current->mm, ips[i]); + if (!vma || stack_map_get_build_id(vma, id_offs[i].build_id)) { + /* per entry fall back to ips */ + id_offs[i].status = BPF_STACK_BUILD_ID_IP; + id_offs[i].ip = ips[i]; + continue; + } + id_offs[i].offset = (vma->vm_pgoff << PAGE_SHIFT) + ips[i] + - vma->vm_start; + id_offs[i].status = BPF_STACK_BUILD_ID_VALID; + } + + if (!work) { + up_read(¤t->mm->mmap_sem); + } else { + work->sem = ¤t->mm->mmap_sem; + irq_work_queue(&work->irq_work); + } +} + +BPF_CALL_3(bpf_get_stackid, struct pt_regs *, regs, struct bpf_map *, map, + u64, flags) +{ + struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map); + struct perf_callchain_entry *trace; + struct stack_map_bucket *bucket, *new_bucket, *old_bucket; + u32 max_depth = map->value_size / stack_map_data_size(map); + /* stack_map_alloc() checks that max_depth <= sysctl_perf_event_max_stack */ + u32 init_nr = sysctl_perf_event_max_stack - max_depth; + u32 skip = flags & BPF_F_SKIP_FIELD_MASK; + u32 hash, id, trace_nr, trace_len; + bool user = flags & BPF_F_USER_STACK; + bool kernel = !user; + u64 *ips; + bool hash_matches; + + if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK | + BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID))) + return -EINVAL; + + trace = get_perf_callchain(regs, init_nr, kernel, user, + sysctl_perf_event_max_stack, false, false); + + if (unlikely(!trace)) + /* couldn't fetch the stack trace */ + return -EFAULT; + + /* get_perf_callchain() guarantees that trace->nr >= init_nr + * and trace-nr <= sysctl_perf_event_max_stack, so trace_nr <= max_depth + */ + trace_nr = trace->nr - init_nr; + + if (trace_nr <= skip) + /* skipping more than usable stack trace */ + return -EFAULT; + + trace_nr -= skip; + trace_len = trace_nr * sizeof(u64); + ips = trace->ip + skip + init_nr; + hash = jhash2((u32 *)ips, trace_len / sizeof(u32), 0); + id = hash & (smap->n_buckets - 1); + bucket = READ_ONCE(smap->buckets[id]); + + hash_matches = bucket && bucket->hash == hash; + /* fast cmp */ + if (hash_matches && flags & BPF_F_FAST_STACK_CMP) + return id; + + if (stack_map_use_build_id(map)) { + /* for build_id+offset, pop a bucket before slow cmp */ + new_bucket = (struct stack_map_bucket *) + pcpu_freelist_pop(&smap->freelist); + if (unlikely(!new_bucket)) + return -ENOMEM; + new_bucket->nr = trace_nr; + stack_map_get_build_id_offset( + (struct bpf_stack_build_id *)new_bucket->data, + ips, trace_nr, user); + trace_len = trace_nr * sizeof(struct bpf_stack_build_id); + if (hash_matches && bucket->nr == trace_nr && + memcmp(bucket->data, new_bucket->data, trace_len) == 0) { + pcpu_freelist_push(&smap->freelist, &new_bucket->fnode); + return id; + } + if (bucket && !(flags & BPF_F_REUSE_STACKID)) { + pcpu_freelist_push(&smap->freelist, &new_bucket->fnode); + return -EEXIST; + } + } else { + if (hash_matches && bucket->nr == trace_nr && + memcmp(bucket->data, ips, trace_len) == 0) + return id; + if (bucket && !(flags & BPF_F_REUSE_STACKID)) + return -EEXIST; + + new_bucket = (struct stack_map_bucket *) + pcpu_freelist_pop(&smap->freelist); + if (unlikely(!new_bucket)) + return -ENOMEM; + memcpy(new_bucket->data, ips, trace_len); + } + + new_bucket->hash = hash; + new_bucket->nr = trace_nr; + + old_bucket = xchg(&smap->buckets[id], new_bucket); + if (old_bucket) + pcpu_freelist_push(&smap->freelist, &old_bucket->fnode); + return id; +} + +const struct bpf_func_proto bpf_get_stackid_proto = { + .func = bpf_get_stackid, + .gpl_only = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_CONST_MAP_PTR, + .arg3_type = ARG_ANYTHING, +}; + +BPF_CALL_4(bpf_get_stack, struct pt_regs *, regs, void *, buf, u32, size, + u64, flags) +{ + u32 init_nr, trace_nr, copy_len, elem_size, num_elem; + bool user_build_id = flags & BPF_F_USER_BUILD_ID; + u32 skip = flags & BPF_F_SKIP_FIELD_MASK; + bool user = flags & BPF_F_USER_STACK; + struct perf_callchain_entry *trace; + bool kernel = !user; + int err = -EINVAL; + u64 *ips; + + if (unlikely(flags & ~(BPF_F_SKIP_FIELD_MASK | BPF_F_USER_STACK | + BPF_F_USER_BUILD_ID))) + goto clear; + if (kernel && user_build_id) + goto clear; + + elem_size = (user && user_build_id) ? sizeof(struct bpf_stack_build_id) + : sizeof(u64); + if (unlikely(size % elem_size)) + goto clear; + + num_elem = size / elem_size; + if (sysctl_perf_event_max_stack < num_elem) + init_nr = 0; + else + init_nr = sysctl_perf_event_max_stack - num_elem; + trace = get_perf_callchain(regs, init_nr, kernel, user, + sysctl_perf_event_max_stack, false, false); + if (unlikely(!trace)) + goto err_fault; + + trace_nr = trace->nr - init_nr; + if (trace_nr < skip) + goto err_fault; + + trace_nr -= skip; + trace_nr = (trace_nr <= num_elem) ? trace_nr : num_elem; + copy_len = trace_nr * elem_size; + ips = trace->ip + skip + init_nr; + if (user && user_build_id) + stack_map_get_build_id_offset(buf, ips, trace_nr, user); + else + memcpy(buf, ips, copy_len); + + if (size > copy_len) + memset(buf + copy_len, 0, size - copy_len); + return copy_len; + +err_fault: + err = -EFAULT; +clear: + memset(buf, 0, size); + return err; +} + +const struct bpf_func_proto bpf_get_stack_proto = { + .func = bpf_get_stack, + .gpl_only = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_PTR_TO_CTX, + .arg2_type = ARG_PTR_TO_UNINIT_MEM, + .arg3_type = ARG_CONST_SIZE_OR_ZERO, + .arg4_type = ARG_ANYTHING, +}; + +/* Called from eBPF program */ +static void *stack_map_lookup_elem(struct bpf_map *map, void *key) +{ + return NULL; +} + +/* Called from syscall */ +int bpf_stackmap_copy(struct bpf_map *map, void *key, void *value) +{ + struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map); + struct stack_map_bucket *bucket, *old_bucket; + u32 id = *(u32 *)key, trace_len; + + if (unlikely(id >= smap->n_buckets)) + return -ENOENT; + + bucket = xchg(&smap->buckets[id], NULL); + if (!bucket) + return -ENOENT; + + trace_len = bucket->nr * stack_map_data_size(map); + memcpy(value, bucket->data, trace_len); + memset(value + trace_len, 0, map->value_size - trace_len); + + old_bucket = xchg(&smap->buckets[id], bucket); + if (old_bucket) + pcpu_freelist_push(&smap->freelist, &old_bucket->fnode); + return 0; +} + +static int stack_map_get_next_key(struct bpf_map *map, void *key, + void *next_key) +{ + struct bpf_stack_map *smap = container_of(map, + struct bpf_stack_map, map); + u32 id; + + WARN_ON_ONCE(!rcu_read_lock_held()); + + if (!key) { + id = 0; + } else { + id = *(u32 *)key; + if (id >= smap->n_buckets || !smap->buckets[id]) + id = 0; + else + id++; + } + + while (id < smap->n_buckets && !smap->buckets[id]) + id++; + + if (id >= smap->n_buckets) + return -ENOENT; + + *(u32 *)next_key = id; + return 0; +} + +static int stack_map_update_elem(struct bpf_map *map, void *key, void *value, + u64 map_flags) +{ + return -EINVAL; +} + +/* Called from syscall or from eBPF program */ +static int stack_map_delete_elem(struct bpf_map *map, void *key) +{ + struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map); + struct stack_map_bucket *old_bucket; + u32 id = *(u32 *)key; + + if (unlikely(id >= smap->n_buckets)) + return -E2BIG; + + old_bucket = xchg(&smap->buckets[id], NULL); + if (old_bucket) { + pcpu_freelist_push(&smap->freelist, &old_bucket->fnode); + return 0; + } else { + return -ENOENT; + } +} + +/* Called when map->refcnt goes to zero, either from workqueue or from syscall */ +static void stack_map_free(struct bpf_map *map) +{ + struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map); + + /* wait for bpf programs to complete before freeing stack map */ + synchronize_rcu(); + + bpf_map_area_free(smap->elems); + pcpu_freelist_destroy(&smap->freelist); + bpf_map_area_free(smap); + put_callchain_buffers(); +} + +const struct bpf_map_ops stack_trace_map_ops = { + .map_alloc = stack_map_alloc, + .map_free = stack_map_free, + .map_get_next_key = stack_map_get_next_key, + .map_lookup_elem = stack_map_lookup_elem, + .map_update_elem = stack_map_update_elem, + .map_delete_elem = stack_map_delete_elem, + .map_check_btf = map_check_no_btf, +}; + +static int __init stack_map_init(void) +{ + int cpu; + struct stack_map_irq_work *work; + + for_each_possible_cpu(cpu) { + work = per_cpu_ptr(&up_read_work, cpu); + init_irq_work(&work->irq_work, do_up_read); + } + return 0; +} +subsys_initcall(stack_map_init); From patchwork Tue Sep 18 04:52:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Vasquez X-Patchwork-Id: 970929 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=polito.it Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 42DrGZ6VWBz9sCV for ; Tue, 18 Sep 2018 14:53:10 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729212AbeIRKXy (ORCPT ); Tue, 18 Sep 2018 06:23:54 -0400 Received: from fm2nodo5.polito.it ([130.192.180.19]:54583 "EHLO fm2nodo5.polito.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726759AbeIRKXx (ORCPT ); Tue, 18 Sep 2018 06:23:53 -0400 Received: from polito.it (frontmail2.polito.it [130.192.180.42]) by fm2nodo5.polito.it with ESMTP id w8I4qehM029943-w8I4qehO029943 (version=TLSv1.0 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=CAFAIL); Tue, 18 Sep 2018 06:52:40 +0200 X-ExtScanner: Niversoft's FindAttachments (free) Received: from [130.192.225.241] (account d040768@polito.it HELO [127.0.1.1]) by polito.it (CommuniGate Pro SMTP 6.2.5) with ESMTPSA id 79303206; Tue, 18 Sep 2018 06:52:40 +0200 Subject: [RFC PATCH bpf-next v3 2/7] bpf/syscall: allow key to be null in map functions From: Mauricio Vasquez B To: Alexei Starovoitov , Daniel Borkmann , netdev@vger.kernel.org Cc: Yonghong Song Date: Tue, 18 Sep 2018 06:52:40 +0200 Message-ID: <153724635986.7866.4652214103601747174.stgit@kernel> In-Reply-To: <153724634652.7866.6354309647800281793.stgit@kernel> References: <153724634652.7866.6354309647800281793.stgit@kernel> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This commit adds the required logic to allow key being NULL in case the key_size of the map is 0. A new __bpf_copy_key function helper only copies the key from userpsace when key_size != 0, otherwise it enforces that key must be null. Signed-off-by: Mauricio Vasquez B --- kernel/bpf/syscall.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 3c9636f03bb2..f2d4e4f280dc 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -651,6 +651,17 @@ int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value) return -ENOTSUPP; } +static void *__bpf_copy_key(void __user *ukey, u64 key_size) +{ + if (key_size) + return memdup_user(ukey, key_size); + + if (ukey) + return ERR_PTR(-EINVAL); + + return NULL; +} + /* last field in 'union bpf_attr' used by this command */ #define BPF_MAP_LOOKUP_ELEM_LAST_FIELD value @@ -678,7 +689,7 @@ static int map_lookup_elem(union bpf_attr *attr) goto err_put; } - key = memdup_user(ukey, map->key_size); + key = __bpf_copy_key(ukey, map->key_size); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; @@ -766,7 +777,7 @@ static int map_update_elem(union bpf_attr *attr) goto err_put; } - key = memdup_user(ukey, map->key_size); + key = __bpf_copy_key(ukey, map->key_size); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; @@ -864,7 +875,7 @@ static int map_delete_elem(union bpf_attr *attr) goto err_put; } - key = memdup_user(ukey, map->key_size); + key = __bpf_copy_key(ukey, map->key_size); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; @@ -916,7 +927,7 @@ static int map_get_next_key(union bpf_attr *attr) } if (ukey) { - key = memdup_user(ukey, map->key_size); + key = __bpf_copy_key(ukey, map->key_size); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; From patchwork Tue Sep 18 04:52:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Vasquez X-Patchwork-Id: 970931 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=polito.it Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 42DrGh1N9zz9sBv for ; Tue, 18 Sep 2018 14:53:16 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729239AbeIRKX7 (ORCPT ); Tue, 18 Sep 2018 06:23:59 -0400 Received: from fm2nodo5.polito.it ([130.192.180.19]:54589 "EHLO fm2nodo5.polito.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726779AbeIRKX7 (ORCPT ); Tue, 18 Sep 2018 06:23:59 -0400 Received: from polito.it (frontmail2.polito.it [130.192.180.42]) by fm2nodo5.polito.it with ESMTP id w8I4qkKl029983-w8I4qkKn029983 (version=TLSv1.0 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=CAFAIL); Tue, 18 Sep 2018 06:52:46 +0200 X-ExtScanner: Niversoft's FindAttachments (free) Received: from [130.192.225.241] (account d040768@polito.it HELO [127.0.1.1]) by polito.it (CommuniGate Pro SMTP 6.2.5) with ESMTPSA id 79303263; Tue, 18 Sep 2018 06:52:45 +0200 Subject: [RFC PATCH bpf-next v3 3/7] bpf: add lookup_and_delete map operation From: Mauricio Vasquez B To: Alexei Starovoitov , Daniel Borkmann , netdev@vger.kernel.org Cc: Yonghong Song Date: Tue, 18 Sep 2018 06:52:45 +0200 Message-ID: <153724636549.7866.10996029558223703966.stgit@kernel> In-Reply-To: <153724634652.7866.6354309647800281793.stgit@kernel> References: <153724634652.7866.6354309647800281793.stgit@kernel> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org The following patch implements a bpf queue/stack maps that provides the peek/pop/push functions. There is not a direct relationship between those functions and the current operations supported by a map, hence a new lookup_and_delete map operation is added, this operation would be used by the pop helper. A pop operation is not added because it will too specific to stack/queue maps, instead this new operation could be useful for other maps as well. Signed-off-by: Mauricio Vasquez B --- include/linux/bpf.h | 1 + include/uapi/linux/bpf.h | 1 + kernel/bpf/syscall.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 84 insertions(+) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 523481a3471b..c63a44381d3f 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -39,6 +39,7 @@ struct bpf_map_ops { void *(*map_lookup_elem)(struct bpf_map *map, void *key); int (*map_update_elem)(struct bpf_map *map, void *key, void *value, u64 flags); int (*map_delete_elem)(struct bpf_map *map, void *key); + void *(*map_lookup_and_delete_elem)(struct bpf_map *map, void *key); /* funcs called by prog_array and perf_event_array map */ void *(*map_fd_get_ptr)(struct bpf_map *map, struct file *map_file, diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 66917a4eba27..4cda584c6640 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -103,6 +103,7 @@ enum bpf_cmd { BPF_BTF_LOAD, BPF_BTF_GET_FD_BY_ID, BPF_TASK_FD_QUERY, + BPF_MAP_LOOKUP_AND_DELETE_ELEM, }; enum bpf_map_type { diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index f2d4e4f280dc..7d429123a298 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -968,6 +968,85 @@ static int map_get_next_key(union bpf_attr *attr) return err; } +#define BPF_MAP_LOOKUP_AND_DELETE_ELEM_LAST_FIELD value + +static int map_lookup_and_delete_elem(union bpf_attr *attr) +{ + void __user *ukey = u64_to_user_ptr(attr->key); + void __user *uvalue = u64_to_user_ptr(attr->value); + int ufd = attr->map_fd; + struct bpf_map *map; + void *key, *value, *ptr; + u32 value_size; + struct fd f; + int err; + + if (CHECK_ATTR(BPF_MAP_LOOKUP_ELEM)) + return -EINVAL; + + f = fdget(ufd); + map = __bpf_map_get(f); + if (IS_ERR(map)) + return PTR_ERR(map); + + if (!(f.file->f_mode & FMODE_CAN_WRITE)) { + err = -EPERM; + goto err_put; + } + + if (!map->ops->map_lookup_and_delete_elem) { + err = -ENOTSUPP; + goto err_put; + } + + key = __bpf_copy_key(ukey, map->key_size); + if (IS_ERR(key)) { + err = PTR_ERR(key); + goto err_put; + } + + value_size = map->value_size; + + err = -ENOMEM; + value = kmalloc(value_size, GFP_USER | __GFP_NOWARN); + if (!value) + goto free_key; + + err = -EFAULT; + if (copy_from_user(value, uvalue, value_size) != 0) + goto free_value; + + /* must increment bpf_prog_active to avoid kprobe+bpf triggering from + * inside bpf map update or delete otherwise deadlocks are possible + */ + preempt_disable(); + __this_cpu_inc(bpf_prog_active); + rcu_read_lock(); + ptr = map->ops->map_lookup_and_delete_elem(map, key); + if (ptr) + memcpy(value, ptr, value_size); + rcu_read_unlock(); + err = ptr ? 0 : -ENOENT; + __this_cpu_dec(bpf_prog_active); + preempt_enable(); + + if (err) + goto free_value; + + if (copy_to_user(uvalue, value, value_size) != 0) + goto free_value; + + err = 0; + +free_value: + kfree(value); +free_key: + kfree(key); +err_put: + fdput(f); + return err; +} + static const struct bpf_prog_ops * const bpf_prog_types[] = { #define BPF_PROG_TYPE(_id, _name) \ [_id] = & _name ## _prog_ops, @@ -2428,6 +2507,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz case BPF_TASK_FD_QUERY: err = bpf_task_fd_query(&attr, uattr); break; + case BPF_MAP_LOOKUP_AND_DELETE_ELEM: + err = map_lookup_and_delete_elem(&attr); + break; default: err = -EINVAL; break; From patchwork Tue Sep 18 04:52:51 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Vasquez X-Patchwork-Id: 970932 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=polito.it Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 42DrGr6MRTz9sBv for ; Tue, 18 Sep 2018 14:53:24 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729253AbeIRKYJ (ORCPT ); Tue, 18 Sep 2018 06:24:09 -0400 Received: from fm1nodo1.polito.it ([130.192.180.16]:44164 "EHLO antispam.polito.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726779AbeIRKYJ (ORCPT ); Tue, 18 Sep 2018 06:24:09 -0400 Received: from polito.it (frontmail2.polito.it [130.192.180.42]) by antispam.polito.it with ESMTP id w8I4qpoU026887-w8I4qpoW026887 (version=TLSv1.0 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=CAFAIL); Tue, 18 Sep 2018 06:52:51 +0200 X-ExtScanner: Niversoft's FindAttachments (free) Received: from [130.192.225.241] (account d040768@polito.it HELO [127.0.1.1]) by polito.it (CommuniGate Pro SMTP 6.2.5) with ESMTPSA id 79303272; Tue, 18 Sep 2018 06:52:51 +0200 Subject: [RFC PATCH bpf-next v3 4/7] bpf: add bpf queue and stack maps From: Mauricio Vasquez B To: Alexei Starovoitov , Daniel Borkmann , netdev@vger.kernel.org Cc: Yonghong Song Date: Tue, 18 Sep 2018 06:52:51 +0200 Message-ID: <153724637114.7866.8418882266337744689.stgit@kernel> In-Reply-To: <153724634652.7866.6354309647800281793.stgit@kernel> References: <153724634652.7866.6354309647800281793.stgit@kernel> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Implement two new kind of maps that support the peek, push and pop operations. A use case for this is to keep track of a pool of elements, like network ports in a SNAT. Signed-off-by: Mauricio Vasquez B --- include/linux/bpf.h | 3 include/linux/bpf_types.h | 2 include/uapi/linux/bpf.h | 30 ++++ kernel/bpf/Makefile | 2 kernel/bpf/core.c | 3 kernel/bpf/helpers.c | 98 ++++++++++++++ kernel/bpf/queue_stack_maps.c | 291 +++++++++++++++++++++++++++++++++++++++++ kernel/bpf/verifier.c | 5 + net/core/filter.c | 6 + 9 files changed, 437 insertions(+), 3 deletions(-) create mode 100644 kernel/bpf/queue_stack_maps.c diff --git a/include/linux/bpf.h b/include/linux/bpf.h index c63a44381d3f..8e924b5c5a0e 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -807,6 +807,9 @@ static inline int bpf_fd_reuseport_array_update_elem(struct bpf_map *map, extern const struct bpf_func_proto bpf_map_lookup_elem_proto; extern const struct bpf_func_proto bpf_map_update_elem_proto; extern const struct bpf_func_proto bpf_map_delete_elem_proto; +extern const struct bpf_func_proto bpf_map_push_elem_proto; +extern const struct bpf_func_proto bpf_map_pop_elem_proto; +extern const struct bpf_func_proto bpf_map_peek_elem_proto; extern const struct bpf_func_proto bpf_get_prandom_u32_proto; extern const struct bpf_func_proto bpf_get_smp_processor_id_proto; diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index 33f7f574b983..903a446f14c3 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -67,3 +67,5 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_XSKMAP, xsk_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, reuseport_array_ops) #endif #endif +BPF_MAP_TYPE(BPF_MAP_TYPE_QUEUE, queue_map_ops) +BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, stack_map_ops) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 4cda584c6640..c899386dcb2b 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -128,6 +128,8 @@ enum bpf_map_type { BPF_MAP_TYPE_SOCKHASH, BPF_MAP_TYPE_CGROUP_STORAGE, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, + BPF_MAP_TYPE_QUEUE, + BPF_MAP_TYPE_STACK, }; enum bpf_prog_type { @@ -460,6 +462,29 @@ union bpf_attr { * Return * 0 on success, or a negative error in case of failure. * + * int bpf_map_push_elem(struct bpf_map *map, const void *value, u32 len, + * u64 flags) + * Description + * Push an element *value* in *map*. *flags* is one of: + * + * **BPF_EXIST** + * If the queue/stack is full, the oldest element is removed to + * make room for this. + * Return + * 0 on success, or a negative error in case of failure. + * + * int bpf_map_pop_elem(struct bpf_map *map, void *value, u32 len) + * Description + * Pop an element from *map*. + * Return + * 0 on success, or a negative error in case of failure. + * + * int bpf_map_peek_elem(struct bpf_map *map, void *value, u32 len) + * Description + * Get an element from *map* without removing it. + * Return + * 0 on success, or a negative error in case of failure. + * * int bpf_probe_read(void *dst, u32 size, const void *src) * Description * For tracing programs, safely attempt to read *size* bytes from @@ -2227,7 +2252,10 @@ union bpf_attr { FN(get_current_cgroup_id), \ FN(get_local_storage), \ FN(sk_select_reuseport), \ - FN(skb_ancestor_cgroup_id), + FN(skb_ancestor_cgroup_id), \ + FN(map_push_elem), \ + FN(map_pop_elem), \ + FN(map_peek_elem), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index e656bce87c8f..2d77bc5b2aca 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -3,7 +3,7 @@ obj-y := core.o obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o -obj-$(CONFIG_BPF_SYSCALL) += local_storage.o +obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o obj-$(CONFIG_BPF_SYSCALL) += disasm.o obj-$(CONFIG_BPF_SYSCALL) += btf.o ifeq ($(CONFIG_NET),y) diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 3f5bf1af0826..8d2db076d123 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -1783,6 +1783,9 @@ BPF_CALL_0(bpf_user_rnd_u32) const struct bpf_func_proto bpf_map_lookup_elem_proto __weak; const struct bpf_func_proto bpf_map_update_elem_proto __weak; const struct bpf_func_proto bpf_map_delete_elem_proto __weak; +const struct bpf_func_proto bpf_map_push_elem_proto __weak; +const struct bpf_func_proto bpf_map_pop_elem_proto __weak; +const struct bpf_func_proto bpf_map_peek_elem_proto __weak; const struct bpf_func_proto bpf_get_prandom_u32_proto __weak; const struct bpf_func_proto bpf_get_smp_processor_id_proto __weak; diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 1991466b8327..5f364e6acaf1 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -76,6 +76,104 @@ const struct bpf_func_proto bpf_map_delete_elem_proto = { .arg2_type = ARG_PTR_TO_MAP_KEY, }; +BPF_CALL_4(bpf_map_push_elem, struct bpf_map *, map, void *, value, u32 size, + u64, flags) +{ + if (map->value_size != size) + return -EINVAL; + + return map->ops->map_update_elem(map, NULL, value, flags); +} + +const struct bpf_func_proto bpf_map_push_elem_proto = { + .func = bpf_map_push_elem, + .gpl_only = false, + .pkt_access = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_PTR_TO_MEM, + .arg3_type = ARG_CONST_SIZE, + .arg4_type = ARG_ANYTHING, +}; + +BPF_CALL_3(bpf_map_pop_elem, struct bpf_map *, map, void *, value, u32, size) +{ + void *ptr; + + if (map->value_size != size) + return -EINVAL; + + ptr = map->ops->map_lookup_and_delete_elem(map, NULL); + if (!ptr) + return -ENOENT; + + switch (size) { + case 1: + *(u8 *) value = *(u8 *) ptr; + break; + case 2: + *(u16 *) value = *(u16 *) ptr; + break; + case 4: + *(u32 *) value = *(u32 *) ptr; + break; + case 8: + *(u64 *) value = *(u64 *) ptr; + break; + } + + return 0; +} + +const struct bpf_func_proto bpf_map_pop_elem_proto = { + .func = bpf_map_pop_elem, + .gpl_only = false, + .pkt_access = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_PTR_TO_UNINIT_MEM, + .arg3_type = ARG_CONST_SIZE, +}; + +BPF_CALL_3(bpf_map_peek_elem, struct bpf_map *, map, void *, value, u32, size) +{ + void *ptr; + + if (map->value_size != size) + return -EINVAL; + + ptr = map->ops->map_lookup_elem(map, NULL); + if (!ptr) + return -ENOENT; + + switch (size) { + case 1: + *(u8 *) value = *(u8 *) ptr; + break; + case 2: + *(u16 *) value = *(u16 *) ptr; + break; + case 4: + *(u32 *) value = *(u32 *) ptr; + break; + case 8: + *(u64 *) value = *(u64 *) ptr; + break; + } + + return 0; +} + +const struct bpf_func_proto bpf_map_peek_elem_proto = { + .func = bpf_map_pop_elem, + .gpl_only = false, + .pkt_access = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_PTR_TO_UNINIT_MEM, + .arg3_type = ARG_CONST_SIZE, +}; + const struct bpf_func_proto bpf_get_prandom_u32_proto = { .func = bpf_user_rnd_u32, .gpl_only = false, diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c new file mode 100644 index 000000000000..10c081f3f02b --- /dev/null +++ b/kernel/bpf/queue_stack_maps.c @@ -0,0 +1,291 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * queue_stack_maps.c: BPF queue and stack maps + * + * Copyright (c) 2018 Politecnico di Torino + */ +#include +#include +#include +#include "percpu_freelist.h" + +#define QUEUE_STACK_CREATE_FLAG_MASK \ + (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY) + + +struct bpf_queue_stack { + struct bpf_map map; + raw_spinlock_t lock; + u32 head, tail; + u32 index_mask; + u32 count; + + char elements[0] __aligned(8); +}; + +static struct bpf_queue_stack *bpf_queue_stack(struct bpf_map *map) +{ + return container_of(map, struct bpf_queue_stack, map); +} + +static bool queue_stack_map_is_empty(struct bpf_queue_stack *qs) +{ + return qs->count == 0; +} + +static bool queue_stack_map_is_full(struct bpf_queue_stack *qs) +{ + return qs->count == qs->map.max_entries; +} + +/* Called from syscall */ +static int queue_stack_map_alloc_check(union bpf_attr *attr) +{ + /* check sanity of attributes */ + if (attr->max_entries == 0 || attr->key_size != 0 || + (attr->value_size != 1 && attr->value_size != 2 && + attr->value_size != 4 && attr->value_size != 8) || + attr->map_flags & ~QUEUE_STACK_CREATE_FLAG_MASK) + return -EINVAL; + + if (attr->value_size > KMALLOC_MAX_SIZE) + /* if value_size is bigger, the user space won't be able to + * access the elements. + */ + return -E2BIG; + + return 0; +} + +static struct bpf_map *queue_stack_map_alloc(union bpf_attr *attr) +{ + int ret, numa_node = bpf_map_attr_numa_node(attr); + u32 max_entries, value_size, index_mask; + u64 queue_size, cost, mask64; + struct bpf_queue_stack *qs; + + max_entries = attr->max_entries; + value_size = attr->value_size; + + /* From arraymap.c: + * On 32 bit archs roundup_pow_of_two() with max_entries that has + * upper most bit set in u32 space is undefined behavior due to + * resulting 1U << 32, so do it manually here in u64 space. + */ + mask64 = fls_long(max_entries - 1); + mask64 = 1ULL << mask64; + mask64 -= 1; + + index_mask = mask64; + + /* Round up queue size to nearest power of 2 */ + max_entries = index_mask + 1; + /* Check for overflows. */ + if (max_entries < attr->max_entries) + return ERR_PTR(-E2BIG); + + queue_size = sizeof(*qs) + (u64) value_size * max_entries; + + cost = queue_size; + if (cost >= U32_MAX - PAGE_SIZE) + return ERR_PTR(-E2BIG); + + cost = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT; + + ret = bpf_map_precharge_memlock(cost); + if (ret < 0) + return ERR_PTR(ret); + + qs = bpf_map_area_alloc(queue_size, numa_node); + if (!qs) + return ERR_PTR(-ENOMEM); + + memset(qs, 0, sizeof(*qs)); + + bpf_map_init_from_attr(&qs->map, attr); + + qs->map.pages = cost; + qs->index_mask = index_mask; + + raw_spin_lock_init(&qs->lock); + + return &qs->map; +} + +/* Called when map->refcnt goes to zero, either from workqueue or from syscall */ +static void queue_stack_map_free(struct bpf_map *map) +{ + struct bpf_queue_stack *qs = bpf_queue_stack(map); + + /* at this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0, + * so the programs (can be more than one that used this map) were + * disconnected from events. Wait for outstanding critical sections in + * these programs to complete + */ + synchronize_rcu(); + + bpf_map_area_free(qs); +} + +static void *__queue_map_lookup(struct bpf_map *map, bool delete) +{ + struct bpf_queue_stack *qs = bpf_queue_stack(map); + unsigned long flags; + void *ptr = NULL; + + raw_spin_lock_irqsave(&qs->lock, flags); + + if (queue_stack_map_is_empty(qs)) + goto out; + + ptr = &qs->elements[qs->tail * qs->map.value_size]; + + if (delete) { + qs->tail = (qs->tail + 1) & qs->index_mask; + qs->count--; + } + +out: + raw_spin_unlock_irqrestore(&qs->lock, flags); + return ptr; +} + + +static void *__stack_map_lookup(struct bpf_map *map, bool delete) +{ + struct bpf_queue_stack *qs = bpf_queue_stack(map); + unsigned long flags; + void *ptr = NULL; + u32 index; + + raw_spin_lock_irqsave(&qs->lock, flags); + + if (queue_stack_map_is_empty(qs)) + goto out; + + index = (qs->head - 1) & qs->index_mask; + ptr = &qs->elements[index * qs->map.value_size]; + + if (delete) { + qs->head = (qs->head - 1) & qs->index_mask; + qs->count--; + } + +out: + raw_spin_unlock_irqrestore(&qs->lock, flags); + return ptr; +} + +/* Called from syscall or from eBPF program */ +static void *queue_map_lookup_elem(struct bpf_map *map, void *key) +{ + return __queue_map_lookup(map, false); +} + +/* Called from syscall or from eBPF program */ +static void *stack_map_lookup_elem(struct bpf_map *map, void *key) +{ + return __stack_map_lookup(map, false); +} + +/* Called from syscall or from eBPF program */ +static void *queue_map_lookup_and_delete_elem(struct bpf_map *map, void *key) +{ + return __queue_map_lookup(map, true); +} + +/* Called from syscall or from eBPF program */ +static void *stack_map_lookup_and_delete_elem(struct bpf_map *map, void *key) +{ + return __stack_map_lookup(map, true); +} + +/* Called from syscall or from eBPF program */ +static int queue_stack_map_update_elem(struct bpf_map *map, void *key, + void *value, u64 flags) +{ + struct bpf_queue_stack *qs = bpf_queue_stack(map); + unsigned long irq_flags; + int err = 0; + void *dst; + + /* BPF_EXIST is used to force making room for a new element in case the + * map is full + */ + bool replace = (flags & BPF_EXIST); + + /* Check supported flags for queue and stqueue_map_is_preallocack maps */ + if (flags & BPF_NOEXIST || flags > BPF_EXIST) + return -EINVAL; + + raw_spin_lock_irqsave(&qs->lock, irq_flags); + + if (queue_stack_map_is_full(qs)) { + if (!replace) { + err = -E2BIG; + goto out; + } + /* advance tail pointer to overwrite oldest element */ + qs->tail = (qs->tail + 1) & qs->index_mask; + qs->count--; + } + + dst = &qs->elements[qs->head * qs->map.value_size]; + + switch (qs->map.value_size) { + case 1: + *(u8 *) dst = *(u8 *) value; + break; + case 2: + *(u16 *) dst = *(u16 *) value; + break; + case 4: + *(u32 *) dst = *(u32 *) value; + break; + case 8: + *(u64 *) dst = *(u64 *) value; + break; + } + + qs->head = (qs->head + 1) & qs->index_mask; + qs->count++; + +out: + raw_spin_unlock_irqrestore(&qs->lock, irq_flags); + return err; +} + +/* Called from syscall or from eBPF program */ +static int queue_stack_map_delete_elem(struct bpf_map *map, void *key) +{ + return -EINVAL; +} + +/* Called from syscall */ +static int queue_stack_map_get_next_key(struct bpf_map *map, void *key, + void *next_key) +{ + return -EINVAL; +} + +const struct bpf_map_ops queue_map_ops = { + .map_alloc_check = queue_stack_map_alloc_check, + .map_alloc = queue_stack_map_alloc, + .map_free = queue_stack_map_free, + .map_lookup_elem = queue_map_lookup_elem, + .map_lookup_and_delete_elem = queue_map_lookup_and_delete_elem, + .map_update_elem = queue_stack_map_update_elem, + .map_delete_elem = queue_stack_map_delete_elem, + .map_get_next_key = queue_stack_map_get_next_key, +}; + +const struct bpf_map_ops stack_map_ops = { + .map_alloc_check = queue_stack_map_alloc_check, + .map_alloc = queue_stack_map_alloc, + .map_free = queue_stack_map_free, + .map_lookup_elem = stack_map_lookup_elem, + .map_lookup_and_delete_elem = stack_map_lookup_and_delete_elem, + .map_update_elem = queue_stack_map_update_elem, + .map_delete_elem = queue_stack_map_delete_elem, + .map_get_next_key = queue_stack_map_get_next_key, +}; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index f4ff0c569e54..b9e005188f0e 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2369,7 +2369,10 @@ record_func_map(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta, if (func_id != BPF_FUNC_tail_call && func_id != BPF_FUNC_map_lookup_elem && func_id != BPF_FUNC_map_update_elem && - func_id != BPF_FUNC_map_delete_elem) + func_id != BPF_FUNC_map_delete_elem && + func_id != BPF_FUNC_map_push_elem && + func_id != BPF_FUNC_map_pop_elem && + func_id != BPF_FUNC_map_peek_elem) return 0; if (meta->map_ptr == NULL) { diff --git a/net/core/filter.c b/net/core/filter.c index feb578506009..c7b73376c23a 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4839,6 +4839,12 @@ bpf_base_func_proto(enum bpf_func_id func_id) return &bpf_map_update_elem_proto; case BPF_FUNC_map_delete_elem: return &bpf_map_delete_elem_proto; + case BPF_FUNC_map_push_elem: + return &bpf_map_push_elem_proto; + case BPF_FUNC_map_pop_elem: + return &bpf_map_pop_elem_proto; + case BPF_FUNC_map_peek_elem: + return &bpf_map_peek_elem_proto; case BPF_FUNC_get_prandom_u32: return &bpf_get_prandom_u32_proto; case BPF_FUNC_get_smp_processor_id: From patchwork Tue Sep 18 04:52:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Vasquez X-Patchwork-Id: 970933 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=polito.it Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 42DrGt0yytz9sCR for ; Tue, 18 Sep 2018 14:53:26 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729263AbeIRKYL (ORCPT ); Tue, 18 Sep 2018 06:24:11 -0400 Received: from fm1nodo1.polito.it ([130.192.180.16]:44169 "EHLO antispam.polito.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729244AbeIRKYK (ORCPT ); Tue, 18 Sep 2018 06:24:10 -0400 Received: from polito.it (frontmail2.polito.it [130.192.180.42]) by antispam.polito.it with ESMTP id w8I4qv1m026906-w8I4qv1o026906 (version=TLSv1.0 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=CAFAIL); Tue, 18 Sep 2018 06:52:57 +0200 X-ExtScanner: Niversoft's FindAttachments (free) Received: from [130.192.225.241] (account d040768@polito.it HELO [127.0.1.1]) by polito.it (CommuniGate Pro SMTP 6.2.5) with ESMTPSA id 79303277; Tue, 18 Sep 2018 06:52:56 +0200 Subject: [RFC PATCH bpf-next v3 5/7] bpf: restrict use of peek/push/pop From: Mauricio Vasquez B To: Alexei Starovoitov , Daniel Borkmann , netdev@vger.kernel.org Cc: Yonghong Song Date: Tue, 18 Sep 2018 06:52:56 +0200 Message-ID: <153724637653.7866.3309569197587729502.stgit@kernel> In-Reply-To: <153724634652.7866.6354309647800281793.stgit@kernel> References: <153724634652.7866.6354309647800281793.stgit@kernel> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Restrict the use of peek, push and pop helpers only to queue and stack maps. Signed-off-by: Mauricio Vasquez B --- kernel/bpf/verifier.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index b9e005188f0e..1628ffe48e32 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2084,6 +2084,13 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, if (func_id != BPF_FUNC_sk_select_reuseport) goto error; break; + case BPF_MAP_TYPE_QUEUE: + case BPF_MAP_TYPE_STACK: + if (func_id != BPF_FUNC_map_peek_elem && + func_id != BPF_FUNC_map_pop_elem && + func_id != BPF_FUNC_map_push_elem) + goto error; + break; default: break; } @@ -2139,6 +2146,13 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, if (map->map_type != BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) goto error; break; + case BPF_FUNC_map_peek_elem: + case BPF_FUNC_map_pop_elem: + case BPF_FUNC_map_push_elem: + if (map->map_type != BPF_MAP_TYPE_QUEUE && + map->map_type != BPF_MAP_TYPE_STACK) + goto error; + break; default: break; } From patchwork Tue Sep 18 04:53:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Vasquez X-Patchwork-Id: 970934 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=polito.it Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 42DrGx21ypz9sBv for ; Tue, 18 Sep 2018 14:53:29 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729268AbeIRKYP (ORCPT ); Tue, 18 Sep 2018 06:24:15 -0400 Received: from fm1nodo1.polito.it ([130.192.180.16]:44171 "EHLO antispam.polito.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727233AbeIRKYP (ORCPT ); Tue, 18 Sep 2018 06:24:15 -0400 Received: from polito.it (frontmail2.polito.it [130.192.180.42]) by antispam.polito.it with ESMTP id w8I4r2qD026932-w8I4r2qF026932 (version=TLSv1.0 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=CAFAIL); Tue, 18 Sep 2018 06:53:02 +0200 X-ExtScanner: Niversoft's FindAttachments (free) Received: from [130.192.225.241] (account d040768@polito.it HELO [127.0.1.1]) by polito.it (CommuniGate Pro SMTP 6.2.5) with ESMTPSA id 79303245; Tue, 18 Sep 2018 06:53:02 +0200 Subject: [RFC PATCH bpf-next v3 6/7] Sync uapi/bpf.h to tools/include From: Mauricio Vasquez B To: Alexei Starovoitov , Daniel Borkmann , netdev@vger.kernel.org Cc: Yonghong Song Date: Tue, 18 Sep 2018 06:53:02 +0200 Message-ID: <153724638200.7866.16341864978822421006.stgit@kernel> In-Reply-To: <153724634652.7866.6354309647800281793.stgit@kernel> References: <153724634652.7866.6354309647800281793.stgit@kernel> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Sync both files. Signed-off-by: Mauricio Vasquez B --- tools/include/uapi/linux/bpf.h | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 66917a4eba27..c899386dcb2b 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -103,6 +103,7 @@ enum bpf_cmd { BPF_BTF_LOAD, BPF_BTF_GET_FD_BY_ID, BPF_TASK_FD_QUERY, + BPF_MAP_LOOKUP_AND_DELETE_ELEM, }; enum bpf_map_type { @@ -127,6 +128,8 @@ enum bpf_map_type { BPF_MAP_TYPE_SOCKHASH, BPF_MAP_TYPE_CGROUP_STORAGE, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, + BPF_MAP_TYPE_QUEUE, + BPF_MAP_TYPE_STACK, }; enum bpf_prog_type { @@ -459,6 +462,29 @@ union bpf_attr { * Return * 0 on success, or a negative error in case of failure. * + * int bpf_map_push_elem(struct bpf_map *map, const void *value, u32 len, + * u64 flags) + * Description + * Push an element *value* in *map*. *flags* is one of: + * + * **BPF_EXIST** + * If the queue/stack is full, the oldest element is removed to + * make room for this. + * Return + * 0 on success, or a negative error in case of failure. + * + * int bpf_map_pop_elem(struct bpf_map *map, void *value, u32 len) + * Description + * Pop an element from *map*. + * Return + * 0 on success, or a negative error in case of failure. + * + * int bpf_map_peek_elem(struct bpf_map *map, void *value, u32 len) + * Description + * Get an element from *map* without removing it. + * Return + * 0 on success, or a negative error in case of failure. + * * int bpf_probe_read(void *dst, u32 size, const void *src) * Description * For tracing programs, safely attempt to read *size* bytes from @@ -2226,7 +2252,10 @@ union bpf_attr { FN(get_current_cgroup_id), \ FN(get_local_storage), \ FN(sk_select_reuseport), \ - FN(skb_ancestor_cgroup_id), + FN(skb_ancestor_cgroup_id), \ + FN(map_push_elem), \ + FN(map_pop_elem), \ + FN(map_peek_elem), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call From patchwork Tue Sep 18 04:53:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Vasquez X-Patchwork-Id: 970935 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=polito.it Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 42DrH71Tfxz9sBv for ; Tue, 18 Sep 2018 14:53:39 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729271AbeIRKYZ (ORCPT ); Tue, 18 Sep 2018 06:24:25 -0400 Received: from fm1nodo5.polito.it ([130.192.180.18]:40828 "EHLO fm1nodo5.polito.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726853AbeIRKYZ (ORCPT ); Tue, 18 Sep 2018 06:24:25 -0400 Received: from polito.it (frontmail2.polito.it [130.192.180.42]) by fm1nodo5.polito.it with ESMTP id w8I4r7as017499-w8I4r7au017499 (version=TLSv1.0 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=CAFAIL); Tue, 18 Sep 2018 06:53:07 +0200 X-ExtScanner: Niversoft's FindAttachments (free) Received: from [130.192.225.241] (account d040768@polito.it HELO [127.0.1.1]) by polito.it (CommuniGate Pro SMTP 6.2.5) with ESMTPSA id 79303282; Tue, 18 Sep 2018 06:53:07 +0200 Subject: [RFC PATCH bpf-next v3 7/7] selftests/bpf: add test cases for queue and stack maps From: Mauricio Vasquez B To: Alexei Starovoitov , Daniel Borkmann , netdev@vger.kernel.org Cc: Yonghong Song Date: Tue, 18 Sep 2018 06:53:07 +0200 Message-ID: <153724638737.7866.4029986291571667976.stgit@kernel> In-Reply-To: <153724634652.7866.6354309647800281793.stgit@kernel> References: <153724634652.7866.6354309647800281793.stgit@kernel> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Two types of tests are done: - test_maps: only userspace api. - test_progs: userspace api and ebpf helpers. Signed-off-by: Mauricio Vasquez B --- kernel/bpf/helpers.c | 2 tools/lib/bpf/bpf.c | 12 ++ tools/lib/bpf/bpf.h | 1 tools/testing/selftests/bpf/Makefile | 5 + tools/testing/selftests/bpf/bpf_helpers.h | 7 + tools/testing/selftests/bpf/test_maps.c | 130 ++++++++++++++++++++ tools/testing/selftests/bpf/test_progs.c | 99 +++++++++++++++ tools/testing/selftests/bpf/test_queue_map.c | 4 + tools/testing/selftests/bpf/test_queue_stack_map.h | 59 +++++++++ tools/testing/selftests/bpf/test_stack_map.c | 4 + 10 files changed, 321 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/bpf/test_queue_map.c create mode 100644 tools/testing/selftests/bpf/test_queue_stack_map.h create mode 100644 tools/testing/selftests/bpf/test_stack_map.c diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 5f364e6acaf1..1293cd5240e3 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -76,7 +76,7 @@ const struct bpf_func_proto bpf_map_delete_elem_proto = { .arg2_type = ARG_PTR_TO_MAP_KEY, }; -BPF_CALL_4(bpf_map_push_elem, struct bpf_map *, map, void *, value, u32 size, +BPF_CALL_4(bpf_map_push_elem, struct bpf_map *, map, void *, value, u32, size, u64, flags) { if (map->value_size != size) diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index 60aa4ca8b2c5..7056b2eb554d 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -286,6 +286,18 @@ int bpf_map_lookup_elem(int fd, const void *key, void *value) return sys_bpf(BPF_MAP_LOOKUP_ELEM, &attr, sizeof(attr)); } +int bpf_map_lookup_and_delete_elem(int fd, const void *key, const void *value) +{ + union bpf_attr attr; + + bzero(&attr, sizeof(attr)); + attr.map_fd = fd; + attr.key = ptr_to_u64(key); + attr.value = ptr_to_u64(value); + + return sys_bpf(BPF_MAP_LOOKUP_AND_DELETE_ELEM, &attr, sizeof(attr)); +} + int bpf_map_delete_elem(int fd, const void *key) { union bpf_attr attr; diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index 6f38164b2618..6134ed9517d3 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -86,6 +86,7 @@ int bpf_map_update_elem(int fd, const void *key, const void *value, __u64 flags); int bpf_map_lookup_elem(int fd, const void *key, void *value); +int bpf_map_lookup_and_delete_elem(int fd, const void *key, const void *value); int bpf_map_delete_elem(int fd, const void *key); int bpf_map_get_next_key(int fd, const void *key, void *next_key); int bpf_obj_pin(int fd, const char *pathname); diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index fff7fb1285fc..ad8a2b8fb738 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -35,7 +35,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o test_tcp_estats.o test test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \ test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o \ get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \ - test_skb_cgroup_id_kern.o + test_skb_cgroup_id_kern.o test_queue_map.o test_stack_map.o # Order correspond to 'make run_tests' order TEST_PROGS := test_kmod.sh \ @@ -110,6 +110,9 @@ CLANG_FLAGS = -I. -I./include/uapi -I../../../include/uapi \ $(OUTPUT)/test_l4lb_noinline.o: CLANG_FLAGS += -fno-inline $(OUTPUT)/test_xdp_noinline.o: CLANG_FLAGS += -fno-inline +$(OUTPUT)/test_queue_map.o: test_queue_stack_map.h +$(OUTPUT)/test_stack_map.o: test_queue_stack_map.h + BTF_LLC_PROBE := $(shell $(LLC) -march=bpf -mattr=help 2>&1 | grep dwarfris) BTF_PAHOLE_PROBE := $(shell $(BTF_PAHOLE) --help 2>&1 | grep BTF) BTF_OBJCOPY_PROBE := $(shell $(LLVM_OBJCOPY) --help 2>&1 | grep -i 'usage.*llvm') diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h index e4be7730222d..bdbe8f84023e 100644 --- a/tools/testing/selftests/bpf/bpf_helpers.h +++ b/tools/testing/selftests/bpf/bpf_helpers.h @@ -16,6 +16,13 @@ static int (*bpf_map_update_elem)(void *map, void *key, void *value, (void *) BPF_FUNC_map_update_elem; static int (*bpf_map_delete_elem)(void *map, void *key) = (void *) BPF_FUNC_map_delete_elem; +static int (*bpf_map_push_elem)(void *map, const void *value, int len, + unsigned long long flags) = + (void *) BPF_FUNC_map_push_elem; +static int (*bpf_map_pop_elem)(void *map, void *value, int len) = + (void *) BPF_FUNC_map_pop_elem; +static int (*bpf_map_peek_elem)(void *map, void *value, int len) = + (void *) BPF_FUNC_map_peek_elem; static int (*bpf_probe_read)(void *dst, int size, void *unsafe_ptr) = (void *) BPF_FUNC_probe_read; static unsigned long long (*bpf_ktime_get_ns)(void) = diff --git a/tools/testing/selftests/bpf/test_maps.c b/tools/testing/selftests/bpf/test_maps.c index 6f54f84144a0..abb21ea731fa 100644 --- a/tools/testing/selftests/bpf/test_maps.c +++ b/tools/testing/selftests/bpf/test_maps.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include @@ -471,6 +472,130 @@ static void test_devmap(int task, void *data) close(fd); } +static void test_queuemap(int task, void *data) +{ + const int MAP_SIZE = 32; + __u32 vals[MAP_SIZE + MAP_SIZE/2], val; + int fd, i; + + /* Fill test values to be used */ + for (i = 0; i < MAP_SIZE + MAP_SIZE/2; i++) + vals[i] = rand(); + + /* Invalid key size */ + fd = bpf_create_map(BPF_MAP_TYPE_QUEUE, 4, sizeof(val), MAP_SIZE, + map_flags); + assert(fd < 0 && errno == EINVAL); + + /* Invalid value size */ + fd = bpf_create_map(BPF_MAP_TYPE_QUEUE, 0, 3, MAP_SIZE,map_flags); + assert(fd < 0 && errno == EINVAL); + + fd = bpf_create_map(BPF_MAP_TYPE_QUEUE, 0, sizeof(val), MAP_SIZE, + map_flags); + /* Queue map does not support BPF_F_NO_PREALLOC */ + if (map_flags & BPF_F_NO_PREALLOC) { + assert(fd < 0 && errno == EINVAL); + return; + } + if (fd < 0) { + printf("Failed to create queuemap '%s'!\n", strerror(errno)); + exit(1); + } + + /* Push MAP_SIZE elements */ + for (i = 0; i < MAP_SIZE; i++) + assert(bpf_map_update_elem(fd, NULL, &vals[i], 0) == 0); + + /* Check that element cannot be pushed due to max_entries limit */ + assert(bpf_map_update_elem(fd, NULL, &val, 0) == -1 && + errno == E2BIG); + + /* Peek element */ + assert(bpf_map_lookup_elem(fd, NULL, &val) == 0 && val == vals[0]); + + /* Replace half elements */ + for (i = MAP_SIZE; i < MAP_SIZE + MAP_SIZE/2; i++) + assert(bpf_map_update_elem(fd, NULL, &vals[i], BPF_EXIST) == 0); + + /* Pop all elements */ + for (i = MAP_SIZE/2; i < MAP_SIZE + MAP_SIZE/2; i++) + assert(bpf_map_lookup_and_delete_elem(fd, NULL, &val) == 0 && + val == vals[i]); + + /* Check that there are not elements left */ + assert(bpf_map_lookup_and_delete_elem(fd, NULL, &val) == -1 && + errno == ENOENT); + + /* Check that non supported functions set errno to EINVAL */ + assert(bpf_map_delete_elem(fd, NULL) == -1 && errno == EINVAL); + assert(bpf_map_get_next_key(fd, NULL, NULL) == -1 && errno == EINVAL); + + close(fd); +} + +static void test_stackmap(int task, void *data) +{ + const int MAP_SIZE = 32; + __u32 vals[MAP_SIZE + MAP_SIZE/2], val; + int fd, i; + + /* Fill test values to be used */ + for (i = 0; i < MAP_SIZE + MAP_SIZE/2; i++) + vals[i] = rand(); + + /* Invalid key size */ + fd = bpf_create_map(BPF_MAP_TYPE_STACK, 4, sizeof(val), MAP_SIZE, + map_flags); + assert(fd < 0 && errno == EINVAL); + + /* Invalid value size */ + fd = bpf_create_map(BPF_MAP_TYPE_STACK, 0, 3, MAP_SIZE,map_flags); + assert(fd < 0 && errno == EINVAL); + + fd = bpf_create_map(BPF_MAP_TYPE_STACK, 0, sizeof(val), MAP_SIZE, + map_flags); + /* Stack map does not support BPF_F_NO_PREALLOC */ + if (map_flags & BPF_F_NO_PREALLOC) { + assert(fd < 0 && errno == EINVAL); + return; + } + if (fd < 0) { + printf("Failed to create stackmap '%s'!\n", strerror(errno)); + exit(1); + } + + /* Push MAP_SIZE elements */ + for (i = 0; i < MAP_SIZE; i++) + assert(bpf_map_update_elem(fd, NULL, &vals[i], 0) == 0); + + /* Check that element cannot be pushed due to max_entries limit */ + assert(bpf_map_update_elem(fd, NULL, &val, 0) == -1 && + errno == E2BIG); + + /* Peek element */ + assert(bpf_map_lookup_elem(fd, NULL, &val) == 0 && val == vals[i - 1]); + + /* Replace half elements */ + for (i = MAP_SIZE; i < MAP_SIZE + MAP_SIZE/2; i++) + assert(bpf_map_update_elem(fd, NULL, &vals[i], BPF_EXIST) == 0); + + /* Pop all elements */ + for (i = MAP_SIZE + MAP_SIZE/2 - 1; i >= MAP_SIZE/2; i--) + assert(bpf_map_lookup_and_delete_elem(fd, NULL, &val) == 0 && + val == vals[i]); + + /* Check that there are not elements left */ + assert(bpf_map_lookup_and_delete_elem(fd, NULL, &val) == -1 && + errno == ENOENT); + + /* Check that non supported functions set errno to EINVAL */ + assert(bpf_map_delete_elem(fd, NULL) == -1 && errno == EINVAL); + assert(bpf_map_get_next_key(fd, NULL, NULL) == -1 && errno == EINVAL); + + close(fd); +} + #include #include #include @@ -1430,10 +1555,15 @@ static void run_all_tests(void) test_map_wronly(); test_reuseport_array(); + + test_queuemap(0, NULL); + test_stackmap(0, NULL); } int main(void) { + srand(time(NULL)); + map_flags = 0; run_all_tests(); diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c index 63a671803ed6..a69b9e1e668d 100644 --- a/tools/testing/selftests/bpf/test_progs.c +++ b/tools/testing/selftests/bpf/test_progs.c @@ -1698,8 +1698,105 @@ static void test_task_fd_query_tp(void) "sys_enter_read"); } +enum { + QUEUE, + STACK, +}; + +static void test_queue_stack_map(int type) +{ + const int MAP_SIZE = 32; + __u32 vals[MAP_SIZE], duration, retval, size, val; + int i, err, prog_fd, map_in_fd, map_out_fd; + char file[32], buf[128]; + struct bpf_object *obj; + struct iphdr *iph = (void *)buf + sizeof(struct ethhdr); + + /* Fill test values to be used */ + for (i = 0; i < MAP_SIZE; i++) + vals[i] = rand(); + + if (type == QUEUE) + strncpy(file, "./test_queue_map.o", sizeof(file)); + else if (type == STACK) + strncpy(file, "./test_stack_map.o", sizeof(file)); + else + return; + + err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd); + if (err) { + error_cnt++; + return; + } + + map_in_fd = bpf_find_map(__func__, obj, "map_in"); + if (map_in_fd < 0) + goto out; + + map_out_fd = bpf_find_map(__func__, obj, "map_out"); + if (map_out_fd < 0) + goto out; + + /* Push 32 elements to the input map */ + for (i = 0; i < MAP_SIZE; i++) { + err = bpf_map_update_elem(map_in_fd, NULL, &vals[i], 0); + if (err) { + error_cnt++; + goto out; + } + } + + /* The eBPF program pushes iph.saddr in the output map, + * pops the input map and saves this value in iph.daddr + */ + for (i = 0; i < MAP_SIZE; i++) { + if (type == QUEUE) { + val = vals[i]; + pkt_v4.iph.saddr = vals[i] * 5; + } else if (type == STACK) { + val = vals[MAP_SIZE - 1 - i]; + pkt_v4.iph.saddr = vals[MAP_SIZE - 1 - i] * 5; + } + + err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4), + buf, &size, &retval, &duration); + if (err || retval || size != sizeof(pkt_v4) || + iph->daddr != val) + break; + } + + CHECK(err || retval || size != sizeof(pkt_v4) || iph->daddr != val, + "bpf_map_pop_elem", + "err %d errno %d retval %d size %d iph->daddr %u\n", + err, errno, retval, size, iph->daddr); + + /* Queue is empty, program should return TC_ACT_SHOT */ + err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4), + buf, &size, &retval, &duration); + CHECK(err || retval != 2 /* TC_ACT_SHOT */|| size != sizeof(pkt_v4), + "check-queue-stack-map-empty", + "err %d errno %d retval %d size %d\n", + err, errno, retval, size); + + /* Check that the program pushed elements correctly */ + for (i = 0; i < MAP_SIZE; i++) { + err = bpf_map_lookup_and_delete_elem(map_out_fd, NULL, &val); + if (err || val != vals[i] * 5) + break; + } + + CHECK(i != MAP_SIZE && (err || val != vals[i] * 5), + "bpf_map_push_elem", "err %d value %u\n", err, val); + +out: + pkt_v4.iph.saddr = 0; + bpf_object__close(obj); +} + int main(void) { + srand(time(NULL)); + jit_enabled = is_jit_enabled(); test_pkt_access(); @@ -1719,6 +1816,8 @@ int main(void) test_get_stack_raw_tp(); test_task_fd_query_rawtp(); test_task_fd_query_tp(); + test_queue_stack_map(QUEUE); + test_queue_stack_map(STACK); printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt); return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS; diff --git a/tools/testing/selftests/bpf/test_queue_map.c b/tools/testing/selftests/bpf/test_queue_map.c new file mode 100644 index 000000000000..3fdb3b9cb038 --- /dev/null +++ b/tools/testing/selftests/bpf/test_queue_map.c @@ -0,0 +1,4 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2017 Politecnico di Torino +#define MAP_TYPE BPF_MAP_TYPE_QUEUE +#include "test_queue_stack_map.h" diff --git a/tools/testing/selftests/bpf/test_queue_stack_map.h b/tools/testing/selftests/bpf/test_queue_stack_map.h new file mode 100644 index 000000000000..23a5b5d60069 --- /dev/null +++ b/tools/testing/selftests/bpf/test_queue_stack_map.h @@ -0,0 +1,59 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +// Copyright (c) 2017 Politecnico di Torino +#include +#include +#include +#include +#include +#include +#include "bpf_helpers.h" + +int _version SEC("version") = 1; + +struct bpf_map_def __attribute__ ((section("maps"), used)) map_in = { + .type = MAP_TYPE, + .key_size = 0, + .value_size = sizeof(__u32), + .max_entries = 32, + .map_flags = 0, +}; + +struct bpf_map_def __attribute__ ((section("maps"), used)) map_out = { + .type = MAP_TYPE, + .key_size = 0, + .value_size = sizeof(__u32), + .max_entries = 32, + .map_flags = 0, +}; + +SEC("test") +int _test(struct __sk_buff *skb) +{ + void *data_end = (void *)(long)skb->data_end; + void *data = (void *)(long)skb->data; + struct ethhdr *eth = (struct ethhdr *)(data); + __u32 value; + int err; + + if (eth + 1 > data_end) + return TC_ACT_SHOT; + + struct iphdr *iph = (struct iphdr *)(eth + 1); + + if (iph + 1 > data_end) + return TC_ACT_SHOT; + + err = bpf_map_pop_elem(&map_in, &value, sizeof(value)); + if (err) + return TC_ACT_SHOT; + + iph->daddr = value; + + err = bpf_map_push_elem(&map_out, &iph->saddr, sizeof(iph->saddr), 0); + if (err) + return TC_ACT_SHOT; + + return TC_ACT_OK; +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/test_stack_map.c b/tools/testing/selftests/bpf/test_stack_map.c new file mode 100644 index 000000000000..5be9159b2927 --- /dev/null +++ b/tools/testing/selftests/bpf/test_stack_map.c @@ -0,0 +1,4 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2017 Politecnico di Torino +#define MAP_TYPE BPF_MAP_TYPE_STACK +#include "test_queue_stack_map.h"