From patchwork Fri Aug 31 21:25:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Vasquez X-Patchwork-Id: 964721 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=polito.it Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 422C8s2gYVz9sC2 for ; Sat, 1 Sep 2018 07:26:21 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727555AbeIABfj (ORCPT ); Fri, 31 Aug 2018 21:35:39 -0400 Received: from fm2nodo1.polito.it ([130.192.180.17]:53700 "EHLO fm2nodo1.polito.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727234AbeIABfj (ORCPT ); Fri, 31 Aug 2018 21:35:39 -0400 Received: from polito.it (frontmail1.polito.it [130.192.180.41]) by fm2nodo1.polito.it with ESMTP id w7VLPsST011243-w7VLPsSV011243 (version=TLSv1.0 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=CAFAIL); Fri, 31 Aug 2018 23:25:55 +0200 X-ExtScanner: Niversoft's FindAttachments (free) Received: from [130.192.225.241] (account d040768@polito.it HELO [127.0.1.1]) by polito.it (CommuniGate Pro SMTP 6.2.5) with ESMTPSA id 142351981; Fri, 31 Aug 2018 23:25:54 +0200 Subject: [RFC PATCH bpf-next v2 1/4] bpf: add bpf queue and stack maps From: Mauricio Vasquez B To: Alexei Starovoitov , Daniel Borkmann , netdev@vger.kernel.org Date: Fri, 31 Aug 2018 23:25:54 +0200 Message-ID: <153575075452.30050.2166341202758341587.stgit@kernel> In-Reply-To: <153575074884.30050.17670029209466860207.stgit@kernel> References: <153575074884.30050.17670029209466860207.stgit@kernel> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Implement two new kind of maps that support the peek, push and pop operations. A use case for this is to keep track of a pool of elements, like network ports in a SNAT. Signed-off-by: Mauricio Vasquez B --- include/linux/bpf.h | 8 + include/linux/bpf_types.h | 2 include/uapi/linux/bpf.h | 36 ++++ kernel/bpf/Makefile | 2 kernel/bpf/helpers.c | 44 +++++ kernel/bpf/queue_stack_maps.c | 353 +++++++++++++++++++++++++++++++++++++++++ kernel/bpf/syscall.c | 96 +++++++++++ kernel/bpf/verifier.c | 6 + net/core/filter.c | 6 + 9 files changed, 543 insertions(+), 10 deletions(-) create mode 100644 kernel/bpf/queue_stack_maps.c diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 523481a3471b..1d39b9096d9f 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -39,6 +39,11 @@ struct bpf_map_ops { void *(*map_lookup_elem)(struct bpf_map *map, void *key); int (*map_update_elem)(struct bpf_map *map, void *key, void *value, u64 flags); int (*map_delete_elem)(struct bpf_map *map, void *key); + void *(*map_lookup_and_delete_elem)(struct bpf_map *map, void *key); + + /* funcs callable from eBPF programs */ + void *(*map_lookup_or_init_elem)(struct bpf_map *map, void *key, + void *value); /* funcs called by prog_array and perf_event_array map */ void *(*map_fd_get_ptr)(struct bpf_map *map, struct file *map_file, @@ -806,6 +811,9 @@ static inline int bpf_fd_reuseport_array_update_elem(struct bpf_map *map, extern const struct bpf_func_proto bpf_map_lookup_elem_proto; extern const struct bpf_func_proto bpf_map_update_elem_proto; extern const struct bpf_func_proto bpf_map_delete_elem_proto; +extern const struct bpf_func_proto bpf_map_push_elem_proto; +extern const struct bpf_func_proto bpf_map_pop_elem_proto; +extern const struct bpf_func_proto bpf_map_peek_elem_proto; extern const struct bpf_func_proto bpf_get_prandom_u32_proto; extern const struct bpf_func_proto bpf_get_smp_processor_id_proto; diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index cd26c090e7c0..8d955f11f1cd 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -67,3 +67,5 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_XSKMAP, xsk_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, reuseport_array_ops) #endif #endif +BPF_MAP_TYPE(BPF_MAP_TYPE_QUEUE, queue_map_ops) +BPF_MAP_TYPE(BPF_MAP_TYPE_STACK, queue_map_ops) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 66917a4eba27..0a5b904ba42f 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -103,6 +103,7 @@ enum bpf_cmd { BPF_BTF_LOAD, BPF_BTF_GET_FD_BY_ID, BPF_TASK_FD_QUERY, + BPF_MAP_LOOKUP_AND_DELETE_ELEM, }; enum bpf_map_type { @@ -127,6 +128,8 @@ enum bpf_map_type { BPF_MAP_TYPE_SOCKHASH, BPF_MAP_TYPE_CGROUP_STORAGE, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, + BPF_MAP_TYPE_QUEUE, + BPF_MAP_TYPE_STACK, }; enum bpf_prog_type { @@ -459,6 +462,28 @@ union bpf_attr { * Return * 0 on success, or a negative error in case of failure. * + * int bpf_map_push_elem(struct bpf_map *map, const void *value, u64 flags) + * Description + * Push an element *value* in *map*. *flags* is one of: + * + * **BPF_EXIST** + * If the queue/stack is full, the oldest element is removed to + * make room for this. + * Return + * 0 on success, or a negative error in case of failure. + * + * void *bpf_map_pop_elem(struct bpf_map *map) + * Description + * Pop an element from *map*. + * Return + * Pointer to the element of *NULL* if there is not any. + * + * void *bpf_map_peek_elem(struct bpf_map *map) + * Description + * Return an element from *map* without removing it. + * Return + * Pointer to the element of *NULL* if there is not any. + * * int bpf_probe_read(void *dst, u32 size, const void *src) * Description * For tracing programs, safely attempt to read *size* bytes from @@ -786,14 +811,14 @@ union bpf_attr { * * int ret; * struct bpf_tunnel_key key = {}; - * + * * ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0); * if (ret < 0) * return TC_ACT_SHOT; // drop packet - * + * * if (key.remote_ipv4 != 0x0a000001) * return TC_ACT_SHOT; // drop packet - * + * * return TC_ACT_OK; // accept packet * * This interface can also be used with all encapsulation devices @@ -2226,7 +2251,10 @@ union bpf_attr { FN(get_current_cgroup_id), \ FN(get_local_storage), \ FN(sk_select_reuseport), \ - FN(skb_ancestor_cgroup_id), + FN(skb_ancestor_cgroup_id), \ + FN(map_push_elem), \ + FN(map_pop_elem), \ + FN(map_peek_elem), /* integer value in 'imm' field of BPF_CALL instruction selects which helper * function eBPF program intends to call diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 0488b8258321..17afae9e65f3 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -3,7 +3,7 @@ obj-y := core.o obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o -obj-$(CONFIG_BPF_SYSCALL) += local_storage.o +obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o obj-$(CONFIG_BPF_SYSCALL) += disasm.o obj-$(CONFIG_BPF_SYSCALL) += btf.o ifeq ($(CONFIG_NET),y) diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 1991466b8327..c1ff567b69f1 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -76,6 +76,50 @@ const struct bpf_func_proto bpf_map_delete_elem_proto = { .arg2_type = ARG_PTR_TO_MAP_KEY, }; +BPF_CALL_3(bpf_map_push_elem, struct bpf_map *, map, void *, value, u64, flags) +{ + WARN_ON_ONCE(!rcu_read_lock_held()); + return map->ops->map_update_elem(map, NULL, value, flags); +} + +const struct bpf_func_proto bpf_map_push_elem_proto = { + .func = bpf_map_push_elem, + .gpl_only = false, + .pkt_access = true, + .ret_type = RET_INTEGER, + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_PTR_TO_MAP_VALUE, + .arg3_type = ARG_ANYTHING, +}; + +BPF_CALL_1(bpf_map_pop_elem, struct bpf_map *, map) +{ + WARN_ON_ONCE(!rcu_read_lock_held()); + return (unsigned long) map->ops->map_lookup_and_delete_elem(map, NULL); +} + +const struct bpf_func_proto bpf_map_pop_elem_proto = { + .func = bpf_map_pop_elem, + .gpl_only = false, + .pkt_access = true, + .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL, + .arg1_type = ARG_CONST_MAP_PTR, +}; + +BPF_CALL_1(bpf_map_peek_elem, struct bpf_map *, map) +{ + WARN_ON_ONCE(!rcu_read_lock_held()); + return (unsigned long) map->ops->map_lookup_elem(map, NULL); +} + +const struct bpf_func_proto bpf_map_peek_elem_proto = { + .func = bpf_map_pop_elem, + .gpl_only = false, + .pkt_access = true, + .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL, + .arg1_type = ARG_CONST_MAP_PTR, +}; + const struct bpf_func_proto bpf_get_prandom_u32_proto = { .func = bpf_user_rnd_u32, .gpl_only = false, diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c new file mode 100644 index 000000000000..97e652bf6df5 --- /dev/null +++ b/kernel/bpf/queue_stack_maps.c @@ -0,0 +1,353 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * queue_stack_maps.c: BPF queue and stack maps + * + * Copyright (c) 2018 Politecnico di Torino + */ +#include +#include +#include +#include "percpu_freelist.h" + +#define QUEUE_STACK_CREATE_FLAG_MASK \ + (BPF_F_NO_PREALLOC | BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY) + +enum queue_type { + QUEUE, + STACK, +}; + +struct bpf_queue { + struct bpf_map map; + struct list_head head; + struct pcpu_freelist freelist; + void *nodes; + enum queue_type type; + raw_spinlock_t lock; + atomic_t count; + u32 node_size; +}; + +struct queue_node { + struct pcpu_freelist_node fnode; + struct bpf_queue *queue; + struct list_head list; + struct rcu_head rcu; + char element[0] __aligned(8); +}; + +static bool queue_map_is_prealloc(struct bpf_queue *queue) +{ + return !(queue->map.map_flags & BPF_F_NO_PREALLOC); +} + +/* Called from syscall */ +static int queue_map_alloc_check(union bpf_attr *attr) +{ + /* check sanity of attributes */ + if (attr->max_entries == 0 || attr->key_size != 0 || + attr->value_size == 0 || + attr->map_flags & ~QUEUE_STACK_CREATE_FLAG_MASK) + return -EINVAL; + + if (attr->value_size > KMALLOC_MAX_SIZE) + /* if value_size is bigger, the user space won't be able to + * access the elements. + */ + return -E2BIG; + + return 0; +} + +static int prealloc_init(struct bpf_queue *queue) +{ + u32 node_size = sizeof(struct queue_node) + + round_up(queue->map.value_size, 8); + u32 num_entries = queue->map.max_entries; + int err; + + queue->nodes = bpf_map_area_alloc(node_size * num_entries, + queue->map.numa_node); + if (!queue->nodes) + return -ENOMEM; + + err = pcpu_freelist_init(&queue->freelist); + if (err) + goto free_nodes; + + pcpu_freelist_populate(&queue->freelist, + queue->nodes + + offsetof(struct queue_node, fnode), + node_size, num_entries); + + return 0; + +free_nodes: + bpf_map_area_free(queue->nodes); + return err; +} + +static void prealloc_destroy(struct bpf_queue *queue) +{ + bpf_map_area_free(queue->nodes); + pcpu_freelist_destroy(&queue->freelist); +} + +static struct bpf_map *queue_map_alloc(union bpf_attr *attr) +{ + struct bpf_queue *queue; + u64 cost = sizeof(*queue); + int ret; + + queue = kzalloc(sizeof(*queue), GFP_USER); + if (!queue) + return ERR_PTR(-ENOMEM); + + bpf_map_init_from_attr(&queue->map, attr); + + queue->node_size = sizeof(struct queue_node) + + round_up(attr->value_size, 8); + cost += (u64) attr->max_entries * queue->node_size; + if (cost >= U32_MAX - PAGE_SIZE) { + ret = -E2BIG; + goto free_queue; + } + + queue->map.pages = round_up(cost, PAGE_SIZE) >> PAGE_SHIFT; + + ret = bpf_map_precharge_memlock(queue->map.pages); + if (ret) + goto free_queue; + + INIT_LIST_HEAD(&queue->head); + + raw_spin_lock_init(&queue->lock); + + if (queue->map.map_type == BPF_MAP_TYPE_QUEUE) + queue->type = QUEUE; + else if (queue->map.map_type == BPF_MAP_TYPE_STACK) + queue->type = STACK; + + if (queue_map_is_prealloc(queue)) + ret = prealloc_init(queue); + if (ret) + goto free_queue; + + return &queue->map; + +free_queue: + kfree(queue); + return ERR_PTR(ret); +} + +/* Called when map->refcnt goes to zero, either from workqueue or from syscall */ +static void queue_map_free(struct bpf_map *map) +{ + struct bpf_queue *queue = container_of(map, struct bpf_queue, map); + struct queue_node *l; + + /* at this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0, + * so the programs (can be more than one that used this map) were + * disconnected from events. Wait for outstanding critical sections in + * these programs to complete + */ + synchronize_rcu(); + + /* some of queue_elem_free_rcu() callbacks for elements of this map may + * not have executed. Wait for them. + */ + rcu_barrier(); + if (!queue_map_is_prealloc(queue)) + list_for_each_entry(l, &queue->head, list) { + list_del(&l->list); + kfree(l); + } + else + prealloc_destroy(queue); + kfree(queue); +} + +static void queue_elem_free_rcu(struct rcu_head *head) +{ + struct queue_node *l = container_of(head, struct queue_node, rcu); + struct bpf_queue *queue = l->queue; + + /* must increment bpf_prog_active to avoid kprobe+bpf triggering while + * we're calling kfree, otherwise deadlock is possible if kprobes + * are placed somewhere inside of slub + */ + preempt_disable(); + __this_cpu_inc(bpf_prog_active); + if (queue_map_is_prealloc(queue)) + pcpu_freelist_push(&queue->freelist, &l->fnode); + else + kfree(l); + __this_cpu_dec(bpf_prog_active); + preempt_enable(); +} + +static void *__queue_map_lookup(struct bpf_map *map, bool delete) +{ + struct bpf_queue *queue = container_of(map, struct bpf_queue, map); + unsigned long flags; + struct queue_node *node; + + raw_spin_lock_irqsave(&queue->lock, flags); + + if (list_empty(&queue->head)) { + raw_spin_unlock_irqrestore(&queue->lock, flags); + return NULL; + } + + node = list_first_entry(&queue->head, struct queue_node, list); + + if (delete) { + if (!queue_map_is_prealloc(queue)) + atomic_dec(&queue->count); + + list_del(&node->list); + call_rcu(&node->rcu, queue_elem_free_rcu); + } + + raw_spin_unlock_irqrestore(&queue->lock, flags); + return &node->element; +} + +/* Called from syscall or from eBPF program */ +static void *queue_map_lookup_elem(struct bpf_map *map, void *key) +{ + return __queue_map_lookup(map, false); +} + +/* Called from syscall or from eBPF program */ +static void *queue_map_lookup_and_delete_elem(struct bpf_map *map, void *key) +{ + return __queue_map_lookup(map, true); +} + +static struct queue_node *queue_map_delete_oldest_node(struct bpf_queue *queue) +{ + struct queue_node *node = NULL; + unsigned long irq_flags; + + raw_spin_lock_irqsave(&queue->lock, irq_flags); + + if (list_empty(&queue->head)) + goto out; + + switch (queue->type) { + case QUEUE: + node = list_first_entry(&queue->head, struct queue_node, list); + break; + case STACK: + node = list_last_entry(&queue->head, struct queue_node, list); + break; + default: + goto out; + } + + list_del(&node->list); +out: + raw_spin_unlock_irqrestore(&queue->lock, irq_flags); + return node; +} + +/* Called from syscall or from eBPF program */ +static int queue_map_update_elem(struct bpf_map *map, void *key, void *value, + u64 flags) +{ + struct bpf_queue *queue = container_of(map, struct bpf_queue, map); + unsigned long irq_flags; + struct queue_node *new; + /* BPF_EXIST is used to force making room for a new element in case the + * map is full + */ + bool replace = (flags & BPF_EXIST); + + /* Check supported flags for queue and stack maps */ + if (flags & BPF_NOEXIST || flags > BPF_EXIST) + return -EINVAL; + +again: + if (!queue_map_is_prealloc(queue)) { + if (atomic_inc_return(&queue->count) > queue->map.max_entries) { + atomic_dec(&queue->count); + if (!replace) + return -E2BIG; + new = queue_map_delete_oldest_node(queue); + /* It is possible that in the meanwhile the queue/stack + * became empty and there was not an 'oldest' element + * to delete. In that case, try again + */ + if (!new) + goto again; + } else { + new = kmalloc_node(queue->node_size, + GFP_ATOMIC | __GFP_NOWARN, + queue->map.numa_node); + if (!new) { + atomic_dec(&queue->count); + return -ENOMEM; + } + } + } else { + struct pcpu_freelist_node *l; + + l = pcpu_freelist_pop(&queue->freelist); + if (!l) { + if (!replace) + return -E2BIG; + new = queue_map_delete_oldest_node(queue); + if (!new) + /* TODO: This should goto again, but this causes an + * infinite loop when the elements are not being + * returned to the free list by the + * queue_elem_free_rcu() callback + */ + return -ENOMEM; + } else + new = container_of(l, struct queue_node, fnode); + } + + memcpy(new->element, value, queue->map.value_size); + new->queue = queue; + + raw_spin_lock_irqsave(&queue->lock, irq_flags); + switch (queue->type) { + case QUEUE: + list_add_tail(&new->list, &queue->head); + break; + + case STACK: + list_add(&new->list, &queue->head); + break; + } + raw_spin_unlock_irqrestore(&queue->lock, irq_flags); + + return 0; +} + +/* Called from syscall or from eBPF program */ +static int queue_map_delete_elem(struct bpf_map *map, void *key) +{ + return -EINVAL; +} + +/* Called from syscall */ +static int queue_map_get_next_key(struct bpf_map *map, void *key, + void *next_key) +{ + return -EINVAL; +} + +const struct bpf_map_ops queue_map_ops = { + .map_alloc_check = queue_map_alloc_check, + .map_alloc = queue_map_alloc, + .map_free = queue_map_free, + .map_lookup_elem = queue_map_lookup_elem, + .map_lookup_and_delete_elem = queue_map_lookup_and_delete_elem, + .map_update_elem = queue_map_update_elem, + .map_delete_elem = queue_map_delete_elem, + .map_get_next_key = queue_map_get_next_key, +}; + diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 8339d81cba1d..7703795cb4e4 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -652,6 +652,17 @@ int __weak bpf_stackmap_copy(struct bpf_map *map, void *key, void *value) return -ENOTSUPP; } +static void *__bpf_copy_key(void *ukey, u64 key_size) +{ + if (key_size) + return memdup_user(ukey, key_size); + + if (ukey) + return ERR_PTR(-EINVAL); + + return NULL; +} + /* last field in 'union bpf_attr' used by this command */ #define BPF_MAP_LOOKUP_ELEM_LAST_FIELD value @@ -679,7 +690,7 @@ static int map_lookup_elem(union bpf_attr *attr) goto err_put; } - key = memdup_user(ukey, map->key_size); + key = __bpf_copy_key(ukey, map->key_size); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; @@ -767,7 +778,7 @@ static int map_update_elem(union bpf_attr *attr) goto err_put; } - key = memdup_user(ukey, map->key_size); + key = __bpf_copy_key(ukey, map->key_size); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; @@ -865,7 +876,7 @@ static int map_delete_elem(union bpf_attr *attr) goto err_put; } - key = memdup_user(ukey, map->key_size); + key = __bpf_copy_key(ukey, map->key_size); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; @@ -917,7 +928,7 @@ static int map_get_next_key(union bpf_attr *attr) } if (ukey) { - key = memdup_user(ukey, map->key_size); + key = __bpf_copy_key(ukey, map->key_size); if (IS_ERR(key)) { err = PTR_ERR(key); goto err_put; @@ -958,6 +969,80 @@ static int map_get_next_key(union bpf_attr *attr) return err; } +#define BPF_MAP_LOOKUP_AND_DELETE_ELEM_LAST_FIELD value + +static int map_lookup_and_delete_elem(union bpf_attr *attr) +{ + void __user *ukey = u64_to_user_ptr(attr->key); + void __user *uvalue = u64_to_user_ptr(attr->value); + int ufd = attr->map_fd; + struct bpf_map *map; + void *key, *value, *ptr; + u32 value_size; + struct fd f; + int err; + + if (CHECK_ATTR(BPF_MAP_LOOKUP_ELEM)) + return -EINVAL; + + f = fdget(ufd); + map = __bpf_map_get(f); + if (IS_ERR(map)) + return PTR_ERR(map); + + if (!(f.file->f_mode & FMODE_CAN_WRITE)) { + err = -EPERM; + goto err_put; + } + + key = __bpf_copy_key(ukey, map->key_size); + if (IS_ERR(key)) { + err = PTR_ERR(key); + goto err_put; + } + + value_size = map->value_size; + + err = -ENOMEM; + value = kmalloc(value_size, GFP_USER | __GFP_NOWARN); + if (!value) + goto free_key; + + err = -EFAULT; + if (copy_from_user(value, uvalue, value_size) != 0) + goto free_value; + + /* must increment bpf_prog_active to avoid kprobe+bpf triggering from + * inside bpf map update or delete otherwise deadlocks are possible + */ + preempt_disable(); + __this_cpu_inc(bpf_prog_active); + rcu_read_lock(); + ptr = map->ops->map_lookup_and_delete_elem(map, key); + if (ptr) + memcpy(value, ptr, value_size); + rcu_read_unlock(); + err = ptr ? 0 : -ENOENT; + __this_cpu_dec(bpf_prog_active); + preempt_enable(); + + if (err) + goto free_value; + + if (copy_to_user(uvalue, value, value_size) != 0) + goto free_value; + + err = 0; + +free_value: + kfree(value); +free_key: + kfree(key); +err_put: + fdput(f); + return err; +} + static const struct bpf_prog_ops * const bpf_prog_types[] = { #define BPF_PROG_TYPE(_id, _name) \ [_id] = & _name ## _prog_ops, @@ -2418,6 +2503,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz case BPF_TASK_FD_QUERY: err = bpf_task_fd_query(&attr, uattr); break; + case BPF_MAP_LOOKUP_AND_DELETE_ELEM: + err = map_lookup_and_delete_elem(&attr); + break; default: err = -EINVAL; break; diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index ca90679a7fe5..5bd67feb2f07 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2035,6 +2035,7 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 regno, verbose(env, "invalid map_ptr to access map->key\n"); return -EACCES; } + err = check_helper_mem_access(env, regno, meta->map_ptr->key_size, false, NULL); @@ -2454,7 +2455,10 @@ record_func_map(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta, if (func_id != BPF_FUNC_tail_call && func_id != BPF_FUNC_map_lookup_elem && func_id != BPF_FUNC_map_update_elem && - func_id != BPF_FUNC_map_delete_elem) + func_id != BPF_FUNC_map_delete_elem && + func_id != BPF_FUNC_map_push_elem && + func_id != BPF_FUNC_map_pop_elem && + func_id != BPF_FUNC_map_peek_elem) return 0; if (meta->map_ptr == NULL) { diff --git a/net/core/filter.c b/net/core/filter.c index fd423ce3da34..e7860914ffc7 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4830,6 +4830,12 @@ bpf_base_func_proto(enum bpf_func_id func_id) return &bpf_map_update_elem_proto; case BPF_FUNC_map_delete_elem: return &bpf_map_delete_elem_proto; + case BPF_FUNC_map_push_elem: + return &bpf_map_push_elem_proto; + case BPF_FUNC_map_pop_elem: + return &bpf_map_pop_elem_proto; + case BPF_FUNC_map_peek_elem: + return &bpf_map_peek_elem_proto; case BPF_FUNC_get_prandom_u32: return &bpf_get_prandom_u32_proto; case BPF_FUNC_get_smp_processor_id: From patchwork Fri Aug 31 21:26:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mauricio Vasquez X-Patchwork-Id: 964720 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=polito.it Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 422C8k2cqBz9sBn for ; Sat, 1 Sep 2018 07:26:14 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727536AbeIABfc (ORCPT ); Fri, 31 Aug 2018 21:35:32 -0400 Received: from fm2nodo5.polito.it ([130.192.180.19]:60817 "EHLO fm2nodo5.polito.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727234AbeIABfc (ORCPT ); Fri, 31 Aug 2018 21:35:32 -0400 Received: from polito.it (frontmail1.polito.it [130.192.180.41]) by fm2nodo5.polito.it with ESMTP id w7VLQ0s2017955-w7VLQ0s4017955 (version=TLSv1.0 cipher=ECDHE-RSA-AES256-SHA bits=256 verify=CAFAIL); Fri, 31 Aug 2018 23:26:00 +0200 X-ExtScanner: Niversoft's FindAttachments (free) Received: from [130.192.225.241] (account d040768@polito.it HELO [127.0.1.1]) by polito.it (CommuniGate Pro SMTP 6.2.5) with ESMTPSA id 142351986; Fri, 31 Aug 2018 23:26:00 +0200 Subject: [RFC PATCH bpf-next v2 2/4] bpf: restrict use of peek/push/pop From: Mauricio Vasquez B To: Alexei Starovoitov , Daniel Borkmann , netdev@vger.kernel.org Date: Fri, 31 Aug 2018 23:26:00 +0200 Message-ID: <153575075998.30050.14181063558477405003.stgit@kernel> In-Reply-To: <153575074884.30050.17670029209466860207.stgit@kernel> References: <153575074884.30050.17670029209466860207.stgit@kernel> User-Agent: StGit/unknown-version MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Restrict the use of peek, push and pop helpers only to queue and stack maps. Signed-off-by: Mauricio Vasquez B --- kernel/bpf/verifier.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 5bd67feb2f07..9e177ff4a3b9 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2172,6 +2172,13 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, if (func_id != BPF_FUNC_sk_select_reuseport) goto error; break; + case BPF_MAP_TYPE_QUEUE: + case BPF_MAP_TYPE_STACK: + if (func_id != BPF_FUNC_map_peek_elem && + func_id != BPF_FUNC_map_pop_elem && + func_id != BPF_FUNC_map_push_elem) + goto error; + break; default: break; } @@ -2227,6 +2234,13 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, if (map->map_type != BPF_MAP_TYPE_REUSEPORT_SOCKARRAY) goto error; break; + case BPF_FUNC_map_peek_elem: + case BPF_FUNC_map_pop_elem: + case BPF_FUNC_map_push_elem: + if (map->map_type != BPF_MAP_TYPE_QUEUE && + map->map_type != BPF_MAP_TYPE_STACK) + goto error; + break; default: break; }