From patchwork Tue Aug 25 18:29:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: KP Singh X-Patchwork-Id: 1351220 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=chromium.org header.i=@chromium.org header.a=rsa-sha256 header.s=google header.b=WgNxaxWB; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4BbcwN2w3Wz9sTN for ; Wed, 26 Aug 2020 04:29:40 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726682AbgHYS3g (ORCPT ); Tue, 25 Aug 2020 14:29:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726541AbgHYS31 (ORCPT ); Tue, 25 Aug 2020 14:29:27 -0400 Received: from mail-ej1-x643.google.com (mail-ej1-x643.google.com [IPv6:2a00:1450:4864:20::643]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 54CA7C061757 for ; Tue, 25 Aug 2020 11:29:26 -0700 (PDT) Received: by mail-ej1-x643.google.com with SMTP id md23so17106855ejb.6 for ; Tue, 25 Aug 2020 11:29:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=Y9LyXb20slEI8Y2fUXGX1+GKsYUy3ll/VjJxp8ah7vM=; b=WgNxaxWBiap+g7lNmsjVM6PFbRN4wMpb/dGuziVnJrMjNQRGV7oaBLwUrG0CVbhYtQ 3lan2QADv2VTh1ILqHB138UGYXC0HUn0mNfozyBYlGGeIV2muYeMQU11wWRrseph49I7 Ed7hQyJmvFnltaxBoWZ8USE0HbVyH4JHPBIcA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Y9LyXb20slEI8Y2fUXGX1+GKsYUy3ll/VjJxp8ah7vM=; b=subpF38g0xA2ttprv0+XAYC0mdvwGVzdl9vEq446OqvwsAqON8J6r42hIirkGCkZ85 Rt/uvXl4hlz7tr0Zj12Sf7tlXABXscoX4AAqbz0RJWMMq57qPpdw99xJhjOHcAVB2Db2 VXtPk+o6Kgqhy8zlvPnLdrWzXtutd4csjlEzHkBiXUxUvi+ZoWdllPDGBjAEu3Vo3EXZ DKAGckkZF9AuZH5lO/RNJm3BK6zAZ75E9XseZpFIUQHFl++dWAm24maCvHSrFc6/Sd3b JSMLMKD6fiRH/1vxfFkug247wTRWuvzDZ6exemNqp3Wze2Imm5TK8+V+W/z4yIXf2Mht dIIg== X-Gm-Message-State: AOAM5309gBIc0eIJEWo7kpY8lDcnPrdXYoSyND4KbbYWWxU9lI2h4hUd lcwxwFqMyJ6U6veRSZnBbmrEkA== X-Google-Smtp-Source: ABdhPJxnFa0BvioGQNMfQa+q97CDznZPV2IsDtPiY6TbCAa1+CVlfv4KFEX45hxDX7MaxWOkFI+EtQ== X-Received: by 2002:a17:906:4e03:: with SMTP id z3mr11527949eju.503.1598380164608; Tue, 25 Aug 2020 11:29:24 -0700 (PDT) Received: from kpsingh.zrh.corp.google.com ([81.6.44.51]) by smtp.gmail.com with ESMTPSA id dr21sm15323286ejc.112.2020.08.25.11.29.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Aug 2020 11:29:24 -0700 (PDT) From: KP Singh To: linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-security-module@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Daniel Borkmann , Paul Turner , Jann Horn , Florent Revest Subject: [PATCH bpf-next v10 1/7] bpf: Renames in preparation for bpf_local_storage Date: Tue, 25 Aug 2020 20:29:13 +0200 Message-Id: <20200825182919.1118197-2-kpsingh@chromium.org> X-Mailer: git-send-email 2.28.0.297.g1956fa8f8d-goog In-Reply-To: <20200825182919.1118197-1-kpsingh@chromium.org> References: <20200825182919.1118197-1-kpsingh@chromium.org> MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org From: KP Singh A purely mechanical change to split the renaming from the actual generalization. Flags/consts: SK_STORAGE_CREATE_FLAG_MASK BPF_LOCAL_STORAGE_CREATE_FLAG_MASK BPF_SK_STORAGE_CACHE_SIZE BPF_LOCAL_STORAGE_CACHE_SIZE MAX_VALUE_SIZE BPF_LOCAL_STORAGE_MAX_VALUE_SIZE Structs: bucket bpf_local_storage_map_bucket bpf_sk_storage_map bpf_local_storage_map bpf_sk_storage_data bpf_local_storage_data bpf_sk_storage_elem bpf_local_storage_elem bpf_sk_storage bpf_local_storage The "sk" member in bpf_local_storage is also updated to "owner" in preparation for changing the type to void * in a subsequent patch. Functions: selem_linked_to_sk selem_linked_to_storage selem_alloc bpf_selem_alloc __selem_unlink_sk bpf_selem_unlink_storage_nolock __selem_link_sk bpf_selem_link_storage_nolock selem_unlink_sk __bpf_selem_unlink_storage sk_storage_update bpf_local_storage_update __sk_storage_lookup bpf_local_storage_lookup bpf_sk_storage_map_free bpf_local_storage_map_free bpf_sk_storage_map_alloc bpf_local_storage_map_alloc bpf_sk_storage_map_alloc_check bpf_local_storage_map_alloc_check bpf_sk_storage_map_check_btf bpf_local_storage_map_check_btf Acked-by: Martin KaFai Lau Signed-off-by: KP Singh --- include/net/sock.h | 4 +- net/core/bpf_sk_storage.c | 488 +++++++++++++++++++------------------- 2 files changed, 252 insertions(+), 240 deletions(-) diff --git a/include/net/sock.h b/include/net/sock.h index 064637d1ddf6..18423cc9cde8 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -246,7 +246,7 @@ struct sock_common { /* public: */ }; -struct bpf_sk_storage; +struct bpf_local_storage; /** * struct sock - network layer representation of sockets @@ -517,7 +517,7 @@ struct sock { void (*sk_destruct)(struct sock *sk); struct sock_reuseport __rcu *sk_reuseport_cb; #ifdef CONFIG_BPF_SYSCALL - struct bpf_sk_storage __rcu *sk_bpf_storage; + struct bpf_local_storage __rcu *sk_bpf_storage; #endif struct rcu_head sk_rcu; }; diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c index 281200dc0a01..f975e2d01207 100644 --- a/net/core/bpf_sk_storage.c +++ b/net/core/bpf_sk_storage.c @@ -12,33 +12,32 @@ #include #include -#define SK_STORAGE_CREATE_FLAG_MASK \ - (BPF_F_NO_PREALLOC | BPF_F_CLONE) +#define BPF_LOCAL_STORAGE_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_CLONE) -struct bucket { +struct bpf_local_storage_map_bucket { struct hlist_head list; raw_spinlock_t lock; }; -/* Thp map is not the primary owner of a bpf_sk_storage_elem. - * Instead, the sk->sk_bpf_storage is. +/* Thp map is not the primary owner of a bpf_local_storage_elem. + * Instead, the container object (eg. sk->sk_bpf_storage) is. * - * The map (bpf_sk_storage_map) is for two purposes - * 1. Define the size of the "sk local storage". It is + * The map (bpf_local_storage_map) is for two purposes + * 1. Define the size of the "local storage". It is * the map's value_size. * * 2. Maintain a list to keep track of all elems such * that they can be cleaned up during the map destruction. * * When a bpf local storage is being looked up for a - * particular sk, the "bpf_map" pointer is actually used + * particular object, the "bpf_map" pointer is actually used * as the "key" to search in the list of elem in - * sk->sk_bpf_storage. + * the respective bpf_local_storage owned by the object. * - * Hence, consider sk->sk_bpf_storage is the mini-map - * with the "bpf_map" pointer as the searching key. + * e.g. sk->sk_bpf_storage is the mini-map with the "bpf_map" pointer + * as the searching key. */ -struct bpf_sk_storage_map { +struct bpf_local_storage_map { struct bpf_map map; /* Lookup elem does not require accessing the map. * @@ -46,55 +45,57 @@ struct bpf_sk_storage_map { * link/unlink the elem from the map. Having * multiple buckets to improve contention. */ - struct bucket *buckets; + struct bpf_local_storage_map_bucket *buckets; u32 bucket_log; u16 elem_size; u16 cache_idx; }; -struct bpf_sk_storage_data { +struct bpf_local_storage_data { /* smap is used as the searching key when looking up - * from sk->sk_bpf_storage. + * from the object's bpf_local_storage. * * Put it in the same cacheline as the data to minimize * the number of cachelines access during the cache hit case. */ - struct bpf_sk_storage_map __rcu *smap; + struct bpf_local_storage_map __rcu *smap; u8 data[] __aligned(8); }; -/* Linked to bpf_sk_storage and bpf_sk_storage_map */ -struct bpf_sk_storage_elem { - struct hlist_node map_node; /* Linked to bpf_sk_storage_map */ - struct hlist_node snode; /* Linked to bpf_sk_storage */ - struct bpf_sk_storage __rcu *sk_storage; +/* Linked to bpf_local_storage and bpf_local_storage_map */ +struct bpf_local_storage_elem { + struct hlist_node map_node; /* Linked to bpf_local_storage_map */ + struct hlist_node snode; /* Linked to bpf_local_storage */ + struct bpf_local_storage __rcu *local_storage; struct rcu_head rcu; /* 8 bytes hole */ /* The data is stored in aother cacheline to minimize * the number of cachelines access during a cache hit. */ - struct bpf_sk_storage_data sdata ____cacheline_aligned; + struct bpf_local_storage_data sdata ____cacheline_aligned; }; -#define SELEM(_SDATA) container_of((_SDATA), struct bpf_sk_storage_elem, sdata) +#define SELEM(_SDATA) \ + container_of((_SDATA), struct bpf_local_storage_elem, sdata) #define SDATA(_SELEM) (&(_SELEM)->sdata) -#define BPF_SK_STORAGE_CACHE_SIZE 16 +#define BPF_LOCAL_STORAGE_CACHE_SIZE 16 static DEFINE_SPINLOCK(cache_idx_lock); -static u64 cache_idx_usage_counts[BPF_SK_STORAGE_CACHE_SIZE]; +static u64 cache_idx_usage_counts[BPF_LOCAL_STORAGE_CACHE_SIZE]; -struct bpf_sk_storage { - struct bpf_sk_storage_data __rcu *cache[BPF_SK_STORAGE_CACHE_SIZE]; - struct hlist_head list; /* List of bpf_sk_storage_elem */ - struct sock *sk; /* The sk that owns the the above "list" of - * bpf_sk_storage_elem. +struct bpf_local_storage { + struct bpf_local_storage_data __rcu *cache[BPF_LOCAL_STORAGE_CACHE_SIZE]; + struct hlist_head list; /* List of bpf_local_storage_elem */ + struct sock *owner; /* The object that owns the above "list" of + * bpf_local_storage_elem. */ struct rcu_head rcu; raw_spinlock_t lock; /* Protect adding/removing from the "list" */ }; -static struct bucket *select_bucket(struct bpf_sk_storage_map *smap, - struct bpf_sk_storage_elem *selem) +static struct bpf_local_storage_map_bucket * +select_bucket(struct bpf_local_storage_map *smap, + struct bpf_local_storage_elem *selem) { return &smap->buckets[hash_ptr(selem, smap->bucket_log)]; } @@ -111,21 +112,21 @@ static int omem_charge(struct sock *sk, unsigned int size) return -ENOMEM; } -static bool selem_linked_to_sk(const struct bpf_sk_storage_elem *selem) +static bool selem_linked_to_storage(const struct bpf_local_storage_elem *selem) { return !hlist_unhashed(&selem->snode); } -static bool selem_linked_to_map(const struct bpf_sk_storage_elem *selem) +static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem) { return !hlist_unhashed(&selem->map_node); } -static struct bpf_sk_storage_elem *selem_alloc(struct bpf_sk_storage_map *smap, - struct sock *sk, void *value, - bool charge_omem) +static struct bpf_local_storage_elem * +bpf_selem_alloc(struct bpf_local_storage_map *smap, struct sock *sk, + void *value, bool charge_omem) { - struct bpf_sk_storage_elem *selem; + struct bpf_local_storage_elem *selem; if (charge_omem && omem_charge(sk, smap->elem_size)) return NULL; @@ -143,89 +144,93 @@ static struct bpf_sk_storage_elem *selem_alloc(struct bpf_sk_storage_map *smap, return NULL; } -/* sk_storage->lock must be held and selem->sk_storage == sk_storage. +/* local_storage->lock must be held and selem->local_storage == local_storage. * The caller must ensure selem->smap is still valid to be * dereferenced for its smap->elem_size and smap->cache_idx. */ -static bool __selem_unlink_sk(struct bpf_sk_storage *sk_storage, - struct bpf_sk_storage_elem *selem, - bool uncharge_omem) +static bool +bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage, + struct bpf_local_storage_elem *selem, + bool uncharge_omem) { - struct bpf_sk_storage_map *smap; - bool free_sk_storage; + struct bpf_local_storage_map *smap; + bool free_local_storage; struct sock *sk; smap = rcu_dereference(SDATA(selem)->smap); - sk = sk_storage->sk; + sk = local_storage->owner; - /* All uncharging on sk->sk_omem_alloc must be done first. - * sk may be freed once the last selem is unlinked from sk_storage. + /* All uncharging on the owner must be done first. + * The owner may be freed once the last selem is unlinked + * from local_storage. */ if (uncharge_omem) atomic_sub(smap->elem_size, &sk->sk_omem_alloc); - free_sk_storage = hlist_is_singular_node(&selem->snode, - &sk_storage->list); - if (free_sk_storage) { - atomic_sub(sizeof(struct bpf_sk_storage), &sk->sk_omem_alloc); - sk_storage->sk = NULL; + free_local_storage = hlist_is_singular_node(&selem->snode, + &local_storage->list); + if (free_local_storage) { + atomic_sub(sizeof(struct bpf_local_storage), &sk->sk_omem_alloc); + local_storage->owner = NULL; /* After this RCU_INIT, sk may be freed and cannot be used */ RCU_INIT_POINTER(sk->sk_bpf_storage, NULL); - /* sk_storage is not freed now. sk_storage->lock is - * still held and raw_spin_unlock_bh(&sk_storage->lock) + /* local_storage is not freed now. local_storage->lock is + * still held and raw_spin_unlock_bh(&local_storage->lock) * will be done by the caller. * * Although the unlock will be done under * rcu_read_lock(), it is more intutivie to - * read if kfree_rcu(sk_storage, rcu) is done - * after the raw_spin_unlock_bh(&sk_storage->lock). + * read if kfree_rcu(local_storage, rcu) is done + * after the raw_spin_unlock_bh(&local_storage->lock). * - * Hence, a "bool free_sk_storage" is returned + * Hence, a "bool free_local_storage" is returned * to the caller which then calls the kfree_rcu() * after unlock. */ } hlist_del_init_rcu(&selem->snode); - if (rcu_access_pointer(sk_storage->cache[smap->cache_idx]) == + if (rcu_access_pointer(local_storage->cache[smap->cache_idx]) == SDATA(selem)) - RCU_INIT_POINTER(sk_storage->cache[smap->cache_idx], NULL); + RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL); kfree_rcu(selem, rcu); - return free_sk_storage; + return free_local_storage; } -static void selem_unlink_sk(struct bpf_sk_storage_elem *selem) +static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem) { - struct bpf_sk_storage *sk_storage; - bool free_sk_storage = false; + struct bpf_local_storage *local_storage; + bool free_local_storage = false; - if (unlikely(!selem_linked_to_sk(selem))) + if (unlikely(!selem_linked_to_storage(selem))) /* selem has already been unlinked from sk */ return; - sk_storage = rcu_dereference(selem->sk_storage); - raw_spin_lock_bh(&sk_storage->lock); - if (likely(selem_linked_to_sk(selem))) - free_sk_storage = __selem_unlink_sk(sk_storage, selem, true); - raw_spin_unlock_bh(&sk_storage->lock); + local_storage = rcu_dereference(selem->local_storage); + raw_spin_lock_bh(&local_storage->lock); + if (likely(selem_linked_to_storage(selem))) + free_local_storage = + bpf_selem_unlink_storage_nolock(local_storage, selem, true); + raw_spin_unlock_bh(&local_storage->lock); - if (free_sk_storage) - kfree_rcu(sk_storage, rcu); + if (free_local_storage) + kfree_rcu(local_storage, rcu); } -static void __selem_link_sk(struct bpf_sk_storage *sk_storage, - struct bpf_sk_storage_elem *selem) +static void +bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage, + struct bpf_local_storage_elem *selem) { - RCU_INIT_POINTER(selem->sk_storage, sk_storage); - hlist_add_head(&selem->snode, &sk_storage->list); + RCU_INIT_POINTER(selem->local_storage, local_storage); + hlist_add_head(&selem->snode, &local_storage->list); } -static void selem_unlink_map(struct bpf_sk_storage_elem *selem) +static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem) { - struct bpf_sk_storage_map *smap; - struct bucket *b; + struct bpf_local_storage_map *smap; + struct bpf_local_storage_map_bucket *b; if (unlikely(!selem_linked_to_map(selem))) /* selem has already be unlinked from smap */ @@ -239,10 +244,10 @@ static void selem_unlink_map(struct bpf_sk_storage_elem *selem) raw_spin_unlock_bh(&b->lock); } -static void selem_link_map(struct bpf_sk_storage_map *smap, - struct bpf_sk_storage_elem *selem) +static void bpf_selem_link_map(struct bpf_local_storage_map *smap, + struct bpf_local_storage_elem *selem) { - struct bucket *b = select_bucket(smap, selem); + struct bpf_local_storage_map_bucket *b = select_bucket(smap, selem); raw_spin_lock_bh(&b->lock); RCU_INIT_POINTER(SDATA(selem)->smap, smap); @@ -250,31 +255,31 @@ static void selem_link_map(struct bpf_sk_storage_map *smap, raw_spin_unlock_bh(&b->lock); } -static void selem_unlink(struct bpf_sk_storage_elem *selem) +static void bpf_selem_unlink(struct bpf_local_storage_elem *selem) { - /* Always unlink from map before unlinking from sk_storage + /* Always unlink from map before unlinking from local_storage * because selem will be freed after successfully unlinked from - * the sk_storage. + * the local_storage. */ - selem_unlink_map(selem); - selem_unlink_sk(selem); + bpf_selem_unlink_map(selem); + __bpf_selem_unlink_storage(selem); } -static struct bpf_sk_storage_data * -__sk_storage_lookup(struct bpf_sk_storage *sk_storage, - struct bpf_sk_storage_map *smap, - bool cacheit_lockit) +static struct bpf_local_storage_data * +bpf_local_storage_lookup(struct bpf_local_storage *local_storage, + struct bpf_local_storage_map *smap, + bool cacheit_lockit) { - struct bpf_sk_storage_data *sdata; - struct bpf_sk_storage_elem *selem; + struct bpf_local_storage_data *sdata; + struct bpf_local_storage_elem *selem; /* Fast path (cache hit) */ - sdata = rcu_dereference(sk_storage->cache[smap->cache_idx]); + sdata = rcu_dereference(local_storage->cache[smap->cache_idx]); if (sdata && rcu_access_pointer(sdata->smap) == smap) return sdata; /* Slow path (cache miss) */ - hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) + hlist_for_each_entry_rcu(selem, &local_storage->list, snode) if (rcu_access_pointer(SDATA(selem)->smap) == smap) break; @@ -286,33 +291,33 @@ __sk_storage_lookup(struct bpf_sk_storage *sk_storage, /* spinlock is needed to avoid racing with the * parallel delete. Otherwise, publishing an already * deleted sdata to the cache will become a use-after-free - * problem in the next __sk_storage_lookup(). + * problem in the next bpf_local_storage_lookup(). */ - raw_spin_lock_bh(&sk_storage->lock); - if (selem_linked_to_sk(selem)) - rcu_assign_pointer(sk_storage->cache[smap->cache_idx], + raw_spin_lock_bh(&local_storage->lock); + if (selem_linked_to_storage(selem)) + rcu_assign_pointer(local_storage->cache[smap->cache_idx], sdata); - raw_spin_unlock_bh(&sk_storage->lock); + raw_spin_unlock_bh(&local_storage->lock); } return sdata; } -static struct bpf_sk_storage_data * +static struct bpf_local_storage_data * sk_storage_lookup(struct sock *sk, struct bpf_map *map, bool cacheit_lockit) { - struct bpf_sk_storage *sk_storage; - struct bpf_sk_storage_map *smap; + struct bpf_local_storage *sk_storage; + struct bpf_local_storage_map *smap; sk_storage = rcu_dereference(sk->sk_bpf_storage); if (!sk_storage) return NULL; - smap = (struct bpf_sk_storage_map *)map; - return __sk_storage_lookup(sk_storage, smap, cacheit_lockit); + smap = (struct bpf_local_storage_map *)map; + return bpf_local_storage_lookup(sk_storage, smap, cacheit_lockit); } -static int check_flags(const struct bpf_sk_storage_data *old_sdata, +static int check_flags(const struct bpf_local_storage_data *old_sdata, u64 map_flags) { if (old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_NOEXIST) @@ -327,10 +332,10 @@ static int check_flags(const struct bpf_sk_storage_data *old_sdata, } static int sk_storage_alloc(struct sock *sk, - struct bpf_sk_storage_map *smap, - struct bpf_sk_storage_elem *first_selem) + struct bpf_local_storage_map *smap, + struct bpf_local_storage_elem *first_selem) { - struct bpf_sk_storage *prev_sk_storage, *sk_storage; + struct bpf_local_storage *prev_sk_storage, *sk_storage; int err; err = omem_charge(sk, sizeof(*sk_storage)); @@ -344,10 +349,10 @@ static int sk_storage_alloc(struct sock *sk, } INIT_HLIST_HEAD(&sk_storage->list); raw_spin_lock_init(&sk_storage->lock); - sk_storage->sk = sk; + sk_storage->owner = sk; - __selem_link_sk(sk_storage, first_selem); - selem_link_map(smap, first_selem); + bpf_selem_link_storage_nolock(sk_storage, first_selem); + bpf_selem_link_map(smap, first_selem); /* Publish sk_storage to sk. sk->sk_lock cannot be acquired. * Hence, atomic ops is used to set sk->sk_bpf_storage * from NULL to the newly allocated sk_storage ptr. @@ -357,17 +362,17 @@ static int sk_storage_alloc(struct sock *sk, * the sk->sk_bpf_storage, the sk_storage->lock must * be held before setting sk->sk_bpf_storage to NULL. */ - prev_sk_storage = cmpxchg((struct bpf_sk_storage **)&sk->sk_bpf_storage, + prev_sk_storage = cmpxchg((struct bpf_local_storage **)&sk->sk_bpf_storage, NULL, sk_storage); if (unlikely(prev_sk_storage)) { - selem_unlink_map(first_selem); + bpf_selem_unlink_map(first_selem); err = -EAGAIN; goto uncharge; /* Note that even first_selem was linked to smap's * bucket->list, first_selem can be freed immediately * (instead of kfree_rcu) because - * bpf_sk_storage_map_free() does a + * bpf_local_storage_map_free() does a * synchronize_rcu() before walking the bucket->list. * Hence, no one is accessing selem from the * bucket->list under rcu_read_lock(). @@ -387,15 +392,14 @@ static int sk_storage_alloc(struct sock *sk, * Otherwise, it will become a leak (and other memory issues * during map destruction). */ -static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk, - struct bpf_map *map, - void *value, - u64 map_flags) +static struct bpf_local_storage_data * +bpf_local_storage_update(struct sock *sk, struct bpf_map *map, void *value, + u64 map_flags) { - struct bpf_sk_storage_data *old_sdata = NULL; - struct bpf_sk_storage_elem *selem; - struct bpf_sk_storage *sk_storage; - struct bpf_sk_storage_map *smap; + struct bpf_local_storage_data *old_sdata = NULL; + struct bpf_local_storage_elem *selem; + struct bpf_local_storage *local_storage; + struct bpf_local_storage_map *smap; int err; /* BPF_EXIST and BPF_NOEXIST cannot be both set */ @@ -404,15 +408,15 @@ static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk, unlikely((map_flags & BPF_F_LOCK) && !map_value_has_spin_lock(map))) return ERR_PTR(-EINVAL); - smap = (struct bpf_sk_storage_map *)map; - sk_storage = rcu_dereference(sk->sk_bpf_storage); - if (!sk_storage || hlist_empty(&sk_storage->list)) { - /* Very first elem for this sk */ + smap = (struct bpf_local_storage_map *)map; + local_storage = rcu_dereference(sk->sk_bpf_storage); + if (!local_storage || hlist_empty(&local_storage->list)) { + /* Very first elem for the owner */ err = check_flags(NULL, map_flags); if (err) return ERR_PTR(err); - selem = selem_alloc(smap, sk, value, true); + selem = bpf_selem_alloc(smap, sk, value, true); if (!selem) return ERR_PTR(-ENOMEM); @@ -428,25 +432,26 @@ static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk, if ((map_flags & BPF_F_LOCK) && !(map_flags & BPF_NOEXIST)) { /* Hoping to find an old_sdata to do inline update - * such that it can avoid taking the sk_storage->lock + * such that it can avoid taking the local_storage->lock * and changing the lists. */ - old_sdata = __sk_storage_lookup(sk_storage, smap, false); + old_sdata = + bpf_local_storage_lookup(local_storage, smap, false); err = check_flags(old_sdata, map_flags); if (err) return ERR_PTR(err); - if (old_sdata && selem_linked_to_sk(SELEM(old_sdata))) { + if (old_sdata && selem_linked_to_storage(SELEM(old_sdata))) { copy_map_value_locked(map, old_sdata->data, value, false); return old_sdata; } } - raw_spin_lock_bh(&sk_storage->lock); + raw_spin_lock_bh(&local_storage->lock); - /* Recheck sk_storage->list under sk_storage->lock */ - if (unlikely(hlist_empty(&sk_storage->list))) { - /* A parallel del is happening and sk_storage is going + /* Recheck local_storage->list under local_storage->lock */ + if (unlikely(hlist_empty(&local_storage->list))) { + /* A parallel del is happening and local_storage is going * away. It has just been checked before, so very * unlikely. Return instead of retry to keep things * simple. @@ -455,7 +460,7 @@ static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk, goto unlock_err; } - old_sdata = __sk_storage_lookup(sk_storage, smap, false); + old_sdata = bpf_local_storage_lookup(local_storage, smap, false); err = check_flags(old_sdata, map_flags); if (err) goto unlock_err; @@ -466,50 +471,52 @@ static struct bpf_sk_storage_data *sk_storage_update(struct sock *sk, goto unlock; } - /* sk_storage->lock is held. Hence, we are sure + /* local_storage->lock is held. Hence, we are sure * we can unlink and uncharge the old_sdata successfully * later. Hence, instead of charging the new selem now * and then uncharge the old selem later (which may cause * a potential but unnecessary charge failure), avoid taking * a charge at all here (the "!old_sdata" check) and the - * old_sdata will not be uncharged later during __selem_unlink_sk(). + * old_sdata will not be uncharged later during + * bpf_selem_unlink_storage_nolock(). */ - selem = selem_alloc(smap, sk, value, !old_sdata); + selem = bpf_selem_alloc(smap, sk, value, !old_sdata); if (!selem) { err = -ENOMEM; goto unlock_err; } /* First, link the new selem to the map */ - selem_link_map(smap, selem); + bpf_selem_link_map(smap, selem); - /* Second, link (and publish) the new selem to sk_storage */ - __selem_link_sk(sk_storage, selem); + /* Second, link (and publish) the new selem to local_storage */ + bpf_selem_link_storage_nolock(local_storage, selem); /* Third, remove old selem, SELEM(old_sdata) */ if (old_sdata) { - selem_unlink_map(SELEM(old_sdata)); - __selem_unlink_sk(sk_storage, SELEM(old_sdata), false); + bpf_selem_unlink_map(SELEM(old_sdata)); + bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata), + false); } unlock: - raw_spin_unlock_bh(&sk_storage->lock); + raw_spin_unlock_bh(&local_storage->lock); return SDATA(selem); unlock_err: - raw_spin_unlock_bh(&sk_storage->lock); + raw_spin_unlock_bh(&local_storage->lock); return ERR_PTR(err); } static int sk_storage_delete(struct sock *sk, struct bpf_map *map) { - struct bpf_sk_storage_data *sdata; + struct bpf_local_storage_data *sdata; sdata = sk_storage_lookup(sk, map, false); if (!sdata) return -ENOENT; - selem_unlink(SELEM(sdata)); + bpf_selem_unlink(SELEM(sdata)); return 0; } @@ -521,7 +528,7 @@ static u16 cache_idx_get(void) spin_lock(&cache_idx_lock); - for (i = 0; i < BPF_SK_STORAGE_CACHE_SIZE; i++) { + for (i = 0; i < BPF_LOCAL_STORAGE_CACHE_SIZE; i++) { if (cache_idx_usage_counts[i] < min_usage) { min_usage = cache_idx_usage_counts[i]; res = i; @@ -548,8 +555,8 @@ static void cache_idx_free(u16 idx) /* Called by __sk_destruct() & bpf_sk_storage_clone() */ void bpf_sk_storage_free(struct sock *sk) { - struct bpf_sk_storage_elem *selem; - struct bpf_sk_storage *sk_storage; + struct bpf_local_storage_elem *selem; + struct bpf_local_storage *sk_storage; bool free_sk_storage = false; struct hlist_node *n; @@ -565,7 +572,7 @@ void bpf_sk_storage_free(struct sock *sk) * Thus, no elem can be added-to or deleted-from the * sk_storage->list by the bpf_prog or by the bpf-map's syscall. * - * It is racing with bpf_sk_storage_map_free() alone + * It is racing with bpf_local_storage_map_free() alone * when unlinking elem from the sk_storage->list and * the map's bucket->list. */ @@ -574,8 +581,9 @@ void bpf_sk_storage_free(struct sock *sk) /* Always unlink from map before unlinking from * sk_storage. */ - selem_unlink_map(selem); - free_sk_storage = __selem_unlink_sk(sk_storage, selem, true); + bpf_selem_unlink_map(selem); + free_sk_storage = bpf_selem_unlink_storage_nolock(sk_storage, + selem, true); } raw_spin_unlock_bh(&sk_storage->lock); rcu_read_unlock(); @@ -584,14 +592,14 @@ void bpf_sk_storage_free(struct sock *sk) kfree_rcu(sk_storage, rcu); } -static void bpf_sk_storage_map_free(struct bpf_map *map) +static void bpf_local_storage_map_free(struct bpf_map *map) { - struct bpf_sk_storage_elem *selem; - struct bpf_sk_storage_map *smap; - struct bucket *b; + struct bpf_local_storage_elem *selem; + struct bpf_local_storage_map *smap; + struct bpf_local_storage_map_bucket *b; unsigned int i; - smap = (struct bpf_sk_storage_map *)map; + smap = (struct bpf_local_storage_map *)map; cache_idx_free(smap->cache_idx); @@ -604,10 +612,10 @@ static void bpf_sk_storage_map_free(struct bpf_map *map) /* bpf prog and the userspace can no longer access this map * now. No new selem (of this map) can be added - * to the sk->sk_bpf_storage or to the map bucket's list. + * to the owner->storage or to the map bucket's list. * * The elem of this map can be cleaned up here - * or + * or when the storage is freed e.g. * by bpf_sk_storage_free() during __sk_destruct(). */ for (i = 0; i < (1U << smap->bucket_log); i++) { @@ -615,26 +623,26 @@ static void bpf_sk_storage_map_free(struct bpf_map *map) rcu_read_lock(); /* No one is adding to b->list now */ - while ((selem = hlist_entry_safe(rcu_dereference_raw(hlist_first_rcu(&b->list)), - struct bpf_sk_storage_elem, - map_node))) { - selem_unlink(selem); + while ((selem = hlist_entry_safe( + rcu_dereference_raw(hlist_first_rcu(&b->list)), + struct bpf_local_storage_elem, map_node))) { + bpf_selem_unlink(selem); cond_resched_rcu(); } rcu_read_unlock(); } - /* bpf_sk_storage_free() may still need to access the map. - * e.g. bpf_sk_storage_free() has unlinked selem from the map + /* While freeing the storage we may still need to access the map. + * + * e.g. when bpf_sk_storage_free() has unlinked selem from the map * which then made the above while((selem = ...)) loop - * exited immediately. + * exit immediately. * - * However, the bpf_sk_storage_free() still needs to access - * the smap->elem_size to do the uncharging in - * __selem_unlink_sk(). + * However, while freeing the storage one still needs to access the + * smap->elem_size to do the uncharging in + * bpf_selem_unlink_storage_nolock(). * - * Hence, wait another rcu grace period for the - * bpf_sk_storage_free() to finish. + * Hence, wait another rcu grace period for the storage to be freed. */ synchronize_rcu(); @@ -645,14 +653,15 @@ static void bpf_sk_storage_map_free(struct bpf_map *map) /* U16_MAX is much more than enough for sk local storage * considering a tcp_sock is ~2k. */ -#define MAX_VALUE_SIZE \ +#define BPF_LOCAL_STORAGE_MAX_VALUE_SIZE \ min_t(u32, \ - (KMALLOC_MAX_SIZE - MAX_BPF_STACK - sizeof(struct bpf_sk_storage_elem)), \ - (U16_MAX - sizeof(struct bpf_sk_storage_elem))) + (KMALLOC_MAX_SIZE - MAX_BPF_STACK - \ + sizeof(struct bpf_local_storage_elem)), \ + (U16_MAX - sizeof(struct bpf_local_storage_elem))) -static int bpf_sk_storage_map_alloc_check(union bpf_attr *attr) +static int bpf_local_storage_map_alloc_check(union bpf_attr *attr) { - if (attr->map_flags & ~SK_STORAGE_CREATE_FLAG_MASK || + if (attr->map_flags & ~BPF_LOCAL_STORAGE_CREATE_FLAG_MASK || !(attr->map_flags & BPF_F_NO_PREALLOC) || attr->max_entries || attr->key_size != sizeof(int) || !attr->value_size || @@ -663,15 +672,15 @@ static int bpf_sk_storage_map_alloc_check(union bpf_attr *attr) if (!bpf_capable()) return -EPERM; - if (attr->value_size > MAX_VALUE_SIZE) + if (attr->value_size > BPF_LOCAL_STORAGE_MAX_VALUE_SIZE) return -E2BIG; return 0; } -static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr) +static struct bpf_map *bpf_local_storage_map_alloc(union bpf_attr *attr) { - struct bpf_sk_storage_map *smap; + struct bpf_local_storage_map *smap; unsigned int i; u32 nbuckets; u64 cost; @@ -707,7 +716,8 @@ static struct bpf_map *bpf_sk_storage_map_alloc(union bpf_attr *attr) raw_spin_lock_init(&smap->buckets[i].lock); } - smap->elem_size = sizeof(struct bpf_sk_storage_elem) + attr->value_size; + smap->elem_size = + sizeof(struct bpf_local_storage_elem) + attr->value_size; smap->cache_idx = cache_idx_get(); return &smap->map; @@ -719,10 +729,10 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, return -ENOTSUPP; } -static int bpf_sk_storage_map_check_btf(const struct bpf_map *map, - const struct btf *btf, - const struct btf_type *key_type, - const struct btf_type *value_type) +static int bpf_local_storage_map_check_btf(const struct bpf_map *map, + const struct btf *btf, + const struct btf_type *key_type, + const struct btf_type *value_type) { u32 int_data; @@ -738,7 +748,7 @@ static int bpf_sk_storage_map_check_btf(const struct bpf_map *map, static void *bpf_fd_sk_storage_lookup_elem(struct bpf_map *map, void *key) { - struct bpf_sk_storage_data *sdata; + struct bpf_local_storage_data *sdata; struct socket *sock; int fd, err; @@ -756,14 +766,15 @@ static void *bpf_fd_sk_storage_lookup_elem(struct bpf_map *map, void *key) static int bpf_fd_sk_storage_update_elem(struct bpf_map *map, void *key, void *value, u64 map_flags) { - struct bpf_sk_storage_data *sdata; + struct bpf_local_storage_data *sdata; struct socket *sock; int fd, err; fd = *(int *)key; sock = sockfd_lookup(fd, &err); if (sock) { - sdata = sk_storage_update(sock->sk, map, value, map_flags); + sdata = bpf_local_storage_update(sock->sk, map, value, + map_flags); sockfd_put(sock); return PTR_ERR_OR_ZERO(sdata); } @@ -787,14 +798,14 @@ static int bpf_fd_sk_storage_delete_elem(struct bpf_map *map, void *key) return err; } -static struct bpf_sk_storage_elem * +static struct bpf_local_storage_elem * bpf_sk_storage_clone_elem(struct sock *newsk, - struct bpf_sk_storage_map *smap, - struct bpf_sk_storage_elem *selem) + struct bpf_local_storage_map *smap, + struct bpf_local_storage_elem *selem) { - struct bpf_sk_storage_elem *copy_selem; + struct bpf_local_storage_elem *copy_selem; - copy_selem = selem_alloc(smap, newsk, NULL, true); + copy_selem = bpf_selem_alloc(smap, newsk, NULL, true); if (!copy_selem) return NULL; @@ -810,9 +821,9 @@ bpf_sk_storage_clone_elem(struct sock *newsk, int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk) { - struct bpf_sk_storage *new_sk_storage = NULL; - struct bpf_sk_storage *sk_storage; - struct bpf_sk_storage_elem *selem; + struct bpf_local_storage *new_sk_storage = NULL; + struct bpf_local_storage *sk_storage; + struct bpf_local_storage_elem *selem; int ret = 0; RCU_INIT_POINTER(newsk->sk_bpf_storage, NULL); @@ -824,8 +835,8 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk) goto out; hlist_for_each_entry_rcu(selem, &sk_storage->list, snode) { - struct bpf_sk_storage_elem *copy_selem; - struct bpf_sk_storage_map *smap; + struct bpf_local_storage_elem *copy_selem; + struct bpf_local_storage_map *smap; struct bpf_map *map; smap = rcu_dereference(SDATA(selem)->smap); @@ -833,7 +844,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk) continue; /* Note that for lockless listeners adding new element - * here can race with cleanup in bpf_sk_storage_map_free. + * here can race with cleanup in bpf_local_storage_map_free. * Try to grab map refcnt to make sure that it's still * alive and prevent concurrent removal. */ @@ -849,8 +860,8 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk) } if (new_sk_storage) { - selem_link_map(smap, copy_selem); - __selem_link_sk(new_sk_storage, copy_selem); + bpf_selem_link_map(smap, copy_selem); + bpf_selem_link_storage_nolock(new_sk_storage, copy_selem); } else { ret = sk_storage_alloc(newsk, smap, copy_selem); if (ret) { @@ -861,7 +872,8 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk) goto out; } - new_sk_storage = rcu_dereference(copy_selem->sk_storage); + new_sk_storage = + rcu_dereference(copy_selem->local_storage); } bpf_map_put(map); } @@ -879,7 +891,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk) BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk, void *, value, u64, flags) { - struct bpf_sk_storage_data *sdata; + struct bpf_local_storage_data *sdata; if (flags > BPF_SK_STORAGE_GET_F_CREATE) return (unsigned long)NULL; @@ -895,7 +907,7 @@ BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk, * destruction). */ refcount_inc_not_zero(&sk->sk_refcnt)) { - sdata = sk_storage_update(sk, map, value, BPF_NOEXIST); + sdata = bpf_local_storage_update(sk, map, value, BPF_NOEXIST); /* sk must be a fullsock (guaranteed by verifier), * so sock_gen_put() is unnecessary. */ @@ -922,15 +934,15 @@ BPF_CALL_2(bpf_sk_storage_delete, struct bpf_map *, map, struct sock *, sk) static int sk_storage_map_btf_id; const struct bpf_map_ops sk_storage_map_ops = { - .map_alloc_check = bpf_sk_storage_map_alloc_check, - .map_alloc = bpf_sk_storage_map_alloc, - .map_free = bpf_sk_storage_map_free, + .map_alloc_check = bpf_local_storage_map_alloc_check, + .map_alloc = bpf_local_storage_map_alloc, + .map_free = bpf_local_storage_map_free, .map_get_next_key = notsupp_get_next_key, .map_lookup_elem = bpf_fd_sk_storage_lookup_elem, .map_update_elem = bpf_fd_sk_storage_update_elem, .map_delete_elem = bpf_fd_sk_storage_delete_elem, - .map_check_btf = bpf_sk_storage_map_check_btf, - .map_btf_name = "bpf_sk_storage_map", + .map_check_btf = bpf_local_storage_map_check_btf, + .map_btf_name = "bpf_local_storage_map", .map_btf_id = &sk_storage_map_btf_id, }; @@ -1022,7 +1034,7 @@ bpf_sk_storage_diag_alloc(const struct nlattr *nla_stgs) u32 nr_maps = 0; int rem, err; - /* bpf_sk_storage_map is currently limited to CAP_SYS_ADMIN as + /* bpf_local_storage_map is currently limited to CAP_SYS_ADMIN as * the map_alloc_check() side also does. */ if (!bpf_capable()) @@ -1072,13 +1084,13 @@ bpf_sk_storage_diag_alloc(const struct nlattr *nla_stgs) } EXPORT_SYMBOL_GPL(bpf_sk_storage_diag_alloc); -static int diag_get(struct bpf_sk_storage_data *sdata, struct sk_buff *skb) +static int diag_get(struct bpf_local_storage_data *sdata, struct sk_buff *skb) { struct nlattr *nla_stg, *nla_value; - struct bpf_sk_storage_map *smap; + struct bpf_local_storage_map *smap; /* It cannot exceed max nlattr's payload */ - BUILD_BUG_ON(U16_MAX - NLA_HDRLEN < MAX_VALUE_SIZE); + BUILD_BUG_ON(U16_MAX - NLA_HDRLEN < BPF_LOCAL_STORAGE_MAX_VALUE_SIZE); nla_stg = nla_nest_start(skb, SK_DIAG_BPF_STORAGE); if (!nla_stg) @@ -1114,9 +1126,9 @@ static int bpf_sk_storage_diag_put_all(struct sock *sk, struct sk_buff *skb, { /* stg_array_type (e.g. INET_DIAG_BPF_SK_STORAGES) */ unsigned int diag_size = nla_total_size(0); - struct bpf_sk_storage *sk_storage; - struct bpf_sk_storage_elem *selem; - struct bpf_sk_storage_map *smap; + struct bpf_local_storage *sk_storage; + struct bpf_local_storage_elem *selem; + struct bpf_local_storage_map *smap; struct nlattr *nla_stgs; unsigned int saved_len; int err = 0; @@ -1169,8 +1181,8 @@ int bpf_sk_storage_diag_put(struct bpf_sk_storage_diag *diag, { /* stg_array_type (e.g. INET_DIAG_BPF_SK_STORAGES) */ unsigned int diag_size = nla_total_size(0); - struct bpf_sk_storage *sk_storage; - struct bpf_sk_storage_data *sdata; + struct bpf_local_storage *sk_storage; + struct bpf_local_storage_data *sdata; struct nlattr *nla_stgs; unsigned int saved_len; int err = 0; @@ -1197,8 +1209,8 @@ int bpf_sk_storage_diag_put(struct bpf_sk_storage_diag *diag, saved_len = skb->len; for (i = 0; i < diag->nr_maps; i++) { - sdata = __sk_storage_lookup(sk_storage, - (struct bpf_sk_storage_map *)diag->maps[i], + sdata = bpf_local_storage_lookup(sk_storage, + (struct bpf_local_storage_map *)diag->maps[i], false); if (!sdata) @@ -1235,19 +1247,19 @@ struct bpf_iter_seq_sk_storage_map_info { unsigned skip_elems; }; -static struct bpf_sk_storage_elem * +static struct bpf_local_storage_elem * bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info, - struct bpf_sk_storage_elem *prev_selem) + struct bpf_local_storage_elem *prev_selem) { - struct bpf_sk_storage *sk_storage; - struct bpf_sk_storage_elem *selem; + struct bpf_local_storage *sk_storage; + struct bpf_local_storage_elem *selem; u32 skip_elems = info->skip_elems; - struct bpf_sk_storage_map *smap; + struct bpf_local_storage_map *smap; u32 bucket_id = info->bucket_id; u32 i, count, n_buckets; - struct bucket *b; + struct bpf_local_storage_map_bucket *b; - smap = (struct bpf_sk_storage_map *)info->map; + smap = (struct bpf_local_storage_map *)info->map; n_buckets = 1U << smap->bucket_log; if (bucket_id >= n_buckets) return NULL; @@ -1257,7 +1269,7 @@ bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info, count = 0; while (selem) { selem = hlist_entry_safe(selem->map_node.next, - struct bpf_sk_storage_elem, map_node); + struct bpf_local_storage_elem, map_node); if (!selem) { /* not found, unlock and go to the next bucket */ b = &smap->buckets[bucket_id++]; @@ -1265,7 +1277,7 @@ bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info, skip_elems = 0; break; } - sk_storage = rcu_dereference_raw(selem->sk_storage); + sk_storage = rcu_dereference_raw(selem->local_storage); if (sk_storage) { info->skip_elems = skip_elems + count; return selem; @@ -1278,7 +1290,7 @@ bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info, raw_spin_lock_bh(&b->lock); count = 0; hlist_for_each_entry(selem, &b->list, map_node) { - sk_storage = rcu_dereference_raw(selem->sk_storage); + sk_storage = rcu_dereference_raw(selem->local_storage); if (sk_storage && count >= skip_elems) { info->bucket_id = i; info->skip_elems = count; @@ -1297,7 +1309,7 @@ bpf_sk_storage_map_seq_find_next(struct bpf_iter_seq_sk_storage_map_info *info, static void *bpf_sk_storage_map_seq_start(struct seq_file *seq, loff_t *pos) { - struct bpf_sk_storage_elem *selem; + struct bpf_local_storage_elem *selem; selem = bpf_sk_storage_map_seq_find_next(seq->private, NULL); if (!selem) @@ -1330,11 +1342,11 @@ DEFINE_BPF_ITER_FUNC(bpf_sk_storage_map, struct bpf_iter_meta *meta, void *value) static int __bpf_sk_storage_map_seq_show(struct seq_file *seq, - struct bpf_sk_storage_elem *selem) + struct bpf_local_storage_elem *selem) { struct bpf_iter_seq_sk_storage_map_info *info = seq->private; struct bpf_iter__bpf_sk_storage_map ctx = {}; - struct bpf_sk_storage *sk_storage; + struct bpf_local_storage *sk_storage; struct bpf_iter_meta meta; struct bpf_prog *prog; int ret = 0; @@ -1345,8 +1357,8 @@ static int __bpf_sk_storage_map_seq_show(struct seq_file *seq, ctx.meta = &meta; ctx.map = info->map; if (selem) { - sk_storage = rcu_dereference_raw(selem->sk_storage); - ctx.sk = sk_storage->sk; + sk_storage = rcu_dereference_raw(selem->local_storage); + ctx.sk = sk_storage->owner; ctx.value = SDATA(selem)->data; } ret = bpf_iter_run_prog(prog, &ctx); @@ -1363,13 +1375,13 @@ static int bpf_sk_storage_map_seq_show(struct seq_file *seq, void *v) static void bpf_sk_storage_map_seq_stop(struct seq_file *seq, void *v) { struct bpf_iter_seq_sk_storage_map_info *info = seq->private; - struct bpf_sk_storage_map *smap; - struct bucket *b; + struct bpf_local_storage_map *smap; + struct bpf_local_storage_map_bucket *b; if (!v) { (void)__bpf_sk_storage_map_seq_show(seq, v); } else { - smap = (struct bpf_sk_storage_map *)info->map; + smap = (struct bpf_local_storage_map *)info->map; b = &smap->buckets[info->bucket_id]; raw_spin_unlock_bh(&b->lock); } From patchwork Tue Aug 25 18:29:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: KP Singh X-Patchwork-Id: 1351225 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=chromium.org header.i=@chromium.org header.a=rsa-sha256 header.s=google header.b=jDfTkfnu; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4Bbcwp6cS3z9sTN for ; Wed, 26 Aug 2020 04:30:02 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726751AbgHYSaC (ORCPT ); Tue, 25 Aug 2020 14:30:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42686 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726556AbgHYS31 (ORCPT ); Tue, 25 Aug 2020 14:29:27 -0400 Received: from mail-ej1-x642.google.com (mail-ej1-x642.google.com [IPv6:2a00:1450:4864:20::642]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 009A0C061755 for ; Tue, 25 Aug 2020 11:29:26 -0700 (PDT) Received: by mail-ej1-x642.google.com with SMTP id dp2so12570011ejc.4 for ; Tue, 25 Aug 2020 11:29:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=7EGajeDiJIxrlPr3Y0NRvAuPtl3AEsFbIm2NTrH6ZEc=; b=jDfTkfnuojbMaFUK6N1gavlgXoz0UOitqddOW0kO7XGsBtyCIdWvVGP4Ali167FpVl +8fZXzYunItxxegZYSeFNDr+lSOHwnPn/FI8sdZL/Jmm+9TIQS005cX+6+E9SOoujGw1 aSz2+ffiizsI2Lg0F9RC26uoeZVLjUSIq012E= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=7EGajeDiJIxrlPr3Y0NRvAuPtl3AEsFbIm2NTrH6ZEc=; b=r7ZaFq/pOfXIFcmpjFfA9BD5F+wwlaChsqKvv1aPppE0lXejv2ra/z75RsdK2wNZUp kX+ba6NbLNvLx9p+xS2DrVg4xnJg2HV9o9LfZJFnnyEQIYMkMhKuoTsb6LV+noK4rkRq z9YFhOuKWzIFZjV5EGlC1JI6YW35gyYWRMjFyEmIy2qGLZ8AalbVqgusbCsI3nXpRq3N bj4X+qVcTEPwBbUfmg289xVSdcEfC90wcodGxMPluXZJWPgeyGKTwT+5tACVzKUpCkdU v1SHfbS6UIz0b4FW6oHMNe6PCo9wyA2yin2hn6n1enVcIyYrgXBGbppW/h8QlhtMHrFs jyNw== X-Gm-Message-State: AOAM532kfC7CW5awQTCewZrXbxXTubyXU0bEtkN5SidXBYy0dsz7TZuP Bo3LcL5N8jgmkYKKkawQKJFB+A== X-Google-Smtp-Source: ABdhPJxwWmTdSKRTZoliBYv1yxoma3Uae22mPp0EOYaM+udhulIxdQLSMBipzg5LcWTHCcC3UvKnbg== X-Received: by 2002:a17:906:4f11:: with SMTP id t17mr8838115eju.371.1598380165443; Tue, 25 Aug 2020 11:29:25 -0700 (PDT) Received: from kpsingh.zrh.corp.google.com ([81.6.44.51]) by smtp.gmail.com with ESMTPSA id dr21sm15323286ejc.112.2020.08.25.11.29.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Aug 2020 11:29:24 -0700 (PDT) From: KP Singh To: linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-security-module@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Daniel Borkmann , Paul Turner , Jann Horn , Florent Revest Subject: [PATCH bpf-next v10 2/7] bpf: Generalize caching for sk_storage. Date: Tue, 25 Aug 2020 20:29:14 +0200 Message-Id: <20200825182919.1118197-3-kpsingh@chromium.org> X-Mailer: git-send-email 2.28.0.297.g1956fa8f8d-goog In-Reply-To: <20200825182919.1118197-1-kpsingh@chromium.org> References: <20200825182919.1118197-1-kpsingh@chromium.org> MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org From: KP Singh Provide the a ability to define local storage caches on a per-object type basis. The caches and caching indices for different objects should not be inter-mixed as suggested in: https://lore.kernel.org/bpf/20200630193441.kdwnkestulg5erii@kafai-mbp.dhcp.thefacebook.com/ "Caching a sk-storage at idx=0 of a sk should not stop an inode-storage to be cached at the same idx of a inode." Acked-by: Martin KaFai Lau Signed-off-by: KP Singh --- include/net/bpf_sk_storage.h | 19 +++++++++++++++++++ net/core/bpf_sk_storage.c | 31 +++++++++++++++---------------- 2 files changed, 34 insertions(+), 16 deletions(-) diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h index 5036c94c0503..950c5aaba15e 100644 --- a/include/net/bpf_sk_storage.h +++ b/include/net/bpf_sk_storage.h @@ -3,6 +3,9 @@ #ifndef _BPF_SK_STORAGE_H #define _BPF_SK_STORAGE_H +#include +#include + struct sock; void bpf_sk_storage_free(struct sock *sk); @@ -15,6 +18,22 @@ struct sk_buff; struct nlattr; struct sock; +#define BPF_LOCAL_STORAGE_CACHE_SIZE 16 + +struct bpf_local_storage_cache { + spinlock_t idx_lock; + u64 idx_usage_counts[BPF_LOCAL_STORAGE_CACHE_SIZE]; +}; + +#define DEFINE_BPF_STORAGE_CACHE(name) \ +static struct bpf_local_storage_cache name = { \ + .idx_lock = __SPIN_LOCK_UNLOCKED(name.idx_lock), \ +} + +u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache); +void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache, + u16 idx); + #ifdef CONFIG_BPF_SYSCALL int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk); struct bpf_sk_storage_diag * diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c index f975e2d01207..ec61ee7c7ee4 100644 --- a/net/core/bpf_sk_storage.c +++ b/net/core/bpf_sk_storage.c @@ -14,6 +14,8 @@ #define BPF_LOCAL_STORAGE_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_CLONE) +DEFINE_BPF_STORAGE_CACHE(sk_cache); + struct bpf_local_storage_map_bucket { struct hlist_head list; raw_spinlock_t lock; @@ -78,10 +80,6 @@ struct bpf_local_storage_elem { #define SELEM(_SDATA) \ container_of((_SDATA), struct bpf_local_storage_elem, sdata) #define SDATA(_SELEM) (&(_SELEM)->sdata) -#define BPF_LOCAL_STORAGE_CACHE_SIZE 16 - -static DEFINE_SPINLOCK(cache_idx_lock); -static u64 cache_idx_usage_counts[BPF_LOCAL_STORAGE_CACHE_SIZE]; struct bpf_local_storage { struct bpf_local_storage_data __rcu *cache[BPF_LOCAL_STORAGE_CACHE_SIZE]; @@ -521,16 +519,16 @@ static int sk_storage_delete(struct sock *sk, struct bpf_map *map) return 0; } -static u16 cache_idx_get(void) +u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache) { u64 min_usage = U64_MAX; u16 i, res = 0; - spin_lock(&cache_idx_lock); + spin_lock(&cache->idx_lock); for (i = 0; i < BPF_LOCAL_STORAGE_CACHE_SIZE; i++) { - if (cache_idx_usage_counts[i] < min_usage) { - min_usage = cache_idx_usage_counts[i]; + if (cache->idx_usage_counts[i] < min_usage) { + min_usage = cache->idx_usage_counts[i]; res = i; /* Found a free cache_idx */ @@ -538,18 +536,19 @@ static u16 cache_idx_get(void) break; } } - cache_idx_usage_counts[res]++; + cache->idx_usage_counts[res]++; - spin_unlock(&cache_idx_lock); + spin_unlock(&cache->idx_lock); return res; } -static void cache_idx_free(u16 idx) +void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache, + u16 idx) { - spin_lock(&cache_idx_lock); - cache_idx_usage_counts[idx]--; - spin_unlock(&cache_idx_lock); + spin_lock(&cache->idx_lock); + cache->idx_usage_counts[idx]--; + spin_unlock(&cache->idx_lock); } /* Called by __sk_destruct() & bpf_sk_storage_clone() */ @@ -601,7 +600,7 @@ static void bpf_local_storage_map_free(struct bpf_map *map) smap = (struct bpf_local_storage_map *)map; - cache_idx_free(smap->cache_idx); + bpf_local_storage_cache_idx_free(&sk_cache, smap->cache_idx); /* Note that this map might be concurrently cloned from * bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone @@ -718,7 +717,7 @@ static struct bpf_map *bpf_local_storage_map_alloc(union bpf_attr *attr) smap->elem_size = sizeof(struct bpf_local_storage_elem) + attr->value_size; - smap->cache_idx = cache_idx_get(); + smap->cache_idx = bpf_local_storage_cache_idx_get(&sk_cache); return &smap->map; } From patchwork Tue Aug 25 18:29:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: KP Singh X-Patchwork-Id: 1351223 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=chromium.org header.i=@chromium.org header.a=rsa-sha256 header.s=google header.b=eHkmTWht; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4Bbcwf3dm5z9sTX for ; Wed, 26 Aug 2020 04:29:54 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726664AbgHYS3w (ORCPT ); Tue, 25 Aug 2020 14:29:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42692 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726609AbgHYS32 (ORCPT ); Tue, 25 Aug 2020 14:29:28 -0400 Received: from mail-ej1-x643.google.com (mail-ej1-x643.google.com [IPv6:2a00:1450:4864:20::643]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6484EC061756 for ; Tue, 25 Aug 2020 11:29:28 -0700 (PDT) Received: by mail-ej1-x643.google.com with SMTP id j25so8684569ejk.9 for ; Tue, 25 Aug 2020 11:29:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ggFFvKCon8n+cwHrevM6ZkexFFyhcXKRquMmsz50xiQ=; b=eHkmTWhtka7kNTnjkDbbhbFTb5eTVH99cuSk/OzHUNqyu3KsFe6rWQDutG2WxfkARB AiI+2XOQ2WQZUZuSUGaLlul2Ulk1Godf9gBjMf/q2eAb9aH+tZL6eHEreR1V6K+00AE4 OKRcxKgxXUJB1w0N8FV9qnzkJT5ngWVKfBi5c= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ggFFvKCon8n+cwHrevM6ZkexFFyhcXKRquMmsz50xiQ=; b=fzs489DiV613P9WaNFPVKeydWkR8hGNLRLrNA0qHDOtVcXkFsgVczoBQOQc3lktDd8 8eKhiVjVXw3SG85j0E7LdxpEVu0YYqgCtxR1i/FEWblw8DTLOFJT5vcgmji887azkXO1 jiu6uuWibe6PhZmBKU+v9/Dn62jgAUjGgIrzUjhHIY9D/5POQJK7dG7t2/72c1MnDZO8 4+TNKogVj5P+qaerpecJn31fXGNAveaML2NUwFIgwk9mnJHpcf1OaH6M5XQazYKrIaZg ibgrj8bLJnqE+rFOmhTuvALhMkl8znHYpDHzzc1Yq89VACiuNMABX8qcOGCLl1pgyAQQ fJSQ== X-Gm-Message-State: AOAM530iiN2nt9AALpwp8M0OoTrdK3VnmEHCP3f5oT/LUX/R1ueI9R2P VlkE2QT8PyAo0i6Wspj9zY5dfA== X-Google-Smtp-Source: ABdhPJwMZ3GkhT2AGOMEGQ+KyiGJLOr8WDCCFcmBiC+Wzvc8+XLQ+CD/zQdeamyq2NDLodw7g4TnaA== X-Received: by 2002:a17:906:f1c4:: with SMTP id gx4mr12234990ejb.98.1598380166513; Tue, 25 Aug 2020 11:29:26 -0700 (PDT) Received: from kpsingh.zrh.corp.google.com ([81.6.44.51]) by smtp.gmail.com with ESMTPSA id dr21sm15323286ejc.112.2020.08.25.11.29.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Aug 2020 11:29:26 -0700 (PDT) From: KP Singh To: linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-security-module@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Daniel Borkmann , Paul Turner , Jann Horn , Florent Revest Subject: [PATCH bpf-next v10 3/7] bpf: Generalize bpf_sk_storage Date: Tue, 25 Aug 2020 20:29:15 +0200 Message-Id: <20200825182919.1118197-4-kpsingh@chromium.org> X-Mailer: git-send-email 2.28.0.297.g1956fa8f8d-goog In-Reply-To: <20200825182919.1118197-1-kpsingh@chromium.org> References: <20200825182919.1118197-1-kpsingh@chromium.org> MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org From: KP Singh Refactor the functionality in bpf_sk_storage.c so that concept of storage linked to kernel objects can be extended to other objects like inode, task_struct etc. Each new local storage will still be a separate map and provide its own set of helpers. This allows for future object specific extensions and still share a lot of the underlying implementation. This includes the changes suggested by Martin in: https://lore.kernel.org/bpf/20200725013047.4006241-1-kafai@fb.com/ adding new map operations to support bpf_local_storage maps: * storages for different kernel objects to optionally have different memory charging strategy (map_local_storage_charge, map_local_storage_uncharge) * Functionality to extract the storage pointer from a pointer to the owning object (map_owner_storage_ptr) Co-developed-by: Martin KaFai Lau Signed-off-by: Martin KaFai Lau Signed-off-by: KP Singh --- include/linux/bpf.h | 8 ++ include/net/bpf_sk_storage.h | 52 +++++++ include/uapi/linux/bpf.h | 8 +- net/core/bpf_sk_storage.c | 238 +++++++++++++++++++++------------ tools/include/uapi/linux/bpf.h | 8 +- 5 files changed, 228 insertions(+), 86 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 81f38e2fda78..8c443b93ac11 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -34,6 +34,8 @@ struct btf_type; struct exception_table_entry; struct seq_operations; struct bpf_iter_aux_info; +struct bpf_local_storage; +struct bpf_local_storage_map; extern struct idr btf_idr; extern spinlock_t btf_idr_lock; @@ -104,6 +106,12 @@ struct bpf_map_ops { __poll_t (*map_poll)(struct bpf_map *map, struct file *filp, struct poll_table_struct *pts); + /* Functions called by bpf_local_storage maps */ + int (*map_local_storage_charge)(struct bpf_local_storage_map *smap, + void *owner, u32 size); + void (*map_local_storage_uncharge)(struct bpf_local_storage_map *smap, + void *owner, u32 size); + struct bpf_local_storage __rcu ** (*map_owner_storage_ptr)(void *owner); /* BTF name and id of struct allocated by map_alloc */ const char * const map_btf_name; int *map_btf_id; diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h index 950c5aaba15e..9e631b5466e3 100644 --- a/include/net/bpf_sk_storage.h +++ b/include/net/bpf_sk_storage.h @@ -3,8 +3,15 @@ #ifndef _BPF_SK_STORAGE_H #define _BPF_SK_STORAGE_H +#include +#include +#include #include #include +#include +#include +#include +#include struct sock; @@ -13,6 +20,7 @@ void bpf_sk_storage_free(struct sock *sk); extern const struct bpf_func_proto bpf_sk_storage_get_proto; extern const struct bpf_func_proto bpf_sk_storage_delete_proto; +struct bpf_local_storage_elem; struct bpf_sk_storage_diag; struct sk_buff; struct nlattr; @@ -34,6 +42,50 @@ u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache); void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache, u16 idx); +/* Helper functions for bpf_local_storage */ +int bpf_local_storage_map_alloc_check(union bpf_attr *attr); + +struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr); + +struct bpf_local_storage_data * +bpf_local_storage_lookup(struct bpf_local_storage *local_storage, + struct bpf_local_storage_map *smap, + bool cacheit_lockit); + +void bpf_local_storage_map_free(struct bpf_local_storage_map *smap); + +int bpf_local_storage_map_check_btf(const struct bpf_map *map, + const struct btf *btf, + const struct btf_type *key_type, + const struct btf_type *value_type); + +void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage, + struct bpf_local_storage_elem *selem); + +bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage, + struct bpf_local_storage_elem *selem, + bool uncharge_omem); + +void bpf_selem_unlink(struct bpf_local_storage_elem *selem); + +void bpf_selem_link_map(struct bpf_local_storage_map *smap, + struct bpf_local_storage_elem *selem); + +void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem); + +struct bpf_local_storage_elem * +bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value, + bool charge_mem); + +int +bpf_local_storage_alloc(void *owner, + struct bpf_local_storage_map *smap, + struct bpf_local_storage_elem *first_selem); + +struct bpf_local_storage_data * +bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, + void *value, u64 map_flags); + #ifdef CONFIG_BPF_SYSCALL int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk); struct bpf_sk_storage_diag * diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 544b89a64918..2cbd137eed86 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -3765,9 +3765,13 @@ enum { BPF_F_SYSCTL_BASE_NAME = (1ULL << 0), }; -/* BPF_FUNC_sk_storage_get flags */ +/* BPF_FUNC__storage_get flags */ enum { - BPF_SK_STORAGE_GET_F_CREATE = (1ULL << 0), + BPF_LOCAL_STORAGE_GET_F_CREATE = (1ULL << 0), + /* BPF_SK_STORAGE_GET_F_CREATE is only kept for backward compatibility + * and BPF_LOCAL_STORAGE_GET_F_CREATE must be used instead. + */ + BPF_SK_STORAGE_GET_F_CREATE = BPF_LOCAL_STORAGE_GET_F_CREATE, }; /* BPF_FUNC_read_branch_records flags. */ diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c index ec61ee7c7ee4..cd8b7017913b 100644 --- a/net/core/bpf_sk_storage.c +++ b/net/core/bpf_sk_storage.c @@ -84,7 +84,7 @@ struct bpf_local_storage_elem { struct bpf_local_storage { struct bpf_local_storage_data __rcu *cache[BPF_LOCAL_STORAGE_CACHE_SIZE]; struct hlist_head list; /* List of bpf_local_storage_elem */ - struct sock *owner; /* The object that owns the above "list" of + void *owner; /* The object that owns the above "list" of * bpf_local_storage_elem. */ struct rcu_head rcu; @@ -110,6 +110,33 @@ static int omem_charge(struct sock *sk, unsigned int size) return -ENOMEM; } +static int mem_charge(struct bpf_local_storage_map *smap, void *owner, u32 size) +{ + struct bpf_map *map = &smap->map; + + if (!map->ops->map_local_storage_charge) + return 0; + + return map->ops->map_local_storage_charge(smap, owner, size); +} + +static void mem_uncharge(struct bpf_local_storage_map *smap, void *owner, + u32 size) +{ + struct bpf_map *map = &smap->map; + + if (map->ops->map_local_storage_uncharge) + map->ops->map_local_storage_uncharge(smap, owner, size); +} + +static struct bpf_local_storage __rcu ** +owner_storage(struct bpf_local_storage_map *smap, void *owner) +{ + struct bpf_map *map = &smap->map; + + return map->ops->map_owner_storage_ptr(owner); +} + static bool selem_linked_to_storage(const struct bpf_local_storage_elem *selem) { return !hlist_unhashed(&selem->snode); @@ -120,13 +147,13 @@ static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem) return !hlist_unhashed(&selem->map_node); } -static struct bpf_local_storage_elem * -bpf_selem_alloc(struct bpf_local_storage_map *smap, struct sock *sk, - void *value, bool charge_omem) +struct bpf_local_storage_elem * +bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, + void *value, bool charge_mem) { struct bpf_local_storage_elem *selem; - if (charge_omem && omem_charge(sk, smap->elem_size)) + if (charge_mem && mem_charge(smap, owner, smap->elem_size)) return NULL; selem = kzalloc(smap->elem_size, GFP_ATOMIC | __GFP_NOWARN); @@ -136,8 +163,8 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, struct sock *sk, return selem; } - if (charge_omem) - atomic_sub(smap->elem_size, &sk->sk_omem_alloc); + if (charge_mem) + mem_uncharge(smap, owner, smap->elem_size); return NULL; } @@ -146,32 +173,32 @@ bpf_selem_alloc(struct bpf_local_storage_map *smap, struct sock *sk, * The caller must ensure selem->smap is still valid to be * dereferenced for its smap->elem_size and smap->cache_idx. */ -static bool -bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage, - struct bpf_local_storage_elem *selem, - bool uncharge_omem) +bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage, + struct bpf_local_storage_elem *selem, + bool uncharge_mem) { struct bpf_local_storage_map *smap; bool free_local_storage; - struct sock *sk; + void *owner; smap = rcu_dereference(SDATA(selem)->smap); - sk = local_storage->owner; + owner = local_storage->owner; /* All uncharging on the owner must be done first. * The owner may be freed once the last selem is unlinked * from local_storage. */ - if (uncharge_omem) - atomic_sub(smap->elem_size, &sk->sk_omem_alloc); + if (uncharge_mem) + mem_uncharge(smap, owner, smap->elem_size); free_local_storage = hlist_is_singular_node(&selem->snode, &local_storage->list); if (free_local_storage) { - atomic_sub(sizeof(struct bpf_local_storage), &sk->sk_omem_alloc); + mem_uncharge(smap, owner, sizeof(struct bpf_local_storage)); local_storage->owner = NULL; - /* After this RCU_INIT, sk may be freed and cannot be used */ - RCU_INIT_POINTER(sk->sk_bpf_storage, NULL); + + /* After this RCU_INIT, owner may be freed and cannot be used */ + RCU_INIT_POINTER(*owner_storage(smap, owner), NULL); /* local_storage is not freed now. local_storage->lock is * still held and raw_spin_unlock_bh(&local_storage->lock) @@ -209,23 +236,22 @@ static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem) local_storage = rcu_dereference(selem->local_storage); raw_spin_lock_bh(&local_storage->lock); if (likely(selem_linked_to_storage(selem))) - free_local_storage = - bpf_selem_unlink_storage_nolock(local_storage, selem, true); + free_local_storage = bpf_selem_unlink_storage_nolock( + local_storage, selem, true); raw_spin_unlock_bh(&local_storage->lock); if (free_local_storage) kfree_rcu(local_storage, rcu); } -static void -bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage, - struct bpf_local_storage_elem *selem) +void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage, + struct bpf_local_storage_elem *selem) { RCU_INIT_POINTER(selem->local_storage, local_storage); hlist_add_head(&selem->snode, &local_storage->list); } -static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem) +void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem) { struct bpf_local_storage_map *smap; struct bpf_local_storage_map_bucket *b; @@ -242,8 +268,8 @@ static void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem) raw_spin_unlock_bh(&b->lock); } -static void bpf_selem_link_map(struct bpf_local_storage_map *smap, - struct bpf_local_storage_elem *selem) +void bpf_selem_link_map(struct bpf_local_storage_map *smap, + struct bpf_local_storage_elem *selem) { struct bpf_local_storage_map_bucket *b = select_bucket(smap, selem); @@ -253,7 +279,7 @@ static void bpf_selem_link_map(struct bpf_local_storage_map *smap, raw_spin_unlock_bh(&b->lock); } -static void bpf_selem_unlink(struct bpf_local_storage_elem *selem) +void bpf_selem_unlink(struct bpf_local_storage_elem *selem) { /* Always unlink from map before unlinking from local_storage * because selem will be freed after successfully unlinked from @@ -263,7 +289,7 @@ static void bpf_selem_unlink(struct bpf_local_storage_elem *selem) __bpf_selem_unlink_storage(selem); } -static struct bpf_local_storage_data * +struct bpf_local_storage_data * bpf_local_storage_lookup(struct bpf_local_storage *local_storage, struct bpf_local_storage_map *smap, bool cacheit_lockit) @@ -329,40 +355,45 @@ static int check_flags(const struct bpf_local_storage_data *old_sdata, return 0; } -static int sk_storage_alloc(struct sock *sk, +int bpf_local_storage_alloc(void *owner, struct bpf_local_storage_map *smap, struct bpf_local_storage_elem *first_selem) { - struct bpf_local_storage *prev_sk_storage, *sk_storage; + struct bpf_local_storage *prev_storage, *storage; + struct bpf_local_storage **owner_storage_ptr; int err; - err = omem_charge(sk, sizeof(*sk_storage)); + err = mem_charge(smap, owner, sizeof(*storage)); if (err) return err; - sk_storage = kzalloc(sizeof(*sk_storage), GFP_ATOMIC | __GFP_NOWARN); - if (!sk_storage) { + storage = kzalloc(sizeof(*storage), GFP_ATOMIC | __GFP_NOWARN); + if (!storage) { err = -ENOMEM; goto uncharge; } - INIT_HLIST_HEAD(&sk_storage->list); - raw_spin_lock_init(&sk_storage->lock); - sk_storage->owner = sk; - bpf_selem_link_storage_nolock(sk_storage, first_selem); + INIT_HLIST_HEAD(&storage->list); + raw_spin_lock_init(&storage->lock); + storage->owner = owner; + + bpf_selem_link_storage_nolock(storage, first_selem); bpf_selem_link_map(smap, first_selem); - /* Publish sk_storage to sk. sk->sk_lock cannot be acquired. - * Hence, atomic ops is used to set sk->sk_bpf_storage - * from NULL to the newly allocated sk_storage ptr. + + owner_storage_ptr = + (struct bpf_local_storage **)owner_storage(smap, owner); + /* Publish storage to the owner. + * Instead of using any lock of the kernel object (i.e. owner), + * cmpxchg will work with any kernel object regardless what + * the running context is, bh, irq...etc. * - * From now on, the sk->sk_bpf_storage pointer is protected - * by the sk_storage->lock. Hence, when freeing - * the sk->sk_bpf_storage, the sk_storage->lock must - * be held before setting sk->sk_bpf_storage to NULL. + * From now on, the owner->storage pointer (e.g. sk->sk_bpf_storage) + * is protected by the storage->lock. Hence, when freeing + * the owner->storage, the storage->lock must be held before + * setting owner->storage ptr to NULL. */ - prev_sk_storage = cmpxchg((struct bpf_local_storage **)&sk->sk_bpf_storage, - NULL, sk_storage); - if (unlikely(prev_sk_storage)) { + prev_storage = cmpxchg(owner_storage_ptr, NULL, storage); + if (unlikely(prev_storage)) { bpf_selem_unlink_map(first_selem); err = -EAGAIN; goto uncharge; @@ -380,8 +411,8 @@ static int sk_storage_alloc(struct sock *sk, return 0; uncharge: - kfree(sk_storage); - atomic_sub(sizeof(*sk_storage), &sk->sk_omem_alloc); + kfree(storage); + mem_uncharge(smap, owner, sizeof(*storage)); return err; } @@ -390,38 +421,37 @@ static int sk_storage_alloc(struct sock *sk, * Otherwise, it will become a leak (and other memory issues * during map destruction). */ -static struct bpf_local_storage_data * -bpf_local_storage_update(struct sock *sk, struct bpf_map *map, void *value, - u64 map_flags) +struct bpf_local_storage_data * +bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, + void *value, u64 map_flags) { struct bpf_local_storage_data *old_sdata = NULL; struct bpf_local_storage_elem *selem; struct bpf_local_storage *local_storage; - struct bpf_local_storage_map *smap; int err; /* BPF_EXIST and BPF_NOEXIST cannot be both set */ if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) || /* BPF_F_LOCK can only be used in a value with spin_lock */ - unlikely((map_flags & BPF_F_LOCK) && !map_value_has_spin_lock(map))) + unlikely((map_flags & BPF_F_LOCK) && + !map_value_has_spin_lock(&smap->map))) return ERR_PTR(-EINVAL); - smap = (struct bpf_local_storage_map *)map; - local_storage = rcu_dereference(sk->sk_bpf_storage); + local_storage = rcu_dereference(*owner_storage(smap, owner)); if (!local_storage || hlist_empty(&local_storage->list)) { /* Very first elem for the owner */ err = check_flags(NULL, map_flags); if (err) return ERR_PTR(err); - selem = bpf_selem_alloc(smap, sk, value, true); + selem = bpf_selem_alloc(smap, owner, value, true); if (!selem) return ERR_PTR(-ENOMEM); - err = sk_storage_alloc(sk, smap, selem); + err = bpf_local_storage_alloc(owner, smap, selem); if (err) { kfree(selem); - atomic_sub(smap->elem_size, &sk->sk_omem_alloc); + mem_uncharge(smap, owner, smap->elem_size); return ERR_PTR(err); } @@ -439,7 +469,7 @@ bpf_local_storage_update(struct sock *sk, struct bpf_map *map, void *value, if (err) return ERR_PTR(err); if (old_sdata && selem_linked_to_storage(SELEM(old_sdata))) { - copy_map_value_locked(map, old_sdata->data, + copy_map_value_locked(&smap->map, old_sdata->data, value, false); return old_sdata; } @@ -464,7 +494,8 @@ bpf_local_storage_update(struct sock *sk, struct bpf_map *map, void *value, goto unlock_err; if (old_sdata && (map_flags & BPF_F_LOCK)) { - copy_map_value_locked(map, old_sdata->data, value, false); + copy_map_value_locked(&smap->map, old_sdata->data, value, + false); selem = SELEM(old_sdata); goto unlock; } @@ -478,7 +509,7 @@ bpf_local_storage_update(struct sock *sk, struct bpf_map *map, void *value, * old_sdata will not be uncharged later during * bpf_selem_unlink_storage_nolock(). */ - selem = bpf_selem_alloc(smap, sk, value, !old_sdata); + selem = bpf_selem_alloc(smap, owner, value, !old_sdata); if (!selem) { err = -ENOMEM; goto unlock_err; @@ -591,17 +622,12 @@ void bpf_sk_storage_free(struct sock *sk) kfree_rcu(sk_storage, rcu); } -static void bpf_local_storage_map_free(struct bpf_map *map) +void bpf_local_storage_map_free(struct bpf_local_storage_map *smap) { struct bpf_local_storage_elem *selem; - struct bpf_local_storage_map *smap; struct bpf_local_storage_map_bucket *b; unsigned int i; - smap = (struct bpf_local_storage_map *)map; - - bpf_local_storage_cache_idx_free(&sk_cache, smap->cache_idx); - /* Note that this map might be concurrently cloned from * bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone * RCU read section to finish before proceeding. New RCU @@ -646,7 +672,16 @@ static void bpf_local_storage_map_free(struct bpf_map *map) synchronize_rcu(); kvfree(smap->buckets); - kfree(map); + kfree(smap); +} + +static void sk_storage_map_free(struct bpf_map *map) +{ + struct bpf_local_storage_map *smap; + + smap = (struct bpf_local_storage_map *)map; + bpf_local_storage_cache_idx_free(&sk_cache, smap->cache_idx); + bpf_local_storage_map_free(smap); } /* U16_MAX is much more than enough for sk local storage @@ -658,7 +693,7 @@ static void bpf_local_storage_map_free(struct bpf_map *map) sizeof(struct bpf_local_storage_elem)), \ (U16_MAX - sizeof(struct bpf_local_storage_elem))) -static int bpf_local_storage_map_alloc_check(union bpf_attr *attr) +int bpf_local_storage_map_alloc_check(union bpf_attr *attr) { if (attr->map_flags & ~BPF_LOCAL_STORAGE_CREATE_FLAG_MASK || !(attr->map_flags & BPF_F_NO_PREALLOC) || @@ -677,7 +712,7 @@ static int bpf_local_storage_map_alloc_check(union bpf_attr *attr) return 0; } -static struct bpf_map *bpf_local_storage_map_alloc(union bpf_attr *attr) +struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr) { struct bpf_local_storage_map *smap; unsigned int i; @@ -717,8 +752,19 @@ static struct bpf_map *bpf_local_storage_map_alloc(union bpf_attr *attr) smap->elem_size = sizeof(struct bpf_local_storage_elem) + attr->value_size; - smap->cache_idx = bpf_local_storage_cache_idx_get(&sk_cache); + return smap; +} + +static struct bpf_map *sk_storage_map_alloc(union bpf_attr *attr) +{ + struct bpf_local_storage_map *smap; + + smap = bpf_local_storage_map_alloc(attr); + if (IS_ERR(smap)) + return ERR_CAST(smap); + + smap->cache_idx = bpf_local_storage_cache_idx_get(&sk_cache); return &smap->map; } @@ -728,10 +774,10 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, return -ENOTSUPP; } -static int bpf_local_storage_map_check_btf(const struct bpf_map *map, - const struct btf *btf, - const struct btf_type *key_type, - const struct btf_type *value_type) +int bpf_local_storage_map_check_btf(const struct bpf_map *map, + const struct btf *btf, + const struct btf_type *key_type, + const struct btf_type *value_type) { u32 int_data; @@ -772,8 +818,9 @@ static int bpf_fd_sk_storage_update_elem(struct bpf_map *map, void *key, fd = *(int *)key; sock = sockfd_lookup(fd, &err); if (sock) { - sdata = bpf_local_storage_update(sock->sk, map, value, - map_flags); + sdata = bpf_local_storage_update( + sock->sk, (struct bpf_local_storage_map *)map, value, + map_flags); sockfd_put(sock); return PTR_ERR_OR_ZERO(sdata); } @@ -862,7 +909,7 @@ int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk) bpf_selem_link_map(smap, copy_selem); bpf_selem_link_storage_nolock(new_sk_storage, copy_selem); } else { - ret = sk_storage_alloc(newsk, smap, copy_selem); + ret = bpf_local_storage_alloc(newsk, smap, copy_selem); if (ret) { kfree(copy_selem); atomic_sub(smap->elem_size, @@ -906,7 +953,9 @@ BPF_CALL_4(bpf_sk_storage_get, struct bpf_map *, map, struct sock *, sk, * destruction). */ refcount_inc_not_zero(&sk->sk_refcnt)) { - sdata = bpf_local_storage_update(sk, map, value, BPF_NOEXIST); + sdata = bpf_local_storage_update( + sk, (struct bpf_local_storage_map *)map, value, + BPF_NOEXIST); /* sk must be a fullsock (guaranteed by verifier), * so sock_gen_put() is unnecessary. */ @@ -931,11 +980,33 @@ BPF_CALL_2(bpf_sk_storage_delete, struct bpf_map *, map, struct sock *, sk) return -ENOENT; } +static int sk_storage_charge(struct bpf_local_storage_map *smap, + void *owner, u32 size) +{ + return omem_charge(owner, size); +} + +static void sk_storage_uncharge(struct bpf_local_storage_map *smap, + void *owner, u32 size) +{ + struct sock *sk = owner; + + atomic_sub(size, &sk->sk_omem_alloc); +} + +static struct bpf_local_storage __rcu ** +sk_storage_ptr(void *owner) +{ + struct sock *sk = owner; + + return &sk->sk_bpf_storage; +} + static int sk_storage_map_btf_id; const struct bpf_map_ops sk_storage_map_ops = { .map_alloc_check = bpf_local_storage_map_alloc_check, - .map_alloc = bpf_local_storage_map_alloc, - .map_free = bpf_local_storage_map_free, + .map_alloc = sk_storage_map_alloc, + .map_free = sk_storage_map_free, .map_get_next_key = notsupp_get_next_key, .map_lookup_elem = bpf_fd_sk_storage_lookup_elem, .map_update_elem = bpf_fd_sk_storage_update_elem, @@ -943,6 +1014,9 @@ const struct bpf_map_ops sk_storage_map_ops = { .map_check_btf = bpf_local_storage_map_check_btf, .map_btf_name = "bpf_local_storage_map", .map_btf_id = &sk_storage_map_btf_id, + .map_local_storage_charge = sk_storage_charge, + .map_local_storage_uncharge = sk_storage_uncharge, + .map_owner_storage_ptr = sk_storage_ptr, }; const struct bpf_func_proto bpf_sk_storage_get_proto = { diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 544b89a64918..2cbd137eed86 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -3765,9 +3765,13 @@ enum { BPF_F_SYSCTL_BASE_NAME = (1ULL << 0), }; -/* BPF_FUNC_sk_storage_get flags */ +/* BPF_FUNC__storage_get flags */ enum { - BPF_SK_STORAGE_GET_F_CREATE = (1ULL << 0), + BPF_LOCAL_STORAGE_GET_F_CREATE = (1ULL << 0), + /* BPF_SK_STORAGE_GET_F_CREATE is only kept for backward compatibility + * and BPF_LOCAL_STORAGE_GET_F_CREATE must be used instead. + */ + BPF_SK_STORAGE_GET_F_CREATE = BPF_LOCAL_STORAGE_GET_F_CREATE, }; /* BPF_FUNC_read_branch_records flags. */ From patchwork Tue Aug 25 18:29:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: KP Singh X-Patchwork-Id: 1351222 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=chromium.org header.i=@chromium.org header.a=rsa-sha256 header.s=google header.b=Il1ILU9b; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4Bbcwd2BgFz9sTR for ; Wed, 26 Aug 2020 04:29:53 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726713AbgHYS3n (ORCPT ); Tue, 25 Aug 2020 14:29:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726646AbgHYS3f (ORCPT ); Tue, 25 Aug 2020 14:29:35 -0400 Received: from mail-ej1-x642.google.com (mail-ej1-x642.google.com [IPv6:2a00:1450:4864:20::642]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4D72AC061799 for ; Tue, 25 Aug 2020 11:29:29 -0700 (PDT) Received: by mail-ej1-x642.google.com with SMTP id oz20so12898803ejb.5 for ; Tue, 25 Aug 2020 11:29:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=LrABoMiWVsa//mL1S6BhmHlsx9jHzjyCqlDb+o+VdJI=; b=Il1ILU9ba+bcBTkZo+OeEnn46K8aZ096H4tgoVTyLQmNdpm/jcUTNFAKlAW5DO03rY gMgymsksIRzWBdcPvcn4gI2rUUvUSrx9HB+MIUT+2qyj2BAEG9Kdust/vqIDKcKY0vUT U9ffO5skcWMPb0Qp2QsaCnCC/+JfWjc5klxVc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=LrABoMiWVsa//mL1S6BhmHlsx9jHzjyCqlDb+o+VdJI=; b=IsZ1+o8tMA7+aiQ/A2xMMGnuFLVCQR/bM8bNqUMRm7fvZGTxHYAc3XyMXTZY8R7Yt6 9aaf4siGMRUQBFcGQpA5gCwXTAz5tmqX5B1unltE8cBlxOJhTRDLLmsss2ILmR2c7k+R 5EfOwpm+zb8Q7d6Dt7XnV+Lshn/svfuXrBUJQB8hU/nkewYIjPSQqmpn/2XVKcYk7+J2 xE6I07xQw5Lm2YFW0yRUkfnWRHh7D5dnfbmcW0FNEY43ldR01S3bPizzviq9XgMK9V4j 47IXjRWSaijMzOJXrxCq0w3kun+nqj+cQ0cozpTNo+i9xhr3CwKlra+3DB/Q9bULHRP8 Kz5w== X-Gm-Message-State: AOAM532SuxxuszXr22opieif8e9tgecEWm9wiG8gjK24Evm+oarTeeX1 Zr7aSiLeGlH7vEjqSR2WLjw0Hw== X-Google-Smtp-Source: ABdhPJxAdbFNiFoTuol0d2R+lPhqf7N1ULtrhm7AY/jEg/kGXG+lZDoqMAxXdOnSwOaXCcIj1t4UtA== X-Received: by 2002:a17:907:388:: with SMTP id ss8mr11302290ejb.479.1598380167430; Tue, 25 Aug 2020 11:29:27 -0700 (PDT) Received: from kpsingh.zrh.corp.google.com ([81.6.44.51]) by smtp.gmail.com with ESMTPSA id dr21sm15323286ejc.112.2020.08.25.11.29.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Aug 2020 11:29:26 -0700 (PDT) From: KP Singh To: linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-security-module@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Daniel Borkmann , Paul Turner , Jann Horn , Florent Revest Subject: [PATCH bpf-next v10 4/7] bpf: Split bpf_local_storage to bpf_sk_storage Date: Tue, 25 Aug 2020 20:29:16 +0200 Message-Id: <20200825182919.1118197-5-kpsingh@chromium.org> X-Mailer: git-send-email 2.28.0.297.g1956fa8f8d-goog In-Reply-To: <20200825182919.1118197-1-kpsingh@chromium.org> References: <20200825182919.1118197-1-kpsingh@chromium.org> MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org From: KP Singh A purely mechanical change: bpf_sk_storage.c = bpf_sk_storage.c + bpf_local_storage.c bpf_sk_storage.h = bpf_sk_storage.h + bpf_local_storage.h Acked-by: Martin KaFai Lau Signed-off-by: KP Singh --- include/linux/bpf_local_storage.h | 163 ++++++++ include/net/bpf_sk_storage.h | 61 +-- kernel/bpf/Makefile | 1 + kernel/bpf/bpf_local_storage.c | 600 ++++++++++++++++++++++++++ net/core/bpf_sk_storage.c | 672 +----------------------------- 5 files changed, 766 insertions(+), 731 deletions(-) create mode 100644 include/linux/bpf_local_storage.h create mode 100644 kernel/bpf/bpf_local_storage.c diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h new file mode 100644 index 000000000000..b2c9463f36a1 --- /dev/null +++ b/include/linux/bpf_local_storage.h @@ -0,0 +1,163 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (c) 2019 Facebook + * Copyright 2020 Google LLC. + */ + +#ifndef _BPF_LOCAL_STORAGE_H +#define _BPF_LOCAL_STORAGE_H + +#include +#include +#include +#include +#include +#include + +#define BPF_LOCAL_STORAGE_CACHE_SIZE 16 + +struct bpf_local_storage_map_bucket { + struct hlist_head list; + raw_spinlock_t lock; +}; + +/* Thp map is not the primary owner of a bpf_local_storage_elem. + * Instead, the container object (eg. sk->sk_bpf_storage) is. + * + * The map (bpf_local_storage_map) is for two purposes + * 1. Define the size of the "local storage". It is + * the map's value_size. + * + * 2. Maintain a list to keep track of all elems such + * that they can be cleaned up during the map destruction. + * + * When a bpf local storage is being looked up for a + * particular object, the "bpf_map" pointer is actually used + * as the "key" to search in the list of elem in + * the respective bpf_local_storage owned by the object. + * + * e.g. sk->sk_bpf_storage is the mini-map with the "bpf_map" pointer + * as the searching key. + */ +struct bpf_local_storage_map { + struct bpf_map map; + /* Lookup elem does not require accessing the map. + * + * Updating/Deleting requires a bucket lock to + * link/unlink the elem from the map. Having + * multiple buckets to improve contention. + */ + struct bpf_local_storage_map_bucket *buckets; + u32 bucket_log; + u16 elem_size; + u16 cache_idx; +}; + +struct bpf_local_storage_data { + /* smap is used as the searching key when looking up + * from the object's bpf_local_storage. + * + * Put it in the same cacheline as the data to minimize + * the number of cachelines access during the cache hit case. + */ + struct bpf_local_storage_map __rcu *smap; + u8 data[] __aligned(8); +}; + +/* Linked to bpf_local_storage and bpf_local_storage_map */ +struct bpf_local_storage_elem { + struct hlist_node map_node; /* Linked to bpf_local_storage_map */ + struct hlist_node snode; /* Linked to bpf_local_storage */ + struct bpf_local_storage __rcu *local_storage; + struct rcu_head rcu; + /* 8 bytes hole */ + /* The data is stored in aother cacheline to minimize + * the number of cachelines access during a cache hit. + */ + struct bpf_local_storage_data sdata ____cacheline_aligned; +}; + +struct bpf_local_storage { + struct bpf_local_storage_data __rcu *cache[BPF_LOCAL_STORAGE_CACHE_SIZE]; + struct hlist_head list; /* List of bpf_local_storage_elem */ + void *owner; /* The object that owns the above "list" of + * bpf_local_storage_elem. + */ + struct rcu_head rcu; + raw_spinlock_t lock; /* Protect adding/removing from the "list" */ +}; + +/* U16_MAX is much more than enough for sk local storage + * considering a tcp_sock is ~2k. + */ +#define BPF_LOCAL_STORAGE_MAX_VALUE_SIZE \ + min_t(u32, \ + (KMALLOC_MAX_SIZE - MAX_BPF_STACK - \ + sizeof(struct bpf_local_storage_elem)), \ + (U16_MAX - sizeof(struct bpf_local_storage_elem))) + +#define SELEM(_SDATA) \ + container_of((_SDATA), struct bpf_local_storage_elem, sdata) +#define SDATA(_SELEM) (&(_SELEM)->sdata) + +#define BPF_LOCAL_STORAGE_CACHE_SIZE 16 + +struct bpf_local_storage_cache { + spinlock_t idx_lock; + u64 idx_usage_counts[BPF_LOCAL_STORAGE_CACHE_SIZE]; +}; + +#define DEFINE_BPF_STORAGE_CACHE(name) \ +static struct bpf_local_storage_cache name = { \ + .idx_lock = __SPIN_LOCK_UNLOCKED(name.idx_lock), \ +} + +u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache); +void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache, + u16 idx); + +/* Helper functions for bpf_local_storage */ +int bpf_local_storage_map_alloc_check(union bpf_attr *attr); + +struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr); + +struct bpf_local_storage_data * +bpf_local_storage_lookup(struct bpf_local_storage *local_storage, + struct bpf_local_storage_map *smap, + bool cacheit_lockit); + +void bpf_local_storage_map_free(struct bpf_local_storage_map *smap); + +int bpf_local_storage_map_check_btf(const struct bpf_map *map, + const struct btf *btf, + const struct btf_type *key_type, + const struct btf_type *value_type); + +void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage, + struct bpf_local_storage_elem *selem); + +bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage, + struct bpf_local_storage_elem *selem, + bool uncharge_omem); + +void bpf_selem_unlink(struct bpf_local_storage_elem *selem); + +void bpf_selem_link_map(struct bpf_local_storage_map *smap, + struct bpf_local_storage_elem *selem); + +void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem); + +struct bpf_local_storage_elem * +bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value, + bool charge_mem); + +int +bpf_local_storage_alloc(void *owner, + struct bpf_local_storage_map *smap, + struct bpf_local_storage_elem *first_selem); + +struct bpf_local_storage_data * +bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, + void *value, u64 map_flags); + +#endif /* _BPF_LOCAL_STORAGE_H */ diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h index 9e631b5466e3..3c516dd07caf 100644 --- a/include/net/bpf_sk_storage.h +++ b/include/net/bpf_sk_storage.h @@ -12,6 +12,7 @@ #include #include #include +#include struct sock; @@ -26,66 +27,6 @@ struct sk_buff; struct nlattr; struct sock; -#define BPF_LOCAL_STORAGE_CACHE_SIZE 16 - -struct bpf_local_storage_cache { - spinlock_t idx_lock; - u64 idx_usage_counts[BPF_LOCAL_STORAGE_CACHE_SIZE]; -}; - -#define DEFINE_BPF_STORAGE_CACHE(name) \ -static struct bpf_local_storage_cache name = { \ - .idx_lock = __SPIN_LOCK_UNLOCKED(name.idx_lock), \ -} - -u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache); -void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache, - u16 idx); - -/* Helper functions for bpf_local_storage */ -int bpf_local_storage_map_alloc_check(union bpf_attr *attr); - -struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr); - -struct bpf_local_storage_data * -bpf_local_storage_lookup(struct bpf_local_storage *local_storage, - struct bpf_local_storage_map *smap, - bool cacheit_lockit); - -void bpf_local_storage_map_free(struct bpf_local_storage_map *smap); - -int bpf_local_storage_map_check_btf(const struct bpf_map *map, - const struct btf *btf, - const struct btf_type *key_type, - const struct btf_type *value_type); - -void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage, - struct bpf_local_storage_elem *selem); - -bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage, - struct bpf_local_storage_elem *selem, - bool uncharge_omem); - -void bpf_selem_unlink(struct bpf_local_storage_elem *selem); - -void bpf_selem_link_map(struct bpf_local_storage_map *smap, - struct bpf_local_storage_elem *selem); - -void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem); - -struct bpf_local_storage_elem * -bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value, - bool charge_mem); - -int -bpf_local_storage_alloc(void *owner, - struct bpf_local_storage_map *smap, - struct bpf_local_storage_elem *first_selem); - -struct bpf_local_storage_data * -bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, - void *value, u64 map_flags); - #ifdef CONFIG_BPF_SYSCALL int bpf_sk_storage_clone(const struct sock *sk, struct sock *newsk); struct bpf_sk_storage_diag * diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 19e137aae40e..6961ff400cba 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -12,6 +12,7 @@ obj-$(CONFIG_BPF_JIT) += dispatcher.o ifeq ($(CONFIG_NET),y) obj-$(CONFIG_BPF_SYSCALL) += devmap.o obj-$(CONFIG_BPF_SYSCALL) += cpumap.o +obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o obj-$(CONFIG_BPF_SYSCALL) += offload.o obj-$(CONFIG_BPF_SYSCALL) += net_namespace.o endif diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c new file mode 100644 index 000000000000..ffa7d11fc2bd --- /dev/null +++ b/kernel/bpf/bpf_local_storage.c @@ -0,0 +1,600 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2019 Facebook */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define BPF_LOCAL_STORAGE_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_CLONE) + +static struct bpf_local_storage_map_bucket * +select_bucket(struct bpf_local_storage_map *smap, + struct bpf_local_storage_elem *selem) +{ + return &smap->buckets[hash_ptr(selem, smap->bucket_log)]; +} + +static int mem_charge(struct bpf_local_storage_map *smap, void *owner, u32 size) +{ + struct bpf_map *map = &smap->map; + + if (!map->ops->map_local_storage_charge) + return 0; + + return map->ops->map_local_storage_charge(smap, owner, size); +} + +static void mem_uncharge(struct bpf_local_storage_map *smap, void *owner, + u32 size) +{ + struct bpf_map *map = &smap->map; + + if (map->ops->map_local_storage_uncharge) + map->ops->map_local_storage_uncharge(smap, owner, size); +} + +static struct bpf_local_storage __rcu ** +owner_storage(struct bpf_local_storage_map *smap, void *owner) +{ + struct bpf_map *map = &smap->map; + + return map->ops->map_owner_storage_ptr(owner); +} + +static bool selem_linked_to_storage(const struct bpf_local_storage_elem *selem) +{ + return !hlist_unhashed(&selem->snode); +} + +static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem) +{ + return !hlist_unhashed(&selem->map_node); +} + +struct bpf_local_storage_elem * +bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, + void *value, bool charge_mem) +{ + struct bpf_local_storage_elem *selem; + + if (charge_mem && mem_charge(smap, owner, smap->elem_size)) + return NULL; + + selem = kzalloc(smap->elem_size, GFP_ATOMIC | __GFP_NOWARN); + if (selem) { + if (value) + memcpy(SDATA(selem)->data, value, smap->map.value_size); + return selem; + } + + if (charge_mem) + mem_uncharge(smap, owner, smap->elem_size); + + return NULL; +} + +/* local_storage->lock must be held and selem->local_storage == local_storage. + * The caller must ensure selem->smap is still valid to be + * dereferenced for its smap->elem_size and smap->cache_idx. + */ +bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage, + struct bpf_local_storage_elem *selem, + bool uncharge_mem) +{ + struct bpf_local_storage_map *smap; + bool free_local_storage; + void *owner; + + smap = rcu_dereference(SDATA(selem)->smap); + owner = local_storage->owner; + + /* All uncharging on the owner must be done first. + * The owner may be freed once the last selem is unlinked + * from local_storage. + */ + if (uncharge_mem) + mem_uncharge(smap, owner, smap->elem_size); + + free_local_storage = hlist_is_singular_node(&selem->snode, + &local_storage->list); + if (free_local_storage) { + mem_uncharge(smap, owner, sizeof(struct bpf_local_storage)); + local_storage->owner = NULL; + + /* After this RCU_INIT, owner may be freed and cannot be used */ + RCU_INIT_POINTER(*owner_storage(smap, owner), NULL); + + /* local_storage is not freed now. local_storage->lock is + * still held and raw_spin_unlock_bh(&local_storage->lock) + * will be done by the caller. + * + * Although the unlock will be done under + * rcu_read_lock(), it is more intutivie to + * read if kfree_rcu(local_storage, rcu) is done + * after the raw_spin_unlock_bh(&local_storage->lock). + * + * Hence, a "bool free_local_storage" is returned + * to the caller which then calls the kfree_rcu() + * after unlock. + */ + } + hlist_del_init_rcu(&selem->snode); + if (rcu_access_pointer(local_storage->cache[smap->cache_idx]) == + SDATA(selem)) + RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL); + + kfree_rcu(selem, rcu); + + return free_local_storage; +} + +static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem) +{ + struct bpf_local_storage *local_storage; + bool free_local_storage = false; + + if (unlikely(!selem_linked_to_storage(selem))) + /* selem has already been unlinked from sk */ + return; + + local_storage = rcu_dereference(selem->local_storage); + raw_spin_lock_bh(&local_storage->lock); + if (likely(selem_linked_to_storage(selem))) + free_local_storage = bpf_selem_unlink_storage_nolock( + local_storage, selem, true); + raw_spin_unlock_bh(&local_storage->lock); + + if (free_local_storage) + kfree_rcu(local_storage, rcu); +} + +void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage, + struct bpf_local_storage_elem *selem) +{ + RCU_INIT_POINTER(selem->local_storage, local_storage); + hlist_add_head(&selem->snode, &local_storage->list); +} + +void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem) +{ + struct bpf_local_storage_map *smap; + struct bpf_local_storage_map_bucket *b; + + if (unlikely(!selem_linked_to_map(selem))) + /* selem has already be unlinked from smap */ + return; + + smap = rcu_dereference(SDATA(selem)->smap); + b = select_bucket(smap, selem); + raw_spin_lock_bh(&b->lock); + if (likely(selem_linked_to_map(selem))) + hlist_del_init_rcu(&selem->map_node); + raw_spin_unlock_bh(&b->lock); +} + +void bpf_selem_link_map(struct bpf_local_storage_map *smap, + struct bpf_local_storage_elem *selem) +{ + struct bpf_local_storage_map_bucket *b = select_bucket(smap, selem); + + raw_spin_lock_bh(&b->lock); + RCU_INIT_POINTER(SDATA(selem)->smap, smap); + hlist_add_head_rcu(&selem->map_node, &b->list); + raw_spin_unlock_bh(&b->lock); +} + +void bpf_selem_unlink(struct bpf_local_storage_elem *selem) +{ + /* Always unlink from map before unlinking from local_storage + * because selem will be freed after successfully unlinked from + * the local_storage. + */ + bpf_selem_unlink_map(selem); + __bpf_selem_unlink_storage(selem); +} + +struct bpf_local_storage_data * +bpf_local_storage_lookup(struct bpf_local_storage *local_storage, + struct bpf_local_storage_map *smap, + bool cacheit_lockit) +{ + struct bpf_local_storage_data *sdata; + struct bpf_local_storage_elem *selem; + + /* Fast path (cache hit) */ + sdata = rcu_dereference(local_storage->cache[smap->cache_idx]); + if (sdata && rcu_access_pointer(sdata->smap) == smap) + return sdata; + + /* Slow path (cache miss) */ + hlist_for_each_entry_rcu(selem, &local_storage->list, snode) + if (rcu_access_pointer(SDATA(selem)->smap) == smap) + break; + + if (!selem) + return NULL; + + sdata = SDATA(selem); + if (cacheit_lockit) { + /* spinlock is needed to avoid racing with the + * parallel delete. Otherwise, publishing an already + * deleted sdata to the cache will become a use-after-free + * problem in the next bpf_local_storage_lookup(). + */ + raw_spin_lock_bh(&local_storage->lock); + if (selem_linked_to_storage(selem)) + rcu_assign_pointer(local_storage->cache[smap->cache_idx], + sdata); + raw_spin_unlock_bh(&local_storage->lock); + } + + return sdata; +} + +static int check_flags(const struct bpf_local_storage_data *old_sdata, + u64 map_flags) +{ + if (old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_NOEXIST) + /* elem already exists */ + return -EEXIST; + + if (!old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_EXIST) + /* elem doesn't exist, cannot update it */ + return -ENOENT; + + return 0; +} + +int bpf_local_storage_alloc(void *owner, + struct bpf_local_storage_map *smap, + struct bpf_local_storage_elem *first_selem) +{ + struct bpf_local_storage *prev_storage, *storage; + struct bpf_local_storage **owner_storage_ptr; + int err; + + err = mem_charge(smap, owner, sizeof(*storage)); + if (err) + return err; + + storage = kzalloc(sizeof(*storage), GFP_ATOMIC | __GFP_NOWARN); + if (!storage) { + err = -ENOMEM; + goto uncharge; + } + + INIT_HLIST_HEAD(&storage->list); + raw_spin_lock_init(&storage->lock); + storage->owner = owner; + + bpf_selem_link_storage_nolock(storage, first_selem); + bpf_selem_link_map(smap, first_selem); + + owner_storage_ptr = + (struct bpf_local_storage **)owner_storage(smap, owner); + /* Publish storage to the owner. + * Instead of using any lock of the kernel object (i.e. owner), + * cmpxchg will work with any kernel object regardless what + * the running context is, bh, irq...etc. + * + * From now on, the owner->storage pointer (e.g. sk->sk_bpf_storage) + * is protected by the storage->lock. Hence, when freeing + * the owner->storage, the storage->lock must be held before + * setting owner->storage ptr to NULL. + */ + prev_storage = cmpxchg(owner_storage_ptr, NULL, storage); + if (unlikely(prev_storage)) { + bpf_selem_unlink_map(first_selem); + err = -EAGAIN; + goto uncharge; + + /* Note that even first_selem was linked to smap's + * bucket->list, first_selem can be freed immediately + * (instead of kfree_rcu) because + * bpf_local_storage_map_free() does a + * synchronize_rcu() before walking the bucket->list. + * Hence, no one is accessing selem from the + * bucket->list under rcu_read_lock(). + */ + } + + return 0; + +uncharge: + kfree(storage); + mem_uncharge(smap, owner, sizeof(*storage)); + return err; +} + +/* sk cannot be going away because it is linking new elem + * to sk->sk_bpf_storage. (i.e. sk->sk_refcnt cannot be 0). + * Otherwise, it will become a leak (and other memory issues + * during map destruction). + */ +struct bpf_local_storage_data * +bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, + void *value, u64 map_flags) +{ + struct bpf_local_storage_data *old_sdata = NULL; + struct bpf_local_storage_elem *selem; + struct bpf_local_storage *local_storage; + int err; + + /* BPF_EXIST and BPF_NOEXIST cannot be both set */ + if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) || + /* BPF_F_LOCK can only be used in a value with spin_lock */ + unlikely((map_flags & BPF_F_LOCK) && + !map_value_has_spin_lock(&smap->map))) + return ERR_PTR(-EINVAL); + + local_storage = rcu_dereference(*owner_storage(smap, owner)); + if (!local_storage || hlist_empty(&local_storage->list)) { + /* Very first elem for the owner */ + err = check_flags(NULL, map_flags); + if (err) + return ERR_PTR(err); + + selem = bpf_selem_alloc(smap, owner, value, true); + if (!selem) + return ERR_PTR(-ENOMEM); + + err = bpf_local_storage_alloc(owner, smap, selem); + if (err) { + kfree(selem); + mem_uncharge(smap, owner, smap->elem_size); + return ERR_PTR(err); + } + + return SDATA(selem); + } + + if ((map_flags & BPF_F_LOCK) && !(map_flags & BPF_NOEXIST)) { + /* Hoping to find an old_sdata to do inline update + * such that it can avoid taking the local_storage->lock + * and changing the lists. + */ + old_sdata = + bpf_local_storage_lookup(local_storage, smap, false); + err = check_flags(old_sdata, map_flags); + if (err) + return ERR_PTR(err); + if (old_sdata && selem_linked_to_storage(SELEM(old_sdata))) { + copy_map_value_locked(&smap->map, old_sdata->data, + value, false); + return old_sdata; + } + } + + raw_spin_lock_bh(&local_storage->lock); + + /* Recheck local_storage->list under local_storage->lock */ + if (unlikely(hlist_empty(&local_storage->list))) { + /* A parallel del is happening and local_storage is going + * away. It has just been checked before, so very + * unlikely. Return instead of retry to keep things + * simple. + */ + err = -EAGAIN; + goto unlock_err; + } + + old_sdata = bpf_local_storage_lookup(local_storage, smap, false); + err = check_flags(old_sdata, map_flags); + if (err) + goto unlock_err; + + if (old_sdata && (map_flags & BPF_F_LOCK)) { + copy_map_value_locked(&smap->map, old_sdata->data, value, + false); + selem = SELEM(old_sdata); + goto unlock; + } + + /* local_storage->lock is held. Hence, we are sure + * we can unlink and uncharge the old_sdata successfully + * later. Hence, instead of charging the new selem now + * and then uncharge the old selem later (which may cause + * a potential but unnecessary charge failure), avoid taking + * a charge at all here (the "!old_sdata" check) and the + * old_sdata will not be uncharged later during + * bpf_selem_unlink_storage_nolock(). + */ + selem = bpf_selem_alloc(smap, owner, value, !old_sdata); + if (!selem) { + err = -ENOMEM; + goto unlock_err; + } + + /* First, link the new selem to the map */ + bpf_selem_link_map(smap, selem); + + /* Second, link (and publish) the new selem to local_storage */ + bpf_selem_link_storage_nolock(local_storage, selem); + + /* Third, remove old selem, SELEM(old_sdata) */ + if (old_sdata) { + bpf_selem_unlink_map(SELEM(old_sdata)); + bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata), + false); + } + +unlock: + raw_spin_unlock_bh(&local_storage->lock); + return SDATA(selem); + +unlock_err: + raw_spin_unlock_bh(&local_storage->lock); + return ERR_PTR(err); +} + +u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache) +{ + u64 min_usage = U64_MAX; + u16 i, res = 0; + + spin_lock(&cache->idx_lock); + + for (i = 0; i < BPF_LOCAL_STORAGE_CACHE_SIZE; i++) { + if (cache->idx_usage_counts[i] < min_usage) { + min_usage = cache->idx_usage_counts[i]; + res = i; + + /* Found a free cache_idx */ + if (!min_usage) + break; + } + } + cache->idx_usage_counts[res]++; + + spin_unlock(&cache->idx_lock); + + return res; +} + +void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache, + u16 idx) +{ + spin_lock(&cache->idx_lock); + cache->idx_usage_counts[idx]--; + spin_unlock(&cache->idx_lock); +} + +void bpf_local_storage_map_free(struct bpf_local_storage_map *smap) +{ + struct bpf_local_storage_elem *selem; + struct bpf_local_storage_map_bucket *b; + unsigned int i; + + /* Note that this map might be concurrently cloned from + * bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone + * RCU read section to finish before proceeding. New RCU + * read sections should be prevented via bpf_map_inc_not_zero. + */ + synchronize_rcu(); + + /* bpf prog and the userspace can no longer access this map + * now. No new selem (of this map) can be added + * to the owner->storage or to the map bucket's list. + * + * The elem of this map can be cleaned up here + * or when the storage is freed e.g. + * by bpf_sk_storage_free() during __sk_destruct(). + */ + for (i = 0; i < (1U << smap->bucket_log); i++) { + b = &smap->buckets[i]; + + rcu_read_lock(); + /* No one is adding to b->list now */ + while ((selem = hlist_entry_safe( + rcu_dereference_raw(hlist_first_rcu(&b->list)), + struct bpf_local_storage_elem, map_node))) { + bpf_selem_unlink(selem); + cond_resched_rcu(); + } + rcu_read_unlock(); + } + + /* While freeing the storage we may still need to access the map. + * + * e.g. when bpf_sk_storage_free() has unlinked selem from the map + * which then made the above while((selem = ...)) loop + * exit immediately. + * + * However, while freeing the storage one still needs to access the + * smap->elem_size to do the uncharging in + * bpf_selem_unlink_storage_nolock(). + * + * Hence, wait another rcu grace period for the storage to be freed. + */ + synchronize_rcu(); + + kvfree(smap->buckets); + kfree(smap); +} + +int bpf_local_storage_map_alloc_check(union bpf_attr *attr) +{ + if (attr->map_flags & ~BPF_LOCAL_STORAGE_CREATE_FLAG_MASK || + !(attr->map_flags & BPF_F_NO_PREALLOC) || + attr->max_entries || + attr->key_size != sizeof(int) || !attr->value_size || + /* Enforce BTF for userspace sk dumping */ + !attr->btf_key_type_id || !attr->btf_value_type_id) + return -EINVAL; + + if (!bpf_capable()) + return -EPERM; + + if (attr->value_size > BPF_LOCAL_STORAGE_MAX_VALUE_SIZE) + return -E2BIG; + + return 0; +} + +struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr) +{ + struct bpf_local_storage_map *smap; + unsigned int i; + u32 nbuckets; + u64 cost; + int ret; + + smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN); + if (!smap) + return ERR_PTR(-ENOMEM); + bpf_map_init_from_attr(&smap->map, attr); + + nbuckets = roundup_pow_of_two(num_possible_cpus()); + /* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */ + nbuckets = max_t(u32, 2, nbuckets); + smap->bucket_log = ilog2(nbuckets); + cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap); + + ret = bpf_map_charge_init(&smap->map.memory, cost); + if (ret < 0) { + kfree(smap); + return ERR_PTR(ret); + } + + smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets, + GFP_USER | __GFP_NOWARN); + if (!smap->buckets) { + bpf_map_charge_finish(&smap->map.memory); + kfree(smap); + return ERR_PTR(-ENOMEM); + } + + for (i = 0; i < nbuckets; i++) { + INIT_HLIST_HEAD(&smap->buckets[i].list); + raw_spin_lock_init(&smap->buckets[i].lock); + } + + smap->elem_size = + sizeof(struct bpf_local_storage_elem) + attr->value_size; + + return smap; +} + +int bpf_local_storage_map_check_btf(const struct bpf_map *map, + const struct btf *btf, + const struct btf_type *key_type, + const struct btf_type *value_type) +{ + u32 int_data; + + if (BTF_INFO_KIND(key_type->info) != BTF_KIND_INT) + return -EINVAL; + + int_data = *(u32 *)(key_type + 1); + if (BTF_INT_BITS(int_data) != 32 || BTF_INT_OFFSET(int_data)) + return -EINVAL; + + return 0; +} diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c index cd8b7017913b..f29d9a9b4ea4 100644 --- a/net/core/bpf_sk_storage.c +++ b/net/core/bpf_sk_storage.c @@ -7,97 +7,14 @@ #include #include #include +#include #include #include #include #include -#define BPF_LOCAL_STORAGE_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_CLONE) - DEFINE_BPF_STORAGE_CACHE(sk_cache); -struct bpf_local_storage_map_bucket { - struct hlist_head list; - raw_spinlock_t lock; -}; - -/* Thp map is not the primary owner of a bpf_local_storage_elem. - * Instead, the container object (eg. sk->sk_bpf_storage) is. - * - * The map (bpf_local_storage_map) is for two purposes - * 1. Define the size of the "local storage". It is - * the map's value_size. - * - * 2. Maintain a list to keep track of all elems such - * that they can be cleaned up during the map destruction. - * - * When a bpf local storage is being looked up for a - * particular object, the "bpf_map" pointer is actually used - * as the "key" to search in the list of elem in - * the respective bpf_local_storage owned by the object. - * - * e.g. sk->sk_bpf_storage is the mini-map with the "bpf_map" pointer - * as the searching key. - */ -struct bpf_local_storage_map { - struct bpf_map map; - /* Lookup elem does not require accessing the map. - * - * Updating/Deleting requires a bucket lock to - * link/unlink the elem from the map. Having - * multiple buckets to improve contention. - */ - struct bpf_local_storage_map_bucket *buckets; - u32 bucket_log; - u16 elem_size; - u16 cache_idx; -}; - -struct bpf_local_storage_data { - /* smap is used as the searching key when looking up - * from the object's bpf_local_storage. - * - * Put it in the same cacheline as the data to minimize - * the number of cachelines access during the cache hit case. - */ - struct bpf_local_storage_map __rcu *smap; - u8 data[] __aligned(8); -}; - -/* Linked to bpf_local_storage and bpf_local_storage_map */ -struct bpf_local_storage_elem { - struct hlist_node map_node; /* Linked to bpf_local_storage_map */ - struct hlist_node snode; /* Linked to bpf_local_storage */ - struct bpf_local_storage __rcu *local_storage; - struct rcu_head rcu; - /* 8 bytes hole */ - /* The data is stored in aother cacheline to minimize - * the number of cachelines access during a cache hit. - */ - struct bpf_local_storage_data sdata ____cacheline_aligned; -}; - -#define SELEM(_SDATA) \ - container_of((_SDATA), struct bpf_local_storage_elem, sdata) -#define SDATA(_SELEM) (&(_SELEM)->sdata) - -struct bpf_local_storage { - struct bpf_local_storage_data __rcu *cache[BPF_LOCAL_STORAGE_CACHE_SIZE]; - struct hlist_head list; /* List of bpf_local_storage_elem */ - void *owner; /* The object that owns the above "list" of - * bpf_local_storage_elem. - */ - struct rcu_head rcu; - raw_spinlock_t lock; /* Protect adding/removing from the "list" */ -}; - -static struct bpf_local_storage_map_bucket * -select_bucket(struct bpf_local_storage_map *smap, - struct bpf_local_storage_elem *selem) -{ - return &smap->buckets[hash_ptr(selem, smap->bucket_log)]; -} - static int omem_charge(struct sock *sk, unsigned int size) { /* same check as in sock_kmalloc() */ @@ -110,223 +27,6 @@ static int omem_charge(struct sock *sk, unsigned int size) return -ENOMEM; } -static int mem_charge(struct bpf_local_storage_map *smap, void *owner, u32 size) -{ - struct bpf_map *map = &smap->map; - - if (!map->ops->map_local_storage_charge) - return 0; - - return map->ops->map_local_storage_charge(smap, owner, size); -} - -static void mem_uncharge(struct bpf_local_storage_map *smap, void *owner, - u32 size) -{ - struct bpf_map *map = &smap->map; - - if (map->ops->map_local_storage_uncharge) - map->ops->map_local_storage_uncharge(smap, owner, size); -} - -static struct bpf_local_storage __rcu ** -owner_storage(struct bpf_local_storage_map *smap, void *owner) -{ - struct bpf_map *map = &smap->map; - - return map->ops->map_owner_storage_ptr(owner); -} - -static bool selem_linked_to_storage(const struct bpf_local_storage_elem *selem) -{ - return !hlist_unhashed(&selem->snode); -} - -static bool selem_linked_to_map(const struct bpf_local_storage_elem *selem) -{ - return !hlist_unhashed(&selem->map_node); -} - -struct bpf_local_storage_elem * -bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, - void *value, bool charge_mem) -{ - struct bpf_local_storage_elem *selem; - - if (charge_mem && mem_charge(smap, owner, smap->elem_size)) - return NULL; - - selem = kzalloc(smap->elem_size, GFP_ATOMIC | __GFP_NOWARN); - if (selem) { - if (value) - memcpy(SDATA(selem)->data, value, smap->map.value_size); - return selem; - } - - if (charge_mem) - mem_uncharge(smap, owner, smap->elem_size); - - return NULL; -} - -/* local_storage->lock must be held and selem->local_storage == local_storage. - * The caller must ensure selem->smap is still valid to be - * dereferenced for its smap->elem_size and smap->cache_idx. - */ -bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage, - struct bpf_local_storage_elem *selem, - bool uncharge_mem) -{ - struct bpf_local_storage_map *smap; - bool free_local_storage; - void *owner; - - smap = rcu_dereference(SDATA(selem)->smap); - owner = local_storage->owner; - - /* All uncharging on the owner must be done first. - * The owner may be freed once the last selem is unlinked - * from local_storage. - */ - if (uncharge_mem) - mem_uncharge(smap, owner, smap->elem_size); - - free_local_storage = hlist_is_singular_node(&selem->snode, - &local_storage->list); - if (free_local_storage) { - mem_uncharge(smap, owner, sizeof(struct bpf_local_storage)); - local_storage->owner = NULL; - - /* After this RCU_INIT, owner may be freed and cannot be used */ - RCU_INIT_POINTER(*owner_storage(smap, owner), NULL); - - /* local_storage is not freed now. local_storage->lock is - * still held and raw_spin_unlock_bh(&local_storage->lock) - * will be done by the caller. - * - * Although the unlock will be done under - * rcu_read_lock(), it is more intutivie to - * read if kfree_rcu(local_storage, rcu) is done - * after the raw_spin_unlock_bh(&local_storage->lock). - * - * Hence, a "bool free_local_storage" is returned - * to the caller which then calls the kfree_rcu() - * after unlock. - */ - } - hlist_del_init_rcu(&selem->snode); - if (rcu_access_pointer(local_storage->cache[smap->cache_idx]) == - SDATA(selem)) - RCU_INIT_POINTER(local_storage->cache[smap->cache_idx], NULL); - - kfree_rcu(selem, rcu); - - return free_local_storage; -} - -static void __bpf_selem_unlink_storage(struct bpf_local_storage_elem *selem) -{ - struct bpf_local_storage *local_storage; - bool free_local_storage = false; - - if (unlikely(!selem_linked_to_storage(selem))) - /* selem has already been unlinked from sk */ - return; - - local_storage = rcu_dereference(selem->local_storage); - raw_spin_lock_bh(&local_storage->lock); - if (likely(selem_linked_to_storage(selem))) - free_local_storage = bpf_selem_unlink_storage_nolock( - local_storage, selem, true); - raw_spin_unlock_bh(&local_storage->lock); - - if (free_local_storage) - kfree_rcu(local_storage, rcu); -} - -void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage, - struct bpf_local_storage_elem *selem) -{ - RCU_INIT_POINTER(selem->local_storage, local_storage); - hlist_add_head(&selem->snode, &local_storage->list); -} - -void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem) -{ - struct bpf_local_storage_map *smap; - struct bpf_local_storage_map_bucket *b; - - if (unlikely(!selem_linked_to_map(selem))) - /* selem has already be unlinked from smap */ - return; - - smap = rcu_dereference(SDATA(selem)->smap); - b = select_bucket(smap, selem); - raw_spin_lock_bh(&b->lock); - if (likely(selem_linked_to_map(selem))) - hlist_del_init_rcu(&selem->map_node); - raw_spin_unlock_bh(&b->lock); -} - -void bpf_selem_link_map(struct bpf_local_storage_map *smap, - struct bpf_local_storage_elem *selem) -{ - struct bpf_local_storage_map_bucket *b = select_bucket(smap, selem); - - raw_spin_lock_bh(&b->lock); - RCU_INIT_POINTER(SDATA(selem)->smap, smap); - hlist_add_head_rcu(&selem->map_node, &b->list); - raw_spin_unlock_bh(&b->lock); -} - -void bpf_selem_unlink(struct bpf_local_storage_elem *selem) -{ - /* Always unlink from map before unlinking from local_storage - * because selem will be freed after successfully unlinked from - * the local_storage. - */ - bpf_selem_unlink_map(selem); - __bpf_selem_unlink_storage(selem); -} - -struct bpf_local_storage_data * -bpf_local_storage_lookup(struct bpf_local_storage *local_storage, - struct bpf_local_storage_map *smap, - bool cacheit_lockit) -{ - struct bpf_local_storage_data *sdata; - struct bpf_local_storage_elem *selem; - - /* Fast path (cache hit) */ - sdata = rcu_dereference(local_storage->cache[smap->cache_idx]); - if (sdata && rcu_access_pointer(sdata->smap) == smap) - return sdata; - - /* Slow path (cache miss) */ - hlist_for_each_entry_rcu(selem, &local_storage->list, snode) - if (rcu_access_pointer(SDATA(selem)->smap) == smap) - break; - - if (!selem) - return NULL; - - sdata = SDATA(selem); - if (cacheit_lockit) { - /* spinlock is needed to avoid racing with the - * parallel delete. Otherwise, publishing an already - * deleted sdata to the cache will become a use-after-free - * problem in the next bpf_local_storage_lookup(). - */ - raw_spin_lock_bh(&local_storage->lock); - if (selem_linked_to_storage(selem)) - rcu_assign_pointer(local_storage->cache[smap->cache_idx], - sdata); - raw_spin_unlock_bh(&local_storage->lock); - } - - return sdata; -} - static struct bpf_local_storage_data * sk_storage_lookup(struct sock *sk, struct bpf_map *map, bool cacheit_lockit) { @@ -341,202 +41,6 @@ sk_storage_lookup(struct sock *sk, struct bpf_map *map, bool cacheit_lockit) return bpf_local_storage_lookup(sk_storage, smap, cacheit_lockit); } -static int check_flags(const struct bpf_local_storage_data *old_sdata, - u64 map_flags) -{ - if (old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_NOEXIST) - /* elem already exists */ - return -EEXIST; - - if (!old_sdata && (map_flags & ~BPF_F_LOCK) == BPF_EXIST) - /* elem doesn't exist, cannot update it */ - return -ENOENT; - - return 0; -} - -int bpf_local_storage_alloc(void *owner, - struct bpf_local_storage_map *smap, - struct bpf_local_storage_elem *first_selem) -{ - struct bpf_local_storage *prev_storage, *storage; - struct bpf_local_storage **owner_storage_ptr; - int err; - - err = mem_charge(smap, owner, sizeof(*storage)); - if (err) - return err; - - storage = kzalloc(sizeof(*storage), GFP_ATOMIC | __GFP_NOWARN); - if (!storage) { - err = -ENOMEM; - goto uncharge; - } - - INIT_HLIST_HEAD(&storage->list); - raw_spin_lock_init(&storage->lock); - storage->owner = owner; - - bpf_selem_link_storage_nolock(storage, first_selem); - bpf_selem_link_map(smap, first_selem); - - owner_storage_ptr = - (struct bpf_local_storage **)owner_storage(smap, owner); - /* Publish storage to the owner. - * Instead of using any lock of the kernel object (i.e. owner), - * cmpxchg will work with any kernel object regardless what - * the running context is, bh, irq...etc. - * - * From now on, the owner->storage pointer (e.g. sk->sk_bpf_storage) - * is protected by the storage->lock. Hence, when freeing - * the owner->storage, the storage->lock must be held before - * setting owner->storage ptr to NULL. - */ - prev_storage = cmpxchg(owner_storage_ptr, NULL, storage); - if (unlikely(prev_storage)) { - bpf_selem_unlink_map(first_selem); - err = -EAGAIN; - goto uncharge; - - /* Note that even first_selem was linked to smap's - * bucket->list, first_selem can be freed immediately - * (instead of kfree_rcu) because - * bpf_local_storage_map_free() does a - * synchronize_rcu() before walking the bucket->list. - * Hence, no one is accessing selem from the - * bucket->list under rcu_read_lock(). - */ - } - - return 0; - -uncharge: - kfree(storage); - mem_uncharge(smap, owner, sizeof(*storage)); - return err; -} - -/* sk cannot be going away because it is linking new elem - * to sk->sk_bpf_storage. (i.e. sk->sk_refcnt cannot be 0). - * Otherwise, it will become a leak (and other memory issues - * during map destruction). - */ -struct bpf_local_storage_data * -bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, - void *value, u64 map_flags) -{ - struct bpf_local_storage_data *old_sdata = NULL; - struct bpf_local_storage_elem *selem; - struct bpf_local_storage *local_storage; - int err; - - /* BPF_EXIST and BPF_NOEXIST cannot be both set */ - if (unlikely((map_flags & ~BPF_F_LOCK) > BPF_EXIST) || - /* BPF_F_LOCK can only be used in a value with spin_lock */ - unlikely((map_flags & BPF_F_LOCK) && - !map_value_has_spin_lock(&smap->map))) - return ERR_PTR(-EINVAL); - - local_storage = rcu_dereference(*owner_storage(smap, owner)); - if (!local_storage || hlist_empty(&local_storage->list)) { - /* Very first elem for the owner */ - err = check_flags(NULL, map_flags); - if (err) - return ERR_PTR(err); - - selem = bpf_selem_alloc(smap, owner, value, true); - if (!selem) - return ERR_PTR(-ENOMEM); - - err = bpf_local_storage_alloc(owner, smap, selem); - if (err) { - kfree(selem); - mem_uncharge(smap, owner, smap->elem_size); - return ERR_PTR(err); - } - - return SDATA(selem); - } - - if ((map_flags & BPF_F_LOCK) && !(map_flags & BPF_NOEXIST)) { - /* Hoping to find an old_sdata to do inline update - * such that it can avoid taking the local_storage->lock - * and changing the lists. - */ - old_sdata = - bpf_local_storage_lookup(local_storage, smap, false); - err = check_flags(old_sdata, map_flags); - if (err) - return ERR_PTR(err); - if (old_sdata && selem_linked_to_storage(SELEM(old_sdata))) { - copy_map_value_locked(&smap->map, old_sdata->data, - value, false); - return old_sdata; - } - } - - raw_spin_lock_bh(&local_storage->lock); - - /* Recheck local_storage->list under local_storage->lock */ - if (unlikely(hlist_empty(&local_storage->list))) { - /* A parallel del is happening and local_storage is going - * away. It has just been checked before, so very - * unlikely. Return instead of retry to keep things - * simple. - */ - err = -EAGAIN; - goto unlock_err; - } - - old_sdata = bpf_local_storage_lookup(local_storage, smap, false); - err = check_flags(old_sdata, map_flags); - if (err) - goto unlock_err; - - if (old_sdata && (map_flags & BPF_F_LOCK)) { - copy_map_value_locked(&smap->map, old_sdata->data, value, - false); - selem = SELEM(old_sdata); - goto unlock; - } - - /* local_storage->lock is held. Hence, we are sure - * we can unlink and uncharge the old_sdata successfully - * later. Hence, instead of charging the new selem now - * and then uncharge the old selem later (which may cause - * a potential but unnecessary charge failure), avoid taking - * a charge at all here (the "!old_sdata" check) and the - * old_sdata will not be uncharged later during - * bpf_selem_unlink_storage_nolock(). - */ - selem = bpf_selem_alloc(smap, owner, value, !old_sdata); - if (!selem) { - err = -ENOMEM; - goto unlock_err; - } - - /* First, link the new selem to the map */ - bpf_selem_link_map(smap, selem); - - /* Second, link (and publish) the new selem to local_storage */ - bpf_selem_link_storage_nolock(local_storage, selem); - - /* Third, remove old selem, SELEM(old_sdata) */ - if (old_sdata) { - bpf_selem_unlink_map(SELEM(old_sdata)); - bpf_selem_unlink_storage_nolock(local_storage, SELEM(old_sdata), - false); - } - -unlock: - raw_spin_unlock_bh(&local_storage->lock); - return SDATA(selem); - -unlock_err: - raw_spin_unlock_bh(&local_storage->lock); - return ERR_PTR(err); -} - static int sk_storage_delete(struct sock *sk, struct bpf_map *map) { struct bpf_local_storage_data *sdata; @@ -550,38 +54,6 @@ static int sk_storage_delete(struct sock *sk, struct bpf_map *map) return 0; } -u16 bpf_local_storage_cache_idx_get(struct bpf_local_storage_cache *cache) -{ - u64 min_usage = U64_MAX; - u16 i, res = 0; - - spin_lock(&cache->idx_lock); - - for (i = 0; i < BPF_LOCAL_STORAGE_CACHE_SIZE; i++) { - if (cache->idx_usage_counts[i] < min_usage) { - min_usage = cache->idx_usage_counts[i]; - res = i; - - /* Found a free cache_idx */ - if (!min_usage) - break; - } - } - cache->idx_usage_counts[res]++; - - spin_unlock(&cache->idx_lock); - - return res; -} - -void bpf_local_storage_cache_idx_free(struct bpf_local_storage_cache *cache, - u16 idx) -{ - spin_lock(&cache->idx_lock); - cache->idx_usage_counts[idx]--; - spin_unlock(&cache->idx_lock); -} - /* Called by __sk_destruct() & bpf_sk_storage_clone() */ void bpf_sk_storage_free(struct sock *sk) { @@ -622,59 +94,6 @@ void bpf_sk_storage_free(struct sock *sk) kfree_rcu(sk_storage, rcu); } -void bpf_local_storage_map_free(struct bpf_local_storage_map *smap) -{ - struct bpf_local_storage_elem *selem; - struct bpf_local_storage_map_bucket *b; - unsigned int i; - - /* Note that this map might be concurrently cloned from - * bpf_sk_storage_clone. Wait for any existing bpf_sk_storage_clone - * RCU read section to finish before proceeding. New RCU - * read sections should be prevented via bpf_map_inc_not_zero. - */ - synchronize_rcu(); - - /* bpf prog and the userspace can no longer access this map - * now. No new selem (of this map) can be added - * to the owner->storage or to the map bucket's list. - * - * The elem of this map can be cleaned up here - * or when the storage is freed e.g. - * by bpf_sk_storage_free() during __sk_destruct(). - */ - for (i = 0; i < (1U << smap->bucket_log); i++) { - b = &smap->buckets[i]; - - rcu_read_lock(); - /* No one is adding to b->list now */ - while ((selem = hlist_entry_safe( - rcu_dereference_raw(hlist_first_rcu(&b->list)), - struct bpf_local_storage_elem, map_node))) { - bpf_selem_unlink(selem); - cond_resched_rcu(); - } - rcu_read_unlock(); - } - - /* While freeing the storage we may still need to access the map. - * - * e.g. when bpf_sk_storage_free() has unlinked selem from the map - * which then made the above while((selem = ...)) loop - * exit immediately. - * - * However, while freeing the storage one still needs to access the - * smap->elem_size to do the uncharging in - * bpf_selem_unlink_storage_nolock(). - * - * Hence, wait another rcu grace period for the storage to be freed. - */ - synchronize_rcu(); - - kvfree(smap->buckets); - kfree(smap); -} - static void sk_storage_map_free(struct bpf_map *map) { struct bpf_local_storage_map *smap; @@ -684,78 +103,6 @@ static void sk_storage_map_free(struct bpf_map *map) bpf_local_storage_map_free(smap); } -/* U16_MAX is much more than enough for sk local storage - * considering a tcp_sock is ~2k. - */ -#define BPF_LOCAL_STORAGE_MAX_VALUE_SIZE \ - min_t(u32, \ - (KMALLOC_MAX_SIZE - MAX_BPF_STACK - \ - sizeof(struct bpf_local_storage_elem)), \ - (U16_MAX - sizeof(struct bpf_local_storage_elem))) - -int bpf_local_storage_map_alloc_check(union bpf_attr *attr) -{ - if (attr->map_flags & ~BPF_LOCAL_STORAGE_CREATE_FLAG_MASK || - !(attr->map_flags & BPF_F_NO_PREALLOC) || - attr->max_entries || - attr->key_size != sizeof(int) || !attr->value_size || - /* Enforce BTF for userspace sk dumping */ - !attr->btf_key_type_id || !attr->btf_value_type_id) - return -EINVAL; - - if (!bpf_capable()) - return -EPERM; - - if (attr->value_size > BPF_LOCAL_STORAGE_MAX_VALUE_SIZE) - return -E2BIG; - - return 0; -} - -struct bpf_local_storage_map *bpf_local_storage_map_alloc(union bpf_attr *attr) -{ - struct bpf_local_storage_map *smap; - unsigned int i; - u32 nbuckets; - u64 cost; - int ret; - - smap = kzalloc(sizeof(*smap), GFP_USER | __GFP_NOWARN); - if (!smap) - return ERR_PTR(-ENOMEM); - bpf_map_init_from_attr(&smap->map, attr); - - nbuckets = roundup_pow_of_two(num_possible_cpus()); - /* Use at least 2 buckets, select_bucket() is undefined behavior with 1 bucket */ - nbuckets = max_t(u32, 2, nbuckets); - smap->bucket_log = ilog2(nbuckets); - cost = sizeof(*smap->buckets) * nbuckets + sizeof(*smap); - - ret = bpf_map_charge_init(&smap->map.memory, cost); - if (ret < 0) { - kfree(smap); - return ERR_PTR(ret); - } - - smap->buckets = kvcalloc(sizeof(*smap->buckets), nbuckets, - GFP_USER | __GFP_NOWARN); - if (!smap->buckets) { - bpf_map_charge_finish(&smap->map.memory); - kfree(smap); - return ERR_PTR(-ENOMEM); - } - - for (i = 0; i < nbuckets; i++) { - INIT_HLIST_HEAD(&smap->buckets[i].list); - raw_spin_lock_init(&smap->buckets[i].lock); - } - - smap->elem_size = - sizeof(struct bpf_local_storage_elem) + attr->value_size; - - return smap; -} - static struct bpf_map *sk_storage_map_alloc(union bpf_attr *attr) { struct bpf_local_storage_map *smap; @@ -774,23 +121,6 @@ static int notsupp_get_next_key(struct bpf_map *map, void *key, return -ENOTSUPP; } -int bpf_local_storage_map_check_btf(const struct bpf_map *map, - const struct btf *btf, - const struct btf_type *key_type, - const struct btf_type *value_type) -{ - u32 int_data; - - if (BTF_INFO_KIND(key_type->info) != BTF_KIND_INT) - return -EINVAL; - - int_data = *(u32 *)(key_type + 1); - if (BTF_INT_BITS(int_data) != 32 || BTF_INT_OFFSET(int_data)) - return -EINVAL; - - return 0; -} - static void *bpf_fd_sk_storage_lookup_elem(struct bpf_map *map, void *key) { struct bpf_local_storage_data *sdata; From patchwork Tue Aug 25 18:29:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: KP Singh X-Patchwork-Id: 1351221 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=chromium.org header.i=@chromium.org header.a=rsa-sha256 header.s=google header.b=AS38meYt; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4BbcwW2bNXz9sTR for ; Wed, 26 Aug 2020 04:29:47 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726646AbgHYS3p (ORCPT ); Tue, 25 Aug 2020 14:29:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726656AbgHYS3g (ORCPT ); Tue, 25 Aug 2020 14:29:36 -0400 Received: from mail-ej1-x643.google.com (mail-ej1-x643.google.com [IPv6:2a00:1450:4864:20::643]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE33CC06179A for ; Tue, 25 Aug 2020 11:29:29 -0700 (PDT) Received: by mail-ej1-x643.google.com with SMTP id si26so17857954ejb.12 for ; Tue, 25 Aug 2020 11:29:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=79kQIEcK7yqmRoFZrIuhjzHkb1P77k+b1BhK/XVCBRU=; b=AS38meYtF6jHBflZ2Ny7hGF94mQ+4V3W07jPG8fbvXilGHBHGwkD6ByhbjbmVKmFvH B0g7FU/Jf/Mbo9h/6Xr/z56L+sNFqEm0dqQh7fpgTND2gGvonKW2h5essJFqaJn9AWQ7 c7wQ7LAPYHi+KfFXe3m1D7rO7SKAIUpRZSpSo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=79kQIEcK7yqmRoFZrIuhjzHkb1P77k+b1BhK/XVCBRU=; b=dPTQmTWKnLR45aw50mlX18Yi4CM67oph4hOvHbHlBgb1VWmGxK/xyrxEqQrrkvq4no WxUFim2y+PHXQ0HLSCJWXWk7tvDF0B4A2V2u+wsqqhbtXBMCX+rmEyxiQYPqc8zyL01U 9LZZHYfElAkV9wZzZ+TzLFVFQv4CCwOzydxe8d5thdMZrMCLkd1Znp4CXT44eFa2Z82y T5lHZYH9NCKOMV0a3iM/4oydCo1492DErGVRHgUXwddkxo06G1s13OrcybWEtDATaDGm y7UizKbRRVPIx6nvhZLYK162WuQMm8hu+cr9n8WPnlx8TdS2SBuGCRlKoccD6fEHuou9 04zw== X-Gm-Message-State: AOAM532yRH1JhbhESy5TEswdf9OrJAWRKFc07W5QM0qX5cwC3sx67FET 7MXDx5kbC0F+dlUQ5RhJuLohJQ== X-Google-Smtp-Source: ABdhPJx4HfH5ol1cB0+55NcforHTJ5Bz0mWB/YRGU83lBeTUimrw8hLDufDozhZ6dzciYFejzrddug== X-Received: by 2002:a17:906:af41:: with SMTP id ly1mr11564742ejb.418.1598380168248; Tue, 25 Aug 2020 11:29:28 -0700 (PDT) Received: from kpsingh.zrh.corp.google.com ([81.6.44.51]) by smtp.gmail.com with ESMTPSA id dr21sm15323286ejc.112.2020.08.25.11.29.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Aug 2020 11:29:27 -0700 (PDT) From: KP Singh To: linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-security-module@vger.kernel.org Cc: Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , Paul Turner , Jann Horn , Florent Revest Subject: [PATCH bpf-next v10 5/7] bpf: Implement bpf_local_storage for inodes Date: Tue, 25 Aug 2020 20:29:17 +0200 Message-Id: <20200825182919.1118197-6-kpsingh@chromium.org> X-Mailer: git-send-email 2.28.0.297.g1956fa8f8d-goog In-Reply-To: <20200825182919.1118197-1-kpsingh@chromium.org> References: <20200825182919.1118197-1-kpsingh@chromium.org> MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org From: KP Singh Similar to bpf_local_storage for sockets, add local storage for inodes. The life-cycle of storage is managed with the life-cycle of the inode. i.e. the storage is destroyed along with the owning inode. The BPF LSM allocates an __rcu pointer to the bpf_local_storage in the security blob which are now stackable and can co-exist with other LSMs. Signed-off-by: KP Singh --- include/linux/bpf_lsm.h | 29 ++ include/linux/bpf_types.h | 3 + include/uapi/linux/bpf.h | 40 ++- kernel/bpf/Makefile | 1 + kernel/bpf/bpf_inode_storage.c | 273 ++++++++++++++++++ kernel/bpf/syscall.c | 3 +- kernel/bpf/verifier.c | 10 + security/bpf/hooks.c | 6 + .../bpf/bpftool/Documentation/bpftool-map.rst | 2 +- tools/bpf/bpftool/bash-completion/bpftool | 3 +- tools/bpf/bpftool/map.c | 3 +- tools/include/uapi/linux/bpf.h | 40 ++- tools/lib/bpf/libbpf_probes.c | 5 +- 13 files changed, 410 insertions(+), 8 deletions(-) create mode 100644 kernel/bpf/bpf_inode_storage.c diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h index af74712af585..aaacb6aafc87 100644 --- a/include/linux/bpf_lsm.h +++ b/include/linux/bpf_lsm.h @@ -17,9 +17,28 @@ #include #undef LSM_HOOK +struct bpf_storage_blob { + struct bpf_local_storage __rcu *storage; +}; + +extern struct lsm_blob_sizes bpf_lsm_blob_sizes; + int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog, const struct bpf_prog *prog); +static inline struct bpf_storage_blob *bpf_inode( + const struct inode *inode) +{ + if (unlikely(!inode->i_security)) + return NULL; + + return inode->i_security + bpf_lsm_blob_sizes.lbs_inode; +} + +extern const struct bpf_func_proto bpf_inode_storage_get_proto; +extern const struct bpf_func_proto bpf_inode_storage_delete_proto; +void bpf_inode_storage_free(struct inode *inode); + #else /* !CONFIG_BPF_LSM */ static inline int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog, @@ -28,6 +47,16 @@ static inline int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog, return -EOPNOTSUPP; } +static inline struct bpf_storage_blob *bpf_inode( + const struct inode *inode) +{ + return NULL; +} + +static inline void bpf_inode_storage_free(struct inode *inode) +{ +} + #endif /* CONFIG_BPF_LSM */ #endif /* _LINUX_BPF_LSM_H */ diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index a52a5688418e..2e6f568377f1 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -107,6 +107,9 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_SK_STORAGE, sk_storage_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKMAP, sock_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_SOCKHASH, sock_hash_ops) #endif +#ifdef CONFIG_BPF_LSM +BPF_MAP_TYPE(BPF_MAP_TYPE_INODE_STORAGE, inode_storage_map_ops) +#endif BPF_MAP_TYPE(BPF_MAP_TYPE_CPUMAP, cpu_map_ops) #if defined(CONFIG_XDP_SOCKETS) BPF_MAP_TYPE(BPF_MAP_TYPE_XSKMAP, xsk_map_ops) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 2cbd137eed86..b6bfcd085a76 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -155,6 +155,7 @@ enum bpf_map_type { BPF_MAP_TYPE_DEVMAP_HASH, BPF_MAP_TYPE_STRUCT_OPS, BPF_MAP_TYPE_RINGBUF, + BPF_MAP_TYPE_INODE_STORAGE, }; /* Note that tracing related programs such as @@ -3509,6 +3510,41 @@ union bpf_attr { * * **-EPERM** This helper cannot be used under the * current sock_ops->op. + * void *bpf_inode_storage_get(struct bpf_map *map, void *inode, void *value, u64 flags) + * Description + * Get a bpf_local_storage from an *inode*. + * + * Logically, it could be thought of as getting the value from + * a *map* with *inode* as the **key**. From this + * perspective, the usage is not much different from + * **bpf_map_lookup_elem**\ (*map*, **&**\ *inode*) except this + * helper enforces the key must be an inode and the map must also + * be a **BPF_MAP_TYPE_INODE_STORAGE**. + * + * Underneath, the value is stored locally at *inode* instead of + * the *map*. The *map* is used as the bpf-local-storage + * "type". The bpf-local-storage "type" (i.e. the *map*) is + * searched against all bpf_local_storage residing at *inode*. + * + * An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) can be + * used such that a new bpf_local_storage will be + * created if one does not exist. *value* can be used + * together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify + * the initial value of a bpf_local_storage. If *value* is + * **NULL**, the new bpf_local_storage will be zero initialized. + * Return + * A bpf_local_storage pointer is returned on success. + * + * **NULL** if not found or there was an error in adding + * a new bpf_local_storage. + * + * int bpf_inode_storage_delete(struct bpf_map *map, void *inode) + * Description + * Delete a bpf_local_storage from an *inode*. + * Return + * 0 on success. + * + * **-ENOENT** if the bpf_local_storage cannot be found. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -3655,7 +3691,9 @@ union bpf_attr { FN(get_task_stack), \ FN(load_hdr_opt), \ FN(store_hdr_opt), \ - FN(reserve_hdr_opt), + FN(reserve_hdr_opt), \ + FN(inode_storage_get), \ + FN(inode_storage_delete), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile index 6961ff400cba..bdc8cd1b6767 100644 --- a/kernel/bpf/Makefile +++ b/kernel/bpf/Makefile @@ -5,6 +5,7 @@ CFLAGS_core.o += $(call cc-disable-warning, override-init) obj-$(CONFIG_BPF_SYSCALL) += syscall.o verifier.o inode.o helpers.o tnum.o bpf_iter.o map_iter.o task_iter.o prog_iter.o obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o +obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o obj-$(CONFIG_BPF_SYSCALL) += disasm.o obj-$(CONFIG_BPF_JIT) += trampoline.o obj-$(CONFIG_BPF_SYSCALL) += btf.o diff --git a/kernel/bpf/bpf_inode_storage.c b/kernel/bpf/bpf_inode_storage.c new file mode 100644 index 000000000000..f3a44e929447 --- /dev/null +++ b/kernel/bpf/bpf_inode_storage.c @@ -0,0 +1,273 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2019 Facebook + * Copyright 2020 Google LLC. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +DEFINE_BPF_STORAGE_CACHE(inode_cache); + +static struct bpf_local_storage __rcu ** +inode_storage_ptr(void *owner) +{ + struct inode *inode = owner; + struct bpf_storage_blob *bsb; + + bsb = bpf_inode(inode); + if (!bsb) + return NULL; + return &bsb->storage; +} + +static struct bpf_local_storage_data *inode_storage_lookup(struct inode *inode, + struct bpf_map *map, + bool cacheit_lockit) +{ + struct bpf_local_storage *inode_storage; + struct bpf_local_storage_map *smap; + struct bpf_storage_blob *bsb; + + bsb = bpf_inode(inode); + if (!bsb) + return NULL; + + inode_storage = rcu_dereference(bsb->storage); + if (!inode_storage) + return NULL; + + smap = (struct bpf_local_storage_map *)map; + return bpf_local_storage_lookup(inode_storage, smap, cacheit_lockit); +} + +void bpf_inode_storage_free(struct inode *inode) +{ + struct bpf_local_storage_elem *selem; + struct bpf_local_storage *local_storage; + bool free_inode_storage = false; + struct bpf_storage_blob *bsb; + struct hlist_node *n; + + bsb = bpf_inode(inode); + if (!bsb) + return; + + rcu_read_lock(); + + local_storage = rcu_dereference(bsb->storage); + if (!local_storage) { + rcu_read_unlock(); + return; + } + + /* Netiher the bpf_prog nor the bpf-map's syscall + * could be modifying the local_storage->list now. + * Thus, no elem can be added-to or deleted-from the + * local_storage->list by the bpf_prog or by the bpf-map's syscall. + * + * It is racing with bpf_local_storage_map_free() alone + * when unlinking elem from the local_storage->list and + * the map's bucket->list. + */ + raw_spin_lock_bh(&local_storage->lock); + hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) { + /* Always unlink from map before unlinking from + * local_storage. + */ + bpf_selem_unlink_map(selem); + free_inode_storage = bpf_selem_unlink_storage_nolock( + local_storage, selem, false); + } + raw_spin_unlock_bh(&local_storage->lock); + rcu_read_unlock(); + + /* free_inoode_storage should always be true as long as + * local_storage->list was non-empty. + */ + if (free_inode_storage) + kfree_rcu(local_storage, rcu); +} + +static void *bpf_fd_inode_storage_lookup_elem(struct bpf_map *map, void *key) +{ + struct bpf_local_storage_data *sdata; + struct file *f; + int fd; + + fd = *(int *)key; + f = fget_raw(fd); + if (!f) + return NULL; + + sdata = inode_storage_lookup(f->f_inode, map, true); + fput(f); + return sdata ? sdata->data : NULL; +} + +static int bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key, + void *value, u64 map_flags) +{ + struct bpf_local_storage_data *sdata; + struct file *f; + int fd; + + fd = *(int *)key; + f = fget_raw(fd); + if (!f || !inode_storage_ptr(f->f_inode)) + return -EBADF; + + sdata = bpf_local_storage_update(f->f_inode, + (struct bpf_local_storage_map *)map, + value, map_flags); + fput(f); + return PTR_ERR_OR_ZERO(sdata); +} + +static int inode_storage_delete(struct inode *inode, struct bpf_map *map) +{ + struct bpf_local_storage_data *sdata; + + sdata = inode_storage_lookup(inode, map, false); + if (!sdata) + return -ENOENT; + + bpf_selem_unlink(SELEM(sdata)); + + return 0; +} + +static int bpf_fd_inode_storage_delete_elem(struct bpf_map *map, void *key) +{ + struct file *f; + int fd, err; + + fd = *(int *)key; + f = fget_raw(fd); + if (!f) + return -EBADF; + + err = inode_storage_delete(f->f_inode, map); + fput(f); + return err; +} + +BPF_CALL_4(bpf_inode_storage_get, struct bpf_map *, map, struct inode *, inode, + void *, value, u64, flags) +{ + struct bpf_local_storage_data *sdata; + + if (flags & ~(BPF_LOCAL_STORAGE_GET_F_CREATE)) + return (unsigned long)NULL; + + /* explicitly check that the inode_storage_ptr is not + * NULL as inode_storage_lookup returns NULL in this case and + * bpf_local_storage_update expects the owner to have a + * valid storage pointer. + */ + if (!inode_storage_ptr(inode)) + return (unsigned long)NULL; + + sdata = inode_storage_lookup(inode, map, true); + if (sdata) + return (unsigned long)sdata->data; + + /* This helper must only called from where the inode is gurranteed + * to have a refcount and cannot be freed. + */ + if (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) { + sdata = bpf_local_storage_update( + inode, (struct bpf_local_storage_map *)map, value, + BPF_NOEXIST); + return IS_ERR(sdata) ? (unsigned long)NULL : + (unsigned long)sdata->data; + } + + return (unsigned long)NULL; +} + +BPF_CALL_2(bpf_inode_storage_delete, + struct bpf_map *, map, struct inode *, inode) +{ + /* This helper must only called from where the inode is gurranteed + * to have a refcount and cannot be freed. + */ + return inode_storage_delete(inode, map); +} + +static int notsupp_get_next_key(struct bpf_map *map, void *key, + void *next_key) +{ + return -ENOTSUPP; +} + +static struct bpf_map *inode_storage_map_alloc(union bpf_attr *attr) +{ + struct bpf_local_storage_map *smap; + + smap = bpf_local_storage_map_alloc(attr); + if (IS_ERR(smap)) + return ERR_CAST(smap); + + smap->cache_idx = bpf_local_storage_cache_idx_get(&inode_cache); + return &smap->map; +} + +static void inode_storage_map_free(struct bpf_map *map) +{ + struct bpf_local_storage_map *smap; + + smap = (struct bpf_local_storage_map *)map; + bpf_local_storage_cache_idx_free(&inode_cache, smap->cache_idx); + bpf_local_storage_map_free(smap); +} + +static int inode_storage_map_btf_id; +const struct bpf_map_ops inode_storage_map_ops = { + .map_alloc_check = bpf_local_storage_map_alloc_check, + .map_alloc = inode_storage_map_alloc, + .map_free = inode_storage_map_free, + .map_get_next_key = notsupp_get_next_key, + .map_lookup_elem = bpf_fd_inode_storage_lookup_elem, + .map_update_elem = bpf_fd_inode_storage_update_elem, + .map_delete_elem = bpf_fd_inode_storage_delete_elem, + .map_check_btf = bpf_local_storage_map_check_btf, + .map_btf_name = "bpf_local_storage_map", + .map_btf_id = &inode_storage_map_btf_id, + .map_owner_storage_ptr = inode_storage_ptr, +}; + +BTF_ID_LIST(bpf_inode_storage_btf_ids) +BTF_ID_UNUSED +BTF_ID(struct, inode) + +const struct bpf_func_proto bpf_inode_storage_get_proto = { + .func = bpf_inode_storage_get, + .gpl_only = false, + .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL, + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_PTR_TO_BTF_ID, + .arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL, + .arg4_type = ARG_ANYTHING, + .btf_id = bpf_inode_storage_btf_ids, +}; + +const struct bpf_func_proto bpf_inode_storage_delete_proto = { + .func = bpf_inode_storage_delete, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_PTR_TO_BTF_ID, + .btf_id = bpf_inode_storage_btf_ids, +}; diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index b46e973faee9..5443cea86cef 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -769,7 +769,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf, if (map->map_type != BPF_MAP_TYPE_HASH && map->map_type != BPF_MAP_TYPE_ARRAY && map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE && - map->map_type != BPF_MAP_TYPE_SK_STORAGE) + map->map_type != BPF_MAP_TYPE_SK_STORAGE && + map->map_type != BPF_MAP_TYPE_INODE_STORAGE) return -ENOTSUPP; if (map->spin_lock_off + sizeof(struct bpf_spin_lock) > map->value_size) { diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index dd24503ab3d3..38748794518e 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -4311,6 +4311,11 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, func_id != BPF_FUNC_sk_storage_delete) goto error; break; + case BPF_MAP_TYPE_INODE_STORAGE: + if (func_id != BPF_FUNC_inode_storage_get && + func_id != BPF_FUNC_inode_storage_delete) + goto error; + break; default: break; } @@ -4384,6 +4389,11 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env, if (map->map_type != BPF_MAP_TYPE_SK_STORAGE) goto error; break; + case BPF_FUNC_inode_storage_get: + case BPF_FUNC_inode_storage_delete: + if (map->map_type != BPF_MAP_TYPE_INODE_STORAGE) + goto error; + break; default: break; } diff --git a/security/bpf/hooks.c b/security/bpf/hooks.c index 32d32d485451..788667d582ae 100644 --- a/security/bpf/hooks.c +++ b/security/bpf/hooks.c @@ -11,6 +11,7 @@ static struct security_hook_list bpf_lsm_hooks[] __lsm_ro_after_init = { LSM_HOOK_INIT(NAME, bpf_lsm_##NAME), #include #undef LSM_HOOK + LSM_HOOK_INIT(inode_free_security, bpf_inode_storage_free), }; static int __init bpf_lsm_init(void) @@ -20,7 +21,12 @@ static int __init bpf_lsm_init(void) return 0; } +struct lsm_blob_sizes bpf_lsm_blob_sizes __lsm_ro_after_init = { + .lbs_inode = sizeof(struct bpf_storage_blob), +}; + DEFINE_LSM(bpf) = { .name = "bpf", .init = bpf_lsm_init, + .blobs = &bpf_lsm_blob_sizes }; diff --git a/tools/bpf/bpftool/Documentation/bpftool-map.rst b/tools/bpf/bpftool/Documentation/bpftool-map.rst index 41e2a74252d0..083db6c2fc67 100644 --- a/tools/bpf/bpftool/Documentation/bpftool-map.rst +++ b/tools/bpf/bpftool/Documentation/bpftool-map.rst @@ -49,7 +49,7 @@ MAP COMMANDS | | **lru_percpu_hash** | **lpm_trie** | **array_of_maps** | **hash_of_maps** | | **devmap** | **devmap_hash** | **sockmap** | **cpumap** | **xskmap** | **sockhash** | | **cgroup_storage** | **reuseport_sockarray** | **percpu_cgroup_storage** -| | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** } +| | **queue** | **stack** | **sk_storage** | **struct_ops** | **ringbuf** | **inode_storage** } DESCRIPTION =========== diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool index f53ed2f1a4aa..7b68e3c0a5fb 100644 --- a/tools/bpf/bpftool/bash-completion/bpftool +++ b/tools/bpf/bpftool/bash-completion/bpftool @@ -704,7 +704,8 @@ _bpftool() lru_percpu_hash lpm_trie array_of_maps \ hash_of_maps devmap devmap_hash sockmap cpumap \ xskmap sockhash cgroup_storage reuseport_sockarray \ - percpu_cgroup_storage queue stack' -- \ + percpu_cgroup_storage queue stack sk_storage \ + struct_ops inode_storage' -- \ "$cur" ) ) return 0 ;; diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c index 3a27d31a1856..bc0071228f88 100644 --- a/tools/bpf/bpftool/map.c +++ b/tools/bpf/bpftool/map.c @@ -50,6 +50,7 @@ const char * const map_type_name[] = { [BPF_MAP_TYPE_SK_STORAGE] = "sk_storage", [BPF_MAP_TYPE_STRUCT_OPS] = "struct_ops", [BPF_MAP_TYPE_RINGBUF] = "ringbuf", + [BPF_MAP_TYPE_INODE_STORAGE] = "inode_storage", }; const size_t map_type_name_size = ARRAY_SIZE(map_type_name); @@ -1442,7 +1443,7 @@ static int do_help(int argc, char **argv) " lru_percpu_hash | lpm_trie | array_of_maps | hash_of_maps |\n" " devmap | devmap_hash | sockmap | cpumap | xskmap | sockhash |\n" " cgroup_storage | reuseport_sockarray | percpu_cgroup_storage |\n" - " queue | stack | sk_storage | struct_ops | ringbuf }\n" + " queue | stack | sk_storage | struct_ops | ringbuf | inode_storage }\n" " " HELP_SPEC_OPTIONS "\n" "", bin_name, argv[-2]); diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 2cbd137eed86..b6bfcd085a76 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -155,6 +155,7 @@ enum bpf_map_type { BPF_MAP_TYPE_DEVMAP_HASH, BPF_MAP_TYPE_STRUCT_OPS, BPF_MAP_TYPE_RINGBUF, + BPF_MAP_TYPE_INODE_STORAGE, }; /* Note that tracing related programs such as @@ -3509,6 +3510,41 @@ union bpf_attr { * * **-EPERM** This helper cannot be used under the * current sock_ops->op. + * void *bpf_inode_storage_get(struct bpf_map *map, void *inode, void *value, u64 flags) + * Description + * Get a bpf_local_storage from an *inode*. + * + * Logically, it could be thought of as getting the value from + * a *map* with *inode* as the **key**. From this + * perspective, the usage is not much different from + * **bpf_map_lookup_elem**\ (*map*, **&**\ *inode*) except this + * helper enforces the key must be an inode and the map must also + * be a **BPF_MAP_TYPE_INODE_STORAGE**. + * + * Underneath, the value is stored locally at *inode* instead of + * the *map*. The *map* is used as the bpf-local-storage + * "type". The bpf-local-storage "type" (i.e. the *map*) is + * searched against all bpf_local_storage residing at *inode*. + * + * An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) can be + * used such that a new bpf_local_storage will be + * created if one does not exist. *value* can be used + * together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify + * the initial value of a bpf_local_storage. If *value* is + * **NULL**, the new bpf_local_storage will be zero initialized. + * Return + * A bpf_local_storage pointer is returned on success. + * + * **NULL** if not found or there was an error in adding + * a new bpf_local_storage. + * + * int bpf_inode_storage_delete(struct bpf_map *map, void *inode) + * Description + * Delete a bpf_local_storage from an *inode*. + * Return + * 0 on success. + * + * **-ENOENT** if the bpf_local_storage cannot be found. */ #define __BPF_FUNC_MAPPER(FN) \ FN(unspec), \ @@ -3655,7 +3691,9 @@ union bpf_attr { FN(get_task_stack), \ FN(load_hdr_opt), \ FN(store_hdr_opt), \ - FN(reserve_hdr_opt), + FN(reserve_hdr_opt), \ + FN(inode_storage_get), \ + FN(inode_storage_delete), \ /* */ /* integer value in 'imm' field of BPF_CALL instruction selects which helper diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c index 010c9a76fd2b..5482a9b7ae2d 100644 --- a/tools/lib/bpf/libbpf_probes.c +++ b/tools/lib/bpf/libbpf_probes.c @@ -170,7 +170,7 @@ int libbpf__load_raw_btf(const char *raw_types, size_t types_len, return btf_fd; } -static int load_sk_storage_btf(void) +static int load_local_storage_btf(void) { const char strs[] = "\0bpf_spin_lock\0val\0cnt\0l"; /* struct bpf_spin_lock { @@ -229,12 +229,13 @@ bool bpf_probe_map_type(enum bpf_map_type map_type, __u32 ifindex) key_size = 0; break; case BPF_MAP_TYPE_SK_STORAGE: + case BPF_MAP_TYPE_INODE_STORAGE: btf_key_type_id = 1; btf_value_type_id = 3; value_size = 8; max_entries = 0; map_flags = BPF_F_NO_PREALLOC; - btf_fd = load_sk_storage_btf(); + btf_fd = load_local_storage_btf(); if (btf_fd < 0) return false; break; From patchwork Tue Aug 25 18:29:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: KP Singh X-Patchwork-Id: 1351224 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=chromium.org header.i=@chromium.org header.a=rsa-sha256 header.s=google header.b=in178nvC; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4Bbcwh2Qjnz9sTR for ; Wed, 26 Aug 2020 04:29:56 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726734AbgHYS3w (ORCPT ); Tue, 25 Aug 2020 14:29:52 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42722 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726673AbgHYS3g (ORCPT ); Tue, 25 Aug 2020 14:29:36 -0400 Received: from mail-ej1-x642.google.com (mail-ej1-x642.google.com [IPv6:2a00:1450:4864:20::642]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1DA07C0617A0 for ; Tue, 25 Aug 2020 11:29:31 -0700 (PDT) Received: by mail-ej1-x642.google.com with SMTP id l2so11822911eji.3 for ; Tue, 25 Aug 2020 11:29:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=d10PijIPuG0Iy1sT+yUZyIWJTUgD/5/pscV7dWlrwCs=; b=in178nvC/wWOGVHO8LUcePDeuaHSWE0dopDVO/ffY3s8OthG82JXPlnBj8RLjdt1Qt 0ypaEAUjtF1wbOEN9QvUyw96i7VvtuXL5DtdodzcH53cex1Xy+6YKLewuRBh0Br2RAUY yVst+DKxCxEzEiKsnSeVwxWCudoySEz3A6kyU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=d10PijIPuG0Iy1sT+yUZyIWJTUgD/5/pscV7dWlrwCs=; b=tml+IuzrJsF7H2iQn9X7lN0iqrui5pCcDHjKng4ljTDFRpeOhXKEG4JTf/WwSvcTQZ ktL8c/I2Vg1TXHoOMCnQgrB5Tt2wVFpz2wacnEV3ROmGfxk4/ry97OW/ySWNqMxqo9N9 g3qTFAmQcoCrw9jjWs4BNzLiOrEdgzmOoVZpHn7rO5eByWgiUZ6UD+DQEdA0TjVt9ZkZ Xjv7diaeA/yvDp+YXptTdfKUlzN+OYqZz4o6YgdriSg5KiRPBvp7CZnIzF9Slq6Ko9mO ppyJse/IuUlLyZtSDlDPrgRHG1AtWpU00ZWOuU8/NpOOGrqGUyQ3I1JfafBrO86Txb57 ZqPw== X-Gm-Message-State: AOAM530t9KrXL0mVcERX/wZ7Mk5Mc1wrcYHVxf6TL2OZGVIY+3qqL276 lhiEgVpBXhZ5+yff6/L5m4vC0dp6GALXMw== X-Google-Smtp-Source: ABdhPJzpRkSaE0z8A85wKfv7Sqklv7tJuzfkaazGf1AfXeeuhe2y8k1c1zBOKZU4BiEkWGCdX+DKPw== X-Received: by 2002:a17:906:3a81:: with SMTP id y1mr11363532ejd.464.1598380169682; Tue, 25 Aug 2020 11:29:29 -0700 (PDT) Received: from kpsingh.zrh.corp.google.com ([81.6.44.51]) by smtp.gmail.com with ESMTPSA id dr21sm15323286ejc.112.2020.08.25.11.29.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Aug 2020 11:29:28 -0700 (PDT) From: KP Singh To: linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-security-module@vger.kernel.org Cc: Martin KaFai Lau , Alexei Starovoitov , Daniel Borkmann , Paul Turner , Jann Horn , Florent Revest Subject: [PATCH bpf-next v10 6/7] bpf: Allow local storage to be used from LSM programs Date: Tue, 25 Aug 2020 20:29:18 +0200 Message-Id: <20200825182919.1118197-7-kpsingh@chromium.org> X-Mailer: git-send-email 2.28.0.297.g1956fa8f8d-goog In-Reply-To: <20200825182919.1118197-1-kpsingh@chromium.org> References: <20200825182919.1118197-1-kpsingh@chromium.org> MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org From: KP Singh Adds support for both bpf_{sk, inode}_storage_{get, delete} to be used in LSM programs. These helpers are not used for tracing programs (currently) as their usage is tied to the life-cycle of the object and should only be used where the owning object won't be freed (when the owning object is passed as an argument to the LSM hook). Thus, they are safer to use in LSM hooks than tracing. Usage of local storage in tracing programs will probably follow a per function based whitelist approach. Since the UAPI helper signature for bpf_sk_storage expect a bpf_sock, it, leads to a compilation warning for LSM programs, it's also updated to accept a void * pointer instead. Acked-by: Martin KaFai Lau Signed-off-by: KP Singh --- include/net/bpf_sk_storage.h | 2 ++ include/uapi/linux/bpf.h | 7 +++++-- kernel/bpf/bpf_lsm.c | 21 ++++++++++++++++++++- net/core/bpf_sk_storage.c | 25 +++++++++++++++++++++++++ tools/include/uapi/linux/bpf.h | 7 +++++-- 5 files changed, 57 insertions(+), 5 deletions(-) diff --git a/include/net/bpf_sk_storage.h b/include/net/bpf_sk_storage.h index 3c516dd07caf..119f4c9c3a9c 100644 --- a/include/net/bpf_sk_storage.h +++ b/include/net/bpf_sk_storage.h @@ -20,6 +20,8 @@ void bpf_sk_storage_free(struct sock *sk); extern const struct bpf_func_proto bpf_sk_storage_get_proto; extern const struct bpf_func_proto bpf_sk_storage_delete_proto; +extern const struct bpf_func_proto sk_storage_get_btf_proto; +extern const struct bpf_func_proto sk_storage_delete_btf_proto; struct bpf_local_storage_elem; struct bpf_sk_storage_diag; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index b6bfcd085a76..0e1cdf806fe1 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -2808,7 +2808,7 @@ union bpf_attr { * * **-ERANGE** if resulting value was out of range. * - * void *bpf_sk_storage_get(struct bpf_map *map, struct bpf_sock *sk, void *value, u64 flags) + * void *bpf_sk_storage_get(struct bpf_map *map, void *sk, void *value, u64 flags) * Description * Get a bpf-local-storage from a *sk*. * @@ -2824,6 +2824,9 @@ union bpf_attr { * "type". The bpf-local-storage "type" (i.e. the *map*) is * searched against all bpf-local-storages residing at *sk*. * + * *sk* is a kernel **struct sock** pointer for LSM program. + * *sk* is a **struct bpf_sock** pointer for other program types. + * * An optional *flags* (**BPF_SK_STORAGE_GET_F_CREATE**) can be * used such that a new bpf-local-storage will be * created if one does not exist. *value* can be used @@ -2836,7 +2839,7 @@ union bpf_attr { * **NULL** if not found or there was an error in adding * a new bpf-local-storage. * - * long bpf_sk_storage_delete(struct bpf_map *map, struct bpf_sock *sk) + * long bpf_sk_storage_delete(struct bpf_map *map, void *sk) * Description * Delete a bpf-local-storage from a *sk*. * Return diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c index fb278144e9fd..9cd1428c7199 100644 --- a/kernel/bpf/bpf_lsm.c +++ b/kernel/bpf/bpf_lsm.c @@ -11,6 +11,8 @@ #include #include #include +#include +#include /* For every LSM hook that allows attachment of BPF programs, declare a nop * function where a BPF program can be attached. @@ -45,10 +47,27 @@ int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog, return 0; } +static const struct bpf_func_proto * +bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +{ + switch (func_id) { + case BPF_FUNC_inode_storage_get: + return &bpf_inode_storage_get_proto; + case BPF_FUNC_inode_storage_delete: + return &bpf_inode_storage_delete_proto; + case BPF_FUNC_sk_storage_get: + return &sk_storage_get_btf_proto; + case BPF_FUNC_sk_storage_delete: + return &sk_storage_delete_btf_proto; + default: + return tracing_prog_func_proto(func_id, prog); + } +} + const struct bpf_prog_ops lsm_prog_ops = { }; const struct bpf_verifier_ops lsm_verifier_ops = { - .get_func_proto = tracing_prog_func_proto, + .get_func_proto = bpf_lsm_func_proto, .is_valid_access = btf_ctx_access, }; diff --git a/net/core/bpf_sk_storage.c b/net/core/bpf_sk_storage.c index f29d9a9b4ea4..55fae03b4cc3 100644 --- a/net/core/bpf_sk_storage.c +++ b/net/core/bpf_sk_storage.c @@ -12,6 +12,7 @@ #include #include #include +#include DEFINE_BPF_STORAGE_CACHE(sk_cache); @@ -377,6 +378,30 @@ const struct bpf_func_proto bpf_sk_storage_delete_proto = { .arg2_type = ARG_PTR_TO_SOCKET, }; +BTF_ID_LIST(sk_storage_btf_ids) +BTF_ID_UNUSED +BTF_ID(struct, sock) + +const struct bpf_func_proto sk_storage_get_btf_proto = { + .func = bpf_sk_storage_get, + .gpl_only = false, + .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL, + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_PTR_TO_BTF_ID, + .arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL, + .arg4_type = ARG_ANYTHING, + .btf_id = sk_storage_btf_ids, +}; + +const struct bpf_func_proto sk_storage_delete_btf_proto = { + .func = bpf_sk_storage_delete, + .gpl_only = false, + .ret_type = RET_INTEGER, + .arg1_type = ARG_CONST_MAP_PTR, + .arg2_type = ARG_PTR_TO_BTF_ID, + .btf_id = sk_storage_btf_ids, +}; + struct bpf_sk_storage_diag { u32 nr_maps; struct bpf_map *maps[]; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index b6bfcd085a76..0e1cdf806fe1 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -2808,7 +2808,7 @@ union bpf_attr { * * **-ERANGE** if resulting value was out of range. * - * void *bpf_sk_storage_get(struct bpf_map *map, struct bpf_sock *sk, void *value, u64 flags) + * void *bpf_sk_storage_get(struct bpf_map *map, void *sk, void *value, u64 flags) * Description * Get a bpf-local-storage from a *sk*. * @@ -2824,6 +2824,9 @@ union bpf_attr { * "type". The bpf-local-storage "type" (i.e. the *map*) is * searched against all bpf-local-storages residing at *sk*. * + * *sk* is a kernel **struct sock** pointer for LSM program. + * *sk* is a **struct bpf_sock** pointer for other program types. + * * An optional *flags* (**BPF_SK_STORAGE_GET_F_CREATE**) can be * used such that a new bpf-local-storage will be * created if one does not exist. *value* can be used @@ -2836,7 +2839,7 @@ union bpf_attr { * **NULL** if not found or there was an error in adding * a new bpf-local-storage. * - * long bpf_sk_storage_delete(struct bpf_map *map, struct bpf_sock *sk) + * long bpf_sk_storage_delete(struct bpf_map *map, void *sk) * Description * Delete a bpf-local-storage from a *sk*. * Return From patchwork Tue Aug 25 18:29:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: KP Singh X-Patchwork-Id: 1351226 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Original-To: incoming-bpf@patchwork.ozlabs.org Delivered-To: patchwork-incoming-bpf@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=bpf-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=chromium.org header.i=@chromium.org header.a=rsa-sha256 header.s=google header.b=S4kmTq23; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4BbcxN3ScNz9sTN for ; Wed, 26 Aug 2020 04:30:29 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726471AbgHYSa1 (ORCPT ); Tue, 25 Aug 2020 14:30:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42694 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726104AbgHYS3j (ORCPT ); Tue, 25 Aug 2020 14:29:39 -0400 Received: from mail-ej1-x641.google.com (mail-ej1-x641.google.com [IPv6:2a00:1450:4864:20::641]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 262E4C0617A5 for ; Tue, 25 Aug 2020 11:29:32 -0700 (PDT) Received: by mail-ej1-x641.google.com with SMTP id d11so17646288ejt.13 for ; Tue, 25 Aug 2020 11:29:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=UQsnRZlN0H9BcX2r4S5kq1SuETDS5jf7walwVksTH94=; b=S4kmTq23hn3c5x6dDrPPRADtdxouNbGaglISf+110wP0HCsA2GmLSuM0EkGQAuj24C VsZgLifucYVDpdlRkjKUkXm7FkGl9AiYwBe8YbC78zHDafg3TQpWS6XScocuXLNAdCZh 1PD/7HPrhVfrA8+1Jxx5JIFOYyeq0IxDQPs+0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=UQsnRZlN0H9BcX2r4S5kq1SuETDS5jf7walwVksTH94=; b=V+8t+cw924nloZCmgfWybUMi+RLTVlxnMN7s9VBJOrP7HVMkfajhqnjIWEZkhlRbNv kJlvXyvhOM2hRn4YX04ODSovst6xUYpaKvdkA/vRMbkpEA+fR+Nr4WBBpOYB0wuU/4Os SrN0FUEhWplGBgCjrjJgzfLPbQ+8icOO9WBxxrR+JtgkMcftQGsfzyZfDJ8YDRs9oTSB hnBcwj4toGZLpqmpvnHDTTASNcBwXxPRwCI5YRTAHqFKSGNfaKFQIPUbCdMmIcouPidl OI/5zob/NXIhOqMCH3BD7KlZSc7daq9BV9PESjv6YmpBzrblmd56qJeq3R/RtAAQ5n49 cDSQ== X-Gm-Message-State: AOAM530eSxAqC6p6iOJ3iQjXNFlJ52GDRQF836o+mjOloKB/3mPe3D88 j7tLjV/YvArSE9hO+gyEEij+6WFCLfSX8A== X-Google-Smtp-Source: ABdhPJz1JfvNo65OJ9r9gQgOVA5SsP2Ub+ZB2uY2L7FHVSKJIso58vh+xWKQkANnkCKKXYRi/cDxMg== X-Received: by 2002:a17:906:4089:: with SMTP id u9mr12450371ejj.235.1598380170754; Tue, 25 Aug 2020 11:29:30 -0700 (PDT) Received: from kpsingh.zrh.corp.google.com ([81.6.44.51]) by smtp.gmail.com with ESMTPSA id dr21sm15323286ejc.112.2020.08.25.11.29.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 25 Aug 2020 11:29:30 -0700 (PDT) From: KP Singh To: linux-kernel@vger.kernel.org, bpf@vger.kernel.org, linux-security-module@vger.kernel.org Cc: Andrii Nakryiko , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , Paul Turner , Jann Horn , Florent Revest Subject: [PATCH bpf-next v10 7/7] bpf: Add selftests for local_storage Date: Tue, 25 Aug 2020 20:29:19 +0200 Message-Id: <20200825182919.1118197-8-kpsingh@chromium.org> X-Mailer: git-send-email 2.28.0.297.g1956fa8f8d-goog In-Reply-To: <20200825182919.1118197-1-kpsingh@chromium.org> References: <20200825182919.1118197-1-kpsingh@chromium.org> MIME-Version: 1.0 Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org From: KP Singh inode_local_storage: * Hook to the file_open and inode_unlink LSM hooks. * Create and unlink a temporary file. * Store some information in the inode's bpf_local_storage during file_open. * Verify that this information exists when the file is unlinked. sk_local_storage: * Hook to the socket_post_create and socket_bind LSM hooks. * Open and bind a socket and set the sk_storage in the socket_post_create hook using the start_server helper. * Verify if the information is set in the socket_bind hook. Acked-by: Andrii Nakryiko Signed-off-by: KP Singh --- .../bpf/prog_tests/test_local_storage.c | 60 ++++++++ .../selftests/bpf/progs/local_storage.c | 140 ++++++++++++++++++ 2 files changed, 200 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/test_local_storage.c create mode 100644 tools/testing/selftests/bpf/progs/local_storage.c diff --git a/tools/testing/selftests/bpf/prog_tests/test_local_storage.c b/tools/testing/selftests/bpf/prog_tests/test_local_storage.c new file mode 100644 index 000000000000..91cd6f357246 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/test_local_storage.c @@ -0,0 +1,60 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Copyright (C) 2020 Google LLC. + */ + +#include +#include + +#include "local_storage.skel.h" +#include "network_helpers.h" + +int create_and_unlink_file(void) +{ + char fname[PATH_MAX] = "/tmp/fileXXXXXX"; + int fd; + + fd = mkstemp(fname); + if (fd < 0) + return fd; + + close(fd); + unlink(fname); + return 0; +} + +void test_test_local_storage(void) +{ + struct local_storage *skel = NULL; + int err, duration = 0, serv_sk = -1; + + skel = local_storage__open_and_load(); + if (CHECK(!skel, "skel_load", "lsm skeleton failed\n")) + goto close_prog; + + err = local_storage__attach(skel); + if (CHECK(err, "attach", "lsm attach failed: %d\n", err)) + goto close_prog; + + skel->bss->monitored_pid = getpid(); + + err = create_and_unlink_file(); + if (CHECK(err < 0, "exec_cmd", "err %d errno %d\n", err, errno)) + goto close_prog; + + CHECK(skel->data->inode_storage_result != 0, "inode_storage_result", + "inode_local_storage not set\n"); + + serv_sk = start_server(AF_INET6, SOCK_STREAM, NULL, 0, 0); + if (CHECK(serv_sk < 0, "start_server", "failed to start server\n")) + goto close_prog; + + CHECK(skel->data->sk_storage_result != 0, "sk_storage_result", + "sk_local_storage not set\n"); + + close(serv_sk); + +close_prog: + local_storage__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/local_storage.c b/tools/testing/selftests/bpf/progs/local_storage.c new file mode 100644 index 000000000000..0758ba229ae0 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/local_storage.c @@ -0,0 +1,140 @@ +// SPDX-License-Identifier: GPL-2.0 + +/* + * Copyright 2020 Google LLC. + */ + +#include +#include +#include +#include +#include + +char _license[] SEC("license") = "GPL"; + +#define DUMMY_STORAGE_VALUE 0xdeadbeef + +int monitored_pid = 0; +int inode_storage_result = -1; +int sk_storage_result = -1; + +struct dummy_storage { + __u32 value; +}; + +struct { + __uint(type, BPF_MAP_TYPE_INODE_STORAGE); + __uint(map_flags, BPF_F_NO_PREALLOC); + __type(key, int); + __type(value, struct dummy_storage); +} inode_storage_map SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_SK_STORAGE); + __uint(map_flags, BPF_F_NO_PREALLOC | BPF_F_CLONE); + __type(key, int); + __type(value, struct dummy_storage); +} sk_storage_map SEC(".maps"); + +/* TODO Use vmlinux.h once BTF pruning for embedded types is fixed. + */ +struct sock {} __attribute__((preserve_access_index)); +struct sockaddr {} __attribute__((preserve_access_index)); +struct socket { + struct sock *sk; +} __attribute__((preserve_access_index)); + +struct inode {} __attribute__((preserve_access_index)); +struct dentry { + struct inode *d_inode; +} __attribute__((preserve_access_index)); +struct file { + struct inode *f_inode; +} __attribute__((preserve_access_index)); + + +SEC("lsm/inode_unlink") +int BPF_PROG(unlink_hook, struct inode *dir, struct dentry *victim) +{ + __u32 pid = bpf_get_current_pid_tgid() >> 32; + struct dummy_storage *storage; + + if (pid != monitored_pid) + return 0; + + storage = bpf_inode_storage_get(&inode_storage_map, victim->d_inode, 0, + BPF_SK_STORAGE_GET_F_CREATE); + if (!storage) + return 0; + + if (storage->value == DUMMY_STORAGE_VALUE) + inode_storage_result = -1; + + inode_storage_result = + bpf_inode_storage_delete(&inode_storage_map, victim->d_inode); + + return 0; +} + +SEC("lsm/socket_bind") +int BPF_PROG(socket_bind, struct socket *sock, struct sockaddr *address, + int addrlen) +{ + __u32 pid = bpf_get_current_pid_tgid() >> 32; + struct dummy_storage *storage; + + if (pid != monitored_pid) + return 0; + + storage = bpf_sk_storage_get(&sk_storage_map, sock->sk, 0, + BPF_SK_STORAGE_GET_F_CREATE); + if (!storage) + return 0; + + if (storage->value == DUMMY_STORAGE_VALUE) + sk_storage_result = -1; + + sk_storage_result = bpf_sk_storage_delete(&sk_storage_map, sock->sk); + return 0; +} + +SEC("lsm/socket_post_create") +int BPF_PROG(socket_post_create, struct socket *sock, int family, int type, + int protocol, int kern) +{ + __u32 pid = bpf_get_current_pid_tgid() >> 32; + struct dummy_storage *storage; + + if (pid != monitored_pid) + return 0; + + storage = bpf_sk_storage_get(&sk_storage_map, sock->sk, 0, + BPF_SK_STORAGE_GET_F_CREATE); + if (!storage) + return 0; + + storage->value = DUMMY_STORAGE_VALUE; + + return 0; +} + +SEC("lsm/file_open") +int BPF_PROG(file_open, struct file *file) +{ + __u32 pid = bpf_get_current_pid_tgid() >> 32; + struct dummy_storage *storage; + + if (pid != monitored_pid) + return 0; + + if (!file->f_inode) + return 0; + + storage = bpf_inode_storage_get(&inode_storage_map, file->f_inode, 0, + BPF_LOCAL_STORAGE_GET_F_CREATE); + if (!storage) + return 0; + + storage->value = DUMMY_STORAGE_VALUE; + return 0; +}