diff mbox series

[v4,bpf-next,3/5] bpf: Make cgroup storages shared across attaches on the same cgroup

Message ID f56279652110face35e35f2d4182da371cfe937c.1595274799.git.zhuyifei@google.com
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series Make BPF CGROUP_STORAGE map usable by different programs at once | expand

Commit Message

YiFei Zhu July 20, 2020, 7:54 p.m. UTC
From: YiFei Zhu <zhuyifei@google.com>

This change comes in several parts:

One, the restriction that the CGROUP_STORAGE map can only be used
by one program is removed. This results in the removal of the field
'aux' in struct bpf_cgroup_storage_map, and removal of relevant
code associated with the field, and removal of now-noop functions
bpf_free_cgroup_storage and bpf_cgroup_storage_release.

Second, because there could be multiple attach types to the same
cgroup, the attach type is completely ignored on comparison in
the map key. Newly added keys have it zeroed. The only value in
the key that still matters is the cgroup inode. bpftool map dump
will also show an attach type of zero.

Third, because the storages are now shared, the storages cannot
be unconditionally freed on program detach. There could be two
ways to solve this issue:
* A. Reference count the usage of the storages, and free when the
     last program is detached.
* B. Free only when the storage is impossible to be referred to
     again, i.e. when either the cgroup_bpf it is attached to, or
     the map itself, is freed.
Option A has the side effect that, when the user detach and
reattach a program, whether the program gets a fresh storage
depends on whether there is another program attached using that
storage. This could trigger races if the user is multi-threaded,
and since nondeterminism in data races is evil, go with option B.

The both the map and the cgroup_bpf now tracks their associated
storages, and the storage unlink and free are removed from
cgroup_bpf_detach and added to cgroup_bpf_release and
cgroup_storage_map_free. The latter also new holds the cgroup_mutex
to prevent any races with the former.

Fourth, on attach, we reuse the old storage if the key already
exists in the map, via cgroup_storage_lookup. If the storage
does not exist yet, we create a new one, and publish it at the
last step in the attach process. This does not create a race
condition because for the whole attach the cgroup_mutex is held.
We keep track of an array of new storages that was allocated
and if the process fails only the new storages would get freed.

Signed-off-by: YiFei Zhu <zhuyifei@google.com>
---
 include/linux/bpf-cgroup.h     | 15 ++++---
 include/uapi/linux/bpf.h       |  2 +-
 kernel/bpf/cgroup.c            | 69 ++++++++++++++++++--------------
 kernel/bpf/core.c              | 12 ------
 kernel/bpf/local_storage.c     | 73 ++++++++++++----------------------
 tools/include/uapi/linux/bpf.h |  2 +-
 6 files changed, 76 insertions(+), 97 deletions(-)

Comments

Martin KaFai Lau July 21, 2020, 6:05 p.m. UTC | #1
On Mon, Jul 20, 2020 at 02:54:53PM -0500, YiFei Zhu wrote:
> From: YiFei Zhu <zhuyifei@google.com>
> 
> This change comes in several parts:
> 
> One, the restriction that the CGROUP_STORAGE map can only be used
> by one program is removed. This results in the removal of the field
> 'aux' in struct bpf_cgroup_storage_map, and removal of relevant
> code associated with the field, and removal of now-noop functions
> bpf_free_cgroup_storage and bpf_cgroup_storage_release.
> 
> Second, because there could be multiple attach types to the same
> cgroup, the attach type is completely ignored on comparison in
> the map key. Newly added keys have it zeroed. The only value in
> the key that still matters is the cgroup inode. bpftool map dump
> will also show an attach type of zero.
I quickly checked zero is actually BPF_CGROUP_INET_INGRESS instead
of UNSPEC for attach_type.  It will be confusing on the syscall
side. e.g. map dumping with key.attach_type == BPF_CGROUP_INET_INGRESS
while the only cgroup program is actually attaching to BPF_CGROUP_INET_EGRESS.

I don't have a clean way out for this.  Adding a non-zero UNSPEC
to "enum bpf_attach_type" seems wrong also.
One possible way out is to allow the bpf_cgroup_storage_map
to have a smaller key size (i.e. allow both sizeof(cgroup_inode_id)
and the existing sizeof(struct bpf_cgroup_storage_key).  That
will completely remove attach_type from the picture if the user
specified to use sizeof(cgroup_inode_id) as the key_size.
If sizeof(struct bpf_cgroup_storage_key) is specified, the attach_type
is still used in cmp().

The bpf_cgroup_storage_key_cmp() need to do cmp accordingly.
Same changes is needed to lookup_elem, delete_elem, check_btf, map_alloc...etc.

[ ... ]

> diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
> index 51bd5a8cb01b..0b94b4ba99ba 100644
> --- a/kernel/bpf/local_storage.c
> +++ b/kernel/bpf/local_storage.c
> @@ -9,6 +9,8 @@
>  #include <linux/slab.h>
>  #include <uapi/linux/btf.h>
>  
> +#include "../cgroup/cgroup-internal.h"
Why?

Others LGTM.
YiFei Zhu July 21, 2020, 9:20 p.m. UTC | #2
On Tue, Jul 21, 2020 at 1:05 PM Martin KaFai Lau <kafai@fb.com> wrote:
> I quickly checked zero is actually BPF_CGROUP_INET_INGRESS instead
> of UNSPEC for attach_type.  It will be confusing on the syscall
> side. e.g. map dumping with key.attach_type == BPF_CGROUP_INET_INGRESS
> while the only cgroup program is actually attaching to BPF_CGROUP_INET_EGRESS.
>
> I don't have a clean way out for this.  Adding a non-zero UNSPEC
> to "enum bpf_attach_type" seems wrong also.
> One possible way out is to allow the bpf_cgroup_storage_map
> to have a smaller key size (i.e. allow both sizeof(cgroup_inode_id)
> and the existing sizeof(struct bpf_cgroup_storage_key).  That
> will completely remove attach_type from the picture if the user
> specified to use sizeof(cgroup_inode_id) as the key_size.
> If sizeof(struct bpf_cgroup_storage_key) is specified, the attach_type
> is still used in cmp().
>
> The bpf_cgroup_storage_key_cmp() need to do cmp accordingly.
> Same changes is needed to lookup_elem, delete_elem, check_btf, map_alloc...etc.

ACK. Considering that the cgroup_inode_id is the first field of struct
bpf_cgroup_storage_key, I can probably just use this fact and directly
use the user-provided map key (after it's copied to kernel space) and
cast the pointer to a __u64 - or are there any architectures where the
first field is not at offset zero? If this is done then we can change
all the kernel internal APIs to use __u64 cgroup_inode_id, ignoring
the existence of the struct aside from the map creation (size checking
and BTF checking).


> > +#include "../cgroup/cgroup-internal.h"
> Why?

cgroup_mutex is declared in this header.

YiFei Zhu
Martin KaFai Lau July 21, 2020, 10:41 p.m. UTC | #3
On Tue, Jul 21, 2020 at 04:20:00PM -0500, YiFei Zhu wrote:
> On Tue, Jul 21, 2020 at 1:05 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > I quickly checked zero is actually BPF_CGROUP_INET_INGRESS instead
> > of UNSPEC for attach_type.  It will be confusing on the syscall
> > side. e.g. map dumping with key.attach_type == BPF_CGROUP_INET_INGRESS
> > while the only cgroup program is actually attaching to BPF_CGROUP_INET_EGRESS.
> >
> > I don't have a clean way out for this.  Adding a non-zero UNSPEC
> > to "enum bpf_attach_type" seems wrong also.
> > One possible way out is to allow the bpf_cgroup_storage_map
> > to have a smaller key size (i.e. allow both sizeof(cgroup_inode_id)
> > and the existing sizeof(struct bpf_cgroup_storage_key).  That
> > will completely remove attach_type from the picture if the user
> > specified to use sizeof(cgroup_inode_id) as the key_size.
> > If sizeof(struct bpf_cgroup_storage_key) is specified, the attach_type
> > is still used in cmp().
> >
> > The bpf_cgroup_storage_key_cmp() need to do cmp accordingly.
> > Same changes is needed to lookup_elem, delete_elem, check_btf, map_alloc...etc.
> 
> ACK. Considering that the cgroup_inode_id is the first field of struct
> bpf_cgroup_storage_key, I can probably just use this fact and directly
> use the user-provided map key (after it's copied to kernel space) and
> cast the pointer to a __u64 - or are there any architectures where the
> first field is not at offset zero? If this is done then we can change
> all the kernel internal APIs to use __u64 cgroup_inode_id, ignoring
> the existence of the struct aside from the map creation (size checking
> and BTF checking).
Just to be clear.  The problem is the userspace is expecting
the whole key (cgroup_id, attach_type) to be meaningful and
we cannot stop supporting this existing key type now.

This patch essentially sets all attach_type to 0 to mean unused/UNSPEC/dont_care
but 0 is actually BPF_CGROUP_INET_INGRESS in bpf's uapi.  "map dump" will be an
issue as mentioned earlier.  Also, as a key of a map,
it is logical for the user to assume that different attach_type will have
different storage.  e.g. If user lookup by (cgroup_id, 1),  what does it mean
to quietly return the storage created by (cgroup_id, 0/2/3/4/5)?

I think from the map's perspective, it makes more sense to
have different "key-cmp" operation for different key type.  When the map
is created with key_size == (cgroup_id, attach_type),  the "cmp"
will stay as is to compare the attach_type also.  Then the
storage will be sharable among the same cgroup and the
same attach_type.

Then add support for the new key_size == (cgroup_id) which the "cmp" will
only compare the "cgroup_id" as this patch does so that the storage can be
shared among all attach_types of the same cgroup.

Does it make sense?
Stanislav Fomichev July 21, 2020, 10:56 p.m. UTC | #4
On 07/21, Martin KaFai Lau wrote:
> On Tue, Jul 21, 2020 at 04:20:00PM -0500, YiFei Zhu wrote:
> > On Tue, Jul 21, 2020 at 1:05 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > I quickly checked zero is actually BPF_CGROUP_INET_INGRESS instead
> > > of UNSPEC for attach_type.  It will be confusing on the syscall
> > > side. e.g. map dumping with key.attach_type == BPF_CGROUP_INET_INGRESS
> > > while the only cgroup program is actually attaching to  
> BPF_CGROUP_INET_EGRESS.
> > >
> > > I don't have a clean way out for this.  Adding a non-zero UNSPEC
> > > to "enum bpf_attach_type" seems wrong also.
> > > One possible way out is to allow the bpf_cgroup_storage_map
> > > to have a smaller key size (i.e. allow both sizeof(cgroup_inode_id)
> > > and the existing sizeof(struct bpf_cgroup_storage_key).  That
> > > will completely remove attach_type from the picture if the user
> > > specified to use sizeof(cgroup_inode_id) as the key_size.
> > > If sizeof(struct bpf_cgroup_storage_key) is specified, the attach_type
> > > is still used in cmp().
> > >
> > > The bpf_cgroup_storage_key_cmp() need to do cmp accordingly.
> > > Same changes is needed to lookup_elem, delete_elem, check_btf,  
> map_alloc...etc.
> >
> > ACK. Considering that the cgroup_inode_id is the first field of struct
> > bpf_cgroup_storage_key, I can probably just use this fact and directly
> > use the user-provided map key (after it's copied to kernel space) and
> > cast the pointer to a __u64 - or are there any architectures where the
> > first field is not at offset zero? If this is done then we can change
> > all the kernel internal APIs to use __u64 cgroup_inode_id, ignoring
> > the existence of the struct aside from the map creation (size checking
> > and BTF checking).
> Just to be clear.  The problem is the userspace is expecting
> the whole key (cgroup_id, attach_type) to be meaningful and
> we cannot stop supporting this existing key type now.
To step back a bit. I think in the commit message we mentioned that
attach_type is essentially (mostly) meaningless right now.
If I want to share cgroup storage between BPF_CGROUP_INET_INGRESS and
BPF_CGROUP_INET_EGRESS, the verifier will refuse to load those programs.
So doing lookup with different attach_type for the same storage
shouldn't really happen.

Except. There is one use-case where it does make sense. If you take
the same loaded program and attach it to both ingress and egress, each
one will get a separate storage copy. And only in this case
attach_type really has any meaning.

But since more and more attach types started to require
expected_attach_type, it seems that the case where the same
prog is attached to two different hooks should be almost
non-existent. That's why we've decided to go with
the idea of completely ignoring it.

So the question is: is it really worth it trying to preserve
that obscure use-case (and have more complex implementation) or we can
safely assume that nobody is doing that sort of thing in the wild (and
can simplify the internal logic a bit)?
Martin KaFai Lau July 22, 2020, 12:09 a.m. UTC | #5
On Tue, Jul 21, 2020 at 03:56:36PM -0700, sdf@google.com wrote:
> On 07/21, Martin KaFai Lau wrote:
> > On Tue, Jul 21, 2020 at 04:20:00PM -0500, YiFei Zhu wrote:
> > > On Tue, Jul 21, 2020 at 1:05 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > > > I quickly checked zero is actually BPF_CGROUP_INET_INGRESS instead
> > > > of UNSPEC for attach_type.  It will be confusing on the syscall
> > > > side. e.g. map dumping with key.attach_type == BPF_CGROUP_INET_INGRESS
> > > > while the only cgroup program is actually attaching to
> > BPF_CGROUP_INET_EGRESS.
> > > >
> > > > I don't have a clean way out for this.  Adding a non-zero UNSPEC
> > > > to "enum bpf_attach_type" seems wrong also.
> > > > One possible way out is to allow the bpf_cgroup_storage_map
> > > > to have a smaller key size (i.e. allow both sizeof(cgroup_inode_id)
> > > > and the existing sizeof(struct bpf_cgroup_storage_key).  That
> > > > will completely remove attach_type from the picture if the user
> > > > specified to use sizeof(cgroup_inode_id) as the key_size.
> > > > If sizeof(struct bpf_cgroup_storage_key) is specified, the attach_type
> > > > is still used in cmp().
> > > >
> > > > The bpf_cgroup_storage_key_cmp() need to do cmp accordingly.
> > > > Same changes is needed to lookup_elem, delete_elem, check_btf,
> > map_alloc...etc.
> > >
> > > ACK. Considering that the cgroup_inode_id is the first field of struct
> > > bpf_cgroup_storage_key, I can probably just use this fact and directly
> > > use the user-provided map key (after it's copied to kernel space) and
> > > cast the pointer to a __u64 - or are there any architectures where the
> > > first field is not at offset zero? If this is done then we can change
> > > all the kernel internal APIs to use __u64 cgroup_inode_id, ignoring
> > > the existence of the struct aside from the map creation (size checking
> > > and BTF checking).
> > Just to be clear.  The problem is the userspace is expecting
> > the whole key (cgroup_id, attach_type) to be meaningful and
> > we cannot stop supporting this existing key type now.
> To step back a bit. I think in the commit message we mentioned that
> attach_type is essentially (mostly) meaningless right now.
> If I want to share cgroup storage between BPF_CGROUP_INET_INGRESS and
> BPF_CGROUP_INET_EGRESS, the verifier will refuse to load those programs.
> So doing lookup with different attach_type for the same storage
> shouldn't really happen.
I think we are talking about the existing map->aux check in
bpf_cgroup_storage_assign()?  Right, the map is dedicated for
only one bpf prog which usually has only one expected_attach_type.
Thus, the attach_type of the key is not really useful since it will
always be the same.  However,  it does not mean the exposed
attach_type (of the key) is meaningless or the value is invalid.
The exposed value is valid.

I am trying to say attach_type is still part of the key
and exposed to userspace (e.g. in map dump) but it is sort of
invalid after this change because I am not sure what that "0"
means now.

Also, as part of the key, it is intuitive for user to think the storage is
unique per (cgroup, attach_type).  This uniqueness is always true now because
of the map->aux logic and guaranteed by the verifier.  With this patch, this
"key" intuition is gone where the kernel quietly ignore the attach_type.

> 
> Except. There is one use-case where it does make sense. If you take
> the same loaded program and attach it to both ingress and egress, each
> one will get a separate storage copy. And only in this case
> attach_type really has any meaning.
>
> But since more and more attach types started to require
> expected_attach_type, it seems that the case where the same
> prog is attached to two different hooks should be almost
> non-existent. That's why we've decided to go with
> the idea of completely ignoring it.
> 
> So the question is: is it really worth it trying to preserve
> that obscure use-case (and have more complex implementation) or we can
> safely assume that nobody is doing that sort of thing in the wild (and
> can simplify the internal logic a bit)?
I think it is a reasonable expectation for map-dump to show a
meaningful attach_type if it is indeed part of the key.
YiFei Zhu July 22, 2020, 12:19 a.m. UTC | #6
On Tue, Jul 21, 2020 at 7:09 PM Martin KaFai Lau <kafai@fb.com> wrote:
> I think we are talking about the existing map->aux check in
> bpf_cgroup_storage_assign()?  Right, the map is dedicated for
> only one bpf prog which usually has only one expected_attach_type.
> Thus, the attach_type of the key is not really useful since it will
> always be the same.  However,  it does not mean the exposed
> attach_type (of the key) is meaningless or the value is invalid.
> The exposed value is valid.
>
> I am trying to say attach_type is still part of the key
> and exposed to userspace (e.g. in map dump) but it is sort of
> invalid after this change because I am not sure what that "0"
> means now.
>
> Also, as part of the key, it is intuitive for user to think the storage is
> unique per (cgroup, attach_type).  This uniqueness is always true now because
> of the map->aux logic and guaranteed by the verifier.  With this patch, this
> "key" intuition is gone where the kernel quietly ignore the attach_type.

Right. It is indeed non-intuitive for part of the "key" to be
completely ignored in cmp. However, what would a sane solution be
instead? I could at attach time, use the attach type to also perform
the lookup, and store the attach type as the key, if and only if the
size of the key == sizeof(struct cgroup_storage_key). This storage
will not be shared across different attach types, only across
different programs of the same attach type. The lifetime will still
bound to the (map, cgroup_bpf) pair, rather than the program, as the
implementation prior to this patch did.

YiFei Zhu
Martin KaFai Lau July 22, 2020, 12:49 a.m. UTC | #7
On Tue, Jul 21, 2020 at 07:19:31PM -0500, YiFei Zhu wrote:
> On Tue, Jul 21, 2020 at 7:09 PM Martin KaFai Lau <kafai@fb.com> wrote:
> > I think we are talking about the existing map->aux check in
> > bpf_cgroup_storage_assign()?  Right, the map is dedicated for
> > only one bpf prog which usually has only one expected_attach_type.
> > Thus, the attach_type of the key is not really useful since it will
> > always be the same.  However,  it does not mean the exposed
> > attach_type (of the key) is meaningless or the value is invalid.
> > The exposed value is valid.
> >
> > I am trying to say attach_type is still part of the key
> > and exposed to userspace (e.g. in map dump) but it is sort of
> > invalid after this change because I am not sure what that "0"
> > means now.
> >
> > Also, as part of the key, it is intuitive for user to think the storage is
> > unique per (cgroup, attach_type).  This uniqueness is always true now because
> > of the map->aux logic and guaranteed by the verifier.  With this patch, this
> > "key" intuition is gone where the kernel quietly ignore the attach_type.
> 
> Right. It is indeed non-intuitive for part of the "key" to be
> completely ignored in cmp. However, what would a sane solution be
> instead? I could at attach time, use the attach type to also perform
> the lookup, and store the attach type as the key, if and only if the
> size of the key == sizeof(struct cgroup_storage_key).
Yes, that makes sense.  cgroup_storage_lookup() has the map which
has the key_size, so it can decide which part of the key to use
for lookup.  bpf_cgroup_storages_alloc() does not need to know
the details and it just populates everything in bpf_cgroup_storage_key
and let cgroup_storage_lookup() to decide.

Thus, it should not be a big change on this patch.

> This storage
> will not be shared across different attach types, only across
> different programs of the same attach type. The lifetime will still
> bound to the (map, cgroup_bpf) pair, rather than the program, as the
> implementation prior to this patch did.
Right, the lifetime of the storage should be the same regardless of the key.

The major idea of this patch is to make storage sharable across
different bpf-progs and this part does not change.
How sharable is defined by the map's key.

The key of the map should behave like a normal map's key.
sharable among (cgroup_id, attach_type) or
sharable among (cgroup_id)

Then all lookup/update/map-dump will behave like a normal
map without an invalid value "0" (BPF_CGROUP_INET_INGRESS) in the attach_type.

Thus, I don't expect a big change in this patch.
May be restoring the original bpf_cgroup_storage_key_cmp() behavior
and a few key_size check in the map_ops's lookup/update/check_btf...etc.
diff mbox series

Patch

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 2c6f26670acc..2d5a04c0f4be 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -46,7 +46,8 @@  struct bpf_cgroup_storage {
 	};
 	struct bpf_cgroup_storage_map *map;
 	struct bpf_cgroup_storage_key key;
-	struct list_head list;
+	struct list_head list_map;
+	struct list_head list_cg;
 	struct rb_node node;
 	struct rcu_head rcu;
 };
@@ -78,6 +79,9 @@  struct cgroup_bpf {
 	struct list_head progs[MAX_BPF_ATTACH_TYPE];
 	u32 flags[MAX_BPF_ATTACH_TYPE];
 
+	/* list of cgroup shared storages */
+	struct list_head storages;
+
 	/* temp storage for effective prog array used by prog_attach/detach */
 	struct bpf_prog_array *inactive;
 
@@ -161,15 +165,16 @@  static inline void bpf_cgroup_storage_set(struct bpf_cgroup_storage
 		this_cpu_write(bpf_cgroup_storage[stype], storage[stype]);
 }
 
+struct bpf_cgroup_storage *
+cgroup_storage_lookup(struct bpf_cgroup_storage_map *map,
+		      struct bpf_cgroup_storage_key *key, bool locked);
 struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(struct bpf_prog *prog,
 					enum bpf_cgroup_storage_type stype);
 void bpf_cgroup_storage_free(struct bpf_cgroup_storage *storage);
 void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage,
-			     struct cgroup *cgroup,
-			     enum bpf_attach_type type);
+			     struct cgroup *cgroup);
 void bpf_cgroup_storage_unlink(struct bpf_cgroup_storage *storage);
 int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux, struct bpf_map *map);
-void bpf_cgroup_storage_release(struct bpf_prog_aux *aux, struct bpf_map *map);
 
 int bpf_percpu_cgroup_storage_copy(struct bpf_map *map, void *key, void *value);
 int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key,
@@ -383,8 +388,6 @@  static inline void bpf_cgroup_storage_set(
 	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE]) {}
 static inline int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux,
 					    struct bpf_map *map) { return 0; }
-static inline void bpf_cgroup_storage_release(struct bpf_prog_aux *aux,
-					      struct bpf_map *map) {}
 static inline struct bpf_cgroup_storage *bpf_cgroup_storage_alloc(
 	struct bpf_prog *prog, enum bpf_cgroup_storage_type stype) { return NULL; }
 static inline void bpf_cgroup_storage_free(
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 54d0c886e3ba..db93a211d2b1 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -78,7 +78,7 @@  struct bpf_lpm_trie_key {
 
 struct bpf_cgroup_storage_key {
 	__u64	cgroup_inode_id;	/* cgroup inode id */
-	__u32	attach_type;		/* program attach type */
+	__u32	attach_type;		/* program attach type, unused */
 };
 
 /* BPF syscall commands, see bpf(2) man-page for details. */
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index ac53102e244a..f91c554d6be6 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -37,17 +37,33 @@  static void bpf_cgroup_storages_free(struct bpf_cgroup_storage *storages[])
 }
 
 static int bpf_cgroup_storages_alloc(struct bpf_cgroup_storage *storages[],
-				     struct bpf_prog *prog)
+				     struct bpf_cgroup_storage *new_storages[],
+				     struct bpf_prog *prog,
+				     struct cgroup *cgrp)
 {
 	enum bpf_cgroup_storage_type stype;
+	struct bpf_cgroup_storage_key key;
+	struct bpf_map *map;
+
+	key.cgroup_inode_id = cgroup_id(cgrp);
+	key.attach_type = 0;
 
 	for_each_cgroup_storage_type(stype) {
+		map = prog->aux->cgroup_storage[stype];
+		if (!map)
+			continue;
+
+		storages[stype] = cgroup_storage_lookup((void *)map, &key, false);
+		if (storages[stype])
+			continue;
+
 		storages[stype] = bpf_cgroup_storage_alloc(prog, stype);
 		if (IS_ERR(storages[stype])) {
-			storages[stype] = NULL;
-			bpf_cgroup_storages_free(storages);
+			bpf_cgroup_storages_free(new_storages);
 			return -ENOMEM;
 		}
+
+		new_storages[stype] = storages[stype];
 	}
 
 	return 0;
@@ -63,21 +79,12 @@  static void bpf_cgroup_storages_assign(struct bpf_cgroup_storage *dst[],
 }
 
 static void bpf_cgroup_storages_link(struct bpf_cgroup_storage *storages[],
-				     struct cgroup* cgrp,
-				     enum bpf_attach_type attach_type)
-{
-	enum bpf_cgroup_storage_type stype;
-
-	for_each_cgroup_storage_type(stype)
-		bpf_cgroup_storage_link(storages[stype], cgrp, attach_type);
-}
-
-static void bpf_cgroup_storages_unlink(struct bpf_cgroup_storage *storages[])
+				     struct cgroup *cgrp)
 {
 	enum bpf_cgroup_storage_type stype;
 
 	for_each_cgroup_storage_type(stype)
-		bpf_cgroup_storage_unlink(storages[stype]);
+		bpf_cgroup_storage_link(storages[stype], cgrp);
 }
 
 /* Called when bpf_cgroup_link is auto-detached from dying cgroup.
@@ -101,22 +108,23 @@  static void cgroup_bpf_release(struct work_struct *work)
 	struct cgroup *p, *cgrp = container_of(work, struct cgroup,
 					       bpf.release_work);
 	struct bpf_prog_array *old_array;
+	struct list_head *storages = &cgrp->bpf.storages;
+	struct bpf_cgroup_storage *storage, *stmp;
+
 	unsigned int type;
 
 	mutex_lock(&cgroup_mutex);
 
 	for (type = 0; type < ARRAY_SIZE(cgrp->bpf.progs); type++) {
 		struct list_head *progs = &cgrp->bpf.progs[type];
-		struct bpf_prog_list *pl, *tmp;
+		struct bpf_prog_list *pl, *pltmp;
 
-		list_for_each_entry_safe(pl, tmp, progs, node) {
+		list_for_each_entry_safe(pl, pltmp, progs, node) {
 			list_del(&pl->node);
 			if (pl->prog)
 				bpf_prog_put(pl->prog);
 			if (pl->link)
 				bpf_cgroup_link_auto_detach(pl->link);
-			bpf_cgroup_storages_unlink(pl->storage);
-			bpf_cgroup_storages_free(pl->storage);
 			kfree(pl);
 			static_branch_dec(&cgroup_bpf_enabled_key);
 		}
@@ -126,6 +134,11 @@  static void cgroup_bpf_release(struct work_struct *work)
 		bpf_prog_array_free(old_array);
 	}
 
+	list_for_each_entry_safe(storage, stmp, storages, list_cg) {
+		bpf_cgroup_storage_unlink(storage);
+		bpf_cgroup_storage_free(storage);
+	}
+
 	mutex_unlock(&cgroup_mutex);
 
 	for (p = cgroup_parent(cgrp); p; p = cgroup_parent(p))
@@ -290,6 +303,8 @@  int cgroup_bpf_inherit(struct cgroup *cgrp)
 	for (i = 0; i < NR; i++)
 		INIT_LIST_HEAD(&cgrp->bpf.progs[i]);
 
+	INIT_LIST_HEAD(&cgrp->bpf.storages);
+
 	for (i = 0; i < NR; i++)
 		if (compute_effective_progs(cgrp, i, &arrays[i]))
 			goto cleanup;
@@ -422,7 +437,7 @@  int __cgroup_bpf_attach(struct cgroup *cgrp,
 	struct list_head *progs = &cgrp->bpf.progs[type];
 	struct bpf_prog *old_prog = NULL;
 	struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
-	struct bpf_cgroup_storage *old_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
+	struct bpf_cgroup_storage *new_storage[MAX_BPF_CGROUP_STORAGE_TYPE] = {};
 	struct bpf_prog_list *pl;
 	int err;
 
@@ -455,17 +470,16 @@  int __cgroup_bpf_attach(struct cgroup *cgrp,
 	if (IS_ERR(pl))
 		return PTR_ERR(pl);
 
-	if (bpf_cgroup_storages_alloc(storage, prog ? : link->link.prog))
+	if (bpf_cgroup_storages_alloc(storage, new_storage,
+				      prog ? : link->link.prog, cgrp))
 		return -ENOMEM;
 
 	if (pl) {
 		old_prog = pl->prog;
-		bpf_cgroup_storages_unlink(pl->storage);
-		bpf_cgroup_storages_assign(old_storage, pl->storage);
 	} else {
 		pl = kmalloc(sizeof(*pl), GFP_KERNEL);
 		if (!pl) {
-			bpf_cgroup_storages_free(storage);
+			bpf_cgroup_storages_free(new_storage);
 			return -ENOMEM;
 		}
 		list_add_tail(&pl->node, progs);
@@ -480,12 +494,11 @@  int __cgroup_bpf_attach(struct cgroup *cgrp,
 	if (err)
 		goto cleanup;
 
-	bpf_cgroup_storages_free(old_storage);
 	if (old_prog)
 		bpf_prog_put(old_prog);
 	else
 		static_branch_inc(&cgroup_bpf_enabled_key);
-	bpf_cgroup_storages_link(pl->storage, cgrp, type);
+	bpf_cgroup_storages_link(new_storage, cgrp);
 	return 0;
 
 cleanup:
@@ -493,9 +506,7 @@  int __cgroup_bpf_attach(struct cgroup *cgrp,
 		pl->prog = old_prog;
 		pl->link = NULL;
 	}
-	bpf_cgroup_storages_free(pl->storage);
-	bpf_cgroup_storages_assign(pl->storage, old_storage);
-	bpf_cgroup_storages_link(pl->storage, cgrp, type);
+	bpf_cgroup_storages_free(new_storage);
 	if (!old_prog) {
 		list_del(&pl->node);
 		kfree(pl);
@@ -679,8 +690,6 @@  int __cgroup_bpf_detach(struct cgroup *cgrp, struct bpf_prog *prog,
 
 	/* now can actually delete it from this cgroup list */
 	list_del(&pl->node);
-	bpf_cgroup_storages_unlink(pl->storage);
-	bpf_cgroup_storages_free(pl->storage);
 	kfree(pl);
 	if (list_empty(progs))
 		/* last program was detached, reset flags to zero */
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 7be02e555ab9..bde93344164d 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -2097,24 +2097,12 @@  int bpf_prog_array_copy_info(struct bpf_prog_array *array,
 								     : 0;
 }
 
-static void bpf_free_cgroup_storage(struct bpf_prog_aux *aux)
-{
-	enum bpf_cgroup_storage_type stype;
-
-	for_each_cgroup_storage_type(stype) {
-		if (!aux->cgroup_storage[stype])
-			continue;
-		bpf_cgroup_storage_release(aux, aux->cgroup_storage[stype]);
-	}
-}
-
 void __bpf_free_used_maps(struct bpf_prog_aux *aux,
 			  struct bpf_map **used_maps, u32 len)
 {
 	struct bpf_map *map;
 	u32 i;
 
-	bpf_free_cgroup_storage(aux);
 	for (i = 0; i < len; i++) {
 		map = used_maps[i];
 		if (map->ops->map_poke_untrack)
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index 51bd5a8cb01b..0b94b4ba99ba 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -9,6 +9,8 @@ 
 #include <linux/slab.h>
 #include <uapi/linux/btf.h>
 
+#include "../cgroup/cgroup-internal.h"
+
 DEFINE_PER_CPU(struct bpf_cgroup_storage*, bpf_cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]);
 
 #ifdef CONFIG_CGROUP_BPF
@@ -20,7 +22,6 @@  struct bpf_cgroup_storage_map {
 	struct bpf_map map;
 
 	spinlock_t lock;
-	struct bpf_prog_aux *aux;
 	struct rb_root root;
 	struct list_head list;
 };
@@ -38,16 +39,12 @@  static int bpf_cgroup_storage_key_cmp(
 		return -1;
 	else if (key1->cgroup_inode_id > key2->cgroup_inode_id)
 		return 1;
-	else if (key1->attach_type < key2->attach_type)
-		return -1;
-	else if (key1->attach_type > key2->attach_type)
-		return 1;
 	return 0;
 }
 
-static struct bpf_cgroup_storage *cgroup_storage_lookup(
-	struct bpf_cgroup_storage_map *map, struct bpf_cgroup_storage_key *key,
-	bool locked)
+struct bpf_cgroup_storage *
+cgroup_storage_lookup(struct bpf_cgroup_storage_map *map,
+		      struct bpf_cgroup_storage_key *key, bool locked)
 {
 	struct rb_root *root = &map->root;
 	struct rb_node *node;
@@ -131,10 +128,7 @@  static int cgroup_storage_update_elem(struct bpf_map *map, void *_key,
 	struct bpf_cgroup_storage *storage;
 	struct bpf_storage_buffer *new;
 
-	if (unlikely(flags & ~(BPF_F_LOCK | BPF_EXIST | BPF_NOEXIST)))
-		return -EINVAL;
-
-	if (unlikely(flags & BPF_NOEXIST))
+	if (unlikely(flags & ~(BPF_F_LOCK | BPF_EXIST)))
 		return -EINVAL;
 
 	if (unlikely((flags & BPF_F_LOCK) &&
@@ -250,16 +244,15 @@  static int cgroup_storage_get_next_key(struct bpf_map *_map, void *_key,
 		if (!storage)
 			goto enoent;
 
-		storage = list_next_entry(storage, list);
+		storage = list_next_entry(storage, list_map);
 		if (!storage)
 			goto enoent;
 	} else {
 		storage = list_first_entry(&map->list,
-					 struct bpf_cgroup_storage, list);
+					 struct bpf_cgroup_storage, list_map);
 	}
 
 	spin_unlock_bh(&map->lock);
-	next->attach_type = storage->key.attach_type;
 	next->cgroup_inode_id = storage->key.cgroup_inode_id;
 	return 0;
 
@@ -318,6 +311,17 @@  static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
 static void cgroup_storage_map_free(struct bpf_map *_map)
 {
 	struct bpf_cgroup_storage_map *map = map_to_storage(_map);
+	struct list_head *storages = &map->list;
+	struct bpf_cgroup_storage *storage, *stmp;
+
+	mutex_lock(&cgroup_mutex);
+
+	list_for_each_entry_safe(storage, stmp, storages, list_map) {
+		bpf_cgroup_storage_unlink(storage);
+		bpf_cgroup_storage_free(storage);
+	}
+
+	mutex_unlock(&cgroup_mutex);
 
 	WARN_ON(!RB_EMPTY_ROOT(&map->root));
 	WARN_ON(!list_empty(&map->list));
@@ -426,38 +430,13 @@  const struct bpf_map_ops cgroup_storage_map_ops = {
 int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux, struct bpf_map *_map)
 {
 	enum bpf_cgroup_storage_type stype = cgroup_storage_type(_map);
-	struct bpf_cgroup_storage_map *map = map_to_storage(_map);
-	int ret = -EBUSY;
-
-	spin_lock_bh(&map->lock);
 
-	if (map->aux && map->aux != aux)
-		goto unlock;
 	if (aux->cgroup_storage[stype] &&
 	    aux->cgroup_storage[stype] != _map)
-		goto unlock;
+		return -EBUSY;
 
-	map->aux = aux;
 	aux->cgroup_storage[stype] = _map;
-	ret = 0;
-unlock:
-	spin_unlock_bh(&map->lock);
-
-	return ret;
-}
-
-void bpf_cgroup_storage_release(struct bpf_prog_aux *aux, struct bpf_map *_map)
-{
-	enum bpf_cgroup_storage_type stype = cgroup_storage_type(_map);
-	struct bpf_cgroup_storage_map *map = map_to_storage(_map);
-
-	spin_lock_bh(&map->lock);
-	if (map->aux == aux) {
-		WARN_ON(aux->cgroup_storage[stype] != _map);
-		map->aux = NULL;
-		aux->cgroup_storage[stype] = NULL;
-	}
-	spin_unlock_bh(&map->lock);
+	return 0;
 }
 
 static size_t bpf_cgroup_storage_calculate_size(struct bpf_map *map, u32 *pages)
@@ -563,22 +542,21 @@  void bpf_cgroup_storage_free(struct bpf_cgroup_storage *storage)
 }
 
 void bpf_cgroup_storage_link(struct bpf_cgroup_storage *storage,
-			     struct cgroup *cgroup,
-			     enum bpf_attach_type type)
+			     struct cgroup *cgroup)
 {
 	struct bpf_cgroup_storage_map *map;
 
 	if (!storage)
 		return;
 
-	storage->key.attach_type = type;
 	storage->key.cgroup_inode_id = cgroup_id(cgroup);
 
 	map = storage->map;
 
 	spin_lock_bh(&map->lock);
 	WARN_ON(cgroup_storage_insert(map, storage));
-	list_add(&storage->list, &map->list);
+	list_add(&storage->list_map, &map->list);
+	list_add(&storage->list_cg, &cgroup->bpf.storages);
 	spin_unlock_bh(&map->lock);
 }
 
@@ -596,7 +574,8 @@  void bpf_cgroup_storage_unlink(struct bpf_cgroup_storage *storage)
 	root = &map->root;
 	rb_erase(&storage->node, root);
 
-	list_del(&storage->list);
+	list_del(&storage->list_map);
+	list_del(&storage->list_cg);
 	spin_unlock_bh(&map->lock);
 }
 
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 54d0c886e3ba..db93a211d2b1 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -78,7 +78,7 @@  struct bpf_lpm_trie_key {
 
 struct bpf_cgroup_storage_key {
 	__u64	cgroup_inode_id;	/* cgroup inode id */
-	__u32	attach_type;		/* program attach type */
+	__u32	attach_type;		/* program attach type, unused */
 };
 
 /* BPF syscall commands, see bpf(2) man-page for details. */