Message ID | 20200630003441.42616-2-alexei.starovoitov@gmail.com |
---|---|
State | Changes Requested |
Delegated to: | BPF Maintainers |
Headers | show |
Series | bpf: Introduce minimal support for sleepable progs | expand |
On Mon, Jun 29, 2020 at 5:35 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > From: Alexei Starovoitov <ast@kernel.org> > > bpf_free_used_maps() or close(map_fd) will trigger map_free callback. > bpf_free_used_maps() is called after bpf prog is no longer executing: > bpf_prog_put->call_rcu->bpf_prog_free->bpf_free_used_maps. > Hence there is no need to call synchronize_rcu() to protect map elements. > > Signed-off-by: Alexei Starovoitov <ast@kernel.org> > --- Seems correct. And nice that maps don't have to care about this anymore. Acked-by: Andrii Nakryiko <andriin@fb.com> > kernel/bpf/arraymap.c | 9 --------- > kernel/bpf/hashtab.c | 8 +++----- > kernel/bpf/lpm_trie.c | 5 ----- > kernel/bpf/queue_stack_maps.c | 7 ------- > kernel/bpf/reuseport_array.c | 2 -- > kernel/bpf/ringbuf.c | 7 ------- > kernel/bpf/stackmap.c | 3 --- > 7 files changed, 3 insertions(+), 38 deletions(-) >
On Mon, Jun 29, 2020 at 5:58 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Mon, Jun 29, 2020 at 5:35 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > From: Alexei Starovoitov <ast@kernel.org> > > > > bpf_free_used_maps() or close(map_fd) will trigger map_free callback. > > bpf_free_used_maps() is called after bpf prog is no longer executing: > > bpf_prog_put->call_rcu->bpf_prog_free->bpf_free_used_maps. > > Hence there is no need to call synchronize_rcu() to protect map elements. > > > > Signed-off-by: Alexei Starovoitov <ast@kernel.org> > > --- > > Seems correct. And nice that maps don't have to care about this anymore. > Actually, what about the map-in-map case? What if you had an array-of-maps with an inner map element. It is the last reference to that map. Now you have two BPF prog executions in parallel. One looked up that inner map and is updating it at the moment. Another execution at the same time deletes that map. That deletion will call bpf_map_put(), which without synchronize_rcu() will free memory. All the while the former BPF program execution is still working with that map. Or am I missing something that would prevent that? > Acked-by: Andrii Nakryiko <andriin@fb.com> > > > kernel/bpf/arraymap.c | 9 --------- > > kernel/bpf/hashtab.c | 8 +++----- > > kernel/bpf/lpm_trie.c | 5 ----- > > kernel/bpf/queue_stack_maps.c | 7 ------- > > kernel/bpf/reuseport_array.c | 2 -- > > kernel/bpf/ringbuf.c | 7 ------- > > kernel/bpf/stackmap.c | 3 --- > > 7 files changed, 3 insertions(+), 38 deletions(-) > >
On Mon, Jun 29, 2020 at 06:08:48PM -0700, Andrii Nakryiko wrote: > On Mon, Jun 29, 2020 at 5:58 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Mon, Jun 29, 2020 at 5:35 PM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > From: Alexei Starovoitov <ast@kernel.org> > > > > > > bpf_free_used_maps() or close(map_fd) will trigger map_free callback. > > > bpf_free_used_maps() is called after bpf prog is no longer executing: > > > bpf_prog_put->call_rcu->bpf_prog_free->bpf_free_used_maps. > > > Hence there is no need to call synchronize_rcu() to protect map elements. > > > > > > Signed-off-by: Alexei Starovoitov <ast@kernel.org> > > > --- > > > > Seems correct. And nice that maps don't have to care about this anymore. > > > > Actually, what about the map-in-map case? > > What if you had an array-of-maps with an inner map element. It is the > last reference to that map. Now you have two BPF prog executions in > parallel. One looked up that inner map and is updating it at the > moment. Another execution at the same time deletes that map. That > deletion will call bpf_map_put(), which without synchronize_rcu() will > free memory. All the while the former BPF program execution is still > working with that map. The delete of that inner map can only be done via sys_bpf() and there we do maybe_wait_bpf_programs() exactly to avoid this kind of problems. It's also necessary for user space. When the user is doing map_update/delete of inner map as soon as syscall returns the user can process old map with guarantees that no bpf prog is touching inner map.
On Mon, Jun 29, 2020 at 7:56 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Mon, Jun 29, 2020 at 06:08:48PM -0700, Andrii Nakryiko wrote: > > On Mon, Jun 29, 2020 at 5:58 PM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > On Mon, Jun 29, 2020 at 5:35 PM Alexei Starovoitov > > > <alexei.starovoitov@gmail.com> wrote: > > > > > > > > From: Alexei Starovoitov <ast@kernel.org> > > > > > > > > bpf_free_used_maps() or close(map_fd) will trigger map_free callback. > > > > bpf_free_used_maps() is called after bpf prog is no longer executing: > > > > bpf_prog_put->call_rcu->bpf_prog_free->bpf_free_used_maps. > > > > Hence there is no need to call synchronize_rcu() to protect map elements. > > > > > > > > Signed-off-by: Alexei Starovoitov <ast@kernel.org> > > > > --- > > > > > > Seems correct. And nice that maps don't have to care about this anymore. > > > > > > > Actually, what about the map-in-map case? > > > > What if you had an array-of-maps with an inner map element. It is the > > last reference to that map. Now you have two BPF prog executions in > > parallel. One looked up that inner map and is updating it at the > > moment. Another execution at the same time deletes that map. That > > deletion will call bpf_map_put(), which without synchronize_rcu() will > > free memory. All the while the former BPF program execution is still > > working with that map. > > The delete of that inner map can only be done via sys_bpf() and there > we do maybe_wait_bpf_programs() exactly to avoid this kind of problems. > It's also necessary for user space. When the user is doing map_update/delete > of inner map as soon as syscall returns the user can process > old map with guarantees that no bpf prog is touching inner map. Ah, that's what I missed. I also constantly forget that map-in-map can't be updated from BPF side. Thanks!
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index ec5cd11032aa..c66e8273fccd 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -386,13 +386,6 @@ static void array_map_free(struct bpf_map *map) { struct bpf_array *array = container_of(map, struct bpf_array, map); - /* at this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0, - * so the programs (can be more than one that used this map) were - * disconnected from events. Wait for outstanding programs to complete - * and free the array - */ - synchronize_rcu(); - if (array->map.map_type == BPF_MAP_TYPE_PERCPU_ARRAY) bpf_array_free_percpu(array); @@ -546,8 +539,6 @@ static void fd_array_map_free(struct bpf_map *map) struct bpf_array *array = container_of(map, struct bpf_array, map); int i; - synchronize_rcu(); - /* make sure it's empty */ for (i = 0; i < array->map.max_entries; i++) BUG_ON(array->ptrs[i] != NULL); diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index acd06081d81d..d4378d7d442b 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -1290,12 +1290,10 @@ static void htab_map_free(struct bpf_map *map) { struct bpf_htab *htab = container_of(map, struct bpf_htab, map); - /* at this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0, - * so the programs (can be more than one that used this map) were - * disconnected from events. Wait for outstanding critical sections in - * these programs to complete + /* bpf_free_used_maps() or close(map_fd) will trigger this map_free callback. + * bpf_free_used_maps() is called after bpf prog is no longer executing. + * There is no need to synchronize_rcu() here to protect map elements. */ - synchronize_rcu(); /* some of free_htab_elem() callbacks for elements of this map may * not have executed. Wait for them. diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c index 1abd4e3f906d..44474bf3ab7a 100644 --- a/kernel/bpf/lpm_trie.c +++ b/kernel/bpf/lpm_trie.c @@ -589,11 +589,6 @@ static void trie_free(struct bpf_map *map) struct lpm_trie_node __rcu **slot; struct lpm_trie_node *node; - /* Wait for outstanding programs to complete - * update/lookup/delete/get_next_key and free the trie. - */ - synchronize_rcu(); - /* Always start at the root and walk down to a node that has no * children. Then free that node, nullify its reference in the parent * and start over. diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c index 80c66a6d7c54..44184f82916a 100644 --- a/kernel/bpf/queue_stack_maps.c +++ b/kernel/bpf/queue_stack_maps.c @@ -101,13 +101,6 @@ static void queue_stack_map_free(struct bpf_map *map) { struct bpf_queue_stack *qs = bpf_queue_stack(map); - /* at this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0, - * so the programs (can be more than one that used this map) were - * disconnected from events. Wait for outstanding critical sections in - * these programs to complete - */ - synchronize_rcu(); - bpf_map_area_free(qs); } diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c index a09922f656e4..3625c4fcc65c 100644 --- a/kernel/bpf/reuseport_array.c +++ b/kernel/bpf/reuseport_array.c @@ -96,8 +96,6 @@ static void reuseport_array_free(struct bpf_map *map) struct sock *sk; u32 i; - synchronize_rcu(); - /* * ops->map_*_elem() will not be able to access this * array now. Hence, this function only races with diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c index dbf37aff4827..13a8d3967e07 100644 --- a/kernel/bpf/ringbuf.c +++ b/kernel/bpf/ringbuf.c @@ -215,13 +215,6 @@ static void ringbuf_map_free(struct bpf_map *map) { struct bpf_ringbuf_map *rb_map; - /* at this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0, - * so the programs (can be more than one that used this map) were - * disconnected from events. Wait for outstanding critical sections in - * these programs to complete - */ - synchronize_rcu(); - rb_map = container_of(map, struct bpf_ringbuf_map, map); bpf_ringbuf_free(rb_map->rb); kfree(rb_map); diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c index 27dc9b1b08a5..071f98d0f7c6 100644 --- a/kernel/bpf/stackmap.c +++ b/kernel/bpf/stackmap.c @@ -604,9 +604,6 @@ static void stack_map_free(struct bpf_map *map) { struct bpf_stack_map *smap = container_of(map, struct bpf_stack_map, map); - /* wait for bpf programs to complete before freeing stack map */ - synchronize_rcu(); - bpf_map_area_free(smap->elems); pcpu_freelist_destroy(&smap->freelist); bpf_map_area_free(smap);