Message ID | 505e5dfeea6ab7dd3719bb9863fc50e7595e06ed.1557789256.git.daniel@iogearbox.net |
---|---|
State | Accepted |
Delegated to: | BPF Maintainers |
Headers | show |
Series | BPF LRU map fix | expand |
On Mon, May 13, 2019 at 4:20 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > Add a callback map_lookup_elem_sys_only() that map implementations > could use over map_lookup_elem() from system call side in case the > map implementation needs to handle the latter differently than from > the BPF data path. If map_lookup_elem_sys_only() is set, this will > be preferred pick for map lookups out of user space. This hook is This is kind of surprising behavior w/ preferred vs default lookup code path. Why the desired behavior can't be achieved with an extra flag, similar to BPF_F_LOCK? It seems like it will be more explicit, more extensible and more generic approach, avoiding duplication of lookup semantics. E.g., for LRU map, with flag on lookup, one can decide whether lookup from inside BPF program (not just from syscall side!) should modify LRU ordering or not, simply by specifying extra flag. Am I missing some complication that prevents us from doing it that way? > used in a follow-up fix for LRU map, but once development window > opens, we can convert other map types from map_lookup_elem() (here, > the one called upon BPF_MAP_LOOKUP_ELEM cmd is meant) over to use > the callback to simplify and clean up the latter. > > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> > Acked-by: Martin KaFai Lau <kafai@fb.com> > --- > include/linux/bpf.h | 1 + > kernel/bpf/syscall.c | 5 ++++- > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 59631dd..4fb3aa2 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -36,6 +36,7 @@ struct bpf_map_ops { > void (*map_free)(struct bpf_map *map); > int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key); > void (*map_release_uref)(struct bpf_map *map); > + void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void *key); > > /* funcs callable from userspace and from eBPF programs */ > void *(*map_lookup_elem)(struct bpf_map *map, void *key); > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index ad3ccf8..cb5440b 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -808,7 +808,10 @@ static int map_lookup_elem(union bpf_attr *attr) > err = map->ops->map_peek_elem(map, value); > } else { > rcu_read_lock(); > - ptr = map->ops->map_lookup_elem(map, key); > + if (map->ops->map_lookup_elem_sys_only) > + ptr = map->ops->map_lookup_elem_sys_only(map, key); > + else > + ptr = map->ops->map_lookup_elem(map, key); > if (IS_ERR(ptr)) { > err = PTR_ERR(ptr); > } else if (!ptr) { > -- > 2.9.5 >
On 05/14/2019 07:04 AM, Andrii Nakryiko wrote: > On Mon, May 13, 2019 at 4:20 PM Daniel Borkmann <daniel@iogearbox.net> wrote: >> >> Add a callback map_lookup_elem_sys_only() that map implementations >> could use over map_lookup_elem() from system call side in case the >> map implementation needs to handle the latter differently than from >> the BPF data path. If map_lookup_elem_sys_only() is set, this will >> be preferred pick for map lookups out of user space. This hook is > > This is kind of surprising behavior w/ preferred vs default lookup > code path. Why the desired behavior can't be achieved with an extra > flag, similar to BPF_F_LOCK? It seems like it will be more explicit, > more extensible and more generic approach, avoiding duplication of > lookup semantics. For lookup from syscall side, this is possible of course. Given the current situation breaks heuristic with any walks of the LRU map, I presume you are saying something like an opt-in flag such as BPF_F_MARK_USED would be more useful? I was thinking about something like this initially, but then I couldn't come up with a concrete use case where it's needed/useful today for user space. Given that, my preference was to only add such flag wait until there is an actual need for it, and in any case, it is trivial to add it later on. Do you have a concrete need for it today that would justify such flag? > E.g., for LRU map, with flag on lookup, one can decide whether lookup > from inside BPF program (not just from syscall side!) should modify > LRU ordering or not, simply by specifying extra flag. Am I missing > some complication that prevents us from doing it that way? For programs it's a bit tricky. The BPF call interface is ... BPF_CALL_2(bpf_map_lookup_elem, struct bpf_map *, map, void *, key) ... meaning verifier does not care what argument 3 and beyond contains. >From BPF context/pov, it could also be uninitialized register. This would mean, we'd need to add a BPF_CALL_3(bpf_map_lookup_elem2, ...) interface which programs would use instead (and to not break existing ones), or some other new helper call that gets a map value argument to unmark the element from LRU side. While all doable one way or another although bit hacky, we should probably clarify and understand the use case for it first, thus brings me back to the last question from above paragraph. Thanks, Daniel
On Tue, May 14, 2019 at 12:59 AM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 05/14/2019 07:04 AM, Andrii Nakryiko wrote: > > On Mon, May 13, 2019 at 4:20 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > >> > >> Add a callback map_lookup_elem_sys_only() that map implementations > >> could use over map_lookup_elem() from system call side in case the > >> map implementation needs to handle the latter differently than from > >> the BPF data path. If map_lookup_elem_sys_only() is set, this will > >> be preferred pick for map lookups out of user space. This hook is > > > > This is kind of surprising behavior w/ preferred vs default lookup > > code path. Why the desired behavior can't be achieved with an extra > > flag, similar to BPF_F_LOCK? It seems like it will be more explicit, > > more extensible and more generic approach, avoiding duplication of > > lookup semantics. > > For lookup from syscall side, this is possible of course. Given the > current situation breaks heuristic with any walks of the LRU map, I > presume you are saying something like an opt-in flag such as > BPF_F_MARK_USED would be more useful? I was thinking about something To preserve existing semantics, it would be opt-out BPF_F_DONT_MARK_USED, if you don't want to update LRU, so that existing use cases don't break. > like this initially, but then I couldn't come up with a concrete use > case where it's needed/useful today for user space. Given that, my > preference was to only add such flag wait until there is an actual > need for it, and in any case, it is trivial to add it later on. Do > you have a concrete need for it today that would justify such flag? So my concern was with having two ops for lookup for maps (map_lookup_elem() and map_lookup_elem_sys_only()) which for existing use cases differ only in whether we are reordering LRU on lookup or not, which felt like would be cleaner to solve with extending ops->map_lookup_elem() to accept flags. But now I realize that there are important implementation limitations preventing doing this cleanly and efficiently, so I rescind my proposal. > > > E.g., for LRU map, with flag on lookup, one can decide whether lookup > > from inside BPF program (not just from syscall side!) should modify > > LRU ordering or not, simply by specifying extra flag. Am I missing > > some complication that prevents us from doing it that way? > > For programs it's a bit tricky. The BPF call interface is ... > > BPF_CALL_2(bpf_map_lookup_elem, struct bpf_map *, map, void *, key) > > ... meaning verifier does not care what argument 3 and beyond contains. > From BPF context/pov, it could also be uninitialized register. This would > mean, we'd need to add a BPF_CALL_3(bpf_map_lookup_elem2, ...) interface > which programs would use instead (and to not break existing ones), or > some other new helper call that gets a map value argument to unmark the > element from LRU side. While all doable one way or another although bit > hacky, we should probably clarify and understand the use case for it > first, thus brings me back to the last question from above paragraph. Yeah, if we wanted to expose this functionality from BPF side right now, we'd have to add new helper w/ extra flags arg. As I mentioned above, though, I assumed it wouldn't be too hard to make existing BPF_CALL_2(bpf_map_lookup_elem, struct bpf_map *, map, void *, key) translate to map->ops->map_lookup_elem(key, 0 /* flags */), filling in default flags = 0 value, but apparently that's not that simple (and will hurt performance). > > Thanks, > Daniel
diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 59631dd..4fb3aa2 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -36,6 +36,7 @@ struct bpf_map_ops { void (*map_free)(struct bpf_map *map); int (*map_get_next_key)(struct bpf_map *map, void *key, void *next_key); void (*map_release_uref)(struct bpf_map *map); + void *(*map_lookup_elem_sys_only)(struct bpf_map *map, void *key); /* funcs callable from userspace and from eBPF programs */ void *(*map_lookup_elem)(struct bpf_map *map, void *key); diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index ad3ccf8..cb5440b 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -808,7 +808,10 @@ static int map_lookup_elem(union bpf_attr *attr) err = map->ops->map_peek_elem(map, value); } else { rcu_read_lock(); - ptr = map->ops->map_lookup_elem(map, key); + if (map->ops->map_lookup_elem_sys_only) + ptr = map->ops->map_lookup_elem_sys_only(map, key); + else + ptr = map->ops->map_lookup_elem(map, key); if (IS_ERR(ptr)) { err = PTR_ERR(ptr); } else if (!ptr) {