Message ID | 156125626136.5209.14349225282974871197.stgit@alrua-x1 |
---|---|
State | Changes Requested |
Delegated to: | BPF Maintainers |
Headers | show |
Series | xdp: Allow lookup into devmaps before redirect | expand |
On 22 Jun 2019, at 19:17, Toke Høiland-Jørgensen wrote: > From: Toke Høiland-Jørgensen <toke@redhat.com> > > The bpf_redirect_map() helper used by XDP programs doesn't return any > indication of whether it can successfully redirect to the map index it was > given. Instead, BPF programs have to track this themselves, leading to > programs using duplicate maps to track which entries are populated in the > devmap. > > This patch fixes this by moving the map lookup into the bpf_redirect_map() > helper, which makes it possible to return failure to the eBPF program. The > lower bits of the flags argument is used as the return code, which means > that existing users who pass a '0' flag argument will get XDP_ABORTED. > > With this, a BPF program can check the return code from the helper call and > react by, for instance, substituting a different redirect. This works for > any type of map used for redirect. > > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
On 06/23/2019 04:17 AM, Toke Høiland-Jørgensen wrote: > From: Toke Høiland-Jørgensen <toke@redhat.com> > > The bpf_redirect_map() helper used by XDP programs doesn't return any > indication of whether it can successfully redirect to the map index it was > given. Instead, BPF programs have to track this themselves, leading to > programs using duplicate maps to track which entries are populated in the > devmap. > > This patch fixes this by moving the map lookup into the bpf_redirect_map() > helper, which makes it possible to return failure to the eBPF program. The > lower bits of the flags argument is used as the return code, which means > that existing users who pass a '0' flag argument will get XDP_ABORTED. > > With this, a BPF program can check the return code from the helper call and > react by, for instance, substituting a different redirect. This works for > any type of map used for redirect. > > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Overall series looks good to me. Just very small things inline here & in the other two patches: [...] > @@ -3750,9 +3742,16 @@ BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex, > { > struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); > > - if (unlikely(flags)) > + /* Lower bits of the flags are used as return code on lookup failure */ > + if (unlikely(flags > XDP_TX)) > return XDP_ABORTED; > > + ri->item = __xdp_map_lookup_elem(map, ifindex); > + if (unlikely(!ri->item)) { > + WRITE_ONCE(ri->map, NULL); This WRITE_ONCE() is not needed. We never set it before at this point. > + return flags; > + } > + > ri->ifindex = ifindex; > ri->flags = flags; > WRITE_ONCE(ri->map, map); >
On 06/23/2019 04:17 AM, Toke Høiland-Jørgensen wrote: > From: Toke Høiland-Jørgensen <toke@redhat.com> > > The bpf_redirect_map() helper used by XDP programs doesn't return any > indication of whether it can successfully redirect to the map index it was > given. Instead, BPF programs have to track this themselves, leading to > programs using duplicate maps to track which entries are populated in the > devmap. > > This patch fixes this by moving the map lookup into the bpf_redirect_map() > helper, which makes it possible to return failure to the eBPF program. The > lower bits of the flags argument is used as the return code, which means > that existing users who pass a '0' flag argument will get XDP_ABORTED. > > With this, a BPF program can check the return code from the helper call and > react by, for instance, substituting a different redirect. This works for > any type of map used for redirect. > > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> [...] > diff --git a/net/core/filter.c b/net/core/filter.c > index 183bf4d8e301..a6779e1cc1b8 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -3605,17 +3605,13 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp, > struct bpf_redirect_info *ri) > { > u32 index = ri->ifindex; > - void *fwd = NULL; > + void *fwd = ri->item; > int err; > > ri->ifindex = 0; > + ri->item = NULL; > WRITE_ONCE(ri->map, NULL); > > - fwd = __xdp_map_lookup_elem(map, index); > - if (unlikely(!fwd)) { > - err = -EINVAL; > - goto err; > - } If you look at the _trace_xdp_redirect{,_err}(), we should also get rid of the extra NULL test in devmap_ifindex() which is not under tracepoint static key.
Daniel Borkmann <daniel@iogearbox.net> writes: > On 06/23/2019 04:17 AM, Toke Høiland-Jørgensen wrote: >> From: Toke Høiland-Jørgensen <toke@redhat.com> >> >> The bpf_redirect_map() helper used by XDP programs doesn't return any >> indication of whether it can successfully redirect to the map index it was >> given. Instead, BPF programs have to track this themselves, leading to >> programs using duplicate maps to track which entries are populated in the >> devmap. >> >> This patch fixes this by moving the map lookup into the bpf_redirect_map() >> helper, which makes it possible to return failure to the eBPF program. The >> lower bits of the flags argument is used as the return code, which means >> that existing users who pass a '0' flag argument will get XDP_ABORTED. >> >> With this, a BPF program can check the return code from the helper call and >> react by, for instance, substituting a different redirect. This works for >> any type of map used for redirect. >> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> > > Overall series looks good to me. Just very small things inline here & in the > other two patches: > > [...] >> @@ -3750,9 +3742,16 @@ BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex, >> { >> struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); >> >> - if (unlikely(flags)) >> + /* Lower bits of the flags are used as return code on lookup failure */ >> + if (unlikely(flags > XDP_TX)) >> return XDP_ABORTED; >> >> + ri->item = __xdp_map_lookup_elem(map, ifindex); >> + if (unlikely(!ri->item)) { >> + WRITE_ONCE(ri->map, NULL); > > This WRITE_ONCE() is not needed. We never set it before at this point. You mean the WRITE_ONCE() wrapper is not needed, or the set-to-NULL is not needed? The reason I added it is in case an eBPF program calls the helper twice before returning, where the first lookup succeeds but the second fails; in that case we want to clear the ->map pointer, no? -Toke
Daniel Borkmann <daniel@iogearbox.net> writes: > On 06/23/2019 04:17 AM, Toke Høiland-Jørgensen wrote: >> From: Toke Høiland-Jørgensen <toke@redhat.com> >> >> The bpf_redirect_map() helper used by XDP programs doesn't return any >> indication of whether it can successfully redirect to the map index it was >> given. Instead, BPF programs have to track this themselves, leading to >> programs using duplicate maps to track which entries are populated in the >> devmap. >> >> This patch fixes this by moving the map lookup into the bpf_redirect_map() >> helper, which makes it possible to return failure to the eBPF program. The >> lower bits of the flags argument is used as the return code, which means >> that existing users who pass a '0' flag argument will get XDP_ABORTED. >> >> With this, a BPF program can check the return code from the helper call and >> react by, for instance, substituting a different redirect. This works for >> any type of map used for redirect. >> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> > [...] >> diff --git a/net/core/filter.c b/net/core/filter.c >> index 183bf4d8e301..a6779e1cc1b8 100644 >> --- a/net/core/filter.c >> +++ b/net/core/filter.c >> @@ -3605,17 +3605,13 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp, >> struct bpf_redirect_info *ri) >> { >> u32 index = ri->ifindex; >> - void *fwd = NULL; >> + void *fwd = ri->item; >> int err; >> >> ri->ifindex = 0; >> + ri->item = NULL; >> WRITE_ONCE(ri->map, NULL); >> >> - fwd = __xdp_map_lookup_elem(map, index); >> - if (unlikely(!fwd)) { >> - err = -EINVAL; >> - goto err; >> - } > > If you look at the _trace_xdp_redirect{,_err}(), we should also get rid of the > extra NULL test in devmap_ifindex() which is not under tracepoint static key. ACK, will add. -Toke
On 06/28/2019 09:17 AM, Toke Høiland-Jørgensen wrote: > Daniel Borkmann <daniel@iogearbox.net> writes: > >> On 06/23/2019 04:17 AM, Toke Høiland-Jørgensen wrote: >>> From: Toke Høiland-Jørgensen <toke@redhat.com> >>> >>> The bpf_redirect_map() helper used by XDP programs doesn't return any >>> indication of whether it can successfully redirect to the map index it was >>> given. Instead, BPF programs have to track this themselves, leading to >>> programs using duplicate maps to track which entries are populated in the >>> devmap. >>> >>> This patch fixes this by moving the map lookup into the bpf_redirect_map() >>> helper, which makes it possible to return failure to the eBPF program. The >>> lower bits of the flags argument is used as the return code, which means >>> that existing users who pass a '0' flag argument will get XDP_ABORTED. >>> >>> With this, a BPF program can check the return code from the helper call and >>> react by, for instance, substituting a different redirect. This works for >>> any type of map used for redirect. >>> >>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> >> >> Overall series looks good to me. Just very small things inline here & in the >> other two patches: >> >> [...] >>> @@ -3750,9 +3742,16 @@ BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex, >>> { >>> struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); >>> >>> - if (unlikely(flags)) >>> + /* Lower bits of the flags are used as return code on lookup failure */ >>> + if (unlikely(flags > XDP_TX)) >>> return XDP_ABORTED; >>> >>> + ri->item = __xdp_map_lookup_elem(map, ifindex); >>> + if (unlikely(!ri->item)) { >>> + WRITE_ONCE(ri->map, NULL); >> >> This WRITE_ONCE() is not needed. We never set it before at this point. > > You mean the WRITE_ONCE() wrapper is not needed, or the set-to-NULL is > not needed? The reason I added it is in case an eBPF program calls the > helper twice before returning, where the first lookup succeeds but the > second fails; in that case we want to clear the ->map pointer, no? Yeah I meant the set-to-NULL. So if first call succeeds, and the second one fails, then the expected semantics wrt the first call are as if the program would have called bpf_xdp_redirect() only? Looking at the code again, if we set ri->item to NULL, then we /must/ also set ri->map to NULL. I guess there are two options: i) leave as is, ii) keep the __xdp_map_lookup_elem() result in a temp var, if it's NULL return flags, otherwise only /then/ update ri->item, so that semantics are similar to the invalid flags check earlier. I guess fine either way, in case of i) there should probably be a comment since it's less obvious. Thanks, Daniel
Daniel Borkmann <daniel@iogearbox.net> writes: > On 06/28/2019 09:17 AM, Toke Høiland-Jørgensen wrote: >> Daniel Borkmann <daniel@iogearbox.net> writes: >> >>> On 06/23/2019 04:17 AM, Toke Høiland-Jørgensen wrote: >>>> From: Toke Høiland-Jørgensen <toke@redhat.com> >>>> >>>> The bpf_redirect_map() helper used by XDP programs doesn't return any >>>> indication of whether it can successfully redirect to the map index it was >>>> given. Instead, BPF programs have to track this themselves, leading to >>>> programs using duplicate maps to track which entries are populated in the >>>> devmap. >>>> >>>> This patch fixes this by moving the map lookup into the bpf_redirect_map() >>>> helper, which makes it possible to return failure to the eBPF program. The >>>> lower bits of the flags argument is used as the return code, which means >>>> that existing users who pass a '0' flag argument will get XDP_ABORTED. >>>> >>>> With this, a BPF program can check the return code from the helper call and >>>> react by, for instance, substituting a different redirect. This works for >>>> any type of map used for redirect. >>>> >>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> >>> >>> Overall series looks good to me. Just very small things inline here & in the >>> other two patches: >>> >>> [...] >>>> @@ -3750,9 +3742,16 @@ BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex, >>>> { >>>> struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); >>>> >>>> - if (unlikely(flags)) >>>> + /* Lower bits of the flags are used as return code on lookup failure */ >>>> + if (unlikely(flags > XDP_TX)) >>>> return XDP_ABORTED; >>>> >>>> + ri->item = __xdp_map_lookup_elem(map, ifindex); >>>> + if (unlikely(!ri->item)) { >>>> + WRITE_ONCE(ri->map, NULL); >>> >>> This WRITE_ONCE() is not needed. We never set it before at this point. >> >> You mean the WRITE_ONCE() wrapper is not needed, or the set-to-NULL is >> not needed? The reason I added it is in case an eBPF program calls the >> helper twice before returning, where the first lookup succeeds but the >> second fails; in that case we want to clear the ->map pointer, no? > > Yeah I meant the set-to-NULL. So if first call succeeds, and the second one > fails, then the expected semantics wrt the first call are as if the program > would have called bpf_xdp_redirect() only? > > Looking at the code again, if we set ri->item to NULL, then we /must/ > also set ri->map to NULL. I guess there are two options: i) leave as > is, ii) keep the __xdp_map_lookup_elem() result in a temp var, if it's > NULL return flags, otherwise only /then/ update ri->item, so that > semantics are similar to the invalid flags check earlier. I guess fine > either way, in case of i) there should probably be a comment since > it's less obvious. Yeah, I think a temp var is probably clearer, will do that :) -Toke
Toke Høiland-Jørgensen <toke@redhat.com> writes: > Daniel Borkmann <daniel@iogearbox.net> writes: > >> On 06/28/2019 09:17 AM, Toke Høiland-Jørgensen wrote: >>> Daniel Borkmann <daniel@iogearbox.net> writes: >>> >>>> On 06/23/2019 04:17 AM, Toke Høiland-Jørgensen wrote: >>>>> From: Toke Høiland-Jørgensen <toke@redhat.com> >>>>> >>>>> The bpf_redirect_map() helper used by XDP programs doesn't return any >>>>> indication of whether it can successfully redirect to the map index it was >>>>> given. Instead, BPF programs have to track this themselves, leading to >>>>> programs using duplicate maps to track which entries are populated in the >>>>> devmap. >>>>> >>>>> This patch fixes this by moving the map lookup into the bpf_redirect_map() >>>>> helper, which makes it possible to return failure to the eBPF program. The >>>>> lower bits of the flags argument is used as the return code, which means >>>>> that existing users who pass a '0' flag argument will get XDP_ABORTED. >>>>> >>>>> With this, a BPF program can check the return code from the helper call and >>>>> react by, for instance, substituting a different redirect. This works for >>>>> any type of map used for redirect. >>>>> >>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> >>>> >>>> Overall series looks good to me. Just very small things inline here & in the >>>> other two patches: >>>> >>>> [...] >>>>> @@ -3750,9 +3742,16 @@ BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex, >>>>> { >>>>> struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); >>>>> >>>>> - if (unlikely(flags)) >>>>> + /* Lower bits of the flags are used as return code on lookup failure */ >>>>> + if (unlikely(flags > XDP_TX)) >>>>> return XDP_ABORTED; >>>>> >>>>> + ri->item = __xdp_map_lookup_elem(map, ifindex); >>>>> + if (unlikely(!ri->item)) { >>>>> + WRITE_ONCE(ri->map, NULL); >>>> >>>> This WRITE_ONCE() is not needed. We never set it before at this point. >>> >>> You mean the WRITE_ONCE() wrapper is not needed, or the set-to-NULL is >>> not needed? The reason I added it is in case an eBPF program calls the >>> helper twice before returning, where the first lookup succeeds but the >>> second fails; in that case we want to clear the ->map pointer, no? >> >> Yeah I meant the set-to-NULL. So if first call succeeds, and the second one >> fails, then the expected semantics wrt the first call are as if the program >> would have called bpf_xdp_redirect() only? >> >> Looking at the code again, if we set ri->item to NULL, then we /must/ >> also set ri->map to NULL. I guess there are two options: i) leave as >> is, ii) keep the __xdp_map_lookup_elem() result in a temp var, if it's >> NULL return flags, otherwise only /then/ update ri->item, so that >> semantics are similar to the invalid flags check earlier. I guess fine >> either way, in case of i) there should probably be a comment since >> it's less obvious. > > Yeah, I think a temp var is probably clearer, will do that :) Actually, thinking about this some more, I think it's better to completely clear out the state the second time around. I'll add a comment to explain. -Toke
diff --git a/include/linux/filter.h b/include/linux/filter.h index 43b45d6db36d..f31ae8b9035a 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -580,6 +580,7 @@ struct bpf_skb_data_end { struct bpf_redirect_info { u32 ifindex; u32 flags; + void *item; struct bpf_map *map; struct bpf_map *map_to_flush; u32 kern_flags; diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index b077507efa3f..59f3fe61b2b0 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1568,8 +1568,11 @@ union bpf_attr { * but this is only implemented for native XDP (with driver * support) as of this writing). * - * All values for *flags* are reserved for future usage, and must - * be left at zero. + * The lower two bits of *flags* are used as the return code if + * the map lookup fails. This is so that the return value can be + * one of the XDP program return codes up to XDP_TX, as chosen by + * the caller. Any higher bits in the *flags* argument must be + * unset. * * When used to redirect packets to net devices, this helper * provides a high performance increase over **bpf_redirect**\ (). diff --git a/net/core/filter.c b/net/core/filter.c index 183bf4d8e301..a6779e1cc1b8 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3605,17 +3605,13 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp, struct bpf_redirect_info *ri) { u32 index = ri->ifindex; - void *fwd = NULL; + void *fwd = ri->item; int err; ri->ifindex = 0; + ri->item = NULL; WRITE_ONCE(ri->map, NULL); - fwd = __xdp_map_lookup_elem(map, index); - if (unlikely(!fwd)) { - err = -EINVAL; - goto err; - } if (ri->map_to_flush && unlikely(ri->map_to_flush != map)) xdp_do_flush_map(); @@ -3652,18 +3648,13 @@ static int xdp_do_generic_redirect_map(struct net_device *dev, { struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); u32 index = ri->ifindex; - void *fwd = NULL; + void *fwd = ri->item; int err = 0; ri->ifindex = 0; + ri->item = NULL; WRITE_ONCE(ri->map, NULL); - fwd = __xdp_map_lookup_elem(map, index); - if (unlikely(!fwd)) { - err = -EINVAL; - goto err; - } - if (map->map_type == BPF_MAP_TYPE_DEVMAP) { struct bpf_dtab_netdev *dst = fwd; @@ -3732,6 +3723,7 @@ BPF_CALL_2(bpf_xdp_redirect, u32, ifindex, u64, flags) ri->ifindex = ifindex; ri->flags = flags; + ri->item = NULL; WRITE_ONCE(ri->map, NULL); return XDP_REDIRECT; @@ -3750,9 +3742,16 @@ BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex, { struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info); - if (unlikely(flags)) + /* Lower bits of the flags are used as return code on lookup failure */ + if (unlikely(flags > XDP_TX)) return XDP_ABORTED; + ri->item = __xdp_map_lookup_elem(map, ifindex); + if (unlikely(!ri->item)) { + WRITE_ONCE(ri->map, NULL); + return flags; + } + ri->ifindex = ifindex; ri->flags = flags; WRITE_ONCE(ri->map, map);