diff mbox series

[bpf-next,v5,2/3] bpf_xdp_redirect_map: Perform map lookup in eBPF helper

Message ID 156125626136.5209.14349225282974871197.stgit@alrua-x1
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series xdp: Allow lookup into devmaps before redirect | expand

Commit Message

Toke Høiland-Jørgensen June 23, 2019, 2:17 a.m. UTC
From: Toke Høiland-Jørgensen <toke@redhat.com>

The bpf_redirect_map() helper used by XDP programs doesn't return any
indication of whether it can successfully redirect to the map index it was
given. Instead, BPF programs have to track this themselves, leading to
programs using duplicate maps to track which entries are populated in the
devmap.

This patch fixes this by moving the map lookup into the bpf_redirect_map()
helper, which makes it possible to return failure to the eBPF program. The
lower bits of the flags argument is used as the return code, which means
that existing users who pass a '0' flag argument will get XDP_ABORTED.

With this, a BPF program can check the return code from the helper call and
react by, for instance, substituting a different redirect. This works for
any type of map used for redirect.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/linux/filter.h   |    1 +
 include/uapi/linux/bpf.h |    7 +++++--
 net/core/filter.c        |   27 +++++++++++++--------------
 3 files changed, 19 insertions(+), 16 deletions(-)

Comments

Jonathan Lemon June 24, 2019, 4:41 p.m. UTC | #1
On 22 Jun 2019, at 19:17, Toke Høiland-Jørgensen wrote:

> From: Toke Høiland-Jørgensen <toke@redhat.com>
>
> The bpf_redirect_map() helper used by XDP programs doesn't return any
> indication of whether it can successfully redirect to the map index it was
> given. Instead, BPF programs have to track this themselves, leading to
> programs using duplicate maps to track which entries are populated in the
> devmap.
>
> This patch fixes this by moving the map lookup into the bpf_redirect_map()
> helper, which makes it possible to return failure to the eBPF program. The
> lower bits of the flags argument is used as the return code, which means
> that existing users who pass a '0' flag argument will get XDP_ABORTED.
>
> With this, a BPF program can check the return code from the helper call and
> react by, for instance, substituting a different redirect. This works for
> any type of map used for redirect.
>
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Daniel Borkmann June 27, 2019, 9:55 p.m. UTC | #2
On 06/23/2019 04:17 AM, Toke Høiland-Jørgensen wrote:
> From: Toke Høiland-Jørgensen <toke@redhat.com>
> 
> The bpf_redirect_map() helper used by XDP programs doesn't return any
> indication of whether it can successfully redirect to the map index it was
> given. Instead, BPF programs have to track this themselves, leading to
> programs using duplicate maps to track which entries are populated in the
> devmap.
> 
> This patch fixes this by moving the map lookup into the bpf_redirect_map()
> helper, which makes it possible to return failure to the eBPF program. The
> lower bits of the flags argument is used as the return code, which means
> that existing users who pass a '0' flag argument will get XDP_ABORTED.
> 
> With this, a BPF program can check the return code from the helper call and
> react by, for instance, substituting a different redirect. This works for
> any type of map used for redirect.
> 
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>

Overall series looks good to me. Just very small things inline here & in the
other two patches:

[...]
> @@ -3750,9 +3742,16 @@ BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex,
>  {
>  	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
>  
> -	if (unlikely(flags))
> +	/* Lower bits of the flags are used as return code on lookup failure */
> +	if (unlikely(flags > XDP_TX))
>  		return XDP_ABORTED;
>  
> +	ri->item = __xdp_map_lookup_elem(map, ifindex);
> +	if (unlikely(!ri->item)) {
> +		WRITE_ONCE(ri->map, NULL);

This WRITE_ONCE() is not needed. We never set it before at this point.

> +		return flags;
> +	}
> +
>  	ri->ifindex = ifindex;
>  	ri->flags = flags;
>  	WRITE_ONCE(ri->map, map);
>
Daniel Borkmann June 27, 2019, 10:08 p.m. UTC | #3
On 06/23/2019 04:17 AM, Toke Høiland-Jørgensen wrote:
> From: Toke Høiland-Jørgensen <toke@redhat.com>
> 
> The bpf_redirect_map() helper used by XDP programs doesn't return any
> indication of whether it can successfully redirect to the map index it was
> given. Instead, BPF programs have to track this themselves, leading to
> programs using duplicate maps to track which entries are populated in the
> devmap.
> 
> This patch fixes this by moving the map lookup into the bpf_redirect_map()
> helper, which makes it possible to return failure to the eBPF program. The
> lower bits of the flags argument is used as the return code, which means
> that existing users who pass a '0' flag argument will get XDP_ABORTED.
> 
> With this, a BPF program can check the return code from the helper call and
> react by, for instance, substituting a different redirect. This works for
> any type of map used for redirect.
> 
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
[...]
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 183bf4d8e301..a6779e1cc1b8 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3605,17 +3605,13 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
>  			       struct bpf_redirect_info *ri)
>  {
>  	u32 index = ri->ifindex;
> -	void *fwd = NULL;
> +	void *fwd = ri->item;
>  	int err;
>  
>  	ri->ifindex = 0;
> +	ri->item = NULL;
>  	WRITE_ONCE(ri->map, NULL);
>  
> -	fwd = __xdp_map_lookup_elem(map, index);
> -	if (unlikely(!fwd)) {
> -		err = -EINVAL;
> -		goto err;
> -	}

If you look at the _trace_xdp_redirect{,_err}(), we should also get rid of the
extra NULL test in devmap_ifindex() which is not under tracepoint static key.
Toke Høiland-Jørgensen June 28, 2019, 7:17 a.m. UTC | #4
Daniel Borkmann <daniel@iogearbox.net> writes:

> On 06/23/2019 04:17 AM, Toke Høiland-Jørgensen wrote:
>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>> 
>> The bpf_redirect_map() helper used by XDP programs doesn't return any
>> indication of whether it can successfully redirect to the map index it was
>> given. Instead, BPF programs have to track this themselves, leading to
>> programs using duplicate maps to track which entries are populated in the
>> devmap.
>> 
>> This patch fixes this by moving the map lookup into the bpf_redirect_map()
>> helper, which makes it possible to return failure to the eBPF program. The
>> lower bits of the flags argument is used as the return code, which means
>> that existing users who pass a '0' flag argument will get XDP_ABORTED.
>> 
>> With this, a BPF program can check the return code from the helper call and
>> react by, for instance, substituting a different redirect. This works for
>> any type of map used for redirect.
>> 
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>
> Overall series looks good to me. Just very small things inline here & in the
> other two patches:
>
> [...]
>> @@ -3750,9 +3742,16 @@ BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex,
>>  {
>>  	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
>>  
>> -	if (unlikely(flags))
>> +	/* Lower bits of the flags are used as return code on lookup failure */
>> +	if (unlikely(flags > XDP_TX))
>>  		return XDP_ABORTED;
>>  
>> +	ri->item = __xdp_map_lookup_elem(map, ifindex);
>> +	if (unlikely(!ri->item)) {
>> +		WRITE_ONCE(ri->map, NULL);
>
> This WRITE_ONCE() is not needed. We never set it before at this point.

You mean the WRITE_ONCE() wrapper is not needed, or the set-to-NULL is
not needed? The reason I added it is in case an eBPF program calls the
helper twice before returning, where the first lookup succeeds but the
second fails; in that case we want to clear the ->map pointer, no?

-Toke
Toke Høiland-Jørgensen June 28, 2019, 7:23 a.m. UTC | #5
Daniel Borkmann <daniel@iogearbox.net> writes:

> On 06/23/2019 04:17 AM, Toke Høiland-Jørgensen wrote:
>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>> 
>> The bpf_redirect_map() helper used by XDP programs doesn't return any
>> indication of whether it can successfully redirect to the map index it was
>> given. Instead, BPF programs have to track this themselves, leading to
>> programs using duplicate maps to track which entries are populated in the
>> devmap.
>> 
>> This patch fixes this by moving the map lookup into the bpf_redirect_map()
>> helper, which makes it possible to return failure to the eBPF program. The
>> lower bits of the flags argument is used as the return code, which means
>> that existing users who pass a '0' flag argument will get XDP_ABORTED.
>> 
>> With this, a BPF program can check the return code from the helper call and
>> react by, for instance, substituting a different redirect. This works for
>> any type of map used for redirect.
>> 
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> [...]
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 183bf4d8e301..a6779e1cc1b8 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -3605,17 +3605,13 @@ static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
>>  			       struct bpf_redirect_info *ri)
>>  {
>>  	u32 index = ri->ifindex;
>> -	void *fwd = NULL;
>> +	void *fwd = ri->item;
>>  	int err;
>>  
>>  	ri->ifindex = 0;
>> +	ri->item = NULL;
>>  	WRITE_ONCE(ri->map, NULL);
>>  
>> -	fwd = __xdp_map_lookup_elem(map, index);
>> -	if (unlikely(!fwd)) {
>> -		err = -EINVAL;
>> -		goto err;
>> -	}
>
> If you look at the _trace_xdp_redirect{,_err}(), we should also get rid of the
> extra NULL test in devmap_ifindex() which is not under tracepoint static key.

ACK, will add.

-Toke
Daniel Borkmann June 28, 2019, 8:12 a.m. UTC | #6
On 06/28/2019 09:17 AM, Toke Høiland-Jørgensen wrote:
> Daniel Borkmann <daniel@iogearbox.net> writes:
> 
>> On 06/23/2019 04:17 AM, Toke Høiland-Jørgensen wrote:
>>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>>>
>>> The bpf_redirect_map() helper used by XDP programs doesn't return any
>>> indication of whether it can successfully redirect to the map index it was
>>> given. Instead, BPF programs have to track this themselves, leading to
>>> programs using duplicate maps to track which entries are populated in the
>>> devmap.
>>>
>>> This patch fixes this by moving the map lookup into the bpf_redirect_map()
>>> helper, which makes it possible to return failure to the eBPF program. The
>>> lower bits of the flags argument is used as the return code, which means
>>> that existing users who pass a '0' flag argument will get XDP_ABORTED.
>>>
>>> With this, a BPF program can check the return code from the helper call and
>>> react by, for instance, substituting a different redirect. This works for
>>> any type of map used for redirect.
>>>
>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>
>> Overall series looks good to me. Just very small things inline here & in the
>> other two patches:
>>
>> [...]
>>> @@ -3750,9 +3742,16 @@ BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex,
>>>  {
>>>  	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
>>>  
>>> -	if (unlikely(flags))
>>> +	/* Lower bits of the flags are used as return code on lookup failure */
>>> +	if (unlikely(flags > XDP_TX))
>>>  		return XDP_ABORTED;
>>>  
>>> +	ri->item = __xdp_map_lookup_elem(map, ifindex);
>>> +	if (unlikely(!ri->item)) {
>>> +		WRITE_ONCE(ri->map, NULL);
>>
>> This WRITE_ONCE() is not needed. We never set it before at this point.
> 
> You mean the WRITE_ONCE() wrapper is not needed, or the set-to-NULL is
> not needed? The reason I added it is in case an eBPF program calls the
> helper twice before returning, where the first lookup succeeds but the
> second fails; in that case we want to clear the ->map pointer, no?

Yeah I meant the set-to-NULL. So if first call succeeds, and the second one
fails, then the expected semantics wrt the first call are as if the program
would have called bpf_xdp_redirect() only?

Looking at the code again, if we set ri->item to NULL, then we /must/ also
set ri->map to NULL. I guess there are two options: i) leave as is, ii) keep
the __xdp_map_lookup_elem() result in a temp var, if it's NULL return flags,
otherwise only /then/ update ri->item, so that semantics are similar to the
invalid flags check earlier. I guess fine either way, in case of i) there
should probably be a comment since it's less obvious.

Thanks,
Daniel
Toke Høiland-Jørgensen June 28, 2019, 8:18 a.m. UTC | #7
Daniel Borkmann <daniel@iogearbox.net> writes:

> On 06/28/2019 09:17 AM, Toke Høiland-Jørgensen wrote:
>> Daniel Borkmann <daniel@iogearbox.net> writes:
>> 
>>> On 06/23/2019 04:17 AM, Toke Høiland-Jørgensen wrote:
>>>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>>>>
>>>> The bpf_redirect_map() helper used by XDP programs doesn't return any
>>>> indication of whether it can successfully redirect to the map index it was
>>>> given. Instead, BPF programs have to track this themselves, leading to
>>>> programs using duplicate maps to track which entries are populated in the
>>>> devmap.
>>>>
>>>> This patch fixes this by moving the map lookup into the bpf_redirect_map()
>>>> helper, which makes it possible to return failure to the eBPF program. The
>>>> lower bits of the flags argument is used as the return code, which means
>>>> that existing users who pass a '0' flag argument will get XDP_ABORTED.
>>>>
>>>> With this, a BPF program can check the return code from the helper call and
>>>> react by, for instance, substituting a different redirect. This works for
>>>> any type of map used for redirect.
>>>>
>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>
>>> Overall series looks good to me. Just very small things inline here & in the
>>> other two patches:
>>>
>>> [...]
>>>> @@ -3750,9 +3742,16 @@ BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex,
>>>>  {
>>>>  	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
>>>>  
>>>> -	if (unlikely(flags))
>>>> +	/* Lower bits of the flags are used as return code on lookup failure */
>>>> +	if (unlikely(flags > XDP_TX))
>>>>  		return XDP_ABORTED;
>>>>  
>>>> +	ri->item = __xdp_map_lookup_elem(map, ifindex);
>>>> +	if (unlikely(!ri->item)) {
>>>> +		WRITE_ONCE(ri->map, NULL);
>>>
>>> This WRITE_ONCE() is not needed. We never set it before at this point.
>> 
>> You mean the WRITE_ONCE() wrapper is not needed, or the set-to-NULL is
>> not needed? The reason I added it is in case an eBPF program calls the
>> helper twice before returning, where the first lookup succeeds but the
>> second fails; in that case we want to clear the ->map pointer, no?
>
> Yeah I meant the set-to-NULL. So if first call succeeds, and the second one
> fails, then the expected semantics wrt the first call are as if the program
> would have called bpf_xdp_redirect() only?
>
> Looking at the code again, if we set ri->item to NULL, then we /must/
> also set ri->map to NULL. I guess there are two options: i) leave as
> is, ii) keep the __xdp_map_lookup_elem() result in a temp var, if it's
> NULL return flags, otherwise only /then/ update ri->item, so that
> semantics are similar to the invalid flags check earlier. I guess fine
> either way, in case of i) there should probably be a comment since
> it's less obvious.

Yeah, I think a temp var is probably clearer, will do that :)

-Toke
Toke Høiland-Jørgensen June 28, 2019, 8:57 a.m. UTC | #8
Toke Høiland-Jørgensen <toke@redhat.com> writes:

> Daniel Borkmann <daniel@iogearbox.net> writes:
>
>> On 06/28/2019 09:17 AM, Toke Høiland-Jørgensen wrote:
>>> Daniel Borkmann <daniel@iogearbox.net> writes:
>>> 
>>>> On 06/23/2019 04:17 AM, Toke Høiland-Jørgensen wrote:
>>>>> From: Toke Høiland-Jørgensen <toke@redhat.com>
>>>>>
>>>>> The bpf_redirect_map() helper used by XDP programs doesn't return any
>>>>> indication of whether it can successfully redirect to the map index it was
>>>>> given. Instead, BPF programs have to track this themselves, leading to
>>>>> programs using duplicate maps to track which entries are populated in the
>>>>> devmap.
>>>>>
>>>>> This patch fixes this by moving the map lookup into the bpf_redirect_map()
>>>>> helper, which makes it possible to return failure to the eBPF program. The
>>>>> lower bits of the flags argument is used as the return code, which means
>>>>> that existing users who pass a '0' flag argument will get XDP_ABORTED.
>>>>>
>>>>> With this, a BPF program can check the return code from the helper call and
>>>>> react by, for instance, substituting a different redirect. This works for
>>>>> any type of map used for redirect.
>>>>>
>>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>>>
>>>> Overall series looks good to me. Just very small things inline here & in the
>>>> other two patches:
>>>>
>>>> [...]
>>>>> @@ -3750,9 +3742,16 @@ BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex,
>>>>>  {
>>>>>  	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
>>>>>  
>>>>> -	if (unlikely(flags))
>>>>> +	/* Lower bits of the flags are used as return code on lookup failure */
>>>>> +	if (unlikely(flags > XDP_TX))
>>>>>  		return XDP_ABORTED;
>>>>>  
>>>>> +	ri->item = __xdp_map_lookup_elem(map, ifindex);
>>>>> +	if (unlikely(!ri->item)) {
>>>>> +		WRITE_ONCE(ri->map, NULL);
>>>>
>>>> This WRITE_ONCE() is not needed. We never set it before at this point.
>>> 
>>> You mean the WRITE_ONCE() wrapper is not needed, or the set-to-NULL is
>>> not needed? The reason I added it is in case an eBPF program calls the
>>> helper twice before returning, where the first lookup succeeds but the
>>> second fails; in that case we want to clear the ->map pointer, no?
>>
>> Yeah I meant the set-to-NULL. So if first call succeeds, and the second one
>> fails, then the expected semantics wrt the first call are as if the program
>> would have called bpf_xdp_redirect() only?
>>
>> Looking at the code again, if we set ri->item to NULL, then we /must/
>> also set ri->map to NULL. I guess there are two options: i) leave as
>> is, ii) keep the __xdp_map_lookup_elem() result in a temp var, if it's
>> NULL return flags, otherwise only /then/ update ri->item, so that
>> semantics are similar to the invalid flags check earlier. I guess fine
>> either way, in case of i) there should probably be a comment since
>> it's less obvious.
>
> Yeah, I think a temp var is probably clearer, will do that :)

Actually, thinking about this some more, I think it's better to
completely clear out the state the second time around. I'll add a
comment to explain.

-Toke
diff mbox series

Patch

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 43b45d6db36d..f31ae8b9035a 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -580,6 +580,7 @@  struct bpf_skb_data_end {
 struct bpf_redirect_info {
 	u32 ifindex;
 	u32 flags;
+	void *item;
 	struct bpf_map *map;
 	struct bpf_map *map_to_flush;
 	u32 kern_flags;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b077507efa3f..59f3fe61b2b0 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1568,8 +1568,11 @@  union bpf_attr {
  * 		but this is only implemented for native XDP (with driver
  * 		support) as of this writing).
  *
- * 		All values for *flags* are reserved for future usage, and must
- * 		be left at zero.
+ * 		The lower two bits of *flags* are used as the return code if
+ * 		the map lookup fails. This is so that the return value can be
+ * 		one of the XDP program return codes up to XDP_TX, as chosen by
+ * 		the caller. Any higher bits in the *flags* argument must be
+ * 		unset.
  *
  * 		When used to redirect packets to net devices, this helper
  * 		provides a high performance increase over **bpf_redirect**\ ().
diff --git a/net/core/filter.c b/net/core/filter.c
index 183bf4d8e301..a6779e1cc1b8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3605,17 +3605,13 @@  static int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
 			       struct bpf_redirect_info *ri)
 {
 	u32 index = ri->ifindex;
-	void *fwd = NULL;
+	void *fwd = ri->item;
 	int err;
 
 	ri->ifindex = 0;
+	ri->item = NULL;
 	WRITE_ONCE(ri->map, NULL);
 
-	fwd = __xdp_map_lookup_elem(map, index);
-	if (unlikely(!fwd)) {
-		err = -EINVAL;
-		goto err;
-	}
 	if (ri->map_to_flush && unlikely(ri->map_to_flush != map))
 		xdp_do_flush_map();
 
@@ -3652,18 +3648,13 @@  static int xdp_do_generic_redirect_map(struct net_device *dev,
 {
 	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
 	u32 index = ri->ifindex;
-	void *fwd = NULL;
+	void *fwd = ri->item;
 	int err = 0;
 
 	ri->ifindex = 0;
+	ri->item = NULL;
 	WRITE_ONCE(ri->map, NULL);
 
-	fwd = __xdp_map_lookup_elem(map, index);
-	if (unlikely(!fwd)) {
-		err = -EINVAL;
-		goto err;
-	}
-
 	if (map->map_type == BPF_MAP_TYPE_DEVMAP) {
 		struct bpf_dtab_netdev *dst = fwd;
 
@@ -3732,6 +3723,7 @@  BPF_CALL_2(bpf_xdp_redirect, u32, ifindex, u64, flags)
 
 	ri->ifindex = ifindex;
 	ri->flags = flags;
+	ri->item = NULL;
 	WRITE_ONCE(ri->map, NULL);
 
 	return XDP_REDIRECT;
@@ -3750,9 +3742,16 @@  BPF_CALL_3(bpf_xdp_redirect_map, struct bpf_map *, map, u32, ifindex,
 {
 	struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
 
-	if (unlikely(flags))
+	/* Lower bits of the flags are used as return code on lookup failure */
+	if (unlikely(flags > XDP_TX))
 		return XDP_ABORTED;
 
+	ri->item = __xdp_map_lookup_elem(map, ifindex);
+	if (unlikely(!ri->item)) {
+		WRITE_ONCE(ri->map, NULL);
+		return flags;
+	}
+
 	ri->ifindex = ifindex;
 	ri->flags = flags;
 	WRITE_ONCE(ri->map, map);