diff mbox series

[bpf-next,5/8] bpf: Allow selecting reuseport socket from a SOCKMAP

Message ID 20191123110751.6729-6-jakub@cloudflare.com
State Changes Requested
Delegated to: BPF Maintainers
Headers show
Series Extend SOCKMAP to store listening sockets | expand

Commit Message

Jakub Sitnicki Nov. 23, 2019, 11:07 a.m. UTC
SOCKMAP now supports storing references to listening sockets. Nothing keeps
us from using it as an array of sockets to select from in SK_REUSEPORT
programs.

Whitelist the map type with the BPF helper for selecting socket. However,
impose a restriction that the selected socket needs to be a listening TCP
socket or a bound UDP socket (connected or not).

The only other map type that works with the BPF reuseport helper,
REUSEPORT_SOCKARRAY, has a corresponding check in its update operation
handler.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 kernel/bpf/verifier.c | 6 ++++--
 net/core/filter.c     | 2 ++
 2 files changed, 6 insertions(+), 2 deletions(-)

Comments

John Fastabend Nov. 24, 2019, 5:57 a.m. UTC | #1
Jakub Sitnicki wrote:
> SOCKMAP now supports storing references to listening sockets. Nothing keeps
> us from using it as an array of sockets to select from in SK_REUSEPORT
> programs.
> 
> Whitelist the map type with the BPF helper for selecting socket. However,
> impose a restriction that the selected socket needs to be a listening TCP
> socket or a bound UDP socket (connected or not).
> 
> The only other map type that works with the BPF reuseport helper,
> REUSEPORT_SOCKARRAY, has a corresponding check in its update operation
> handler.
> 
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---

Acked-by: John Fastabend <john.fastabend@gmail.com>
Alexei Starovoitov Nov. 25, 2019, 1:24 a.m. UTC | #2
On Sat, Nov 23, 2019 at 12:07:48PM +0100, Jakub Sitnicki wrote:
> SOCKMAP now supports storing references to listening sockets. Nothing keeps
> us from using it as an array of sockets to select from in SK_REUSEPORT
> programs.
> 
> Whitelist the map type with the BPF helper for selecting socket. However,
> impose a restriction that the selected socket needs to be a listening TCP
> socket or a bound UDP socket (connected or not).
> 
> The only other map type that works with the BPF reuseport helper,
> REUSEPORT_SOCKARRAY, has a corresponding check in its update operation
> handler.
> 
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  kernel/bpf/verifier.c | 6 ++++--
>  net/core/filter.c     | 2 ++
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index a0482e1c4a77..111a1eb543ab 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -3685,7 +3685,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
>  		if (func_id != BPF_FUNC_sk_redirect_map &&
>  		    func_id != BPF_FUNC_sock_map_update &&
>  		    func_id != BPF_FUNC_map_delete_elem &&
> -		    func_id != BPF_FUNC_msg_redirect_map)
> +		    func_id != BPF_FUNC_msg_redirect_map &&
> +		    func_id != BPF_FUNC_sk_select_reuseport)
>  			goto error;
>  		break;
>  	case BPF_MAP_TYPE_SOCKHASH:
> @@ -3766,7 +3767,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
>  			goto error;
>  		break;
>  	case BPF_FUNC_sk_select_reuseport:
> -		if (map->map_type != BPF_MAP_TYPE_REUSEPORT_SOCKARRAY)
> +		if (map->map_type != BPF_MAP_TYPE_REUSEPORT_SOCKARRAY &&
> +		    map->map_type != BPF_MAP_TYPE_SOCKMAP)
>  			goto error;
>  		break;
>  	case BPF_FUNC_map_peek_elem:
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 49ded4a7588a..e3fb77353248 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -8723,6 +8723,8 @@ BPF_CALL_4(sk_select_reuseport, struct sk_reuseport_kern *, reuse_kern,
>  	selected_sk = map->ops->map_lookup_elem(map, key);
>  	if (!selected_sk)
>  		return -ENOENT;
> +	if (!sock_flag(selected_sk, SOCK_RCU_FREE))
> +		return -EINVAL;

hmm. I wonder whether this breaks existing users...
Martin,
what do you think?
Could you also take a look at other patches too?
In particular patch 7?
John Fastabend Nov. 25, 2019, 4:17 a.m. UTC | #3
Alexei Starovoitov wrote:
> On Sat, Nov 23, 2019 at 12:07:48PM +0100, Jakub Sitnicki wrote:
> > SOCKMAP now supports storing references to listening sockets. Nothing keeps
> > us from using it as an array of sockets to select from in SK_REUSEPORT
> > programs.
> > 
> > Whitelist the map type with the BPF helper for selecting socket. However,
> > impose a restriction that the selected socket needs to be a listening TCP
> > socket or a bound UDP socket (connected or not).
> > 
> > The only other map type that works with the BPF reuseport helper,
> > REUSEPORT_SOCKARRAY, has a corresponding check in its update operation
> > handler.
> > 
> > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> > ---

[...]

> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 49ded4a7588a..e3fb77353248 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -8723,6 +8723,8 @@ BPF_CALL_4(sk_select_reuseport, struct sk_reuseport_kern *, reuse_kern,
> >  	selected_sk = map->ops->map_lookup_elem(map, key);
> >  	if (!selected_sk)
> >  		return -ENOENT;
> > +	if (!sock_flag(selected_sk, SOCK_RCU_FREE))
> > +		return -EINVAL;
> 
> hmm. I wonder whether this breaks existing users...

There is already this check in reuseport_array_update_check()

	/*
	 * sk must be hashed (i.e. listening in the TCP case or binded
	 * in the UDP case) and
	 * it must also be a SO_REUSEPORT sk (i.e. reuse cannot be NULL).
	 *
	 * Also, sk will be used in bpf helper that is protected by
	 * rcu_read_lock().
	 */
	if (!sock_flag(nsk, SOCK_RCU_FREE) || !sk_hashed(nsk) || !nsk_reuse)
		return -EINVAL;

So I believe it should not cause any problems with existing users. Perhaps
we could consolidate the checks a bit or move it into the update paths if we
wanted. I assume Jakub was just ensuring we don't get here with SOCK_RCU_FREE
set from any of the new paths now. I'll let him answer though.

> Martin,
> what do you think?

More eyes the better.

> Could you also take a look at other patches too?
> In particular patch 7?
> 

Agreed would be good to give 7/8 a look I'm not too familiar with the
selftests there.
Jakub Sitnicki Nov. 25, 2019, 10:40 a.m. UTC | #4
On Mon, Nov 25, 2019 at 05:17 AM CET, John Fastabend wrote:
> Alexei Starovoitov wrote:
>> On Sat, Nov 23, 2019 at 12:07:48PM +0100, Jakub Sitnicki wrote:
>> > SOCKMAP now supports storing references to listening sockets. Nothing keeps
>> > us from using it as an array of sockets to select from in SK_REUSEPORT
>> > programs.
>> >
>> > Whitelist the map type with the BPF helper for selecting socket. However,
>> > impose a restriction that the selected socket needs to be a listening TCP
>> > socket or a bound UDP socket (connected or not).
>> >
>> > The only other map type that works with the BPF reuseport helper,
>> > REUSEPORT_SOCKARRAY, has a corresponding check in its update operation
>> > handler.
>> >
>> > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> > ---
>
> [...]
>
>> > diff --git a/net/core/filter.c b/net/core/filter.c
>> > index 49ded4a7588a..e3fb77353248 100644
>> > --- a/net/core/filter.c
>> > +++ b/net/core/filter.c
>> > @@ -8723,6 +8723,8 @@ BPF_CALL_4(sk_select_reuseport, struct sk_reuseport_kern *, reuse_kern,
>> >  	selected_sk = map->ops->map_lookup_elem(map, key);
>> >  	if (!selected_sk)
>> >  		return -ENOENT;
>> > +	if (!sock_flag(selected_sk, SOCK_RCU_FREE))
>> > +		return -EINVAL;
>>
>> hmm. I wonder whether this breaks existing users...
>
> There is already this check in reuseport_array_update_check()
>
> 	/*
> 	 * sk must be hashed (i.e. listening in the TCP case or binded
> 	 * in the UDP case) and
> 	 * it must also be a SO_REUSEPORT sk (i.e. reuse cannot be NULL).
> 	 *
> 	 * Also, sk will be used in bpf helper that is protected by
> 	 * rcu_read_lock().
> 	 */
> 	if (!sock_flag(nsk, SOCK_RCU_FREE) || !sk_hashed(nsk) || !nsk_reuse)
> 		return -EINVAL;
>
> So I believe it should not cause any problems with existing users. Perhaps
> we could consolidate the checks a bit or move it into the update paths if we
> wanted. I assume Jakub was just ensuring we don't get here with SOCK_RCU_FREE
> set from any of the new paths now. I'll let him answer though.

That was exactly my thinking here.

REUSEPORT_SOCKARRAY can't be populated with sockets that don't have
SOCK_RCU_FREE set. This makes the flag check in sk_select_reuseport BPF
helper redundant for this map type.

SOCKMAP, OTOH, allows storing established TCP sockets, which don't have
SOCK_RCU_FREE flag and shouldn't be used as reuseport targets. The newly
added check protects us against it.

I have a couple tests in the last patch for it -
test_sockmap_reuseport_select_{listening,connected}. Admittedly, UDP is
not covered.

Not sure how we could go about moving the checks to the update path for
SOCKMAP. At update time we don't know if the map will be used with a
reuseport or a sk_{skb,msg} program.

-Jakub

>
>> Martin,
>> what do you think?
>
> More eyes the better.
>
>> Could you also take a look at other patches too?
>> In particular patch 7?
>>
>
> Agreed would be good to give 7/8 a look I'm not too familiar with the
> selftests there.
Martin KaFai Lau Nov. 25, 2019, 10:07 p.m. UTC | #5
On Mon, Nov 25, 2019 at 11:40:41AM +0100, Jakub Sitnicki wrote:
> On Mon, Nov 25, 2019 at 05:17 AM CET, John Fastabend wrote:
> > Alexei Starovoitov wrote:
> >> On Sat, Nov 23, 2019 at 12:07:48PM +0100, Jakub Sitnicki wrote:
> >> > SOCKMAP now supports storing references to listening sockets. Nothing keeps
> >> > us from using it as an array of sockets to select from in SK_REUSEPORT
> >> > programs.
> >> >
> >> > Whitelist the map type with the BPF helper for selecting socket. However,
> >> > impose a restriction that the selected socket needs to be a listening TCP
> >> > socket or a bound UDP socket (connected or not).
> >> >
> >> > The only other map type that works with the BPF reuseport helper,
> >> > REUSEPORT_SOCKARRAY, has a corresponding check in its update operation
> >> > handler.
> >> >
> >> > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> >> > ---
> >
> > [...]
> >
> >> > diff --git a/net/core/filter.c b/net/core/filter.c
> >> > index 49ded4a7588a..e3fb77353248 100644
> >> > --- a/net/core/filter.c
> >> > +++ b/net/core/filter.c
> >> > @@ -8723,6 +8723,8 @@ BPF_CALL_4(sk_select_reuseport, struct sk_reuseport_kern *, reuse_kern,
> >> >  	selected_sk = map->ops->map_lookup_elem(map, key);
> >> >  	if (!selected_sk)
> >> >  		return -ENOENT;
> >> > +	if (!sock_flag(selected_sk, SOCK_RCU_FREE))
> >> > +		return -EINVAL;
If I read it correctly,
this is to avoid the following "if (!reuse)" to return -ENOENT,
and instead returns -EINVAL for non TCP_LISTEN tcp_sock.
It should at least only be done under the "if (!reuse)" then.

Checking SOCK_RCU_FREE to imply TCP_LISTEN is not ideal.
It is not immediately obvious.  Why not directly check
TCP_LISTEN?

Note that the SOCK_RCU_FREE check at the 'slow-path'
reuseport_array_update_check() is because reuseport_array does depend on
call_rcu(&sk->sk_rcu,...) to work, e.g. the reuseport_array
does not hold the sk_refcnt.

> >>
> >> hmm. I wonder whether this breaks existing users...
> >
> > There is already this check in reuseport_array_update_check()
> >
> > 	/*
> > 	 * sk must be hashed (i.e. listening in the TCP case or binded
> > 	 * in the UDP case) and
> > 	 * it must also be a SO_REUSEPORT sk (i.e. reuse cannot be NULL).
> > 	 *
> > 	 * Also, sk will be used in bpf helper that is protected by
> > 	 * rcu_read_lock().
> > 	 */
> > 	if (!sock_flag(nsk, SOCK_RCU_FREE) || !sk_hashed(nsk) || !nsk_reuse)
> > 		return -EINVAL;
> >
> > So I believe it should not cause any problems with existing users. Perhaps
> > we could consolidate the checks a bit or move it into the update paths if we
> > wanted. I assume Jakub was just ensuring we don't get here with SOCK_RCU_FREE
> > set from any of the new paths now. I'll let him answer though.
> 
> That was exactly my thinking here.
> 
> REUSEPORT_SOCKARRAY can't be populated with sockets that don't have
> SOCK_RCU_FREE set. This makes the flag check in sk_select_reuseport BPF
> helper redundant for this map type.
> 
> SOCKMAP, OTOH, allows storing established TCP sockets, which don't have
> SOCK_RCU_FREE flag and shouldn't be used as reuseport targets. The newly
> added check protects us against it.
> 
> I have a couple tests in the last patch for it -
> test_sockmap_reuseport_select_{listening,connected}. Admittedly, UDP is
> not covered.
> 
> Not sure how we could go about moving the checks to the update path for
> SOCKMAP. At update time we don't know if the map will be used with a
> reuseport or a sk_{skb,msg} program.
or make these checks specific to the sockmap's lookup path.



digress a little from this patch,
will the upcoming patches/examples show the use case to have both
TCP_LISTEN and TCP_ESTABLISHED sk in the same sock_map?
Jakub Sitnicki Nov. 26, 2019, 2:30 p.m. UTC | #6
On Mon, Nov 25, 2019 at 11:07 PM CET, Martin Lau wrote:
> On Mon, Nov 25, 2019 at 11:40:41AM +0100, Jakub Sitnicki wrote:
>> On Mon, Nov 25, 2019 at 05:17 AM CET, John Fastabend wrote:
>> > Alexei Starovoitov wrote:
>> >> On Sat, Nov 23, 2019 at 12:07:48PM +0100, Jakub Sitnicki wrote:
>> >> > SOCKMAP now supports storing references to listening sockets. Nothing keeps
>> >> > us from using it as an array of sockets to select from in SK_REUSEPORT
>> >> > programs.
>> >> >
>> >> > Whitelist the map type with the BPF helper for selecting socket. However,
>> >> > impose a restriction that the selected socket needs to be a listening TCP
>> >> > socket or a bound UDP socket (connected or not).
>> >> >
>> >> > The only other map type that works with the BPF reuseport helper,
>> >> > REUSEPORT_SOCKARRAY, has a corresponding check in its update operation
>> >> > handler.
>> >> >
>> >> > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> >> > ---
>> >
>> > [...]
>> >
>> >> > diff --git a/net/core/filter.c b/net/core/filter.c
>> >> > index 49ded4a7588a..e3fb77353248 100644
>> >> > --- a/net/core/filter.c
>> >> > +++ b/net/core/filter.c
>> >> > @@ -8723,6 +8723,8 @@ BPF_CALL_4(sk_select_reuseport, struct sk_reuseport_kern *, reuse_kern,
>> >> >  	selected_sk = map->ops->map_lookup_elem(map, key);
>> >> >  	if (!selected_sk)
>> >> >  		return -ENOENT;
>> >> > +	if (!sock_flag(selected_sk, SOCK_RCU_FREE))
>> >> > +		return -EINVAL;
> If I read it correctly,
> this is to avoid the following "if (!reuse)" to return -ENOENT,
> and instead returns -EINVAL for non TCP_LISTEN tcp_sock.
> It should at least only be done under the "if (!reuse)" then.

Yes, exactly. For an established TCP socket in SOCKMAP we would get
-ENOENT because sk_reuseport_cb is not set. Which is a bit confusing
since the map entry exists.

Returning -EINVAL matches the REUSEPORT_SOCKARRAY update operation
semantics for established TCP sockets.

But this is just about returning an informative error so you're
completely right that this should be done under "if (!reuse)" branch to
avoid the extra cost on the happy path.

> Checking SOCK_RCU_FREE to imply TCP_LISTEN is not ideal.
> It is not immediately obvious.  Why not directly check
> TCP_LISTEN?

I agree, it's not obvious. When I first saw this check in
reuseport_array_update_check it got me puzzled too. I should have added
an explanatory comment there.

Thing is we're not matching on just TCP_LISTEN. REUSEPORT_SOCKARRAY
allows selecting a connected UDP socket as a target as well. It takes
some effort to set up but it's possible even if obscure.

> Note that the SOCK_RCU_FREE check at the 'slow-path'
> reuseport_array_update_check() is because reuseport_array does depend on
> call_rcu(&sk->sk_rcu,...) to work, e.g. the reuseport_array
> does not hold the sk_refcnt.

Oh, so it's not only about socket state like I thought.

This raises the question - does REUSEPORT_SOCKARRAY allow storing
connected UDP sockets by design or is it a happy accident? It doesn't
seem particularly useful.

Either way, thanks for the explanation.

>
>> >>
>> >> hmm. I wonder whether this breaks existing users...
>> >
>> > There is already this check in reuseport_array_update_check()
>> >
>> > 	/*
>> > 	 * sk must be hashed (i.e. listening in the TCP case or binded
>> > 	 * in the UDP case) and
>> > 	 * it must also be a SO_REUSEPORT sk (i.e. reuse cannot be NULL).
>> > 	 *
>> > 	 * Also, sk will be used in bpf helper that is protected by
>> > 	 * rcu_read_lock().
>> > 	 */
>> > 	if (!sock_flag(nsk, SOCK_RCU_FREE) || !sk_hashed(nsk) || !nsk_reuse)
>> > 		return -EINVAL;
>> >
>> > So I believe it should not cause any problems with existing users. Perhaps
>> > we could consolidate the checks a bit or move it into the update paths if we
>> > wanted. I assume Jakub was just ensuring we don't get here with SOCK_RCU_FREE
>> > set from any of the new paths now. I'll let him answer though.
>>
>> That was exactly my thinking here.
>>
>> REUSEPORT_SOCKARRAY can't be populated with sockets that don't have
>> SOCK_RCU_FREE set. This makes the flag check in sk_select_reuseport BPF
>> helper redundant for this map type.
>>
>> SOCKMAP, OTOH, allows storing established TCP sockets, which don't have
>> SOCK_RCU_FREE flag and shouldn't be used as reuseport targets. The newly
>> added check protects us against it.
>>
>> I have a couple tests in the last patch for it -
>> test_sockmap_reuseport_select_{listening,connected}. Admittedly, UDP is
>> not covered.
>>
>> Not sure how we could go about moving the checks to the update path for
>> SOCKMAP. At update time we don't know if the map will be used with a
>> reuseport or a sk_{skb,msg} program.
> or make these checks specific to the sockmap's lookup path.
>
> digress a little from this patch,
> will the upcoming patches/examples show the use case to have both
> TCP_LISTEN and TCP_ESTABLISHED sk in the same sock_map?

No, we have no use for a map instance that mixes listening and
established TCP sockets that I know of.

I'm guessing you would like to avoid adding a new check on the fast-path
(at socket selection time) by filering out sockets in invalid state on
map update, like SOCKARRAY does.

I could imagine setting a flag at map creation time to put SOCKMAP in a
a certain mode. Storing just listening or just established sockets.

OTOH why restrict the user? If you are okay with not adding extra checks
on the happy path in sk_select_reuseport, I would opt for that.

Thanks,
Jakub
Martin KaFai Lau Nov. 26, 2019, 7:03 p.m. UTC | #7
On Tue, Nov 26, 2019 at 03:30:57PM +0100, Jakub Sitnicki wrote:
> On Mon, Nov 25, 2019 at 11:07 PM CET, Martin Lau wrote:
> > On Mon, Nov 25, 2019 at 11:40:41AM +0100, Jakub Sitnicki wrote:
> >> On Mon, Nov 25, 2019 at 05:17 AM CET, John Fastabend wrote:
> >> > Alexei Starovoitov wrote:
> >> >> On Sat, Nov 23, 2019 at 12:07:48PM +0100, Jakub Sitnicki wrote:
> >> >> > SOCKMAP now supports storing references to listening sockets. Nothing keeps
> >> >> > us from using it as an array of sockets to select from in SK_REUSEPORT
> >> >> > programs.
> >> >> >
> >> >> > Whitelist the map type with the BPF helper for selecting socket. However,
> >> >> > impose a restriction that the selected socket needs to be a listening TCP
> >> >> > socket or a bound UDP socket (connected or not).
> >> >> >
> >> >> > The only other map type that works with the BPF reuseport helper,
> >> >> > REUSEPORT_SOCKARRAY, has a corresponding check in its update operation
> >> >> > handler.
> >> >> >
> >> >> > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> >> >> > ---
> >> >
> >> > [...]
> >> >
> >> >> > diff --git a/net/core/filter.c b/net/core/filter.c
> >> >> > index 49ded4a7588a..e3fb77353248 100644
> >> >> > --- a/net/core/filter.c
> >> >> > +++ b/net/core/filter.c
> >> >> > @@ -8723,6 +8723,8 @@ BPF_CALL_4(sk_select_reuseport, struct sk_reuseport_kern *, reuse_kern,
> >> >> >  	selected_sk = map->ops->map_lookup_elem(map, key);
> >> >> >  	if (!selected_sk)
> >> >> >  		return -ENOENT;
> >> >> > +	if (!sock_flag(selected_sk, SOCK_RCU_FREE))
> >> >> > +		return -EINVAL;
> > If I read it correctly,
> > this is to avoid the following "if (!reuse)" to return -ENOENT,
> > and instead returns -EINVAL for non TCP_LISTEN tcp_sock.
> > It should at least only be done under the "if (!reuse)" then.
> 
> Yes, exactly. For an established TCP socket in SOCKMAP we would get
> -ENOENT because sk_reuseport_cb is not set. Which is a bit confusing
> since the map entry exists.
> 
> Returning -EINVAL matches the REUSEPORT_SOCKARRAY update operation
> semantics for established TCP sockets.
> 
> But this is just about returning an informative error so you're
> completely right that this should be done under "if (!reuse)" branch to
> avoid the extra cost on the happy path.
> 
> > Checking SOCK_RCU_FREE to imply TCP_LISTEN is not ideal.
> > It is not immediately obvious.  Why not directly check
> > TCP_LISTEN?
> 
> I agree, it's not obvious. When I first saw this check in
> reuseport_array_update_check it got me puzzled too. I should have added
> an explanatory comment there.
> 
> Thing is we're not matching on just TCP_LISTEN. REUSEPORT_SOCKARRAY
> allows selecting a connected UDP socket as a target as well. It takes
> some effort to set up but it's possible even if obscure.
How about this instead:
if (!reuse)
 	/* reuseport_array only has sk that has non NULL sk_reuseport_cb.
	 * The only (!reuse) case here is, the sk has already been removed from
	 * reuseport_array, so treat it as -ENOENT.
	 *
	 * Other maps (e.g. sock_map) do not provide this guarantee and the sk may
	 * never be in the reuseport to begin with.
	 */
	return map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY ? -ENOENT : -EINVAL;

> 
> > Note that the SOCK_RCU_FREE check at the 'slow-path'
> > reuseport_array_update_check() is because reuseport_array does depend on
> > call_rcu(&sk->sk_rcu,...) to work, e.g. the reuseport_array
> > does not hold the sk_refcnt.
> 
> Oh, so it's not only about socket state like I thought.
> 
> This raises the question - does REUSEPORT_SOCKARRAY allow storing
> connected UDP sockets by design or is it a happy accident? It doesn't
> seem particularly useful.
Not by design/accident on the REUSEPORT_SOCKARRAY side ;)

The intention of REUSEPORT_SOCKARRAY is to allow sk that can be added to
reuse->socks[].
Jakub Sitnicki Nov. 27, 2019, 9:34 p.m. UTC | #8
On Tue, Nov 26, 2019 at 08:03 PM CET, Martin Lau wrote:
> On Tue, Nov 26, 2019 at 03:30:57PM +0100, Jakub Sitnicki wrote:
>> On Mon, Nov 25, 2019 at 11:07 PM CET, Martin Lau wrote:
>> > On Mon, Nov 25, 2019 at 11:40:41AM +0100, Jakub Sitnicki wrote:

[...]

>> I agree, it's not obvious. When I first saw this check in
>> reuseport_array_update_check it got me puzzled too. I should have added
>> an explanatory comment there.
>>
>> Thing is we're not matching on just TCP_LISTEN. REUSEPORT_SOCKARRAY
>> allows selecting a connected UDP socket as a target as well. It takes
>> some effort to set up but it's possible even if obscure.
> How about this instead:
> if (!reuse)
>  	/* reuseport_array only has sk that has non NULL sk_reuseport_cb.
> 	 * The only (!reuse) case here is, the sk has already been removed from
> 	 * reuseport_array, so treat it as -ENOENT.
> 	 *
> 	 * Other maps (e.g. sock_map) do not provide this guarantee and the sk may
> 	 * never be in the reuseport to begin with.
> 	 */
> 	return map->map_type == BPF_MAP_TYPE_REUSEPORT_SOCKARRAY ? -ENOENT : -EINVAL;

Right, apart from established TCP sockets we must not select a listening
socket that's not in a reuseport group either. This covers both
cases. Clever. Thanks for the suggestion.

>
>>
>> > Note that the SOCK_RCU_FREE check at the 'slow-path'
>> > reuseport_array_update_check() is because reuseport_array does depend on
>> > call_rcu(&sk->sk_rcu,...) to work, e.g. the reuseport_array
>> > does not hold the sk_refcnt.
>>
>> Oh, so it's not only about socket state like I thought.
>>
>> This raises the question - does REUSEPORT_SOCKARRAY allow storing
>> connected UDP sockets by design or is it a happy accident? It doesn't
>> seem particularly useful.
> Not by design/accident on the REUSEPORT_SOCKARRAY side ;)
>
> The intention of REUSEPORT_SOCKARRAY is to allow sk that can be added to
> reuse->socks[].

Ah, makes sense. REUSEPORT_SOCKARRAY had to mimic reuseport groups.

-Jakub
diff mbox series

Patch

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index a0482e1c4a77..111a1eb543ab 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3685,7 +3685,8 @@  static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		if (func_id != BPF_FUNC_sk_redirect_map &&
 		    func_id != BPF_FUNC_sock_map_update &&
 		    func_id != BPF_FUNC_map_delete_elem &&
-		    func_id != BPF_FUNC_msg_redirect_map)
+		    func_id != BPF_FUNC_msg_redirect_map &&
+		    func_id != BPF_FUNC_sk_select_reuseport)
 			goto error;
 		break;
 	case BPF_MAP_TYPE_SOCKHASH:
@@ -3766,7 +3767,8 @@  static int check_map_func_compatibility(struct bpf_verifier_env *env,
 			goto error;
 		break;
 	case BPF_FUNC_sk_select_reuseport:
-		if (map->map_type != BPF_MAP_TYPE_REUSEPORT_SOCKARRAY)
+		if (map->map_type != BPF_MAP_TYPE_REUSEPORT_SOCKARRAY &&
+		    map->map_type != BPF_MAP_TYPE_SOCKMAP)
 			goto error;
 		break;
 	case BPF_FUNC_map_peek_elem:
diff --git a/net/core/filter.c b/net/core/filter.c
index 49ded4a7588a..e3fb77353248 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8723,6 +8723,8 @@  BPF_CALL_4(sk_select_reuseport, struct sk_reuseport_kern *, reuse_kern,
 	selected_sk = map->ops->map_lookup_elem(map, key);
 	if (!selected_sk)
 		return -ENOENT;
+	if (!sock_flag(selected_sk, SOCK_RCU_FREE))
+		return -EINVAL;
 
 	reuse = rcu_dereference(selected_sk->sk_reuseport_cb);
 	if (!reuse)