diff mbox series

[bpf-next] bpf: Extend the sk_lookup() helper to XDP hookpoint.

Message ID 20181018115029.77ff8e71@ubun
State Changes Requested, archived
Delegated to: BPF Maintainers
Headers show
Series [bpf-next] bpf: Extend the sk_lookup() helper to XDP hookpoint. | expand

Commit Message

Nitin Hande Oct. 18, 2018, 6:50 p.m. UTC
This patch proposes to extend the sk_lookup() BPF API to the
XDP hookpoint. The sk_lookup() helper supports a lookup
on incoming packet to find the corresponding socket that will
receive this packet. Current support for this BPF API is
at the tc hookpoint. This patch will extend this API at XDP
hookpoint. A XDP program can map the incoming packet to the
5-tuple parameter and invoke the API to find the corresponding
socket structure.

Open Issue
* The underlying code relies on presence of an skb to find out the
right sk for the case of REUSEPORT socket option. Since there is
no skb available at XDP hookpoint, the helper function will return
the first available sk based off the 5 tuple hash. If the desire
is to return a particular sk matching reuseport_cb function, please
suggest way to tackle it, which can be addressed in a future commit.

Signed-off-by: Nitin Hande <Nitin.Hande@gmail.com>
---
 net/core/filter.c | 92 +++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 78 insertions(+), 14 deletions(-)

Comments

Joe Stringer Oct. 18, 2018, 9:06 p.m. UTC | #1
On Thu, 18 Oct 2018 at 11:54, Nitin Hande <nitin.hande@gmail.com> wrote:
>
>
> This patch proposes to extend the sk_lookup() BPF API to the
> XDP hookpoint. The sk_lookup() helper supports a lookup
> on incoming packet to find the corresponding socket that will
> receive this packet. Current support for this BPF API is
> at the tc hookpoint. This patch will extend this API at XDP
> hookpoint. A XDP program can map the incoming packet to the
> 5-tuple parameter and invoke the API to find the corresponding
> socket structure.
>
> Open Issue
> * The underlying code relies on presence of an skb to find out the
> right sk for the case of REUSEPORT socket option. Since there is
> no skb available at XDP hookpoint, the helper function will return
> the first available sk based off the 5 tuple hash. If the desire
> is to return a particular sk matching reuseport_cb function, please
> suggest way to tackle it, which can be addressed in a future commit.
>
> Signed-off-by: Nitin Hande <Nitin.Hande@gmail.com>

Thanks Nitin, LGTM overall.

The REUSEPORT thing suggests that the usage of this helper from XDP
layer may lead to a different socket being selected vs. the equivalent
call at TC hook, or other places where the selection may occur. This
could be a bit counter-intuitive.

One thought I had to work around this was to introduce a flag,
something like BPF_F_FIND_REUSEPORT_SK_BY_HASH. This flag would
effectively communicate in the API that the bpf_sk_lookup_xxx()
functions will only select a REUSEPORT socket based on the hash and
not by, for example BPF_PROG_TYPE_SK_REUSEPORT programs. The absence
of the flag would support finding REUSEPORT sockets by other
mechanisms (which would be allowed for now from TC hooks but would be
disallowed from XDP, since there's no specific plan to support this).
Daniel Borkmann Oct. 18, 2018, 9:20 p.m. UTC | #2
On 10/18/2018 11:06 PM, Joe Stringer wrote:
> On Thu, 18 Oct 2018 at 11:54, Nitin Hande <nitin.hande@gmail.com> wrote:
[...]
>> Open Issue
>> * The underlying code relies on presence of an skb to find out the
>> right sk for the case of REUSEPORT socket option. Since there is
>> no skb available at XDP hookpoint, the helper function will return
>> the first available sk based off the 5 tuple hash. If the desire
>> is to return a particular sk matching reuseport_cb function, please
>> suggest way to tackle it, which can be addressed in a future commit.

>> Signed-off-by: Nitin Hande <Nitin.Hande@gmail.com>
> 
> Thanks Nitin, LGTM overall.
> 
> The REUSEPORT thing suggests that the usage of this helper from XDP
> layer may lead to a different socket being selected vs. the equivalent
> call at TC hook, or other places where the selection may occur. This
> could be a bit counter-intuitive.
> 
> One thought I had to work around this was to introduce a flag,
> something like BPF_F_FIND_REUSEPORT_SK_BY_HASH. This flag would
> effectively communicate in the API that the bpf_sk_lookup_xxx()
> functions will only select a REUSEPORT socket based on the hash and
> not by, for example BPF_PROG_TYPE_SK_REUSEPORT programs. The absence
> of the flag would support finding REUSEPORT sockets by other
> mechanisms (which would be allowed for now from TC hooks but would be
> disallowed from XDP, since there's no specific plan to support this).

Hmm, given skb is NULL here the only way to lookup the socket in such
scenario is based on hash, that is, inet_ehashfn() / inet6_ehashfn(),
perhaps alternative is to pass this hash in from XDP itself to the
helper so it could be custom selector. Do you have a specific use case
on this for XDP (just curious)?

Thanks,
Daniel
Nitin Hande Oct. 18, 2018, 11:32 p.m. UTC | #3
On Thu, 18 Oct 2018 23:20:17 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 10/18/2018 11:06 PM, Joe Stringer wrote:
> > On Thu, 18 Oct 2018 at 11:54, Nitin Hande <nitin.hande@gmail.com> wrote:  
> [...]
> >> Open Issue
> >> * The underlying code relies on presence of an skb to find out the
> >> right sk for the case of REUSEPORT socket option. Since there is
> >> no skb available at XDP hookpoint, the helper function will return
> >> the first available sk based off the 5 tuple hash. If the desire
> >> is to return a particular sk matching reuseport_cb function, please
> >> suggest way to tackle it, which can be addressed in a future commit.  
> 
> >> Signed-off-by: Nitin Hande <Nitin.Hande@gmail.com>  
> > 
> > Thanks Nitin, LGTM overall.
> > 
> > The REUSEPORT thing suggests that the usage of this helper from XDP
> > layer may lead to a different socket being selected vs. the equivalent
> > call at TC hook, or other places where the selection may occur. This
> > could be a bit counter-intuitive.
> > 
> > One thought I had to work around this was to introduce a flag,
> > something like BPF_F_FIND_REUSEPORT_SK_BY_HASH. This flag would
> > effectively communicate in the API that the bpf_sk_lookup_xxx()
> > functions will only select a REUSEPORT socket based on the hash and
> > not by, for example BPF_PROG_TYPE_SK_REUSEPORT programs. The absence
> > of the flag would support finding REUSEPORT sockets by other
> > mechanisms (which would be allowed for now from TC hooks but would be
> > disallowed from XDP, since there's no specific plan to support this). 

Thanks Joe for the quick response.This certainly looks feasible. With the
flag, both tc and XDP hookpoints will be consistent in their approach.
 
> 
> Hmm, given skb is NULL here the only way to lookup the socket in such
> scenario is based on hash, that is, inet_ehashfn() / inet6_ehashfn(),
> perhaps alternative is to pass this hash in from XDP itself to the
> helper so it could be custom selector.

Interesting, and this will be an additional helper or done within this
sk_lookup() helper ?

 Do you have a specific use case
> on this for XDP (just curious)?

Yes, this is for a dual-stack solution. The XDP program functions as a
demux, if there is a receiver the packet will be ingressed on Linux
networking stack, else it enters the other stack path.

Thanks
Nitin

> 
> Thanks,
> Daniel
Joe Stringer Oct. 18, 2018, 11:52 p.m. UTC | #4
On Thu, 18 Oct 2018 at 14:20, Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 10/18/2018 11:06 PM, Joe Stringer wrote:
> > On Thu, 18 Oct 2018 at 11:54, Nitin Hande <nitin.hande@gmail.com> wrote:
> [...]
> >> Open Issue
> >> * The underlying code relies on presence of an skb to find out the
> >> right sk for the case of REUSEPORT socket option. Since there is
> >> no skb available at XDP hookpoint, the helper function will return
> >> the first available sk based off the 5 tuple hash. If the desire
> >> is to return a particular sk matching reuseport_cb function, please
> >> suggest way to tackle it, which can be addressed in a future commit.
>
> >> Signed-off-by: Nitin Hande <Nitin.Hande@gmail.com>
> >
> > Thanks Nitin, LGTM overall.
> >
> > The REUSEPORT thing suggests that the usage of this helper from XDP
> > layer may lead to a different socket being selected vs. the equivalent
> > call at TC hook, or other places where the selection may occur. This
> > could be a bit counter-intuitive.
> >
> > One thought I had to work around this was to introduce a flag,
> > something like BPF_F_FIND_REUSEPORT_SK_BY_HASH. This flag would
> > effectively communicate in the API that the bpf_sk_lookup_xxx()
> > functions will only select a REUSEPORT socket based on the hash and
> > not by, for example BPF_PROG_TYPE_SK_REUSEPORT programs. The absence
> > of the flag would support finding REUSEPORT sockets by other
> > mechanisms (which would be allowed for now from TC hooks but would be
> > disallowed from XDP, since there's no specific plan to support this).
>
> Hmm, given skb is NULL here the only way to lookup the socket in such
> scenario is based on hash, that is, inet_ehashfn() / inet6_ehashfn(),
> perhaps alternative is to pass this hash in from XDP itself to the
> helper so it could be custom selector. Do you have a specific use case
> on this for XDP (just curious)?

I don't have a use case for SO_REUSEPORT introspection from XDP, so
I'm primarily thinking from the perspective of making the behaviour
clear in the API in a way that leaves open the possibility for a
reasonable implementation in future. From that perspective, my main
concern is that it may surprise some BPF writers that the same
"bpf_sk_lookup_tcp()" call (with identical parameters) may have
different behaviour at TC vs. XDP layers, as the BPF selection of
sockets is respected at TC but not at XDP.

FWIW we're already out of parameters for the actual call, so if we
wanted to allow passing a hash in, we'd need to either dedicate half
the 'flags' field for this configurable hash, or consider adding the
new hash parameter to 'struct bpf_sock_tuple'.

+Martin for any thoughts on SO_REUSEPORT and XDP here.
Martin KaFai Lau Oct. 19, 2018, 5:06 a.m. UTC | #5
On Thu, Oct 18, 2018 at 04:52:40PM -0700, Joe Stringer wrote:
> On Thu, 18 Oct 2018 at 14:20, Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 10/18/2018 11:06 PM, Joe Stringer wrote:
> > > On Thu, 18 Oct 2018 at 11:54, Nitin Hande <nitin.hande@gmail.com> wrote:
> > [...]
> > >> Open Issue
> > >> * The underlying code relies on presence of an skb to find out the
> > >> right sk for the case of REUSEPORT socket option. Since there is
> > >> no skb available at XDP hookpoint, the helper function will return
> > >> the first available sk based off the 5 tuple hash. If the desire
> > >> is to return a particular sk matching reuseport_cb function, please
> > >> suggest way to tackle it, which can be addressed in a future commit.
> >
> > >> Signed-off-by: Nitin Hande <Nitin.Hande@gmail.com>
> > >
> > > Thanks Nitin, LGTM overall.
> > >
> > > The REUSEPORT thing suggests that the usage of this helper from XDP
> > > layer may lead to a different socket being selected vs. the equivalent
> > > call at TC hook, or other places where the selection may occur. This
> > > could be a bit counter-intuitive.
> > >
> > > One thought I had to work around this was to introduce a flag,
> > > something like BPF_F_FIND_REUSEPORT_SK_BY_HASH. This flag would
> > > effectively communicate in the API that the bpf_sk_lookup_xxx()
> > > functions will only select a REUSEPORT socket based on the hash and
> > > not by, for example BPF_PROG_TYPE_SK_REUSEPORT programs. The absence
> > > of the flag would support finding REUSEPORT sockets by other
> > > mechanisms (which would be allowed for now from TC hooks but would be
> > > disallowed from XDP, since there's no specific plan to support this).
> >
> > Hmm, given skb is NULL here the only way to lookup the socket in such
> > scenario is based on hash, that is, inet_ehashfn() / inet6_ehashfn(),
> > perhaps alternative is to pass this hash in from XDP itself to the
> > helper so it could be custom selector. Do you have a specific use case
> > on this for XDP (just curious)?
> 
> I don't have a use case for SO_REUSEPORT introspection from XDP, so
> I'm primarily thinking from the perspective of making the behaviour
> clear in the API in a way that leaves open the possibility for a
> reasonable implementation in future. From that perspective, my main
> concern is that it may surprise some BPF writers that the same
> "bpf_sk_lookup_tcp()" call (with identical parameters) may have
> different behaviour at TC vs. XDP layers, as the BPF selection of
> sockets is respected at TC but not at XDP.
> 
> FWIW we're already out of parameters for the actual call, so if we
> wanted to allow passing a hash in, we'd need to either dedicate half
> the 'flags' field for this configurable hash, or consider adding the
> new hash parameter to 'struct bpf_sock_tuple'.
> 
> +Martin for any thoughts on SO_REUSEPORT and XDP here.
The XDP/TC prog has read access to the sk fields through
'struct bpf_sock'?

A quick thought...
Considering all sk in the same reuse->socks[] share
many things (e.g. family,type,protocol,ip,port..etc are the same),
I wonder returning which particular sk from reuse->socks[] will
matter too much since most of the fields from 'struct bpf_sock' will
be the same.  Some of fields in 'struct bpf_sock' could be different
though, like priority?  Hence, another possibility is to limit the
accessible fields for the XDP prog.  Only allow accessing the fields
that must be the same among the sk in the same reuse->socks[].
Joe Stringer Oct. 19, 2018, 4:47 p.m. UTC | #6
On Thu, 18 Oct 2018 at 22:07, Martin Lau <kafai@fb.com> wrote:
>
> On Thu, Oct 18, 2018 at 04:52:40PM -0700, Joe Stringer wrote:
> > On Thu, 18 Oct 2018 at 14:20, Daniel Borkmann <daniel@iogearbox.net> wrote:
> > >
> > > On 10/18/2018 11:06 PM, Joe Stringer wrote:
> > > > On Thu, 18 Oct 2018 at 11:54, Nitin Hande <nitin.hande@gmail.com> wrote:
> > > [...]
> > > >> Open Issue
> > > >> * The underlying code relies on presence of an skb to find out the
> > > >> right sk for the case of REUSEPORT socket option. Since there is
> > > >> no skb available at XDP hookpoint, the helper function will return
> > > >> the first available sk based off the 5 tuple hash. If the desire
> > > >> is to return a particular sk matching reuseport_cb function, please
> > > >> suggest way to tackle it, which can be addressed in a future commit.
> > >
> > > >> Signed-off-by: Nitin Hande <Nitin.Hande@gmail.com>
> > > >
> > > > Thanks Nitin, LGTM overall.
> > > >
> > > > The REUSEPORT thing suggests that the usage of this helper from XDP
> > > > layer may lead to a different socket being selected vs. the equivalent
> > > > call at TC hook, or other places where the selection may occur. This
> > > > could be a bit counter-intuitive.
> > > >
> > > > One thought I had to work around this was to introduce a flag,
> > > > something like BPF_F_FIND_REUSEPORT_SK_BY_HASH. This flag would
> > > > effectively communicate in the API that the bpf_sk_lookup_xxx()
> > > > functions will only select a REUSEPORT socket based on the hash and
> > > > not by, for example BPF_PROG_TYPE_SK_REUSEPORT programs. The absence
> > > > of the flag would support finding REUSEPORT sockets by other
> > > > mechanisms (which would be allowed for now from TC hooks but would be
> > > > disallowed from XDP, since there's no specific plan to support this).
> > >
> > > Hmm, given skb is NULL here the only way to lookup the socket in such
> > > scenario is based on hash, that is, inet_ehashfn() / inet6_ehashfn(),
> > > perhaps alternative is to pass this hash in from XDP itself to the
> > > helper so it could be custom selector. Do you have a specific use case
> > > on this for XDP (just curious)?
> >
> > I don't have a use case for SO_REUSEPORT introspection from XDP, so
> > I'm primarily thinking from the perspective of making the behaviour
> > clear in the API in a way that leaves open the possibility for a
> > reasonable implementation in future. From that perspective, my main
> > concern is that it may surprise some BPF writers that the same
> > "bpf_sk_lookup_tcp()" call (with identical parameters) may have
> > different behaviour at TC vs. XDP layers, as the BPF selection of
> > sockets is respected at TC but not at XDP.
> >
> > FWIW we're already out of parameters for the actual call, so if we
> > wanted to allow passing a hash in, we'd need to either dedicate half
> > the 'flags' field for this configurable hash, or consider adding the
> > new hash parameter to 'struct bpf_sock_tuple'.
> >
> > +Martin for any thoughts on SO_REUSEPORT and XDP here.
> The XDP/TC prog has read access to the sk fields through
> 'struct bpf_sock'?
>
> A quick thought...
> Considering all sk in the same reuse->socks[] share
> many things (e.g. family,type,protocol,ip,port..etc are the same),
> I wonder returning which particular sk from reuse->socks[] will
> matter too much since most of the fields from 'struct bpf_sock' will
> be the same.  Some of fields in 'struct bpf_sock' could be different
> though, like priority?  Hence, another possibility is to limit the
> accessible fields for the XDP prog.  Only allow accessing the fields
> that must be the same among the sk in the same reuse->socks[].

This sounds pretty reasonable to me.
Daniel Borkmann Oct. 19, 2018, 8:32 p.m. UTC | #7
On 10/19/2018 06:47 PM, Joe Stringer wrote:
> On Thu, 18 Oct 2018 at 22:07, Martin Lau <kafai@fb.com> wrote:
>> On Thu, Oct 18, 2018 at 04:52:40PM -0700, Joe Stringer wrote:
>>> On Thu, 18 Oct 2018 at 14:20, Daniel Borkmann <daniel@iogearbox.net> wrote:
>>>> On 10/18/2018 11:06 PM, Joe Stringer wrote:
>>>>> On Thu, 18 Oct 2018 at 11:54, Nitin Hande <nitin.hande@gmail.com> wrote:
>>>> [...]
>>>>>> Open Issue
>>>>>> * The underlying code relies on presence of an skb to find out the
>>>>>> right sk for the case of REUSEPORT socket option. Since there is
>>>>>> no skb available at XDP hookpoint, the helper function will return
>>>>>> the first available sk based off the 5 tuple hash. If the desire
>>>>>> is to return a particular sk matching reuseport_cb function, please
>>>>>> suggest way to tackle it, which can be addressed in a future commit.
>>>>
>>>>>> Signed-off-by: Nitin Hande <Nitin.Hande@gmail.com>
>>>>>
>>>>> Thanks Nitin, LGTM overall.
>>>>>
>>>>> The REUSEPORT thing suggests that the usage of this helper from XDP
>>>>> layer may lead to a different socket being selected vs. the equivalent
>>>>> call at TC hook, or other places where the selection may occur. This
>>>>> could be a bit counter-intuitive.
>>>>>
>>>>> One thought I had to work around this was to introduce a flag,
>>>>> something like BPF_F_FIND_REUSEPORT_SK_BY_HASH. This flag would
>>>>> effectively communicate in the API that the bpf_sk_lookup_xxx()
>>>>> functions will only select a REUSEPORT socket based on the hash and
>>>>> not by, for example BPF_PROG_TYPE_SK_REUSEPORT programs. The absence
>>>>> of the flag would support finding REUSEPORT sockets by other
>>>>> mechanisms (which would be allowed for now from TC hooks but would be
>>>>> disallowed from XDP, since there's no specific plan to support this).
>>>>
>>>> Hmm, given skb is NULL here the only way to lookup the socket in such
>>>> scenario is based on hash, that is, inet_ehashfn() / inet6_ehashfn(),
>>>> perhaps alternative is to pass this hash in from XDP itself to the
>>>> helper so it could be custom selector. Do you have a specific use case
>>>> on this for XDP (just curious)?
>>>
>>> I don't have a use case for SO_REUSEPORT introspection from XDP, so
>>> I'm primarily thinking from the perspective of making the behaviour
>>> clear in the API in a way that leaves open the possibility for a
>>> reasonable implementation in future. From that perspective, my main
>>> concern is that it may surprise some BPF writers that the same
>>> "bpf_sk_lookup_tcp()" call (with identical parameters) may have
>>> different behaviour at TC vs. XDP layers, as the BPF selection of
>>> sockets is respected at TC but not at XDP.
>>>
>>> FWIW we're already out of parameters for the actual call, so if we
>>> wanted to allow passing a hash in, we'd need to either dedicate half
>>> the 'flags' field for this configurable hash, or consider adding the
>>> new hash parameter to 'struct bpf_sock_tuple'.
>>>
>>> +Martin for any thoughts on SO_REUSEPORT and XDP here.
>> The XDP/TC prog has read access to the sk fields through
>> 'struct bpf_sock'?
>>
>> A quick thought...
>> Considering all sk in the same reuse->socks[] share
>> many things (e.g. family,type,protocol,ip,port..etc are the same),
>> I wonder returning which particular sk from reuse->socks[] will
>> matter too much since most of the fields from 'struct bpf_sock' will
>> be the same.  Some of fields in 'struct bpf_sock' could be different
>> though, like priority?  Hence, another possibility is to limit the
>> accessible fields for the XDP prog.  Only allow accessing the fields
>> that must be the same among the sk in the same reuse->socks[].
> 
> This sounds pretty reasonable to me.

Agree, and in any case this difference in returned sk selection should
probably also be documented in the uapi helper description.
Nitin Hande Oct. 19, 2018, 11:04 p.m. UTC | #8
On Fri, 19 Oct 2018 22:32:28 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 10/19/2018 06:47 PM, Joe Stringer wrote:
> > On Thu, 18 Oct 2018 at 22:07, Martin Lau <kafai@fb.com> wrote:  
> >> On Thu, Oct 18, 2018 at 04:52:40PM -0700, Joe Stringer wrote:  
> >>> On Thu, 18 Oct 2018 at 14:20, Daniel Borkmann
> >>> <daniel@iogearbox.net> wrote:  
> >>>> On 10/18/2018 11:06 PM, Joe Stringer wrote:  
> >>>>> On Thu, 18 Oct 2018 at 11:54, Nitin Hande
> >>>>> <nitin.hande@gmail.com> wrote:  
> >>>> [...]  
> >>>>>> Open Issue
> >>>>>> * The underlying code relies on presence of an skb to find out
> >>>>>> the right sk for the case of REUSEPORT socket option. Since
> >>>>>> there is no skb available at XDP hookpoint, the helper
> >>>>>> function will return the first available sk based off the 5
> >>>>>> tuple hash. If the desire is to return a particular sk
> >>>>>> matching reuseport_cb function, please suggest way to tackle
> >>>>>> it, which can be addressed in a future commit.  
> >>>>  
> >>>>>> Signed-off-by: Nitin Hande <Nitin.Hande@gmail.com>  
> >>>>>
> >>>>> Thanks Nitin, LGTM overall.
> >>>>>
> >>>>> The REUSEPORT thing suggests that the usage of this helper from
> >>>>> XDP layer may lead to a different socket being selected vs. the
> >>>>> equivalent call at TC hook, or other places where the selection
> >>>>> may occur. This could be a bit counter-intuitive.
> >>>>>
> >>>>> One thought I had to work around this was to introduce a flag,
> >>>>> something like BPF_F_FIND_REUSEPORT_SK_BY_HASH. This flag would
> >>>>> effectively communicate in the API that the bpf_sk_lookup_xxx()
> >>>>> functions will only select a REUSEPORT socket based on the hash
> >>>>> and not by, for example BPF_PROG_TYPE_SK_REUSEPORT programs.
> >>>>> The absence of the flag would support finding REUSEPORT sockets
> >>>>> by other mechanisms (which would be allowed for now from TC
> >>>>> hooks but would be disallowed from XDP, since there's no
> >>>>> specific plan to support this).  
> >>>>
> >>>> Hmm, given skb is NULL here the only way to lookup the socket in
> >>>> such scenario is based on hash, that is, inet_ehashfn() /
> >>>> inet6_ehashfn(), perhaps alternative is to pass this hash in
> >>>> from XDP itself to the helper so it could be custom selector. Do
> >>>> you have a specific use case on this for XDP (just curious)?  
> >>>
> >>> I don't have a use case for SO_REUSEPORT introspection from XDP,
> >>> so I'm primarily thinking from the perspective of making the
> >>> behaviour clear in the API in a way that leaves open the
> >>> possibility for a reasonable implementation in future. From that
> >>> perspective, my main concern is that it may surprise some BPF
> >>> writers that the same "bpf_sk_lookup_tcp()" call (with identical
> >>> parameters) may have different behaviour at TC vs. XDP layers, as
> >>> the BPF selection of sockets is respected at TC but not at XDP.
> >>>
> >>> FWIW we're already out of parameters for the actual call, so if we
> >>> wanted to allow passing a hash in, we'd need to either dedicate
> >>> half the 'flags' field for this configurable hash, or consider
> >>> adding the new hash parameter to 'struct bpf_sock_tuple'.
> >>>
> >>> +Martin for any thoughts on SO_REUSEPORT and XDP here.  
> >> The XDP/TC prog has read access to the sk fields through
> >> 'struct bpf_sock'?
> >>
> >> A quick thought...
> >> Considering all sk in the same reuse->socks[] share
> >> many things (e.g. family,type,protocol,ip,port..etc are the same),
> >> I wonder returning which particular sk from reuse->socks[] will
> >> matter too much since most of the fields from 'struct bpf_sock'
> >> will be the same.  Some of fields in 'struct bpf_sock' could be
> >> different though, like priority?  Hence, another possibility is to
> >> limit the accessible fields for the XDP prog.  Only allow
> >> accessing the fields that must be the same among the sk in the
> >> same reuse->socks[].  
> > 
> > This sounds pretty reasonable to me.  
> 
> Agree, and in any case this difference in returned sk selection should
> probably also be documented in the uapi helper description.

Okay, will do in a v2.

Thanks
Nitin
diff mbox series

Patch

diff --git a/net/core/filter.c b/net/core/filter.c
index 1a3ac6c46873..85862e41e291 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4694,14 +4694,11 @@  static const struct bpf_func_proto bpf_lwt_seg6_adjust_srh_proto = {
 
 #ifdef CONFIG_INET
 static struct sock *sk_lookup(struct net *net, struct bpf_sock_tuple *tuple,
-			      struct sk_buff *skb, u8 family, u8 proto)
+			      struct sk_buff *skb, int dif, u8 family,
+			      u8 proto)
 {
 	bool refcounted = false;
 	struct sock *sk = NULL;
-	int dif = 0;
-
-	if (skb->dev)
-		dif = skb->dev->ifindex;
 
 	if (family == AF_INET) {
 		__be32 src4 = tuple->ipv4.saddr;
@@ -4751,10 +4748,10 @@  static struct sock *sk_lookup(struct net *net, struct bpf_sock_tuple *tuple,
  * callers to satisfy BPF_CALL declarations.
  */
 static unsigned long
-bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
-	      u8 proto, u64 netns_id, u64 flags)
+__bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
+		struct net *caller_net, u32 ifindex, u8 proto, u64 netns_id,
+		u64 flags)
 {
-	struct net *caller_net;
 	struct sock *sk = NULL;
 	u8 family = AF_UNSPEC;
 	struct net *net;
@@ -4763,19 +4760,15 @@  bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
 	if (unlikely(family == AF_UNSPEC || netns_id > U32_MAX || flags))
 		goto out;
 
-	if (skb->dev)
-		caller_net = dev_net(skb->dev);
-	else
-		caller_net = sock_net(skb->sk);
 	if (netns_id) {
 		net = get_net_ns_by_id(caller_net, netns_id);
 		if (unlikely(!net))
 			goto out;
-		sk = sk_lookup(net, tuple, skb, family, proto);
+		sk = sk_lookup(net, tuple, skb, ifindex, family, proto);
 		put_net(net);
 	} else {
 		net = caller_net;
-		sk = sk_lookup(net, tuple, skb, family, proto);
+		sk = sk_lookup(net, tuple, skb, ifindex, family, proto);
 	}
 
 	if (sk)
@@ -4784,6 +4777,25 @@  bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
 	return (unsigned long) sk;
 }
 
+static unsigned long
+bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len,
+	      u8 proto, u64 netns_id, u64 flags)
+{
+	struct net *caller_net;
+	int ifindex;
+
+	if (skb->dev) {
+		caller_net = dev_net(skb->dev);
+		ifindex = skb->dev->ifindex;
+	} else {
+		caller_net = sock_net(skb->sk);
+		ifindex = 0;
+	}
+
+	return __bpf_sk_lookup(skb, tuple, len, caller_net, ifindex,
+			      proto, netns_id, flags);
+}
+
 BPF_CALL_5(bpf_sk_lookup_tcp, struct sk_buff *, skb,
 	   struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags)
 {
@@ -4833,6 +4845,50 @@  static const struct bpf_func_proto bpf_sk_release_proto = {
 	.ret_type	= RET_INTEGER,
 	.arg1_type	= ARG_PTR_TO_SOCKET,
 };
+
+BPF_CALL_5(bpf_xdp_sk_lookup_udp, struct xdp_buff *, ctx,
+	   struct bpf_sock_tuple *, tuple, u32, len, u32, netns_id, u64, flags)
+{
+	struct net *caller_net = dev_net(ctx->rxq->dev);
+	int ifindex = ctx->rxq->dev->ifindex;
+
+	return __bpf_sk_lookup(NULL, tuple, len, caller_net, ifindex,
+			      IPPROTO_UDP, netns_id, flags);
+}
+
+static const struct bpf_func_proto bpf_xdp_sk_lookup_udp_proto = {
+	.func           = bpf_xdp_sk_lookup_udp,
+	.gpl_only       = false,
+	.pkt_access     = true,
+	.ret_type       = RET_PTR_TO_SOCKET_OR_NULL,
+	.arg1_type      = ARG_PTR_TO_CTX,
+	.arg2_type      = ARG_PTR_TO_MEM,
+	.arg3_type      = ARG_CONST_SIZE,
+	.arg4_type      = ARG_ANYTHING,
+	.arg5_type      = ARG_ANYTHING,
+};
+
+BPF_CALL_5(bpf_xdp_sk_lookup_tcp, struct xdp_buff *, ctx,
+	   struct bpf_sock_tuple *, tuple, u32, len, u32, netns_id, u64, flags)
+{
+	struct net *caller_net = dev_net(ctx->rxq->dev);
+	int ifindex = ctx->rxq->dev->ifindex;
+
+	return __bpf_sk_lookup(NULL, tuple, len, caller_net, ifindex,
+			      IPPROTO_TCP, netns_id, flags);
+}
+
+static const struct bpf_func_proto bpf_xdp_sk_lookup_tcp_proto = {
+	.func           = bpf_xdp_sk_lookup_tcp,
+	.gpl_only       = false,
+	.pkt_access     = true,
+	.ret_type       = RET_PTR_TO_SOCKET_OR_NULL,
+	.arg1_type      = ARG_PTR_TO_CTX,
+	.arg2_type      = ARG_PTR_TO_MEM,
+	.arg3_type      = ARG_CONST_SIZE,
+	.arg4_type      = ARG_ANYTHING,
+	.arg5_type      = ARG_ANYTHING,
+};
 #endif /* CONFIG_INET */
 
 bool bpf_helper_changes_pkt_data(void *func)
@@ -5076,6 +5132,14 @@  xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_xdp_adjust_tail_proto;
 	case BPF_FUNC_fib_lookup:
 		return &bpf_xdp_fib_lookup_proto;
+#ifdef CONFIG_INET
+	case BPF_FUNC_sk_lookup_udp:
+		return &bpf_xdp_sk_lookup_udp_proto;
+	case BPF_FUNC_sk_lookup_tcp:
+		return &bpf_xdp_sk_lookup_tcp_proto;
+	case BPF_FUNC_sk_release:
+		return &bpf_sk_release_proto;
+#endif
 	default:
 		return bpf_base_func_proto(func_id);
 	}