mbox series

[RFC,bpf-next,0/5] Extend SOCKMAP to store listening sockets

Message ID 20191022113730.29303-1-jakub@cloudflare.com
Headers show
Series Extend SOCKMAP to store listening sockets | expand

Message

Jakub Sitnicki Oct. 22, 2019, 11:37 a.m. UTC
This patch set is a follow up on a suggestion from LPC '19 discussions to
make SOCKMAP (or a new map type derived from it) a generic type for storing
established as well as listening sockets.

We found ourselves in need of a map type that keeps references to listening
sockets when working on making the socket lookup programmable, aka BPF
inet_lookup [1].  Initially we repurposed REUSEPORT_SOCKARRAY but found it
problematic to extend due to being tightly coupled with reuseport
logic (see slides [2]). So we've turned our attention to SOCKMAP instead.

As it turns out the changes needed to make SOCKMAP suitable for storing
listening sockets are self-contained and have use outside of programming
the socket lookup. Hence this patch set.

With these patches SOCKMAP can be used in SK_REUSEPORT BPF programs as a
drop-in replacement for REUSEPORT_SOCKARRAY for TCP. This can hopefully
lead to code consolidation between the two map types in the future.

Having said that, the main intention here is to lay groundwork for using
SOCKMAP in the next iteration of programmable socket lookup patches.

I'm looking for feedback if there's anything fundamentally wrong with
extending SOCKMAP map type like this that I might have missed.

Thanks,
Jakub

[1] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/
[2] https://linuxplumbersconf.org/event/4/contributions/487/attachments/238/417/Programmable_socket_lookup_LPC_19.pdf


Jakub Sitnicki (5):
  bpf, sockmap: Let BPF helpers use lookup operation on SOCKMAP
  bpf, sockmap: Allow inserting listening TCP sockets into SOCKMAP
  bpf, sockmap: Don't let child socket inherit psock or its ops on copy
  bpf: Allow selecting reuseport socket from a SOCKMAP
  selftests/bpf: Extend SK_REUSEPORT tests to cover SOCKMAP

 kernel/bpf/verifier.c                         |   6 +-
 net/core/sock_map.c                           |  11 +-
 net/ipv4/tcp_bpf.c                            |  30 ++++
 tools/testing/selftests/bpf/Makefile          |   7 +-
 .../selftests/bpf/test_select_reuseport.c     | 141 ++++++++++++++----
 .../selftests/bpf/test_select_reuseport.sh    |  14 ++
 6 files changed, 173 insertions(+), 36 deletions(-)
 create mode 100755 tools/testing/selftests/bpf/test_select_reuseport.sh

Comments

Alexei Starovoitov Oct. 24, 2019, 4:12 p.m. UTC | #1
On Tue, Oct 22, 2019 at 01:37:25PM +0200, Jakub Sitnicki wrote:
> This patch set is a follow up on a suggestion from LPC '19 discussions to
> make SOCKMAP (or a new map type derived from it) a generic type for storing
> established as well as listening sockets.
> 
> We found ourselves in need of a map type that keeps references to listening
> sockets when working on making the socket lookup programmable, aka BPF
> inet_lookup [1].  Initially we repurposed REUSEPORT_SOCKARRAY but found it
> problematic to extend due to being tightly coupled with reuseport
> logic (see slides [2]). So we've turned our attention to SOCKMAP instead.
> 
> As it turns out the changes needed to make SOCKMAP suitable for storing
> listening sockets are self-contained and have use outside of programming
> the socket lookup. Hence this patch set.
> 
> With these patches SOCKMAP can be used in SK_REUSEPORT BPF programs as a
> drop-in replacement for REUSEPORT_SOCKARRAY for TCP. This can hopefully
> lead to code consolidation between the two map types in the future.
> 
> Having said that, the main intention here is to lay groundwork for using
> SOCKMAP in the next iteration of programmable socket lookup patches.
> 
> I'm looking for feedback if there's anything fundamentally wrong with
> extending SOCKMAP map type like this that I might have missed.

John, Martin,
comments?
John Fastabend Oct. 24, 2019, 4:56 p.m. UTC | #2
Jakub Sitnicki wrote:
> This patch set is a follow up on a suggestion from LPC '19 discussions to
> make SOCKMAP (or a new map type derived from it) a generic type for storing
> established as well as listening sockets.
> 
> We found ourselves in need of a map type that keeps references to listening
> sockets when working on making the socket lookup programmable, aka BPF
> inet_lookup [1].  Initially we repurposed REUSEPORT_SOCKARRAY but found it
> problematic to extend due to being tightly coupled with reuseport
> logic (see slides [2]). So we've turned our attention to SOCKMAP instead.
> 
> As it turns out the changes needed to make SOCKMAP suitable for storing
> listening sockets are self-contained and have use outside of programming
> the socket lookup. Hence this patch set.
> 
> With these patches SOCKMAP can be used in SK_REUSEPORT BPF programs as a
> drop-in replacement for REUSEPORT_SOCKARRAY for TCP. This can hopefully
> lead to code consolidation between the two map types in the future.
> 
> Having said that, the main intention here is to lay groundwork for using
> SOCKMAP in the next iteration of programmable socket lookup patches.
> 
> I'm looking for feedback if there's anything fundamentally wrong with
> extending SOCKMAP map type like this that I might have missed.

I think this looks good. The main reason I blocked it off before is mostly
because I had no use-case for it and the complication with what to do with
child sockets. Clearing the psock state seems OK to me if user wants to
add it back to a map they can simply grab it again from a sockops event.

By the way I would eventually like to see the lookup hook return the
correct type (PTR_TO_SOCKET_OR_NULL) so that the verifier "knows" the type
and the socket can be used the same as if it was pulled from a sk_lookup
helper.

> 
> Thanks,
> Jakub
> 
> [1] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/
> [2] https://linuxplumbersconf.org/event/4/contributions/487/attachments/238/417/Programmable_socket_lookup_LPC_19.pdf
>
Jakub Sitnicki Oct. 25, 2019, 9:26 a.m. UTC | #3
On Thu, Oct 24, 2019 at 06:56 PM CEST, John Fastabend wrote:
> Jakub Sitnicki wrote:

[...]

>> I'm looking for feedback if there's anything fundamentally wrong with
>> extending SOCKMAP map type like this that I might have missed.
>
> I think this looks good. The main reason I blocked it off before is mostly
> because I had no use-case for it and the complication with what to do with
> child sockets. Clearing the psock state seems OK to me if user wants to
> add it back to a map they can simply grab it again from a sockops
> event.

Thanks for taking a look at the code.

> By the way I would eventually like to see the lookup hook return the
> correct type (PTR_TO_SOCKET_OR_NULL) so that the verifier "knows" the type
> and the socket can be used the same as if it was pulled from a sk_lookup
> helper.

Wait... you had me scratching my head there for a minute.

I haven't whitelisted bpf_map_lookup_elem for SOCKMAP in
check_map_func_compatibility so verifier won't allow lookups from BPF.

If we wanted to do that, I don't actually have a use-case for it, I
think would have to extend get_func_proto for SK_SKB and SK_REUSEPORT
prog types. At least that's what docs for bpf_map_lookup_elem suggest:

/* If kernel subsystem is allowing eBPF programs to call this function,
 * inside its own verifier_ops->get_func_proto() callback it should return
 * bpf_map_lookup_elem_proto, so that verifier can properly check the arguments
 *
 * Different map implementations will rely on rcu in map methods
 * lookup/update/delete, therefore eBPF programs must run under rcu lock
 * if program is allowed to access maps, so check rcu_read_lock_held in
 * all three functions.
 */
BPF_CALL_2(bpf_map_lookup_elem, struct bpf_map *, map, void *, key)
{
	WARN_ON_ONCE(!rcu_read_lock_held());
	return (unsigned long) map->ops->map_lookup_elem(map, key);
}

-Jakub
John Fastabend Oct. 25, 2019, 2:18 p.m. UTC | #4
Jakub Sitnicki wrote:
> On Thu, Oct 24, 2019 at 06:56 PM CEST, John Fastabend wrote:
> > Jakub Sitnicki wrote:
> 
> [...]
> 
> >> I'm looking for feedback if there's anything fundamentally wrong with
> >> extending SOCKMAP map type like this that I might have missed.
> >
> > I think this looks good. The main reason I blocked it off before is mostly
> > because I had no use-case for it and the complication with what to do with
> > child sockets. Clearing the psock state seems OK to me if user wants to
> > add it back to a map they can simply grab it again from a sockops
> > event.
> 
> Thanks for taking a look at the code.
> 
> > By the way I would eventually like to see the lookup hook return the
> > correct type (PTR_TO_SOCKET_OR_NULL) so that the verifier "knows" the type
> > and the socket can be used the same as if it was pulled from a sk_lookup
> > helper.
> 
> Wait... you had me scratching my head there for a minute.
> 
> I haven't whitelisted bpf_map_lookup_elem for SOCKMAP in
> check_map_func_compatibility so verifier won't allow lookups from BPF.
> 
> If we wanted to do that, I don't actually have a use-case for it, I
> think would have to extend get_func_proto for SK_SKB and SK_REUSEPORT
> prog types. At least that's what docs for bpf_map_lookup_elem suggest:

Right, so its not required for your series just letting you know I will
probably look to do this shortly. It would be useful for some use cases
we have.

> 
> /* If kernel subsystem is allowing eBPF programs to call this function,
>  * inside its own verifier_ops->get_func_proto() callback it should return
>  * bpf_map_lookup_elem_proto, so that verifier can properly check the arguments
>  *
>  * Different map implementations will rely on rcu in map methods
>  * lookup/update/delete, therefore eBPF programs must run under rcu lock
>  * if program is allowed to access maps, so check rcu_read_lock_held in
>  * all three functions.
>  */
> BPF_CALL_2(bpf_map_lookup_elem, struct bpf_map *, map, void *, key)
> {
> 	WARN_ON_ONCE(!rcu_read_lock_held());
> 	return (unsigned long) map->ops->map_lookup_elem(map, key);
> }
> 
> -Jakub
Martin KaFai Lau Oct. 28, 2019, 5:52 a.m. UTC | #5
On Tue, Oct 22, 2019 at 01:37:25PM +0200, Jakub Sitnicki wrote:
> This patch set is a follow up on a suggestion from LPC '19 discussions to
> make SOCKMAP (or a new map type derived from it) a generic type for storing
> established as well as listening sockets.
> 
> We found ourselves in need of a map type that keeps references to listening
> sockets when working on making the socket lookup programmable, aka BPF
> inet_lookup [1].  Initially we repurposed REUSEPORT_SOCKARRAY but found it
> problematic to extend due to being tightly coupled with reuseport
> logic (see slides [2]).
> So we've turned our attention to SOCKMAP instead.
> 
> As it turns out the changes needed to make SOCKMAP suitable for storing
> listening sockets are self-contained and have use outside of programming
> the socket lookup. Hence this patch set.
> 
> With these patches SOCKMAP can be used in SK_REUSEPORT BPF programs as a
> drop-in replacement for REUSEPORT_SOCKARRAY for TCP. This can hopefully
> lead to code consolidation between the two map types in the future.
What is the plan for UDP support in sockmap?

> 
> Having said that, the main intention here is to lay groundwork for using
> SOCKMAP in the next iteration of programmable socket lookup patches.
What may be the minimal to get only lookup work for UDP sockmap?
.close() and .unhash()?

> 
> I'm looking for feedback if there's anything fundamentally wrong with
> extending SOCKMAP map type like this that I might have missed.
> 
> Thanks,
> Jakub
> 
> [1] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/
> [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__linuxplumbersconf.org_event_4_contributions_487_attachments_238_417_Programmable-5Fsocket-5Flookup-5FLPC-5F19.pdf&d=DwIDAg&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=Y-Ap1QuRBsqsu8gATb1wH3rPT89No2mam2qINt1BGDI&s=_sfXVfJhB2eR7znE7-WBk660dQXIBxuDLRi7jvXVpsg&e= 
> 
> 
> Jakub Sitnicki (5):
>   bpf, sockmap: Let BPF helpers use lookup operation on SOCKMAP
>   bpf, sockmap: Allow inserting listening TCP sockets into SOCKMAP
>   bpf, sockmap: Don't let child socket inherit psock or its ops on copy
>   bpf: Allow selecting reuseport socket from a SOCKMAP
>   selftests/bpf: Extend SK_REUSEPORT tests to cover SOCKMAP
> 
>  kernel/bpf/verifier.c                         |   6 +-
>  net/core/sock_map.c                           |  11 +-
>  net/ipv4/tcp_bpf.c                            |  30 ++++
>  tools/testing/selftests/bpf/Makefile          |   7 +-
>  .../selftests/bpf/test_select_reuseport.c     | 141 ++++++++++++++----
>  .../selftests/bpf/test_select_reuseport.sh    |  14 ++
>  6 files changed, 173 insertions(+), 36 deletions(-)
>  create mode 100755 tools/testing/selftests/bpf/test_select_reuseport.sh
> 
> -- 
> 2.20.1
>
Jakub Sitnicki Oct. 28, 2019, 12:35 p.m. UTC | #6
On Mon, Oct 28, 2019 at 06:52 AM CET, Martin Lau wrote:
> On Tue, Oct 22, 2019 at 01:37:25PM +0200, Jakub Sitnicki wrote:
>> This patch set is a follow up on a suggestion from LPC '19 discussions to
>> make SOCKMAP (or a new map type derived from it) a generic type for storing
>> established as well as listening sockets.
>>
>> We found ourselves in need of a map type that keeps references to listening
>> sockets when working on making the socket lookup programmable, aka BPF
>> inet_lookup [1].  Initially we repurposed REUSEPORT_SOCKARRAY but found it
>> problematic to extend due to being tightly coupled with reuseport
>> logic (see slides [2]).
>> So we've turned our attention to SOCKMAP instead.
>>
>> As it turns out the changes needed to make SOCKMAP suitable for storing
>> listening sockets are self-contained and have use outside of programming
>> the socket lookup. Hence this patch set.
>>
>> With these patches SOCKMAP can be used in SK_REUSEPORT BPF programs as a
>> drop-in replacement for REUSEPORT_SOCKARRAY for TCP. This can hopefully
>> lead to code consolidation between the two map types in the future.
> What is the plan for UDP support in sockmap?

It's on our road-map because without SOCKMAP support for UDP we won't be
able to move away from TPROXY [1] and custom SO_BINDTOPREFIX extension
[2] for steering new UDP flows to receiving sockets. Also we would like
to look into using SOCKMAP for connected UDP socket splicing in the
future [3].

I was planning to split work as follows:

1. SOCKMAP support for listening sockets (this series)
2. programmable socket lookup for TCP (cut-down version of [4])
3. SOCKMAP support for UDP (work not started)
4. programmable socket lookup for UDP (rest of [4])

I'm open to suggestions on how to organize it.

>> Having said that, the main intention here is to lay groundwork for using
>> SOCKMAP in the next iteration of programmable socket lookup patches.
> What may be the minimal to get only lookup work for UDP sockmap?
> .close() and .unhash()?

John would know better. I haven't tried doing it yet.

From just reading the code - override the two proto ops you mentioned,
close and unhash, and adapt the socket checks in SOCKMAP.

-Jakub

[1] https://blog.cloudflare.com/how-we-built-spectrum/
[2] https://lore.kernel.org/netdev/1458699966-3752-1-git-send-email-gilberto.bertin@gmail.com/
[3] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/
[4] https://blog.cloudflare.com/sockmap-tcp-splicing-of-the-future/
John Fastabend Oct. 28, 2019, 7:04 p.m. UTC | #7
Jakub Sitnicki wrote:
> On Mon, Oct 28, 2019 at 06:52 AM CET, Martin Lau wrote:
> > On Tue, Oct 22, 2019 at 01:37:25PM +0200, Jakub Sitnicki wrote:
> >> This patch set is a follow up on a suggestion from LPC '19 discussions to
> >> make SOCKMAP (or a new map type derived from it) a generic type for storing
> >> established as well as listening sockets.
> >>
> >> We found ourselves in need of a map type that keeps references to listening
> >> sockets when working on making the socket lookup programmable, aka BPF
> >> inet_lookup [1].  Initially we repurposed REUSEPORT_SOCKARRAY but found it
> >> problematic to extend due to being tightly coupled with reuseport
> >> logic (see slides [2]).
> >> So we've turned our attention to SOCKMAP instead.
> >>
> >> As it turns out the changes needed to make SOCKMAP suitable for storing
> >> listening sockets are self-contained and have use outside of programming
> >> the socket lookup. Hence this patch set.
> >>
> >> With these patches SOCKMAP can be used in SK_REUSEPORT BPF programs as a
> >> drop-in replacement for REUSEPORT_SOCKARRAY for TCP. This can hopefully
> >> lead to code consolidation between the two map types in the future.
> > What is the plan for UDP support in sockmap?
> 
> It's on our road-map because without SOCKMAP support for UDP we won't be
> able to move away from TPROXY [1] and custom SO_BINDTOPREFIX extension
> [2] for steering new UDP flows to receiving sockets. Also we would like
> to look into using SOCKMAP for connected UDP socket splicing in the
> future [3].
> 
> I was planning to split work as follows:
> 
> 1. SOCKMAP support for listening sockets (this series)
> 2. programmable socket lookup for TCP (cut-down version of [4])
> 3. SOCKMAP support for UDP (work not started)
> 4. programmable socket lookup for UDP (rest of [4])
> 
> I'm open to suggestions on how to organize it.

Looks good to me. I've had UDP support on my todo list for awhile now
but it hasn't got to the top yet so glad to see this.

Also perhaps not necessary for your work but I have some patches on my
stack I'll try to get out soon to get ktls + receive hooks working.

> 
> >> Having said that, the main intention here is to lay groundwork for using
> >> SOCKMAP in the next iteration of programmable socket lookup patches.
> > What may be the minimal to get only lookup work for UDP sockmap?
> > .close() and .unhash()?
> 
> John would know better. I haven't tried doing it yet.

Right, I don't think its too complicated we just need the hooks and then
to be sure the socket state checks are OK. Having listening support should
help with the UDP case.

> 
> From just reading the code - override the two proto ops you mentioned,
> close and unhash, and adapt the socket checks in SOCKMAP.

+1.

> 
> -Jakub
> 
> [1] https://blog.cloudflare.com/how-we-built-spectrum/
> [2] https://lore.kernel.org/netdev/1458699966-3752-1-git-send-email-gilberto.bertin@gmail.com/
> [3] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/
> [4] https://blog.cloudflare.com/sockmap-tcp-splicing-of-the-future/
Martin KaFai Lau Oct. 28, 2019, 8:42 p.m. UTC | #8
On Mon, Oct 28, 2019 at 01:35:26PM +0100, Jakub Sitnicki wrote:
> On Mon, Oct 28, 2019 at 06:52 AM CET, Martin Lau wrote:
> > On Tue, Oct 22, 2019 at 01:37:25PM +0200, Jakub Sitnicki wrote:
> >> This patch set is a follow up on a suggestion from LPC '19 discussions to
> >> make SOCKMAP (or a new map type derived from it) a generic type for storing
> >> established as well as listening sockets.
> >>
> >> We found ourselves in need of a map type that keeps references to listening
> >> sockets when working on making the socket lookup programmable, aka BPF
> >> inet_lookup [1].  Initially we repurposed REUSEPORT_SOCKARRAY but found it
> >> problematic to extend due to being tightly coupled with reuseport
> >> logic (see slides [2]).
> >> So we've turned our attention to SOCKMAP instead.
> >>
> >> As it turns out the changes needed to make SOCKMAP suitable for storing
> >> listening sockets are self-contained and have use outside of programming
> >> the socket lookup. Hence this patch set.
> >>
> >> With these patches SOCKMAP can be used in SK_REUSEPORT BPF programs as a
> >> drop-in replacement for REUSEPORT_SOCKARRAY for TCP. This can hopefully
> >> lead to code consolidation between the two map types in the future.
> > What is the plan for UDP support in sockmap?
> 
> It's on our road-map because without SOCKMAP support for UDP we won't be
> able to move away from TPROXY [1] and custom SO_BINDTOPREFIX extension
> [2] for steering new UDP flows to receiving sockets. Also we would like
> to look into using SOCKMAP for connected UDP socket splicing in the
> future [3].
> 
> I was planning to split work as follows:
> 
> 1. SOCKMAP support for listening sockets (this series)
> 2. programmable socket lookup for TCP (cut-down version of [4])
> 3. SOCKMAP support for UDP (work not started)
hmm...It is hard to comment how the full UDP sockmap may
work out without a code attempt because I am not fluent in
sock_map ;)

From a quick look, it seems there are quite a few things to do.
For example, the TCP_SKB_CB(skb) usage and how that may look
like in UDP.  "struct udp_skb_cb" is 28 bytes while "struct napi_gro_cb"
seems to be 48 bytes already which may need a closer look.

> 4. programmable socket lookup for UDP (rest of [4])
> 
> I'm open to suggestions on how to organize it.
> 
> >> Having said that, the main intention here is to lay groundwork for using
> >> SOCKMAP in the next iteration of programmable socket lookup patches.
> > What may be the minimal to get only lookup work for UDP sockmap?
> > .close() and .unhash()?
> 
> John would know better. I haven't tried doing it yet.
> 
> From just reading the code - override the two proto ops you mentioned,
> close and unhash, and adapt the socket checks in SOCKMAP.
Do your use cases need bpf prog attached to sock_map?

If not, would it be cleaner to delicate another map_type
for lookup-only use case to have both TCP and UDP support.

> 
> -Jakub
> 
> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__blog.cloudflare.com_how-2Dwe-2Dbuilt-2Dspectrum_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=lSo-FsOeNl_8znZZ07H8I6ZYAinPKTR5C3Cn_Ol3QYQ&s=DZgW8-2Xl1P8NU59ji4ieQLzwWpx4t3gGq_tqB0l3Bo&e= 
> [2] https://lore.kernel.org/netdev/1458699966-3752-1-git-send-email-gilberto.bertin@gmail.com/
> [3] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/
> [4] https://urldefense.proofpoint.com/v2/url?u=https-3A__blog.cloudflare.com_sockmap-2Dtcp-2Dsplicing-2Dof-2Dthe-2Dfuture_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=lSo-FsOeNl_8znZZ07H8I6ZYAinPKTR5C3Cn_Ol3QYQ&s=NerUqb4j7IsGBTcni6Yxk40wf6kTkckHXn3Nx5i4mCU&e=
John Fastabend Oct. 28, 2019, 9:05 p.m. UTC | #9
Martin Lau wrote:
> On Mon, Oct 28, 2019 at 01:35:26PM +0100, Jakub Sitnicki wrote:
> > On Mon, Oct 28, 2019 at 06:52 AM CET, Martin Lau wrote:
> > > On Tue, Oct 22, 2019 at 01:37:25PM +0200, Jakub Sitnicki wrote:
> > >> This patch set is a follow up on a suggestion from LPC '19 discussions to
> > >> make SOCKMAP (or a new map type derived from it) a generic type for storing
> > >> established as well as listening sockets.
> > >>
> > >> We found ourselves in need of a map type that keeps references to listening
> > >> sockets when working on making the socket lookup programmable, aka BPF
> > >> inet_lookup [1].  Initially we repurposed REUSEPORT_SOCKARRAY but found it
> > >> problematic to extend due to being tightly coupled with reuseport
> > >> logic (see slides [2]).
> > >> So we've turned our attention to SOCKMAP instead.
> > >>
> > >> As it turns out the changes needed to make SOCKMAP suitable for storing
> > >> listening sockets are self-contained and have use outside of programming
> > >> the socket lookup. Hence this patch set.
> > >>
> > >> With these patches SOCKMAP can be used in SK_REUSEPORT BPF programs as a
> > >> drop-in replacement for REUSEPORT_SOCKARRAY for TCP. This can hopefully
> > >> lead to code consolidation between the two map types in the future.
> > > What is the plan for UDP support in sockmap?
> > 
> > It's on our road-map because without SOCKMAP support for UDP we won't be
> > able to move away from TPROXY [1] and custom SO_BINDTOPREFIX extension
> > [2] for steering new UDP flows to receiving sockets. Also we would like
> > to look into using SOCKMAP for connected UDP socket splicing in the
> > future [3].
> > 
> > I was planning to split work as follows:
> > 
> > 1. SOCKMAP support for listening sockets (this series)
> > 2. programmable socket lookup for TCP (cut-down version of [4])
> > 3. SOCKMAP support for UDP (work not started)
> hmm...It is hard to comment how the full UDP sockmap may
> work out without a code attempt because I am not fluent in
> sock_map ;)
> 
> From a quick look, it seems there are quite a few things to do.
> For example, the TCP_SKB_CB(skb) usage and how that may look
> like in UDP.  "struct udp_skb_cb" is 28 bytes while "struct napi_gro_cb"
> seems to be 48 bytes already which may need a closer look.

The extra bits sockmap needs are used for redirecting between
between sockets. These will fit in the udp cb area with some
extra room to spare. If that is paticularly challenging we can
also create a program attach type which would preclude using
those bits in the sk_reuseport bpf program types. We already
have types for rx, tx, nop progs, so one more should be fine.

So at least that paticular concern is not difficult to fix.

> 
> > 4. programmable socket lookup for UDP (rest of [4])
> > 
> > I'm open to suggestions on how to organize it.
> > 
> > >> Having said that, the main intention here is to lay groundwork for using
> > >> SOCKMAP in the next iteration of programmable socket lookup patches.
> > > What may be the minimal to get only lookup work for UDP sockmap?
> > > .close() and .unhash()?
> > 
> > John would know better. I haven't tried doing it yet.
> > 
> > From just reading the code - override the two proto ops you mentioned,
> > close and unhash, and adapt the socket checks in SOCKMAP.
> Do your use cases need bpf prog attached to sock_map?

Perhaps not specifically sock_map but they do need to be consolidated
into a map somewhere IMO this has proven to be the most versatile. We
can add sockets from the various BPF hooks or from user space and have
the ability to use the existing map tools, etc.

> 
> If not, would it be cleaner to delicate another map_type
> for lookup-only use case to have both TCP and UDP support.

But we (Cilium project and above splicing use case is also interested)
will need UDP support so it will be supported regardless of the
SK_REUSEPORT_BPF so I think it makes sense to consolidate all these
use cases on to the existing sockmap.

Also sockmap supports inserting sockets from BPF and from userspace
which actually requires a bit of logic to track state, etc. Its been
in use and been beat on by various automated test tools so I think
at minimum this needs to be reused. Re-implementing this logic seems
a waste of time and it wasn't exactly trivial and took some work.

Being able to insert the sockets from XDP (support coming soon) and
from sock_ops programs turns out to be fairly powerful.

So in short I think it makes most sense to consolidate on sock_map
because

  (a) we need and will add udp support regardless,
  (b) we already handle the tricky parts inerting/removing live sockets
  (c) from this series it looks like its fairly straight forward
  (d) we get lots of shared code

Thanks,
John


> 
> > 
> > -Jakub
> > 
> > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__blog.cloudflare.com_how-2Dwe-2Dbuilt-2Dspectrum_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=lSo-FsOeNl_8znZZ07H8I6ZYAinPKTR5C3Cn_Ol3QYQ&s=DZgW8-2Xl1P8NU59ji4ieQLzwWpx4t3gGq_tqB0l3Bo&e= 
> > [2] https://lore.kernel.org/netdev/1458699966-3752-1-git-send-email-gilberto.bertin@gmail.com/
> > [3] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/
> > [4] https://urldefense.proofpoint.com/v2/url?u=https-3A__blog.cloudflare.com_sockmap-2Dtcp-2Dsplicing-2Dof-2Dthe-2Dfuture_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=lSo-FsOeNl_8znZZ07H8I6ZYAinPKTR5C3Cn_Ol3QYQ&s=NerUqb4j7IsGBTcni6Yxk40wf6kTkckHXn3Nx5i4mCU&e=
Martin KaFai Lau Oct. 28, 2019, 9:38 p.m. UTC | #10
On Mon, Oct 28, 2019 at 02:05:24PM -0700, John Fastabend wrote:
> Martin Lau wrote:
> > On Mon, Oct 28, 2019 at 01:35:26PM +0100, Jakub Sitnicki wrote:
> > > On Mon, Oct 28, 2019 at 06:52 AM CET, Martin Lau wrote:
> > > > On Tue, Oct 22, 2019 at 01:37:25PM +0200, Jakub Sitnicki wrote:
> > > >> This patch set is a follow up on a suggestion from LPC '19 discussions to
> > > >> make SOCKMAP (or a new map type derived from it) a generic type for storing
> > > >> established as well as listening sockets.
> > > >>
> > > >> We found ourselves in need of a map type that keeps references to listening
> > > >> sockets when working on making the socket lookup programmable, aka BPF
> > > >> inet_lookup [1].  Initially we repurposed REUSEPORT_SOCKARRAY but found it
> > > >> problematic to extend due to being tightly coupled with reuseport
> > > >> logic (see slides [2]).
> > > >> So we've turned our attention to SOCKMAP instead.
> > > >>
> > > >> As it turns out the changes needed to make SOCKMAP suitable for storing
> > > >> listening sockets are self-contained and have use outside of programming
> > > >> the socket lookup. Hence this patch set.
> > > >>
> > > >> With these patches SOCKMAP can be used in SK_REUSEPORT BPF programs as a
> > > >> drop-in replacement for REUSEPORT_SOCKARRAY for TCP. This can hopefully
> > > >> lead to code consolidation between the two map types in the future.
> > > > What is the plan for UDP support in sockmap?
> > > 
> > > It's on our road-map because without SOCKMAP support for UDP we won't be
> > > able to move away from TPROXY [1] and custom SO_BINDTOPREFIX extension
> > > [2] for steering new UDP flows to receiving sockets. Also we would like
> > > to look into using SOCKMAP for connected UDP socket splicing in the
> > > future [3].
> > > 
> > > I was planning to split work as follows:
> > > 
> > > 1. SOCKMAP support for listening sockets (this series)
> > > 2. programmable socket lookup for TCP (cut-down version of [4])
> > > 3. SOCKMAP support for UDP (work not started)
> > hmm...It is hard to comment how the full UDP sockmap may
> > work out without a code attempt because I am not fluent in
> > sock_map ;)
> > 
> > From a quick look, it seems there are quite a few things to do.
> > For example, the TCP_SKB_CB(skb) usage and how that may look
> > like in UDP.  "struct udp_skb_cb" is 28 bytes while "struct napi_gro_cb"
> > seems to be 48 bytes already which may need a closer look.
> 
> The extra bits sockmap needs are used for redirecting between
> between sockets. These will fit in the udp cb area with some
> extra room to spare. If that is paticularly challenging we can
> also create a program attach type which would preclude using
> those bits in the sk_reuseport bpf program types. We already
> have types for rx, tx, nop progs, so one more should be fine.
> 
> So at least that paticular concern is not difficult to fix.
> 
> > 
> > > 4. programmable socket lookup for UDP (rest of [4])
> > > 
> > > I'm open to suggestions on how to organize it.
> > > 
> > > >> Having said that, the main intention here is to lay groundwork for using
> > > >> SOCKMAP in the next iteration of programmable socket lookup patches.
> > > > What may be the minimal to get only lookup work for UDP sockmap?
> > > > .close() and .unhash()?
> > > 
> > > John would know better. I haven't tried doing it yet.
> > > 
> > > From just reading the code - override the two proto ops you mentioned,
> > > close and unhash, and adapt the socket checks in SOCKMAP.
> > Do your use cases need bpf prog attached to sock_map?
> 
> Perhaps not specifically sock_map but they do need to be consolidated
> into a map somewhere IMO this has proven to be the most versatile. We
> can add sockets from the various BPF hooks or from user space and have
> the ability to use the existing map tools, etc.
> 
> > 
> > If not, would it be cleaner to delicate another map_type
> > for lookup-only use case to have both TCP and UDP support.
> 
> But we (Cilium project and above splicing use case is also interested)
> will need UDP support so it will be supported regardless of the
> SK_REUSEPORT_BPF so I think it makes sense to consolidate all these
> use cases on to the existing sockmap.
> 
> Also sockmap supports inserting sockets from BPF and from userspace
> which actually requires a bit of logic to track state, etc. Its been
> in use and been beat on by various automated test tools so I think
> at minimum this needs to be reused. Re-implementing this logic seems
> a waste of time and it wasn't exactly trivial and took some work.
> 
> Being able to insert the sockets from XDP (support coming soon) and
> from sock_ops programs turns out to be fairly powerful.
> 
> So in short I think it makes most sense to consolidate on sock_map
> because
> 
>   (a) we need and will add udp support regardless,

>   (b) we already handle the tricky parts inerting/removing live sockets
I didn't mean not to reuse the existing sockmap logic on tracking
socks life-time.  I was exploring options if the first step for UDP
could be lookup-only support first.

It is always better to get full UDP support ;)
It seems to be confident also, then there is little reason not to do
so in UDP sockmap support v1.

>   (c) from this series it looks like its fairly straight forward
>   (d) we get lots of shared code
> 
> Thanks,
> John
> 
> 
> > 
> > > 
> > > -Jakub
> > > 
> > > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__blog.cloudflare.com_how-2Dwe-2Dbuilt-2Dspectrum_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=lSo-FsOeNl_8znZZ07H8I6ZYAinPKTR5C3Cn_Ol3QYQ&s=DZgW8-2Xl1P8NU59ji4ieQLzwWpx4t3gGq_tqB0l3Bo&e= 
> > > [2] https://lore.kernel.org/netdev/1458699966-3752-1-git-send-email-gilberto.bertin@gmail.com/
> > > [3] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/
> > > [4] https://urldefense.proofpoint.com/v2/url?u=https-3A__blog.cloudflare.com_sockmap-2Dtcp-2Dsplicing-2Dof-2Dthe-2Dfuture_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=lSo-FsOeNl_8znZZ07H8I6ZYAinPKTR5C3Cn_Ol3QYQ&s=NerUqb4j7IsGBTcni6Yxk40wf6kTkckHXn3Nx5i4mCU&e= 
> 
>
Jakub Sitnicki Oct. 29, 2019, 8:52 a.m. UTC | #11
On Mon, Oct 28, 2019 at 10:38 PM CET, Martin Lau wrote:
> It is always better to get full UDP support ;)
> It seems to be confident also, then there is little reason not to do
> so in UDP sockmap support v1.

Let me give it a shot :-)
Jakub Sitnicki Oct. 29, 2019, 8:56 a.m. UTC | #12
On Mon, Oct 28, 2019 at 08:04 PM CET, John Fastabend wrote:
> Jakub Sitnicki wrote:
>> I was planning to split work as follows:
>>
>> 1. SOCKMAP support for listening sockets (this series)
>> 2. programmable socket lookup for TCP (cut-down version of [4])
>> 3. SOCKMAP support for UDP (work not started)
>> 4. programmable socket lookup for UDP (rest of [4])
>>
>> I'm open to suggestions on how to organize it.
>
> Looks good to me. I've had UDP support on my todo list for awhile now
> but it hasn't got to the top yet so glad to see this.
>
> Also perhaps not necessary for your work but I have some patches on my
> stack I'll try to get out soon to get ktls + receive hooks working.

Thanks for the heads-up. If you have a dev branch somewhere, I'm curious
to see what's cooking. Otherwise I'll keep an eye out for your patches.