Message ID | 20191022113730.29303-1-jakub@cloudflare.com |
---|---|
Headers | show |
Series | Extend SOCKMAP to store listening sockets | expand |
On Tue, Oct 22, 2019 at 01:37:25PM +0200, Jakub Sitnicki wrote: > This patch set is a follow up on a suggestion from LPC '19 discussions to > make SOCKMAP (or a new map type derived from it) a generic type for storing > established as well as listening sockets. > > We found ourselves in need of a map type that keeps references to listening > sockets when working on making the socket lookup programmable, aka BPF > inet_lookup [1]. Initially we repurposed REUSEPORT_SOCKARRAY but found it > problematic to extend due to being tightly coupled with reuseport > logic (see slides [2]). So we've turned our attention to SOCKMAP instead. > > As it turns out the changes needed to make SOCKMAP suitable for storing > listening sockets are self-contained and have use outside of programming > the socket lookup. Hence this patch set. > > With these patches SOCKMAP can be used in SK_REUSEPORT BPF programs as a > drop-in replacement for REUSEPORT_SOCKARRAY for TCP. This can hopefully > lead to code consolidation between the two map types in the future. > > Having said that, the main intention here is to lay groundwork for using > SOCKMAP in the next iteration of programmable socket lookup patches. > > I'm looking for feedback if there's anything fundamentally wrong with > extending SOCKMAP map type like this that I might have missed. John, Martin, comments?
Jakub Sitnicki wrote: > This patch set is a follow up on a suggestion from LPC '19 discussions to > make SOCKMAP (or a new map type derived from it) a generic type for storing > established as well as listening sockets. > > We found ourselves in need of a map type that keeps references to listening > sockets when working on making the socket lookup programmable, aka BPF > inet_lookup [1]. Initially we repurposed REUSEPORT_SOCKARRAY but found it > problematic to extend due to being tightly coupled with reuseport > logic (see slides [2]). So we've turned our attention to SOCKMAP instead. > > As it turns out the changes needed to make SOCKMAP suitable for storing > listening sockets are self-contained and have use outside of programming > the socket lookup. Hence this patch set. > > With these patches SOCKMAP can be used in SK_REUSEPORT BPF programs as a > drop-in replacement for REUSEPORT_SOCKARRAY for TCP. This can hopefully > lead to code consolidation between the two map types in the future. > > Having said that, the main intention here is to lay groundwork for using > SOCKMAP in the next iteration of programmable socket lookup patches. > > I'm looking for feedback if there's anything fundamentally wrong with > extending SOCKMAP map type like this that I might have missed. I think this looks good. The main reason I blocked it off before is mostly because I had no use-case for it and the complication with what to do with child sockets. Clearing the psock state seems OK to me if user wants to add it back to a map they can simply grab it again from a sockops event. By the way I would eventually like to see the lookup hook return the correct type (PTR_TO_SOCKET_OR_NULL) so that the verifier "knows" the type and the socket can be used the same as if it was pulled from a sk_lookup helper. > > Thanks, > Jakub > > [1] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/ > [2] https://linuxplumbersconf.org/event/4/contributions/487/attachments/238/417/Programmable_socket_lookup_LPC_19.pdf >
On Thu, Oct 24, 2019 at 06:56 PM CEST, John Fastabend wrote: > Jakub Sitnicki wrote: [...] >> I'm looking for feedback if there's anything fundamentally wrong with >> extending SOCKMAP map type like this that I might have missed. > > I think this looks good. The main reason I blocked it off before is mostly > because I had no use-case for it and the complication with what to do with > child sockets. Clearing the psock state seems OK to me if user wants to > add it back to a map they can simply grab it again from a sockops > event. Thanks for taking a look at the code. > By the way I would eventually like to see the lookup hook return the > correct type (PTR_TO_SOCKET_OR_NULL) so that the verifier "knows" the type > and the socket can be used the same as if it was pulled from a sk_lookup > helper. Wait... you had me scratching my head there for a minute. I haven't whitelisted bpf_map_lookup_elem for SOCKMAP in check_map_func_compatibility so verifier won't allow lookups from BPF. If we wanted to do that, I don't actually have a use-case for it, I think would have to extend get_func_proto for SK_SKB and SK_REUSEPORT prog types. At least that's what docs for bpf_map_lookup_elem suggest: /* If kernel subsystem is allowing eBPF programs to call this function, * inside its own verifier_ops->get_func_proto() callback it should return * bpf_map_lookup_elem_proto, so that verifier can properly check the arguments * * Different map implementations will rely on rcu in map methods * lookup/update/delete, therefore eBPF programs must run under rcu lock * if program is allowed to access maps, so check rcu_read_lock_held in * all three functions. */ BPF_CALL_2(bpf_map_lookup_elem, struct bpf_map *, map, void *, key) { WARN_ON_ONCE(!rcu_read_lock_held()); return (unsigned long) map->ops->map_lookup_elem(map, key); } -Jakub
Jakub Sitnicki wrote: > On Thu, Oct 24, 2019 at 06:56 PM CEST, John Fastabend wrote: > > Jakub Sitnicki wrote: > > [...] > > >> I'm looking for feedback if there's anything fundamentally wrong with > >> extending SOCKMAP map type like this that I might have missed. > > > > I think this looks good. The main reason I blocked it off before is mostly > > because I had no use-case for it and the complication with what to do with > > child sockets. Clearing the psock state seems OK to me if user wants to > > add it back to a map they can simply grab it again from a sockops > > event. > > Thanks for taking a look at the code. > > > By the way I would eventually like to see the lookup hook return the > > correct type (PTR_TO_SOCKET_OR_NULL) so that the verifier "knows" the type > > and the socket can be used the same as if it was pulled from a sk_lookup > > helper. > > Wait... you had me scratching my head there for a minute. > > I haven't whitelisted bpf_map_lookup_elem for SOCKMAP in > check_map_func_compatibility so verifier won't allow lookups from BPF. > > If we wanted to do that, I don't actually have a use-case for it, I > think would have to extend get_func_proto for SK_SKB and SK_REUSEPORT > prog types. At least that's what docs for bpf_map_lookup_elem suggest: Right, so its not required for your series just letting you know I will probably look to do this shortly. It would be useful for some use cases we have. > > /* If kernel subsystem is allowing eBPF programs to call this function, > * inside its own verifier_ops->get_func_proto() callback it should return > * bpf_map_lookup_elem_proto, so that verifier can properly check the arguments > * > * Different map implementations will rely on rcu in map methods > * lookup/update/delete, therefore eBPF programs must run under rcu lock > * if program is allowed to access maps, so check rcu_read_lock_held in > * all three functions. > */ > BPF_CALL_2(bpf_map_lookup_elem, struct bpf_map *, map, void *, key) > { > WARN_ON_ONCE(!rcu_read_lock_held()); > return (unsigned long) map->ops->map_lookup_elem(map, key); > } > > -Jakub
On Tue, Oct 22, 2019 at 01:37:25PM +0200, Jakub Sitnicki wrote: > This patch set is a follow up on a suggestion from LPC '19 discussions to > make SOCKMAP (or a new map type derived from it) a generic type for storing > established as well as listening sockets. > > We found ourselves in need of a map type that keeps references to listening > sockets when working on making the socket lookup programmable, aka BPF > inet_lookup [1]. Initially we repurposed REUSEPORT_SOCKARRAY but found it > problematic to extend due to being tightly coupled with reuseport > logic (see slides [2]). > So we've turned our attention to SOCKMAP instead. > > As it turns out the changes needed to make SOCKMAP suitable for storing > listening sockets are self-contained and have use outside of programming > the socket lookup. Hence this patch set. > > With these patches SOCKMAP can be used in SK_REUSEPORT BPF programs as a > drop-in replacement for REUSEPORT_SOCKARRAY for TCP. This can hopefully > lead to code consolidation between the two map types in the future. What is the plan for UDP support in sockmap? > > Having said that, the main intention here is to lay groundwork for using > SOCKMAP in the next iteration of programmable socket lookup patches. What may be the minimal to get only lookup work for UDP sockmap? .close() and .unhash()? > > I'm looking for feedback if there's anything fundamentally wrong with > extending SOCKMAP map type like this that I might have missed. > > Thanks, > Jakub > > [1] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/ > [2] https://urldefense.proofpoint.com/v2/url?u=https-3A__linuxplumbersconf.org_event_4_contributions_487_attachments_238_417_Programmable-5Fsocket-5Flookup-5FLPC-5F19.pdf&d=DwIDAg&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=Y-Ap1QuRBsqsu8gATb1wH3rPT89No2mam2qINt1BGDI&s=_sfXVfJhB2eR7znE7-WBk660dQXIBxuDLRi7jvXVpsg&e= > > > Jakub Sitnicki (5): > bpf, sockmap: Let BPF helpers use lookup operation on SOCKMAP > bpf, sockmap: Allow inserting listening TCP sockets into SOCKMAP > bpf, sockmap: Don't let child socket inherit psock or its ops on copy > bpf: Allow selecting reuseport socket from a SOCKMAP > selftests/bpf: Extend SK_REUSEPORT tests to cover SOCKMAP > > kernel/bpf/verifier.c | 6 +- > net/core/sock_map.c | 11 +- > net/ipv4/tcp_bpf.c | 30 ++++ > tools/testing/selftests/bpf/Makefile | 7 +- > .../selftests/bpf/test_select_reuseport.c | 141 ++++++++++++++---- > .../selftests/bpf/test_select_reuseport.sh | 14 ++ > 6 files changed, 173 insertions(+), 36 deletions(-) > create mode 100755 tools/testing/selftests/bpf/test_select_reuseport.sh > > -- > 2.20.1 >
On Mon, Oct 28, 2019 at 06:52 AM CET, Martin Lau wrote: > On Tue, Oct 22, 2019 at 01:37:25PM +0200, Jakub Sitnicki wrote: >> This patch set is a follow up on a suggestion from LPC '19 discussions to >> make SOCKMAP (or a new map type derived from it) a generic type for storing >> established as well as listening sockets. >> >> We found ourselves in need of a map type that keeps references to listening >> sockets when working on making the socket lookup programmable, aka BPF >> inet_lookup [1]. Initially we repurposed REUSEPORT_SOCKARRAY but found it >> problematic to extend due to being tightly coupled with reuseport >> logic (see slides [2]). >> So we've turned our attention to SOCKMAP instead. >> >> As it turns out the changes needed to make SOCKMAP suitable for storing >> listening sockets are self-contained and have use outside of programming >> the socket lookup. Hence this patch set. >> >> With these patches SOCKMAP can be used in SK_REUSEPORT BPF programs as a >> drop-in replacement for REUSEPORT_SOCKARRAY for TCP. This can hopefully >> lead to code consolidation between the two map types in the future. > What is the plan for UDP support in sockmap? It's on our road-map because without SOCKMAP support for UDP we won't be able to move away from TPROXY [1] and custom SO_BINDTOPREFIX extension [2] for steering new UDP flows to receiving sockets. Also we would like to look into using SOCKMAP for connected UDP socket splicing in the future [3]. I was planning to split work as follows: 1. SOCKMAP support for listening sockets (this series) 2. programmable socket lookup for TCP (cut-down version of [4]) 3. SOCKMAP support for UDP (work not started) 4. programmable socket lookup for UDP (rest of [4]) I'm open to suggestions on how to organize it. >> Having said that, the main intention here is to lay groundwork for using >> SOCKMAP in the next iteration of programmable socket lookup patches. > What may be the minimal to get only lookup work for UDP sockmap? > .close() and .unhash()? John would know better. I haven't tried doing it yet. From just reading the code - override the two proto ops you mentioned, close and unhash, and adapt the socket checks in SOCKMAP. -Jakub [1] https://blog.cloudflare.com/how-we-built-spectrum/ [2] https://lore.kernel.org/netdev/1458699966-3752-1-git-send-email-gilberto.bertin@gmail.com/ [3] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/ [4] https://blog.cloudflare.com/sockmap-tcp-splicing-of-the-future/
Jakub Sitnicki wrote: > On Mon, Oct 28, 2019 at 06:52 AM CET, Martin Lau wrote: > > On Tue, Oct 22, 2019 at 01:37:25PM +0200, Jakub Sitnicki wrote: > >> This patch set is a follow up on a suggestion from LPC '19 discussions to > >> make SOCKMAP (or a new map type derived from it) a generic type for storing > >> established as well as listening sockets. > >> > >> We found ourselves in need of a map type that keeps references to listening > >> sockets when working on making the socket lookup programmable, aka BPF > >> inet_lookup [1]. Initially we repurposed REUSEPORT_SOCKARRAY but found it > >> problematic to extend due to being tightly coupled with reuseport > >> logic (see slides [2]). > >> So we've turned our attention to SOCKMAP instead. > >> > >> As it turns out the changes needed to make SOCKMAP suitable for storing > >> listening sockets are self-contained and have use outside of programming > >> the socket lookup. Hence this patch set. > >> > >> With these patches SOCKMAP can be used in SK_REUSEPORT BPF programs as a > >> drop-in replacement for REUSEPORT_SOCKARRAY for TCP. This can hopefully > >> lead to code consolidation between the two map types in the future. > > What is the plan for UDP support in sockmap? > > It's on our road-map because without SOCKMAP support for UDP we won't be > able to move away from TPROXY [1] and custom SO_BINDTOPREFIX extension > [2] for steering new UDP flows to receiving sockets. Also we would like > to look into using SOCKMAP for connected UDP socket splicing in the > future [3]. > > I was planning to split work as follows: > > 1. SOCKMAP support for listening sockets (this series) > 2. programmable socket lookup for TCP (cut-down version of [4]) > 3. SOCKMAP support for UDP (work not started) > 4. programmable socket lookup for UDP (rest of [4]) > > I'm open to suggestions on how to organize it. Looks good to me. I've had UDP support on my todo list for awhile now but it hasn't got to the top yet so glad to see this. Also perhaps not necessary for your work but I have some patches on my stack I'll try to get out soon to get ktls + receive hooks working. > > >> Having said that, the main intention here is to lay groundwork for using > >> SOCKMAP in the next iteration of programmable socket lookup patches. > > What may be the minimal to get only lookup work for UDP sockmap? > > .close() and .unhash()? > > John would know better. I haven't tried doing it yet. Right, I don't think its too complicated we just need the hooks and then to be sure the socket state checks are OK. Having listening support should help with the UDP case. > > From just reading the code - override the two proto ops you mentioned, > close and unhash, and adapt the socket checks in SOCKMAP. +1. > > -Jakub > > [1] https://blog.cloudflare.com/how-we-built-spectrum/ > [2] https://lore.kernel.org/netdev/1458699966-3752-1-git-send-email-gilberto.bertin@gmail.com/ > [3] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/ > [4] https://blog.cloudflare.com/sockmap-tcp-splicing-of-the-future/
On Mon, Oct 28, 2019 at 01:35:26PM +0100, Jakub Sitnicki wrote: > On Mon, Oct 28, 2019 at 06:52 AM CET, Martin Lau wrote: > > On Tue, Oct 22, 2019 at 01:37:25PM +0200, Jakub Sitnicki wrote: > >> This patch set is a follow up on a suggestion from LPC '19 discussions to > >> make SOCKMAP (or a new map type derived from it) a generic type for storing > >> established as well as listening sockets. > >> > >> We found ourselves in need of a map type that keeps references to listening > >> sockets when working on making the socket lookup programmable, aka BPF > >> inet_lookup [1]. Initially we repurposed REUSEPORT_SOCKARRAY but found it > >> problematic to extend due to being tightly coupled with reuseport > >> logic (see slides [2]). > >> So we've turned our attention to SOCKMAP instead. > >> > >> As it turns out the changes needed to make SOCKMAP suitable for storing > >> listening sockets are self-contained and have use outside of programming > >> the socket lookup. Hence this patch set. > >> > >> With these patches SOCKMAP can be used in SK_REUSEPORT BPF programs as a > >> drop-in replacement for REUSEPORT_SOCKARRAY for TCP. This can hopefully > >> lead to code consolidation between the two map types in the future. > > What is the plan for UDP support in sockmap? > > It's on our road-map because without SOCKMAP support for UDP we won't be > able to move away from TPROXY [1] and custom SO_BINDTOPREFIX extension > [2] for steering new UDP flows to receiving sockets. Also we would like > to look into using SOCKMAP for connected UDP socket splicing in the > future [3]. > > I was planning to split work as follows: > > 1. SOCKMAP support for listening sockets (this series) > 2. programmable socket lookup for TCP (cut-down version of [4]) > 3. SOCKMAP support for UDP (work not started) hmm...It is hard to comment how the full UDP sockmap may work out without a code attempt because I am not fluent in sock_map ;) From a quick look, it seems there are quite a few things to do. For example, the TCP_SKB_CB(skb) usage and how that may look like in UDP. "struct udp_skb_cb" is 28 bytes while "struct napi_gro_cb" seems to be 48 bytes already which may need a closer look. > 4. programmable socket lookup for UDP (rest of [4]) > > I'm open to suggestions on how to organize it. > > >> Having said that, the main intention here is to lay groundwork for using > >> SOCKMAP in the next iteration of programmable socket lookup patches. > > What may be the minimal to get only lookup work for UDP sockmap? > > .close() and .unhash()? > > John would know better. I haven't tried doing it yet. > > From just reading the code - override the two proto ops you mentioned, > close and unhash, and adapt the socket checks in SOCKMAP. Do your use cases need bpf prog attached to sock_map? If not, would it be cleaner to delicate another map_type for lookup-only use case to have both TCP and UDP support. > > -Jakub > > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__blog.cloudflare.com_how-2Dwe-2Dbuilt-2Dspectrum_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=lSo-FsOeNl_8znZZ07H8I6ZYAinPKTR5C3Cn_Ol3QYQ&s=DZgW8-2Xl1P8NU59ji4ieQLzwWpx4t3gGq_tqB0l3Bo&e= > [2] https://lore.kernel.org/netdev/1458699966-3752-1-git-send-email-gilberto.bertin@gmail.com/ > [3] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/ > [4] https://urldefense.proofpoint.com/v2/url?u=https-3A__blog.cloudflare.com_sockmap-2Dtcp-2Dsplicing-2Dof-2Dthe-2Dfuture_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=lSo-FsOeNl_8znZZ07H8I6ZYAinPKTR5C3Cn_Ol3QYQ&s=NerUqb4j7IsGBTcni6Yxk40wf6kTkckHXn3Nx5i4mCU&e=
Martin Lau wrote: > On Mon, Oct 28, 2019 at 01:35:26PM +0100, Jakub Sitnicki wrote: > > On Mon, Oct 28, 2019 at 06:52 AM CET, Martin Lau wrote: > > > On Tue, Oct 22, 2019 at 01:37:25PM +0200, Jakub Sitnicki wrote: > > >> This patch set is a follow up on a suggestion from LPC '19 discussions to > > >> make SOCKMAP (or a new map type derived from it) a generic type for storing > > >> established as well as listening sockets. > > >> > > >> We found ourselves in need of a map type that keeps references to listening > > >> sockets when working on making the socket lookup programmable, aka BPF > > >> inet_lookup [1]. Initially we repurposed REUSEPORT_SOCKARRAY but found it > > >> problematic to extend due to being tightly coupled with reuseport > > >> logic (see slides [2]). > > >> So we've turned our attention to SOCKMAP instead. > > >> > > >> As it turns out the changes needed to make SOCKMAP suitable for storing > > >> listening sockets are self-contained and have use outside of programming > > >> the socket lookup. Hence this patch set. > > >> > > >> With these patches SOCKMAP can be used in SK_REUSEPORT BPF programs as a > > >> drop-in replacement for REUSEPORT_SOCKARRAY for TCP. This can hopefully > > >> lead to code consolidation between the two map types in the future. > > > What is the plan for UDP support in sockmap? > > > > It's on our road-map because without SOCKMAP support for UDP we won't be > > able to move away from TPROXY [1] and custom SO_BINDTOPREFIX extension > > [2] for steering new UDP flows to receiving sockets. Also we would like > > to look into using SOCKMAP for connected UDP socket splicing in the > > future [3]. > > > > I was planning to split work as follows: > > > > 1. SOCKMAP support for listening sockets (this series) > > 2. programmable socket lookup for TCP (cut-down version of [4]) > > 3. SOCKMAP support for UDP (work not started) > hmm...It is hard to comment how the full UDP sockmap may > work out without a code attempt because I am not fluent in > sock_map ;) > > From a quick look, it seems there are quite a few things to do. > For example, the TCP_SKB_CB(skb) usage and how that may look > like in UDP. "struct udp_skb_cb" is 28 bytes while "struct napi_gro_cb" > seems to be 48 bytes already which may need a closer look. The extra bits sockmap needs are used for redirecting between between sockets. These will fit in the udp cb area with some extra room to spare. If that is paticularly challenging we can also create a program attach type which would preclude using those bits in the sk_reuseport bpf program types. We already have types for rx, tx, nop progs, so one more should be fine. So at least that paticular concern is not difficult to fix. > > > 4. programmable socket lookup for UDP (rest of [4]) > > > > I'm open to suggestions on how to organize it. > > > > >> Having said that, the main intention here is to lay groundwork for using > > >> SOCKMAP in the next iteration of programmable socket lookup patches. > > > What may be the minimal to get only lookup work for UDP sockmap? > > > .close() and .unhash()? > > > > John would know better. I haven't tried doing it yet. > > > > From just reading the code - override the two proto ops you mentioned, > > close and unhash, and adapt the socket checks in SOCKMAP. > Do your use cases need bpf prog attached to sock_map? Perhaps not specifically sock_map but they do need to be consolidated into a map somewhere IMO this has proven to be the most versatile. We can add sockets from the various BPF hooks or from user space and have the ability to use the existing map tools, etc. > > If not, would it be cleaner to delicate another map_type > for lookup-only use case to have both TCP and UDP support. But we (Cilium project and above splicing use case is also interested) will need UDP support so it will be supported regardless of the SK_REUSEPORT_BPF so I think it makes sense to consolidate all these use cases on to the existing sockmap. Also sockmap supports inserting sockets from BPF and from userspace which actually requires a bit of logic to track state, etc. Its been in use and been beat on by various automated test tools so I think at minimum this needs to be reused. Re-implementing this logic seems a waste of time and it wasn't exactly trivial and took some work. Being able to insert the sockets from XDP (support coming soon) and from sock_ops programs turns out to be fairly powerful. So in short I think it makes most sense to consolidate on sock_map because (a) we need and will add udp support regardless, (b) we already handle the tricky parts inerting/removing live sockets (c) from this series it looks like its fairly straight forward (d) we get lots of shared code Thanks, John > > > > > -Jakub > > > > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__blog.cloudflare.com_how-2Dwe-2Dbuilt-2Dspectrum_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=lSo-FsOeNl_8znZZ07H8I6ZYAinPKTR5C3Cn_Ol3QYQ&s=DZgW8-2Xl1P8NU59ji4ieQLzwWpx4t3gGq_tqB0l3Bo&e= > > [2] https://lore.kernel.org/netdev/1458699966-3752-1-git-send-email-gilberto.bertin@gmail.com/ > > [3] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/ > > [4] https://urldefense.proofpoint.com/v2/url?u=https-3A__blog.cloudflare.com_sockmap-2Dtcp-2Dsplicing-2Dof-2Dthe-2Dfuture_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=lSo-FsOeNl_8znZZ07H8I6ZYAinPKTR5C3Cn_Ol3QYQ&s=NerUqb4j7IsGBTcni6Yxk40wf6kTkckHXn3Nx5i4mCU&e=
On Mon, Oct 28, 2019 at 02:05:24PM -0700, John Fastabend wrote: > Martin Lau wrote: > > On Mon, Oct 28, 2019 at 01:35:26PM +0100, Jakub Sitnicki wrote: > > > On Mon, Oct 28, 2019 at 06:52 AM CET, Martin Lau wrote: > > > > On Tue, Oct 22, 2019 at 01:37:25PM +0200, Jakub Sitnicki wrote: > > > >> This patch set is a follow up on a suggestion from LPC '19 discussions to > > > >> make SOCKMAP (or a new map type derived from it) a generic type for storing > > > >> established as well as listening sockets. > > > >> > > > >> We found ourselves in need of a map type that keeps references to listening > > > >> sockets when working on making the socket lookup programmable, aka BPF > > > >> inet_lookup [1]. Initially we repurposed REUSEPORT_SOCKARRAY but found it > > > >> problematic to extend due to being tightly coupled with reuseport > > > >> logic (see slides [2]). > > > >> So we've turned our attention to SOCKMAP instead. > > > >> > > > >> As it turns out the changes needed to make SOCKMAP suitable for storing > > > >> listening sockets are self-contained and have use outside of programming > > > >> the socket lookup. Hence this patch set. > > > >> > > > >> With these patches SOCKMAP can be used in SK_REUSEPORT BPF programs as a > > > >> drop-in replacement for REUSEPORT_SOCKARRAY for TCP. This can hopefully > > > >> lead to code consolidation between the two map types in the future. > > > > What is the plan for UDP support in sockmap? > > > > > > It's on our road-map because without SOCKMAP support for UDP we won't be > > > able to move away from TPROXY [1] and custom SO_BINDTOPREFIX extension > > > [2] for steering new UDP flows to receiving sockets. Also we would like > > > to look into using SOCKMAP for connected UDP socket splicing in the > > > future [3]. > > > > > > I was planning to split work as follows: > > > > > > 1. SOCKMAP support for listening sockets (this series) > > > 2. programmable socket lookup for TCP (cut-down version of [4]) > > > 3. SOCKMAP support for UDP (work not started) > > hmm...It is hard to comment how the full UDP sockmap may > > work out without a code attempt because I am not fluent in > > sock_map ;) > > > > From a quick look, it seems there are quite a few things to do. > > For example, the TCP_SKB_CB(skb) usage and how that may look > > like in UDP. "struct udp_skb_cb" is 28 bytes while "struct napi_gro_cb" > > seems to be 48 bytes already which may need a closer look. > > The extra bits sockmap needs are used for redirecting between > between sockets. These will fit in the udp cb area with some > extra room to spare. If that is paticularly challenging we can > also create a program attach type which would preclude using > those bits in the sk_reuseport bpf program types. We already > have types for rx, tx, nop progs, so one more should be fine. > > So at least that paticular concern is not difficult to fix. > > > > > > 4. programmable socket lookup for UDP (rest of [4]) > > > > > > I'm open to suggestions on how to organize it. > > > > > > >> Having said that, the main intention here is to lay groundwork for using > > > >> SOCKMAP in the next iteration of programmable socket lookup patches. > > > > What may be the minimal to get only lookup work for UDP sockmap? > > > > .close() and .unhash()? > > > > > > John would know better. I haven't tried doing it yet. > > > > > > From just reading the code - override the two proto ops you mentioned, > > > close and unhash, and adapt the socket checks in SOCKMAP. > > Do your use cases need bpf prog attached to sock_map? > > Perhaps not specifically sock_map but they do need to be consolidated > into a map somewhere IMO this has proven to be the most versatile. We > can add sockets from the various BPF hooks or from user space and have > the ability to use the existing map tools, etc. > > > > > If not, would it be cleaner to delicate another map_type > > for lookup-only use case to have both TCP and UDP support. > > But we (Cilium project and above splicing use case is also interested) > will need UDP support so it will be supported regardless of the > SK_REUSEPORT_BPF so I think it makes sense to consolidate all these > use cases on to the existing sockmap. > > Also sockmap supports inserting sockets from BPF and from userspace > which actually requires a bit of logic to track state, etc. Its been > in use and been beat on by various automated test tools so I think > at minimum this needs to be reused. Re-implementing this logic seems > a waste of time and it wasn't exactly trivial and took some work. > > Being able to insert the sockets from XDP (support coming soon) and > from sock_ops programs turns out to be fairly powerful. > > So in short I think it makes most sense to consolidate on sock_map > because > > (a) we need and will add udp support regardless, > (b) we already handle the tricky parts inerting/removing live sockets I didn't mean not to reuse the existing sockmap logic on tracking socks life-time. I was exploring options if the first step for UDP could be lookup-only support first. It is always better to get full UDP support ;) It seems to be confident also, then there is little reason not to do so in UDP sockmap support v1. > (c) from this series it looks like its fairly straight forward > (d) we get lots of shared code > > Thanks, > John > > > > > > > > > > -Jakub > > > > > > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__blog.cloudflare.com_how-2Dwe-2Dbuilt-2Dspectrum_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=lSo-FsOeNl_8znZZ07H8I6ZYAinPKTR5C3Cn_Ol3QYQ&s=DZgW8-2Xl1P8NU59ji4ieQLzwWpx4t3gGq_tqB0l3Bo&e= > > > [2] https://lore.kernel.org/netdev/1458699966-3752-1-git-send-email-gilberto.bertin@gmail.com/ > > > [3] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/ > > > [4] https://urldefense.proofpoint.com/v2/url?u=https-3A__blog.cloudflare.com_sockmap-2Dtcp-2Dsplicing-2Dof-2Dthe-2Dfuture_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=lSo-FsOeNl_8znZZ07H8I6ZYAinPKTR5C3Cn_Ol3QYQ&s=NerUqb4j7IsGBTcni6Yxk40wf6kTkckHXn3Nx5i4mCU&e= > >
On Mon, Oct 28, 2019 at 10:38 PM CET, Martin Lau wrote: > It is always better to get full UDP support ;) > It seems to be confident also, then there is little reason not to do > so in UDP sockmap support v1. Let me give it a shot :-)
On Mon, Oct 28, 2019 at 08:04 PM CET, John Fastabend wrote: > Jakub Sitnicki wrote: >> I was planning to split work as follows: >> >> 1. SOCKMAP support for listening sockets (this series) >> 2. programmable socket lookup for TCP (cut-down version of [4]) >> 3. SOCKMAP support for UDP (work not started) >> 4. programmable socket lookup for UDP (rest of [4]) >> >> I'm open to suggestions on how to organize it. > > Looks good to me. I've had UDP support on my todo list for awhile now > but it hasn't got to the top yet so glad to see this. > > Also perhaps not necessary for your work but I have some patches on my > stack I'll try to get out soon to get ktls + receive hooks working. Thanks for the heads-up. If you have a dev branch somewhere, I'm curious to see what's cooking. Otherwise I'll keep an eye out for your patches.