mbox series

[bpf-next,0/9] Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY and BPF_PROG_TYPE_SK_REUSEPORT

Message ID 20180808075917.3009181-1-kafai@fb.com
Headers show
Series Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY and BPF_PROG_TYPE_SK_REUSEPORT | expand

Message

Martin KaFai Lau Aug. 8, 2018, 7:59 a.m. UTC
This series introduces a new map type "BPF_MAP_TYPE_REUSEPORT_SOCKARRAY"
and a new prog type BPF_PROG_TYPE_SK_REUSEPORT.

Here is a snippet from a commit message:

"To unleash the full potential of a bpf prog, it is essential for the
userspace to be capable of directly setting up a bpf map which can then
be consumed by the bpf prog to make decision.  In this case, decide which
SO_REUSEPORT sk to serve the incoming request.

By adding BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, the userspace has total control
and visibility on where a SO_REUSEPORT sk should be located in a bpf map.
The later patch will introduce BPF_PROG_TYPE_SK_REUSEPORT such that
the bpf prog can directly select a sk from the bpf map.  That will
raise the programmability of the bpf prog attached to a reuseport
group (a group of sk serving the same IP:PORT).

For example, in UDP, the bpf prog can peek into the payload (e.g.
through the "data" pointer introduced in the later patch) to learn
the application level's connection information and then decide which sk
to pick from a bpf map.  The userspace can tightly couple the sk's location
in a bpf map with the application logic in generating the UDP payload's
connection information.  This connection info contact/API stays within the
userspace.

Also, when used with map-in-map, the userspace can switch the
old-server-process's inner map to a new-server-process's inner map
in one call "bpf_map_update_elem(outer_map, &index, &new_reuseport_array)".
The bpf prog will then direct incoming requests to the new process instead
of the old process.  The old process can finish draining the pending
requests (e.g. by "accept()") before closing the old-fds.  [Note that
deleting a fd from a bpf map does not necessary mean the fd is closed]"

Please see individual patch for details

Martin KaFai Lau (9):
  tcp: Avoid TCP syncookie rejected by SO_REUSEPORT socket
  net: Add ID (if needed) to sock_reuseport and expose reuseport_lock
  bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY
  bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT
  bpf: Enable BPF_PROG_TYPE_SK_REUSEPORT bpf prog in reuseport selection
  bpf: Refactor ARRAY_SIZE macro to bpf_util.h
  bpf: Sync bpf.h uapi to tools/
  bpf: test BPF_MAP_TYPE_REUSEPORT_SOCKARRAY
  bpf: Test BPF_PROG_TYPE_SK_REUSEPORT

 include/linux/bpf.h                           |  28 +
 include/linux/bpf_types.h                     |   6 +
 include/linux/filter.h                        |  16 +
 include/net/addrconf.h                        |   1 +
 include/net/sock_reuseport.h                  |  19 +-
 include/net/tcp.h                             |  30 +-
 include/uapi/linux/bpf.h                      |  37 +-
 kernel/bpf/Makefile                           |   3 +
 kernel/bpf/arraymap.c                         |   2 +-
 kernel/bpf/reuseport_array.c                  | 363 +++++++++
 kernel/bpf/syscall.c                          |   6 +
 kernel/bpf/verifier.c                         |   9 +
 net/core/filter.c                             | 354 ++++++++-
 net/core/sock_reuseport.c                     |  92 ++-
 net/ipv4/inet_connection_sock.c               |   9 +
 net/ipv4/inet_hashtables.c                    |  19 +-
 net/ipv4/udp.c                                |   9 +-
 net/ipv6/inet6_hashtables.c                   |  14 +-
 net/ipv6/udp.c                                |   4 +
 tools/include/uapi/linux/bpf.h                |  37 +-
 tools/lib/bpf/bpf.c                           |   1 +
 tools/lib/bpf/bpf.h                           |   1 +
 tools/lib/bpf/libbpf.c                        |   1 +
 tools/testing/selftests/bpf/Makefile          |   4 +-
 tools/testing/selftests/bpf/bpf_helpers.h     |   4 +
 tools/testing/selftests/bpf/bpf_util.h        |   4 +
 tools/testing/selftests/bpf/test_align.c      |   5 +-
 tools/testing/selftests/bpf/test_btf.c        |   5 +-
 tools/testing/selftests/bpf/test_maps.c       | 262 ++++++-
 .../selftests/bpf/test_select_reuseport.c     | 688 ++++++++++++++++++
 .../bpf/test_select_reuseport_common.h        |  36 +
 .../bpf/test_select_reuseport_kern.c          | 180 +++++
 tools/testing/selftests/bpf/test_sock.c       |   5 +-
 tools/testing/selftests/bpf/test_sock_addr.c  |   5 +-
 tools/testing/selftests/bpf/test_verifier.c   |   5 +-
 35 files changed, 2167 insertions(+), 97 deletions(-)
 create mode 100644 kernel/bpf/reuseport_array.c
 create mode 100644 tools/testing/selftests/bpf/test_select_reuseport.c
 create mode 100644 tools/testing/selftests/bpf/test_select_reuseport_common.h
 create mode 100644 tools/testing/selftests/bpf/test_select_reuseport_kern.c

Comments

Daniel Borkmann Aug. 11, 2018, 12:32 a.m. UTC | #1
On 08/08/2018 09:59 AM, Martin KaFai Lau wrote:
> This series introduces a new map type "BPF_MAP_TYPE_REUSEPORT_SOCKARRAY"
> and a new prog type BPF_PROG_TYPE_SK_REUSEPORT.
> 
> Here is a snippet from a commit message:
> 
> "To unleash the full potential of a bpf prog, it is essential for the
> userspace to be capable of directly setting up a bpf map which can then
> be consumed by the bpf prog to make decision.  In this case, decide which
> SO_REUSEPORT sk to serve the incoming request.
> 
> By adding BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, the userspace has total control
> and visibility on where a SO_REUSEPORT sk should be located in a bpf map.
> The later patch will introduce BPF_PROG_TYPE_SK_REUSEPORT such that
> the bpf prog can directly select a sk from the bpf map.  That will
> raise the programmability of the bpf prog attached to a reuseport
> group (a group of sk serving the same IP:PORT).
> 
> For example, in UDP, the bpf prog can peek into the payload (e.g.
> through the "data" pointer introduced in the later patch) to learn
> the application level's connection information and then decide which sk
> to pick from a bpf map.  The userspace can tightly couple the sk's location
> in a bpf map with the application logic in generating the UDP payload's
> connection information.  This connection info contact/API stays within the
> userspace.
> 
> Also, when used with map-in-map, the userspace can switch the
> old-server-process's inner map to a new-server-process's inner map
> in one call "bpf_map_update_elem(outer_map, &index, &new_reuseport_array)".
> The bpf prog will then direct incoming requests to the new process instead
> of the old process.  The old process can finish draining the pending
> requests (e.g. by "accept()") before closing the old-fds.  [Note that
> deleting a fd from a bpf map does not necessary mean the fd is closed]"
> 
> Please see individual patch for details
> 
> Martin KaFai Lau (9):
>   tcp: Avoid TCP syncookie rejected by SO_REUSEPORT socket
>   net: Add ID (if needed) to sock_reuseport and expose reuseport_lock
>   bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY
>   bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT
>   bpf: Enable BPF_PROG_TYPE_SK_REUSEPORT bpf prog in reuseport selection
>   bpf: Refactor ARRAY_SIZE macro to bpf_util.h
>   bpf: Sync bpf.h uapi to tools/
>   bpf: test BPF_MAP_TYPE_REUSEPORT_SOCKARRAY
>   bpf: Test BPF_PROG_TYPE_SK_REUSEPORT

Applied to bpf-next, thanks Martin!