mbox series

[RFCv2,bpf-next,00/12] Programming socket lookup with BPF

Message ID 20190828072250.29828-1-jakub@cloudflare.com
Headers show
Series Programming socket lookup with BPF | expand

Message

Jakub Sitnicki Aug. 28, 2019, 7:22 a.m. UTC
This patch set adds a mechanism for programming mappings between the local
addresses and listening/receiving sockets with BPF.

It introduces a new per-netns BPF program type, called inet_lookup, which
runs during the socket lookup. The program is allowed to select a
listening/receiving socket from a SOCKARRAY map that the packet will be
delivered to.

BPF inet_lookup intends to be an alternative for:

* SO_BINDTOPREFIX [1] - a mechanism that provides a way to listen/receive
  on all local addresses that belong to a network prefix. An alternative to
  binding to INADDR_ANY that allows applications bound to disjoint network
  prefixes to share a port. Not generic. Never got upstreamed.

* TPROXY [2] - a powerful mechanism that allows steering packets destined
  to non-local addresses to a local socket. It also works for local
  addresses, which is a less restrictive case. Can be used to implement
  what SO_BINDTOPREFIX does, and more - in particular, all ports can be
  redirected to a single socket. Socket dispatch happens early in ingress
  path (PREROUTING hook). Versatile but comes with complexities.

Compared to the above, inet_lookup aims to be a programmatic way to map
(address, port) pairs to a socket. It runs after a routing decision for
local delivery was made, and hence is limited to local addresses only.

Being part of the socket lookup, has a desired effect that redirection is
visible to XDP programs which call bpf_sk_lookup helpers.

When it comes to use cases, we have presented them in RFCv1 [3] cover
letter and also at last Netconf [4]. To recap, they are:

1) sharing a port between two services

   Services are accepting connections on different (disjoint) IP ranges but
   same port. Requests going to 192.0.2.0/24 tcp/80 are handled by NGINX,
   while 198.51.100.0/24 tcp/80 IP range is handled by Apache server.
   Applications are running as different users, in a flat single-netns
   setup.

2) receiving traffic on all ports

   We have a proxy server that accepts connections to _any_ port [5].

A simple demo program that implements (1) could look like

#define NET1 (IP4(192,  0,   2, 0) >> 8)
#define NET2 (IP4(198, 51, 100, 0) >> 8)

#define MAX_SERVERS 2

struct {
	__uint(type, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY);
	__uint(max_entries, MAX_SERVERS);
	__type(key, __u32);
	__type(value, __u64);
} redir_map SEC(".maps");

SEC("inet_lookup/demo_two_servers")
int demo_two_http_servers(struct bpf_inet_lookup *ctx)
{
	__u32 index = 0;
	__u64 flags = 0;

        if (ctx->family != AF_INET)
                return BPF_OK;
	if (ctx->protocol != IPPROTO_TCP)
		return BPF_OK;
        if (ctx->local_port != 80)
                return BPF_OK;

        switch (bpf_ntohl(ctx->local_ip4) >> 8) {
        case NET1:
		index = 0;
		break;
        case NET2:
		index = 1;
		break;
	default:
		return BPF_OK;
        }

        return bpf_redirect_lookup(ctx, &redir_map, &index, flags);
}

Since RFCv1, we've changed the approach from rewriting the lookup key to
map-based redirection. This has been suggested at Netconf, and is a
recurring pattern in existing BPF program types.

We're posting the 2nd version of RFC patch set to collect further feedback
and set context for the presentation and discussions at the upcoming
Network Summit at LPC '19 [6].

Patches are also available on GitHub [7].

Thanks,
Jakub

[1] https://www.spinics.net/lists/netdev/msg370789.html
[2] https://www.kernel.org/doc/Documentation/networking/tproxy.txt
[3] https://lore.kernel.org/netdev/20190618130050.8344-1-jakub@cloudflare.com/
[4] http://vger.kernel.org/netconf2019_files/Programmable%20socket%20lookup.pdf
[5] https://blog.cloudflare.com/how-we-built-spectrum/
[6] https://linuxplumbersconf.org/event/4/contributions/487/
[7] https://github.com/jsitnicki/linux/commits/bpf-inet-lookup

Changes RFCv1 -> RFCv2:

- Make socket lookup redirection map-based. BPF program now uses a
  dedicated helper and a SOCKARRAY map to select the socket to redirect to.
  A consequence of this change is that bpf_inet_lookup context is now
  read-only.

- Look for connected UDP sockets before allowing redirection from BPF.
  This makes connected UDP socket work as expected in the presence of
  inet_lookup prog.

- Share the code for BPF_PROG_{ATTACH,DETACH,QUERY} with flow_dissector,
  the only other per-netns BPF prog type.


Jakub Sitnicki (12):
  flow_dissector: Extract attach/detach/query helpers
  bpf: Introduce inet_lookup program type for redirecting socket lookup
  bpf: Add verifier tests for inet_lookup context access
  inet: Store layer 4 protocol in inet_hashinfo
  udp: Store layer 4 protocol in udp_table
  inet: Run inet_lookup bpf program on socket lookup
  inet6: Run inet_lookup bpf program on socket lookup
  udp: Run inet_lookup bpf program on socket lookup
  udp6: Run inet_lookup bpf program on socket lookup
  bpf: Sync linux/bpf.h to tools/
  libbpf: Add support for inet_lookup program type
  bpf: Test redirecting listening/receiving socket lookup

 include/linux/bpf.h                           |   8 +
 include/linux/bpf_types.h                     |   1 +
 include/linux/filter.h                        |  18 +
 include/net/inet6_hashtables.h                |  19 +
 include/net/inet_hashtables.h                 |  36 +
 include/net/net_namespace.h                   |   2 +
 include/net/udp.h                             |  10 +-
 include/uapi/linux/bpf.h                      |  58 +-
 kernel/bpf/syscall.c                          |  10 +
 kernel/bpf/verifier.c                         |   7 +-
 net/core/filter.c                             | 304 ++++++++
 net/core/flow_dissector.c                     |  65 +-
 net/dccp/proto.c                              |   2 +-
 net/ipv4/inet_hashtables.c                    |   5 +
 net/ipv4/tcp_ipv4.c                           |   2 +-
 net/ipv4/udp.c                                |  59 +-
 net/ipv4/udp_impl.h                           |   2 +-
 net/ipv4/udplite.c                            |   4 +-
 net/ipv6/inet6_hashtables.c                   |   5 +
 net/ipv6/udp.c                                |  54 +-
 net/ipv6/udp_impl.h                           |   2 +-
 net/ipv6/udplite.c                            |   2 +-
 tools/include/uapi/linux/bpf.h                |  58 +-
 tools/lib/bpf/libbpf.c                        |   4 +
 tools/lib/bpf/libbpf.h                        |   2 +
 tools/lib/bpf/libbpf.map                      |   2 +
 tools/lib/bpf/libbpf_probes.c                 |   1 +
 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   5 +-
 tools/testing/selftests/bpf/bpf_helpers.h     |   3 +
 .../selftests/bpf/progs/inet_lookup_progs.c   |  78 ++
 .../testing/selftests/bpf/test_inet_lookup.c  | 522 +++++++++++++
 .../testing/selftests/bpf/test_inet_lookup.sh |  35 +
 .../selftests/bpf/verifier/ctx_inet_lookup.c  | 696 ++++++++++++++++++
 34 files changed, 1974 insertions(+), 108 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/inet_lookup_progs.c
 create mode 100644 tools/testing/selftests/bpf/test_inet_lookup.c
 create mode 100755 tools/testing/selftests/bpf/test_inet_lookup.sh
 create mode 100644 tools/testing/selftests/bpf/verifier/ctx_inet_lookup.c