mbox series

[bpf-next,v2,00/12] Link-based program attachment to network namespaces

Message ID 20200531082846.2117903-1-jakub@cloudflare.com
Headers show
Series Link-based program attachment to network namespaces | expand

Message

Jakub Sitnicki May 31, 2020, 8:28 a.m. UTC
One of the pieces of feedback from recent review of BPF hooks for socket
lookup [0] was that new program types should use bpf_link-based
attachment.

This series introduces new bpf_link type for attaching to network
namespace. All link operations are supported. Errors returned from ops
follow cgroup example. Patch 4 description goes into error semantics.

The major change in v2 is a switch away from RCU to mutex-only
synchronization. Andrii pointed out that it is not needed, and it makes
sense to keep locking straightforward.

Also, there were a couple of bugs in update_prog and fill_info initial
implementation, one picked up by kbuild. Those are now fixed. Tests have
been extended to cover them. Full changelog below.

Series is organized as so:

Patches 1-3 prepare a space in struct net to keep state for attached BPF
programs, and massage the code in flow_dissector to make it attach type
agnostic, to finally move it under kernel/bpf/.

Patch 4, the most important one, introduces new bpf_link link type for
attaching to network namespace.

Patch 5 unifies the update error (ENOLINK) between BPF cgroup and netns.

Patches 6-8 make libbpf and bpftool aware of the new link type.

Patches 9-12 Add and extend tests to check that link low- and high-level
API for operating on links to netns works as intended.

Thanks to Alexei, Andrii, Lorenz, Marek, and Stanislav for feedback.

-jkbs

[0] https://lore.kernel.org/bpf/20200511185218.1422406-1-jakub@cloudflare.com/

Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Cc: Marek Majkowski <marek@cloudflare.com>
Cc: Stanislav Fomichev <sdf@google.com>

v1 -> v2:

- Switch to mutex-only synchronization. Don't rely on RCU grace period
  guarantee when accessing struct net from link release / update /
  fill_info, and when accessing bpf_link from pernet pre_exit
  callback. (Andrii)
- Drop patch 1, no longer needed with mutex-only synchronization.
- Don't leak uninitialized variable contents from fill_info callback
  when link is in defunct state. (kbuild)
- Make fill_info treat the link as defunct (i.e. no attached netns) when
  struct net refcount is 0, but link has not been yet auto-detached.
- Add missing BPF_LINK_TYPE define in bpf_types.h for new link type.
- Fix link update_prog callback to update the prog that will run, and
  not just the link itself.
- Return EEXIST on prog attach when link already exists, and on link
  create when prog is already attached directly. (Andrii)
- Return EINVAL on prog detach when link is attached. (Andrii)
- Fold __netns_bpf_link_attach into its only caller. (Stanislav)
- Get rid of a wrapper around container_of() (Andrii)
- Use rcu_dereference_protected instead of rcu_access_pointer on
  update-side. (Stanislav)
- Make return-on-success from netns_bpf_link_create less
  confusing. (Andrii)
- Adapt bpf_link for cgroup to return ENOLINK when updating a defunct
  link. (Andrii, Alexei)
- Order new exported symbols in libbpf.map alphabetically (Andrii)
- Keep libbpf's "failed to attach link" warning message clear as to what
  we failed to attach to (cgroup vs netns). (Andrii)
- Extract helpers for printing link attach type. (bpftool, Andrii)
- Switch flow_dissector tests to BPF skeleton and extend them to
  exercise link-based flow dissector attachment. (Andrii)
- Harden flow dissector attachment tests with prog query checks after
  prog attach/detach, or link create/update/close.
- Extend flow dissector tests to cover fill_info for defunct links.
- Rebase onto recent bpf-next

Jakub Sitnicki (12):
  flow_dissector: Pull locking up from prog attach callback
  net: Introduce netns_bpf for BPF programs attached to netns
  flow_dissector: Move out netns_bpf prog callbacks
  bpf: Add link-based BPF program attachment to network namespace
  bpf, cgroup: Return ENOLINK for auto-detached links on update
  libbpf: Add support for bpf_link-based netns attachment
  bpftool: Extract helpers for showing link attach type
  bpftool: Support link show for netns-attached links
  selftests/bpf: Add tests for attaching bpf_link to netns
  selftests/bpf, flow_dissector: Close TAP device FD after the test
  selftests/bpf: Convert test_flow_dissector to use BPF skeleton
  selftests/bpf: Extend test_flow_dissector to cover link creation

 include/linux/bpf-netns.h                     |  64 ++
 include/linux/bpf_types.h                     |   3 +
 include/linux/skbuff.h                        |  26 -
 include/net/flow_dissector.h                  |   6 +
 include/net/net_namespace.h                   |   4 +-
 include/net/netns/bpf.h                       |  18 +
 include/uapi/linux/bpf.h                      |   5 +
 kernel/bpf/Makefile                           |   1 +
 kernel/bpf/cgroup.c                           |   2 +-
 kernel/bpf/net_namespace.c                    | 373 +++++++++++
 kernel/bpf/syscall.c                          |  10 +-
 net/core/flow_dissector.c                     | 124 +---
 tools/bpf/bpftool/link.c                      |  54 +-
 tools/include/uapi/linux/bpf.h                |   5 +
 tools/lib/bpf/libbpf.c                        |  23 +-
 tools/lib/bpf/libbpf.h                        |   2 +
 tools/lib/bpf/libbpf.map                      |   1 +
 .../selftests/bpf/prog_tests/flow_dissector.c | 166 +++--
 .../bpf/prog_tests/flow_dissector_reattach.c  | 588 ++++++++++++++++--
 tools/testing/selftests/bpf/progs/bpf_flow.c  |  20 +-
 20 files changed, 1248 insertions(+), 247 deletions(-)
 create mode 100644 include/linux/bpf-netns.h
 create mode 100644 include/net/netns/bpf.h
 create mode 100644 kernel/bpf/net_namespace.c

Comments

Alexei Starovoitov June 1, 2020, 10:26 p.m. UTC | #1
On Sun, May 31, 2020 at 1:28 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> One of the pieces of feedback from recent review of BPF hooks for socket
> lookup [0] was that new program types should use bpf_link-based
> attachment.
>
> This series introduces new bpf_link type for attaching to network
> namespace. All link operations are supported. Errors returned from ops
> follow cgroup example. Patch 4 description goes into error semantics.
>
> The major change in v2 is a switch away from RCU to mutex-only
> synchronization. Andrii pointed out that it is not needed, and it makes
> sense to keep locking straightforward.
>
> Also, there were a couple of bugs in update_prog and fill_info initial
> implementation, one picked up by kbuild. Those are now fixed. Tests have
> been extended to cover them. Full changelog below.
>
> Series is organized as so:
>
> Patches 1-3 prepare a space in struct net to keep state for attached BPF
> programs, and massage the code in flow_dissector to make it attach type
> agnostic, to finally move it under kernel/bpf/.
>
> Patch 4, the most important one, introduces new bpf_link link type for
> attaching to network namespace.
>
> Patch 5 unifies the update error (ENOLINK) between BPF cgroup and netns.
>
> Patches 6-8 make libbpf and bpftool aware of the new link type.
>
> Patches 9-12 Add and extend tests to check that link low- and high-level
> API for operating on links to netns works as intended.
>
> Thanks to Alexei, Andrii, Lorenz, Marek, and Stanislav for feedback.
>
> -jkbs
>
> [0] https://lore.kernel.org/bpf/20200511185218.1422406-1-jakub@cloudflare.com/
>
> Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Cc: Marek Majkowski <marek@cloudflare.com>
> Cc: Stanislav Fomichev <sdf@google.com>
>
> v1 -> v2:
>
> - Switch to mutex-only synchronization. Don't rely on RCU grace period
>   guarantee when accessing struct net from link release / update /
>   fill_info, and when accessing bpf_link from pernet pre_exit
>   callback. (Andrii)
> - Drop patch 1, no longer needed with mutex-only synchronization.
> - Don't leak uninitialized variable contents from fill_info callback
>   when link is in defunct state. (kbuild)
> - Make fill_info treat the link as defunct (i.e. no attached netns) when
>   struct net refcount is 0, but link has not been yet auto-detached.
> - Add missing BPF_LINK_TYPE define in bpf_types.h for new link type.
> - Fix link update_prog callback to update the prog that will run, and
>   not just the link itself.
> - Return EEXIST on prog attach when link already exists, and on link
>   create when prog is already attached directly. (Andrii)
> - Return EINVAL on prog detach when link is attached. (Andrii)
> - Fold __netns_bpf_link_attach into its only caller. (Stanislav)
> - Get rid of a wrapper around container_of() (Andrii)
> - Use rcu_dereference_protected instead of rcu_access_pointer on
>   update-side. (Stanislav)
> - Make return-on-success from netns_bpf_link_create less
>   confusing. (Andrii)
> - Adapt bpf_link for cgroup to return ENOLINK when updating a defunct
>   link. (Andrii, Alexei)
> - Order new exported symbols in libbpf.map alphabetically (Andrii)
> - Keep libbpf's "failed to attach link" warning message clear as to what
>   we failed to attach to (cgroup vs netns). (Andrii)
> - Extract helpers for printing link attach type. (bpftool, Andrii)
> - Switch flow_dissector tests to BPF skeleton and extend them to
>   exercise link-based flow dissector attachment. (Andrii)
> - Harden flow dissector attachment tests with prog query checks after
>   prog attach/detach, or link create/update/close.
> - Extend flow dissector tests to cover fill_info for defunct links.
> - Rebase onto recent bpf-next

I really like the set. Applied. Thanks!