mbox series

[bpf-next,v3,0/5] xdp: Support multiple programs on a single interface through chain calls

Message ID 157046883502.2092443.146052429591277809.stgit@alrua-x1
Headers show
Series xdp: Support multiple programs on a single interface through chain calls | expand

Message

Toke Høiland-Jørgensen Oct. 7, 2019, 5:20 p.m. UTC
This series adds support for executing multiple XDP programs on a single
interface in sequence, through the use of chain calls, as discussed at the Linux
Plumbers Conference last month:

https://linuxplumbersconf.org/event/4/contributions/460/

# HIGH-LEVEL IDEA

Since Alexei pointed out some issues with trying to rewrite the eBPF byte code,
let's try a third approach: We add the ability to chain call programs into the
eBPF execution core itself, but without rewriting the eBPF byte code.

As in the previous version, the bpf() syscall gets a couple of new commands
which takes a pair of BPF program fds and a return code. It will then attach the
second program to the first one in a structured keyed by return code. When a
program chain is thus established, the former program will tail call to the
latter at the end of its execution.

The actual tail calling is achieved by adding a new flag to struct bpf_prog and
having BPF_PROG_RUN run the chain call logic if that flag is set. This means
that if the feature is *not* used, the overhead is a single conditional branch
(which means I couldn't measure a performance difference, as can be seen in the
results below).

For this version I kept the load-time flag from the previous version, to avoid
having to remove the read-only memory protection from the bpf prog. Only
programs loaded with this flag set can have other programs attached to them for
chain calls.

As before, it shouldn't be necessary to set the flag on program load time, but
rather we should enable the feature when a chain call program is first loaded.
We could conceivably just remove the RO property from the first page of struct
bpf_prog and set the flag as needed.

# PERFORMANCE

I performed a simple performance test to get an initial feel for the overhead of
the chain call mechanism. This test consists of running only two programs in
sequence: One that returns XDP_PASS and another that returns XDP_DROP. I then
measure the drop PPS performance and compare it to a baseline of just a single
program that only returns XDP_DROP.

For comparison, a test case that uses regular eBPF tail calls to sequence two
programs together is also included.

| Test case                        | Perf      | Overhead |
|----------------------------------+-----------+----------|
| Before patch (XDP DROP program)  | 31.5 Mpps |          |
| After patch (XDP DROP program)   | 32.0 Mpps |          |
| XDP chain call (XDP_PASS return) | 28.5 Mpps | 3.8 ns   |
| XDP chain call (wildcard return) | 28.1 Mpps | 4.3 ns   |

I consider the "Before patch" and "After patch" to be identical; the .5 Mpps
difference is within the regular test variance I see between runs. Likewise,
there is probably no significant difference between hooking the XDP_PASS return
code and using the wildcard slot.

# PATCH SET STRUCTURE
This series is structured as follows:

- Patch 1: Adds the call chain looping logic
- Patch 2: Adds the new commands added to the bpf() syscall
- Patch 3-4: Tools/ update and libbpf syscall wrappers
- Patch 5: Selftest  with example user space code (a bit hacky still)

The whole series is also available in my git repo on kernel.org:
https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=xdp-multiprog-03

Changelog:

v3:
  - Keep the UAPI from v2, but change the implementation to hook into
    BPF_PROG_RUN instead of trying to inject instructions into the eBPF program
    itself (since that had problems as Alexei pointed out).
v2:
  - Completely new approach that integrates chain calls into the core eBPF
    runtime instead of doing the map XDP-specific thing with a new map from v1.

---

Toke Høiland-Jørgensen (5):
      bpf: Support chain calling multiple BPF programs after each other
      bpf: Add support for setting chain call sequence for programs
      tools: Update bpf.h header for program chain calls
      libbpf: Add syscall wrappers for BPF_PROG_CHAIN_* commands
      selftests: Add tests for XDP chain calls


 include/linux/bpf.h                           |    3 
 include/linux/filter.h                        |   34 +++
 include/uapi/linux/bpf.h                      |   16 +
 kernel/bpf/core.c                             |    6 
 kernel/bpf/syscall.c                          |   82 ++++++-
 tools/include/uapi/linux/bpf.h                |   16 +
 tools/lib/bpf/bpf.c                           |   34 +++
 tools/lib/bpf/bpf.h                           |    4 
 tools/lib/bpf/libbpf.map                      |    3 
 tools/testing/selftests/bpf/.gitignore        |    1 
 tools/testing/selftests/bpf/Makefile          |    3 
 tools/testing/selftests/bpf/progs/xdp_dummy.c |    6 
 tools/testing/selftests/bpf/test_xdp_chain.sh |   77 ++++++
 tools/testing/selftests/bpf/xdp_chain.c       |  313 +++++++++++++++++++++++++
 14 files changed, 594 insertions(+), 4 deletions(-)
 create mode 100755 tools/testing/selftests/bpf/test_xdp_chain.sh
 create mode 100644 tools/testing/selftests/bpf/xdp_chain.c

Comments

John Fastabend Oct. 7, 2019, 6:58 p.m. UTC | #1
Toke Høiland-Jørgensen wrote:
> This series adds support for executing multiple XDP programs on a single
> interface in sequence, through the use of chain calls, as discussed at the Linux
> Plumbers Conference last month:
> 
> https://linuxplumbersconf.org/event/4/contributions/460/
> 

Can we add RFC to the title if we are just iterating through idea-space here.

> # HIGH-LEVEL IDEA
> 
> Since Alexei pointed out some issues with trying to rewrite the eBPF byte code,
> let's try a third approach: We add the ability to chain call programs into the
> eBPF execution core itself, but without rewriting the eBPF byte code.
> 
> As in the previous version, the bpf() syscall gets a couple of new commands
> which takes a pair of BPF program fds and a return code. It will then attach the
> second program to the first one in a structured keyed by return code. When a
> program chain is thus established, the former program will tail call to the
> latter at the end of its execution.
> 
> The actual tail calling is achieved by adding a new flag to struct bpf_prog and
> having BPF_PROG_RUN run the chain call logic if that flag is set. This means
> that if the feature is *not* used, the overhead is a single conditional branch
> (which means I couldn't measure a performance difference, as can be seen in the
> results below).
> 

I still believe user space should be able to link these multiple programs
together as Ed and I were suggesting in the last series. It seems much cleaner
to handle this with calls and linker steps vs adding something on the side to
handle this. Also by doing it by linking your control program can be arbitrary
complex. For example not just taking the output of one program and jumping
to another but doing arbitrary more complex/interesting things. Taking the
input from multiple programs to pick next call for example.

Maybe I missed a point but it seems like the main complaint is tail calls and
regular calls don't mix well. We want to fix this regardless so I don't think
that should be a blocker on using a linking step in user space.

> For this version I kept the load-time flag from the previous version, to avoid
> having to remove the read-only memory protection from the bpf prog. Only
> programs loaded with this flag set can have other programs attached to them for
> chain calls.
> 
> As before, it shouldn't be necessary to set the flag on program load time, but
> rather we should enable the feature when a chain call program is first loaded.
> We could conceivably just remove the RO property from the first page of struct
> bpf_prog and set the flag as needed.
> 
> # PERFORMANCE
> 
> I performed a simple performance test to get an initial feel for the overhead of
> the chain call mechanism. This test consists of running only two programs in
> sequence: One that returns XDP_PASS and another that returns XDP_DROP. I then
> measure the drop PPS performance and compare it to a baseline of just a single
> program that only returns XDP_DROP.
> 
> For comparison, a test case that uses regular eBPF tail calls to sequence two
> programs together is also included.
> 
> | Test case                        | Perf      | Overhead |
> |----------------------------------+-----------+----------|
> | Before patch (XDP DROP program)  | 31.5 Mpps |          |
> | After patch (XDP DROP program)   | 32.0 Mpps |          |
> | XDP chain call (XDP_PASS return) | 28.5 Mpps | 3.8 ns   |
> | XDP chain call (wildcard return) | 28.1 Mpps | 4.3 ns   |
> 
> I consider the "Before patch" and "After patch" to be identical; the .5 Mpps
> difference is within the regular test variance I see between runs. Likewise,
> there is probably no significant difference between hooking the XDP_PASS return
> code and using the wildcard slot.
> 
> # PATCH SET STRUCTURE
> This series is structured as follows:
> 
> - Patch 1: Adds the call chain looping logic
> - Patch 2: Adds the new commands added to the bpf() syscall
> - Patch 3-4: Tools/ update and libbpf syscall wrappers
> - Patch 5: Selftest  with example user space code (a bit hacky still)
> 
> The whole series is also available in my git repo on kernel.org:
> https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=xdp-multiprog-03
> 
> Changelog:
> 
> v3:
>   - Keep the UAPI from v2, but change the implementation to hook into
>     BPF_PROG_RUN instead of trying to inject instructions into the eBPF program
>     itself (since that had problems as Alexei pointed out).
> v2:
>   - Completely new approach that integrates chain calls into the core eBPF
>     runtime instead of doing the map XDP-specific thing with a new map from v1.
> 
> ---
> 
> Toke Høiland-Jørgensen (5):
>       bpf: Support chain calling multiple BPF programs after each other
>       bpf: Add support for setting chain call sequence for programs
>       tools: Update bpf.h header for program chain calls
>       libbpf: Add syscall wrappers for BPF_PROG_CHAIN_* commands
>       selftests: Add tests for XDP chain calls
> 
> 
>  include/linux/bpf.h                           |    3 
>  include/linux/filter.h                        |   34 +++
>  include/uapi/linux/bpf.h                      |   16 +
>  kernel/bpf/core.c                             |    6 
>  kernel/bpf/syscall.c                          |   82 ++++++-
>  tools/include/uapi/linux/bpf.h                |   16 +
>  tools/lib/bpf/bpf.c                           |   34 +++
>  tools/lib/bpf/bpf.h                           |    4 
>  tools/lib/bpf/libbpf.map                      |    3 
>  tools/testing/selftests/bpf/.gitignore        |    1 
>  tools/testing/selftests/bpf/Makefile          |    3 
>  tools/testing/selftests/bpf/progs/xdp_dummy.c |    6 
>  tools/testing/selftests/bpf/test_xdp_chain.sh |   77 ++++++
>  tools/testing/selftests/bpf/xdp_chain.c       |  313 +++++++++++++++++++++++++
>  14 files changed, 594 insertions(+), 4 deletions(-)
>  create mode 100755 tools/testing/selftests/bpf/test_xdp_chain.sh
>  create mode 100644 tools/testing/selftests/bpf/xdp_chain.c
>
Toke Høiland-Jørgensen Oct. 8, 2019, 8:42 a.m. UTC | #2
John Fastabend <john.fastabend@gmail.com> writes:

> Toke Høiland-Jørgensen wrote:
>> This series adds support for executing multiple XDP programs on a single
>> interface in sequence, through the use of chain calls, as discussed at the Linux
>> Plumbers Conference last month:
>> 
>> https://linuxplumbersconf.org/event/4/contributions/460/
>> 
>
> Can we add RFC to the title if we are just iterating through
> idea-space here.

I don't view this as "just iterating through idea-space". I'll admit
that I may have overestimated the extent to which we were all on the
same page on this after LPC, but I do view these submissions as serious
proposals that are making progress... :)

>> # HIGH-LEVEL IDEA
>> 
>> Since Alexei pointed out some issues with trying to rewrite the eBPF byte code,
>> let's try a third approach: We add the ability to chain call programs into the
>> eBPF execution core itself, but without rewriting the eBPF byte code.
>> 
>> As in the previous version, the bpf() syscall gets a couple of new commands
>> which takes a pair of BPF program fds and a return code. It will then attach the
>> second program to the first one in a structured keyed by return code. When a
>> program chain is thus established, the former program will tail call to the
>> latter at the end of its execution.
>> 
>> The actual tail calling is achieved by adding a new flag to struct bpf_prog and
>> having BPF_PROG_RUN run the chain call logic if that flag is set. This means
>> that if the feature is *not* used, the overhead is a single conditional branch
>> (which means I couldn't measure a performance difference, as can be seen in the
>> results below).
>> 
>
> I still believe user space should be able to link these multiple
> programs together as Ed and I were suggesting in the last series.

I expect that userspace probably could (I mean, after all, eBPF is
within spitting distance of a full almost-Turing-complete executing
environment so userspace can conceivably do pretty much anything).

However, I don't believe that doing it in userspace is the best
solution. I view it as a tradeoff: How much complexity do we have to add
to the kernel to make things easier for userspace. And given that we can
do this without negatively impacting performance, and with a reasonable
cost in terms of complexity (both of which I think this series
demonstrates that yes, we can), I think this makes sense to put in the
kernel.

> Also by doing it by linking your control program can be arbitrary
> complex. For example not just taking the output of one program and
> jumping to another but doing arbitrary more complex/interesting
> things. Taking the input from multiple programs to pick next call for
> example.

I expect there will indeed be more complex use cases where combining
multiple programs in arbitrary complex ways would make a lot of sense,
and doing that by linking in userspace is probably a good fit for that.
But for the simple use case of running multiple programs after one
another, I think it is overkill, and something that is better to do in
the kernel.

-Toke