[bpf-next,v6,0/3] libbpf: adding AF_XDP support

Message ID	1550740888-26439-1-git-send-email-magnus.karlsson@intel.com
Headers	show Return-Path: <netdev-owner@vger.kernel.org> From: Magnus Karlsson <magnus.karlsson@intel.com> To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org, jakub.kicinski@netronome.com, bjorn.topel@gmail.com, qi.z.zhang@intel.com Cc: brouer@redhat.com, xiaolong.ye@intel.com Subject: [PATCH bpf-next v6 0/3] libbpf: adding AF_XDP support Date: Thu, 21 Feb 2019 10:21:25 +0100 Message-Id: <1550740888-26439-1-git-send-email-magnus.karlsson@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: netdev-owner@vger.kernel.org Precedence: bulk
Series	libbpf: adding AF_XDP support \| expand [bpf-next,v6,0/3] libbpf: adding AF_XDP support [bpf-next,v6,1/3] libbpf: add support for using AF_XDP sockets [bpf-next,v6,2/3] samples/bpf: convert xdpsock to use libbpf for AF_XDP access [bpf-next,v6,3/3] xsk: add FAQ to facilitate for first time users

Message ID

1550740888-26439-1-git-send-email-magnus.karlsson@intel.com

Headers

From: Magnus Karlsson <magnus.karlsson@intel.com>
To: magnus.karlsson@intel.com, bjorn.topel@intel.com, ast@kernel.org,
	daniel@iogearbox.net, netdev@vger.kernel.org,
	jakub.kicinski@netronome.com, bjorn.topel@gmail.com, qi.z.zhang@intel.com
Cc: brouer@redhat.com, xiaolong.ye@intel.com
Subject: [PATCH bpf-next v6 0/3] libbpf: adding AF_XDP support
Date: Thu, 21 Feb 2019 10:21:25 +0100
Message-Id: <1550740888-26439-1-git-send-email-magnus.karlsson@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Sender: netdev-owner@vger.kernel.org
Precedence: bulk

Series

libbpf: adding AF_XDP support | expand

Message

Magnus Karlsson Feb. 21, 2019, 9:21 a.m. UTC

This patch proposes to add AF_XDP support to libbpf. The main reason
for this is to facilitate writing applications that use AF_XDP by
offering higher-level APIs that hide many of the details of the AF_XDP
uapi. This is in the same vein as libbpf facilitates XDP adoption by
offering easy-to-use higher level interfaces of XDP
functionality. Hopefully this will facilitate adoption of AF_XDP, make
applications using it simpler and smaller, and finally also make it
possible for applications to benefit from optimizations in the AF_XDP
user space access code. Previously, people just copied and pasted the
code from the sample application into their application, which is not
desirable.

The proposed interface is composed of two parts:

* Low-level access interface to the four rings and the packet
* High-level control plane interface for creating and setting up umems
and AF_XDP sockets. This interface also loads a simple XDP program
that routes all traffic on a queue up to the AF_XDP socket.

The sample program has been updated to use this new interface and in
that process it lost roughly 300 lines of code. I cannot detect any
performance degradations due to the use of this library instead of the
previous functions that were inlined in the sample application. But I
did measure this on a slower machine and not the Broadwell that we
normally use.

The rings are now called xsk_ring and when a producer operates on
it. It is xsk_ring_prod and for a consumer it is xsk_ring_cons. This
way we can get some compile time error checking that the rings are
used correctly.

Comments and contenplations:

* The current behaviour is that the library loads an XDP program (if
requested to do so) but the clean up of this program is left to the
application. It would be possible to implement this cleanup in the
library, but it would require state to be kept on netdev level,
which there is none at the moment, and the synchronization of this
between processes. All this adding complexity. But when we get an
XDP program per queue id, then it becomes trivial to also remove the
XDP program when the application exits. This proposal from Jesper,
Björn and others will also improve the performance of libbpf, since
most of the XDP program code can be removed when that feature is
supported.

* In a future release, I am planning on adding a higher level data
plane interface too. This will be based around recvmsg and sendmsg
with the use of struct iovec for batching, without the user having
to know anything about the underlying four rings of an AF_XDP
socket. There will be one semantic difference though from the
standard recvmsg and that is that the kernel will fill in the iovecs
instead of the application. But the rest should be the same as the
libc versions so that application writers feel at home.

Patch 1: adds AF_XDP support in libbpf
Patch 2: updates the xdpsock sample application to use the libbpf functions
Patch 3: Documentation update to help first time users

Changes v5 to v6:
* Fixed prog_fd bug found by Xiaolong Ye. Thanks!
Changes v4 to v5:
* Added a FAQ to the documentation
* Removed xsk_umem__get_data and renamed xsk_umem__get_dat_raw to
xsk_umem__get_data
* Replaced the netlink code with bpf_get_link_xdp_id()
* Dynamic allocation of the map sizes. They are now sized after
the max number of queueus on the netdev in question.
Changes v3 to v4:
* Dropped the pr_*() patch in favor of Yonghong Song's patch set
* Addressed the review comments of Daniel Borkmann, mainly leaking
of file descriptors at clean up and making the data plane APIs
all static inline (with the exception of xsk_umem__get_data that
uses an internal structure I do not want to expose).
* Fixed the netlink callback as suggested by Maciej Fijalkowski.
* Removed an unecessary include in the sample program as spotted by
Ilia Fillipov.
Changes v2 to v3:
* Added automatic loading of a simple XDP program that routes all
traffic on a queue up to the AF_XDP socket. This program loading
can be disabled.
* Updated function names to be consistent with the libbpf naming
convention
* Moved all code to xsk.[ch]
* Removed all the XDP program loading code from the sample since
this is now done by libbpf
* The initialization functions now return a handle as suggested by
Alexei
* const statements added in the API where applicable.
Changes v1 to v2:
* Fixed cleanup of library state on error.
* Moved API to initial version
* Prefixed all public functions by xsk__ instead of xsk_
* Added comment about changed default ring sizes, batch size and umem
size in the sample application commit message
* The library now only creates an Rx or Tx ring if the respective
parameter is != NULL

Note that for zero-copy to work on FVL you need the following patch:
https://lore.kernel.org/netdev/1548770597-16141-1-git-send-email-magnus.karlsson@intel.com/.
For ixgbe, you need a similar patch called found here:
https://lore.kernel.org/netdev/CAJ8uoz1GJBmC0GFbURvEzY4kDZZ6C7O9+1F+gV0y=GOMGLobUQ@mail.gmail.com/.

I based this patch set on bpf-next commit 435b3ff5b08a ("bpf, seccomp: fix false positive preemption splat for cbpf->ebpf progs")

Thanks: Magnus

Magnus Karlsson (3):
libbpf: add support for using AF_XDP sockets
samples/bpf: convert xdpsock to use libbpf for AF_XDP access
xsk: add FAQ to facilitate for first time users

Documentation/networking/af_xdp.rst | 36 +-
samples/bpf/Makefile | 1 -
samples/bpf/xdpsock.h | 11 -
samples/bpf/xdpsock_kern.c | 56 ---
samples/bpf/xdpsock_user.c | 841 +++++++++++-------------------------
tools/include/uapi/linux/ethtool.h | 51 +++
tools/include/uapi/linux/if_xdp.h | 78 ++++
tools/lib/bpf/Build | 2 +-
tools/lib/bpf/Makefile | 5 +-
tools/lib/bpf/README.rst | 15 +-
tools/lib/bpf/libbpf.map | 6 +
tools/lib/bpf/xsk.c | 723 +++++++++++++++++++++++++++++++
tools/lib/bpf/xsk.h | 203 +++++++++
13 files changed, 1376 insertions(+), 652 deletions(-)
delete mode 100644 samples/bpf/xdpsock.h
delete mode 100644 samples/bpf/xdpsock_kern.c
create mode 100644 tools/include/uapi/linux/ethtool.h
create mode 100644 tools/include/uapi/linux/if_xdp.h
create mode 100644 tools/lib/bpf/xsk.c
create mode 100644 tools/lib/bpf/xsk.h

--
2.7.4

Comments

Daniel Borkmann Feb. 25, 2019, 10:59 p.m. UTC | #1

On 02/21/2019 10:21 AM, Magnus Karlsson wrote:
> This patch proposes to add AF_XDP support to libbpf. The main reason
> for this is to facilitate writing applications that use AF_XDP by
> offering higher-level APIs that hide many of the details of the AF_XDP
> uapi. This is in the same vein as libbpf facilitates XDP adoption by
> offering easy-to-use higher level interfaces of XDP
> functionality. Hopefully this will facilitate adoption of AF_XDP, make
> applications using it simpler and smaller, and finally also make it
> possible for applications to benefit from optimizations in the AF_XDP
> user space access code. Previously, people just copied and pasted the
> code from the sample application into their application, which is not
> desirable.
> 
> The proposed interface is composed of two parts:
> 
> * Low-level access interface to the four rings and the packet
> * High-level control plane interface for creating and setting up umems
>   and AF_XDP sockets. This interface also loads a simple XDP program
>   that routes all traffic on a queue up to the AF_XDP socket.
> 
> The sample program has been updated to use this new interface and in
> that process it lost roughly 300 lines of code. I cannot detect any
> performance degradations due to the use of this library instead of the
> previous functions that were inlined in the sample application. But I
> did measure this on a slower machine and not the Broadwell that we
> normally use.
> 
> The rings are now called xsk_ring and when a producer operates on
> it. It is xsk_ring_prod and for a consumer it is xsk_ring_cons. This
> way we can get some compile time error checking that the rings are
> used correctly.
> 
> Comments and contenplations:
> 
> * The current behaviour is that the library loads an XDP program (if
>   requested to do so) but the clean up of this program is left to the
>   application. It would be possible to implement this cleanup in the
>   library, but it would require state to be kept on netdev level,
>   which there is none at the moment, and the synchronization of this
>   between processes. All this adding complexity. But when we get an
>   XDP program per queue id, then it becomes trivial to also remove the
>   XDP program when the application exits. This proposal from Jesper,
>   Björn and others will also improve the performance of libbpf, since
>   most of the XDP program code can be removed when that feature is
>   supported.
> 
> * In a future release, I am planning on adding a higher level data
>   plane interface too. This will be based around recvmsg and sendmsg
>   with the use of struct iovec for batching, without the user having
>   to know anything about the underlying four rings of an AF_XDP
>   socket. There will be one semantic difference though from the
>   standard recvmsg and that is that the kernel will fill in the iovecs
>   instead of the application. But the rest should be the same as the
>   libc versions so that application writers feel at home.
> 
> Patch 1: adds AF_XDP support in libbpf
> Patch 2: updates the xdpsock sample application to use the libbpf functions
> Patch 3: Documentation update to help first time users
> 
> Changes v5 to v6:
>   * Fixed prog_fd bug found by Xiaolong Ye. Thanks!
> Changes v4 to v5:
>   * Added a FAQ to the documentation
>   * Removed xsk_umem__get_data and renamed xsk_umem__get_dat_raw to
>     xsk_umem__get_data
>   * Replaced the netlink code with bpf_get_link_xdp_id()
>   * Dynamic allocation of the map sizes. They are now sized after
>     the max number of queueus on the netdev in question.

Looks better, I've applied it to bpf-next, thanks!