[ovs-dev,PATCHv4] netdev-afxdp: Enable loading XDP program.
diff mbox series

Message ID 1573223355-104929-1-git-send-email-u9012063@gmail.com
State New
Headers show
Series
  • [ovs-dev,PATCHv4] netdev-afxdp: Enable loading XDP program.
Related show

Commit Message

William Tu Nov. 8, 2019, 2:29 p.m. UTC
Now netdev-afxdp always forwards all packets to userspace because
it is using libbpf's default XDP program, see 'xsk_load_xdp_prog'.
There are some cases when users want to keep packets in kernel instead
of sending to userspace, for example, management traffic such as SSH
should be processed in kernel.

The patch enables loading the user-provide XDP program by doing
  $ovs-vsctl -- set int afxdp-p0 options:xdp-obj=<path/to/xdp/obj>

So users can implement their filtering logic or traffic steering idea
in their XDP program, and rest of the traffic passes to AF_XDP socket
handled by OVS.

Signed-off-by: William Tu <u9012063@gmail.com>
---
v4:
    Feedbacks from Eelco.
    - First load the program, then configure xsk.
      Let API take care of xdp prog and map loading, don't set
      XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD.
    - When loading custom xdp, need to close(prog_fd) and close(map_fd)
      to release the resources
    - make sure prog and map is unloaded by bpftool.
    - update doc, afxdp.rst
    - Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/608986781

v3:
    Feedbacks from Eelco.
    - keep using xdpobj not xdp-obj (because we alread use xdpmode)
      or we change both to xdp-obj and xdp-mode?
    - log a info message when using external program for better debugging
    - combine some failure messages
    - update doc
    NEW:
    - add options:xdpobj=__default__, to set back to libbpf default prog
    - Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/606153231

v2:
    A couple fixes and remove RFC
---
 Documentation/intro/install/afxdp.rst |  59 +++++++++++++++++
 lib/netdev-afxdp.c                    | 121 +++++++++++++++++++++++++++++++---
 lib/netdev-linux-private.h            |   4 ++
 3 files changed, 176 insertions(+), 8 deletions(-)

Comments

Eelco Chaudron Nov. 12, 2019, 10:25 a.m. UTC | #1
See one remark below, however when I did a quick test with a program 
that would not load it goes into some re-try loop:

2019-11-12T10:13:21.658Z|01609|netdev_afxdp|INFO|eno1: Removing xdp 
program.
2019-11-12T10:13:21.658Z|01610|netdev_afxdp|INFO|Removed program ID: 0, 
fd: 0
2019-11-12T10:13:21.658Z|01611|netdev_afxdp|INFO|eno1: Setting XDP mode 
to DRV.
2019-11-12T10:13:22.224Z|01612|netdev_afxdp|INFO|eno1: Load custom XDP 
program at /root/xdp_pass_kern.o.
2019-11-12T10:13:22.274Z|01613|netdev_afxdp|ERR|xsk_socket__create 
failed (Resource temporarily unavailable) mode: DRV use-need-wakeup: 
true qid: 0
2019-11-12T10:13:22.300Z|01614|netdev_afxdp|ERR|Failed to create AF_XDP 
socket on queue 0.
2019-11-12T10:13:22.300Z|01615|netdev_afxdp|INFO|eno1: Removing xdp 
program.
2019-11-12T10:13:22.658Z|00047|ovs_rcu(urcu2)|WARN|blocked 1000 ms 
waiting for main to quiesce
2019-11-12T10:13:22.735Z|01616|netdev_afxdp|INFO|Removed program ID: 
320, fd: 0
2019-11-12T10:13:22.735Z|01617|netdev_afxdp|ERR|AF_XDP device eno1 
reconfig failed.
2019-11-12T10:13:22.735Z|01618|dpif_netdev|ERR|Failed to set interface 
eno1 new configuration
2019-11-12T10:13:22.735Z|01619|netdev_afxdp|INFO|FIXME: Device tapVM 
always use numa id 0.
2019-11-12T10:13:22.735Z|01620|dpif_netdev|INFO|Core 1 on numa node 0 
assigned port 'tapVM' rx queue 0 (measured processing cycles 0).
2019-11-12T10:13:22.735Z|01621|dpif|WARN|netdev@ovs-netdev: failed to 
add eno1 as port: Invalid argument
2019-11-12T10:13:22.735Z|01622|netdev_afxdp|INFO|eno1: Removing xdp 
program.
2019-11-12T10:13:22.736Z|01623|netdev_afxdp|INFO|Removed program ID: 0, 
fd: 0
2019-11-12T10:13:22.736Z|01624|timeval|WARN|Unreasonably long 1079ms 
poll interval (7ms user, 80ms system)
2019-11-12T10:13:22.736Z|01625|timeval|WARN|faults: 16387 minor, 0 major
2019-11-12T10:13:22.736Z|01626|timeval|WARN|context switches: 327 
voluntary, 337 involuntary
2019-11-12T10:13:22.738Z|01627|netdev_afxdp|INFO|eno1: Removing xdp 
program.
2019-11-12T10:13:22.738Z|01628|netdev_afxdp|INFO|Removed program ID: 0, 
fd: 0
2019-11-12T10:13:22.739Z|01629|netdev_afxdp|INFO|eno1: Setting XDP mode 
to DRV.
2019-11-12T10:13:23.312Z|01630|netdev_afxdp|INFO|eno1: Load custom XDP 
program at /root/xdp_pass_kern.o.
2019-11-12T10:13:23.363Z|01631|netdev_afxdp|ERR|xsk_socket__create 
failed (Resource temporarily unavailable) mode: DRV use-need-wakeup: 
true qid: 0
2019-11-12T10:13:23.392Z|01632|netdev_afxdp|ERR|Failed to create AF_XDP 
socket on queue 0.
2019-11-12T10:13:23.392Z|01633|netdev_afxdp|INFO|eno1: Removing xdp 
program.
2019-11-12T10:13:23.738Z|00048|ovs_rcu(urcu2)|WARN|blocked 1000 ms 
waiting for main to quiesce
2019-11-12T10:13:23.823Z|01634|netdev_afxdp|INFO|Removed program ID: 
324, fd: 0
2019-11-12T10:13:23.823Z|01635|netdev_afxdp|ERR|AF_XDP device eno1 
reconfig failed.
2019-11-12T10:13:23.823Z|01636|dpif_netdev|ERR|Failed to set interface 
eno1 new configuration
2019-11-12T10:13:23.823Z|01637|netdev_afxdp|INFO|FIXME: Device tapVM 
always use numa id 0.
2019-11-12T10:13:23.823Z|01638|dpif_netdev|INFO|Core 1 on numa node 0 
assigned port 'tapVM' rx queue 0 (measured processing cycles 0).
2019-11-12T10:13:23.823Z|01639|dpif|WARN|netdev@ovs-netdev: failed to 
add eno1 as port: Invalid argument

But in addition during this loop it’s not freeing it’s resources:

$  bpftool prog list && bpftool map
4: xdp  tag 80b55d8a76303785
	loaded_at 2019-11-12T05:11:15-0500  uid 0
	xlated 136B  jited 96B  memlock 4096B  map_ids 4
12: xdp  name xdp_prog_simple  tag 3b185187f1855c4c  gpl
	loaded_at 2019-11-12T05:11:58-0500  uid 0
	xlated 16B  jited 35B  memlock 4096B
16: xdp  name xdp_prog_simple  tag 3b185187f1855c4c  gpl
	loaded_at 2019-11-12T05:11:59-0500  uid 0
	xlated 16B  jited 35B  memlock 4096B
20: xdp  name xdp_prog_simple  tag 3b185187f1855c4c  gpl
	loaded_at 2019-11-12T05:12:00-0500  uid 0
	xlated 16B  jited 35B  memlock 4096B
…
…
120: (null)  name xsks_map  flags 0x0
	key 4B  value 4B  max_entries 32  memlock 4096B
122: (null)  name xsks_map  flags 0x0
	key 4B  value 4B  max_entries 32  memlock 4096B
124: (null)  name xsks_map  flags 0x0
	key 4B  value 4B  max_entries 32  memlock 4096B
126: (null)  name xsks_map  flags 0x0
	key 4B  value 4B  max_entries 32  memlock 4096B
128: (null)  name xsks_map  flags 0x0
	key 4B  value 4B  max_entries 32  memlock 4096B
130: (null)  name xsks_map  flags 0x0
	key 4B  value 4B  max_entries 32  memlock 4096B
132: (null)  name xsks_map  flags 0x0
	key 4B  value 4B  max_entries 32  memlock 4096B


On 8 Nov 2019, at 15:29, William Tu wrote:

> Now netdev-afxdp always forwards all packets to userspace because
> it is using libbpf's default XDP program, see 'xsk_load_xdp_prog'.
> There are some cases when users want to keep packets in kernel instead
> of sending to userspace, for example, management traffic such as SSH
> should be processed in kernel.
>
> The patch enables loading the user-provide XDP program by doing
>   $ovs-vsctl -- set int afxdp-p0 options:xdp-obj=<path/to/xdp/obj>
>
> So users can implement their filtering logic or traffic steering idea
> in their XDP program, and rest of the traffic passes to AF_XDP socket
> handled by OVS.
>
> Signed-off-by: William Tu <u9012063@gmail.com>
> ---
> v4:
>     Feedbacks from Eelco.
>     - First load the program, then configure xsk.
>       Let API take care of xdp prog and map loading, don't set
>       XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD.
>     - When loading custom xdp, need to close(prog_fd) and 
> close(map_fd)
>       to release the resources
>     - make sure prog and map is unloaded by bpftool.
>     - update doc, afxdp.rst
>     - Tested-at: 
> https://travis-ci.org/williamtu/ovs-travis/builds/608986781
>
> v3:
>     Feedbacks from Eelco.
>     - keep using xdpobj not xdp-obj (because we alread use xdpmode)
>       or we change both to xdp-obj and xdp-mode?
>     - log a info message when using external program for better 
> debugging
>     - combine some failure messages
>     - update doc
>     NEW:
>     - add options:xdpobj=__default__, to set back to libbpf default 
> prog
>     - Tested-at: 
> https://travis-ci.org/williamtu/ovs-travis/builds/606153231
>
> v2:
>     A couple fixes and remove RFC
> ---
>  Documentation/intro/install/afxdp.rst |  59 +++++++++++++++++
>  lib/netdev-afxdp.c                    | 121 
> +++++++++++++++++++++++++++++++---
>  lib/netdev-linux-private.h            |   4 ++
>  3 files changed, 176 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/intro/install/afxdp.rst 
> b/Documentation/intro/install/afxdp.rst
> index a136db0c950a..d95a85f39035 100644
> --- a/Documentation/intro/install/afxdp.rst
> +++ b/Documentation/intro/install/afxdp.rst
> @@ -273,6 +273,65 @@ Or, use OVS pmd tool::
>    ovs-appctl dpif-netdev/pmd-stats-show
>
>
> +Loading Custom XDP Program
> +--------------------------
> +By defailt, netdev-afxdp always forwards all packets to userspace 
> because
> +it is using libbpf's default XDP program. There are some cases when 
> users
> +want to keep packets in kernel instead of sending to userspace, for 
> example,
> +management traffic such as SSH should be processed in kernel. This 
> can be
> +done by loading the user-provided XDP program::
> +
> +  ovs-vsctl -- set int afxdp-p0 options:xdpobj=<path/to/xdp/obj>
> +
> +So users can implement their filtering logic or traffic steering idea
> +in their XDP program, and rest of the traffic passes to AF_XDP socket
> +handled by OVS. To set it back to default, use::
> +
> +  ovs-vsctl -- set int afxdp-p0 options:xdpobj=__default__

Do we really need this __default__ setting, can’t we just not do:

	ovs-vsctl remove int afxdp-p0 options xdpobj

If so the code below needs some updates…

Maybe we should also change this to xdp-obj based on Ilya’s change

> +Below is a sample C program compiled under kernel's samples/bpf/.
> +
> +.. code-block:: c
> +
> +  #include <uapi/linux/bpf.h>
> +  #include "bpf_helpers.h"
> +
> +  #if LINUX_VERSION_CODE < KERNEL_VERSION(5,3,0)
> +  /* Kernel version before 5.3 needed an additional map */
> +  struct bpf_map_def SEC("maps") qidconf_map = {
> +      .type = BPF_MAP_TYPE_ARRAY,
> +      .key_size = sizeof(int),
> +      .value_size = sizeof(int),
> +      .max_entries = 64,
> +  };
> +  #endif
> +
> +  /* OVS will associate map 'xsks_map' to xsk socket. */
> +  struct bpf_map_def SEC("maps") xsks_map = {
> +      .type = BPF_MAP_TYPE_XSKMAP,
> +      .key_size = sizeof(int),
> +      .value_size = sizeof(int),
> +      .max_entries = 32,
> +  };
> +
> +  SEC("xdp_sock")
> +  int xdp_sock_prog(struct xdp_md *ctx)
> +  {
> +      int index = ctx->rx_queue_index;
> +
> +      /* Customized by user.
> +       * For example
> +       * 1) filter out all SSH traffic and return XDP_PASS
> +       *    for kernel to process.
> +       * 2) Drop unwanted packet by returning XDP_DROP.
> +       */
> +
> +      /* Rest of packets goes to AF_XDP. */
> +      return bpf_redirect_map(&xsks_map, index, 0);
> +  }
> +  char _license[] SEC("license") = "GPL";
> +
> +
>  Example Script
>  --------------
>
> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
> index af654d498a88..853eeb8a8dbe 100644
> --- a/lib/netdev-afxdp.c
> +++ b/lib/netdev-afxdp.c
> @@ -21,6 +21,7 @@
>  #include "netdev-afxdp.h"
>  #include "netdev-afxdp-pool.h"
>
> +#include <bpf/bpf.h>
>  #include <errno.h>
>  #include <inttypes.h>
>  #include <linux/rtnetlink.h>
> @@ -30,6 +31,7 @@
>  #include <stdlib.h>
>  #include <sys/resource.h>
>  #include <sys/socket.h>
> +#include <sys/stat.h>
>  #include <sys/types.h>
>  #include <unistd.h>
>
> @@ -88,9 +90,12 @@ BUILD_ASSERT_DECL(PROD_NUM_DESCS == 
> CONS_NUM_DESCS);
>
>  #define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char 
> *)base))
>
> +#define LIBBPF_XDP_PROGRAM "__default__"
> +
>  static struct xsk_socket_info *xsk_configure(int ifindex, int 
> xdp_queue_id,
>                                               int mode, bool 
> use_need_wakeup);
> -static void xsk_remove_xdp_program(uint32_t ifindex, int xdpmode);
> +static void xsk_remove_xdp_program(uint32_t ifindex, int xdpmode,
> +                                   int prog_fd, int map_fd);
>  static void xsk_destroy(struct xsk_socket_info *xsk);
>  static int xsk_configure_all(struct netdev *netdev);
>  static void xsk_destroy_all(struct netdev *netdev);
> @@ -213,6 +218,23 @@ netdev_afxdp_sweep_unused_pools(void *aux 
> OVS_UNUSED)
>      ovs_mutex_unlock(&unused_pools_mutex);
>  }
>
> +static int
> +xsk_load_prog(const char *path, struct bpf_object **obj,
> +              int *prog_fd)
> +{
> +    struct bpf_prog_load_attr attr = {
> +        .prog_type = BPF_PROG_TYPE_XDP,
> +        .file = path,
> +    };
> +
> +    if (bpf_prog_load_xattr(&attr, obj, prog_fd)) {
> +        VLOG_ERR("Can't load XDP program at '%s'", path);
> +        return EINVAL;
> +    }
> +
> +    return 0;
> +}
> +
>  static struct xsk_umem_info *
>  xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
>  {
> @@ -420,6 +442,11 @@ xsk_configure_all(struct netdev *netdev)
>      struct netdev_linux *dev = netdev_linux_cast(netdev);
>      struct xsk_socket_info *xsk_info;
>      int i, ifindex, n_rxq, n_txq;
> +    struct bpf_object *obj;
> +    uint32_t prog_id = 0;
> +    int prog_fd = 0;
> +    int map_fd = 0;
> +    int ret;
>
>      ifindex = linux_get_ifindex(netdev_get_name(netdev));
>
> @@ -431,6 +458,34 @@ xsk_configure_all(struct netdev *netdev)
>
>      /* Configure each queue. */
>      for (i = 0; i < n_rxq; i++) {
> +        if (dev->xdpobj) {
> +            if (prog_fd == 0) {
> +                /* XDP program is per-netdev, so all queues share
> +                   the same XDP program. */
> +                ret = xsk_load_prog(dev->xdpobj, &obj, &prog_fd);
> +                if (ret) {
> +                    goto err;
> +                }
> +            }
> +            bpf_set_link_xdp_fd(ifindex, prog_fd, dev->xdpmode);
> +
> +            ret = bpf_get_link_xdp_id(ifindex, &prog_id, 
> dev->xdpmode);
> +            if (ret < 0) {
> +                VLOG_ERR("%s: Cannot get XDP prog id.",
> +                         netdev_get_name(netdev));
> +                goto err;
> +            }
> +            map_fd = bpf_object__find_map_fd_by_name(obj, 
> "xsks_map");
> +            if (map_fd < 0) {
> +                VLOG_ERR("%s: Cannot find \"xsks_map\".",
> +                         netdev_get_name(netdev));
> +                goto err;
> +            }
> +
> +            VLOG_INFO("%s: Load custom XDP program at %s.",
> +                      netdev_get_name(netdev), dev->xdpobj);
> +        }
> +
>          VLOG_DBG("%s: configure queue %d mode %s use-need-wakeup 
> %s.",
>                   netdev_get_name(netdev), i,
>                   dev->xdpmode == XDP_COPY ? "SKB" : "DRV",
> @@ -442,10 +497,13 @@ xsk_configure_all(struct netdev *netdev)
>              dev->xsks[i] = NULL;
>              goto err;
>          }
> +
>          dev->xsks[i] = xsk_info;
>          atomic_init(&xsk_info->tx_dropped, 0);
>          xsk_info->outstanding_tx = 0;
>          xsk_info->available_rx = PROD_NUM_DESCS;
> +        dev->prog_fd = prog_fd;
> +        dev->map_fd = map_fd;
>      }
>
>      n_txq = netdev_n_txq(netdev);
> @@ -510,7 +568,9 @@ xsk_destroy_all(struct netdev *netdev)
>
>      VLOG_INFO("%s: Removing xdp program.", netdev_get_name(netdev));
>      ifindex = linux_get_ifindex(netdev_get_name(netdev));
> -    xsk_remove_xdp_program(ifindex, dev->xdpmode);
> +    xsk_remove_xdp_program(ifindex, dev->xdpmode, dev->prog_fd, 
> dev->map_fd);
> +    dev->prog_fd = 0;
> +    dev->map_fd = 0;
>
>      if (dev->tx_locks) {
>          for (i = 0; i < netdev_n_txq(netdev); i++) {
> @@ -519,6 +579,8 @@ xsk_destroy_all(struct netdev *netdev)
>          free(dev->tx_locks);
>          dev->tx_locks = NULL;
>      }
> +    free(CONST_CAST(char *, dev->xdpobj));
> +    dev->xdpobj = NULL;
>  }
>
>  int
> @@ -527,8 +589,10 @@ netdev_afxdp_set_config(struct netdev *netdev, 
> const struct smap *args,
>  {
>      struct netdev_linux *dev = netdev_linux_cast(netdev);
>      const char *str_xdpmode;
> +    const char *str_xdpobj;
>      int xdpmode, new_n_rxq;
>      bool need_wakeup;
> +    struct stat s;
>
>      ovs_mutex_lock(&dev->mutex);
>      new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
> @@ -545,9 +609,9 @@ netdev_afxdp_set_config(struct netdev *netdev, 
> const struct smap *args,
>      } else if (!strcasecmp(str_xdpmode, "skb")) {
>          xdpmode = XDP_COPY;
>      } else {
> +        ovs_mutex_unlock(&dev->mutex);
>          VLOG_ERR("%s: Incorrect xdpmode (%s).",
>                   netdev_get_name(netdev), str_xdpmode);
> -        ovs_mutex_unlock(&dev->mutex);
>          return EINVAL;
>      }
>
> @@ -559,12 +623,30 @@ netdev_afxdp_set_config(struct netdev *netdev, 
> const struct smap *args,
>      }
>  #endif
>
> +    str_xdpobj = smap_get_def(args, "xdpobj", NULL);
> +    if (str_xdpobj) {
> +        if (!strcmp(str_xdpobj, LIBBPF_XDP_PROGRAM)) {
> +            str_xdpobj = NULL;
> +        } else if (stat(str_xdpobj, &s)) {
> +            ovs_mutex_unlock(&dev->mutex);
> +            VLOG_ERR("Invalid xdpobj '%s': %s.", str_xdpobj,
> +                     ovs_strerror(errno));
> +            return EINVAL;
> +        } else if (!S_ISREG(s.st_mode)) {
> +            ovs_mutex_unlock(&dev->mutex);
> +            VLOG_ERR("xdpobj '%s' is not a regular file.", 
> str_xdpobj);
> +            return EINVAL;
> +        }
> +    }
> +
>      if (dev->requested_n_rxq != new_n_rxq
>          || dev->requested_xdpmode != xdpmode
> -        || dev->requested_need_wakeup != need_wakeup) {
> +        || dev->requested_need_wakeup != need_wakeup
> +        || !nullable_string_is_equal(dev->requested_xdpobj, 
> str_xdpobj)) {
>          dev->requested_n_rxq = new_n_rxq;
>          dev->requested_xdpmode = xdpmode;
>          dev->requested_need_wakeup = need_wakeup;
> +        dev->requested_xdpobj = nullable_xstrdup(str_xdpobj);
>          netdev_request_reconfigure(netdev);
>      }
>      ovs_mutex_unlock(&dev->mutex);
> @@ -582,6 +664,8 @@ netdev_afxdp_get_config(const struct netdev 
> *netdev, struct smap *args)
>                      dev->xdpmode == XDP_ZEROCOPY ? "drv" : "skb");
>      smap_add_format(args, "use-need-wakeup", "%s",
>                      dev->use_need_wakeup ? "true" : "false");
> +    smap_add_format(args, "xdpobj", "%s",
> +                    dev->xdpobj ? dev->xdpobj : LIBBPF_XDP_PROGRAM);
>      ovs_mutex_unlock(&dev->mutex);
>      return 0;
>  }
> @@ -598,7 +682,8 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
>      if (netdev->n_rxq == dev->requested_n_rxq
>          && dev->xdpmode == dev->requested_xdpmode
>          && dev->use_need_wakeup == dev->requested_need_wakeup
> -        && dev->xsks) {
> +        && dev->xsks
> +        && nullable_string_is_equal(dev->xdpobj, 
> dev->requested_xdpobj)) {
>          goto out;
>      }
>
> @@ -616,6 +701,8 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
>      }
>      dev->use_need_wakeup = dev->requested_need_wakeup;
>
> +    dev->xdpobj = dev->requested_xdpobj;
> +
>      err = xsk_configure_all(netdev);
>      if (err) {
>          VLOG_ERR("AF_XDP device %s reconfig failed.", 
> netdev_get_name(netdev));
> @@ -638,9 +725,12 @@ netdev_afxdp_get_numa_id(const struct netdev 
> *netdev)
>  }
>
>  static void
> -xsk_remove_xdp_program(uint32_t ifindex, int xdpmode)
> +xsk_remove_xdp_program(uint32_t ifindex, int xdpmode,
> +                       int prog_fd, int map_fd)
>  {
>      uint32_t flags;
> +    uint32_t prog_id;
> +    int ret;
>
>      flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
>
> @@ -650,7 +740,20 @@ xsk_remove_xdp_program(uint32_t ifindex, int 
> xdpmode)
>          flags |= XDP_FLAGS_DRV_MODE;
>      }
>
> -    bpf_set_link_xdp_fd(ifindex, -1, flags);
> +    if (prog_fd) {
> +        close(prog_fd);
> +    }
> +    if (map_fd) {
> +        close(map_fd);
> +    }
> +
> +    bpf_get_link_xdp_id(ifindex, &prog_id, flags);
> +    ret = bpf_set_link_xdp_fd(ifindex, -1, flags);
> +    if (ret) {
> +        VLOG_ERR("Link set xdp failed: %s\n", ovs_strerror(-ret));
> +    }
> +
> +    VLOG_INFO("Removed program ID: %d, fd: %d", prog_id, prog_fd);
>  }
>
>  void
> @@ -662,7 +765,7 @@ signal_remove_xdp(struct netdev *netdev)
>      ifindex = linux_get_ifindex(netdev_get_name(netdev));
>
>      VLOG_WARN("Force removing xdp program.");
> -    xsk_remove_xdp_program(ifindex, dev->xdpmode);
> +    xsk_remove_xdp_program(ifindex, dev->xdpmode, dev->prog_fd, 
> dev->map_fd);
>  }
>
>  static struct dp_packet_afxdp *
> @@ -1053,10 +1156,12 @@ netdev_afxdp_construct(struct netdev *netdev)
>      netdev->n_rxq = 0;
>      netdev->n_txq = 0;
>      dev->xdpmode = 0;
> +    dev->xdpobj = NULL;
>
>      dev->requested_n_rxq = NR_QUEUE;
>      dev->requested_xdpmode = XDP_COPY;
>      dev->requested_need_wakeup = NEED_WAKEUP_DEFAULT;
> +    dev->requested_xdpobj = NULL;
>
>      dev->xsks = NULL;
>      dev->tx_locks = NULL;
> diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h
> index c14f2fb81bb0..ce258ca6215c 100644
> --- a/lib/netdev-linux-private.h
> +++ b/lib/netdev-linux-private.h
> @@ -105,6 +105,10 @@ struct netdev_linux {
>      bool use_need_wakeup;
>      bool requested_need_wakeup;
>      struct ovs_spin *tx_locks;  /* spin lock array for TX queues. */
> +    const char *xdpobj;         /* XDP object file path. */
> +    const char *requested_xdpobj;
> +    int prog_fd;
> +    int map_fd;
>  #endif
>  };
>
> -- 
> 2.7.4
William Tu Nov. 19, 2019, 6:51 p.m. UTC | #2
Hi Eelco,

Thanks for your testing.

On Tue, Nov 12, 2019 at 11:25:55AM +0100, Eelco Chaudron wrote:
> See one remark below, however when I did a quick test with a program that
> would not load it goes into some re-try loop:
> 
> 2019-11-12T10:13:21.658Z|01609|netdev_afxdp|INFO|eno1: Removing xdp program.
> 2019-11-12T10:13:21.658Z|01610|netdev_afxdp|INFO|Removed program ID: 0, fd:
> 0
> 2019-11-12T10:13:21.658Z|01611|netdev_afxdp|INFO|eno1: Setting XDP mode to
> DRV.
> 2019-11-12T10:13:22.224Z|01612|netdev_afxdp|INFO|eno1: Load custom XDP
> program at /root/xdp_pass_kern.o.
> 2019-11-12T10:13:22.274Z|01613|netdev_afxdp|ERR|xsk_socket__create failed
> (Resource temporarily unavailable) mode: DRV use-need-wakeup: true qid: 0

I couldn't reproduce this issue.
Also looking at xsk_socket__create, I couln't figure out how it returns
EAGAIN (Resource temporarily unavailable) in your case.

> 2019-11-12T10:13:22.300Z|01614|netdev_afxdp|ERR|Failed to create AF_XDP
> socket on queue 0.
> 2019-11-12T10:13:22.300Z|01615|netdev_afxdp|INFO|eno1: Removing xdp program.
> 2019-11-12T10:13:22.658Z|00047|ovs_rcu(urcu2)|WARN|blocked 1000 ms waiting
> for main to quiesce
> 2019-11-12T10:13:22.735Z|01616|netdev_afxdp|INFO|Removed program ID: 320,
> fd: 0
> 2019-11-12T10:13:22.735Z|01617|netdev_afxdp|ERR|AF_XDP device eno1 reconfig
> failed.
> 2019-11-12T10:13:22.735Z|01618|dpif_netdev|ERR|Failed to set interface eno1
> new configuration
> 2019-11-12T10:13:22.735Z|01619|netdev_afxdp|INFO|FIXME: Device tapVM always
> use numa id 0.
> 2019-11-12T10:13:22.735Z|01620|dpif_netdev|INFO|Core 1 on numa node 0
> assigned port 'tapVM' rx queue 0 (measured processing cycles 0).
> 2019-11-12T10:13:22.735Z|01621|dpif|WARN|netdev@ovs-netdev: failed to add
> eno1 as port: Invalid argument
> 2019-11-12T10:13:22.735Z|01622|netdev_afxdp|INFO|eno1: Removing xdp program.
> 2019-11-12T10:13:22.736Z|01623|netdev_afxdp|INFO|Removed program ID: 0, fd:
> 0
> 2019-11-12T10:13:22.736Z|01624|timeval|WARN|Unreasonably long 1079ms poll
> interval (7ms user, 80ms system)
> 2019-11-12T10:13:22.736Z|01625|timeval|WARN|faults: 16387 minor, 0 major
> 2019-11-12T10:13:22.736Z|01626|timeval|WARN|context switches: 327 voluntary,
> 337 involuntary
> 2019-11-12T10:13:22.738Z|01627|netdev_afxdp|INFO|eno1: Removing xdp program.
> 2019-11-12T10:13:22.738Z|01628|netdev_afxdp|INFO|Removed program ID: 0, fd:
> 0
> 2019-11-12T10:13:22.739Z|01629|netdev_afxdp|INFO|eno1: Setting XDP mode to
> DRV.
> 2019-11-12T10:13:23.312Z|01630|netdev_afxdp|INFO|eno1: Load custom XDP
> program at /root/xdp_pass_kern.o.
> 2019-11-12T10:13:23.363Z|01631|netdev_afxdp|ERR|xsk_socket__create failed
> (Resource temporarily unavailable) mode: DRV use-need-wakeup: true qid: 0
> 2019-11-12T10:13:23.392Z|01632|netdev_afxdp|ERR|Failed to create AF_XDP
> socket on queue 0.
> 2019-11-12T10:13:23.392Z|01633|netdev_afxdp|INFO|eno1: Removing xdp program.
> 2019-11-12T10:13:23.738Z|00048|ovs_rcu(urcu2)|WARN|blocked 1000 ms waiting
> for main to quiesce
> 2019-11-12T10:13:23.823Z|01634|netdev_afxdp|INFO|Removed program ID: 324,
> fd: 0
> 2019-11-12T10:13:23.823Z|01635|netdev_afxdp|ERR|AF_XDP device eno1 reconfig
> failed.
> 2019-11-12T10:13:23.823Z|01636|dpif_netdev|ERR|Failed to set interface eno1
> new configuration
> 2019-11-12T10:13:23.823Z|01637|netdev_afxdp|INFO|FIXME: Device tapVM always
> use numa id 0.
> 2019-11-12T10:13:23.823Z|01638|dpif_netdev|INFO|Core 1 on numa node 0
> assigned port 'tapVM' rx queue 0 (measured processing cycles 0).
> 2019-11-12T10:13:23.823Z|01639|dpif|WARN|netdev@ovs-netdev: failed to add
> eno1 as port: Invalid argument
> 
> But in addition during this loop it’s not freeing it’s resources:
> 
> $  bpftool prog list && bpftool map
> 4: xdp  tag 80b55d8a76303785
> 	loaded_at 2019-11-12T05:11:15-0500  uid 0
> 	xlated 136B  jited 96B  memlock 4096B  map_ids 4
> 12: xdp  name xdp_prog_simple  tag 3b185187f1855c4c  gpl
> 	loaded_at 2019-11-12T05:11:58-0500  uid 0
> 	xlated 16B  jited 35B  memlock 4096B
> 16: xdp  name xdp_prog_simple  tag 3b185187f1855c4c  gpl
> 	loaded_at 2019-11-12T05:11:59-0500  uid 0
> 	xlated 16B  jited 35B  memlock 4096B
> 20: xdp  name xdp_prog_simple  tag 3b185187f1855c4c  gpl
> 	loaded_at 2019-11-12T05:12:00-0500  uid 0
> 	xlated 16B  jited 35B  memlock 4096B
> …
> …
> 120: (null)  name xsks_map  flags 0x0
> 	key 4B  value 4B  max_entries 32  memlock 4096B
> 122: (null)  name xsks_map  flags 0x0
> 	key 4B  value 4B  max_entries 32  memlock 4096B
> 124: (null)  name xsks_map  flags 0x0
> 	key 4B  value 4B  max_entries 32  memlock 4096B
> 126: (null)  name xsks_map  flags 0x0
> 	key 4B  value 4B  max_entries 32  memlock 4096B
> 128: (null)  name xsks_map  flags 0x0
> 	key 4B  value 4B  max_entries 32  memlock 4096B
> 130: (null)  name xsks_map  flags 0x0
> 	key 4B  value 4B  max_entries 32  memlock 4096B
> 132: (null)  name xsks_map  flags 0x0
> 	key 4B  value 4B  max_entries 32  memlock 4096B
> 

And base on the above, the XDP program and map are already
loaded, so in xsk_socket__create(), it already reaches the
end:
708     if (!(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) {
709         err = xsk_setup_xdp_prog(xsk);
710         if (err)
711             goto out_mmap_tx;

So I assume something wrong with xsk_setup_xdp_prog()?
I suspect bpf_get_link_xdp_id() returns EAGAIN?

I will keep debugging but if you have time, could you help
finding out why EAGAIN is returned? Thanks!

--William
William Tu Nov. 20, 2019, 8:17 p.m. UTC | #3
On Tue, Nov 19, 2019 at 10:51:01AM -0800, William Tu wrote:
> Hi Eelco,
> 
> Thanks for your testing.
> 
> On Tue, Nov 12, 2019 at 11:25:55AM +0100, Eelco Chaudron wrote:
> > See one remark below, however when I did a quick test with a program that
> > would not load it goes into some re-try loop:
> > 
> > 2019-11-12T10:13:21.658Z|01609|netdev_afxdp|INFO|eno1: Removing xdp program.
> > 2019-11-12T10:13:21.658Z|01610|netdev_afxdp|INFO|Removed program ID: 0, fd:
> > 0
> > 2019-11-12T10:13:21.658Z|01611|netdev_afxdp|INFO|eno1: Setting XDP mode to
> > DRV.
> > 2019-11-12T10:13:22.224Z|01612|netdev_afxdp|INFO|eno1: Load custom XDP
> > program at /root/xdp_pass_kern.o.
> > 2019-11-12T10:13:22.274Z|01613|netdev_afxdp|ERR|xsk_socket__create failed
> > (Resource temporarily unavailable) mode: DRV use-need-wakeup: true qid: 0
> 
> I couldn't reproduce this issue.
> Also looking at xsk_socket__create, I couln't figure out how it returns
> EAGAIN (Resource temporarily unavailable) in your case.
> 
> > 2019-11-12T10:13:22.300Z|01614|netdev_afxdp|ERR|Failed to create AF_XDP
> > socket on queue 0.
> > 2019-11-12T10:13:22.300Z|01615|netdev_afxdp|INFO|eno1: Removing xdp program.
> > 2019-11-12T10:13:22.658Z|00047|ovs_rcu(urcu2)|WARN|blocked 1000 ms waiting
> > for main to quiesce
> > 2019-11-12T10:13:22.735Z|01616|netdev_afxdp|INFO|Removed program ID: 320,
> > fd: 0
> > 2019-11-12T10:13:22.735Z|01617|netdev_afxdp|ERR|AF_XDP device eno1 reconfig
> > failed.
> > 2019-11-12T10:13:22.735Z|01618|dpif_netdev|ERR|Failed to set interface eno1
> > new configuration
> > 2019-11-12T10:13:22.735Z|01619|netdev_afxdp|INFO|FIXME: Device tapVM always
> > use numa id 0.
> > 2019-11-12T10:13:22.735Z|01620|dpif_netdev|INFO|Core 1 on numa node 0
> > assigned port 'tapVM' rx queue 0 (measured processing cycles 0).
> > 2019-11-12T10:13:22.735Z|01621|dpif|WARN|netdev@ovs-netdev: failed to add
> > eno1 as port: Invalid argument
> > 2019-11-12T10:13:22.735Z|01622|netdev_afxdp|INFO|eno1: Removing xdp program.
> > 2019-11-12T10:13:22.736Z|01623|netdev_afxdp|INFO|Removed program ID: 0, fd:
> > 0
> > 2019-11-12T10:13:22.736Z|01624|timeval|WARN|Unreasonably long 1079ms poll
> > interval (7ms user, 80ms system)
> > 2019-11-12T10:13:22.736Z|01625|timeval|WARN|faults: 16387 minor, 0 major
> > 2019-11-12T10:13:22.736Z|01626|timeval|WARN|context switches: 327 voluntary,
> > 337 involuntary
> > 2019-11-12T10:13:22.738Z|01627|netdev_afxdp|INFO|eno1: Removing xdp program.
> > 2019-11-12T10:13:22.738Z|01628|netdev_afxdp|INFO|Removed program ID: 0, fd:
> > 0
> > 2019-11-12T10:13:22.739Z|01629|netdev_afxdp|INFO|eno1: Setting XDP mode to
> > DRV.
> > 2019-11-12T10:13:23.312Z|01630|netdev_afxdp|INFO|eno1: Load custom XDP
> > program at /root/xdp_pass_kern.o.
> > 2019-11-12T10:13:23.363Z|01631|netdev_afxdp|ERR|xsk_socket__create failed
> > (Resource temporarily unavailable) mode: DRV use-need-wakeup: true qid: 0
> > 2019-11-12T10:13:23.392Z|01632|netdev_afxdp|ERR|Failed to create AF_XDP
> > socket on queue 0.
> > 2019-11-12T10:13:23.392Z|01633|netdev_afxdp|INFO|eno1: Removing xdp program.
> > 2019-11-12T10:13:23.738Z|00048|ovs_rcu(urcu2)|WARN|blocked 1000 ms waiting
> > for main to quiesce
> > 2019-11-12T10:13:23.823Z|01634|netdev_afxdp|INFO|Removed program ID: 324,
> > fd: 0
> > 2019-11-12T10:13:23.823Z|01635|netdev_afxdp|ERR|AF_XDP device eno1 reconfig
> > failed.
> > 2019-11-12T10:13:23.823Z|01636|dpif_netdev|ERR|Failed to set interface eno1
> > new configuration
> > 2019-11-12T10:13:23.823Z|01637|netdev_afxdp|INFO|FIXME: Device tapVM always
> > use numa id 0.
> > 2019-11-12T10:13:23.823Z|01638|dpif_netdev|INFO|Core 1 on numa node 0
> > assigned port 'tapVM' rx queue 0 (measured processing cycles 0).
> > 2019-11-12T10:13:23.823Z|01639|dpif|WARN|netdev@ovs-netdev: failed to add
> > eno1 as port: Invalid argument
> > 
> > But in addition during this loop it’s not freeing it’s resources:
> > 
> > $  bpftool prog list && bpftool map
> > 4: xdp  tag 80b55d8a76303785
> > 	loaded_at 2019-11-12T05:11:15-0500  uid 0
> > 	xlated 136B  jited 96B  memlock 4096B  map_ids 4
> > 12: xdp  name xdp_prog_simple  tag 3b185187f1855c4c  gpl
> > 	loaded_at 2019-11-12T05:11:58-0500  uid 0
> > 	xlated 16B  jited 35B  memlock 4096B
> > 16: xdp  name xdp_prog_simple  tag 3b185187f1855c4c  gpl
> > 	loaded_at 2019-11-12T05:11:59-0500  uid 0
> > 	xlated 16B  jited 35B  memlock 4096B
> > 20: xdp  name xdp_prog_simple  tag 3b185187f1855c4c  gpl
> > 	loaded_at 2019-11-12T05:12:00-0500  uid 0
> > 	xlated 16B  jited 35B  memlock 4096B
> > …
> > …
> > 120: (null)  name xsks_map  flags 0x0
> > 	key 4B  value 4B  max_entries 32  memlock 4096B
> > 122: (null)  name xsks_map  flags 0x0
> > 	key 4B  value 4B  max_entries 32  memlock 4096B
> > 124: (null)  name xsks_map  flags 0x0
> > 	key 4B  value 4B  max_entries 32  memlock 4096B
> > 126: (null)  name xsks_map  flags 0x0
> > 	key 4B  value 4B  max_entries 32  memlock 4096B
> > 128: (null)  name xsks_map  flags 0x0
> > 	key 4B  value 4B  max_entries 32  memlock 4096B
> > 130: (null)  name xsks_map  flags 0x0
> > 	key 4B  value 4B  max_entries 32  memlock 4096B
> > 132: (null)  name xsks_map  flags 0x0
> > 	key 4B  value 4B  max_entries 32  memlock 4096B
> > 
> 
> And base on the above, the XDP program and map are already
> loaded, so in xsk_socket__create(), it already reaches the
> end:
> 708     if (!(xsk->config.libbpf_flags & XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) {
> 709         err = xsk_setup_xdp_prog(xsk);
> 710         if (err)
> 711             goto out_mmap_tx;
> 
> So I assume something wrong with xsk_setup_xdp_prog()?
> I suspect bpf_get_link_xdp_id() returns EAGAIN?
> 
> I will keep debugging but if you have time, could you help
> finding out why EAGAIN is returned? Thanks!
> 
> --William

Thanks for your help. I've found the root cause.
At the error handling path, the prog_fd and map_fd is not
saved, so xsk_destroy_all() does not clean them properly.

Fixed it by diff below, I will send out new version.
William

diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
index e079cd33bc9a..e9b8a43bb7c0 100644
--- a/lib/netdev-afxdp.c
+++ b/lib/netdev-afxdp.c
@@ -484,6 +484,8 @@ xsk_configure_all(struct netdev *netdev)
 
             VLOG_INFO("%s: Load custom XDP program at %s.",
                       netdev_get_name(netdev), dev->xdpobj);
+            dev->prog_fd = prog_fd;
+            dev->map_fd = map_fd;
         }
 
         VLOG_DBG("%s: configure queue %d mode %s use-need-wakeup %s.",
@@ -502,8 +504,6 @@ xsk_configure_all(struct netdev *netdev)
         atomic_init(&xsk_info->tx_dropped, 0);
         xsk_info->outstanding_tx = 0;
         xsk_info->available_rx = PROD_NUM_DESCS;
-        dev->prog_fd = prog_fd;
-        dev->map_fd = map_fd;
     }
 
     n_txq = netdev_n_txq(netdev);
Eelco Chaudron Nov. 21, 2019, 10:40 a.m. UTC | #4
On 20 Nov 2019, at 21:17, William Tu wrote:

> On Tue, Nov 19, 2019 at 10:51:01AM -0800, William Tu wrote:
>> Hi Eelco,
>>
>> Thanks for your testing.
>>
>> On Tue, Nov 12, 2019 at 11:25:55AM +0100, Eelco Chaudron wrote:
>>> See one remark below, however when I did a quick test with a program 
>>> that
>>> would not load it goes into some re-try loop:
>>>
>>> 2019-11-12T10:13:21.658Z|01609|netdev_afxdp|INFO|eno1: Removing xdp 
>>> program.
>>> 2019-11-12T10:13:21.658Z|01610|netdev_afxdp|INFO|Removed program ID: 
>>> 0, fd:
>>> 0
>>> 2019-11-12T10:13:21.658Z|01611|netdev_afxdp|INFO|eno1: Setting XDP 
>>> mode to
>>> DRV.
>>> 2019-11-12T10:13:22.224Z|01612|netdev_afxdp|INFO|eno1: Load custom 
>>> XDP
>>> program at /root/xdp_pass_kern.o.
>>> 2019-11-12T10:13:22.274Z|01613|netdev_afxdp|ERR|xsk_socket__create 
>>> failed
>>> (Resource temporarily unavailable) mode: DRV use-need-wakeup: true 
>>> qid: 0
>>
>> I couldn't reproduce this issue.
>> Also looking at xsk_socket__create, I couln't figure out how it 
>> returns
>> EAGAIN (Resource temporarily unavailable) in your case.
>>
>>> 2019-11-12T10:13:22.300Z|01614|netdev_afxdp|ERR|Failed to create 
>>> AF_XDP
>>> socket on queue 0.
>>> 2019-11-12T10:13:22.300Z|01615|netdev_afxdp|INFO|eno1: Removing xdp 
>>> program.
>>> 2019-11-12T10:13:22.658Z|00047|ovs_rcu(urcu2)|WARN|blocked 1000 ms 
>>> waiting
>>> for main to quiesce
>>> 2019-11-12T10:13:22.735Z|01616|netdev_afxdp|INFO|Removed program ID: 
>>> 320,
>>> fd: 0
>>> 2019-11-12T10:13:22.735Z|01617|netdev_afxdp|ERR|AF_XDP device eno1 
>>> reconfig
>>> failed.
>>> 2019-11-12T10:13:22.735Z|01618|dpif_netdev|ERR|Failed to set 
>>> interface eno1
>>> new configuration
>>> 2019-11-12T10:13:22.735Z|01619|netdev_afxdp|INFO|FIXME: Device tapVM 
>>> always
>>> use numa id 0.
>>> 2019-11-12T10:13:22.735Z|01620|dpif_netdev|INFO|Core 1 on numa node 
>>> 0
>>> assigned port 'tapVM' rx queue 0 (measured processing cycles 0).
>>> 2019-11-12T10:13:22.735Z|01621|dpif|WARN|netdev@ovs-netdev: failed 
>>> to add
>>> eno1 as port: Invalid argument
>>> 2019-11-12T10:13:22.735Z|01622|netdev_afxdp|INFO|eno1: Removing xdp 
>>> program.
>>> 2019-11-12T10:13:22.736Z|01623|netdev_afxdp|INFO|Removed program ID: 
>>> 0, fd:
>>> 0
>>> 2019-11-12T10:13:22.736Z|01624|timeval|WARN|Unreasonably long 1079ms 
>>> poll
>>> interval (7ms user, 80ms system)
>>> 2019-11-12T10:13:22.736Z|01625|timeval|WARN|faults: 16387 minor, 0 
>>> major
>>> 2019-11-12T10:13:22.736Z|01626|timeval|WARN|context switches: 327 
>>> voluntary,
>>> 337 involuntary
>>> 2019-11-12T10:13:22.738Z|01627|netdev_afxdp|INFO|eno1: Removing xdp 
>>> program.
>>> 2019-11-12T10:13:22.738Z|01628|netdev_afxdp|INFO|Removed program ID: 
>>> 0, fd:
>>> 0
>>> 2019-11-12T10:13:22.739Z|01629|netdev_afxdp|INFO|eno1: Setting XDP 
>>> mode to
>>> DRV.
>>> 2019-11-12T10:13:23.312Z|01630|netdev_afxdp|INFO|eno1: Load custom 
>>> XDP
>>> program at /root/xdp_pass_kern.o.
>>> 2019-11-12T10:13:23.363Z|01631|netdev_afxdp|ERR|xsk_socket__create 
>>> failed
>>> (Resource temporarily unavailable) mode: DRV use-need-wakeup: true 
>>> qid: 0
>>> 2019-11-12T10:13:23.392Z|01632|netdev_afxdp|ERR|Failed to create 
>>> AF_XDP
>>> socket on queue 0.
>>> 2019-11-12T10:13:23.392Z|01633|netdev_afxdp|INFO|eno1: Removing xdp 
>>> program.
>>> 2019-11-12T10:13:23.738Z|00048|ovs_rcu(urcu2)|WARN|blocked 1000 ms 
>>> waiting
>>> for main to quiesce
>>> 2019-11-12T10:13:23.823Z|01634|netdev_afxdp|INFO|Removed program ID: 
>>> 324,
>>> fd: 0
>>> 2019-11-12T10:13:23.823Z|01635|netdev_afxdp|ERR|AF_XDP device eno1 
>>> reconfig
>>> failed.
>>> 2019-11-12T10:13:23.823Z|01636|dpif_netdev|ERR|Failed to set 
>>> interface eno1
>>> new configuration
>>> 2019-11-12T10:13:23.823Z|01637|netdev_afxdp|INFO|FIXME: Device tapVM 
>>> always
>>> use numa id 0.
>>> 2019-11-12T10:13:23.823Z|01638|dpif_netdev|INFO|Core 1 on numa node 
>>> 0
>>> assigned port 'tapVM' rx queue 0 (measured processing cycles 0).
>>> 2019-11-12T10:13:23.823Z|01639|dpif|WARN|netdev@ovs-netdev: failed 
>>> to add
>>> eno1 as port: Invalid argument
>>>
>>> But in addition during this loop it’s not freeing it’s 
>>> resources:
>>>
>>> $  bpftool prog list && bpftool map
>>> 4: xdp  tag 80b55d8a76303785
>>> 	loaded_at 2019-11-12T05:11:15-0500  uid 0
>>> 	xlated 136B  jited 96B  memlock 4096B  map_ids 4
>>> 12: xdp  name xdp_prog_simple  tag 3b185187f1855c4c  gpl
>>> 	loaded_at 2019-11-12T05:11:58-0500  uid 0
>>> 	xlated 16B  jited 35B  memlock 4096B
>>> 16: xdp  name xdp_prog_simple  tag 3b185187f1855c4c  gpl
>>> 	loaded_at 2019-11-12T05:11:59-0500  uid 0
>>> 	xlated 16B  jited 35B  memlock 4096B
>>> 20: xdp  name xdp_prog_simple  tag 3b185187f1855c4c  gpl
>>> 	loaded_at 2019-11-12T05:12:00-0500  uid 0
>>> 	xlated 16B  jited 35B  memlock 4096B
>>> …
>>> …
>>> 120: (null)  name xsks_map  flags 0x0
>>> 	key 4B  value 4B  max_entries 32  memlock 4096B
>>> 122: (null)  name xsks_map  flags 0x0
>>> 	key 4B  value 4B  max_entries 32  memlock 4096B
>>> 124: (null)  name xsks_map  flags 0x0
>>> 	key 4B  value 4B  max_entries 32  memlock 4096B
>>> 126: (null)  name xsks_map  flags 0x0
>>> 	key 4B  value 4B  max_entries 32  memlock 4096B
>>> 128: (null)  name xsks_map  flags 0x0
>>> 	key 4B  value 4B  max_entries 32  memlock 4096B
>>> 130: (null)  name xsks_map  flags 0x0
>>> 	key 4B  value 4B  max_entries 32  memlock 4096B
>>> 132: (null)  name xsks_map  flags 0x0
>>> 	key 4B  value 4B  max_entries 32  memlock 4096B
>>>
>>
>> And base on the above, the XDP program and map are already
>> loaded, so in xsk_socket__create(), it already reaches the
>> end:
>> 708     if (!(xsk->config.libbpf_flags & 
>> XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD)) {
>> 709         err = xsk_setup_xdp_prog(xsk);
>> 710         if (err)
>> 711             goto out_mmap_tx;
>>
>> So I assume something wrong with xsk_setup_xdp_prog()?
>> I suspect bpf_get_link_xdp_id() returns EAGAIN?
>>
>> I will keep debugging but if you have time, could you help
>> finding out why EAGAIN is returned? Thanks!
>>
>> --William
>
> Thanks for your help. I've found the root cause.
> At the error handling path, the prog_fd and map_fd is not
> saved, so xsk_destroy_all() does not clean them properly.
>
> Fixed it by diff below, I will send out new version.
> William
>
> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
> index e079cd33bc9a..e9b8a43bb7c0 100644
> --- a/lib/netdev-afxdp.c
> +++ b/lib/netdev-afxdp.c
> @@ -484,6 +484,8 @@ xsk_configure_all(struct netdev *netdev)
>
>              VLOG_INFO("%s: Load custom XDP program at %s.",
>                        netdev_get_name(netdev), dev->xdpobj);
> +            dev->prog_fd = prog_fd;
> +            dev->map_fd = map_fd;
>          }
>
>          VLOG_DBG("%s: configure queue %d mode %s use-need-wakeup 
> %s.",
> @@ -502,8 +504,6 @@ xsk_configure_all(struct netdev *netdev)
>          atomic_init(&xsk_info->tx_dropped, 0);
>          xsk_info->outstanding_tx = 0;
>          xsk_info->available_rx = PROD_NUM_DESCS;
> -        dev->prog_fd = prog_fd;
> -        dev->map_fd = map_fd;
>      }
>
>      n_txq = netdev_n_txq(netdev);

Cool, thanks!

//Eelco

Patch
diff mbox series

diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst
index a136db0c950a..d95a85f39035 100644
--- a/Documentation/intro/install/afxdp.rst
+++ b/Documentation/intro/install/afxdp.rst
@@ -273,6 +273,65 @@  Or, use OVS pmd tool::
   ovs-appctl dpif-netdev/pmd-stats-show
 
 
+Loading Custom XDP Program
+--------------------------
+By defailt, netdev-afxdp always forwards all packets to userspace because
+it is using libbpf's default XDP program. There are some cases when users
+want to keep packets in kernel instead of sending to userspace, for example,
+management traffic such as SSH should be processed in kernel. This can be
+done by loading the user-provided XDP program::
+
+  ovs-vsctl -- set int afxdp-p0 options:xdpobj=<path/to/xdp/obj>
+
+So users can implement their filtering logic or traffic steering idea
+in their XDP program, and rest of the traffic passes to AF_XDP socket
+handled by OVS. To set it back to default, use::
+
+  ovs-vsctl -- set int afxdp-p0 options:xdpobj=__default__
+
+Below is a sample C program compiled under kernel's samples/bpf/.
+
+.. code-block:: c
+
+  #include <uapi/linux/bpf.h>
+  #include "bpf_helpers.h"
+
+  #if LINUX_VERSION_CODE < KERNEL_VERSION(5,3,0)
+  /* Kernel version before 5.3 needed an additional map */
+  struct bpf_map_def SEC("maps") qidconf_map = {
+      .type = BPF_MAP_TYPE_ARRAY,
+      .key_size = sizeof(int),
+      .value_size = sizeof(int),
+      .max_entries = 64,
+  };
+  #endif
+
+  /* OVS will associate map 'xsks_map' to xsk socket. */
+  struct bpf_map_def SEC("maps") xsks_map = {
+      .type = BPF_MAP_TYPE_XSKMAP,
+      .key_size = sizeof(int),
+      .value_size = sizeof(int),
+      .max_entries = 32,
+  };
+
+  SEC("xdp_sock")
+  int xdp_sock_prog(struct xdp_md *ctx)
+  {
+      int index = ctx->rx_queue_index;
+
+      /* Customized by user.
+       * For example
+       * 1) filter out all SSH traffic and return XDP_PASS
+       *    for kernel to process.
+       * 2) Drop unwanted packet by returning XDP_DROP.
+       */
+
+      /* Rest of packets goes to AF_XDP. */
+      return bpf_redirect_map(&xsks_map, index, 0);
+  }
+  char _license[] SEC("license") = "GPL";
+
+
 Example Script
 --------------
 
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
index af654d498a88..853eeb8a8dbe 100644
--- a/lib/netdev-afxdp.c
+++ b/lib/netdev-afxdp.c
@@ -21,6 +21,7 @@ 
 #include "netdev-afxdp.h"
 #include "netdev-afxdp-pool.h"
 
+#include <bpf/bpf.h>
 #include <errno.h>
 #include <inttypes.h>
 #include <linux/rtnetlink.h>
@@ -30,6 +31,7 @@ 
 #include <stdlib.h>
 #include <sys/resource.h>
 #include <sys/socket.h>
+#include <sys/stat.h>
 #include <sys/types.h>
 #include <unistd.h>
 
@@ -88,9 +90,12 @@  BUILD_ASSERT_DECL(PROD_NUM_DESCS == CONS_NUM_DESCS);
 
 #define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base))
 
+#define LIBBPF_XDP_PROGRAM "__default__"
+
 static struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id,
                                              int mode, bool use_need_wakeup);
-static void xsk_remove_xdp_program(uint32_t ifindex, int xdpmode);
+static void xsk_remove_xdp_program(uint32_t ifindex, int xdpmode,
+                                   int prog_fd, int map_fd);
 static void xsk_destroy(struct xsk_socket_info *xsk);
 static int xsk_configure_all(struct netdev *netdev);
 static void xsk_destroy_all(struct netdev *netdev);
@@ -213,6 +218,23 @@  netdev_afxdp_sweep_unused_pools(void *aux OVS_UNUSED)
     ovs_mutex_unlock(&unused_pools_mutex);
 }
 
+static int
+xsk_load_prog(const char *path, struct bpf_object **obj,
+              int *prog_fd)
+{
+    struct bpf_prog_load_attr attr = {
+        .prog_type = BPF_PROG_TYPE_XDP,
+        .file = path,
+    };
+
+    if (bpf_prog_load_xattr(&attr, obj, prog_fd)) {
+        VLOG_ERR("Can't load XDP program at '%s'", path);
+        return EINVAL;
+    }
+
+    return 0;
+}
+
 static struct xsk_umem_info *
 xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
 {
@@ -420,6 +442,11 @@  xsk_configure_all(struct netdev *netdev)
     struct netdev_linux *dev = netdev_linux_cast(netdev);
     struct xsk_socket_info *xsk_info;
     int i, ifindex, n_rxq, n_txq;
+    struct bpf_object *obj;
+    uint32_t prog_id = 0;
+    int prog_fd = 0;
+    int map_fd = 0;
+    int ret;
 
     ifindex = linux_get_ifindex(netdev_get_name(netdev));
 
@@ -431,6 +458,34 @@  xsk_configure_all(struct netdev *netdev)
 
     /* Configure each queue. */
     for (i = 0; i < n_rxq; i++) {
+        if (dev->xdpobj) {
+            if (prog_fd == 0) {
+                /* XDP program is per-netdev, so all queues share
+                   the same XDP program. */
+                ret = xsk_load_prog(dev->xdpobj, &obj, &prog_fd);
+                if (ret) {
+                    goto err;
+                }
+            }
+            bpf_set_link_xdp_fd(ifindex, prog_fd, dev->xdpmode);
+
+            ret = bpf_get_link_xdp_id(ifindex, &prog_id, dev->xdpmode);
+            if (ret < 0) {
+                VLOG_ERR("%s: Cannot get XDP prog id.",
+                         netdev_get_name(netdev));
+                goto err;
+            }
+            map_fd = bpf_object__find_map_fd_by_name(obj, "xsks_map");
+            if (map_fd < 0) {
+                VLOG_ERR("%s: Cannot find \"xsks_map\".",
+                         netdev_get_name(netdev));
+                goto err;
+            }
+
+            VLOG_INFO("%s: Load custom XDP program at %s.",
+                      netdev_get_name(netdev), dev->xdpobj);
+        }
+
         VLOG_DBG("%s: configure queue %d mode %s use-need-wakeup %s.",
                  netdev_get_name(netdev), i,
                  dev->xdpmode == XDP_COPY ? "SKB" : "DRV",
@@ -442,10 +497,13 @@  xsk_configure_all(struct netdev *netdev)
             dev->xsks[i] = NULL;
             goto err;
         }
+
         dev->xsks[i] = xsk_info;
         atomic_init(&xsk_info->tx_dropped, 0);
         xsk_info->outstanding_tx = 0;
         xsk_info->available_rx = PROD_NUM_DESCS;
+        dev->prog_fd = prog_fd;
+        dev->map_fd = map_fd;
     }
 
     n_txq = netdev_n_txq(netdev);
@@ -510,7 +568,9 @@  xsk_destroy_all(struct netdev *netdev)
 
     VLOG_INFO("%s: Removing xdp program.", netdev_get_name(netdev));
     ifindex = linux_get_ifindex(netdev_get_name(netdev));
-    xsk_remove_xdp_program(ifindex, dev->xdpmode);
+    xsk_remove_xdp_program(ifindex, dev->xdpmode, dev->prog_fd, dev->map_fd);
+    dev->prog_fd = 0;
+    dev->map_fd = 0;
 
     if (dev->tx_locks) {
         for (i = 0; i < netdev_n_txq(netdev); i++) {
@@ -519,6 +579,8 @@  xsk_destroy_all(struct netdev *netdev)
         free(dev->tx_locks);
         dev->tx_locks = NULL;
     }
+    free(CONST_CAST(char *, dev->xdpobj));
+    dev->xdpobj = NULL;
 }
 
 int
@@ -527,8 +589,10 @@  netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
 {
     struct netdev_linux *dev = netdev_linux_cast(netdev);
     const char *str_xdpmode;
+    const char *str_xdpobj;
     int xdpmode, new_n_rxq;
     bool need_wakeup;
+    struct stat s;
 
     ovs_mutex_lock(&dev->mutex);
     new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
@@ -545,9 +609,9 @@  netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
     } else if (!strcasecmp(str_xdpmode, "skb")) {
         xdpmode = XDP_COPY;
     } else {
+        ovs_mutex_unlock(&dev->mutex);
         VLOG_ERR("%s: Incorrect xdpmode (%s).",
                  netdev_get_name(netdev), str_xdpmode);
-        ovs_mutex_unlock(&dev->mutex);
         return EINVAL;
     }
 
@@ -559,12 +623,30 @@  netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
     }
 #endif
 
+    str_xdpobj = smap_get_def(args, "xdpobj", NULL);
+    if (str_xdpobj) {
+        if (!strcmp(str_xdpobj, LIBBPF_XDP_PROGRAM)) {
+            str_xdpobj = NULL;
+        } else if (stat(str_xdpobj, &s)) {
+            ovs_mutex_unlock(&dev->mutex);
+            VLOG_ERR("Invalid xdpobj '%s': %s.", str_xdpobj,
+                     ovs_strerror(errno));
+            return EINVAL;
+        } else if (!S_ISREG(s.st_mode)) {
+            ovs_mutex_unlock(&dev->mutex);
+            VLOG_ERR("xdpobj '%s' is not a regular file.", str_xdpobj);
+            return EINVAL;
+        }
+    }
+
     if (dev->requested_n_rxq != new_n_rxq
         || dev->requested_xdpmode != xdpmode
-        || dev->requested_need_wakeup != need_wakeup) {
+        || dev->requested_need_wakeup != need_wakeup
+        || !nullable_string_is_equal(dev->requested_xdpobj, str_xdpobj)) {
         dev->requested_n_rxq = new_n_rxq;
         dev->requested_xdpmode = xdpmode;
         dev->requested_need_wakeup = need_wakeup;
+        dev->requested_xdpobj = nullable_xstrdup(str_xdpobj);
         netdev_request_reconfigure(netdev);
     }
     ovs_mutex_unlock(&dev->mutex);
@@ -582,6 +664,8 @@  netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args)
                     dev->xdpmode == XDP_ZEROCOPY ? "drv" : "skb");
     smap_add_format(args, "use-need-wakeup", "%s",
                     dev->use_need_wakeup ? "true" : "false");
+    smap_add_format(args, "xdpobj", "%s",
+                    dev->xdpobj ? dev->xdpobj : LIBBPF_XDP_PROGRAM);
     ovs_mutex_unlock(&dev->mutex);
     return 0;
 }
@@ -598,7 +682,8 @@  netdev_afxdp_reconfigure(struct netdev *netdev)
     if (netdev->n_rxq == dev->requested_n_rxq
         && dev->xdpmode == dev->requested_xdpmode
         && dev->use_need_wakeup == dev->requested_need_wakeup
-        && dev->xsks) {
+        && dev->xsks
+        && nullable_string_is_equal(dev->xdpobj, dev->requested_xdpobj)) {
         goto out;
     }
 
@@ -616,6 +701,8 @@  netdev_afxdp_reconfigure(struct netdev *netdev)
     }
     dev->use_need_wakeup = dev->requested_need_wakeup;
 
+    dev->xdpobj = dev->requested_xdpobj;
+
     err = xsk_configure_all(netdev);
     if (err) {
         VLOG_ERR("AF_XDP device %s reconfig failed.", netdev_get_name(netdev));
@@ -638,9 +725,12 @@  netdev_afxdp_get_numa_id(const struct netdev *netdev)
 }
 
 static void
-xsk_remove_xdp_program(uint32_t ifindex, int xdpmode)
+xsk_remove_xdp_program(uint32_t ifindex, int xdpmode,
+                       int prog_fd, int map_fd)
 {
     uint32_t flags;
+    uint32_t prog_id;
+    int ret;
 
     flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 
@@ -650,7 +740,20 @@  xsk_remove_xdp_program(uint32_t ifindex, int xdpmode)
         flags |= XDP_FLAGS_DRV_MODE;
     }
 
-    bpf_set_link_xdp_fd(ifindex, -1, flags);
+    if (prog_fd) {
+        close(prog_fd);
+    }
+    if (map_fd) {
+        close(map_fd);
+    }
+
+    bpf_get_link_xdp_id(ifindex, &prog_id, flags);
+    ret = bpf_set_link_xdp_fd(ifindex, -1, flags);
+    if (ret) {
+        VLOG_ERR("Link set xdp failed: %s\n", ovs_strerror(-ret));
+    }
+
+    VLOG_INFO("Removed program ID: %d, fd: %d", prog_id, prog_fd);
 }
 
 void
@@ -662,7 +765,7 @@  signal_remove_xdp(struct netdev *netdev)
     ifindex = linux_get_ifindex(netdev_get_name(netdev));
 
     VLOG_WARN("Force removing xdp program.");
-    xsk_remove_xdp_program(ifindex, dev->xdpmode);
+    xsk_remove_xdp_program(ifindex, dev->xdpmode, dev->prog_fd, dev->map_fd);
 }
 
 static struct dp_packet_afxdp *
@@ -1053,10 +1156,12 @@  netdev_afxdp_construct(struct netdev *netdev)
     netdev->n_rxq = 0;
     netdev->n_txq = 0;
     dev->xdpmode = 0;
+    dev->xdpobj = NULL;
 
     dev->requested_n_rxq = NR_QUEUE;
     dev->requested_xdpmode = XDP_COPY;
     dev->requested_need_wakeup = NEED_WAKEUP_DEFAULT;
+    dev->requested_xdpobj = NULL;
 
     dev->xsks = NULL;
     dev->tx_locks = NULL;
diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h
index c14f2fb81bb0..ce258ca6215c 100644
--- a/lib/netdev-linux-private.h
+++ b/lib/netdev-linux-private.h
@@ -105,6 +105,10 @@  struct netdev_linux {
     bool use_need_wakeup;
     bool requested_need_wakeup;
     struct ovs_spin *tx_locks;  /* spin lock array for TX queues. */
+    const char *xdpobj;         /* XDP object file path. */
+    const char *requested_xdpobj;
+    int prog_fd;
+    int map_fd;
 #endif
 };