Message ID | 1579211295-97690-1-git-send-email-u9012063@gmail.com |
---|---|
State | Superseded |
Headers | show |
Series | [ovs-dev,PATCHv6] netdev-afxdp: Enable loading XDP program. | expand |
Hi Eelco and Ilya, Do you think this patch is ok? Thanks William On Thu, Jan 16, 2020 at 1:49 PM William Tu <u9012063@gmail.com> wrote: > > Now netdev-afxdp always forwards all packets to userspace because > it is using libbpf's default XDP program, see 'xsk_load_xdp_prog'. > There are some cases when users want to keep packets in kernel instead > of sending to userspace, for example, management traffic such as SSH > should be processed in kernel. > > The patch enables loading the user-provided XDP program by > $ovs-vsctl -- set int afxdp-p0 options:xdp-obj=<path/to/xdp/obj> > > So users can implement their filtering logic or traffic steering idea > in their XDP program, and rest of the traffic passes to AF_XDP socket > handled by OVS. > > Signed-off-by: William Tu <u9012063@gmail.com> > --- > v6: > - rebase to master > - mostly remains the same as v5, but make sure there is no > leak using bpftool and no repeated loop issued reported from Eelco > here: > https://patchwork.ozlabs.org/patch/1199734/ > which has been fixed at > netdev-afxdp: Avoid removing of XDP program if not loaded. > - travis: https://travis-ci.org/williamtu/ovs-travis/builds/638126505 > > v5: > - rebase to master > Feedbacks from Eelco: > - Remove xdp-obj="__default__" case, to remove xdp-obj, use > ovs-vsctl remove int <dev> options xdp-obj > - Fix problem of xdp program not unloading > verify by bpftool. > - use xdp-obj instead of xdpobj > - Limitation: xdp-obj doesn't work when using best-effort-mode > because best-effort mode tried to probe mode by setting up queue, > and loading xdp-obj requires knwoing mode in advance. > (to support it, we might need to use the > XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD as in v3) > > Testing > - I place two xdp binary here > https://drive.google.com/open?id=1QCCdNE-5CwlKCFV6Upg9mOPnnbVkUwA5 > [xdpsock_pass.o] Working one, which forwards packets to dpif-netdev > [xdpsock_invalid.o] invalid one, which has no map > > v4: > Feedbacks from Eelco. > - First load the program, then configure xsk. > Let API take care of xdp prog and map loading, don't set > XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD. > - When loading custom xdp, need to close(prog_fd) and close(map_fd) > to release the resources > - make sure prog and map is unloaded by bpftool. > - update doc, afxdp.rst > - Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/608986781 > > v3: > Feedbacks from Eelco. > - keep using xdpobj not xdp-obj (because we alread use xdpmode) > or we change both to xdp-obj and xdp-mode? > - log a info message when using external program for better debugging > - combine some failure messages > - update doc > NEW: > - add options:xdpobj=__default__, to set back to libbpf default prog > - Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/606153231 > > v2: > A couple fixes and remove RFC > --- > Documentation/intro/install/afxdp.rst | 59 +++++++++++++++ > NEWS | 2 + > lib/netdev-afxdp.c | 135 ++++++++++++++++++++++++++++++++-- > lib/netdev-linux-private.h | 4 + > 4 files changed, 193 insertions(+), 7 deletions(-) > > diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst > index 15e3c918f942..e72bb3edabe6 100644 > --- a/Documentation/intro/install/afxdp.rst > +++ b/Documentation/intro/install/afxdp.rst > @@ -283,6 +283,65 @@ Or, use OVS pmd tool:: > ovs-appctl dpif-netdev/pmd-stats-show > > > +Loading Custom XDP Program > +-------------------------- > +By defailt, netdev-afxdp always forwards all packets to userspace because > +it is using libbpf's default XDP program. There are some cases when users > +want to keep packets in kernel instead of sending to userspace, for example, > +management traffic such as SSH should be processed in kernel. This can be > +done by loading the user-provided XDP program:: > + > + ovs-vsctl -- set int afxdp-p0 options:xdp-obj=<path/to/xdp/obj> > + > +So users can implement their filtering logic or traffic steering idea > +in their XDP program, and rest of the traffic passes to AF_XDP socket > +handled by OVS. To set it back to default, use:: > + > + ovs-vsctl remove int afxdp-p0 options xdp-obj > + > +Below is a sample C program compiled under kernel's samples/bpf/. > + > +.. code-block:: c > + > + #include <uapi/linux/bpf.h> > + #include "bpf_helpers.h" > + > + #if LINUX_VERSION_CODE < KERNEL_VERSION(5,3,0) > + /* Kernel version before 5.3 needed an additional map */ > + struct bpf_map_def SEC("maps") qidconf_map = { > + .type = BPF_MAP_TYPE_ARRAY, > + .key_size = sizeof(int), > + .value_size = sizeof(int), > + .max_entries = 64, > + }; > + #endif > + > + /* OVS will associate map 'xsks_map' to xsk socket. */ > + struct bpf_map_def SEC("maps") xsks_map = { > + .type = BPF_MAP_TYPE_XSKMAP, > + .key_size = sizeof(int), > + .value_size = sizeof(int), > + .max_entries = 32, > + }; > + > + SEC("xdp_sock") > + int xdp_sock_prog(struct xdp_md *ctx) > + { > + int index = ctx->rx_queue_index; > + > + /* Customized by user. > + * For example > + * 1) filter out all SSH traffic and return XDP_PASS > + * for kernel to process. > + * 2) Drop unwanted packet by returning XDP_DROP. > + */ > + > + /* Rest of packets goes to AF_XDP. */ > + return bpf_redirect_map(&xsks_map, index, 0); > + } > + char _license[] SEC("license") = "GPL"; > + > + > Example Script > -------------- > > diff --git a/NEWS b/NEWS > index e8d662a0c15f..a939262ce09e 100644 > --- a/NEWS > +++ b/NEWS > @@ -19,6 +19,8 @@ Post-v2.12.0 > generic - former SKB > best-effort [default] - new one, chooses the best available from > 3 above modes > + * New option 'xdp-obj' for loading custom XDP program. Default uses > + the libbpf builtin XDP program. > - DPDK: > * DPDK pdump packet capture support disabled by default. New configure > option '--enable-dpdk-pdump' to enable it. > diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c > index 6ac0bc2dde90..421566e36a40 100644 > --- a/lib/netdev-afxdp.c > +++ b/lib/netdev-afxdp.c > @@ -21,6 +21,7 @@ > #include "netdev-afxdp.h" > #include "netdev-afxdp-pool.h" > > +#include <bpf/bpf.h> > #include <errno.h> > #include <inttypes.h> > #include <linux/rtnetlink.h> > @@ -30,6 +31,7 @@ > #include <stdlib.h> > #include <sys/resource.h> > #include <sys/socket.h> > +#include <sys/stat.h> > #include <sys/types.h> > #include <unistd.h> > > @@ -93,7 +95,8 @@ static struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id, > enum afxdp_mode mode, > bool use_need_wakeup, > bool report_socket_failures); > -static void xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode); > +static void xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode, > + int prog_fd, int map_fd); > static void xsk_destroy(struct xsk_socket_info *xsk); > static int xsk_configure_all(struct netdev *netdev); > static void xsk_destroy_all(struct netdev *netdev); > @@ -255,6 +258,23 @@ netdev_afxdp_sweep_unused_pools(void *aux OVS_UNUSED) > ovs_mutex_unlock(&unused_pools_mutex); > } > > +static int > +xsk_load_prog(const char *path, struct bpf_object **obj, > + int *prog_fd) > +{ > + struct bpf_prog_load_attr attr = { > + .prog_type = BPF_PROG_TYPE_XDP, > + .file = path, > + }; > + > + if (bpf_prog_load_xattr(&attr, obj, prog_fd)) { > + VLOG_ERR("Can't load XDP program at '%s'", path); > + return EINVAL; > + } > + > + return 0; > +} > + > static struct xsk_umem_info * > xsk_configure_umem(void *buffer, uint64_t size) > { > @@ -471,6 +491,50 @@ xsk_configure_queue(struct netdev_linux *dev, int ifindex, int queue_id, > return 0; > } > > +static int > +xsk_configure_prog(struct netdev *netdev, int ifindex) > +{ > + struct netdev_linux *dev = netdev_linux_cast(netdev); > + struct bpf_object *obj; > + uint32_t prog_id = 0; > + uint32_t flags; > + int prog_fd = 0; > + int map_fd = 0; > + int mode; > + int ret; > + > + mode = dev->xdp_mode_in_use; > + flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST; > + > + ret = xsk_load_prog(dev->xdp_obj, &obj, &prog_fd); > + if (ret) { > + goto err; > + } > + dev->prog_fd = prog_fd; > + > + bpf_set_link_xdp_fd(ifindex, prog_fd, flags); > + ret = bpf_get_link_xdp_id(ifindex, &prog_id, flags); > + if (ret < 0) { > + VLOG_ERR("%s: Cannot get XDP prog id.", > + netdev_get_name(netdev)); > + goto err; > + } > + > + map_fd = bpf_object__find_map_fd_by_name(obj, "xsks_map"); > + if (map_fd < 0) { > + VLOG_ERR("%s: Cannot find \"xsks_map\".", > + netdev_get_name(netdev)); > + goto err; > + } > + dev->map_fd = map_fd; > + > + VLOG_INFO("%s: Loaded custom XDP program at %s prog_id %d.", > + netdev_get_name(netdev), dev->xdp_obj, prog_id); > + return 0; > + > +err: > + return ret; > +} > > static int > xsk_configure_all(struct netdev *netdev) > @@ -507,6 +571,13 @@ xsk_configure_all(struct netdev *netdev) > qid++; > } else { > dev->xdp_mode_in_use = dev->xdp_mode; > + if (dev->xdp_obj) { > + /* XDP program is per-netdev, so all queues share > + * the same XDP program. */ > + if (xsk_configure_prog(netdev, ifindex)) { > + goto err; > + } > + } > } > > /* Configure remaining queues. */ > @@ -581,7 +652,12 @@ xsk_destroy_all(struct netdev *netdev) > > VLOG_INFO("%s: Removing xdp program.", netdev_get_name(netdev)); > ifindex = linux_get_ifindex(netdev_get_name(netdev)); > - xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use); > + xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use, dev->prog_fd, > + dev->map_fd); > + dev->prog_fd = 0; > + dev->map_fd = 0; > + free(CONST_CAST(char *, dev->xdp_obj)); > + dev->xdp_obj = NULL; > > if (dev->tx_locks) { > for (i = 0; i < netdev_n_txq(netdev); i++) { > @@ -598,9 +674,11 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args, > { > struct netdev_linux *dev = netdev_linux_cast(netdev); > const char *str_xdp_mode; > + const char *str_xdp_obj; > enum afxdp_mode xdp_mode; > bool need_wakeup; > int new_n_rxq; > + struct stat s; > > ovs_mutex_lock(&dev->mutex); > new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1); > @@ -634,12 +712,34 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args, > } > #endif > > + str_xdp_obj = smap_get_def(args, "xdp-obj", NULL); > + if (str_xdp_obj) { > + if (stat(str_xdp_obj, &s)) { > + ovs_mutex_unlock(&dev->mutex); > + VLOG_ERR("Invalid xdp-obj '%s': %s.", str_xdp_obj, > + ovs_strerror(errno)); > + return EINVAL; > + } else if (!S_ISREG(s.st_mode)) { > + ovs_mutex_unlock(&dev->mutex); > + VLOG_ERR("xdp-obj '%s' is not a regular file.", str_xdp_obj); > + return EINVAL; > + } > + } > + > + if (str_xdp_obj && xdp_mode == OVS_AF_XDP_MODE_BEST_EFFORT) { > + ovs_mutex_unlock(&dev->mutex); > + VLOG_ERR("best-effort mode and xdp-obj can't be set together"); > + return EINVAL; > + } > + > if (dev->requested_n_rxq != new_n_rxq > || dev->requested_xdp_mode != xdp_mode > - || dev->requested_need_wakeup != need_wakeup) { > + || dev->requested_need_wakeup != need_wakeup > + || !nullable_string_is_equal(dev->requested_xdp_obj, str_xdp_obj)) { > dev->requested_n_rxq = new_n_rxq; > dev->requested_xdp_mode = xdp_mode; > dev->requested_need_wakeup = need_wakeup; > + dev->requested_xdp_obj = nullable_xstrdup(str_xdp_obj); > netdev_request_reconfigure(netdev); > } > ovs_mutex_unlock(&dev->mutex); > @@ -658,6 +758,8 @@ netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args) > xdp_modes[dev->xdp_mode_in_use].name); > smap_add_format(args, "use-need-wakeup", "%s", > dev->use_need_wakeup ? "true" : "false"); > + smap_add_format(args, "xdp-obj", "%s", > + dev->xdp_obj ? dev->xdp_obj : "builtin"); > ovs_mutex_unlock(&dev->mutex); > return 0; > } > @@ -674,7 +776,8 @@ netdev_afxdp_reconfigure(struct netdev *netdev) > if (netdev->n_rxq == dev->requested_n_rxq > && dev->xdp_mode == dev->requested_xdp_mode > && dev->use_need_wakeup == dev->requested_need_wakeup > - && dev->xsks) { > + && dev->xsks > + && nullable_string_is_equal(dev->xdp_obj, dev->requested_xdp_obj)) { > goto out; > } > > @@ -692,6 +795,8 @@ netdev_afxdp_reconfigure(struct netdev *netdev) > } > dev->use_need_wakeup = dev->requested_need_wakeup; > > + dev->xdp_obj = dev->requested_xdp_obj; > + > err = xsk_configure_all(netdev); > if (err) { > VLOG_ERR("%s: AF_XDP device reconfiguration failed.", > @@ -715,7 +820,8 @@ netdev_afxdp_get_numa_id(const struct netdev *netdev) > } > > static void > -xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode mode) > +xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode mode, > + int prog_fd, int map_fd) > { > uint32_t flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST; > uint32_t ret, prog_id = 0; > @@ -732,7 +838,19 @@ xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode mode) > return; > } > > - bpf_set_link_xdp_fd(ifindex, -1, flags); > + ret = bpf_set_link_xdp_fd(ifindex, -1, flags); > + if (ret) { > + VLOG_ERR("Failed to unload prog ID: %d", prog_id); > + } > + > + if (prog_fd) { > + close(prog_fd); > + } > + if (map_fd) { > + close(map_fd); > + } > + > + VLOG_INFO("Removed XDP program ID: %d", prog_id); > } > > void > @@ -744,7 +862,8 @@ signal_remove_xdp(struct netdev *netdev) > ifindex = linux_get_ifindex(netdev_get_name(netdev)); > > VLOG_WARN("Force removing xdp program."); > - xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use); > + xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use, > + dev->prog_fd, dev->map_fd); > } > > static struct dp_packet_afxdp * > @@ -1158,10 +1277,12 @@ netdev_afxdp_construct(struct netdev *netdev) > netdev->n_txq = 0; > dev->xdp_mode = OVS_AF_XDP_MODE_UNSPEC; > dev->xdp_mode_in_use = OVS_AF_XDP_MODE_UNSPEC; > + dev->xdp_obj = NULL; > > dev->requested_n_rxq = NR_QUEUE; > dev->requested_xdp_mode = OVS_AF_XDP_MODE_BEST_EFFORT; > dev->requested_need_wakeup = NEED_WAKEUP_DEFAULT; > + dev->requested_xdp_obj = NULL; > > dev->xsks = NULL; > dev->tx_locks = NULL; > diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h > index f08159aa7b53..190927cec098 100644 > --- a/lib/netdev-linux-private.h > +++ b/lib/netdev-linux-private.h > @@ -109,6 +109,10 @@ struct netdev_linux { > bool requested_need_wakeup; > > struct netdev_afxdp_tx_lock *tx_locks; /* Array of locks for TX queues. */ > + const char *xdp_obj; /* XDP object file path. */ > + const char *requested_xdp_obj; > + int prog_fd; > + int map_fd; > #endif > }; > > -- > 2.7.4 >
On 27 Jan 2020, at 6:04, William Tu wrote: > Hi Eelco and Ilya, > Do you think this patch is ok? I’ve been out last week and was working on other stuff earlier. I will try to review it next week unless Ilya gets to it earlier. Cheers, Eelco > Thanks > William > > On Thu, Jan 16, 2020 at 1:49 PM William Tu <u9012063@gmail.com> wrote: >> >> Now netdev-afxdp always forwards all packets to userspace because >> it is using libbpf's default XDP program, see 'xsk_load_xdp_prog'. >> There are some cases when users want to keep packets in kernel >> instead >> of sending to userspace, for example, management traffic such as SSH >> should be processed in kernel. >> >> The patch enables loading the user-provided XDP program by >> $ovs-vsctl -- set int afxdp-p0 options:xdp-obj=<path/to/xdp/obj> >> >> So users can implement their filtering logic or traffic steering idea >> in their XDP program, and rest of the traffic passes to AF_XDP socket >> handled by OVS. >> >> Signed-off-by: William Tu <u9012063@gmail.com> >> --- >> v6: >> - rebase to master >> - mostly remains the same as v5, but make sure there is no >> leak using bpftool and no repeated loop issued reported from >> Eelco >> here: >> https://patchwork.ozlabs.org/patch/1199734/ >> which has been fixed at >> netdev-afxdp: Avoid removing of XDP program if not loaded. >> - travis: >> https://travis-ci.org/williamtu/ovs-travis/builds/638126505 >> >> v5: >> - rebase to master >> Feedbacks from Eelco: >> - Remove xdp-obj="__default__" case, to remove xdp-obj, use >> ovs-vsctl remove int <dev> options xdp-obj >> - Fix problem of xdp program not unloading >> verify by bpftool. >> - use xdp-obj instead of xdpobj >> - Limitation: xdp-obj doesn't work when using best-effort-mode >> because best-effort mode tried to probe mode by setting up >> queue, >> and loading xdp-obj requires knwoing mode in advance. >> (to support it, we might need to use the >> XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD as in v3) >> >> Testing >> - I place two xdp binary here >> https://drive.google.com/open?id=1QCCdNE-5CwlKCFV6Upg9mOPnnbVkUwA5 >> [xdpsock_pass.o] Working one, which forwards packets to >> dpif-netdev >> [xdpsock_invalid.o] invalid one, which has no map >> >> v4: >> Feedbacks from Eelco. >> - First load the program, then configure xsk. >> Let API take care of xdp prog and map loading, don't set >> XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD. >> - When loading custom xdp, need to close(prog_fd) and >> close(map_fd) >> to release the resources >> - make sure prog and map is unloaded by bpftool. >> - update doc, afxdp.rst >> - Tested-at: >> https://travis-ci.org/williamtu/ovs-travis/builds/608986781 >> >> v3: >> Feedbacks from Eelco. >> - keep using xdpobj not xdp-obj (because we alread use xdpmode) >> or we change both to xdp-obj and xdp-mode? >> - log a info message when using external program for better >> debugging >> - combine some failure messages >> - update doc >> NEW: >> - add options:xdpobj=__default__, to set back to libbpf default >> prog >> - Tested-at: >> https://travis-ci.org/williamtu/ovs-travis/builds/606153231 >> >> v2: >> A couple fixes and remove RFC >> --- >> Documentation/intro/install/afxdp.rst | 59 +++++++++++++++ >> NEWS | 2 + >> lib/netdev-afxdp.c | 135 >> ++++++++++++++++++++++++++++++++-- >> lib/netdev-linux-private.h | 4 + >> 4 files changed, 193 insertions(+), 7 deletions(-) >> >> diff --git a/Documentation/intro/install/afxdp.rst >> b/Documentation/intro/install/afxdp.rst >> index 15e3c918f942..e72bb3edabe6 100644 >> --- a/Documentation/intro/install/afxdp.rst >> +++ b/Documentation/intro/install/afxdp.rst >> @@ -283,6 +283,65 @@ Or, use OVS pmd tool:: >> ovs-appctl dpif-netdev/pmd-stats-show >> >> >> +Loading Custom XDP Program >> +-------------------------- >> +By defailt, netdev-afxdp always forwards all packets to userspace >> because >> +it is using libbpf's default XDP program. There are some cases when >> users >> +want to keep packets in kernel instead of sending to userspace, for >> example, >> +management traffic such as SSH should be processed in kernel. This >> can be >> +done by loading the user-provided XDP program:: >> + >> + ovs-vsctl -- set int afxdp-p0 options:xdp-obj=<path/to/xdp/obj> >> + >> +So users can implement their filtering logic or traffic steering >> idea >> +in their XDP program, and rest of the traffic passes to AF_XDP >> socket >> +handled by OVS. To set it back to default, use:: >> + >> + ovs-vsctl remove int afxdp-p0 options xdp-obj >> + >> +Below is a sample C program compiled under kernel's samples/bpf/. >> + >> +.. code-block:: c >> + >> + #include <uapi/linux/bpf.h> >> + #include "bpf_helpers.h" >> + >> + #if LINUX_VERSION_CODE < KERNEL_VERSION(5,3,0) >> + /* Kernel version before 5.3 needed an additional map */ >> + struct bpf_map_def SEC("maps") qidconf_map = { >> + .type = BPF_MAP_TYPE_ARRAY, >> + .key_size = sizeof(int), >> + .value_size = sizeof(int), >> + .max_entries = 64, >> + }; >> + #endif >> + >> + /* OVS will associate map 'xsks_map' to xsk socket. */ >> + struct bpf_map_def SEC("maps") xsks_map = { >> + .type = BPF_MAP_TYPE_XSKMAP, >> + .key_size = sizeof(int), >> + .value_size = sizeof(int), >> + .max_entries = 32, >> + }; >> + >> + SEC("xdp_sock") >> + int xdp_sock_prog(struct xdp_md *ctx) >> + { >> + int index = ctx->rx_queue_index; >> + >> + /* Customized by user. >> + * For example >> + * 1) filter out all SSH traffic and return XDP_PASS >> + * for kernel to process. >> + * 2) Drop unwanted packet by returning XDP_DROP. >> + */ >> + >> + /* Rest of packets goes to AF_XDP. */ >> + return bpf_redirect_map(&xsks_map, index, 0); >> + } >> + char _license[] SEC("license") = "GPL"; >> + >> + >> Example Script >> -------------- >> >> diff --git a/NEWS b/NEWS >> index e8d662a0c15f..a939262ce09e 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -19,6 +19,8 @@ Post-v2.12.0 >> generic - former SKB >> best-effort [default] - new one, chooses the best available >> from >> 3 above modes >> + * New option 'xdp-obj' for loading custom XDP program. Default >> uses >> + the libbpf builtin XDP program. >> - DPDK: >> * DPDK pdump packet capture support disabled by default. New >> configure >> option '--enable-dpdk-pdump' to enable it. >> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c >> index 6ac0bc2dde90..421566e36a40 100644 >> --- a/lib/netdev-afxdp.c >> +++ b/lib/netdev-afxdp.c >> @@ -21,6 +21,7 @@ >> #include "netdev-afxdp.h" >> #include "netdev-afxdp-pool.h" >> >> +#include <bpf/bpf.h> >> #include <errno.h> >> #include <inttypes.h> >> #include <linux/rtnetlink.h> >> @@ -30,6 +31,7 @@ >> #include <stdlib.h> >> #include <sys/resource.h> >> #include <sys/socket.h> >> +#include <sys/stat.h> >> #include <sys/types.h> >> #include <unistd.h> >> >> @@ -93,7 +95,8 @@ static struct xsk_socket_info *xsk_configure(int >> ifindex, int xdp_queue_id, >> enum afxdp_mode mode, >> bool use_need_wakeup, >> bool >> report_socket_failures); >> -static void xsk_remove_xdp_program(uint32_t ifindex, enum >> afxdp_mode); >> +static void xsk_remove_xdp_program(uint32_t ifindex, enum >> afxdp_mode, >> + int prog_fd, int map_fd); >> static void xsk_destroy(struct xsk_socket_info *xsk); >> static int xsk_configure_all(struct netdev *netdev); >> static void xsk_destroy_all(struct netdev *netdev); >> @@ -255,6 +258,23 @@ netdev_afxdp_sweep_unused_pools(void *aux >> OVS_UNUSED) >> ovs_mutex_unlock(&unused_pools_mutex); >> } >> >> +static int >> +xsk_load_prog(const char *path, struct bpf_object **obj, >> + int *prog_fd) >> +{ >> + struct bpf_prog_load_attr attr = { >> + .prog_type = BPF_PROG_TYPE_XDP, >> + .file = path, >> + }; >> + >> + if (bpf_prog_load_xattr(&attr, obj, prog_fd)) { >> + VLOG_ERR("Can't load XDP program at '%s'", path); >> + return EINVAL; >> + } >> + >> + return 0; >> +} >> + >> static struct xsk_umem_info * >> xsk_configure_umem(void *buffer, uint64_t size) >> { >> @@ -471,6 +491,50 @@ xsk_configure_queue(struct netdev_linux *dev, >> int ifindex, int queue_id, >> return 0; >> } >> >> +static int >> +xsk_configure_prog(struct netdev *netdev, int ifindex) >> +{ >> + struct netdev_linux *dev = netdev_linux_cast(netdev); >> + struct bpf_object *obj; >> + uint32_t prog_id = 0; >> + uint32_t flags; >> + int prog_fd = 0; >> + int map_fd = 0; >> + int mode; >> + int ret; >> + >> + mode = dev->xdp_mode_in_use; >> + flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST; >> + >> + ret = xsk_load_prog(dev->xdp_obj, &obj, &prog_fd); >> + if (ret) { >> + goto err; >> + } >> + dev->prog_fd = prog_fd; >> + >> + bpf_set_link_xdp_fd(ifindex, prog_fd, flags); >> + ret = bpf_get_link_xdp_id(ifindex, &prog_id, flags); >> + if (ret < 0) { >> + VLOG_ERR("%s: Cannot get XDP prog id.", >> + netdev_get_name(netdev)); >> + goto err; >> + } >> + >> + map_fd = bpf_object__find_map_fd_by_name(obj, "xsks_map"); >> + if (map_fd < 0) { >> + VLOG_ERR("%s: Cannot find \"xsks_map\".", >> + netdev_get_name(netdev)); >> + goto err; >> + } >> + dev->map_fd = map_fd; >> + >> + VLOG_INFO("%s: Loaded custom XDP program at %s prog_id %d.", >> + netdev_get_name(netdev), dev->xdp_obj, prog_id); >> + return 0; >> + >> +err: >> + return ret; >> +} >> >> static int >> xsk_configure_all(struct netdev *netdev) >> @@ -507,6 +571,13 @@ xsk_configure_all(struct netdev *netdev) >> qid++; >> } else { >> dev->xdp_mode_in_use = dev->xdp_mode; >> + if (dev->xdp_obj) { >> + /* XDP program is per-netdev, so all queues share >> + * the same XDP program. */ >> + if (xsk_configure_prog(netdev, ifindex)) { >> + goto err; >> + } >> + } >> } >> >> /* Configure remaining queues. */ >> @@ -581,7 +652,12 @@ xsk_destroy_all(struct netdev *netdev) >> >> VLOG_INFO("%s: Removing xdp program.", netdev_get_name(netdev)); >> ifindex = linux_get_ifindex(netdev_get_name(netdev)); >> - xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use); >> + xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use, >> dev->prog_fd, >> + dev->map_fd); >> + dev->prog_fd = 0; >> + dev->map_fd = 0; >> + free(CONST_CAST(char *, dev->xdp_obj)); >> + dev->xdp_obj = NULL; >> >> if (dev->tx_locks) { >> for (i = 0; i < netdev_n_txq(netdev); i++) { >> @@ -598,9 +674,11 @@ netdev_afxdp_set_config(struct netdev *netdev, >> const struct smap *args, >> { >> struct netdev_linux *dev = netdev_linux_cast(netdev); >> const char *str_xdp_mode; >> + const char *str_xdp_obj; >> enum afxdp_mode xdp_mode; >> bool need_wakeup; >> int new_n_rxq; >> + struct stat s; >> >> ovs_mutex_lock(&dev->mutex); >> new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1); >> @@ -634,12 +712,34 @@ netdev_afxdp_set_config(struct netdev *netdev, >> const struct smap *args, >> } >> #endif >> >> + str_xdp_obj = smap_get_def(args, "xdp-obj", NULL); >> + if (str_xdp_obj) { >> + if (stat(str_xdp_obj, &s)) { >> + ovs_mutex_unlock(&dev->mutex); >> + VLOG_ERR("Invalid xdp-obj '%s': %s.", str_xdp_obj, >> + ovs_strerror(errno)); >> + return EINVAL; >> + } else if (!S_ISREG(s.st_mode)) { >> + ovs_mutex_unlock(&dev->mutex); >> + VLOG_ERR("xdp-obj '%s' is not a regular file.", >> str_xdp_obj); >> + return EINVAL; >> + } >> + } >> + >> + if (str_xdp_obj && xdp_mode == OVS_AF_XDP_MODE_BEST_EFFORT) { >> + ovs_mutex_unlock(&dev->mutex); >> + VLOG_ERR("best-effort mode and xdp-obj can't be set >> together"); >> + return EINVAL; >> + } >> + >> if (dev->requested_n_rxq != new_n_rxq >> || dev->requested_xdp_mode != xdp_mode >> - || dev->requested_need_wakeup != need_wakeup) { >> + || dev->requested_need_wakeup != need_wakeup >> + || !nullable_string_is_equal(dev->requested_xdp_obj, >> str_xdp_obj)) { >> dev->requested_n_rxq = new_n_rxq; >> dev->requested_xdp_mode = xdp_mode; >> dev->requested_need_wakeup = need_wakeup; >> + dev->requested_xdp_obj = nullable_xstrdup(str_xdp_obj); >> netdev_request_reconfigure(netdev); >> } >> ovs_mutex_unlock(&dev->mutex); >> @@ -658,6 +758,8 @@ netdev_afxdp_get_config(const struct netdev >> *netdev, struct smap *args) >> xdp_modes[dev->xdp_mode_in_use].name); >> smap_add_format(args, "use-need-wakeup", "%s", >> dev->use_need_wakeup ? "true" : "false"); >> + smap_add_format(args, "xdp-obj", "%s", >> + dev->xdp_obj ? dev->xdp_obj : "builtin"); >> ovs_mutex_unlock(&dev->mutex); >> return 0; >> } >> @@ -674,7 +776,8 @@ netdev_afxdp_reconfigure(struct netdev *netdev) >> if (netdev->n_rxq == dev->requested_n_rxq >> && dev->xdp_mode == dev->requested_xdp_mode >> && dev->use_need_wakeup == dev->requested_need_wakeup >> - && dev->xsks) { >> + && dev->xsks >> + && nullable_string_is_equal(dev->xdp_obj, >> dev->requested_xdp_obj)) { >> goto out; >> } >> >> @@ -692,6 +795,8 @@ netdev_afxdp_reconfigure(struct netdev *netdev) >> } >> dev->use_need_wakeup = dev->requested_need_wakeup; >> >> + dev->xdp_obj = dev->requested_xdp_obj; >> + >> err = xsk_configure_all(netdev); >> if (err) { >> VLOG_ERR("%s: AF_XDP device reconfiguration failed.", >> @@ -715,7 +820,8 @@ netdev_afxdp_get_numa_id(const struct netdev >> *netdev) >> } >> >> static void >> -xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode mode) >> +xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode mode, >> + int prog_fd, int map_fd) >> { >> uint32_t flags = xdp_modes[mode].xdp_flags | >> XDP_FLAGS_UPDATE_IF_NOEXIST; >> uint32_t ret, prog_id = 0; >> @@ -732,7 +838,19 @@ xsk_remove_xdp_program(uint32_t ifindex, enum >> afxdp_mode mode) >> return; >> } >> >> - bpf_set_link_xdp_fd(ifindex, -1, flags); >> + ret = bpf_set_link_xdp_fd(ifindex, -1, flags); >> + if (ret) { >> + VLOG_ERR("Failed to unload prog ID: %d", prog_id); >> + } >> + >> + if (prog_fd) { >> + close(prog_fd); >> + } >> + if (map_fd) { >> + close(map_fd); >> + } >> + >> + VLOG_INFO("Removed XDP program ID: %d", prog_id); >> } >> >> void >> @@ -744,7 +862,8 @@ signal_remove_xdp(struct netdev *netdev) >> ifindex = linux_get_ifindex(netdev_get_name(netdev)); >> >> VLOG_WARN("Force removing xdp program."); >> - xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use); >> + xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use, >> + dev->prog_fd, dev->map_fd); >> } >> >> static struct dp_packet_afxdp * >> @@ -1158,10 +1277,12 @@ netdev_afxdp_construct(struct netdev *netdev) >> netdev->n_txq = 0; >> dev->xdp_mode = OVS_AF_XDP_MODE_UNSPEC; >> dev->xdp_mode_in_use = OVS_AF_XDP_MODE_UNSPEC; >> + dev->xdp_obj = NULL; >> >> dev->requested_n_rxq = NR_QUEUE; >> dev->requested_xdp_mode = OVS_AF_XDP_MODE_BEST_EFFORT; >> dev->requested_need_wakeup = NEED_WAKEUP_DEFAULT; >> + dev->requested_xdp_obj = NULL; >> >> dev->xsks = NULL; >> dev->tx_locks = NULL; >> diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h >> index f08159aa7b53..190927cec098 100644 >> --- a/lib/netdev-linux-private.h >> +++ b/lib/netdev-linux-private.h >> @@ -109,6 +109,10 @@ struct netdev_linux { >> bool requested_need_wakeup; >> >> struct netdev_afxdp_tx_lock *tx_locks; /* Array of locks for TX >> queues. */ >> + const char *xdp_obj; /* XDP object file path. */ >> + const char *requested_xdp_obj; >> + int prog_fd; >> + int map_fd; >> #endif >> }; >> >> -- >> 2.7.4 >>
Hi William, Applied you patch to the latest OVS master, and I was able to make it crash :( 2020-02-05T11:06:53.940Z|00159|bridge|WARN|could not open network device tapVM (No such device) 2020-02-05T11:06:53.964Z|00160|netdev_afxdp|INFO|eno1: Removing xdp program. 2020-02-05T11:06:54.281Z|00161|netdev_afxdp|INFO|Removed XDP program ID: 6 2020-02-05T11:06:54.281Z|00162|netdev_afxdp|INFO|eno1: Setting XDP mode to native-with-zerocopy. 2020-02-05T11:06:54.968Z|00163|netdev_afxdp|INFO|eno1: Loaded custom XDP program at /root/xdp_pass_kern.o prog_id 10. 2020-02-05T11:06:55.014Z|00164|netdev_afxdp|ERR|xsk_socket__create failed (Resource temporarily unavailable) mode: native-with-zerocopy, use-need-wakeup: true, qid: 0 2020-02-05T11:06:55.030Z|00165|netdev_afxdp|ERR|eno1: Failed to create AF_XDP socket on queue 0 in native-with-zerocopy mode. 2020-02-05T11:06:55.030Z|00166|netdev_afxdp|ERR|eno1: Failed to create AF_XDP socket on queue 0. 2020-02-05T11:06:55.030Z|00167|netdev_afxdp|INFO|eno1: Removing xdp program. 2020-02-05T11:06:55.600Z|00168|netdev_afxdp|INFO|Removed XDP program ID: 10 2020-02-05T11:06:55.600Z|00169|netdev_afxdp|ERR|eno1: AF_XDP device reconfiguration failed. 2020-02-05T11:06:55.600Z|00170|dpif_netdev|ERR|Failed to set interface eno1 new configuration 2020-02-05T11:06:56.163Z|00001|fatal_signal(pmd-c01/id:7)|WARN|terminating with signal 11 (Segmentation fault) This happened after changing the config as follows (not xdp_pass_kern.o is an invalid program for AF_XDP pass): $ ovs-vsctl -- set int eno1 options:xdp-mode=native-with-zerocopy $ ovs-vsctl -- set int eno1 options:xdp-obj=/root/xdp_pass_kern.o After the crash (or when it’s not crashing) it goes into some reconfigurations loop: 2020-02-05T11:06:59.555Z|00001|vlog|INFO|opened log file /var/log/openvswitch/ovs-vswitchd.log 2020-02-05T11:06:59.560Z|00002|ovs_numa|INFO|Discovered 28 CPU cores on NUMA node 0 2020-02-05T11:06:59.560Z|00003|ovs_numa|INFO|Discovered 1 NUMA nodes and 28 CPU cores 2020-02-05T11:06:59.560Z|00004|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2020-02-05T11:06:59.560Z|00005|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2020-02-05T11:06:59.562Z|00006|dpdk|INFO|Using DPDK 19.11.0 2020-02-05T11:06:59.562Z|00007|dpdk|INFO|DPDK Enabled - initializing... 2020-02-05T11:06:59.562Z|00008|dpdk|INFO|No vhost-sock-dir provided - defaulting to /var/run/openvswitch 2020-02-05T11:06:59.562Z|00009|dpdk|INFO|IOMMU support for vhost-user-client disabled. 2020-02-05T11:06:59.562Z|00010|dpdk|INFO|POSTCOPY support for vhost-user-client disabled. 2020-02-05T11:06:59.562Z|00011|dpdk|INFO|Per port memory for DPDK devices disabled. 2020-02-05T11:06:59.562Z|00012|dpdk|INFO|EAL ARGS: ovs-vswitchd -c 0x00010004 --socket-mem 1024 --socket-limit 1024. 2020-02-05T11:06:59.565Z|00013|dpdk|INFO|EAL: Detected 28 lcore(s) 2020-02-05T11:06:59.565Z|00014|dpdk|INFO|EAL: Detected 1 NUMA nodes 2020-02-05T11:06:59.566Z|00015|dpdk|INFO|EAL: Multi-process socket /var/run/dpdk/rte/mp_socket 2020-02-05T11:06:59.579Z|00016|dpdk|INFO|EAL: Selected IOVA mode 'VA' 2020-02-05T11:06:59.579Z|00017|dpdk|INFO|EAL: Probing VFIO support... 2020-02-05T11:07:09.346Z|00018|dpdk|INFO|EAL: PCI device 0000:01:00.0 on NUMA socket 0 2020-02-05T11:07:09.346Z|00019|dpdk|INFO|EAL: probe driver: 8086:10fb net_ixgbe 2020-02-05T11:07:09.346Z|00020|dpdk|INFO|EAL: PCI device 0000:01:00.1 on NUMA socket 0 2020-02-05T11:07:09.346Z|00021|dpdk|INFO|EAL: probe driver: 8086:10fb net_ixgbe 2020-02-05T11:07:09.346Z|00022|dpdk|INFO|EAL: PCI device 0000:07:00.0 on NUMA socket 0 2020-02-05T11:07:09.346Z|00023|dpdk|INFO|EAL: probe driver: 8086:1521 net_e1000_igb 2020-02-05T11:07:09.346Z|00024|dpdk|INFO|EAL: PCI device 0000:07:00.1 on NUMA socket 0 2020-02-05T11:07:09.346Z|00025|dpdk|INFO|EAL: probe driver: 8086:1521 net_e1000_igb 2020-02-05T11:07:09.346Z|00026|dpdk|INFO|DPDK Enabled - initialized 2020-02-05T11:07:09.350Z|00027|pmd_perf|INFO|DPDK provided TSC frequency: 2600000 KHz 2020-02-05T11:07:09.355Z|00028|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports recirculation 2020-02-05T11:07:09.355Z|00029|ofproto_dpif|INFO|netdev@ovs-netdev: VLAN header stack length probed as 1 2020-02-05T11:07:09.355Z|00030|ofproto_dpif|INFO|netdev@ovs-netdev: MPLS label stack length probed as 3 2020-02-05T11:07:09.355Z|00031|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports truncate action 2020-02-05T11:07:09.355Z|00032|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports unique flow ids 2020-02-05T11:07:09.355Z|00033|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports clone action 2020-02-05T11:07:09.355Z|00034|ofproto_dpif|INFO|netdev@ovs-netdev: Max sample nesting level probed as 10 2020-02-05T11:07:09.355Z|00035|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports eventmask in conntrack action 2020-02-05T11:07:09.355Z|00036|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_clear action 2020-02-05T11:07:09.355Z|00037|ofproto_dpif|INFO|netdev@ovs-netdev: Max dp_hash algorithm probed to be 1 2020-02-05T11:07:09.355Z|00038|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports check_pkt_len action 2020-02-05T11:07:09.355Z|00039|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports timeout policy in conntrack action 2020-02-05T11:07:09.355Z|00040|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_state 2020-02-05T11:07:09.355Z|00041|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_zone 2020-02-05T11:07:09.355Z|00042|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_mark 2020-02-05T11:07:09.355Z|00043|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_label 2020-02-05T11:07:09.355Z|00044|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_state_nat 2020-02-05T11:07:09.355Z|00045|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_orig_tuple 2020-02-05T11:07:09.355Z|00046|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports ct_orig_tuple6 2020-02-05T11:07:09.355Z|00047|ofproto_dpif|INFO|netdev@ovs-netdev: Datapath supports IPv6 ND Extensions 2020-02-05T11:07:09.360Z|00048|dpif_netdev|INFO|PMD thread on numa_id: 0, core id: 1 created. 2020-02-05T11:07:09.360Z|00049|dpif_netdev|INFO|There are 1 pmd threads on numa node 0 2020-02-05T11:07:09.360Z|00050|netdev_afxdp|INFO|eno1: Removing xdp program. 2020-02-05T11:07:09.360Z|00051|netdev_afxdp|INFO|No XDP program is loaded at ifindex 4 2020-02-05T11:07:09.360Z|00052|netdev_afxdp|INFO|eno1: Setting XDP mode to native-with-zerocopy. 2020-02-05T11:07:09.729Z|00053|netdev_afxdp|INFO|eno1: Loaded custom XDP program at /root/xdp_pass_kern.o prog_id 14. 2020-02-05T11:07:09.781Z|00054|netdev_afxdp|ERR|xsk_socket__create failed (Resource temporarily unavailable) mode: native-with-zerocopy, use-need-wakeup: true, qid: 0 2020-02-05T11:07:09.798Z|00055|netdev_afxdp|ERR|eno1: Failed to create AF_XDP socket on queue 0 in native-with-zerocopy mode. 2020-02-05T11:07:09.798Z|00056|netdev_afxdp|ERR|eno1: Failed to create AF_XDP socket on queue 0. 2020-02-05T11:07:09.798Z|00057|netdev_afxdp|INFO|eno1: Removing xdp program. 2020-02-05T11:07:10.361Z|00058|netdev_afxdp|INFO|Removed XDP program ID: 14 2020-02-05T11:07:10.361Z|00059|netdev_afxdp|ERR|eno1: AF_XDP device reconfiguration failed. 2020-02-05T11:07:10.361Z|00060|dpif_netdev|ERR|Failed to set interface eno1 new configuration 2020-02-05T11:07:10.361Z|00061|dpif|WARN|netdev@ovs-netdev: failed to add eno1 as port: Invalid argument 2020-02-05T11:07:10.361Z|00062|bridge|WARN|could not add network device eno1 to ofproto (Invalid argument) 2020-02-05T11:07:10.361Z|00063|netdev_afxdp|INFO|eno1: Removing xdp program. 2020-02-05T11:07:10.361Z|00064|netdev_afxdp|INFO|No XDP program is loaded at ifindex 4 2020-02-05T11:07:10.364Z|00065|bridge|WARN|could not open network device tapVM (No such device) 2020-02-05T11:07:10.364Z|00066|dpif_netdev|INFO|PMD thread on numa_id: 0, core id: 1 destroyed. 2020-02-05T11:07:10.365Z|00067|bridge|INFO|bridge ovs_pvp_br0: added interface ovs_pvp_br0 on port 65534 2020-02-05T11:07:10.365Z|00068|bridge|INFO|bridge ovs_pvp_br0: using datapath ID 0000c2bf97187e42 2020-02-05T11:07:10.366Z|00069|connmgr|INFO|ovs_pvp_br0: added service controller "punix:/var/run/openvswitch/ovs_pvp_br0.mgmt" 2020-02-05T11:07:10.368Z|00070|timeval|WARN|Unreasonably long 10806ms poll interval (11ms user, 9891ms system) 2020-02-05T11:07:10.368Z|00071|timeval|WARN|faults: 34637545 minor, 0 major 2020-02-05T11:07:10.368Z|00072|timeval|WARN|disk: 0 reads, 16 writes 2020-02-05T11:07:10.368Z|00073|timeval|WARN|context switches: 290 voluntary, 25 involuntary 2020-02-05T11:07:10.368Z|00074|coverage|INFO|Event coverage, avg rate over last: 5 seconds, last minute, last hour, hash=1a468b82: 2020-02-05T11:07:10.368Z|00075|coverage|INFO|bridge_reconfigure 0.0/sec 0.000/sec 0.0000/sec total: 1 2020-02-05T11:07:10.368Z|00076|coverage|INFO|ofproto_flush 0.0/sec 0.000/sec 0.0000/sec total: 1 2020-02-05T11:07:10.368Z|00077|coverage|INFO|ofproto_update_port 0.0/sec 0.000/sec 0.0000/sec total: 1 2020-02-05T11:07:10.368Z|00078|coverage|INFO|rev_flow_table 0.0/sec 0.000/sec 0.0000/sec total: 1 2020-02-05T11:07:10.369Z|00079|coverage|INFO|cmap_expand 0.0/sec 0.000/sec 0.0000/sec total: 50 2020-02-05T11:07:10.369Z|00080|coverage|INFO|cmap_shrink 0.0/sec 0.000/sec 0.0000/sec total: 30 2020-02-05T11:07:10.369Z|00081|coverage|INFO|dpif_port_add 0.0/sec 0.000/sec 0.0000/sec total: 2 2020-02-05T11:07:10.369Z|00082|coverage|INFO|dpif_flow_flush 0.0/sec 0.000/sec 0.0000/sec total: 1 2020-02-05T11:07:10.369Z|00083|coverage|INFO|dpif_flow_get 0.0/sec 0.000/sec 0.0000/sec total: 23 2020-02-05T11:07:10.369Z|00084|coverage|INFO|dpif_flow_put 0.0/sec 0.000/sec 0.0000/sec total: 24 2020-02-05T11:07:10.369Z|00085|coverage|INFO|dpif_flow_del 0.0/sec 0.000/sec 0.0000/sec total: 23 2020-02-05T11:07:10.369Z|00086|coverage|INFO|dpif_execute 0.0/sec 0.000/sec 0.0000/sec total: 6 2020-02-05T11:07:10.369Z|00087|coverage|INFO|flow_extract 0.0/sec 0.000/sec 0.0000/sec total: 4 2020-02-05T11:07:10.369Z|00088|coverage|INFO|miniflow_malloc 0.0/sec 0.000/sec 0.0000/sec total: 48 2020-02-05T11:07:10.369Z|00089|coverage|INFO|hmap_expand 0.0/sec 0.000/sec 0.0000/sec total: 472 2020-02-05T11:07:10.369Z|00090|coverage|INFO|netdev_get_stats 0.0/sec 0.000/sec 0.0000/sec total: 2 2020-02-05T11:07:10.369Z|00091|coverage|INFO|txn_unchanged 0.0/sec 0.000/sec 0.0000/sec total: 1 2020-02-05T11:07:10.369Z|00092|coverage|INFO|txn_incomplete 0.0/sec 0.000/sec 0.0000/sec total: 2 2020-02-05T11:07:10.369Z|00093|coverage|INFO|poll_create_node 0.0/sec 0.000/sec 0.0000/sec total: 2128 2020-02-05T11:07:10.369Z|00094|coverage|INFO|poll_zero_timeout 0.0/sec 0.000/sec 0.0000/sec total: 2 2020-02-05T11:07:10.369Z|00095|coverage|INFO|seq_change 0.0/sec 0.000/sec 0.0000/sec total: 706 2020-02-05T11:07:10.369Z|00096|coverage|INFO|pstream_open 0.0/sec 0.000/sec 0.0000/sec total: 3 2020-02-05T11:07:10.369Z|00097|coverage|INFO|stream_open 0.0/sec 0.000/sec 0.0000/sec total: 1 2020-02-05T11:07:10.369Z|00098|coverage|INFO|util_xalloc 0.0/sec 0.000/sec 0.0000/sec total: 14170 2020-02-05T11:07:10.369Z|00099|coverage|INFO|netdev_set_policing 0.0/sec 0.000/sec 0.0000/sec total: 1 2020-02-05T11:07:10.369Z|00100|coverage|INFO|netdev_get_ifindex 0.0/sec 0.000/sec 0.0000/sec total: 4 2020-02-05T11:07:10.369Z|00101|coverage|INFO|netdev_set_hwaddr 0.0/sec 0.000/sec 0.0000/sec total: 1 2020-02-05T11:07:10.369Z|00102|coverage|INFO|netdev_get_ethtool 0.0/sec 0.000/sec 0.0000/sec total: 2 2020-02-05T11:07:10.369Z|00103|coverage|INFO|netlink_received 0.0/sec 0.000/sec 0.0000/sec total: 128 2020-02-05T11:07:10.369Z|00104|coverage|INFO|netlink_recv_jumbo 0.0/sec 0.000/sec 0.0000/sec total: 26 2020-02-05T11:07:10.369Z|00105|coverage|INFO|netlink_sent 0.0/sec 0.000/sec 0.0000/sec total: 122 2020-02-05T11:07:10.369Z|00106|coverage|INFO|111 events never hit 2020-02-05T11:07:10.369Z|00107|poll_loop|INFO|wakeup due to [POLLIN] on fd 11 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (91% CPU usage) 2020-02-05T11:07:10.369Z|00108|poll_loop|INFO|wakeup due to [POLLIN] on fd 10 (NETLINK_ROUTE<->NETLINK_ROUTE) at lib/netlink-socket.c:1401 (91% CPU usage) 2020-02-05T11:07:10.369Z|00109|memory|INFO|149440 kB peak resident set size after 10.8 seconds 2020-02-05T11:07:10.369Z|00110|memory|INFO|handlers:1 ports:1 revalidators:1 rules:5 2020-02-05T11:07:10.370Z|00111|dpif_netdev|INFO|PMD thread on numa_id: 0, core id: 1 created. 2020-02-05T11:07:10.370Z|00112|dpif_netdev|INFO|There are 1 pmd threads on numa node 0 2020-02-05T11:07:10.370Z|00113|netdev_afxdp|INFO|eno1: Removing xdp program. 2020-02-05T11:07:10.370Z|00114|netdev_afxdp|INFO|No XDP program is loaded at ifindex 4 2020-02-05T11:07:10.370Z|00115|netdev_afxdp|INFO|eno1: Setting XDP mode to native-with-zerocopy. 2020-02-05T11:07:11.049Z|00116|netdev_afxdp|INFO|eno1: Loaded custom XDP program at /root/xdp_pass_kern.o prog_id 18. 2020-02-05T11:07:11.098Z|00117|netdev_afxdp|ERR|xsk_socket__create failed (Resource temporarily unavailable) mode: native-with-zerocopy, use-need-wakeup: true, qid: 0 2020-02-05T11:07:11.123Z|00118|netdev_afxdp|ERR|eno1: Failed to create AF_XDP socket on queue 0 in native-with-zerocopy mode. 2020-02-05T11:07:11.123Z|00119|netdev_afxdp|ERR|eno1: Failed to create AF_XDP socket on queue 0. 2020-02-05T11:07:11.123Z|00120|netdev_afxdp|INFO|eno1: Removing xdp program. 2020-02-05T11:07:11.680Z|00121|netdev_afxdp|INFO|Removed XDP program ID: 18 2020-02-05T11:07:11.680Z|00122|netdev_afxdp|ERR|eno1: AF_XDP device reconfiguration failed. 2020-02-05T11:07:11.680Z|00123|dpif_netdev|ERR|Failed to set interface eno1 new configuration 2020-02-05T11:07:11.680Z|00124|dpif|WARN|netdev@ovs-netdev: failed to add eno1 as port: Invalid argument 2020-02-05T11:07:11.680Z|00125|bridge|WARN|could not add network device eno1 to ofproto (Invalid argument) 2020-02-05T11:07:11.680Z|00126|netdev_afxdp|INFO|eno1: Removing xdp program. 2020-02-05T11:07:11.680Z|00127|netdev_afxdp|INFO|No XDP program is loaded at ifindex 4 2020-02-05T11:07:11.683Z|00128|bridge|WARN|could not open network device tapVM (No such device) 2020-02-05T11:07:11.684Z|00129|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.13.90 2020-02-05T11:07:11.685Z|00130|timeval|WARN|Unreasonably long 1316ms poll interval (7ms user, 103ms system) 2020-02-05T11:07:11.685Z|00131|timeval|WARN|faults: 20491 minor, 0 major 2020-02-05T11:07:11.685Z|00132|timeval|WARN|disk: 0 reads, 8 writes 2020-02-05T11:07:11.685Z|00133|timeval|WARN|context switches: 439 voluntary, 7 involuntary … I can’t replicate the crash on demand, I need to play with the four commands below: $ ovs-vsctl remove int eno1 options xdp-obj $ systemctl restart openvswitch $ ovs-vsctl -- set int eno1 options:xdp-obj=/root/af_xdp_kern.o $ ovs-vsctl -- set int eno1 options:xdp-obj=/root/xdp_pass_kern.o This is where it’s crashing: (gdb) bt #0 0x0000000000c050fc in netdev_afxdp_get_custom_stats (netdev=0x2a84980, custom_stats=0x7fffcbd5fc50) at lib/netdev-afxdp.c:1377 #1 0x0000000000ab78dc in iface_refresh_stats (iface=0x2a83520) at vswitchd/bridge.c:2676 #2 0x0000000000ac08a8 in run_stats_update () at vswitchd/bridge.c:3103 #3 bridge_run () at vswitchd/bridge.c:3366 #4 0x00000000005291ad in main (argc=<optimized out>, argv=<optimized out>) at vswitchd/ovs-vswitchd.c:127 (gdb) l 1372 custom_stats->counters = xcalloc(netdev_n_rxq(netdev) * N_XDP_CSTATS, 1373 sizeof *custom_stats->counters); 1374 1375 /* Account the stats for each xsk. */ 1376 for (i = 0; i < netdev_n_rxq(netdev); i++) { 1377 xsk_info = dev->xsks[i]; 1378 optlen = sizeof stat; 1379 1380 if (xsk_info && !getsockopt(xsk_socket__fd(xsk_info->xsk), SOL_XDP, 1381 XDP_STATISTICS, &stat, &optlen)) { Cheers, Eelco On 27 Jan 2020, at 17:25, Eelco Chaudron wrote: > On 27 Jan 2020, at 6:04, William Tu wrote: > >> Hi Eelco and Ilya, >> Do you think this patch is ok? > > I’ve been out last week and was working on other stuff earlier. I > will try to review it next week unless Ilya gets to it earlier. > > Cheers, > > Eelco > >> Thanks >> William >> >> On Thu, Jan 16, 2020 at 1:49 PM William Tu <u9012063@gmail.com> >> wrote: >>> >>> Now netdev-afxdp always forwards all packets to userspace because >>> it is using libbpf's default XDP program, see 'xsk_load_xdp_prog'. >>> There are some cases when users want to keep packets in kernel >>> instead >>> of sending to userspace, for example, management traffic such as SSH >>> should be processed in kernel. >>> >>> The patch enables loading the user-provided XDP program by >>> $ovs-vsctl -- set int afxdp-p0 options:xdp-obj=<path/to/xdp/obj> >>> >>> So users can implement their filtering logic or traffic steering >>> idea >>> in their XDP program, and rest of the traffic passes to AF_XDP >>> socket >>> handled by OVS. >>> >>> Signed-off-by: William Tu <u9012063@gmail.com> >>> --- >>> v6: >>> - rebase to master >>> - mostly remains the same as v5, but make sure there is no >>> leak using bpftool and no repeated loop issued reported from >>> Eelco >>> here: >>> https://patchwork.ozlabs.org/patch/1199734/ >>> which has been fixed at >>> netdev-afxdp: Avoid removing of XDP program if not loaded. >>> - travis: >>> https://travis-ci.org/williamtu/ovs-travis/builds/638126505 >>> >>> v5: >>> - rebase to master >>> Feedbacks from Eelco: >>> - Remove xdp-obj="__default__" case, to remove xdp-obj, use >>> ovs-vsctl remove int <dev> options xdp-obj >>> - Fix problem of xdp program not unloading >>> verify by bpftool. >>> - use xdp-obj instead of xdpobj >>> - Limitation: xdp-obj doesn't work when using best-effort-mode >>> because best-effort mode tried to probe mode by setting up >>> queue, >>> and loading xdp-obj requires knwoing mode in advance. >>> (to support it, we might need to use the >>> XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD as in v3) >>> >>> Testing >>> - I place two xdp binary here >>> https://drive.google.com/open?id=1QCCdNE-5CwlKCFV6Upg9mOPnnbVkUwA5 >>> [xdpsock_pass.o] Working one, which forwards packets to >>> dpif-netdev >>> [xdpsock_invalid.o] invalid one, which has no map >>> >>> v4: >>> Feedbacks from Eelco. >>> - First load the program, then configure xsk. >>> Let API take care of xdp prog and map loading, don't set >>> XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD. >>> - When loading custom xdp, need to close(prog_fd) and >>> close(map_fd) >>> to release the resources >>> - make sure prog and map is unloaded by bpftool. >>> - update doc, afxdp.rst >>> - Tested-at: >>> https://travis-ci.org/williamtu/ovs-travis/builds/608986781 >>> >>> v3: >>> Feedbacks from Eelco. >>> - keep using xdpobj not xdp-obj (because we alread use xdpmode) >>> or we change both to xdp-obj and xdp-mode? >>> - log a info message when using external program for better >>> debugging >>> - combine some failure messages >>> - update doc >>> NEW: >>> - add options:xdpobj=__default__, to set back to libbpf default >>> prog >>> - Tested-at: >>> https://travis-ci.org/williamtu/ovs-travis/builds/606153231 >>> >>> v2: >>> A couple fixes and remove RFC >>> --- >>> Documentation/intro/install/afxdp.rst | 59 +++++++++++++++ >>> NEWS | 2 + >>> lib/netdev-afxdp.c | 135 >>> ++++++++++++++++++++++++++++++++-- >>> lib/netdev-linux-private.h | 4 + >>> 4 files changed, 193 insertions(+), 7 deletions(-) >>> >>> diff --git a/Documentation/intro/install/afxdp.rst >>> b/Documentation/intro/install/afxdp.rst >>> index 15e3c918f942..e72bb3edabe6 100644 >>> --- a/Documentation/intro/install/afxdp.rst >>> +++ b/Documentation/intro/install/afxdp.rst >>> @@ -283,6 +283,65 @@ Or, use OVS pmd tool:: >>> ovs-appctl dpif-netdev/pmd-stats-show >>> >>> >>> +Loading Custom XDP Program >>> +-------------------------- >>> +By defailt, netdev-afxdp always forwards all packets to userspace >>> because >>> +it is using libbpf's default XDP program. There are some cases when >>> users >>> +want to keep packets in kernel instead of sending to userspace, for >>> example, >>> +management traffic such as SSH should be processed in kernel. This >>> can be >>> +done by loading the user-provided XDP program:: >>> + >>> + ovs-vsctl -- set int afxdp-p0 options:xdp-obj=<path/to/xdp/obj> >>> + >>> +So users can implement their filtering logic or traffic steering >>> idea >>> +in their XDP program, and rest of the traffic passes to AF_XDP >>> socket >>> +handled by OVS. To set it back to default, use:: >>> + >>> + ovs-vsctl remove int afxdp-p0 options xdp-obj >>> + >>> +Below is a sample C program compiled under kernel's samples/bpf/. >>> + >>> +.. code-block:: c >>> + >>> + #include <uapi/linux/bpf.h> >>> + #include "bpf_helpers.h" >>> + >>> + #if LINUX_VERSION_CODE < KERNEL_VERSION(5,3,0) >>> + /* Kernel version before 5.3 needed an additional map */ >>> + struct bpf_map_def SEC("maps") qidconf_map = { >>> + .type = BPF_MAP_TYPE_ARRAY, >>> + .key_size = sizeof(int), >>> + .value_size = sizeof(int), >>> + .max_entries = 64, >>> + }; >>> + #endif >>> + >>> + /* OVS will associate map 'xsks_map' to xsk socket. */ >>> + struct bpf_map_def SEC("maps") xsks_map = { >>> + .type = BPF_MAP_TYPE_XSKMAP, >>> + .key_size = sizeof(int), >>> + .value_size = sizeof(int), >>> + .max_entries = 32, >>> + }; >>> + >>> + SEC("xdp_sock") >>> + int xdp_sock_prog(struct xdp_md *ctx) >>> + { >>> + int index = ctx->rx_queue_index; >>> + >>> + /* Customized by user. >>> + * For example >>> + * 1) filter out all SSH traffic and return XDP_PASS >>> + * for kernel to process. >>> + * 2) Drop unwanted packet by returning XDP_DROP. >>> + */ >>> + >>> + /* Rest of packets goes to AF_XDP. */ >>> + return bpf_redirect_map(&xsks_map, index, 0); >>> + } >>> + char _license[] SEC("license") = "GPL"; >>> + >>> + >>> Example Script >>> -------------- >>> >>> diff --git a/NEWS b/NEWS >>> index e8d662a0c15f..a939262ce09e 100644 >>> --- a/NEWS >>> +++ b/NEWS >>> @@ -19,6 +19,8 @@ Post-v2.12.0 >>> generic - former SKB >>> best-effort [default] - new one, chooses the best >>> available from >>> 3 above modes >>> + * New option 'xdp-obj' for loading custom XDP program. >>> Default uses >>> + the libbpf builtin XDP program. >>> - DPDK: >>> * DPDK pdump packet capture support disabled by default. New >>> configure >>> option '--enable-dpdk-pdump' to enable it. >>> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c >>> index 6ac0bc2dde90..421566e36a40 100644 >>> --- a/lib/netdev-afxdp.c >>> +++ b/lib/netdev-afxdp.c >>> @@ -21,6 +21,7 @@ >>> #include "netdev-afxdp.h" >>> #include "netdev-afxdp-pool.h" >>> >>> +#include <bpf/bpf.h> >>> #include <errno.h> >>> #include <inttypes.h> >>> #include <linux/rtnetlink.h> >>> @@ -30,6 +31,7 @@ >>> #include <stdlib.h> >>> #include <sys/resource.h> >>> #include <sys/socket.h> >>> +#include <sys/stat.h> >>> #include <sys/types.h> >>> #include <unistd.h> >>> >>> @@ -93,7 +95,8 @@ static struct xsk_socket_info *xsk_configure(int >>> ifindex, int xdp_queue_id, >>> enum afxdp_mode mode, >>> bool use_need_wakeup, >>> bool >>> report_socket_failures); >>> -static void xsk_remove_xdp_program(uint32_t ifindex, enum >>> afxdp_mode); >>> +static void xsk_remove_xdp_program(uint32_t ifindex, enum >>> afxdp_mode, >>> + int prog_fd, int map_fd); >>> static void xsk_destroy(struct xsk_socket_info *xsk); >>> static int xsk_configure_all(struct netdev *netdev); >>> static void xsk_destroy_all(struct netdev *netdev); >>> @@ -255,6 +258,23 @@ netdev_afxdp_sweep_unused_pools(void *aux >>> OVS_UNUSED) >>> ovs_mutex_unlock(&unused_pools_mutex); >>> } >>> >>> +static int >>> +xsk_load_prog(const char *path, struct bpf_object **obj, >>> + int *prog_fd) >>> +{ >>> + struct bpf_prog_load_attr attr = { >>> + .prog_type = BPF_PROG_TYPE_XDP, >>> + .file = path, >>> + }; >>> + >>> + if (bpf_prog_load_xattr(&attr, obj, prog_fd)) { >>> + VLOG_ERR("Can't load XDP program at '%s'", path); >>> + return EINVAL; >>> + } >>> + >>> + return 0; >>> +} >>> + >>> static struct xsk_umem_info * >>> xsk_configure_umem(void *buffer, uint64_t size) >>> { >>> @@ -471,6 +491,50 @@ xsk_configure_queue(struct netdev_linux *dev, >>> int ifindex, int queue_id, >>> return 0; >>> } >>> >>> +static int >>> +xsk_configure_prog(struct netdev *netdev, int ifindex) >>> +{ >>> + struct netdev_linux *dev = netdev_linux_cast(netdev); >>> + struct bpf_object *obj; >>> + uint32_t prog_id = 0; >>> + uint32_t flags; >>> + int prog_fd = 0; >>> + int map_fd = 0; >>> + int mode; >>> + int ret; >>> + >>> + mode = dev->xdp_mode_in_use; >>> + flags = xdp_modes[mode].xdp_flags | >>> XDP_FLAGS_UPDATE_IF_NOEXIST; >>> + >>> + ret = xsk_load_prog(dev->xdp_obj, &obj, &prog_fd); >>> + if (ret) { >>> + goto err; >>> + } >>> + dev->prog_fd = prog_fd; >>> + >>> + bpf_set_link_xdp_fd(ifindex, prog_fd, flags); >>> + ret = bpf_get_link_xdp_id(ifindex, &prog_id, flags); >>> + if (ret < 0) { >>> + VLOG_ERR("%s: Cannot get XDP prog id.", >>> + netdev_get_name(netdev)); >>> + goto err; >>> + } >>> + >>> + map_fd = bpf_object__find_map_fd_by_name(obj, "xsks_map"); >>> + if (map_fd < 0) { >>> + VLOG_ERR("%s: Cannot find \"xsks_map\".", >>> + netdev_get_name(netdev)); >>> + goto err; >>> + } >>> + dev->map_fd = map_fd; >>> + >>> + VLOG_INFO("%s: Loaded custom XDP program at %s prog_id %d.", >>> + netdev_get_name(netdev), dev->xdp_obj, prog_id); >>> + return 0; >>> + >>> +err: >>> + return ret; >>> +} >>> >>> static int >>> xsk_configure_all(struct netdev *netdev) >>> @@ -507,6 +571,13 @@ xsk_configure_all(struct netdev *netdev) >>> qid++; >>> } else { >>> dev->xdp_mode_in_use = dev->xdp_mode; >>> + if (dev->xdp_obj) { >>> + /* XDP program is per-netdev, so all queues share >>> + * the same XDP program. */ >>> + if (xsk_configure_prog(netdev, ifindex)) { >>> + goto err; >>> + } >>> + } >>> } >>> >>> /* Configure remaining queues. */ >>> @@ -581,7 +652,12 @@ xsk_destroy_all(struct netdev *netdev) >>> >>> VLOG_INFO("%s: Removing xdp program.", >>> netdev_get_name(netdev)); >>> ifindex = linux_get_ifindex(netdev_get_name(netdev)); >>> - xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use); >>> + xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use, >>> dev->prog_fd, >>> + dev->map_fd); >>> + dev->prog_fd = 0; >>> + dev->map_fd = 0; >>> + free(CONST_CAST(char *, dev->xdp_obj)); >>> + dev->xdp_obj = NULL; >>> >>> if (dev->tx_locks) { >>> for (i = 0; i < netdev_n_txq(netdev); i++) { >>> @@ -598,9 +674,11 @@ netdev_afxdp_set_config(struct netdev *netdev, >>> const struct smap *args, >>> { >>> struct netdev_linux *dev = netdev_linux_cast(netdev); >>> const char *str_xdp_mode; >>> + const char *str_xdp_obj; >>> enum afxdp_mode xdp_mode; >>> bool need_wakeup; >>> int new_n_rxq; >>> + struct stat s; >>> >>> ovs_mutex_lock(&dev->mutex); >>> new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1); >>> @@ -634,12 +712,34 @@ netdev_afxdp_set_config(struct netdev *netdev, >>> const struct smap *args, >>> } >>> #endif >>> >>> + str_xdp_obj = smap_get_def(args, "xdp-obj", NULL); >>> + if (str_xdp_obj) { >>> + if (stat(str_xdp_obj, &s)) { >>> + ovs_mutex_unlock(&dev->mutex); >>> + VLOG_ERR("Invalid xdp-obj '%s': %s.", str_xdp_obj, >>> + ovs_strerror(errno)); >>> + return EINVAL; >>> + } else if (!S_ISREG(s.st_mode)) { >>> + ovs_mutex_unlock(&dev->mutex); >>> + VLOG_ERR("xdp-obj '%s' is not a regular file.", >>> str_xdp_obj); >>> + return EINVAL; >>> + } >>> + } >>> + >>> + if (str_xdp_obj && xdp_mode == OVS_AF_XDP_MODE_BEST_EFFORT) { >>> + ovs_mutex_unlock(&dev->mutex); >>> + VLOG_ERR("best-effort mode and xdp-obj can't be set >>> together"); >>> + return EINVAL; >>> + } >>> + >>> if (dev->requested_n_rxq != new_n_rxq >>> || dev->requested_xdp_mode != xdp_mode >>> - || dev->requested_need_wakeup != need_wakeup) { >>> + || dev->requested_need_wakeup != need_wakeup >>> + || !nullable_string_is_equal(dev->requested_xdp_obj, >>> str_xdp_obj)) { >>> dev->requested_n_rxq = new_n_rxq; >>> dev->requested_xdp_mode = xdp_mode; >>> dev->requested_need_wakeup = need_wakeup; >>> + dev->requested_xdp_obj = nullable_xstrdup(str_xdp_obj); >>> netdev_request_reconfigure(netdev); >>> } >>> ovs_mutex_unlock(&dev->mutex); >>> @@ -658,6 +758,8 @@ netdev_afxdp_get_config(const struct netdev >>> *netdev, struct smap *args) >>> xdp_modes[dev->xdp_mode_in_use].name); >>> smap_add_format(args, "use-need-wakeup", "%s", >>> dev->use_need_wakeup ? "true" : "false"); >>> + smap_add_format(args, "xdp-obj", "%s", >>> + dev->xdp_obj ? dev->xdp_obj : "builtin"); >>> ovs_mutex_unlock(&dev->mutex); >>> return 0; >>> } >>> @@ -674,7 +776,8 @@ netdev_afxdp_reconfigure(struct netdev *netdev) >>> if (netdev->n_rxq == dev->requested_n_rxq >>> && dev->xdp_mode == dev->requested_xdp_mode >>> && dev->use_need_wakeup == dev->requested_need_wakeup >>> - && dev->xsks) { >>> + && dev->xsks >>> + && nullable_string_is_equal(dev->xdp_obj, >>> dev->requested_xdp_obj)) { >>> goto out; >>> } >>> >>> @@ -692,6 +795,8 @@ netdev_afxdp_reconfigure(struct netdev *netdev) >>> } >>> dev->use_need_wakeup = dev->requested_need_wakeup; >>> >>> + dev->xdp_obj = dev->requested_xdp_obj; >>> + >>> err = xsk_configure_all(netdev); >>> if (err) { >>> VLOG_ERR("%s: AF_XDP device reconfiguration failed.", >>> @@ -715,7 +820,8 @@ netdev_afxdp_get_numa_id(const struct netdev >>> *netdev) >>> } >>> >>> static void >>> -xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode mode) >>> +xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode mode, >>> + int prog_fd, int map_fd) >>> { >>> uint32_t flags = xdp_modes[mode].xdp_flags | >>> XDP_FLAGS_UPDATE_IF_NOEXIST; >>> uint32_t ret, prog_id = 0; >>> @@ -732,7 +838,19 @@ xsk_remove_xdp_program(uint32_t ifindex, enum >>> afxdp_mode mode) >>> return; >>> } >>> >>> - bpf_set_link_xdp_fd(ifindex, -1, flags); >>> + ret = bpf_set_link_xdp_fd(ifindex, -1, flags); >>> + if (ret) { >>> + VLOG_ERR("Failed to unload prog ID: %d", prog_id); >>> + } >>> + >>> + if (prog_fd) { >>> + close(prog_fd); >>> + } >>> + if (map_fd) { >>> + close(map_fd); >>> + } >>> + >>> + VLOG_INFO("Removed XDP program ID: %d", prog_id); >>> } >>> >>> void >>> @@ -744,7 +862,8 @@ signal_remove_xdp(struct netdev *netdev) >>> ifindex = linux_get_ifindex(netdev_get_name(netdev)); >>> >>> VLOG_WARN("Force removing xdp program."); >>> - xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use); >>> + xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use, >>> + dev->prog_fd, dev->map_fd); >>> } >>> >>> static struct dp_packet_afxdp * >>> @@ -1158,10 +1277,12 @@ netdev_afxdp_construct(struct netdev >>> *netdev) >>> netdev->n_txq = 0; >>> dev->xdp_mode = OVS_AF_XDP_MODE_UNSPEC; >>> dev->xdp_mode_in_use = OVS_AF_XDP_MODE_UNSPEC; >>> + dev->xdp_obj = NULL; >>> >>> dev->requested_n_rxq = NR_QUEUE; >>> dev->requested_xdp_mode = OVS_AF_XDP_MODE_BEST_EFFORT; >>> dev->requested_need_wakeup = NEED_WAKEUP_DEFAULT; >>> + dev->requested_xdp_obj = NULL; >>> >>> dev->xsks = NULL; >>> dev->tx_locks = NULL; >>> diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h >>> index f08159aa7b53..190927cec098 100644 >>> --- a/lib/netdev-linux-private.h >>> +++ b/lib/netdev-linux-private.h >>> @@ -109,6 +109,10 @@ struct netdev_linux { >>> bool requested_need_wakeup; >>> >>> struct netdev_afxdp_tx_lock *tx_locks; /* Array of locks for >>> TX queues. */ >>> + const char *xdp_obj; /* XDP object file path. */ >>> + const char *requested_xdp_obj; >>> + int prog_fd; >>> + int map_fd; >>> #endif >>> }; >>> >>> -- >>> 2.7.4 >>> > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst index 15e3c918f942..e72bb3edabe6 100644 --- a/Documentation/intro/install/afxdp.rst +++ b/Documentation/intro/install/afxdp.rst @@ -283,6 +283,65 @@ Or, use OVS pmd tool:: ovs-appctl dpif-netdev/pmd-stats-show +Loading Custom XDP Program +-------------------------- +By defailt, netdev-afxdp always forwards all packets to userspace because +it is using libbpf's default XDP program. There are some cases when users +want to keep packets in kernel instead of sending to userspace, for example, +management traffic such as SSH should be processed in kernel. This can be +done by loading the user-provided XDP program:: + + ovs-vsctl -- set int afxdp-p0 options:xdp-obj=<path/to/xdp/obj> + +So users can implement their filtering logic or traffic steering idea +in their XDP program, and rest of the traffic passes to AF_XDP socket +handled by OVS. To set it back to default, use:: + + ovs-vsctl remove int afxdp-p0 options xdp-obj + +Below is a sample C program compiled under kernel's samples/bpf/. + +.. code-block:: c + + #include <uapi/linux/bpf.h> + #include "bpf_helpers.h" + + #if LINUX_VERSION_CODE < KERNEL_VERSION(5,3,0) + /* Kernel version before 5.3 needed an additional map */ + struct bpf_map_def SEC("maps") qidconf_map = { + .type = BPF_MAP_TYPE_ARRAY, + .key_size = sizeof(int), + .value_size = sizeof(int), + .max_entries = 64, + }; + #endif + + /* OVS will associate map 'xsks_map' to xsk socket. */ + struct bpf_map_def SEC("maps") xsks_map = { + .type = BPF_MAP_TYPE_XSKMAP, + .key_size = sizeof(int), + .value_size = sizeof(int), + .max_entries = 32, + }; + + SEC("xdp_sock") + int xdp_sock_prog(struct xdp_md *ctx) + { + int index = ctx->rx_queue_index; + + /* Customized by user. + * For example + * 1) filter out all SSH traffic and return XDP_PASS + * for kernel to process. + * 2) Drop unwanted packet by returning XDP_DROP. + */ + + /* Rest of packets goes to AF_XDP. */ + return bpf_redirect_map(&xsks_map, index, 0); + } + char _license[] SEC("license") = "GPL"; + + Example Script -------------- diff --git a/NEWS b/NEWS index e8d662a0c15f..a939262ce09e 100644 --- a/NEWS +++ b/NEWS @@ -19,6 +19,8 @@ Post-v2.12.0 generic - former SKB best-effort [default] - new one, chooses the best available from 3 above modes + * New option 'xdp-obj' for loading custom XDP program. Default uses + the libbpf builtin XDP program. - DPDK: * DPDK pdump packet capture support disabled by default. New configure option '--enable-dpdk-pdump' to enable it. diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c index 6ac0bc2dde90..421566e36a40 100644 --- a/lib/netdev-afxdp.c +++ b/lib/netdev-afxdp.c @@ -21,6 +21,7 @@ #include "netdev-afxdp.h" #include "netdev-afxdp-pool.h" +#include <bpf/bpf.h> #include <errno.h> #include <inttypes.h> #include <linux/rtnetlink.h> @@ -30,6 +31,7 @@ #include <stdlib.h> #include <sys/resource.h> #include <sys/socket.h> +#include <sys/stat.h> #include <sys/types.h> #include <unistd.h> @@ -93,7 +95,8 @@ static struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id, enum afxdp_mode mode, bool use_need_wakeup, bool report_socket_failures); -static void xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode); +static void xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode, + int prog_fd, int map_fd); static void xsk_destroy(struct xsk_socket_info *xsk); static int xsk_configure_all(struct netdev *netdev); static void xsk_destroy_all(struct netdev *netdev); @@ -255,6 +258,23 @@ netdev_afxdp_sweep_unused_pools(void *aux OVS_UNUSED) ovs_mutex_unlock(&unused_pools_mutex); } +static int +xsk_load_prog(const char *path, struct bpf_object **obj, + int *prog_fd) +{ + struct bpf_prog_load_attr attr = { + .prog_type = BPF_PROG_TYPE_XDP, + .file = path, + }; + + if (bpf_prog_load_xattr(&attr, obj, prog_fd)) { + VLOG_ERR("Can't load XDP program at '%s'", path); + return EINVAL; + } + + return 0; +} + static struct xsk_umem_info * xsk_configure_umem(void *buffer, uint64_t size) { @@ -471,6 +491,50 @@ xsk_configure_queue(struct netdev_linux *dev, int ifindex, int queue_id, return 0; } +static int +xsk_configure_prog(struct netdev *netdev, int ifindex) +{ + struct netdev_linux *dev = netdev_linux_cast(netdev); + struct bpf_object *obj; + uint32_t prog_id = 0; + uint32_t flags; + int prog_fd = 0; + int map_fd = 0; + int mode; + int ret; + + mode = dev->xdp_mode_in_use; + flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST; + + ret = xsk_load_prog(dev->xdp_obj, &obj, &prog_fd); + if (ret) { + goto err; + } + dev->prog_fd = prog_fd; + + bpf_set_link_xdp_fd(ifindex, prog_fd, flags); + ret = bpf_get_link_xdp_id(ifindex, &prog_id, flags); + if (ret < 0) { + VLOG_ERR("%s: Cannot get XDP prog id.", + netdev_get_name(netdev)); + goto err; + } + + map_fd = bpf_object__find_map_fd_by_name(obj, "xsks_map"); + if (map_fd < 0) { + VLOG_ERR("%s: Cannot find \"xsks_map\".", + netdev_get_name(netdev)); + goto err; + } + dev->map_fd = map_fd; + + VLOG_INFO("%s: Loaded custom XDP program at %s prog_id %d.", + netdev_get_name(netdev), dev->xdp_obj, prog_id); + return 0; + +err: + return ret; +} static int xsk_configure_all(struct netdev *netdev) @@ -507,6 +571,13 @@ xsk_configure_all(struct netdev *netdev) qid++; } else { dev->xdp_mode_in_use = dev->xdp_mode; + if (dev->xdp_obj) { + /* XDP program is per-netdev, so all queues share + * the same XDP program. */ + if (xsk_configure_prog(netdev, ifindex)) { + goto err; + } + } } /* Configure remaining queues. */ @@ -581,7 +652,12 @@ xsk_destroy_all(struct netdev *netdev) VLOG_INFO("%s: Removing xdp program.", netdev_get_name(netdev)); ifindex = linux_get_ifindex(netdev_get_name(netdev)); - xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use); + xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use, dev->prog_fd, + dev->map_fd); + dev->prog_fd = 0; + dev->map_fd = 0; + free(CONST_CAST(char *, dev->xdp_obj)); + dev->xdp_obj = NULL; if (dev->tx_locks) { for (i = 0; i < netdev_n_txq(netdev); i++) { @@ -598,9 +674,11 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args, { struct netdev_linux *dev = netdev_linux_cast(netdev); const char *str_xdp_mode; + const char *str_xdp_obj; enum afxdp_mode xdp_mode; bool need_wakeup; int new_n_rxq; + struct stat s; ovs_mutex_lock(&dev->mutex); new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1); @@ -634,12 +712,34 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args, } #endif + str_xdp_obj = smap_get_def(args, "xdp-obj", NULL); + if (str_xdp_obj) { + if (stat(str_xdp_obj, &s)) { + ovs_mutex_unlock(&dev->mutex); + VLOG_ERR("Invalid xdp-obj '%s': %s.", str_xdp_obj, + ovs_strerror(errno)); + return EINVAL; + } else if (!S_ISREG(s.st_mode)) { + ovs_mutex_unlock(&dev->mutex); + VLOG_ERR("xdp-obj '%s' is not a regular file.", str_xdp_obj); + return EINVAL; + } + } + + if (str_xdp_obj && xdp_mode == OVS_AF_XDP_MODE_BEST_EFFORT) { + ovs_mutex_unlock(&dev->mutex); + VLOG_ERR("best-effort mode and xdp-obj can't be set together"); + return EINVAL; + } + if (dev->requested_n_rxq != new_n_rxq || dev->requested_xdp_mode != xdp_mode - || dev->requested_need_wakeup != need_wakeup) { + || dev->requested_need_wakeup != need_wakeup + || !nullable_string_is_equal(dev->requested_xdp_obj, str_xdp_obj)) { dev->requested_n_rxq = new_n_rxq; dev->requested_xdp_mode = xdp_mode; dev->requested_need_wakeup = need_wakeup; + dev->requested_xdp_obj = nullable_xstrdup(str_xdp_obj); netdev_request_reconfigure(netdev); } ovs_mutex_unlock(&dev->mutex); @@ -658,6 +758,8 @@ netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args) xdp_modes[dev->xdp_mode_in_use].name); smap_add_format(args, "use-need-wakeup", "%s", dev->use_need_wakeup ? "true" : "false"); + smap_add_format(args, "xdp-obj", "%s", + dev->xdp_obj ? dev->xdp_obj : "builtin"); ovs_mutex_unlock(&dev->mutex); return 0; } @@ -674,7 +776,8 @@ netdev_afxdp_reconfigure(struct netdev *netdev) if (netdev->n_rxq == dev->requested_n_rxq && dev->xdp_mode == dev->requested_xdp_mode && dev->use_need_wakeup == dev->requested_need_wakeup - && dev->xsks) { + && dev->xsks + && nullable_string_is_equal(dev->xdp_obj, dev->requested_xdp_obj)) { goto out; } @@ -692,6 +795,8 @@ netdev_afxdp_reconfigure(struct netdev *netdev) } dev->use_need_wakeup = dev->requested_need_wakeup; + dev->xdp_obj = dev->requested_xdp_obj; + err = xsk_configure_all(netdev); if (err) { VLOG_ERR("%s: AF_XDP device reconfiguration failed.", @@ -715,7 +820,8 @@ netdev_afxdp_get_numa_id(const struct netdev *netdev) } static void -xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode mode) +xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode mode, + int prog_fd, int map_fd) { uint32_t flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST; uint32_t ret, prog_id = 0; @@ -732,7 +838,19 @@ xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode mode) return; } - bpf_set_link_xdp_fd(ifindex, -1, flags); + ret = bpf_set_link_xdp_fd(ifindex, -1, flags); + if (ret) { + VLOG_ERR("Failed to unload prog ID: %d", prog_id); + } + + if (prog_fd) { + close(prog_fd); + } + if (map_fd) { + close(map_fd); + } + + VLOG_INFO("Removed XDP program ID: %d", prog_id); } void @@ -744,7 +862,8 @@ signal_remove_xdp(struct netdev *netdev) ifindex = linux_get_ifindex(netdev_get_name(netdev)); VLOG_WARN("Force removing xdp program."); - xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use); + xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use, + dev->prog_fd, dev->map_fd); } static struct dp_packet_afxdp * @@ -1158,10 +1277,12 @@ netdev_afxdp_construct(struct netdev *netdev) netdev->n_txq = 0; dev->xdp_mode = OVS_AF_XDP_MODE_UNSPEC; dev->xdp_mode_in_use = OVS_AF_XDP_MODE_UNSPEC; + dev->xdp_obj = NULL; dev->requested_n_rxq = NR_QUEUE; dev->requested_xdp_mode = OVS_AF_XDP_MODE_BEST_EFFORT; dev->requested_need_wakeup = NEED_WAKEUP_DEFAULT; + dev->requested_xdp_obj = NULL; dev->xsks = NULL; dev->tx_locks = NULL; diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h index f08159aa7b53..190927cec098 100644 --- a/lib/netdev-linux-private.h +++ b/lib/netdev-linux-private.h @@ -109,6 +109,10 @@ struct netdev_linux { bool requested_need_wakeup; struct netdev_afxdp_tx_lock *tx_locks; /* Array of locks for TX queues. */ + const char *xdp_obj; /* XDP object file path. */ + const char *requested_xdp_obj; + int prog_fd; + int map_fd; #endif };
Now netdev-afxdp always forwards all packets to userspace because it is using libbpf's default XDP program, see 'xsk_load_xdp_prog'. There are some cases when users want to keep packets in kernel instead of sending to userspace, for example, management traffic such as SSH should be processed in kernel. The patch enables loading the user-provided XDP program by $ovs-vsctl -- set int afxdp-p0 options:xdp-obj=<path/to/xdp/obj> So users can implement their filtering logic or traffic steering idea in their XDP program, and rest of the traffic passes to AF_XDP socket handled by OVS. Signed-off-by: William Tu <u9012063@gmail.com> --- v6: - rebase to master - mostly remains the same as v5, but make sure there is no leak using bpftool and no repeated loop issued reported from Eelco here: https://patchwork.ozlabs.org/patch/1199734/ which has been fixed at netdev-afxdp: Avoid removing of XDP program if not loaded. - travis: https://travis-ci.org/williamtu/ovs-travis/builds/638126505 v5: - rebase to master Feedbacks from Eelco: - Remove xdp-obj="__default__" case, to remove xdp-obj, use ovs-vsctl remove int <dev> options xdp-obj - Fix problem of xdp program not unloading verify by bpftool. - use xdp-obj instead of xdpobj - Limitation: xdp-obj doesn't work when using best-effort-mode because best-effort mode tried to probe mode by setting up queue, and loading xdp-obj requires knwoing mode in advance. (to support it, we might need to use the XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD as in v3) Testing - I place two xdp binary here https://drive.google.com/open?id=1QCCdNE-5CwlKCFV6Upg9mOPnnbVkUwA5 [xdpsock_pass.o] Working one, which forwards packets to dpif-netdev [xdpsock_invalid.o] invalid one, which has no map v4: Feedbacks from Eelco. - First load the program, then configure xsk. Let API take care of xdp prog and map loading, don't set XSK_LIBBPF_FLAGS__INHIBIT_PROG_LOAD. - When loading custom xdp, need to close(prog_fd) and close(map_fd) to release the resources - make sure prog and map is unloaded by bpftool. - update doc, afxdp.rst - Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/608986781 v3: Feedbacks from Eelco. - keep using xdpobj not xdp-obj (because we alread use xdpmode) or we change both to xdp-obj and xdp-mode? - log a info message when using external program for better debugging - combine some failure messages - update doc NEW: - add options:xdpobj=__default__, to set back to libbpf default prog - Tested-at: https://travis-ci.org/williamtu/ovs-travis/builds/606153231 v2: A couple fixes and remove RFC --- Documentation/intro/install/afxdp.rst | 59 +++++++++++++++ NEWS | 2 + lib/netdev-afxdp.c | 135 ++++++++++++++++++++++++++++++++-- lib/netdev-linux-private.h | 4 + 4 files changed, 193 insertions(+), 7 deletions(-)