Message ID | 1554158812-44622-1-git-send-email-u9012063@gmail.com |
---|---|
Headers | show |
Series | AF_XDP netdev support for OVS | expand |
On Mon, Apr 01, 2019 at 03:46:48PM -0700, William Tu wrote: > The patch series introduces AF_XDP support for OVS netdev. > AF_XDP is a new address family working together with eBPF. > In short, a socket with AF_XDP family can receive and send > packets from an eBPF/XDP program attached to the netdev. > For more details about AF_XDP, please see linux kernel's > Documentation/networking/af_xdp.rst I'm glad to see some more revisions of this series! AF_XDP is a faster way to access the existing kernel devices. If we take that point of view, then it would be ideal if AF_XDP were automatically used when it was available, instead of adding a new network device type. Is there a reason that this point of view is wrong? That is, when AF_XDP is available, is there a reason not to use it? You said that your goal for the next version is to improve performance and add optimizations. Do you think that is important before we merge the series? We can continue to improve performance after it is merged. If we set performance aside, do you have a reason to want to wait to merge this? (I wasn't able to easily apply this series to current master, so it'll need at least a rebase before we apply it. And I have only skimmed it, not fully reviewed it.) It might make sense to squash all of these into a single patch. I am not sure that they are really distinct conceptually.
On 16 Apr 2019, at 21:55, Ben Pfaff wrote: > On Mon, Apr 01, 2019 at 03:46:48PM -0700, William Tu wrote: >> The patch series introduces AF_XDP support for OVS netdev. >> AF_XDP is a new address family working together with eBPF. >> In short, a socket with AF_XDP family can receive and send >> packets from an eBPF/XDP program attached to the netdev. >> For more details about AF_XDP, please see linux kernel's >> Documentation/networking/af_xdp.rst > > I'm glad to see some more revisions of this series! I’m planning on reviewing and testing this patch, I’ll try to start it this week, or else when I get back from PTO. > AF_XDP is a faster way to access the existing kernel devices. If we > take that point of view, then it would be ideal if AF_XDP were > automatically used when it was available, instead of adding a new > network device type. Is there a reason that this point of view is > wrong? That is, when AF_XDP is available, is there a reason not to > use > it? This needs support by all the ingress and egress ports in the system, and currently, there is no API to check this. There are also features like traffic shaping that will not work. Maybe it will be worth adding the table for AF_XDP in http://docs.openvswitch.org/en/latest/faq/releases/ > You said that your goal for the next version is to improve performance > and add optimizations. Do you think that is important before we merge > the series? We can continue to improve performance after it is > merged. The previous patch was rather unstable and I could not get it running with the PVP test without crashing. I think this patchset should get some proper testing and reviews by others. Especially for all the features being marked as supported in the above-mentioned table. > If we set performance aside, do you have a reason to want to wait to > merge this? (I wasn't able to easily apply this series to current > master, so it'll need at least a rebase before we apply it. And I > have > only skimmed it, not fully reviewed it.) Other than the items above, do we really need another datapath? With this, we use two or more cores for processing packets. If we poll two physical ports it could be 300%, which is a typical use case with bonding. What about multiple queue support, does it work? Both in kernel and DPDK mode we use multiple queues to distribute the load, with this scenario does it double the number of CPUs used? Can we use the poll() mode as explained here, https://linuxplumbersconf.org/event/2/contributions/99/, and how will it work with multiple queues/pmd threads? What about any latency tests, is it worse or better than kernel/dpdk? Also with the AF_XDP datapath, there is no to leverage hardware offload, like DPDK and TC. And then there is the part that it only works on the most recent kernels. To me looking at this I would say it’s far from being ready to be merged into OVS. However, if others decide to go ahead I think it should be disabled, not compiled in by default. > It might make sense to squash all of these into a single patch. I am > not sure that they are really distinct conceptually.
On 17 Apr 2019, at 10:09, Eelco Chaudron wrote: > On 16 Apr 2019, at 21:55, Ben Pfaff wrote: > >> On Mon, Apr 01, 2019 at 03:46:48PM -0700, William Tu wrote: >>> The patch series introduces AF_XDP support for OVS netdev. >>> AF_XDP is a new address family working together with eBPF. >>> In short, a socket with AF_XDP family can receive and send >>> packets from an eBPF/XDP program attached to the netdev. >>> For more details about AF_XDP, please see linux kernel's >>> Documentation/networking/af_xdp.rst >> >> I'm glad to see some more revisions of this series! > > I’m planning on reviewing and testing this patch, I’ll try to > start it this week, or else when I get back from PTO. > >> AF_XDP is a faster way to access the existing kernel devices. If we >> take that point of view, then it would be ideal if AF_XDP were >> automatically used when it was available, instead of adding a new >> network device type. Is there a reason that this point of view is >> wrong? That is, when AF_XDP is available, is there a reason not to >> use >> it? > > This needs support by all the ingress and egress ports in the system, > and currently, there is no API to check this. > > There are also features like traffic shaping that will not work. Maybe > it will be worth adding the table for AF_XDP in > http://docs.openvswitch.org/en/latest/faq/releases/ > >> You said that your goal for the next version is to improve >> performance >> and add optimizations. Do you think that is important before we >> merge >> the series? We can continue to improve performance after it is >> merged. > > The previous patch was rather unstable and I could not get it running > with the PVP test without crashing. I think this patchset should get > some proper testing and reviews by others. Especially for all the > features being marked as supported in the above-mentioned table. > >> If we set performance aside, do you have a reason to want to wait to >> merge this? (I wasn't able to easily apply this series to current >> master, so it'll need at least a rebase before we apply it. And I >> have >> only skimmed it, not fully reviewed it.) > > Other than the items above, do we really need another datapath? With > this, we use two or more cores for processing packets. If we poll two > physical ports it could be 300%, which is a typical use case with > bonding. What about multiple queue support, does it work? Both in > kernel and DPDK mode we use multiple queues to distribute the load, > with this scenario does it double the number of CPUs used? Can we use > the poll() mode as explained here, > https://linuxplumbersconf.org/event/2/contributions/99/, and how will > it work with multiple queues/pmd threads? What about any latency > tests, is it worse or better than kernel/dpdk? Also with the AF_XDP > datapath, there is no to leverage hardware offload, like DPDK and TC. > And then there is the part that it only works on the most recent > kernels. One other thing that popped up in my head is how (will) it work together with DPDK enabled on the same system? > To me looking at this I would say it’s far from being ready to be > merged into OVS. However, if others decide to go ahead I think it > should be disabled, not compiled in by default. > >> It might make sense to squash all of these into a single patch. I am >> not sure that they are really distinct conceptually. > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Hi William, I think you applied the following patch to get it to compile? Or did you copy in the kernel headers? https://www.spinics.net/lists/netdev/msg563507.html //Eelco On 2 Apr 2019, at 0:46, William Tu wrote: > The patch series introduces AF_XDP support for OVS netdev. > AF_XDP is a new address family working together with eBPF. > In short, a socket with AF_XDP family can receive and send > packets from an eBPF/XDP program attached to the netdev. > For more details about AF_XDP, please see linux kernel's > Documentation/networking/af_xdp.rst > > OVS has a couple of netdev types, i.e., system, tap, or > internal. The patch first adds a new netdev types called > "afxdp", and implement its configuration, packet reception, > and transmit functions. Since the AF_XDP socket, xsk, > operates in userspace, once ovs-vswitchd receives packets > from xsk, the proposed architecture re-uses the existing > userspace dpif-netdev datapath. As a result, most of > the packet processing happens at the userspace instead of > linux kernel. > > Architecure > =========== > _ > | +-------------------+ > | | ovs-vswitchd |<-->ovsdb-server > | +-------------------+ > | | ofproto |<-->OpenFlow controllers > | +--------+-+--------+ > | | netdev | |ofproto-| > userspace | +--------+ | dpif | > | | netdev | +--------+ > | |provider| | dpif | > | +---||---+ +--------+ > | || | dpif- | > | || | netdev | > |_ || +--------+ > || > _ +---||-----+--------+ > | | af_xdp prog + | > kernel | | xsk_map | > |_ +--------||---------+ > || > physical > NIC > > To simply start, create a ovs userspace bridge using dpif-netdev > by setting the datapath_type to netdev: > # ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev > > And attach a linux netdev with type afxdp: > # ovs-vsctl add-port br0 afxdp-p0 -- \ > set interface afxdp-p0 type="afxdp" > > Performance > =========== > For this version, v4, I mainly focus on making the features right with > libbpf AF_XDP API and use the AF_XDP SKB mode, which is the slower > set-up. > My next version is to measure the performance and add optimizations. > > Documentation > ============= > Most of the design details are described in the paper presetned at > Linux Plumber 2018, "Bringing the Power of eBPF to Open vSwitch"[1], > section 4, and slides[2]. > This path uses a not-yet upstreamed feature called XDP_ATTACH[3], > described in section 3.1, which is a built-in XDP program for the > AF_XDP. > This greatly simplifies the management of XDP/eBPF programs. > > [1] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-afxdp.pdf > [2] > http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-lpc18-presentation.pdf > [3] > http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf > > For installation and configuration guide, see > # Documentation/intro/install/bpf.rst > > Test Cases > ========== > Test cases are created using namespaces and veth peer, with AF_XDP > socket > attached to the veth (thus the SKB_MODE). By issuing "make > check-afxdp", > the patch shows the following: > > AF_XDP netdev datapath-sanity > > 1: datapath - ping between two ports ok > 2: datapath - ping between two ports on vlan ok > 3: datapath - ping6 between two ports ok > 4: datapath - ping6 between two ports on vlan ok > 5: datapath - ping over vxlan tunnel ok > 6: datapath - ping over vxlan6 tunnel ok > 7: datapath - ping over gre tunnel ok > 8: datapath - ping over erspan v1 tunnel ok > 9: datapath - ping over erspan v2 tunnel ok > 10: datapath - ping over ip6erspan v1 tunnel ok > 11: datapath - ping over ip6erspan v2 tunnel ok > 12: datapath - ping over geneve tunnel ok > 13: datapath - ping over geneve6 tunnel ok > 14: datapath - clone action ok > 15: datapath - basic truncate action ok > > conntrack > > 16: conntrack - controller ok > 17: conntrack - force commit ok > 18: conntrack - ct flush by 5-tuple ok > 19: conntrack - IPv4 ping ok > 20: conntrack - get_nconns and get/set_maxconns ok > 21: conntrack - IPv6 ping ok > > system-ovn > > 22: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT ok > 23: ovn -- 2 LRs connected via LS, gateway router, easy SNAT ok > 24: ovn -- multiple gateway routers, SNAT and DNAT ok > 25: ovn -- load-balancing ok > 26: ovn -- load-balancing - same subnet. ok > 27: ovn -- load balancing in gateway router ok > 28: ovn -- multiple gateway routers, load-balancing ok > 29: ovn -- load balancing in router with gateway router port ok > 30: ovn -- DNAT and SNAT on distributed router - N/S ok > 31: ovn -- DNAT and SNAT on distributed router - E/W ok > > --- > v1->v2: > - add a list to maintain unused umem elements > - remove copy from rx umem to ovs internal buffer > - use hugetlb to reduce misses (not much difference) > - use pmd mode netdev in OVS (huge performance improve) > - remove malloc dp_packet, instead put dp_packet in umem > > v2->v3: > - rebase on the OVS master, 7ab4b0653784 > ("configure: Check for more specific function to pull in pthread > library.") > - remove the dependency on libbpf and dpif-bpf. > instead, use the built-in XDP_ATTACH feature. > - data structure optimizations for better performance, see[1] > - more test cases support > v3: > https://mail.openvswitch.org/pipermail/ovs-dev/2018-November/354179.html > > v3->v4: > - Use AF_XDP API provided by libbpf > - Remove the dependency on XDP_ATTACH kernel patch set > - Add documentation, bpf.rst > > William Tu (4): > Add libbpf build support. > netdev-afxdp: add new netdev type for AF_XDP > tests: add AF_XDP netdev test cases. > afxdp netdev: add documentation and configuration. > > Documentation/automake.mk | 1 + > Documentation/index.rst | 1 + > Documentation/intro/install/bpf.rst | 182 +++++++ > Documentation/intro/install/index.rst | 1 + > acinclude.m4 | 20 + > configure.ac | 1 + > lib/automake.mk | 7 +- > lib/dp-packet.c | 12 + > lib/dp-packet.h | 32 +- > lib/dpif-netdev.c | 2 +- > lib/netdev-afxdp.c | 491 +++++++++++++++++ > lib/netdev-afxdp.h | 39 ++ > lib/netdev-linux.c | 78 ++- > lib/netdev-provider.h | 1 + > lib/netdev.c | 1 + > lib/xdpsock.c | 179 +++++++ > lib/xdpsock.h | 129 +++++ > tests/automake.mk | 17 + > tests/system-afxdp-macros.at | 153 ++++++ > tests/system-afxdp-testsuite.at | 26 + > tests/system-afxdp-traffic.at | 978 > ++++++++++++++++++++++++++++++++++ > 21 files changed, 2345 insertions(+), 6 deletions(-) > create mode 100644 Documentation/intro/install/bpf.rst > create mode 100644 lib/netdev-afxdp.c > create mode 100644 lib/netdev-afxdp.h > create mode 100644 lib/xdpsock.c > create mode 100644 lib/xdpsock.h > create mode 100644 tests/system-afxdp-macros.at > create mode 100644 tests/system-afxdp-testsuite.at > create mode 100644 tests/system-afxdp-traffic.at > > -- > 2.7.4
On 17 Apr 2019, at 14:01, Eelco Chaudron wrote: > Hi William, > > I think you applied the following patch to get it to compile? Or did > you copy in the kernel headers? > > https://www.spinics.net/lists/netdev/msg563507.html I noticed you duplicated the macros, which resulted in all kind of compile errors. So I removed them, applied the two patches above, which would get me to the next step. I’m building it with DPDK enabled and it was causing all kind of duplicate definition errors as the kernel and DPDK re-use some structure names. To get it all compiled and working I had top make the following changes: $ git diff diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c index b3bf2f044..47fb3342a 100644 --- a/lib/netdev-afxdp.c +++ b/lib/netdev-afxdp.c @@ -295,7 +295,7 @@ netdev_linux_rxq_xsk(struct xsk_socket_info *xsk, uint32_t idx_rx = 0, idx_fq = 0; int ret = 0; - unsigned int non_afxdp; + unsigned int non_afxdp = 0; /* See if there is any packet on RX queue, * if yes, idx_rx is the index having the packet. diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 47153dc60..77f2150ab 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -24,7 +24,7 @@ #include <unistd.h> #include <linux/virtio_net.h> #include <sys/socket.h> -#include <linux/if.h> +//#include <linux/if.h> #include <rte_bus_pci.h> #include <rte_config.h> diff --git a/lib/xdpsock.h b/lib/xdpsock.h index 8df8fa451..a2ed1a136 100644 --- a/lib/xdpsock.h +++ b/lib/xdpsock.h @@ -28,7 +28,7 @@ #include <stdio.h> #include <stdlib.h> #include <string.h> -#include <net/ethernet.h> +//#include <net/ethernet.h> #include <sys/resource.h> #include <sys/socket.h> #include <sys/mman.h> @@ -43,14 +43,6 @@ #include "ovs-atomic.h" #include "openvswitch/thread.h" -/* bpf/xsk.h uses the following macros not defined in OVS, - * so re-define them before include. - */ -#define unlikely OVS_UNLIKELY -#define likely OVS_LIKELY -#define barrier() __asm__ __volatile__("": : :"memory") -#define smp_rmb() barrier() -#define smp_wmb() barrier() #include <bpf/xsk.h> In addition you need to do “make install_headers” from kernel libbpf and copy the libbpf_util.h manually. I was able to do a simple physical port in same physical port out test without crashing, but the numbers seem low: $ ovs-ofctl dump-flows ovs_pvp_br0 cookie=0x0, duration=210.344s, table=0, n_packets=1784692, n_bytes=2694884920, in_port=eno1 actions=IN_PORT "Physical loopback test, L3 flows[port redirect]" ,Packet size Number of flows,64,128,256,512,768,1024,1514 100,77574,77329,76605,76417,75539,75252,74617 The above is using two cores, but with a single DPDK core I get the following (on the same machine): "Physical loopback test, L3 flows[port redirect]" ,Packet size Number of flows,64,128,256,512,768,1024,1514 100,9527075,8445852,4528935,2349597,1586276,1197304,814854 For the kernel datapath the numbers are: "Physical loopback test, L3 flows[port redirect]" ,Packet size Number of flows,64,128,256,512,768,1024,1514 100,4862995,5521870,4528872,2349596,1586277,1197305,814854 But keep in mind it uses roughly 550/610/520/380/180/140/110% of the CPU for the respective packet size. > On 2 Apr 2019, at 0:46, William Tu wrote: > >> The patch series introduces AF_XDP support for OVS netdev. >> AF_XDP is a new address family working together with eBPF. >> In short, a socket with AF_XDP family can receive and send >> packets from an eBPF/XDP program attached to the netdev. >> For more details about AF_XDP, please see linux kernel's >> Documentation/networking/af_xdp.rst >> >> OVS has a couple of netdev types, i.e., system, tap, or >> internal. The patch first adds a new netdev types called >> "afxdp", and implement its configuration, packet reception, >> and transmit functions. Since the AF_XDP socket, xsk, >> operates in userspace, once ovs-vswitchd receives packets >> from xsk, the proposed architecture re-uses the existing >> userspace dpif-netdev datapath. As a result, most of >> the packet processing happens at the userspace instead of >> linux kernel. >> >> Architecure >> =========== >> _ >> | +-------------------+ >> | | ovs-vswitchd |<-->ovsdb-server >> | +-------------------+ >> | | ofproto |<-->OpenFlow controllers >> | +--------+-+--------+ >> | | netdev | |ofproto-| >> userspace | +--------+ | dpif | >> | | netdev | +--------+ >> | |provider| | dpif | >> | +---||---+ +--------+ >> | || | dpif- | >> | || | netdev | >> |_ || +--------+ >> || >> _ +---||-----+--------+ >> | | af_xdp prog + | >> kernel | | xsk_map | >> |_ +--------||---------+ >> || >> physical >> NIC >> >> To simply start, create a ovs userspace bridge using dpif-netdev >> by setting the datapath_type to netdev: >> # ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev >> >> And attach a linux netdev with type afxdp: >> # ovs-vsctl add-port br0 afxdp-p0 -- \ >> set interface afxdp-p0 type="afxdp" >> >> Performance >> =========== >> For this version, v4, I mainly focus on making the features right >> with >> libbpf AF_XDP API and use the AF_XDP SKB mode, which is the slower >> set-up. >> My next version is to measure the performance and add optimizations. >> >> Documentation >> ============= >> Most of the design details are described in the paper presetned at >> Linux Plumber 2018, "Bringing the Power of eBPF to Open vSwitch"[1], >> section 4, and slides[2]. >> This path uses a not-yet upstreamed feature called XDP_ATTACH[3], >> described in section 3.1, which is a built-in XDP program for the >> AF_XDP. >> This greatly simplifies the management of XDP/eBPF programs. >> >> [1] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-afxdp.pdf >> [2] >> http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-lpc18-presentation.pdf >> [3] >> http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf >> >> For installation and configuration guide, see >> # Documentation/intro/install/bpf.rst >> >> Test Cases >> ========== >> Test cases are created using namespaces and veth peer, with AF_XDP >> socket >> attached to the veth (thus the SKB_MODE). By issuing "make >> check-afxdp", >> the patch shows the following: >> >> AF_XDP netdev datapath-sanity >> >> 1: datapath - ping between two ports ok >> 2: datapath - ping between two ports on vlan ok >> 3: datapath - ping6 between two ports ok >> 4: datapath - ping6 between two ports on vlan ok >> 5: datapath - ping over vxlan tunnel ok >> 6: datapath - ping over vxlan6 tunnel ok >> 7: datapath - ping over gre tunnel ok >> 8: datapath - ping over erspan v1 tunnel ok >> 9: datapath - ping over erspan v2 tunnel ok >> 10: datapath - ping over ip6erspan v1 tunnel ok >> 11: datapath - ping over ip6erspan v2 tunnel ok >> 12: datapath - ping over geneve tunnel ok >> 13: datapath - ping over geneve6 tunnel ok >> 14: datapath - clone action ok >> 15: datapath - basic truncate action ok >> >> conntrack >> >> 16: conntrack - controller ok >> 17: conntrack - force commit ok >> 18: conntrack - ct flush by 5-tuple ok >> 19: conntrack - IPv4 ping ok >> 20: conntrack - get_nconns and get/set_maxconns ok >> 21: conntrack - IPv6 ping ok >> >> system-ovn >> >> 22: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT ok >> 23: ovn -- 2 LRs connected via LS, gateway router, easy SNAT ok >> 24: ovn -- multiple gateway routers, SNAT and DNAT ok >> 25: ovn -- load-balancing ok >> 26: ovn -- load-balancing - same subnet. ok >> 27: ovn -- load balancing in gateway router ok >> 28: ovn -- multiple gateway routers, load-balancing ok >> 29: ovn -- load balancing in router with gateway router port ok >> 30: ovn -- DNAT and SNAT on distributed router - N/S ok >> 31: ovn -- DNAT and SNAT on distributed router - E/W ok >> >> --- >> v1->v2: >> - add a list to maintain unused umem elements >> - remove copy from rx umem to ovs internal buffer >> - use hugetlb to reduce misses (not much difference) >> - use pmd mode netdev in OVS (huge performance improve) >> - remove malloc dp_packet, instead put dp_packet in umem >> >> v2->v3: >> - rebase on the OVS master, 7ab4b0653784 >> ("configure: Check for more specific function to pull in pthread >> library.") >> - remove the dependency on libbpf and dpif-bpf. >> instead, use the built-in XDP_ATTACH feature. >> - data structure optimizations for better performance, see[1] >> - more test cases support >> v3: >> https://mail.openvswitch.org/pipermail/ovs-dev/2018-November/354179.html >> >> v3->v4: >> - Use AF_XDP API provided by libbpf >> - Remove the dependency on XDP_ATTACH kernel patch set >> - Add documentation, bpf.rst >> >> William Tu (4): >> Add libbpf build support. >> netdev-afxdp: add new netdev type for AF_XDP >> tests: add AF_XDP netdev test cases. >> afxdp netdev: add documentation and configuration. >> >> Documentation/automake.mk | 1 + >> Documentation/index.rst | 1 + >> Documentation/intro/install/bpf.rst | 182 +++++++ >> Documentation/intro/install/index.rst | 1 + >> acinclude.m4 | 20 + >> configure.ac | 1 + >> lib/automake.mk | 7 +- >> lib/dp-packet.c | 12 + >> lib/dp-packet.h | 32 +- >> lib/dpif-netdev.c | 2 +- >> lib/netdev-afxdp.c | 491 +++++++++++++++++ >> lib/netdev-afxdp.h | 39 ++ >> lib/netdev-linux.c | 78 ++- >> lib/netdev-provider.h | 1 + >> lib/netdev.c | 1 + >> lib/xdpsock.c | 179 +++++++ >> lib/xdpsock.h | 129 +++++ >> tests/automake.mk | 17 + >> tests/system-afxdp-macros.at | 153 ++++++ >> tests/system-afxdp-testsuite.at | 26 + >> tests/system-afxdp-traffic.at | 978 >> ++++++++++++++++++++++++++++++++++ >> 21 files changed, 2345 insertions(+), 6 deletions(-) >> create mode 100644 Documentation/intro/install/bpf.rst >> create mode 100644 lib/netdev-afxdp.c >> create mode 100644 lib/netdev-afxdp.h >> create mode 100644 lib/xdpsock.c >> create mode 100644 lib/xdpsock.h >> create mode 100644 tests/system-afxdp-macros.at >> create mode 100644 tests/system-afxdp-testsuite.at >> create mode 100644 tests/system-afxdp-traffic.at >> >> -- >> 2.7.4 > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Wed, Apr 17, 2019 at 10:09:53AM +0200, Eelco Chaudron wrote: > On 16 Apr 2019, at 21:55, Ben Pfaff wrote: > > AF_XDP is a faster way to access the existing kernel devices. If we > > take that point of view, then it would be ideal if AF_XDP were > > automatically used when it was available, instead of adding a new > > network device type. Is there a reason that this point of view is > > wrong? That is, when AF_XDP is available, is there a reason not to use > > it? > > This needs support by all the ingress and egress ports in the system, and > currently, there is no API to check this. Do you mean for performance or for some other reason? I would suspect that, if AF_XDP was not available, then everything would still work OK via AF_PACKET, just slower. > There are also features like traffic shaping that will not work. Maybe it > will be worth adding the table for AF_XDP in > http://docs.openvswitch.org/en/latest/faq/releases/ AF_XDP is comparable to DPDK/userspace, not to the Linux kernel datapath. The table currently conflates the userspace datapath with the DPDK network device. I believe that the only entry there that depends on the DPDK network device is the one for policing. It could be replaced by a [*] with a note like this: YES - for DPDK network devices. NO - for system or AF_XDP network devices. > > You said that your goal for the next version is to improve performance > > and add optimizations. Do you think that is important before we merge > > the series? We can continue to improve performance after it is merged. > > The previous patch was rather unstable and I could not get it running with > the PVP test without crashing. I think this patchset should get some proper > testing and reviews by others. Especially for all the features being marked > as supported in the above-mentioned table. If it's unstable, we should fix that before adding it in. However, the bar is lower for new features that don't break existing features, especially optional ones and ones that can be easily be removed if they don't work out in the end. DPDK support was considered "experimental" for a long time, it's possible that AF_XDP would be in the same boat for a while. > > If we set performance aside, do you have a reason to want to wait to > > merge this? (I wasn't able to easily apply this series to current > > master, so it'll need at least a rebase before we apply it. And I have > > only skimmed it, not fully reviewed it.) > > Other than the items above, do we really need another datapath? It's less than a new datapath. It's a new network device implementation. > With this, we use two or more cores for processing packets. If we poll > two physical ports it could be 300%, which is a typical use case with > bonding. What about multiple queue support, does it work? Both in > kernel and DPDK mode we use multiple queues to distribute the load, > with this scenario does it double the number of CPUs used? Can we use > the poll() mode as explained here, > https://linuxplumbersconf.org/event/2/contributions/99/, and how will > it work with multiple queues/pmd threads? What about any latency > tests, is it worse or better than kernel/dpdk? Also with the AF_XDP > datapath, there is no to leverage hardware offload, like DPDK and > TC. And then there is the part that it only works on the most recent > kernels. These are good questions. William will have some of the answers. > To me looking at this I would say it’s far from being ready to be merged > into OVS. However, if others decide to go ahead I think it should be > disabled, not compiled in by default. Yes, that seems reasonable to me.
On Wed, Apr 17, 2019 at 12:16:59PM +0200, Eelco Chaudron wrote: > One other thing that popped up in my head is how (will) it work together > with DPDK enabled on the same system? Why not?
Thanks for the feedbacks. On Tue, Apr 16, 2019 at 12:55 PM Ben Pfaff <blp@ovn.org> wrote: > > On Mon, Apr 01, 2019 at 03:46:48PM -0700, William Tu wrote: > > The patch series introduces AF_XDP support for OVS netdev. > > AF_XDP is a new address family working together with eBPF. > > In short, a socket with AF_XDP family can receive and send > > packets from an eBPF/XDP program attached to the netdev. > > For more details about AF_XDP, please see linux kernel's > > Documentation/networking/af_xdp.rst > > I'm glad to see some more revisions of this series! > > AF_XDP is a faster way to access the existing kernel devices. If we > take that point of view, then it would be ideal if AF_XDP were > automatically used when it was available, instead of adding a new > network device type. Is there a reason that this point of view is > wrong? That is, when AF_XDP is available, is there a reason not to use > it? I think we should use it if it is available. However, now only ixgbe/i40e driver support AF_XDP mode. But I think more vendors are working on this feature. > > You said that your goal for the next version is to improve performance > and add optimizations. Do you think that is important before we merge > the series? We can continue to improve performance after it is merged. > > If we set performance aside, do you have a reason to want to wait to > merge this? (I wasn't able to easily apply this series to current > master, so it'll need at least a rebase before we apply it. And I have > only skimmed it, not fully reviewed it.) OK Thanks. I have been working on measuring the performance and adding some optimizations. I will consider submit another version. > > It might make sense to squash all of these into a single patch. I am > not sure that they are really distinct conceptually.
On Wed, Apr 17, 2019 at 1:09 AM Eelco Chaudron <echaudro@redhat.com> wrote: > > > > On 16 Apr 2019, at 21:55, Ben Pfaff wrote: > > > On Mon, Apr 01, 2019 at 03:46:48PM -0700, William Tu wrote: > >> The patch series introduces AF_XDP support for OVS netdev. > >> AF_XDP is a new address family working together with eBPF. > >> In short, a socket with AF_XDP family can receive and send > >> packets from an eBPF/XDP program attached to the netdev. > >> For more details about AF_XDP, please see linux kernel's > >> Documentation/networking/af_xdp.rst > > > > I'm glad to see some more revisions of this series! > > I’m planning on reviewing and testing this patch, I’ll try to start > it this week, or else when I get back from PTO. > > > AF_XDP is a faster way to access the existing kernel devices. If we > > take that point of view, then it would be ideal if AF_XDP were > > automatically used when it was available, instead of adding a new > > network device type. Is there a reason that this point of view is > > wrong? That is, when AF_XDP is available, is there a reason not to > > use > > it? > > This needs support by all the ingress and egress ports in the system, > and currently, there is no API to check this. Not necessary all ports. On a OVS switch, you can have some ports supporting AF_XDP, and some ports are other types, ex: DPDK vhost, or tap. > > There are also features like traffic shaping that will not work. Maybe > it will be worth adding the table for AF_XDP in > http://docs.openvswitch.org/en/latest/faq/releases/ Right, when using AF_XDP, we don't have QoS support. If people want to do rate limiting on a AF_XDP port, another way is to use OpenFlow meter actions. > > > You said that your goal for the next version is to improve performance > > and add optimizations. Do you think that is important before we merge > > the series? We can continue to improve performance after it is > > merged. > > The previous patch was rather unstable and I could not get it running > with the PVP test without crashing. I think this patchset should get > some proper testing and reviews by others. Especially for all the > features being marked as supported in the above-mentioned table. > Yes, Tim has been helping a lot to test this and I have a couple of new fixes. I will incorporate into next version. > > If we set performance aside, do you have a reason to want to wait to > > merge this? (I wasn't able to easily apply this series to current > > master, so it'll need at least a rebase before we apply it. And I > > have > > only skimmed it, not fully reviewed it.) > > Other than the items above, do we really need another datapath? With This is using the same datapath, the userspace datapath, as OVS-DPDK. So we don't introduce another datapath, we introduce a new netdev type. > this, we use two or more cores for processing packets. If we poll two > physical ports it could be 300%, which is a typical use case with > bonding. What about multiple queue support, does it work? Both in kernel Yes, this patchset only allows 1 pmd and 1 queue. I'm adding the multiqueue support. > and DPDK mode we use multiple queues to distribute the load, with this > scenario does it double the number of CPUs used? Can we use the poll() > mode as explained here, > https://linuxplumbersconf.org/event/2/contributions/99/, and how will it > work with multiple queues/pmd threads? What about any latency tests, is > it worse or better than kernel/dpdk? Also with the AF_XDP datapath, > there is no to leverage hardware offload, like DPDK and TC. And then > there is the part that it only works on the most recent kernels. You have lots of good points here. My experiments show that it's slower than DPDK, but much faster than kernel. > > To me looking at this I would say it’s far from being ready to be > merged into OVS. However, if others decide to go ahead I think it should > be disabled, not compiled in by default. > I agree. This should be experimental feature and we're adding s.t like #./configure --enable-afxdp so not compiled in by default Thanks William
Hi Eelco, Thanks for trying this patchset! On Wed, Apr 17, 2019 at 7:26 AM Eelco Chaudron <echaudro@redhat.com> wrote: > > > > On 17 Apr 2019, at 14:01, Eelco Chaudron wrote: > > > Hi William, > > > > I think you applied the following patch to get it to compile? Or did > > you copy in the kernel headers? > > > > https://www.spinics.net/lists/netdev/msg563507.html > Right. I apply the patch to get it compile. I should document it better in next version about how to install. > I noticed you duplicated the macros, which resulted in all kind of > compile errors. So I removed them, applied the two patches above, which > would get me to the next step. > > I’m building it with DPDK enabled and it was causing all kind of > duplicate definition errors as the kernel and DPDK re-use some structure > names. Sorry about that. I will fix it in next version. > > To get it all compiled and working I had top make the following changes: > > $ git diff > diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c > index b3bf2f044..47fb3342a 100644 > --- a/lib/netdev-afxdp.c > +++ b/lib/netdev-afxdp.c > @@ -295,7 +295,7 @@ netdev_linux_rxq_xsk(struct xsk_socket_info *xsk, > uint32_t idx_rx = 0, idx_fq = 0; > int ret = 0; > > - unsigned int non_afxdp; > + unsigned int non_afxdp = 0; > > /* See if there is any packet on RX queue, > * if yes, idx_rx is the index having the packet. > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > index 47153dc60..77f2150ab 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -24,7 +24,7 @@ > #include <unistd.h> > #include <linux/virtio_net.h> > #include <sys/socket.h> > -#include <linux/if.h> > +//#include <linux/if.h> > > #include <rte_bus_pci.h> > #include <rte_config.h> > diff --git a/lib/xdpsock.h b/lib/xdpsock.h > index 8df8fa451..a2ed1a136 100644 > --- a/lib/xdpsock.h > +++ b/lib/xdpsock.h > @@ -28,7 +28,7 @@ > #include <stdio.h> > #include <stdlib.h> > #include <string.h> > -#include <net/ethernet.h> > +//#include <net/ethernet.h> > #include <sys/resource.h> > #include <sys/socket.h> > #include <sys/mman.h> > @@ -43,14 +43,6 @@ > #include "ovs-atomic.h" > #include "openvswitch/thread.h" > > -/* bpf/xsk.h uses the following macros not defined in OVS, > - * so re-define them before include. > - */ > -#define unlikely OVS_UNLIKELY > -#define likely OVS_LIKELY > -#define barrier() __asm__ __volatile__("": : :"memory") > -#define smp_rmb() barrier() > -#define smp_wmb() barrier() > #include <bpf/xsk.h> > > In addition you need to do “make install_headers” from kernel libbpf > and copy the libbpf_util.h manually. > > I was able to do a simple physical port in same physical port out test > without crashing, but the numbers seem low: This probably is due to some log printing. I have a couple of optimizations, will share it soon. Regards, William > > $ ovs-ofctl dump-flows ovs_pvp_br0 > cookie=0x0, duration=210.344s, table=0, n_packets=1784692, > n_bytes=2694884920, in_port=eno1 actions=IN_PORT > > "Physical loopback test, L3 flows[port redirect]" > ,Packet size > Number of flows,64,128,256,512,768,1024,1514 > 100,77574,77329,76605,76417,75539,75252,74617 > > The above is using two cores, but with a single DPDK core I get the > following (on the same machine): > > "Physical loopback test, L3 flows[port redirect]" > ,Packet size > Number of flows,64,128,256,512,768,1024,1514 > 100,9527075,8445852,4528935,2349597,1586276,1197304,814854 > > For the kernel datapath the numbers are: > > "Physical loopback test, L3 flows[port redirect]" > ,Packet size > Number of flows,64,128,256,512,768,1024,1514 > 100,4862995,5521870,4528872,2349596,1586277,1197305,814854 > > But keep in mind it uses roughly 550/610/520/380/180/140/110% of the CPU > for the respective packet size. > > > On 2 Apr 2019, at 0:46, William Tu wrote: > > > >> The patch series introduces AF_XDP support for OVS netdev. > >> AF_XDP is a new address family working together with eBPF. > >> In short, a socket with AF_XDP family can receive and send > >> packets from an eBPF/XDP program attached to the netdev. > >> For more details about AF_XDP, please see linux kernel's > >> Documentation/networking/af_xdp.rst > >> > >> OVS has a couple of netdev types, i.e., system, tap, or > >> internal. The patch first adds a new netdev types called > >> "afxdp", and implement its configuration, packet reception, > >> and transmit functions. Since the AF_XDP socket, xsk, > >> operates in userspace, once ovs-vswitchd receives packets > >> from xsk, the proposed architecture re-uses the existing > >> userspace dpif-netdev datapath. As a result, most of > >> the packet processing happens at the userspace instead of > >> linux kernel. > >> > >> Architecure > >> =========== > >> _ > >> | +-------------------+ > >> | | ovs-vswitchd |<-->ovsdb-server > >> | +-------------------+ > >> | | ofproto |<-->OpenFlow controllers > >> | +--------+-+--------+ > >> | | netdev | |ofproto-| > >> userspace | +--------+ | dpif | > >> | | netdev | +--------+ > >> | |provider| | dpif | > >> | +---||---+ +--------+ > >> | || | dpif- | > >> | || | netdev | > >> |_ || +--------+ > >> || > >> _ +---||-----+--------+ > >> | | af_xdp prog + | > >> kernel | | xsk_map | > >> |_ +--------||---------+ > >> || > >> physical > >> NIC > >> > >> To simply start, create a ovs userspace bridge using dpif-netdev > >> by setting the datapath_type to netdev: > >> # ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev > >> > >> And attach a linux netdev with type afxdp: > >> # ovs-vsctl add-port br0 afxdp-p0 -- \ > >> set interface afxdp-p0 type="afxdp" > >> > >> Performance > >> =========== > >> For this version, v4, I mainly focus on making the features right > >> with > >> libbpf AF_XDP API and use the AF_XDP SKB mode, which is the slower > >> set-up. > >> My next version is to measure the performance and add optimizations. > >> > >> Documentation > >> ============= > >> Most of the design details are described in the paper presetned at > >> Linux Plumber 2018, "Bringing the Power of eBPF to Open vSwitch"[1], > >> section 4, and slides[2]. > >> This path uses a not-yet upstreamed feature called XDP_ATTACH[3], > >> described in section 3.1, which is a built-in XDP program for the > >> AF_XDP. > >> This greatly simplifies the management of XDP/eBPF programs. > >> > >> [1] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-afxdp.pdf > >> [2] > >> http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-lpc18-presentation.pdf > >> [3] > >> http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf > >> > >> For installation and configuration guide, see > >> # Documentation/intro/install/bpf.rst > >> > >> Test Cases > >> ========== > >> Test cases are created using namespaces and veth peer, with AF_XDP > >> socket > >> attached to the veth (thus the SKB_MODE). By issuing "make > >> check-afxdp", > >> the patch shows the following: > >> > >> AF_XDP netdev datapath-sanity > >> > >> 1: datapath - ping between two ports ok > >> 2: datapath - ping between two ports on vlan ok > >> 3: datapath - ping6 between two ports ok > >> 4: datapath - ping6 between two ports on vlan ok > >> 5: datapath - ping over vxlan tunnel ok > >> 6: datapath - ping over vxlan6 tunnel ok > >> 7: datapath - ping over gre tunnel ok > >> 8: datapath - ping over erspan v1 tunnel ok > >> 9: datapath - ping over erspan v2 tunnel ok > >> 10: datapath - ping over ip6erspan v1 tunnel ok > >> 11: datapath - ping over ip6erspan v2 tunnel ok > >> 12: datapath - ping over geneve tunnel ok > >> 13: datapath - ping over geneve6 tunnel ok > >> 14: datapath - clone action ok > >> 15: datapath - basic truncate action ok > >> > >> conntrack > >> > >> 16: conntrack - controller ok > >> 17: conntrack - force commit ok > >> 18: conntrack - ct flush by 5-tuple ok > >> 19: conntrack - IPv4 ping ok > >> 20: conntrack - get_nconns and get/set_maxconns ok > >> 21: conntrack - IPv6 ping ok > >> > >> system-ovn > >> > >> 22: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT ok > >> 23: ovn -- 2 LRs connected via LS, gateway router, easy SNAT ok > >> 24: ovn -- multiple gateway routers, SNAT and DNAT ok > >> 25: ovn -- load-balancing ok > >> 26: ovn -- load-balancing - same subnet. ok > >> 27: ovn -- load balancing in gateway router ok > >> 28: ovn -- multiple gateway routers, load-balancing ok > >> 29: ovn -- load balancing in router with gateway router port ok > >> 30: ovn -- DNAT and SNAT on distributed router - N/S ok > >> 31: ovn -- DNAT and SNAT on distributed router - E/W ok > >> > >> --- > >> v1->v2: > >> - add a list to maintain unused umem elements > >> - remove copy from rx umem to ovs internal buffer > >> - use hugetlb to reduce misses (not much difference) > >> - use pmd mode netdev in OVS (huge performance improve) > >> - remove malloc dp_packet, instead put dp_packet in umem > >> > >> v2->v3: > >> - rebase on the OVS master, 7ab4b0653784 > >> ("configure: Check for more specific function to pull in pthread > >> library.") > >> - remove the dependency on libbpf and dpif-bpf. > >> instead, use the built-in XDP_ATTACH feature. > >> - data structure optimizations for better performance, see[1] > >> - more test cases support > >> v3: > >> https://mail.openvswitch.org/pipermail/ovs-dev/2018-November/354179.html > >> > >> v3->v4: > >> - Use AF_XDP API provided by libbpf > >> - Remove the dependency on XDP_ATTACH kernel patch set > >> - Add documentation, bpf.rst > >> > >> William Tu (4): > >> Add libbpf build support. > >> netdev-afxdp: add new netdev type for AF_XDP > >> tests: add AF_XDP netdev test cases. > >> afxdp netdev: add documentation and configuration. > >> > >> Documentation/automake.mk | 1 + > >> Documentation/index.rst | 1 + > >> Documentation/intro/install/bpf.rst | 182 +++++++ > >> Documentation/intro/install/index.rst | 1 + > >> acinclude.m4 | 20 + > >> configure.ac | 1 + > >> lib/automake.mk | 7 +- > >> lib/dp-packet.c | 12 + > >> lib/dp-packet.h | 32 +- > >> lib/dpif-netdev.c | 2 +- > >> lib/netdev-afxdp.c | 491 +++++++++++++++++ > >> lib/netdev-afxdp.h | 39 ++ > >> lib/netdev-linux.c | 78 ++- > >> lib/netdev-provider.h | 1 + > >> lib/netdev.c | 1 + > >> lib/xdpsock.c | 179 +++++++ > >> lib/xdpsock.h | 129 +++++ > >> tests/automake.mk | 17 + > >> tests/system-afxdp-macros.at | 153 ++++++ > >> tests/system-afxdp-testsuite.at | 26 + > >> tests/system-afxdp-traffic.at | 978 > >> ++++++++++++++++++++++++++++++++++ > >> 21 files changed, 2345 insertions(+), 6 deletions(-) > >> create mode 100644 Documentation/intro/install/bpf.rst > >> create mode 100644 lib/netdev-afxdp.c > >> create mode 100644 lib/netdev-afxdp.h > >> create mode 100644 lib/xdpsock.c > >> create mode 100644 lib/xdpsock.h > >> create mode 100644 tests/system-afxdp-macros.at > >> create mode 100644 tests/system-afxdp-testsuite.at > >> create mode 100644 tests/system-afxdp-traffic.at > >> > >> -- > >> 2.7.4 > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Wed, Apr 17, 2019 at 9:47 AM Ben Pfaff <blp@ovn.org> wrote: > > On Wed, Apr 17, 2019 at 10:09:53AM +0200, Eelco Chaudron wrote: > > On 16 Apr 2019, at 21:55, Ben Pfaff wrote: > > > AF_XDP is a faster way to access the existing kernel devices. If we > > > take that point of view, then it would be ideal if AF_XDP were > > > automatically used when it was available, instead of adding a new > > > network device type. Is there a reason that this point of view is > > > wrong? That is, when AF_XDP is available, is there a reason not to use > > > it? > > > > This needs support by all the ingress and egress ports in the system, and > > currently, there is no API to check this. > > Do you mean for performance or for some other reason? I would suspect > that, if AF_XDP was not available, then everything would still work OK > via AF_PACKET, just slower. > > > There are also features like traffic shaping that will not work. Maybe it > > will be worth adding the table for AF_XDP in > > http://docs.openvswitch.org/en/latest/faq/releases/ > > AF_XDP is comparable to DPDK/userspace, not to the Linux kernel > datapath. > > The table currently conflates the userspace datapath with the DPDK > network device. I believe that the only entry there that depends on the > DPDK network device is the one for policing. It could be replaced by a > [*] with a note like this: > > YES - for DPDK network devices. > NO - for system or AF_XDP network devices. > > > > You said that your goal for the next version is to improve performance > > > and add optimizations. Do you think that is important before we merge > > > the series? We can continue to improve performance after it is merged. > > > > The previous patch was rather unstable and I could not get it running with > > the PVP test without crashing. I think this patchset should get some proper > > testing and reviews by others. Especially for all the features being marked > > as supported in the above-mentioned table. > > If it's unstable, we should fix that before adding it in. Agree. My first goal is to make sure people can at least run $ make check-afxdp This uses the virtual device, veth XDP skb-mode, to run various OVS test case. The performance will be bad, but it makes sure the correctness. Regards, William > > However, the bar is lower for new features that don't break existing > features, especially optional ones and ones that can be easily be > removed if they don't work out in the end. DPDK support was considered > "experimental" for a long time, it's possible that AF_XDP would be in > the same boat for a while. > > > > If we set performance aside, do you have a reason to want to wait to > > > merge this? (I wasn't able to easily apply this series to current > > > master, so it'll need at least a rebase before we apply it. And I have > > > only skimmed it, not fully reviewed it.) > > > > Other than the items above, do we really need another datapath? > > It's less than a new datapath. It's a new network device > implementation. > > > With this, we use two or more cores for processing packets. If we poll > > two physical ports it could be 300%, which is a typical use case with > > bonding. What about multiple queue support, does it work? Both in > > kernel and DPDK mode we use multiple queues to distribute the load, > > with this scenario does it double the number of CPUs used? Can we use > > the poll() mode as explained here, > > https://linuxplumbersconf.org/event/2/contributions/99/, and how will > > it work with multiple queues/pmd threads? What about any latency > > tests, is it worse or better than kernel/dpdk? Also with the AF_XDP > > datapath, there is no to leverage hardware offload, like DPDK and > > TC. And then there is the part that it only works on the most recent > > kernels. > > These are good questions. William will have some of the answers. > > > To me looking at this I would say it’s far from being ready to be merged > > into OVS. However, if others decide to go ahead I think it should be > > disabled, not compiled in by default. > > Yes, that seems reasonable to me.
On Wed, Apr 17, 2019 at 9:48 AM Ben Pfaff <blp@ovn.org> wrote: > > On Wed, Apr 17, 2019 at 12:16:59PM +0200, Eelco Chaudron wrote: > > One other thing that popped up in my head is how (will) it work together > > with DPDK enabled on the same system? > > Why not? It works OK with OVS-DPDK. For example, I can create a br0, attach a af_xdp port and also attach a dpdk port to it. (I tested using dpdk vhost port, not physical one). The performance is lower than using two dpdk ports due to some packet copying from one to another. Regards, William
On 17 Apr 2019, at 19:39, William Tu wrote: > On Wed, Apr 17, 2019 at 9:48 AM Ben Pfaff <blp@ovn.org> wrote: >> >> On Wed, Apr 17, 2019 at 12:16:59PM +0200, Eelco Chaudron wrote: >>> One other thing that popped up in my head is how (will) it work >>> together >>> with DPDK enabled on the same system? >> >> Why not? I’m like if it’s not tested it’s not working… > > It works OK with OVS-DPDK. > For example, I can create a br0, attach a af_xdp port and also attach > a dpdk port to it. (I tested using dpdk vhost port, not physical one). > > The performance is lower than using two dpdk ports due to some packet > copying from one to another. This is because to sent to the DPDK ports you use the shared queue which might block on a mutex. Sending from a DPDK port to XDP might be worse, as the PMD might stall due to the syscall required. I’ll try to do some more tests on this combination once I return from PTO.
On 17 Apr 2019, at 19:16, William Tu wrote: > Hi Eelco, > Thanks for trying this patchset! <SNIP> >> In addition you need to do “make install_headers” from kernel >> libbpf >> and copy the libbpf_util.h manually. >> >> I was able to do a simple physical port in same physical port out >> test >> without crashing, but the numbers seem low: > > This probably is due to some log printing. > I have a couple of optimizations, will share it soon. I do not see any additional logging, and even with all logging disabled, I get the same numbers. Will continue my testing once back from PTO. If you want me to test something without sending a new patch set, just let me know what to change. >> >> $ ovs-ofctl dump-flows ovs_pvp_br0 >> cookie=0x0, duration=210.344s, table=0, n_packets=1784692, >> n_bytes=2694884920, in_port=eno1 actions=IN_PORT >> >> "Physical loopback test, L3 flows[port redirect]" >> ,Packet size >> Number of flows,64,128,256,512,768,1024,1514 >> 100,77574,77329,76605,76417,75539,75252,74617 >> >> The above is using two cores, but with a single DPDK core I get the >> following (on the same machine): >> >> "Physical loopback test, L3 flows[port redirect]" >> ,Packet size >> Number of flows,64,128,256,512,768,1024,1514 >> 100,9527075,8445852,4528935,2349597,1586276,1197304,814854 >> >> For the kernel datapath the numbers are: >> >> "Physical loopback test, L3 flows[port redirect]" >> ,Packet size >> Number of flows,64,128,256,512,768,1024,1514 >> 100,4862995,5521870,4528872,2349596,1586277,1197305,814854 >> >> But keep in mind it uses roughly 550/610/520/380/180/140/110% of the >> CPU >> for the respective packet size. <SNIP>
On 17 Apr 2019, at 18:47, Ben Pfaff wrote: > On Wed, Apr 17, 2019 at 10:09:53AM +0200, Eelco Chaudron wrote: >> On 16 Apr 2019, at 21:55, Ben Pfaff wrote: >>> AF_XDP is a faster way to access the existing kernel devices. If we >>> take that point of view, then it would be ideal if AF_XDP were >>> automatically used when it was available, instead of adding a new >>> network device type. Is there a reason that this point of view is >>> wrong? That is, when AF_XDP is available, is there a reason not to >>> use >>> it? >> >> This needs support by all the ingress and egress ports in the system, >> and >> currently, there is no API to check this. > > Do you mean for performance or for some other reason? I would suspect > that, if AF_XDP was not available, then everything would still work OK > via AF_PACKET, just slower. Yes, it will become slower and people do not understand why. For example, it's easy to combine kernel, XDP and DPDK ports. But receiving one and tx on another becomes slow. It could even impact the DPDK/XDP performance as now syscall’s need to be executed stalling the PMD loop causing packets to be dropped. Maybe we should add something about this in the documentation, for example, that the kernel receive loop is done in the main thread, XDP/DPDK in dedicated PMD threads, etc. etc. >> There are also features like traffic shaping that will not work. >> Maybe it >> will be worth adding the table for AF_XDP in >> http://docs.openvswitch.org/en/latest/faq/releases/ > > AF_XDP is comparable to DPDK/userspace, not to the Linux kernel > datapath. > > The table currently conflates the userspace datapath with the DPDK > network device. I believe that the only entry there that depends on > the > DPDK network device is the one for policing. It could be replaced by > a > [*] with a note like this: > > YES - for DPDK network devices. > NO - for system or AF_XDP network devices. This would work, just want to make sure it’s tested rather than assume it will work as there might be corner cases. >>> You said that your goal for the next version is to improve >>> performance >>> and add optimizations. Do you think that is important before we >>> merge >>> the series? We can continue to improve performance after it is >>> merged. >> >> The previous patch was rather unstable and I could not get it running >> with >> the PVP test without crashing. I think this patchset should get some >> proper >> testing and reviews by others. Especially for all the features being >> marked >> as supported in the above-mentioned table. > > If it's unstable, we should fix that before adding it in. > > However, the bar is lower for new features that don't break existing > features, especially optional ones and ones that can be easily be > removed if they don't work out in the end. DPDK support was > considered > "experimental" for a long time, it's possible that AF_XDP would be in > the same boat for a while. Thats fine, as long as there are some serious reviews of it. I’ll work on it once I return from PTO, but I guess more would be welcome. >>> If we set performance aside, do you have a reason to want to wait to >>> merge this? (I wasn't able to easily apply this series to current >>> master, so it'll need at least a rebase before we apply it. And I >>> have >>> only skimmed it, not fully reviewed it.) >> >> Other than the items above, do we really need another datapath? > > It's less than a new datapath. It's a new network device > implementation. Sorry yes, from OVS terminology it is… >> With this, we use two or more cores for processing packets. If we >> poll >> two physical ports it could be 300%, which is a typical use case with >> bonding. What about multiple queue support, does it work? Both in >> kernel and DPDK mode we use multiple queues to distribute the load, >> with this scenario does it double the number of CPUs used? Can we use >> the poll() mode as explained here, >> https://linuxplumbersconf.org/event/2/contributions/99/, and how will >> it work with multiple queues/pmd threads? What about any latency >> tests, is it worse or better than kernel/dpdk? Also with the AF_XDP >> datapath, there is no to leverage hardware offload, like DPDK and >> TC. And then there is the part that it only works on the most recent >> kernels. > > These are good questions. William will have some of the answers. > >> To me looking at this I would say it’s far from being ready to be >> merged >> into OVS. However, if others decide to go ahead I think it should be >> disabled, not compiled in by default. > > Yes, that seems reasonable to me.
On 17 Apr 2019, at 19:09, William Tu wrote: > On Wed, Apr 17, 2019 at 1:09 AM Eelco Chaudron <echaudro@redhat.com> > wrote: >> >> >> >> On 16 Apr 2019, at 21:55, Ben Pfaff wrote: >> >>> On Mon, Apr 01, 2019 at 03:46:48PM -0700, William Tu wrote: >>>> The patch series introduces AF_XDP support for OVS netdev. >>>> AF_XDP is a new address family working together with eBPF. >>>> In short, a socket with AF_XDP family can receive and send >>>> packets from an eBPF/XDP program attached to the netdev. >>>> For more details about AF_XDP, please see linux kernel's >>>> Documentation/networking/af_xdp.rst >>> >>> I'm glad to see some more revisions of this series! >> >> I’m planning on reviewing and testing this patch, I’ll try to >> start >> it this week, or else when I get back from PTO. >> >>> AF_XDP is a faster way to access the existing kernel devices. If we >>> take that point of view, then it would be ideal if AF_XDP were >>> automatically used when it was available, instead of adding a new >>> network device type. Is there a reason that this point of view is >>> wrong? That is, when AF_XDP is available, is there a reason not to >>> use >>> it? >> >> This needs support by all the ingress and egress ports in the system, >> and currently, there is no API to check this. > > Not necessary all ports. > On a OVS switch, you can have some ports supporting AF_XDP, > and some ports are other types, ex: DPDK vhost, or tap. But I’m wondering how would you deal with ports not supporting this at driver level? Will you fall back to skb style, will you report this (as it’s interesting to know from a performance level). Guess I just need to look at your code :) >> >> There are also features like traffic shaping that will not work. >> Maybe >> it will be worth adding the table for AF_XDP in >> http://docs.openvswitch.org/en/latest/faq/releases/ > > Right, when using AF_XDP, we don't have QoS support. > If people want to do rate limiting on a AF_XDP port, another > way is to use OpenFlow meter actions. That for me was the only thing that stood out, but just want to make sure no other things were abstracted in the DPDK APIs… Guess you could use the DPDK meters framework to support the same as DPDK, the only thing is that you need enablement of DPDK also. >> >>> You said that your goal for the next version is to improve >>> performance >>> and add optimizations. Do you think that is important before we >>> merge >>> the series? We can continue to improve performance after it is >>> merged. >> >> The previous patch was rather unstable and I could not get it running >> with the PVP test without crashing. I think this patchset should get >> some proper testing and reviews by others. Especially for all the >> features being marked as supported in the above-mentioned table. >> > > Yes, Tim has been helping a lot to test this and I have a couple of > new fixes. I will incorporate into next version. Cool, I’ll talk to Tim offline, in addition, copy me on the next patch and I’ll check it out. Do you have a time frame, so I can do the review based on that revision? >>> If we set performance aside, do you have a reason to want to wait to >>> merge this? (I wasn't able to easily apply this series to current >>> master, so it'll need at least a rebase before we apply it. And I >>> have >>> only skimmed it, not fully reviewed it.) >> >> Other than the items above, do we really need another datapath? With > > This is using the same datapath, the userspace datapath, as OVS-DPDK. > So we don't introduce another datapath, we introduce a new netdev > type. My fault, I was not referring to the OVS data path definition ;) >> this, we use two or more cores for processing packets. If we poll two >> physical ports it could be 300%, which is a typical use case with >> bonding. What about multiple queue support, does it work? Both in >> kernel > > Yes, this patchset only allows 1 pmd and 1 queue. > I'm adding the multiqueue support. We need some alignment here on how we add threads for PMDs XDP vs DPDK. If there are not enough cores for both the system will not start (EMERGENCY exit). And user also might want to control which cores run DPDK and which XDP. >> and DPDK mode we use multiple queues to distribute the load, with >> this >> scenario does it double the number of CPUs used? Can we use the >> poll() >> mode as explained here, >> https://linuxplumbersconf.org/event/2/contributions/99/, and how will >> it >> work with multiple queues/pmd threads? What about any latency tests, >> is >> it worse or better than kernel/dpdk? Also with the AF_XDP datapath, >> there is no to leverage hardware offload, like DPDK and TC. And then >> there is the part that it only works on the most recent kernels. > > You have lots of good points here. > My experiments show that it's slower than DPDK, but much faster than > kernel. Looking for your improvement patch as for me it’s about 10x slower for the kernel with a single queue (see other email). >> >> To me looking at this I would say it’s far from being ready to be >> merged into OVS. However, if others decide to go ahead I think it >> should >> be disabled, not compiled in by default. >> > I agree. This should be experimental feature and we're adding s.t like > #./configure --enable-afxdp > so not compiled in by default > > Thanks > William
On 2 Apr 2019, at 0:46, William Tu wrote: > The patch series introduces AF_XDP support for OVS netdev. > AF_XDP is a new address family working together with eBPF. > In short, a socket with AF_XDP family can receive and send > packets from an eBPF/XDP program attached to the netdev. > For more details about AF_XDP, please see linux kernel's > Documentation/networking/af_xdp.rst > > OVS has a couple of netdev types, i.e., system, tap, or > internal. The patch first adds a new netdev types called > "afxdp", and implement its configuration, packet reception, > and transmit functions. Since the AF_XDP socket, xsk, > operates in userspace, once ovs-vswitchd receives packets > from xsk, the proposed architecture re-uses the existing > userspace dpif-netdev datapath. As a result, most of > the packet processing happens at the userspace instead of > linux kernel. One other issue I found it that if a XDP program is already attached, due to crash or previous loaded one, adding the port will hang. <SNIP>
Hi Eelco, Thanks for your feedbacks! > > > Not necessary all ports. > > On a OVS switch, you can have some ports supporting AF_XDP, > > and some ports are other types, ex: DPDK vhost, or tap. > > But I’m wondering how would you deal with ports not supporting this at > driver level? > Will you fall back to skb style, will you report this (as it’s > interesting to know from a performance level). > Guess I just need to look at your code :) > I'm adding an option when adding the port s.t like options:xdpmode=drv, or skb I put the patch here: https://github.com/williamtu/ovs-ebpf/commit/ef2bfe15db55ecd629cdb75cbc90c7be613745e3 > > >> > >> There are also features like traffic shaping that will not work. > >> Maybe > >> it will be worth adding the table for AF_XDP in > >> http://docs.openvswitch.org/en/latest/faq/releases/ > > > > Right, when using AF_XDP, we don't have QoS support. > > If people want to do rate limiting on a AF_XDP port, another > > way is to use OpenFlow meter actions. > > That for me was the only thing that stood out, but just want to make > sure no other things were abstracted in the DPDK APIs… > > Guess you could use the DPDK meters framework to support the same as > DPDK, the only thing is that you need enablement of DPDK also. > > Right. We can try ./configure --with-dpdk --with-afxdp > >> > >>> You said that your goal for the next version is to improve > >>> performance > >>> and add optimizations. Do you think that is important before we > >>> merge > >>> the series? We can continue to improve performance after it is > >>> merged. > >> > >> The previous patch was rather unstable and I could not get it running > >> with the PVP test without crashing. I think this patchset should get > >> some proper testing and reviews by others. Especially for all the > >> features being marked as supported in the above-mentioned table. > >> > > > > Yes, Tim has been helping a lot to test this and I have a couple of > > new fixes. I will incorporate into next version. > > Cool, I’ll talk to Tim offline, in addition, copy me on the next patch > and I’ll check it out. > Do you have a time frame, so I can do the review based on that revision? > > OK I plan to incorporate your and Tim's feedback, and resubmit next version next Monday (4/22) > >>> If we set performance aside, do you have a reason to want to wait to > >>> merge this? (I wasn't able to easily apply this series to current > >>> master, so it'll need at least a rebase before we apply it. And I > >>> have > >>> only skimmed it, not fully reviewed it.) > >> > > > > Yes, this patchset only allows 1 pmd and 1 queue. > > I'm adding the multiqueue support. > > We need some alignment here on how we add threads for PMDs XDP vs DPDK. > If there are not enough cores for both the system will not start > (EMERGENCY exit). And user also might want to control which cores run > DPDK and which XDP. > Yes, my plan is to use the same commandline interface as OVS-DPDK The pmd-cpu-mask and pmd-rxq-affinity for example 4 pmds: # ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36 // AF_XDP uses 2 # ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \ options:n_rxq=2 options:xdpmode=drv other_config:pmd-rxq-affinity="0:1,1:2" // another DPDK device can use another 2 pmds ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="dpdkvhost-user" other_config:pmd-rxq-affinity="0:3,1:4" > > >> and DPDK mode we use multiple queues to distribute the load, with > >> this > >> scenario does it double the number of CPUs used? Can we use the > >> poll() > >> mode as explained here, > >> https://linuxplumbersconf.org/event/2/contributions/99/, and how will > >> it > >> work with multiple queues/pmd threads? What about any latency tests, > >> is > >> it worse or better than kernel/dpdk? Also with the AF_XDP datapath, > >> there is no to leverage hardware offload, like DPDK and TC. And then > >> there is the part that it only works on the most recent kernels. > > > > You have lots of good points here. > > My experiments show that it's slower than DPDK, but much faster than > > kernel. > > Looking for your improvement patch as for me it’s about 10x slower for > the kernel with a single queue (see other email). > Thanks Regards, William > > >> > >> To me looking at this I would say it’s far from being ready to be > >> merged into OVS. However, if others decide to go ahead I think it > >> should > >> be disabled, not compiled in by default. > >> > > I agree. This should be experimental feature and we're adding s.t like > > #./configure --enable-afxdp > > so not compiled in by default > > > > Thanks > > William >
On 19 Apr 2019, at 0:11, William Tu wrote: > Hi Eelco, > Thanks for your feedbacks! > >> >>> Not necessary all ports. >>> On a OVS switch, you can have some ports supporting AF_XDP, >>> and some ports are other types, ex: DPDK vhost, or tap. >> >> But I’m wondering how would you deal with ports not supporting this >> at >> driver level? >> Will you fall back to skb style, will you report this (as it’s >> interesting to know from a performance level). >> Guess I just need to look at your code :) >> > > I'm adding an option when adding the port > s.t like options:xdpmode=drv, or skb > > I put the patch here: > https://github.com/williamtu/ovs-ebpf/commit/ef2bfe15db55ecd629cdb75cbc90c7be613745e3 Nice! Will review your next patch in detail! >> >>>> >>>> There are also features like traffic shaping that will not work. >>>> Maybe >>>> it will be worth adding the table for AF_XDP in >>>> http://docs.openvswitch.org/en/latest/faq/releases/ >>> >>> Right, when using AF_XDP, we don't have QoS support. >>> If people want to do rate limiting on a AF_XDP port, another >>> way is to use OpenFlow meter actions. >> >> That for me was the only thing that stood out, but just want to make >> sure no other things were abstracted in the DPDK APIs… >> >> Guess you could use the DPDK meters framework to support the same as >> DPDK, the only thing is that you need enablement of DPDK also. >> >> Right. We can try > ./configure --with-dpdk --with-afxdp Yes this way policing is supported, if compiled without DPDK it’s not. Guess we need to give it some thought to see how to warn for this etc. >>>> >>>>> You said that your goal for the next version is to improve >>>>> performance >>>>> and add optimizations. Do you think that is important before we >>>>> merge >>>>> the series? We can continue to improve performance after it is >>>>> merged. >>>> >>>> The previous patch was rather unstable and I could not get it >>>> running >>>> with the PVP test without crashing. I think this patchset should >>>> get >>>> some proper testing and reviews by others. Especially for all the >>>> features being marked as supported in the above-mentioned table. >>>> >>> >>> Yes, Tim has been helping a lot to test this and I have a couple of >>> new fixes. I will incorporate into next version. >> >> Cool, I’ll talk to Tim offline, in addition, copy me on the next >> patch >> and I’ll check it out. >> Do you have a time frame, so I can do the review based on that >> revision? >> >> OK I plan to incorporate your and Tim's feedback, and resubmit next >> version > next Monday (4/22) I’m back from PTO the 30th, so take whatever time you need… >>>>> If we set performance aside, do you have a reason to want to wait >>>>> to >>>>> merge this? (I wasn't able to easily apply this series to current >>>>> master, so it'll need at least a rebase before we apply it. And I >>>>> have >>>>> only skimmed it, not fully reviewed it.) >>>> >>> >>> Yes, this patchset only allows 1 pmd and 1 queue. >>> I'm adding the multiqueue support. >> >> We need some alignment here on how we add threads for PMDs XDP vs >> DPDK. >> If there are not enough cores for both the system will not start >> (EMERGENCY exit). And user also might want to control which cores run >> DPDK and which XDP. >> > > Yes, my plan is to use the same commandline interface as OVS-DPDK > The pmd-cpu-mask and pmd-rxq-affinity > > for example 4 pmds: > # ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36 > // AF_XDP uses 2 > # ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \ > options:n_rxq=2 options:xdpmode=drv > other_config:pmd-rxq-affinity="0:1,1:2" > // another DPDK device can use another 2 pmds > ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 > type="dpdkvhost-user" > other_config:pmd-rxq-affinity="0:3,1:4" In real life, people might not always use pmd-rxq-affinity, especially with people now looking into dynamic re-assignment based on traffic patterns. However, the real problem I was referring two is how to assign specific cores to DPDK PMDs vs AFXPD PMDs. First, if you enable DPDK and AFXPD and only have a single core OVS crashed(force exit). I think should just warn in the log and continue. Secondly, there is no control on which core is used by which type. If you have two hyperthreading pairs you might want to use one sibling set for AFXDP and one for DPDK. Also not talking about NUMA aware yet, which I think needs to be taken care of also. >> >>>> and DPDK mode we use multiple queues to distribute the load, with >>>> this >>>> scenario does it double the number of CPUs used? Can we use the >>>> poll() >>>> mode as explained here, >>>> https://linuxplumbersconf.org/event/2/contributions/99/, and how >>>> will >>>> it >>>> work with multiple queues/pmd threads? What about any latency >>>> tests, >>>> is >>>> it worse or better than kernel/dpdk? Also with the AF_XDP datapath, >>>> there is no to leverage hardware offload, like DPDK and TC. And >>>> then >>>> there is the part that it only works on the most recent kernels. >>> >>> You have lots of good points here. >>> My experiments show that it's slower than DPDK, but much faster than >>> kernel. >> >> Looking for your improvement patch as for me it’s about 10x slower >> for >> the kernel with a single queue (see other email). >> > > Thanks > Regards, > William > > >> >>>> >>>> To me looking at this I would say it’s far from being ready to be >>>> merged into OVS. However, if others decide to go ahead I think it >>>> should >>>> be disabled, not compiled in by default. >>>> >>> I agree. This should be experimental feature and we're adding s.t >>> like >>> #./configure --enable-afxdp >>> so not compiled in by default >>> >>> Thanks >>> William >>