diff mbox series

[ovs-dev,PATCHv6] netdev-afxdp: add new netdev type for AF_XDP.

Message ID 1556149621-5103-1-git-send-email-u9012063@gmail.com
State Superseded
Headers show
Series [ovs-dev,PATCHv6] netdev-afxdp: add new netdev type for AF_XDP. | expand

Commit Message

William Tu April 24, 2019, 11:47 p.m. UTC
The patch introduces experimental AF_XDP support for OVS netdev.
AF_XDP is a new address family working together with eBPF/XDP.
A socket with AF_XDP family can receive and send raw packets
from an eBPF/XDP program attached to the netdev.
For details introduction and configuration, see
Documentation/intro/install/afxdp.rst

Signed-off-by: William Tu <u9012063@gmail.com>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Co-authored-by: Yi-Hung Wei <yihung.wei@gmail.com>

---
v1->v2:
- add a list to maintain unused umem elements
- remove copy from rx umem to ovs internal buffer
- use hugetlb to reduce misses (not much difference)
- use pmd mode netdev in OVS (huge performance improve)
- remove malloc dp_packet, instead put dp_packet in umem

v2->v3:
- rebase on the OVS master, 7ab4b0653784
  ("configure: Check for more specific function to pull in pthread library.")
- remove the dependency on libbpf and dpif-bpf.
  instead, use the built-in XDP_ATTACH feature.
- data structure optimizations for better performance, see[1]
- more test cases support
v3: https://mail.openvswitch.org/pipermail/ovs-dev/2018-November/354179.html

v3->v4:
- Use AF_XDP API provided by libbpf
- Remove the dependency on XDP_ATTACH kernel patch set
- Add documentation, bpf.rst

v4->v5:
- rebase to master
- remove rfc, squash all into a single patch
- add --enable-afxdp, so by default, AF_XDP is not compiled
- add options: xdpmode=drv,skb
- add multiple queue and multiple PMD support, with options: n_rxq
- improve documentation, rename bpf.rst to af_xdp.rst

v5->v6
- rebase to master, commit 0cdd5b13de91b98
- address errors from sparse and clang
- pass travis-ci test
- address feedback from Ben
- fix issues reported by 0-day robot
- improved documentation
---
 Documentation/automake.mk             |   1 +
 Documentation/index.rst               |   1 +
 Documentation/intro/install/afxdp.rst | 366 +++++++++++++
 Documentation/intro/install/index.rst |   1 +
 acinclude.m4                          |  23 +
 configure.ac                          |   1 +
 lib/automake.mk                       |   7 +-
 lib/dp-packet.c                       |  16 +
 lib/dp-packet.h                       |  35 +-
 lib/dpif-netdev-perf.h                |  13 +
 lib/netdev-afxdp.c                    | 589 ++++++++++++++++++++
 lib/netdev-afxdp.h                    |  47 ++
 lib/netdev-linux.c                    |  89 +++-
 lib/netdev-linux.h                    |   1 +
 lib/netdev-provider.h                 |   1 +
 lib/netdev.c                          |   1 +
 lib/xdpsock.c                         | 210 ++++++++
 lib/xdpsock.h                         | 133 +++++
 tests/automake.mk                     |  17 +
 tests/system-afxdp-macros.at          | 153 ++++++
 tests/system-afxdp-testsuite.at       |  26 +
 tests/system-afxdp-traffic.at         | 978 ++++++++++++++++++++++++++++++++++
 22 files changed, 2703 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/intro/install/afxdp.rst
 create mode 100644 lib/netdev-afxdp.c
 create mode 100644 lib/netdev-afxdp.h
 create mode 100644 lib/xdpsock.c
 create mode 100644 lib/xdpsock.h
 create mode 100644 tests/system-afxdp-macros.at
 create mode 100644 tests/system-afxdp-testsuite.at
 create mode 100644 tests/system-afxdp-traffic.at

Comments

Ilya Maximets April 25, 2019, 3:09 p.m. UTC | #1
Hi.

This is not a full review. Just a bunch of thoughts.

See inline.

Best regards, Ilya Maximets.

On 25.04.2019 2:47, William Tu wrote:
> The patch introduces experimental AF_XDP support for OVS netdev.
> AF_XDP is a new address family working together with eBPF/XDP.
> A socket with AF_XDP family can receive and send raw packets
> from an eBPF/XDP program attached to the netdev.
> For details introduction and configuration, see
> Documentation/intro/install/afxdp.rst
> 
> Signed-off-by: William Tu <u9012063@gmail.com>
> Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
> Co-authored-by: Yi-Hung Wei <yihung.wei@gmail.com>
> ---
> v1->v2:
> - add a list to maintain unused umem elements
> - remove copy from rx umem to ovs internal buffer
> - use hugetlb to reduce misses (not much difference)
> - use pmd mode netdev in OVS (huge performance improve)
> - remove malloc dp_packet, instead put dp_packet in umem
> 
> v2->v3:
> - rebase on the OVS master, 7ab4b0653784
>   ("configure: Check for more specific function to pull in pthread library.")
> - remove the dependency on libbpf and dpif-bpf.
>   instead, use the built-in XDP_ATTACH feature.
> - data structure optimizations for better performance, see[1]
> - more test cases support
> v3: https://mail.openvswitch.org/pipermail/ovs-dev/2018-November/354179.html
> 
> v3->v4:
> - Use AF_XDP API provided by libbpf
> - Remove the dependency on XDP_ATTACH kernel patch set
> - Add documentation, bpf.rst
> 
> v4->v5:
> - rebase to master
> - remove rfc, squash all into a single patch
> - add --enable-afxdp, so by default, AF_XDP is not compiled
> - add options: xdpmode=drv,skb
> - add multiple queue and multiple PMD support, with options: n_rxq
> - improve documentation, rename bpf.rst to af_xdp.rst
> 
> v5->v6
> - rebase to master, commit 0cdd5b13de91b98
> - address errors from sparse and clang
> - pass travis-ci test
> - address feedback from Ben
> - fix issues reported by 0-day robot
> - improved documentation
> ---
>  Documentation/automake.mk             |   1 +
>  Documentation/index.rst               |   1 +
>  Documentation/intro/install/afxdp.rst | 366 +++++++++++++
>  Documentation/intro/install/index.rst |   1 +
>  acinclude.m4                          |  23 +
>  configure.ac                          |   1 +
>  lib/automake.mk                       |   7 +-
>  lib/dp-packet.c                       |  16 +
>  lib/dp-packet.h                       |  35 +-
>  lib/dpif-netdev-perf.h                |  13 +
>  lib/netdev-afxdp.c                    | 589 ++++++++++++++++++++
>  lib/netdev-afxdp.h                    |  47 ++
>  lib/netdev-linux.c                    |  89 +++-
>  lib/netdev-linux.h                    |   1 +
>  lib/netdev-provider.h                 |   1 +
>  lib/netdev.c                          |   1 +
>  lib/xdpsock.c                         | 210 ++++++++
>  lib/xdpsock.h                         | 133 +++++
>  tests/automake.mk                     |  17 +
>  tests/system-afxdp-macros.at          | 153 ++++++
>  tests/system-afxdp-testsuite.at       |  26 +
>  tests/system-afxdp-traffic.at         | 978 ++++++++++++++++++++++++++++++++++
>  22 files changed, 2703 insertions(+), 6 deletions(-)
>  create mode 100644 Documentation/intro/install/afxdp.rst
>  create mode 100644 lib/netdev-afxdp.c
>  create mode 100644 lib/netdev-afxdp.h
>  create mode 100644 lib/xdpsock.c
>  create mode 100644 lib/xdpsock.h
>  create mode 100644 tests/system-afxdp-macros.at
>  create mode 100644 tests/system-afxdp-testsuite.at
>  create mode 100644 tests/system-afxdp-traffic.at
> 
> diff --git a/Documentation/automake.mk b/Documentation/automake.mk
> index 082438e09a33..11cc59efc881 100644
> --- a/Documentation/automake.mk
> +++ b/Documentation/automake.mk
> @@ -10,6 +10,7 @@ DOC_SOURCE = \
>  	Documentation/intro/why-ovs.rst \
>  	Documentation/intro/install/index.rst \
>  	Documentation/intro/install/bash-completion.rst \
> +	Documentation/intro/install/afxdp.rst \
>  	Documentation/intro/install/debian.rst \
>  	Documentation/intro/install/documentation.rst \
>  	Documentation/intro/install/distributions.rst \
> diff --git a/Documentation/index.rst b/Documentation/index.rst
> index 46261235c732..aa9e7c49f179 100644
> --- a/Documentation/index.rst
> +++ b/Documentation/index.rst
> @@ -59,6 +59,7 @@ vSwitch? Start here.
>    :doc:`intro/install/windows` |
>    :doc:`intro/install/xenserver` |
>    :doc:`intro/install/dpdk` |
> +  :doc:`intro/install/afxdp` |
>    :doc:`Installation FAQs <faq/releases>`
>  
>  - **Tutorials:** :doc:`tutorials/faucet` |
> diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst
> new file mode 100644
> index 000000000000..a1e3317bbdb5
> --- /dev/null
> +++ b/Documentation/intro/install/afxdp.rst
> @@ -0,0 +1,366 @@
> +..
> +      Licensed under the Apache License, Version 2.0 (the "License"); you may
> +      not use this file except in compliance with the License. You may obtain
> +      a copy of the License at
> +
> +          http://www.apache.org/licenses/LICENSE-2.0
> +
> +      Unless required by applicable law or agreed to in writing, software
> +      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
> +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
> +      License for the specific language governing permissions and limitations
> +      under the License.
> +
> +      Convention for heading levels in Open vSwitch documentation:
> +
> +      =======  Heading 0 (reserved for the title in a document)
> +      -------  Heading 1
> +      ~~~~~~~  Heading 2
> +      +++++++  Heading 3
> +      '''''''  Heading 4
> +
> +      Avoid deeper levels because they do not render well.
> +
> +
> +========================
> +Open vSwitch with AF_XDP
> +========================
> +
> +This document describes how to build and install Open vSwitch using
> +AF_XDP netdev.
> +
> +.. warning::
> +  The AF_XDP support of Open vSwitch is considered 'experimental',
> +  and it is not compiled in by default.
> +
> +Introduction
> +------------
> +AF_XDP, Address Family of the eXpress Data Path, is a new Linux socket type
> +built upon the eBPF and XDP technology.  It is aims to have comparable
> +performance to DPDK but cooperate better with existing kernel's networking
> +stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP program
> +attached to the netdev, by-passing a couple of Linux kernel's subsystems.
> +As a result, AF_XDP socket shows much better performance than AF_PACKET.
> +For more details about AF_XDP, please see linux kernel's
> +Documentation/networking/af_xdp.rst
> +
> +
> +AF_XDP Netdev
> +-------------
> +OVS has a couple of netdev types, i.e., system, tap, or
> +internal.  The AF_XDP feature adds a new netdev types called
> +"afxdp", and implement its configuration, packet reception,
> +and transmit functions.  Since the AF_XDP socket, xsk,
> +operates in userspace, once ovs-vswitchd receives packets
> +from xsk, the proposed architecture re-uses the existing
> +userspace dpif-netdev datapath.  As a result, most of
> +the packet processing happens at the userspace instead of
> +linux kernel.
> +
> +::
> +
> +              |   +-------------------+
> +              |   |    ovs-vswitchd   |<-->ovsdb-server
> +              |   +-------------------+
> +              |   |      ofproto      |<-->OpenFlow controllers
> +              |   +--------+-+--------+
> +              |   | netdev | |ofproto-|
> +    userspace |   +--------+ |  dpif  |
> +              |   | afxdp  | +--------+
> +              |   | netdev | |  dpif  |
> +              |   +---||---+ +--------+
> +              |       ||     |  dpif- |
> +              |       ||     | netdev |
> +              |_      ||     +--------+
> +                      ||
> +               _  +---||-----+--------+
> +              |   | AF_XDP prog +     |
> +       kernel |   |   xsk_map         |
> +              |_  +--------||---------+
> +                           ||
> +                        physical
> +                           NIC
> +
> +
> +Build requirements
> +------------------
> +
> +In addition to the requirements described in :doc:`general`, building Open
> +vSwitch with AF_XDP will require the following:
> +
> +- libbpf from kernel source tree (kernel 5.0.0 or later)
> +
> +- Linux kernel XDP support, with the following options (required)
> +  ``_CONFIG_BPF=y``
> +
> +  ``_CONFIG_BPF_SYSCALL=y``
> +
> +  ``_CONFIG_XDP_SOCKETS=y``
> +
> +
> +- The following optional Kconfig options are also recommended, but not
> +  required:
> +
> +  ``_CONFIG_BPF_JIT=y`` (Performance)
> +
> +  ``_CONFIG_HAVE_BPF_JIT=y`` (Performance)
> +
> +  ``_CONFIG_XDP_SOCKETS_DIAG=y`` (Debugging)
> +
> +- If possible, run **./xdpsock -r -N -z -i <your device>** under
> +  linux/samples/bpf.  This is the OVS indepedent benchmark tools for AF_XDP.
> +  It makes sure your basic kernel requirements are met for AF_XDP.
> +
> +
> +Installing
> +----------
> +For OVS to use AF_XDP netdev, it has to be configured with LIBBPF support.
> +Frist, clone a recent version of Linux bpf-next tree::
> +
> +  git clone git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
> +
> +Second, go into the Linux source directory and build libbpf in the tools
> +directory::
> +
> +  cd bpf-next/
> +  cd tools/lib/bpf/
> +  make && make install
> +  make install_headers
> +
> +.. note::
> +   Make sure xsk.h and bpf.h are installed in system's library path,
> +   e.g. /usr/local/include/bpf/ or /usr/include/bpf/
> +
> +Make sure the libbpf.so is installed correctly::
> +
> +  ldconfig
> +  ldconfig -p | grep libbpf
> +
> +
> +Third, ensure the standard OVS requirements are installed and
> +bootstrap/configure the package::
> +
> +  ./boot.sh && ./configure --enable-afxdp
> +
> +Finally, build and install OVS::
> +
> +  make && make install
> +
> +To kick start end-to-end autotesting::
> +
> +  uname -a # make sure having 5.0+ kernel
> +  make check-afxdp
> +
> +if a test case fails, check the log at::
> +
> +  cat tests/system-afxdp-testsuite.dir/<number>/system-afxdp-testsuite.log
> +
> +
> +Setup AF_XDP netdev
> +-------------------
> +Before running OVS with AF_XDP, make sure the libbpf and libelf are
> +set-up right::
> +
> +  ldd vswitchd/ovs-vswitchd
> +
> +Open vSwitch should be started using userspace datapath as described
> +in :doc:`general`::
> +
> +  ovs-vswitchd --disable-system
> +  ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
> +
> +.. note::
> +   OVS AF_XDP netdev is using the userspace datapath, the same datapath
> +   as used by OVS-DPDK.  So it requires --disable-system for ovs-vswitchd
> +   and datapath_type=netdev when adding a new bridge.
> +
> +Make sure your device support AF_XDP, and to use 1 PMD (on core 4)
> +on 1 queue (queue 0) device, configure these options: **pmd-cpu-mask,
> +pmd-rxq-affinity, and n_rxq**. The **xdpmode** can be "drv" or "skb"::
> +
> +  ethtool -L enp2s0 combined 1
> +  ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
> +  ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
> +    options:n_rxq=1 options:xdpmode=drv \
> +    other_config:pmd-rxq-affinity="0:4"
> +
> +Or, use 4 pmds/cores and 4 queues by doing::
> +
> +  ethtool -L enp2s0 combined 4
> +  ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
> +  ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
> +    options:n_rxq=4 options:xdpmode=drv \
> +    other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
> +
> +To validate that the bridge has successfully instantiated, you can use the::
> +
> +  ovs-vsctl show
> +
> +should show something like::
> +
> +  Port "ens802f0"
> +   Interface "ens802f0"
> +      type: afxdp
> +      options: {n_rxq="1", xdpmode=drv}
> +
> +Otherwise, enable debug by::
> +
> +  ovs-appctl vlog/set netdev_afxdp::dbg
> +
> +
> +References
> +----------
> +Most of the design details are described in the paper presented at
> +Linux Plumber 2018, "Bringing the Power of eBPF to Open vSwitch"[1],
> +section 4, and slides[2][4].
> +"The Path to DPDK Speeds for AF XDP"[3] gives a very good introduction
> +about AF_XDP current and future work.
> +
> +
> +[1] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-afxdp.pdf
> +
> +[2] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-lpc18-presentation.pdf
> +
> +[3] http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf
> +
> +[4] https://ovsfall2018.sched.com/event/IO7p/fast-userspace-ovs-with-afxdp
> +
> +
> +Performance Tuning
> +------------------
> +The name of the game is to keep your CPU running in userspace, allowing PMD
> +to keep polling the AF_XDP queues without any interferences from kernel.
> +
> +#. Make sure everything is in the same NUMA node (memory used by AF_XDP, pmd
> +   running cores, device plug-in slot)
> +
> +#. Isolate your CPU by doing isolcpu at grub configure.
> +
> +#. IRQ should not set to pmd running core.
> +
> +#. The Spectre and Meltdown fixes increase the overhead of system calls.
> +
> +Debugging performance issue
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +While running the traffic, use linux perf tool to see where your cpu
> +spends its cycle::
> +
> +  cd bpf-next/tools/perf
> +  make
> +  ./perf record -p `pidof ovs-vswitchd` sleep 10
> +  ./perf report
> +
> +Measure your system call rate by doing::
> +
> +  pstree -p `pidof ovs-vswitchd`
> +  strace -c -p <your pmd's PID>
> +
> +Or, use OVS pmd tool::
> +
> +  ovs-appctl dpif-netdev/pmd-stats-show
> +
> +
> +Example Script
> +--------------
> +
> +Below is a script using namespaces and veth peer::
> +
> +  #!/bin/bash
> +  ovs-vswitchd --no-chdir --pidfile -vvconn -vofproto_dpif -vunixctl \
> +    --disable-system --detach \
> +  ovs-vsctl -- add-br br0 -- set Bridge br0 \
> +    protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14 \
> +    fail-mode=secure datapath_type=netdev
> +  ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
> +
> +  ip netns add at_ns0
> +  ovs-appctl vlog/set netdev_afxdp::dbg
> +
> +  ip link add p0 type veth peer name afxdp-p0
> +  ip link set p0 netns at_ns0
> +  ip link set dev afxdp-p0 up
> +  ovs-vsctl add-port br0 afxdp-p0 -- \
> +    set interface afxdp-p0 external-ids:iface-id="p0" type="afxdp"
> +
> +  ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
> +  ip addr add "10.1.1.1/24" dev p0
> +  ip link set dev p0 up
> +  NS_EXEC_HEREDOC
> +
> +  ip netns add at_ns1
> +  ip link add p1 type veth peer name afxdp-p1
> +  ip link set p1 netns at_ns1
> +  ip link set dev afxdp-p1 up
> +
> +  ovs-vsctl add-port br0 afxdp-p1 -- \
> +    set interface afxdp-p1 external-ids:iface-id="p1" type="afxdp"
> +  ip netns exec at_ns1 sh << NS_EXEC_HEREDOC
> +  ip addr add "10.1.1.2/24" dev p1
> +  ip link set dev p1 up
> +  NS_EXEC_HEREDOC
> +
> +  ip netns exec at_ns0 ping -i .2 10.1.1.2
> +
> +
> +Limitations/Known Issues
> +------------------------
> +#. Device's numa ID is always 0, need a way to find numa id from a netdev.
> +#. No QoS support because AF_XDP netdev by-pass the Linux TC layer. A possible
> +   work-around is to use OpenFlow meter action.
> +#. AF_XDP device added to bridge, remove, and added again will fail.
> +#. Most of the tests are done using i40e single port. Multiple ports and
> +   also ixgbe driver also needs to be tested.
> +#. No latency test result (TODO items)
> +
> +
> +make check-afxdp
> +----------------
> +When executing 'make check-afxdp', OVS creates namespaces, sets up AF_XDP on
> +veth devices and kicks start the testing.  So far we have the following test
> +cases::
> +
> + AF_XDP netdev datapath-sanity
> +
> +  1: datapath - ping between two ports               ok
> +  2: datapath - ping between two ports on vlan       ok
> +  3: datapath - ping6 between two ports              ok
> +  4: datapath - ping6 between two ports on vlan      ok
> +  5: datapath - ping over vxlan tunnel               ok
> +  6: datapath - ping over vxlan6 tunnel              ok
> +  7: datapath - ping over gre tunnel                 ok
> +  8: datapath - ping over erspan v1 tunnel           ok
> +  9: datapath - ping over erspan v2 tunnel           ok
> + 10: datapath - ping over ip6erspan v1 tunnel        ok
> + 11: datapath - ping over ip6erspan v2 tunnel        ok
> + 12: datapath - ping over geneve tunnel              ok
> + 13: datapath - ping over geneve6 tunnel             ok
> + 14: datapath - clone action                         ok
> + 15: datapath - basic truncate action                ok
> +
> + conntrack
> +
> + 16: conntrack - controller                          ok
> + 17: conntrack - force commit                        ok
> + 18: conntrack - ct flush by 5-tuple                 ok
> + 19: conntrack - IPv4 ping                           ok
> + 20: conntrack - get_nconns and get/set_maxconns     ok
> + 21: conntrack - IPv6 ping                           ok
> +
> + system-ovn
> +
> + 22: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT ok
> + 23: ovn -- 2 LRs connected via LS, gateway router, easy SNAT ok
> + 24: ovn -- multiple gateway routers, SNAT and DNAT  ok
> + 25: ovn -- load-balancing                           ok
> + 26: ovn -- load-balancing - same subnet.            ok
> + 27: ovn -- load balancing in gateway router         ok
> + 28: ovn -- multiple gateway routers, load-balancing ok
> + 29: ovn -- load balancing in router with gateway router port ok
> + 30: ovn -- DNAT and SNAT on distributed router - N/S ok
> + 31: ovn -- DNAT and SNAT on distributed router - E/W ok
> +
> +
> +Bug Reporting
> +-------------
> +
> +Please report problems to dev@openvswitch.org.
> diff --git a/Documentation/intro/install/index.rst b/Documentation/intro/install/index.rst
> index 3193c736cf17..c27a9c9d16ff 100644
> --- a/Documentation/intro/install/index.rst
> +++ b/Documentation/intro/install/index.rst
> @@ -45,6 +45,7 @@ Installation from Source
>     xenserver
>     userspace
>     dpdk
> +   afxdp
>  
>  Installation from Packages
>  --------------------------
> diff --git a/acinclude.m4 b/acinclude.m4
> index 301aeb70d82a..d80f2494d514 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -221,6 +221,29 @@ AC_DEFUN([OVS_FIND_DEPENDENCY], [
>    ])
>  ])
>  
> +dnl OVS_CHECK_LINUX_AF_XDP
> +dnl
> +dnl Check both Linux kernel AF_XDP and libbpf support
> +AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
> +  AC_MSG_CHECKING([whether AF_XDP is supported])
> +  AC_ARG_ENABLE([afxdp],
> +                [AC_HELP_STRING([--enable-afxdp], [Enable AF-XDP support])],
> +                [], [enable_afxdp=no])
> +  AC_CHECK_HEADER([bpf/libbpf.h],
> +                  [HAVE_LIBBPF=yes],
> +                  [HAVE_LIBBPF=no])
> +  AC_CHECK_HEADER([linux/if_xdp.h],
> +                  [HAVE_IF_XDP=yes],
> +                  [HAVE_IF_XDP=no])
> +  AM_CONDITIONAL([SUPPORT_AF_XDP],
> +                 [test "$enable_afxdp" = yes && test "$HAVE_LIBBPF" = yes && test "$HAVE_IF_XDP" = yes])
> +  AM_COND_IF([SUPPORT_AF_XDP], [
> +    AC_DEFINE([HAVE_AF_XDP], [1], [Define to 1 if AF-XDP support is available and enabled.])
> +    LIBBPF_LDADD=" -lbpf -lelf"
> +    AC_SUBST([LIBBPF_LDADD])
> +  ])
> +])
> +

I think that configure should fail in case we have no required headers.
It's confusing that I explicitly enabled afxdp, but OVS was built without
its support.
One more thing is that AC_MSG_CHECKING requires subsequent AC_MSG_RESULT,
otherwise it will look not good.

Suggesting following incremental:

diff --git a/acinclude.m4 b/acinclude.m4
index d80f2494d..c919af570 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -225,23 +225,26 @@ dnl OVS_CHECK_LINUX_AF_XDP
 dnl
 dnl Check both Linux kernel AF_XDP and libbpf support
 AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
-  AC_MSG_CHECKING([whether AF_XDP is supported])
   AC_ARG_ENABLE([afxdp],
                 [AC_HELP_STRING([--enable-afxdp], [Enable AF-XDP support])],
                 [], [enable_afxdp=no])
-  AC_CHECK_HEADER([bpf/libbpf.h],
-                  [HAVE_LIBBPF=yes],
-                  [HAVE_LIBBPF=no])
-  AC_CHECK_HEADER([linux/if_xdp.h],
-                  [HAVE_IF_XDP=yes],
-                  [HAVE_IF_XDP=no])
-  AM_CONDITIONAL([SUPPORT_AF_XDP],
-                 [test "$enable_afxdp" = yes && test "$HAVE_LIBBPF" = yes && test "$HAVE_IF_XDP" = yes])
-  AM_COND_IF([SUPPORT_AF_XDP], [
-    AC_DEFINE([HAVE_AF_XDP], [1], [Define to 1 if AF-XDP support is available and enabled.])
+  AC_MSG_CHECKING([whether AF_XDP is enabled])
+  if test "$enable_afxdp" != yes; then
+    AC_MSG_RESULT([no])
+  else
+    AC_MSG_RESULT([yes])
+
+    AC_CHECK_HEADER([bpf/libbpf.h], [],
+      [AC_MSG_ERROR([unable to find bpf/libbpf.h for AF_XDP support])])
+
+    AC_CHECK_HEADER([linux/if_xdp.h], [],
+      [AC_MSG_ERROR([unable to find linux/if_xdp.h for AF_XDP support])])
+
+    AC_DEFINE([HAVE_AF_XDP], [1],
+              [Define to 1 if AF-XDP support is available and enabled.])
     LIBBPF_LDADD=" -lbpf -lelf"
     AC_SUBST([LIBBPF_LDADD])
-  ])
+  fi
 ])
 
 dnl OVS_CHECK_DPDK
---


>  dnl OVS_CHECK_DPDK
>  dnl
>  dnl Configure DPDK source tree
> diff --git a/configure.ac b/configure.ac
> index 505e3d041e93..29c90b73f836 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -99,6 +99,7 @@ OVS_CHECK_SPHINX
>  OVS_CHECK_DOT
>  OVS_CHECK_IF_DL
>  OVS_CHECK_STRTOK_R
> +OVS_CHECK_LINUX_AF_XDP
>  AC_CHECK_DECLS([sys_siglist], [], [], [[#include <signal.h>]])
>  AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct stat.st_mtimensec],
>    [], [], [[#include <sys/stat.h>]])
> diff --git a/lib/automake.mk b/lib/automake.mk
> index cc5dccf39d6b..8b9df5635bbe 100644
> --- a/lib/automake.mk
> +++ b/lib/automake.mk
> @@ -9,6 +9,7 @@ lib_LTLIBRARIES += lib/libopenvswitch.la
>  
>  lib_libopenvswitch_la_LIBADD = $(SSL_LIBS)
>  lib_libopenvswitch_la_LIBADD += $(CAPNG_LDADD)
> +lib_libopenvswitch_la_LIBADD += $(LIBBPF_LDADD)
>  
>  if WIN32
>  lib_libopenvswitch_la_LIBADD += ${PTHREAD_LIBS}
> @@ -327,7 +328,11 @@ lib_libopenvswitch_la_SOURCES = \
>  	lib/lldp/lldpd.c \
>  	lib/lldp/lldpd.h \
>  	lib/lldp/lldpd-structs.c \
> -	lib/lldp/lldpd-structs.h
> +	lib/lldp/lldpd-structs.h \
> +	lib/xdpsock.c \
> +	lib/xdpsock.h \
> +	lib/netdev-afxdp.c \
> +	lib/netdev-afxdp.h

Maybe it's better to move all these files under #ifdef HAVE_AF_XDP ?

>  
>  if WIN32
>  lib_libopenvswitch_la_SOURCES += \
> diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> index 0976a35e758b..a61552f72988 100644
> --- a/lib/dp-packet.c
> +++ b/lib/dp-packet.c
> @@ -22,6 +22,9 @@
>  #include "netdev-dpdk.h"
>  #include "openvswitch/dynamic-string.h"
>  #include "util.h"
> +#ifdef HAVE_AF_XDP
> +#include "xdpsock.h"
> +#endif
>  
>  static void
>  dp_packet_init__(struct dp_packet *b, size_t allocated, enum dp_packet_source source)
> @@ -122,6 +125,16 @@ dp_packet_uninit(struct dp_packet *b)
>               * created as a dp_packet */
>              free_dpdk_buf((struct dp_packet*) b);
>  #endif
> +        } else if (b->source == DPBUF_AFXDP) {
> +#ifdef HAVE_AF_XDP
> +            struct dp_packet_afxdp *xpacket;
> +
> +            xpacket = dp_packet_cast_afxdp(b);
> +            if (xpacket->mpool) {
> +                umem_elem_push(xpacket->mpool, dp_packet_base(b));
> +            }
> +#endif

Why not making the same trick as we have for DPDK few lines above?
i.e. wrap this part in a function like 'free_afxdp_buf' and move it
to the netdev-afxdp.c ? You will not need to expose so many internals
to generic code. dp_packet_cast_afxdp() will also be moved there along
with 'struct dp_packet_afxdp'.

BTW, I hope, someday, I'll finally implement 'dp-packet-memory-provider'
abstraction for OVS.

> +            return;
>          }
>      }
>  }
> @@ -248,6 +261,8 @@ dp_packet_resize__(struct dp_packet *b, size_t new_headroom, size_t new_tailroom
>      case DPBUF_STACK:
>          OVS_NOT_REACHED();
>  
> +    case DPBUF_AFXDP:
> +        OVS_NOT_REACHED();

Some space required between cases.

>      case DPBUF_STUB:
>          b->source = DPBUF_MALLOC;
>          new_base = xmalloc(new_allocated);
> @@ -433,6 +448,7 @@ dp_packet_steal_data(struct dp_packet *b)
>  {
>      void *p;
>      ovs_assert(b->source != DPBUF_DPDK);
> +    ovs_assert(b->source != DPBUF_AFXDP);
>  
>      if (b->source == DPBUF_MALLOC && dp_packet_data(b) == dp_packet_base(b)) {
>          p = dp_packet_data(b);
> diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> index a5e9ade1244a..774728eef330 100644
> --- a/lib/dp-packet.h
> +++ b/lib/dp-packet.h
> @@ -25,6 +25,10 @@
>  #include <rte_mbuf.h>
>  #endif
>  
> +#ifdef HAVE_AF_XDP
> +#include "lib/xdpsock.h"
> +#endif
> +
>  #include "netdev-dpdk.h"
>  #include "openvswitch/list.h"
>  #include "packets.h"
> @@ -42,6 +46,7 @@ enum OVS_PACKED_ENUM dp_packet_source {
>      DPBUF_DPDK,                /* buffer data is from DPDK allocated memory.
>                                  * ref to dp_packet_init_dpdk() in dp-packet.c.
>                                  */
> +    DPBUF_AFXDP,                /* buffer data from XDP frame */

Please, move the comment one space left.

>  };
>  
>  #define DP_PACKET_CONTEXT_SIZE 64
> @@ -89,6 +94,20 @@ struct dp_packet {
>      };
>  };
>  
> +struct dp_packet_afxdp {
> +    struct umem_pool *mpool;
> +    struct dp_packet packet;
> +};
> +
> +#if HAVE_AF_XDP
> +static struct dp_packet_afxdp *
> +dp_packet_cast_afxdp(const struct dp_packet *d OVS_UNUSED)
> +{
> +    ovs_assert(d->source == DPBUF_AFXDP);
> +    return CONTAINER_OF(d, struct dp_packet_afxdp, packet);
> +}
> +#endif
> +
>  static inline void *dp_packet_data(const struct dp_packet *);
>  static inline void dp_packet_set_data(struct dp_packet *, void *);
>  static inline void *dp_packet_base(const struct dp_packet *);
> @@ -183,7 +202,21 @@ dp_packet_delete(struct dp_packet *b)
>              free_dpdk_buf((struct dp_packet*) b);
>              return;
>          }
> -
> +        if (b->source == DPBUF_AFXDP) {
> +#ifdef HAVE_AF_XDP
> +            struct dp_packet_afxdp *xpacket;
> +
> +            /* if a packet is received from afxdp port,
> +             * and tx to a system port. Then we need to
> +             * push the rx umem back here
> +             */
> +            xpacket = dp_packet_cast_afxdp(b);
> +            if (xpacket->mpool) {
> +                umem_elem_push(xpacket->mpool, dp_packet_base(b));
> +            }
> +#endif
> +            return;
> +        }
>          dp_packet_uninit(b);
>          free(b);
>      }
> diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
> index 859c05613ddf..e47cf73bf3c9 100644
> --- a/lib/dpif-netdev-perf.h
> +++ b/lib/dpif-netdev-perf.h
> @@ -198,6 +198,19 @@ cycles_counter_update(struct pmd_perf_stats *s)
>  {
>  #ifdef DPDK_NETDEV
>      return s->last_tsc = rte_get_tsc_cycles();
> +#elif HAVE_AF_XDP
> +    union {
> +        uint64_t tsc_64;
> +        struct {
> +            uint32_t lo_32;
> +            uint32_t hi_32;
> +        };
> +    } tsc;
> +    asm volatile("rdtsc" :
> +             "=a" (tsc.lo_32),
> +             "=d" (tsc.hi_32));

We need to check that we're on x86 machine.
Build should fail, I think. For now, you may add following code
to the head of netdev-afxdp.c:

#if !defined(__i386__) && !defined(__x86_64__)
#error AF_XDP supported only for Linux on x86 or x86_64
#endif

> +
> +    return s->last_tsc = tsc.tsc_64;
>  #else
>      return s->last_tsc = 0;
>  #endif
> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
> new file mode 100644
> index 000000000000..4c71061fc102
> --- /dev/null
> +++ b/lib/netdev-afxdp.c
> @@ -0,0 +1,589 @@
> +/*
> + * Copyright (c) 2018, 2019 Nicira, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#include <config.h>
> +#ifdef HAVE_AF_XDP
> +#include "netdev-linux.h"
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <sys/types.h>
> +#include <netinet/in.h>
> +#include <arpa/inet.h>
> +#include <inttypes.h>
> +#include <sys/ioctl.h>
> +#include <sys/socket.h>
> +#include <sys/utsname.h>
> +#include <netpacket/packet.h>
> +#include <net/if.h>
> +#include <net/if_arp.h>
> +#include <net/route.h>
> +#include <poll.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <unistd.h>
> +
> +#include "coverage.h"
> +#include "dp-packet.h"
> +#include "dpif-netlink.h"
> +#include "dpif-netdev.h"
> +#include "openvswitch/dynamic-string.h"
> +#include "fatal-signal.h"
> +#include "hash.h"
> +#include "openvswitch/hmap.h"
> +#include "netdev-provider.h"
> +#include "netdev-tc-offloads.h"
> +#include "netdev-vport.h"
> +#include "netlink-notifier.h"
> +#include "netlink-socket.h"
> +#include "netlink.h"
> +#include "netnsid.h"
> +#include "openvswitch/ofpbuf.h"
> +#include "openflow/openflow.h"
> +#include "ovs-atomic.h"
> +#include "packets.h"
> +#include "openvswitch/poll-loop.h"
> +#include "rtnetlink.h"
> +#include "openvswitch/shash.h"
> +#include "socket-util.h"
> +#include "sset.h"
> +#include "tc.h"
> +#include "timer.h"
> +#include "unaligned.h"
> +#include "openvswitch/vlog.h"
> +#include "util.h"
> +#include "netdev-afxdp.h"
> +
> +#include <linux/if_ether.h>
> +#include <linux/if_tun.h>
> +#include <linux/types.h>
> +#include <linux/ethtool.h>
> +#include <linux/mii.h>
> +#include <linux/rtnetlink.h>
> +#include <linux/sockios.h>
> +#include <linux/if_xdp.h>
> +#include "xdpsock.h"
> +
> +#ifndef SOL_XDP
> +#define SOL_XDP 283
> +#endif
> +#ifndef AF_XDP
> +#define AF_XDP 44
> +#endif
> +#ifndef PF_XDP
> +#define PF_XDP AF_XDP
> +#endif
> +
> +VLOG_DEFINE_THIS_MODULE(netdev_afxdp);
> +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> +
> +#define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base))
> +#define UMEM2XPKT(base, i) \
> +    ALIGNED_CAST(struct dp_packet_afxdp *, (char *)base + \
> +    i * sizeof(struct dp_packet_afxdp))
> +
> +static uint32_t opt_xdp_bind_flags = XDP_COPY;
> +static uint32_t opt_xdp_flags =
> +                    XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
> +#ifdef USE_DRVMODE_DEFAULT

If I'll define this, build will fail.
Should there be ifdef-else-end ?

> +static uint32_t opt_xdp_bind_flags = XDP_ZEROCOPY;
> +static uint32_t opt_xdp_flags =
> +                    XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
> +#endif
> +static uint32_t prog_id;
> +
> +static struct xsk_umem_info *xsk_configure_umem(void *buffer, uint64_t size)
> +{
> +    struct xsk_umem_info *umem;
> +    int ret;
> +    int i;
> +
> +    umem = xcalloc(1, sizeof(*umem));
> +    if (!umem) {
> +        VLOG_FATAL("xsk config umem failed (%s)", ovs_strerror(errno));

xcalloc can't fail.

> +    }
> +
> +    ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
> +                           NULL);
> +
> +    if (ret) {
> +        VLOG_FATAL("xsk umem create failed (%s) mode: %s",
> +            ovs_strerror(errno),
> +            opt_xdp_bind_flags == XDP_COPY ? "SKB": "DRV");

Why so FATAL? Can we just return NULL and fail the netdev_linux_rxq_construct?

> +    }
> +
> +    umem->buffer = buffer;
> +
> +    /* set-up umem pool */
> +    umem_pool_init(&umem->mpool, NUM_FRAMES);
> +
> +    for (i = NUM_FRAMES - 1; i >= 0; i--) {
> +        struct umem_elem *elem;
> +
> +        elem = ALIGNED_CAST(struct umem_elem *,
> +                            (char *)umem->buffer + i * FRAME_SIZE);
> +        umem_elem_push(&umem->mpool, elem);
> +    }
> +
> +    /* set-up metadata */
> +    xpacket_pool_init(&umem->xpool, NUM_FRAMES);
> +
> +    VLOG_DBG("%s xpacket pool from %p to %p", __func__,
> +              umem->xpool.array,
> +              (char *)umem->xpool.array +
> +              NUM_FRAMES * sizeof(struct dp_packet_afxdp));
> +
> +    for (i = NUM_FRAMES - 1; i >= 0; i--) {
> +        struct dp_packet_afxdp *xpacket;
> +        struct dp_packet *packet;
> +
> +        xpacket = UMEM2XPKT(umem->xpool.array, i);
> +        xpacket->mpool = &umem->mpool;
> +
> +        packet = &xpacket->packet;
> +        packet->source = DPBUF_AFXDP;
> +    }
> +
> +    return umem;
> +}
> +
> +static struct xsk_socket_info *
> +xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
> +                     uint32_t queue_id)
> +{
> +    struct xsk_socket_config cfg;
> +    struct xsk_socket_info *xsk;
> +    char devname[IF_NAMESIZE];
> +    uint32_t idx;
> +    int ret;
> +    int i;
> +
> +    xsk = xcalloc(1, sizeof(*xsk));
> +    if (!xsk) {
> +        VLOG_FATAL("xsk calloc failed (%s)", ovs_strerror(errno));

xcalloc can't fail.

> +    }
> +
> +    xsk->umem = umem;
> +    cfg.rx_size = CONS_NUM_DESCS;
> +    cfg.tx_size = PROD_NUM_DESCS;
> +    cfg.libbpf_flags = 0;
> +    cfg.xdp_flags = opt_xdp_flags;
> +    cfg.bind_flags = opt_xdp_bind_flags;
> +
> +    if (if_indextoname(ifindex, devname) == NULL) {
> +        VLOG_FATAL("ifindex %d devname failed (%s)",
> +                   ifindex, ovs_strerror(errno));

Every little misconfiguration will lead to aborting. It's probably OK
for the experimantal feature, but I don't like this.

> +    }
> +
> +    ret = xsk_socket__create(&xsk->xsk, devname, queue_id, umem->umem,
> +                             &xsk->rx, &xsk->tx, &cfg);
> +    if (ret) {
> +        VLOG_FATAL("xsk_socket_create failed (%s) mode: %s qid: %d",
> +                   ovs_strerror(errno),
> +                   opt_xdp_bind_flags == XDP_COPY ? "SKB": "DRV",
> +                   queue_id);
> +    }
> +
> +    /* make sure the XDP program is there */
> +    ret = bpf_get_link_xdp_id(ifindex, &prog_id, opt_xdp_flags);
> +    if (ret) {
> +        VLOG_FATAL("get XDP prog ID failed (%s)", ovs_strerror(errno));
> +    }
> +
> +    ret = xsk_ring_prod__reserve(&xsk->umem->fq,
> +                                 PROD_NUM_DESCS,
> +                                 &idx);
> +    if (ret != PROD_NUM_DESCS) {
> +        VLOG_FATAL("fq set-up failed (%s)", ovs_strerror(errno));
> +    }
> +
> +    for (i = 0;
> +         i < PROD_NUM_DESCS * FRAME_SIZE;
> +         i += FRAME_SIZE) {
> +        struct umem_elem *elem;
> +        uint64_t addr;
> +
> +        elem = umem_elem_pop(&xsk->umem->mpool);
> +        addr = UMEM2DESC(elem, xsk->umem->buffer);
> +
> +        *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = addr;
> +    }
> +
> +    xsk_ring_prod__submit(&xsk->umem->fq,
> +                          PROD_NUM_DESCS);
> +    return xsk;
> +}
> +
> +struct xsk_socket_info *
> +xsk_configure(int ifindex, int xdp_queue_id)
> +{
> +    struct xsk_socket_info *xsk;
> +    struct xsk_umem_info *umem;
> +    void *bufs;
> +    int ret;
> +
> +    ret = posix_memalign(&bufs, getpagesize(),
> +                         NUM_FRAMES * FRAME_SIZE);

In the future we'll need to use HAVE_POSIX_MEMALIGN, probably.

Do we need to clear the allocated memory?

> +    ovs_assert(!ret);
> +
> +    /* Create sockets... */
> +    umem = xsk_configure_umem(bufs,
> +                              NUM_FRAMES * FRAME_SIZE);
> +    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id);
> +    return xsk;
> +}
> +
> +static void OVS_UNUSED vlog_hex_dump(const void *buf, size_t count)
> +{
> +    struct ds ds = DS_EMPTY_INITIALIZER;
> +    ds_put_hex_dump(&ds, buf, count, 0, false);
> +    VLOG_DBG_RL(&rl, "%s", ds_cstr(&ds));
> +    ds_destroy(&ds);
> +}
> +
> +void
> +xsk_destroy(struct xsk_socket_info *xsk)
> +{
> +    struct xsk_umem *umem;
> +
> +    if (!xsk) {
> +        return;
> +    }
> +
> +    umem = xsk->umem->umem;
> +    xsk_socket__delete(xsk->xsk);
> +    (void)xsk_umem__delete(umem);
> +
> +    /* cleanup umem pool */
> +    umem_pool_cleanup(&xsk->umem->mpool);
> +
> +    /* cleanup metadata pool */
> +    xpacket_pool_cleanup(&xsk->umem->xpool);
> +}
> +
> +static inline void OVS_UNUSED
> +print_xsk_stat(struct xsk_socket_info *xsk OVS_UNUSED) {
> +    struct xdp_statistics stat;
> +    socklen_t optlen;
> +
> +    optlen = sizeof(stat);

please don't paranthesize the argument of sizeof if it's name of variable.

> +    ovs_assert(getsockopt(xsk_socket__fd(xsk->xsk), SOL_XDP, XDP_STATISTICS,
> +                &stat, &optlen) == 0);
> +
> +    VLOG_DBG_RL(&rl, "rx dropped %llu, rx_invalid %llu, tx_invalid %llu",
> +                     stat.rx_dropped,
> +                     stat.rx_invalid_descs,
> +                     stat.tx_invalid_descs);
> +}
> +
> +int
> +netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
> +                        char **errp OVS_UNUSED)
> +{
> +    const char *xdpmode;
> +    int new_n_rxq;
> +
> +    /* TODO: add mutex lock */
> +    new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
> +
> +    if (netdev->n_rxq != new_n_rxq) {
> +
> +        if (new_n_rxq > MAX_XSKQ) {
> +            VLOG_WARN("set n_rxq %d too large", new_n_rxq);
> +            goto out;

Just return EINVAL.

> +        }
> +
> +        netdev->n_rxq = new_n_rxq;

This is wrong. You must not update netdev->n_rxq here. This should
be done on reconfiguration.

> +        VLOG_INFO("set AF_XDP device %s to %d n_rxq", netdev->name, new_n_rxq);
> +        netdev_request_reconfigure(netdev);
> +    }
> +
> +    xdpmode = smap_get(args, "xdpmode");
> +    if (xdpmode && strncmp(xdpmode, "drv", 3) == 0) {
> +        if (opt_xdp_bind_flags != XDP_ZEROCOPY) {
> +            opt_xdp_bind_flags = XDP_ZEROCOPY;
> +            opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
> +        }
> +        VLOG_INFO("AF_XDP device %s in ZC driver mode", netdev->name);
> +    } else {
> +        opt_xdp_bind_flags = XDP_COPY;
> +        opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
> +        VLOG_INFO("AF_XDP device %s in SKB mode", netdev->name);
> +    }

Looks like changing "xdpmode" while port already added will
lead to incorrect work. You, probably, need to forbid this or
prepare the proper reconfiguration process.

> +
> +out:
> +    return 0;
> +}
> +
> +int
> +netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args)
> +{
> +    /* TODO: add mutex lock */
> +    smap_add_format(args, "n_rxq", "%d", netdev->n_rxq);
> +    smap_add_format(args, "xdpmode", "%s",
> +        opt_xdp_bind_flags == XDP_ZEROCOPY ? "drv" : "skb");
> +
> +    return 0;
> +}
> +
> +int
> +netdev_afxdp_get_numa_id(const struct netdev *netdev)
> +{
> +    /* FIXME: Get netdev's PCIe device ID, then find
> +     * its NUMA node id.
> +     */
> +    VLOG_INFO("FIXME: Device %s always use numa id 0", netdev->name);
> +    return 0;
> +}
> +
> +void
> +xsk_remove_xdp_program(uint32_t ifindex)
> +{
> +    uint32_t curr_prog_id = 0;
> +
> +    /* remove_xdp_program() */
> +    if (bpf_get_link_xdp_id(ifindex, &curr_prog_id, opt_xdp_flags)) {
> +        bpf_set_link_xdp_fd(ifindex, -1, opt_xdp_flags);
> +    }
> +    if (prog_id == curr_prog_id) {
> +        bpf_set_link_xdp_fd(ifindex, -1, opt_xdp_flags);
> +    } else if (!curr_prog_id) {
> +        VLOG_WARN("couldn't find a prog id on a given interface");
> +    } else {
> +        VLOG_WARN("program on interface changed, not removing");
> +    }
> +}
> +
> +/* Receive packet from AF_XDP socket */
> +int
> +netdev_linux_rxq_xsk(struct xsk_socket_info *xsk,
> +                     struct dp_packet_batch *batch)
> +{
> +    unsigned int rcvd, i;
> +    uint32_t idx_rx = 0, idx_fq = 0;
> +    int ret = 0;
> +
> +    /* See if there is any packet on RX queue,
> +     * if yes, idx_rx is the index having the packet.
> +     */
> +    rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx);
> +    if (!rcvd) {
> +        return 0;
> +    }
> +
> +    /* Form a dp_packet batch from descriptor in RX queue */
> +    for (i = 0; i < rcvd; i++) {
> +        uint64_t addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
> +        uint32_t len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->len;
> +        char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
> +        uint64_t index;
> +
> +        struct dp_packet_afxdp *xpacket;
> +        struct dp_packet *packet;
> +
> +        index = addr >> FRAME_SHIFT;
> +        xpacket = UMEM2XPKT(xsk->umem->xpool.array, index);
> +
> +        packet = &xpacket->packet;
> +        xpacket->mpool = &xsk->umem->mpool;
> +
> +        if (packet->source != DPBUF_AFXDP) {
> +            /* FIXME: might be a bug */

Need to log something here. Rate-limited.

> +            continue;
> +        }
> +
> +        /* Initialize the struct dp_packet */
> +        if (opt_xdp_bind_flags == XDP_ZEROCOPY) {
> +            dp_packet_set_base(packet, pkt - FRAME_HEADROOM);
> +        } else {
> +            /* SKB mode */
> +            dp_packet_set_base(packet, pkt);
> +        }
> +        dp_packet_set_data(packet, pkt);
> +        dp_packet_set_size(packet, len);
> +
> +        /* Add packet into batch, increase batch->count */
> +        dp_packet_batch_add(batch, packet);
> +
> +        idx_rx++;
> +    }
> +
> +    /* We've consume rcvd packets in RX, now re-fill the
> +     * same number back to FILL queue.
> +     */
> +    for (i = 0; i < rcvd; i++) {
> +        uint64_t index;
> +        struct umem_elem *elem;
> +
> +        ret = xsk_ring_prod__reserve(&xsk->umem->fq, 1, &idx_fq);
> +        while (ret == 0) {
> +            /* The FILL queue is full, so retry. (or skip)? */
> +            ret = xsk_ring_prod__reserve(&xsk->umem->fq, 1, &idx_fq);
> +        }
> +
> +        /* Get one free umem, program it into FILL queue */
> +        elem = umem_elem_pop(&xsk->umem->mpool);
> +        index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
> +        ovs_assert((index & FRAME_SHIFT_MASK) == 0);
> +        *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq) = index;
> +
> +        idx_fq++;
> +    }
> +    xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
> +
> +    /* Release the RX queue */
> +    xsk_ring_cons__release(&xsk->rx, rcvd);
> +    xsk->rx_npkts += rcvd;
> +
> +#ifdef AFXDP_DEBUG
> +    print_xsk_stat(xsk);
> +#endif
> +    return 0;
> +}
> +
> +static void kick_tx(struct xsk_socket_info *xsk)
> +{
> +    int ret;
> +
> +    ret = sendto(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, 0);
> +    if (ret >= 0 || errno == ENOBUFS || errno == EAGAIN || errno == EBUSY) {
> +        return;
> +    }
> +}
> +
> +int
> +netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk,
> +                              struct dp_packet_batch *batch)
> +{
> +    uint32_t tx_done, idx_cq = 0;
> +    struct dp_packet *packet;
> +    uint32_t idx;
> +    int j;
> +
> +    /* Make sure we have enough TX descs */
> +    if (xsk_ring_prod__reserve(&xsk->tx, batch->count, &idx) == 0) {
> +        return -EAGAIN;
> +    }
> +
> +    DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
> +        struct dp_packet_afxdp *xpacket;
> +        struct umem_elem *elem;
> +        uint64_t index;
> +
> +        elem = umem_elem_pop(&xsk->umem->mpool);
> +        if (!elem) {
> +            return -EAGAIN;
> +        }
> +
> +        memcpy(elem, dp_packet_data(packet), dp_packet_size(packet));
> +
> +        index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
> +        xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->addr = index;
> +        xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->len
> +            = dp_packet_size(packet);
> +
> +        if (packet->source == DPBUF_AFXDP) {
> +            xpacket = dp_packet_cast_afxdp(packet);
> +            umem_elem_push(xpacket->mpool, dp_packet_base(packet));
> +             /* Avoid freeing it twice at dp_packet_uninit */
> +            xpacket->mpool = NULL;

Why you're freeing packets here? 'netdev_linux_send' will do that for you.

> +        }
> +    }
> +    xsk_ring_prod__submit(&xsk->tx, batch->count);
> +    xsk->outstanding_tx += batch->count;
> +
> +retry:
> +    kick_tx(xsk);
> +
> +    /* Process CQ */

Maybe it's better to process CQ on rx ?
It's unknown when we'll be here next time, but we'll definitely
call rx function soon.

> +    tx_done = xsk_ring_cons__peek(&xsk->umem->cq, batch->count, &idx_cq);
> +    if (tx_done > 0) {
> +        xsk->outstanding_tx -= tx_done;
> +        xsk->tx_npkts += tx_done;
> +    }
> +
> +    /* Recycle back to umem pool */
> +    for (j = 0; j < tx_done; j++) {
> +        struct umem_elem *elem;
> +        uint64_t addr;
> +
> +        addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++);
> +
> +        elem = ALIGNED_CAST(struct umem_elem *,
> +                            (char *)xsk->umem->buffer + addr);
> +        umem_elem_push(&xsk->umem->mpool, elem);
> +    }
> +    xsk_ring_cons__release(&xsk->umem->cq, tx_done);
> +
> +    if (xsk->outstanding_tx > PROD_NUM_DESCS - (PROD_NUM_DESCS >> 2)) {
> +        /* If there are still a lot not transmitted,
> +         * try harder.
> +         */
> +        goto retry;
> +    }
> +
> +    return 0;
> +}
> +
> +#else
> +#include "openvswitch/compiler.h"
> +#include "netdev-afxdp.h"
> +
> +struct xsk_socket_info *
> +xsk_configure(int ifindex OVS_UNUSED, int xdp_queue_id OVS_UNUSED)
> +{
> +    return NULL;
> +}
> +
> +void
> +xsk_destroy(struct xsk_socket_info *xsk OVS_UNUSED)
> +{
> +}
> +
> +int
> +netdev_linux_rxq_xsk(struct xsk_socket_info *xsk OVS_UNUSED,
> +                     struct dp_packet_batch *batch OVS_UNUSED)
> +{
> +    return 0;
> +}
> +
> +int
> +netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk OVS_UNUSED,
> +                              struct dp_packet_batch *batch OVS_UNUSED)
> +{
> +    return 0;
> +}
> +
> +int
> +netdev_afxdp_set_config(struct netdev *netdev OVS_UNUSED,
> +                        const struct smap *args OVS_UNUSED,
> +                        char **errp OVS_UNUSED)
> +{
> +    return 0;
> +}
> +
> +int
> +netdev_afxdp_get_config(const struct netdev *netdev OVS_UNUSED,
> +                        struct smap *args OVS_UNUSED)
> +{
> +    return 0;
> +}
> +
> +int
> +netdev_afxdp_get_numa_id(const struct netdev *netdev OVS_UNUSED)
> +{
> +    return 0;
> +}
> +#endif
> diff --git a/lib/netdev-afxdp.h b/lib/netdev-afxdp.h
> new file mode 100644
> index 000000000000..ea05612a7c0f
> --- /dev/null
> +++ b/lib/netdev-afxdp.h
> @@ -0,0 +1,47 @@
> +/*
> + * Copyright (c) 2018 Nicira, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +
> +#ifndef NETDEV_AFXDP_H
> +#define NETDEV_AFXDP_H 1
> +
> +#include <stdint.h>
> +#include <stdbool.h>
> +
> +/* These functions are Linux AF_XDP specific, so they should be used directly
> + * only by Linux-specific code. */
> +#define MAX_XSKQ 16
> +struct netdev;
> +struct xsk_socket_info;
> +struct xdp_umem;
> +struct dp_packet_batch;
> +struct smap;
> +
> +struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id);
> +void xsk_destroy(struct xsk_socket_info *xsk);
> +
> +int netdev_linux_rxq_xsk(struct xsk_socket_info *xsk,
> +                         struct dp_packet_batch *batch);
> +
> +int netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk,
> +                                  struct dp_packet_batch *batch);
> +
> +void xsk_remove_xdp_program(uint32_t ifindex);
> +int netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
> +                            char **errp);
> +int netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args);
> +int netdev_afxdp_get_numa_id(const struct netdev *netdev);
> +
> +#endif /* netdev-afxdp.h */
> diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> index f75d73fd39f8..337760ca3333 100644
> --- a/lib/netdev-linux.c
> +++ b/lib/netdev-linux.c
> @@ -75,6 +75,7 @@
>  #include "unaligned.h"
>  #include "openvswitch/vlog.h"
>  #include "util.h"
> +#include "netdev-afxdp.h"
>  
>  VLOG_DEFINE_THIS_MODULE(netdev_linux);
>  
> @@ -531,6 +532,7 @@ struct netdev_linux {
>  
>      /* LAG information. */
>      bool is_lag_master;         /* True if the netdev is a LAG master. */
> +    struct xsk_socket_info *xsk[MAX_XSKQ]; /* af_xdp socket */
>  };
>  
>  struct netdev_rxq_linux {
> @@ -580,12 +582,18 @@ is_netdev_linux_class(const struct netdev_class *netdev_class)
>  }
>  
>  static bool
> +is_afxdp_netdev(const struct netdev *netdev)
> +{
> +    return netdev_get_class(netdev) == &netdev_afxdp_class;
> +}
> +
> +static bool
>  is_tap_netdev(const struct netdev *netdev)
>  {
>      return netdev_get_class(netdev) == &netdev_tap_class;
>  }
>  
> -static struct netdev_linux *
> +struct netdev_linux *
>  netdev_linux_cast(const struct netdev *netdev)
>  {
>      ovs_assert(is_netdev_linux_class(netdev_get_class(netdev)));
> @@ -1084,6 +1092,25 @@ netdev_linux_destruct(struct netdev *netdev_)
>          atomic_count_dec(&miimon_cnt);
>      }
>  
> +#if HAVE_AF_XDP
> +    if (is_afxdp_netdev(netdev_)) {
> +        int ifindex;
> +        int ret, i;
> +
> +        ret = get_ifindex(netdev_, &ifindex);
> +        if (ret) {
> +            VLOG_ERR("get ifindex error");
> +        } else {
> +            for (i = 0; i < MAX_XSKQ; i++) {
> +                if (netdev->xsk[i]) {
> +                    VLOG_INFO("destroy xsk[%d]", i);
> +                    xsk_destroy(netdev->xsk[i]);
> +                }
> +            }
> +            xsk_remove_xdp_program(ifindex);
> +        }
> +    }
> +#endif
>      ovs_mutex_destroy(&netdev->mutex);
>  }
>  
> @@ -1113,6 +1140,32 @@ netdev_linux_rxq_construct(struct netdev_rxq *rxq_)
>      rx->is_tap = is_tap_netdev(netdev_);
>      if (rx->is_tap) {
>          rx->fd = netdev->tap_fd;
> +    } else if (is_afxdp_netdev(netdev_)) {
> +#if HAVE_AF_XDP
> +        struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
> +        int ifindex;
> +        int xdp_queue_id = rxq_->queue_id;
> +        struct xsk_socket_info *xsk;
> +
> +        if (setrlimit(RLIMIT_MEMLOCK, &r)) {
> +            VLOG_ERR("ERROR: setrlimit(RLIMIT_MEMLOCK) \"%s\"\n",
> +                      ovs_strerror(errno));
> +            ovs_assert(0);
> +        }
> +
> +        VLOG_DBG("%s: %s: queue=%d configuring xdp sock",
> +                  __func__, netdev_->name, xdp_queue_id);
> +
> +        /* Get ethernet device index. */
> +        error = get_ifindex(&netdev->up, &ifindex);
> +        if (error) {
> +            goto error;
> +        }
> +
> +        xsk = xsk_configure(ifindex, xdp_queue_id);
> +        netdev->xsk[xdp_queue_id] = xsk;
> +        rx->fd = xsk_socket__fd(xsk->xsk); /* for netdev layer to poll */
> +#endif
>      } else {
>          struct sockaddr_ll sll;
>          int ifindex, val;
> @@ -1318,9 +1371,16 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
>  {
>      struct netdev_rxq_linux *rx = netdev_rxq_linux_cast(rxq_);
>      struct netdev *netdev = rx->up.netdev;
> -    struct dp_packet *buffer;
> +    struct dp_packet *buffer = NULL;
>      ssize_t retval;
>      int mtu;
> +    struct netdev_linux *netdev_ = netdev_linux_cast(netdev);
> +
> +    if (is_afxdp_netdev(netdev)) {
> +        int qid = rxq_->queue_id;
> +
> +        return netdev_linux_rxq_xsk(netdev_->xsk[qid], batch);
> +    }
>  
>      if (netdev_linux_get_mtu__(netdev_linux_cast(netdev), &mtu)) {
>          mtu = ETH_PAYLOAD_MAX;
> @@ -1329,6 +1389,7 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
>      /* Assume Ethernet port. No need to set packet_type. */
>      buffer = dp_packet_new_with_headroom(VLAN_ETH_HEADER_LEN + mtu,
>                                             DP_NETDEV_HEADROOM);
> +
>      retval = (rx->is_tap
>                ? netdev_linux_rxq_recv_tap(rx->fd, buffer)
>                : netdev_linux_rxq_recv_sock(rx->fd, buffer));
> @@ -1473,14 +1534,15 @@ netdev_linux_tap_batch_send(struct netdev *netdev_,
>   * The kernel maintains a packet transmission queue, so the caller is not
>   * expected to do additional queuing of packets. */
>  static int
> -netdev_linux_send(struct netdev *netdev_, int qid OVS_UNUSED,
> +netdev_linux_send(struct netdev *netdev_, int qid,
>                    struct dp_packet_batch *batch,
>                    bool concurrent_txq OVS_UNUSED)
>  {
>      int error = 0;
>      int sock = 0;
>  
> -    if (!is_tap_netdev(netdev_)) {
> +    if (!is_tap_netdev(netdev_) &&
> +        !is_afxdp_netdev(netdev_)) {
>          if (netdev_linux_netnsid_is_remote(netdev_linux_cast(netdev_))) {
>              error = EOPNOTSUPP;
>              goto free_batch;
> @@ -1499,6 +1561,10 @@ netdev_linux_send(struct netdev *netdev_, int qid OVS_UNUSED,
>          }
>  
>          error = netdev_linux_sock_batch_send(sock, ifindex, batch);
> +    } else if (is_afxdp_netdev(netdev_)) {
> +        struct netdev_linux *netdev = netdev_linux_cast(netdev_);
> +
> +        error = netdev_linux_afxdp_batch_send(netdev->xsk[qid], batch);
>      } else {
>          error = netdev_linux_tap_batch_send(netdev_, batch);
>      }
> @@ -3323,6 +3389,7 @@ const struct netdev_class netdev_linux_class = {
>      NETDEV_LINUX_CLASS_COMMON,
>      LINUX_FLOW_OFFLOAD_API,
>      .type = "system",
> +    .is_pmd = false,
>      .construct = netdev_linux_construct,
>      .get_stats = netdev_linux_get_stats,
>      .get_features = netdev_linux_get_features,
> @@ -3333,6 +3400,7 @@ const struct netdev_class netdev_linux_class = {
>  const struct netdev_class netdev_tap_class = {
>      NETDEV_LINUX_CLASS_COMMON,
>      .type = "tap",
> +    .is_pmd = false,
>      .construct = netdev_linux_construct_tap,
>      .get_stats = netdev_tap_get_stats,
>      .get_features = netdev_linux_get_features,
> @@ -3343,10 +3411,23 @@ const struct netdev_class netdev_internal_class = {
>      NETDEV_LINUX_CLASS_COMMON,
>      LINUX_FLOW_OFFLOAD_API,
>      .type = "internal",
> +    .is_pmd = false,
>      .construct = netdev_linux_construct,
>      .get_stats = netdev_internal_get_stats,
>      .get_status = netdev_internal_get_status,
>  };
> +
> +const struct netdev_class netdev_afxdp_class = {
> +    NETDEV_LINUX_CLASS_COMMON,
> +    .type = "afxdp",
> +    .is_pmd = true,
> +    .construct = netdev_linux_construct,
> +    .get_stats = netdev_linux_get_stats,
> +    .get_status = netdev_linux_get_status,
> +    .set_config = netdev_afxdp_set_config,
> +    .get_config = netdev_afxdp_get_config,
> +    .get_numa_id = netdev_afxdp_get_numa_id,
> +};
>  
>  
>  #define CODEL_N_QUEUES 0x0000
> diff --git a/lib/netdev-linux.h b/lib/netdev-linux.h
> index 17ca9120168a..afcb20ee8d0a 100644
> --- a/lib/netdev-linux.h
> +++ b/lib/netdev-linux.h
> @@ -28,6 +28,7 @@ struct netdev;
>  int netdev_linux_ethtool_set_flag(struct netdev *netdev, uint32_t flag,
>                                    const char *flag_name, bool enable);
>  int linux_get_ifindex(const char *netdev_name);
> +struct netdev_linux *netdev_linux_cast(const struct netdev *netdev);
>  
>  #define LINUX_FLOW_OFFLOAD_API                          \
>     .flow_flush = netdev_tc_flow_flush,                  \
> diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
> index fb0c27e6e8e8..5bf041316503 100644
> --- a/lib/netdev-provider.h
> +++ b/lib/netdev-provider.h
> @@ -902,6 +902,7 @@ extern const struct netdev_class netdev_linux_class;
>  #endif
>  extern const struct netdev_class netdev_internal_class;
>  extern const struct netdev_class netdev_tap_class;
> +extern const struct netdev_class netdev_afxdp_class;
>  
>  #ifdef  __cplusplus
>  }
> diff --git a/lib/netdev.c b/lib/netdev.c
> index 7d7ecf6f0946..c30016b34033 100644
> --- a/lib/netdev.c
> +++ b/lib/netdev.c
> @@ -145,6 +145,7 @@ netdev_initialize(void)
>          netdev_register_provider(&netdev_linux_class);
>          netdev_register_provider(&netdev_internal_class);
>          netdev_register_provider(&netdev_tap_class);
> +        netdev_register_provider(&netdev_afxdp_class);
>          netdev_vport_tunnel_register();
>  #endif
>  #if defined(__FreeBSD__) || defined(__NetBSD__)
> diff --git a/lib/xdpsock.c b/lib/xdpsock.c
> new file mode 100644
> index 000000000000..f9fe94b9e36a
> --- /dev/null
> +++ b/lib/xdpsock.c
> @@ -0,0 +1,210 @@
> +/*
> + * Copyright (c) 2018, 2019 Nicira, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +#include <config.h>
> +#include <ctype.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <stdarg.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/stat.h>
> +#include <sys/types.h>
> +#include <syslog.h>
> +#include <time.h>
> +#include <unistd.h>
> +#include "openvswitch/vlog.h"
> +#include "async-append.h"
> +#include "coverage.h"
> +#include "dirs.h"
> +#include "ovs-thread.h"
> +#include "sat-math.h"
> +#include "socket-util.h"
> +#include "svec.h"
> +#include "syslog-direct.h"
> +#include "syslog-libc.h"
> +#include "syslog-provider.h"
> +#include "timeval.h"
> +#include "unixctl.h"
> +#include "util.h"
> +#include "ovs-atomic.h"
> +#include "openvswitch/compiler.h"
> +#include "dp-packet.h"
> +
> +#ifdef HAVE_AF_XDP
> +#include "xdpsock.h"
> +
> +static inline void ovs_spinlock_init(ovs_spinlock_t *sl)
> +{
> +    sl->locked = 0;
> +}
> +
> +static inline void ovs_spin_lock(ovs_spinlock_t *sl)
> +{
> +    int exp = 0;
> +
> +    while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0,
> +                __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
> +        while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED)) {


These atomics are compiler specific. Please use:

    while (!atomic_compare_exchange_strong_explicit(&sl->locked, &exp, 1,
                                                    memory_order_acquire,
                                                    memory_order_relaxed)) {
        locked = 1;
        while (locked) {
            atomic_read_relaxed(&sl->locked, &locked);
        }
        exp = 0;
    }

> +            ;
> +        }
> +        exp = 0;
> +    }
> +}
> +
> +static inline void ovs_spin_unlock(ovs_spinlock_t *sl)
> +{
> +    __atomic_store_n(&sl->locked, 0, __ATOMIC_RELEASE);

    atomic_store_explicit(&sl->locked, 0, memory_order_release);

> +}
> +
> +static inline int OVS_UNUSED ovs_spin_trylock(ovs_spinlock_t *sl)
> +{
> +    int exp = 0;
> +    return __atomic_compare_exchange_n(&sl->locked, &exp, 1,
> +              0, /* disallow spurious failure */
> +               __ATOMIC_ACQUIRE, __ATOMIC_RELAXED);


    return atomic_compare_exchange_strong_explicit(&sl->locked, &exp, 1,
                                                   memory_order_acquire,
                                                   memory_order_relaxed);


> +}
> +
> +void
> +__umem_elem_push_n(struct umem_pool *umemp OVS_UNUSED, void **addrs, int n)
> +{
> +    void *ptr;
> +
> +    if (OVS_UNLIKELY(umemp->index + n > umemp->size)) {
> +        OVS_NOT_REACHED();
> +    }
> +
> +    ptr = &umemp->array[umemp->index];
> +    memcpy(ptr, addrs, n * sizeof(void *));
> +    umemp->index += n;
> +}
> +
> +inline void
> +__umem_elem_push(struct umem_pool *umemp OVS_UNUSED, void *addr)
> +{
> +    umemp->array[umemp->index++] = addr;
> +}
> +
> +void
> +umem_elem_push(struct umem_pool *umemp OVS_UNUSED, void *addr)
> +{
> +
> +    if (OVS_UNLIKELY(umemp->index >= umemp->size)) {
> +        /* stack is full */
> +        /* it's possible that one umem gets pushed twice,
> +         * because actions=1,2,3... multiple ports?
> +        */
> +        OVS_NOT_REACHED();
> +    }
> +
> +    ovs_assert(((uint64_t)addr & FRAME_SHIFT_MASK) == 0);
> +
> +    ovs_spin_lock(&umemp->mutex);
> +    __umem_elem_push(umemp, addr);
> +    ovs_spin_unlock(&umemp->mutex);
> +}
> +
> +void
> +__umem_elem_pop_n(struct umem_pool *umemp OVS_UNUSED, void **addrs, int n)
> +{
> +    void *ptr;
> +
> +    umemp->index -= n;
> +
> +    if (OVS_UNLIKELY(umemp->index < 0)) {
> +        OVS_NOT_REACHED();
> +    }
> +
> +    ptr = &umemp->array[umemp->index];
> +    memcpy(addrs, ptr, n * sizeof(void *));
> +}
> +
> +inline void *
> +__umem_elem_pop(struct umem_pool *umemp OVS_UNUSED)
> +{
> +    return umemp->array[--umemp->index];
> +}
> +
> +void *
> +umem_elem_pop(struct umem_pool *umemp OVS_UNUSED)
> +{
> +    void *ptr;
> +
> +    ovs_spin_lock(&umemp->mutex);
> +    ptr = __umem_elem_pop(umemp);
> +    ovs_spin_unlock(&umemp->mutex);
> +
> +    return ptr;
> +}
> +
> +void **
> +__umem_pool_alloc(unsigned int size)
> +{
> +    void *bufs;
> +
> +    ovs_assert(posix_memalign(&bufs, getpagesize(),
> +                              size * sizeof(void *)) == 0);
> +    memset(bufs, 0, size * sizeof(void *));
> +    return (void **)bufs;
> +}
> +
> +unsigned int
> +umem_elem_count(struct umem_pool *mpool)
> +{
> +    return mpool->index;
> +}
> +
> +int
> +umem_pool_init(struct umem_pool *umemp OVS_UNUSED, unsigned int size)
> +{
> +    umemp->array = __umem_pool_alloc(size);
> +    if (!umemp->array) {
> +        OVS_NOT_REACHED();
> +    }
> +
> +    umemp->size = size;
> +    umemp->index = 0;
> +    ovs_spinlock_init(&umemp->mutex);
> +    return 0;
> +}
> +
> +void
> +umem_pool_cleanup(struct umem_pool *umemp OVS_UNUSED)
> +{
> +    free(umemp->array);
> +}
> +
> +/* AF_XDP metadata init/destroy */
> +int
> +xpacket_pool_init(struct xpacket_pool *xp, unsigned int size)
> +{
> +    void *bufs;
> +
> +    ovs_assert(posix_memalign(&bufs, getpagesize(),
> +                              size * sizeof(struct dp_packet_afxdp)) == 0);
> +    memset(bufs, 0, size * sizeof(struct dp_packet_afxdp));
> +
> +    xp->array = bufs;
> +    xp->size = size;
> +    return 0;
> +}
> +
> +void
> +xpacket_pool_cleanup(struct xpacket_pool *xp)
> +{
> +    free(xp->array);
> +}
> +#else   /* !HAVE_AF_XDP below */
> +#endif
> diff --git a/lib/xdpsock.h b/lib/xdpsock.h
> new file mode 100644
> index 000000000000..cb64befe7dba
> --- /dev/null
> +++ b/lib/xdpsock.h
> @@ -0,0 +1,133 @@
> +/*
> + * Copyright (c) 2018, 2019 Nicira, Inc.
> + *
> + * Licensed under the Apache License, Version 2.0 (the "License");
> + * you may not use this file except in compliance with the License.
> + * You may obtain a copy of the License at:
> + *
> + *     http://www.apache.org/licenses/LICENSE-2.0
> + *
> + * Unless required by applicable law or agreed to in writing, software
> + * distributed under the License is distributed on an "AS IS" BASIS,
> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> + * See the License for the specific language governing permissions and
> + * limitations under the License.
> + */
> +#ifndef XDPSOCK_H
> +#define XDPSOCK_H 1
> +#include <errno.h>
> +#include <getopt.h>
> +#include <libgen.h>
> +#include <linux/bpf.h>
> +#include <linux/if_link.h>
> +#include <linux/if_xdp.h>
> +#include <linux/if_ether.h>
> +#include <net/if.h>
> +#include <signal.h>
> +#include <stdbool.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <net/ethernet.h>
> +#include <sys/resource.h>
> +#include <sys/socket.h>
> +#include <sys/mman.h>
> +#include <time.h>
> +#include <unistd.h>
> +#include <pthread.h>
> +#include <locale.h>
> +#include <sys/types.h>
> +#include <poll.h>
> +#include <bpf/libbpf.h>
> +
> +#include "ovs-atomic.h"
> +#include "openvswitch/thread.h"
> +
> +/* bpf/xsk.h uses the following macros not defined in OVS,
> + * so re-define them before include.
> + */
> +#define unlikely OVS_UNLIKELY
> +#define likely OVS_LIKELY
> +#define barrier() __asm__ __volatile__("": : :"memory")
> +#define smp_rmb() barrier()
> +#define smp_wmb() barrier()

These barriers also x86 specific. We'll need to fix that in
the future before removing build constraints.

> +#include <bpf/xsk.h>
> +
> +#define FRAME_HEADROOM  XDP_PACKET_HEADROOM
> +#define FRAME_SIZE      XSK_UMEM__DEFAULT_FRAME_SIZE
> +#define BATCH_SIZE      NETDEV_MAX_BURST
> +#define FRAME_SHIFT     XSK_UMEM__DEFAULT_FRAME_SHIFT
> +#define FRAME_SHIFT_MASK    ((1<<FRAME_SHIFT)-1)
> +
> +#define NUM_FRAMES  1024
> +#define PROD_NUM_DESCS 128
> +#define CONS_NUM_DESCS 128
> +
> +#ifdef USE_XSK_DEFAULT
> +#define PROD_NUM_DESCS XSK_RING_PROD__DEFAULT_NUM_DESCS
> +#define CONS_NUM_DESCS XSK_RING_CONS__DEFAULT_NUM_DESCS
> +#endif
> +
> +typedef struct {
> +    volatile int locked;

atomic_int locked;

or atomic_bool.

> +} ovs_spinlock_t;
> +
> +/* LIFO ptr_array */
> +struct umem_pool {
> +    int index;      /* point to top */
> +    unsigned int size;
> +    ovs_spinlock_t mutex;
> +    void **array;   /* a pointer array */
> +};
> +
> +/* array-based dp_packet_afxdp */
> +struct xpacket_pool {
> +    unsigned int size;
> +    struct dp_packet_afxdp **array;
> +};
> +
> +struct xsk_umem_info {
> +    struct umem_pool mpool;
> +    struct xpacket_pool xpool;
> +    struct xsk_ring_prod fq;
> +    struct xsk_ring_cons cq;
> +    struct xsk_umem *umem;
> +    void *buffer;
> +};
> +
> +struct xsk_socket_info {
> +    struct xsk_ring_cons rx;
> +    struct xsk_ring_prod tx;
> +    struct xsk_umem_info *umem;
> +    struct xsk_socket *xsk;
> +    unsigned long rx_npkts;
> +    unsigned long tx_npkts;
> +    unsigned long prev_rx_npkts;
> +    unsigned long prev_tx_npkts;
> +    uint32_t outstanding_tx;
> +};
> +
> +struct umem_elem_head {
> +    unsigned int index;
> +    struct ovs_mutex mutex;
> +    uint32_t n;
> +};
> +
> +struct umem_elem {
> +    struct umem_elem *next;
> +};
> +
> +void __umem_elem_push(struct umem_pool *umemp, void *addr);
> +void umem_elem_push(struct umem_pool *umemp, void *addr);
> +void *__umem_elem_pop(struct umem_pool *umemp);
> +void *umem_elem_pop(struct umem_pool *umemp);
> +void **__umem_pool_alloc(unsigned int size);
> +int umem_pool_init(struct umem_pool *umemp, unsigned int size);
> +void umem_pool_cleanup(struct umem_pool *umemp);
> +unsigned int umem_elem_count(struct umem_pool *mpool);
> +void __umem_elem_pop_n(struct umem_pool *umemp, void **addrs, int n);
> +void __umem_elem_push_n(struct umem_pool *umemp, void **addrs, int n);
> +int xpacket_pool_init(struct xpacket_pool *xp, unsigned int size);
> +void xpacket_pool_cleanup(struct xpacket_pool *xp);
> +
> +#endif
> diff --git a/tests/automake.mk b/tests/automake.mk
> index ea16532dd2a0..715cef9a6b3b 100644
> --- a/tests/automake.mk
> +++ b/tests/automake.mk
> @@ -4,12 +4,14 @@ EXTRA_DIST += \
>  	$(SYSTEM_TESTSUITE_AT) \
>  	$(SYSTEM_KMOD_TESTSUITE_AT) \
>  	$(SYSTEM_USERSPACE_TESTSUITE_AT) \
> +	$(SYSTEM_AFXDP_TESTSUITE_AT) \
>  	$(SYSTEM_OFFLOADS_TESTSUITE_AT) \
>  	$(SYSTEM_DPDK_TESTSUITE_AT) \
>  	$(OVSDB_CLUSTER_TESTSUITE_AT) \
>  	$(TESTSUITE) \
>  	$(SYSTEM_KMOD_TESTSUITE) \
>  	$(SYSTEM_USERSPACE_TESTSUITE) \
> +	$(SYSTEM_AFXDP_TESTSUITE) \
>  	$(SYSTEM_OFFLOADS_TESTSUITE) \
>  	$(SYSTEM_DPDK_TESTSUITE) \
>  	$(OVSDB_CLUSTER_TESTSUITE) \
> @@ -158,6 +160,11 @@ SYSTEM_USERSPACE_TESTSUITE_AT = \
>  	tests/system-userspace-macros.at \
>  	tests/system-userspace-packet-type-aware.at
>  
> +SYSTEM_AFXDP_TESTSUITE_AT = \
> +	tests/system-afxdp-testsuite.at \
> +	tests/system-afxdp-traffic.at \
> +	tests/system-afxdp-macros.at
> +
>  SYSTEM_TESTSUITE_AT = \
>  	tests/system-common-macros.at \
>  	tests/system-ovn.at \
> @@ -182,6 +189,7 @@ TESTSUITE = $(srcdir)/tests/testsuite
>  TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch
>  SYSTEM_KMOD_TESTSUITE = $(srcdir)/tests/system-kmod-testsuite
>  SYSTEM_USERSPACE_TESTSUITE = $(srcdir)/tests/system-userspace-testsuite
> +SYSTEM_AFXDP_TESTSUITE = $(srcdir)/tests/system-afxdp-testsuite
>  SYSTEM_OFFLOADS_TESTSUITE = $(srcdir)/tests/system-offloads-testsuite
>  SYSTEM_DPDK_TESTSUITE = $(srcdir)/tests/system-dpdk-testsuite
>  OVSDB_CLUSTER_TESTSUITE = $(srcdir)/tests/ovsdb-cluster-testsuite
> @@ -315,6 +323,11 @@ check-system-userspace: all
>  	set $(SHELL) '$(SYSTEM_USERSPACE_TESTSUITE)' -C tests  AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
>  	"$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
>  
> +check-afxdp: all
> +	$(MAKE) install
> +	set $(SHELL) '$(SYSTEM_AFXDP_TESTSUITE)' -C tests  AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \
> +	"$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
> +
>  check-offloads: all
>  	set $(SHELL) '$(SYSTEM_OFFLOADS_TESTSUITE)' -C tests  AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
>  	"$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
> @@ -352,6 +365,10 @@ $(SYSTEM_USERSPACE_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_USERSP
>  	$(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
>  	$(AM_V_at)mv $@.tmp $@
>  
> +$(SYSTEM_AFXDP_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_AFXDP_TESTSUITE_AT) $(COMMON_MACROS_AT)
> +	$(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
> +	$(AM_V_at)mv $@.tmp $@
> +
>  $(SYSTEM_OFFLOADS_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_OFFLOADS_TESTSUITE_AT) $(COMMON_MACROS_AT)
>  	$(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
>  	$(AM_V_at)mv $@.tmp $@
> diff --git a/tests/system-afxdp-macros.at b/tests/system-afxdp-macros.at
> new file mode 100644
> index 000000000000..2c58c2d6554b
> --- /dev/null
> +++ b/tests/system-afxdp-macros.at
> @@ -0,0 +1,153 @@
> +# _ADD_BR([name])
> +#
> +# Expands into the proper ovs-vsctl commands to create a bridge with the
> +# appropriate type and properties
> +m4_define([_ADD_BR], [[add-br $1 -- set Bridge $1 datapath_type=netdev protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15 fail-mode=secure ]])
> +
> +# OVS_TRAFFIC_VSWITCHD_START([vsctl-args], [vsctl-output], [=override])
> +#
> +# Creates a database and starts ovsdb-server, starts ovs-vswitchd
> +# connected to that database, calls ovs-vsctl to create a bridge named
> +# br0 with predictable settings, passing 'vsctl-args' as additional
> +# commands to ovs-vsctl.  If 'vsctl-args' causes ovs-vsctl to provide
> +# output (e.g. because it includes "create" commands) then 'vsctl-output'
> +# specifies the expected output after filtering through uuidfilt.
> +m4_define([OVS_TRAFFIC_VSWITCHD_START],
> +  [
> +   export OVS_PKGDATADIR=$(`pwd`)
> +   _OVS_VSWITCHD_START([--disable-system])
> +   AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [| uuidfilt])], [0], [$2])
> +])
> +
> +# OVS_TRAFFIC_VSWITCHD_STOP([WHITELIST], [extra_cmds])
> +#
> +# Gracefully stops ovs-vswitchd and ovsdb-server, checking their log files
> +# for messages with severity WARN or higher and signaling an error if any
> +# is present.  The optional WHITELIST may contain shell-quoted "sed"
> +# commands to delete any warnings that are actually expected, e.g.:
> +#
> +#   OVS_TRAFFIC_VSWITCHD_STOP(["/expected error/d"])
> +#
> +# 'extra_cmds' are shell commands to be executed afte OVS_VSWITCHD_STOP() is
> +# invoked. They can be used to perform additional cleanups such as name space
> +# removal.
> +m4_define([OVS_TRAFFIC_VSWITCHD_STOP],
> +  [OVS_VSWITCHD_STOP([dnl
> +$1";/netdev_linux.*obtaining netdev stats via vport failed/d
> +/dpif_netlink.*Generic Netlink family 'ovs_datapath' does not exist. The Open vSwitch kernel module is probably not loaded./d
> +/dpif_netdev(revalidator.*)|ERR|internal error parsing flow key/d
> +/dpif(revalidator.*)|WARN|netdev@ovs-netdev: failed to put/d
> +"])
> +   AT_CHECK([:; $2])
> +  ])
> +
> +m4_define([ADD_VETH_AFXDP],
> +    [ AT_CHECK([ip link add $1 type veth peer name ovs-$1 || return 77])
> +      CONFIGURE_AFXDP_VETH_OFFLOADS([$1])
> +      AT_CHECK([ip link set $1 netns $2])
> +      AT_CHECK([ip link set dev ovs-$1 up])
> +      AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \
> +                set interface ovs-$1 external-ids:iface-id="$1" type="afxdp"])
> +      NS_CHECK_EXEC([$2], [ip addr add $4 dev $1 $7])
> +      NS_CHECK_EXEC([$2], [ip link set dev $1 up])
> +      if test -n "$5"; then
> +        NS_CHECK_EXEC([$2], [ip link set dev $1 address $5])
> +      fi
> +      if test -n "$6"; then
> +        NS_CHECK_EXEC([$2], [ip route add default via $6])
> +      fi
> +      on_exit 'ip link del ovs-$1'
> +    ]
> +)
> +
> +# CONFIGURE_AFXDP_VETH_OFFLOADS([VETH])
> +#
> +# Disable TX offloads and VLAN offloads for veths used in AF_XDP.
> +m4_define([CONFIGURE_AFXDP_VETH_OFFLOADS],
> +    [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])
> +     AT_CHECK([ethtool -K $1 rxvlan off], [0], [ignore], [ignore])
> +     AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore])
> +    ]
> +)
> +
> +# CONFIGURE_VETH_OFFLOADS([VETH])
> +#
> +# Disable TX offloads for veths.  The userspace datapath uses the AF_PACKET
> +# socket to receive packets for veths.  Unfortunately, the AF_PACKET socket
> +# doesn't play well with offloads:
> +# 1. GSO packets are received without segmentation and therefore discarded.
> +# 2. Packets with offloaded partial checksum are received with the wrong
> +#    checksum, therefore discarded by the receiver.
> +#
> +# By disabling tx offloads in the non-OVS side of the veth peer we make sure
> +# that the AF_PACKET socket will not receive bad packets.
> +#
> +# This is a workaround, and should be removed when offloads are properly
> +# supported in netdev-linux.
> +m4_define([CONFIGURE_VETH_OFFLOADS],
> +    [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])]
> +)
> +
> +# CHECK_CONNTRACK()
> +#
> +# Perform requirements checks for running conntrack tests.
> +#
> +m4_define([CHECK_CONNTRACK],
> +    [AT_SKIP_IF([test $HAVE_PYTHON = no])]
> +)
> +
> +# CHECK_CONNTRACK_ALG()
> +#
> +# Perform requirements checks for running conntrack ALG tests. The userspace
> +# supports FTP and TFTP.
> +#
> +m4_define([CHECK_CONNTRACK_ALG])
> +
> +# CHECK_CONNTRACK_FRAG()
> +#
> +# Perform requirements checks for running conntrack fragmentations tests.
> +# The userspace doesn't support fragmentation yet, so skip the tests.
> +m4_define([CHECK_CONNTRACK_FRAG],
> +[
> +    AT_SKIP_IF([:])
> +])
> +
> +# CHECK_CONNTRACK_LOCAL_STACK()
> +#
> +# Perform requirements checks for running conntrack tests with local stack.
> +# While the kernel connection tracker automatically passes all the connection
> +# tracking state from an internal port to the OpenvSwitch kernel module, there
> +# is simply no way of doing that with the userspace, so skip the tests.
> +m4_define([CHECK_CONNTRACK_LOCAL_STACK],
> +[
> +    AT_SKIP_IF([:])
> +])
> +
> +# CHECK_CONNTRACK_NAT()
> +#
> +# Perform requirements checks for running conntrack NAT tests. The userspace
> +# datapath supports NAT.
> +#
> +m4_define([CHECK_CONNTRACK_NAT])
> +
> +# CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE()
> +#
> +# Perform requirements checks for running ovs-dpctl flush-conntrack by
> +# conntrack 5-tuple test. The userspace datapath does not support
> +# this feature yet.
> +m4_define([CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE],
> +[
> +    AT_SKIP_IF([:])
> +])
> +
> +# CHECK_CT_DPIF_SET_GET_MAXCONNS()
> +#
> +# Perform requirements checks for running ovs-dpctl ct-set-maxconns or
> +# ovs-dpctl ct-get-maxconns. The userspace datapath does support this feature.
> +m4_define([CHECK_CT_DPIF_SET_GET_MAXCONNS])
> +
> +# CHECK_CT_DPIF_GET_NCONNS()
> +#
> +# Perform requirements checks for running ovs-dpctl ct-get-nconns. The
> +# userspace datapath does support this feature.
> +m4_define([CHECK_CT_DPIF_GET_NCONNS])
> diff --git a/tests/system-afxdp-testsuite.at b/tests/system-afxdp-testsuite.at
> new file mode 100644
> index 000000000000..538c0d15d556
> --- /dev/null
> +++ b/tests/system-afxdp-testsuite.at
> @@ -0,0 +1,26 @@
> +AT_INIT
> +
> +AT_COPYRIGHT([Copyright (c) 2018 Nicira, Inc.
> +
> +Licensed under the Apache License, Version 2.0 (the "License");
> +you may not use this file except in compliance with the License.
> +You may obtain a copy of the License at:
> +
> +    http://www.apache.org/licenses/LICENSE-2.0
> +
> +Unless required by applicable law or agreed to in writing, software
> +distributed under the License is distributed on an "AS IS" BASIS,
> +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> +See the License for the specific language governing permissions and
> +limitations under the License.])
> +
> +m4_ifdef([AT_COLOR_TESTS], [AT_COLOR_TESTS])
> +
> +m4_include([tests/ovs-macros.at])
> +m4_include([tests/ovsdb-macros.at])
> +m4_include([tests/ofproto-macros.at])
> +m4_include([tests/system-afxdp-macros.at])
> +m4_include([tests/system-common-macros.at])
> +
> +m4_include([tests/system-afxdp-traffic.at])
> +m4_include([tests/system-ovn.at])
> diff --git a/tests/system-afxdp-traffic.at b/tests/system-afxdp-traffic.at
> new file mode 100644
> index 000000000000..26f72acf48ef
> --- /dev/null
> +++ b/tests/system-afxdp-traffic.at
> @@ -0,0 +1,978 @@
> +AT_BANNER([AF_XDP netdev datapath-sanity])
> +
> +AT_SETUP([datapath - ping between two ports])
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ulimit -l unlimited
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping between two ports on vlan])
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +ADD_VLAN(p0, at_ns0, 100, "10.2.2.1/24")
> +ADD_VLAN(p1, at_ns1, 100, "10.2.2.2/24")
> +
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.2.2.2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping6 between two ports])
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
> +
> +dnl Linux seems to take a little time to get its IPv6 stack in order. Without
> +dnl waiting, we get occasional failures due to the following error:
> +dnl "connect: Cannot assign requested address"
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
> +
> +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 6 fc00::2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping6 between two ports on vlan])
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
> +
> +ADD_VLAN(p0, at_ns0, 100, "fc00:1::1/96")
> +ADD_VLAN(p1, at_ns1, 100, "fc00:1::2/96")
> +
> +dnl Linux seems to take a little time to get its IPv6 stack in order. Without
> +dnl waiting, we get occasional failures due to the following error:
> +dnl "connect: Cannot assign requested address"
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00:1::2])
> +
> +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping6 -s 1600 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping over vxlan tunnel])
> +OVS_CHECK_VXLAN()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-underlay])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0)
> +
> +dnl Set up underlay link from host into the namespace using veth pair.
> +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> +AT_CHECK([ip link set dev br-underlay up])
> +
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace and with a native
> +dnl linux device inside the namespace.
> +ADD_OVS_TUNNEL([vxlan], [br0], [at_vxlan0], [172.31.1.1], [10.1.1.100/24])
> +ADD_NATIVE_TUNNEL([vxlan], [at_vxlan1], [at_ns0], [172.31.1.100], [10.1.1.1/24],
> +                  [id 0 dstport 4789])
> +
> +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK
> +])
> +
> +dnl First, check the underlay
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Okay, now check the overlay with different packet sizes
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping over vxlan6 tunnel])
> +OVS_CHECK_VXLAN_UDP6ZEROCSUM()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-underlay])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0)
> +
> +dnl Set up underlay link from host into the namespace using veth pair.
> +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
> +AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
> +AT_CHECK([ip link set dev br-underlay up])
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace and with a native
> +dnl linux device inside the namespace.
> +ADD_OVS_TUNNEL6([vxlan], [br0], [at_vxlan0], [fc00::1], [10.1.1.100/24])
> +ADD_NATIVE_TUNNEL6([vxlan], [at_vxlan1], [at_ns0], [fc00::100], [10.1.1.1/24],
> +                   [id 0 dstport 4789 udp6zerocsumtx udp6zerocsumrx])
> +
> +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0], [OK
> +])
> +
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
> +
> +dnl First, check the underlay
> +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Okay, now check the overlay with different packet sizes
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping over gre tunnel])
> +OVS_CHECK_GRE()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-underlay])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0)
> +
> +dnl Set up underlay link from host into the namespace using veth pair.
> +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> +AT_CHECK([ip link set dev br-underlay up])
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace and with a native
> +dnl linux device inside the namespace.
> +ADD_OVS_TUNNEL([gre], [br0], [at_gre0], [172.31.1.1], [10.1.1.100/24])
> +ADD_NATIVE_TUNNEL([gretap], [ns_gre0], [at_ns0], [172.31.1.100], [10.1.1.1/24])
> +
> +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK
> +])
> +
> +dnl First, check the underlay
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Okay, now check the overlay with different packet sizes
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping over erspan v1 tunnel])
> +OVS_CHECK_GRE()
> +OVS_CHECK_ERSPAN()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-underlay])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0)
> +
> +dnl Set up underlay link from host into the namespace using veth pair.
> +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> +AT_CHECK([ip link set dev br-underlay up])
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace and with a native
> +dnl linux device inside the namespace.
> +ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1], [10.1.1.100/24], [options:key=1 options:erspan_ver=1 options:erspan_idx=7])
> +ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100], [10.1.1.1/24], [seq key 1 erspan_ver 1 erspan 7])
> +
> +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK
> +])
> +
> +dnl First, check the underlay
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Okay, now check the overlay with different packet sizes
> +dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping over erspan v2 tunnel])
> +OVS_CHECK_GRE()
> +OVS_CHECK_ERSPAN()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-underlay])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0)
> +
> +dnl Set up underlay link from host into the namespace using veth pair.
> +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> +AT_CHECK([ip link set dev br-underlay up])
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace and with a native
> +dnl linux device inside the namespace.
> +ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1], [10.1.1.100/24], [options:key=1 options:erspan_ver=2 options:erspan_dir=1 options:erspan_hwid=0x7])
> +ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100], [10.1.1.1/24], [seq key 1 erspan_ver 2 erspan_dir egress erspan_hwid 7])
> +
> +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK
> +])
> +
> +dnl First, check the underlay
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Okay, now check the overlay with different packet sizes
> +dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping over ip6erspan v1 tunnel])
> +OVS_CHECK_GRE()
> +OVS_CHECK_ERSPAN()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-underlay])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0)
> +
> +dnl Set up underlay link from host into the namespace using veth pair.
> +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [], nodad)
> +AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad])
> +AT_CHECK([ip link set dev br-underlay up])
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace and with a native
> +dnl linux device inside the namespace.
> +ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1], [10.1.1.100/24],
> +                [options:key=123 options:erspan_ver=1 options:erspan_idx=0x7])
> +ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0], [fc00:100::100],
> +                   [10.1.1.1/24], [local fc00:100::1 seq key 123 erspan_ver 1 erspan 7])
> +
> +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0], [OK
> +])
> +
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100])
> +
> +dnl First, check the underlay
> +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Okay, now check the overlay with different packet sizes
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping over ip6erspan v2 tunnel])
> +OVS_CHECK_GRE()
> +OVS_CHECK_ERSPAN()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-underlay])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0)
> +
> +dnl Set up underlay link from host into the namespace using veth pair.
> +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [], nodad)
> +AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad])
> +AT_CHECK([ip link set dev br-underlay up])
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace and with a native
> +dnl linux device inside the namespace.
> +ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1], [10.1.1.100/24],
> +                [options:key=121 options:erspan_ver=2 options:erspan_dir=0 options:erspan_hwid=0x7])
> +ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0], [fc00:100::100],
> +                   [10.1.1.1/24],
> +                   [local fc00:100::1 seq key 121 erspan_ver 2 erspan_dir ingress erspan_hwid 0x7])
> +
> +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0], [OK
> +])
> +
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100])
> +
> +dnl First, check the underlay
> +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Okay, now check the overlay with different packet sizes
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping over geneve tunnel])
> +OVS_CHECK_GENEVE()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-underlay])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0)
> +
> +dnl Set up underlay link from host into the namespace using veth pair.
> +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> +AT_CHECK([ip link set dev br-underlay up])
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace and with a native
> +dnl linux device inside the namespace.
> +ADD_OVS_TUNNEL([geneve], [br0], [at_gnv0], [172.31.1.1], [10.1.1.100/24])
> +ADD_NATIVE_TUNNEL([geneve], [ns_gnv0], [at_ns0], [172.31.1.100], [10.1.1.1/24],
> +                  [vni 0])
> +
> +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.100/24 br-underlay], [0], [OK
> +])
> +
> +dnl First, check the underlay
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Okay, now check the overlay with different packet sizes
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - ping over geneve6 tunnel])
> +OVS_CHECK_GENEVE_UDP6ZEROCSUM()
> +
> +OVS_TRAFFIC_VSWITCHD_START()
> +ADD_BR([br-underlay])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0)
> +
> +dnl Set up underlay link from host into the namespace using veth pair.
> +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
> +AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
> +AT_CHECK([ip link set dev br-underlay up])
> +
> +dnl Set up tunnel endpoints on OVS outside the namespace and with a native
> +dnl linux device inside the namespace.
> +ADD_OVS_TUNNEL6([geneve], [br0], [at_gnv0], [fc00::1], [10.1.1.100/24])
> +ADD_NATIVE_TUNNEL6([geneve], [ns_gnv0], [at_ns0], [fc00::100], [10.1.1.1/24],
> +                   [vni 0 udp6zerocsumtx udp6zerocsumrx])
> +
> +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> +])
> +AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0], [OK
> +])
> +
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
> +
> +dnl First, check the underlay
> +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +dnl Okay, now check the overlay with different packet sizes
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - clone action])
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1, at_ns2)
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +AT_CHECK([ovs-vsctl -- set interface ovs-p0 ofport_request=1 \
> +                    -- set interface ovs-p1 ofport_request=2])
> +
> +AT_DATA([flows.txt], [dnl
> +priority=1 actions=NORMAL
> +priority=10 in_port=1,ip,actions=clone(mod_dl_dst(50:54:00:00:00:0a),set_field:192.168.3.3->ip_dst), output:2
> +priority=10 in_port=2,ip,actions=clone(mod_dl_src(ae:c6:7e:54:8d:4d),mod_dl_dst(50:54:00:00:00:0b),set_field:192.168.4.4->ip_dst, controller), output:1
> +])
> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> +
> +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir --pidfile 2> ofctl_monitor.log])
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +AT_CHECK([cat ofctl_monitor.log | STRIP_MONITOR_CSUM], [0], [dnl
> +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: <skip>
> +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: <skip>
> +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: <skip>
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([datapath - basic truncate action])
> +AT_SKIP_IF([test $HAVE_NC = no])
> +OVS_TRAFFIC_VSWITCHD_START()
> +AT_CHECK([ovs-ofctl del-flows br0])
> +
> +dnl Create p0 and ovs-p0(1)
> +ADD_NAMESPACES(at_ns0)
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> +NS_CHECK_EXEC([at_ns0], [ip link set dev p0 address e6:66:c1:11:11:11])
> +NS_CHECK_EXEC([at_ns0], [arp -s 10.1.1.2 e6:66:c1:22:22:22])
> +
> +dnl Create p1(3) and ovs-p1(2), packets received from ovs-p1 will appear in p1
> +AT_CHECK([ip link add p1 type veth peer name ovs-p1])
> +on_exit 'ip link del ovs-p1'
> +AT_CHECK([ip link set dev ovs-p1 up])
> +AT_CHECK([ip link set dev p1 up])
> +AT_CHECK([ovs-vsctl add-port br0 ovs-p1 -- set interface ovs-p1 ofport_request=2])
> +dnl Use p1 to check the truncated packet
> +AT_CHECK([ovs-vsctl add-port br0 p1 -- set interface p1 ofport_request=3])
> +
> +dnl Create p2(5) and ovs-p2(4)
> +AT_CHECK([ip link add p2 type veth peer name ovs-p2])
> +on_exit 'ip link del ovs-p2'
> +AT_CHECK([ip link set dev ovs-p2 up])
> +AT_CHECK([ip link set dev p2 up])
> +AT_CHECK([ovs-vsctl add-port br0 ovs-p2 -- set interface ovs-p2 ofport_request=4])
> +dnl Use p2 to check the truncated packet
> +AT_CHECK([ovs-vsctl add-port br0 p2 -- set interface p2 ofport_request=5])
> +
> +dnl basic test
> +AT_CHECK([ovs-ofctl del-flows br0])
> +AT_DATA([flows.txt], [dnl
> +in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop
> +in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop
> +in_port=1 dl_dst=e6:66:c1:22:22:22 actions=output(port=2,max_len=100),output:4
> +])
> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> +
> +dnl use this file as payload file for ncat
> +AT_CHECK([dd if=/dev/urandom of=payload200.bin bs=200 count=1 2> /dev/null])
> +on_exit 'rm -f payload200.bin'
> +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin])
> +
> +dnl packet with truncated size
> +AT_CHECK([ovs-appctl revalidator/purge], [0])
> +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" |  sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> +n_bytes=100
> +])
> +dnl packet with original size
> +AT_CHECK([ovs-appctl revalidator/purge], [0])
> +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> +n_bytes=242
> +])
> +
> +dnl more complicated output actions
> +AT_CHECK([ovs-ofctl del-flows br0])
> +AT_DATA([flows.txt], [dnl
> +in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop
> +in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop
> +in_port=1 dl_dst=e6:66:c1:22:22:22 actions=output(port=2,max_len=100),output:4,output(port=2,max_len=100),output(port=4,max_len=100),output:2,output(port=4,max_len=200),output(port=2,max_len=65535)
> +])
> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> +
> +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin])
> +
> +dnl 100 + 100 + 242 + min(65535,242) = 684
> +AT_CHECK([ovs-appctl revalidator/purge], [0])
> +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> +n_bytes=684
> +])
> +dnl 242 + 100 + min(242,200) = 542
> +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> +n_bytes=542
> +])
> +
> +dnl SLOW_ACTION: disable kernel datapath truncate support
> +dnl Repeat the test above, but exercise the SLOW_ACTION code path
> +AT_CHECK([ovs-appctl dpif/set-dp-features br0 trunc false], [0])
> +
> +dnl SLOW_ACTION test1: check datapatch actions
> +AT_CHECK([ovs-ofctl del-flows br0])
> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> +
> +AT_CHECK([ovs-appctl ofproto/trace br0 "in_port=1,dl_type=0x800,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_proto=6,tp_src=8,tp_dst=9"], [0], [stdout])
> +AT_CHECK([tail -3 stdout], [0],
> +[Datapath actions: trunc(100),3,5,trunc(100),3,trunc(100),5,3,trunc(200),5,trunc(65535),3
> +This flow is handled by the userspace slow path because it:
> +  - Uses action(s) not supported by datapath.
> +])
> +
> +dnl SLOW_ACTION test2: check actual packet truncate
> +AT_CHECK([ovs-ofctl del-flows br0])
> +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin])
> +
> +dnl 100 + 100 + 242 + min(65535,242) = 684
> +AT_CHECK([ovs-appctl revalidator/purge], [0])
> +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> +n_bytes=684
> +])
> +
> +dnl 242 + 100 + min(242,200) = 542
> +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> +n_bytes=542
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +
> +AT_BANNER([conntrack])
> +
> +AT_SETUP([conntrack - controller])
> +CHECK_CONNTRACK()
> +OVS_TRAFFIC_VSWITCHD_START()
> +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg ofproto_dpif_upcall:dbg])
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
> +AT_DATA([flows.txt], [dnl
> +priority=1,action=drop
> +priority=10,arp,action=normal
> +priority=100,in_port=1,udp,action=ct(commit),controller
> +priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0)
> +priority=100,in_port=2,ct_state=+trk+est,udp,action=controller
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +AT_CAPTURE_FILE([ofctl_monitor.log])
> +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir --pidfile 2> ofctl_monitor.log])
> +
> +dnl Send an unsolicited reply from port 2. This should be dropped.
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\) '50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000'])
> +
> +dnl OK, now start a new connection from port 1.
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 1 ct\(commit\),controller '50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000'])
> +
> +dnl Now try a reply from port 2.
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\) '50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000'])
> +
> +dnl Check this output. We only see the latter two packets, not the first.
> +AT_CHECK([cat ofctl_monitor.log], [0], [dnl
> +NXT_PACKET_IN2 (xid=0x0): total_len=42 in_port=1 (via action) data_len=42 (unbuffered)
> +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2 udp_csum:0
> +NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 ct_state=est|rpl|trk,ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2,ip,in_port=2 (via action) data_len=42 (unbuffered)
> +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1 udp_csum:0
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([conntrack - force commit])
> +CHECK_CONNTRACK()
> +OVS_TRAFFIC_VSWITCHD_START()
> +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg ofproto_dpif_upcall:dbg])
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +AT_DATA([flows.txt], [dnl
> +priority=1,action=drop
> +priority=10,arp,action=normal
> +priority=100,in_port=1,udp,action=ct(force,commit),controller
> +priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0)
> +priority=100,in_port=2,ct_state=+trk+est,udp,action=ct(force,commit,table=1)
> +table=1,in_port=2,ct_state=+trk,udp,action=controller
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +AT_CAPTURE_FILE([ofctl_monitor.log])
> +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir --pidfile 2> ofctl_monitor.log])
> +
> +dnl Send an unsolicited reply from port 2. This should be dropped.
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
> +
> +dnl OK, now start a new connection from port 1.
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
> +
> +dnl Now try a reply from port 2.
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
> +
> +AT_CHECK([ovs-appctl revalidator/purge], [0])
> +
> +dnl Check this output. We only see the latter two packets, not the first.
> +AT_CHECK([cat ofctl_monitor.log], [0], [dnl
> +NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 in_port=1 (via action) data_len=42 (unbuffered)
> +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2 udp_csum:0
> +NXT_PACKET_IN2 (xid=0x0): table_id=1 cookie=0x0 total_len=42 ct_state=new|trk,ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1,ip,in_port=2 (via action) data_len=42 (unbuffered)
> +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1 udp_csum:0
> +])
> +
> +dnl
> +dnl Check that the directionality has been changed by force commit.
> +dnl
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [], [dnl
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2)
> +])
> +
> +dnl OK, now send another packet from port 1 and see that it switches again
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
> +AT_CHECK([ovs-appctl revalidator/purge], [0])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1,"], [], [dnl
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([conntrack - ct flush by 5-tuple])
> +CHECK_CONNTRACK()
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +AT_DATA([flows.txt], [dnl
> +priority=1,action=drop
> +priority=10,arp,action=normal
> +priority=100,in_port=1,udp,action=ct(commit),2
> +priority=100,in_port=2,udp,action=ct(zone=5,commit),1
> +priority=100,in_port=1,icmp,action=ct(commit),2
> +priority=100,in_port=2,icmp,action=ct(zone=5,commit),1
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +dnl Test UDP from port 1
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1,"], [], [dnl
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1'])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1,"], [1], [dnl
> +])
> +
> +dnl Test UDP from port 2
> +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [0], [dnl
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5 'ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2'])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
> +])
> +
> +dnl Test ICMP traffic
> +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [0], [stdout])
> +AT_CHECK([cat stdout | FORMAT_CT(10.1.1.1)], [0],[dnl
> +icmp,orig=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=8,code=0),reply=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=0,code=0),zone=5
> +])
> +
> +ICMP_ID=`cat stdout | cut -d ',' -f4 | cut -d '=' -f2`
> +ICMP_TUPLE=ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=1,icmp_id=$ICMP_ID,icmp_type=8,icmp_code=0
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5 $ICMP_TUPLE])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [1], [dnl
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([conntrack - IPv4 ping])
> +CHECK_CONNTRACK()
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
> +AT_DATA([flows.txt], [dnl
> +priority=1,action=drop
> +priority=10,arp,action=normal
> +priority=100,in_port=1,icmp,action=ct(commit),2
> +priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
> +priority=100,in_port=2,icmp,ct_state=+trk+est,action=1
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +dnl Pings from ns0->ns1 should work fine.
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
> +icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=0,code=0)
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> +
> +dnl Pings from ns1->ns0 should fail.
> +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], [0], [dnl
> +7 packets transmitted, 0 received, 100% packet loss, time 0ms
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([conntrack - get_nconns and get/set_maxconns])
> +CHECK_CONNTRACK()
> +CHECK_CT_DPIF_SET_GET_MAXCONNS()
> +CHECK_CT_DPIF_GET_NCONNS()
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> +
> +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
> +AT_DATA([flows.txt], [dnl
> +priority=1,action=drop
> +priority=10,arp,action=normal
> +priority=100,in_port=1,icmp,action=ct(commit),2
> +priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
> +priority=100,in_port=2,icmp,ct_state=+trk+est,action=1
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +dnl Pings from ns0->ns1 should work fine.
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
> +icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=0,code=0)
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp], [2], [], [dnl
> +ovs-vswitchd: maxconns missing or malformed (Invalid argument)
> +ovs-appctl: ovs-vswitchd: server returned an error
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns a], [2], [], [dnl
> +ovs-vswitchd: maxconns missing or malformed (Invalid argument)
> +ovs-appctl: ovs-vswitchd: server returned an error
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp 10], [2], [], [dnl
> +ovs-vswitchd: datapath not found (Invalid argument)
> +ovs-appctl: ovs-vswitchd: server returned an error
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns one-bad-dp], [2], [], [dnl
> +ovs-vswitchd: datapath not found (Invalid argument)
> +ovs-appctl: ovs-vswitchd: server returned an error
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-get-nconns one-bad-dp], [2], [], [dnl
> +ovs-vswitchd: datapath not found (Invalid argument)
> +ovs-appctl: ovs-vswitchd: server returned an error
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl
> +1
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
> +3000000
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns 10], [], [dnl
> +setting maxconns successful
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
> +10
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl
> +0
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
> +10
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
> +AT_SETUP([conntrack - IPv6 ping])
> +CHECK_CONNTRACK()
> +OVS_TRAFFIC_VSWITCHD_START()
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
> +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
> +
> +AT_DATA([flows.txt], [dnl
> +
> +dnl ICMPv6 echo request and reply go to table 1.  The rest of the traffic goes
> +dnl through normal action.
> +table=0,priority=10,icmp6,icmp_type=128,action=goto_table:1
> +table=0,priority=10,icmp6,icmp_type=129,action=goto_table:1
> +table=0,priority=1,action=normal
> +
> +dnl Allow everything from ns0->ns1. Only allow return traffic from ns1->ns0.
> +table=1,priority=100,in_port=1,icmp6,action=ct(commit),2
> +table=1,priority=100,in_port=2,icmp6,ct_state=-trk,action=ct(table=0)
> +table=1,priority=100,in_port=2,icmp6,ct_state=+trk+est,action=1
> +table=1,priority=1,action=drop
> +])
> +
> +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> +
> +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
> +
> +dnl The above ping creates state in the connection tracker.  We're not
> +dnl interested in that state.
> +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> +
> +dnl Pings from ns1->ns0 should fail.
> +NS_CHECK_EXEC([at_ns1], [ping6 -q -c 3 -i 0.3 -w 2 fc00::1 | FORMAT_PING], [0], [dnl
> +7 packets transmitted, 0 received, 100% packet loss, time 0ms
> +])
> +
> +dnl Pings from ns0->ns1 should work fine.
> +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::2 | FORMAT_PING], [0], [dnl
> +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> +])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fc00::2)], [0], [dnl
> +icmpv6,orig=(src=fc00::1,dst=fc00::2,id=<cleared>,type=128,code=0),reply=(src=fc00::2,dst=fc00::1,id=<cleared>,type=129,code=0)
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
>
William Tu April 25, 2019, 11:49 p.m. UTC | #2
Hi Ilya,

Very appreciated for your review!
I will fix/address your comments in my next version.
see replies inline

On Thu, Apr 25, 2019 at 8:09 AM Ilya Maximets <i.maximets@samsung.com>
wrote:

> Hi.
>
> This is not a full review. Just a bunch of thoughts.
>
> See inline.
>
> Best regards, Ilya Maximets.
>
> On 25.04.2019 2:47, William Tu wrote:
> > The patch introduces experimental AF_XDP support for OVS netdev.
> > AF_XDP is a new address family working together with eBPF/XDP.
> > A socket with AF_XDP family can receive and send raw packets
> > from an eBPF/XDP program attached to the netdev.
> > For details introduction and configuration, see
> > Documentation/intro/install/afxdp.rst
> >
> > Signed-off-by: William Tu <u9012063@gmail.com>
> > Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
> > Co-authored-by: Yi-Hung Wei <yihung.wei@gmail.com>
> > ---
> > v1->v2:
> > - add a list to maintain unused umem elements
> > - remove copy from rx umem to ovs internal buffer
> > - use hugetlb to reduce misses (not much difference)
> > - use pmd mode netdev in OVS (huge performance improve)
> > - remove malloc dp_packet, instead put dp_packet in umem
> >
> > v2->v3:
> > - rebase on the OVS master, 7ab4b0653784
> >   ("configure: Check for more specific function to pull in pthread
> library.")
> > - remove the dependency on libbpf and dpif-bpf.
> >   instead, use the built-in XDP_ATTACH feature.
> > - data structure optimizations for better performance, see[1]
> > - more test cases support
> > v3:
> https://mail.openvswitch.org/pipermail/ovs-dev/2018-November/354179.html
> >
> > v3->v4:
> > - Use AF_XDP API provided by libbpf
> > - Remove the dependency on XDP_ATTACH kernel patch set
> > - Add documentation, bpf.rst
> >
> > v4->v5:
> > - rebase to master
> > - remove rfc, squash all into a single patch
> > - add --enable-afxdp, so by default, AF_XDP is not compiled
> > - add options: xdpmode=drv,skb
> > - add multiple queue and multiple PMD support, with options: n_rxq
> > - improve documentation, rename bpf.rst to af_xdp.rst
> >
> > v5->v6
> > - rebase to master, commit 0cdd5b13de91b98
> > - address errors from sparse and clang
> > - pass travis-ci test
> > - address feedback from Ben
> > - fix issues reported by 0-day robot
> > - improved documentation
> > ---
> >  Documentation/automake.mk             |   1 +
> >  Documentation/index.rst               |   1 +
> >  Documentation/intro/install/afxdp.rst | 366 +++++++++++++
> >  Documentation/intro/install/index.rst |   1 +
> >  acinclude.m4                          |  23 +
> >  configure.ac                          |   1 +
> >  lib/automake.mk                       |   7 +-
> >  lib/dp-packet.c                       |  16 +
> >  lib/dp-packet.h                       |  35 +-
> >  lib/dpif-netdev-perf.h                |  13 +
> >  lib/netdev-afxdp.c                    | 589 ++++++++++++++++++++
> >  lib/netdev-afxdp.h                    |  47 ++
> >  lib/netdev-linux.c                    |  89 +++-
> >  lib/netdev-linux.h                    |   1 +
> >  lib/netdev-provider.h                 |   1 +
> >  lib/netdev.c                          |   1 +
> >  lib/xdpsock.c                         | 210 ++++++++
> >  lib/xdpsock.h                         | 133 +++++
> >  tests/automake.mk                     |  17 +
> >  tests/system-afxdp-macros.at          | 153 ++++++
> >  tests/system-afxdp-testsuite.at       |  26 +
> >  tests/system-afxdp-traffic.at         | 978
> ++++++++++++++++++++++++++++++++++
> >  22 files changed, 2703 insertions(+), 6 deletions(-)
> >  create mode 100644 Documentation/intro/install/afxdp.rst
> >  create mode 100644 lib/netdev-afxdp.c
> >  create mode 100644 lib/netdev-afxdp.h
> >  create mode 100644 lib/xdpsock.c
> >  create mode 100644 lib/xdpsock.h
> >  create mode 100644 tests/system-afxdp-macros.at
> >  create mode 100644 tests/system-afxdp-testsuite.at
> >  create mode 100644 tests/system-afxdp-traffic.at
> >
> > diff --git a/Documentation/automake.mk b/Documentation/automake.mk
> > index 082438e09a33..11cc59efc881 100644
> > --- a/Documentation/automake.mk
> > +++ b/Documentation/automake.mk
> > @@ -10,6 +10,7 @@ DOC_SOURCE = \
> >       Documentation/intro/why-ovs.rst \
> >       Documentation/intro/install/index.rst \
> >       Documentation/intro/install/bash-completion.rst \
> > +     Documentation/intro/install/afxdp.rst \
> >       Documentation/intro/install/debian.rst \
> >       Documentation/intro/install/documentation.rst \
> >       Documentation/intro/install/distributions.rst \
> > diff --git a/Documentation/index.rst b/Documentation/index.rst
> > index 46261235c732..aa9e7c49f179 100644
> > --- a/Documentation/index.rst
> > +++ b/Documentation/index.rst
> > @@ -59,6 +59,7 @@ vSwitch? Start here.
> >    :doc:`intro/install/windows` |
> >    :doc:`intro/install/xenserver` |
> >    :doc:`intro/install/dpdk` |
> > +  :doc:`intro/install/afxdp` |
> >    :doc:`Installation FAQs <faq/releases>`
> >
> >  - **Tutorials:** :doc:`tutorials/faucet` |
> > diff --git a/Documentation/intro/install/afxdp.rst
> b/Documentation/intro/install/afxdp.rst
> > new file mode 100644
> > index 000000000000..a1e3317bbdb5
> > --- /dev/null
> > +++ b/Documentation/intro/install/afxdp.rst
> > @@ -0,0 +1,366 @@
> > +..
> > +      Licensed under the Apache License, Version 2.0 (the "License");
> you may
> > +      not use this file except in compliance with the License. You may
> obtain
> > +      a copy of the License at
> > +
> > +          http://www.apache.org/licenses/LICENSE-2.0
> > +
> > +      Unless required by applicable law or agreed to in writing,
> software
> > +      distributed under the License is distributed on an "AS IS" BASIS,
> WITHOUT
> > +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> See the
> > +      License for the specific language governing permissions and
> limitations
> > +      under the License.
> > +
> > +      Convention for heading levels in Open vSwitch documentation:
> > +
> > +      =======  Heading 0 (reserved for the title in a document)
> > +      -------  Heading 1
> > +      ~~~~~~~  Heading 2
> > +      +++++++  Heading 3
> > +      '''''''  Heading 4
> > +
> > +      Avoid deeper levels because they do not render well.
> > +
> > +
> > +========================
> > +Open vSwitch with AF_XDP
> > +========================
> > +
> > +This document describes how to build and install Open vSwitch using
> > +AF_XDP netdev.
> > +
> > +.. warning::
> > +  The AF_XDP support of Open vSwitch is considered 'experimental',
> > +  and it is not compiled in by default.
> > +
> > +Introduction
> > +------------
> > +AF_XDP, Address Family of the eXpress Data Path, is a new Linux socket
> type
> > +built upon the eBPF and XDP technology.  It is aims to have comparable
> > +performance to DPDK but cooperate better with existing kernel's
> networking
> > +stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP
> program
> > +attached to the netdev, by-passing a couple of Linux kernel's
> subsystems.
> > +As a result, AF_XDP socket shows much better performance than AF_PACKET.
> > +For more details about AF_XDP, please see linux kernel's
> > +Documentation/networking/af_xdp.rst
> > +
> > +
> > +AF_XDP Netdev
> > +-------------
> > +OVS has a couple of netdev types, i.e., system, tap, or
> > +internal.  The AF_XDP feature adds a new netdev types called
> > +"afxdp", and implement its configuration, packet reception,
> > +and transmit functions.  Since the AF_XDP socket, xsk,
> > +operates in userspace, once ovs-vswitchd receives packets
> > +from xsk, the proposed architecture re-uses the existing
> > +userspace dpif-netdev datapath.  As a result, most of
> > +the packet processing happens at the userspace instead of
> > +linux kernel.
> > +
> > +::
> > +
> > +              |   +-------------------+
> > +              |   |    ovs-vswitchd   |<-->ovsdb-server
> > +              |   +-------------------+
> > +              |   |      ofproto      |<-->OpenFlow controllers
> > +              |   +--------+-+--------+
> > +              |   | netdev | |ofproto-|
> > +    userspace |   +--------+ |  dpif  |
> > +              |   | afxdp  | +--------+
> > +              |   | netdev | |  dpif  |
> > +              |   +---||---+ +--------+
> > +              |       ||     |  dpif- |
> > +              |       ||     | netdev |
> > +              |_      ||     +--------+
> > +                      ||
> > +               _  +---||-----+--------+
> > +              |   | AF_XDP prog +     |
> > +       kernel |   |   xsk_map         |
> > +              |_  +--------||---------+
> > +                           ||
> > +                        physical
> > +                           NIC
> > +
> > +
> > +Build requirements
> > +------------------
> > +
> > +In addition to the requirements described in :doc:`general`, building
> Open
> > +vSwitch with AF_XDP will require the following:
> > +
> > +- libbpf from kernel source tree (kernel 5.0.0 or later)
> > +
> > +- Linux kernel XDP support, with the following options (required)
> > +  ``_CONFIG_BPF=y``
> > +
> > +  ``_CONFIG_BPF_SYSCALL=y``
> > +
> > +  ``_CONFIG_XDP_SOCKETS=y``
> > +
> > +
> > +- The following optional Kconfig options are also recommended, but not
> > +  required:
> > +
> > +  ``_CONFIG_BPF_JIT=y`` (Performance)
> > +
> > +  ``_CONFIG_HAVE_BPF_JIT=y`` (Performance)
> > +
> > +  ``_CONFIG_XDP_SOCKETS_DIAG=y`` (Debugging)
> > +
> > +- If possible, run **./xdpsock -r -N -z -i <your device>** under
> > +  linux/samples/bpf.  This is the OVS indepedent benchmark tools for
> AF_XDP.
> > +  It makes sure your basic kernel requirements are met for AF_XDP.
> > +
> > +
> > +Installing
> > +----------
> > +For OVS to use AF_XDP netdev, it has to be configured with LIBBPF
> support.
> > +Frist, clone a recent version of Linux bpf-next tree::
> > +
> > +  git clone git://
> git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
> > +
> > +Second, go into the Linux source directory and build libbpf in the tools
> > +directory::
> > +
> > +  cd bpf-next/
> > +  cd tools/lib/bpf/
> > +  make && make install
> > +  make install_headers
> > +
> > +.. note::
> > +   Make sure xsk.h and bpf.h are installed in system's library path,
> > +   e.g. /usr/local/include/bpf/ or /usr/include/bpf/
> > +
> > +Make sure the libbpf.so is installed correctly::
> > +
> > +  ldconfig
> > +  ldconfig -p | grep libbpf
> > +
> > +
> > +Third, ensure the standard OVS requirements are installed and
> > +bootstrap/configure the package::
> > +
> > +  ./boot.sh && ./configure --enable-afxdp
> > +
> > +Finally, build and install OVS::
> > +
> > +  make && make install
> > +
> > +To kick start end-to-end autotesting::
> > +
> > +  uname -a # make sure having 5.0+ kernel
> > +  make check-afxdp
> > +
> > +if a test case fails, check the log at::
> > +
> > +  cat
> tests/system-afxdp-testsuite.dir/<number>/system-afxdp-testsuite.log
> > +
> > +
> > +Setup AF_XDP netdev
> > +-------------------
> > +Before running OVS with AF_XDP, make sure the libbpf and libelf are
> > +set-up right::
> > +
> > +  ldd vswitchd/ovs-vswitchd
> > +
> > +Open vSwitch should be started using userspace datapath as described
> > +in :doc:`general`::
> > +
> > +  ovs-vswitchd --disable-system
> > +  ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
> > +
> > +.. note::
> > +   OVS AF_XDP netdev is using the userspace datapath, the same datapath
> > +   as used by OVS-DPDK.  So it requires --disable-system for
> ovs-vswitchd
> > +   and datapath_type=netdev when adding a new bridge.
> > +
> > +Make sure your device support AF_XDP, and to use 1 PMD (on core 4)
> > +on 1 queue (queue 0) device, configure these options: **pmd-cpu-mask,
> > +pmd-rxq-affinity, and n_rxq**. The **xdpmode** can be "drv" or "skb"::
> > +
> > +  ethtool -L enp2s0 combined 1
> > +  ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
> > +  ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
> > +    options:n_rxq=1 options:xdpmode=drv \
> > +    other_config:pmd-rxq-affinity="0:4"
> > +
> > +Or, use 4 pmds/cores and 4 queues by doing::
> > +
> > +  ethtool -L enp2s0 combined 4
> > +  ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
> > +  ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
> > +    options:n_rxq=4 options:xdpmode=drv \
> > +    other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
> > +
> > +To validate that the bridge has successfully instantiated, you can use
> the::
> > +
> > +  ovs-vsctl show
> > +
> > +should show something like::
> > +
> > +  Port "ens802f0"
> > +   Interface "ens802f0"
> > +      type: afxdp
> > +      options: {n_rxq="1", xdpmode=drv}
> > +
> > +Otherwise, enable debug by::
> > +
> > +  ovs-appctl vlog/set netdev_afxdp::dbg
> > +
> > +
> > +References
> > +----------
> > +Most of the design details are described in the paper presented at
> > +Linux Plumber 2018, "Bringing the Power of eBPF to Open vSwitch"[1],
> > +section 4, and slides[2][4].
> > +"The Path to DPDK Speeds for AF XDP"[3] gives a very good introduction
> > +about AF_XDP current and future work.
> > +
> > +
> > +[1] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-afxdp.pdf
> > +
> > +[2]
> http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-lpc18-presentation.pdf
> > +
> > +[3]
> http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf
> > +
> > +[4]
> https://ovsfall2018.sched.com/event/IO7p/fast-userspace-ovs-with-afxdp
> > +
> > +
> > +Performance Tuning
> > +------------------
> > +The name of the game is to keep your CPU running in userspace, allowing
> PMD
> > +to keep polling the AF_XDP queues without any interferences from kernel.
> > +
> > +#. Make sure everything is in the same NUMA node (memory used by
> AF_XDP, pmd
> > +   running cores, device plug-in slot)
> > +
> > +#. Isolate your CPU by doing isolcpu at grub configure.
> > +
> > +#. IRQ should not set to pmd running core.
> > +
> > +#. The Spectre and Meltdown fixes increase the overhead of system calls.
> > +
> > +Debugging performance issue
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +While running the traffic, use linux perf tool to see where your cpu
> > +spends its cycle::
> > +
> > +  cd bpf-next/tools/perf
> > +  make
> > +  ./perf record -p `pidof ovs-vswitchd` sleep 10
> > +  ./perf report
> > +
> > +Measure your system call rate by doing::
> > +
> > +  pstree -p `pidof ovs-vswitchd`
> > +  strace -c -p <your pmd's PID>
> > +
> > +Or, use OVS pmd tool::
> > +
> > +  ovs-appctl dpif-netdev/pmd-stats-show
> > +
> > +
> > +Example Script
> > +--------------
> > +
> > +Below is a script using namespaces and veth peer::
> > +
> > +  #!/bin/bash
> > +  ovs-vswitchd --no-chdir --pidfile -vvconn -vofproto_dpif -vunixctl \
> > +    --disable-system --detach \
> > +  ovs-vsctl -- add-br br0 -- set Bridge br0 \
> > +    protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14 \
> > +    fail-mode=secure datapath_type=netdev
> > +  ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
> > +
> > +  ip netns add at_ns0
> > +  ovs-appctl vlog/set netdev_afxdp::dbg
> > +
> > +  ip link add p0 type veth peer name afxdp-p0
> > +  ip link set p0 netns at_ns0
> > +  ip link set dev afxdp-p0 up
> > +  ovs-vsctl add-port br0 afxdp-p0 -- \
> > +    set interface afxdp-p0 external-ids:iface-id="p0" type="afxdp"
> > +
> > +  ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
> > +  ip addr add "10.1.1.1/24" dev p0
> > +  ip link set dev p0 up
> > +  NS_EXEC_HEREDOC
> > +
> > +  ip netns add at_ns1
> > +  ip link add p1 type veth peer name afxdp-p1
> > +  ip link set p1 netns at_ns1
> > +  ip link set dev afxdp-p1 up
> > +
> > +  ovs-vsctl add-port br0 afxdp-p1 -- \
> > +    set interface afxdp-p1 external-ids:iface-id="p1" type="afxdp"
> > +  ip netns exec at_ns1 sh << NS_EXEC_HEREDOC
> > +  ip addr add "10.1.1.2/24" dev p1
> > +  ip link set dev p1 up
> > +  NS_EXEC_HEREDOC
> > +
> > +  ip netns exec at_ns0 ping -i .2 10.1.1.2
> > +
> > +
> > +Limitations/Known Issues
> > +------------------------
> > +#. Device's numa ID is always 0, need a way to find numa id from a
> netdev.
> > +#. No QoS support because AF_XDP netdev by-pass the Linux TC layer. A
> possible
> > +   work-around is to use OpenFlow meter action.
> > +#. AF_XDP device added to bridge, remove, and added again will fail.
> > +#. Most of the tests are done using i40e single port. Multiple ports and
> > +   also ixgbe driver also needs to be tested.
> > +#. No latency test result (TODO items)
> > +
> > +
> > +make check-afxdp
> > +----------------
> > +When executing 'make check-afxdp', OVS creates namespaces, sets up
> AF_XDP on
> > +veth devices and kicks start the testing.  So far we have the following
> test
> > +cases::
> > +
> > + AF_XDP netdev datapath-sanity
> > +
> > +  1: datapath - ping between two ports               ok
> > +  2: datapath - ping between two ports on vlan       ok
> > +  3: datapath - ping6 between two ports              ok
> > +  4: datapath - ping6 between two ports on vlan      ok
> > +  5: datapath - ping over vxlan tunnel               ok
> > +  6: datapath - ping over vxlan6 tunnel              ok
> > +  7: datapath - ping over gre tunnel                 ok
> > +  8: datapath - ping over erspan v1 tunnel           ok
> > +  9: datapath - ping over erspan v2 tunnel           ok
> > + 10: datapath - ping over ip6erspan v1 tunnel        ok
> > + 11: datapath - ping over ip6erspan v2 tunnel        ok
> > + 12: datapath - ping over geneve tunnel              ok
> > + 13: datapath - ping over geneve6 tunnel             ok
> > + 14: datapath - clone action                         ok
> > + 15: datapath - basic truncate action                ok
> > +
> > + conntrack
> > +
> > + 16: conntrack - controller                          ok
> > + 17: conntrack - force commit                        ok
> > + 18: conntrack - ct flush by 5-tuple                 ok
> > + 19: conntrack - IPv4 ping                           ok
> > + 20: conntrack - get_nconns and get/set_maxconns     ok
> > + 21: conntrack - IPv6 ping                           ok
> > +
> > + system-ovn
> > +
> > + 22: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT ok
> > + 23: ovn -- 2 LRs connected via LS, gateway router, easy SNAT ok
> > + 24: ovn -- multiple gateway routers, SNAT and DNAT  ok
> > + 25: ovn -- load-balancing                           ok
> > + 26: ovn -- load-balancing - same subnet.            ok
> > + 27: ovn -- load balancing in gateway router         ok
> > + 28: ovn -- multiple gateway routers, load-balancing ok
> > + 29: ovn -- load balancing in router with gateway router port ok
> > + 30: ovn -- DNAT and SNAT on distributed router - N/S ok
> > + 31: ovn -- DNAT and SNAT on distributed router - E/W ok
> > +
> > +
> > +Bug Reporting
> > +-------------
> > +
> > +Please report problems to dev@openvswitch.org.
> > diff --git a/Documentation/intro/install/index.rst
> b/Documentation/intro/install/index.rst
> > index 3193c736cf17..c27a9c9d16ff 100644
> > --- a/Documentation/intro/install/index.rst
> > +++ b/Documentation/intro/install/index.rst
> > @@ -45,6 +45,7 @@ Installation from Source
> >     xenserver
> >     userspace
> >     dpdk
> > +   afxdp
> >
> >  Installation from Packages
> >  --------------------------
> > diff --git a/acinclude.m4 b/acinclude.m4
> > index 301aeb70d82a..d80f2494d514 100644
> > --- a/acinclude.m4
> > +++ b/acinclude.m4
> > @@ -221,6 +221,29 @@ AC_DEFUN([OVS_FIND_DEPENDENCY], [
> >    ])
> >  ])
> >
> > +dnl OVS_CHECK_LINUX_AF_XDP
> > +dnl
> > +dnl Check both Linux kernel AF_XDP and libbpf support
> > +AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
> > +  AC_MSG_CHECKING([whether AF_XDP is supported])
> > +  AC_ARG_ENABLE([afxdp],
> > +                [AC_HELP_STRING([--enable-afxdp], [Enable AF-XDP
> support])],
> > +                [], [enable_afxdp=no])
> > +  AC_CHECK_HEADER([bpf/libbpf.h],
> > +                  [HAVE_LIBBPF=yes],
> > +                  [HAVE_LIBBPF=no])
> > +  AC_CHECK_HEADER([linux/if_xdp.h],
> > +                  [HAVE_IF_XDP=yes],
> > +                  [HAVE_IF_XDP=no])
> > +  AM_CONDITIONAL([SUPPORT_AF_XDP],
> > +                 [test "$enable_afxdp" = yes && test "$HAVE_LIBBPF" =
> yes && test "$HAVE_IF_XDP" = yes])
> > +  AM_COND_IF([SUPPORT_AF_XDP], [
> > +    AC_DEFINE([HAVE_AF_XDP], [1], [Define to 1 if AF-XDP support is
> available and enabled.])
> > +    LIBBPF_LDADD=" -lbpf -lelf"
> > +    AC_SUBST([LIBBPF_LDADD])
> > +  ])
> > +])
> > +
>
> I think that configure should fail in case we have no required headers.
> It's confusing that I explicitly enabled afxdp, but OVS was built without
> its support.
> One more thing is that AC_MSG_CHECKING requires subsequent AC_MSG_RESULT,
> otherwise it will look not good.
>
> Suggesting following incremental:
>
> diff --git a/acinclude.m4 b/acinclude.m4
> index d80f2494d..c919af570 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -225,23 +225,26 @@ dnl OVS_CHECK_LINUX_AF_XDP
>  dnl
>  dnl Check both Linux kernel AF_XDP and libbpf support
>  AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
> -  AC_MSG_CHECKING([whether AF_XDP is supported])
>    AC_ARG_ENABLE([afxdp],
>                  [AC_HELP_STRING([--enable-afxdp], [Enable AF-XDP
> support])],
>                  [], [enable_afxdp=no])
> -  AC_CHECK_HEADER([bpf/libbpf.h],
> -                  [HAVE_LIBBPF=yes],
> -                  [HAVE_LIBBPF=no])
> -  AC_CHECK_HEADER([linux/if_xdp.h],
> -                  [HAVE_IF_XDP=yes],
> -                  [HAVE_IF_XDP=no])
> -  AM_CONDITIONAL([SUPPORT_AF_XDP],
> -                 [test "$enable_afxdp" = yes && test "$HAVE_LIBBPF" = yes
> && test "$HAVE_IF_XDP" = yes])
> -  AM_COND_IF([SUPPORT_AF_XDP], [
> -    AC_DEFINE([HAVE_AF_XDP], [1], [Define to 1 if AF-XDP support is
> available and enabled.])
> +  AC_MSG_CHECKING([whether AF_XDP is enabled])
> +  if test "$enable_afxdp" != yes; then
> +    AC_MSG_RESULT([no])
> +  else
> +    AC_MSG_RESULT([yes])
> +
> +    AC_CHECK_HEADER([bpf/libbpf.h], [],
> +      [AC_MSG_ERROR([unable to find bpf/libbpf.h for AF_XDP support])])
> +
> +    AC_CHECK_HEADER([linux/if_xdp.h], [],
> +      [AC_MSG_ERROR([unable to find linux/if_xdp.h for AF_XDP support])])
> +
> +    AC_DEFINE([HAVE_AF_XDP], [1],
> +              [Define to 1 if AF-XDP support is available and enabled.])
>      LIBBPF_LDADD=" -lbpf -lelf"
>      AC_SUBST([LIBBPF_LDADD])
> -  ])
> +  fi
>  ])
>
>  dnl OVS_CHECK_DPDK
> ---
>
Thanks, will do it.



>
>
> >  dnl OVS_CHECK_DPDK
> >  dnl
> >  dnl Configure DPDK source tree
> > diff --git a/configure.ac b/configure.ac
> > index 505e3d041e93..29c90b73f836 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -99,6 +99,7 @@ OVS_CHECK_SPHINX
> >  OVS_CHECK_DOT
> >  OVS_CHECK_IF_DL
> >  OVS_CHECK_STRTOK_R
> > +OVS_CHECK_LINUX_AF_XDP
> >  AC_CHECK_DECLS([sys_siglist], [], [], [[#include <signal.h>]])
> >  AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct
> stat.st_mtimensec],
> >    [], [], [[#include <sys/stat.h>]])
> > diff --git a/lib/automake.mk b/lib/automake.mk
> > index cc5dccf39d6b..8b9df5635bbe 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -9,6 +9,7 @@ lib_LTLIBRARIES += lib/libopenvswitch.la
> >
> >  lib_libopenvswitch_la_LIBADD = $(SSL_LIBS)
> >  lib_libopenvswitch_la_LIBADD += $(CAPNG_LDADD)
> > +lib_libopenvswitch_la_LIBADD += $(LIBBPF_LDADD)
> >
> >  if WIN32
> >  lib_libopenvswitch_la_LIBADD += ${PTHREAD_LIBS}
> > @@ -327,7 +328,11 @@ lib_libopenvswitch_la_SOURCES = \
> >       lib/lldp/lldpd.c \
> >       lib/lldp/lldpd.h \
> >       lib/lldp/lldpd-structs.c \
> > -     lib/lldp/lldpd-structs.h
> > +     lib/lldp/lldpd-structs.h \
> > +     lib/xdpsock.c \
> > +     lib/xdpsock.h \
> > +     lib/netdev-afxdp.c \
> > +     lib/netdev-afxdp.h
>
> Maybe it's better to move all these files under #ifdef HAVE_AF_XDP ?
>
> >
> >  if WIN32
> >  lib_libopenvswitch_la_SOURCES += \
> > diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> > index 0976a35e758b..a61552f72988 100644
> > --- a/lib/dp-packet.c
> > +++ b/lib/dp-packet.c
> > @@ -22,6 +22,9 @@
> >  #include "netdev-dpdk.h"
> >  #include "openvswitch/dynamic-string.h"
> >  #include "util.h"
> > +#ifdef HAVE_AF_XDP
> > +#include "xdpsock.h"
> > +#endif
> >
> >  static void
> >  dp_packet_init__(struct dp_packet *b, size_t allocated, enum
> dp_packet_source source)
> > @@ -122,6 +125,16 @@ dp_packet_uninit(struct dp_packet *b)
> >               * created as a dp_packet */
> >              free_dpdk_buf((struct dp_packet*) b);
> >  #endif
> > +        } else if (b->source == DPBUF_AFXDP) {
> > +#ifdef HAVE_AF_XDP
> > +            struct dp_packet_afxdp *xpacket;
> > +
> > +            xpacket = dp_packet_cast_afxdp(b);
> > +            if (xpacket->mpool) {
> > +                umem_elem_push(xpacket->mpool, dp_packet_base(b));
> > +            }
> > +#endif
>
> Why not making the same trick as we have for DPDK few lines above?
> i.e. wrap this part in a function like 'free_afxdp_buf' and move it
> to the netdev-afxdp.c ? You will not need to expose so many internals
> to generic code. dp_packet_cast_afxdp() will also be moved there along
> with 'struct dp_packet_afxdp'.
>

Yes, make sense. I will move it to netdev-afxdp.c


>
> BTW, I hope, someday, I'll finally implement 'dp-packet-memory-provider'
> abstraction for OVS.
>
> > +            return;
> >          }
> >      }
> >  }
> > @@ -248,6 +261,8 @@ dp_packet_resize__(struct dp_packet *b, size_t
> new_headroom, size_t new_tailroom
> >      case DPBUF_STACK:
> >          OVS_NOT_REACHED();
> >
> > +    case DPBUF_AFXDP:
> > +        OVS_NOT_REACHED();
>
> Some space required between cases.
>
OK Thanks

>
> >      case DPBUF_STUB:
> >          b->source = DPBUF_MALLOC;
> >          new_base = xmalloc(new_allocated);
> > @@ -433,6 +448,7 @@ dp_packet_steal_data(struct dp_packet *b)
> >  {
> >      void *p;
> >      ovs_assert(b->source != DPBUF_DPDK);
> > +    ovs_assert(b->source != DPBUF_AFXDP);
> >
> >      if (b->source == DPBUF_MALLOC && dp_packet_data(b) ==
> dp_packet_base(b)) {
> >          p = dp_packet_data(b);
> > diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> > index a5e9ade1244a..774728eef330 100644
> > --- a/lib/dp-packet.h
> > +++ b/lib/dp-packet.h
> > @@ -25,6 +25,10 @@
> >  #include <rte_mbuf.h>
> >  #endif
> >
> > +#ifdef HAVE_AF_XDP
> > +#include "lib/xdpsock.h"
> > +#endif
> > +
> >  #include "netdev-dpdk.h"
> >  #include "openvswitch/list.h"
> >  #include "packets.h"
> > @@ -42,6 +46,7 @@ enum OVS_PACKED_ENUM dp_packet_source {
> >      DPBUF_DPDK,                /* buffer data is from DPDK allocated
> memory.
> >                                  * ref to dp_packet_init_dpdk() in
> dp-packet.c.
> >                                  */
> > +    DPBUF_AFXDP,                /* buffer data from XDP frame */
>
> Please, move the comment one space left.
>
OK


>
> >  };
> >
> >  #define DP_PACKET_CONTEXT_SIZE 64
> > @@ -89,6 +94,20 @@ struct dp_packet {
> >      };
> >  };
> >
> > +struct dp_packet_afxdp {
> > +    struct umem_pool *mpool;
> > +    struct dp_packet packet;
> > +};
> > +
> > +#if HAVE_AF_XDP
> > +static struct dp_packet_afxdp *
> > +dp_packet_cast_afxdp(const struct dp_packet *d OVS_UNUSED)
> > +{
> > +    ovs_assert(d->source == DPBUF_AFXDP);
> > +    return CONTAINER_OF(d, struct dp_packet_afxdp, packet);
> > +}
> > +#endif
> > +
> >  static inline void *dp_packet_data(const struct dp_packet *);
> >  static inline void dp_packet_set_data(struct dp_packet *, void *);
> >  static inline void *dp_packet_base(const struct dp_packet *);
> > @@ -183,7 +202,21 @@ dp_packet_delete(struct dp_packet *b)
> >              free_dpdk_buf((struct dp_packet*) b);
> >              return;
> >          }
> > -
> > +        if (b->source == DPBUF_AFXDP) {
> > +#ifdef HAVE_AF_XDP
> > +            struct dp_packet_afxdp *xpacket;
> > +
> > +            /* if a packet is received from afxdp port,
> > +             * and tx to a system port. Then we need to
> > +             * push the rx umem back here
> > +             */
> > +            xpacket = dp_packet_cast_afxdp(b);
> > +            if (xpacket->mpool) {
> > +                umem_elem_push(xpacket->mpool, dp_packet_base(b));
> > +            }
> > +#endif
> > +            return;
> > +        }
> >          dp_packet_uninit(b);
> >          free(b);
> >      }
> > diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
> > index 859c05613ddf..e47cf73bf3c9 100644
> > --- a/lib/dpif-netdev-perf.h
> > +++ b/lib/dpif-netdev-perf.h
> > @@ -198,6 +198,19 @@ cycles_counter_update(struct pmd_perf_stats *s)
> >  {
> >  #ifdef DPDK_NETDEV
> >      return s->last_tsc = rte_get_tsc_cycles();
> > +#elif HAVE_AF_XDP
> > +    union {
> > +        uint64_t tsc_64;
> > +        struct {
> > +            uint32_t lo_32;
> > +            uint32_t hi_32;
> > +        };
> > +    } tsc;
> > +    asm volatile("rdtsc" :
> > +             "=a" (tsc.lo_32),
> > +             "=d" (tsc.hi_32));
>
> We need to check that we're on x86 machine.
> Build should fail, I think. For now, you may add following code
> to the head of netdev-afxdp.c:
>
> #if !defined(__i386__) && !defined(__x86_64__)
> #error AF_XDP supported only for Linux on x86 or x86_64
> #endif
>
> Thanks, yes, this is x86 specific instructions.


> > +
> > +    return s->last_tsc = tsc.tsc_64;
> >  #else
> >      return s->last_tsc = 0;
> >  #endif
> > diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
> > new file mode 100644
> > index 000000000000..4c71061fc102
> > --- /dev/null
> > +++ b/lib/netdev-afxdp.c
> > @@ -0,0 +1,589 @@
> > +/*
> > + * Copyright (c) 2018, 2019 Nicira, Inc.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#include <config.h>
> > +#ifdef HAVE_AF_XDP
> > +#include "netdev-linux.h"
> > +#include <errno.h>
> > +#include <fcntl.h>
> > +#include <sys/types.h>
> > +#include <netinet/in.h>
> > +#include <arpa/inet.h>
> > +#include <inttypes.h>
> > +#include <sys/ioctl.h>
> > +#include <sys/socket.h>
> > +#include <sys/utsname.h>
> > +#include <netpacket/packet.h>
> > +#include <net/if.h>
> > +#include <net/if_arp.h>
> > +#include <net/route.h>
> > +#include <poll.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <unistd.h>
> > +
> > +#include "coverage.h"
> > +#include "dp-packet.h"
> > +#include "dpif-netlink.h"
> > +#include "dpif-netdev.h"
> > +#include "openvswitch/dynamic-string.h"
> > +#include "fatal-signal.h"
> > +#include "hash.h"
> > +#include "openvswitch/hmap.h"
> > +#include "netdev-provider.h"
> > +#include "netdev-tc-offloads.h"
> > +#include "netdev-vport.h"
> > +#include "netlink-notifier.h"
> > +#include "netlink-socket.h"
> > +#include "netlink.h"
> > +#include "netnsid.h"
> > +#include "openvswitch/ofpbuf.h"
> > +#include "openflow/openflow.h"
> > +#include "ovs-atomic.h"
> > +#include "packets.h"
> > +#include "openvswitch/poll-loop.h"
> > +#include "rtnetlink.h"
> > +#include "openvswitch/shash.h"
> > +#include "socket-util.h"
> > +#include "sset.h"
> > +#include "tc.h"
> > +#include "timer.h"
> > +#include "unaligned.h"
> > +#include "openvswitch/vlog.h"
> > +#include "util.h"
> > +#include "netdev-afxdp.h"
> > +
> > +#include <linux/if_ether.h>
> > +#include <linux/if_tun.h>
> > +#include <linux/types.h>
> > +#include <linux/ethtool.h>
> > +#include <linux/mii.h>
> > +#include <linux/rtnetlink.h>
> > +#include <linux/sockios.h>
> > +#include <linux/if_xdp.h>
> > +#include "xdpsock.h"
> > +
> > +#ifndef SOL_XDP
> > +#define SOL_XDP 283
> > +#endif
> > +#ifndef AF_XDP
> > +#define AF_XDP 44
> > +#endif
> > +#ifndef PF_XDP
> > +#define PF_XDP AF_XDP
> > +#endif
> > +
> > +VLOG_DEFINE_THIS_MODULE(netdev_afxdp);
> > +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> > +
> > +#define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base))
> > +#define UMEM2XPKT(base, i) \
> > +    ALIGNED_CAST(struct dp_packet_afxdp *, (char *)base + \
> > +    i * sizeof(struct dp_packet_afxdp))
> > +
> > +static uint32_t opt_xdp_bind_flags = XDP_COPY;
> > +static uint32_t opt_xdp_flags =
> > +                    XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
> > +#ifdef USE_DRVMODE_DEFAULT
>
> If I'll define this, build will fail.
> Should there be ifdef-else-end ?
>

yes, I will put if-else.

>
> > +static uint32_t opt_xdp_bind_flags = XDP_ZEROCOPY;
> > +static uint32_t opt_xdp_flags =
> > +                    XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
> > +#endif
> > +static uint32_t prog_id;
> > +
> > +static struct xsk_umem_info *xsk_configure_umem(void *buffer, uint64_t
> size)
> > +{
> > +    struct xsk_umem_info *umem;
> > +    int ret;
> > +    int i;
> > +
> > +    umem = xcalloc(1, sizeof(*umem));
> > +    if (!umem) {
> > +        VLOG_FATAL("xsk config umem failed (%s)", ovs_strerror(errno));
>
> xcalloc can't fail.
>
> > +    }
> > +
> > +    ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq,
> &umem->cq,
> > +                           NULL);
> > +
> > +    if (ret) {
> > +        VLOG_FATAL("xsk umem create failed (%s) mode: %s",
> > +            ovs_strerror(errno),
> > +            opt_xdp_bind_flags == XDP_COPY ? "SKB": "DRV");
>
> Why so FATAL? Can we just return NULL and fail the
> netdev_linux_rxq_construct?
>

I wasn't carefully thinking about the error handling cases and
I assumed either all configuration works or any one fails then fails all.
Will fix it in next version.


> > +    }
> > +
> > +    umem->buffer = buffer;
> > +
> > +    /* set-up umem pool */
> > +    umem_pool_init(&umem->mpool, NUM_FRAMES);
> > +
> > +    for (i = NUM_FRAMES - 1; i >= 0; i--) {
> > +        struct umem_elem *elem;
> > +
> > +        elem = ALIGNED_CAST(struct umem_elem *,
> > +                            (char *)umem->buffer + i * FRAME_SIZE);
> > +        umem_elem_push(&umem->mpool, elem);
> > +    }
> > +
> > +    /* set-up metadata */
> > +    xpacket_pool_init(&umem->xpool, NUM_FRAMES);
> > +
> > +    VLOG_DBG("%s xpacket pool from %p to %p", __func__,
> > +              umem->xpool.array,
> > +              (char *)umem->xpool.array +
> > +              NUM_FRAMES * sizeof(struct dp_packet_afxdp));
> > +
> > +    for (i = NUM_FRAMES - 1; i >= 0; i--) {
> > +        struct dp_packet_afxdp *xpacket;
> > +        struct dp_packet *packet;
> > +
> > +        xpacket = UMEM2XPKT(umem->xpool.array, i);
> > +        xpacket->mpool = &umem->mpool;
> > +
> > +        packet = &xpacket->packet;
> > +        packet->source = DPBUF_AFXDP;
> > +    }
> > +
> > +    return umem;
> > +}
> > +
> > +static struct xsk_socket_info *
> > +xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
> > +                     uint32_t queue_id)
> > +{
> > +    struct xsk_socket_config cfg;
> > +    struct xsk_socket_info *xsk;
> > +    char devname[IF_NAMESIZE];
> > +    uint32_t idx;
> > +    int ret;
> > +    int i;
> > +
> > +    xsk = xcalloc(1, sizeof(*xsk));
> > +    if (!xsk) {
> > +        VLOG_FATAL("xsk calloc failed (%s)", ovs_strerror(errno));
>
> xcalloc can't fail.
>
> > +    }
> > +
> > +    xsk->umem = umem;
> > +    cfg.rx_size = CONS_NUM_DESCS;
> > +    cfg.tx_size = PROD_NUM_DESCS;
> > +    cfg.libbpf_flags = 0;
> > +    cfg.xdp_flags = opt_xdp_flags;
> > +    cfg.bind_flags = opt_xdp_bind_flags;
> > +
> > +    if (if_indextoname(ifindex, devname) == NULL) {
> > +        VLOG_FATAL("ifindex %d devname failed (%s)",
> > +                   ifindex, ovs_strerror(errno));
>
> Every little misconfiguration will lead to aborting. It's probably OK
> for the experimantal feature, but I don't like this.
>
> > +    }
> > +
> > +    ret = xsk_socket__create(&xsk->xsk, devname, queue_id, umem->umem,
> > +                             &xsk->rx, &xsk->tx, &cfg);
> > +    if (ret) {
> > +        VLOG_FATAL("xsk_socket_create failed (%s) mode: %s qid: %d",
> > +                   ovs_strerror(errno),
> > +                   opt_xdp_bind_flags == XDP_COPY ? "SKB": "DRV",
> > +                   queue_id);
> > +    }
> > +
> > +    /* make sure the XDP program is there */
> > +    ret = bpf_get_link_xdp_id(ifindex, &prog_id, opt_xdp_flags);
> > +    if (ret) {
> > +        VLOG_FATAL("get XDP prog ID failed (%s)", ovs_strerror(errno));
> > +    }
> > +
> > +    ret = xsk_ring_prod__reserve(&xsk->umem->fq,
> > +                                 PROD_NUM_DESCS,
> > +                                 &idx);
> > +    if (ret != PROD_NUM_DESCS) {
> > +        VLOG_FATAL("fq set-up failed (%s)", ovs_strerror(errno));
> > +    }
> > +
> > +    for (i = 0;
> > +         i < PROD_NUM_DESCS * FRAME_SIZE;
> > +         i += FRAME_SIZE) {
> > +        struct umem_elem *elem;
> > +        uint64_t addr;
> > +
> > +        elem = umem_elem_pop(&xsk->umem->mpool);
> > +        addr = UMEM2DESC(elem, xsk->umem->buffer);
> > +
> > +        *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = addr;
> > +    }
> > +
> > +    xsk_ring_prod__submit(&xsk->umem->fq,
> > +                          PROD_NUM_DESCS);
> > +    return xsk;
> > +}
> > +
> > +struct xsk_socket_info *
> > +xsk_configure(int ifindex, int xdp_queue_id)
> > +{
> > +    struct xsk_socket_info *xsk;
> > +    struct xsk_umem_info *umem;
> > +    void *bufs;
> > +    int ret;
> > +
> > +    ret = posix_memalign(&bufs, getpagesize(),
> > +                         NUM_FRAMES * FRAME_SIZE);
>
> In the future we'll need to use HAVE_POSIX_MEMALIGN, probably.
>
> Do we need to clear the allocated memory?
>
> Good point. Need to add free() at xsk_destroy().


> > +    ovs_assert(!ret);
> > +
> > +    /* Create sockets... */
> > +    umem = xsk_configure_umem(bufs,
> > +                              NUM_FRAMES * FRAME_SIZE);
> > +    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id);
> > +    return xsk;
> > +}
> > +
> > +static void OVS_UNUSED vlog_hex_dump(const void *buf, size_t count)
> > +{
> > +    struct ds ds = DS_EMPTY_INITIALIZER;
> > +    ds_put_hex_dump(&ds, buf, count, 0, false);
> > +    VLOG_DBG_RL(&rl, "%s", ds_cstr(&ds));
> > +    ds_destroy(&ds);
> > +}
> > +
> > +void
> > +xsk_destroy(struct xsk_socket_info *xsk)
> > +{
> > +    struct xsk_umem *umem;
> > +
> > +    if (!xsk) {
> > +        return;
> > +    }
> > +
> > +    umem = xsk->umem->umem;
> > +    xsk_socket__delete(xsk->xsk);
> > +    (void)xsk_umem__delete(umem);
> > +
> > +    /* cleanup umem pool */
> > +    umem_pool_cleanup(&xsk->umem->mpool);
> > +
> > +    /* cleanup metadata pool */
> > +    xpacket_pool_cleanup(&xsk->umem->xpool);
> > +}
> > +
> > +static inline void OVS_UNUSED
> > +print_xsk_stat(struct xsk_socket_info *xsk OVS_UNUSED) {
> > +    struct xdp_statistics stat;
> > +    socklen_t optlen;
> > +
> > +    optlen = sizeof(stat);
>
> please don't paranthesize the argument of sizeof if it's name of variable.
>
> OK Thanks.


> > +    ovs_assert(getsockopt(xsk_socket__fd(xsk->xsk), SOL_XDP,
> XDP_STATISTICS,
> > +                &stat, &optlen) == 0);
> > +
> > +    VLOG_DBG_RL(&rl, "rx dropped %llu, rx_invalid %llu, tx_invalid
> %llu",
> > +                     stat.rx_dropped,
> > +                     stat.rx_invalid_descs,
> > +                     stat.tx_invalid_descs);
> > +}
> > +
> > +int
> > +netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
> > +                        char **errp OVS_UNUSED)
> > +{
> > +    const char *xdpmode;
> > +    int new_n_rxq;
> > +
> > +    /* TODO: add mutex lock */
> > +    new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
> > +
> > +    if (netdev->n_rxq != new_n_rxq) {
> > +
> > +        if (new_n_rxq > MAX_XSKQ) {
> > +            VLOG_WARN("set n_rxq %d too large", new_n_rxq);
> > +            goto out;
>
> Just return EINVAL.
>

OK


>
> > +        }
> > +
> > +        netdev->n_rxq = new_n_rxq;
>
> This is wrong. You must not update netdev->n_rxq here. This should
> be done on reconfiguration.
>

Good point. Thanks.
Will follow the way netdev-dpdk.c to fix it.


>
> > +        VLOG_INFO("set AF_XDP device %s to %d n_rxq", netdev->name,
> new_n_rxq);
> > +        netdev_request_reconfigure(netdev);
> > +    }
> > +
> > +    xdpmode = smap_get(args, "xdpmode");
> > +    if (xdpmode && strncmp(xdpmode, "drv", 3) == 0) {
> > +        if (opt_xdp_bind_flags != XDP_ZEROCOPY) {
> > +            opt_xdp_bind_flags = XDP_ZEROCOPY;
> > +            opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST |
> XDP_FLAGS_DRV_MODE;
> > +        }
> > +        VLOG_INFO("AF_XDP device %s in ZC driver mode", netdev->name);
> > +    } else {
> > +        opt_xdp_bind_flags = XDP_COPY;
> > +        opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST |
> XDP_FLAGS_SKB_MODE;
> > +        VLOG_INFO("AF_XDP device %s in SKB mode", netdev->name);
> > +    }
>
> Looks like changing "xdpmode" while port already added will
> lead to incorrect work. You, probably, need to forbid this or
> prepare the proper reconfiguration process.
>

Right, to enable mode change at this point, I should unbind the
xdp socket and re-create a new one.
I will handle this later.

>
> > +
> > +out:
> > +    return 0;
> > +}
> > +
> > +int
> > +netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args)
> > +{
> > +    /* TODO: add mutex lock */
> > +    smap_add_format(args, "n_rxq", "%d", netdev->n_rxq);
> > +    smap_add_format(args, "xdpmode", "%s",
> > +        opt_xdp_bind_flags == XDP_ZEROCOPY ? "drv" : "skb");
> > +
> > +    return 0;
> > +}
> > +
> > +int
> > +netdev_afxdp_get_numa_id(const struct netdev *netdev)
> > +{
> > +    /* FIXME: Get netdev's PCIe device ID, then find
> > +     * its NUMA node id.
> > +     */
> > +    VLOG_INFO("FIXME: Device %s always use numa id 0", netdev->name);
> > +    return 0;
> > +}
> > +
> > +void
> > +xsk_remove_xdp_program(uint32_t ifindex)
> > +{
> > +    uint32_t curr_prog_id = 0;
> > +
> > +    /* remove_xdp_program() */
> > +    if (bpf_get_link_xdp_id(ifindex, &curr_prog_id, opt_xdp_flags)) {
> > +        bpf_set_link_xdp_fd(ifindex, -1, opt_xdp_flags);
> > +    }
> > +    if (prog_id == curr_prog_id) {
> > +        bpf_set_link_xdp_fd(ifindex, -1, opt_xdp_flags);
> > +    } else if (!curr_prog_id) {
> > +        VLOG_WARN("couldn't find a prog id on a given interface");
> > +    } else {
> > +        VLOG_WARN("program on interface changed, not removing");
> > +    }
> > +}
> > +
> > +/* Receive packet from AF_XDP socket */
> > +int
> > +netdev_linux_rxq_xsk(struct xsk_socket_info *xsk,
> > +                     struct dp_packet_batch *batch)
> > +{
> > +    unsigned int rcvd, i;
> > +    uint32_t idx_rx = 0, idx_fq = 0;
> > +    int ret = 0;
> > +
> > +    /* See if there is any packet on RX queue,
> > +     * if yes, idx_rx is the index having the packet.
> > +     */
> > +    rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx);
> > +    if (!rcvd) {
> > +        return 0;
> > +    }
> > +
> > +    /* Form a dp_packet batch from descriptor in RX queue */
> > +    for (i = 0; i < rcvd; i++) {
> > +        uint64_t addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
> > +        uint32_t len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->len;
> > +        char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
> > +        uint64_t index;
> > +
> > +        struct dp_packet_afxdp *xpacket;
> > +        struct dp_packet *packet;
> > +
> > +        index = addr >> FRAME_SHIFT;
> > +        xpacket = UMEM2XPKT(xsk->umem->xpool.array, index);
> > +
> > +        packet = &xpacket->packet;
> > +        xpacket->mpool = &xsk->umem->mpool;
> > +
> > +        if (packet->source != DPBUF_AFXDP) {
> > +            /* FIXME: might be a bug */
>
> Need to log something here. Rate-limited.
>

OK!

>
> > +            continue;
> > +        }
> > +
> > +        /* Initialize the struct dp_packet */
> > +        if (opt_xdp_bind_flags == XDP_ZEROCOPY) {
> > +            dp_packet_set_base(packet, pkt - FRAME_HEADROOM);
> > +        } else {
> > +            /* SKB mode */
> > +            dp_packet_set_base(packet, pkt);
> > +        }
> > +        dp_packet_set_data(packet, pkt);
> > +        dp_packet_set_size(packet, len);
> > +
> > +        /* Add packet into batch, increase batch->count */
> > +        dp_packet_batch_add(batch, packet);
> > +
> > +        idx_rx++;
> > +    }
> > +
> > +    /* We've consume rcvd packets in RX, now re-fill the
> > +     * same number back to FILL queue.
> > +     */
> > +    for (i = 0; i < rcvd; i++) {
> > +        uint64_t index;
> > +        struct umem_elem *elem;
> > +
> > +        ret = xsk_ring_prod__reserve(&xsk->umem->fq, 1, &idx_fq);
> > +        while (ret == 0) {
> > +            /* The FILL queue is full, so retry. (or skip)? */
> > +            ret = xsk_ring_prod__reserve(&xsk->umem->fq, 1, &idx_fq);
> > +        }
> > +
> > +        /* Get one free umem, program it into FILL queue */
> > +        elem = umem_elem_pop(&xsk->umem->mpool);
> > +        index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
> > +        ovs_assert((index & FRAME_SHIFT_MASK) == 0);
> > +        *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq) = index;
> > +
> > +        idx_fq++;
> > +    }
> > +    xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
> > +
> > +    /* Release the RX queue */
> > +    xsk_ring_cons__release(&xsk->rx, rcvd);
> > +    xsk->rx_npkts += rcvd;
> > +
> > +#ifdef AFXDP_DEBUG
> > +    print_xsk_stat(xsk);
> > +#endif
> > +    return 0;
> > +}
> > +
> > +static void kick_tx(struct xsk_socket_info *xsk)
> > +{
> > +    int ret;
> > +
> > +    ret = sendto(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL,
> 0);
> > +    if (ret >= 0 || errno == ENOBUFS || errno == EAGAIN || errno ==
> EBUSY) {
> > +        return;
> > +    }
> > +}
> > +
> > +int
> > +netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk,
> > +                              struct dp_packet_batch *batch)
> > +{
> > +    uint32_t tx_done, idx_cq = 0;
> > +    struct dp_packet *packet;
> > +    uint32_t idx;
> > +    int j;
> > +
> > +    /* Make sure we have enough TX descs */
> > +    if (xsk_ring_prod__reserve(&xsk->tx, batch->count, &idx) == 0) {
> > +        return -EAGAIN;
> > +    }
> > +
> > +    DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
> > +        struct dp_packet_afxdp *xpacket;
> > +        struct umem_elem *elem;
> > +        uint64_t index;
> > +
> > +        elem = umem_elem_pop(&xsk->umem->mpool);
> > +        if (!elem) {
> > +            return -EAGAIN;
> > +        }
> > +
> > +        memcpy(elem, dp_packet_data(packet), dp_packet_size(packet));
> > +
> > +        index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
> > +        xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->addr = index;
> > +        xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->len
> > +            = dp_packet_size(packet);
> > +
> > +        if (packet->source == DPBUF_AFXDP) {
> > +            xpacket = dp_packet_cast_afxdp(packet);
> > +            umem_elem_push(xpacket->mpool, dp_packet_base(packet));
> > +             /* Avoid freeing it twice at dp_packet_uninit */
> > +            xpacket->mpool = NULL;
>
> Why you're freeing packets here? 'netdev_linux_send' will do that for you.
>

You're right. Will move out.


>
> > +        }
> > +    }
> > +    xsk_ring_prod__submit(&xsk->tx, batch->count);
> > +    xsk->outstanding_tx += batch->count;
> > +
> > +retry:
> > +    kick_tx(xsk);
> > +
> > +    /* Process CQ */
>
> Maybe it's better to process CQ on rx ?

It's unknown when we'll be here next time, but we'll definitely
> call rx function soon.
>
I think it's OK here.
We will have entries in CQ only when issuing TX.
So processing CQ here make sure when the above TX is done,
there is enough entries in CQ for TX to finish.


>
> > +    tx_done = xsk_ring_cons__peek(&xsk->umem->cq, batch->count,
> &idx_cq);
> > +    if (tx_done > 0) {
> > +        xsk->outstanding_tx -= tx_done;
> > +        xsk->tx_npkts += tx_done;
> > +    }
> > +
> > +    /* Recycle back to umem pool */
> > +    for (j = 0; j < tx_done; j++) {
> > +        struct umem_elem *elem;
> > +        uint64_t addr;
> > +
> > +        addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++);
> > +
> > +        elem = ALIGNED_CAST(struct umem_elem *,
> > +                            (char *)xsk->umem->buffer + addr);
> > +        umem_elem_push(&xsk->umem->mpool, elem);
> > +    }
> > +    xsk_ring_cons__release(&xsk->umem->cq, tx_done);
> > +
> > +    if (xsk->outstanding_tx > PROD_NUM_DESCS - (PROD_NUM_DESCS >> 2)) {
> > +        /* If there are still a lot not transmitted,
> > +         * try harder.
> > +         */
> > +        goto retry;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +#else
> > +#include "openvswitch/compiler.h"
> > +#include "netdev-afxdp.h"
> > +
> > +struct xsk_socket_info *
> > +xsk_configure(int ifindex OVS_UNUSED, int xdp_queue_id OVS_UNUSED)
> > +{
> > +    return NULL;
> > +}
> > +
> > +void
> > +xsk_destroy(struct xsk_socket_info *xsk OVS_UNUSED)
> > +{
> > +}
> > +
> > +int
> > +netdev_linux_rxq_xsk(struct xsk_socket_info *xsk OVS_UNUSED,
> > +                     struct dp_packet_batch *batch OVS_UNUSED)
> > +{
> > +    return 0;
> > +}
> > +
> > +int
> > +netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk OVS_UNUSED,
> > +                              struct dp_packet_batch *batch OVS_UNUSED)
> > +{
> > +    return 0;
> > +}
> > +
> > +int
> > +netdev_afxdp_set_config(struct netdev *netdev OVS_UNUSED,
> > +                        const struct smap *args OVS_UNUSED,
> > +                        char **errp OVS_UNUSED)
> > +{
> > +    return 0;
> > +}
> > +
> > +int
> > +netdev_afxdp_get_config(const struct netdev *netdev OVS_UNUSED,
> > +                        struct smap *args OVS_UNUSED)
> > +{
> > +    return 0;
> > +}
> > +
> > +int
> > +netdev_afxdp_get_numa_id(const struct netdev *netdev OVS_UNUSED)
> > +{
> > +    return 0;
> > +}
> > +#endif
> > diff --git a/lib/netdev-afxdp.h b/lib/netdev-afxdp.h
> > new file mode 100644
> > index 000000000000..ea05612a7c0f
> > --- /dev/null
> > +++ b/lib/netdev-afxdp.h
> > @@ -0,0 +1,47 @@
> > +/*
> > + * Copyright (c) 2018 Nicira, Inc.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#ifndef NETDEV_AFXDP_H
> > +#define NETDEV_AFXDP_H 1
> > +
> > +#include <stdint.h>
> > +#include <stdbool.h>
> > +
> > +/* These functions are Linux AF_XDP specific, so they should be used
> directly
> > + * only by Linux-specific code. */
> > +#define MAX_XSKQ 16
> > +struct netdev;
> > +struct xsk_socket_info;
> > +struct xdp_umem;
> > +struct dp_packet_batch;
> > +struct smap;
> > +
> > +struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id);
> > +void xsk_destroy(struct xsk_socket_info *xsk);
> > +
> > +int netdev_linux_rxq_xsk(struct xsk_socket_info *xsk,
> > +                         struct dp_packet_batch *batch);
> > +
> > +int netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk,
> > +                                  struct dp_packet_batch *batch);
> > +
> > +void xsk_remove_xdp_program(uint32_t ifindex);
> > +int netdev_afxdp_set_config(struct netdev *netdev, const struct smap
> *args,
> > +                            char **errp);
> > +int netdev_afxdp_get_config(const struct netdev *netdev, struct smap
> *args);
> > +int netdev_afxdp_get_numa_id(const struct netdev *netdev);
> > +
> > +#endif /* netdev-afxdp.h */
> > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> > index f75d73fd39f8..337760ca3333 100644
> > --- a/lib/netdev-linux.c
> > +++ b/lib/netdev-linux.c
> > @@ -75,6 +75,7 @@
> >  #include "unaligned.h"
> >  #include "openvswitch/vlog.h"
> >  #include "util.h"
> > +#include "netdev-afxdp.h"
> >
> >  VLOG_DEFINE_THIS_MODULE(netdev_linux);
> >
> > @@ -531,6 +532,7 @@ struct netdev_linux {
> >
> >      /* LAG information. */
> >      bool is_lag_master;         /* True if the netdev is a LAG master.
> */
> > +    struct xsk_socket_info *xsk[MAX_XSKQ]; /* af_xdp socket */
> >  };
> >
> >  struct netdev_rxq_linux {
> > @@ -580,12 +582,18 @@ is_netdev_linux_class(const struct netdev_class
> *netdev_class)
> >  }
> >
> >  static bool
> > +is_afxdp_netdev(const struct netdev *netdev)
> > +{
> > +    return netdev_get_class(netdev) == &netdev_afxdp_class;
> > +}
> > +
> > +static bool
> >  is_tap_netdev(const struct netdev *netdev)
> >  {
> >      return netdev_get_class(netdev) == &netdev_tap_class;
> >  }
> >
> > -static struct netdev_linux *
> > +struct netdev_linux *
> >  netdev_linux_cast(const struct netdev *netdev)
> >  {
> >      ovs_assert(is_netdev_linux_class(netdev_get_class(netdev)));
> > @@ -1084,6 +1092,25 @@ netdev_linux_destruct(struct netdev *netdev_)
> >          atomic_count_dec(&miimon_cnt);
> >      }
> >
> > +#if HAVE_AF_XDP
> > +    if (is_afxdp_netdev(netdev_)) {
> > +        int ifindex;
> > +        int ret, i;
> > +
> > +        ret = get_ifindex(netdev_, &ifindex);
> > +        if (ret) {
> > +            VLOG_ERR("get ifindex error");
> > +        } else {
> > +            for (i = 0; i < MAX_XSKQ; i++) {
> > +                if (netdev->xsk[i]) {
> > +                    VLOG_INFO("destroy xsk[%d]", i);
> > +                    xsk_destroy(netdev->xsk[i]);
> > +                }
> > +            }
> > +            xsk_remove_xdp_program(ifindex);
> > +        }
> > +    }
> > +#endif
> >      ovs_mutex_destroy(&netdev->mutex);
> >  }
> >
> > @@ -1113,6 +1140,32 @@ netdev_linux_rxq_construct(struct netdev_rxq
> *rxq_)
> >      rx->is_tap = is_tap_netdev(netdev_);
> >      if (rx->is_tap) {
> >          rx->fd = netdev->tap_fd;
> > +    } else if (is_afxdp_netdev(netdev_)) {
> > +#if HAVE_AF_XDP
> > +        struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
> > +        int ifindex;
> > +        int xdp_queue_id = rxq_->queue_id;
> > +        struct xsk_socket_info *xsk;
> > +
> > +        if (setrlimit(RLIMIT_MEMLOCK, &r)) {
> > +            VLOG_ERR("ERROR: setrlimit(RLIMIT_MEMLOCK) \"%s\"\n",
> > +                      ovs_strerror(errno));
> > +            ovs_assert(0);
> > +        }
> > +
> > +        VLOG_DBG("%s: %s: queue=%d configuring xdp sock",
> > +                  __func__, netdev_->name, xdp_queue_id);
> > +
> > +        /* Get ethernet device index. */
> > +        error = get_ifindex(&netdev->up, &ifindex);
> > +        if (error) {
> > +            goto error;
> > +        }
> > +
> > +        xsk = xsk_configure(ifindex, xdp_queue_id);
> > +        netdev->xsk[xdp_queue_id] = xsk;
> > +        rx->fd = xsk_socket__fd(xsk->xsk); /* for netdev layer to poll
> */
> > +#endif
> >      } else {
> >          struct sockaddr_ll sll;
> >          int ifindex, val;
> > @@ -1318,9 +1371,16 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_,
> struct dp_packet_batch *batch,
> >  {
> >      struct netdev_rxq_linux *rx = netdev_rxq_linux_cast(rxq_);
> >      struct netdev *netdev = rx->up.netdev;
> > -    struct dp_packet *buffer;
> > +    struct dp_packet *buffer = NULL;
> >      ssize_t retval;
> >      int mtu;
> > +    struct netdev_linux *netdev_ = netdev_linux_cast(netdev);
> > +
> > +    if (is_afxdp_netdev(netdev)) {
> > +        int qid = rxq_->queue_id;
> > +
> > +        return netdev_linux_rxq_xsk(netdev_->xsk[qid], batch);
> > +    }
> >
> >      if (netdev_linux_get_mtu__(netdev_linux_cast(netdev), &mtu)) {
> >          mtu = ETH_PAYLOAD_MAX;
> > @@ -1329,6 +1389,7 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_,
> struct dp_packet_batch *batch,
> >      /* Assume Ethernet port. No need to set packet_type. */
> >      buffer = dp_packet_new_with_headroom(VLAN_ETH_HEADER_LEN + mtu,
> >                                             DP_NETDEV_HEADROOM);
> > +
> >      retval = (rx->is_tap
> >                ? netdev_linux_rxq_recv_tap(rx->fd, buffer)
> >                : netdev_linux_rxq_recv_sock(rx->fd, buffer));
> > @@ -1473,14 +1534,15 @@ netdev_linux_tap_batch_send(struct netdev
> *netdev_,
> >   * The kernel maintains a packet transmission queue, so the caller is
> not
> >   * expected to do additional queuing of packets. */
> >  static int
> > -netdev_linux_send(struct netdev *netdev_, int qid OVS_UNUSED,
> > +netdev_linux_send(struct netdev *netdev_, int qid,
> >                    struct dp_packet_batch *batch,
> >                    bool concurrent_txq OVS_UNUSED)
> >  {
> >      int error = 0;
> >      int sock = 0;
> >
> > -    if (!is_tap_netdev(netdev_)) {
> > +    if (!is_tap_netdev(netdev_) &&
> > +        !is_afxdp_netdev(netdev_)) {
> >          if (netdev_linux_netnsid_is_remote(netdev_linux_cast(netdev_)))
> {
> >              error = EOPNOTSUPP;
> >              goto free_batch;
> > @@ -1499,6 +1561,10 @@ netdev_linux_send(struct netdev *netdev_, int qid
> OVS_UNUSED,
> >          }
> >
> >          error = netdev_linux_sock_batch_send(sock, ifindex, batch);
> > +    } else if (is_afxdp_netdev(netdev_)) {
> > +        struct netdev_linux *netdev = netdev_linux_cast(netdev_);
> > +
> > +        error = netdev_linux_afxdp_batch_send(netdev->xsk[qid], batch);
> >      } else {
> >          error = netdev_linux_tap_batch_send(netdev_, batch);
> >      }
> > @@ -3323,6 +3389,7 @@ const struct netdev_class netdev_linux_class = {
> >      NETDEV_LINUX_CLASS_COMMON,
> >      LINUX_FLOW_OFFLOAD_API,
> >      .type = "system",
> > +    .is_pmd = false,
> >      .construct = netdev_linux_construct,
> >      .get_stats = netdev_linux_get_stats,
> >      .get_features = netdev_linux_get_features,
> > @@ -3333,6 +3400,7 @@ const struct netdev_class netdev_linux_class = {
> >  const struct netdev_class netdev_tap_class = {
> >      NETDEV_LINUX_CLASS_COMMON,
> >      .type = "tap",
> > +    .is_pmd = false,
> >      .construct = netdev_linux_construct_tap,
> >      .get_stats = netdev_tap_get_stats,
> >      .get_features = netdev_linux_get_features,
> > @@ -3343,10 +3411,23 @@ const struct netdev_class netdev_internal_class
> = {
> >      NETDEV_LINUX_CLASS_COMMON,
> >      LINUX_FLOW_OFFLOAD_API,
> >      .type = "internal",
> > +    .is_pmd = false,
> >      .construct = netdev_linux_construct,
> >      .get_stats = netdev_internal_get_stats,
> >      .get_status = netdev_internal_get_status,
> >  };
> > +
> > +const struct netdev_class netdev_afxdp_class = {
> > +    NETDEV_LINUX_CLASS_COMMON,
> > +    .type = "afxdp",
> > +    .is_pmd = true,
> > +    .construct = netdev_linux_construct,
> > +    .get_stats = netdev_linux_get_stats,
> > +    .get_status = netdev_linux_get_status,
> > +    .set_config = netdev_afxdp_set_config,
> > +    .get_config = netdev_afxdp_get_config,
> > +    .get_numa_id = netdev_afxdp_get_numa_id,
> > +};
> >
> >
> >  #define CODEL_N_QUEUES 0x0000
> > diff --git a/lib/netdev-linux.h b/lib/netdev-linux.h
> > index 17ca9120168a..afcb20ee8d0a 100644
> > --- a/lib/netdev-linux.h
> > +++ b/lib/netdev-linux.h
> > @@ -28,6 +28,7 @@ struct netdev;
> >  int netdev_linux_ethtool_set_flag(struct netdev *netdev, uint32_t flag,
> >                                    const char *flag_name, bool enable);
> >  int linux_get_ifindex(const char *netdev_name);
> > +struct netdev_linux *netdev_linux_cast(const struct netdev *netdev);
> >
> >  #define LINUX_FLOW_OFFLOAD_API                          \
> >     .flow_flush = netdev_tc_flow_flush,                  \
> > diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
> > index fb0c27e6e8e8..5bf041316503 100644
> > --- a/lib/netdev-provider.h
> > +++ b/lib/netdev-provider.h
> > @@ -902,6 +902,7 @@ extern const struct netdev_class netdev_linux_class;
> >  #endif
> >  extern const struct netdev_class netdev_internal_class;
> >  extern const struct netdev_class netdev_tap_class;
> > +extern const struct netdev_class netdev_afxdp_class;
> >
> >  #ifdef  __cplusplus
> >  }
> > diff --git a/lib/netdev.c b/lib/netdev.c
> > index 7d7ecf6f0946..c30016b34033 100644
> > --- a/lib/netdev.c
> > +++ b/lib/netdev.c
> > @@ -145,6 +145,7 @@ netdev_initialize(void)
> >          netdev_register_provider(&netdev_linux_class);
> >          netdev_register_provider(&netdev_internal_class);
> >          netdev_register_provider(&netdev_tap_class);
> > +        netdev_register_provider(&netdev_afxdp_class);
> >          netdev_vport_tunnel_register();
> >  #endif
> >  #if defined(__FreeBSD__) || defined(__NetBSD__)
> > diff --git a/lib/xdpsock.c b/lib/xdpsock.c
> > new file mode 100644
> > index 000000000000..f9fe94b9e36a
> > --- /dev/null
> > +++ b/lib/xdpsock.c
> > @@ -0,0 +1,210 @@
> > +/*
> > + * Copyright (c) 2018, 2019 Nicira, Inc.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +#include <config.h>
> > +#include <ctype.h>
> > +#include <errno.h>
> > +#include <fcntl.h>
> > +#include <stdarg.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <sys/stat.h>
> > +#include <sys/types.h>
> > +#include <syslog.h>
> > +#include <time.h>
> > +#include <unistd.h>
> > +#include "openvswitch/vlog.h"
> > +#include "async-append.h"
> > +#include "coverage.h"
> > +#include "dirs.h"
> > +#include "ovs-thread.h"
> > +#include "sat-math.h"
> > +#include "socket-util.h"
> > +#include "svec.h"
> > +#include "syslog-direct.h"
> > +#include "syslog-libc.h"
> > +#include "syslog-provider.h"
> > +#include "timeval.h"
> > +#include "unixctl.h"
> > +#include "util.h"
> > +#include "ovs-atomic.h"
> > +#include "openvswitch/compiler.h"
> > +#include "dp-packet.h"
> > +
> > +#ifdef HAVE_AF_XDP
> > +#include "xdpsock.h"
> > +
> > +static inline void ovs_spinlock_init(ovs_spinlock_t *sl)
> > +{
> > +    sl->locked = 0;
> > +}
> > +
> > +static inline void ovs_spin_lock(ovs_spinlock_t *sl)
> > +{
> > +    int exp = 0;
> > +
> > +    while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0,
> > +                __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
> > +        while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED)) {
>
>
Thanks, I will fix them in next version.


>
> These atomics are compiler specific. Please use:
>
>     while (!atomic_compare_exchange_strong_explicit(&sl->locked, &exp, 1,
>                                                     memory_order_acquire,
>                                                     memory_order_relaxed))
> {
>         locked = 1;
>         while (locked) {
>             atomic_read_relaxed(&sl->locked, &locked);
>         }
>         exp = 0;
>     }
>
> > +            ;
> > +        }
> > +        exp = 0;
> > +    }
> > +}
> > +
> > +static inline void ovs_spin_unlock(ovs_spinlock_t *sl)
> > +{
> > +    __atomic_store_n(&sl->locked, 0, __ATOMIC_RELEASE);
>
>     atomic_store_explicit(&sl->locked, 0, memory_order_release);
>
> > +}
> > +
> > +static inline int OVS_UNUSED ovs_spin_trylock(ovs_spinlock_t *sl)
> > +{
> > +    int exp = 0;
> > +    return __atomic_compare_exchange_n(&sl->locked, &exp, 1,
> > +              0, /* disallow spurious failure */
> > +               __ATOMIC_ACQUIRE, __ATOMIC_RELAXED);
>
>
>     return atomic_compare_exchange_strong_explicit(&sl->locked, &exp, 1,
>                                                    memory_order_acquire,
>                                                    memory_order_relaxed);
>
>
> > +}
> > +
> > +void
> > +__umem_elem_push_n(struct umem_pool *umemp OVS_UNUSED, void **addrs,
> int n)
> > +{
> > +    void *ptr;
> > +
> > +    if (OVS_UNLIKELY(umemp->index + n > umemp->size)) {
> > +        OVS_NOT_REACHED();
> > +    }
> > +
> > +    ptr = &umemp->array[umemp->index];
> > +    memcpy(ptr, addrs, n * sizeof(void *));
> > +    umemp->index += n;
> > +}
> > +
> > +inline void
> > +__umem_elem_push(struct umem_pool *umemp OVS_UNUSED, void *addr)
> > +{
> > +    umemp->array[umemp->index++] = addr;
> > +}
> > +
> > +void
> > +umem_elem_push(struct umem_pool *umemp OVS_UNUSED, void *addr)
> > +{
> > +
> > +    if (OVS_UNLIKELY(umemp->index >= umemp->size)) {
> > +        /* stack is full */
> > +        /* it's possible that one umem gets pushed twice,
> > +         * because actions=1,2,3... multiple ports?
> > +        */
> > +        OVS_NOT_REACHED();
> > +    }
> > +
> > +    ovs_assert(((uint64_t)addr & FRAME_SHIFT_MASK) == 0);
> > +
> > +    ovs_spin_lock(&umemp->mutex);
> > +    __umem_elem_push(umemp, addr);
> > +    ovs_spin_unlock(&umemp->mutex);
> > +}
> > +
> > +void
> > +__umem_elem_pop_n(struct umem_pool *umemp OVS_UNUSED, void **addrs, int
> n)
> > +{
> > +    void *ptr;
> > +
> > +    umemp->index -= n;
> > +
> > +    if (OVS_UNLIKELY(umemp->index < 0)) {
> > +        OVS_NOT_REACHED();
> > +    }
> > +
> > +    ptr = &umemp->array[umemp->index];
> > +    memcpy(addrs, ptr, n * sizeof(void *));
> > +}
> > +
> > +inline void *
> > +__umem_elem_pop(struct umem_pool *umemp OVS_UNUSED)
> > +{
> > +    return umemp->array[--umemp->index];
> > +}
> > +
> > +void *
> > +umem_elem_pop(struct umem_pool *umemp OVS_UNUSED)
> > +{
> > +    void *ptr;
> > +
> > +    ovs_spin_lock(&umemp->mutex);
> > +    ptr = __umem_elem_pop(umemp);
> > +    ovs_spin_unlock(&umemp->mutex);
> > +
> > +    return ptr;
> > +}
> > +
> > +void **
> > +__umem_pool_alloc(unsigned int size)
> > +{
> > +    void *bufs;
> > +
> > +    ovs_assert(posix_memalign(&bufs, getpagesize(),
> > +                              size * sizeof(void *)) == 0);
> > +    memset(bufs, 0, size * sizeof(void *));
> > +    return (void **)bufs;
> > +}
> > +
> > +unsigned int
> > +umem_elem_count(struct umem_pool *mpool)
> > +{
> > +    return mpool->index;
> > +}
> > +
> > +int
> > +umem_pool_init(struct umem_pool *umemp OVS_UNUSED, unsigned int size)
> > +{
> > +    umemp->array = __umem_pool_alloc(size);
> > +    if (!umemp->array) {
> > +        OVS_NOT_REACHED();
> > +    }
> > +
> > +    umemp->size = size;
> > +    umemp->index = 0;
> > +    ovs_spinlock_init(&umemp->mutex);
> > +    return 0;
> > +}
> > +
> > +void
> > +umem_pool_cleanup(struct umem_pool *umemp OVS_UNUSED)
> > +{
> > +    free(umemp->array);
> > +}
> > +
> > +/* AF_XDP metadata init/destroy */
> > +int
> > +xpacket_pool_init(struct xpacket_pool *xp, unsigned int size)
> > +{
> > +    void *bufs;
> > +
> > +    ovs_assert(posix_memalign(&bufs, getpagesize(),
> > +                              size * sizeof(struct dp_packet_afxdp)) ==
> 0);
> > +    memset(bufs, 0, size * sizeof(struct dp_packet_afxdp));
> > +
> > +    xp->array = bufs;
> > +    xp->size = size;
> > +    return 0;
> > +}
> > +
> > +void
> > +xpacket_pool_cleanup(struct xpacket_pool *xp)
> > +{
> > +    free(xp->array);
> > +}
> > +#else   /* !HAVE_AF_XDP below */
> > +#endif
> > diff --git a/lib/xdpsock.h b/lib/xdpsock.h
> > new file mode 100644
> > index 000000000000..cb64befe7dba
> > --- /dev/null
> > +++ b/lib/xdpsock.h
> > @@ -0,0 +1,133 @@
> > +/*
> > + * Copyright (c) 2018, 2019 Nicira, Inc.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +#ifndef XDPSOCK_H
> > +#define XDPSOCK_H 1
> > +#include <errno.h>
> > +#include <getopt.h>
> > +#include <libgen.h>
> > +#include <linux/bpf.h>
> > +#include <linux/if_link.h>
> > +#include <linux/if_xdp.h>
> > +#include <linux/if_ether.h>
> > +#include <net/if.h>
> > +#include <signal.h>
> > +#include <stdbool.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <net/ethernet.h>
> > +#include <sys/resource.h>
> > +#include <sys/socket.h>
> > +#include <sys/mman.h>
> > +#include <time.h>
> > +#include <unistd.h>
> > +#include <pthread.h>
> > +#include <locale.h>
> > +#include <sys/types.h>
> > +#include <poll.h>
> > +#include <bpf/libbpf.h>
> > +
> > +#include "ovs-atomic.h"
> > +#include "openvswitch/thread.h"
> > +
> > +/* bpf/xsk.h uses the following macros not defined in OVS,
> > + * so re-define them before include.
> > + */
> > +#define unlikely OVS_UNLIKELY
> > +#define likely OVS_LIKELY
> > +#define barrier() __asm__ __volatile__("": : :"memory")
> > +#define smp_rmb() barrier()
> > +#define smp_wmb() barrier()
>
> These barriers also x86 specific. We'll need to fix that in
> the future before removing build constraints.
>
> > +#include <bpf/xsk.h>
> > +
> > +#define FRAME_HEADROOM  XDP_PACKET_HEADROOM
> > +#define FRAME_SIZE      XSK_UMEM__DEFAULT_FRAME_SIZE
> > +#define BATCH_SIZE      NETDEV_MAX_BURST
> > +#define FRAME_SHIFT     XSK_UMEM__DEFAULT_FRAME_SHIFT
> > +#define FRAME_SHIFT_MASK    ((1<<FRAME_SHIFT)-1)
> > +
> > +#define NUM_FRAMES  1024
> > +#define PROD_NUM_DESCS 128
> > +#define CONS_NUM_DESCS 128
> > +
> > +#ifdef USE_XSK_DEFAULT
> > +#define PROD_NUM_DESCS XSK_RING_PROD__DEFAULT_NUM_DESCS
> > +#define CONS_NUM_DESCS XSK_RING_CONS__DEFAULT_NUM_DESCS
> > +#endif
> > +
> > +typedef struct {
> > +    volatile int locked;
>
> atomic_int locked;
>
> or atomic_bool.
>
> > +} ovs_spinlock_t;
> > +
> > +/* LIFO ptr_array */
> > +struct umem_pool {
> > +    int index;      /* point to top */
> > +    unsigned int size;
> > +    ovs_spinlock_t mutex;
> > +    void **array;   /* a pointer array */
> > +};
> > +
> > +/* array-based dp_packet_afxdp */
> > +struct xpacket_pool {
> > +    unsigned int size;
> > +    struct dp_packet_afxdp **array;
> > +};
> > +
> > +struct xsk_umem_info {
> > +    struct umem_pool mpool;
> > +    struct xpacket_pool xpool;
> > +    struct xsk_ring_prod fq;
> > +    struct xsk_ring_cons cq;
> > +    struct xsk_umem *umem;
> > +    void *buffer;
> > +};
> > +
> > +struct xsk_socket_info {
> > +    struct xsk_ring_cons rx;
> > +    struct xsk_ring_prod tx;
> > +    struct xsk_umem_info *umem;
> > +    struct xsk_socket *xsk;
> > +    unsigned long rx_npkts;
> > +    unsigned long tx_npkts;
> > +    unsigned long prev_rx_npkts;
> > +    unsigned long prev_tx_npkts;
> > +    uint32_t outstanding_tx;
> > +};
> > +
> > +struct umem_elem_head {
> > +    unsigned int index;
> > +    struct ovs_mutex mutex;
> > +    uint32_t n;
> > +};
> > +
> > +struct umem_elem {
> > +    struct umem_elem *next;
> > +};
> > +
> > +void __umem_elem_push(struct umem_pool *umemp, void *addr);
> > +void umem_elem_push(struct umem_pool *umemp, void *addr);
> > +void *__umem_elem_pop(struct umem_pool *umemp);
> > +void *umem_elem_pop(struct umem_pool *umemp);
> > +void **__umem_pool_alloc(unsigned int size);
> > +int umem_pool_init(struct umem_pool *umemp, unsigned int size);
> > +void umem_pool_cleanup(struct umem_pool *umemp);
> > +unsigned int umem_elem_count(struct umem_pool *mpool);
> > +void __umem_elem_pop_n(struct umem_pool *umemp, void **addrs, int n);
> > +void __umem_elem_push_n(struct umem_pool *umemp, void **addrs, int n);
> > +int xpacket_pool_init(struct xpacket_pool *xp, unsigned int size);
> > +void xpacket_pool_cleanup(struct xpacket_pool *xp);
> > +
> > +#endif
> > diff --git a/tests/automake.mk b/tests/automake.mk
> > index ea16532dd2a0..715cef9a6b3b 100644
> > --- a/tests/automake.mk
> > +++ b/tests/automake.mk
> > @@ -4,12 +4,14 @@ EXTRA_DIST += \
> >       $(SYSTEM_TESTSUITE_AT) \
> >       $(SYSTEM_KMOD_TESTSUITE_AT) \
> >       $(SYSTEM_USERSPACE_TESTSUITE_AT) \
> > +     $(SYSTEM_AFXDP_TESTSUITE_AT) \
> >       $(SYSTEM_OFFLOADS_TESTSUITE_AT) \
> >       $(SYSTEM_DPDK_TESTSUITE_AT) \
> >       $(OVSDB_CLUSTER_TESTSUITE_AT) \
> >       $(TESTSUITE) \
> >       $(SYSTEM_KMOD_TESTSUITE) \
> >       $(SYSTEM_USERSPACE_TESTSUITE) \
> > +     $(SYSTEM_AFXDP_TESTSUITE) \
> >       $(SYSTEM_OFFLOADS_TESTSUITE) \
> >       $(SYSTEM_DPDK_TESTSUITE) \
> >       $(OVSDB_CLUSTER_TESTSUITE) \
> > @@ -158,6 +160,11 @@ SYSTEM_USERSPACE_TESTSUITE_AT = \
> >       tests/system-userspace-macros.at \
> >       tests/system-userspace-packet-type-aware.at
> >
> > +SYSTEM_AFXDP_TESTSUITE_AT = \
> > +     tests/system-afxdp-testsuite.at \
> > +     tests/system-afxdp-traffic.at \
> > +     tests/system-afxdp-macros.at
> > +
> >  SYSTEM_TESTSUITE_AT = \
> >       tests/system-common-macros.at \
> >       tests/system-ovn.at \
> > @@ -182,6 +189,7 @@ TESTSUITE = $(srcdir)/tests/testsuite
> >  TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch
> >  SYSTEM_KMOD_TESTSUITE = $(srcdir)/tests/system-kmod-testsuite
> >  SYSTEM_USERSPACE_TESTSUITE = $(srcdir)/tests/system-userspace-testsuite
> > +SYSTEM_AFXDP_TESTSUITE = $(srcdir)/tests/system-afxdp-testsuite
> >  SYSTEM_OFFLOADS_TESTSUITE = $(srcdir)/tests/system-offloads-testsuite
> >  SYSTEM_DPDK_TESTSUITE = $(srcdir)/tests/system-dpdk-testsuite
> >  OVSDB_CLUSTER_TESTSUITE = $(srcdir)/tests/ovsdb-cluster-testsuite
> > @@ -315,6 +323,11 @@ check-system-userspace: all
> >       set $(SHELL) '$(SYSTEM_USERSPACE_TESTSUITE)' -C tests
> AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
> >       "$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@"
> --recheck)
> >
> > +check-afxdp: all
> > +     $(MAKE) install
> > +     set $(SHELL) '$(SYSTEM_AFXDP_TESTSUITE)' -C tests
> AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \
> > +     "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
> > +
> >  check-offloads: all
> >       set $(SHELL) '$(SYSTEM_OFFLOADS_TESTSUITE)' -C tests
> AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
> >       "$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@"
> --recheck)
> > @@ -352,6 +365,10 @@ $(SYSTEM_USERSPACE_TESTSUITE): package.m4
> $(SYSTEM_TESTSUITE_AT) $(SYSTEM_USERSP
> >       $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
> >       $(AM_V_at)mv $@.tmp $@
> >
> > +$(SYSTEM_AFXDP_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT)
> $(SYSTEM_AFXDP_TESTSUITE_AT) $(COMMON_MACROS_AT)
> > +     $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
> > +     $(AM_V_at)mv $@.tmp $@
> > +
> >  $(SYSTEM_OFFLOADS_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT)
> $(SYSTEM_OFFLOADS_TESTSUITE_AT) $(COMMON_MACROS_AT)
> >       $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
> >       $(AM_V_at)mv $@.tmp $@
> > diff --git a/tests/system-afxdp-macros.at b/tests/system-afxdp-macros.at
> > new file mode 100644
> > index 000000000000..2c58c2d6554b
> > --- /dev/null
> > +++ b/tests/system-afxdp-macros.at
> > @@ -0,0 +1,153 @@
> > +# _ADD_BR([name])
> > +#
> > +# Expands into the proper ovs-vsctl commands to create a bridge with the
> > +# appropriate type and properties
> > +m4_define([_ADD_BR], [[add-br $1 -- set Bridge $1 datapath_type=netdev
> protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15
> fail-mode=secure ]])
> > +
> > +# OVS_TRAFFIC_VSWITCHD_START([vsctl-args], [vsctl-output], [=override])
> > +#
> > +# Creates a database and starts ovsdb-server, starts ovs-vswitchd
> > +# connected to that database, calls ovs-vsctl to create a bridge named
> > +# br0 with predictable settings, passing 'vsctl-args' as additional
> > +# commands to ovs-vsctl.  If 'vsctl-args' causes ovs-vsctl to provide
> > +# output (e.g. because it includes "create" commands) then
> 'vsctl-output'
> > +# specifies the expected output after filtering through uuidfilt.
> > +m4_define([OVS_TRAFFIC_VSWITCHD_START],
> > +  [
> > +   export OVS_PKGDATADIR=$(`pwd`)
> > +   _OVS_VSWITCHD_START([--disable-system])
> > +   AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [|
> uuidfilt])], [0], [$2])
> > +])
> > +
> > +# OVS_TRAFFIC_VSWITCHD_STOP([WHITELIST], [extra_cmds])
> > +#
> > +# Gracefully stops ovs-vswitchd and ovsdb-server, checking their log
> files
> > +# for messages with severity WARN or higher and signaling an error if
> any
> > +# is present.  The optional WHITELIST may contain shell-quoted "sed"
> > +# commands to delete any warnings that are actually expected, e.g.:
> > +#
> > +#   OVS_TRAFFIC_VSWITCHD_STOP(["/expected error/d"])
> > +#
> > +# 'extra_cmds' are shell commands to be executed afte
> OVS_VSWITCHD_STOP() is
> > +# invoked. They can be used to perform additional cleanups such as name
> space
> > +# removal.
> > +m4_define([OVS_TRAFFIC_VSWITCHD_STOP],
> > +  [OVS_VSWITCHD_STOP([dnl
> > +$1";/netdev_linux.*obtaining netdev stats via vport failed/d
> > +/dpif_netlink.*Generic Netlink family 'ovs_datapath' does not exist.
> The Open vSwitch kernel module is probably not loaded./d
> > +/dpif_netdev(revalidator.*)|ERR|internal error parsing flow key/d
> > +/dpif(revalidator.*)|WARN|netdev@ovs-netdev: failed to put/d
> > +"])
> > +   AT_CHECK([:; $2])
> > +  ])
> > +
> > +m4_define([ADD_VETH_AFXDP],
> > +    [ AT_CHECK([ip link add $1 type veth peer name ovs-$1 || return 77])
> > +      CONFIGURE_AFXDP_VETH_OFFLOADS([$1])
> > +      AT_CHECK([ip link set $1 netns $2])
> > +      AT_CHECK([ip link set dev ovs-$1 up])
> > +      AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \
> > +                set interface ovs-$1 external-ids:iface-id="$1"
> type="afxdp"])
> > +      NS_CHECK_EXEC([$2], [ip addr add $4 dev $1 $7])
> > +      NS_CHECK_EXEC([$2], [ip link set dev $1 up])
> > +      if test -n "$5"; then
> > +        NS_CHECK_EXEC([$2], [ip link set dev $1 address $5])
> > +      fi
> > +      if test -n "$6"; then
> > +        NS_CHECK_EXEC([$2], [ip route add default via $6])
> > +      fi
> > +      on_exit 'ip link del ovs-$1'
> > +    ]
> > +)
> > +
> > +# CONFIGURE_AFXDP_VETH_OFFLOADS([VETH])
> > +#
> > +# Disable TX offloads and VLAN offloads for veths used in AF_XDP.
> > +m4_define([CONFIGURE_AFXDP_VETH_OFFLOADS],
> > +    [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])
> > +     AT_CHECK([ethtool -K $1 rxvlan off], [0], [ignore], [ignore])
> > +     AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore])
> > +    ]
> > +)
> > +
> > +# CONFIGURE_VETH_OFFLOADS([VETH])
> > +#
> > +# Disable TX offloads for veths.  The userspace datapath uses the
> AF_PACKET
> > +# socket to receive packets for veths.  Unfortunately, the AF_PACKET
> socket
> > +# doesn't play well with offloads:
> > +# 1. GSO packets are received without segmentation and therefore
> discarded.
> > +# 2. Packets with offloaded partial checksum are received with the wrong
> > +#    checksum, therefore discarded by the receiver.
> > +#
> > +# By disabling tx offloads in the non-OVS side of the veth peer we make
> sure
> > +# that the AF_PACKET socket will not receive bad packets.
> > +#
> > +# This is a workaround, and should be removed when offloads are properly
> > +# supported in netdev-linux.
> > +m4_define([CONFIGURE_VETH_OFFLOADS],
> > +    [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])]
> > +)
> > +
> > +# CHECK_CONNTRACK()
> > +#
> > +# Perform requirements checks for running conntrack tests.
> > +#
> > +m4_define([CHECK_CONNTRACK],
> > +    [AT_SKIP_IF([test $HAVE_PYTHON = no])]
> > +)
> > +
> > +# CHECK_CONNTRACK_ALG()
> > +#
> > +# Perform requirements checks for running conntrack ALG tests. The
> userspace
> > +# supports FTP and TFTP.
> > +#
> > +m4_define([CHECK_CONNTRACK_ALG])
> > +
> > +# CHECK_CONNTRACK_FRAG()
> > +#
> > +# Perform requirements checks for running conntrack fragmentations
> tests.
> > +# The userspace doesn't support fragmentation yet, so skip the tests.
> > +m4_define([CHECK_CONNTRACK_FRAG],
> > +[
> > +    AT_SKIP_IF([:])
> > +])
> > +
> > +# CHECK_CONNTRACK_LOCAL_STACK()
> > +#
> > +# Perform requirements checks for running conntrack tests with local
> stack.
> > +# While the kernel connection tracker automatically passes all the
> connection
> > +# tracking state from an internal port to the OpenvSwitch kernel
> module, there
> > +# is simply no way of doing that with the userspace, so skip the tests.
> > +m4_define([CHECK_CONNTRACK_LOCAL_STACK],
> > +[
> > +    AT_SKIP_IF([:])
> > +])
> > +
> > +# CHECK_CONNTRACK_NAT()
> > +#
> > +# Perform requirements checks for running conntrack NAT tests. The
> userspace
> > +# datapath supports NAT.
> > +#
> > +m4_define([CHECK_CONNTRACK_NAT])
> > +
> > +# CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE()
> > +#
> > +# Perform requirements checks for running ovs-dpctl flush-conntrack by
> > +# conntrack 5-tuple test. The userspace datapath does not support
> > +# this feature yet.
> > +m4_define([CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE],
> > +[
> > +    AT_SKIP_IF([:])
> > +])
> > +
> > +# CHECK_CT_DPIF_SET_GET_MAXCONNS()
> > +#
> > +# Perform requirements checks for running ovs-dpctl ct-set-maxconns or
> > +# ovs-dpctl ct-get-maxconns. The userspace datapath does support this
> feature.
> > +m4_define([CHECK_CT_DPIF_SET_GET_MAXCONNS])
> > +
> > +# CHECK_CT_DPIF_GET_NCONNS()
> > +#
> > +# Perform requirements checks for running ovs-dpctl ct-get-nconns. The
> > +# userspace datapath does support this feature.
> > +m4_define([CHECK_CT_DPIF_GET_NCONNS])
> > diff --git a/tests/system-afxdp-testsuite.at b/tests/
> system-afxdp-testsuite.at
> > new file mode 100644
> > index 000000000000..538c0d15d556
> > --- /dev/null
> > +++ b/tests/system-afxdp-testsuite.at
> > @@ -0,0 +1,26 @@
> > +AT_INIT
> > +
> > +AT_COPYRIGHT([Copyright (c) 2018 Nicira, Inc.
> > +
> > +Licensed under the Apache License, Version 2.0 (the "License");
> > +you may not use this file except in compliance with the License.
> > +You may obtain a copy of the License at:
> > +
> > +    http://www.apache.org/licenses/LICENSE-2.0
> > +
> > +Unless required by applicable law or agreed to in writing, software
> > +distributed under the License is distributed on an "AS IS" BASIS,
> > +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> > +See the License for the specific language governing permissions and
> > +limitations under the License.])
> > +
> > +m4_ifdef([AT_COLOR_TESTS], [AT_COLOR_TESTS])
> > +
> > +m4_include([tests/ovs-macros.at])
> > +m4_include([tests/ovsdb-macros.at])
> > +m4_include([tests/ofproto-macros.at])
> > +m4_include([tests/system-afxdp-macros.at])
> > +m4_include([tests/system-common-macros.at])
> > +
> > +m4_include([tests/system-afxdp-traffic.at])
> > +m4_include([tests/system-ovn.at])
> > diff --git a/tests/system-afxdp-traffic.at b/tests/
> system-afxdp-traffic.at
> > new file mode 100644
> > index 000000000000..26f72acf48ef
> > --- /dev/null
> > +++ b/tests/system-afxdp-traffic.at
> > @@ -0,0 +1,978 @@
> > +AT_BANNER([AF_XDP netdev datapath-sanity])
> > +
> > +AT_SETUP([datapath - ping between two ports])
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ulimit -l unlimited
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping between two ports on vlan])
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +ADD_VLAN(p0, at_ns0, 100, "10.2.2.1/24")
> > +ADD_VLAN(p1, at_ns1, 100, "10.2.2.2/24")
> > +
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.2.2.2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping6 between two ports])
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
> > +
> > +dnl Linux seems to take a little time to get its IPv6 stack in order.
> Without
> > +dnl waiting, we get occasional failures due to the following error:
> > +dnl "connect: Cannot assign requested address"
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
> > +
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 6 fc00::2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping6 between two ports on vlan])
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
> > +
> > +ADD_VLAN(p0, at_ns0, 100, "fc00:1::1/96")
> > +ADD_VLAN(p1, at_ns1, 100, "fc00:1::2/96")
> > +
> > +dnl Linux seems to take a little time to get its IPv6 stack in order.
> Without
> > +dnl waiting, we get occasional failures due to the following error:
> > +dnl "connect: Cannot assign requested address"
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00:1::2])
> > +
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:1::2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping6 -s 1600 -q -c 3 -i 0.3 -w 2 fc00:1::2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 fc00:1::2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over vxlan tunnel])
> > +OVS_CHECK_VXLAN()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> > +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL([vxlan], [br0], [at_vxlan0], [172.31.1.1], [
> 10.1.1.100/24])
> > +ADD_NATIVE_TUNNEL([vxlan], [at_vxlan1], [at_ns0], [172.31.1.100], [
> 10.1.1.1/24],
> > +                  [id 0 dstport 4789])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0],
> [OK
> > +])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over vxlan6 tunnel])
> > +OVS_CHECK_VXLAN_UDP6ZEROCSUM()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
> > +AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL6([vxlan], [br0], [at_vxlan0], [fc00::1], [10.1.1.100/24
> ])
> > +ADD_NATIVE_TUNNEL6([vxlan], [at_vxlan1], [at_ns0], [fc00::100], [
> 10.1.1.1/24],
> > +                   [id 0 dstport 4789 udp6zerocsumtx udp6zerocsumrx])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0], [OK
> > +])
> > +
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over gre tunnel])
> > +OVS_CHECK_GRE()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> > +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL([gre], [br0], [at_gre0], [172.31.1.1], [10.1.1.100/24])
> > +ADD_NATIVE_TUNNEL([gretap], [ns_gre0], [at_ns0], [172.31.1.100], [
> 10.1.1.1/24])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0],
> [OK
> > +])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over erspan v1 tunnel])
> > +OVS_CHECK_GRE()
> > +OVS_CHECK_ERSPAN()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> > +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1], [
> 10.1.1.100/24], [options:key=1 options:erspan_ver=1 options:erspan_idx=7])
> > +ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100], [
> 10.1.1.1/24], [seq key 1 erspan_ver 1 erspan 7])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0],
> [OK
> > +])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over erspan v2 tunnel])
> > +OVS_CHECK_GRE()
> > +OVS_CHECK_ERSPAN()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> > +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1], [
> 10.1.1.100/24], [options:key=1 options:erspan_ver=2 options:erspan_dir=1
> options:erspan_hwid=0x7])
> > +ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100], [
> 10.1.1.1/24], [seq key 1 erspan_ver 2 erspan_dir egress erspan_hwid 7])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0],
> [OK
> > +])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over ip6erspan v1 tunnel])
> > +OVS_CHECK_GRE()
> > +OVS_CHECK_ERSPAN()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [], nodad)
> > +AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1], [
> 10.1.1.100/24],
> > +                [options:key=123 options:erspan_ver=1
> options:erspan_idx=0x7])
> > +ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0], [fc00:100::100],
> > +                   [10.1.1.1/24], [local fc00:100::1 seq key 123
> erspan_ver 1 erspan 7])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0], [OK
> > +])
> > +
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over ip6erspan v2 tunnel])
> > +OVS_CHECK_GRE()
> > +OVS_CHECK_ERSPAN()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [], nodad)
> > +AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1], [
> 10.1.1.100/24],
> > +                [options:key=121 options:erspan_ver=2
> options:erspan_dir=0 options:erspan_hwid=0x7])
> > +ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0], [fc00:100::100],
> > +                   [10.1.1.1/24],
> > +                   [local fc00:100::1 seq key 121 erspan_ver 2
> erspan_dir ingress erspan_hwid 0x7])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0], [OK
> > +])
> > +
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over geneve tunnel])
> > +OVS_CHECK_GENEVE()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> > +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL([geneve], [br0], [at_gnv0], [172.31.1.1], [10.1.1.100/24
> ])
> > +ADD_NATIVE_TUNNEL([geneve], [ns_gnv0], [at_ns0], [172.31.1.100], [
> 10.1.1.1/24],
> > +                  [vni 0])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.100/24 br-underlay], [0],
> [OK
> > +])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over geneve6 tunnel])
> > +OVS_CHECK_GENEVE_UDP6ZEROCSUM()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
> > +AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL6([geneve], [br0], [at_gnv0], [fc00::1], [10.1.1.100/24])
> > +ADD_NATIVE_TUNNEL6([geneve], [ns_gnv0], [at_ns0], [fc00::100], [
> 10.1.1.1/24],
> > +                   [vni 0 udp6zerocsumtx udp6zerocsumrx])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0], [OK
> > +])
> > +
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - clone action])
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1, at_ns2)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +AT_CHECK([ovs-vsctl -- set interface ovs-p0 ofport_request=1 \
> > +                    -- set interface ovs-p1 ofport_request=2])
> > +
> > +AT_DATA([flows.txt], [dnl
> > +priority=1 actions=NORMAL
> > +priority=10
> in_port=1,ip,actions=clone(mod_dl_dst(50:54:00:00:00:0a),set_field:192.168.3.3->ip_dst),
> output:2
> > +priority=10
> in_port=2,ip,actions=clone(mod_dl_src(ae:c6:7e:54:8d:4d),mod_dl_dst(50:54:00:00:00:0b),set_field:192.168.4.4->ip_dst,
> controller), output:1
> > +])
> > +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> > +
> > +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir
> --pidfile 2> ofctl_monitor.log])
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +AT_CHECK([cat ofctl_monitor.log | STRIP_MONITOR_CSUM], [0], [dnl
> >
> +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0
> icmp_csum: <skip>
> >
> +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0
> icmp_csum: <skip>
> >
> +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0
> icmp_csum: <skip>
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - basic truncate action])
> > +AT_SKIP_IF([test $HAVE_NC = no])
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +AT_CHECK([ovs-ofctl del-flows br0])
> > +
> > +dnl Create p0 and ovs-p0(1)
> > +ADD_NAMESPACES(at_ns0)
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +NS_CHECK_EXEC([at_ns0], [ip link set dev p0 address e6:66:c1:11:11:11])
> > +NS_CHECK_EXEC([at_ns0], [arp -s 10.1.1.2 e6:66:c1:22:22:22])
> > +
> > +dnl Create p1(3) and ovs-p1(2), packets received from ovs-p1 will
> appear in p1
> > +AT_CHECK([ip link add p1 type veth peer name ovs-p1])
> > +on_exit 'ip link del ovs-p1'
> > +AT_CHECK([ip link set dev ovs-p1 up])
> > +AT_CHECK([ip link set dev p1 up])
> > +AT_CHECK([ovs-vsctl add-port br0 ovs-p1 -- set interface ovs-p1
> ofport_request=2])
> > +dnl Use p1 to check the truncated packet
> > +AT_CHECK([ovs-vsctl add-port br0 p1 -- set interface p1
> ofport_request=3])
> > +
> > +dnl Create p2(5) and ovs-p2(4)
> > +AT_CHECK([ip link add p2 type veth peer name ovs-p2])
> > +on_exit 'ip link del ovs-p2'
> > +AT_CHECK([ip link set dev ovs-p2 up])
> > +AT_CHECK([ip link set dev p2 up])
> > +AT_CHECK([ovs-vsctl add-port br0 ovs-p2 -- set interface ovs-p2
> ofport_request=4])
> > +dnl Use p2 to check the truncated packet
> > +AT_CHECK([ovs-vsctl add-port br0 p2 -- set interface p2
> ofport_request=5])
> > +
> > +dnl basic test
> > +AT_CHECK([ovs-ofctl del-flows br0])
> > +AT_DATA([flows.txt], [dnl
> > +in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop
> > +in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop
> > +in_port=1 dl_dst=e6:66:c1:22:22:22
> actions=output(port=2,max_len=100),output:4
> > +])
> > +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> > +
> > +dnl use this file as payload file for ncat
> > +AT_CHECK([dd if=/dev/urandom of=payload200.bin bs=200 count=1 2>
> /dev/null])
> > +on_exit 'rm -f payload200.bin'
> > +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 <
> payload200.bin])
> > +
> > +dnl packet with truncated size
> > +AT_CHECK([ovs-appctl revalidator/purge], [0])
> > +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" |  sed -n
> 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> > +n_bytes=100
> > +])
> > +dnl packet with original size
> > +AT_CHECK([ovs-appctl revalidator/purge], [0])
> > +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n
> 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> > +n_bytes=242
> > +])
> > +
> > +dnl more complicated output actions
> > +AT_CHECK([ovs-ofctl del-flows br0])
> > +AT_DATA([flows.txt], [dnl
> > +in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop
> > +in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop
> > +in_port=1 dl_dst=e6:66:c1:22:22:22
> actions=output(port=2,max_len=100),output:4,output(port=2,max_len=100),output(port=4,max_len=100),output:2,output(port=4,max_len=200),output(port=2,max_len=65535)
> > +])
> > +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> > +
> > +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 <
> payload200.bin])
> > +
> > +dnl 100 + 100 + 242 + min(65535,242) = 684
> > +AT_CHECK([ovs-appctl revalidator/purge], [0])
> > +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n
> 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> > +n_bytes=684
> > +])
> > +dnl 242 + 100 + min(242,200) = 542
> > +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n
> 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> > +n_bytes=542
> > +])
> > +
> > +dnl SLOW_ACTION: disable kernel datapath truncate support
> > +dnl Repeat the test above, but exercise the SLOW_ACTION code path
> > +AT_CHECK([ovs-appctl dpif/set-dp-features br0 trunc false], [0])
> > +
> > +dnl SLOW_ACTION test1: check datapatch actions
> > +AT_CHECK([ovs-ofctl del-flows br0])
> > +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> > +
> > +AT_CHECK([ovs-appctl ofproto/trace br0
> "in_port=1,dl_type=0x800,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_proto=6,tp_src=8,tp_dst=9"],
> [0], [stdout])
> > +AT_CHECK([tail -3 stdout], [0],
> > +[Datapath actions:
> trunc(100),3,5,trunc(100),3,trunc(100),5,3,trunc(200),5,trunc(65535),3
> > +This flow is handled by the userspace slow path because it:
> > +  - Uses action(s) not supported by datapath.
> > +])
> > +
> > +dnl SLOW_ACTION test2: check actual packet truncate
> > +AT_CHECK([ovs-ofctl del-flows br0])
> > +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> > +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 <
> payload200.bin])
> > +
> > +dnl 100 + 100 + 242 + min(65535,242) = 684
> > +AT_CHECK([ovs-appctl revalidator/purge], [0])
> > +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n
> 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> > +n_bytes=684
> > +])
> > +
> > +dnl 242 + 100 + min(242,200) = 542
> > +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n
> 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> > +n_bytes=542
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +
> > +AT_BANNER([conntrack])
> > +
> > +AT_SETUP([conntrack - controller])
> > +CHECK_CONNTRACK()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg
> ofproto_dpif_upcall:dbg])
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from
> ns1->ns0.
> > +AT_DATA([flows.txt], [dnl
> > +priority=1,action=drop
> > +priority=10,arp,action=normal
> > +priority=100,in_port=1,udp,action=ct(commit),controller
> > +priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0)
> > +priority=100,in_port=2,ct_state=+trk+est,udp,action=controller
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +AT_CAPTURE_FILE([ofctl_monitor.log])
> > +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir
> --pidfile 2> ofctl_monitor.log])
> > +
> > +dnl Send an unsolicited reply from port 2. This should be dropped.
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\)
> '50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000'])
> > +
> > +dnl OK, now start a new connection from port 1.
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 1
> ct\(commit\),controller
> '50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000'])
> > +
> > +dnl Now try a reply from port 2.
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\)
> '50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000'])
> > +
> > +dnl Check this output. We only see the latter two packets, not the
> first.
> > +AT_CHECK([cat ofctl_monitor.log], [0], [dnl
> > +NXT_PACKET_IN2 (xid=0x0): total_len=42 in_port=1 (via action)
> data_len=42 (unbuffered)
> >
> +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2
> udp_csum:0
> > +NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42
> ct_state=est|rpl|trk,ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2,ip,in_port=2
> (via action) data_len=42 (unbuffered)
> >
> +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1
> udp_csum:0
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([conntrack - force commit])
> > +CHECK_CONNTRACK()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg
> ofproto_dpif_upcall:dbg])
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +AT_DATA([flows.txt], [dnl
> > +priority=1,action=drop
> > +priority=10,arp,action=normal
> > +priority=100,in_port=1,udp,action=ct(force,commit),controller
> > +priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0)
> >
> +priority=100,in_port=2,ct_state=+trk+est,udp,action=ct(force,commit,table=1)
> > +table=1,in_port=2,ct_state=+trk,udp,action=controller
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +AT_CAPTURE_FILE([ofctl_monitor.log])
> > +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir
> --pidfile 2> ofctl_monitor.log])
> > +
> > +dnl Send an unsolicited reply from port 2. This should be dropped.
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000
> actions=resubmit(,0)"])
> > +
> > +dnl OK, now start a new connection from port 1.
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000
> actions=resubmit(,0)"])
> > +
> > +dnl Now try a reply from port 2.
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000
> actions=resubmit(,0)"])
> > +
> > +AT_CHECK([ovs-appctl revalidator/purge], [0])
> > +
> > +dnl Check this output. We only see the latter two packets, not the
> first.
> > +AT_CHECK([cat ofctl_monitor.log], [0], [dnl
> > +NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 in_port=1 (via
> action) data_len=42 (unbuffered)
> >
> +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2
> udp_csum:0
> > +NXT_PACKET_IN2 (xid=0x0): table_id=1 cookie=0x0 total_len=42
> ct_state=new|trk,ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1,ip,in_port=2
> (via action) data_len=42 (unbuffered)
> >
> +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1
> udp_csum:0
> > +])
> > +
> > +dnl
> > +dnl Check that the directionality has been changed by force commit.
> > +dnl
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.2,"], [], [dnl
> >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2)
> > +])
> > +
> > +dnl OK, now send another packet from port 1 and see that it switches
> again
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000
> actions=resubmit(,0)"])
> > +AT_CHECK([ovs-appctl revalidator/purge], [0])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.1,"], [], [dnl
> >
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([conntrack - ct flush by 5-tuple])
> > +CHECK_CONNTRACK()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +AT_DATA([flows.txt], [dnl
> > +priority=1,action=drop
> > +priority=10,arp,action=normal
> > +priority=100,in_port=1,udp,action=ct(commit),2
> > +priority=100,in_port=2,udp,action=ct(zone=5,commit),1
> > +priority=100,in_port=1,icmp,action=ct(commit),2
> > +priority=100,in_port=2,icmp,action=ct(zone=5,commit),1
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +dnl Test UDP from port 1
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000
> actions=resubmit(,0)"])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.1,"], [], [dnl
> >
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack
> 'ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1'])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.1,"], [1], [dnl
> > +])
> > +
> > +dnl Test UDP from port 2
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000
> actions=resubmit(,0)"])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.2,"], [0], [dnl
> >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5
> 'ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2'])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0],
> [dnl
> > +])
> > +
> > +dnl Test ICMP traffic
> > +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.2,"], [0], [stdout])
> > +AT_CHECK([cat stdout | FORMAT_CT(10.1.1.1)], [0],[dnl
> >
> +icmp,orig=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=8,code=0),reply=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=0,code=0),zone=5
> > +])
> > +
> > +ICMP_ID=`cat stdout | cut -d ',' -f4 | cut -d '=' -f2`
> >
> +ICMP_TUPLE=ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=1,icmp_id=$ICMP_ID,icmp_type=8,icmp_code=0
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5 $ICMP_TUPLE])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.2,"], [1], [dnl
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([conntrack - IPv4 ping])
> > +CHECK_CONNTRACK()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from
> ns1->ns0.
> > +AT_DATA([flows.txt], [dnl
> > +priority=1,action=drop
> > +priority=10,arp,action=normal
> > +priority=100,in_port=1,icmp,action=ct(commit),2
> > +priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
> > +priority=100,in_port=2,icmp,ct_state=+trk+est,action=1
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +dnl Pings from ns0->ns1 should work fine.
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0],
> [dnl
> >
> +icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=0,code=0)
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> > +
> > +dnl Pings from ns1->ns0 should fail.
> > +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 |
> FORMAT_PING], [0], [dnl
> > +7 packets transmitted, 0 received, 100% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([conntrack - get_nconns and get/set_maxconns])
> > +CHECK_CONNTRACK()
> > +CHECK_CT_DPIF_SET_GET_MAXCONNS()
> > +CHECK_CT_DPIF_GET_NCONNS()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from
> ns1->ns0.
> > +AT_DATA([flows.txt], [dnl
> > +priority=1,action=drop
> > +priority=10,arp,action=normal
> > +priority=100,in_port=1,icmp,action=ct(commit),2
> > +priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
> > +priority=100,in_port=2,icmp,ct_state=+trk+est,action=1
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +dnl Pings from ns0->ns1 should work fine.
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0],
> [dnl
> >
> +icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=0,code=0)
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp], [2], [], [dnl
> > +ovs-vswitchd: maxconns missing or malformed (Invalid argument)
> > +ovs-appctl: ovs-vswitchd: server returned an error
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns a], [2], [], [dnl
> > +ovs-vswitchd: maxconns missing or malformed (Invalid argument)
> > +ovs-appctl: ovs-vswitchd: server returned an error
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp 10], [2], [], [dnl
> > +ovs-vswitchd: datapath not found (Invalid argument)
> > +ovs-appctl: ovs-vswitchd: server returned an error
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns one-bad-dp], [2], [], [dnl
> > +ovs-vswitchd: datapath not found (Invalid argument)
> > +ovs-appctl: ovs-vswitchd: server returned an error
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-nconns one-bad-dp], [2], [], [dnl
> > +ovs-vswitchd: datapath not found (Invalid argument)
> > +ovs-appctl: ovs-vswitchd: server returned an error
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl
> > +1
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
> > +3000000
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns 10], [], [dnl
> > +setting maxconns successful
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
> > +10
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl
> > +0
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
> > +10
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([conntrack - IPv6 ping])
> > +CHECK_CONNTRACK()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
> > +
> > +AT_DATA([flows.txt], [dnl
> > +
> > +dnl ICMPv6 echo request and reply go to table 1.  The rest of the
> traffic goes
> > +dnl through normal action.
> > +table=0,priority=10,icmp6,icmp_type=128,action=goto_table:1
> > +table=0,priority=10,icmp6,icmp_type=129,action=goto_table:1
> > +table=0,priority=1,action=normal
> > +
> > +dnl Allow everything from ns0->ns1. Only allow return traffic from
> ns1->ns0.
> > +table=1,priority=100,in_port=1,icmp6,action=ct(commit),2
> > +table=1,priority=100,in_port=2,icmp6,ct_state=-trk,action=ct(table=0)
> > +table=1,priority=100,in_port=2,icmp6,ct_state=+trk+est,action=1
> > +table=1,priority=1,action=drop
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
> > +
> > +dnl The above ping creates state in the connection tracker.  We're not
> > +dnl interested in that state.
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> > +
> > +dnl Pings from ns1->ns0 should fail.
> > +NS_CHECK_EXEC([at_ns1], [ping6 -q -c 3 -i 0.3 -w 2 fc00::1 |
> FORMAT_PING], [0], [dnl
> > +7 packets transmitted, 0 received, 100% packet loss, time 0ms
> > +])
> > +
> > +dnl Pings from ns0->ns1 should work fine.
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fc00::2)], [0],
> [dnl
> >
> +icmpv6,orig=(src=fc00::1,dst=fc00::2,id=<cleared>,type=128,code=0),reply=(src=fc00::2,dst=fc00::1,id=<cleared>,type=129,code=0)
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> >
>
Ilya Maximets April 26, 2019, 9:46 a.m. UTC | #3
On 25.04.2019 2:47, William Tu wrote:
> diff --git a/lib/netdev.c b/lib/netdev.c
> index 7d7ecf6f0946..c30016b34033 100644
> --- a/lib/netdev.c
> +++ b/lib/netdev.c
> @@ -145,6 +145,7 @@ netdev_initialize(void)
>          netdev_register_provider(&netdev_linux_class);
>          netdev_register_provider(&netdev_internal_class);
>          netdev_register_provider(&netdev_tap_class);
> +        netdev_register_provider(&netdev_afxdp_class);

It's better to move this under #ifdef HAVE_AF_XDP to not register
netdev class that will not work. Otherwise it'll be confusing for
users because OVS will report afxdp in a list of supported port
types and will allow port creation.

>          netdev_vport_tunnel_register();
>  #endif
>  #if defined(__FreeBSD__) || defined(__NetBSD__)
William Tu April 26, 2019, 3:47 p.m. UTC | #4
On Fri, Apr 26, 2019 at 2:46 AM Ilya Maximets <i.maximets@samsung.com>
wrote:

> On 25.04.2019 2:47, William Tu wrote:
> > diff --git a/lib/netdev.c b/lib/netdev.c
> > index 7d7ecf6f0946..c30016b34033 100644
> > --- a/lib/netdev.c
> > +++ b/lib/netdev.c
> > @@ -145,6 +145,7 @@ netdev_initialize(void)
> >          netdev_register_provider(&netdev_linux_class);
> >          netdev_register_provider(&netdev_internal_class);
> >          netdev_register_provider(&netdev_tap_class);
> > +        netdev_register_provider(&netdev_afxdp_class);
>
> It's better to move this under #ifdef HAVE_AF_XDP to not register
> netdev class that will not work. Otherwise it'll be confusing for
> users because OVS will report afxdp in a list of supported port
> types and will allow port creation.
>

Yes, thanks.
Will do it next version.
William

>
> >          netdev_vport_tunnel_register();
> >  #endif
> >  #if defined(__FreeBSD__) || defined(__NetBSD__)
>
William Tu April 27, 2019, 1:28 p.m. UTC | #5
On Thu, Apr 25, 2019 at 8:09 AM Ilya Maximets <i.maximets@samsung.com>
wrote:

> Hi.
>
> This is not a full review. Just a bunch of thoughts.
>
> See inline.
>
> Best regards, Ilya Maximets.
>
> On 25.04.2019 2:47, William Tu wrote:
> > The patch introduces experimental AF_XDP support for OVS netdev.
> > AF_XDP is a new address family working together with eBPF/XDP.
> > A socket with AF_XDP family can receive and send raw packets
> > from an eBPF/XDP program attached to the netdev.
> > For details introduction and configuration, see
> > Documentation/intro/install/afxdp.rst
> >
> > Signed-off-by: William Tu <u9012063@gmail.com>
> > Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
> > Co-authored-by: Yi-Hung Wei <yihung.wei@gmail.com>
> > ---
> > v1->v2:
> > - add a list to maintain unused umem elements
> > - remove copy from rx umem to ovs internal buffer
> > - use hugetlb to reduce misses (not much difference)
> > - use pmd mode netdev in OVS (huge performance improve)
> > - remove malloc dp_packet, instead put dp_packet in umem
> >
> > v2->v3:
> > - rebase on the OVS master, 7ab4b0653784
> >   ("configure: Check for more specific function to pull in pthread
> library.")
> > - remove the dependency on libbpf and dpif-bpf.
> >   instead, use the built-in XDP_ATTACH feature.
> > - data structure optimizations for better performance, see[1]
> > - more test cases support
> > v3:
> https://mail.openvswitch.org/pipermail/ovs-dev/2018-November/354179.html
> >
> > v3->v4:
> > - Use AF_XDP API provided by libbpf
> > - Remove the dependency on XDP_ATTACH kernel patch set
> > - Add documentation, bpf.rst
> >
> > v4->v5:
> > - rebase to master
> > - remove rfc, squash all into a single patch
> > - add --enable-afxdp, so by default, AF_XDP is not compiled
> > - add options: xdpmode=drv,skb
> > - add multiple queue and multiple PMD support, with options: n_rxq
> > - improve documentation, rename bpf.rst to af_xdp.rst
> >
> > v5->v6
> > - rebase to master, commit 0cdd5b13de91b98
> > - address errors from sparse and clang
> > - pass travis-ci test
> > - address feedback from Ben
> > - fix issues reported by 0-day robot
> > - improved documentation
> > ---
> >  Documentation/automake.mk             |   1 +
> >  Documentation/index.rst               |   1 +
> >  Documentation/intro/install/afxdp.rst | 366 +++++++++++++
> >  Documentation/intro/install/index.rst |   1 +
> >  acinclude.m4                          |  23 +
> >  configure.ac                          |   1 +
> >  lib/automake.mk                       |   7 +-
> >  lib/dp-packet.c                       |  16 +
> >  lib/dp-packet.h                       |  35 +-
> >  lib/dpif-netdev-perf.h                |  13 +
> >  lib/netdev-afxdp.c                    | 589 ++++++++++++++++++++
> >  lib/netdev-afxdp.h                    |  47 ++
> >  lib/netdev-linux.c                    |  89 +++-
> >  lib/netdev-linux.h                    |   1 +
> >  lib/netdev-provider.h                 |   1 +
> >  lib/netdev.c                          |   1 +
> >  lib/xdpsock.c                         | 210 ++++++++
> >  lib/xdpsock.h                         | 133 +++++
> >  tests/automake.mk                     |  17 +
> >  tests/system-afxdp-macros.at          | 153 ++++++
> >  tests/system-afxdp-testsuite.at       |  26 +
> >  tests/system-afxdp-traffic.at         | 978
> ++++++++++++++++++++++++++++++++++
> >  22 files changed, 2703 insertions(+), 6 deletions(-)
> >  create mode 100644 Documentation/intro/install/afxdp.rst
> >  create mode 100644 lib/netdev-afxdp.c
> >  create mode 100644 lib/netdev-afxdp.h
> >  create mode 100644 lib/xdpsock.c
> >  create mode 100644 lib/xdpsock.h
> >  create mode 100644 tests/system-afxdp-macros.at
> >  create mode 100644 tests/system-afxdp-testsuite.at
> >  create mode 100644 tests/system-afxdp-traffic.at
> >
> > diff --git a/Documentation/automake.mk b/Documentation/automake.mk
> > index 082438e09a33..11cc59efc881 100644
> > --- a/Documentation/automake.mk
> > +++ b/Documentation/automake.mk
> > @@ -10,6 +10,7 @@ DOC_SOURCE = \
> >       Documentation/intro/why-ovs.rst \
> >       Documentation/intro/install/index.rst \
> >       Documentation/intro/install/bash-completion.rst \
> > +     Documentation/intro/install/afxdp.rst \
> >       Documentation/intro/install/debian.rst \
> >       Documentation/intro/install/documentation.rst \
> >       Documentation/intro/install/distributions.rst \
> > diff --git a/Documentation/index.rst b/Documentation/index.rst
> > index 46261235c732..aa9e7c49f179 100644
> > --- a/Documentation/index.rst
> > +++ b/Documentation/index.rst
> > @@ -59,6 +59,7 @@ vSwitch? Start here.
> >    :doc:`intro/install/windows` |
> >    :doc:`intro/install/xenserver` |
> >    :doc:`intro/install/dpdk` |
> > +  :doc:`intro/install/afxdp` |
> >    :doc:`Installation FAQs <faq/releases>`
> >
> >  - **Tutorials:** :doc:`tutorials/faucet` |
> > diff --git a/Documentation/intro/install/afxdp.rst
> b/Documentation/intro/install/afxdp.rst
> > new file mode 100644
> > index 000000000000..a1e3317bbdb5
> > --- /dev/null
> > +++ b/Documentation/intro/install/afxdp.rst
> > @@ -0,0 +1,366 @@
> > +..
> > +      Licensed under the Apache License, Version 2.0 (the "License");
> you may
> > +      not use this file except in compliance with the License. You may
> obtain
> > +      a copy of the License at
> > +
> > +          http://www.apache.org/licenses/LICENSE-2.0
> > +
> > +      Unless required by applicable law or agreed to in writing,
> software
> > +      distributed under the License is distributed on an "AS IS" BASIS,
> WITHOUT
> > +      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> See the
> > +      License for the specific language governing permissions and
> limitations
> > +      under the License.
> > +
> > +      Convention for heading levels in Open vSwitch documentation:
> > +
> > +      =======  Heading 0 (reserved for the title in a document)
> > +      -------  Heading 1
> > +      ~~~~~~~  Heading 2
> > +      +++++++  Heading 3
> > +      '''''''  Heading 4
> > +
> > +      Avoid deeper levels because they do not render well.
> > +
> > +
> > +========================
> > +Open vSwitch with AF_XDP
> > +========================
> > +
> > +This document describes how to build and install Open vSwitch using
> > +AF_XDP netdev.
> > +
> > +.. warning::
> > +  The AF_XDP support of Open vSwitch is considered 'experimental',
> > +  and it is not compiled in by default.
> > +
> > +Introduction
> > +------------
> > +AF_XDP, Address Family of the eXpress Data Path, is a new Linux socket
> type
> > +built upon the eBPF and XDP technology.  It is aims to have comparable
> > +performance to DPDK but cooperate better with existing kernel's
> networking
> > +stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP
> program
> > +attached to the netdev, by-passing a couple of Linux kernel's
> subsystems.
> > +As a result, AF_XDP socket shows much better performance than AF_PACKET.
> > +For more details about AF_XDP, please see linux kernel's
> > +Documentation/networking/af_xdp.rst
> > +
> > +
> > +AF_XDP Netdev
> > +-------------
> > +OVS has a couple of netdev types, i.e., system, tap, or
> > +internal.  The AF_XDP feature adds a new netdev types called
> > +"afxdp", and implement its configuration, packet reception,
> > +and transmit functions.  Since the AF_XDP socket, xsk,
> > +operates in userspace, once ovs-vswitchd receives packets
> > +from xsk, the proposed architecture re-uses the existing
> > +userspace dpif-netdev datapath.  As a result, most of
> > +the packet processing happens at the userspace instead of
> > +linux kernel.
> > +
> > +::
> > +
> > +              |   +-------------------+
> > +              |   |    ovs-vswitchd   |<-->ovsdb-server
> > +              |   +-------------------+
> > +              |   |      ofproto      |<-->OpenFlow controllers
> > +              |   +--------+-+--------+
> > +              |   | netdev | |ofproto-|
> > +    userspace |   +--------+ |  dpif  |
> > +              |   | afxdp  | +--------+
> > +              |   | netdev | |  dpif  |
> > +              |   +---||---+ +--------+
> > +              |       ||     |  dpif- |
> > +              |       ||     | netdev |
> > +              |_      ||     +--------+
> > +                      ||
> > +               _  +---||-----+--------+
> > +              |   | AF_XDP prog +     |
> > +       kernel |   |   xsk_map         |
> > +              |_  +--------||---------+
> > +                           ||
> > +                        physical
> > +                           NIC
> > +
> > +
> > +Build requirements
> > +------------------
> > +
> > +In addition to the requirements described in :doc:`general`, building
> Open
> > +vSwitch with AF_XDP will require the following:
> > +
> > +- libbpf from kernel source tree (kernel 5.0.0 or later)
> > +
> > +- Linux kernel XDP support, with the following options (required)
> > +  ``_CONFIG_BPF=y``
> > +
> > +  ``_CONFIG_BPF_SYSCALL=y``
> > +
> > +  ``_CONFIG_XDP_SOCKETS=y``
> > +
> > +
> > +- The following optional Kconfig options are also recommended, but not
> > +  required:
> > +
> > +  ``_CONFIG_BPF_JIT=y`` (Performance)
> > +
> > +  ``_CONFIG_HAVE_BPF_JIT=y`` (Performance)
> > +
> > +  ``_CONFIG_XDP_SOCKETS_DIAG=y`` (Debugging)
> > +
> > +- If possible, run **./xdpsock -r -N -z -i <your device>** under
> > +  linux/samples/bpf.  This is the OVS indepedent benchmark tools for
> AF_XDP.
> > +  It makes sure your basic kernel requirements are met for AF_XDP.
> > +
> > +
> > +Installing
> > +----------
> > +For OVS to use AF_XDP netdev, it has to be configured with LIBBPF
> support.
> > +Frist, clone a recent version of Linux bpf-next tree::
> > +
> > +  git clone git://
> git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
> > +
> > +Second, go into the Linux source directory and build libbpf in the tools
> > +directory::
> > +
> > +  cd bpf-next/
> > +  cd tools/lib/bpf/
> > +  make && make install
> > +  make install_headers
> > +
> > +.. note::
> > +   Make sure xsk.h and bpf.h are installed in system's library path,
> > +   e.g. /usr/local/include/bpf/ or /usr/include/bpf/
> > +
> > +Make sure the libbpf.so is installed correctly::
> > +
> > +  ldconfig
> > +  ldconfig -p | grep libbpf
> > +
> > +
> > +Third, ensure the standard OVS requirements are installed and
> > +bootstrap/configure the package::
> > +
> > +  ./boot.sh && ./configure --enable-afxdp
> > +
> > +Finally, build and install OVS::
> > +
> > +  make && make install
> > +
> > +To kick start end-to-end autotesting::
> > +
> > +  uname -a # make sure having 5.0+ kernel
> > +  make check-afxdp
> > +
> > +if a test case fails, check the log at::
> > +
> > +  cat
> tests/system-afxdp-testsuite.dir/<number>/system-afxdp-testsuite.log
> > +
> > +
> > +Setup AF_XDP netdev
> > +-------------------
> > +Before running OVS with AF_XDP, make sure the libbpf and libelf are
> > +set-up right::
> > +
> > +  ldd vswitchd/ovs-vswitchd
> > +
> > +Open vSwitch should be started using userspace datapath as described
> > +in :doc:`general`::
> > +
> > +  ovs-vswitchd --disable-system
> > +  ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
> > +
> > +.. note::
> > +   OVS AF_XDP netdev is using the userspace datapath, the same datapath
> > +   as used by OVS-DPDK.  So it requires --disable-system for
> ovs-vswitchd
> > +   and datapath_type=netdev when adding a new bridge.
> > +
> > +Make sure your device support AF_XDP, and to use 1 PMD (on core 4)
> > +on 1 queue (queue 0) device, configure these options: **pmd-cpu-mask,
> > +pmd-rxq-affinity, and n_rxq**. The **xdpmode** can be "drv" or "skb"::
> > +
> > +  ethtool -L enp2s0 combined 1
> > +  ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
> > +  ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
> > +    options:n_rxq=1 options:xdpmode=drv \
> > +    other_config:pmd-rxq-affinity="0:4"
> > +
> > +Or, use 4 pmds/cores and 4 queues by doing::
> > +
> > +  ethtool -L enp2s0 combined 4
> > +  ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
> > +  ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
> > +    options:n_rxq=4 options:xdpmode=drv \
> > +    other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
> > +
> > +To validate that the bridge has successfully instantiated, you can use
> the::
> > +
> > +  ovs-vsctl show
> > +
> > +should show something like::
> > +
> > +  Port "ens802f0"
> > +   Interface "ens802f0"
> > +      type: afxdp
> > +      options: {n_rxq="1", xdpmode=drv}
> > +
> > +Otherwise, enable debug by::
> > +
> > +  ovs-appctl vlog/set netdev_afxdp::dbg
> > +
> > +
> > +References
> > +----------
> > +Most of the design details are described in the paper presented at
> > +Linux Plumber 2018, "Bringing the Power of eBPF to Open vSwitch"[1],
> > +section 4, and slides[2][4].
> > +"The Path to DPDK Speeds for AF XDP"[3] gives a very good introduction
> > +about AF_XDP current and future work.
> > +
> > +
> > +[1] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-afxdp.pdf
> > +
> > +[2]
> http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-lpc18-presentation.pdf
> > +
> > +[3]
> http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf
> > +
> > +[4]
> https://ovsfall2018.sched.com/event/IO7p/fast-userspace-ovs-with-afxdp
> > +
> > +
> > +Performance Tuning
> > +------------------
> > +The name of the game is to keep your CPU running in userspace, allowing
> PMD
> > +to keep polling the AF_XDP queues without any interferences from kernel.
> > +
> > +#. Make sure everything is in the same NUMA node (memory used by
> AF_XDP, pmd
> > +   running cores, device plug-in slot)
> > +
> > +#. Isolate your CPU by doing isolcpu at grub configure.
> > +
> > +#. IRQ should not set to pmd running core.
> > +
> > +#. The Spectre and Meltdown fixes increase the overhead of system calls.
> > +
> > +Debugging performance issue
> > +~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > +While running the traffic, use linux perf tool to see where your cpu
> > +spends its cycle::
> > +
> > +  cd bpf-next/tools/perf
> > +  make
> > +  ./perf record -p `pidof ovs-vswitchd` sleep 10
> > +  ./perf report
> > +
> > +Measure your system call rate by doing::
> > +
> > +  pstree -p `pidof ovs-vswitchd`
> > +  strace -c -p <your pmd's PID>
> > +
> > +Or, use OVS pmd tool::
> > +
> > +  ovs-appctl dpif-netdev/pmd-stats-show
> > +
> > +
> > +Example Script
> > +--------------
> > +
> > +Below is a script using namespaces and veth peer::
> > +
> > +  #!/bin/bash
> > +  ovs-vswitchd --no-chdir --pidfile -vvconn -vofproto_dpif -vunixctl \
> > +    --disable-system --detach \
> > +  ovs-vsctl -- add-br br0 -- set Bridge br0 \
> > +    protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14 \
> > +    fail-mode=secure datapath_type=netdev
> > +  ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
> > +
> > +  ip netns add at_ns0
> > +  ovs-appctl vlog/set netdev_afxdp::dbg
> > +
> > +  ip link add p0 type veth peer name afxdp-p0
> > +  ip link set p0 netns at_ns0
> > +  ip link set dev afxdp-p0 up
> > +  ovs-vsctl add-port br0 afxdp-p0 -- \
> > +    set interface afxdp-p0 external-ids:iface-id="p0" type="afxdp"
> > +
> > +  ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
> > +  ip addr add "10.1.1.1/24" dev p0
> > +  ip link set dev p0 up
> > +  NS_EXEC_HEREDOC
> > +
> > +  ip netns add at_ns1
> > +  ip link add p1 type veth peer name afxdp-p1
> > +  ip link set p1 netns at_ns1
> > +  ip link set dev afxdp-p1 up
> > +
> > +  ovs-vsctl add-port br0 afxdp-p1 -- \
> > +    set interface afxdp-p1 external-ids:iface-id="p1" type="afxdp"
> > +  ip netns exec at_ns1 sh << NS_EXEC_HEREDOC
> > +  ip addr add "10.1.1.2/24" dev p1
> > +  ip link set dev p1 up
> > +  NS_EXEC_HEREDOC
> > +
> > +  ip netns exec at_ns0 ping -i .2 10.1.1.2
> > +
> > +
> > +Limitations/Known Issues
> > +------------------------
> > +#. Device's numa ID is always 0, need a way to find numa id from a
> netdev.
> > +#. No QoS support because AF_XDP netdev by-pass the Linux TC layer. A
> possible
> > +   work-around is to use OpenFlow meter action.
> > +#. AF_XDP device added to bridge, remove, and added again will fail.
> > +#. Most of the tests are done using i40e single port. Multiple ports and
> > +   also ixgbe driver also needs to be tested.
> > +#. No latency test result (TODO items)
> > +
> > +
> > +make check-afxdp
> > +----------------
> > +When executing 'make check-afxdp', OVS creates namespaces, sets up
> AF_XDP on
> > +veth devices and kicks start the testing.  So far we have the following
> test
> > +cases::
> > +
> > + AF_XDP netdev datapath-sanity
> > +
> > +  1: datapath - ping between two ports               ok
> > +  2: datapath - ping between two ports on vlan       ok
> > +  3: datapath - ping6 between two ports              ok
> > +  4: datapath - ping6 between two ports on vlan      ok
> > +  5: datapath - ping over vxlan tunnel               ok
> > +  6: datapath - ping over vxlan6 tunnel              ok
> > +  7: datapath - ping over gre tunnel                 ok
> > +  8: datapath - ping over erspan v1 tunnel           ok
> > +  9: datapath - ping over erspan v2 tunnel           ok
> > + 10: datapath - ping over ip6erspan v1 tunnel        ok
> > + 11: datapath - ping over ip6erspan v2 tunnel        ok
> > + 12: datapath - ping over geneve tunnel              ok
> > + 13: datapath - ping over geneve6 tunnel             ok
> > + 14: datapath - clone action                         ok
> > + 15: datapath - basic truncate action                ok
> > +
> > + conntrack
> > +
> > + 16: conntrack - controller                          ok
> > + 17: conntrack - force commit                        ok
> > + 18: conntrack - ct flush by 5-tuple                 ok
> > + 19: conntrack - IPv4 ping                           ok
> > + 20: conntrack - get_nconns and get/set_maxconns     ok
> > + 21: conntrack - IPv6 ping                           ok
> > +
> > + system-ovn
> > +
> > + 22: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT ok
> > + 23: ovn -- 2 LRs connected via LS, gateway router, easy SNAT ok
> > + 24: ovn -- multiple gateway routers, SNAT and DNAT  ok
> > + 25: ovn -- load-balancing                           ok
> > + 26: ovn -- load-balancing - same subnet.            ok
> > + 27: ovn -- load balancing in gateway router         ok
> > + 28: ovn -- multiple gateway routers, load-balancing ok
> > + 29: ovn -- load balancing in router with gateway router port ok
> > + 30: ovn -- DNAT and SNAT on distributed router - N/S ok
> > + 31: ovn -- DNAT and SNAT on distributed router - E/W ok
> > +
> > +
> > +Bug Reporting
> > +-------------
> > +
> > +Please report problems to dev@openvswitch.org.
> > diff --git a/Documentation/intro/install/index.rst
> b/Documentation/intro/install/index.rst
> > index 3193c736cf17..c27a9c9d16ff 100644
> > --- a/Documentation/intro/install/index.rst
> > +++ b/Documentation/intro/install/index.rst
> > @@ -45,6 +45,7 @@ Installation from Source
> >     xenserver
> >     userspace
> >     dpdk
> > +   afxdp
> >
> >  Installation from Packages
> >  --------------------------
> > diff --git a/acinclude.m4 b/acinclude.m4
> > index 301aeb70d82a..d80f2494d514 100644
> > --- a/acinclude.m4
> > +++ b/acinclude.m4
> > @@ -221,6 +221,29 @@ AC_DEFUN([OVS_FIND_DEPENDENCY], [
> >    ])
> >  ])
> >
> > +dnl OVS_CHECK_LINUX_AF_XDP
> > +dnl
> > +dnl Check both Linux kernel AF_XDP and libbpf support
> > +AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
> > +  AC_MSG_CHECKING([whether AF_XDP is supported])
> > +  AC_ARG_ENABLE([afxdp],
> > +                [AC_HELP_STRING([--enable-afxdp], [Enable AF-XDP
> support])],
> > +                [], [enable_afxdp=no])
> > +  AC_CHECK_HEADER([bpf/libbpf.h],
> > +                  [HAVE_LIBBPF=yes],
> > +                  [HAVE_LIBBPF=no])
> > +  AC_CHECK_HEADER([linux/if_xdp.h],
> > +                  [HAVE_IF_XDP=yes],
> > +                  [HAVE_IF_XDP=no])
> > +  AM_CONDITIONAL([SUPPORT_AF_XDP],
> > +                 [test "$enable_afxdp" = yes && test "$HAVE_LIBBPF" =
> yes && test "$HAVE_IF_XDP" = yes])
> > +  AM_COND_IF([SUPPORT_AF_XDP], [
> > +    AC_DEFINE([HAVE_AF_XDP], [1], [Define to 1 if AF-XDP support is
> available and enabled.])
> > +    LIBBPF_LDADD=" -lbpf -lelf"
> > +    AC_SUBST([LIBBPF_LDADD])
> > +  ])
> > +])
> > +
>
> I think that configure should fail in case we have no required headers.
> It's confusing that I explicitly enabled afxdp, but OVS was built without
> its support.
> One more thing is that AC_MSG_CHECKING requires subsequent AC_MSG_RESULT,
> otherwise it will look not good.
>
> Suggesting following incremental:
>
> diff --git a/acinclude.m4 b/acinclude.m4
> index d80f2494d..c919af570 100644
> --- a/acinclude.m4
> +++ b/acinclude.m4
> @@ -225,23 +225,26 @@ dnl OVS_CHECK_LINUX_AF_XDP
>  dnl
>  dnl Check both Linux kernel AF_XDP and libbpf support
>  AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
> -  AC_MSG_CHECKING([whether AF_XDP is supported])
>    AC_ARG_ENABLE([afxdp],
>                  [AC_HELP_STRING([--enable-afxdp], [Enable AF-XDP
> support])],
>                  [], [enable_afxdp=no])
> -  AC_CHECK_HEADER([bpf/libbpf.h],
> -                  [HAVE_LIBBPF=yes],
> -                  [HAVE_LIBBPF=no])
> -  AC_CHECK_HEADER([linux/if_xdp.h],
> -                  [HAVE_IF_XDP=yes],
> -                  [HAVE_IF_XDP=no])
> -  AM_CONDITIONAL([SUPPORT_AF_XDP],
> -                 [test "$enable_afxdp" = yes && test "$HAVE_LIBBPF" = yes
> && test "$HAVE_IF_XDP" = yes])
> -  AM_COND_IF([SUPPORT_AF_XDP], [
> -    AC_DEFINE([HAVE_AF_XDP], [1], [Define to 1 if AF-XDP support is
> available and enabled.])
> +  AC_MSG_CHECKING([whether AF_XDP is enabled])
> +  if test "$enable_afxdp" != yes; then
> +    AC_MSG_RESULT([no])
> +  else
> +    AC_MSG_RESULT([yes])
> +
> +    AC_CHECK_HEADER([bpf/libbpf.h], [],
> +      [AC_MSG_ERROR([unable to find bpf/libbpf.h for AF_XDP support])])
> +
> +    AC_CHECK_HEADER([linux/if_xdp.h], [],
> +      [AC_MSG_ERROR([unable to find linux/if_xdp.h for AF_XDP support])])
> +
> +    AC_DEFINE([HAVE_AF_XDP], [1],
> +              [Define to 1 if AF-XDP support is available and enabled.])
>      LIBBPF_LDADD=" -lbpf -lelf"
>      AC_SUBST([LIBBPF_LDADD])
> -  ])
> +  fi
>  ])
>
>  dnl OVS_CHECK_DPDK
> ---
>
>
> >  dnl OVS_CHECK_DPDK
> >  dnl
> >  dnl Configure DPDK source tree
> > diff --git a/configure.ac b/configure.ac
> > index 505e3d041e93..29c90b73f836 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -99,6 +99,7 @@ OVS_CHECK_SPHINX
> >  OVS_CHECK_DOT
> >  OVS_CHECK_IF_DL
> >  OVS_CHECK_STRTOK_R
> > +OVS_CHECK_LINUX_AF_XDP
> >  AC_CHECK_DECLS([sys_siglist], [], [], [[#include <signal.h>]])
> >  AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct
> stat.st_mtimensec],
> >    [], [], [[#include <sys/stat.h>]])
> > diff --git a/lib/automake.mk b/lib/automake.mk
> > index cc5dccf39d6b..8b9df5635bbe 100644
> > --- a/lib/automake.mk
> > +++ b/lib/automake.mk
> > @@ -9,6 +9,7 @@ lib_LTLIBRARIES += lib/libopenvswitch.la
> >
> >  lib_libopenvswitch_la_LIBADD = $(SSL_LIBS)
> >  lib_libopenvswitch_la_LIBADD += $(CAPNG_LDADD)
> > +lib_libopenvswitch_la_LIBADD += $(LIBBPF_LDADD)
> >
> >  if WIN32
> >  lib_libopenvswitch_la_LIBADD += ${PTHREAD_LIBS}
> > @@ -327,7 +328,11 @@ lib_libopenvswitch_la_SOURCES = \
> >       lib/lldp/lldpd.c \
> >       lib/lldp/lldpd.h \
> >       lib/lldp/lldpd-structs.c \
> > -     lib/lldp/lldpd-structs.h
> > +     lib/lldp/lldpd-structs.h \
> > +     lib/xdpsock.c \
> > +     lib/xdpsock.h \
> > +     lib/netdev-afxdp.c \
> > +     lib/netdev-afxdp.h
>
> Maybe it's better to move all these files under #ifdef HAVE_AF_XDP ?
>
> >
> >  if WIN32
> >  lib_libopenvswitch_la_SOURCES += \
> > diff --git a/lib/dp-packet.c b/lib/dp-packet.c
> > index 0976a35e758b..a61552f72988 100644
> > --- a/lib/dp-packet.c
> > +++ b/lib/dp-packet.c
> > @@ -22,6 +22,9 @@
> >  #include "netdev-dpdk.h"
> >  #include "openvswitch/dynamic-string.h"
> >  #include "util.h"
> > +#ifdef HAVE_AF_XDP
> > +#include "xdpsock.h"
> > +#endif
> >
> >  static void
> >  dp_packet_init__(struct dp_packet *b, size_t allocated, enum
> dp_packet_source source)
> > @@ -122,6 +125,16 @@ dp_packet_uninit(struct dp_packet *b)
> >               * created as a dp_packet */
> >              free_dpdk_buf((struct dp_packet*) b);
> >  #endif
> > +        } else if (b->source == DPBUF_AFXDP) {
> > +#ifdef HAVE_AF_XDP
> > +            struct dp_packet_afxdp *xpacket;
> > +
> > +            xpacket = dp_packet_cast_afxdp(b);
> > +            if (xpacket->mpool) {
> > +                umem_elem_push(xpacket->mpool, dp_packet_base(b));
> > +            }
> > +#endif
>
> Why not making the same trick as we have for DPDK few lines above?
> i.e. wrap this part in a function like 'free_afxdp_buf' and move it
> to the netdev-afxdp.c ? You will not need to expose so many internals
> to generic code. dp_packet_cast_afxdp() will also be moved there along
> with 'struct dp_packet_afxdp'.
>
> BTW, I hope, someday, I'll finally implement 'dp-packet-memory-provider'
> abstraction for OVS.
>

Hi Ilya,

Can you share more detail about this idea, dp-packet-memory-provider?
Why do we need it?

Thanks
William


>
> > +            return;
> >          }
> >      }
> >  }
> > @@ -248,6 +261,8 @@ dp_packet_resize__(struct dp_packet *b, size_t
> new_headroom, size_t new_tailroom
> >      case DPBUF_STACK:
> >          OVS_NOT_REACHED();
> >
> > +    case DPBUF_AFXDP:
> > +        OVS_NOT_REACHED();
>
> Some space required between cases.
>
> >      case DPBUF_STUB:
> >          b->source = DPBUF_MALLOC;
> >          new_base = xmalloc(new_allocated);
> > @@ -433,6 +448,7 @@ dp_packet_steal_data(struct dp_packet *b)
> >  {
> >      void *p;
> >      ovs_assert(b->source != DPBUF_DPDK);
> > +    ovs_assert(b->source != DPBUF_AFXDP);
> >
> >      if (b->source == DPBUF_MALLOC && dp_packet_data(b) ==
> dp_packet_base(b)) {
> >          p = dp_packet_data(b);
> > diff --git a/lib/dp-packet.h b/lib/dp-packet.h
> > index a5e9ade1244a..774728eef330 100644
> > --- a/lib/dp-packet.h
> > +++ b/lib/dp-packet.h
> > @@ -25,6 +25,10 @@
> >  #include <rte_mbuf.h>
> >  #endif
> >
> > +#ifdef HAVE_AF_XDP
> > +#include "lib/xdpsock.h"
> > +#endif
> > +
> >  #include "netdev-dpdk.h"
> >  #include "openvswitch/list.h"
> >  #include "packets.h"
> > @@ -42,6 +46,7 @@ enum OVS_PACKED_ENUM dp_packet_source {
> >      DPBUF_DPDK,                /* buffer data is from DPDK allocated
> memory.
> >                                  * ref to dp_packet_init_dpdk() in
> dp-packet.c.
> >                                  */
> > +    DPBUF_AFXDP,                /* buffer data from XDP frame */
>
> Please, move the comment one space left.
>
> >  };
> >
> >  #define DP_PACKET_CONTEXT_SIZE 64
> > @@ -89,6 +94,20 @@ struct dp_packet {
> >      };
> >  };
> >
> > +struct dp_packet_afxdp {
> > +    struct umem_pool *mpool;
> > +    struct dp_packet packet;
> > +};
> > +
> > +#if HAVE_AF_XDP
> > +static struct dp_packet_afxdp *
> > +dp_packet_cast_afxdp(const struct dp_packet *d OVS_UNUSED)
> > +{
> > +    ovs_assert(d->source == DPBUF_AFXDP);
> > +    return CONTAINER_OF(d, struct dp_packet_afxdp, packet);
> > +}
> > +#endif
> > +
> >  static inline void *dp_packet_data(const struct dp_packet *);
> >  static inline void dp_packet_set_data(struct dp_packet *, void *);
> >  static inline void *dp_packet_base(const struct dp_packet *);
> > @@ -183,7 +202,21 @@ dp_packet_delete(struct dp_packet *b)
> >              free_dpdk_buf((struct dp_packet*) b);
> >              return;
> >          }
> > -
> > +        if (b->source == DPBUF_AFXDP) {
> > +#ifdef HAVE_AF_XDP
> > +            struct dp_packet_afxdp *xpacket;
> > +
> > +            /* if a packet is received from afxdp port,
> > +             * and tx to a system port. Then we need to
> > +             * push the rx umem back here
> > +             */
> > +            xpacket = dp_packet_cast_afxdp(b);
> > +            if (xpacket->mpool) {
> > +                umem_elem_push(xpacket->mpool, dp_packet_base(b));
> > +            }
> > +#endif
> > +            return;
> > +        }
> >          dp_packet_uninit(b);
> >          free(b);
> >      }
> > diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
> > index 859c05613ddf..e47cf73bf3c9 100644
> > --- a/lib/dpif-netdev-perf.h
> > +++ b/lib/dpif-netdev-perf.h
> > @@ -198,6 +198,19 @@ cycles_counter_update(struct pmd_perf_stats *s)
> >  {
> >  #ifdef DPDK_NETDEV
> >      return s->last_tsc = rte_get_tsc_cycles();
> > +#elif HAVE_AF_XDP
> > +    union {
> > +        uint64_t tsc_64;
> > +        struct {
> > +            uint32_t lo_32;
> > +            uint32_t hi_32;
> > +        };
> > +    } tsc;
> > +    asm volatile("rdtsc" :
> > +             "=a" (tsc.lo_32),
> > +             "=d" (tsc.hi_32));
>
> We need to check that we're on x86 machine.
> Build should fail, I think. For now, you may add following code
> to the head of netdev-afxdp.c:
>
> #if !defined(__i386__) && !defined(__x86_64__)
> #error AF_XDP supported only for Linux on x86 or x86_64
> #endif
>
> > +
> > +    return s->last_tsc = tsc.tsc_64;
> >  #else
> >      return s->last_tsc = 0;
> >  #endif
> > diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
> > new file mode 100644
> > index 000000000000..4c71061fc102
> > --- /dev/null
> > +++ b/lib/netdev-afxdp.c
> > @@ -0,0 +1,589 @@
> > +/*
> > + * Copyright (c) 2018, 2019 Nicira, Inc.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#include <config.h>
> > +#ifdef HAVE_AF_XDP
> > +#include "netdev-linux.h"
> > +#include <errno.h>
> > +#include <fcntl.h>
> > +#include <sys/types.h>
> > +#include <netinet/in.h>
> > +#include <arpa/inet.h>
> > +#include <inttypes.h>
> > +#include <sys/ioctl.h>
> > +#include <sys/socket.h>
> > +#include <sys/utsname.h>
> > +#include <netpacket/packet.h>
> > +#include <net/if.h>
> > +#include <net/if_arp.h>
> > +#include <net/route.h>
> > +#include <poll.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <unistd.h>
> > +
> > +#include "coverage.h"
> > +#include "dp-packet.h"
> > +#include "dpif-netlink.h"
> > +#include "dpif-netdev.h"
> > +#include "openvswitch/dynamic-string.h"
> > +#include "fatal-signal.h"
> > +#include "hash.h"
> > +#include "openvswitch/hmap.h"
> > +#include "netdev-provider.h"
> > +#include "netdev-tc-offloads.h"
> > +#include "netdev-vport.h"
> > +#include "netlink-notifier.h"
> > +#include "netlink-socket.h"
> > +#include "netlink.h"
> > +#include "netnsid.h"
> > +#include "openvswitch/ofpbuf.h"
> > +#include "openflow/openflow.h"
> > +#include "ovs-atomic.h"
> > +#include "packets.h"
> > +#include "openvswitch/poll-loop.h"
> > +#include "rtnetlink.h"
> > +#include "openvswitch/shash.h"
> > +#include "socket-util.h"
> > +#include "sset.h"
> > +#include "tc.h"
> > +#include "timer.h"
> > +#include "unaligned.h"
> > +#include "openvswitch/vlog.h"
> > +#include "util.h"
> > +#include "netdev-afxdp.h"
> > +
> > +#include <linux/if_ether.h>
> > +#include <linux/if_tun.h>
> > +#include <linux/types.h>
> > +#include <linux/ethtool.h>
> > +#include <linux/mii.h>
> > +#include <linux/rtnetlink.h>
> > +#include <linux/sockios.h>
> > +#include <linux/if_xdp.h>
> > +#include "xdpsock.h"
> > +
> > +#ifndef SOL_XDP
> > +#define SOL_XDP 283
> > +#endif
> > +#ifndef AF_XDP
> > +#define AF_XDP 44
> > +#endif
> > +#ifndef PF_XDP
> > +#define PF_XDP AF_XDP
> > +#endif
> > +
> > +VLOG_DEFINE_THIS_MODULE(netdev_afxdp);
> > +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
> > +
> > +#define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base))
> > +#define UMEM2XPKT(base, i) \
> > +    ALIGNED_CAST(struct dp_packet_afxdp *, (char *)base + \
> > +    i * sizeof(struct dp_packet_afxdp))
> > +
> > +static uint32_t opt_xdp_bind_flags = XDP_COPY;
> > +static uint32_t opt_xdp_flags =
> > +                    XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
> > +#ifdef USE_DRVMODE_DEFAULT
>
> If I'll define this, build will fail.
> Should there be ifdef-else-end ?
>
> > +static uint32_t opt_xdp_bind_flags = XDP_ZEROCOPY;
> > +static uint32_t opt_xdp_flags =
> > +                    XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
> > +#endif
> > +static uint32_t prog_id;
> > +
> > +static struct xsk_umem_info *xsk_configure_umem(void *buffer, uint64_t
> size)
> > +{
> > +    struct xsk_umem_info *umem;
> > +    int ret;
> > +    int i;
> > +
> > +    umem = xcalloc(1, sizeof(*umem));
> > +    if (!umem) {
> > +        VLOG_FATAL("xsk config umem failed (%s)", ovs_strerror(errno));
>
> xcalloc can't fail.
>
> > +    }
> > +
> > +    ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq,
> &umem->cq,
> > +                           NULL);
> > +
> > +    if (ret) {
> > +        VLOG_FATAL("xsk umem create failed (%s) mode: %s",
> > +            ovs_strerror(errno),
> > +            opt_xdp_bind_flags == XDP_COPY ? "SKB": "DRV");
>
> Why so FATAL? Can we just return NULL and fail the
> netdev_linux_rxq_construct?
>
> > +    }
> > +
> > +    umem->buffer = buffer;
> > +
> > +    /* set-up umem pool */
> > +    umem_pool_init(&umem->mpool, NUM_FRAMES);
> > +
> > +    for (i = NUM_FRAMES - 1; i >= 0; i--) {
> > +        struct umem_elem *elem;
> > +
> > +        elem = ALIGNED_CAST(struct umem_elem *,
> > +                            (char *)umem->buffer + i * FRAME_SIZE);
> > +        umem_elem_push(&umem->mpool, elem);
> > +    }
> > +
> > +    /* set-up metadata */
> > +    xpacket_pool_init(&umem->xpool, NUM_FRAMES);
> > +
> > +    VLOG_DBG("%s xpacket pool from %p to %p", __func__,
> > +              umem->xpool.array,
> > +              (char *)umem->xpool.array +
> > +              NUM_FRAMES * sizeof(struct dp_packet_afxdp));
> > +
> > +    for (i = NUM_FRAMES - 1; i >= 0; i--) {
> > +        struct dp_packet_afxdp *xpacket;
> > +        struct dp_packet *packet;
> > +
> > +        xpacket = UMEM2XPKT(umem->xpool.array, i);
> > +        xpacket->mpool = &umem->mpool;
> > +
> > +        packet = &xpacket->packet;
> > +        packet->source = DPBUF_AFXDP;
> > +    }
> > +
> > +    return umem;
> > +}
> > +
> > +static struct xsk_socket_info *
> > +xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
> > +                     uint32_t queue_id)
> > +{
> > +    struct xsk_socket_config cfg;
> > +    struct xsk_socket_info *xsk;
> > +    char devname[IF_NAMESIZE];
> > +    uint32_t idx;
> > +    int ret;
> > +    int i;
> > +
> > +    xsk = xcalloc(1, sizeof(*xsk));
> > +    if (!xsk) {
> > +        VLOG_FATAL("xsk calloc failed (%s)", ovs_strerror(errno));
>
> xcalloc can't fail.
>
> > +    }
> > +
> > +    xsk->umem = umem;
> > +    cfg.rx_size = CONS_NUM_DESCS;
> > +    cfg.tx_size = PROD_NUM_DESCS;
> > +    cfg.libbpf_flags = 0;
> > +    cfg.xdp_flags = opt_xdp_flags;
> > +    cfg.bind_flags = opt_xdp_bind_flags;
> > +
> > +    if (if_indextoname(ifindex, devname) == NULL) {
> > +        VLOG_FATAL("ifindex %d devname failed (%s)",
> > +                   ifindex, ovs_strerror(errno));
>
> Every little misconfiguration will lead to aborting. It's probably OK
> for the experimantal feature, but I don't like this.
>
> > +    }
> > +
> > +    ret = xsk_socket__create(&xsk->xsk, devname, queue_id, umem->umem,
> > +                             &xsk->rx, &xsk->tx, &cfg);
> > +    if (ret) {
> > +        VLOG_FATAL("xsk_socket_create failed (%s) mode: %s qid: %d",
> > +                   ovs_strerror(errno),
> > +                   opt_xdp_bind_flags == XDP_COPY ? "SKB": "DRV",
> > +                   queue_id);
> > +    }
> > +
> > +    /* make sure the XDP program is there */
> > +    ret = bpf_get_link_xdp_id(ifindex, &prog_id, opt_xdp_flags);
> > +    if (ret) {
> > +        VLOG_FATAL("get XDP prog ID failed (%s)", ovs_strerror(errno));
> > +    }
> > +
> > +    ret = xsk_ring_prod__reserve(&xsk->umem->fq,
> > +                                 PROD_NUM_DESCS,
> > +                                 &idx);
> > +    if (ret != PROD_NUM_DESCS) {
> > +        VLOG_FATAL("fq set-up failed (%s)", ovs_strerror(errno));
> > +    }
> > +
> > +    for (i = 0;
> > +         i < PROD_NUM_DESCS * FRAME_SIZE;
> > +         i += FRAME_SIZE) {
> > +        struct umem_elem *elem;
> > +        uint64_t addr;
> > +
> > +        elem = umem_elem_pop(&xsk->umem->mpool);
> > +        addr = UMEM2DESC(elem, xsk->umem->buffer);
> > +
> > +        *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = addr;
> > +    }
> > +
> > +    xsk_ring_prod__submit(&xsk->umem->fq,
> > +                          PROD_NUM_DESCS);
> > +    return xsk;
> > +}
> > +
> > +struct xsk_socket_info *
> > +xsk_configure(int ifindex, int xdp_queue_id)
> > +{
> > +    struct xsk_socket_info *xsk;
> > +    struct xsk_umem_info *umem;
> > +    void *bufs;
> > +    int ret;
> > +
> > +    ret = posix_memalign(&bufs, getpagesize(),
> > +                         NUM_FRAMES * FRAME_SIZE);
>
> In the future we'll need to use HAVE_POSIX_MEMALIGN, probably.
>
> Do we need to clear the allocated memory?
>
> > +    ovs_assert(!ret);
> > +
> > +    /* Create sockets... */
> > +    umem = xsk_configure_umem(bufs,
> > +                              NUM_FRAMES * FRAME_SIZE);
> > +    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id);
> > +    return xsk;
> > +}
> > +
> > +static void OVS_UNUSED vlog_hex_dump(const void *buf, size_t count)
> > +{
> > +    struct ds ds = DS_EMPTY_INITIALIZER;
> > +    ds_put_hex_dump(&ds, buf, count, 0, false);
> > +    VLOG_DBG_RL(&rl, "%s", ds_cstr(&ds));
> > +    ds_destroy(&ds);
> > +}
> > +
> > +void
> > +xsk_destroy(struct xsk_socket_info *xsk)
> > +{
> > +    struct xsk_umem *umem;
> > +
> > +    if (!xsk) {
> > +        return;
> > +    }
> > +
> > +    umem = xsk->umem->umem;
> > +    xsk_socket__delete(xsk->xsk);
> > +    (void)xsk_umem__delete(umem);
> > +
> > +    /* cleanup umem pool */
> > +    umem_pool_cleanup(&xsk->umem->mpool);
> > +
> > +    /* cleanup metadata pool */
> > +    xpacket_pool_cleanup(&xsk->umem->xpool);
> > +}
> > +
> > +static inline void OVS_UNUSED
> > +print_xsk_stat(struct xsk_socket_info *xsk OVS_UNUSED) {
> > +    struct xdp_statistics stat;
> > +    socklen_t optlen;
> > +
> > +    optlen = sizeof(stat);
>
> please don't paranthesize the argument of sizeof if it's name of variable.
>
> > +    ovs_assert(getsockopt(xsk_socket__fd(xsk->xsk), SOL_XDP,
> XDP_STATISTICS,
> > +                &stat, &optlen) == 0);
> > +
> > +    VLOG_DBG_RL(&rl, "rx dropped %llu, rx_invalid %llu, tx_invalid
> %llu",
> > +                     stat.rx_dropped,
> > +                     stat.rx_invalid_descs,
> > +                     stat.tx_invalid_descs);
> > +}
> > +
> > +int
> > +netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
> > +                        char **errp OVS_UNUSED)
> > +{
> > +    const char *xdpmode;
> > +    int new_n_rxq;
> > +
> > +    /* TODO: add mutex lock */
> > +    new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
> > +
> > +    if (netdev->n_rxq != new_n_rxq) {
> > +
> > +        if (new_n_rxq > MAX_XSKQ) {
> > +            VLOG_WARN("set n_rxq %d too large", new_n_rxq);
> > +            goto out;
>
> Just return EINVAL.
>
> > +        }
> > +
> > +        netdev->n_rxq = new_n_rxq;
>
> This is wrong. You must not update netdev->n_rxq here. This should
> be done on reconfiguration.
>
> > +        VLOG_INFO("set AF_XDP device %s to %d n_rxq", netdev->name,
> new_n_rxq);
> > +        netdev_request_reconfigure(netdev);
> > +    }
> > +
> > +    xdpmode = smap_get(args, "xdpmode");
> > +    if (xdpmode && strncmp(xdpmode, "drv", 3) == 0) {
> > +        if (opt_xdp_bind_flags != XDP_ZEROCOPY) {
> > +            opt_xdp_bind_flags = XDP_ZEROCOPY;
> > +            opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST |
> XDP_FLAGS_DRV_MODE;
> > +        }
> > +        VLOG_INFO("AF_XDP device %s in ZC driver mode", netdev->name);
> > +    } else {
> > +        opt_xdp_bind_flags = XDP_COPY;
> > +        opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST |
> XDP_FLAGS_SKB_MODE;
> > +        VLOG_INFO("AF_XDP device %s in SKB mode", netdev->name);
> > +    }
>
> Looks like changing "xdpmode" while port already added will
> lead to incorrect work. You, probably, need to forbid this or
> prepare the proper reconfiguration process.
>
> > +
> > +out:
> > +    return 0;
> > +}
> > +
> > +int
> > +netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args)
> > +{
> > +    /* TODO: add mutex lock */
> > +    smap_add_format(args, "n_rxq", "%d", netdev->n_rxq);
> > +    smap_add_format(args, "xdpmode", "%s",
> > +        opt_xdp_bind_flags == XDP_ZEROCOPY ? "drv" : "skb");
> > +
> > +    return 0;
> > +}
> > +
> > +int
> > +netdev_afxdp_get_numa_id(const struct netdev *netdev)
> > +{
> > +    /* FIXME: Get netdev's PCIe device ID, then find
> > +     * its NUMA node id.
> > +     */
> > +    VLOG_INFO("FIXME: Device %s always use numa id 0", netdev->name);
> > +    return 0;
> > +}
> > +
> > +void
> > +xsk_remove_xdp_program(uint32_t ifindex)
> > +{
> > +    uint32_t curr_prog_id = 0;
> > +
> > +    /* remove_xdp_program() */
> > +    if (bpf_get_link_xdp_id(ifindex, &curr_prog_id, opt_xdp_flags)) {
> > +        bpf_set_link_xdp_fd(ifindex, -1, opt_xdp_flags);
> > +    }
> > +    if (prog_id == curr_prog_id) {
> > +        bpf_set_link_xdp_fd(ifindex, -1, opt_xdp_flags);
> > +    } else if (!curr_prog_id) {
> > +        VLOG_WARN("couldn't find a prog id on a given interface");
> > +    } else {
> > +        VLOG_WARN("program on interface changed, not removing");
> > +    }
> > +}
> > +
> > +/* Receive packet from AF_XDP socket */
> > +int
> > +netdev_linux_rxq_xsk(struct xsk_socket_info *xsk,
> > +                     struct dp_packet_batch *batch)
> > +{
> > +    unsigned int rcvd, i;
> > +    uint32_t idx_rx = 0, idx_fq = 0;
> > +    int ret = 0;
> > +
> > +    /* See if there is any packet on RX queue,
> > +     * if yes, idx_rx is the index having the packet.
> > +     */
> > +    rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx);
> > +    if (!rcvd) {
> > +        return 0;
> > +    }
> > +
> > +    /* Form a dp_packet batch from descriptor in RX queue */
> > +    for (i = 0; i < rcvd; i++) {
> > +        uint64_t addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
> > +        uint32_t len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->len;
> > +        char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
> > +        uint64_t index;
> > +
> > +        struct dp_packet_afxdp *xpacket;
> > +        struct dp_packet *packet;
> > +
> > +        index = addr >> FRAME_SHIFT;
> > +        xpacket = UMEM2XPKT(xsk->umem->xpool.array, index);
> > +
> > +        packet = &xpacket->packet;
> > +        xpacket->mpool = &xsk->umem->mpool;
> > +
> > +        if (packet->source != DPBUF_AFXDP) {
> > +            /* FIXME: might be a bug */
>
> Need to log something here. Rate-limited.
>
> > +            continue;
> > +        }
> > +
> > +        /* Initialize the struct dp_packet */
> > +        if (opt_xdp_bind_flags == XDP_ZEROCOPY) {
> > +            dp_packet_set_base(packet, pkt - FRAME_HEADROOM);
> > +        } else {
> > +            /* SKB mode */
> > +            dp_packet_set_base(packet, pkt);
> > +        }
> > +        dp_packet_set_data(packet, pkt);
> > +        dp_packet_set_size(packet, len);
> > +
> > +        /* Add packet into batch, increase batch->count */
> > +        dp_packet_batch_add(batch, packet);
> > +
> > +        idx_rx++;
> > +    }
> > +
> > +    /* We've consume rcvd packets in RX, now re-fill the
> > +     * same number back to FILL queue.
> > +     */
> > +    for (i = 0; i < rcvd; i++) {
> > +        uint64_t index;
> > +        struct umem_elem *elem;
> > +
> > +        ret = xsk_ring_prod__reserve(&xsk->umem->fq, 1, &idx_fq);
> > +        while (ret == 0) {
> > +            /* The FILL queue is full, so retry. (or skip)? */
> > +            ret = xsk_ring_prod__reserve(&xsk->umem->fq, 1, &idx_fq);
> > +        }
> > +
> > +        /* Get one free umem, program it into FILL queue */
> > +        elem = umem_elem_pop(&xsk->umem->mpool);
> > +        index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
> > +        ovs_assert((index & FRAME_SHIFT_MASK) == 0);
> > +        *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq) = index;
> > +
> > +        idx_fq++;
> > +    }
> > +    xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
> > +
> > +    /* Release the RX queue */
> > +    xsk_ring_cons__release(&xsk->rx, rcvd);
> > +    xsk->rx_npkts += rcvd;
> > +
> > +#ifdef AFXDP_DEBUG
> > +    print_xsk_stat(xsk);
> > +#endif
> > +    return 0;
> > +}
> > +
> > +static void kick_tx(struct xsk_socket_info *xsk)
> > +{
> > +    int ret;
> > +
> > +    ret = sendto(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL,
> 0);
> > +    if (ret >= 0 || errno == ENOBUFS || errno == EAGAIN || errno ==
> EBUSY) {
> > +        return;
> > +    }
> > +}
> > +
> > +int
> > +netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk,
> > +                              struct dp_packet_batch *batch)
> > +{
> > +    uint32_t tx_done, idx_cq = 0;
> > +    struct dp_packet *packet;
> > +    uint32_t idx;
> > +    int j;
> > +
> > +    /* Make sure we have enough TX descs */
> > +    if (xsk_ring_prod__reserve(&xsk->tx, batch->count, &idx) == 0) {
> > +        return -EAGAIN;
> > +    }
> > +
> > +    DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
> > +        struct dp_packet_afxdp *xpacket;
> > +        struct umem_elem *elem;
> > +        uint64_t index;
> > +
> > +        elem = umem_elem_pop(&xsk->umem->mpool);
> > +        if (!elem) {
> > +            return -EAGAIN;
> > +        }
> > +
> > +        memcpy(elem, dp_packet_data(packet), dp_packet_size(packet));
> > +
> > +        index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
> > +        xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->addr = index;
> > +        xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->len
> > +            = dp_packet_size(packet);
> > +
> > +        if (packet->source == DPBUF_AFXDP) {
> > +            xpacket = dp_packet_cast_afxdp(packet);
> > +            umem_elem_push(xpacket->mpool, dp_packet_base(packet));
> > +             /* Avoid freeing it twice at dp_packet_uninit */
> > +            xpacket->mpool = NULL;
>
> Why you're freeing packets here? 'netdev_linux_send' will do that for you.
>
> > +        }
> > +    }
> > +    xsk_ring_prod__submit(&xsk->tx, batch->count);
> > +    xsk->outstanding_tx += batch->count;
> > +
> > +retry:
> > +    kick_tx(xsk);
> > +
> > +    /* Process CQ */
>
> Maybe it's better to process CQ on rx ?
> It's unknown when we'll be here next time, but we'll definitely
> call rx function soon.
>
> > +    tx_done = xsk_ring_cons__peek(&xsk->umem->cq, batch->count,
> &idx_cq);
> > +    if (tx_done > 0) {
> > +        xsk->outstanding_tx -= tx_done;
> > +        xsk->tx_npkts += tx_done;
> > +    }
> > +
> > +    /* Recycle back to umem pool */
> > +    for (j = 0; j < tx_done; j++) {
> > +        struct umem_elem *elem;
> > +        uint64_t addr;
> > +
> > +        addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++);
> > +
> > +        elem = ALIGNED_CAST(struct umem_elem *,
> > +                            (char *)xsk->umem->buffer + addr);
> > +        umem_elem_push(&xsk->umem->mpool, elem);
> > +    }
> > +    xsk_ring_cons__release(&xsk->umem->cq, tx_done);
> > +
> > +    if (xsk->outstanding_tx > PROD_NUM_DESCS - (PROD_NUM_DESCS >> 2)) {
> > +        /* If there are still a lot not transmitted,
> > +         * try harder.
> > +         */
> > +        goto retry;
> > +    }
> > +
> > +    return 0;
> > +}
> > +
> > +#else
> > +#include "openvswitch/compiler.h"
> > +#include "netdev-afxdp.h"
> > +
> > +struct xsk_socket_info *
> > +xsk_configure(int ifindex OVS_UNUSED, int xdp_queue_id OVS_UNUSED)
> > +{
> > +    return NULL;
> > +}
> > +
> > +void
> > +xsk_destroy(struct xsk_socket_info *xsk OVS_UNUSED)
> > +{
> > +}
> > +
> > +int
> > +netdev_linux_rxq_xsk(struct xsk_socket_info *xsk OVS_UNUSED,
> > +                     struct dp_packet_batch *batch OVS_UNUSED)
> > +{
> > +    return 0;
> > +}
> > +
> > +int
> > +netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk OVS_UNUSED,
> > +                              struct dp_packet_batch *batch OVS_UNUSED)
> > +{
> > +    return 0;
> > +}
> > +
> > +int
> > +netdev_afxdp_set_config(struct netdev *netdev OVS_UNUSED,
> > +                        const struct smap *args OVS_UNUSED,
> > +                        char **errp OVS_UNUSED)
> > +{
> > +    return 0;
> > +}
> > +
> > +int
> > +netdev_afxdp_get_config(const struct netdev *netdev OVS_UNUSED,
> > +                        struct smap *args OVS_UNUSED)
> > +{
> > +    return 0;
> > +}
> > +
> > +int
> > +netdev_afxdp_get_numa_id(const struct netdev *netdev OVS_UNUSED)
> > +{
> > +    return 0;
> > +}
> > +#endif
> > diff --git a/lib/netdev-afxdp.h b/lib/netdev-afxdp.h
> > new file mode 100644
> > index 000000000000..ea05612a7c0f
> > --- /dev/null
> > +++ b/lib/netdev-afxdp.h
> > @@ -0,0 +1,47 @@
> > +/*
> > + * Copyright (c) 2018 Nicira, Inc.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +
> > +#ifndef NETDEV_AFXDP_H
> > +#define NETDEV_AFXDP_H 1
> > +
> > +#include <stdint.h>
> > +#include <stdbool.h>
> > +
> > +/* These functions are Linux AF_XDP specific, so they should be used
> directly
> > + * only by Linux-specific code. */
> > +#define MAX_XSKQ 16
> > +struct netdev;
> > +struct xsk_socket_info;
> > +struct xdp_umem;
> > +struct dp_packet_batch;
> > +struct smap;
> > +
> > +struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id);
> > +void xsk_destroy(struct xsk_socket_info *xsk);
> > +
> > +int netdev_linux_rxq_xsk(struct xsk_socket_info *xsk,
> > +                         struct dp_packet_batch *batch);
> > +
> > +int netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk,
> > +                                  struct dp_packet_batch *batch);
> > +
> > +void xsk_remove_xdp_program(uint32_t ifindex);
> > +int netdev_afxdp_set_config(struct netdev *netdev, const struct smap
> *args,
> > +                            char **errp);
> > +int netdev_afxdp_get_config(const struct netdev *netdev, struct smap
> *args);
> > +int netdev_afxdp_get_numa_id(const struct netdev *netdev);
> > +
> > +#endif /* netdev-afxdp.h */
> > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
> > index f75d73fd39f8..337760ca3333 100644
> > --- a/lib/netdev-linux.c
> > +++ b/lib/netdev-linux.c
> > @@ -75,6 +75,7 @@
> >  #include "unaligned.h"
> >  #include "openvswitch/vlog.h"
> >  #include "util.h"
> > +#include "netdev-afxdp.h"
> >
> >  VLOG_DEFINE_THIS_MODULE(netdev_linux);
> >
> > @@ -531,6 +532,7 @@ struct netdev_linux {
> >
> >      /* LAG information. */
> >      bool is_lag_master;         /* True if the netdev is a LAG master.
> */
> > +    struct xsk_socket_info *xsk[MAX_XSKQ]; /* af_xdp socket */
> >  };
> >
> >  struct netdev_rxq_linux {
> > @@ -580,12 +582,18 @@ is_netdev_linux_class(const struct netdev_class
> *netdev_class)
> >  }
> >
> >  static bool
> > +is_afxdp_netdev(const struct netdev *netdev)
> > +{
> > +    return netdev_get_class(netdev) == &netdev_afxdp_class;
> > +}
> > +
> > +static bool
> >  is_tap_netdev(const struct netdev *netdev)
> >  {
> >      return netdev_get_class(netdev) == &netdev_tap_class;
> >  }
> >
> > -static struct netdev_linux *
> > +struct netdev_linux *
> >  netdev_linux_cast(const struct netdev *netdev)
> >  {
> >      ovs_assert(is_netdev_linux_class(netdev_get_class(netdev)));
> > @@ -1084,6 +1092,25 @@ netdev_linux_destruct(struct netdev *netdev_)
> >          atomic_count_dec(&miimon_cnt);
> >      }
> >
> > +#if HAVE_AF_XDP
> > +    if (is_afxdp_netdev(netdev_)) {
> > +        int ifindex;
> > +        int ret, i;
> > +
> > +        ret = get_ifindex(netdev_, &ifindex);
> > +        if (ret) {
> > +            VLOG_ERR("get ifindex error");
> > +        } else {
> > +            for (i = 0; i < MAX_XSKQ; i++) {
> > +                if (netdev->xsk[i]) {
> > +                    VLOG_INFO("destroy xsk[%d]", i);
> > +                    xsk_destroy(netdev->xsk[i]);
> > +                }
> > +            }
> > +            xsk_remove_xdp_program(ifindex);
> > +        }
> > +    }
> > +#endif
> >      ovs_mutex_destroy(&netdev->mutex);
> >  }
> >
> > @@ -1113,6 +1140,32 @@ netdev_linux_rxq_construct(struct netdev_rxq
> *rxq_)
> >      rx->is_tap = is_tap_netdev(netdev_);
> >      if (rx->is_tap) {
> >          rx->fd = netdev->tap_fd;
> > +    } else if (is_afxdp_netdev(netdev_)) {
> > +#if HAVE_AF_XDP
> > +        struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
> > +        int ifindex;
> > +        int xdp_queue_id = rxq_->queue_id;
> > +        struct xsk_socket_info *xsk;
> > +
> > +        if (setrlimit(RLIMIT_MEMLOCK, &r)) {
> > +            VLOG_ERR("ERROR: setrlimit(RLIMIT_MEMLOCK) \"%s\"\n",
> > +                      ovs_strerror(errno));
> > +            ovs_assert(0);
> > +        }
> > +
> > +        VLOG_DBG("%s: %s: queue=%d configuring xdp sock",
> > +                  __func__, netdev_->name, xdp_queue_id);
> > +
> > +        /* Get ethernet device index. */
> > +        error = get_ifindex(&netdev->up, &ifindex);
> > +        if (error) {
> > +            goto error;
> > +        }
> > +
> > +        xsk = xsk_configure(ifindex, xdp_queue_id);
> > +        netdev->xsk[xdp_queue_id] = xsk;
> > +        rx->fd = xsk_socket__fd(xsk->xsk); /* for netdev layer to poll
> */
> > +#endif
> >      } else {
> >          struct sockaddr_ll sll;
> >          int ifindex, val;
> > @@ -1318,9 +1371,16 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_,
> struct dp_packet_batch *batch,
> >  {
> >      struct netdev_rxq_linux *rx = netdev_rxq_linux_cast(rxq_);
> >      struct netdev *netdev = rx->up.netdev;
> > -    struct dp_packet *buffer;
> > +    struct dp_packet *buffer = NULL;
> >      ssize_t retval;
> >      int mtu;
> > +    struct netdev_linux *netdev_ = netdev_linux_cast(netdev);
> > +
> > +    if (is_afxdp_netdev(netdev)) {
> > +        int qid = rxq_->queue_id;
> > +
> > +        return netdev_linux_rxq_xsk(netdev_->xsk[qid], batch);
> > +    }
> >
> >      if (netdev_linux_get_mtu__(netdev_linux_cast(netdev), &mtu)) {
> >          mtu = ETH_PAYLOAD_MAX;
> > @@ -1329,6 +1389,7 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_,
> struct dp_packet_batch *batch,
> >      /* Assume Ethernet port. No need to set packet_type. */
> >      buffer = dp_packet_new_with_headroom(VLAN_ETH_HEADER_LEN + mtu,
> >                                             DP_NETDEV_HEADROOM);
> > +
> >      retval = (rx->is_tap
> >                ? netdev_linux_rxq_recv_tap(rx->fd, buffer)
> >                : netdev_linux_rxq_recv_sock(rx->fd, buffer));
> > @@ -1473,14 +1534,15 @@ netdev_linux_tap_batch_send(struct netdev
> *netdev_,
> >   * The kernel maintains a packet transmission queue, so the caller is
> not
> >   * expected to do additional queuing of packets. */
> >  static int
> > -netdev_linux_send(struct netdev *netdev_, int qid OVS_UNUSED,
> > +netdev_linux_send(struct netdev *netdev_, int qid,
> >                    struct dp_packet_batch *batch,
> >                    bool concurrent_txq OVS_UNUSED)
> >  {
> >      int error = 0;
> >      int sock = 0;
> >
> > -    if (!is_tap_netdev(netdev_)) {
> > +    if (!is_tap_netdev(netdev_) &&
> > +        !is_afxdp_netdev(netdev_)) {
> >          if (netdev_linux_netnsid_is_remote(netdev_linux_cast(netdev_)))
> {
> >              error = EOPNOTSUPP;
> >              goto free_batch;
> > @@ -1499,6 +1561,10 @@ netdev_linux_send(struct netdev *netdev_, int qid
> OVS_UNUSED,
> >          }
> >
> >          error = netdev_linux_sock_batch_send(sock, ifindex, batch);
> > +    } else if (is_afxdp_netdev(netdev_)) {
> > +        struct netdev_linux *netdev = netdev_linux_cast(netdev_);
> > +
> > +        error = netdev_linux_afxdp_batch_send(netdev->xsk[qid], batch);
> >      } else {
> >          error = netdev_linux_tap_batch_send(netdev_, batch);
> >      }
> > @@ -3323,6 +3389,7 @@ const struct netdev_class netdev_linux_class = {
> >      NETDEV_LINUX_CLASS_COMMON,
> >      LINUX_FLOW_OFFLOAD_API,
> >      .type = "system",
> > +    .is_pmd = false,
> >      .construct = netdev_linux_construct,
> >      .get_stats = netdev_linux_get_stats,
> >      .get_features = netdev_linux_get_features,
> > @@ -3333,6 +3400,7 @@ const struct netdev_class netdev_linux_class = {
> >  const struct netdev_class netdev_tap_class = {
> >      NETDEV_LINUX_CLASS_COMMON,
> >      .type = "tap",
> > +    .is_pmd = false,
> >      .construct = netdev_linux_construct_tap,
> >      .get_stats = netdev_tap_get_stats,
> >      .get_features = netdev_linux_get_features,
> > @@ -3343,10 +3411,23 @@ const struct netdev_class netdev_internal_class
> = {
> >      NETDEV_LINUX_CLASS_COMMON,
> >      LINUX_FLOW_OFFLOAD_API,
> >      .type = "internal",
> > +    .is_pmd = false,
> >      .construct = netdev_linux_construct,
> >      .get_stats = netdev_internal_get_stats,
> >      .get_status = netdev_internal_get_status,
> >  };
> > +
> > +const struct netdev_class netdev_afxdp_class = {
> > +    NETDEV_LINUX_CLASS_COMMON,
> > +    .type = "afxdp",
> > +    .is_pmd = true,
> > +    .construct = netdev_linux_construct,
> > +    .get_stats = netdev_linux_get_stats,
> > +    .get_status = netdev_linux_get_status,
> > +    .set_config = netdev_afxdp_set_config,
> > +    .get_config = netdev_afxdp_get_config,
> > +    .get_numa_id = netdev_afxdp_get_numa_id,
> > +};
> >
> >
> >  #define CODEL_N_QUEUES 0x0000
> > diff --git a/lib/netdev-linux.h b/lib/netdev-linux.h
> > index 17ca9120168a..afcb20ee8d0a 100644
> > --- a/lib/netdev-linux.h
> > +++ b/lib/netdev-linux.h
> > @@ -28,6 +28,7 @@ struct netdev;
> >  int netdev_linux_ethtool_set_flag(struct netdev *netdev, uint32_t flag,
> >                                    const char *flag_name, bool enable);
> >  int linux_get_ifindex(const char *netdev_name);
> > +struct netdev_linux *netdev_linux_cast(const struct netdev *netdev);
> >
> >  #define LINUX_FLOW_OFFLOAD_API                          \
> >     .flow_flush = netdev_tc_flow_flush,                  \
> > diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
> > index fb0c27e6e8e8..5bf041316503 100644
> > --- a/lib/netdev-provider.h
> > +++ b/lib/netdev-provider.h
> > @@ -902,6 +902,7 @@ extern const struct netdev_class netdev_linux_class;
> >  #endif
> >  extern const struct netdev_class netdev_internal_class;
> >  extern const struct netdev_class netdev_tap_class;
> > +extern const struct netdev_class netdev_afxdp_class;
> >
> >  #ifdef  __cplusplus
> >  }
> > diff --git a/lib/netdev.c b/lib/netdev.c
> > index 7d7ecf6f0946..c30016b34033 100644
> > --- a/lib/netdev.c
> > +++ b/lib/netdev.c
> > @@ -145,6 +145,7 @@ netdev_initialize(void)
> >          netdev_register_provider(&netdev_linux_class);
> >          netdev_register_provider(&netdev_internal_class);
> >          netdev_register_provider(&netdev_tap_class);
> > +        netdev_register_provider(&netdev_afxdp_class);
> >          netdev_vport_tunnel_register();
> >  #endif
> >  #if defined(__FreeBSD__) || defined(__NetBSD__)
> > diff --git a/lib/xdpsock.c b/lib/xdpsock.c
> > new file mode 100644
> > index 000000000000..f9fe94b9e36a
> > --- /dev/null
> > +++ b/lib/xdpsock.c
> > @@ -0,0 +1,210 @@
> > +/*
> > + * Copyright (c) 2018, 2019 Nicira, Inc.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +#include <config.h>
> > +#include <ctype.h>
> > +#include <errno.h>
> > +#include <fcntl.h>
> > +#include <stdarg.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <sys/stat.h>
> > +#include <sys/types.h>
> > +#include <syslog.h>
> > +#include <time.h>
> > +#include <unistd.h>
> > +#include "openvswitch/vlog.h"
> > +#include "async-append.h"
> > +#include "coverage.h"
> > +#include "dirs.h"
> > +#include "ovs-thread.h"
> > +#include "sat-math.h"
> > +#include "socket-util.h"
> > +#include "svec.h"
> > +#include "syslog-direct.h"
> > +#include "syslog-libc.h"
> > +#include "syslog-provider.h"
> > +#include "timeval.h"
> > +#include "unixctl.h"
> > +#include "util.h"
> > +#include "ovs-atomic.h"
> > +#include "openvswitch/compiler.h"
> > +#include "dp-packet.h"
> > +
> > +#ifdef HAVE_AF_XDP
> > +#include "xdpsock.h"
> > +
> > +static inline void ovs_spinlock_init(ovs_spinlock_t *sl)
> > +{
> > +    sl->locked = 0;
> > +}
> > +
> > +static inline void ovs_spin_lock(ovs_spinlock_t *sl)
> > +{
> > +    int exp = 0;
> > +
> > +    while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0,
> > +                __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
> > +        while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED)) {
>
>
> These atomics are compiler specific. Please use:
>
>     while (!atomic_compare_exchange_strong_explicit(&sl->locked, &exp, 1,
>                                                     memory_order_acquire,
>                                                     memory_order_relaxed))
> {
>         locked = 1;
>         while (locked) {
>             atomic_read_relaxed(&sl->locked, &locked);
>         }
>         exp = 0;
>     }
>
> > +            ;
> > +        }
> > +        exp = 0;
> > +    }
> > +}
> > +
> > +static inline void ovs_spin_unlock(ovs_spinlock_t *sl)
> > +{
> > +    __atomic_store_n(&sl->locked, 0, __ATOMIC_RELEASE);
>
>     atomic_store_explicit(&sl->locked, 0, memory_order_release);
>
> > +}
> > +
> > +static inline int OVS_UNUSED ovs_spin_trylock(ovs_spinlock_t *sl)
> > +{
> > +    int exp = 0;
> > +    return __atomic_compare_exchange_n(&sl->locked, &exp, 1,
> > +              0, /* disallow spurious failure */
> > +               __ATOMIC_ACQUIRE, __ATOMIC_RELAXED);
>
>
>     return atomic_compare_exchange_strong_explicit(&sl->locked, &exp, 1,
>                                                    memory_order_acquire,
>                                                    memory_order_relaxed);
>
>
> > +}
> > +
> > +void
> > +__umem_elem_push_n(struct umem_pool *umemp OVS_UNUSED, void **addrs,
> int n)
> > +{
> > +    void *ptr;
> > +
> > +    if (OVS_UNLIKELY(umemp->index + n > umemp->size)) {
> > +        OVS_NOT_REACHED();
> > +    }
> > +
> > +    ptr = &umemp->array[umemp->index];
> > +    memcpy(ptr, addrs, n * sizeof(void *));
> > +    umemp->index += n;
> > +}
> > +
> > +inline void
> > +__umem_elem_push(struct umem_pool *umemp OVS_UNUSED, void *addr)
> > +{
> > +    umemp->array[umemp->index++] = addr;
> > +}
> > +
> > +void
> > +umem_elem_push(struct umem_pool *umemp OVS_UNUSED, void *addr)
> > +{
> > +
> > +    if (OVS_UNLIKELY(umemp->index >= umemp->size)) {
> > +        /* stack is full */
> > +        /* it's possible that one umem gets pushed twice,
> > +         * because actions=1,2,3... multiple ports?
> > +        */
> > +        OVS_NOT_REACHED();
> > +    }
> > +
> > +    ovs_assert(((uint64_t)addr & FRAME_SHIFT_MASK) == 0);
> > +
> > +    ovs_spin_lock(&umemp->mutex);
> > +    __umem_elem_push(umemp, addr);
> > +    ovs_spin_unlock(&umemp->mutex);
> > +}
> > +
> > +void
> > +__umem_elem_pop_n(struct umem_pool *umemp OVS_UNUSED, void **addrs, int
> n)
> > +{
> > +    void *ptr;
> > +
> > +    umemp->index -= n;
> > +
> > +    if (OVS_UNLIKELY(umemp->index < 0)) {
> > +        OVS_NOT_REACHED();
> > +    }
> > +
> > +    ptr = &umemp->array[umemp->index];
> > +    memcpy(addrs, ptr, n * sizeof(void *));
> > +}
> > +
> > +inline void *
> > +__umem_elem_pop(struct umem_pool *umemp OVS_UNUSED)
> > +{
> > +    return umemp->array[--umemp->index];
> > +}
> > +
> > +void *
> > +umem_elem_pop(struct umem_pool *umemp OVS_UNUSED)
> > +{
> > +    void *ptr;
> > +
> > +    ovs_spin_lock(&umemp->mutex);
> > +    ptr = __umem_elem_pop(umemp);
> > +    ovs_spin_unlock(&umemp->mutex);
> > +
> > +    return ptr;
> > +}
> > +
> > +void **
> > +__umem_pool_alloc(unsigned int size)
> > +{
> > +    void *bufs;
> > +
> > +    ovs_assert(posix_memalign(&bufs, getpagesize(),
> > +                              size * sizeof(void *)) == 0);
> > +    memset(bufs, 0, size * sizeof(void *));
> > +    return (void **)bufs;
> > +}
> > +
> > +unsigned int
> > +umem_elem_count(struct umem_pool *mpool)
> > +{
> > +    return mpool->index;
> > +}
> > +
> > +int
> > +umem_pool_init(struct umem_pool *umemp OVS_UNUSED, unsigned int size)
> > +{
> > +    umemp->array = __umem_pool_alloc(size);
> > +    if (!umemp->array) {
> > +        OVS_NOT_REACHED();
> > +    }
> > +
> > +    umemp->size = size;
> > +    umemp->index = 0;
> > +    ovs_spinlock_init(&umemp->mutex);
> > +    return 0;
> > +}
> > +
> > +void
> > +umem_pool_cleanup(struct umem_pool *umemp OVS_UNUSED)
> > +{
> > +    free(umemp->array);
> > +}
> > +
> > +/* AF_XDP metadata init/destroy */
> > +int
> > +xpacket_pool_init(struct xpacket_pool *xp, unsigned int size)
> > +{
> > +    void *bufs;
> > +
> > +    ovs_assert(posix_memalign(&bufs, getpagesize(),
> > +                              size * sizeof(struct dp_packet_afxdp)) ==
> 0);
> > +    memset(bufs, 0, size * sizeof(struct dp_packet_afxdp));
> > +
> > +    xp->array = bufs;
> > +    xp->size = size;
> > +    return 0;
> > +}
> > +
> > +void
> > +xpacket_pool_cleanup(struct xpacket_pool *xp)
> > +{
> > +    free(xp->array);
> > +}
> > +#else   /* !HAVE_AF_XDP below */
> > +#endif
> > diff --git a/lib/xdpsock.h b/lib/xdpsock.h
> > new file mode 100644
> > index 000000000000..cb64befe7dba
> > --- /dev/null
> > +++ b/lib/xdpsock.h
> > @@ -0,0 +1,133 @@
> > +/*
> > + * Copyright (c) 2018, 2019 Nicira, Inc.
> > + *
> > + * Licensed under the Apache License, Version 2.0 (the "License");
> > + * you may not use this file except in compliance with the License.
> > + * You may obtain a copy of the License at:
> > + *
> > + *     http://www.apache.org/licenses/LICENSE-2.0
> > + *
> > + * Unless required by applicable law or agreed to in writing, software
> > + * distributed under the License is distributed on an "AS IS" BASIS,
> > + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
> implied.
> > + * See the License for the specific language governing permissions and
> > + * limitations under the License.
> > + */
> > +#ifndef XDPSOCK_H
> > +#define XDPSOCK_H 1
> > +#include <errno.h>
> > +#include <getopt.h>
> > +#include <libgen.h>
> > +#include <linux/bpf.h>
> > +#include <linux/if_link.h>
> > +#include <linux/if_xdp.h>
> > +#include <linux/if_ether.h>
> > +#include <net/if.h>
> > +#include <signal.h>
> > +#include <stdbool.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <net/ethernet.h>
> > +#include <sys/resource.h>
> > +#include <sys/socket.h>
> > +#include <sys/mman.h>
> > +#include <time.h>
> > +#include <unistd.h>
> > +#include <pthread.h>
> > +#include <locale.h>
> > +#include <sys/types.h>
> > +#include <poll.h>
> > +#include <bpf/libbpf.h>
> > +
> > +#include "ovs-atomic.h"
> > +#include "openvswitch/thread.h"
> > +
> > +/* bpf/xsk.h uses the following macros not defined in OVS,
> > + * so re-define them before include.
> > + */
> > +#define unlikely OVS_UNLIKELY
> > +#define likely OVS_LIKELY
> > +#define barrier() __asm__ __volatile__("": : :"memory")
> > +#define smp_rmb() barrier()
> > +#define smp_wmb() barrier()
>
> These barriers also x86 specific. We'll need to fix that in
> the future before removing build constraints.
>
> > +#include <bpf/xsk.h>
> > +
> > +#define FRAME_HEADROOM  XDP_PACKET_HEADROOM
> > +#define FRAME_SIZE      XSK_UMEM__DEFAULT_FRAME_SIZE
> > +#define BATCH_SIZE      NETDEV_MAX_BURST
> > +#define FRAME_SHIFT     XSK_UMEM__DEFAULT_FRAME_SHIFT
> > +#define FRAME_SHIFT_MASK    ((1<<FRAME_SHIFT)-1)
> > +
> > +#define NUM_FRAMES  1024
> > +#define PROD_NUM_DESCS 128
> > +#define CONS_NUM_DESCS 128
> > +
> > +#ifdef USE_XSK_DEFAULT
> > +#define PROD_NUM_DESCS XSK_RING_PROD__DEFAULT_NUM_DESCS
> > +#define CONS_NUM_DESCS XSK_RING_CONS__DEFAULT_NUM_DESCS
> > +#endif
> > +
> > +typedef struct {
> > +    volatile int locked;
>
> atomic_int locked;
>
> or atomic_bool.
>
> > +} ovs_spinlock_t;
> > +
> > +/* LIFO ptr_array */
> > +struct umem_pool {
> > +    int index;      /* point to top */
> > +    unsigned int size;
> > +    ovs_spinlock_t mutex;
> > +    void **array;   /* a pointer array */
> > +};
> > +
> > +/* array-based dp_packet_afxdp */
> > +struct xpacket_pool {
> > +    unsigned int size;
> > +    struct dp_packet_afxdp **array;
> > +};
> > +
> > +struct xsk_umem_info {
> > +    struct umem_pool mpool;
> > +    struct xpacket_pool xpool;
> > +    struct xsk_ring_prod fq;
> > +    struct xsk_ring_cons cq;
> > +    struct xsk_umem *umem;
> > +    void *buffer;
> > +};
> > +
> > +struct xsk_socket_info {
> > +    struct xsk_ring_cons rx;
> > +    struct xsk_ring_prod tx;
> > +    struct xsk_umem_info *umem;
> > +    struct xsk_socket *xsk;
> > +    unsigned long rx_npkts;
> > +    unsigned long tx_npkts;
> > +    unsigned long prev_rx_npkts;
> > +    unsigned long prev_tx_npkts;
> > +    uint32_t outstanding_tx;
> > +};
> > +
> > +struct umem_elem_head {
> > +    unsigned int index;
> > +    struct ovs_mutex mutex;
> > +    uint32_t n;
> > +};
> > +
> > +struct umem_elem {
> > +    struct umem_elem *next;
> > +};
> > +
> > +void __umem_elem_push(struct umem_pool *umemp, void *addr);
> > +void umem_elem_push(struct umem_pool *umemp, void *addr);
> > +void *__umem_elem_pop(struct umem_pool *umemp);
> > +void *umem_elem_pop(struct umem_pool *umemp);
> > +void **__umem_pool_alloc(unsigned int size);
> > +int umem_pool_init(struct umem_pool *umemp, unsigned int size);
> > +void umem_pool_cleanup(struct umem_pool *umemp);
> > +unsigned int umem_elem_count(struct umem_pool *mpool);
> > +void __umem_elem_pop_n(struct umem_pool *umemp, void **addrs, int n);
> > +void __umem_elem_push_n(struct umem_pool *umemp, void **addrs, int n);
> > +int xpacket_pool_init(struct xpacket_pool *xp, unsigned int size);
> > +void xpacket_pool_cleanup(struct xpacket_pool *xp);
> > +
> > +#endif
> > diff --git a/tests/automake.mk b/tests/automake.mk
> > index ea16532dd2a0..715cef9a6b3b 100644
> > --- a/tests/automake.mk
> > +++ b/tests/automake.mk
> > @@ -4,12 +4,14 @@ EXTRA_DIST += \
> >       $(SYSTEM_TESTSUITE_AT) \
> >       $(SYSTEM_KMOD_TESTSUITE_AT) \
> >       $(SYSTEM_USERSPACE_TESTSUITE_AT) \
> > +     $(SYSTEM_AFXDP_TESTSUITE_AT) \
> >       $(SYSTEM_OFFLOADS_TESTSUITE_AT) \
> >       $(SYSTEM_DPDK_TESTSUITE_AT) \
> >       $(OVSDB_CLUSTER_TESTSUITE_AT) \
> >       $(TESTSUITE) \
> >       $(SYSTEM_KMOD_TESTSUITE) \
> >       $(SYSTEM_USERSPACE_TESTSUITE) \
> > +     $(SYSTEM_AFXDP_TESTSUITE) \
> >       $(SYSTEM_OFFLOADS_TESTSUITE) \
> >       $(SYSTEM_DPDK_TESTSUITE) \
> >       $(OVSDB_CLUSTER_TESTSUITE) \
> > @@ -158,6 +160,11 @@ SYSTEM_USERSPACE_TESTSUITE_AT = \
> >       tests/system-userspace-macros.at \
> >       tests/system-userspace-packet-type-aware.at
> >
> > +SYSTEM_AFXDP_TESTSUITE_AT = \
> > +     tests/system-afxdp-testsuite.at \
> > +     tests/system-afxdp-traffic.at \
> > +     tests/system-afxdp-macros.at
> > +
> >  SYSTEM_TESTSUITE_AT = \
> >       tests/system-common-macros.at \
> >       tests/system-ovn.at \
> > @@ -182,6 +189,7 @@ TESTSUITE = $(srcdir)/tests/testsuite
> >  TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch
> >  SYSTEM_KMOD_TESTSUITE = $(srcdir)/tests/system-kmod-testsuite
> >  SYSTEM_USERSPACE_TESTSUITE = $(srcdir)/tests/system-userspace-testsuite
> > +SYSTEM_AFXDP_TESTSUITE = $(srcdir)/tests/system-afxdp-testsuite
> >  SYSTEM_OFFLOADS_TESTSUITE = $(srcdir)/tests/system-offloads-testsuite
> >  SYSTEM_DPDK_TESTSUITE = $(srcdir)/tests/system-dpdk-testsuite
> >  OVSDB_CLUSTER_TESTSUITE = $(srcdir)/tests/ovsdb-cluster-testsuite
> > @@ -315,6 +323,11 @@ check-system-userspace: all
> >       set $(SHELL) '$(SYSTEM_USERSPACE_TESTSUITE)' -C tests
> AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
> >       "$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@"
> --recheck)
> >
> > +check-afxdp: all
> > +     $(MAKE) install
> > +     set $(SHELL) '$(SYSTEM_AFXDP_TESTSUITE)' -C tests
> AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \
> > +     "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
> > +
> >  check-offloads: all
> >       set $(SHELL) '$(SYSTEM_OFFLOADS_TESTSUITE)' -C tests
> AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
> >       "$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@"
> --recheck)
> > @@ -352,6 +365,10 @@ $(SYSTEM_USERSPACE_TESTSUITE): package.m4
> $(SYSTEM_TESTSUITE_AT) $(SYSTEM_USERSP
> >       $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
> >       $(AM_V_at)mv $@.tmp $@
> >
> > +$(SYSTEM_AFXDP_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT)
> $(SYSTEM_AFXDP_TESTSUITE_AT) $(COMMON_MACROS_AT)
> > +     $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
> > +     $(AM_V_at)mv $@.tmp $@
> > +
> >  $(SYSTEM_OFFLOADS_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT)
> $(SYSTEM_OFFLOADS_TESTSUITE_AT) $(COMMON_MACROS_AT)
> >       $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
> >       $(AM_V_at)mv $@.tmp $@
> > diff --git a/tests/system-afxdp-macros.at b/tests/system-afxdp-macros.at
> > new file mode 100644
> > index 000000000000..2c58c2d6554b
> > --- /dev/null
> > +++ b/tests/system-afxdp-macros.at
> > @@ -0,0 +1,153 @@
> > +# _ADD_BR([name])
> > +#
> > +# Expands into the proper ovs-vsctl commands to create a bridge with the
> > +# appropriate type and properties
> > +m4_define([_ADD_BR], [[add-br $1 -- set Bridge $1 datapath_type=netdev
> protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15
> fail-mode=secure ]])
> > +
> > +# OVS_TRAFFIC_VSWITCHD_START([vsctl-args], [vsctl-output], [=override])
> > +#
> > +# Creates a database and starts ovsdb-server, starts ovs-vswitchd
> > +# connected to that database, calls ovs-vsctl to create a bridge named
> > +# br0 with predictable settings, passing 'vsctl-args' as additional
> > +# commands to ovs-vsctl.  If 'vsctl-args' causes ovs-vsctl to provide
> > +# output (e.g. because it includes "create" commands) then
> 'vsctl-output'
> > +# specifies the expected output after filtering through uuidfilt.
> > +m4_define([OVS_TRAFFIC_VSWITCHD_START],
> > +  [
> > +   export OVS_PKGDATADIR=$(`pwd`)
> > +   _OVS_VSWITCHD_START([--disable-system])
> > +   AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [|
> uuidfilt])], [0], [$2])
> > +])
> > +
> > +# OVS_TRAFFIC_VSWITCHD_STOP([WHITELIST], [extra_cmds])
> > +#
> > +# Gracefully stops ovs-vswitchd and ovsdb-server, checking their log
> files
> > +# for messages with severity WARN or higher and signaling an error if
> any
> > +# is present.  The optional WHITELIST may contain shell-quoted "sed"
> > +# commands to delete any warnings that are actually expected, e.g.:
> > +#
> > +#   OVS_TRAFFIC_VSWITCHD_STOP(["/expected error/d"])
> > +#
> > +# 'extra_cmds' are shell commands to be executed afte
> OVS_VSWITCHD_STOP() is
> > +# invoked. They can be used to perform additional cleanups such as name
> space
> > +# removal.
> > +m4_define([OVS_TRAFFIC_VSWITCHD_STOP],
> > +  [OVS_VSWITCHD_STOP([dnl
> > +$1";/netdev_linux.*obtaining netdev stats via vport failed/d
> > +/dpif_netlink.*Generic Netlink family 'ovs_datapath' does not exist.
> The Open vSwitch kernel module is probably not loaded./d
> > +/dpif_netdev(revalidator.*)|ERR|internal error parsing flow key/d
> > +/dpif(revalidator.*)|WARN|netdev@ovs-netdev: failed to put/d
> > +"])
> > +   AT_CHECK([:; $2])
> > +  ])
> > +
> > +m4_define([ADD_VETH_AFXDP],
> > +    [ AT_CHECK([ip link add $1 type veth peer name ovs-$1 || return 77])
> > +      CONFIGURE_AFXDP_VETH_OFFLOADS([$1])
> > +      AT_CHECK([ip link set $1 netns $2])
> > +      AT_CHECK([ip link set dev ovs-$1 up])
> > +      AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \
> > +                set interface ovs-$1 external-ids:iface-id="$1"
> type="afxdp"])
> > +      NS_CHECK_EXEC([$2], [ip addr add $4 dev $1 $7])
> > +      NS_CHECK_EXEC([$2], [ip link set dev $1 up])
> > +      if test -n "$5"; then
> > +        NS_CHECK_EXEC([$2], [ip link set dev $1 address $5])
> > +      fi
> > +      if test -n "$6"; then
> > +        NS_CHECK_EXEC([$2], [ip route add default via $6])
> > +      fi
> > +      on_exit 'ip link del ovs-$1'
> > +    ]
> > +)
> > +
> > +# CONFIGURE_AFXDP_VETH_OFFLOADS([VETH])
> > +#
> > +# Disable TX offloads and VLAN offloads for veths used in AF_XDP.
> > +m4_define([CONFIGURE_AFXDP_VETH_OFFLOADS],
> > +    [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])
> > +     AT_CHECK([ethtool -K $1 rxvlan off], [0], [ignore], [ignore])
> > +     AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore])
> > +    ]
> > +)
> > +
> > +# CONFIGURE_VETH_OFFLOADS([VETH])
> > +#
> > +# Disable TX offloads for veths.  The userspace datapath uses the
> AF_PACKET
> > +# socket to receive packets for veths.  Unfortunately, the AF_PACKET
> socket
> > +# doesn't play well with offloads:
> > +# 1. GSO packets are received without segmentation and therefore
> discarded.
> > +# 2. Packets with offloaded partial checksum are received with the wrong
> > +#    checksum, therefore discarded by the receiver.
> > +#
> > +# By disabling tx offloads in the non-OVS side of the veth peer we make
> sure
> > +# that the AF_PACKET socket will not receive bad packets.
> > +#
> > +# This is a workaround, and should be removed when offloads are properly
> > +# supported in netdev-linux.
> > +m4_define([CONFIGURE_VETH_OFFLOADS],
> > +    [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])]
> > +)
> > +
> > +# CHECK_CONNTRACK()
> > +#
> > +# Perform requirements checks for running conntrack tests.
> > +#
> > +m4_define([CHECK_CONNTRACK],
> > +    [AT_SKIP_IF([test $HAVE_PYTHON = no])]
> > +)
> > +
> > +# CHECK_CONNTRACK_ALG()
> > +#
> > +# Perform requirements checks for running conntrack ALG tests. The
> userspace
> > +# supports FTP and TFTP.
> > +#
> > +m4_define([CHECK_CONNTRACK_ALG])
> > +
> > +# CHECK_CONNTRACK_FRAG()
> > +#
> > +# Perform requirements checks for running conntrack fragmentations
> tests.
> > +# The userspace doesn't support fragmentation yet, so skip the tests.
> > +m4_define([CHECK_CONNTRACK_FRAG],
> > +[
> > +    AT_SKIP_IF([:])
> > +])
> > +
> > +# CHECK_CONNTRACK_LOCAL_STACK()
> > +#
> > +# Perform requirements checks for running conntrack tests with local
> stack.
> > +# While the kernel connection tracker automatically passes all the
> connection
> > +# tracking state from an internal port to the OpenvSwitch kernel
> module, there
> > +# is simply no way of doing that with the userspace, so skip the tests.
> > +m4_define([CHECK_CONNTRACK_LOCAL_STACK],
> > +[
> > +    AT_SKIP_IF([:])
> > +])
> > +
> > +# CHECK_CONNTRACK_NAT()
> > +#
> > +# Perform requirements checks for running conntrack NAT tests. The
> userspace
> > +# datapath supports NAT.
> > +#
> > +m4_define([CHECK_CONNTRACK_NAT])
> > +
> > +# CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE()
> > +#
> > +# Perform requirements checks for running ovs-dpctl flush-conntrack by
> > +# conntrack 5-tuple test. The userspace datapath does not support
> > +# this feature yet.
> > +m4_define([CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE],
> > +[
> > +    AT_SKIP_IF([:])
> > +])
> > +
> > +# CHECK_CT_DPIF_SET_GET_MAXCONNS()
> > +#
> > +# Perform requirements checks for running ovs-dpctl ct-set-maxconns or
> > +# ovs-dpctl ct-get-maxconns. The userspace datapath does support this
> feature.
> > +m4_define([CHECK_CT_DPIF_SET_GET_MAXCONNS])
> > +
> > +# CHECK_CT_DPIF_GET_NCONNS()
> > +#
> > +# Perform requirements checks for running ovs-dpctl ct-get-nconns. The
> > +# userspace datapath does support this feature.
> > +m4_define([CHECK_CT_DPIF_GET_NCONNS])
> > diff --git a/tests/system-afxdp-testsuite.at b/tests/
> system-afxdp-testsuite.at
> > new file mode 100644
> > index 000000000000..538c0d15d556
> > --- /dev/null
> > +++ b/tests/system-afxdp-testsuite.at
> > @@ -0,0 +1,26 @@
> > +AT_INIT
> > +
> > +AT_COPYRIGHT([Copyright (c) 2018 Nicira, Inc.
> > +
> > +Licensed under the Apache License, Version 2.0 (the "License");
> > +you may not use this file except in compliance with the License.
> > +You may obtain a copy of the License at:
> > +
> > +    http://www.apache.org/licenses/LICENSE-2.0
> > +
> > +Unless required by applicable law or agreed to in writing, software
> > +distributed under the License is distributed on an "AS IS" BASIS,
> > +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> > +See the License for the specific language governing permissions and
> > +limitations under the License.])
> > +
> > +m4_ifdef([AT_COLOR_TESTS], [AT_COLOR_TESTS])
> > +
> > +m4_include([tests/ovs-macros.at])
> > +m4_include([tests/ovsdb-macros.at])
> > +m4_include([tests/ofproto-macros.at])
> > +m4_include([tests/system-afxdp-macros.at])
> > +m4_include([tests/system-common-macros.at])
> > +
> > +m4_include([tests/system-afxdp-traffic.at])
> > +m4_include([tests/system-ovn.at])
> > diff --git a/tests/system-afxdp-traffic.at b/tests/
> system-afxdp-traffic.at
> > new file mode 100644
> > index 000000000000..26f72acf48ef
> > --- /dev/null
> > +++ b/tests/system-afxdp-traffic.at
> > @@ -0,0 +1,978 @@
> > +AT_BANNER([AF_XDP netdev datapath-sanity])
> > +
> > +AT_SETUP([datapath - ping between two ports])
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ulimit -l unlimited
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping between two ports on vlan])
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +ADD_VLAN(p0, at_ns0, 100, "10.2.2.1/24")
> > +ADD_VLAN(p1, at_ns1, 100, "10.2.2.2/24")
> > +
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.2.2.2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping6 between two ports])
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
> > +
> > +dnl Linux seems to take a little time to get its IPv6 stack in order.
> Without
> > +dnl waiting, we get occasional failures due to the following error:
> > +dnl "connect: Cannot assign requested address"
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
> > +
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 6 fc00::2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping6 between two ports on vlan])
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
> > +
> > +ADD_VLAN(p0, at_ns0, 100, "fc00:1::1/96")
> > +ADD_VLAN(p1, at_ns1, 100, "fc00:1::2/96")
> > +
> > +dnl Linux seems to take a little time to get its IPv6 stack in order.
> Without
> > +dnl waiting, we get occasional failures due to the following error:
> > +dnl "connect: Cannot assign requested address"
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00:1::2])
> > +
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:1::2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping6 -s 1600 -q -c 3 -i 0.3 -w 2 fc00:1::2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 fc00:1::2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over vxlan tunnel])
> > +OVS_CHECK_VXLAN()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> > +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL([vxlan], [br0], [at_vxlan0], [172.31.1.1], [
> 10.1.1.100/24])
> > +ADD_NATIVE_TUNNEL([vxlan], [at_vxlan1], [at_ns0], [172.31.1.100], [
> 10.1.1.1/24],
> > +                  [id 0 dstport 4789])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0],
> [OK
> > +])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over vxlan6 tunnel])
> > +OVS_CHECK_VXLAN_UDP6ZEROCSUM()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
> > +AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL6([vxlan], [br0], [at_vxlan0], [fc00::1], [10.1.1.100/24
> ])
> > +ADD_NATIVE_TUNNEL6([vxlan], [at_vxlan1], [at_ns0], [fc00::100], [
> 10.1.1.1/24],
> > +                   [id 0 dstport 4789 udp6zerocsumtx udp6zerocsumrx])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0], [OK
> > +])
> > +
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over gre tunnel])
> > +OVS_CHECK_GRE()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> > +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL([gre], [br0], [at_gre0], [172.31.1.1], [10.1.1.100/24])
> > +ADD_NATIVE_TUNNEL([gretap], [ns_gre0], [at_ns0], [172.31.1.100], [
> 10.1.1.1/24])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0],
> [OK
> > +])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over erspan v1 tunnel])
> > +OVS_CHECK_GRE()
> > +OVS_CHECK_ERSPAN()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> > +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1], [
> 10.1.1.100/24], [options:key=1 options:erspan_ver=1 options:erspan_idx=7])
> > +ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100], [
> 10.1.1.1/24], [seq key 1 erspan_ver 1 erspan 7])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0],
> [OK
> > +])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over erspan v2 tunnel])
> > +OVS_CHECK_GRE()
> > +OVS_CHECK_ERSPAN()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> > +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1], [
> 10.1.1.100/24], [options:key=1 options:erspan_ver=2 options:erspan_dir=1
> options:erspan_hwid=0x7])
> > +ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100], [
> 10.1.1.1/24], [seq key 1 erspan_ver 2 erspan_dir egress erspan_hwid 7])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0],
> [OK
> > +])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over ip6erspan v1 tunnel])
> > +OVS_CHECK_GRE()
> > +OVS_CHECK_ERSPAN()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [], nodad)
> > +AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1], [
> 10.1.1.100/24],
> > +                [options:key=123 options:erspan_ver=1
> options:erspan_idx=0x7])
> > +ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0], [fc00:100::100],
> > +                   [10.1.1.1/24], [local fc00:100::1 seq key 123
> erspan_ver 1 erspan 7])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0], [OK
> > +])
> > +
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over ip6erspan v2 tunnel])
> > +OVS_CHECK_GRE()
> > +OVS_CHECK_ERSPAN()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [], nodad)
> > +AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1], [
> 10.1.1.100/24],
> > +                [options:key=121 options:erspan_ver=2
> options:erspan_dir=0 options:erspan_hwid=0x7])
> > +ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0], [fc00:100::100],
> > +                   [10.1.1.1/24],
> > +                   [local fc00:100::1 seq key 121 erspan_ver 2
> erspan_dir ingress erspan_hwid 0x7])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0], [OK
> > +])
> > +
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over geneve tunnel])
> > +OVS_CHECK_GENEVE()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
> > +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL([geneve], [br0], [at_gnv0], [172.31.1.1], [10.1.1.100/24
> ])
> > +ADD_NATIVE_TUNNEL([geneve], [ns_gnv0], [at_ns0], [172.31.1.100], [
> 10.1.1.1/24],
> > +                  [vni 0])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.100/24 br-underlay], [0],
> [OK
> > +])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - ping over geneve6 tunnel])
> > +OVS_CHECK_GENEVE_UDP6ZEROCSUM()
> > +
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +ADD_BR([br-underlay])
> > +
> > +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> > +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
> > +
> > +ADD_NAMESPACES(at_ns0)
> > +
> > +dnl Set up underlay link from host into the namespace using veth pair.
> > +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
> > +AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
> > +AT_CHECK([ip link set dev br-underlay up])
> > +
> > +dnl Set up tunnel endpoints on OVS outside the namespace and with a
> native
> > +dnl linux device inside the namespace.
> > +ADD_OVS_TUNNEL6([geneve], [br0], [at_gnv0], [fc00::1], [10.1.1.100/24])
> > +ADD_NATIVE_TUNNEL6([geneve], [ns_gnv0], [at_ns0], [fc00::100], [
> 10.1.1.1/24],
> > +                   [vni 0 udp6zerocsumtx udp6zerocsumrx])
> > +
> > +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
> > +])
> > +AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0], [OK
> > +])
> > +
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
> > +
> > +dnl First, check the underlay
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +dnl Okay, now check the overlay with different packet sizes
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - clone action])
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1, at_ns2)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +AT_CHECK([ovs-vsctl -- set interface ovs-p0 ofport_request=1 \
> > +                    -- set interface ovs-p1 ofport_request=2])
> > +
> > +AT_DATA([flows.txt], [dnl
> > +priority=1 actions=NORMAL
> > +priority=10
> in_port=1,ip,actions=clone(mod_dl_dst(50:54:00:00:00:0a),set_field:192.168.3.3->ip_dst),
> output:2
> > +priority=10
> in_port=2,ip,actions=clone(mod_dl_src(ae:c6:7e:54:8d:4d),mod_dl_dst(50:54:00:00:00:0b),set_field:192.168.4.4->ip_dst,
> controller), output:1
> > +])
> > +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> > +
> > +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir
> --pidfile 2> ofctl_monitor.log])
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +AT_CHECK([cat ofctl_monitor.log | STRIP_MONITOR_CSUM], [0], [dnl
> >
> +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0
> icmp_csum: <skip>
> >
> +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0
> icmp_csum: <skip>
> >
> +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0
> icmp_csum: <skip>
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([datapath - basic truncate action])
> > +AT_SKIP_IF([test $HAVE_NC = no])
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +AT_CHECK([ovs-ofctl del-flows br0])
> > +
> > +dnl Create p0 and ovs-p0(1)
> > +ADD_NAMESPACES(at_ns0)
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +NS_CHECK_EXEC([at_ns0], [ip link set dev p0 address e6:66:c1:11:11:11])
> > +NS_CHECK_EXEC([at_ns0], [arp -s 10.1.1.2 e6:66:c1:22:22:22])
> > +
> > +dnl Create p1(3) and ovs-p1(2), packets received from ovs-p1 will
> appear in p1
> > +AT_CHECK([ip link add p1 type veth peer name ovs-p1])
> > +on_exit 'ip link del ovs-p1'
> > +AT_CHECK([ip link set dev ovs-p1 up])
> > +AT_CHECK([ip link set dev p1 up])
> > +AT_CHECK([ovs-vsctl add-port br0 ovs-p1 -- set interface ovs-p1
> ofport_request=2])
> > +dnl Use p1 to check the truncated packet
> > +AT_CHECK([ovs-vsctl add-port br0 p1 -- set interface p1
> ofport_request=3])
> > +
> > +dnl Create p2(5) and ovs-p2(4)
> > +AT_CHECK([ip link add p2 type veth peer name ovs-p2])
> > +on_exit 'ip link del ovs-p2'
> > +AT_CHECK([ip link set dev ovs-p2 up])
> > +AT_CHECK([ip link set dev p2 up])
> > +AT_CHECK([ovs-vsctl add-port br0 ovs-p2 -- set interface ovs-p2
> ofport_request=4])
> > +dnl Use p2 to check the truncated packet
> > +AT_CHECK([ovs-vsctl add-port br0 p2 -- set interface p2
> ofport_request=5])
> > +
> > +dnl basic test
> > +AT_CHECK([ovs-ofctl del-flows br0])
> > +AT_DATA([flows.txt], [dnl
> > +in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop
> > +in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop
> > +in_port=1 dl_dst=e6:66:c1:22:22:22
> actions=output(port=2,max_len=100),output:4
> > +])
> > +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> > +
> > +dnl use this file as payload file for ncat
> > +AT_CHECK([dd if=/dev/urandom of=payload200.bin bs=200 count=1 2>
> /dev/null])
> > +on_exit 'rm -f payload200.bin'
> > +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 <
> payload200.bin])
> > +
> > +dnl packet with truncated size
> > +AT_CHECK([ovs-appctl revalidator/purge], [0])
> > +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" |  sed -n
> 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> > +n_bytes=100
> > +])
> > +dnl packet with original size
> > +AT_CHECK([ovs-appctl revalidator/purge], [0])
> > +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n
> 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> > +n_bytes=242
> > +])
> > +
> > +dnl more complicated output actions
> > +AT_CHECK([ovs-ofctl del-flows br0])
> > +AT_DATA([flows.txt], [dnl
> > +in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop
> > +in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop
> > +in_port=1 dl_dst=e6:66:c1:22:22:22
> actions=output(port=2,max_len=100),output:4,output(port=2,max_len=100),output(port=4,max_len=100),output:2,output(port=4,max_len=200),output(port=2,max_len=65535)
> > +])
> > +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> > +
> > +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 <
> payload200.bin])
> > +
> > +dnl 100 + 100 + 242 + min(65535,242) = 684
> > +AT_CHECK([ovs-appctl revalidator/purge], [0])
> > +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n
> 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> > +n_bytes=684
> > +])
> > +dnl 242 + 100 + min(242,200) = 542
> > +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n
> 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> > +n_bytes=542
> > +])
> > +
> > +dnl SLOW_ACTION: disable kernel datapath truncate support
> > +dnl Repeat the test above, but exercise the SLOW_ACTION code path
> > +AT_CHECK([ovs-appctl dpif/set-dp-features br0 trunc false], [0])
> > +
> > +dnl SLOW_ACTION test1: check datapatch actions
> > +AT_CHECK([ovs-ofctl del-flows br0])
> > +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> > +
> > +AT_CHECK([ovs-appctl ofproto/trace br0
> "in_port=1,dl_type=0x800,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_proto=6,tp_src=8,tp_dst=9"],
> [0], [stdout])
> > +AT_CHECK([tail -3 stdout], [0],
> > +[Datapath actions:
> trunc(100),3,5,trunc(100),3,trunc(100),5,3,trunc(200),5,trunc(65535),3
> > +This flow is handled by the userspace slow path because it:
> > +  - Uses action(s) not supported by datapath.
> > +])
> > +
> > +dnl SLOW_ACTION test2: check actual packet truncate
> > +AT_CHECK([ovs-ofctl del-flows br0])
> > +AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
> > +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 <
> payload200.bin])
> > +
> > +dnl 100 + 100 + 242 + min(65535,242) = 684
> > +AT_CHECK([ovs-appctl revalidator/purge], [0])
> > +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n
> 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> > +n_bytes=684
> > +])
> > +
> > +dnl 242 + 100 + min(242,200) = 542
> > +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n
> 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
> > +n_bytes=542
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +
> > +AT_BANNER([conntrack])
> > +
> > +AT_SETUP([conntrack - controller])
> > +CHECK_CONNTRACK()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg
> ofproto_dpif_upcall:dbg])
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from
> ns1->ns0.
> > +AT_DATA([flows.txt], [dnl
> > +priority=1,action=drop
> > +priority=10,arp,action=normal
> > +priority=100,in_port=1,udp,action=ct(commit),controller
> > +priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0)
> > +priority=100,in_port=2,ct_state=+trk+est,udp,action=controller
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +AT_CAPTURE_FILE([ofctl_monitor.log])
> > +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir
> --pidfile 2> ofctl_monitor.log])
> > +
> > +dnl Send an unsolicited reply from port 2. This should be dropped.
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\)
> '50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000'])
> > +
> > +dnl OK, now start a new connection from port 1.
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 1
> ct\(commit\),controller
> '50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000'])
> > +
> > +dnl Now try a reply from port 2.
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\)
> '50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000'])
> > +
> > +dnl Check this output. We only see the latter two packets, not the
> first.
> > +AT_CHECK([cat ofctl_monitor.log], [0], [dnl
> > +NXT_PACKET_IN2 (xid=0x0): total_len=42 in_port=1 (via action)
> data_len=42 (unbuffered)
> >
> +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2
> udp_csum:0
> > +NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42
> ct_state=est|rpl|trk,ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2,ip,in_port=2
> (via action) data_len=42 (unbuffered)
> >
> +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1
> udp_csum:0
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([conntrack - force commit])
> > +CHECK_CONNTRACK()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg
> ofproto_dpif_upcall:dbg])
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +AT_DATA([flows.txt], [dnl
> > +priority=1,action=drop
> > +priority=10,arp,action=normal
> > +priority=100,in_port=1,udp,action=ct(force,commit),controller
> > +priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0)
> >
> +priority=100,in_port=2,ct_state=+trk+est,udp,action=ct(force,commit,table=1)
> > +table=1,in_port=2,ct_state=+trk,udp,action=controller
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +AT_CAPTURE_FILE([ofctl_monitor.log])
> > +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir
> --pidfile 2> ofctl_monitor.log])
> > +
> > +dnl Send an unsolicited reply from port 2. This should be dropped.
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000
> actions=resubmit(,0)"])
> > +
> > +dnl OK, now start a new connection from port 1.
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000
> actions=resubmit(,0)"])
> > +
> > +dnl Now try a reply from port 2.
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000
> actions=resubmit(,0)"])
> > +
> > +AT_CHECK([ovs-appctl revalidator/purge], [0])
> > +
> > +dnl Check this output. We only see the latter two packets, not the
> first.
> > +AT_CHECK([cat ofctl_monitor.log], [0], [dnl
> > +NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 in_port=1 (via
> action) data_len=42 (unbuffered)
> >
> +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2
> udp_csum:0
> > +NXT_PACKET_IN2 (xid=0x0): table_id=1 cookie=0x0 total_len=42
> ct_state=new|trk,ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1,ip,in_port=2
> (via action) data_len=42 (unbuffered)
> >
> +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1
> udp_csum:0
> > +])
> > +
> > +dnl
> > +dnl Check that the directionality has been changed by force commit.
> > +dnl
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.2,"], [], [dnl
> >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2)
> > +])
> > +
> > +dnl OK, now send another packet from port 1 and see that it switches
> again
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000
> actions=resubmit(,0)"])
> > +AT_CHECK([ovs-appctl revalidator/purge], [0])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.1,"], [], [dnl
> >
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([conntrack - ct flush by 5-tuple])
> > +CHECK_CONNTRACK()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +AT_DATA([flows.txt], [dnl
> > +priority=1,action=drop
> > +priority=10,arp,action=normal
> > +priority=100,in_port=1,udp,action=ct(commit),2
> > +priority=100,in_port=2,udp,action=ct(zone=5,commit),1
> > +priority=100,in_port=1,icmp,action=ct(commit),2
> > +priority=100,in_port=2,icmp,action=ct(zone=5,commit),1
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +dnl Test UDP from port 1
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000
> actions=resubmit(,0)"])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.1,"], [], [dnl
> >
> +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack
> 'ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1'])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.1,"], [1], [dnl
> > +])
> > +
> > +dnl Test UDP from port 2
> > +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2
> packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000
> actions=resubmit(,0)"])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.2,"], [0], [dnl
> >
> +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5
> 'ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2'])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0],
> [dnl
> > +])
> > +
> > +dnl Test ICMP traffic
> > +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.2,"], [0], [stdout])
> > +AT_CHECK([cat stdout | FORMAT_CT(10.1.1.1)], [0],[dnl
> >
> +icmp,orig=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=8,code=0),reply=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=0,code=0),zone=5
> > +])
> > +
> > +ICMP_ID=`cat stdout | cut -d ',' -f4 | cut -d '=' -f2`
> >
> +ICMP_TUPLE=ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=1,icmp_id=$ICMP_ID,icmp_type=8,icmp_code=0
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5 $ICMP_TUPLE])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep
> "orig=.src=10\.1\.1\.2,"], [1], [dnl
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([conntrack - IPv4 ping])
> > +CHECK_CONNTRACK()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from
> ns1->ns0.
> > +AT_DATA([flows.txt], [dnl
> > +priority=1,action=drop
> > +priority=10,arp,action=normal
> > +priority=100,in_port=1,icmp,action=ct(commit),2
> > +priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
> > +priority=100,in_port=2,icmp,ct_state=+trk+est,action=1
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +dnl Pings from ns0->ns1 should work fine.
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0],
> [dnl
> >
> +icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=0,code=0)
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> > +
> > +dnl Pings from ns1->ns0 should fail.
> > +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 |
> FORMAT_PING], [0], [dnl
> > +7 packets transmitted, 0 received, 100% packet loss, time 0ms
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([conntrack - get_nconns and get/set_maxconns])
> > +CHECK_CONNTRACK()
> > +CHECK_CT_DPIF_SET_GET_MAXCONNS()
> > +CHECK_CT_DPIF_GET_NCONNS()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
> > +
> > +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from
> ns1->ns0.
> > +AT_DATA([flows.txt], [dnl
> > +priority=1,action=drop
> > +priority=10,arp,action=normal
> > +priority=100,in_port=1,icmp,action=ct(commit),2
> > +priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
> > +priority=100,in_port=2,icmp,ct_state=+trk+est,action=1
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +dnl Pings from ns0->ns1 should work fine.
> > +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0],
> [dnl
> >
> +icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=0,code=0)
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp], [2], [], [dnl
> > +ovs-vswitchd: maxconns missing or malformed (Invalid argument)
> > +ovs-appctl: ovs-vswitchd: server returned an error
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns a], [2], [], [dnl
> > +ovs-vswitchd: maxconns missing or malformed (Invalid argument)
> > +ovs-appctl: ovs-vswitchd: server returned an error
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp 10], [2], [], [dnl
> > +ovs-vswitchd: datapath not found (Invalid argument)
> > +ovs-appctl: ovs-vswitchd: server returned an error
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns one-bad-dp], [2], [], [dnl
> > +ovs-vswitchd: datapath not found (Invalid argument)
> > +ovs-appctl: ovs-vswitchd: server returned an error
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-nconns one-bad-dp], [2], [], [dnl
> > +ovs-vswitchd: datapath not found (Invalid argument)
> > +ovs-appctl: ovs-vswitchd: server returned an error
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl
> > +1
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
> > +3000000
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns 10], [], [dnl
> > +setting maxconns successful
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
> > +10
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl
> > +0
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
> > +10
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> > +
> > +AT_SETUP([conntrack - IPv6 ping])
> > +CHECK_CONNTRACK()
> > +OVS_TRAFFIC_VSWITCHD_START()
> > +
> > +ADD_NAMESPACES(at_ns0, at_ns1)
> > +
> > +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
> > +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
> > +
> > +AT_DATA([flows.txt], [dnl
> > +
> > +dnl ICMPv6 echo request and reply go to table 1.  The rest of the
> traffic goes
> > +dnl through normal action.
> > +table=0,priority=10,icmp6,icmp_type=128,action=goto_table:1
> > +table=0,priority=10,icmp6,icmp_type=129,action=goto_table:1
> > +table=0,priority=1,action=normal
> > +
> > +dnl Allow everything from ns0->ns1. Only allow return traffic from
> ns1->ns0.
> > +table=1,priority=100,in_port=1,icmp6,action=ct(commit),2
> > +table=1,priority=100,in_port=2,icmp6,ct_state=-trk,action=ct(table=0)
> > +table=1,priority=100,in_port=2,icmp6,ct_state=+trk+est,action=1
> > +table=1,priority=1,action=drop
> > +])
> > +
> > +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
> > +
> > +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
> > +
> > +dnl The above ping creates state in the connection tracker.  We're not
> > +dnl interested in that state.
> > +AT_CHECK([ovs-appctl dpctl/flush-conntrack])
> > +
> > +dnl Pings from ns1->ns0 should fail.
> > +NS_CHECK_EXEC([at_ns1], [ping6 -q -c 3 -i 0.3 -w 2 fc00::1 |
> FORMAT_PING], [0], [dnl
> > +7 packets transmitted, 0 received, 100% packet loss, time 0ms
> > +])
> > +
> > +dnl Pings from ns0->ns1 should work fine.
> > +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::2 |
> FORMAT_PING], [0], [dnl
> > +3 packets transmitted, 3 received, 0% packet loss, time 0ms
> > +])
> > +
> > +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fc00::2)], [0],
> [dnl
> >
> +icmpv6,orig=(src=fc00::1,dst=fc00::2,id=<cleared>,type=128,code=0),reply=(src=fc00::2,dst=fc00::1,id=<cleared>,type=129,code=0)
> > +])
> > +
> > +OVS_TRAFFIC_VSWITCHD_STOP
> > +AT_CLEANUP
> >
>
Ilya Maximets April 30, 2019, 1:14 p.m. UTC | #6
On 27.04.2019 16:28, William Tu wrote:
>     >  if WIN32
>     >  lib_libopenvswitch_la_SOURCES += \
>     > diff --git a/lib/dp-packet.c b/lib/dp-packet.c
>     > index 0976a35e758b..a61552f72988 100644
>     > --- a/lib/dp-packet.c
>     > +++ b/lib/dp-packet.c
>     > @@ -22,6 +22,9 @@
>     >  #include "netdev-dpdk.h"
>     >  #include "openvswitch/dynamic-string.h"
>     >  #include "util.h"
>     > +#ifdef HAVE_AF_XDP
>     > +#include "xdpsock.h"
>     > +#endif
>     > 
>     >  static void
>     >  dp_packet_init__(struct dp_packet *b, size_t allocated, enum dp_packet_source source)
>     > @@ -122,6 +125,16 @@ dp_packet_uninit(struct dp_packet *b)
>     >               * created as a dp_packet */
>     >              free_dpdk_buf((struct dp_packet*) b);
>     >  #endif
>     > +        } else if (b->source == DPBUF_AFXDP) {
>     > +#ifdef HAVE_AF_XDP
>     > +            struct dp_packet_afxdp *xpacket;
>     > +
>     > +            xpacket = dp_packet_cast_afxdp(b);
>     > +            if (xpacket->mpool) {
>     > +                umem_elem_push(xpacket->mpool, dp_packet_base(b));
>     > +            }
>     > +#endif
> 
>     Why not making the same trick as we have for DPDK few lines above?
>     i.e. wrap this part in a function like 'free_afxdp_buf' and move it
>     to the netdev-afxdp.c ? You will not need to expose so many internals
>     to generic code. dp_packet_cast_afxdp() will also be moved there along
>     with 'struct dp_packet_afxdp'.
> 
>     BTW, I hope, someday, I'll finally implement 'dp-packet-memory-provider'
>     abstraction for OVS.
> 
> 
> Hi Ilya,
> 
> Can you share more detail about this idea, dp-packet-memory-provider?
> Why do we need it?

Hi.

OVS uses way too different memory sources for 'dp-packet's. They are defined
in 'enum dp_packet_source'. Here is some list of issues (in my opinion) we
have:

* Some of dp-packet APIs are independent from the memory source, but all the
  APIs that able to alloc/free/resize depends on the memory source and includes
  all these ugly switches/ifs guarded by ifdefs to do completely different
  things on packets from different sources.

* Some specific management of memory pools resides inside implementations of
  netdevs, which is not good from the point of OVS design. This is moslty
  about managing DPDK memory pools, but the same is applicable to your umem
  memory pools. There was also attempt to implement netmap based netdev and
  there was own memory pool implementation too.

* We even have DPDK specific code in the generic datapath code to prevent
  having packets with different memory sources in a single batch.

* netdevs that requires specific memory sources (DPDK) has to reallocate and
  copy all the packets that has different source. And this is done on netdev
  level while handling send().

* 'dp_packet_new()' is able to allocate new packet only with DPBUF_MALLOC.
  This triggers the previous issue with re-allocating on send. This is really
  bad thing because 'dp_packet_new()' used by 'dp_packet_clone()'. i.e. to
  clone dp-packet with DPDK source we're reallocating it on heap and again
  re-allocating it to send to DPDK netdev.

* netdevs usually new packets with dp_packet_new() on malloced memory even
  if they have no strict requirements about memory sources.
  For example, netdev-linux is not able to receive packets to dp-packet
  with DPBUF_DPDK/AFXDP source.

* Probably, there are more issues that I don't remember right now.


Solution is to have 'dp-packet-memory-provider' abstraction layer:

* Each memory-provider will implement specific alloc/free/etc functions.

* Each netdev might have preferred memory-provider that must be used while
  sending packets to it.

* Datapath will be able to make decision about dp-packet reallocations in
  case of memory-provider mismatch on a higher level decreasing the munber
  of reallocations.

* Packet clone could take into account the memory-provider of the original
  packet with ability to reallocate from the same provider.

* We might have globally preferred memory provider for all cases where
  memory source is not important. For example, we could use dpdk memory
  provider for packets received from netdev-linux to not re-allocate them
  on send().

* Memory pools' management will be encapsulated inside specific providers.

* It might be possible even for AFXDP netdev to use DPDK mempools created
  on top of memory shared with kernel. (just an idea)


Drawbacks:

* There are always performance concerns because this will have significant
  impact on the hot path --> could produce performance issues. Extensive
  testing required.

Best regards, Ilya Maximets.
Ilya Maximets April 30, 2019, 3:56 p.m. UTC | #7
One more thing.

On 25.04.2019 2:47, William Tu wrote:
> +#define unlikely OVS_UNLIKELY
> +#define likely OVS_LIKELY
> +#define barrier() __asm__ __volatile__("": : :"memory")
> +#define smp_rmb() barrier()
> +#define smp_wmb() barrier()

You may probably use something like this:

#include "ovs-atomic.h"

#define barrier() atomic_signal_fence(memory_order_acq_rel)
#define smp_rmb() atomic_thread_fence(memory_order_acq_rel)
#define smp_wmb() atomic_thread_fence(memory_order_acq_rel)


Note:
atomic_thread_fence(memory_order_acq_rel) should be a bit stronger
than smp_{w,r}mb(), but it's a fair replacement and should be equal
to compiler_barrier() on x86 anyway.

Best regards, Ilya Maximets.
Ben Pfaff April 30, 2019, 4:10 p.m. UTC | #8
On 25.04.2019 2:47, William Tu wrote:
> +#define unlikely OVS_UNLIKELY
> +#define likely OVS_LIKELY
> +#define barrier() __asm__ __volatile__("": : :"memory")
> +#define smp_rmb() barrier()
> +#define smp_wmb() barrier()

Does any of this mean we're cutting and pasting Linux kernel code into
OVS userspace?  I hope not.  That is a bad idea for licensing reasons.
Eelco Chaudron April 30, 2019, 4:18 p.m. UTC | #9
On 30 Apr 2019, at 18:10, Ben Pfaff wrote:

> On 25.04.2019 2:47, William Tu wrote:
>> +#define unlikely OVS_UNLIKELY
>> +#define likely OVS_LIKELY
>> +#define barrier() __asm__ __volatile__("": : :"memory")
>> +#define smp_rmb() barrier()
>> +#define smp_wmb() barrier()
>
> Does any of this mean we're cutting and pasting Linux kernel code into
> OVS userspace?  I hope not.  That is a bad idea for licensing reasons.

I think this got fixed with the following kernel patches:

https://www.spinics.net/lists/netdev/msg563507.html
William Tu April 30, 2019, 4:25 p.m. UTC | #10
Thanks Ben and Eelco

On Tue, Apr 30, 2019 at 9:18 AM Eelco Chaudron <echaudro@redhat.com> wrote:

>
>
> On 30 Apr 2019, at 18:10, Ben Pfaff wrote:
>
> > On 25.04.2019 2:47, William Tu wrote:
> >> +#define unlikely OVS_UNLIKELY
> >> +#define likely OVS_LIKELY
> >> +#define barrier() __asm__ __volatile__("": : :"memory")
> >> +#define smp_rmb() barrier()
> >> +#define smp_wmb() barrier()
> >
> > Does any of this mean we're cutting and pasting Linux kernel code into
> > OVS userspace?  I hope not.  That is a bad idea for licensing reasons.
>
> I think this got fixed with the following kernel patches:
>
> https://www.spinics.net/lists/netdev/msg563507.html


I did cut/paste some kernel code in my previous versions.
I think this version is OK, will double check in my next version.

Regards,
William
diff mbox series

Patch

diff --git a/Documentation/automake.mk b/Documentation/automake.mk
index 082438e09a33..11cc59efc881 100644
--- a/Documentation/automake.mk
+++ b/Documentation/automake.mk
@@ -10,6 +10,7 @@  DOC_SOURCE = \
 	Documentation/intro/why-ovs.rst \
 	Documentation/intro/install/index.rst \
 	Documentation/intro/install/bash-completion.rst \
+	Documentation/intro/install/afxdp.rst \
 	Documentation/intro/install/debian.rst \
 	Documentation/intro/install/documentation.rst \
 	Documentation/intro/install/distributions.rst \
diff --git a/Documentation/index.rst b/Documentation/index.rst
index 46261235c732..aa9e7c49f179 100644
--- a/Documentation/index.rst
+++ b/Documentation/index.rst
@@ -59,6 +59,7 @@  vSwitch? Start here.
   :doc:`intro/install/windows` |
   :doc:`intro/install/xenserver` |
   :doc:`intro/install/dpdk` |
+  :doc:`intro/install/afxdp` |
   :doc:`Installation FAQs <faq/releases>`
 
 - **Tutorials:** :doc:`tutorials/faucet` |
diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst
new file mode 100644
index 000000000000..a1e3317bbdb5
--- /dev/null
+++ b/Documentation/intro/install/afxdp.rst
@@ -0,0 +1,366 @@ 
+..
+      Licensed under the Apache License, Version 2.0 (the "License"); you may
+      not use this file except in compliance with the License. You may obtain
+      a copy of the License at
+
+          http://www.apache.org/licenses/LICENSE-2.0
+
+      Unless required by applicable law or agreed to in writing, software
+      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
+      License for the specific language governing permissions and limitations
+      under the License.
+
+      Convention for heading levels in Open vSwitch documentation:
+
+      =======  Heading 0 (reserved for the title in a document)
+      -------  Heading 1
+      ~~~~~~~  Heading 2
+      +++++++  Heading 3
+      '''''''  Heading 4
+
+      Avoid deeper levels because they do not render well.
+
+
+========================
+Open vSwitch with AF_XDP
+========================
+
+This document describes how to build and install Open vSwitch using
+AF_XDP netdev.
+
+.. warning::
+  The AF_XDP support of Open vSwitch is considered 'experimental',
+  and it is not compiled in by default.
+
+Introduction
+------------
+AF_XDP, Address Family of the eXpress Data Path, is a new Linux socket type
+built upon the eBPF and XDP technology.  It is aims to have comparable
+performance to DPDK but cooperate better with existing kernel's networking
+stack.  An AF_XDP socket receives and sends packets from an eBPF/XDP program
+attached to the netdev, by-passing a couple of Linux kernel's subsystems.
+As a result, AF_XDP socket shows much better performance than AF_PACKET.
+For more details about AF_XDP, please see linux kernel's
+Documentation/networking/af_xdp.rst
+
+
+AF_XDP Netdev
+-------------
+OVS has a couple of netdev types, i.e., system, tap, or
+internal.  The AF_XDP feature adds a new netdev types called
+"afxdp", and implement its configuration, packet reception,
+and transmit functions.  Since the AF_XDP socket, xsk,
+operates in userspace, once ovs-vswitchd receives packets
+from xsk, the proposed architecture re-uses the existing
+userspace dpif-netdev datapath.  As a result, most of
+the packet processing happens at the userspace instead of
+linux kernel.
+
+::
+
+              |   +-------------------+
+              |   |    ovs-vswitchd   |<-->ovsdb-server
+              |   +-------------------+
+              |   |      ofproto      |<-->OpenFlow controllers
+              |   +--------+-+--------+
+              |   | netdev | |ofproto-|
+    userspace |   +--------+ |  dpif  |
+              |   | afxdp  | +--------+
+              |   | netdev | |  dpif  |
+              |   +---||---+ +--------+
+              |       ||     |  dpif- |
+              |       ||     | netdev |
+              |_      ||     +--------+
+                      ||
+               _  +---||-----+--------+
+              |   | AF_XDP prog +     |
+       kernel |   |   xsk_map         |
+              |_  +--------||---------+
+                           ||
+                        physical
+                           NIC
+
+
+Build requirements
+------------------
+
+In addition to the requirements described in :doc:`general`, building Open
+vSwitch with AF_XDP will require the following:
+
+- libbpf from kernel source tree (kernel 5.0.0 or later)
+
+- Linux kernel XDP support, with the following options (required)
+  ``_CONFIG_BPF=y``
+
+  ``_CONFIG_BPF_SYSCALL=y``
+
+  ``_CONFIG_XDP_SOCKETS=y``
+
+
+- The following optional Kconfig options are also recommended, but not
+  required:
+
+  ``_CONFIG_BPF_JIT=y`` (Performance)
+
+  ``_CONFIG_HAVE_BPF_JIT=y`` (Performance)
+
+  ``_CONFIG_XDP_SOCKETS_DIAG=y`` (Debugging)
+
+- If possible, run **./xdpsock -r -N -z -i <your device>** under
+  linux/samples/bpf.  This is the OVS indepedent benchmark tools for AF_XDP.
+  It makes sure your basic kernel requirements are met for AF_XDP.
+
+
+Installing
+----------
+For OVS to use AF_XDP netdev, it has to be configured with LIBBPF support.
+Frist, clone a recent version of Linux bpf-next tree::
+
+  git clone git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
+
+Second, go into the Linux source directory and build libbpf in the tools
+directory::
+
+  cd bpf-next/
+  cd tools/lib/bpf/
+  make && make install
+  make install_headers
+
+.. note::
+   Make sure xsk.h and bpf.h are installed in system's library path,
+   e.g. /usr/local/include/bpf/ or /usr/include/bpf/
+
+Make sure the libbpf.so is installed correctly::
+
+  ldconfig
+  ldconfig -p | grep libbpf
+
+
+Third, ensure the standard OVS requirements are installed and
+bootstrap/configure the package::
+
+  ./boot.sh && ./configure --enable-afxdp
+
+Finally, build and install OVS::
+
+  make && make install
+
+To kick start end-to-end autotesting::
+
+  uname -a # make sure having 5.0+ kernel
+  make check-afxdp
+
+if a test case fails, check the log at::
+
+  cat tests/system-afxdp-testsuite.dir/<number>/system-afxdp-testsuite.log
+
+
+Setup AF_XDP netdev
+-------------------
+Before running OVS with AF_XDP, make sure the libbpf and libelf are
+set-up right::
+
+  ldd vswitchd/ovs-vswitchd
+
+Open vSwitch should be started using userspace datapath as described
+in :doc:`general`::
+
+  ovs-vswitchd --disable-system
+  ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
+
+.. note::
+   OVS AF_XDP netdev is using the userspace datapath, the same datapath
+   as used by OVS-DPDK.  So it requires --disable-system for ovs-vswitchd
+   and datapath_type=netdev when adding a new bridge.
+
+Make sure your device support AF_XDP, and to use 1 PMD (on core 4)
+on 1 queue (queue 0) device, configure these options: **pmd-cpu-mask,
+pmd-rxq-affinity, and n_rxq**. The **xdpmode** can be "drv" or "skb"::
+
+  ethtool -L enp2s0 combined 1
+  ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
+  ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
+    options:n_rxq=1 options:xdpmode=drv \
+    other_config:pmd-rxq-affinity="0:4"
+
+Or, use 4 pmds/cores and 4 queues by doing::
+
+  ethtool -L enp2s0 combined 4
+  ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
+  ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
+    options:n_rxq=4 options:xdpmode=drv \
+    other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
+
+To validate that the bridge has successfully instantiated, you can use the::
+
+  ovs-vsctl show
+
+should show something like::
+
+  Port "ens802f0"
+   Interface "ens802f0"
+      type: afxdp
+      options: {n_rxq="1", xdpmode=drv}
+
+Otherwise, enable debug by::
+
+  ovs-appctl vlog/set netdev_afxdp::dbg
+
+
+References
+----------
+Most of the design details are described in the paper presented at
+Linux Plumber 2018, "Bringing the Power of eBPF to Open vSwitch"[1],
+section 4, and slides[2][4].
+"The Path to DPDK Speeds for AF XDP"[3] gives a very good introduction
+about AF_XDP current and future work.
+
+
+[1] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-afxdp.pdf
+
+[2] http://vger.kernel.org/lpc_net2018_talks/ovs-ebpf-lpc18-presentation.pdf
+
+[3] http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdf
+
+[4] https://ovsfall2018.sched.com/event/IO7p/fast-userspace-ovs-with-afxdp
+
+
+Performance Tuning
+------------------
+The name of the game is to keep your CPU running in userspace, allowing PMD
+to keep polling the AF_XDP queues without any interferences from kernel.
+
+#. Make sure everything is in the same NUMA node (memory used by AF_XDP, pmd
+   running cores, device plug-in slot)
+
+#. Isolate your CPU by doing isolcpu at grub configure.
+
+#. IRQ should not set to pmd running core.
+
+#. The Spectre and Meltdown fixes increase the overhead of system calls.
+
+Debugging performance issue
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+While running the traffic, use linux perf tool to see where your cpu
+spends its cycle::
+
+  cd bpf-next/tools/perf
+  make
+  ./perf record -p `pidof ovs-vswitchd` sleep 10
+  ./perf report
+
+Measure your system call rate by doing::
+
+  pstree -p `pidof ovs-vswitchd`
+  strace -c -p <your pmd's PID>
+
+Or, use OVS pmd tool::
+
+  ovs-appctl dpif-netdev/pmd-stats-show
+
+
+Example Script
+--------------
+
+Below is a script using namespaces and veth peer::
+
+  #!/bin/bash
+  ovs-vswitchd --no-chdir --pidfile -vvconn -vofproto_dpif -vunixctl \
+    --disable-system --detach \
+  ovs-vsctl -- add-br br0 -- set Bridge br0 \
+    protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14 \
+    fail-mode=secure datapath_type=netdev
+  ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
+
+  ip netns add at_ns0
+  ovs-appctl vlog/set netdev_afxdp::dbg
+
+  ip link add p0 type veth peer name afxdp-p0
+  ip link set p0 netns at_ns0
+  ip link set dev afxdp-p0 up
+  ovs-vsctl add-port br0 afxdp-p0 -- \
+    set interface afxdp-p0 external-ids:iface-id="p0" type="afxdp"
+
+  ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
+  ip addr add "10.1.1.1/24" dev p0
+  ip link set dev p0 up
+  NS_EXEC_HEREDOC
+
+  ip netns add at_ns1
+  ip link add p1 type veth peer name afxdp-p1
+  ip link set p1 netns at_ns1
+  ip link set dev afxdp-p1 up
+
+  ovs-vsctl add-port br0 afxdp-p1 -- \
+    set interface afxdp-p1 external-ids:iface-id="p1" type="afxdp"
+  ip netns exec at_ns1 sh << NS_EXEC_HEREDOC
+  ip addr add "10.1.1.2/24" dev p1
+  ip link set dev p1 up
+  NS_EXEC_HEREDOC
+
+  ip netns exec at_ns0 ping -i .2 10.1.1.2
+
+
+Limitations/Known Issues
+------------------------
+#. Device's numa ID is always 0, need a way to find numa id from a netdev.
+#. No QoS support because AF_XDP netdev by-pass the Linux TC layer. A possible
+   work-around is to use OpenFlow meter action.
+#. AF_XDP device added to bridge, remove, and added again will fail.
+#. Most of the tests are done using i40e single port. Multiple ports and
+   also ixgbe driver also needs to be tested.
+#. No latency test result (TODO items)
+
+
+make check-afxdp
+----------------
+When executing 'make check-afxdp', OVS creates namespaces, sets up AF_XDP on
+veth devices and kicks start the testing.  So far we have the following test
+cases::
+
+ AF_XDP netdev datapath-sanity
+
+  1: datapath - ping between two ports               ok
+  2: datapath - ping between two ports on vlan       ok
+  3: datapath - ping6 between two ports              ok
+  4: datapath - ping6 between two ports on vlan      ok
+  5: datapath - ping over vxlan tunnel               ok
+  6: datapath - ping over vxlan6 tunnel              ok
+  7: datapath - ping over gre tunnel                 ok
+  8: datapath - ping over erspan v1 tunnel           ok
+  9: datapath - ping over erspan v2 tunnel           ok
+ 10: datapath - ping over ip6erspan v1 tunnel        ok
+ 11: datapath - ping over ip6erspan v2 tunnel        ok
+ 12: datapath - ping over geneve tunnel              ok
+ 13: datapath - ping over geneve6 tunnel             ok
+ 14: datapath - clone action                         ok
+ 15: datapath - basic truncate action                ok
+
+ conntrack
+
+ 16: conntrack - controller                          ok
+ 17: conntrack - force commit                        ok
+ 18: conntrack - ct flush by 5-tuple                 ok
+ 19: conntrack - IPv4 ping                           ok
+ 20: conntrack - get_nconns and get/set_maxconns     ok
+ 21: conntrack - IPv6 ping                           ok
+
+ system-ovn
+
+ 22: ovn -- 2 LRs connected via LS, gateway router, SNAT and DNAT ok
+ 23: ovn -- 2 LRs connected via LS, gateway router, easy SNAT ok
+ 24: ovn -- multiple gateway routers, SNAT and DNAT  ok
+ 25: ovn -- load-balancing                           ok
+ 26: ovn -- load-balancing - same subnet.            ok
+ 27: ovn -- load balancing in gateway router         ok
+ 28: ovn -- multiple gateway routers, load-balancing ok
+ 29: ovn -- load balancing in router with gateway router port ok
+ 30: ovn -- DNAT and SNAT on distributed router - N/S ok
+ 31: ovn -- DNAT and SNAT on distributed router - E/W ok
+
+
+Bug Reporting
+-------------
+
+Please report problems to dev@openvswitch.org.
diff --git a/Documentation/intro/install/index.rst b/Documentation/intro/install/index.rst
index 3193c736cf17..c27a9c9d16ff 100644
--- a/Documentation/intro/install/index.rst
+++ b/Documentation/intro/install/index.rst
@@ -45,6 +45,7 @@  Installation from Source
    xenserver
    userspace
    dpdk
+   afxdp
 
 Installation from Packages
 --------------------------
diff --git a/acinclude.m4 b/acinclude.m4
index 301aeb70d82a..d80f2494d514 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -221,6 +221,29 @@  AC_DEFUN([OVS_FIND_DEPENDENCY], [
   ])
 ])
 
+dnl OVS_CHECK_LINUX_AF_XDP
+dnl
+dnl Check both Linux kernel AF_XDP and libbpf support
+AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
+  AC_MSG_CHECKING([whether AF_XDP is supported])
+  AC_ARG_ENABLE([afxdp],
+                [AC_HELP_STRING([--enable-afxdp], [Enable AF-XDP support])],
+                [], [enable_afxdp=no])
+  AC_CHECK_HEADER([bpf/libbpf.h],
+                  [HAVE_LIBBPF=yes],
+                  [HAVE_LIBBPF=no])
+  AC_CHECK_HEADER([linux/if_xdp.h],
+                  [HAVE_IF_XDP=yes],
+                  [HAVE_IF_XDP=no])
+  AM_CONDITIONAL([SUPPORT_AF_XDP],
+                 [test "$enable_afxdp" = yes && test "$HAVE_LIBBPF" = yes && test "$HAVE_IF_XDP" = yes])
+  AM_COND_IF([SUPPORT_AF_XDP], [
+    AC_DEFINE([HAVE_AF_XDP], [1], [Define to 1 if AF-XDP support is available and enabled.])
+    LIBBPF_LDADD=" -lbpf -lelf"
+    AC_SUBST([LIBBPF_LDADD])
+  ])
+])
+
 dnl OVS_CHECK_DPDK
 dnl
 dnl Configure DPDK source tree
diff --git a/configure.ac b/configure.ac
index 505e3d041e93..29c90b73f836 100644
--- a/configure.ac
+++ b/configure.ac
@@ -99,6 +99,7 @@  OVS_CHECK_SPHINX
 OVS_CHECK_DOT
 OVS_CHECK_IF_DL
 OVS_CHECK_STRTOK_R
+OVS_CHECK_LINUX_AF_XDP
 AC_CHECK_DECLS([sys_siglist], [], [], [[#include <signal.h>]])
 AC_CHECK_MEMBERS([struct stat.st_mtim.tv_nsec, struct stat.st_mtimensec],
   [], [], [[#include <sys/stat.h>]])
diff --git a/lib/automake.mk b/lib/automake.mk
index cc5dccf39d6b..8b9df5635bbe 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -9,6 +9,7 @@  lib_LTLIBRARIES += lib/libopenvswitch.la
 
 lib_libopenvswitch_la_LIBADD = $(SSL_LIBS)
 lib_libopenvswitch_la_LIBADD += $(CAPNG_LDADD)
+lib_libopenvswitch_la_LIBADD += $(LIBBPF_LDADD)
 
 if WIN32
 lib_libopenvswitch_la_LIBADD += ${PTHREAD_LIBS}
@@ -327,7 +328,11 @@  lib_libopenvswitch_la_SOURCES = \
 	lib/lldp/lldpd.c \
 	lib/lldp/lldpd.h \
 	lib/lldp/lldpd-structs.c \
-	lib/lldp/lldpd-structs.h
+	lib/lldp/lldpd-structs.h \
+	lib/xdpsock.c \
+	lib/xdpsock.h \
+	lib/netdev-afxdp.c \
+	lib/netdev-afxdp.h
 
 if WIN32
 lib_libopenvswitch_la_SOURCES += \
diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 0976a35e758b..a61552f72988 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -22,6 +22,9 @@ 
 #include "netdev-dpdk.h"
 #include "openvswitch/dynamic-string.h"
 #include "util.h"
+#ifdef HAVE_AF_XDP
+#include "xdpsock.h"
+#endif
 
 static void
 dp_packet_init__(struct dp_packet *b, size_t allocated, enum dp_packet_source source)
@@ -122,6 +125,16 @@  dp_packet_uninit(struct dp_packet *b)
              * created as a dp_packet */
             free_dpdk_buf((struct dp_packet*) b);
 #endif
+        } else if (b->source == DPBUF_AFXDP) {
+#ifdef HAVE_AF_XDP
+            struct dp_packet_afxdp *xpacket;
+
+            xpacket = dp_packet_cast_afxdp(b);
+            if (xpacket->mpool) {
+                umem_elem_push(xpacket->mpool, dp_packet_base(b));
+            }
+#endif
+            return;
         }
     }
 }
@@ -248,6 +261,8 @@  dp_packet_resize__(struct dp_packet *b, size_t new_headroom, size_t new_tailroom
     case DPBUF_STACK:
         OVS_NOT_REACHED();
 
+    case DPBUF_AFXDP:
+        OVS_NOT_REACHED();
     case DPBUF_STUB:
         b->source = DPBUF_MALLOC;
         new_base = xmalloc(new_allocated);
@@ -433,6 +448,7 @@  dp_packet_steal_data(struct dp_packet *b)
 {
     void *p;
     ovs_assert(b->source != DPBUF_DPDK);
+    ovs_assert(b->source != DPBUF_AFXDP);
 
     if (b->source == DPBUF_MALLOC && dp_packet_data(b) == dp_packet_base(b)) {
         p = dp_packet_data(b);
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index a5e9ade1244a..774728eef330 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -25,6 +25,10 @@ 
 #include <rte_mbuf.h>
 #endif
 
+#ifdef HAVE_AF_XDP
+#include "lib/xdpsock.h"
+#endif
+
 #include "netdev-dpdk.h"
 #include "openvswitch/list.h"
 #include "packets.h"
@@ -42,6 +46,7 @@  enum OVS_PACKED_ENUM dp_packet_source {
     DPBUF_DPDK,                /* buffer data is from DPDK allocated memory.
                                 * ref to dp_packet_init_dpdk() in dp-packet.c.
                                 */
+    DPBUF_AFXDP,                /* buffer data from XDP frame */
 };
 
 #define DP_PACKET_CONTEXT_SIZE 64
@@ -89,6 +94,20 @@  struct dp_packet {
     };
 };
 
+struct dp_packet_afxdp {
+    struct umem_pool *mpool;
+    struct dp_packet packet;
+};
+
+#if HAVE_AF_XDP
+static struct dp_packet_afxdp *
+dp_packet_cast_afxdp(const struct dp_packet *d OVS_UNUSED)
+{
+    ovs_assert(d->source == DPBUF_AFXDP);
+    return CONTAINER_OF(d, struct dp_packet_afxdp, packet);
+}
+#endif
+
 static inline void *dp_packet_data(const struct dp_packet *);
 static inline void dp_packet_set_data(struct dp_packet *, void *);
 static inline void *dp_packet_base(const struct dp_packet *);
@@ -183,7 +202,21 @@  dp_packet_delete(struct dp_packet *b)
             free_dpdk_buf((struct dp_packet*) b);
             return;
         }
-
+        if (b->source == DPBUF_AFXDP) {
+#ifdef HAVE_AF_XDP
+            struct dp_packet_afxdp *xpacket;
+
+            /* if a packet is received from afxdp port,
+             * and tx to a system port. Then we need to
+             * push the rx umem back here
+             */
+            xpacket = dp_packet_cast_afxdp(b);
+            if (xpacket->mpool) {
+                umem_elem_push(xpacket->mpool, dp_packet_base(b));
+            }
+#endif
+            return;
+        }
         dp_packet_uninit(b);
         free(b);
     }
diff --git a/lib/dpif-netdev-perf.h b/lib/dpif-netdev-perf.h
index 859c05613ddf..e47cf73bf3c9 100644
--- a/lib/dpif-netdev-perf.h
+++ b/lib/dpif-netdev-perf.h
@@ -198,6 +198,19 @@  cycles_counter_update(struct pmd_perf_stats *s)
 {
 #ifdef DPDK_NETDEV
     return s->last_tsc = rte_get_tsc_cycles();
+#elif HAVE_AF_XDP
+    union {
+        uint64_t tsc_64;
+        struct {
+            uint32_t lo_32;
+            uint32_t hi_32;
+        };
+    } tsc;
+    asm volatile("rdtsc" :
+             "=a" (tsc.lo_32),
+             "=d" (tsc.hi_32));
+
+    return s->last_tsc = tsc.tsc_64;
 #else
     return s->last_tsc = 0;
 #endif
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
new file mode 100644
index 000000000000..4c71061fc102
--- /dev/null
+++ b/lib/netdev-afxdp.c
@@ -0,0 +1,589 @@ 
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <config.h>
+#ifdef HAVE_AF_XDP
+#include "netdev-linux.h"
+#include <errno.h>
+#include <fcntl.h>
+#include <sys/types.h>
+#include <netinet/in.h>
+#include <arpa/inet.h>
+#include <inttypes.h>
+#include <sys/ioctl.h>
+#include <sys/socket.h>
+#include <sys/utsname.h>
+#include <netpacket/packet.h>
+#include <net/if.h>
+#include <net/if_arp.h>
+#include <net/route.h>
+#include <poll.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#include "coverage.h"
+#include "dp-packet.h"
+#include "dpif-netlink.h"
+#include "dpif-netdev.h"
+#include "openvswitch/dynamic-string.h"
+#include "fatal-signal.h"
+#include "hash.h"
+#include "openvswitch/hmap.h"
+#include "netdev-provider.h"
+#include "netdev-tc-offloads.h"
+#include "netdev-vport.h"
+#include "netlink-notifier.h"
+#include "netlink-socket.h"
+#include "netlink.h"
+#include "netnsid.h"
+#include "openvswitch/ofpbuf.h"
+#include "openflow/openflow.h"
+#include "ovs-atomic.h"
+#include "packets.h"
+#include "openvswitch/poll-loop.h"
+#include "rtnetlink.h"
+#include "openvswitch/shash.h"
+#include "socket-util.h"
+#include "sset.h"
+#include "tc.h"
+#include "timer.h"
+#include "unaligned.h"
+#include "openvswitch/vlog.h"
+#include "util.h"
+#include "netdev-afxdp.h"
+
+#include <linux/if_ether.h>
+#include <linux/if_tun.h>
+#include <linux/types.h>
+#include <linux/ethtool.h>
+#include <linux/mii.h>
+#include <linux/rtnetlink.h>
+#include <linux/sockios.h>
+#include <linux/if_xdp.h>
+#include "xdpsock.h"
+
+#ifndef SOL_XDP
+#define SOL_XDP 283
+#endif
+#ifndef AF_XDP
+#define AF_XDP 44
+#endif
+#ifndef PF_XDP
+#define PF_XDP AF_XDP
+#endif
+
+VLOG_DEFINE_THIS_MODULE(netdev_afxdp);
+static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
+
+#define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base))
+#define UMEM2XPKT(base, i) \
+    ALIGNED_CAST(struct dp_packet_afxdp *, (char *)base + \
+    i * sizeof(struct dp_packet_afxdp))
+
+static uint32_t opt_xdp_bind_flags = XDP_COPY;
+static uint32_t opt_xdp_flags =
+                    XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
+#ifdef USE_DRVMODE_DEFAULT
+static uint32_t opt_xdp_bind_flags = XDP_ZEROCOPY;
+static uint32_t opt_xdp_flags =
+                    XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
+#endif
+static uint32_t prog_id;
+
+static struct xsk_umem_info *xsk_configure_umem(void *buffer, uint64_t size)
+{
+    struct xsk_umem_info *umem;
+    int ret;
+    int i;
+
+    umem = xcalloc(1, sizeof(*umem));
+    if (!umem) {
+        VLOG_FATAL("xsk config umem failed (%s)", ovs_strerror(errno));
+    }
+
+    ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
+                           NULL);
+
+    if (ret) {
+        VLOG_FATAL("xsk umem create failed (%s) mode: %s",
+            ovs_strerror(errno),
+            opt_xdp_bind_flags == XDP_COPY ? "SKB": "DRV");
+    }
+
+    umem->buffer = buffer;
+
+    /* set-up umem pool */
+    umem_pool_init(&umem->mpool, NUM_FRAMES);
+
+    for (i = NUM_FRAMES - 1; i >= 0; i--) {
+        struct umem_elem *elem;
+
+        elem = ALIGNED_CAST(struct umem_elem *,
+                            (char *)umem->buffer + i * FRAME_SIZE);
+        umem_elem_push(&umem->mpool, elem);
+    }
+
+    /* set-up metadata */
+    xpacket_pool_init(&umem->xpool, NUM_FRAMES);
+
+    VLOG_DBG("%s xpacket pool from %p to %p", __func__,
+              umem->xpool.array,
+              (char *)umem->xpool.array +
+              NUM_FRAMES * sizeof(struct dp_packet_afxdp));
+
+    for (i = NUM_FRAMES - 1; i >= 0; i--) {
+        struct dp_packet_afxdp *xpacket;
+        struct dp_packet *packet;
+
+        xpacket = UMEM2XPKT(umem->xpool.array, i);
+        xpacket->mpool = &umem->mpool;
+
+        packet = &xpacket->packet;
+        packet->source = DPBUF_AFXDP;
+    }
+
+    return umem;
+}
+
+static struct xsk_socket_info *
+xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
+                     uint32_t queue_id)
+{
+    struct xsk_socket_config cfg;
+    struct xsk_socket_info *xsk;
+    char devname[IF_NAMESIZE];
+    uint32_t idx;
+    int ret;
+    int i;
+
+    xsk = xcalloc(1, sizeof(*xsk));
+    if (!xsk) {
+        VLOG_FATAL("xsk calloc failed (%s)", ovs_strerror(errno));
+    }
+
+    xsk->umem = umem;
+    cfg.rx_size = CONS_NUM_DESCS;
+    cfg.tx_size = PROD_NUM_DESCS;
+    cfg.libbpf_flags = 0;
+    cfg.xdp_flags = opt_xdp_flags;
+    cfg.bind_flags = opt_xdp_bind_flags;
+
+    if (if_indextoname(ifindex, devname) == NULL) {
+        VLOG_FATAL("ifindex %d devname failed (%s)",
+                   ifindex, ovs_strerror(errno));
+    }
+
+    ret = xsk_socket__create(&xsk->xsk, devname, queue_id, umem->umem,
+                             &xsk->rx, &xsk->tx, &cfg);
+    if (ret) {
+        VLOG_FATAL("xsk_socket_create failed (%s) mode: %s qid: %d",
+                   ovs_strerror(errno),
+                   opt_xdp_bind_flags == XDP_COPY ? "SKB": "DRV",
+                   queue_id);
+    }
+
+    /* make sure the XDP program is there */
+    ret = bpf_get_link_xdp_id(ifindex, &prog_id, opt_xdp_flags);
+    if (ret) {
+        VLOG_FATAL("get XDP prog ID failed (%s)", ovs_strerror(errno));
+    }
+
+    ret = xsk_ring_prod__reserve(&xsk->umem->fq,
+                                 PROD_NUM_DESCS,
+                                 &idx);
+    if (ret != PROD_NUM_DESCS) {
+        VLOG_FATAL("fq set-up failed (%s)", ovs_strerror(errno));
+    }
+
+    for (i = 0;
+         i < PROD_NUM_DESCS * FRAME_SIZE;
+         i += FRAME_SIZE) {
+        struct umem_elem *elem;
+        uint64_t addr;
+
+        elem = umem_elem_pop(&xsk->umem->mpool);
+        addr = UMEM2DESC(elem, xsk->umem->buffer);
+
+        *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx++) = addr;
+    }
+
+    xsk_ring_prod__submit(&xsk->umem->fq,
+                          PROD_NUM_DESCS);
+    return xsk;
+}
+
+struct xsk_socket_info *
+xsk_configure(int ifindex, int xdp_queue_id)
+{
+    struct xsk_socket_info *xsk;
+    struct xsk_umem_info *umem;
+    void *bufs;
+    int ret;
+
+    ret = posix_memalign(&bufs, getpagesize(),
+                         NUM_FRAMES * FRAME_SIZE);
+    ovs_assert(!ret);
+
+    /* Create sockets... */
+    umem = xsk_configure_umem(bufs,
+                              NUM_FRAMES * FRAME_SIZE);
+    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id);
+    return xsk;
+}
+
+static void OVS_UNUSED vlog_hex_dump(const void *buf, size_t count)
+{
+    struct ds ds = DS_EMPTY_INITIALIZER;
+    ds_put_hex_dump(&ds, buf, count, 0, false);
+    VLOG_DBG_RL(&rl, "%s", ds_cstr(&ds));
+    ds_destroy(&ds);
+}
+
+void
+xsk_destroy(struct xsk_socket_info *xsk)
+{
+    struct xsk_umem *umem;
+
+    if (!xsk) {
+        return;
+    }
+
+    umem = xsk->umem->umem;
+    xsk_socket__delete(xsk->xsk);
+    (void)xsk_umem__delete(umem);
+
+    /* cleanup umem pool */
+    umem_pool_cleanup(&xsk->umem->mpool);
+
+    /* cleanup metadata pool */
+    xpacket_pool_cleanup(&xsk->umem->xpool);
+}
+
+static inline void OVS_UNUSED
+print_xsk_stat(struct xsk_socket_info *xsk OVS_UNUSED) {
+    struct xdp_statistics stat;
+    socklen_t optlen;
+
+    optlen = sizeof(stat);
+    ovs_assert(getsockopt(xsk_socket__fd(xsk->xsk), SOL_XDP, XDP_STATISTICS,
+                &stat, &optlen) == 0);
+
+    VLOG_DBG_RL(&rl, "rx dropped %llu, rx_invalid %llu, tx_invalid %llu",
+                     stat.rx_dropped,
+                     stat.rx_invalid_descs,
+                     stat.tx_invalid_descs);
+}
+
+int
+netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
+                        char **errp OVS_UNUSED)
+{
+    const char *xdpmode;
+    int new_n_rxq;
+
+    /* TODO: add mutex lock */
+    new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
+
+    if (netdev->n_rxq != new_n_rxq) {
+
+        if (new_n_rxq > MAX_XSKQ) {
+            VLOG_WARN("set n_rxq %d too large", new_n_rxq);
+            goto out;
+        }
+
+        netdev->n_rxq = new_n_rxq;
+        VLOG_INFO("set AF_XDP device %s to %d n_rxq", netdev->name, new_n_rxq);
+        netdev_request_reconfigure(netdev);
+    }
+
+    xdpmode = smap_get(args, "xdpmode");
+    if (xdpmode && strncmp(xdpmode, "drv", 3) == 0) {
+        if (opt_xdp_bind_flags != XDP_ZEROCOPY) {
+            opt_xdp_bind_flags = XDP_ZEROCOPY;
+            opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
+        }
+        VLOG_INFO("AF_XDP device %s in ZC driver mode", netdev->name);
+    } else {
+        opt_xdp_bind_flags = XDP_COPY;
+        opt_xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
+        VLOG_INFO("AF_XDP device %s in SKB mode", netdev->name);
+    }
+
+out:
+    return 0;
+}
+
+int
+netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args)
+{
+    /* TODO: add mutex lock */
+    smap_add_format(args, "n_rxq", "%d", netdev->n_rxq);
+    smap_add_format(args, "xdpmode", "%s",
+        opt_xdp_bind_flags == XDP_ZEROCOPY ? "drv" : "skb");
+
+    return 0;
+}
+
+int
+netdev_afxdp_get_numa_id(const struct netdev *netdev)
+{
+    /* FIXME: Get netdev's PCIe device ID, then find
+     * its NUMA node id.
+     */
+    VLOG_INFO("FIXME: Device %s always use numa id 0", netdev->name);
+    return 0;
+}
+
+void
+xsk_remove_xdp_program(uint32_t ifindex)
+{
+    uint32_t curr_prog_id = 0;
+
+    /* remove_xdp_program() */
+    if (bpf_get_link_xdp_id(ifindex, &curr_prog_id, opt_xdp_flags)) {
+        bpf_set_link_xdp_fd(ifindex, -1, opt_xdp_flags);
+    }
+    if (prog_id == curr_prog_id) {
+        bpf_set_link_xdp_fd(ifindex, -1, opt_xdp_flags);
+    } else if (!curr_prog_id) {
+        VLOG_WARN("couldn't find a prog id on a given interface");
+    } else {
+        VLOG_WARN("program on interface changed, not removing");
+    }
+}
+
+/* Receive packet from AF_XDP socket */
+int
+netdev_linux_rxq_xsk(struct xsk_socket_info *xsk,
+                     struct dp_packet_batch *batch)
+{
+    unsigned int rcvd, i;
+    uint32_t idx_rx = 0, idx_fq = 0;
+    int ret = 0;
+
+    /* See if there is any packet on RX queue,
+     * if yes, idx_rx is the index having the packet.
+     */
+    rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx);
+    if (!rcvd) {
+        return 0;
+    }
+
+    /* Form a dp_packet batch from descriptor in RX queue */
+    for (i = 0; i < rcvd; i++) {
+        uint64_t addr = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->addr;
+        uint32_t len = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx)->len;
+        char *pkt = xsk_umem__get_data(xsk->umem->buffer, addr);
+        uint64_t index;
+
+        struct dp_packet_afxdp *xpacket;
+        struct dp_packet *packet;
+
+        index = addr >> FRAME_SHIFT;
+        xpacket = UMEM2XPKT(xsk->umem->xpool.array, index);
+
+        packet = &xpacket->packet;
+        xpacket->mpool = &xsk->umem->mpool;
+
+        if (packet->source != DPBUF_AFXDP) {
+            /* FIXME: might be a bug */
+            continue;
+        }
+
+        /* Initialize the struct dp_packet */
+        if (opt_xdp_bind_flags == XDP_ZEROCOPY) {
+            dp_packet_set_base(packet, pkt - FRAME_HEADROOM);
+        } else {
+            /* SKB mode */
+            dp_packet_set_base(packet, pkt);
+        }
+        dp_packet_set_data(packet, pkt);
+        dp_packet_set_size(packet, len);
+
+        /* Add packet into batch, increase batch->count */
+        dp_packet_batch_add(batch, packet);
+
+        idx_rx++;
+    }
+
+    /* We've consume rcvd packets in RX, now re-fill the
+     * same number back to FILL queue.
+     */
+    for (i = 0; i < rcvd; i++) {
+        uint64_t index;
+        struct umem_elem *elem;
+
+        ret = xsk_ring_prod__reserve(&xsk->umem->fq, 1, &idx_fq);
+        while (ret == 0) {
+            /* The FILL queue is full, so retry. (or skip)? */
+            ret = xsk_ring_prod__reserve(&xsk->umem->fq, 1, &idx_fq);
+        }
+
+        /* Get one free umem, program it into FILL queue */
+        elem = umem_elem_pop(&xsk->umem->mpool);
+        index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
+        ovs_assert((index & FRAME_SHIFT_MASK) == 0);
+        *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq) = index;
+
+        idx_fq++;
+    }
+    xsk_ring_prod__submit(&xsk->umem->fq, rcvd);
+
+    /* Release the RX queue */
+    xsk_ring_cons__release(&xsk->rx, rcvd);
+    xsk->rx_npkts += rcvd;
+
+#ifdef AFXDP_DEBUG
+    print_xsk_stat(xsk);
+#endif
+    return 0;
+}
+
+static void kick_tx(struct xsk_socket_info *xsk)
+{
+    int ret;
+
+    ret = sendto(xsk_socket__fd(xsk->xsk), NULL, 0, MSG_DONTWAIT, NULL, 0);
+    if (ret >= 0 || errno == ENOBUFS || errno == EAGAIN || errno == EBUSY) {
+        return;
+    }
+}
+
+int
+netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk,
+                              struct dp_packet_batch *batch)
+{
+    uint32_t tx_done, idx_cq = 0;
+    struct dp_packet *packet;
+    uint32_t idx;
+    int j;
+
+    /* Make sure we have enough TX descs */
+    if (xsk_ring_prod__reserve(&xsk->tx, batch->count, &idx) == 0) {
+        return -EAGAIN;
+    }
+
+    DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
+        struct dp_packet_afxdp *xpacket;
+        struct umem_elem *elem;
+        uint64_t index;
+
+        elem = umem_elem_pop(&xsk->umem->mpool);
+        if (!elem) {
+            return -EAGAIN;
+        }
+
+        memcpy(elem, dp_packet_data(packet), dp_packet_size(packet));
+
+        index = (uint64_t)((char *)elem - (char *)xsk->umem->buffer);
+        xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->addr = index;
+        xsk_ring_prod__tx_desc(&xsk->tx, idx + i)->len
+            = dp_packet_size(packet);
+
+        if (packet->source == DPBUF_AFXDP) {
+            xpacket = dp_packet_cast_afxdp(packet);
+            umem_elem_push(xpacket->mpool, dp_packet_base(packet));
+             /* Avoid freeing it twice at dp_packet_uninit */
+            xpacket->mpool = NULL;
+        }
+    }
+    xsk_ring_prod__submit(&xsk->tx, batch->count);
+    xsk->outstanding_tx += batch->count;
+
+retry:
+    kick_tx(xsk);
+
+    /* Process CQ */
+    tx_done = xsk_ring_cons__peek(&xsk->umem->cq, batch->count, &idx_cq);
+    if (tx_done > 0) {
+        xsk->outstanding_tx -= tx_done;
+        xsk->tx_npkts += tx_done;
+    }
+
+    /* Recycle back to umem pool */
+    for (j = 0; j < tx_done; j++) {
+        struct umem_elem *elem;
+        uint64_t addr;
+
+        addr = *xsk_ring_cons__comp_addr(&xsk->umem->cq, idx_cq++);
+
+        elem = ALIGNED_CAST(struct umem_elem *,
+                            (char *)xsk->umem->buffer + addr);
+        umem_elem_push(&xsk->umem->mpool, elem);
+    }
+    xsk_ring_cons__release(&xsk->umem->cq, tx_done);
+
+    if (xsk->outstanding_tx > PROD_NUM_DESCS - (PROD_NUM_DESCS >> 2)) {
+        /* If there are still a lot not transmitted,
+         * try harder.
+         */
+        goto retry;
+    }
+
+    return 0;
+}
+
+#else
+#include "openvswitch/compiler.h"
+#include "netdev-afxdp.h"
+
+struct xsk_socket_info *
+xsk_configure(int ifindex OVS_UNUSED, int xdp_queue_id OVS_UNUSED)
+{
+    return NULL;
+}
+
+void
+xsk_destroy(struct xsk_socket_info *xsk OVS_UNUSED)
+{
+}
+
+int
+netdev_linux_rxq_xsk(struct xsk_socket_info *xsk OVS_UNUSED,
+                     struct dp_packet_batch *batch OVS_UNUSED)
+{
+    return 0;
+}
+
+int
+netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk OVS_UNUSED,
+                              struct dp_packet_batch *batch OVS_UNUSED)
+{
+    return 0;
+}
+
+int
+netdev_afxdp_set_config(struct netdev *netdev OVS_UNUSED,
+                        const struct smap *args OVS_UNUSED,
+                        char **errp OVS_UNUSED)
+{
+    return 0;
+}
+
+int
+netdev_afxdp_get_config(const struct netdev *netdev OVS_UNUSED,
+                        struct smap *args OVS_UNUSED)
+{
+    return 0;
+}
+
+int
+netdev_afxdp_get_numa_id(const struct netdev *netdev OVS_UNUSED)
+{
+    return 0;
+}
+#endif
diff --git a/lib/netdev-afxdp.h b/lib/netdev-afxdp.h
new file mode 100644
index 000000000000..ea05612a7c0f
--- /dev/null
+++ b/lib/netdev-afxdp.h
@@ -0,0 +1,47 @@ 
+/*
+ * Copyright (c) 2018 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef NETDEV_AFXDP_H
+#define NETDEV_AFXDP_H 1
+
+#include <stdint.h>
+#include <stdbool.h>
+
+/* These functions are Linux AF_XDP specific, so they should be used directly
+ * only by Linux-specific code. */
+#define MAX_XSKQ 16
+struct netdev;
+struct xsk_socket_info;
+struct xdp_umem;
+struct dp_packet_batch;
+struct smap;
+
+struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id);
+void xsk_destroy(struct xsk_socket_info *xsk);
+
+int netdev_linux_rxq_xsk(struct xsk_socket_info *xsk,
+                         struct dp_packet_batch *batch);
+
+int netdev_linux_afxdp_batch_send(struct xsk_socket_info *xsk,
+                                  struct dp_packet_batch *batch);
+
+void xsk_remove_xdp_program(uint32_t ifindex);
+int netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
+                            char **errp);
+int netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args);
+int netdev_afxdp_get_numa_id(const struct netdev *netdev);
+
+#endif /* netdev-afxdp.h */
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index f75d73fd39f8..337760ca3333 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -75,6 +75,7 @@ 
 #include "unaligned.h"
 #include "openvswitch/vlog.h"
 #include "util.h"
+#include "netdev-afxdp.h"
 
 VLOG_DEFINE_THIS_MODULE(netdev_linux);
 
@@ -531,6 +532,7 @@  struct netdev_linux {
 
     /* LAG information. */
     bool is_lag_master;         /* True if the netdev is a LAG master. */
+    struct xsk_socket_info *xsk[MAX_XSKQ]; /* af_xdp socket */
 };
 
 struct netdev_rxq_linux {
@@ -580,12 +582,18 @@  is_netdev_linux_class(const struct netdev_class *netdev_class)
 }
 
 static bool
+is_afxdp_netdev(const struct netdev *netdev)
+{
+    return netdev_get_class(netdev) == &netdev_afxdp_class;
+}
+
+static bool
 is_tap_netdev(const struct netdev *netdev)
 {
     return netdev_get_class(netdev) == &netdev_tap_class;
 }
 
-static struct netdev_linux *
+struct netdev_linux *
 netdev_linux_cast(const struct netdev *netdev)
 {
     ovs_assert(is_netdev_linux_class(netdev_get_class(netdev)));
@@ -1084,6 +1092,25 @@  netdev_linux_destruct(struct netdev *netdev_)
         atomic_count_dec(&miimon_cnt);
     }
 
+#if HAVE_AF_XDP
+    if (is_afxdp_netdev(netdev_)) {
+        int ifindex;
+        int ret, i;
+
+        ret = get_ifindex(netdev_, &ifindex);
+        if (ret) {
+            VLOG_ERR("get ifindex error");
+        } else {
+            for (i = 0; i < MAX_XSKQ; i++) {
+                if (netdev->xsk[i]) {
+                    VLOG_INFO("destroy xsk[%d]", i);
+                    xsk_destroy(netdev->xsk[i]);
+                }
+            }
+            xsk_remove_xdp_program(ifindex);
+        }
+    }
+#endif
     ovs_mutex_destroy(&netdev->mutex);
 }
 
@@ -1113,6 +1140,32 @@  netdev_linux_rxq_construct(struct netdev_rxq *rxq_)
     rx->is_tap = is_tap_netdev(netdev_);
     if (rx->is_tap) {
         rx->fd = netdev->tap_fd;
+    } else if (is_afxdp_netdev(netdev_)) {
+#if HAVE_AF_XDP
+        struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
+        int ifindex;
+        int xdp_queue_id = rxq_->queue_id;
+        struct xsk_socket_info *xsk;
+
+        if (setrlimit(RLIMIT_MEMLOCK, &r)) {
+            VLOG_ERR("ERROR: setrlimit(RLIMIT_MEMLOCK) \"%s\"\n",
+                      ovs_strerror(errno));
+            ovs_assert(0);
+        }
+
+        VLOG_DBG("%s: %s: queue=%d configuring xdp sock",
+                  __func__, netdev_->name, xdp_queue_id);
+
+        /* Get ethernet device index. */
+        error = get_ifindex(&netdev->up, &ifindex);
+        if (error) {
+            goto error;
+        }
+
+        xsk = xsk_configure(ifindex, xdp_queue_id);
+        netdev->xsk[xdp_queue_id] = xsk;
+        rx->fd = xsk_socket__fd(xsk->xsk); /* for netdev layer to poll */
+#endif
     } else {
         struct sockaddr_ll sll;
         int ifindex, val;
@@ -1318,9 +1371,16 @@  netdev_linux_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
 {
     struct netdev_rxq_linux *rx = netdev_rxq_linux_cast(rxq_);
     struct netdev *netdev = rx->up.netdev;
-    struct dp_packet *buffer;
+    struct dp_packet *buffer = NULL;
     ssize_t retval;
     int mtu;
+    struct netdev_linux *netdev_ = netdev_linux_cast(netdev);
+
+    if (is_afxdp_netdev(netdev)) {
+        int qid = rxq_->queue_id;
+
+        return netdev_linux_rxq_xsk(netdev_->xsk[qid], batch);
+    }
 
     if (netdev_linux_get_mtu__(netdev_linux_cast(netdev), &mtu)) {
         mtu = ETH_PAYLOAD_MAX;
@@ -1329,6 +1389,7 @@  netdev_linux_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
     /* Assume Ethernet port. No need to set packet_type. */
     buffer = dp_packet_new_with_headroom(VLAN_ETH_HEADER_LEN + mtu,
                                            DP_NETDEV_HEADROOM);
+
     retval = (rx->is_tap
               ? netdev_linux_rxq_recv_tap(rx->fd, buffer)
               : netdev_linux_rxq_recv_sock(rx->fd, buffer));
@@ -1473,14 +1534,15 @@  netdev_linux_tap_batch_send(struct netdev *netdev_,
  * The kernel maintains a packet transmission queue, so the caller is not
  * expected to do additional queuing of packets. */
 static int
-netdev_linux_send(struct netdev *netdev_, int qid OVS_UNUSED,
+netdev_linux_send(struct netdev *netdev_, int qid,
                   struct dp_packet_batch *batch,
                   bool concurrent_txq OVS_UNUSED)
 {
     int error = 0;
     int sock = 0;
 
-    if (!is_tap_netdev(netdev_)) {
+    if (!is_tap_netdev(netdev_) &&
+        !is_afxdp_netdev(netdev_)) {
         if (netdev_linux_netnsid_is_remote(netdev_linux_cast(netdev_))) {
             error = EOPNOTSUPP;
             goto free_batch;
@@ -1499,6 +1561,10 @@  netdev_linux_send(struct netdev *netdev_, int qid OVS_UNUSED,
         }
 
         error = netdev_linux_sock_batch_send(sock, ifindex, batch);
+    } else if (is_afxdp_netdev(netdev_)) {
+        struct netdev_linux *netdev = netdev_linux_cast(netdev_);
+
+        error = netdev_linux_afxdp_batch_send(netdev->xsk[qid], batch);
     } else {
         error = netdev_linux_tap_batch_send(netdev_, batch);
     }
@@ -3323,6 +3389,7 @@  const struct netdev_class netdev_linux_class = {
     NETDEV_LINUX_CLASS_COMMON,
     LINUX_FLOW_OFFLOAD_API,
     .type = "system",
+    .is_pmd = false,
     .construct = netdev_linux_construct,
     .get_stats = netdev_linux_get_stats,
     .get_features = netdev_linux_get_features,
@@ -3333,6 +3400,7 @@  const struct netdev_class netdev_linux_class = {
 const struct netdev_class netdev_tap_class = {
     NETDEV_LINUX_CLASS_COMMON,
     .type = "tap",
+    .is_pmd = false,
     .construct = netdev_linux_construct_tap,
     .get_stats = netdev_tap_get_stats,
     .get_features = netdev_linux_get_features,
@@ -3343,10 +3411,23 @@  const struct netdev_class netdev_internal_class = {
     NETDEV_LINUX_CLASS_COMMON,
     LINUX_FLOW_OFFLOAD_API,
     .type = "internal",
+    .is_pmd = false,
     .construct = netdev_linux_construct,
     .get_stats = netdev_internal_get_stats,
     .get_status = netdev_internal_get_status,
 };
+
+const struct netdev_class netdev_afxdp_class = {
+    NETDEV_LINUX_CLASS_COMMON,
+    .type = "afxdp",
+    .is_pmd = true,
+    .construct = netdev_linux_construct,
+    .get_stats = netdev_linux_get_stats,
+    .get_status = netdev_linux_get_status,
+    .set_config = netdev_afxdp_set_config,
+    .get_config = netdev_afxdp_get_config,
+    .get_numa_id = netdev_afxdp_get_numa_id,
+};
 
 
 #define CODEL_N_QUEUES 0x0000
diff --git a/lib/netdev-linux.h b/lib/netdev-linux.h
index 17ca9120168a..afcb20ee8d0a 100644
--- a/lib/netdev-linux.h
+++ b/lib/netdev-linux.h
@@ -28,6 +28,7 @@  struct netdev;
 int netdev_linux_ethtool_set_flag(struct netdev *netdev, uint32_t flag,
                                   const char *flag_name, bool enable);
 int linux_get_ifindex(const char *netdev_name);
+struct netdev_linux *netdev_linux_cast(const struct netdev *netdev);
 
 #define LINUX_FLOW_OFFLOAD_API                          \
    .flow_flush = netdev_tc_flow_flush,                  \
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index fb0c27e6e8e8..5bf041316503 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -902,6 +902,7 @@  extern const struct netdev_class netdev_linux_class;
 #endif
 extern const struct netdev_class netdev_internal_class;
 extern const struct netdev_class netdev_tap_class;
+extern const struct netdev_class netdev_afxdp_class;
 
 #ifdef  __cplusplus
 }
diff --git a/lib/netdev.c b/lib/netdev.c
index 7d7ecf6f0946..c30016b34033 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -145,6 +145,7 @@  netdev_initialize(void)
         netdev_register_provider(&netdev_linux_class);
         netdev_register_provider(&netdev_internal_class);
         netdev_register_provider(&netdev_tap_class);
+        netdev_register_provider(&netdev_afxdp_class);
         netdev_vport_tunnel_register();
 #endif
 #if defined(__FreeBSD__) || defined(__NetBSD__)
diff --git a/lib/xdpsock.c b/lib/xdpsock.c
new file mode 100644
index 000000000000..f9fe94b9e36a
--- /dev/null
+++ b/lib/xdpsock.c
@@ -0,0 +1,210 @@ 
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include <config.h>
+#include <ctype.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <stdarg.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <syslog.h>
+#include <time.h>
+#include <unistd.h>
+#include "openvswitch/vlog.h"
+#include "async-append.h"
+#include "coverage.h"
+#include "dirs.h"
+#include "ovs-thread.h"
+#include "sat-math.h"
+#include "socket-util.h"
+#include "svec.h"
+#include "syslog-direct.h"
+#include "syslog-libc.h"
+#include "syslog-provider.h"
+#include "timeval.h"
+#include "unixctl.h"
+#include "util.h"
+#include "ovs-atomic.h"
+#include "openvswitch/compiler.h"
+#include "dp-packet.h"
+
+#ifdef HAVE_AF_XDP
+#include "xdpsock.h"
+
+static inline void ovs_spinlock_init(ovs_spinlock_t *sl)
+{
+    sl->locked = 0;
+}
+
+static inline void ovs_spin_lock(ovs_spinlock_t *sl)
+{
+    int exp = 0;
+
+    while (!__atomic_compare_exchange_n(&sl->locked, &exp, 1, 0,
+                __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+        while (__atomic_load_n(&sl->locked, __ATOMIC_RELAXED)) {
+            ;
+        }
+        exp = 0;
+    }
+}
+
+static inline void ovs_spin_unlock(ovs_spinlock_t *sl)
+{
+    __atomic_store_n(&sl->locked, 0, __ATOMIC_RELEASE);
+}
+
+static inline int OVS_UNUSED ovs_spin_trylock(ovs_spinlock_t *sl)
+{
+    int exp = 0;
+    return __atomic_compare_exchange_n(&sl->locked, &exp, 1,
+              0, /* disallow spurious failure */
+               __ATOMIC_ACQUIRE, __ATOMIC_RELAXED);
+}
+
+void
+__umem_elem_push_n(struct umem_pool *umemp OVS_UNUSED, void **addrs, int n)
+{
+    void *ptr;
+
+    if (OVS_UNLIKELY(umemp->index + n > umemp->size)) {
+        OVS_NOT_REACHED();
+    }
+
+    ptr = &umemp->array[umemp->index];
+    memcpy(ptr, addrs, n * sizeof(void *));
+    umemp->index += n;
+}
+
+inline void
+__umem_elem_push(struct umem_pool *umemp OVS_UNUSED, void *addr)
+{
+    umemp->array[umemp->index++] = addr;
+}
+
+void
+umem_elem_push(struct umem_pool *umemp OVS_UNUSED, void *addr)
+{
+
+    if (OVS_UNLIKELY(umemp->index >= umemp->size)) {
+        /* stack is full */
+        /* it's possible that one umem gets pushed twice,
+         * because actions=1,2,3... multiple ports?
+        */
+        OVS_NOT_REACHED();
+    }
+
+    ovs_assert(((uint64_t)addr & FRAME_SHIFT_MASK) == 0);
+
+    ovs_spin_lock(&umemp->mutex);
+    __umem_elem_push(umemp, addr);
+    ovs_spin_unlock(&umemp->mutex);
+}
+
+void
+__umem_elem_pop_n(struct umem_pool *umemp OVS_UNUSED, void **addrs, int n)
+{
+    void *ptr;
+
+    umemp->index -= n;
+
+    if (OVS_UNLIKELY(umemp->index < 0)) {
+        OVS_NOT_REACHED();
+    }
+
+    ptr = &umemp->array[umemp->index];
+    memcpy(addrs, ptr, n * sizeof(void *));
+}
+
+inline void *
+__umem_elem_pop(struct umem_pool *umemp OVS_UNUSED)
+{
+    return umemp->array[--umemp->index];
+}
+
+void *
+umem_elem_pop(struct umem_pool *umemp OVS_UNUSED)
+{
+    void *ptr;
+
+    ovs_spin_lock(&umemp->mutex);
+    ptr = __umem_elem_pop(umemp);
+    ovs_spin_unlock(&umemp->mutex);
+
+    return ptr;
+}
+
+void **
+__umem_pool_alloc(unsigned int size)
+{
+    void *bufs;
+
+    ovs_assert(posix_memalign(&bufs, getpagesize(),
+                              size * sizeof(void *)) == 0);
+    memset(bufs, 0, size * sizeof(void *));
+    return (void **)bufs;
+}
+
+unsigned int
+umem_elem_count(struct umem_pool *mpool)
+{
+    return mpool->index;
+}
+
+int
+umem_pool_init(struct umem_pool *umemp OVS_UNUSED, unsigned int size)
+{
+    umemp->array = __umem_pool_alloc(size);
+    if (!umemp->array) {
+        OVS_NOT_REACHED();
+    }
+
+    umemp->size = size;
+    umemp->index = 0;
+    ovs_spinlock_init(&umemp->mutex);
+    return 0;
+}
+
+void
+umem_pool_cleanup(struct umem_pool *umemp OVS_UNUSED)
+{
+    free(umemp->array);
+}
+
+/* AF_XDP metadata init/destroy */
+int
+xpacket_pool_init(struct xpacket_pool *xp, unsigned int size)
+{
+    void *bufs;
+
+    ovs_assert(posix_memalign(&bufs, getpagesize(),
+                              size * sizeof(struct dp_packet_afxdp)) == 0);
+    memset(bufs, 0, size * sizeof(struct dp_packet_afxdp));
+
+    xp->array = bufs;
+    xp->size = size;
+    return 0;
+}
+
+void
+xpacket_pool_cleanup(struct xpacket_pool *xp)
+{
+    free(xp->array);
+}
+#else   /* !HAVE_AF_XDP below */
+#endif
diff --git a/lib/xdpsock.h b/lib/xdpsock.h
new file mode 100644
index 000000000000..cb64befe7dba
--- /dev/null
+++ b/lib/xdpsock.h
@@ -0,0 +1,133 @@ 
+/*
+ * Copyright (c) 2018, 2019 Nicira, Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#ifndef XDPSOCK_H
+#define XDPSOCK_H 1
+#include <errno.h>
+#include <getopt.h>
+#include <libgen.h>
+#include <linux/bpf.h>
+#include <linux/if_link.h>
+#include <linux/if_xdp.h>
+#include <linux/if_ether.h>
+#include <net/if.h>
+#include <signal.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <net/ethernet.h>
+#include <sys/resource.h>
+#include <sys/socket.h>
+#include <sys/mman.h>
+#include <time.h>
+#include <unistd.h>
+#include <pthread.h>
+#include <locale.h>
+#include <sys/types.h>
+#include <poll.h>
+#include <bpf/libbpf.h>
+
+#include "ovs-atomic.h"
+#include "openvswitch/thread.h"
+
+/* bpf/xsk.h uses the following macros not defined in OVS,
+ * so re-define them before include.
+ */
+#define unlikely OVS_UNLIKELY
+#define likely OVS_LIKELY
+#define barrier() __asm__ __volatile__("": : :"memory")
+#define smp_rmb() barrier()
+#define smp_wmb() barrier()
+#include <bpf/xsk.h>
+
+#define FRAME_HEADROOM  XDP_PACKET_HEADROOM
+#define FRAME_SIZE      XSK_UMEM__DEFAULT_FRAME_SIZE
+#define BATCH_SIZE      NETDEV_MAX_BURST
+#define FRAME_SHIFT     XSK_UMEM__DEFAULT_FRAME_SHIFT
+#define FRAME_SHIFT_MASK    ((1<<FRAME_SHIFT)-1)
+
+#define NUM_FRAMES  1024
+#define PROD_NUM_DESCS 128
+#define CONS_NUM_DESCS 128
+
+#ifdef USE_XSK_DEFAULT
+#define PROD_NUM_DESCS XSK_RING_PROD__DEFAULT_NUM_DESCS
+#define CONS_NUM_DESCS XSK_RING_CONS__DEFAULT_NUM_DESCS
+#endif
+
+typedef struct {
+    volatile int locked;
+} ovs_spinlock_t;
+
+/* LIFO ptr_array */
+struct umem_pool {
+    int index;      /* point to top */
+    unsigned int size;
+    ovs_spinlock_t mutex;
+    void **array;   /* a pointer array */
+};
+
+/* array-based dp_packet_afxdp */
+struct xpacket_pool {
+    unsigned int size;
+    struct dp_packet_afxdp **array;
+};
+
+struct xsk_umem_info {
+    struct umem_pool mpool;
+    struct xpacket_pool xpool;
+    struct xsk_ring_prod fq;
+    struct xsk_ring_cons cq;
+    struct xsk_umem *umem;
+    void *buffer;
+};
+
+struct xsk_socket_info {
+    struct xsk_ring_cons rx;
+    struct xsk_ring_prod tx;
+    struct xsk_umem_info *umem;
+    struct xsk_socket *xsk;
+    unsigned long rx_npkts;
+    unsigned long tx_npkts;
+    unsigned long prev_rx_npkts;
+    unsigned long prev_tx_npkts;
+    uint32_t outstanding_tx;
+};
+
+struct umem_elem_head {
+    unsigned int index;
+    struct ovs_mutex mutex;
+    uint32_t n;
+};
+
+struct umem_elem {
+    struct umem_elem *next;
+};
+
+void __umem_elem_push(struct umem_pool *umemp, void *addr);
+void umem_elem_push(struct umem_pool *umemp, void *addr);
+void *__umem_elem_pop(struct umem_pool *umemp);
+void *umem_elem_pop(struct umem_pool *umemp);
+void **__umem_pool_alloc(unsigned int size);
+int umem_pool_init(struct umem_pool *umemp, unsigned int size);
+void umem_pool_cleanup(struct umem_pool *umemp);
+unsigned int umem_elem_count(struct umem_pool *mpool);
+void __umem_elem_pop_n(struct umem_pool *umemp, void **addrs, int n);
+void __umem_elem_push_n(struct umem_pool *umemp, void **addrs, int n);
+int xpacket_pool_init(struct xpacket_pool *xp, unsigned int size);
+void xpacket_pool_cleanup(struct xpacket_pool *xp);
+
+#endif
diff --git a/tests/automake.mk b/tests/automake.mk
index ea16532dd2a0..715cef9a6b3b 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -4,12 +4,14 @@  EXTRA_DIST += \
 	$(SYSTEM_TESTSUITE_AT) \
 	$(SYSTEM_KMOD_TESTSUITE_AT) \
 	$(SYSTEM_USERSPACE_TESTSUITE_AT) \
+	$(SYSTEM_AFXDP_TESTSUITE_AT) \
 	$(SYSTEM_OFFLOADS_TESTSUITE_AT) \
 	$(SYSTEM_DPDK_TESTSUITE_AT) \
 	$(OVSDB_CLUSTER_TESTSUITE_AT) \
 	$(TESTSUITE) \
 	$(SYSTEM_KMOD_TESTSUITE) \
 	$(SYSTEM_USERSPACE_TESTSUITE) \
+	$(SYSTEM_AFXDP_TESTSUITE) \
 	$(SYSTEM_OFFLOADS_TESTSUITE) \
 	$(SYSTEM_DPDK_TESTSUITE) \
 	$(OVSDB_CLUSTER_TESTSUITE) \
@@ -158,6 +160,11 @@  SYSTEM_USERSPACE_TESTSUITE_AT = \
 	tests/system-userspace-macros.at \
 	tests/system-userspace-packet-type-aware.at
 
+SYSTEM_AFXDP_TESTSUITE_AT = \
+	tests/system-afxdp-testsuite.at \
+	tests/system-afxdp-traffic.at \
+	tests/system-afxdp-macros.at
+
 SYSTEM_TESTSUITE_AT = \
 	tests/system-common-macros.at \
 	tests/system-ovn.at \
@@ -182,6 +189,7 @@  TESTSUITE = $(srcdir)/tests/testsuite
 TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch
 SYSTEM_KMOD_TESTSUITE = $(srcdir)/tests/system-kmod-testsuite
 SYSTEM_USERSPACE_TESTSUITE = $(srcdir)/tests/system-userspace-testsuite
+SYSTEM_AFXDP_TESTSUITE = $(srcdir)/tests/system-afxdp-testsuite
 SYSTEM_OFFLOADS_TESTSUITE = $(srcdir)/tests/system-offloads-testsuite
 SYSTEM_DPDK_TESTSUITE = $(srcdir)/tests/system-dpdk-testsuite
 OVSDB_CLUSTER_TESTSUITE = $(srcdir)/tests/ovsdb-cluster-testsuite
@@ -315,6 +323,11 @@  check-system-userspace: all
 	set $(SHELL) '$(SYSTEM_USERSPACE_TESTSUITE)' -C tests  AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
 	"$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
 
+check-afxdp: all
+	$(MAKE) install
+	set $(SHELL) '$(SYSTEM_AFXDP_TESTSUITE)' -C tests  AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \
+	"$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
+
 check-offloads: all
 	set $(SHELL) '$(SYSTEM_OFFLOADS_TESTSUITE)' -C tests  AUTOTEST_PATH='$(AUTOTEST_PATH)'; \
 	"$$@" $(TESTSUITEFLAGS) -j1 || (test X'$(RECHECK)' = Xyes && "$$@" --recheck)
@@ -352,6 +365,10 @@  $(SYSTEM_USERSPACE_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_USERSP
 	$(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
 	$(AM_V_at)mv $@.tmp $@
 
+$(SYSTEM_AFXDP_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_AFXDP_TESTSUITE_AT) $(COMMON_MACROS_AT)
+	$(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
+	$(AM_V_at)mv $@.tmp $@
+
 $(SYSTEM_OFFLOADS_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_OFFLOADS_TESTSUITE_AT) $(COMMON_MACROS_AT)
 	$(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at
 	$(AM_V_at)mv $@.tmp $@
diff --git a/tests/system-afxdp-macros.at b/tests/system-afxdp-macros.at
new file mode 100644
index 000000000000..2c58c2d6554b
--- /dev/null
+++ b/tests/system-afxdp-macros.at
@@ -0,0 +1,153 @@ 
+# _ADD_BR([name])
+#
+# Expands into the proper ovs-vsctl commands to create a bridge with the
+# appropriate type and properties
+m4_define([_ADD_BR], [[add-br $1 -- set Bridge $1 datapath_type=netdev protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15 fail-mode=secure ]])
+
+# OVS_TRAFFIC_VSWITCHD_START([vsctl-args], [vsctl-output], [=override])
+#
+# Creates a database and starts ovsdb-server, starts ovs-vswitchd
+# connected to that database, calls ovs-vsctl to create a bridge named
+# br0 with predictable settings, passing 'vsctl-args' as additional
+# commands to ovs-vsctl.  If 'vsctl-args' causes ovs-vsctl to provide
+# output (e.g. because it includes "create" commands) then 'vsctl-output'
+# specifies the expected output after filtering through uuidfilt.
+m4_define([OVS_TRAFFIC_VSWITCHD_START],
+  [
+   export OVS_PKGDATADIR=$(`pwd`)
+   _OVS_VSWITCHD_START([--disable-system])
+   AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [| uuidfilt])], [0], [$2])
+])
+
+# OVS_TRAFFIC_VSWITCHD_STOP([WHITELIST], [extra_cmds])
+#
+# Gracefully stops ovs-vswitchd and ovsdb-server, checking their log files
+# for messages with severity WARN or higher and signaling an error if any
+# is present.  The optional WHITELIST may contain shell-quoted "sed"
+# commands to delete any warnings that are actually expected, e.g.:
+#
+#   OVS_TRAFFIC_VSWITCHD_STOP(["/expected error/d"])
+#
+# 'extra_cmds' are shell commands to be executed afte OVS_VSWITCHD_STOP() is
+# invoked. They can be used to perform additional cleanups such as name space
+# removal.
+m4_define([OVS_TRAFFIC_VSWITCHD_STOP],
+  [OVS_VSWITCHD_STOP([dnl
+$1";/netdev_linux.*obtaining netdev stats via vport failed/d
+/dpif_netlink.*Generic Netlink family 'ovs_datapath' does not exist. The Open vSwitch kernel module is probably not loaded./d
+/dpif_netdev(revalidator.*)|ERR|internal error parsing flow key/d
+/dpif(revalidator.*)|WARN|netdev@ovs-netdev: failed to put/d
+"])
+   AT_CHECK([:; $2])
+  ])
+
+m4_define([ADD_VETH_AFXDP],
+    [ AT_CHECK([ip link add $1 type veth peer name ovs-$1 || return 77])
+      CONFIGURE_AFXDP_VETH_OFFLOADS([$1])
+      AT_CHECK([ip link set $1 netns $2])
+      AT_CHECK([ip link set dev ovs-$1 up])
+      AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \
+                set interface ovs-$1 external-ids:iface-id="$1" type="afxdp"])
+      NS_CHECK_EXEC([$2], [ip addr add $4 dev $1 $7])
+      NS_CHECK_EXEC([$2], [ip link set dev $1 up])
+      if test -n "$5"; then
+        NS_CHECK_EXEC([$2], [ip link set dev $1 address $5])
+      fi
+      if test -n "$6"; then
+        NS_CHECK_EXEC([$2], [ip route add default via $6])
+      fi
+      on_exit 'ip link del ovs-$1'
+    ]
+)
+
+# CONFIGURE_AFXDP_VETH_OFFLOADS([VETH])
+#
+# Disable TX offloads and VLAN offloads for veths used in AF_XDP.
+m4_define([CONFIGURE_AFXDP_VETH_OFFLOADS],
+    [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])
+     AT_CHECK([ethtool -K $1 rxvlan off], [0], [ignore], [ignore])
+     AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore])
+    ]
+)
+
+# CONFIGURE_VETH_OFFLOADS([VETH])
+#
+# Disable TX offloads for veths.  The userspace datapath uses the AF_PACKET
+# socket to receive packets for veths.  Unfortunately, the AF_PACKET socket
+# doesn't play well with offloads:
+# 1. GSO packets are received without segmentation and therefore discarded.
+# 2. Packets with offloaded partial checksum are received with the wrong
+#    checksum, therefore discarded by the receiver.
+#
+# By disabling tx offloads in the non-OVS side of the veth peer we make sure
+# that the AF_PACKET socket will not receive bad packets.
+#
+# This is a workaround, and should be removed when offloads are properly
+# supported in netdev-linux.
+m4_define([CONFIGURE_VETH_OFFLOADS],
+    [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])]
+)
+
+# CHECK_CONNTRACK()
+#
+# Perform requirements checks for running conntrack tests.
+#
+m4_define([CHECK_CONNTRACK],
+    [AT_SKIP_IF([test $HAVE_PYTHON = no])]
+)
+
+# CHECK_CONNTRACK_ALG()
+#
+# Perform requirements checks for running conntrack ALG tests. The userspace
+# supports FTP and TFTP.
+#
+m4_define([CHECK_CONNTRACK_ALG])
+
+# CHECK_CONNTRACK_FRAG()
+#
+# Perform requirements checks for running conntrack fragmentations tests.
+# The userspace doesn't support fragmentation yet, so skip the tests.
+m4_define([CHECK_CONNTRACK_FRAG],
+[
+    AT_SKIP_IF([:])
+])
+
+# CHECK_CONNTRACK_LOCAL_STACK()
+#
+# Perform requirements checks for running conntrack tests with local stack.
+# While the kernel connection tracker automatically passes all the connection
+# tracking state from an internal port to the OpenvSwitch kernel module, there
+# is simply no way of doing that with the userspace, so skip the tests.
+m4_define([CHECK_CONNTRACK_LOCAL_STACK],
+[
+    AT_SKIP_IF([:])
+])
+
+# CHECK_CONNTRACK_NAT()
+#
+# Perform requirements checks for running conntrack NAT tests. The userspace
+# datapath supports NAT.
+#
+m4_define([CHECK_CONNTRACK_NAT])
+
+# CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE()
+#
+# Perform requirements checks for running ovs-dpctl flush-conntrack by
+# conntrack 5-tuple test. The userspace datapath does not support
+# this feature yet.
+m4_define([CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE],
+[
+    AT_SKIP_IF([:])
+])
+
+# CHECK_CT_DPIF_SET_GET_MAXCONNS()
+#
+# Perform requirements checks for running ovs-dpctl ct-set-maxconns or
+# ovs-dpctl ct-get-maxconns. The userspace datapath does support this feature.
+m4_define([CHECK_CT_DPIF_SET_GET_MAXCONNS])
+
+# CHECK_CT_DPIF_GET_NCONNS()
+#
+# Perform requirements checks for running ovs-dpctl ct-get-nconns. The
+# userspace datapath does support this feature.
+m4_define([CHECK_CT_DPIF_GET_NCONNS])
diff --git a/tests/system-afxdp-testsuite.at b/tests/system-afxdp-testsuite.at
new file mode 100644
index 000000000000..538c0d15d556
--- /dev/null
+++ b/tests/system-afxdp-testsuite.at
@@ -0,0 +1,26 @@ 
+AT_INIT
+
+AT_COPYRIGHT([Copyright (c) 2018 Nicira, Inc.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at:
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.])
+
+m4_ifdef([AT_COLOR_TESTS], [AT_COLOR_TESTS])
+
+m4_include([tests/ovs-macros.at])
+m4_include([tests/ovsdb-macros.at])
+m4_include([tests/ofproto-macros.at])
+m4_include([tests/system-afxdp-macros.at])
+m4_include([tests/system-common-macros.at])
+
+m4_include([tests/system-afxdp-traffic.at])
+m4_include([tests/system-ovn.at])
diff --git a/tests/system-afxdp-traffic.at b/tests/system-afxdp-traffic.at
new file mode 100644
index 000000000000..26f72acf48ef
--- /dev/null
+++ b/tests/system-afxdp-traffic.at
@@ -0,0 +1,978 @@ 
+AT_BANNER([AF_XDP netdev datapath-sanity])
+
+AT_SETUP([datapath - ping between two ports])
+OVS_TRAFFIC_VSWITCHD_START()
+
+ulimit -l unlimited
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping between two ports on vlan])
+OVS_TRAFFIC_VSWITCHD_START()
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+ADD_VLAN(p0, at_ns0, 100, "10.2.2.1/24")
+ADD_VLAN(p1, at_ns1, 100, "10.2.2.2/24")
+
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.2.2.2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping6 between two ports])
+OVS_TRAFFIC_VSWITCHD_START()
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
+
+dnl Linux seems to take a little time to get its IPv6 stack in order. Without
+dnl waiting, we get occasional failures due to the following error:
+dnl "connect: Cannot assign requested address"
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
+
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 6 fc00::2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping6 between two ports on vlan])
+OVS_TRAFFIC_VSWITCHD_START()
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
+
+ADD_VLAN(p0, at_ns0, 100, "fc00:1::1/96")
+ADD_VLAN(p1, at_ns1, 100, "fc00:1::2/96")
+
+dnl Linux seems to take a little time to get its IPv6 stack in order. Without
+dnl waiting, we get occasional failures due to the following error:
+dnl "connect: Cannot assign requested address"
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00:1::2])
+
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping6 -s 1600 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over vxlan tunnel])
+OVS_CHECK_VXLAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([vxlan], [br0], [at_vxlan0], [172.31.1.1], [10.1.1.100/24])
+ADD_NATIVE_TUNNEL([vxlan], [at_vxlan1], [at_ns0], [172.31.1.100], [10.1.1.1/24],
+                  [id 0 dstport 4789])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK
+])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over vxlan6 tunnel])
+OVS_CHECK_VXLAN_UDP6ZEROCSUM()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
+AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL6([vxlan], [br0], [at_vxlan0], [fc00::1], [10.1.1.100/24])
+ADD_NATIVE_TUNNEL6([vxlan], [at_vxlan1], [at_ns0], [fc00::100], [10.1.1.1/24],
+                   [id 0 dstport 4789 udp6zerocsumtx udp6zerocsumrx])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0], [OK
+])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over gre tunnel])
+OVS_CHECK_GRE()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([gre], [br0], [at_gre0], [172.31.1.1], [10.1.1.100/24])
+ADD_NATIVE_TUNNEL([gretap], [ns_gre0], [at_ns0], [172.31.1.100], [10.1.1.1/24])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK
+])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over erspan v1 tunnel])
+OVS_CHECK_GRE()
+OVS_CHECK_ERSPAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1], [10.1.1.100/24], [options:key=1 options:erspan_ver=1 options:erspan_idx=7])
+ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100], [10.1.1.1/24], [seq key 1 erspan_ver 1 erspan 7])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK
+])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over erspan v2 tunnel])
+OVS_CHECK_GRE()
+OVS_CHECK_ERSPAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1], [10.1.1.100/24], [options:key=1 options:erspan_ver=2 options:erspan_dir=1 options:erspan_hwid=0x7])
+ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100], [10.1.1.1/24], [seq key 1 erspan_ver 2 erspan_dir egress erspan_hwid 7])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK
+])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over ip6erspan v1 tunnel])
+OVS_CHECK_GRE()
+OVS_CHECK_ERSPAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [], nodad)
+AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1], [10.1.1.100/24],
+                [options:key=123 options:erspan_ver=1 options:erspan_idx=0x7])
+ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0], [fc00:100::100],
+                   [10.1.1.1/24], [local fc00:100::1 seq key 123 erspan_ver 1 erspan 7])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0], [OK
+])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over ip6erspan v2 tunnel])
+OVS_CHECK_GRE()
+OVS_CHECK_ERSPAN()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [], nodad)
+AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1], [10.1.1.100/24],
+                [options:key=121 options:erspan_ver=2 options:erspan_dir=0 options:erspan_hwid=0x7])
+ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0], [fc00:100::100],
+                   [10.1.1.1/24],
+                   [local fc00:100::1 seq key 121 erspan_ver 2 erspan_dir ingress erspan_hwid 0x7])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0], [OK
+])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over geneve tunnel])
+OVS_CHECK_GENEVE()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24")
+AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL([geneve], [br0], [at_gnv0], [172.31.1.1], [10.1.1.100/24])
+ADD_NATIVE_TUNNEL([geneve], [ns_gnv0], [at_ns0], [172.31.1.100], [10.1.1.1/24],
+                  [vni 0])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add 172.31.1.100/24 br-underlay], [0], [OK
+])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - ping over geneve6 tunnel])
+OVS_CHECK_GENEVE_UDP6ZEROCSUM()
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-underlay])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"])
+
+ADD_NAMESPACES(at_ns0)
+
+dnl Set up underlay link from host into the namespace using veth pair.
+ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad")
+AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad])
+AT_CHECK([ip link set dev br-underlay up])
+
+dnl Set up tunnel endpoints on OVS outside the namespace and with a native
+dnl linux device inside the namespace.
+ADD_OVS_TUNNEL6([geneve], [br0], [at_gnv0], [fc00::1], [10.1.1.100/24])
+ADD_NATIVE_TUNNEL6([geneve], [ns_gnv0], [at_ns0], [fc00::100], [10.1.1.1/24],
+                   [vni 0 udp6zerocsumtx udp6zerocsumrx])
+
+AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK
+])
+AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0], [OK
+])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100])
+
+dnl First, check the underlay
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+dnl Okay, now check the overlay with different packet sizes
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - clone action])
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1, at_ns2)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+AT_CHECK([ovs-vsctl -- set interface ovs-p0 ofport_request=1 \
+                    -- set interface ovs-p1 ofport_request=2])
+
+AT_DATA([flows.txt], [dnl
+priority=1 actions=NORMAL
+priority=10 in_port=1,ip,actions=clone(mod_dl_dst(50:54:00:00:00:0a),set_field:192.168.3.3->ip_dst), output:2
+priority=10 in_port=2,ip,actions=clone(mod_dl_src(ae:c6:7e:54:8d:4d),mod_dl_dst(50:54:00:00:00:0b),set_field:192.168.4.4->ip_dst, controller), output:1
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir --pidfile 2> ofctl_monitor.log])
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([cat ofctl_monitor.log | STRIP_MONITOR_CSUM], [0], [dnl
+icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: <skip>
+icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: <skip>
+icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: <skip>
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([datapath - basic truncate action])
+AT_SKIP_IF([test $HAVE_NC = no])
+OVS_TRAFFIC_VSWITCHD_START()
+AT_CHECK([ovs-ofctl del-flows br0])
+
+dnl Create p0 and ovs-p0(1)
+ADD_NAMESPACES(at_ns0)
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+NS_CHECK_EXEC([at_ns0], [ip link set dev p0 address e6:66:c1:11:11:11])
+NS_CHECK_EXEC([at_ns0], [arp -s 10.1.1.2 e6:66:c1:22:22:22])
+
+dnl Create p1(3) and ovs-p1(2), packets received from ovs-p1 will appear in p1
+AT_CHECK([ip link add p1 type veth peer name ovs-p1])
+on_exit 'ip link del ovs-p1'
+AT_CHECK([ip link set dev ovs-p1 up])
+AT_CHECK([ip link set dev p1 up])
+AT_CHECK([ovs-vsctl add-port br0 ovs-p1 -- set interface ovs-p1 ofport_request=2])
+dnl Use p1 to check the truncated packet
+AT_CHECK([ovs-vsctl add-port br0 p1 -- set interface p1 ofport_request=3])
+
+dnl Create p2(5) and ovs-p2(4)
+AT_CHECK([ip link add p2 type veth peer name ovs-p2])
+on_exit 'ip link del ovs-p2'
+AT_CHECK([ip link set dev ovs-p2 up])
+AT_CHECK([ip link set dev p2 up])
+AT_CHECK([ovs-vsctl add-port br0 ovs-p2 -- set interface ovs-p2 ofport_request=4])
+dnl Use p2 to check the truncated packet
+AT_CHECK([ovs-vsctl add-port br0 p2 -- set interface p2 ofport_request=5])
+
+dnl basic test
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_DATA([flows.txt], [dnl
+in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop
+in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop
+in_port=1 dl_dst=e6:66:c1:22:22:22 actions=output(port=2,max_len=100),output:4
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+dnl use this file as payload file for ncat
+AT_CHECK([dd if=/dev/urandom of=payload200.bin bs=200 count=1 2> /dev/null])
+on_exit 'rm -f payload200.bin'
+NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin])
+
+dnl packet with truncated size
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" |  sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
+n_bytes=100
+])
+dnl packet with original size
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
+n_bytes=242
+])
+
+dnl more complicated output actions
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_DATA([flows.txt], [dnl
+in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop
+in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop
+in_port=1 dl_dst=e6:66:c1:22:22:22 actions=output(port=2,max_len=100),output:4,output(port=2,max_len=100),output(port=4,max_len=100),output:2,output(port=4,max_len=200),output(port=2,max_len=65535)
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin])
+
+dnl 100 + 100 + 242 + min(65535,242) = 684
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
+n_bytes=684
+])
+dnl 242 + 100 + min(242,200) = 542
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
+n_bytes=542
+])
+
+dnl SLOW_ACTION: disable kernel datapath truncate support
+dnl Repeat the test above, but exercise the SLOW_ACTION code path
+AT_CHECK([ovs-appctl dpif/set-dp-features br0 trunc false], [0])
+
+dnl SLOW_ACTION test1: check datapatch actions
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+
+AT_CHECK([ovs-appctl ofproto/trace br0 "in_port=1,dl_type=0x800,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_proto=6,tp_src=8,tp_dst=9"], [0], [stdout])
+AT_CHECK([tail -3 stdout], [0],
+[Datapath actions: trunc(100),3,5,trunc(100),3,trunc(100),5,3,trunc(200),5,trunc(65535),3
+This flow is handled by the userspace slow path because it:
+  - Uses action(s) not supported by datapath.
+])
+
+dnl SLOW_ACTION test2: check actual packet truncate
+AT_CHECK([ovs-ofctl del-flows br0])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin])
+
+dnl 100 + 100 + 242 + min(65535,242) = 684
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
+n_bytes=684
+])
+
+dnl 242 + 100 + min(242,200) = 542
+AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl
+n_bytes=542
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+
+AT_BANNER([conntrack])
+
+AT_SETUP([conntrack - controller])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg ofproto_dpif_upcall:dbg])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,udp,action=ct(commit),controller
+priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0)
+priority=100,in_port=2,ct_state=+trk+est,udp,action=controller
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+AT_CAPTURE_FILE([ofctl_monitor.log])
+AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir --pidfile 2> ofctl_monitor.log])
+
+dnl Send an unsolicited reply from port 2. This should be dropped.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\) '50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000'])
+
+dnl OK, now start a new connection from port 1.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 1 ct\(commit\),controller '50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000'])
+
+dnl Now try a reply from port 2.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\) '50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000'])
+
+dnl Check this output. We only see the latter two packets, not the first.
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN2 (xid=0x0): total_len=42 in_port=1 (via action) data_len=42 (unbuffered)
+udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2 udp_csum:0
+NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 ct_state=est|rpl|trk,ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2,ip,in_port=2 (via action) data_len=42 (unbuffered)
+udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1 udp_csum:0
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - force commit])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg ofproto_dpif_upcall:dbg])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,udp,action=ct(force,commit),controller
+priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0)
+priority=100,in_port=2,ct_state=+trk+est,udp,action=ct(force,commit,table=1)
+table=1,in_port=2,ct_state=+trk,udp,action=controller
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+AT_CAPTURE_FILE([ofctl_monitor.log])
+AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir --pidfile 2> ofctl_monitor.log])
+
+dnl Send an unsolicited reply from port 2. This should be dropped.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
+
+dnl OK, now start a new connection from port 1.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
+
+dnl Now try a reply from port 2.
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
+
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+
+dnl Check this output. We only see the latter two packets, not the first.
+AT_CHECK([cat ofctl_monitor.log], [0], [dnl
+NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 in_port=1 (via action) data_len=42 (unbuffered)
+udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2 udp_csum:0
+NXT_PACKET_IN2 (xid=0x0): table_id=1 cookie=0x0 total_len=42 ct_state=new|trk,ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1,ip,in_port=2 (via action) data_len=42 (unbuffered)
+udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1 udp_csum:0
+])
+
+dnl
+dnl Check that the directionality has been changed by force commit.
+dnl
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [], [dnl
+udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2)
+])
+
+dnl OK, now send another packet from port 1 and see that it switches again
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
+AT_CHECK([ovs-appctl revalidator/purge], [0])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1,"], [], [dnl
+udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - ct flush by 5-tuple])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,udp,action=ct(commit),2
+priority=100,in_port=2,udp,action=ct(zone=5,commit),1
+priority=100,in_port=1,icmp,action=ct(commit),2
+priority=100,in_port=2,icmp,action=ct(zone=5,commit),1
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+dnl Test UDP from port 1
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1,"], [], [dnl
+udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1)
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1'])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1,"], [1], [dnl
+])
+
+dnl Test UDP from port 2
+AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [0], [dnl
+udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5 'ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2'])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
+])
+
+dnl Test ICMP traffic
+NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [0], [stdout])
+AT_CHECK([cat stdout | FORMAT_CT(10.1.1.1)], [0],[dnl
+icmp,orig=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=8,code=0),reply=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=0,code=0),zone=5
+])
+
+ICMP_ID=`cat stdout | cut -d ',' -f4 | cut -d '=' -f2`
+ICMP_TUPLE=ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=1,icmp_id=$ICMP_ID,icmp_type=8,icmp_code=0
+AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5 $ICMP_TUPLE])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [1], [dnl
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - IPv4 ping])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,icmp,action=ct(commit),2
+priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
+priority=100,in_port=2,icmp,ct_state=+trk+est,action=1
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+dnl Pings from ns0->ns1 should work fine.
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
+icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=0,code=0)
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack])
+
+dnl Pings from ns1->ns0 should fail.
+NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], [0], [dnl
+7 packets transmitted, 0 received, 100% packet loss, time 0ms
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - get_nconns and get/set_maxconns])
+CHECK_CONNTRACK()
+CHECK_CT_DPIF_SET_GET_MAXCONNS()
+CHECK_CT_DPIF_GET_NCONNS()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0.
+AT_DATA([flows.txt], [dnl
+priority=1,action=drop
+priority=10,arp,action=normal
+priority=100,in_port=1,icmp,action=ct(commit),2
+priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0)
+priority=100,in_port=2,icmp,ct_state=+trk+est,action=1
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+dnl Pings from ns0->ns1 should work fine.
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl
+icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=<cleared>,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=<cleared>,type=0,code=0)
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp], [2], [], [dnl
+ovs-vswitchd: maxconns missing or malformed (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-set-maxconns a], [2], [], [dnl
+ovs-vswitchd: maxconns missing or malformed (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp 10], [2], [], [dnl
+ovs-vswitchd: datapath not found (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-maxconns one-bad-dp], [2], [], [dnl
+ovs-vswitchd: datapath not found (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-nconns one-bad-dp], [2], [], [dnl
+ovs-vswitchd: datapath not found (Invalid argument)
+ovs-appctl: ovs-vswitchd: server returned an error
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl
+1
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
+3000000
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-set-maxconns 10], [], [dnl
+setting maxconns successful
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
+10
+])
+
+AT_CHECK([ovs-appctl dpctl/flush-conntrack])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl
+0
+])
+
+AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl
+10
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_SETUP([conntrack - IPv6 ping])
+CHECK_CONNTRACK()
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96")
+ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96")
+
+AT_DATA([flows.txt], [dnl
+
+dnl ICMPv6 echo request and reply go to table 1.  The rest of the traffic goes
+dnl through normal action.
+table=0,priority=10,icmp6,icmp_type=128,action=goto_table:1
+table=0,priority=10,icmp6,icmp_type=129,action=goto_table:1
+table=0,priority=1,action=normal
+
+dnl Allow everything from ns0->ns1. Only allow return traffic from ns1->ns0.
+table=1,priority=100,in_port=1,icmp6,action=ct(commit),2
+table=1,priority=100,in_port=2,icmp6,ct_state=-trk,action=ct(table=0)
+table=1,priority=100,in_port=2,icmp6,ct_state=+trk+est,action=1
+table=1,priority=1,action=drop
+])
+
+AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt])
+
+OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2])
+
+dnl The above ping creates state in the connection tracker.  We're not
+dnl interested in that state.
+AT_CHECK([ovs-appctl dpctl/flush-conntrack])
+
+dnl Pings from ns1->ns0 should fail.
+NS_CHECK_EXEC([at_ns1], [ping6 -q -c 3 -i 0.3 -w 2 fc00::1 | FORMAT_PING], [0], [dnl
+7 packets transmitted, 0 received, 100% packet loss, time 0ms
+])
+
+dnl Pings from ns0->ns1 should work fine.
+NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::2 | FORMAT_PING], [0], [dnl
+3 packets transmitted, 3 received, 0% packet loss, time 0ms
+])
+
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fc00::2)], [0], [dnl
+icmpv6,orig=(src=fc00::1,dst=fc00::2,id=<cleared>,type=128,code=0),reply=(src=fc00::2,dst=fc00::1,id=<cleared>,type=129,code=0)
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP