diff mbox series

[ovs-dev,v2] netdev-afxdp: Best-effort configuration of XDP mode.

Message ID 20191107113633.9630-1-i.maximets@ovn.org
State Accepted
Headers show
Series [ovs-dev,v2] netdev-afxdp: Best-effort configuration of XDP mode. | expand

Commit Message

Ilya Maximets Nov. 7, 2019, 11:36 a.m. UTC
Until now there was only two options for XDP mode in OVS: SKB or DRV.
i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.

Devices like 'veth' interfaces in Linux supports native XDP, but
doesn't support zero-copy mode.  This case can not be covered by
existing API and we have to use slower generic XDP for such devices.
There are few more issues, e.g. TCP is not supported in generic XDP
mode for veth interfaces due to kernel limitations, however it is
supported in native mode.

This change introduces ability to use native XDP without zero-copy
along with best-effort configuration option that enabled by default.
In best-effort case OVS will sequentially try different modes starting
from the fastest one and will choose the first acceptable for current
interface.  This will guarantee the best possible performance.

If user will want to choose specific mode, it's still possible by
setting the 'options:xdp-mode'.

This change additionally changes the API by renaming the configuration
knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
themselves to be more user-friendly.

The full list of currently supported modes:
  * native-with-zerocopy - former DRV
  * native               - new one, DRV without zero-copy
  * generic              - former SKB
  * best-effort          - new one, chooses the best available from
                           3 above modes

Since 'best-effort' is a default mode, users will not need to
explicitely set 'xdp-mode' in most cases.

TCP related tests enabled back in system afxdp testsuite, because
'best-effort' will choose 'native' mode for veth interfaces
and this mode has no issues with TCP.

Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---

With this patch I modified the user-visible API, but I think it's OK
since it's still an experimental netdev.  Comments are welcome.

Version 2:
  * Rebased on current master.

 Documentation/intro/install/afxdp.rst |  54 ++++---
 NEWS                                  |  12 +-
 lib/netdev-afxdp.c                    | 223 ++++++++++++++++----------
 lib/netdev-afxdp.h                    |   9 ++
 lib/netdev-linux-private.h            |   8 +-
 tests/system-afxdp-macros.at          |   7 -
 vswitchd/vswitch.xml                  |  38 +++--
 7 files changed, 227 insertions(+), 124 deletions(-)

Comments

Eelco Chaudron Nov. 7, 2019, 12:39 p.m. UTC | #1
On 7 Nov 2019, at 12:36, Ilya Maximets wrote:

> Until now there was only two options for XDP mode in OVS: SKB or DRV.
> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
>
> Devices like 'veth' interfaces in Linux supports native XDP, but
> doesn't support zero-copy mode.  This case can not be covered by
> existing API and we have to use slower generic XDP for such devices.
> There are few more issues, e.g. TCP is not supported in generic XDP
> mode for veth interfaces due to kernel limitations, however it is
> supported in native mode.
>
> This change introduces ability to use native XDP without zero-copy
> along with best-effort configuration option that enabled by default.
> In best-effort case OVS will sequentially try different modes starting
> from the fastest one and will choose the first acceptable for current
> interface.  This will guarantee the best possible performance.
>
> If user will want to choose specific mode, it's still possible by
> setting the 'options:xdp-mode'.
>
> This change additionally changes the API by renaming the configuration
> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
> themselves to be more user-friendly.
>
> The full list of currently supported modes:
>   * native-with-zerocopy - former DRV
>   * native               - new one, DRV without zero-copy
>   * generic              - former SKB
>   * best-effort          - new one, chooses the best available from
>                            3 above modes
>
> Since 'best-effort' is a default mode, users will not need to
> explicitely set 'xdp-mode' in most cases.
>
> TCP related tests enabled back in system afxdp testsuite, because
> 'best-effort' will choose 'native' mode for veth interfaces
> and this mode has no issues with TCP.
>
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>

No review, I was running some performance tests for the OVS conference 
presentation so quickly tried this patch (assuming performance would 
increase :)…

So with the native option (default for tap) I see a decrease, :(,  in 
performance over the skb mode (this is with my usual ovs_perf PVP test 
setup).

With the patch applied, and default configuration 
(xdp-mode-in-use=native-with-zerocopy for my tap):

"Physical to Virtual to Physical test, L3 flows[port redirect]"
,                 Packet size
Number of flows,  64
    1,             711581
  100,             673152
1000,             617733

Here you will even see a performance drop with multiple IP flows (this 
is a single queue config).

With SKB mode (xdp-mode-in-use=generic):

"Physical to Virtual to Physical test, L3 flows[port redirect]"
,                 Packet size
Number of flows,  64
    1,             800796
  100,             800912
1000,             800133

Cheers,

Eelco
Ilya Maximets Nov. 7, 2019, 1:21 p.m. UTC | #2
On 07.11.2019 13:39, Eelco Chaudron wrote:
> 
> 
> On 7 Nov 2019, at 12:36, Ilya Maximets wrote:
> 
>> Until now there was only two options for XDP mode in OVS: SKB or DRV.
>> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
>>
>> Devices like 'veth' interfaces in Linux supports native XDP, but
>> doesn't support zero-copy mode.  This case can not be covered by
>> existing API and we have to use slower generic XDP for such devices.
>> There are few more issues, e.g. TCP is not supported in generic XDP
>> mode for veth interfaces due to kernel limitations, however it is
>> supported in native mode.
>>
>> This change introduces ability to use native XDP without zero-copy
>> along with best-effort configuration option that enabled by default.
>> In best-effort case OVS will sequentially try different modes starting
>> from the fastest one and will choose the first acceptable for current
>> interface.  This will guarantee the best possible performance.
>>
>> If user will want to choose specific mode, it's still possible by
>> setting the 'options:xdp-mode'.
>>
>> This change additionally changes the API by renaming the configuration
>> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
>> themselves to be more user-friendly.
>>
>> The full list of currently supported modes:
>>   * native-with-zerocopy - former DRV
>>   * native               - new one, DRV without zero-copy
>>   * generic              - former SKB
>>   * best-effort          - new one, chooses the best available from
>>                            3 above modes
>>
>> Since 'best-effort' is a default mode, users will not need to
>> explicitely set 'xdp-mode' in most cases.
>>
>> TCP related tests enabled back in system afxdp testsuite, because
>> 'best-effort' will choose 'native' mode for veth interfaces
>> and this mode has no issues with TCP.
>>
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> 
> No review, I was running some performance tests for the OVS conference presentation so quickly tried this patch (assuming performance would increase :)…
> 
> So with the native option (default for tap) I see a decrease, :(,  in performance over the skb mode (this is with my usual ovs_perf PVP test setup).
> 
> With the patch applied, and default configuration (xdp-mode-in-use=native-with-zerocopy for my tap):

Hmm. tap supports zero-copy? Interesting.
Have you tried forcing 'native' mode without zero-copy?
Maybe it will make sense to enable/disable need-wakeup feature.

> 
> "Physical to Virtual to Physical test, L3 flows[port redirect]"
> ,                 Packet size
> Number of flows,  64
>     1,             711581
>   100,             673152
> 1000,             617733
> 
> Here you will even see a performance drop with multiple IP flows (this is a single queue config).

This is strange..

> 
> With SKB mode (xdp-mode-in-use=generic):

I'm curious, does TCP work via tap interfaces in generic mode?

> 
> "Physical to Virtual to Physical test, L3 flows[port redirect]"
> ,                 Packet size
> Number of flows,  64
>     1,             800796
>   100,             800912
> 1000,             800133
> 
> Cheers,
> 
> Eelco
>
Eelco Chaudron Nov. 7, 2019, 1:28 p.m. UTC | #3
On 7 Nov 2019, at 14:21, Ilya Maximets wrote:

> On 07.11.2019 13:39, Eelco Chaudron wrote:
>>
>>
>> On 7 Nov 2019, at 12:36, Ilya Maximets wrote:
>>
>>> Until now there was only two options for XDP mode in OVS: SKB or 
>>> DRV.
>>> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
>>>
>>> Devices like 'veth' interfaces in Linux supports native XDP, but
>>> doesn't support zero-copy mode.  This case can not be covered by
>>> existing API and we have to use slower generic XDP for such devices.
>>> There are few more issues, e.g. TCP is not supported in generic XDP
>>> mode for veth interfaces due to kernel limitations, however it is
>>> supported in native mode.
>>>
>>> This change introduces ability to use native XDP without zero-copy
>>> along with best-effort configuration option that enabled by default.
>>> In best-effort case OVS will sequentially try different modes 
>>> starting
>>> from the fastest one and will choose the first acceptable for 
>>> current
>>> interface.  This will guarantee the best possible performance.
>>>
>>> If user will want to choose specific mode, it's still possible by
>>> setting the 'options:xdp-mode'.
>>>
>>> This change additionally changes the API by renaming the 
>>> configuration
>>> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
>>> themselves to be more user-friendly.
>>>
>>> The full list of currently supported modes:
>>>   * native-with-zerocopy - former DRV
>>>   * native               - new one, DRV without 
>>> zero-copy
>>>   * generic              - former SKB
>>>   * best-effort          - new one, chooses the best 
>>> available from
>>>                            3 above modes
>>>
>>> Since 'best-effort' is a default mode, users will not need to
>>> explicitely set 'xdp-mode' in most cases.
>>>
>>> TCP related tests enabled back in system afxdp testsuite, because
>>> 'best-effort' will choose 'native' mode for veth interfaces
>>> and this mode has no issues with TCP.
>>>
>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>
>> No review, I was running some performance tests for the OVS 
>> conference presentation so quickly tried this patch (assuming 
>> performance would increase :)…
>>
>> So with the native option (default for tap) I see a decrease, :(,  
>> in performance over the skb mode (this is with my usual ovs_perf PVP 
>> test setup).
>>
>> With the patch applied, and default configuration 
>> (xdp-mode-in-use=native-with-zerocopy for my tap):
>
> Hmm. tap supports zero-copy? Interesting.
> Have you tried forcing 'native' mode without zero-copy?
> Maybe it will make sense to enable/disable need-wakeup feature.

Oops, wrong cut/paste: xdp-mode-in-use=native

All my the tests where with use-need-wakeup=true

   port 2: tapVM (afxdp: n_rxq=1, use-need-wakeup=true, 
xdp-mode=best-effort, xdp-mode-in-use=native)

>
>>
>> "Physical to Virtual to Physical test, L3 flows[port redirect]"
>> ,                 Packet size
>> Number of flows,  64
>>     1,             711581
>>   100,             673152
>> 1000,             617733
>>
>> Here you will even see a performance drop with multiple IP flows 
>> (this is a single queue config).
>
> This is strange..
>
>>
>> With SKB mode (xdp-mode-in-use=generic):
>
> I'm curious, does TCP work via tap interfaces in generic mode?
>
>>
>> "Physical to Virtual to Physical test, L3 flows[port redirect]"
>> ,                 Packet size
>> Number of flows,  64
>>     1,             800796
>>   100,             800912
>> 1000,             800133
>>
>> Cheers,
>>
>> Eelco
>>
William Tu Nov. 8, 2019, 1:56 p.m. UTC | #4
On Thu, Nov 07, 2019 at 02:28:18PM +0100, Eelco Chaudron wrote:
> 
> 
> On 7 Nov 2019, at 14:21, Ilya Maximets wrote:
> 
> >On 07.11.2019 13:39, Eelco Chaudron wrote:
> >>
> >>
> >>On 7 Nov 2019, at 12:36, Ilya Maximets wrote:
> >>
> >>>Until now there was only two options for XDP mode in OVS: SKB or DRV.
> >>>i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
> >>>
> >>>Devices like 'veth' interfaces in Linux supports native XDP, but
> >>>doesn't support zero-copy mode.  This case can not be covered by
> >>>existing API and we have to use slower generic XDP for such devices.
> >>>There are few more issues, e.g. TCP is not supported in generic XDP
> >>>mode for veth interfaces due to kernel limitations, however it is
> >>>supported in native mode.
> >>>
> >>>This change introduces ability to use native XDP without zero-copy
> >>>along with best-effort configuration option that enabled by default.
> >>>In best-effort case OVS will sequentially try different modes starting
> >>>from the fastest one and will choose the first acceptable for current
> >>>interface.  This will guarantee the best possible performance.
> >>>
> >>>If user will want to choose specific mode, it's still possible by
> >>>setting the 'options:xdp-mode'.
> >>>
> >>>This change additionally changes the API by renaming the configuration
> >>>knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
> >>>themselves to be more user-friendly.
> >>>
> >>>The full list of currently supported modes:
> >>>  * native-with-zerocopy - former DRV
> >>>  * native               - new one, DRV without zero-copy
> >>>  * generic              - former SKB
> >>>  * best-effort          - new one, chooses the best available from
> >>>                           3 above modes
> >>>
> >>>Since 'best-effort' is a default mode, users will not need to
> >>>explicitely set 'xdp-mode' in most cases.
> >>>
> >>>TCP related tests enabled back in system afxdp testsuite, because
> >>>'best-effort' will choose 'native' mode for veth interfaces
> >>>and this mode has no issues with TCP.
> >>>
> >>>Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> >>
> >>No review, I was running some performance tests for the OVS conference
> >>presentation so quickly tried this patch (assuming performance would
> >>increase :)…
> >>
> >>So with the native option (default for tap) I see a decrease, :(,  in
> >>performance over the skb mode (this is with my usual ovs_perf PVP test
> >>setup).
> >>
> >>With the patch applied, and default configuration
> >>(xdp-mode-in-use=native-with-zerocopy for my tap):
> >
> >Hmm. tap supports zero-copy? Interesting.

No, tap doesn't support zero-copy mode.

> >Have you tried forcing 'native' mode without zero-copy?
> >Maybe it will make sense to enable/disable need-wakeup feature.
> 
> Oops, wrong cut/paste: xdp-mode-in-use=native
> 
> All my the tests where with use-need-wakeup=true
> 
>   port 2: tapVM (afxdp: n_rxq=1, use-need-wakeup=true, xdp-mode=best-effort,
> xdp-mode-in-use=native)
> 
> >
> >>
> >>"Physical to Virtual to Physical test, L3 flows[port redirect]"
> >>,                 Packet size
> >>Number of flows,  64
> >>    1,             711581
> >>  100,             673152
> >>1000,             617733
> >>
> >>Here you will even see a performance drop with multiple IP flows (this
> >>is a single queue config).
> >
> >This is strange..

One explanation might be that
For using tap with native mode, OVS is sending XDP frame into the
kernel, but since tap device eventually has to create an skb and
forward to vhost_net then QEMU, so anyway we are having skb overhead.

On the other hand
For using tap with skb mode, there might be more optimization exist
in kernel, some batch allocation and batch rx/tx (See driver/net/tun.c)

--William
William Tu Nov. 8, 2019, 2:22 p.m. UTC | #5
On Thu, Nov 07, 2019 at 12:36:33PM +0100, Ilya Maximets wrote:
> Until now there was only two options for XDP mode in OVS: SKB or DRV.
> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
> 
> Devices like 'veth' interfaces in Linux supports native XDP, but
> doesn't support zero-copy mode.  This case can not be covered by
> existing API and we have to use slower generic XDP for such devices.
> There are few more issues, e.g. TCP is not supported in generic XDP
> mode for veth interfaces due to kernel limitations, however it is
> supported in native mode.
> 
> This change introduces ability to use native XDP without zero-copy
> along with best-effort configuration option that enabled by default.
> In best-effort case OVS will sequentially try different modes starting
> from the fastest one and will choose the first acceptable for current
> interface.  This will guarantee the best possible performance.
> 
> If user will want to choose specific mode, it's still possible by
> setting the 'options:xdp-mode'.
> 
> This change additionally changes the API by renaming the configuration
> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
> themselves to be more user-friendly.
> 
> The full list of currently supported modes:
>   * native-with-zerocopy - former DRV
>   * native               - new one, DRV without zero-copy
>   * generic              - former SKB
>   * best-effort          - new one, chooses the best available from
>                            3 above modes

Since we are renaming the mode, in doc, should we tell user the mapping
of these mode to kernel AF_XDP's mode?
So let user know generic mode in OVS  = generic SKB in kernel,
native mode in OVS = native mode without zc...

> 
> Since 'best-effort' is a default mode, users will not need to
> explicitely set 'xdp-mode' in most cases.
> 
> TCP related tests enabled back in system afxdp testsuite, because
> 'best-effort' will choose 'native' mode for veth interfaces
> and this mode has no issues with TCP.
> 
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---

LGTM.
Acked-by: William Tu <u9012063@gmail.com>

I look through the patch and everything looks good.
Thanks for fixing the documentation.
'make check-afxdp' also pass the previous failed cases due to TCP.

It would also be good if this 'best-effort' mode is supported in libbpf.
I think other users of af_xdp also have the same configuration headache.

Regards,
William

> 
> With this patch I modified the user-visible API, but I think it's OK
> since it's still an experimental netdev.  Comments are welcome.
> 
> Version 2:
>   * Rebased on current master.
> 
>  Documentation/intro/install/afxdp.rst |  54 ++++---
>  NEWS                                  |  12 +-
>  lib/netdev-afxdp.c                    | 223 ++++++++++++++++----------
>  lib/netdev-afxdp.h                    |   9 ++
>  lib/netdev-linux-private.h            |   8 +-
>  tests/system-afxdp-macros.at          |   7 -
>  vswitchd/vswitch.xml                  |  38 +++--
>  7 files changed, 227 insertions(+), 124 deletions(-)
> 
> diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst
> index a136db0c9..937770ad0 100644
> --- a/Documentation/intro/install/afxdp.rst
> +++ b/Documentation/intro/install/afxdp.rst
> @@ -153,9 +153,8 @@ To kick start end-to-end autotesting::
>    make check-afxdp TESTSUITEFLAGS='1'
>  
>  .. note::
> -   Not all test cases pass at this time. Currenly all TCP related
> -   tests, ex: using wget or http, are skipped due to XDP limitations
> -   on veth. cvlan test is also skipped.
> +   Not all test cases pass at this time. Currenly all cvlan tests are skipped
> +   due to kernel issues.
>  
>  If a test case fails, check the log at::
>  
> @@ -177,33 +176,35 @@ in :doc:`general`::
>    ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
>  
>  Make sure your device driver support AF_XDP, netdev-afxdp supports
> -the following additional options (see man ovs-vswitchd.conf.db for
> +the following additional options (see ``man ovs-vswitchd.conf.db`` for
>  more details):
>  
> - * **xdpmode**: use "drv" for driver mode, or "skb" for skb mode.
> + * ``xdp-mode``: ``best-effort``, ``native-with-zerocopy``,
> +   ``native`` or ``generic``.  Defaults to ``best-effort``, i.e. best of
> +   supported modes, so in most cases you don't need to change it.
>  
> - * **use-need-wakeup**: default "true" if libbpf supports it, otherwise false.
> + * ``use-need-wakeup``: default ``true`` if libbpf supports it,
> +   otherwise ``false``.
>  
>  For example, to use 1 PMD (on core 4) on 1 queue (queue 0) device,
> -configure these options: **pmd-cpu-mask, pmd-rxq-affinity, and n_rxq**.
> -The **xdpmode** can be "drv" or "skb"::
> +configure these options: ``pmd-cpu-mask``, ``pmd-rxq-affinity``, and
> +``n_rxq``::
>  
>    ethtool -L enp2s0 combined 1
>    ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
>    ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
> -    options:n_rxq=1 options:xdpmode=drv \
> -    other_config:pmd-rxq-affinity="0:4"
> +                                   other_config:pmd-rxq-affinity="0:4"
>  
>  Or, use 4 pmds/cores and 4 queues by doing::
>  
>    ethtool -L enp2s0 combined 4
>    ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
>    ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
> -    options:n_rxq=4 options:xdpmode=drv \
> -    other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
> +    options:n_rxq=4 other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
>  
>  .. note::
> -   pmd-rxq-affinity is optional. If not specified, system will auto-assign.
> +   ``pmd-rxq-affinity`` is optional. If not specified, system will auto-assign.
> +   ``n_rxq`` equals ``1`` by default.
>  
>  To validate that the bridge has successfully instantiated, you can use the::
>  
> @@ -214,12 +215,21 @@ Should show something like::
>    Port "ens802f0"
>     Interface "ens802f0"
>        type: afxdp
> -      options: {n_rxq="1", xdpmode=drv}
> +      options: {n_rxq="1"}
>  
>  Otherwise, enable debugging by::
>  
>    ovs-appctl vlog/set netdev_afxdp::dbg
>  
> +To check which XDP mode was chosen by ``best-effort``, you can look for
> +``xdp-mode-in-use`` in the output of ``ovs-appctl dpctl/show``::
> +
> +  # ovs-appctl dpctl/show
> +  netdev@ovs-netdev:
> +    <...>
> +    port 2: ens802f0 (afxdp: n_rxq=1, use-need-wakeup=true,
> +                      xdp-mode=best-effort,
> +                      xdp-mode-in-use=native-with-zerocopy)
>  
>  References
>  ----------
> @@ -323,8 +333,11 @@ Limitations/Known Issues
>  #. Most of the tests are done using i40e single port. Multiple ports and
>     also ixgbe driver also needs to be tested.
>  #. No latency test result (TODO items)
> -#. Due to limitations of current upstream kernel, TCP and various offloading
> +#. Due to limitations of current upstream kernel, various offloading
>     (vlan, cvlan) is not working over virtual interfaces (i.e. veth pair).
> +   Also, TCP is not working over virtual interfaces in generic XDP mode.
> +   Some more information and possible workaround available `here
> +   <https://github.com/cilium/cilium/issues/3077#issuecomment-430801467>`__ .
>  
>  
>  PVP using tap device
> @@ -335,8 +348,7 @@ First, start OVS, then add physical port::
>    ethtool -L enp2s0 combined 1
>    ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
>    ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
> -    options:n_rxq=1 options:xdpmode=drv \
> -    other_config:pmd-rxq-affinity="0:4"
> +    options:n_rxq=1 other_config:pmd-rxq-affinity="0:4"
>  
>  Start a VM with virtio and tap device::
>  
> @@ -414,13 +426,11 @@ Create namespace and veth peer devices::
>  
>  Attach the veth port to br0 (linux kernel mode)::
>  
> -  ovs-vsctl add-port br0 afxdp-p0 -- \
> -    set interface afxdp-p0 options:n_rxq=1
> +  ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0
>  
> -Or, use AF_XDP with skb mode::
> +Or, use AF_XDP::
>  
> -  ovs-vsctl add-port br0 afxdp-p0 -- \
> -    set interface afxdp-p0 type="afxdp" options:n_rxq=1 options:xdpmode=skb
> +  ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0 type="afxdp"
>  
>  Setup the OpenFlow rules::
>  
> diff --git a/NEWS b/NEWS
> index 88b818948..d5f476d6e 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -5,11 +5,19 @@ Post-v2.12.0
>         separate project. You can find it at
>         https://github.com/ovn-org/ovn.git
>     - Userspace datapath:
> +     * Add option to enable, disable and query TCP sequence checking in
> +       conntrack.
> +   - AF_XDP:
>       * New option 'use-need-wakeup' for netdev-afxdp to control enabling
>         of corresponding 'need_wakeup' flag in AF_XDP rings.  Enabled by default
>         if supported by libbpf.
> -     * Add option to enable, disable and query TCP sequence checking in
> -       conntrack.
> +     * 'xdpmode' option for netdev-afxdp renamed to 'xdp-mode'.
> +       Modes also updated.  New values:
> +         native-with-zerocopy  - former DRV
> +         native                - new one, DRV without zero-copy
> +         generic               - former SKB
> +         best-effort [default] - new one, chooses the best available from
> +                                 3 above modes
>  
>  v2.12.0 - 03 Sep 2019
>  ---------------------
> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
> index af654d498..74dde219d 100644
> --- a/lib/netdev-afxdp.c
> +++ b/lib/netdev-afxdp.c
> @@ -89,12 +89,42 @@ BUILD_ASSERT_DECL(PROD_NUM_DESCS == CONS_NUM_DESCS);
>  #define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base))
>  
>  static struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id,
> -                                             int mode, bool use_need_wakeup);
> -static void xsk_remove_xdp_program(uint32_t ifindex, int xdpmode);
> +                                             enum afxdp_mode mode,
> +                                             bool use_need_wakeup,
> +                                             bool report_socket_failures);
> +static void xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode);
>  static void xsk_destroy(struct xsk_socket_info *xsk);
>  static int xsk_configure_all(struct netdev *netdev);
>  static void xsk_destroy_all(struct netdev *netdev);
>  
> +static struct {
> +    const char *name;
> +    uint32_t bind_flags;
> +    uint32_t xdp_flags;
> +} xdp_modes[] = {
> +    [OVS_AF_XDP_MODE_UNSPEC] = {
> +        .name = "unspecified", .bind_flags = 0, .xdp_flags = 0,
> +    },
> +    [OVS_AF_XDP_MODE_BEST_EFFORT] = {
> +        .name = "best-effort", .bind_flags = 0, .xdp_flags = 0,
> +    },
> +    [OVS_AF_XDP_MODE_NATIVE_ZC] = {
> +        .name = "native-with-zerocopy",
> +        .bind_flags = XDP_ZEROCOPY,
> +        .xdp_flags = XDP_FLAGS_DRV_MODE,
> +    },
> +    [OVS_AF_XDP_MODE_NATIVE] = {
> +        .name = "native",
> +        .bind_flags = XDP_COPY,
> +        .xdp_flags = XDP_FLAGS_DRV_MODE,
> +    },
> +    [OVS_AF_XDP_MODE_GENERIC] = {
> +        .name = "generic",
> +        .bind_flags = XDP_COPY,
> +        .xdp_flags = XDP_FLAGS_SKB_MODE,
> +    },
> +};
> +
>  struct unused_pool {
>      struct xsk_umem_info *umem_info;
>      int lost_in_rings; /* Number of packets left in tx, rx, cq and fq. */
> @@ -214,7 +244,7 @@ netdev_afxdp_sweep_unused_pools(void *aux OVS_UNUSED)
>  }
>  
>  static struct xsk_umem_info *
> -xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
> +xsk_configure_umem(void *buffer, uint64_t size)
>  {
>      struct xsk_umem_config uconfig;
>      struct xsk_umem_info *umem;
> @@ -232,9 +262,7 @@ xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
>      ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
>                             &uconfig);
>      if (ret) {
> -        VLOG_ERR("xsk_umem__create failed (%s) mode: %s",
> -                 ovs_strerror(errno),
> -                 xdpmode == XDP_COPY ? "SKB": "DRV");
> +        VLOG_ERR("xsk_umem__create failed: %s.", ovs_strerror(errno));
>          free(umem);
>          return NULL;
>      }
> @@ -290,7 +318,8 @@ xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
>  
>  static struct xsk_socket_info *
>  xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
> -                     uint32_t queue_id, int xdpmode, bool use_need_wakeup)
> +                     uint32_t queue_id, enum afxdp_mode mode,
> +                     bool use_need_wakeup, bool report_socket_failures)
>  {
>      struct xsk_socket_config cfg;
>      struct xsk_socket_info *xsk;
> @@ -304,14 +333,8 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
>      cfg.rx_size = CONS_NUM_DESCS;
>      cfg.tx_size = PROD_NUM_DESCS;
>      cfg.libbpf_flags = 0;
> -
> -    if (xdpmode == XDP_ZEROCOPY) {
> -        cfg.bind_flags = XDP_ZEROCOPY;
> -        cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
> -    } else {
> -        cfg.bind_flags = XDP_COPY;
> -        cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
> -    }
> +    cfg.bind_flags = xdp_modes[mode].bind_flags;
> +    cfg.xdp_flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST;
>  
>  #ifdef HAVE_XDP_NEED_WAKEUP
>      if (use_need_wakeup) {
> @@ -329,12 +352,11 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
>      ret = xsk_socket__create(&xsk->xsk, devname, queue_id, umem->umem,
>                               &xsk->rx, &xsk->tx, &cfg);
>      if (ret) {
> -        VLOG_ERR("xsk_socket__create failed (%s) mode: %s "
> -                 "use-need-wakeup: %s qid: %d",
> -                 ovs_strerror(errno),
> -                 xdpmode == XDP_COPY ? "SKB": "DRV",
> -                 use_need_wakeup ? "true" : "false",
> -                 queue_id);
> +        VLOG(report_socket_failures ? VLL_ERR : VLL_DBG,
> +             "xsk_socket__create failed (%s) mode: %s, "
> +             "use-need-wakeup: %s, qid: %d",
> +             ovs_strerror(errno), xdp_modes[mode].name,
> +             use_need_wakeup ? "true" : "false", queue_id);
>          free(xsk);
>          return NULL;
>      }
> @@ -375,8 +397,8 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
>  }
>  
>  static struct xsk_socket_info *
> -xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
> -              bool use_need_wakeup)
> +xsk_configure(int ifindex, int xdp_queue_id, enum afxdp_mode mode,
> +              bool use_need_wakeup, bool report_socket_failures)
>  {
>      struct xsk_socket_info *xsk;
>      struct xsk_umem_info *umem;
> @@ -389,9 +411,7 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
>      memset(bufs, 0, NUM_FRAMES * FRAME_SIZE);
>  
>      /* Create AF_XDP socket. */
> -    umem = xsk_configure_umem(bufs,
> -                              NUM_FRAMES * FRAME_SIZE,
> -                              xdpmode);
> +    umem = xsk_configure_umem(bufs, NUM_FRAMES * FRAME_SIZE);
>      if (!umem) {
>          free_pagealign(bufs);
>          return NULL;
> @@ -399,8 +419,8 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
>  
>      VLOG_DBG("Allocated umem pool at 0x%"PRIxPTR, (uintptr_t) umem);
>  
> -    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, xdpmode,
> -                               use_need_wakeup);
> +    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, mode,
> +                               use_need_wakeup, report_socket_failures);
>      if (!xsk) {
>          /* Clean up umem and xpacket pool. */
>          if (xsk_umem__delete(umem->umem)) {
> @@ -414,12 +434,38 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
>      return xsk;
>  }
>  
> +static int
> +xsk_configure_queue(struct netdev_linux *dev, int ifindex, int queue_id,
> +                    enum afxdp_mode mode, bool report_socket_failures)
> +{
> +    struct xsk_socket_info *xsk_info;
> +
> +    VLOG_DBG("%s: configuring queue: %d, mode: %s, use-need-wakeup: %s.",
> +             netdev_get_name(&dev->up), queue_id, xdp_modes[mode].name,
> +             dev->use_need_wakeup ? "true" : "false");
> +    xsk_info = xsk_configure(ifindex, queue_id, mode, dev->use_need_wakeup,
> +                             report_socket_failures);
> +    if (!xsk_info) {
> +        VLOG(report_socket_failures ? VLL_ERR : VLL_DBG,
> +             "%s: Failed to create AF_XDP socket on queue %d in %s mode.",
> +             netdev_get_name(&dev->up), queue_id, xdp_modes[mode].name);
> +        dev->xsks[queue_id] = NULL;
> +        return -1;
> +    }
> +    dev->xsks[queue_id] = xsk_info;
> +    atomic_init(&xsk_info->tx_dropped, 0);
> +    xsk_info->outstanding_tx = 0;
> +    xsk_info->available_rx = PROD_NUM_DESCS;
> +    return 0;
> +}
> +
> +
>  static int
>  xsk_configure_all(struct netdev *netdev)
>  {
>      struct netdev_linux *dev = netdev_linux_cast(netdev);
> -    struct xsk_socket_info *xsk_info;
>      int i, ifindex, n_rxq, n_txq;
> +    int qid = 0;
>  
>      ifindex = linux_get_ifindex(netdev_get_name(netdev));
>  
> @@ -429,23 +475,36 @@ xsk_configure_all(struct netdev *netdev)
>      n_rxq = netdev_n_rxq(netdev);
>      dev->xsks = xcalloc(n_rxq, sizeof *dev->xsks);
>  
> -    /* Configure each queue. */
> -    for (i = 0; i < n_rxq; i++) {
> -        VLOG_DBG("%s: configure queue %d mode %s use-need-wakeup %s.",
> -                 netdev_get_name(netdev), i,
> -                 dev->xdpmode == XDP_COPY ? "SKB" : "DRV",
> -                 dev->use_need_wakeup ? "true" : "false");
> -        xsk_info = xsk_configure(ifindex, i, dev->xdpmode,
> -                                 dev->use_need_wakeup);
> -        if (!xsk_info) {
> -            VLOG_ERR("Failed to create AF_XDP socket on queue %d.", i);
> -            dev->xsks[i] = NULL;
> +    if (dev->xdp_mode == OVS_AF_XDP_MODE_BEST_EFFORT) {
> +        /* Trying to configure first queue with different modes to
> +         * find the most suitable. */
> +        for (i = OVS_AF_XDP_MODE_NATIVE_ZC; i < OVS_AF_XDP_MODE_MAX; i++) {
> +            if (!xsk_configure_queue(dev, ifindex, qid, i,
> +                                     i == OVS_AF_XDP_MODE_MAX - 1)) {
> +                dev->xdp_mode_in_use = i;
> +                VLOG_INFO("%s: %s XDP mode will be in use.",
> +                          netdev_get_name(netdev), xdp_modes[i].name);
> +                break;
> +            }
> +        }
> +        if (i == OVS_AF_XDP_MODE_MAX) {
> +            VLOG_ERR("%s: Failed to detect suitable XDP mode.",
> +                     netdev_get_name(netdev));
> +            goto err;
> +        }
> +        qid++;
> +    } else {
> +        dev->xdp_mode_in_use = dev->xdp_mode;
> +    }
> +
> +    /* Configure remaining queues. */
> +    for (; qid < n_rxq; qid++) {
> +        if (xsk_configure_queue(dev, ifindex, qid,
> +                                dev->xdp_mode_in_use, true)) {
> +            VLOG_ERR("%s: Failed to create AF_XDP socket on queue %d.",
> +                     netdev_get_name(netdev), qid);
>              goto err;
>          }
> -        dev->xsks[i] = xsk_info;
> -        atomic_init(&xsk_info->tx_dropped, 0);
> -        xsk_info->outstanding_tx = 0;
> -        xsk_info->available_rx = PROD_NUM_DESCS;
>      }
>  
>      n_txq = netdev_n_txq(netdev);
> @@ -500,7 +559,7 @@ xsk_destroy_all(struct netdev *netdev)
>              if (dev->xsks[i]) {
>                  xsk_destroy(dev->xsks[i]);
>                  dev->xsks[i] = NULL;
> -                VLOG_INFO("Destroyed xsk[%d].", i);
> +                VLOG_DBG("%s: Destroyed xsk[%d].", netdev_get_name(netdev), i);
>              }
>          }
>  
> @@ -510,7 +569,7 @@ xsk_destroy_all(struct netdev *netdev)
>  
>      VLOG_INFO("%s: Removing xdp program.", netdev_get_name(netdev));
>      ifindex = linux_get_ifindex(netdev_get_name(netdev));
> -    xsk_remove_xdp_program(ifindex, dev->xdpmode);
> +    xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use);
>  
>      if (dev->tx_locks) {
>          for (i = 0; i < netdev_n_txq(netdev); i++) {
> @@ -526,9 +585,10 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
>                          char **errp OVS_UNUSED)
>  {
>      struct netdev_linux *dev = netdev_linux_cast(netdev);
> -    const char *str_xdpmode;
> -    int xdpmode, new_n_rxq;
> +    const char *str_xdp_mode;
> +    enum afxdp_mode xdp_mode;
>      bool need_wakeup;
> +    int new_n_rxq;
>  
>      ovs_mutex_lock(&dev->mutex);
>      new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
> @@ -539,14 +599,17 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
>          return EINVAL;
>      }
>  
> -    str_xdpmode = smap_get_def(args, "xdpmode", "skb");
> -    if (!strcasecmp(str_xdpmode, "drv")) {
> -        xdpmode = XDP_ZEROCOPY;
> -    } else if (!strcasecmp(str_xdpmode, "skb")) {
> -        xdpmode = XDP_COPY;
> -    } else {
> -        VLOG_ERR("%s: Incorrect xdpmode (%s).",
> -                 netdev_get_name(netdev), str_xdpmode);
> +    str_xdp_mode = smap_get_def(args, "xdp-mode", "best-effort");
> +    for (xdp_mode = OVS_AF_XDP_MODE_BEST_EFFORT;
> +         xdp_mode < OVS_AF_XDP_MODE_MAX;
> +         xdp_mode++) {
> +        if (!strcasecmp(str_xdp_mode, xdp_modes[xdp_mode].name)) {
> +            break;
> +        }
> +    }
> +    if (xdp_mode == OVS_AF_XDP_MODE_MAX) {
> +        VLOG_ERR("%s: Incorrect xdp-mode (%s).",
> +                 netdev_get_name(netdev), str_xdp_mode);
>          ovs_mutex_unlock(&dev->mutex);
>          return EINVAL;
>      }
> @@ -560,10 +623,10 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
>  #endif
>  
>      if (dev->requested_n_rxq != new_n_rxq
> -        || dev->requested_xdpmode != xdpmode
> +        || dev->requested_xdp_mode != xdp_mode
>          || dev->requested_need_wakeup != need_wakeup) {
>          dev->requested_n_rxq = new_n_rxq;
> -        dev->requested_xdpmode = xdpmode;
> +        dev->requested_xdp_mode = xdp_mode;
>          dev->requested_need_wakeup = need_wakeup;
>          netdev_request_reconfigure(netdev);
>      }
> @@ -578,8 +641,9 @@ netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args)
>  
>      ovs_mutex_lock(&dev->mutex);
>      smap_add_format(args, "n_rxq", "%d", netdev->n_rxq);
> -    smap_add_format(args, "xdpmode", "%s",
> -                    dev->xdpmode == XDP_ZEROCOPY ? "drv" : "skb");
> +    smap_add_format(args, "xdp-mode", "%s", xdp_modes[dev->xdp_mode].name);
> +    smap_add_format(args, "xdp-mode-in-use", "%s",
> +                    xdp_modes[dev->xdp_mode_in_use].name);
>      smap_add_format(args, "use-need-wakeup", "%s",
>                      dev->use_need_wakeup ? "true" : "false");
>      ovs_mutex_unlock(&dev->mutex);
> @@ -596,7 +660,7 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
>      ovs_mutex_lock(&dev->mutex);
>  
>      if (netdev->n_rxq == dev->requested_n_rxq
> -        && dev->xdpmode == dev->requested_xdpmode
> +        && dev->xdp_mode == dev->requested_xdp_mode
>          && dev->use_need_wakeup == dev->requested_need_wakeup
>          && dev->xsks) {
>          goto out;
> @@ -607,9 +671,9 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
>      netdev->n_rxq = dev->requested_n_rxq;
>      netdev->n_txq = netdev->n_rxq;
>  
> -    dev->xdpmode = dev->requested_xdpmode;
> +    dev->xdp_mode = dev->requested_xdp_mode;
>      VLOG_INFO("%s: Setting XDP mode to %s.", netdev_get_name(netdev),
> -              dev->xdpmode == XDP_ZEROCOPY ? "DRV" : "SKB");
> +              xdp_modes[dev->xdp_mode].name);
>  
>      if (setrlimit(RLIMIT_MEMLOCK, &r)) {
>          VLOG_ERR("setrlimit(RLIMIT_MEMLOCK) failed: %s", ovs_strerror(errno));
> @@ -618,7 +682,8 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
>  
>      err = xsk_configure_all(netdev);
>      if (err) {
> -        VLOG_ERR("AF_XDP device %s reconfig failed.", netdev_get_name(netdev));
> +        VLOG_ERR("%s: AF_XDP device reconfiguration failed.",
> +                 netdev_get_name(netdev));
>      }
>      netdev_change_seq_changed(netdev);
>  out:
> @@ -638,17 +703,9 @@ netdev_afxdp_get_numa_id(const struct netdev *netdev)
>  }
>  
>  static void
> -xsk_remove_xdp_program(uint32_t ifindex, int xdpmode)
> +xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode mode)
>  {
> -    uint32_t flags;
> -
> -    flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
> -
> -    if (xdpmode == XDP_COPY) {
> -        flags |= XDP_FLAGS_SKB_MODE;
> -    } else if (xdpmode == XDP_ZEROCOPY) {
> -        flags |= XDP_FLAGS_DRV_MODE;
> -    }
> +    uint32_t flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST;
>  
>      bpf_set_link_xdp_fd(ifindex, -1, flags);
>  }
> @@ -662,7 +719,7 @@ signal_remove_xdp(struct netdev *netdev)
>      ifindex = linux_get_ifindex(netdev_get_name(netdev));
>  
>      VLOG_WARN("Force removing xdp program.");
> -    xsk_remove_xdp_program(ifindex, dev->xdpmode);
> +    xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use);
>  }
>  
>  static struct dp_packet_afxdp *
> @@ -782,7 +839,8 @@ netdev_afxdp_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
>  }
>  
>  static inline int
> -kick_tx(struct xsk_socket_info *xsk_info, int xdpmode, bool use_need_wakeup)
> +kick_tx(struct xsk_socket_info *xsk_info, enum afxdp_mode mode,
> +        bool use_need_wakeup)
>  {
>      int ret, retries;
>      static const int KERNEL_TX_BATCH_SIZE = 16;
> @@ -791,11 +849,11 @@ kick_tx(struct xsk_socket_info *xsk_info, int xdpmode, bool use_need_wakeup)
>          return 0;
>      }
>  
> -    /* In SKB_MODE packet transmission is synchronous, and the kernel xmits
> +    /* In generic mode packet transmission is synchronous, and the kernel xmits
>       * only TX_BATCH_SIZE(16) packets for a single sendmsg syscall.
>       * So, we have to kick the kernel (n_packets / 16) times to be sure that
>       * all packets are transmitted. */
> -    retries = (xdpmode == XDP_COPY)
> +    retries = (mode == OVS_AF_XDP_MODE_GENERIC)
>                ? xsk_info->outstanding_tx / KERNEL_TX_BATCH_SIZE
>                : 0;
>  kick_retry:
> @@ -962,7 +1020,7 @@ __netdev_afxdp_batch_send(struct netdev *netdev, int qid,
>                             &orig);
>          COVERAGE_INC(afxdp_tx_full);
>          afxdp_complete_tx(xsk_info);
> -        kick_tx(xsk_info, dev->xdpmode, dev->use_need_wakeup);
> +        kick_tx(xsk_info, dev->xdp_mode_in_use, dev->use_need_wakeup);
>          error = ENOMEM;
>          goto out;
>      }
> @@ -986,7 +1044,7 @@ __netdev_afxdp_batch_send(struct netdev *netdev, int qid,
>      xsk_ring_prod__submit(&xsk_info->tx, dp_packet_batch_size(batch));
>      xsk_info->outstanding_tx += dp_packet_batch_size(batch);
>  
> -    ret = kick_tx(xsk_info, dev->xdpmode, dev->use_need_wakeup);
> +    ret = kick_tx(xsk_info, dev->xdp_mode_in_use, dev->use_need_wakeup);
>      if (OVS_UNLIKELY(ret)) {
>          VLOG_WARN_RL(&rl, "%s: error sending AF_XDP packet: %s.",
>                       netdev_get_name(netdev), ovs_strerror(ret));
> @@ -1052,10 +1110,11 @@ netdev_afxdp_construct(struct netdev *netdev)
>      /* Queues should not be used before the first reconfiguration. Clearing. */
>      netdev->n_rxq = 0;
>      netdev->n_txq = 0;
> -    dev->xdpmode = 0;
> +    dev->xdp_mode = OVS_AF_XDP_MODE_UNSPEC;
> +    dev->xdp_mode_in_use = OVS_AF_XDP_MODE_UNSPEC;
>  
>      dev->requested_n_rxq = NR_QUEUE;
> -    dev->requested_xdpmode = XDP_COPY;
> +    dev->requested_xdp_mode = OVS_AF_XDP_MODE_BEST_EFFORT;
>      dev->requested_need_wakeup = NEED_WAKEUP_DEFAULT;
>  
>      dev->xsks = NULL;
> diff --git a/lib/netdev-afxdp.h b/lib/netdev-afxdp.h
> index e2f400b72..4fe861d2d 100644
> --- a/lib/netdev-afxdp.h
> +++ b/lib/netdev-afxdp.h
> @@ -25,6 +25,15 @@
>  /* These functions are Linux AF_XDP specific, so they should be used directly
>   * only by Linux-specific code. */
>  
> +enum afxdp_mode {
> +    OVS_AF_XDP_MODE_UNSPEC,
> +    OVS_AF_XDP_MODE_BEST_EFFORT,
> +    OVS_AF_XDP_MODE_NATIVE_ZC,
> +    OVS_AF_XDP_MODE_NATIVE,
> +    OVS_AF_XDP_MODE_GENERIC,
> +    OVS_AF_XDP_MODE_MAX,
> +};
> +
>  struct netdev;
>  struct xsk_socket_info;
>  struct xdp_umem;
> diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h
> index c14f2fb81..8873caa9d 100644
> --- a/lib/netdev-linux-private.h
> +++ b/lib/netdev-linux-private.h
> @@ -100,10 +100,14 @@ struct netdev_linux {
>      /* AF_XDP information. */
>      struct xsk_socket_info **xsks;
>      int requested_n_rxq;
> -    int xdpmode;                /* AF_XDP running mode: driver or skb. */
> -    int requested_xdpmode;
> +
> +    enum afxdp_mode xdp_mode;               /* Configured AF_XDP mode. */
> +    enum afxdp_mode requested_xdp_mode;     /* Requested  AF_XDP mode. */
> +    enum afxdp_mode xdp_mode_in_use;        /* Effective  AF_XDP mode. */
> +
>      bool use_need_wakeup;
>      bool requested_need_wakeup;
> +
>      struct ovs_spin *tx_locks;  /* spin lock array for TX queues. */
>  #endif
>  };
> diff --git a/tests/system-afxdp-macros.at b/tests/system-afxdp-macros.at
> index f0683c0a9..5ee2ceb1a 100644
> --- a/tests/system-afxdp-macros.at
> +++ b/tests/system-afxdp-macros.at
> @@ -30,10 +30,3 @@ m4_define([CONFIGURE_VETH_OFFLOADS],
>       AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore])
>      ]
>  )
> -
> -# OVS_START_L7([namespace], [protocol])
> -#
> -# AF_XDP doesn't work with TCP over virtual interfaces for now.
> -#
> -m4_define([OVS_START_L7],
> -   [AT_SKIP_IF([:])])
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index efdfb83bb..02a68deb1 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -3107,18 +3107,38 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
>          </p>
>        </column>
>  
> -      <column name="options" key="xdpmode"
> +      <column name="options" key="xdp-mode"
>                type='{"type": "string",
> -                     "enum": ["set", ["skb", "drv"]]}'>
> +                     "enum": ["set", ["best-effort", "native-with-zerocopy",
> +                                      "native", "generic"]]}'>
>          <p>
>            Specifies the operational mode of the XDP program.
> -          If "drv", the XDP program is loaded into the device driver with
> -          zero-copy RX and TX enabled. This mode requires device driver with
> -          AF_XDP support and has the best performance.
> -          If "skb", the XDP program is using generic XDP mode in kernel with
> -          extra data copying between userspace and kernel. No device driver
> -          support is needed. Note that this is afxdp netdev type only.
> -          Defaults to "skb" mode.
> +          <p>
> +            In <code>native-with-zerocopy</code> mode the XDP program is loaded
> +            into the device driver with zero-copy RX and TX enabled.  This mode
> +            requires device driver support and has the best performance because
> +            there should be no copying of packets.
> +          </p>
> +          <p>
> +            <code>native</code> is the same as
> +            <code>native-with-zerocopy</code>, but without zero-copy
> +            capability.  This requires at least one copy between kernel and the
> +            userspace. This mode also requires support from device driver.
> +          </p>
> +          <p>
> +            In <code>generic</code> case the XDP program in kernel works after
> +            skb allocation on early stages of packet processing inside the
> +            network stack.  This mode doesn't require driver support, but has
> +            much lower performance.
> +          </p>
> +          <p>
> +            <code>best-effort</code> tries to detect and choose the best
> +            (fastest) from the available modes for current interface.
> +          </p>
> +          <p>
> +            Note that this option is specific to netdev-afxdp.
> +            Defaults to <code>best-effort</code> mode.
> +          </p>
>          </p>
>        </column>
>  
> -- 
> 2.17.1
>
William Tu Nov. 11, 2019, 5:46 p.m. UTC | #6
On Thu, Nov 07, 2019 at 01:39:28PM +0100, Eelco Chaudron wrote:
> 
> 
> On 7 Nov 2019, at 12:36, Ilya Maximets wrote:
> 
> >Until now there was only two options for XDP mode in OVS: SKB or DRV.
> >i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
> >
> >Devices like 'veth' interfaces in Linux supports native XDP, but
> >doesn't support zero-copy mode.  This case can not be covered by
> >existing API and we have to use slower generic XDP for such devices.
> >There are few more issues, e.g. TCP is not supported in generic XDP
> >mode for veth interfaces due to kernel limitations, however it is
> >supported in native mode.
> >
> >This change introduces ability to use native XDP without zero-copy
> >along with best-effort configuration option that enabled by default.
> >In best-effort case OVS will sequentially try different modes starting
> >from the fastest one and will choose the first acceptable for current
> >interface.  This will guarantee the best possible performance.
> >
> >If user will want to choose specific mode, it's still possible by
> >setting the 'options:xdp-mode'.
> >
> >This change additionally changes the API by renaming the configuration
> >knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
> >themselves to be more user-friendly.
> >
> >The full list of currently supported modes:
> >  * native-with-zerocopy - former DRV
> >  * native               - new one, DRV without zero-copy
> >  * generic              - former SKB
> >  * best-effort          - new one, chooses the best available from
> >                           3 above modes
> >
> >Since 'best-effort' is a default mode, users will not need to
> >explicitely set 'xdp-mode' in most cases.
> >
> >TCP related tests enabled back in system afxdp testsuite, because
> >'best-effort' will choose 'native' mode for veth interfaces
> >and this mode has no issues with TCP.
> >
> >Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> 
> No review, I was running some performance tests for the OVS conference
> presentation so quickly tried this patch (assuming performance would
> increase :)…
> 
> So with the native option (default for tap) I see a decrease, :(,  in
> performance over the skb mode (this is with my usual ovs_perf PVP test
> setup).
> 
> With the patch applied, and default configuration
> (xdp-mode-in-use=native-with-zerocopy for my tap):
> 
> "Physical to Virtual to Physical test, L3 flows[port redirect]"
> ,                 Packet size
> Number of flows,  64
>    1,             711581
>  100,             673152
> 1000,             617733
> 
> Here you will even see a performance drop with multiple IP flows (this is a
> single queue config).
> 
> With SKB mode (xdp-mode-in-use=generic):
> 
> "Physical to Virtual to Physical test, L3 flows[port redirect]"
> ,                 Packet size
> Number of flows,  64
>    1,             800796
>  100,             800912
> 1000,             800133
> 

Hi Eelco,

I have another question (not directly related to this patch).

When doing PVP testing, what's the total CPU consumption?
I'm trying to think about the efficiency, ex:
  Packet forwarding rate / CPU utilization

For native-with-zerocopy, is it correct that your system uses
  ksoftirqd/x 100% (rx port)
  ksoftirqd/y 100% (tx port)
  qemu-kvm    100%
  vhost_net   100%
  ovs-vswitchd 100% (pmd thead)

Or with need_wakeup, it's lower than this number?

Thanks
William
William Tu Nov. 13, 2019, 12:21 a.m. UTC | #7
On Thu, Nov 7, 2019 at 3:36 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> Until now there was only two options for XDP mode in OVS: SKB or DRV.
> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
>
> Devices like 'veth' interfaces in Linux supports native XDP, but
> doesn't support zero-copy mode.  This case can not be covered by
> existing API and we have to use slower generic XDP for such devices.
> There are few more issues, e.g. TCP is not supported in generic XDP
> mode for veth interfaces due to kernel limitations, however it is
> supported in native mode.
>
> This change introduces ability to use native XDP without zero-copy
> along with best-effort configuration option that enabled by default.
> In best-effort case OVS will sequentially try different modes starting
> from the fastest one and will choose the first acceptable for current
> interface.  This will guarantee the best possible performance.
>
> If user will want to choose specific mode, it's still possible by
> setting the 'options:xdp-mode'.
>
> This change additionally changes the API by renaming the configuration
> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
> themselves to be more user-friendly.
>
> The full list of currently supported modes:
>   * native-with-zerocopy - former DRV
>   * native               - new one, DRV without zero-copy
>   * generic              - former SKB
>   * best-effort          - new one, chooses the best available from
>                            3 above modes
>
> Since 'best-effort' is a default mode, users will not need to
> explicitely set 'xdp-mode' in most cases.
>
> TCP related tests enabled back in system afxdp testsuite, because
> 'best-effort' will choose 'native' mode for veth interfaces
> and this mode has no issues with TCP.
>
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---

I'm thinking about adding some tests case and patches depends on
this API change. If there is no more comment, I will apply to master.

Regards,
William
Ilya Maximets Nov. 13, 2019, 1:06 p.m. UTC | #8
On 08.11.2019 15:22, William Tu wrote:
> On Thu, Nov 07, 2019 at 12:36:33PM +0100, Ilya Maximets wrote:
>> Until now there was only two options for XDP mode in OVS: SKB or DRV.
>> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
>>
>> Devices like 'veth' interfaces in Linux supports native XDP, but
>> doesn't support zero-copy mode.  This case can not be covered by
>> existing API and we have to use slower generic XDP for such devices.
>> There are few more issues, e.g. TCP is not supported in generic XDP
>> mode for veth interfaces due to kernel limitations, however it is
>> supported in native mode.
>>
>> This change introduces ability to use native XDP without zero-copy
>> along with best-effort configuration option that enabled by default.
>> In best-effort case OVS will sequentially try different modes starting
>> from the fastest one and will choose the first acceptable for current
>> interface.  This will guarantee the best possible performance.
>>
>> If user will want to choose specific mode, it's still possible by
>> setting the 'options:xdp-mode'.
>>
>> This change additionally changes the API by renaming the configuration
>> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
>> themselves to be more user-friendly.
>>
>> The full list of currently supported modes:
>>   * native-with-zerocopy - former DRV
>>   * native               - new one, DRV without zero-copy
>>   * generic              - former SKB
>>   * best-effort          - new one, chooses the best available from
>>                            3 above modes
> 
> Since we are renaming the mode, in doc, should we tell user the mapping
> of these mode to kernel AF_XDP's mode?
> So let user know generic mode in OVS  = generic SKB in kernel,
> native mode in OVS = native mode without zc...

It might make sense to document that 'generic' uses 'XDP_SKB'
and 'native' uses 'XDP_DRV', but it seems that 'generic'/'native'
terms are widely used in XDP related community and projects, i.e.
should be well known.

Best regards, Ilya Maximets.
Ilya Maximets Nov. 13, 2019, 1:23 p.m. UTC | #9
On 13.11.2019 1:21, William Tu wrote:
> On Thu, Nov 7, 2019 at 3:36 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>>
>> Until now there was only two options for XDP mode in OVS: SKB or DRV.
>> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
>>
>> Devices like 'veth' interfaces in Linux supports native XDP, but
>> doesn't support zero-copy mode.  This case can not be covered by
>> existing API and we have to use slower generic XDP for such devices.
>> There are few more issues, e.g. TCP is not supported in generic XDP
>> mode for veth interfaces due to kernel limitations, however it is
>> supported in native mode.
>>
>> This change introduces ability to use native XDP without zero-copy
>> along with best-effort configuration option that enabled by default.
>> In best-effort case OVS will sequentially try different modes starting
>> from the fastest one and will choose the first acceptable for current
>> interface.  This will guarantee the best possible performance.
>>
>> If user will want to choose specific mode, it's still possible by
>> setting the 'options:xdp-mode'.
>>
>> This change additionally changes the API by renaming the configuration
>> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
>> themselves to be more user-friendly.
>>
>> The full list of currently supported modes:
>>   * native-with-zerocopy - former DRV
>>   * native               - new one, DRV without zero-copy
>>   * generic              - former SKB
>>   * best-effort          - new one, chooses the best available from
>>                            3 above modes
>>
>> Since 'best-effort' is a default mode, users will not need to
>> explicitely set 'xdp-mode' in most cases.
>>
>> TCP related tests enabled back in system afxdp testsuite, because
>> 'best-effort' will choose 'native' mode for veth interfaces
>> and this mode has no issues with TCP.
>>
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>> ---
> 
> I'm thinking about adding some tests case and patches depends on
> this API change. If there is no more comment, I will apply to master.

I think, it might be good to wait for some comments from Eelco.

@Eelco, do you have a plan to review the code? Do you agree with the
API update?

Best regards, Ilya Maximets.
Eelco Chaudron Nov. 14, 2019, 11:08 a.m. UTC | #10
On 13 Nov 2019, at 14:23, Ilya Maximets wrote:

> On 13.11.2019 1:21, William Tu wrote:
>> On Thu, Nov 7, 2019 at 3:36 AM Ilya Maximets <i.maximets@ovn.org> 
>> wrote:
>>>
>>> Until now there was only two options for XDP mode in OVS: SKB or 
>>> DRV.
>>> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
>>>
>>> Devices like 'veth' interfaces in Linux supports native XDP, but
>>> doesn't support zero-copy mode.  This case can not be covered by
>>> existing API and we have to use slower generic XDP for such devices.
>>> There are few more issues, e.g. TCP is not supported in generic XDP
>>> mode for veth interfaces due to kernel limitations, however it is
>>> supported in native mode.
>>>
>>> This change introduces ability to use native XDP without zero-copy
>>> along with best-effort configuration option that enabled by default.
>>> In best-effort case OVS will sequentially try different modes 
>>> starting
>>> from the fastest one and will choose the first acceptable for 
>>> current
>>> interface.  This will guarantee the best possible performance.
>>>
>>> If user will want to choose specific mode, it's still possible by
>>> setting the 'options:xdp-mode'.
>>>
>>> This change additionally changes the API by renaming the 
>>> configuration
>>> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
>>> themselves to be more user-friendly.
>>>
>>> The full list of currently supported modes:
>>>   * native-with-zerocopy - former DRV
>>>   * native               - new one, DRV without zero-copy
>>>   * generic              - former SKB
>>>   * best-effort          - new one, chooses the best available from
>>>                            3 above modes
>>>
>>> Since 'best-effort' is a default mode, users will not need to
>>> explicitely set 'xdp-mode' in most cases.
>>>
>>> TCP related tests enabled back in system afxdp testsuite, because
>>> 'best-effort' will choose 'native' mode for veth interfaces
>>> and this mode has no issues with TCP.
>>>
>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>> ---
>>
>> I'm thinking about adding some tests case and patches depends on
>> this API change. If there is no more comment, I will apply to master.
>
> I think, it might be good to wait for some comments from Eelco.
>
> @Eelco, do you have a plan to review the code? Do you agree with the
> API update?

I’m rather busy this week, will try to review it next week, and I’m 
ok with the API change.

The only thing that bothers me is the worse performance of the TAP 
interface with the new default config.
Eelco Chaudron Nov. 19, 2019, 4:16 p.m. UTC | #11
On 7 Nov 2019, at 12:36, Ilya Maximets wrote:

> Until now there was only two options for XDP mode in OVS: SKB or DRV.
> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
>
> Devices like 'veth' interfaces in Linux supports native XDP, but
> doesn't support zero-copy mode.  This case can not be covered by
> existing API and we have to use slower generic XDP for such devices.
> There are few more issues, e.g. TCP is not supported in generic XDP
> mode for veth interfaces due to kernel limitations, however it is
> supported in native mode.
>
> This change introduces ability to use native XDP without zero-copy
> along with best-effort configuration option that enabled by default.
> In best-effort case OVS will sequentially try different modes starting
> from the fastest one and will choose the first acceptable for current
> interface.  This will guarantee the best possible performance.
>
> If user will want to choose specific mode, it's still possible by
> setting the 'options:xdp-mode'.
>
> This change additionally changes the API by renaming the configuration
> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
> themselves to be more user-friendly.
>
> The full list of currently supported modes:
>   * native-with-zerocopy - former DRV
>   * native               - new one, DRV without zero-copy
>   * generic              - former SKB
>   * best-effort          - new one, chooses the best available from
>                            3 above modes
>
> Since 'best-effort' is a default mode, users will not need to
> explicitely set 'xdp-mode' in most cases.
>
> TCP related tests enabled back in system afxdp testsuite, because
> 'best-effort' will choose 'native' mode for veth interfaces
> and this mode has no issues with TCP.
Patch in general looks good, two small comments inline.

The only thing that bothers me is the worse performance of the TAP 
interface with the new default config. Can we somehow keep the old 
behavior for TAP interfaces?

> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---
>
> With this patch I modified the user-visible API, but I think it's OK
> since it's still an experimental netdev.  Comments are welcome.
>
> Version 2:
>   * Rebased on current master.
>
>  Documentation/intro/install/afxdp.rst |  54 ++++---
>  NEWS                                  |  12 +-
>  lib/netdev-afxdp.c                    | 223 
> ++++++++++++++++----------
>  lib/netdev-afxdp.h                    |   9 ++
>  lib/netdev-linux-private.h            |   8 +-
>  tests/system-afxdp-macros.at          |   7 -
>  vswitchd/vswitch.xml                  |  38 +++--
>  7 files changed, 227 insertions(+), 124 deletions(-)
>
> diff --git a/Documentation/intro/install/afxdp.rst 
> b/Documentation/intro/install/afxdp.rst
> index a136db0c9..937770ad0 100644
> --- a/Documentation/intro/install/afxdp.rst
> +++ b/Documentation/intro/install/afxdp.rst
> @@ -153,9 +153,8 @@ To kick start end-to-end autotesting::
>    make check-afxdp TESTSUITEFLAGS='1'
>
>  .. note::
> -   Not all test cases pass at this time. Currenly all TCP related
> -   tests, ex: using wget or http, are skipped due to XDP limitations
> -   on veth. cvlan test is also skipped.
> +   Not all test cases pass at this time. Currenly all cvlan tests are 
> skipped
> +   due to kernel issues.
>
>  If a test case fails, check the log at::
>
> @@ -177,33 +176,35 @@ in :doc:`general`::
>    ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
>
>  Make sure your device driver support AF_XDP, netdev-afxdp supports
> -the following additional options (see man ovs-vswitchd.conf.db for
> +the following additional options (see ``man ovs-vswitchd.conf.db`` 
> for
>  more details):
>
> - * **xdpmode**: use "drv" for driver mode, or "skb" for skb mode.
> + * ``xdp-mode``: ``best-effort``, ``native-with-zerocopy``,
> +   ``native`` or ``generic``.  Defaults to ``best-effort``, i.e. best 
> of
> +   supported modes, so in most cases you don't need to change it.
>
> - * **use-need-wakeup**: default "true" if libbpf supports it, 
> otherwise false.
> + * ``use-need-wakeup``: default ``true`` if libbpf supports it,
> +   otherwise ``false``.
>
>  For example, to use 1 PMD (on core 4) on 1 queue (queue 0) device,
> -configure these options: **pmd-cpu-mask, pmd-rxq-affinity, and 
> n_rxq**.
> -The **xdpmode** can be "drv" or "skb"::
> +configure these options: ``pmd-cpu-mask``, ``pmd-rxq-affinity``, and
> +``n_rxq``::
>
>    ethtool -L enp2s0 combined 1
>    ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
>    ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" 
> \
> -    options:n_rxq=1 options:xdpmode=drv \
> -    other_config:pmd-rxq-affinity="0:4"
> +                                   
> other_config:pmd-rxq-affinity="0:4"
>
>  Or, use 4 pmds/cores and 4 queues by doing::
>
>    ethtool -L enp2s0 combined 4
>    ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
>    ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" 
> \
> -    options:n_rxq=4 options:xdpmode=drv \
> -    other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
> +    options:n_rxq=4 other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
>
>  .. note::
> -   pmd-rxq-affinity is optional. If not specified, system will 
> auto-assign.
> +   ``pmd-rxq-affinity`` is optional. If not specified, system will 
> auto-assign.
> +   ``n_rxq`` equals ``1`` by default.
>
>  To validate that the bridge has successfully instantiated, you can 
> use the::
>
> @@ -214,12 +215,21 @@ Should show something like::
>    Port "ens802f0"
>     Interface "ens802f0"
>        type: afxdp
> -      options: {n_rxq="1", xdpmode=drv}
> +      options: {n_rxq="1"}
>
>  Otherwise, enable debugging by::
>
>    ovs-appctl vlog/set netdev_afxdp::dbg
>
> +To check which XDP mode was chosen by ``best-effort``, you can look 
> for
> +``xdp-mode-in-use`` in the output of ``ovs-appctl dpctl/show``::
> +
> +  # ovs-appctl dpctl/show
> +  netdev@ovs-netdev:
> +    <...>
> +    port 2: ens802f0 (afxdp: n_rxq=1, use-need-wakeup=true,
> +                      xdp-mode=best-effort,
> +                      xdp-mode-in-use=native-with-zerocopy)
>
>  References
>  ----------
> @@ -323,8 +333,11 @@ Limitations/Known Issues
>  #. Most of the tests are done using i40e single port. Multiple ports 
> and
>     also ixgbe driver also needs to be tested.
>  #. No latency test result (TODO items)
> -#. Due to limitations of current upstream kernel, TCP and various 
> offloading
> +#. Due to limitations of current upstream kernel, various offloading
>     (vlan, cvlan) is not working over virtual interfaces (i.e. veth 
> pair).
> +   Also, TCP is not working over virtual interfaces in generic XDP 
> mode.
> +   Some more information and possible workaround available `here
> +   
> <https://github.com/cilium/cilium/issues/3077#issuecomment-430801467>`__ 
> .
>
>
>  PVP using tap device
> @@ -335,8 +348,7 @@ First, start OVS, then add physical port::
>    ethtool -L enp2s0 combined 1
>    ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
>    ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" 
> \
> -    options:n_rxq=1 options:xdpmode=drv \
> -    other_config:pmd-rxq-affinity="0:4"
> +    options:n_rxq=1 other_config:pmd-rxq-affinity="0:4"
>
>  Start a VM with virtio and tap device::
>
> @@ -414,13 +426,11 @@ Create namespace and veth peer devices::
>
>  Attach the veth port to br0 (linux kernel mode)::
>
> -  ovs-vsctl add-port br0 afxdp-p0 -- \
> -    set interface afxdp-p0 options:n_rxq=1
> +  ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0
>
> -Or, use AF_XDP with skb mode::
> +Or, use AF_XDP::
>
> -  ovs-vsctl add-port br0 afxdp-p0 -- \
> -    set interface afxdp-p0 type="afxdp" options:n_rxq=1 
> options:xdpmode=skb
> +  ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0 
> type="afxdp"
>
>  Setup the OpenFlow rules::
>
> diff --git a/NEWS b/NEWS
> index 88b818948..d5f476d6e 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -5,11 +5,19 @@ Post-v2.12.0
>         separate project. You can find it at
>         https://github.com/ovn-org/ovn.git
>     - Userspace datapath:
> +     * Add option to enable, disable and query TCP sequence checking 
> in
> +       conntrack.
> +   - AF_XDP:
>       * New option 'use-need-wakeup' for netdev-afxdp to control 
> enabling
>         of corresponding 'need_wakeup' flag in AF_XDP rings.  Enabled 
> by default
>         if supported by libbpf.
> -     * Add option to enable, disable and query TCP sequence checking 
> in
> -       conntrack.
> +     * 'xdpmode' option for netdev-afxdp renamed to 'xdp-mode'.
> +       Modes also updated.  New values:
> +         native-with-zerocopy  - former DRV
> +         native                - new one, DRV without zero-copy
> +         generic               - former SKB
> +         best-effort [default] - new one, chooses the best available 
> from
> +                                 3 above modes
>
>  v2.12.0 - 03 Sep 2019
>  ---------------------
> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
> index af654d498..74dde219d 100644
> --- a/lib/netdev-afxdp.c
> +++ b/lib/netdev-afxdp.c
> @@ -89,12 +89,42 @@ BUILD_ASSERT_DECL(PROD_NUM_DESCS == 
> CONS_NUM_DESCS);
>  #define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char 
> *)base))
>
>  static struct xsk_socket_info *xsk_configure(int ifindex, int 
> xdp_queue_id,
> -                                             int mode, bool 
> use_need_wakeup);
> -static void xsk_remove_xdp_program(uint32_t ifindex, int xdpmode);
> +                                             enum afxdp_mode mode,
> +                                             bool use_need_wakeup,
> +                                             bool 
> report_socket_failures);
> +static void xsk_remove_xdp_program(uint32_t ifindex, enum 
> afxdp_mode);
>  static void xsk_destroy(struct xsk_socket_info *xsk);
>  static int xsk_configure_all(struct netdev *netdev);
>  static void xsk_destroy_all(struct netdev *netdev);
>
> +static struct {
> +    const char *name;
> +    uint32_t bind_flags;
> +    uint32_t xdp_flags;
> +} xdp_modes[] = {
> +    [OVS_AF_XDP_MODE_UNSPEC] = {
> +        .name = "unspecified", .bind_flags = 0, .xdp_flags = 0,
> +    },
> +    [OVS_AF_XDP_MODE_BEST_EFFORT] = {
> +        .name = "best-effort", .bind_flags = 0, .xdp_flags = 0,
> +    },

<nitpicking> Maybe format the two entries above the same as below, as 
it’s more pleasing to my OCD brain </nitpicking>

> +    [OVS_AF_XDP_MODE_NATIVE_ZC] = {
> +        .name = "native-with-zerocopy",
> +        .bind_flags = XDP_ZEROCOPY,
> +        .xdp_flags = XDP_FLAGS_DRV_MODE,
> +    },
> +    [OVS_AF_XDP_MODE_NATIVE] = {
> +        .name = "native",
> +        .bind_flags = XDP_COPY,
> +        .xdp_flags = XDP_FLAGS_DRV_MODE,
> +    },
> +    [OVS_AF_XDP_MODE_GENERIC] = {
> +        .name = "generic",
> +        .bind_flags = XDP_COPY,
> +        .xdp_flags = XDP_FLAGS_SKB_MODE,
> +    },
> +};
> +
>  struct unused_pool {
>      struct xsk_umem_info *umem_info;
>      int lost_in_rings; /* Number of packets left in tx, rx, cq and 
> fq. */
> @@ -214,7 +244,7 @@ netdev_afxdp_sweep_unused_pools(void *aux 
> OVS_UNUSED)
>  }
>
>  static struct xsk_umem_info *
> -xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
> +xsk_configure_umem(void *buffer, uint64_t size)
>  {
>      struct xsk_umem_config uconfig;
>      struct xsk_umem_info *umem;
> @@ -232,9 +262,7 @@ xsk_configure_umem(void *buffer, uint64_t size, 
> int xdpmode)
>      ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, 
> &umem->cq,
>                             &uconfig);
>      if (ret) {
> -        VLOG_ERR("xsk_umem__create failed (%s) mode: %s",
> -                 ovs_strerror(errno),
> -                 xdpmode == XDP_COPY ? "SKB": "DRV");
> +        VLOG_ERR("xsk_umem__create failed: %s.", 
> ovs_strerror(errno));
>          free(umem);
>          return NULL;
>      }
> @@ -290,7 +318,8 @@ xsk_configure_umem(void *buffer, uint64_t size, 
> int xdpmode)
>
>  static struct xsk_socket_info *
>  xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
> -                     uint32_t queue_id, int xdpmode, bool 
> use_need_wakeup)
> +                     uint32_t queue_id, enum afxdp_mode mode,
> +                     bool use_need_wakeup, bool 
> report_socket_failures)
>  {
>      struct xsk_socket_config cfg;
>      struct xsk_socket_info *xsk;
> @@ -304,14 +333,8 @@ xsk_configure_socket(struct xsk_umem_info *umem, 
> uint32_t ifindex,
>      cfg.rx_size = CONS_NUM_DESCS;
>      cfg.tx_size = PROD_NUM_DESCS;
>      cfg.libbpf_flags = 0;
> -
> -    if (xdpmode == XDP_ZEROCOPY) {
> -        cfg.bind_flags = XDP_ZEROCOPY;
> -        cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | 
> XDP_FLAGS_DRV_MODE;
> -    } else {
> -        cfg.bind_flags = XDP_COPY;
> -        cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | 
> XDP_FLAGS_SKB_MODE;
> -    }
> +    cfg.bind_flags = xdp_modes[mode].bind_flags;
> +    cfg.xdp_flags = xdp_modes[mode].xdp_flags | 
> XDP_FLAGS_UPDATE_IF_NOEXIST;
>
>  #ifdef HAVE_XDP_NEED_WAKEUP
>      if (use_need_wakeup) {
> @@ -329,12 +352,11 @@ xsk_configure_socket(struct xsk_umem_info *umem, 
> uint32_t ifindex,
>      ret = xsk_socket__create(&xsk->xsk, devname, queue_id, 
> umem->umem,
>                               &xsk->rx, &xsk->tx, &cfg);
>      if (ret) {
> -        VLOG_ERR("xsk_socket__create failed (%s) mode: %s "
> -                 "use-need-wakeup: %s qid: %d",
> -                 ovs_strerror(errno),
> -                 xdpmode == XDP_COPY ? "SKB": "DRV",
> -                 use_need_wakeup ? "true" : "false",
> -                 queue_id);
> +        VLOG(report_socket_failures ? VLL_ERR : VLL_DBG,
> +             "xsk_socket__create failed (%s) mode: %s, "
> +             "use-need-wakeup: %s, qid: %d",
> +             ovs_strerror(errno), xdp_modes[mode].name,
> +             use_need_wakeup ? "true" : "false", queue_id);
>          free(xsk);
>          return NULL;
>      }
> @@ -375,8 +397,8 @@ xsk_configure_socket(struct xsk_umem_info *umem, 
> uint32_t ifindex,
>  }
>
>  static struct xsk_socket_info *
> -xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
> -              bool use_need_wakeup)
> +xsk_configure(int ifindex, int xdp_queue_id, enum afxdp_mode mode,
> +              bool use_need_wakeup, bool report_socket_failures)
>  {
>      struct xsk_socket_info *xsk;
>      struct xsk_umem_info *umem;
> @@ -389,9 +411,7 @@ xsk_configure(int ifindex, int xdp_queue_id, int 
> xdpmode,
>      memset(bufs, 0, NUM_FRAMES * FRAME_SIZE);
>
>      /* Create AF_XDP socket. */
> -    umem = xsk_configure_umem(bufs,
> -                              NUM_FRAMES * FRAME_SIZE,
> -                              xdpmode);
> +    umem = xsk_configure_umem(bufs, NUM_FRAMES * FRAME_SIZE);
>      if (!umem) {
>          free_pagealign(bufs);
>          return NULL;
> @@ -399,8 +419,8 @@ xsk_configure(int ifindex, int xdp_queue_id, int 
> xdpmode,
>
>      VLOG_DBG("Allocated umem pool at 0x%"PRIxPTR, (uintptr_t) umem);
>
> -    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, xdpmode,
> -                               use_need_wakeup);
> +    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, mode,
> +                               use_need_wakeup, 
> report_socket_failures);
>      if (!xsk) {
>          /* Clean up umem and xpacket pool. */
>          if (xsk_umem__delete(umem->umem)) {
> @@ -414,12 +434,38 @@ xsk_configure(int ifindex, int xdp_queue_id, int 
> xdpmode,
>      return xsk;
>  }
>
> +static int
> +xsk_configure_queue(struct netdev_linux *dev, int ifindex, int 
> queue_id,
> +                    enum afxdp_mode mode, bool 
> report_socket_failures)
> +{
> +    struct xsk_socket_info *xsk_info;
> +
> +    VLOG_DBG("%s: configuring queue: %d, mode: %s, use-need-wakeup: 
> %s.",
> +             netdev_get_name(&dev->up), queue_id, 
> xdp_modes[mode].name,
> +             dev->use_need_wakeup ? "true" : "false");
> +    xsk_info = xsk_configure(ifindex, queue_id, mode, 
> dev->use_need_wakeup,
> +                             report_socket_failures);
> +    if (!xsk_info) {
> +        VLOG(report_socket_failures ? VLL_ERR : VLL_DBG,
> +             "%s: Failed to create AF_XDP socket on queue %d in %s 
> mode.",
> +             netdev_get_name(&dev->up), queue_id, 
> xdp_modes[mode].name);
> +        dev->xsks[queue_id] = NULL;
> +        return -1;
> +    }
> +    dev->xsks[queue_id] = xsk_info;
> +    atomic_init(&xsk_info->tx_dropped, 0);
> +    xsk_info->outstanding_tx = 0;
> +    xsk_info->available_rx = PROD_NUM_DESCS;
> +    return 0;
> +}
> +
> +
>  static int
>  xsk_configure_all(struct netdev *netdev)
>  {
>      struct netdev_linux *dev = netdev_linux_cast(netdev);
> -    struct xsk_socket_info *xsk_info;
>      int i, ifindex, n_rxq, n_txq;
> +    int qid = 0;
>
>      ifindex = linux_get_ifindex(netdev_get_name(netdev));
>
> @@ -429,23 +475,36 @@ xsk_configure_all(struct netdev *netdev)
>      n_rxq = netdev_n_rxq(netdev);
>      dev->xsks = xcalloc(n_rxq, sizeof *dev->xsks);
>
> -    /* Configure each queue. */
> -    for (i = 0; i < n_rxq; i++) {
> -        VLOG_DBG("%s: configure queue %d mode %s use-need-wakeup 
> %s.",
> -                 netdev_get_name(netdev), i,
> -                 dev->xdpmode == XDP_COPY ? "SKB" : "DRV",
> -                 dev->use_need_wakeup ? "true" : "false");
> -        xsk_info = xsk_configure(ifindex, i, dev->xdpmode,
> -                                 dev->use_need_wakeup);
> -        if (!xsk_info) {
> -            VLOG_ERR("Failed to create AF_XDP socket on queue %d.", 
> i);
> -            dev->xsks[i] = NULL;
> +    if (dev->xdp_mode == OVS_AF_XDP_MODE_BEST_EFFORT) {
> +        /* Trying to configure first queue with different modes to
> +         * find the most suitable. */
> +        for (i = OVS_AF_XDP_MODE_NATIVE_ZC; i < OVS_AF_XDP_MODE_MAX; 
> i++) {
> +            if (!xsk_configure_queue(dev, ifindex, qid, i,
> +                                     i == OVS_AF_XDP_MODE_MAX - 1)) {
> +                dev->xdp_mode_in_use = i;
> +                VLOG_INFO("%s: %s XDP mode will be in use.",
> +                          netdev_get_name(netdev), 
> xdp_modes[i].name);
> +                break;
> +            }
> +        }

What if all modes fail? Guess it will use empty flags in 
xsk_configure_socket(). I would suggest to do a check in 
xsk_configure_socket() and bailout with a specific error?

> +        if (i == OVS_AF_XDP_MODE_MAX) {
> +            VLOG_ERR("%s: Failed to detect suitable XDP mode.",
> +                     netdev_get_name(netdev));
> +            goto err;
> +        }
> +        qid++;
> +    } else {
> +        dev->xdp_mode_in_use = dev->xdp_mode;
> +    }
> +
> +    /* Configure remaining queues. */
> +    for (; qid < n_rxq; qid++) {
> +        if (xsk_configure_queue(dev, ifindex, qid,
> +                                dev->xdp_mode_in_use, true)) {
> +            VLOG_ERR("%s: Failed to create AF_XDP socket on queue 
> %d.",
> +                     netdev_get_name(netdev), qid);
>              goto err;
>          }
> -        dev->xsks[i] = xsk_info;
> -        atomic_init(&xsk_info->tx_dropped, 0);
> -        xsk_info->outstanding_tx = 0;
> -        xsk_info->available_rx = PROD_NUM_DESCS;
>      }
>
>      n_txq = netdev_n_txq(netdev);
> @@ -500,7 +559,7 @@ xsk_destroy_all(struct netdev *netdev)
>              if (dev->xsks[i]) {
>                  xsk_destroy(dev->xsks[i]);
>                  dev->xsks[i] = NULL;
> -                VLOG_INFO("Destroyed xsk[%d].", i);
> +                VLOG_DBG("%s: Destroyed xsk[%d].", 
> netdev_get_name(netdev), i);
>              }
>          }
>
> @@ -510,7 +569,7 @@ xsk_destroy_all(struct netdev *netdev)
>
>      VLOG_INFO("%s: Removing xdp program.", netdev_get_name(netdev));
>      ifindex = linux_get_ifindex(netdev_get_name(netdev));
> -    xsk_remove_xdp_program(ifindex, dev->xdpmode);
> +    xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use);
>
>      if (dev->tx_locks) {
>          for (i = 0; i < netdev_n_txq(netdev); i++) {
> @@ -526,9 +585,10 @@ netdev_afxdp_set_config(struct netdev *netdev, 
> const struct smap *args,
>                          char **errp OVS_UNUSED)
>  {
>      struct netdev_linux *dev = netdev_linux_cast(netdev);
> -    const char *str_xdpmode;
> -    int xdpmode, new_n_rxq;
> +    const char *str_xdp_mode;
> +    enum afxdp_mode xdp_mode;
>      bool need_wakeup;
> +    int new_n_rxq;
>
>      ovs_mutex_lock(&dev->mutex);
>      new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
> @@ -539,14 +599,17 @@ netdev_afxdp_set_config(struct netdev *netdev, 
> const struct smap *args,
>          return EINVAL;
>      }
>
> -    str_xdpmode = smap_get_def(args, "xdpmode", "skb");
> -    if (!strcasecmp(str_xdpmode, "drv")) {
> -        xdpmode = XDP_ZEROCOPY;
> -    } else if (!strcasecmp(str_xdpmode, "skb")) {
> -        xdpmode = XDP_COPY;
> -    } else {
> -        VLOG_ERR("%s: Incorrect xdpmode (%s).",
> -                 netdev_get_name(netdev), str_xdpmode);
> +    str_xdp_mode = smap_get_def(args, "xdp-mode", "best-effort");
> +    for (xdp_mode = OVS_AF_XDP_MODE_BEST_EFFORT;
> +         xdp_mode < OVS_AF_XDP_MODE_MAX;
> +         xdp_mode++) {
> +        if (!strcasecmp(str_xdp_mode, xdp_modes[xdp_mode].name)) {
> +            break;
> +        }
> +    }
> +    if (xdp_mode == OVS_AF_XDP_MODE_MAX) {
> +        VLOG_ERR("%s: Incorrect xdp-mode (%s).",
> +                 netdev_get_name(netdev), str_xdp_mode);
>          ovs_mutex_unlock(&dev->mutex);
>          return EINVAL;
>      }
> @@ -560,10 +623,10 @@ netdev_afxdp_set_config(struct netdev *netdev, 
> const struct smap *args,
>  #endif
>
>      if (dev->requested_n_rxq != new_n_rxq
> -        || dev->requested_xdpmode != xdpmode
> +        || dev->requested_xdp_mode != xdp_mode
>          || dev->requested_need_wakeup != need_wakeup) {
>          dev->requested_n_rxq = new_n_rxq;
> -        dev->requested_xdpmode = xdpmode;
> +        dev->requested_xdp_mode = xdp_mode;
>          dev->requested_need_wakeup = need_wakeup;
>          netdev_request_reconfigure(netdev);
>      }
> @@ -578,8 +641,9 @@ netdev_afxdp_get_config(const struct netdev 
> *netdev, struct smap *args)
>
>      ovs_mutex_lock(&dev->mutex);
>      smap_add_format(args, "n_rxq", "%d", netdev->n_rxq);
> -    smap_add_format(args, "xdpmode", "%s",
> -                    dev->xdpmode == XDP_ZEROCOPY ? "drv" : "skb");
> +    smap_add_format(args, "xdp-mode", "%s", 
> xdp_modes[dev->xdp_mode].name);
> +    smap_add_format(args, "xdp-mode-in-use", "%s",
> +                    xdp_modes[dev->xdp_mode_in_use].name);
>      smap_add_format(args, "use-need-wakeup", "%s",
>                      dev->use_need_wakeup ? "true" : "false");
>      ovs_mutex_unlock(&dev->mutex);
> @@ -596,7 +660,7 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
>      ovs_mutex_lock(&dev->mutex);
>
>      if (netdev->n_rxq == dev->requested_n_rxq
> -        && dev->xdpmode == dev->requested_xdpmode
> +        && dev->xdp_mode == dev->requested_xdp_mode
>          && dev->use_need_wakeup == dev->requested_need_wakeup
>          && dev->xsks) {
>          goto out;
> @@ -607,9 +671,9 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
>      netdev->n_rxq = dev->requested_n_rxq;
>      netdev->n_txq = netdev->n_rxq;
>
> -    dev->xdpmode = dev->requested_xdpmode;
> +    dev->xdp_mode = dev->requested_xdp_mode;
>      VLOG_INFO("%s: Setting XDP mode to %s.", netdev_get_name(netdev),
> -              dev->xdpmode == XDP_ZEROCOPY ? "DRV" : "SKB");
> +              xdp_modes[dev->xdp_mode].name);
>
>      if (setrlimit(RLIMIT_MEMLOCK, &r)) {
>          VLOG_ERR("setrlimit(RLIMIT_MEMLOCK) failed: %s", 
> ovs_strerror(errno));
> @@ -618,7 +682,8 @@ netdev_afxdp_reconfigure(struct netdev *netdev)
>
>      err = xsk_configure_all(netdev);
>      if (err) {
> -        VLOG_ERR("AF_XDP device %s reconfig failed.", 
> netdev_get_name(netdev));
> +        VLOG_ERR("%s: AF_XDP device reconfiguration failed.",
> +                 netdev_get_name(netdev));
>      }
>      netdev_change_seq_changed(netdev);
>  out:
> @@ -638,17 +703,9 @@ netdev_afxdp_get_numa_id(const struct netdev 
> *netdev)
>  }
>
>  static void
> -xsk_remove_xdp_program(uint32_t ifindex, int xdpmode)
> +xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode mode)
>  {
> -    uint32_t flags;
> -
> -    flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
> -
> -    if (xdpmode == XDP_COPY) {
> -        flags |= XDP_FLAGS_SKB_MODE;
> -    } else if (xdpmode == XDP_ZEROCOPY) {
> -        flags |= XDP_FLAGS_DRV_MODE;
> -    }
> +    uint32_t flags = xdp_modes[mode].xdp_flags | 
> XDP_FLAGS_UPDATE_IF_NOEXIST;
>
>      bpf_set_link_xdp_fd(ifindex, -1, flags);
>  }
> @@ -662,7 +719,7 @@ signal_remove_xdp(struct netdev *netdev)
>      ifindex = linux_get_ifindex(netdev_get_name(netdev));
>
>      VLOG_WARN("Force removing xdp program.");
> -    xsk_remove_xdp_program(ifindex, dev->xdpmode);
> +    xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use);
>  }
>
>  static struct dp_packet_afxdp *
> @@ -782,7 +839,8 @@ netdev_afxdp_rxq_recv(struct netdev_rxq *rxq_, 
> struct dp_packet_batch *batch,
>  }
>
>  static inline int
> -kick_tx(struct xsk_socket_info *xsk_info, int xdpmode, bool 
> use_need_wakeup)
> +kick_tx(struct xsk_socket_info *xsk_info, enum afxdp_mode mode,
> +        bool use_need_wakeup)
>  {
>      int ret, retries;
>      static const int KERNEL_TX_BATCH_SIZE = 16;
> @@ -791,11 +849,11 @@ kick_tx(struct xsk_socket_info *xsk_info, int 
> xdpmode, bool use_need_wakeup)
>          return 0;
>      }
>
> -    /* In SKB_MODE packet transmission is synchronous, and the kernel 
> xmits
> +    /* In generic mode packet transmission is synchronous, and the 
> kernel xmits
>       * only TX_BATCH_SIZE(16) packets for a single sendmsg syscall.
>       * So, we have to kick the kernel (n_packets / 16) times to be 
> sure that
>       * all packets are transmitted. */
> -    retries = (xdpmode == XDP_COPY)
> +    retries = (mode == OVS_AF_XDP_MODE_GENERIC)
>                ? xsk_info->outstanding_tx / KERNEL_TX_BATCH_SIZE
>                : 0;
>  kick_retry:
> @@ -962,7 +1020,7 @@ __netdev_afxdp_batch_send(struct netdev *netdev, 
> int qid,
>                             &orig);
>          COVERAGE_INC(afxdp_tx_full);
>          afxdp_complete_tx(xsk_info);
> -        kick_tx(xsk_info, dev->xdpmode, dev->use_need_wakeup);
> +        kick_tx(xsk_info, dev->xdp_mode_in_use, 
> dev->use_need_wakeup);
>          error = ENOMEM;
>          goto out;
>      }
> @@ -986,7 +1044,7 @@ __netdev_afxdp_batch_send(struct netdev *netdev, 
> int qid,
>      xsk_ring_prod__submit(&xsk_info->tx, 
> dp_packet_batch_size(batch));
>      xsk_info->outstanding_tx += dp_packet_batch_size(batch);
>
> -    ret = kick_tx(xsk_info, dev->xdpmode, dev->use_need_wakeup);
> +    ret = kick_tx(xsk_info, dev->xdp_mode_in_use, 
> dev->use_need_wakeup);
>      if (OVS_UNLIKELY(ret)) {
>          VLOG_WARN_RL(&rl, "%s: error sending AF_XDP packet: %s.",
>                       netdev_get_name(netdev), ovs_strerror(ret));
> @@ -1052,10 +1110,11 @@ netdev_afxdp_construct(struct netdev *netdev)
>      /* Queues should not be used before the first reconfiguration. 
> Clearing. */
>      netdev->n_rxq = 0;
>      netdev->n_txq = 0;
> -    dev->xdpmode = 0;
> +    dev->xdp_mode = OVS_AF_XDP_MODE_UNSPEC;
> +    dev->xdp_mode_in_use = OVS_AF_XDP_MODE_UNSPEC;
>
>      dev->requested_n_rxq = NR_QUEUE;
> -    dev->requested_xdpmode = XDP_COPY;
> +    dev->requested_xdp_mode = OVS_AF_XDP_MODE_BEST_EFFORT;
>      dev->requested_need_wakeup = NEED_WAKEUP_DEFAULT;
>
>      dev->xsks = NULL;
> diff --git a/lib/netdev-afxdp.h b/lib/netdev-afxdp.h
> index e2f400b72..4fe861d2d 100644
> --- a/lib/netdev-afxdp.h
> +++ b/lib/netdev-afxdp.h
> @@ -25,6 +25,15 @@
>  /* These functions are Linux AF_XDP specific, so they should be used 
> directly
>   * only by Linux-specific code. */
>
> +enum afxdp_mode {
> +    OVS_AF_XDP_MODE_UNSPEC,
> +    OVS_AF_XDP_MODE_BEST_EFFORT,
> +    OVS_AF_XDP_MODE_NATIVE_ZC,
> +    OVS_AF_XDP_MODE_NATIVE,
> +    OVS_AF_XDP_MODE_GENERIC,
> +    OVS_AF_XDP_MODE_MAX,
> +};
> +
>  struct netdev;
>  struct xsk_socket_info;
>  struct xdp_umem;
> diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h
> index c14f2fb81..8873caa9d 100644
> --- a/lib/netdev-linux-private.h
> +++ b/lib/netdev-linux-private.h
> @@ -100,10 +100,14 @@ struct netdev_linux {
>      /* AF_XDP information. */
>      struct xsk_socket_info **xsks;
>      int requested_n_rxq;
> -    int xdpmode;                /* AF_XDP running mode: driver or 
> skb. */
> -    int requested_xdpmode;
> +
> +    enum afxdp_mode xdp_mode;               /* Configured AF_XDP 
> mode. */
> +    enum afxdp_mode requested_xdp_mode;     /* Requested  AF_XDP 
> mode. */
> +    enum afxdp_mode xdp_mode_in_use;        /* Effective  AF_XDP 
> mode. */
> +
>      bool use_need_wakeup;
>      bool requested_need_wakeup;
> +
>      struct ovs_spin *tx_locks;  /* spin lock array for TX queues. */
>  #endif
>  };
> diff --git a/tests/system-afxdp-macros.at 
> b/tests/system-afxdp-macros.at
> index f0683c0a9..5ee2ceb1a 100644
> --- a/tests/system-afxdp-macros.at
> +++ b/tests/system-afxdp-macros.at
> @@ -30,10 +30,3 @@ m4_define([CONFIGURE_VETH_OFFLOADS],
>       AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore])
>      ]
>  )
> -
> -# OVS_START_L7([namespace], [protocol])
> -#
> -# AF_XDP doesn't work with TCP over virtual interfaces for now.
> -#
> -m4_define([OVS_START_L7],
> -   [AT_SKIP_IF([:])])
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index efdfb83bb..02a68deb1 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -3107,18 +3107,38 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 
> type=patch options:peer=p1 \
>          </p>
>        </column>
>
> -      <column name="options" key="xdpmode"
> +      <column name="options" key="xdp-mode"
>                type='{"type": "string",
> -                     "enum": ["set", ["skb", "drv"]]}'>
> +                     "enum": ["set", ["best-effort", 
> "native-with-zerocopy",
> +                                      "native", "generic"]]}'>
>          <p>
>            Specifies the operational mode of the XDP program.
> -          If "drv", the XDP program is loaded into the device driver 
> with
> -          zero-copy RX and TX enabled. This mode requires device 
> driver with
> -          AF_XDP support and has the best performance.
> -          If "skb", the XDP program is using generic XDP mode in 
> kernel with
> -          extra data copying between userspace and kernel. No device 
> driver
> -          support is needed. Note that this is afxdp netdev type 
> only.
> -          Defaults to "skb" mode.
> +          <p>
> +            In <code>native-with-zerocopy</code> mode the XDP program 
> is loaded
> +            into the device driver with zero-copy RX and TX enabled.  
> This mode
> +            requires device driver support and has the best 
> performance because
> +            there should be no copying of packets.
> +          </p>
> +          <p>
> +            <code>native</code> is the same as
> +            <code>native-with-zerocopy</code>, but without zero-copy
> +            capability.  This requires at least one copy between 
> kernel and the
> +            userspace. This mode also requires support from device 
> driver.
> +          </p>
> +          <p>
> +            In <code>generic</code> case the XDP program in kernel 
> works after
> +            skb allocation on early stages of packet processing 
> inside the
> +            network stack.  This mode doesn't require driver support, 
> but has
> +            much lower performance.
> +          </p>
> +          <p>
> +            <code>best-effort</code> tries to detect and choose the 
> best
> +            (fastest) from the available modes for current interface.
> +          </p>
> +          <p>
> +            Note that this option is specific to netdev-afxdp.
> +            Defaults to <code>best-effort</code> mode.
> +          </p>
>          </p>
>        </column>
>
> -- 
> 2.17.1
Ilya Maximets Nov. 19, 2019, 4:52 p.m. UTC | #12
On 19.11.2019 17:16, Eelco Chaudron wrote:
> 
> 
> On 7 Nov 2019, at 12:36, Ilya Maximets wrote:
> 
>> Until now there was only two options for XDP mode in OVS: SKB or DRV.
>> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
>>
>> Devices like 'veth' interfaces in Linux supports native XDP, but
>> doesn't support zero-copy mode.  This case can not be covered by
>> existing API and we have to use slower generic XDP for such devices.
>> There are few more issues, e.g. TCP is not supported in generic XDP
>> mode for veth interfaces due to kernel limitations, however it is
>> supported in native mode.
>>
>> This change introduces ability to use native XDP without zero-copy
>> along with best-effort configuration option that enabled by default.
>> In best-effort case OVS will sequentially try different modes starting
>> from the fastest one and will choose the first acceptable for current
>> interface.  This will guarantee the best possible performance.
>>
>> If user will want to choose specific mode, it's still possible by
>> setting the 'options:xdp-mode'.
>>
>> This change additionally changes the API by renaming the configuration
>> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
>> themselves to be more user-friendly.
>>
>> The full list of currently supported modes:
>>   * native-with-zerocopy - former DRV
>>   * native               - new one, DRV without zero-copy
>>   * generic              - former SKB
>>   * best-effort          - new one, chooses the best available from
>>                            3 above modes
>>
>> Since 'best-effort' is a default mode, users will not need to
>> explicitely set 'xdp-mode' in most cases.
>>
>> TCP related tests enabled back in system afxdp testsuite, because
>> 'best-effort' will choose 'native' mode for veth interfaces
>> and this mode has no issues with TCP.
> Patch in general looks good, two small comments inline.

Thanks for review.

> 
> The only thing that bothers me is the worse performance of the TAP interface with the new default config. Can we somehow keep the old behavior for TAP interfaces?

Could you check if TCP works over tap interfaces in generic mode?
For me the point is that correctness is better than performance.
I also hope that native implementation for tap will be improved
over time.

> 
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>> ---
>>
>> With this patch I modified the user-visible API, but I think it's OK
>> since it's still an experimental netdev.  Comments are welcome.
>>
>> Version 2:
>>   * Rebased on current master.
>>
>>  Documentation/intro/install/afxdp.rst |  54 ++++---
>>  NEWS                                  |  12 +-
>>  lib/netdev-afxdp.c                    | 223 ++++++++++++++++----------
>>  lib/netdev-afxdp.h                    |   9 ++
>>  lib/netdev-linux-private.h            |   8 +-
>>  tests/system-afxdp-macros.at          |   7 -
>>  vswitchd/vswitch.xml                  |  38 +++--
>>  7 files changed, 227 insertions(+), 124 deletions(-)
>>
>> diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst
>> index a136db0c9..937770ad0 100644
>> --- a/Documentation/intro/install/afxdp.rst
>> +++ b/Documentation/intro/install/afxdp.rst
>> @@ -153,9 +153,8 @@ To kick start end-to-end autotesting::
>>    make check-afxdp TESTSUITEFLAGS='1'
>>
>>  .. note::
>> -   Not all test cases pass at this time. Currenly all TCP related
>> -   tests, ex: using wget or http, are skipped due to XDP limitations
>> -   on veth. cvlan test is also skipped.
>> +   Not all test cases pass at this time. Currenly all cvlan tests are skipped
>> +   due to kernel issues.
>>
>>  If a test case fails, check the log at::
>>
>> @@ -177,33 +176,35 @@ in :doc:`general`::
>>    ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
>>
>>  Make sure your device driver support AF_XDP, netdev-afxdp supports
>> -the following additional options (see man ovs-vswitchd.conf.db for
>> +the following additional options (see ``man ovs-vswitchd.conf.db`` for
>>  more details):
>>
>> - * **xdpmode**: use "drv" for driver mode, or "skb" for skb mode.
>> + * ``xdp-mode``: ``best-effort``, ``native-with-zerocopy``,
>> +   ``native`` or ``generic``.  Defaults to ``best-effort``, i.e. best of
>> +   supported modes, so in most cases you don't need to change it.
>>
>> - * **use-need-wakeup**: default "true" if libbpf supports it, otherwise false.
>> + * ``use-need-wakeup``: default ``true`` if libbpf supports it,
>> +   otherwise ``false``.
>>
>>  For example, to use 1 PMD (on core 4) on 1 queue (queue 0) device,
>> -configure these options: **pmd-cpu-mask, pmd-rxq-affinity, and n_rxq**.
>> -The **xdpmode** can be "drv" or "skb"::
>> +configure these options: ``pmd-cpu-mask``, ``pmd-rxq-affinity``, and
>> +``n_rxq``::
>>
>>    ethtool -L enp2s0 combined 1
>>    ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
>>    ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
>> -    options:n_rxq=1 options:xdpmode=drv \
>> -    other_config:pmd-rxq-affinity="0:4"
>> +                                   other_config:pmd-rxq-affinity="0:4"
>>
>>  Or, use 4 pmds/cores and 4 queues by doing::
>>
>>    ethtool -L enp2s0 combined 4
>>    ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
>>    ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
>> -    options:n_rxq=4 options:xdpmode=drv \
>> -    other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
>> +    options:n_rxq=4 other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
>>
>>  .. note::
>> -   pmd-rxq-affinity is optional. If not specified, system will auto-assign.
>> +   ``pmd-rxq-affinity`` is optional. If not specified, system will auto-assign.
>> +   ``n_rxq`` equals ``1`` by default.
>>
>>  To validate that the bridge has successfully instantiated, you can use the::
>>
>> @@ -214,12 +215,21 @@ Should show something like::
>>    Port "ens802f0"
>>     Interface "ens802f0"
>>        type: afxdp
>> -      options: {n_rxq="1", xdpmode=drv}
>> +      options: {n_rxq="1"}
>>
>>  Otherwise, enable debugging by::
>>
>>    ovs-appctl vlog/set netdev_afxdp::dbg
>>
>> +To check which XDP mode was chosen by ``best-effort``, you can look for
>> +``xdp-mode-in-use`` in the output of ``ovs-appctl dpctl/show``::
>> +
>> +  # ovs-appctl dpctl/show
>> +  netdev@ovs-netdev:
>> +    <...>
>> +    port 2: ens802f0 (afxdp: n_rxq=1, use-need-wakeup=true,
>> +                      xdp-mode=best-effort,
>> +                      xdp-mode-in-use=native-with-zerocopy)
>>
>>  References
>>  ----------
>> @@ -323,8 +333,11 @@ Limitations/Known Issues
>>  #. Most of the tests are done using i40e single port. Multiple ports and
>>     also ixgbe driver also needs to be tested.
>>  #. No latency test result (TODO items)
>> -#. Due to limitations of current upstream kernel, TCP and various offloading
>> +#. Due to limitations of current upstream kernel, various offloading
>>     (vlan, cvlan) is not working over virtual interfaces (i.e. veth pair).
>> +   Also, TCP is not working over virtual interfaces in generic XDP mode.
>> +   Some more information and possible workaround available `here
>> +   <https://github.com/cilium/cilium/issues/3077#issuecomment-430801467>`__ .
>>
>>
>>  PVP using tap device
>> @@ -335,8 +348,7 @@ First, start OVS, then add physical port::
>>    ethtool -L enp2s0 combined 1
>>    ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
>>    ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
>> -    options:n_rxq=1 options:xdpmode=drv \
>> -    other_config:pmd-rxq-affinity="0:4"
>> +    options:n_rxq=1 other_config:pmd-rxq-affinity="0:4"
>>
>>  Start a VM with virtio and tap device::
>>
>> @@ -414,13 +426,11 @@ Create namespace and veth peer devices::
>>
>>  Attach the veth port to br0 (linux kernel mode)::
>>
>> -  ovs-vsctl add-port br0 afxdp-p0 -- \
>> -    set interface afxdp-p0 options:n_rxq=1
>> +  ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0
>>
>> -Or, use AF_XDP with skb mode::
>> +Or, use AF_XDP::
>>
>> -  ovs-vsctl add-port br0 afxdp-p0 -- \
>> -    set interface afxdp-p0 type="afxdp" options:n_rxq=1 options:xdpmode=skb
>> +  ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0 type="afxdp"
>>
>>  Setup the OpenFlow rules::
>>
>> diff --git a/NEWS b/NEWS
>> index 88b818948..d5f476d6e 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -5,11 +5,19 @@ Post-v2.12.0
>>         separate project. You can find it at
>>         https://github.com/ovn-org/ovn.git
>>     - Userspace datapath:
>> +     * Add option to enable, disable and query TCP sequence checking in
>> +       conntrack.
>> +   - AF_XDP:
>>       * New option 'use-need-wakeup' for netdev-afxdp to control enabling
>>         of corresponding 'need_wakeup' flag in AF_XDP rings.  Enabled by default
>>         if supported by libbpf.
>> -     * Add option to enable, disable and query TCP sequence checking in
>> -       conntrack.
>> +     * 'xdpmode' option for netdev-afxdp renamed to 'xdp-mode'.
>> +       Modes also updated.  New values:
>> +         native-with-zerocopy  - former DRV
>> +         native                - new one, DRV without zero-copy
>> +         generic               - former SKB
>> +         best-effort [default] - new one, chooses the best available from
>> +                                 3 above modes
>>
>>  v2.12.0 - 03 Sep 2019
>>  ---------------------
>> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
>> index af654d498..74dde219d 100644
>> --- a/lib/netdev-afxdp.c
>> +++ b/lib/netdev-afxdp.c
>> @@ -89,12 +89,42 @@ BUILD_ASSERT_DECL(PROD_NUM_DESCS == CONS_NUM_DESCS);
>>  #define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base))
>>
>>  static struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id,
>> -                                             int mode, bool use_need_wakeup);
>> -static void xsk_remove_xdp_program(uint32_t ifindex, int xdpmode);
>> +                                             enum afxdp_mode mode,
>> +                                             bool use_need_wakeup,
>> +                                             bool report_socket_failures);
>> +static void xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode);
>>  static void xsk_destroy(struct xsk_socket_info *xsk);
>>  static int xsk_configure_all(struct netdev *netdev);
>>  static void xsk_destroy_all(struct netdev *netdev);
>>
>> +static struct {
>> +    const char *name;
>> +    uint32_t bind_flags;
>> +    uint32_t xdp_flags;
>> +} xdp_modes[] = {
>> +    [OVS_AF_XDP_MODE_UNSPEC] = {
>> +        .name = "unspecified", .bind_flags = 0, .xdp_flags = 0,
>> +    },
>> +    [OVS_AF_XDP_MODE_BEST_EFFORT] = {
>> +        .name = "best-effort", .bind_flags = 0, .xdp_flags = 0,
>> +    },
> 
> <nitpicking> Maybe format the two entries above the same as below, as it’s more pleasing to my OCD brain </nitpicking>

Not a big deal for me.  I just wanted to save some space.
Could be changed before applying.

> 
>> +    [OVS_AF_XDP_MODE_NATIVE_ZC] = {
>> +        .name = "native-with-zerocopy",
>> +        .bind_flags = XDP_ZEROCOPY,
>> +        .xdp_flags = XDP_FLAGS_DRV_MODE,
>> +    },
>> +    [OVS_AF_XDP_MODE_NATIVE] = {
>> +        .name = "native",
>> +        .bind_flags = XDP_COPY,
>> +        .xdp_flags = XDP_FLAGS_DRV_MODE,
>> +    },
>> +    [OVS_AF_XDP_MODE_GENERIC] = {
>> +        .name = "generic",
>> +        .bind_flags = XDP_COPY,
>> +        .xdp_flags = XDP_FLAGS_SKB_MODE,
>> +    },
>> +};
>> +
>>  struct unused_pool {
>>      struct xsk_umem_info *umem_info;
>>      int lost_in_rings; /* Number of packets left in tx, rx, cq and fq. */
>> @@ -214,7 +244,7 @@ netdev_afxdp_sweep_unused_pools(void *aux OVS_UNUSED)
>>  }
>>
>>  static struct xsk_umem_info *
>> -xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
>> +xsk_configure_umem(void *buffer, uint64_t size)
>>  {
>>      struct xsk_umem_config uconfig;
>>      struct xsk_umem_info *umem;
>> @@ -232,9 +262,7 @@ xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
>>      ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
>>                             &uconfig);
>>      if (ret) {
>> -        VLOG_ERR("xsk_umem__create failed (%s) mode: %s",
>> -                 ovs_strerror(errno),
>> -                 xdpmode == XDP_COPY ? "SKB": "DRV");
>> +        VLOG_ERR("xsk_umem__create failed: %s.", ovs_strerror(errno));
>>          free(umem);
>>          return NULL;
>>      }
>> @@ -290,7 +318,8 @@ xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
>>
>>  static struct xsk_socket_info *
>>  xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
>> -                     uint32_t queue_id, int xdpmode, bool use_need_wakeup)
>> +                     uint32_t queue_id, enum afxdp_mode mode,
>> +                     bool use_need_wakeup, bool report_socket_failures)
>>  {
>>      struct xsk_socket_config cfg;
>>      struct xsk_socket_info *xsk;
>> @@ -304,14 +333,8 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
>>      cfg.rx_size = CONS_NUM_DESCS;
>>      cfg.tx_size = PROD_NUM_DESCS;
>>      cfg.libbpf_flags = 0;
>> -
>> -    if (xdpmode == XDP_ZEROCOPY) {
>> -        cfg.bind_flags = XDP_ZEROCOPY;
>> -        cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
>> -    } else {
>> -        cfg.bind_flags = XDP_COPY;
>> -        cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
>> -    }
>> +    cfg.bind_flags = xdp_modes[mode].bind_flags;
>> +    cfg.xdp_flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST;
>>
>>  #ifdef HAVE_XDP_NEED_WAKEUP
>>      if (use_need_wakeup) {
>> @@ -329,12 +352,11 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
>>      ret = xsk_socket__create(&xsk->xsk, devname, queue_id, umem->umem,
>>                               &xsk->rx, &xsk->tx, &cfg);
>>      if (ret) {
>> -        VLOG_ERR("xsk_socket__create failed (%s) mode: %s "
>> -                 "use-need-wakeup: %s qid: %d",
>> -                 ovs_strerror(errno),
>> -                 xdpmode == XDP_COPY ? "SKB": "DRV",
>> -                 use_need_wakeup ? "true" : "false",
>> -                 queue_id);
>> +        VLOG(report_socket_failures ? VLL_ERR : VLL_DBG,
>> +             "xsk_socket__create failed (%s) mode: %s, "
>> +             "use-need-wakeup: %s, qid: %d",
>> +             ovs_strerror(errno), xdp_modes[mode].name,
>> +             use_need_wakeup ? "true" : "false", queue_id);
>>          free(xsk);
>>          return NULL;
>>      }
>> @@ -375,8 +397,8 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
>>  }
>>
>>  static struct xsk_socket_info *
>> -xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
>> -              bool use_need_wakeup)
>> +xsk_configure(int ifindex, int xdp_queue_id, enum afxdp_mode mode,
>> +              bool use_need_wakeup, bool report_socket_failures)
>>  {
>>      struct xsk_socket_info *xsk;
>>      struct xsk_umem_info *umem;
>> @@ -389,9 +411,7 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
>>      memset(bufs, 0, NUM_FRAMES * FRAME_SIZE);
>>
>>      /* Create AF_XDP socket. */
>> -    umem = xsk_configure_umem(bufs,
>> -                              NUM_FRAMES * FRAME_SIZE,
>> -                              xdpmode);
>> +    umem = xsk_configure_umem(bufs, NUM_FRAMES * FRAME_SIZE);
>>      if (!umem) {
>>          free_pagealign(bufs);
>>          return NULL;
>> @@ -399,8 +419,8 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
>>
>>      VLOG_DBG("Allocated umem pool at 0x%"PRIxPTR, (uintptr_t) umem);
>>
>> -    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, xdpmode,
>> -                               use_need_wakeup);
>> +    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, mode,
>> +                               use_need_wakeup, report_socket_failures);
>>      if (!xsk) {
>>          /* Clean up umem and xpacket pool. */
>>          if (xsk_umem__delete(umem->umem)) {
>> @@ -414,12 +434,38 @@ xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
>>      return xsk;
>>  }
>>
>> +static int
>> +xsk_configure_queue(struct netdev_linux *dev, int ifindex, int queue_id,
>> +                    enum afxdp_mode mode, bool report_socket_failures)
>> +{
>> +    struct xsk_socket_info *xsk_info;
>> +
>> +    VLOG_DBG("%s: configuring queue: %d, mode: %s, use-need-wakeup: %s.",
>> +             netdev_get_name(&dev->up), queue_id, xdp_modes[mode].name,
>> +             dev->use_need_wakeup ? "true" : "false");
>> +    xsk_info = xsk_configure(ifindex, queue_id, mode, dev->use_need_wakeup,
>> +                             report_socket_failures);
>> +    if (!xsk_info) {
>> +        VLOG(report_socket_failures ? VLL_ERR : VLL_DBG,
>> +             "%s: Failed to create AF_XDP socket on queue %d in %s mode.",
>> +             netdev_get_name(&dev->up), queue_id, xdp_modes[mode].name);
>> +        dev->xsks[queue_id] = NULL;
>> +        return -1;
>> +    }
>> +    dev->xsks[queue_id] = xsk_info;
>> +    atomic_init(&xsk_info->tx_dropped, 0);
>> +    xsk_info->outstanding_tx = 0;
>> +    xsk_info->available_rx = PROD_NUM_DESCS;
>> +    return 0;
>> +}
>> +
>> +
>>  static int
>>  xsk_configure_all(struct netdev *netdev)
>>  {
>>      struct netdev_linux *dev = netdev_linux_cast(netdev);
>> -    struct xsk_socket_info *xsk_info;
>>      int i, ifindex, n_rxq, n_txq;
>> +    int qid = 0;
>>
>>      ifindex = linux_get_ifindex(netdev_get_name(netdev));
>>
>> @@ -429,23 +475,36 @@ xsk_configure_all(struct netdev *netdev)
>>      n_rxq = netdev_n_rxq(netdev);
>>      dev->xsks = xcalloc(n_rxq, sizeof *dev->xsks);
>>
>> -    /* Configure each queue. */
>> -    for (i = 0; i < n_rxq; i++) {
>> -        VLOG_DBG("%s: configure queue %d mode %s use-need-wakeup %s.",
>> -                 netdev_get_name(netdev), i,
>> -                 dev->xdpmode == XDP_COPY ? "SKB" : "DRV",
>> -                 dev->use_need_wakeup ? "true" : "false");
>> -        xsk_info = xsk_configure(ifindex, i, dev->xdpmode,
>> -                                 dev->use_need_wakeup);
>> -        if (!xsk_info) {
>> -            VLOG_ERR("Failed to create AF_XDP socket on queue %d.", i);
>> -            dev->xsks[i] = NULL;
>> +    if (dev->xdp_mode == OVS_AF_XDP_MODE_BEST_EFFORT) {
>> +        /* Trying to configure first queue with different modes to
>> +         * find the most suitable. */
>> +        for (i = OVS_AF_XDP_MODE_NATIVE_ZC; i < OVS_AF_XDP_MODE_MAX; i++) {
>> +            if (!xsk_configure_queue(dev, ifindex, qid, i,
>> +                                     i == OVS_AF_XDP_MODE_MAX - 1)) {
>> +                dev->xdp_mode_in_use = i;
>> +                VLOG_INFO("%s: %s XDP mode will be in use.",
>> +                          netdev_get_name(netdev), xdp_modes[i].name);
>> +                break;
>> +            }
>> +        }
> 
> What if all modes fail? Guess it will use empty flags in xsk_configure_socket(). I would suggest to do a check in xsk_configure_socket() and bailout with a specific error?

What about the code on the next line?
It is here exactly for that purpose.  Am I missing something?

> 
>> +        if (i == OVS_AF_XDP_MODE_MAX) {
>> +            VLOG_ERR("%s: Failed to detect suitable XDP mode.",
>> +                     netdev_get_name(netdev));
>> +            goto err;
>> +        }
>> +        qid++;
>> +    } else {
>> +        dev->xdp_mode_in_use = dev->xdp_mode;
>> +    }
>> +
>> +    /* Configure remaining queues. */
>> +    for (; qid < n_rxq; qid++) {
>> +        if (xsk_configure_queue(dev, ifindex, qid,
>> +                                dev->xdp_mode_in_use, true)) {
>> +            VLOG_ERR("%s: Failed to create AF_XDP socket on queue %d.",
>> +                     netdev_get_name(netdev), qid);
>>              goto err;
>>          }
>> -        dev->xsks[i] = xsk_info;
>> -        atomic_init(&xsk_info->tx_dropped, 0);
>> -        xsk_info->outstanding_tx = 0;
>> -        xsk_info->available_rx = PROD_NUM_DESCS;
>>      }

<snip>
William Tu Nov. 19, 2019, 7:45 p.m. UTC | #13
On Tue, Nov 19, 2019 at 05:52:22PM +0100, Ilya Maximets wrote:
> On 19.11.2019 17:16, Eelco Chaudron wrote:
> > 
> > 
> > On 7 Nov 2019, at 12:36, Ilya Maximets wrote:
> > 
> >> Until now there was only two options for XDP mode in OVS: SKB or DRV.
> >> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
> >>
> >> Devices like 'veth' interfaces in Linux supports native XDP, but
> >> doesn't support zero-copy mode.  This case can not be covered by
> >> existing API and we have to use slower generic XDP for such devices.
> >> There are few more issues, e.g. TCP is not supported in generic XDP
> >> mode for veth interfaces due to kernel limitations, however it is
> >> supported in native mode.
> >>
> >> This change introduces ability to use native XDP without zero-copy
> >> along with best-effort configuration option that enabled by default.
> >> In best-effort case OVS will sequentially try different modes starting
> >> from the fastest one and will choose the first acceptable for current
> >> interface.  This will guarantee the best possible performance.
> >>
> >> If user will want to choose specific mode, it's still possible by
> >> setting the 'options:xdp-mode'.
> >>
> >> This change additionally changes the API by renaming the configuration
> >> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
> >> themselves to be more user-friendly.
> >>
> >> The full list of currently supported modes:
> >>   * native-with-zerocopy - former DRV
> >>   * native               - new one, DRV without zero-copy
> >>   * generic              - former SKB
> >>   * best-effort          - new one, chooses the best available from
> >>                            3 above modes
> >>
> >> Since 'best-effort' is a default mode, users will not need to
> >> explicitely set 'xdp-mode' in most cases.
> >>
> >> TCP related tests enabled back in system afxdp testsuite, because
> >> 'best-effort' will choose 'native' mode for veth interfaces
> >> and this mode has no issues with TCP.
> > Patch in general looks good, two small comments inline.
> 
> Thanks for review.
> 
> > 
> > The only thing that bothers me is the worse performance of the TAP interface with the new default config. Can we somehow keep the old behavior for TAP interfaces?
> 
> Could you check if TCP works over tap interfaces in generic mode?
> For me the point is that correctness is better than performance.
> I also hope that native implementation for tap will be improved
> over time.
> 

I agree that we should first make sure correctness.
I created a simple TCP test using br0, since br0 is a tap device.
Unfortunately it does not work...

---
ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev

ip netns add at_ns0
ip link add p0 type veth peer name afxdp-p0
ip link set p0 netns at_ns0
ip link set dev afxdp-p0 up

ovs-vsctl add-port br0 afxdp-p0
ovs-vsctl -- set interface afxdp-p0 options:n_rxq=1 type="afxdp" options:xdp-mode=native

ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
ip addr add "10.1.1.1/24" dev p0
ip link set dev p0 up
NS_EXEC_HEREDOC

ovs-vsctl -- set interface br0 options:n_rxq=1 type="afxdp" options:xdp-mode=native
ip addr add "10.1.1.2/24" dev br0
ip link set dev br0 up

# UDP works ok
ip netns exec at_ns0 nc -u -l 1234
nc -u 10.1.1.1 1234
<works ok>

# TCP still fails
ip netns exec at_ns0 nc -l 1234
nc 10.1.1.1 1234
<no reponse>
Ilya Maximets Nov. 19, 2019, 7:51 p.m. UTC | #14
On 19.11.2019 20:45, William Tu wrote:
> On Tue, Nov 19, 2019 at 05:52:22PM +0100, Ilya Maximets wrote:
>> On 19.11.2019 17:16, Eelco Chaudron wrote:
>>>
>>>
>>> On 7 Nov 2019, at 12:36, Ilya Maximets wrote:
>>>
>>>> Until now there was only two options for XDP mode in OVS: SKB or DRV.
>>>> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
>>>>
>>>> Devices like 'veth' interfaces in Linux supports native XDP, but
>>>> doesn't support zero-copy mode.  This case can not be covered by
>>>> existing API and we have to use slower generic XDP for such devices.
>>>> There are few more issues, e.g. TCP is not supported in generic XDP
>>>> mode for veth interfaces due to kernel limitations, however it is
>>>> supported in native mode.
>>>>
>>>> This change introduces ability to use native XDP without zero-copy
>>>> along with best-effort configuration option that enabled by default.
>>>> In best-effort case OVS will sequentially try different modes starting
>>>> from the fastest one and will choose the first acceptable for current
>>>> interface.  This will guarantee the best possible performance.
>>>>
>>>> If user will want to choose specific mode, it's still possible by
>>>> setting the 'options:xdp-mode'.
>>>>
>>>> This change additionally changes the API by renaming the configuration
>>>> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
>>>> themselves to be more user-friendly.
>>>>
>>>> The full list of currently supported modes:
>>>>   * native-with-zerocopy - former DRV
>>>>   * native               - new one, DRV without zero-copy
>>>>   * generic              - former SKB
>>>>   * best-effort          - new one, chooses the best available from
>>>>                            3 above modes
>>>>
>>>> Since 'best-effort' is a default mode, users will not need to
>>>> explicitely set 'xdp-mode' in most cases.
>>>>
>>>> TCP related tests enabled back in system afxdp testsuite, because
>>>> 'best-effort' will choose 'native' mode for veth interfaces
>>>> and this mode has no issues with TCP.
>>> Patch in general looks good, two small comments inline.
>>
>> Thanks for review.
>>
>>>
>>> The only thing that bothers me is the worse performance of the TAP interface with the new default config. Can we somehow keep the old behavior for TAP interfaces?
>>
>> Could you check if TCP works over tap interfaces in generic mode?
>> For me the point is that correctness is better than performance.
>> I also hope that native implementation for tap will be improved
>> over time.
>>
> 
> I agree that we should first make sure correctness.
> I created a simple TCP test using br0, since br0 is a tap device.
> Unfortunately it does not work...
> 
> ---
> ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
> 
> ip netns add at_ns0
> ip link add p0 type veth peer name afxdp-p0
> ip link set p0 netns at_ns0
> ip link set dev afxdp-p0 up
> 
> ovs-vsctl add-port br0 afxdp-p0
> ovs-vsctl -- set interface afxdp-p0 options:n_rxq=1 type="afxdp" options:xdp-mode=native
> 
> ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
> ip addr add "10.1.1.1/24" dev p0
> ip link set dev p0 up
> NS_EXEC_HEREDOC
> 
> ovs-vsctl -- set interface br0 options:n_rxq=1 type="afxdp" options:xdp-mode=native

I'm not sure if this is a valid thing to do.
After changing the netdev type this device is no linger TAP device.
Now it's a permanent tap device that detached from any application
and opened by af_xdp from the other side.

BTW, changing the type of "internal" port doesn't sound safe.

> ip addr add "10.1.1.2/24" dev br0
> ip link set dev br0 up
> 
> # UDP works ok
> ip netns exec at_ns0 nc -u -l 1234
> nc -u 10.1.1.1 1234
> <works ok>
> 
> # TCP still fails
> ip netns exec at_ns0 nc -l 1234
> nc 10.1.1.1 1234
> <no reponse>
>
William Tu Nov. 19, 2019, 8:09 p.m. UTC | #15
On Tue, Nov 19, 2019 at 08:51:19PM +0100, Ilya Maximets wrote:
> On 19.11.2019 20:45, William Tu wrote:
> > On Tue, Nov 19, 2019 at 05:52:22PM +0100, Ilya Maximets wrote:
> >> On 19.11.2019 17:16, Eelco Chaudron wrote:
> >>>
> >>>
> >>> On 7 Nov 2019, at 12:36, Ilya Maximets wrote:
> >>>
> >>>> Until now there was only two options for XDP mode in OVS: SKB or DRV.
> >>>> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
> >>>>
> >>>> Devices like 'veth' interfaces in Linux supports native XDP, but
> >>>> doesn't support zero-copy mode.  This case can not be covered by
> >>>> existing API and we have to use slower generic XDP for such devices.
> >>>> There are few more issues, e.g. TCP is not supported in generic XDP
> >>>> mode for veth interfaces due to kernel limitations, however it is
> >>>> supported in native mode.
> >>>>
> >>>> This change introduces ability to use native XDP without zero-copy
> >>>> along with best-effort configuration option that enabled by default.
> >>>> In best-effort case OVS will sequentially try different modes starting
> >>>> from the fastest one and will choose the first acceptable for current
> >>>> interface.  This will guarantee the best possible performance.
> >>>>
> >>>> If user will want to choose specific mode, it's still possible by
> >>>> setting the 'options:xdp-mode'.
> >>>>
> >>>> This change additionally changes the API by renaming the configuration
> >>>> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
> >>>> themselves to be more user-friendly.
> >>>>
> >>>> The full list of currently supported modes:
> >>>>   * native-with-zerocopy - former DRV
> >>>>   * native               - new one, DRV without zero-copy
> >>>>   * generic              - former SKB
> >>>>   * best-effort          - new one, chooses the best available from
> >>>>                            3 above modes
> >>>>
> >>>> Since 'best-effort' is a default mode, users will not need to
> >>>> explicitely set 'xdp-mode' in most cases.
> >>>>
> >>>> TCP related tests enabled back in system afxdp testsuite, because
> >>>> 'best-effort' will choose 'native' mode for veth interfaces
> >>>> and this mode has no issues with TCP.
> >>> Patch in general looks good, two small comments inline.
> >>
> >> Thanks for review.
> >>
> >>>
> >>> The only thing that bothers me is the worse performance of the TAP interface with the new default config. Can we somehow keep the old behavior for TAP interfaces?
> >>
> >> Could you check if TCP works over tap interfaces in generic mode?
> >> For me the point is that correctness is better than performance.
> >> I also hope that native implementation for tap will be improved
> >> over time.
> >>
> > 
> > I agree that we should first make sure correctness.
> > I created a simple TCP test using br0, since br0 is a tap device.
> > Unfortunately it does not work...
> > 
> > ---
> > ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
> > 
> > ip netns add at_ns0
> > ip link add p0 type veth peer name afxdp-p0
> > ip link set p0 netns at_ns0
> > ip link set dev afxdp-p0 up
> > 
> > ovs-vsctl add-port br0 afxdp-p0
> > ovs-vsctl -- set interface afxdp-p0 options:n_rxq=1 type="afxdp" options:xdp-mode=native
> > 
> > ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
> > ip addr add "10.1.1.1/24" dev p0
> > ip link set dev p0 up
> > NS_EXEC_HEREDOC
> > 
> > ovs-vsctl -- set interface br0 options:n_rxq=1 type="afxdp" options:xdp-mode=native
> 
> I'm not sure if this is a valid thing to do.
> After changing the netdev type this device is no linger TAP device.
> Now it's a permanent tap device that detached from any application
> and opened by af_xdp from the other side.
> 
> BTW, changing the type of "internal" port doesn't sound safe.

I see, thanks.

Then if we don't change br0 to afxdp type, the TCP test
below still fails, which I think should pass because
we are using native mode for veth now.

I think even native mode doesn't work for TCP?

--William
> 
> > ip addr add "10.1.1.2/24" dev br0
> > ip link set dev br0 up
> > 
> > # UDP works ok
> > ip netns exec at_ns0 nc -u -l 1234
> > nc -u 10.1.1.1 1234
> > <works ok>
> > 
> > # TCP still fails
> > ip netns exec at_ns0 nc -l 1234
> > nc 10.1.1.1 1234
> > <no reponse>
> >
Ilya Maximets Nov. 19, 2019, 8:26 p.m. UTC | #16
On 19.11.2019 21:09, William Tu wrote:
> On Tue, Nov 19, 2019 at 08:51:19PM +0100, Ilya Maximets wrote:
>> On 19.11.2019 20:45, William Tu wrote:
>>> On Tue, Nov 19, 2019 at 05:52:22PM +0100, Ilya Maximets wrote:
>>>> On 19.11.2019 17:16, Eelco Chaudron wrote:
>>>>>
>>>>>
>>>>> On 7 Nov 2019, at 12:36, Ilya Maximets wrote:
>>>>>
>>>>>> Until now there was only two options for XDP mode in OVS: SKB or DRV.
>>>>>> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
>>>>>>
>>>>>> Devices like 'veth' interfaces in Linux supports native XDP, but
>>>>>> doesn't support zero-copy mode.  This case can not be covered by
>>>>>> existing API and we have to use slower generic XDP for such devices.
>>>>>> There are few more issues, e.g. TCP is not supported in generic XDP
>>>>>> mode for veth interfaces due to kernel limitations, however it is
>>>>>> supported in native mode.
>>>>>>
>>>>>> This change introduces ability to use native XDP without zero-copy
>>>>>> along with best-effort configuration option that enabled by default.
>>>>>> In best-effort case OVS will sequentially try different modes starting
>>>>>> from the fastest one and will choose the first acceptable for current
>>>>>> interface.  This will guarantee the best possible performance.
>>>>>>
>>>>>> If user will want to choose specific mode, it's still possible by
>>>>>> setting the 'options:xdp-mode'.
>>>>>>
>>>>>> This change additionally changes the API by renaming the configuration
>>>>>> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
>>>>>> themselves to be more user-friendly.
>>>>>>
>>>>>> The full list of currently supported modes:
>>>>>>   * native-with-zerocopy - former DRV
>>>>>>   * native               - new one, DRV without zero-copy
>>>>>>   * generic              - former SKB
>>>>>>   * best-effort          - new one, chooses the best available from
>>>>>>                            3 above modes
>>>>>>
>>>>>> Since 'best-effort' is a default mode, users will not need to
>>>>>> explicitely set 'xdp-mode' in most cases.
>>>>>>
>>>>>> TCP related tests enabled back in system afxdp testsuite, because
>>>>>> 'best-effort' will choose 'native' mode for veth interfaces
>>>>>> and this mode has no issues with TCP.
>>>>> Patch in general looks good, two small comments inline.
>>>>
>>>> Thanks for review.
>>>>
>>>>>
>>>>> The only thing that bothers me is the worse performance of the TAP interface with the new default config. Can we somehow keep the old behavior for TAP interfaces?
>>>>
>>>> Could you check if TCP works over tap interfaces in generic mode?
>>>> For me the point is that correctness is better than performance.
>>>> I also hope that native implementation for tap will be improved
>>>> over time.
>>>>
>>>
>>> I agree that we should first make sure correctness.
>>> I created a simple TCP test using br0, since br0 is a tap device.
>>> Unfortunately it does not work...
>>>
>>> ---
>>> ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
>>>
>>> ip netns add at_ns0
>>> ip link add p0 type veth peer name afxdp-p0
>>> ip link set p0 netns at_ns0
>>> ip link set dev afxdp-p0 up
>>>
>>> ovs-vsctl add-port br0 afxdp-p0
>>> ovs-vsctl -- set interface afxdp-p0 options:n_rxq=1 type="afxdp" options:xdp-mode=native
>>>
>>> ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
>>> ip addr add "10.1.1.1/24" dev p0
>>> ip link set dev p0 up
>>> NS_EXEC_HEREDOC
>>>
>>> ovs-vsctl -- set interface br0 options:n_rxq=1 type="afxdp" options:xdp-mode=native
>>
>> I'm not sure if this is a valid thing to do.
>> After changing the netdev type this device is no linger TAP device.
>> Now it's a permanent tap device that detached from any application
>> and opened by af_xdp from the other side.
>>
>> BTW, changing the type of "internal" port doesn't sound safe.
> 
> I see, thanks.
> 
> Then if we don't change br0 to afxdp type, the TCP test
> below still fails, which I think should pass because
> we are using native mode for veth now.
> 
> I think even native mode doesn't work for TCP?

Disable Tx offloading on both p0 and br0 and it will work.

> 
> --William
>>
>>> ip addr add "10.1.1.2/24" dev br0
>>> ip link set dev br0 up
>>>
>>> # UDP works ok
>>> ip netns exec at_ns0 nc -u -l 1234
>>> nc -u 10.1.1.1 1234
>>> <works ok>
>>>
>>> # TCP still fails
>>> ip netns exec at_ns0 nc -l 1234
>>> nc 10.1.1.1 1234
>>> <no reponse>
>>>
William Tu Nov. 19, 2019, 8:31 p.m. UTC | #17
On Tue, Nov 19, 2019 at 12:09:02PM -0800, William Tu wrote:
> On Tue, Nov 19, 2019 at 08:51:19PM +0100, Ilya Maximets wrote:
> > On 19.11.2019 20:45, William Tu wrote:
> > > On Tue, Nov 19, 2019 at 05:52:22PM +0100, Ilya Maximets wrote:
> > >> On 19.11.2019 17:16, Eelco Chaudron wrote:
> > >>>
> > >>>
> > >>> On 7 Nov 2019, at 12:36, Ilya Maximets wrote:
> > >>>
> > >>>> Until now there was only two options for XDP mode in OVS: SKB or DRV.
> > >>>> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
> > >>>>
> > >>>> Devices like 'veth' interfaces in Linux supports native XDP, but
> > >>>> doesn't support zero-copy mode.  This case can not be covered by
> > >>>> existing API and we have to use slower generic XDP for such devices.
> > >>>> There are few more issues, e.g. TCP is not supported in generic XDP
> > >>>> mode for veth interfaces due to kernel limitations, however it is
> > >>>> supported in native mode.
> > >>>>
> > >>>> This change introduces ability to use native XDP without zero-copy
> > >>>> along with best-effort configuration option that enabled by default.
> > >>>> In best-effort case OVS will sequentially try different modes starting
> > >>>> from the fastest one and will choose the first acceptable for current
> > >>>> interface.  This will guarantee the best possible performance.
> > >>>>
> > >>>> If user will want to choose specific mode, it's still possible by
> > >>>> setting the 'options:xdp-mode'.
> > >>>>
> > >>>> This change additionally changes the API by renaming the configuration
> > >>>> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
> > >>>> themselves to be more user-friendly.
> > >>>>
> > >>>> The full list of currently supported modes:
> > >>>>   * native-with-zerocopy - former DRV
> > >>>>   * native               - new one, DRV without zero-copy
> > >>>>   * generic              - former SKB
> > >>>>   * best-effort          - new one, chooses the best available from
> > >>>>                            3 above modes
> > >>>>
> > >>>> Since 'best-effort' is a default mode, users will not need to
> > >>>> explicitely set 'xdp-mode' in most cases.
> > >>>>
> > >>>> TCP related tests enabled back in system afxdp testsuite, because
> > >>>> 'best-effort' will choose 'native' mode for veth interfaces
> > >>>> and this mode has no issues with TCP.
> > >>> Patch in general looks good, two small comments inline.
> > >>
> > >> Thanks for review.
> > >>
> > >>>
> > >>> The only thing that bothers me is the worse performance of the TAP interface with the new default config. Can we somehow keep the old behavior for TAP interfaces?
> > >>
> > >> Could you check if TCP works over tap interfaces in generic mode?
> > >> For me the point is that correctness is better than performance.
> > >> I also hope that native implementation for tap will be improved
> > >> over time.
> > >>
> > > 
> > > I agree that we should first make sure correctness.
> > > I created a simple TCP test using br0, since br0 is a tap device.
> > > Unfortunately it does not work...
> > > 
> > > ---
> > > ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
> > > 
> > > ip netns add at_ns0
> > > ip link add p0 type veth peer name afxdp-p0
> > > ip link set p0 netns at_ns0
> > > ip link set dev afxdp-p0 up
> > > 
> > > ovs-vsctl add-port br0 afxdp-p0
> > > ovs-vsctl -- set interface afxdp-p0 options:n_rxq=1 type="afxdp" options:xdp-mode=native
> > > 
> > > ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
> > > ip addr add "10.1.1.1/24" dev p0
> > > ip link set dev p0 up
> > > NS_EXEC_HEREDOC
> > > 
> > > ovs-vsctl -- set interface br0 options:n_rxq=1 type="afxdp" options:xdp-mode=native
> > 
> > I'm not sure if this is a valid thing to do.
> > After changing the netdev type this device is no linger TAP device.
> > Now it's a permanent tap device that detached from any application
> > and opened by af_xdp from the other side.
> > 
> > BTW, changing the type of "internal" port doesn't sound safe.
> 
> I see, thanks.
> 
> Then if we don't change br0 to afxdp type, the TCP test
> below still fails, which I think should pass because
> we are using native mode for veth now.
> 
> I think even native mode doesn't work for TCP?

I forgot to turn off tx offload
after doing
ip netns exec at_ns0 ethtool -K p0 tx off

TCP works fine.
William
William Tu Nov. 19, 2019, 8:38 p.m. UTC | #18
On Tue, Nov 19, 2019 at 09:26:40PM +0100, Ilya Maximets wrote:
> On 19.11.2019 21:09, William Tu wrote:
> > On Tue, Nov 19, 2019 at 08:51:19PM +0100, Ilya Maximets wrote:
> >> On 19.11.2019 20:45, William Tu wrote:
> >>> On Tue, Nov 19, 2019 at 05:52:22PM +0100, Ilya Maximets wrote:
> >>>> On 19.11.2019 17:16, Eelco Chaudron wrote:
> >>>>>
> >>>>>
> >>>>> On 7 Nov 2019, at 12:36, Ilya Maximets wrote:
> >>>>>
> >>>>>> Until now there was only two options for XDP mode in OVS: SKB or DRV.
> >>>>>> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
> >>>>>>
> >>>>>> Devices like 'veth' interfaces in Linux supports native XDP, but
> >>>>>> doesn't support zero-copy mode.  This case can not be covered by
> >>>>>> existing API and we have to use slower generic XDP for such devices.
> >>>>>> There are few more issues, e.g. TCP is not supported in generic XDP
> >>>>>> mode for veth interfaces due to kernel limitations, however it is
> >>>>>> supported in native mode.
> >>>>>>
> >>>>>> This change introduces ability to use native XDP without zero-copy
> >>>>>> along with best-effort configuration option that enabled by default.
> >>>>>> In best-effort case OVS will sequentially try different modes starting
> >>>>>> from the fastest one and will choose the first acceptable for current
> >>>>>> interface.  This will guarantee the best possible performance.
> >>>>>>
> >>>>>> If user will want to choose specific mode, it's still possible by
> >>>>>> setting the 'options:xdp-mode'.
> >>>>>>
> >>>>>> This change additionally changes the API by renaming the configuration
> >>>>>> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
> >>>>>> themselves to be more user-friendly.
> >>>>>>
> >>>>>> The full list of currently supported modes:
> >>>>>>   * native-with-zerocopy - former DRV
> >>>>>>   * native               - new one, DRV without zero-copy
> >>>>>>   * generic              - former SKB
> >>>>>>   * best-effort          - new one, chooses the best available from
> >>>>>>                            3 above modes
> >>>>>>
> >>>>>> Since 'best-effort' is a default mode, users will not need to
> >>>>>> explicitely set 'xdp-mode' in most cases.
> >>>>>>
> >>>>>> TCP related tests enabled back in system afxdp testsuite, because
> >>>>>> 'best-effort' will choose 'native' mode for veth interfaces
> >>>>>> and this mode has no issues with TCP.
> >>>>> Patch in general looks good, two small comments inline.
> >>>>
> >>>> Thanks for review.
> >>>>
> >>>>>
> >>>>> The only thing that bothers me is the worse performance of the TAP interface with the new default config. Can we somehow keep the old behavior for TAP interfaces?
> >>>>
> >>>> Could you check if TCP works over tap interfaces in generic mode?
> >>>> For me the point is that correctness is better than performance.
> >>>> I also hope that native implementation for tap will be improved
> >>>> over time.
> >>>>
> >>>
> >>> I agree that we should first make sure correctness.
> >>> I created a simple TCP test using br0, since br0 is a tap device.
> >>> Unfortunately it does not work...
> >>>
> >>> ---
> >>> ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
> >>>
> >>> ip netns add at_ns0
> >>> ip link add p0 type veth peer name afxdp-p0
> >>> ip link set p0 netns at_ns0
> >>> ip link set dev afxdp-p0 up
> >>>
> >>> ovs-vsctl add-port br0 afxdp-p0
> >>> ovs-vsctl -- set interface afxdp-p0 options:n_rxq=1 type="afxdp" options:xdp-mode=native
> >>>
> >>> ip netns exec at_ns0 sh << NS_EXEC_HEREDOC
> >>> ip addr add "10.1.1.1/24" dev p0
> >>> ip link set dev p0 up
> >>> NS_EXEC_HEREDOC
> >>>
> >>> ovs-vsctl -- set interface br0 options:n_rxq=1 type="afxdp" options:xdp-mode=native
> >>
> >> I'm not sure if this is a valid thing to do.
> >> After changing the netdev type this device is no linger TAP device.
> >> Now it's a permanent tap device that detached from any application
> >> and opened by af_xdp from the other side.
> >>
> >> BTW, changing the type of "internal" port doesn't sound safe.
> > 
> > I see, thanks.
> > 
> > Then if we don't change br0 to afxdp type, the TCP test
> > below still fails, which I think should pass because
> > we are using native mode for veth now.
> > 
> > I think even native mode doesn't work for TCP?
> 
> Disable Tx offloading on both p0 and br0 and it will work.

So is it because when doing tx, af_xdp dev does not support
tcp checksum offload, so kernel is dropping tcp packet?

William
Eelco Chaudron Nov. 20, 2019, 7:35 a.m. UTC | #19
On 19 Nov 2019, at 17:52, Ilya Maximets wrote:

> On 19.11.2019 17:16, Eelco Chaudron wrote:
>>
>>
>> On 7 Nov 2019, at 12:36, Ilya Maximets wrote:
>>
>>> Until now there was only two options for XDP mode in OVS: SKB or 
>>> DRV.
>>> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
>>>
>>> Devices like 'veth' interfaces in Linux supports native XDP, but
>>> doesn't support zero-copy mode.  This case can not be covered by
>>> existing API and we have to use slower generic XDP for such devices.
>>> There are few more issues, e.g. TCP is not supported in generic XDP
>>> mode for veth interfaces due to kernel limitations, however it is
>>> supported in native mode.
>>>
>>> This change introduces ability to use native XDP without zero-copy
>>> along with best-effort configuration option that enabled by default.
>>> In best-effort case OVS will sequentially try different modes 
>>> starting
>>> from the fastest one and will choose the first acceptable for 
>>> current
>>> interface.  This will guarantee the best possible performance.
>>>
>>> If user will want to choose specific mode, it's still possible by
>>> setting the 'options:xdp-mode'.
>>>
>>> This change additionally changes the API by renaming the 
>>> configuration
>>> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
>>> themselves to be more user-friendly.
>>>
>>> The full list of currently supported modes:
>>>   * native-with-zerocopy - former DRV
>>>   * native               - new one, DRV without 
>>> zero-copy
>>>   * generic              - former SKB
>>>   * best-effort          - new one, chooses the best 
>>> available from
>>>                            3 above modes
>>>
>>> Since 'best-effort' is a default mode, users will not need to
>>> explicitely set 'xdp-mode' in most cases.
>>>
>>> TCP related tests enabled back in system afxdp testsuite, because
>>> 'best-effort' will choose 'native' mode for veth interfaces
>>> and this mode has no issues with TCP.
>> Patch in general looks good, two small comments inline.
>
> Thanks for review.
>
>>
>> The only thing that bothers me is the worse performance of the TAP 
>> interface with the new default config. Can we somehow keep the old 
>> behavior for TAP interfaces?
>
> Could you check if TCP works over tap interfaces in generic mode?
> For me the point is that correctness is better than performance.
> I also hope that native implementation for tap will be improved
> over time.

So if I understood your email chain with William correctly TCP is not 
working, so I affray correctness is better than performance.

Then this patch looks fine to see (see also comments below):

Acked-by: Eelco Chaudron <echaudro@redhat.com>

>>
>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>> ---
>>>
>>> With this patch I modified the user-visible API, but I think it's OK
>>> since it's still an experimental netdev.  Comments are welcome.
>>>
>>> Version 2:
>>>   * Rebased on current master.
>>>
>>>  Documentation/intro/install/afxdp.rst |  54 ++++---
>>>  NEWS                                  
>>> |  12 +-
>>>  lib/netdev-afxdp.c                    | 223 
>>> ++++++++++++++++----------
>>>  lib/netdev-afxdp.h                    |   9 
>>> ++
>>>  lib/netdev-linux-private.h            |   8 +-
>>>  tests/system-afxdp-macros.at          |   7 -
>>>  vswitchd/vswitch.xml                  |  38 
>>> +++--
>>>  7 files changed, 227 insertions(+), 124 deletions(-)
>>>
>>> diff --git a/Documentation/intro/install/afxdp.rst 
>>> b/Documentation/intro/install/afxdp.rst
>>> index a136db0c9..937770ad0 100644
>>> --- a/Documentation/intro/install/afxdp.rst
>>> +++ b/Documentation/intro/install/afxdp.rst
>>> @@ -153,9 +153,8 @@ To kick start end-to-end autotesting::
>>>    make check-afxdp TESTSUITEFLAGS='1'
>>>
>>>  .. note::
>>> -   Not all test cases pass at this time. Currenly all TCP related
>>> -   tests, ex: using wget or http, are skipped due to XDP 
>>> limitations
>>> -   on veth. cvlan test is also skipped.
>>> +   Not all test cases pass at this time. Currenly all cvlan tests 
>>> are skipped
>>> +   due to kernel issues.
>>>
>>>  If a test case fails, check the log at::
>>>
>>> @@ -177,33 +176,35 @@ in :doc:`general`::
>>>    ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
>>>
>>>  Make sure your device driver support AF_XDP, netdev-afxdp supports
>>> -the following additional options (see man ovs-vswitchd.conf.db for
>>> +the following additional options (see ``man ovs-vswitchd.conf.db`` 
>>> for
>>>  more details):
>>>
>>> - * **xdpmode**: use "drv" for driver mode, or "skb" for skb mode.
>>> + * ``xdp-mode``: ``best-effort``, ``native-with-zerocopy``,
>>> +   ``native`` or ``generic``.  Defaults to ``best-effort``, i.e. 
>>> best of
>>> +   supported modes, so in most cases you don't need to change it.
>>>
>>> - * **use-need-wakeup**: default "true" if libbpf supports it, 
>>> otherwise false.
>>> + * ``use-need-wakeup``: default ``true`` if libbpf supports it,
>>> +   otherwise ``false``.
>>>
>>>  For example, to use 1 PMD (on core 4) on 1 queue (queue 0) device,
>>> -configure these options: **pmd-cpu-mask, pmd-rxq-affinity, and 
>>> n_rxq**.
>>> -The **xdpmode** can be "drv" or "skb"::
>>> +configure these options: ``pmd-cpu-mask``, ``pmd-rxq-affinity``, 
>>> and
>>> +``n_rxq``::
>>>
>>>    ethtool -L enp2s0 combined 1
>>>    ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
>>>    ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 
>>> type="afxdp" \
>>> -    options:n_rxq=1 options:xdpmode=drv \
>>> -    other_config:pmd-rxq-affinity="0:4"
>>> +                                   
>>> other_config:pmd-rxq-affinity="0:4"
>>>
>>>  Or, use 4 pmds/cores and 4 queues by doing::
>>>
>>>    ethtool -L enp2s0 combined 4
>>>    ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
>>>    ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 
>>> type="afxdp" \
>>> -    options:n_rxq=4 options:xdpmode=drv \
>>> -    other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
>>> +    options:n_rxq=4 
>>> other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
>>>
>>>  .. note::
>>> -   pmd-rxq-affinity is optional. If not specified, system will 
>>> auto-assign.
>>> +   ``pmd-rxq-affinity`` is optional. If not specified, system 
>>> will auto-assign.
>>> +   ``n_rxq`` equals ``1`` by default.
>>>
>>>  To validate that the bridge has successfully instantiated, you can 
>>> use the::
>>>
>>> @@ -214,12 +215,21 @@ Should show something like::
>>>    Port "ens802f0"
>>>     Interface "ens802f0"
>>>        type: afxdp
>>> -      options: {n_rxq="1", xdpmode=drv}
>>> +      options: {n_rxq="1"}
>>>
>>>  Otherwise, enable debugging by::
>>>
>>>    ovs-appctl vlog/set netdev_afxdp::dbg
>>>
>>> +To check which XDP mode was chosen by ``best-effort``, you can look 
>>> for
>>> +``xdp-mode-in-use`` in the output of ``ovs-appctl dpctl/show``::
>>> +
>>> +  # ovs-appctl dpctl/show
>>> +  netdev@ovs-netdev:
>>> +    <...>
>>> +    port 2: ens802f0 (afxdp: n_rxq=1, use-need-wakeup=true,
>>> +                      xdp-mode=best-effort,
>>> +                      
>>> xdp-mode-in-use=native-with-zerocopy)
>>>
>>>  References
>>>  ----------
>>> @@ -323,8 +333,11 @@ Limitations/Known Issues
>>>  #. Most of the tests are done using i40e single port. Multiple 
>>> ports and
>>>     also ixgbe driver also needs to be tested.
>>>  #. No latency test result (TODO items)
>>> -#. Due to limitations of current upstream kernel, TCP and various 
>>> offloading
>>> +#. Due to limitations of current upstream kernel, various 
>>> offloading
>>>     (vlan, cvlan) is not working over virtual interfaces (i.e. 
>>> veth pair).
>>> +   Also, TCP is not working over virtual interfaces in generic 
>>> XDP mode.
>>> +   Some more information and possible workaround available `here
>>> +   
>>> <https://github.com/cilium/cilium/issues/3077#issuecomment-430801467>`__ 
>>> .
>>>
>>>
>>>  PVP using tap device
>>> @@ -335,8 +348,7 @@ First, start OVS, then add physical port::
>>>    ethtool -L enp2s0 combined 1
>>>    ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
>>>    ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 
>>> type="afxdp" \
>>> -    options:n_rxq=1 options:xdpmode=drv \
>>> -    other_config:pmd-rxq-affinity="0:4"
>>> +    options:n_rxq=1 other_config:pmd-rxq-affinity="0:4"
>>>
>>>  Start a VM with virtio and tap device::
>>>
>>> @@ -414,13 +426,11 @@ Create namespace and veth peer devices::
>>>
>>>  Attach the veth port to br0 (linux kernel mode)::
>>>
>>> -  ovs-vsctl add-port br0 afxdp-p0 -- \
>>> -    set interface afxdp-p0 options:n_rxq=1
>>> +  ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0
>>>
>>> -Or, use AF_XDP with skb mode::
>>> +Or, use AF_XDP::
>>>
>>> -  ovs-vsctl add-port br0 afxdp-p0 -- \
>>> -    set interface afxdp-p0 type="afxdp" options:n_rxq=1 
>>> options:xdpmode=skb
>>> +  ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0 
>>> type="afxdp"
>>>
>>>  Setup the OpenFlow rules::
>>>
>>> diff --git a/NEWS b/NEWS
>>> index 88b818948..d5f476d6e 100644
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -5,11 +5,19 @@ Post-v2.12.0
>>>         separate project. You can find it at
>>>         https://github.com/ovn-org/ovn.git
>>>     - Userspace datapath:
>>> +     * Add option to enable, disable and query TCP sequence 
>>> checking in
>>> +       conntrack.
>>> +   - AF_XDP:
>>>       * New option 'use-need-wakeup' for netdev-afxdp to 
>>> control enabling
>>>         of corresponding 'need_wakeup' flag in AF_XDP 
>>> rings.  Enabled by default
>>>         if supported by libbpf.
>>> -     * Add option to enable, disable and query TCP sequence 
>>> checking in
>>> -       conntrack.
>>> +     * 'xdpmode' option for netdev-afxdp renamed to 'xdp-mode'.
>>> +       Modes also updated.  New values:
>>> +         native-with-zerocopy  - former DRV
>>> +         native                - new one, 
>>> DRV without zero-copy
>>> +         generic               - former SKB
>>> +         best-effort [default] - new one, chooses the best 
>>> available from
>>> +                                 3 
>>> above modes
>>>
>>>  v2.12.0 - 03 Sep 2019
>>>  ---------------------
>>> diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
>>> index af654d498..74dde219d 100644
>>> --- a/lib/netdev-afxdp.c
>>> +++ b/lib/netdev-afxdp.c
>>> @@ -89,12 +89,42 @@ BUILD_ASSERT_DECL(PROD_NUM_DESCS == 
>>> CONS_NUM_DESCS);
>>>  #define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char 
>>> *)base))
>>>
>>>  static struct xsk_socket_info *xsk_configure(int ifindex, int 
>>> xdp_queue_id,
>>> -                                             
>>> int mode, bool use_need_wakeup);
>>> -static void xsk_remove_xdp_program(uint32_t ifindex, int xdpmode);
>>> +                                             
>>> enum afxdp_mode mode,
>>> +                                             
>>> bool use_need_wakeup,
>>> +                                             
>>> bool report_socket_failures);
>>> +static void xsk_remove_xdp_program(uint32_t ifindex, enum 
>>> afxdp_mode);
>>>  static void xsk_destroy(struct xsk_socket_info *xsk);
>>>  static int xsk_configure_all(struct netdev *netdev);
>>>  static void xsk_destroy_all(struct netdev *netdev);
>>>
>>> +static struct {
>>> +    const char *name;
>>> +    uint32_t bind_flags;
>>> +    uint32_t xdp_flags;
>>> +} xdp_modes[] = {
>>> +    [OVS_AF_XDP_MODE_UNSPEC] = {
>>> +        .name = "unspecified", .bind_flags = 0, .xdp_flags = 
>>> 0,
>>> +    },
>>> +    [OVS_AF_XDP_MODE_BEST_EFFORT] = {
>>> +        .name = "best-effort", .bind_flags = 0, .xdp_flags = 
>>> 0,
>>> +    },
>>
>> <nitpicking> Maybe format the two entries above the same as below, as 
>> it’s more pleasing to my OCD brain </nitpicking>
>
> Not a big deal for me.  I just wanted to save some space.
> Could be changed before applying.

Thanks
>>
>>> +    [OVS_AF_XDP_MODE_NATIVE_ZC] = {
>>> +        .name = "native-with-zerocopy",
>>> +        .bind_flags = XDP_ZEROCOPY,
>>> +        .xdp_flags = XDP_FLAGS_DRV_MODE,
>>> +    },
>>> +    [OVS_AF_XDP_MODE_NATIVE] = {
>>> +        .name = "native",
>>> +        .bind_flags = XDP_COPY,
>>> +        .xdp_flags = XDP_FLAGS_DRV_MODE,
>>> +    },
>>> +    [OVS_AF_XDP_MODE_GENERIC] = {
>>> +        .name = "generic",
>>> +        .bind_flags = XDP_COPY,
>>> +        .xdp_flags = XDP_FLAGS_SKB_MODE,
>>> +    },
>>> +};
>>> +
>>>  struct unused_pool {
>>>      struct xsk_umem_info *umem_info;
>>>      int lost_in_rings; /* Number of packets left in tx, rx, cq 
>>> and fq. */
>>> @@ -214,7 +244,7 @@ netdev_afxdp_sweep_unused_pools(void *aux 
>>> OVS_UNUSED)
>>>  }
>>>
>>>  static struct xsk_umem_info *
>>> -xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
>>> +xsk_configure_umem(void *buffer, uint64_t size)
>>>  {
>>>      struct xsk_umem_config uconfig;
>>>      struct xsk_umem_info *umem;
>>> @@ -232,9 +262,7 @@ xsk_configure_umem(void *buffer, uint64_t size, 
>>> int xdpmode)
>>>      ret = xsk_umem__create(&umem->umem, buffer, size, 
>>> &umem->fq, &umem->cq,
>>>                             &uconfig);
>>>      if (ret) {
>>> -        VLOG_ERR("xsk_umem__create failed (%s) mode: %s",
>>> -                 ovs_strerror(errno),
>>> -                 xdpmode == XDP_COPY ? "SKB": 
>>> "DRV");
>>> +        VLOG_ERR("xsk_umem__create failed: %s.", 
>>> ovs_strerror(errno));
>>>          free(umem);
>>>          return NULL;
>>>      }
>>> @@ -290,7 +318,8 @@ xsk_configure_umem(void *buffer, uint64_t size, 
>>> int xdpmode)
>>>
>>>  static struct xsk_socket_info *
>>>  xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
>>> -                     uint32_t queue_id, int 
>>> xdpmode, bool use_need_wakeup)
>>> +                     uint32_t queue_id, enum 
>>> afxdp_mode mode,
>>> +                     bool use_need_wakeup, bool 
>>> report_socket_failures)
>>>  {
>>>      struct xsk_socket_config cfg;
>>>      struct xsk_socket_info *xsk;
>>> @@ -304,14 +333,8 @@ xsk_configure_socket(struct xsk_umem_info 
>>> *umem, uint32_t ifindex,
>>>      cfg.rx_size = CONS_NUM_DESCS;
>>>      cfg.tx_size = PROD_NUM_DESCS;
>>>      cfg.libbpf_flags = 0;
>>> -
>>> -    if (xdpmode == XDP_ZEROCOPY) {
>>> -        cfg.bind_flags = XDP_ZEROCOPY;
>>> -        cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | 
>>> XDP_FLAGS_DRV_MODE;
>>> -    } else {
>>> -        cfg.bind_flags = XDP_COPY;
>>> -        cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | 
>>> XDP_FLAGS_SKB_MODE;
>>> -    }
>>> +    cfg.bind_flags = xdp_modes[mode].bind_flags;
>>> +    cfg.xdp_flags = xdp_modes[mode].xdp_flags | 
>>> XDP_FLAGS_UPDATE_IF_NOEXIST;
>>>
>>>  #ifdef HAVE_XDP_NEED_WAKEUP
>>>      if (use_need_wakeup) {
>>> @@ -329,12 +352,11 @@ xsk_configure_socket(struct xsk_umem_info 
>>> *umem, uint32_t ifindex,
>>>      ret = xsk_socket__create(&xsk->xsk, devname, queue_id, 
>>> umem->umem,
>>>                               &xsk->rx, 
>>> &xsk->tx, &cfg);
>>>      if (ret) {
>>> -        VLOG_ERR("xsk_socket__create failed (%s) mode: %s "
>>> -                 "use-need-wakeup: %s qid: %d",
>>> -                 ovs_strerror(errno),
>>> -                 xdpmode == XDP_COPY ? "SKB": 
>>> "DRV",
>>> -                 use_need_wakeup ? "true" : 
>>> "false",
>>> -                 queue_id);
>>> +        VLOG(report_socket_failures ? VLL_ERR : VLL_DBG,
>>> +             "xsk_socket__create failed (%s) mode: %s, 
>>> "
>>> +             "use-need-wakeup: %s, qid: %d",
>>> +             ovs_strerror(errno), xdp_modes[mode].name,
>>> +             use_need_wakeup ? "true" : "false", 
>>> queue_id);
>>>          free(xsk);
>>>          return NULL;
>>>      }
>>> @@ -375,8 +397,8 @@ xsk_configure_socket(struct xsk_umem_info *umem, 
>>> uint32_t ifindex,
>>>  }
>>>
>>>  static struct xsk_socket_info *
>>> -xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
>>> -              bool use_need_wakeup)
>>> +xsk_configure(int ifindex, int xdp_queue_id, enum afxdp_mode mode,
>>> +              bool use_need_wakeup, bool 
>>> report_socket_failures)
>>>  {
>>>      struct xsk_socket_info *xsk;
>>>      struct xsk_umem_info *umem;
>>> @@ -389,9 +411,7 @@ xsk_configure(int ifindex, int xdp_queue_id, int 
>>> xdpmode,
>>>      memset(bufs, 0, NUM_FRAMES * FRAME_SIZE);
>>>
>>>      /* Create AF_XDP socket. */
>>> -    umem = xsk_configure_umem(bufs,
>>> -                              
>>> NUM_FRAMES * FRAME_SIZE,
>>> -                              
>>> xdpmode);
>>> +    umem = xsk_configure_umem(bufs, NUM_FRAMES * FRAME_SIZE);
>>>      if (!umem) {
>>>          free_pagealign(bufs);
>>>          return NULL;
>>> @@ -399,8 +419,8 @@ xsk_configure(int ifindex, int xdp_queue_id, int 
>>> xdpmode,
>>>
>>>      VLOG_DBG("Allocated umem pool at 0x%"PRIxPTR, (uintptr_t) 
>>> umem);
>>>
>>> -    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, 
>>> xdpmode,
>>> -                               
>>> use_need_wakeup);
>>> +    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, 
>>> mode,
>>> +                               
>>> use_need_wakeup, report_socket_failures);
>>>      if (!xsk) {
>>>          /* Clean up umem and xpacket pool. */
>>>          if (xsk_umem__delete(umem->umem)) {
>>> @@ -414,12 +434,38 @@ xsk_configure(int ifindex, int xdp_queue_id, 
>>> int xdpmode,
>>>      return xsk;
>>>  }
>>>
>>> +static int
>>> +xsk_configure_queue(struct netdev_linux *dev, int ifindex, int 
>>> queue_id,
>>> +                    enum afxdp_mode mode, bool 
>>> report_socket_failures)
>>> +{
>>> +    struct xsk_socket_info *xsk_info;
>>> +
>>> +    VLOG_DBG("%s: configuring queue: %d, mode: %s, 
>>> use-need-wakeup: %s.",
>>> +             netdev_get_name(&dev->up), queue_id, 
>>> xdp_modes[mode].name,
>>> +             dev->use_need_wakeup ? "true" : "false");
>>> +    xsk_info = xsk_configure(ifindex, queue_id, mode, 
>>> dev->use_need_wakeup,
>>> +                             
>>> report_socket_failures);
>>> +    if (!xsk_info) {
>>> +        VLOG(report_socket_failures ? VLL_ERR : VLL_DBG,
>>> +             "%s: Failed to create AF_XDP socket on 
>>> queue %d in %s mode.",
>>> +             netdev_get_name(&dev->up), queue_id, 
>>> xdp_modes[mode].name);
>>> +        dev->xsks[queue_id] = NULL;
>>> +        return -1;
>>> +    }
>>> +    dev->xsks[queue_id] = xsk_info;
>>> +    atomic_init(&xsk_info->tx_dropped, 0);
>>> +    xsk_info->outstanding_tx = 0;
>>> +    xsk_info->available_rx = PROD_NUM_DESCS;
>>> +    return 0;
>>> +}
>>> +
>>> +
>>>  static int
>>>  xsk_configure_all(struct netdev *netdev)
>>>  {
>>>      struct netdev_linux *dev = netdev_linux_cast(netdev);
>>> -    struct xsk_socket_info *xsk_info;
>>>      int i, ifindex, n_rxq, n_txq;
>>> +    int qid = 0;
>>>
>>>      ifindex = linux_get_ifindex(netdev_get_name(netdev));
>>>
>>> @@ -429,23 +475,36 @@ xsk_configure_all(struct netdev *netdev)
>>>      n_rxq = netdev_n_rxq(netdev);
>>>      dev->xsks = xcalloc(n_rxq, sizeof *dev->xsks);
>>>
>>> -    /* Configure each queue. */
>>> -    for (i = 0; i < n_rxq; i++) {
>>> -        VLOG_DBG("%s: configure queue %d mode %s 
>>> use-need-wakeup %s.",
>>> -                 netdev_get_name(netdev), i,
>>> -                 dev->xdpmode == XDP_COPY ? "SKB" : 
>>> "DRV",
>>> -                 dev->use_need_wakeup ? "true" : 
>>> "false");
>>> -        xsk_info = xsk_configure(ifindex, i, dev->xdpmode,
>>> -                                 
>>> dev->use_need_wakeup);
>>> -        if (!xsk_info) {
>>> -            VLOG_ERR("Failed to create AF_XDP socket on 
>>> queue %d.", i);
>>> -            dev->xsks[i] = NULL;
>>> +    if (dev->xdp_mode == OVS_AF_XDP_MODE_BEST_EFFORT) {
>>> +        /* Trying to configure first queue with different 
>>> modes to
>>> +         * find the most suitable. */
>>> +        for (i = OVS_AF_XDP_MODE_NATIVE_ZC; i < 
>>> OVS_AF_XDP_MODE_MAX; i++) {
>>> +            if (!xsk_configure_queue(dev, ifindex, qid, 
>>> i,
>>> +                                     
>>> i == OVS_AF_XDP_MODE_MAX - 1)) {
>>> +                dev->xdp_mode_in_use = i;
>>> +                VLOG_INFO("%s: %s XDP mode will be 
>>> in use.",
>>> +                          
>>> netdev_get_name(netdev), xdp_modes[i].name);
>>> +                break;
>>> +            }
>>> +        }
>>
>> What if all modes fail? Guess it will use empty flags in 
>> xsk_configure_socket(). I would suggest to do a check in 
>> xsk_configure_socket() and bailout with a specific error?
>
> What about the code on the next line?
> It is here exactly for that purpose.  Am I missing something?

No, you are not missing anything, it was my code blindness that kicked 
in ;)

>>
>>> +        if (i == OVS_AF_XDP_MODE_MAX) {
>>> +            VLOG_ERR("%s: Failed to detect suitable XDP 
>>> mode.",
>>> +                     netdev_get_name(netdev));
>>> +            goto err;
>>> +        }
>>> +        qid++;
>>> +    } else {
>>> +        dev->xdp_mode_in_use = dev->xdp_mode;
>>> +    }
>>> +
>>> +    /* Configure remaining queues. */
>>> +    for (; qid < n_rxq; qid++) {
>>> +        if (xsk_configure_queue(dev, ifindex, qid,
>>> +                                
>>> dev->xdp_mode_in_use, true)) {
>>> +            VLOG_ERR("%s: Failed to create AF_XDP socket 
>>> on queue %d.",
>>> +                     netdev_get_name(netdev), 
>>> qid);
>>>              goto err;
>>>          }
>>> -        dev->xsks[i] = xsk_info;
>>> -        atomic_init(&xsk_info->tx_dropped, 0);
>>> -        xsk_info->outstanding_tx = 0;
>>> -        xsk_info->available_rx = PROD_NUM_DESCS;
>>>      }
>
> <snip>
Ilya Maximets Nov. 20, 2019, 4:43 p.m. UTC | #20
On 20.11.2019 8:35, Eelco Chaudron wrote:
> 
> 
> On 19 Nov 2019, at 17:52, Ilya Maximets wrote:
> 
>> On 19.11.2019 17:16, Eelco Chaudron wrote:
>>>
>>>
>>> On 7 Nov 2019, at 12:36, Ilya Maximets wrote:
>>>
>>>> Until now there was only two options for XDP mode in OVS: SKB or DRV.
>>>> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
>>>>
>>>> Devices like 'veth' interfaces in Linux supports native XDP, but
>>>> doesn't support zero-copy mode.  This case can not be covered by
>>>> existing API and we have to use slower generic XDP for such devices.
>>>> There are few more issues, e.g. TCP is not supported in generic XDP
>>>> mode for veth interfaces due to kernel limitations, however it is
>>>> supported in native mode.
>>>>
>>>> This change introduces ability to use native XDP without zero-copy
>>>> along with best-effort configuration option that enabled by default.
>>>> In best-effort case OVS will sequentially try different modes starting
>>>> from the fastest one and will choose the first acceptable for current
>>>> interface.  This will guarantee the best possible performance.
>>>>
>>>> If user will want to choose specific mode, it's still possible by
>>>> setting the 'options:xdp-mode'.
>>>>
>>>> This change additionally changes the API by renaming the configuration
>>>> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
>>>> themselves to be more user-friendly.
>>>>
>>>> The full list of currently supported modes:
>>>>   * native-with-zerocopy - former DRV
>>>>   * native               - new one, DRV without zero-copy
>>>>   * generic              - former SKB
>>>>   * best-effort          - new one, chooses the best available from
>>>>                            3 above modes
>>>>
>>>> Since 'best-effort' is a default mode, users will not need to
>>>> explicitely set 'xdp-mode' in most cases.
>>>>
>>>> TCP related tests enabled back in system afxdp testsuite, because
>>>> 'best-effort' will choose 'native' mode for veth interfaces
>>>> and this mode has no issues with TCP.
>>> Patch in general looks good, two small comments inline.
>>
>> Thanks for review.
>>
>>>
>>> The only thing that bothers me is the worse performance of the TAP interface with the new default config. Can we somehow keep the old behavior for TAP interfaces?
>>
>> Could you check if TCP works over tap interfaces in generic mode?
>> For me the point is that correctness is better than performance.
>> I also hope that native implementation for tap will be improved
>> over time.
> 
> So if I understood your email chain with William correctly TCP is not working, so I affray correctness is better than performance.

Not exactly. William didn't test the actual TAP interfaces.

I tested today with kernel vhost backed virtio-user port and it seems to pass
TCP frames in generic mode.

The setup is following:

tap1 <-- ovs-vswitchd --> tap0 <-- testpmd --> tap2

tap1 -- tap port created by OVS (type=tap)
tap0 -- tap port, virtio-user, created by testpmd, opened by OVS with type=afxdp
tap2 -- tap port created by testpmd (net_tap)

tap1 and tap2 are in their own network namespaces and iperf works on them.

TCP works in this scenario regardless of xdp-mode on tap0 interface.

However, let's look at the difference before-after this patch:

                        PHY NIC              TAP              VETH
before (skb default)  slow by default   fast by default       broken TCP
after  (best-effort)  fast by default   slower by default     works

For the best speed before you had to configure xdp-mode for physical NICs,
after you need to configure xdp-mode for TAPs.  No much difference from the
configuration side, but with patch applied veths are working.

So, I think this change makes sense anyway.  We can think about workaround
for TAP devices, but we still need to check more cases.

For now, I updated the 'Limitations' section with information that generic
mode works for TAP devices and might be even faster in some cases.

I also reformatted the code for 'unspecified' and 'best-effort' modes as
you asked and applied to master. 

Best regards, Ilya Maximets.
William Tu Nov. 20, 2019, 5:54 p.m. UTC | #21
On Wed, Nov 20, 2019 at 05:43:48PM +0100, Ilya Maximets wrote:
> On 20.11.2019 8:35, Eelco Chaudron wrote:
> > 
> > 
> > On 19 Nov 2019, at 17:52, Ilya Maximets wrote:
> > 
> >> On 19.11.2019 17:16, Eelco Chaudron wrote:
> >>>
> >>>
> >>> On 7 Nov 2019, at 12:36, Ilya Maximets wrote:
> >>>
> >>>> Until now there was only two options for XDP mode in OVS: SKB or DRV.
> >>>> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
> >>>>
> >>>> Devices like 'veth' interfaces in Linux supports native XDP, but
> >>>> doesn't support zero-copy mode.  This case can not be covered by
> >>>> existing API and we have to use slower generic XDP for such devices.
> >>>> There are few more issues, e.g. TCP is not supported in generic XDP
> >>>> mode for veth interfaces due to kernel limitations, however it is
> >>>> supported in native mode.
> >>>>
> >>>> This change introduces ability to use native XDP without zero-copy
> >>>> along with best-effort configuration option that enabled by default.
> >>>> In best-effort case OVS will sequentially try different modes starting
> >>>> from the fastest one and will choose the first acceptable for current
> >>>> interface.  This will guarantee the best possible performance.
> >>>>
> >>>> If user will want to choose specific mode, it's still possible by
> >>>> setting the 'options:xdp-mode'.
> >>>>
> >>>> This change additionally changes the API by renaming the configuration
> >>>> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
> >>>> themselves to be more user-friendly.
> >>>>
> >>>> The full list of currently supported modes:
> >>>>   * native-with-zerocopy - former DRV
> >>>>   * native               - new one, DRV without zero-copy
> >>>>   * generic              - former SKB
> >>>>   * best-effort          - new one, chooses the best available from
> >>>>                            3 above modes
> >>>>
> >>>> Since 'best-effort' is a default mode, users will not need to
> >>>> explicitely set 'xdp-mode' in most cases.
> >>>>
> >>>> TCP related tests enabled back in system afxdp testsuite, because
> >>>> 'best-effort' will choose 'native' mode for veth interfaces
> >>>> and this mode has no issues with TCP.
> >>> Patch in general looks good, two small comments inline.
> >>
> >> Thanks for review.
> >>
> >>>
> >>> The only thing that bothers me is the worse performance of the TAP interface with the new default config. Can we somehow keep the old behavior for TAP interfaces?
> >>
> >> Could you check if TCP works over tap interfaces in generic mode?
> >> For me the point is that correctness is better than performance.
> >> I also hope that native implementation for tap will be improved
> >> over time.
> > 
> > So if I understood your email chain with William correctly TCP is not working, so I affray correctness is better than performance.
> 
> Not exactly. William didn't test the actual TAP interfaces.
> 
> I tested today with kernel vhost backed virtio-user port and it seems to pass
> TCP frames in generic mode.
> 
> The setup is following:
> 
> tap1 <-- ovs-vswitchd --> tap0 <-- testpmd --> tap2
> 
> tap1 -- tap port created by OVS (type=tap)
> tap0 -- tap port, virtio-user, created by testpmd, opened by OVS with type=afxdp
> tap2 -- tap port created by testpmd (net_tap)
> 
> tap1 and tap2 are in their own network namespaces and iperf works on them.

Hi Ilya,

This is an interesting setup.
Can you share roughly your commands to do this test?

btw, add Yiyang in the loop as he is going to try
this setup on openstack environment.

William

> 
> TCP works in this scenario regardless of xdp-mode on tap0 interface.
> 
> However, let's look at the difference before-after this patch:
> 
>                         PHY NIC              TAP              VETH
> before (skb default)  slow by default   fast by default       broken TCP
> after  (best-effort)  fast by default   slower by default     works
> 
> For the best speed before you had to configure xdp-mode for physical NICs,
> after you need to configure xdp-mode for TAPs.  No much difference from the
> configuration side, but with patch applied veths are working.
> 
> So, I think this change makes sense anyway.  We can think about workaround
> for TAP devices, but we still need to check more cases.
> 
> For now, I updated the 'Limitations' section with information that generic
> mode works for TAP devices and might be even faster in some cases.
> 
> I also reformatted the code for 'unspecified' and 'best-effort' modes as
> you asked and applied to master. 
> 
> Best regards, Ilya Maximets.
Ilya Maximets Nov. 20, 2019, 6:28 p.m. UTC | #22
On 20.11.2019 18:54, William Tu wrote:
> On Wed, Nov 20, 2019 at 05:43:48PM +0100, Ilya Maximets wrote:
>> On 20.11.2019 8:35, Eelco Chaudron wrote:
>>>
>>>
>>> On 19 Nov 2019, at 17:52, Ilya Maximets wrote:
>>>
>>>> On 19.11.2019 17:16, Eelco Chaudron wrote:
>>>>>
>>>>>
>>>>> On 7 Nov 2019, at 12:36, Ilya Maximets wrote:
>>>>>
>>>>>> Until now there was only two options for XDP mode in OVS: SKB or DRV.
>>>>>> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
>>>>>>
>>>>>> Devices like 'veth' interfaces in Linux supports native XDP, but
>>>>>> doesn't support zero-copy mode.  This case can not be covered by
>>>>>> existing API and we have to use slower generic XDP for such devices.
>>>>>> There are few more issues, e.g. TCP is not supported in generic XDP
>>>>>> mode for veth interfaces due to kernel limitations, however it is
>>>>>> supported in native mode.
>>>>>>
>>>>>> This change introduces ability to use native XDP without zero-copy
>>>>>> along with best-effort configuration option that enabled by default.
>>>>>> In best-effort case OVS will sequentially try different modes starting
>>>>>> from the fastest one and will choose the first acceptable for current
>>>>>> interface.  This will guarantee the best possible performance.
>>>>>>
>>>>>> If user will want to choose specific mode, it's still possible by
>>>>>> setting the 'options:xdp-mode'.
>>>>>>
>>>>>> This change additionally changes the API by renaming the configuration
>>>>>> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
>>>>>> themselves to be more user-friendly.
>>>>>>
>>>>>> The full list of currently supported modes:
>>>>>>   * native-with-zerocopy - former DRV
>>>>>>   * native               - new one, DRV without zero-copy
>>>>>>   * generic              - former SKB
>>>>>>   * best-effort          - new one, chooses the best available from
>>>>>>                            3 above modes
>>>>>>
>>>>>> Since 'best-effort' is a default mode, users will not need to
>>>>>> explicitely set 'xdp-mode' in most cases.
>>>>>>
>>>>>> TCP related tests enabled back in system afxdp testsuite, because
>>>>>> 'best-effort' will choose 'native' mode for veth interfaces
>>>>>> and this mode has no issues with TCP.
>>>>> Patch in general looks good, two small comments inline.
>>>>
>>>> Thanks for review.
>>>>
>>>>>
>>>>> The only thing that bothers me is the worse performance of the TAP interface with the new default config. Can we somehow keep the old behavior for TAP interfaces?
>>>>
>>>> Could you check if TCP works over tap interfaces in generic mode?
>>>> For me the point is that correctness is better than performance.
>>>> I also hope that native implementation for tap will be improved
>>>> over time.
>>>
>>> So if I understood your email chain with William correctly TCP is not working, so I affray correctness is better than performance.
>>
>> Not exactly. William didn't test the actual TAP interfaces.
>>
>> I tested today with kernel vhost backed virtio-user port and it seems to pass
>> TCP frames in generic mode.
>>
>> The setup is following:
>>
>> tap1 <-- ovs-vswitchd --> tap0 <-- testpmd --> tap2
>>
>> tap1 -- tap port created by OVS (type=tap)
>> tap0 -- tap port, virtio-user, created by testpmd, opened by OVS with type=afxdp
>> tap2 -- tap port created by testpmd (net_tap)
>>
>> tap1 and tap2 are in their own network namespaces and iperf works on them.
> 
> Hi Ilya,
> 
> This is an interesting setup.
> Can you share roughly your commands to do this test?

I just copied and modified one of the system-dpdk tests like this:

diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at
index a015d52f7..b0d10fcdd 100644
--- a/tests/system-dpdk.at
+++ b/tests/system-dpdk.at
@@ -232,3 +232,83 @@ OVS_VSWITCHD_STOP(["\@does not exist. The Open vSwitch kernel module is probably
 \@EAL: No free hugepages reported in hugepages-1048576kB@d"])
 AT_CLEANUP
 dnl --------------------------------------------------------------------------
+
+dnl --------------------------------------------------------------------------
+dnl Ping afxdp port
+AT_SETUP([OVS-DPDK - ping afxdp ports])
+AT_KEYWORDS([afxdp tap])
+OVS_DPDK_PRE_CHECK()
+AT_SKIP_IF([! which testpmd >/dev/null 2>/dev/null])
+OVS_DPDK_START()
+
+dnl Find number of sockets
+AT_CHECK([lscpu], [], [stdout])
+AT_CHECK([cat stdout | grep "NUMA node(s)" | awk '{c=1; while (c++<$(3)) {printf "512,"}; print "512"}' > NUMA_NODE])
+
+dnl Add userspace bridge and attach it to OVS
+AT_CHECK([ovs-vsctl add-br br10 -- set bridge br10 datapath_type=netdev])
+dnl Set up namespaces
+ADD_NAMESPACES(ns1, ns2)
+
+dnl Add veth device
+ADD_VETH(tap1, ns2, br10, "172.31.110.12/24")
+
+dnl Execute testpmd in background
+on_exit "pkill -f -x -9 'tail -f /dev/null'"
+tail -f /dev/null | testpmd --socket-mem="$(cat NUMA_NODE)" --no-pci\
+           --vdev="net_virtio_user,path=/dev/vhost-net,queue_size=1024" \
+           --vdev="net_tap7,iface=tap7" --file-prefix page0 \
+           --single-file-segments -- -a >$OVS_RUNDIR/testpmd-dpdkvhostuserclient0.log 2>&1 &
+
+dnl Give settling time to the testpmd processes - NOTE: this is bad form.
+sleep 10
+
+ip link set tap0 up
+ethtool -K tap7 tx off
+ip netns exec ns2 ethtool -K tap1 tx off
+ip link show
+
+AT_CHECK([ovs-vsctl add-port br10 tap0 -- set Interface tap0 \
+                                          type=afxdp \
+                                          options:xdp-mode=generic], [],
+         [stdout], [stderr])
+AT_CHECK([ovs-vsctl show], [], [stdout])
+
+
+dnl Move the tap devices to the namespaces
+AT_CHECK([ps aux | grep testpmd], [], [stdout], [stderr])
+AT_CHECK([ip link show], [], [stdout], [stderr])
+AT_CHECK([ip link set tap7 netns ns1], [], [stdout], [stderr])
+
+AT_CHECK([ip netns exec ns1 ip link show], [], [stdout], [stderr])
+AT_CHECK([ip netns exec ns1 ip link show | grep tap7], [], [stdout], [stderr])
+AT_CHECK([ip netns exec ns1 ip link set tap7 up], [], [stdout], [stderr])
+AT_CHECK([ip netns exec ns1 ip addr add 172.31.110.11/24 dev tap7], [],
+         [stdout], [stderr])
+
+AT_CHECK([ip netns exec ns1 ip link show], [], [stdout], [stderr])
+AT_CHECK([ip netns exec ns2 ip link show], [], [stdout], [stderr])
+AT_CHECK([ip netns exec ns1 ping -c 4 -I tap7 172.31.110.12], [], [stdout],
+         [stderr])
+
+dnl NETNS_DAEMONIZE([ns1], [nc -l -k 1234 > /dev/null], [nc1.pid])
+dnl NS_CHECK_EXEC([ns2], [echo "foobar" | nc $NC_EOF_OPT 10.1.1.1 1234])
+sleep 180
+
+dnl Clean up the testpmd now
+pkill -f -x -9 'tail -f /dev/null'
+
+dnl Clean up
+AT_CHECK([ovs-vsctl del-port br10 tap0], [], [stdout], [stderr])
+OVS_VSWITCHD_STOP(["\@does not exist. The Open vSwitch kernel module is probably not loaded.@d
+\@Failed to enable flow control@d
+\@VHOST_CONFIG: recvmsg failed@d
+\@VHOST_CONFIG: failed to connect to $OVS_RUNDIR/dpdkvhostclient0: No such file or directory@d
+\@Global register is changed during@d
+\@dpdkvhostuser ports are considered deprecated;  please migrate to dpdkvhostuserclient ports.@d
+\@failed to enumerate system datapaths: No such file or directory@d
+\@EAL:   Invalid NUMA socket, default to 0@d
+\@EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles !@d
+\@EAL: No free hugepages reported in hugepages-1048576kB@d"])
+AT_CLEANUP
+dnl --------------------------------------------------------------------------
---

tap2 is a tap7 in the script.

The test is not well-shaped and I was too lazy to properly invoke iperf from
the test so I just added 'sleep 180' to run iperf from the separate terminals
by hands:

(term 1)#ip netns exec ns1 iperf3 -s -i 1
(term 2)#ip netns exec ns2 iperf3 -c 172.31.110.11 -i 1

BTW, to run 'make check-dpdk' you need testpmd available in PATH and
hugepages configured.


Best regards, Ilya Maximets.
Eelco Chaudron Nov. 21, 2019, 2:32 p.m. UTC | #23
On 20 Nov 2019, at 17:43, Ilya Maximets wrote:

> On 20.11.2019 8:35, Eelco Chaudron wrote:
>>
>>
>> On 19 Nov 2019, at 17:52, Ilya Maximets wrote:
>>
>>> On 19.11.2019 17:16, Eelco Chaudron wrote:
>>>>
>>>>
>>>> On 7 Nov 2019, at 12:36, Ilya Maximets wrote:
>>>>
>>>>> Until now there was only two options for XDP mode in OVS: SKB or 
>>>>> DRV.
>>>>> i.e. 'generic XDP' or 'native XDP with zero-copy enabled'.
>>>>>
>>>>> Devices like 'veth' interfaces in Linux supports native XDP, but
>>>>> doesn't support zero-copy mode.  This case can not be covered by
>>>>> existing API and we have to use slower generic XDP for such 
>>>>> devices.
>>>>> There are few more issues, e.g. TCP is not supported in generic 
>>>>> XDP
>>>>> mode for veth interfaces due to kernel limitations, however it is
>>>>> supported in native mode.
>>>>>
>>>>> This change introduces ability to use native XDP without zero-copy
>>>>> along with best-effort configuration option that enabled by 
>>>>> default.
>>>>> In best-effort case OVS will sequentially try different modes 
>>>>> starting
>>>>> from the fastest one and will choose the first acceptable for 
>>>>> current
>>>>> interface.  This will guarantee the best possible performance.
>>>>>
>>>>> If user will want to choose specific mode, it's still possible by
>>>>> setting the 'options:xdp-mode'.
>>>>>
>>>>> This change additionally changes the API by renaming the 
>>>>> configuration
>>>>> knob from 'xdpmode' to 'xdp-mode' and also renaming the modes
>>>>> themselves to be more user-friendly.
>>>>>
>>>>> The full list of currently supported modes:
>>>>>   * native-with-zerocopy - former DRV
>>>>>   * native               - new one, DRV without 
>>>>> zero-copy
>>>>>   * generic              - former SKB
>>>>>   * best-effort          - new one, chooses the best 
>>>>> available from
>>>>>                            3 above modes
>>>>>
>>>>> Since 'best-effort' is a default mode, users will not need to
>>>>> explicitely set 'xdp-mode' in most cases.
>>>>>
>>>>> TCP related tests enabled back in system afxdp testsuite, because
>>>>> 'best-effort' will choose 'native' mode for veth interfaces
>>>>> and this mode has no issues with TCP.
>>>> Patch in general looks good, two small comments inline.
>>>
>>> Thanks for review.
>>>
>>>>
>>>> The only thing that bothers me is the worse performance of the TAP 
>>>> interface with the new default config. Can we somehow keep the old 
>>>> behavior for TAP interfaces?
>>>
>>> Could you check if TCP works over tap interfaces in generic mode?
>>> For me the point is that correctness is better than performance.
>>> I also hope that native implementation for tap will be improved
>>> over time.
>>
>> So if I understood your email chain with William correctly TCP is not 
>> working, so I affray correctness is better than performance.
>
> Not exactly. William didn't test the actual TAP interfaces.
>
> I tested today with kernel vhost backed virtio-user port and it seems 
> to pass
> TCP frames in generic mode.
>
> The setup is following:
>
> tap1 <-- ovs-vswitchd --> tap0 <-- testpmd --> tap2
>
> tap1 -- tap port created by OVS (type=tap)
> tap0 -- tap port, virtio-user, created by testpmd, opened by OVS with 
> type=afxdp
> tap2 -- tap port created by testpmd (net_tap)
>
> tap1 and tap2 are in their own network namespaces and iperf works on 
> them.
>
> TCP works in this scenario regardless of xdp-mode on tap0 interface.
>
> However, let's look at the difference before-after this patch:
>
>                         PHY NIC              TAP              VETH
> before (skb default)  slow by default   fast by default       broken 
> TCP
> after  (best-effort)  fast by default   slower by default     works
>
> For the best speed before you had to configure xdp-mode for physical 
> NICs,
> after you need to configure xdp-mode for TAPs.  No much difference 
> from the
> configuration side, but with patch applied veths are working.
>
> So, I think this change makes sense anyway.  We can think about 
> workaround
> for TAP devices, but we still need to check more cases.
>
> For now, I updated the 'Limitations' section with information that 
> generic
> mode works for TAP devices and might be even faster in some cases.
>
> I also reformatted the code for 'unspecified' and 'best-effort' modes 
> as
> you asked and applied to master.
>
> Best regards, Ilya Maximets.

Thanks!

//Eelco
diff mbox series

Patch

diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst
index a136db0c9..937770ad0 100644
--- a/Documentation/intro/install/afxdp.rst
+++ b/Documentation/intro/install/afxdp.rst
@@ -153,9 +153,8 @@  To kick start end-to-end autotesting::
   make check-afxdp TESTSUITEFLAGS='1'
 
 .. note::
-   Not all test cases pass at this time. Currenly all TCP related
-   tests, ex: using wget or http, are skipped due to XDP limitations
-   on veth. cvlan test is also skipped.
+   Not all test cases pass at this time. Currenly all cvlan tests are skipped
+   due to kernel issues.
 
 If a test case fails, check the log at::
 
@@ -177,33 +176,35 @@  in :doc:`general`::
   ovs-vsctl -- add-br br0 -- set Bridge br0 datapath_type=netdev
 
 Make sure your device driver support AF_XDP, netdev-afxdp supports
-the following additional options (see man ovs-vswitchd.conf.db for
+the following additional options (see ``man ovs-vswitchd.conf.db`` for
 more details):
 
- * **xdpmode**: use "drv" for driver mode, or "skb" for skb mode.
+ * ``xdp-mode``: ``best-effort``, ``native-with-zerocopy``,
+   ``native`` or ``generic``.  Defaults to ``best-effort``, i.e. best of
+   supported modes, so in most cases you don't need to change it.
 
- * **use-need-wakeup**: default "true" if libbpf supports it, otherwise false.
+ * ``use-need-wakeup``: default ``true`` if libbpf supports it,
+   otherwise ``false``.
 
 For example, to use 1 PMD (on core 4) on 1 queue (queue 0) device,
-configure these options: **pmd-cpu-mask, pmd-rxq-affinity, and n_rxq**.
-The **xdpmode** can be "drv" or "skb"::
+configure these options: ``pmd-cpu-mask``, ``pmd-rxq-affinity``, and
+``n_rxq``::
 
   ethtool -L enp2s0 combined 1
   ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
   ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
-    options:n_rxq=1 options:xdpmode=drv \
-    other_config:pmd-rxq-affinity="0:4"
+                                   other_config:pmd-rxq-affinity="0:4"
 
 Or, use 4 pmds/cores and 4 queues by doing::
 
   ethtool -L enp2s0 combined 4
   ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x36
   ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
-    options:n_rxq=4 options:xdpmode=drv \
-    other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
+    options:n_rxq=4 other_config:pmd-rxq-affinity="0:1,1:2,2:3,3:4"
 
 .. note::
-   pmd-rxq-affinity is optional. If not specified, system will auto-assign.
+   ``pmd-rxq-affinity`` is optional. If not specified, system will auto-assign.
+   ``n_rxq`` equals ``1`` by default.
 
 To validate that the bridge has successfully instantiated, you can use the::
 
@@ -214,12 +215,21 @@  Should show something like::
   Port "ens802f0"
    Interface "ens802f0"
       type: afxdp
-      options: {n_rxq="1", xdpmode=drv}
+      options: {n_rxq="1"}
 
 Otherwise, enable debugging by::
 
   ovs-appctl vlog/set netdev_afxdp::dbg
 
+To check which XDP mode was chosen by ``best-effort``, you can look for
+``xdp-mode-in-use`` in the output of ``ovs-appctl dpctl/show``::
+
+  # ovs-appctl dpctl/show
+  netdev@ovs-netdev:
+    <...>
+    port 2: ens802f0 (afxdp: n_rxq=1, use-need-wakeup=true,
+                      xdp-mode=best-effort,
+                      xdp-mode-in-use=native-with-zerocopy)
 
 References
 ----------
@@ -323,8 +333,11 @@  Limitations/Known Issues
 #. Most of the tests are done using i40e single port. Multiple ports and
    also ixgbe driver also needs to be tested.
 #. No latency test result (TODO items)
-#. Due to limitations of current upstream kernel, TCP and various offloading
+#. Due to limitations of current upstream kernel, various offloading
    (vlan, cvlan) is not working over virtual interfaces (i.e. veth pair).
+   Also, TCP is not working over virtual interfaces in generic XDP mode.
+   Some more information and possible workaround available `here
+   <https://github.com/cilium/cilium/issues/3077#issuecomment-430801467>`__ .
 
 
 PVP using tap device
@@ -335,8 +348,7 @@  First, start OVS, then add physical port::
   ethtool -L enp2s0 combined 1
   ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x10
   ovs-vsctl add-port br0 enp2s0 -- set interface enp2s0 type="afxdp" \
-    options:n_rxq=1 options:xdpmode=drv \
-    other_config:pmd-rxq-affinity="0:4"
+    options:n_rxq=1 other_config:pmd-rxq-affinity="0:4"
 
 Start a VM with virtio and tap device::
 
@@ -414,13 +426,11 @@  Create namespace and veth peer devices::
 
 Attach the veth port to br0 (linux kernel mode)::
 
-  ovs-vsctl add-port br0 afxdp-p0 -- \
-    set interface afxdp-p0 options:n_rxq=1
+  ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0
 
-Or, use AF_XDP with skb mode::
+Or, use AF_XDP::
 
-  ovs-vsctl add-port br0 afxdp-p0 -- \
-    set interface afxdp-p0 type="afxdp" options:n_rxq=1 options:xdpmode=skb
+  ovs-vsctl add-port br0 afxdp-p0 -- set interface afxdp-p0 type="afxdp"
 
 Setup the OpenFlow rules::
 
diff --git a/NEWS b/NEWS
index 88b818948..d5f476d6e 100644
--- a/NEWS
+++ b/NEWS
@@ -5,11 +5,19 @@  Post-v2.12.0
        separate project. You can find it at
        https://github.com/ovn-org/ovn.git
    - Userspace datapath:
+     * Add option to enable, disable and query TCP sequence checking in
+       conntrack.
+   - AF_XDP:
      * New option 'use-need-wakeup' for netdev-afxdp to control enabling
        of corresponding 'need_wakeup' flag in AF_XDP rings.  Enabled by default
        if supported by libbpf.
-     * Add option to enable, disable and query TCP sequence checking in
-       conntrack.
+     * 'xdpmode' option for netdev-afxdp renamed to 'xdp-mode'.
+       Modes also updated.  New values:
+         native-with-zerocopy  - former DRV
+         native                - new one, DRV without zero-copy
+         generic               - former SKB
+         best-effort [default] - new one, chooses the best available from
+                                 3 above modes
 
 v2.12.0 - 03 Sep 2019
 ---------------------
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
index af654d498..74dde219d 100644
--- a/lib/netdev-afxdp.c
+++ b/lib/netdev-afxdp.c
@@ -89,12 +89,42 @@  BUILD_ASSERT_DECL(PROD_NUM_DESCS == CONS_NUM_DESCS);
 #define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base))
 
 static struct xsk_socket_info *xsk_configure(int ifindex, int xdp_queue_id,
-                                             int mode, bool use_need_wakeup);
-static void xsk_remove_xdp_program(uint32_t ifindex, int xdpmode);
+                                             enum afxdp_mode mode,
+                                             bool use_need_wakeup,
+                                             bool report_socket_failures);
+static void xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode);
 static void xsk_destroy(struct xsk_socket_info *xsk);
 static int xsk_configure_all(struct netdev *netdev);
 static void xsk_destroy_all(struct netdev *netdev);
 
+static struct {
+    const char *name;
+    uint32_t bind_flags;
+    uint32_t xdp_flags;
+} xdp_modes[] = {
+    [OVS_AF_XDP_MODE_UNSPEC] = {
+        .name = "unspecified", .bind_flags = 0, .xdp_flags = 0,
+    },
+    [OVS_AF_XDP_MODE_BEST_EFFORT] = {
+        .name = "best-effort", .bind_flags = 0, .xdp_flags = 0,
+    },
+    [OVS_AF_XDP_MODE_NATIVE_ZC] = {
+        .name = "native-with-zerocopy",
+        .bind_flags = XDP_ZEROCOPY,
+        .xdp_flags = XDP_FLAGS_DRV_MODE,
+    },
+    [OVS_AF_XDP_MODE_NATIVE] = {
+        .name = "native",
+        .bind_flags = XDP_COPY,
+        .xdp_flags = XDP_FLAGS_DRV_MODE,
+    },
+    [OVS_AF_XDP_MODE_GENERIC] = {
+        .name = "generic",
+        .bind_flags = XDP_COPY,
+        .xdp_flags = XDP_FLAGS_SKB_MODE,
+    },
+};
+
 struct unused_pool {
     struct xsk_umem_info *umem_info;
     int lost_in_rings; /* Number of packets left in tx, rx, cq and fq. */
@@ -214,7 +244,7 @@  netdev_afxdp_sweep_unused_pools(void *aux OVS_UNUSED)
 }
 
 static struct xsk_umem_info *
-xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
+xsk_configure_umem(void *buffer, uint64_t size)
 {
     struct xsk_umem_config uconfig;
     struct xsk_umem_info *umem;
@@ -232,9 +262,7 @@  xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
     ret = xsk_umem__create(&umem->umem, buffer, size, &umem->fq, &umem->cq,
                            &uconfig);
     if (ret) {
-        VLOG_ERR("xsk_umem__create failed (%s) mode: %s",
-                 ovs_strerror(errno),
-                 xdpmode == XDP_COPY ? "SKB": "DRV");
+        VLOG_ERR("xsk_umem__create failed: %s.", ovs_strerror(errno));
         free(umem);
         return NULL;
     }
@@ -290,7 +318,8 @@  xsk_configure_umem(void *buffer, uint64_t size, int xdpmode)
 
 static struct xsk_socket_info *
 xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
-                     uint32_t queue_id, int xdpmode, bool use_need_wakeup)
+                     uint32_t queue_id, enum afxdp_mode mode,
+                     bool use_need_wakeup, bool report_socket_failures)
 {
     struct xsk_socket_config cfg;
     struct xsk_socket_info *xsk;
@@ -304,14 +333,8 @@  xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
     cfg.rx_size = CONS_NUM_DESCS;
     cfg.tx_size = PROD_NUM_DESCS;
     cfg.libbpf_flags = 0;
-
-    if (xdpmode == XDP_ZEROCOPY) {
-        cfg.bind_flags = XDP_ZEROCOPY;
-        cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_DRV_MODE;
-    } else {
-        cfg.bind_flags = XDP_COPY;
-        cfg.xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST | XDP_FLAGS_SKB_MODE;
-    }
+    cfg.bind_flags = xdp_modes[mode].bind_flags;
+    cfg.xdp_flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST;
 
 #ifdef HAVE_XDP_NEED_WAKEUP
     if (use_need_wakeup) {
@@ -329,12 +352,11 @@  xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
     ret = xsk_socket__create(&xsk->xsk, devname, queue_id, umem->umem,
                              &xsk->rx, &xsk->tx, &cfg);
     if (ret) {
-        VLOG_ERR("xsk_socket__create failed (%s) mode: %s "
-                 "use-need-wakeup: %s qid: %d",
-                 ovs_strerror(errno),
-                 xdpmode == XDP_COPY ? "SKB": "DRV",
-                 use_need_wakeup ? "true" : "false",
-                 queue_id);
+        VLOG(report_socket_failures ? VLL_ERR : VLL_DBG,
+             "xsk_socket__create failed (%s) mode: %s, "
+             "use-need-wakeup: %s, qid: %d",
+             ovs_strerror(errno), xdp_modes[mode].name,
+             use_need_wakeup ? "true" : "false", queue_id);
         free(xsk);
         return NULL;
     }
@@ -375,8 +397,8 @@  xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
 }
 
 static struct xsk_socket_info *
-xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
-              bool use_need_wakeup)
+xsk_configure(int ifindex, int xdp_queue_id, enum afxdp_mode mode,
+              bool use_need_wakeup, bool report_socket_failures)
 {
     struct xsk_socket_info *xsk;
     struct xsk_umem_info *umem;
@@ -389,9 +411,7 @@  xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
     memset(bufs, 0, NUM_FRAMES * FRAME_SIZE);
 
     /* Create AF_XDP socket. */
-    umem = xsk_configure_umem(bufs,
-                              NUM_FRAMES * FRAME_SIZE,
-                              xdpmode);
+    umem = xsk_configure_umem(bufs, NUM_FRAMES * FRAME_SIZE);
     if (!umem) {
         free_pagealign(bufs);
         return NULL;
@@ -399,8 +419,8 @@  xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
 
     VLOG_DBG("Allocated umem pool at 0x%"PRIxPTR, (uintptr_t) umem);
 
-    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, xdpmode,
-                               use_need_wakeup);
+    xsk = xsk_configure_socket(umem, ifindex, xdp_queue_id, mode,
+                               use_need_wakeup, report_socket_failures);
     if (!xsk) {
         /* Clean up umem and xpacket pool. */
         if (xsk_umem__delete(umem->umem)) {
@@ -414,12 +434,38 @@  xsk_configure(int ifindex, int xdp_queue_id, int xdpmode,
     return xsk;
 }
 
+static int
+xsk_configure_queue(struct netdev_linux *dev, int ifindex, int queue_id,
+                    enum afxdp_mode mode, bool report_socket_failures)
+{
+    struct xsk_socket_info *xsk_info;
+
+    VLOG_DBG("%s: configuring queue: %d, mode: %s, use-need-wakeup: %s.",
+             netdev_get_name(&dev->up), queue_id, xdp_modes[mode].name,
+             dev->use_need_wakeup ? "true" : "false");
+    xsk_info = xsk_configure(ifindex, queue_id, mode, dev->use_need_wakeup,
+                             report_socket_failures);
+    if (!xsk_info) {
+        VLOG(report_socket_failures ? VLL_ERR : VLL_DBG,
+             "%s: Failed to create AF_XDP socket on queue %d in %s mode.",
+             netdev_get_name(&dev->up), queue_id, xdp_modes[mode].name);
+        dev->xsks[queue_id] = NULL;
+        return -1;
+    }
+    dev->xsks[queue_id] = xsk_info;
+    atomic_init(&xsk_info->tx_dropped, 0);
+    xsk_info->outstanding_tx = 0;
+    xsk_info->available_rx = PROD_NUM_DESCS;
+    return 0;
+}
+
+
 static int
 xsk_configure_all(struct netdev *netdev)
 {
     struct netdev_linux *dev = netdev_linux_cast(netdev);
-    struct xsk_socket_info *xsk_info;
     int i, ifindex, n_rxq, n_txq;
+    int qid = 0;
 
     ifindex = linux_get_ifindex(netdev_get_name(netdev));
 
@@ -429,23 +475,36 @@  xsk_configure_all(struct netdev *netdev)
     n_rxq = netdev_n_rxq(netdev);
     dev->xsks = xcalloc(n_rxq, sizeof *dev->xsks);
 
-    /* Configure each queue. */
-    for (i = 0; i < n_rxq; i++) {
-        VLOG_DBG("%s: configure queue %d mode %s use-need-wakeup %s.",
-                 netdev_get_name(netdev), i,
-                 dev->xdpmode == XDP_COPY ? "SKB" : "DRV",
-                 dev->use_need_wakeup ? "true" : "false");
-        xsk_info = xsk_configure(ifindex, i, dev->xdpmode,
-                                 dev->use_need_wakeup);
-        if (!xsk_info) {
-            VLOG_ERR("Failed to create AF_XDP socket on queue %d.", i);
-            dev->xsks[i] = NULL;
+    if (dev->xdp_mode == OVS_AF_XDP_MODE_BEST_EFFORT) {
+        /* Trying to configure first queue with different modes to
+         * find the most suitable. */
+        for (i = OVS_AF_XDP_MODE_NATIVE_ZC; i < OVS_AF_XDP_MODE_MAX; i++) {
+            if (!xsk_configure_queue(dev, ifindex, qid, i,
+                                     i == OVS_AF_XDP_MODE_MAX - 1)) {
+                dev->xdp_mode_in_use = i;
+                VLOG_INFO("%s: %s XDP mode will be in use.",
+                          netdev_get_name(netdev), xdp_modes[i].name);
+                break;
+            }
+        }
+        if (i == OVS_AF_XDP_MODE_MAX) {
+            VLOG_ERR("%s: Failed to detect suitable XDP mode.",
+                     netdev_get_name(netdev));
+            goto err;
+        }
+        qid++;
+    } else {
+        dev->xdp_mode_in_use = dev->xdp_mode;
+    }
+
+    /* Configure remaining queues. */
+    for (; qid < n_rxq; qid++) {
+        if (xsk_configure_queue(dev, ifindex, qid,
+                                dev->xdp_mode_in_use, true)) {
+            VLOG_ERR("%s: Failed to create AF_XDP socket on queue %d.",
+                     netdev_get_name(netdev), qid);
             goto err;
         }
-        dev->xsks[i] = xsk_info;
-        atomic_init(&xsk_info->tx_dropped, 0);
-        xsk_info->outstanding_tx = 0;
-        xsk_info->available_rx = PROD_NUM_DESCS;
     }
 
     n_txq = netdev_n_txq(netdev);
@@ -500,7 +559,7 @@  xsk_destroy_all(struct netdev *netdev)
             if (dev->xsks[i]) {
                 xsk_destroy(dev->xsks[i]);
                 dev->xsks[i] = NULL;
-                VLOG_INFO("Destroyed xsk[%d].", i);
+                VLOG_DBG("%s: Destroyed xsk[%d].", netdev_get_name(netdev), i);
             }
         }
 
@@ -510,7 +569,7 @@  xsk_destroy_all(struct netdev *netdev)
 
     VLOG_INFO("%s: Removing xdp program.", netdev_get_name(netdev));
     ifindex = linux_get_ifindex(netdev_get_name(netdev));
-    xsk_remove_xdp_program(ifindex, dev->xdpmode);
+    xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use);
 
     if (dev->tx_locks) {
         for (i = 0; i < netdev_n_txq(netdev); i++) {
@@ -526,9 +585,10 @@  netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
                         char **errp OVS_UNUSED)
 {
     struct netdev_linux *dev = netdev_linux_cast(netdev);
-    const char *str_xdpmode;
-    int xdpmode, new_n_rxq;
+    const char *str_xdp_mode;
+    enum afxdp_mode xdp_mode;
     bool need_wakeup;
+    int new_n_rxq;
 
     ovs_mutex_lock(&dev->mutex);
     new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1);
@@ -539,14 +599,17 @@  netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
         return EINVAL;
     }
 
-    str_xdpmode = smap_get_def(args, "xdpmode", "skb");
-    if (!strcasecmp(str_xdpmode, "drv")) {
-        xdpmode = XDP_ZEROCOPY;
-    } else if (!strcasecmp(str_xdpmode, "skb")) {
-        xdpmode = XDP_COPY;
-    } else {
-        VLOG_ERR("%s: Incorrect xdpmode (%s).",
-                 netdev_get_name(netdev), str_xdpmode);
+    str_xdp_mode = smap_get_def(args, "xdp-mode", "best-effort");
+    for (xdp_mode = OVS_AF_XDP_MODE_BEST_EFFORT;
+         xdp_mode < OVS_AF_XDP_MODE_MAX;
+         xdp_mode++) {
+        if (!strcasecmp(str_xdp_mode, xdp_modes[xdp_mode].name)) {
+            break;
+        }
+    }
+    if (xdp_mode == OVS_AF_XDP_MODE_MAX) {
+        VLOG_ERR("%s: Incorrect xdp-mode (%s).",
+                 netdev_get_name(netdev), str_xdp_mode);
         ovs_mutex_unlock(&dev->mutex);
         return EINVAL;
     }
@@ -560,10 +623,10 @@  netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
 #endif
 
     if (dev->requested_n_rxq != new_n_rxq
-        || dev->requested_xdpmode != xdpmode
+        || dev->requested_xdp_mode != xdp_mode
         || dev->requested_need_wakeup != need_wakeup) {
         dev->requested_n_rxq = new_n_rxq;
-        dev->requested_xdpmode = xdpmode;
+        dev->requested_xdp_mode = xdp_mode;
         dev->requested_need_wakeup = need_wakeup;
         netdev_request_reconfigure(netdev);
     }
@@ -578,8 +641,9 @@  netdev_afxdp_get_config(const struct netdev *netdev, struct smap *args)
 
     ovs_mutex_lock(&dev->mutex);
     smap_add_format(args, "n_rxq", "%d", netdev->n_rxq);
-    smap_add_format(args, "xdpmode", "%s",
-                    dev->xdpmode == XDP_ZEROCOPY ? "drv" : "skb");
+    smap_add_format(args, "xdp-mode", "%s", xdp_modes[dev->xdp_mode].name);
+    smap_add_format(args, "xdp-mode-in-use", "%s",
+                    xdp_modes[dev->xdp_mode_in_use].name);
     smap_add_format(args, "use-need-wakeup", "%s",
                     dev->use_need_wakeup ? "true" : "false");
     ovs_mutex_unlock(&dev->mutex);
@@ -596,7 +660,7 @@  netdev_afxdp_reconfigure(struct netdev *netdev)
     ovs_mutex_lock(&dev->mutex);
 
     if (netdev->n_rxq == dev->requested_n_rxq
-        && dev->xdpmode == dev->requested_xdpmode
+        && dev->xdp_mode == dev->requested_xdp_mode
         && dev->use_need_wakeup == dev->requested_need_wakeup
         && dev->xsks) {
         goto out;
@@ -607,9 +671,9 @@  netdev_afxdp_reconfigure(struct netdev *netdev)
     netdev->n_rxq = dev->requested_n_rxq;
     netdev->n_txq = netdev->n_rxq;
 
-    dev->xdpmode = dev->requested_xdpmode;
+    dev->xdp_mode = dev->requested_xdp_mode;
     VLOG_INFO("%s: Setting XDP mode to %s.", netdev_get_name(netdev),
-              dev->xdpmode == XDP_ZEROCOPY ? "DRV" : "SKB");
+              xdp_modes[dev->xdp_mode].name);
 
     if (setrlimit(RLIMIT_MEMLOCK, &r)) {
         VLOG_ERR("setrlimit(RLIMIT_MEMLOCK) failed: %s", ovs_strerror(errno));
@@ -618,7 +682,8 @@  netdev_afxdp_reconfigure(struct netdev *netdev)
 
     err = xsk_configure_all(netdev);
     if (err) {
-        VLOG_ERR("AF_XDP device %s reconfig failed.", netdev_get_name(netdev));
+        VLOG_ERR("%s: AF_XDP device reconfiguration failed.",
+                 netdev_get_name(netdev));
     }
     netdev_change_seq_changed(netdev);
 out:
@@ -638,17 +703,9 @@  netdev_afxdp_get_numa_id(const struct netdev *netdev)
 }
 
 static void
-xsk_remove_xdp_program(uint32_t ifindex, int xdpmode)
+xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode mode)
 {
-    uint32_t flags;
-
-    flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
-
-    if (xdpmode == XDP_COPY) {
-        flags |= XDP_FLAGS_SKB_MODE;
-    } else if (xdpmode == XDP_ZEROCOPY) {
-        flags |= XDP_FLAGS_DRV_MODE;
-    }
+    uint32_t flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST;
 
     bpf_set_link_xdp_fd(ifindex, -1, flags);
 }
@@ -662,7 +719,7 @@  signal_remove_xdp(struct netdev *netdev)
     ifindex = linux_get_ifindex(netdev_get_name(netdev));
 
     VLOG_WARN("Force removing xdp program.");
-    xsk_remove_xdp_program(ifindex, dev->xdpmode);
+    xsk_remove_xdp_program(ifindex, dev->xdp_mode_in_use);
 }
 
 static struct dp_packet_afxdp *
@@ -782,7 +839,8 @@  netdev_afxdp_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
 }
 
 static inline int
-kick_tx(struct xsk_socket_info *xsk_info, int xdpmode, bool use_need_wakeup)
+kick_tx(struct xsk_socket_info *xsk_info, enum afxdp_mode mode,
+        bool use_need_wakeup)
 {
     int ret, retries;
     static const int KERNEL_TX_BATCH_SIZE = 16;
@@ -791,11 +849,11 @@  kick_tx(struct xsk_socket_info *xsk_info, int xdpmode, bool use_need_wakeup)
         return 0;
     }
 
-    /* In SKB_MODE packet transmission is synchronous, and the kernel xmits
+    /* In generic mode packet transmission is synchronous, and the kernel xmits
      * only TX_BATCH_SIZE(16) packets for a single sendmsg syscall.
      * So, we have to kick the kernel (n_packets / 16) times to be sure that
      * all packets are transmitted. */
-    retries = (xdpmode == XDP_COPY)
+    retries = (mode == OVS_AF_XDP_MODE_GENERIC)
               ? xsk_info->outstanding_tx / KERNEL_TX_BATCH_SIZE
               : 0;
 kick_retry:
@@ -962,7 +1020,7 @@  __netdev_afxdp_batch_send(struct netdev *netdev, int qid,
                            &orig);
         COVERAGE_INC(afxdp_tx_full);
         afxdp_complete_tx(xsk_info);
-        kick_tx(xsk_info, dev->xdpmode, dev->use_need_wakeup);
+        kick_tx(xsk_info, dev->xdp_mode_in_use, dev->use_need_wakeup);
         error = ENOMEM;
         goto out;
     }
@@ -986,7 +1044,7 @@  __netdev_afxdp_batch_send(struct netdev *netdev, int qid,
     xsk_ring_prod__submit(&xsk_info->tx, dp_packet_batch_size(batch));
     xsk_info->outstanding_tx += dp_packet_batch_size(batch);
 
-    ret = kick_tx(xsk_info, dev->xdpmode, dev->use_need_wakeup);
+    ret = kick_tx(xsk_info, dev->xdp_mode_in_use, dev->use_need_wakeup);
     if (OVS_UNLIKELY(ret)) {
         VLOG_WARN_RL(&rl, "%s: error sending AF_XDP packet: %s.",
                      netdev_get_name(netdev), ovs_strerror(ret));
@@ -1052,10 +1110,11 @@  netdev_afxdp_construct(struct netdev *netdev)
     /* Queues should not be used before the first reconfiguration. Clearing. */
     netdev->n_rxq = 0;
     netdev->n_txq = 0;
-    dev->xdpmode = 0;
+    dev->xdp_mode = OVS_AF_XDP_MODE_UNSPEC;
+    dev->xdp_mode_in_use = OVS_AF_XDP_MODE_UNSPEC;
 
     dev->requested_n_rxq = NR_QUEUE;
-    dev->requested_xdpmode = XDP_COPY;
+    dev->requested_xdp_mode = OVS_AF_XDP_MODE_BEST_EFFORT;
     dev->requested_need_wakeup = NEED_WAKEUP_DEFAULT;
 
     dev->xsks = NULL;
diff --git a/lib/netdev-afxdp.h b/lib/netdev-afxdp.h
index e2f400b72..4fe861d2d 100644
--- a/lib/netdev-afxdp.h
+++ b/lib/netdev-afxdp.h
@@ -25,6 +25,15 @@ 
 /* These functions are Linux AF_XDP specific, so they should be used directly
  * only by Linux-specific code. */
 
+enum afxdp_mode {
+    OVS_AF_XDP_MODE_UNSPEC,
+    OVS_AF_XDP_MODE_BEST_EFFORT,
+    OVS_AF_XDP_MODE_NATIVE_ZC,
+    OVS_AF_XDP_MODE_NATIVE,
+    OVS_AF_XDP_MODE_GENERIC,
+    OVS_AF_XDP_MODE_MAX,
+};
+
 struct netdev;
 struct xsk_socket_info;
 struct xdp_umem;
diff --git a/lib/netdev-linux-private.h b/lib/netdev-linux-private.h
index c14f2fb81..8873caa9d 100644
--- a/lib/netdev-linux-private.h
+++ b/lib/netdev-linux-private.h
@@ -100,10 +100,14 @@  struct netdev_linux {
     /* AF_XDP information. */
     struct xsk_socket_info **xsks;
     int requested_n_rxq;
-    int xdpmode;                /* AF_XDP running mode: driver or skb. */
-    int requested_xdpmode;
+
+    enum afxdp_mode xdp_mode;               /* Configured AF_XDP mode. */
+    enum afxdp_mode requested_xdp_mode;     /* Requested  AF_XDP mode. */
+    enum afxdp_mode xdp_mode_in_use;        /* Effective  AF_XDP mode. */
+
     bool use_need_wakeup;
     bool requested_need_wakeup;
+
     struct ovs_spin *tx_locks;  /* spin lock array for TX queues. */
 #endif
 };
diff --git a/tests/system-afxdp-macros.at b/tests/system-afxdp-macros.at
index f0683c0a9..5ee2ceb1a 100644
--- a/tests/system-afxdp-macros.at
+++ b/tests/system-afxdp-macros.at
@@ -30,10 +30,3 @@  m4_define([CONFIGURE_VETH_OFFLOADS],
      AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore])
     ]
 )
-
-# OVS_START_L7([namespace], [protocol])
-#
-# AF_XDP doesn't work with TCP over virtual interfaces for now.
-#
-m4_define([OVS_START_L7],
-   [AT_SKIP_IF([:])])
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index efdfb83bb..02a68deb1 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -3107,18 +3107,38 @@  ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
         </p>
       </column>
 
-      <column name="options" key="xdpmode"
+      <column name="options" key="xdp-mode"
               type='{"type": "string",
-                     "enum": ["set", ["skb", "drv"]]}'>
+                     "enum": ["set", ["best-effort", "native-with-zerocopy",
+                                      "native", "generic"]]}'>
         <p>
           Specifies the operational mode of the XDP program.
-          If "drv", the XDP program is loaded into the device driver with
-          zero-copy RX and TX enabled. This mode requires device driver with
-          AF_XDP support and has the best performance.
-          If "skb", the XDP program is using generic XDP mode in kernel with
-          extra data copying between userspace and kernel. No device driver
-          support is needed. Note that this is afxdp netdev type only.
-          Defaults to "skb" mode.
+          <p>
+            In <code>native-with-zerocopy</code> mode the XDP program is loaded
+            into the device driver with zero-copy RX and TX enabled.  This mode
+            requires device driver support and has the best performance because
+            there should be no copying of packets.
+          </p>
+          <p>
+            <code>native</code> is the same as
+            <code>native-with-zerocopy</code>, but without zero-copy
+            capability.  This requires at least one copy between kernel and the
+            userspace. This mode also requires support from device driver.
+          </p>
+          <p>
+            In <code>generic</code> case the XDP program in kernel works after
+            skb allocation on early stages of packet processing inside the
+            network stack.  This mode doesn't require driver support, but has
+            much lower performance.
+          </p>
+          <p>
+            <code>best-effort</code> tries to detect and choose the best
+            (fastest) from the available modes for current interface.
+          </p>
+          <p>
+            Note that this option is specific to netdev-afxdp.
+            Defaults to <code>best-effort</code> mode.
+          </p>
         </p>
       </column>