mbox series

[ovs-dev,V6,00/13] Netdev vxlan-decap offload

Message ID 20210404095456.27190-1-elibr@nvidia.com
Headers show
Series Netdev vxlan-decap offload | expand

Message

Eli Britstein April 4, 2021, 9:54 a.m. UTC
VXLAN decap in OVS-DPDK configuration consists of two flows:
F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0

F1 is a classification flow. It has outer headers matches and it
classifies the packet as a VXLAN packet, and using tnl_pop action the
packet continues processing in F2.
F2 is a flow that has matches on tunnel metadata as well as on the inner
packet headers (as any other flow).

In order to fully offload VXLAN decap path, both F1 and F2 should be
offloaded. As there are more than one flow in HW, it is possible that
F1 is done by HW but F2 is not. Packet is received by SW, and should be
processed starting from F2 as F1 was already done by HW.
Rte_flows are applicable only on physical port IDs. Keeping the original
physical in port on which the packet is received on enables applying
vport flows (e.g. F2) on that physical port.

This patch-set makes use of [1] introduced in DPDK 20.11, that adds API
for tunnel offloads.

Note that MLX5 PMD has a bug that the tnl_pop private actions must be
first. In OVS it is not.
Fixing this issue is scheduled for 21.05 (and stable 20.11.2).
Meanwhile, tests were done with a workaround for it [2].

v2-v1:
- Tracking original in_port, and applying vport on that physical port instead of all PFs.
v3-v2:
- Traversing ports using a new API instead of flow_dump.
- Refactor packet state recover logic, with bug fix for error pop_header.
- One ref count for netdev in non-tunnel case.
- Rename variables, comments, rebase.
v4-v3:
- Extract orig_in_port from physdev for flow modify.
- Miss handling fixes.
v5-v4:
- Drop refactor offload rule creation commit.
- Comment about setting in_port in restore.
- Refactor vports flow offload commit.
v6-v5:
- Fixed duplicate netdev ref bug.

Travis:
v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
v2: https://travis-ci.org/github/elibritstein/OVS/builds/758382963
v3: https://travis-ci.org/github/elibritstein/OVS/builds/761089087
v4: https://travis-ci.org/github/elibritstein/OVS/builds/763146966
v5: https://travis-ci.org/github/elibritstein/OVS/builds/765271879
v6: https://travis-ci.org/github/elibritstein/OVS/builds/765816800

GitHub Actions:
v1: https://github.com/elibritstein/OVS/actions/runs/515334647
v2: https://github.com/elibritstein/OVS/actions/runs/554986007
v3: https://github.com/elibritstein/OVS/actions/runs/613226225
v4: https://github.com/elibritstein/OVS/actions/runs/658415274
v5: https://github.com/elibritstein/OVS/actions/runs/704357369
v6: https://github.com/elibritstein/OVS/actions/runs/716304028

[1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
[2] https://github.com/elibritstein/dpdk-stable/pull/1


Eli Britstein (10):
  netdev-offload: Add HW miss packet state recover API
  netdev-dpdk: Introduce DPDK tunnel APIs
  netdev-offload: Introduce an API to traverse ports
  netdev-dpdk: Add flow_api support for netdev vxlan vports
  netdev-offload-dpdk: Implement HW miss packet recover for vport
  dpif-netdev: Add HW miss packet state recover logic
  netdev-offload-dpdk: Change log rate limits
  netdev-offload-dpdk: Support tunnel pop action
  netdev-offload-dpdk: Support vports flows offload
  netdev-dpdk-offload: Add vxlan pattern matching function

Ilya Maximets (2):
  netdev-offload: Allow offloading to netdev without ifindex.
  netdev-offload: Disallow offloading to unrelated tunneling vports.

Sriharsha Basavapatna (1):
  dpif-netdev: Provide orig_in_port in metadata for tunneled packets

 Documentation/howto/dpdk.rst  |   1 +
 NEWS                          |   2 +
 lib/dpif-netdev.c             |  97 +++--
 lib/netdev-dpdk.c             | 118 ++++++
 lib/netdev-dpdk.h             | 106 ++++-
 lib/netdev-offload-dpdk.c     | 704 +++++++++++++++++++++++++++++++---
 lib/netdev-offload-provider.h |   5 +
 lib/netdev-offload-tc.c       |   8 +
 lib/netdev-offload.c          |  47 ++-
 lib/netdev-offload.h          |  10 +
 lib/packets.h                 |   8 +-
 11 files changed, 1022 insertions(+), 84 deletions(-)

Comments

Ivan Malov April 6, 2021, 7:42 a.m. UTC | #1
Hi Eli,

Thank you for improving the patch series gradually. May I clarify one 
extra aspect (and I'm sorry if it has been discussed already): will OvS 
always insert F1 first and then insert F2? I mean, yes, in the model, F2 
should process packets coming from F1, but is it *technically* possible 
for OvS to insert the rules in reverse order?

On 04/04/2021 12:54, Eli Britstein wrote:
> VXLAN decap in OVS-DPDK configuration consists of two flows:
> F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
> F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
> 
> F1 is a classification flow. It has outer headers matches and it
> classifies the packet as a VXLAN packet, and using tnl_pop action the
> packet continues processing in F2.
> F2 is a flow that has matches on tunnel metadata as well as on the inner
> packet headers (as any other flow).
> 
> In order to fully offload VXLAN decap path, both F1 and F2 should be
> offloaded. As there are more than one flow in HW, it is possible that
> F1 is done by HW but F2 is not. Packet is received by SW, and should be
> processed starting from F2 as F1 was already done by HW.
> Rte_flows are applicable only on physical port IDs. Keeping the original
> physical in port on which the packet is received on enables applying
> vport flows (e.g. F2) on that physical port.
> 
> This patch-set makes use of [1] introduced in DPDK 20.11, that adds API
> for tunnel offloads.
> 
> Note that MLX5 PMD has a bug that the tnl_pop private actions must be
> first. In OVS it is not.
> Fixing this issue is scheduled for 21.05 (and stable 20.11.2).
> Meanwhile, tests were done with a workaround for it [2].
> 
> v2-v1:
> - Tracking original in_port, and applying vport on that physical port instead of all PFs.
> v3-v2:
> - Traversing ports using a new API instead of flow_dump.
> - Refactor packet state recover logic, with bug fix for error pop_header.
> - One ref count for netdev in non-tunnel case.
> - Rename variables, comments, rebase.
> v4-v3:
> - Extract orig_in_port from physdev for flow modify.
> - Miss handling fixes.
> v5-v4:
> - Drop refactor offload rule creation commit.
> - Comment about setting in_port in restore.
> - Refactor vports flow offload commit.
> v6-v5:
> - Fixed duplicate netdev ref bug.
> 
> Travis:
> v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
> v2: https://travis-ci.org/github/elibritstein/OVS/builds/758382963
> v3: https://travis-ci.org/github/elibritstein/OVS/builds/761089087
> v4: https://travis-ci.org/github/elibritstein/OVS/builds/763146966
> v5: https://travis-ci.org/github/elibritstein/OVS/builds/765271879
> v6: https://travis-ci.org/github/elibritstein/OVS/builds/765816800
> 
> GitHub Actions:
> v1: https://github.com/elibritstein/OVS/actions/runs/515334647
> v2: https://github.com/elibritstein/OVS/actions/runs/554986007
> v3: https://github.com/elibritstein/OVS/actions/runs/613226225
> v4: https://github.com/elibritstein/OVS/actions/runs/658415274
> v5: https://github.com/elibritstein/OVS/actions/runs/704357369
> v6: https://github.com/elibritstein/OVS/actions/runs/716304028
> 
> [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
> [2] https://github.com/elibritstein/dpdk-stable/pull/1
> 
> 
> Eli Britstein (10):
>    netdev-offload: Add HW miss packet state recover API
>    netdev-dpdk: Introduce DPDK tunnel APIs
>    netdev-offload: Introduce an API to traverse ports
>    netdev-dpdk: Add flow_api support for netdev vxlan vports
>    netdev-offload-dpdk: Implement HW miss packet recover for vport
>    dpif-netdev: Add HW miss packet state recover logic
>    netdev-offload-dpdk: Change log rate limits
>    netdev-offload-dpdk: Support tunnel pop action
>    netdev-offload-dpdk: Support vports flows offload
>    netdev-dpdk-offload: Add vxlan pattern matching function
> 
> Ilya Maximets (2):
>    netdev-offload: Allow offloading to netdev without ifindex.
>    netdev-offload: Disallow offloading to unrelated tunneling vports.
> 
> Sriharsha Basavapatna (1):
>    dpif-netdev: Provide orig_in_port in metadata for tunneled packets
> 
>   Documentation/howto/dpdk.rst  |   1 +
>   NEWS                          |   2 +
>   lib/dpif-netdev.c             |  97 +++--
>   lib/netdev-dpdk.c             | 118 ++++++
>   lib/netdev-dpdk.h             | 106 ++++-
>   lib/netdev-offload-dpdk.c     | 704 +++++++++++++++++++++++++++++++---
>   lib/netdev-offload-provider.h |   5 +
>   lib/netdev-offload-tc.c       |   8 +
>   lib/netdev-offload.c          |  47 ++-
>   lib/netdev-offload.h          |  10 +
>   lib/packets.h                 |   8 +-
>   11 files changed, 1022 insertions(+), 84 deletions(-)
>
Eli Britstein April 6, 2021, 7:57 a.m. UTC | #2
On 4/6/2021 10:42 AM, Ivan Malov wrote:
> External email: Use caution opening links or attachments
>
>
> Hi Eli,
>
> Thank you for improving the patch series gradually. May I clarify one
> extra aspect (and I'm sorry if it has been discussed already): will OvS
> always insert F1 first and then insert F2? I mean, yes, in the model, F2
> should process packets coming from F1, but is it *technically* possible
> for OvS to insert the rules in reverse order?

Usually the flows are created by the 
handle_packet_upcall->dp_netdev_flow_add.

handle_packet_upcall->dp_netdev_execute_actions is done first. 
Processing the packets is recursive (dp_netdev_recirculate), thus the 
flows are created in a LIFO manner, meaning F2 is first.

However, we cannot rely on such ordering. It is possible that F2 cannot 
be offloaded (unsupported action for example).

Also, in the future offloads will be inserted multi-threaded (see 
https://patchwork.ozlabs.org/project/openvswitch/list/?series=228959), 
so that's another reason we cannot rely on any ordering.

>
> On 04/04/2021 12:54, Eli Britstein wrote:
>> VXLAN decap in OVS-DPDK configuration consists of two flows:
>> F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
>> F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
>>
>> F1 is a classification flow. It has outer headers matches and it
>> classifies the packet as a VXLAN packet, and using tnl_pop action the
>> packet continues processing in F2.
>> F2 is a flow that has matches on tunnel metadata as well as on the inner
>> packet headers (as any other flow).
>>
>> In order to fully offload VXLAN decap path, both F1 and F2 should be
>> offloaded. As there are more than one flow in HW, it is possible that
>> F1 is done by HW but F2 is not. Packet is received by SW, and should be
>> processed starting from F2 as F1 was already done by HW.
>> Rte_flows are applicable only on physical port IDs. Keeping the original
>> physical in port on which the packet is received on enables applying
>> vport flows (e.g. F2) on that physical port.
>>
>> This patch-set makes use of [1] introduced in DPDK 20.11, that adds API
>> for tunnel offloads.
>>
>> Note that MLX5 PMD has a bug that the tnl_pop private actions must be
>> first. In OVS it is not.
>> Fixing this issue is scheduled for 21.05 (and stable 20.11.2).
>> Meanwhile, tests were done with a workaround for it [2].
>>
>> v2-v1:
>> - Tracking original in_port, and applying vport on that physical port 
>> instead of all PFs.
>> v3-v2:
>> - Traversing ports using a new API instead of flow_dump.
>> - Refactor packet state recover logic, with bug fix for error 
>> pop_header.
>> - One ref count for netdev in non-tunnel case.
>> - Rename variables, comments, rebase.
>> v4-v3:
>> - Extract orig_in_port from physdev for flow modify.
>> - Miss handling fixes.
>> v5-v4:
>> - Drop refactor offload rule creation commit.
>> - Comment about setting in_port in restore.
>> - Refactor vports flow offload commit.
>> v6-v5:
>> - Fixed duplicate netdev ref bug.
>>
>> Travis:
>> v1: 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.org%2Fgithub%2Felibritstein%2FOVS%2Fbuilds%2F756418552&data=04%7C01%7Celibr%40nvidia.com%7C9604857ef03d4b4ee35508d8f8cf98a6%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637532917799220640%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=9lDdE7UXHSPFVHHQhp9OhPCB2eUdt2avc%2F3P39cYXfI%3D&reserved=0
>> v2: 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.org%2Fgithub%2Felibritstein%2FOVS%2Fbuilds%2F758382963&data=04%7C01%7Celibr%40nvidia.com%7C9604857ef03d4b4ee35508d8f8cf98a6%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637532917799230634%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=P9VJvydP8OWBe01zAy%2FhDrugktXlNO8xtX5enKux7mc%3D&reserved=0
>> v3: 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.org%2Fgithub%2Felibritstein%2FOVS%2Fbuilds%2F761089087&data=04%7C01%7Celibr%40nvidia.com%7C9604857ef03d4b4ee35508d8f8cf98a6%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637532917799230634%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=fhSFUtCle9BKnVRKlre3hjAjSuBaEjFvszA1WeDBHOc%3D&reserved=0
>> v4: 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.org%2Fgithub%2Felibritstein%2FOVS%2Fbuilds%2F763146966&data=04%7C01%7Celibr%40nvidia.com%7C9604857ef03d4b4ee35508d8f8cf98a6%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637532917799230634%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=xmGASTbUDxeFmNvntdLaYYiXJ%2Bil%2Bqkh1RuxAN3uTS8%3D&reserved=0
>> v5: 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.org%2Fgithub%2Felibritstein%2FOVS%2Fbuilds%2F765271879&data=04%7C01%7Celibr%40nvidia.com%7C9604857ef03d4b4ee35508d8f8cf98a6%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637532917799230634%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2SSMrvKczytG7hsXduUKcou7WBJHr3%2BEawbel%2FKfq2o%3D&reserved=0
>> v6: 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.org%2Fgithub%2Felibritstein%2FOVS%2Fbuilds%2F765816800&data=04%7C01%7Celibr%40nvidia.com%7C9604857ef03d4b4ee35508d8f8cf98a6%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637532917799230634%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=FVxfBb8Whc9%2FR4uocr2uJtnqr6ZjAUCi7Bu6%2FDtMkOs%3D&reserved=0
>>
>> GitHub Actions:
>> v1: https://github.com/elibritstein/OVS/actions/runs/515334647
>> v2: https://github.com/elibritstein/OVS/actions/runs/554986007
>> v3: https://github.com/elibritstein/OVS/actions/runs/613226225
>> v4: https://github.com/elibritstein/OVS/actions/runs/658415274
>> v5: https://github.com/elibritstein/OVS/actions/runs/704357369
>> v6: https://github.com/elibritstein/OVS/actions/runs/716304028
>>
>> [1] 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2020-October%2F187314.html&data=04%7C01%7Celibr%40nvidia.com%7C9604857ef03d4b4ee35508d8f8cf98a6%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637532917799230634%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=bjKGgknW21J8KEw54qF8J9L88sKWmXpadqzBcJ6kgxI%3D&reserved=0
>> [2] https://github.com/elibritstein/dpdk-stable/pull/1
>>
>>
>> Eli Britstein (10):
>>    netdev-offload: Add HW miss packet state recover API
>>    netdev-dpdk: Introduce DPDK tunnel APIs
>>    netdev-offload: Introduce an API to traverse ports
>>    netdev-dpdk: Add flow_api support for netdev vxlan vports
>>    netdev-offload-dpdk: Implement HW miss packet recover for vport
>>    dpif-netdev: Add HW miss packet state recover logic
>>    netdev-offload-dpdk: Change log rate limits
>>    netdev-offload-dpdk: Support tunnel pop action
>>    netdev-offload-dpdk: Support vports flows offload
>>    netdev-dpdk-offload: Add vxlan pattern matching function
>>
>> Ilya Maximets (2):
>>    netdev-offload: Allow offloading to netdev without ifindex.
>>    netdev-offload: Disallow offloading to unrelated tunneling vports.
>>
>> Sriharsha Basavapatna (1):
>>    dpif-netdev: Provide orig_in_port in metadata for tunneled packets
>>
>>   Documentation/howto/dpdk.rst  |   1 +
>>   NEWS                          |   2 +
>>   lib/dpif-netdev.c             |  97 +++--
>>   lib/netdev-dpdk.c             | 118 ++++++
>>   lib/netdev-dpdk.h             | 106 ++++-
>>   lib/netdev-offload-dpdk.c     | 704 +++++++++++++++++++++++++++++++---
>>   lib/netdev-offload-provider.h |   5 +
>>   lib/netdev-offload-tc.c       |   8 +
>>   lib/netdev-offload.c          |  47 ++-
>>   lib/netdev-offload.h          |  10 +
>>   lib/packets.h                 |   8 +-
>>   11 files changed, 1022 insertions(+), 84 deletions(-)
>>
>
> -- 
> Ivan M
Sriharsha Basavapatna April 7, 2021, 9:10 a.m. UTC | #3
On Sun, Apr 4, 2021 at 3:25 PM Eli Britstein <elibr@nvidia.com> wrote:

> VXLAN decap in OVS-DPDK configuration consists of two flows:
> F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
> F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
>
> F1 is a classification flow. It has outer headers matches and it
> classifies the packet as a VXLAN packet, and using tnl_pop action the
> packet continues processing in F2.
> F2 is a flow that has matches on tunnel metadata as well as on the inner
> packet headers (as any other flow).
>
> In order to fully offload VXLAN decap path, both F1 and F2 should be
> offloaded. As there are more than one flow in HW, it is possible that
> F1 is done by HW but F2 is not. Packet is received by SW, and should be
> processed starting from F2 as F1 was already done by HW.
> Rte_flows are applicable only on physical port IDs. Keeping the original
> physical in port on which the packet is received on enables applying
> vport flows (e.g. F2) on that physical port.
>
> This patch-set makes use of [1] introduced in DPDK 20.11, that adds API
> for tunnel offloads.
>
> Note that MLX5 PMD has a bug that the tnl_pop private actions must be
> first. In OVS it is not.
> Fixing this issue is scheduled for 21.05 (and stable 20.11.2).
> Meanwhile, tests were done with a workaround for it [2].
>
> v2-v1:
> - Tracking original in_port, and applying vport on that physical port
> instead of all PFs.
> v3-v2:
> - Traversing ports using a new API instead of flow_dump.
> - Refactor packet state recover logic, with bug fix for error pop_header.
> - One ref count for netdev in non-tunnel case.
> - Rename variables, comments, rebase.
> v4-v3:
> - Extract orig_in_port from physdev for flow modify.
> - Miss handling fixes.
> v5-v4:
> - Drop refactor offload rule creation commit.
> - Comment about setting in_port in restore.
> - Refactor vports flow offload commit.
> v6-v5:
> - Fixed duplicate netdev ref bug.
>

Can you provide some info on this bug ?  and what changes were done to fix
this ?
Thanks,
-Harsha

>
> Travis:
> v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
> v2 <https://travis-ci.org/github/elibritstein/OVS/builds/756418552v2>:
> https://travis-ci.org/github/elibritstein/OVS/builds/758382963
> v3 <https://travis-ci.org/github/elibritstein/OVS/builds/758382963v3>:
> https://travis-ci.org/github/elibritstein/OVS/builds/761089087
> v4 <https://travis-ci.org/github/elibritstein/OVS/builds/761089087v4>:
> https://travis-ci.org/github/elibritstein/OVS/builds/763146966
> v5 <https://travis-ci.org/github/elibritstein/OVS/builds/763146966v5>:
> https://travis-ci.org/github/elibritstein/OVS/builds/765271879
> v6 <https://travis-ci.org/github/elibritstein/OVS/builds/765271879v6>:
> https://travis-ci.org/github/elibritstein/OVS/builds/765816800
>
> GitHub Actions:
> v1: https://github.com/elibritstein/OVS/actions/runs/515334647
> v2: https://github.com/elibritstein/OVS/actions/runs/554986007
> v3: https://github.com/elibritstein/OVS/actions/runs/613226225
> v4: https://github.com/elibritstein/OVS/actions/runs/658415274
> v5: https://github.com/elibritstein/OVS/actions/runs/704357369
> v6: https://github.com/elibritstein/OVS/actions/runs/716304028
>
> [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
> [2] https://github.com/elibritstein/dpdk-stable/pull/1
>
>
> Eli Britstein (10):
>   netdev-offload: Add HW miss packet state recover API
>   netdev-dpdk: Introduce DPDK tunnel APIs
>   netdev-offload: Introduce an API to traverse ports
>   netdev-dpdk: Add flow_api support for netdev vxlan vports
>   netdev-offload-dpdk: Implement HW miss packet recover for vport
>   dpif-netdev: Add HW miss packet state recover logic
>   netdev-offload-dpdk: Change log rate limits
>   netdev-offload-dpdk: Support tunnel pop action
>   netdev-offload-dpdk: Support vports flows offload
>   netdev-dpdk-offload: Add vxlan pattern matching function
>
> Ilya Maximets (2):
>   netdev-offload: Allow offloading to netdev without ifindex.
>   netdev-offload: Disallow offloading to unrelated tunneling vports.
>
> Sriharsha Basavapatna (1):
>   dpif-netdev: Provide orig_in_port in metadata for tunneled packets
>
>  Documentation/howto/dpdk.rst  |   1 +
>  NEWS                          |   2 +
>  lib/dpif-netdev.c             |  97 +++--
>  lib/netdev-dpdk.c             | 118 ++++++
>  lib/netdev-dpdk.h             | 106 ++++-
>  lib/netdev-offload-dpdk.c     | 704 +++++++++++++++++++++++++++++++---
>  lib/netdev-offload-provider.h |   5 +
>  lib/netdev-offload-tc.c       |   8 +
>  lib/netdev-offload.c          |  47 ++-
>  lib/netdev-offload.h          |  10 +
>  lib/packets.h                 |   8 +-
>  11 files changed, 1022 insertions(+), 84 deletions(-)
>
> --
> 2.28.0.2311.g225365fb51
>
>
Eli Britstein April 7, 2021, 9:20 a.m. UTC | #4
On 4/7/2021 12:10 PM, Sriharsha Basavapatna wrote:
>
> On Sun, Apr 4, 2021 at 3:25 PM Eli Britstein <elibr@nvidia.com 
> <mailto:elibr@nvidia.com>> wrote:
>
>     VXLAN decap in OVS-DPDK configuration consists of two flows:
>     F1: in_port(ens1f0),eth(),ipv4(),udp(),
>     actions:tnl_pop(vxlan_sys_4789)
>     F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
>
>     F1 is a classification flow. It has outer headers matches and it
>     classifies the packet as a VXLAN packet, and using tnl_pop action the
>     packet continues processing in F2.
>     F2 is a flow that has matches on tunnel metadata as well as on the
>     inner
>     packet headers (as any other flow).
>
>     In order to fully offload VXLAN decap path, both F1 and F2 should be
>     offloaded. As there are more than one flow in HW, it is possible that
>     F1 is done by HW but F2 is not. Packet is received by SW, and
>     should be
>     processed starting from F2 as F1 was already done by HW.
>     Rte_flows are applicable only on physical port IDs. Keeping the
>     original
>     physical in port on which the packet is received on enables applying
>     vport flows (e.g. F2) on that physical port.
>
>     This patch-set makes use of [1] introduced in DPDK 20.11, that
>     adds API
>     for tunnel offloads.
>
>     Note that MLX5 PMD has a bug that the tnl_pop private actions must be
>     first. In OVS it is not.
>     Fixing this issue is scheduled for 21.05 (and stable 20.11.2).
>     Meanwhile, tests were done with a workaround for it [2].
>
>     v2-v1:
>     - Tracking original in_port, and applying vport on that physical
>     port instead of all PFs.
>     v3-v2:
>     - Traversing ports using a new API instead of flow_dump.
>     - Refactor packet state recover logic, with bug fix for error
>     pop_header.
>     - One ref count for netdev in non-tunnel case.
>     - Rename variables, comments, rebase.
>     v4-v3:
>     - Extract orig_in_port from physdev for flow modify.
>     - Miss handling fixes.
>     v5-v4:
>     - Drop refactor offload rule creation commit.
>     - Comment about setting in_port in restore.
>     - Refactor vports flow offload commit.
>     v6-v5:
>     - Fixed duplicate netdev ref bug.
>
>
> Can you provide some info on this bug ?  and what changes were done to 
> fix this ?

With v5, the 2 netdevs sent to ufid_to_rte_flow_associate are always 
non-NULL (was not like this previously), and there was this line:

+    data->netdev = vport ? netdev_ref(vport) : physdev;

As the "vport" was always non-null, even for non-tunnels, it took 
another ref of it, but in disassociate, only one close was done.

With v6 it is like this (changed arguments names a bit)

+    data->physdev = netdev != physdev ? netdev_ref(physdev) : physdev;

Checking the netdevs are different, not non-NULL.

> Thanks,
> -Harsha
>
>
>     Travis:
>     v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
>     v2
>     <https://travis-ci.org/github/elibritstein/OVS/builds/756418552v2>:
>     https://travis-ci.org/github/elibritstein/OVS/builds/758382963
>     v3
>     <https://travis-ci.org/github/elibritstein/OVS/builds/758382963v3>:
>     https://travis-ci.org/github/elibritstein/OVS/builds/761089087
>     v4
>     <https://travis-ci.org/github/elibritstein/OVS/builds/761089087v4>:
>     https://travis-ci.org/github/elibritstein/OVS/builds/763146966
>     v5
>     <https://travis-ci.org/github/elibritstein/OVS/builds/763146966v5>:
>     https://travis-ci.org/github/elibritstein/OVS/builds/765271879
>     v6
>     <https://travis-ci.org/github/elibritstein/OVS/builds/765271879v6>:
>     https://travis-ci.org/github/elibritstein/OVS/builds/765816800
>     <https://travis-ci.org/github/elibritstein/OVS/builds/765816800>
>
>     GitHub Actions:
>     v1: https://github.com/elibritstein/OVS/actions/runs/515334647
>     <https://github.com/elibritstein/OVS/actions/runs/515334647>
>     v2: https://github.com/elibritstein/OVS/actions/runs/554986007
>     <https://github.com/elibritstein/OVS/actions/runs/554986007>
>     v3: https://github.com/elibritstein/OVS/actions/runs/613226225
>     <https://github.com/elibritstein/OVS/actions/runs/613226225>
>     v4: https://github.com/elibritstein/OVS/actions/runs/658415274
>     <https://github.com/elibritstein/OVS/actions/runs/658415274>
>     v5: https://github.com/elibritstein/OVS/actions/runs/704357369
>     <https://github.com/elibritstein/OVS/actions/runs/704357369>
>     v6: https://github.com/elibritstein/OVS/actions/runs/716304028
>     <https://github.com/elibritstein/OVS/actions/runs/716304028>
>
>     [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
>     <https://mails.dpdk.org/archives/dev/2020-October/187314.html>
>     [2] https://github.com/elibritstein/dpdk-stable/pull/1
>     <https://github.com/elibritstein/dpdk-stable/pull/1>
>
>
>     Eli Britstein (10):
>       netdev-offload: Add HW miss packet state recover API
>       netdev-dpdk: Introduce DPDK tunnel APIs
>       netdev-offload: Introduce an API to traverse ports
>       netdev-dpdk: Add flow_api support for netdev vxlan vports
>       netdev-offload-dpdk: Implement HW miss packet recover for vport
>       dpif-netdev: Add HW miss packet state recover logic
>       netdev-offload-dpdk: Change log rate limits
>       netdev-offload-dpdk: Support tunnel pop action
>       netdev-offload-dpdk: Support vports flows offload
>       netdev-dpdk-offload: Add vxlan pattern matching function
>
>     Ilya Maximets (2):
>       netdev-offload: Allow offloading to netdev without ifindex.
>       netdev-offload: Disallow offloading to unrelated tunneling vports.
>
>     Sriharsha Basavapatna (1):
>       dpif-netdev: Provide orig_in_port in metadata for tunneled packets
>
>      Documentation/howto/dpdk.rst  |   1 +
>      NEWS                          |   2 +
>      lib/dpif-netdev.c             |  97 +++--
>      lib/netdev-dpdk.c             | 118 ++++++
>      lib/netdev-dpdk.h             | 106 ++++-
>      lib/netdev-offload-dpdk.c     | 704
>     +++++++++++++++++++++++++++++++---
>      lib/netdev-offload-provider.h |   5 +
>      lib/netdev-offload-tc.c       |   8 +
>      lib/netdev-offload.c          |  47 ++-
>      lib/netdev-offload.h          |  10 +
>      lib/packets.h                 |   8 +-
>      11 files changed, 1022 insertions(+), 84 deletions(-)
>
>     -- 
>     2.28.0.2311.g225365fb51
>
>
> This electronic communication and the information and any files 
> transmitted with it, or attached to it, are confidential and are 
> intended solely for the use of the individual or entity to whom it is 
> addressed and may contain information that is confidential, legally 
> privileged, protected by privacy laws, or otherwise restricted from 
> disclosure to anyone else. If you are not the intended recipient or 
> the person responsible for delivering the e-mail to the intended 
> recipient, you are hereby notified that any use, copying, 
> distributing, dissemination, forwarding, printing, or copying of this 
> e-mail is strictly prohibited. If you received this e-mail in error, 
> please return the e-mail to the sender, delete it from your computer, 
> and destroy any printed copy of it.
Sriharsha Basavapatna April 8, 2021, 5:04 p.m. UTC | #5
On Wed, Apr 7, 2021 at 2:50 PM Eli Britstein <elibr@nvidia.com> wrote:

>
> On 4/7/2021 12:10 PM, Sriharsha Basavapatna wrote:
>
>
> On Sun, Apr 4, 2021 at 3:25 PM Eli Britstein <elibr@nvidia.com> wrote:
>
>> VXLAN decap in OVS-DPDK configuration consists of two flows:
>> F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
>> F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
>>
>> F1 is a classification flow. It has outer headers matches and it
>> classifies the packet as a VXLAN packet, and using tnl_pop action the
>> packet continues processing in F2.
>> F2 is a flow that has matches on tunnel metadata as well as on the inner
>> packet headers (as any other flow).
>>
>> In order to fully offload VXLAN decap path, both F1 and F2 should be
>> offloaded. As there are more than one flow in HW, it is possible that
>> F1 is done by HW but F2 is not. Packet is received by SW, and should be
>> processed starting from F2 as F1 was already done by HW.
>> Rte_flows are applicable only on physical port IDs. Keeping the original
>> physical in port on which the packet is received on enables applying
>> vport flows (e.g. F2) on that physical port.
>>
>> This patch-set makes use of [1] introduced in DPDK 20.11, that adds API
>> for tunnel offloads.
>>
>> Note that MLX5 PMD has a bug that the tnl_pop private actions must be
>> first. In OVS it is not.
>> Fixing this issue is scheduled for 21.05 (and stable 20.11.2).
>> Meanwhile, tests were done with a workaround for it [2].
>>
>> v2-v1:
>> - Tracking original in_port, and applying vport on that physical port
>> instead of all PFs.
>> v3-v2:
>> - Traversing ports using a new API instead of flow_dump.
>> - Refactor packet state recover logic, with bug fix for error pop_header.
>> - One ref count for netdev in non-tunnel case.
>> - Rename variables, comments, rebase.
>> v4-v3:
>> - Extract orig_in_port from physdev for flow modify.
>> - Miss handling fixes.
>> v5-v4:
>> - Drop refactor offload rule creation commit.
>> - Comment about setting in_port in restore.
>> - Refactor vports flow offload commit.
>> v6-v5:
>> - Fixed duplicate netdev ref bug.
>>
>
> Can you provide some info on this bug ?  and what changes were done to fix
> this ?
>
> With v5, the 2 netdevs sent to ufid_to_rte_flow_associate are always
> non-NULL (was not like this previously), and there was this line:
>
> +    data->netdev = vport ? netdev_ref(vport) : physdev;
>
> As the "vport" was always non-null, even for non-tunnels, it took another
> ref of it, but in disassociate, only one close was done.
>
> With v6 it is like this (changed arguments names a bit)
>
> +    data->physdev = netdev != physdev ? netdev_ref(physdev) : physdev;
>
> Checking the netdevs are different, not non-NULL.
>
> Thanks,
> -Harsha
>
>>
>> Travis:
>> v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
>> v2 <https://travis-ci.org/github/elibritstein/OVS/builds/756418552v2>:
>> https://travis-ci.org/github/elibritstein/OVS/builds/758382963
>> v3 <https://travis-ci.org/github/elibritstein/OVS/builds/758382963v3>:
>> https://travis-ci.org/github/elibritstein/OVS/builds/761089087
>> v4 <https://travis-ci.org/github/elibritstein/OVS/builds/761089087v4>:
>> https://travis-ci.org/github/elibritstein/OVS/builds/763146966
>> v5 <https://travis-ci.org/github/elibritstein/OVS/builds/763146966v5>:
>> https://travis-ci.org/github/elibritstein/OVS/builds/765271879
>> v6 <https://travis-ci.org/github/elibritstein/OVS/builds/765271879v6>:
>> https://travis-ci.org/github/elibritstein/OVS/builds/765816800
>>
>> GitHub Actions:
>> v1: https://github.com/elibritstein/OVS/actions/runs/515334647
>> v2: https://github.com/elibritstein/OVS/actions/runs/554986007
>> v3: https://github.com/elibritstein/OVS/actions/runs/613226225
>> v4: https://github.com/elibritstein/OVS/actions/runs/658415274
>> v5: https://github.com/elibritstein/OVS/actions/runs/704357369
>> v6: https://github.com/elibritstein/OVS/actions/runs/716304028
>>
>> [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
>> [2] https://github.com/elibritstein/dpdk-stable/pull/1
>>
>>
>> Eli Britstein (10):
>>   netdev-offload: Add HW miss packet state recover API
>>   netdev-dpdk: Introduce DPDK tunnel APIs
>>   netdev-offload: Introduce an API to traverse ports
>>   netdev-dpdk: Add flow_api support for netdev vxlan vports
>>   netdev-offload-dpdk: Implement HW miss packet recover for vport
>>   dpif-netdev: Add HW miss packet state recover logic
>>   netdev-offload-dpdk: Change log rate limits
>>   netdev-offload-dpdk: Support tunnel pop action
>>   netdev-offload-dpdk: Support vports flows offload
>>   netdev-dpdk-offload: Add vxlan pattern matching function
>>
>> Ilya Maximets (2):
>>   netdev-offload: Allow offloading to netdev without ifindex.
>>   netdev-offload: Disallow offloading to unrelated tunneling vports.
>>
>> Sriharsha Basavapatna (1):
>>   dpif-netdev: Provide orig_in_port in metadata for tunneled packets
>>
>>  Documentation/howto/dpdk.rst  |   1 +
>>  NEWS                          |   2 +
>>  lib/dpif-netdev.c             |  97 +++--
>>  lib/netdev-dpdk.c             | 118 ++++++
>>  lib/netdev-dpdk.h             | 106 ++++-
>>  lib/netdev-offload-dpdk.c     | 704 +++++++++++++++++++++++++++++++---
>>  lib/netdev-offload-provider.h |   5 +
>>  lib/netdev-offload-tc.c       |   8 +
>>  lib/netdev-offload.c          |  47 ++-
>>  lib/netdev-offload.h          |  10 +
>>  lib/packets.h                 |   8 +-
>>  11 files changed, 1022 insertions(+), 84 deletions(-)
>>
>> --
>> 2.28.0.2311.g225365fb51
>>
> Acked-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>

Thanks,
-Harsha

>
>>
> This electronic communication and the information and any files
> transmitted with it, or attached to it, are confidential and are intended
> solely for the use of the individual or entity to whom it is addressed and
> may contain information that is confidential, legally privileged, protected
> by privacy laws, or otherwise restricted from disclosure to anyone else. If
> you are not the intended recipient or the person responsible for delivering
> the e-mail to the intended recipient, you are hereby notified that any use,
> copying, distributing, dissemination, forwarding, printing, or copying of
> this e-mail is strictly prohibited. If you received this e-mail in error,
> please return the e-mail to the sender, delete it from your computer, and
> destroy any printed copy of it.
>
>
Emma Finn April 20, 2021, 9:49 a.m. UTC | #6
> -----Original Message-----
> From: dev <ovs-dev-bounces@openvswitch.org> On Behalf Of Sriharsha
> Basavapatna via dev
> Sent: Thursday 8 April 2021 18:04
> To: Eli Britstein <elibr@nvidia.com>
> Cc: ovs dev <dev@openvswitch.org>; Ivan.Malov@oktetlabs.ru; Ilya
> Maximets <i.maximets@ovn.org>; Ameer Mahagneh
> <ameerm@nvidia.com>; Majd Dibbiny <majd@nvidia.com>
> Subject: Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload
> 
> On Wed, Apr 7, 2021 at 2:50 PM Eli Britstein <elibr@nvidia.com> wrote:
> 
> >
> > On 4/7/2021 12:10 PM, Sriharsha Basavapatna wrote:
> >
> >
> > On Sun, Apr 4, 2021 at 3:25 PM Eli Britstein <elibr@nvidia.com> wrote:
> >
> >> VXLAN decap in OVS-DPDK configuration consists of two flows:
> >> F1: in_port(ens1f0),eth(),ipv4(),udp(),
> >> actions:tnl_pop(vxlan_sys_4789)
> >> F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
> >>
> >> F1 is a classification flow. It has outer headers matches and it
> >> classifies the packet as a VXLAN packet, and using tnl_pop action the
> >> packet continues processing in F2.
> >> F2 is a flow that has matches on tunnel metadata as well as on the
> >> inner packet headers (as any other flow).
> >>
> >> In order to fully offload VXLAN decap path, both F1 and F2 should be
> >> offloaded. As there are more than one flow in HW, it is possible that
> >> F1 is done by HW but F2 is not. Packet is received by SW, and should
> >> be processed starting from F2 as F1 was already done by HW.
> >> Rte_flows are applicable only on physical port IDs. Keeping the
> >> original physical in port on which the packet is received on enables
> >> applying vport flows (e.g. F2) on that physical port.
> >>
> >> This patch-set makes use of [1] introduced in DPDK 20.11, that adds
> >> API for tunnel offloads.
> >>
> >> Note that MLX5 PMD has a bug that the tnl_pop private actions must be
> >> first. In OVS it is not.
> >> Fixing this issue is scheduled for 21.05 (and stable 20.11.2).
> >> Meanwhile, tests were done with a workaround for it [2].
> >>
> >> v2-v1:
> >> - Tracking original in_port, and applying vport on that physical port
> >> instead of all PFs.
> >> v3-v2:
> >> - Traversing ports using a new API instead of flow_dump.
> >> - Refactor packet state recover logic, with bug fix for error pop_header.
> >> - One ref count for netdev in non-tunnel case.
> >> - Rename variables, comments, rebase.
> >> v4-v3:
> >> - Extract orig_in_port from physdev for flow modify.
> >> - Miss handling fixes.
> >> v5-v4:
> >> - Drop refactor offload rule creation commit.
> >> - Comment about setting in_port in restore.
> >> - Refactor vports flow offload commit.
> >> v6-v5:
> >> - Fixed duplicate netdev ref bug.
> >>
> >
> > Can you provide some info on this bug ?  and what changes were done to
> > fix this ?
> >
> > With v5, the 2 netdevs sent to ufid_to_rte_flow_associate are always
> > non-NULL (was not like this previously), and there was this line:
> >
> > +    data->netdev = vport ? netdev_ref(vport) : physdev;
> >
> > As the "vport" was always non-null, even for non-tunnels, it took
> > another ref of it, but in disassociate, only one close was done.
> >
> > With v6 it is like this (changed arguments names a bit)
> >
> > +    data->physdev = netdev != physdev ? netdev_ref(physdev) :
> > + physdev;
> >
> > Checking the netdevs are different, not non-NULL.
> >
> > Thanks,
> > -Harsha
> >
> >>
> >> Travis:
> >> v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
> >> v2 <https://travis-ci.org/github/elibritstein/OVS/builds/756418552v2>:
> >> https://travis-ci.org/github/elibritstein/OVS/builds/758382963
> >> v3 <https://travis-ci.org/github/elibritstein/OVS/builds/758382963v3>:
> >> https://travis-ci.org/github/elibritstein/OVS/builds/761089087
> >> v4 <https://travis-ci.org/github/elibritstein/OVS/builds/761089087v4>:
> >> https://travis-ci.org/github/elibritstein/OVS/builds/763146966
> >> v5 <https://travis-ci.org/github/elibritstein/OVS/builds/763146966v5>:
> >> https://travis-ci.org/github/elibritstein/OVS/builds/765271879
> >> v6 <https://travis-ci.org/github/elibritstein/OVS/builds/765271879v6>:
> >> https://travis-ci.org/github/elibritstein/OVS/builds/765816800
> >>
> >> GitHub Actions:
> >> v1: https://github.com/elibritstein/OVS/actions/runs/515334647
> >> v2: https://github.com/elibritstein/OVS/actions/runs/554986007
> >> v3: https://github.com/elibritstein/OVS/actions/runs/613226225
> >> v4: https://github.com/elibritstein/OVS/actions/runs/658415274
> >> v5: https://github.com/elibritstein/OVS/actions/runs/704357369
> >> v6: https://github.com/elibritstein/OVS/actions/runs/716304028
> >>
> >> [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
> >> [2] https://github.com/elibritstein/dpdk-stable/pull/1
> >>
> >>
> >> Eli Britstein (10):
> >>   netdev-offload: Add HW miss packet state recover API
> >>   netdev-dpdk: Introduce DPDK tunnel APIs
> >>   netdev-offload: Introduce an API to traverse ports
> >>   netdev-dpdk: Add flow_api support for netdev vxlan vports
> >>   netdev-offload-dpdk: Implement HW miss packet recover for vport
> >>   dpif-netdev: Add HW miss packet state recover logic
> >>   netdev-offload-dpdk: Change log rate limits
> >>   netdev-offload-dpdk: Support tunnel pop action
> >>   netdev-offload-dpdk: Support vports flows offload
> >>   netdev-dpdk-offload: Add vxlan pattern matching function
> >>
> >> Ilya Maximets (2):
> >>   netdev-offload: Allow offloading to netdev without ifindex.
> >>   netdev-offload: Disallow offloading to unrelated tunneling vports.
> >>
> >> Sriharsha Basavapatna (1):
> >>   dpif-netdev: Provide orig_in_port in metadata for tunneled packets
> >>
> >>  Documentation/howto/dpdk.rst  |   1 +
> >>  NEWS                          |   2 +
> >>  lib/dpif-netdev.c             |  97 +++--
> >>  lib/netdev-dpdk.c             | 118 ++++++
> >>  lib/netdev-dpdk.h             | 106 ++++-
> >>  lib/netdev-offload-dpdk.c     | 704
> +++++++++++++++++++++++++++++++---
> >>  lib/netdev-offload-provider.h |   5 +
> >>  lib/netdev-offload-tc.c       |   8 +
> >>  lib/netdev-offload.c          |  47 ++-
> >>  lib/netdev-offload.h          |  10 +
> >>  lib/packets.h                 |   8 +-
> >>  11 files changed, 1022 insertions(+), 84 deletions(-)
> >>
> >> --
> >> 2.28.0.2311.g225365fb51
> >>
> > Acked-by: Sriharsha Basavapatna
> <sriharsha.basavapatna@broadcom.com>
> 
> Thanks,
> -Harsha
> 

Tested-by: Emma Finn <emma.finn@intel.com>



> >
> >>
> > This electronic communication and the information and any files
> > transmitted with it, or attached to it, are confidential and are
> > intended solely for the use of the individual or entity to whom it is
> > addressed and may contain information that is confidential, legally
> > privileged, protected by privacy laws, or otherwise restricted from
> > disclosure to anyone else. If you are not the intended recipient or
> > the person responsible for delivering the e-mail to the intended
> > recipient, you are hereby notified that any use, copying,
> > distributing, dissemination, forwarding, printing, or copying of this
> > e-mail is strictly prohibited. If you received this e-mail in error,
> > please return the e-mail to the sender, delete it from your computer, and
> destroy any printed copy of it.
> >
> >
> 
> --
> This electronic communication and the information and any files transmitted
> with it, or attached to it, are confidential and are intended solely for the use
> of the individual or entity to whom it is addressed and may contain
> information that is confidential, legally privileged, protected by privacy laws,
> or otherwise restricted from disclosure to anyone else. If you are not the
> intended recipient or the person responsible for delivering the e-mail to the
> intended recipient, you are hereby notified that any use, copying,
> distributing, dissemination, forwarding, printing, or copying of this e-mail is
> strictly prohibited. If you received this e-mail in error, please return the e-
> mail to the sender, delete it from your computer, and destroy any printed
> copy of it.
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Kovacevic, Marko April 21, 2021, 8:20 a.m. UTC | #7
> On 4/7/2021 12:10 PM, Sriharsha Basavapatna wrote:
> >
> > On Sun, Apr 4, 2021 at 3:25 PM Eli Britstein <elibr@nvidia.com
> > <mailto:elibr@nvidia.com>> wrote:
> >
> >     VXLAN decap in OVS-DPDK configuration consists of two flows:
> >     F1: in_port(ens1f0),eth(),ipv4(),udp(),
> >     actions:tnl_pop(vxlan_sys_4789)
> >     F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
> >
> >     F1 is a classification flow. It has outer headers matches and it
> >     classifies the packet as a VXLAN packet, and using tnl_pop action the
> >     packet continues processing in F2.
> >     F2 is a flow that has matches on tunnel metadata as well as on the
> >     inner
> >     packet headers (as any other flow).
> >
> >     In order to fully offload VXLAN decap path, both F1 and F2 should be
> >     offloaded. As there are more than one flow in HW, it is possible that
> >     F1 is done by HW but F2 is not. Packet is received by SW, and
> >     should be
> >     processed starting from F2 as F1 was already done by HW.
> >     Rte_flows are applicable only on physical port IDs. Keeping the
> >     original
> >     physical in port on which the packet is received on enables applying
> >     vport flows (e.g. F2) on that physical port.
> >
> >     This patch-set makes use of [1] introduced in DPDK 20.11, that
> >     adds API
> >     for tunnel offloads.
> >
> >     Note that MLX5 PMD has a bug that the tnl_pop private actions must be
> >     first. In OVS it is not.
> >     Fixing this issue is scheduled for 21.05 (and stable 20.11.2).
> >     Meanwhile, tests were done with a workaround for it [2].
> >
> >     v2-v1:
> >     - Tracking original in_port, and applying vport on that physical
> >     port instead of all PFs.
> >     v3-v2:
> >     - Traversing ports using a new API instead of flow_dump.
> >     - Refactor packet state recover logic, with bug fix for error
> >     pop_header.
> >     - One ref count for netdev in non-tunnel case.
> >     - Rename variables, comments, rebase.
> >     v4-v3:
> >     - Extract orig_in_port from physdev for flow modify.
> >     - Miss handling fixes.
> >     v5-v4:
> >     - Drop refactor offload rule creation commit.
> >     - Comment about setting in_port in restore.
> >     - Refactor vports flow offload commit.
> >     v6-v5:
> >     - Fixed duplicate netdev ref bug.
> >
> >
> > Can you provide some info on this bug ?  and what changes were done to
> > fix this ?
> 
> With v5, the 2 netdevs sent to ufid_to_rte_flow_associate are always
> non-NULL (was not like this previously), and there was this line:
> 
> +    data->netdev = vport ? netdev_ref(vport) : physdev;
> 
> As the "vport" was always non-null, even for non-tunnels, it took
> another ref of it, but in disassociate, only one close was done.
> 
> With v6 it is like this (changed arguments names a bit)
> 
> +    data->physdev = netdev != physdev ? netdev_ref(physdev) : physdev;
> 
> Checking the netdevs are different, not non-NULL.
> 
> > Thanks,
> > -Harsha
> >
> >
> >     Travis:
> >     v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
> >     v2
> >     <https://travis-ci.org/github/elibritstein/OVS/builds/756418552v2>:
> >     https://travis-ci.org/github/elibritstein/OVS/builds/758382963
> >     v3
> >     <https://travis-ci.org/github/elibritstein/OVS/builds/758382963v3>:
> >     https://travis-ci.org/github/elibritstein/OVS/builds/761089087
> >     v4
> >     <https://travis-ci.org/github/elibritstein/OVS/builds/761089087v4>:
> >     https://travis-ci.org/github/elibritstein/OVS/builds/763146966
> >     v5
> >     <https://travis-ci.org/github/elibritstein/OVS/builds/763146966v5>:
> >     https://travis-ci.org/github/elibritstein/OVS/builds/765271879
> >     v6
> >     <https://travis-ci.org/github/elibritstein/OVS/builds/765271879v6>:
> >     https://travis-ci.org/github/elibritstein/OVS/builds/765816800
> >     <https://travis-ci.org/github/elibritstein/OVS/builds/765816800>
> >
> >     GitHub Actions:
> >     v1: https://github.com/elibritstein/OVS/actions/runs/515334647
> >     <https://github.com/elibritstein/OVS/actions/runs/515334647>
> >     v2: https://github.com/elibritstein/OVS/actions/runs/554986007
> >     <https://github.com/elibritstein/OVS/actions/runs/554986007>
> >     v3: https://github.com/elibritstein/OVS/actions/runs/613226225
> >     <https://github.com/elibritstein/OVS/actions/runs/613226225>
> >     v4: https://github.com/elibritstein/OVS/actions/runs/658415274
> >     <https://github.com/elibritstein/OVS/actions/runs/658415274>
> >     v5: https://github.com/elibritstein/OVS/actions/runs/704357369
> >     <https://github.com/elibritstein/OVS/actions/runs/704357369>
> >     v6: https://github.com/elibritstein/OVS/actions/runs/716304028
> >     <https://github.com/elibritstein/OVS/actions/runs/716304028>
> >
> >     [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
> >     <https://mails.dpdk.org/archives/dev/2020-October/187314.html>
> >     [2] https://github.com/elibritstein/dpdk-stable/pull/1
> >     <https://github.com/elibritstein/dpdk-stable/pull/1>
> >
> >
> >     Eli Britstein (10):
> >       netdev-offload: Add HW miss packet state recover API
> >       netdev-dpdk: Introduce DPDK tunnel APIs
> >       netdev-offload: Introduce an API to traverse ports
> >       netdev-dpdk: Add flow_api support for netdev vxlan vports
> >       netdev-offload-dpdk: Implement HW miss packet recover for vport
> >       dpif-netdev: Add HW miss packet state recover logic
> >       netdev-offload-dpdk: Change log rate limits
> >       netdev-offload-dpdk: Support tunnel pop action
> >       netdev-offload-dpdk: Support vports flows offload
> >       netdev-dpdk-offload: Add vxlan pattern matching function
> >
> >     Ilya Maximets (2):
> >       netdev-offload: Allow offloading to netdev without ifindex.
> >       netdev-offload: Disallow offloading to unrelated tunneling vports.
> >
> >     Sriharsha Basavapatna (1):
> >       dpif-netdev: Provide orig_in_port in metadata for tunneled packets
> >
> >      Documentation/howto/dpdk.rst  |   1 +
> >      NEWS                          |   2 +
> >      lib/dpif-netdev.c             |  97 +++--
> >      lib/netdev-dpdk.c             | 118 ++++++
> >      lib/netdev-dpdk.h             | 106 ++++-
> >      lib/netdev-offload-dpdk.c     | 704
> >     +++++++++++++++++++++++++++++++---
> >      lib/netdev-offload-provider.h |   5 +
> >      lib/netdev-offload-tc.c       |   8 +
> >      lib/netdev-offload.c          |  47 ++-
> >      lib/netdev-offload.h          |  10 +
> >      lib/packets.h                 |   8 +-
> >      11 files changed, 1022 insertions(+), 84 deletions(-)
> >
> >     --
> >     2.28.0.2311.g225365fb51
> >
> >
> > This electronic communication and the information and any files
> > transmitted with it, or attached to it, are confidential and are
> > intended solely for the use of the individual or entity to whom it is
> > addressed and may contain information that is confidential, legally
> > privileged, protected by privacy laws, or otherwise restricted from
> > disclosure to anyone else. If you are not the intended recipient or
> > the person responsible for delivering the e-mail to the intended
> > recipient, you are hereby notified that any use, copying,
> > distributing, dissemination, forwarding, printing, or copying of this
> > e-mail is strictly prohibited. If you received this e-mail in error,
> > please return the e-mail to the sender, delete it from your computer,
> > and destroy any printed copy of it.
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Tested with burst and scatter traffic profiles on the 32vm setup I have and seen no performance drops.
For burst little bit of a perf gain & with scatter I seen about 1% to 1.6 % in perf gain

Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
Gaetan Rivet May 5, 2021, 4:42 p.m. UTC | #8
On Wed, Apr 21, 2021, at 10:20, Kovacevic, Marko wrote:
> > On 4/7/2021 12:10 PM, Sriharsha Basavapatna wrote:
> > >
> > > On Sun, Apr 4, 2021 at 3:25 PM Eli Britstein <elibr@nvidia.com
> > > <mailto:elibr@nvidia.com>> wrote:
> > >
> > >     VXLAN decap in OVS-DPDK configuration consists of two flows:
> > >     F1: in_port(ens1f0),eth(),ipv4(),udp(),
> > >     actions:tnl_pop(vxlan_sys_4789)
> > >     F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
> > >
> > >     F1 is a classification flow. It has outer headers matches and it
> > >     classifies the packet as a VXLAN packet, and using tnl_pop action the
> > >     packet continues processing in F2.
> > >     F2 is a flow that has matches on tunnel metadata as well as on the
> > >     inner
> > >     packet headers (as any other flow).
> > >
> > >     In order to fully offload VXLAN decap path, both F1 and F2 should be
> > >     offloaded. As there are more than one flow in HW, it is possible that
> > >     F1 is done by HW but F2 is not. Packet is received by SW, and
> > >     should be
> > >     processed starting from F2 as F1 was already done by HW.
> > >     Rte_flows are applicable only on physical port IDs. Keeping the
> > >     original
> > >     physical in port on which the packet is received on enables applying
> > >     vport flows (e.g. F2) on that physical port.
> > >
> > >     This patch-set makes use of [1] introduced in DPDK 20.11, that
> > >     adds API
> > >     for tunnel offloads.
> > >
> > >     Note that MLX5 PMD has a bug that the tnl_pop private actions must be
> > >     first. In OVS it is not.
> > >     Fixing this issue is scheduled for 21.05 (and stable 20.11.2).
> > >     Meanwhile, tests were done with a workaround for it [2].
> > >
> > >     v2-v1:
> > >     - Tracking original in_port, and applying vport on that physical
> > >     port instead of all PFs.
> > >     v3-v2:
> > >     - Traversing ports using a new API instead of flow_dump.
> > >     - Refactor packet state recover logic, with bug fix for error
> > >     pop_header.
> > >     - One ref count for netdev in non-tunnel case.
> > >     - Rename variables, comments, rebase.
> > >     v4-v3:
> > >     - Extract orig_in_port from physdev for flow modify.
> > >     - Miss handling fixes.
> > >     v5-v4:
> > >     - Drop refactor offload rule creation commit.
> > >     - Comment about setting in_port in restore.
> > >     - Refactor vports flow offload commit.
> > >     v6-v5:
> > >     - Fixed duplicate netdev ref bug.
> > >
> > >
> > > Can you provide some info on this bug ?  and what changes were done to
> > > fix this ?
> > 
> > With v5, the 2 netdevs sent to ufid_to_rte_flow_associate are always
> > non-NULL (was not like this previously), and there was this line:
> > 
> > +    data->netdev = vport ? netdev_ref(vport) : physdev;
> > 
> > As the "vport" was always non-null, even for non-tunnels, it took
> > another ref of it, but in disassociate, only one close was done.
> > 
> > With v6 it is like this (changed arguments names a bit)
> > 
> > +    data->physdev = netdev != physdev ? netdev_ref(physdev) : physdev;
> > 
> > Checking the netdevs are different, not non-NULL.
> > 
> > > Thanks,
> > > -Harsha
> > >
> > >
> > >     Travis:
> > >     v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
> > >     v2
> > >     <https://travis-ci.org/github/elibritstein/OVS/builds/756418552v2>:
> > >     https://travis-ci.org/github/elibritstein/OVS/builds/758382963
> > >     v3
> > >     <https://travis-ci.org/github/elibritstein/OVS/builds/758382963v3>:
> > >     https://travis-ci.org/github/elibritstein/OVS/builds/761089087
> > >     v4
> > >     <https://travis-ci.org/github/elibritstein/OVS/builds/761089087v4>:
> > >     https://travis-ci.org/github/elibritstein/OVS/builds/763146966
> > >     v5
> > >     <https://travis-ci.org/github/elibritstein/OVS/builds/763146966v5>:
> > >     https://travis-ci.org/github/elibritstein/OVS/builds/765271879
> > >     v6
> > >     <https://travis-ci.org/github/elibritstein/OVS/builds/765271879v6>:
> > >     https://travis-ci.org/github/elibritstein/OVS/builds/765816800
> > >     <https://travis-ci.org/github/elibritstein/OVS/builds/765816800>
> > >
> > >     GitHub Actions:
> > >     v1: https://github.com/elibritstein/OVS/actions/runs/515334647
> > >     <https://github.com/elibritstein/OVS/actions/runs/515334647>
> > >     v2: https://github.com/elibritstein/OVS/actions/runs/554986007
> > >     <https://github.com/elibritstein/OVS/actions/runs/554986007>
> > >     v3: https://github.com/elibritstein/OVS/actions/runs/613226225
> > >     <https://github.com/elibritstein/OVS/actions/runs/613226225>
> > >     v4: https://github.com/elibritstein/OVS/actions/runs/658415274
> > >     <https://github.com/elibritstein/OVS/actions/runs/658415274>
> > >     v5: https://github.com/elibritstein/OVS/actions/runs/704357369
> > >     <https://github.com/elibritstein/OVS/actions/runs/704357369>
> > >     v6: https://github.com/elibritstein/OVS/actions/runs/716304028
> > >     <https://github.com/elibritstein/OVS/actions/runs/716304028>
> > >
> > >     [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
> > >     <https://mails.dpdk.org/archives/dev/2020-October/187314.html>
> > >     [2] https://github.com/elibritstein/dpdk-stable/pull/1
> > >     <https://github.com/elibritstein/dpdk-stable/pull/1>
> > >
> > >
> > >     Eli Britstein (10):
> > >       netdev-offload: Add HW miss packet state recover API
> > >       netdev-dpdk: Introduce DPDK tunnel APIs
> > >       netdev-offload: Introduce an API to traverse ports
> > >       netdev-dpdk: Add flow_api support for netdev vxlan vports
> > >       netdev-offload-dpdk: Implement HW miss packet recover for vport
> > >       dpif-netdev: Add HW miss packet state recover logic
> > >       netdev-offload-dpdk: Change log rate limits
> > >       netdev-offload-dpdk: Support tunnel pop action
> > >       netdev-offload-dpdk: Support vports flows offload
> > >       netdev-dpdk-offload: Add vxlan pattern matching function
> > >
> > >     Ilya Maximets (2):
> > >       netdev-offload: Allow offloading to netdev without ifindex.
> > >       netdev-offload: Disallow offloading to unrelated tunneling vports.
> > >
> > >     Sriharsha Basavapatna (1):
> > >       dpif-netdev: Provide orig_in_port in metadata for tunneled packets
> > >
> > >      Documentation/howto/dpdk.rst  |   1 +
> > >      NEWS                          |   2 +
> > >      lib/dpif-netdev.c             |  97 +++--
> > >      lib/netdev-dpdk.c             | 118 ++++++
> > >      lib/netdev-dpdk.h             | 106 ++++-
> > >      lib/netdev-offload-dpdk.c     | 704
> > >     +++++++++++++++++++++++++++++++---
> > >      lib/netdev-offload-provider.h |   5 +
> > >      lib/netdev-offload-tc.c       |   8 +
> > >      lib/netdev-offload.c          |  47 ++-
> > >      lib/netdev-offload.h          |  10 +
> > >      lib/packets.h                 |   8 +-
> > >      11 files changed, 1022 insertions(+), 84 deletions(-)
> > >
> > >     --
> > >     2.28.0.2311.g225365fb51
> > >
> > >
> > > This electronic communication and the information and any files
> > > transmitted with it, or attached to it, are confidential and are
> > > intended solely for the use of the individual or entity to whom it is
> > > addressed and may contain information that is confidential, legally
> > > privileged, protected by privacy laws, or otherwise restricted from
> > > disclosure to anyone else. If you are not the intended recipient or
> > > the person responsible for delivering the e-mail to the intended
> > > recipient, you are hereby notified that any use, copying,
> > > distributing, dissemination, forwarding, printing, or copying of this
> > > e-mail is strictly prohibited. If you received this e-mail in error,
> > > please return the e-mail to the sender, delete it from your computer,
> > > and destroy any printed copy of it.
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 
> Tested with burst and scatter traffic profiles on the 32vm setup I have 
> and seen no performance drops.
> For burst little bit of a perf gain & with scatter I seen about 1% to 
> 1.6 % in perf gain
> 
> Tested-by: Marko Kovacevic <marko.kovacevic@intel.com>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> 

Thanks for the tests and the reviews done!

We are currently trying to develop a feature (port mirroring offload) that would
use the port traversal helper introduced by this series. It would help to know
if it needs further changes or if we can depend on it.

If anyone has remarks or otherwise thinks the change is good, please let us know.

Best regards,
Ilya Maximets June 22, 2021, 3:55 p.m. UTC | #9
On 4/4/21 11:54 AM, Eli Britstein wrote:
> VXLAN decap in OVS-DPDK configuration consists of two flows:
> F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
> F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
> 
> F1 is a classification flow. It has outer headers matches and it
> classifies the packet as a VXLAN packet, and using tnl_pop action the
> packet continues processing in F2.
> F2 is a flow that has matches on tunnel metadata as well as on the inner
> packet headers (as any other flow).
> 
> In order to fully offload VXLAN decap path, both F1 and F2 should be
> offloaded. As there are more than one flow in HW, it is possible that
> F1 is done by HW but F2 is not. Packet is received by SW, and should be
> processed starting from F2 as F1 was already done by HW.
> Rte_flows are applicable only on physical port IDs. Keeping the original
> physical in port on which the packet is received on enables applying
> vport flows (e.g. F2) on that physical port.
> 
> This patch-set makes use of [1] introduced in DPDK 20.11, that adds API
> for tunnel offloads.
> 
> Note that MLX5 PMD has a bug that the tnl_pop private actions must be
> first. In OVS it is not.
> Fixing this issue is scheduled for 21.05 (and stable 20.11.2).
> Meanwhile, tests were done with a workaround for it [2].
> 
> v2-v1:
> - Tracking original in_port, and applying vport on that physical port instead of all PFs.
> v3-v2:
> - Traversing ports using a new API instead of flow_dump.
> - Refactor packet state recover logic, with bug fix for error pop_header.
> - One ref count for netdev in non-tunnel case.
> - Rename variables, comments, rebase.
> v4-v3:
> - Extract orig_in_port from physdev for flow modify.
> - Miss handling fixes.
> v5-v4:
> - Drop refactor offload rule creation commit.
> - Comment about setting in_port in restore.
> - Refactor vports flow offload commit.
> v6-v5:
> - Fixed duplicate netdev ref bug.
> 
> Travis:
> v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
> v2: https://travis-ci.org/github/elibritstein/OVS/builds/758382963
> v3: https://travis-ci.org/github/elibritstein/OVS/builds/761089087
> v4: https://travis-ci.org/github/elibritstein/OVS/builds/763146966
> v5: https://travis-ci.org/github/elibritstein/OVS/builds/765271879
> v6: https://travis-ci.org/github/elibritstein/OVS/builds/765816800
> 
> GitHub Actions:
> v1: https://github.com/elibritstein/OVS/actions/runs/515334647
> v2: https://github.com/elibritstein/OVS/actions/runs/554986007
> v3: https://github.com/elibritstein/OVS/actions/runs/613226225
> v4: https://github.com/elibritstein/OVS/actions/runs/658415274
> v5: https://github.com/elibritstein/OVS/actions/runs/704357369
> v6: https://github.com/elibritstein/OVS/actions/runs/716304028
> 
> [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
> [2] https://github.com/elibritstein/dpdk-stable/pull/1
> 
> 
> Eli Britstein (10):
>   netdev-offload: Add HW miss packet state recover API
>   netdev-dpdk: Introduce DPDK tunnel APIs
>   netdev-offload: Introduce an API to traverse ports
>   netdev-dpdk: Add flow_api support for netdev vxlan vports
>   netdev-offload-dpdk: Implement HW miss packet recover for vport
>   dpif-netdev: Add HW miss packet state recover logic
>   netdev-offload-dpdk: Change log rate limits
>   netdev-offload-dpdk: Support tunnel pop action
>   netdev-offload-dpdk: Support vports flows offload
>   netdev-dpdk-offload: Add vxlan pattern matching function
> 
> Ilya Maximets (2):
>   netdev-offload: Allow offloading to netdev without ifindex.
>   netdev-offload: Disallow offloading to unrelated tunneling vports.
> 
> Sriharsha Basavapatna (1):
>   dpif-netdev: Provide orig_in_port in metadata for tunneled packets
> 
>  Documentation/howto/dpdk.rst  |   1 +
>  NEWS                          |   2 +
>  lib/dpif-netdev.c             |  97 +++--
>  lib/netdev-dpdk.c             | 118 ++++++
>  lib/netdev-dpdk.h             | 106 ++++-
>  lib/netdev-offload-dpdk.c     | 704 +++++++++++++++++++++++++++++++---
>  lib/netdev-offload-provider.h |   5 +
>  lib/netdev-offload-tc.c       |   8 +
>  lib/netdev-offload.c          |  47 ++-
>  lib/netdev-offload.h          |  10 +
>  lib/packets.h                 |   8 +-
>  11 files changed, 1022 insertions(+), 84 deletions(-)
> 

Hi.  I reviewed the whole series and it looks mostly OK to me.
I made a several changes along the way and below you may find a diff
of what I changed.  In short:

 - Some style fixes: style of function prototypes and a way how long
   function calls wrapped.  Added names to uint32_t arguments to
   function prototypes.

 - Clarified hw_miss_packet_recover() API.  It needs a note that it
   takes ownership of a packet on error.  Fixed 'return -1' which
   doesn't comply with the API definition (should return positive
   errno).

 - Moved NEWS updates to correct place.

 - Added a packet drop counter.

 - Refactored changes in dfc_processing().  Basically, restored it
   to look mostly like it was before, because LIKELY/UNLIKELY markers
   make no sense for non-offloading cases where they both should be
   'unlikely'.  Also, old way of writing this part allows following
   optimization:
     https://patchwork.ozlabs.org/project/openvswitch/patch/c0b99b4b9be6c290a6aa3cb00049fac4ebfac5d8.1618390390.git.bnemeth@redhat.com/

 - And the only functional change that I made is that I guarded
   actual offloading support with 'ifdef DPDK_EXPERIMENTAL_API',
   because they doesn't make sense to me if packet restoration
   API is not available.  I also added this information to the
   NEWS file.  I think that we should not try to offload TUNNEL_POP
   if we can't restore the tunnel metadata.  And if we can't have
   the first flow in HW, there is no point adding the second one.

Complete branch with ready-to-push patches available here:
  https://github.com/igsilya/ovs/tree/tmp-vxlan-decap

Diff below is a diff with this v6 patch-set.  Please, take a look/test.
I'll wait for Ack on changes below or comments, if I broke something,
before pushing this to the main repo.

Best regards, Ilya Maximets.

The diff:

diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
index 4918d80f3..36314c06a 100644
--- a/Documentation/howto/dpdk.rst
+++ b/Documentation/howto/dpdk.rst
@@ -398,7 +398,7 @@ Supported actions for hardware offload are:
 - VLAN Push/Pop (push_vlan/pop_vlan).
 - Modification of IPv6 (set_field:<ADDR>->ipv6_src/ipv6_dst/mod_nw_ttl).
 - Clone/output (tnl_push and output) for encapsulating over a tunnel.
-- Tunnel pop, for changing from a physical port to a vport.
+- Tunnel pop, for packets received on physical ports.
 
 Further Reading
 ---------------
diff --git a/NEWS b/NEWS
index e49f2955d..10b3ab053 100644
--- a/NEWS
+++ b/NEWS
@@ -17,6 +17,10 @@ Post-v2.15.0
      * OVS validated with DPDK 20.11.1. It is recommended to use this version
        until further releases.
      * New debug appctl command 'dpdk/get-malloc-stats'.
+     * Add hardware offload support for tunnel pop action (experimental).
+       Available only if DPDK experimantal APIs enabled during the build.
+     * Add hardware offload support for VXLAN flows (experimental).
+       Available only if DPDK experimantal APIs enabled during the build.
    - ovsdb-tool:
      * New option '--election-timer' to the 'create-cluster' command to set the
        leader election timer during cluster creation.
@@ -44,8 +48,6 @@ v2.15.0 - 15 Feb 2021
    - DPDK:
      * Removed support for vhost-user dequeue zero-copy.
      * Add support for DPDK 20.11.
-     * Add hardware offload support for tunnel pop action (experimental).
-     * Add hardware offload support for VXLAN flows (experimental).
    - Userspace datapath:
      * Add the 'pmd' option to "ovs-appctl dpctl/dump-flows", which
        restricts a flow dump to a single PMD thread if set.
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 1bd828f82..8766a00ea 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -114,6 +114,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_port);
 COVERAGE_DEFINE(datapath_drop_invalid_bond);
 COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
 COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
+COVERAGE_DEFINE(datapath_drop_hw_miss_recover);
 
 /* Protects against changes to 'dp_netdevs'. */
 static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
@@ -7068,9 +7069,8 @@ smc_lookup_batch(struct dp_netdev_pmd_thread *pmd,
     pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SMC_HIT, n_smc_hit);
 }
 
-static struct tx_port *
-pmd_send_port_cache_lookup(const struct dp_netdev_pmd_thread *pmd,
-                           odp_port_t port_no);
+static struct tx_port * pmd_send_port_cache_lookup(
+    const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no);
 
 static inline int
 dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
@@ -7081,17 +7081,13 @@ dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
     struct tx_port *p;
     uint32_t mark;
 
-    if (!netdev_is_flow_api_enabled() || *recirc_depth_get() != 0) {
-        *flow = NULL;
-        return 0;
-    }
-
     /* Restore the packet if HW processing was terminated before completion. */
     p = pmd_send_port_cache_lookup(pmd, port_no);
     if (OVS_LIKELY(p)) {
         int err = netdev_hw_miss_packet_recover(p->port->netdev, packet);
 
-        if (err != 0 && err != EOPNOTSUPP) {
+        if (err && err != EOPNOTSUPP) {
+            COVERAGE_INC(datapath_drop_hw_miss_recover);
             return -1;
         }
     }
@@ -7168,25 +7164,28 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
             pkt_metadata_init(&packet->md, port_no);
         }
 
-        if (OVS_UNLIKELY(dp_netdev_hw_flow(pmd, port_no, packet, &flow))) {
-            /* Packet restoration failed. Its mbuf was freed, do not continue
-             * processing.
-             */
-            continue;
-        } else if (OVS_LIKELY(flow)) {
-            tcp_flags = parse_tcp_flags(packet);
-            if (OVS_LIKELY(batch_enable)) {
-                dp_netdev_queue_batches(packet, flow, tcp_flags, batches,
-                                        n_batches);
-            } else {
-                /* Flow batching should be performed only after fast-path
-                 * processing is also completed for packets with emc miss
-                 * or else it will result in reordering of packets with
-                 * same datapath flows. */
-                packet_enqueue_to_flow_map(packet, flow, tcp_flags, flow_map,
-                                           map_cnt++);
+        if (netdev_is_flow_api_enabled() && *recirc_depth_get() == 0) {
+            if (OVS_UNLIKELY(dp_netdev_hw_flow(pmd, port_no, packet, &flow))) {
+                /* Packet restoration failed and it was dropped, do not
+                 * continue processing.
+                 */
+                continue;
+            }
+            if (OVS_LIKELY(flow)) {
+                tcp_flags = parse_tcp_flags(packet);
+                if (OVS_LIKELY(batch_enable)) {
+                    dp_netdev_queue_batches(packet, flow, tcp_flags, batches,
+                                            n_batches);
+                } else {
+                    /* Flow batching should be performed only after fast-path
+                     * processing is also completed for packets with emc miss
+                     * or else it will result in reordering of packets with
+                     * same datapath flows. */
+                    packet_enqueue_to_flow_map(packet, flow, tcp_flags,
+                                               flow_map, map_cnt++);
+                }
+                continue;
             }
-            continue;
         }
 
         miniflow_extract(packet, &key->mf);
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 6e35d0574..bc5485d60 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -5365,11 +5365,11 @@ netdev_dpdk_rte_flow_get_restore_info(struct netdev *netdev,
 }
 
 int
-netdev_dpdk_rte_flow_tunnel_action_decap_release
-    (struct netdev *netdev,
-     struct rte_flow_action *actions,
-     uint32_t num_of_actions,
-     struct rte_flow_error *error)
+netdev_dpdk_rte_flow_tunnel_action_decap_release(
+    struct netdev *netdev,
+    struct rte_flow_action *actions,
+    uint32_t num_of_actions,
+    struct rte_flow_error *error)
 {
     struct netdev_dpdk *dev;
     int ret;
diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
index 3b9bf8681..7b77ed8e0 100644
--- a/lib/netdev-dpdk.h
+++ b/lib/netdev-dpdk.h
@@ -53,33 +53,28 @@ netdev_dpdk_get_port_id(struct netdev *netdev);
 
 #ifdef ALLOW_EXPERIMENTAL_API
 
-int
-netdev_dpdk_rte_flow_tunnel_decap_set(struct netdev *,
+int netdev_dpdk_rte_flow_tunnel_decap_set(struct netdev *,
+                                          struct rte_flow_tunnel *,
+                                          struct rte_flow_action **,
+                                          uint32_t *num_of_actions,
+                                          struct rte_flow_error *);
+int netdev_dpdk_rte_flow_tunnel_match(struct netdev *,
                                       struct rte_flow_tunnel *,
-                                      struct rte_flow_action **,
-                                      uint32_t *,
-                                      struct rte_flow_error *);
-int
-netdev_dpdk_rte_flow_tunnel_match(struct netdev *,
-                                  struct rte_flow_tunnel *,
-                                  struct rte_flow_item **,
-                                  uint32_t *,
-                                  struct rte_flow_error *);
-int
-netdev_dpdk_rte_flow_get_restore_info(struct netdev *,
-                                      struct dp_packet *,
-                                      struct rte_flow_restore_info *,
+                                      struct rte_flow_item **,
+                                      uint32_t *num_of_items,
                                       struct rte_flow_error *);
-int
-netdev_dpdk_rte_flow_tunnel_action_decap_release(struct netdev *,
-                                                 struct rte_flow_action *,
-                                                 uint32_t,
-                                                 struct rte_flow_error *);
-int
-netdev_dpdk_rte_flow_tunnel_item_release(struct netdev *,
-                                         struct rte_flow_item *,
-                                         uint32_t,
-                                         struct rte_flow_error *);
+int netdev_dpdk_rte_flow_get_restore_info(struct netdev *,
+                                          struct dp_packet *,
+                                          struct rte_flow_restore_info *,
+                                          struct rte_flow_error *);
+int netdev_dpdk_rte_flow_tunnel_action_decap_release(struct netdev *,
+                                                     struct rte_flow_action *,
+                                                     uint32_t num_of_actions,
+                                                     struct rte_flow_error *);
+int netdev_dpdk_rte_flow_tunnel_item_release(struct netdev *,
+                                             struct rte_flow_item *,
+                                             uint32_t num_of_items,
+                                             struct rte_flow_error *);
 
 #else
 
@@ -92,14 +87,13 @@ set_error(struct rte_flow_error *error, enum rte_flow_error_type type)
 }
 
 static inline int
-netdev_dpdk_rte_flow_tunnel_decap_set(struct netdev *netdev OVS_UNUSED,
-                                      struct rte_flow_tunnel *tunnel,
-                                      struct rte_flow_action **actions,
-                                      uint32_t *num_of_actions OVS_UNUSED,
-                                      struct rte_flow_error *error)
+netdev_dpdk_rte_flow_tunnel_decap_set(
+    struct netdev *netdev OVS_UNUSED,
+    struct rte_flow_tunnel *tunnel OVS_UNUSED,
+    struct rte_flow_action **actions OVS_UNUSED,
+    uint32_t *num_of_actions OVS_UNUSED,
+    struct rte_flow_error *error)
 {
-    (void) tunnel;
-    (void) actions;
     set_error(error, RTE_FLOW_ERROR_TYPE_ACTION);
     return -1;
 }
@@ -116,34 +110,34 @@ netdev_dpdk_rte_flow_tunnel_match(struct netdev *netdev OVS_UNUSED,
 }
 
 static inline int
-netdev_dpdk_rte_flow_get_restore_info(struct netdev *netdev OVS_UNUSED,
-                                      struct dp_packet *p OVS_UNUSED,
-                                      struct rte_flow_restore_info *info,
-                                      struct rte_flow_error *error)
+netdev_dpdk_rte_flow_get_restore_info(
+    struct netdev *netdev OVS_UNUSED,
+    struct dp_packet *p OVS_UNUSED,
+    struct rte_flow_restore_info *info OVS_UNUSED,
+    struct rte_flow_error *error)
 {
-    (void) info;
     set_error(error, RTE_FLOW_ERROR_TYPE_ATTR);
     return -1;
 }
 
 static inline int
-netdev_dpdk_rte_flow_tunnel_action_decap_release
-    (struct netdev *netdev OVS_UNUSED,
-     struct rte_flow_action *actions OVS_UNUSED,
-     uint32_t num_of_actions OVS_UNUSED,
-     struct rte_flow_error *error)
+netdev_dpdk_rte_flow_tunnel_action_decap_release(
+    struct netdev *netdev OVS_UNUSED,
+    struct rte_flow_action *actions OVS_UNUSED,
+    uint32_t num_of_actions OVS_UNUSED,
+    struct rte_flow_error *error)
 {
     set_error(error, RTE_FLOW_ERROR_TYPE_NONE);
     return 0;
 }
 
 static inline int
-netdev_dpdk_rte_flow_tunnel_item_release(struct netdev *netdev OVS_UNUSED,
-                                         struct rte_flow_item *items,
-                                         uint32_t num_of_items OVS_UNUSED,
-                                         struct rte_flow_error *error)
+netdev_dpdk_rte_flow_tunnel_item_release(
+    struct netdev *netdev OVS_UNUSED,
+    struct rte_flow_item *items OVS_UNUSED,
+    uint32_t num_of_items OVS_UNUSED,
+    struct rte_flow_error *error)
 {
-    (void) items;
     set_error(error, RTE_FLOW_ERROR_TYPE_NONE);
     return 0;
 }
diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
index 52a74a707..363f32f71 100644
--- a/lib/netdev-offload-dpdk.c
+++ b/lib/netdev-offload-dpdk.c
@@ -459,7 +459,7 @@ dump_flow_action(struct ds *s, struct ds *s_extra,
                act_index >= flow_actions->tnl_pmd_actions_pos &&
                act_index < flow_actions->tnl_pmd_actions_pos +
                            flow_actions->tnl_pmd_actions_cnt) {
-        /* Opaque PMD tunnel actions is skipped. */
+        /* Opaque PMD tunnel actions are skipped. */
         return;
     } else if (actions->type == RTE_FLOW_ACTION_TYPE_MARK) {
         const struct rte_flow_action_mark *mark = actions->conf;
@@ -792,9 +792,9 @@ free_flow_actions(struct flow_actions *actions)
     for (i = 0; i < actions->cnt; i++) {
         if (actions->tnl_pmd_actions_cnt &&
             i == actions->tnl_pmd_actions_pos) {
-            if (netdev_dpdk_rte_flow_tunnel_action_decap_release
-                    (actions->tnl_netdev, actions->tnl_pmd_actions,
-                     actions->tnl_pmd_actions_cnt, &error)) {
+            if (netdev_dpdk_rte_flow_tunnel_action_decap_release(
+                    actions->tnl_netdev, actions->tnl_pmd_actions,
+                    actions->tnl_pmd_actions_cnt, &error)) {
                 VLOG_DBG_RL(&rl, "%s: "
                             "netdev_dpdk_rte_flow_tunnel_action_decap_release "
                             "failed: %d (%s).",
@@ -1009,7 +1009,7 @@ parse_vxlan_match(struct flow_patterns *patterns,
     return 0;
 }
 
-static int
+static int OVS_UNUSED
 parse_flow_tnl_match(struct netdev *tnldev,
                      struct flow_patterns *patterns,
                      odp_port_t orig_in_port,
@@ -1031,7 +1031,7 @@ parse_flow_tnl_match(struct netdev *tnldev,
 
 static int
 parse_flow_match(struct netdev *netdev,
-                 odp_port_t orig_in_port,
+                 odp_port_t orig_in_port OVS_UNUSED,
                  struct flow_patterns *patterns,
                  struct match *match)
 {
@@ -1045,10 +1045,12 @@ parse_flow_match(struct netdev *netdev,
     }
 
     patterns->physdev = netdev;
+#ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
     if (netdev_vport_is_vport_class(netdev->netdev_class) &&
         parse_flow_tnl_match(netdev, patterns, orig_in_port, match)) {
         return -1;
     }
+#endif
     memset(&consumed_masks->in_port, 0, sizeof consumed_masks->in_port);
     /* recirc id must be zero. */
     if (match->wc.masks.recirc_id & match->flow.recirc_id) {
@@ -1679,7 +1681,7 @@ add_jump_action(struct flow_actions *actions, uint32_t group)
     add_flow_action(actions, RTE_FLOW_ACTION_TYPE_JUMP, jump);
 }
 
-static int
+static int OVS_UNUSED
 add_tnl_pop_action(struct netdev *netdev,
                    struct flow_actions *actions,
                    const struct nlattr *nla)
@@ -1767,10 +1769,12 @@ parse_flow_actions(struct netdev *netdev,
                                     clone_actions_len)) {
                 return -1;
             }
+#ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
         } else if (nl_attr_type(nla) == OVS_ACTION_ATTR_TUNNEL_POP) {
             if (add_tnl_pop_action(netdev, actions, nla)) {
                 return -1;
             }
+#endif
         } else {
             VLOG_DBG_RL(&rl, "Unsupported action type %d", nl_attr_type(nla));
             return -1;
@@ -2067,7 +2071,7 @@ get_vxlan_netdev_cb(struct netdev *netdev,
     struct get_vport_netdev_aux *aux = aux_;
 
     if (strcmp(netdev_get_type(netdev), "vxlan")) {
-       return false;
+        return false;
     }
 
     tnl_cfg = netdev_get_tunnel_config(netdev);
@@ -2121,18 +2125,16 @@ netdev_offload_dpdk_hw_miss_packet_recover(struct netdev *netdev,
     struct rte_flow_restore_info rte_restore_info;
     struct rte_flow_tunnel *rte_tnl;
     struct netdev *vport_netdev;
-    struct rte_flow_error error;
     struct pkt_metadata *md;
     struct flow_tnl *md_tnl;
     odp_port_t vport_odp;
     int ret = 0;
 
     if (netdev_dpdk_rte_flow_get_restore_info(netdev, packet,
-                                              &rte_restore_info, &error)) {
+                                              &rte_restore_info, NULL)) {
         /* This function is called for every packet, and in most cases there
          * will be no restore info from the HW, thus error is expected.
          */
-        (void) error;
         return 0;
     }
 
@@ -2144,25 +2146,25 @@ netdev_offload_dpdk_hw_miss_packet_recover(struct netdev *netdev,
     vport_netdev = get_vport_netdev(netdev->dpif_type, rte_tnl,
                                     &vport_odp);
     if (!vport_netdev) {
-        VLOG_WARN("Could not find vport netdev");
+        VLOG_WARN_RL(&rl, "Could not find vport netdev");
         return EOPNOTSUPP;
     }
 
     md = &packet->md;
     /* For tunnel recovery (RTE_FLOW_RESTORE_INFO_TUNNEL), it is possible
-     * to have the packet to still be encapsulated, or not
-     * (RTE_FLOW_RESTORE_INFO_ENCAPSULATED).
+     * to have the packet to still be encapsulated, or not.  This is reflected
+     * by the RTE_FLOW_RESTORE_INFO_ENCAPSULATED flag.
      * In the case it is on, the packet is still encapsulated, and we do
      * the pop in SW.
      * In the case it is off, the packet is already decapsulated by HW, and
-     * the tunnel info is provided in the tunnel struct. For this case we
+     * the tunnel info is provided in the tunnel struct.  For this case we
      * take it to OVS metadata.
      */
     if (rte_restore_info.flags & RTE_FLOW_RESTORE_INFO_ENCAPSULATED) {
         if (!vport_netdev->netdev_class ||
             !vport_netdev->netdev_class->pop_header) {
-            VLOG_ERR("vport nedtdev=%s with no pop_header method",
-                     netdev_get_name(vport_netdev));
+            VLOG_ERR_RL(&rl, "vport nedtdev=%s with no pop_header method",
+                        netdev_get_name(vport_netdev));
             ret = EOPNOTSUPP;
             goto close_vport_netdev;
         }
@@ -2171,7 +2173,7 @@ netdev_offload_dpdk_hw_miss_packet_recover(struct netdev *netdev,
             /* If there is an error with popping the header, the packet is
              * freed. In this case it should not continue SW processing.
              */
-            ret = -1;
+            ret = EINVAL;
             goto close_vport_netdev;
         }
     } else {
diff --git a/lib/netdev-offload-provider.h b/lib/netdev-offload-provider.h
index f24c7dd19..348ca7081 100644
--- a/lib/netdev-offload-provider.h
+++ b/lib/netdev-offload-provider.h
@@ -89,7 +89,8 @@ struct netdev_flow_api {
 
     /* Recover the packet state (contents and data) for continued processing
      * in software.
-     * Return 0 if successful, otherwise returns a positive errno value. */
+     * Return 0 if successful, otherwise returns a positive errno value and
+     * takes ownership of a packet if errno != EOPNOTSUPP. */
     int (*hw_miss_packet_recover)(struct netdev *, struct dp_packet *);
 
     /* Initializies the netdev flow api.
---
Ferriter, Cian June 23, 2021, 12:37 p.m. UTC | #10
Hi all,

As part of rebasing our AVX512 DPIF on this patchset, I tested this patchset with partial HWOL and I'm seeing strange behaviour.

I'll report back more detailed findings soon, just wanted to mention this here as soon as I found the issue.

Thanks,
Cian

> -----Original Message-----
> From: dev <ovs-dev-bounces@openvswitch.org> On Behalf Of Ilya Maximets
> Sent: Tuesday 22 June 2021 16:55
> To: Eli Britstein <elibr@nvidia.com>; dev@openvswitch.org; Ilya Maximets <i.maximets@ovn.org>
> Cc: Ivan.Malov@oktetlabs.ru; Ameer Mahagneh <ameerm@nvidia.com>; Majd Dibbiny <majd@nvidia.com>
> Subject: Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload
> 
> On 4/4/21 11:54 AM, Eli Britstein wrote:
> > VXLAN decap in OVS-DPDK configuration consists of two flows:
> > F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
> > F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
> >
> > F1 is a classification flow. It has outer headers matches and it
> > classifies the packet as a VXLAN packet, and using tnl_pop action the
> > packet continues processing in F2.
> > F2 is a flow that has matches on tunnel metadata as well as on the inner
> > packet headers (as any other flow).
> >
> > In order to fully offload VXLAN decap path, both F1 and F2 should be
> > offloaded. As there are more than one flow in HW, it is possible that
> > F1 is done by HW but F2 is not. Packet is received by SW, and should be
> > processed starting from F2 as F1 was already done by HW.
> > Rte_flows are applicable only on physical port IDs. Keeping the original
> > physical in port on which the packet is received on enables applying
> > vport flows (e.g. F2) on that physical port.
> >
> > This patch-set makes use of [1] introduced in DPDK 20.11, that adds API
> > for tunnel offloads.
> >
> > Note that MLX5 PMD has a bug that the tnl_pop private actions must be
> > first. In OVS it is not.
> > Fixing this issue is scheduled for 21.05 (and stable 20.11.2).
> > Meanwhile, tests were done with a workaround for it [2].
> >
> > v2-v1:
> > - Tracking original in_port, and applying vport on that physical port instead of all PFs.
> > v3-v2:
> > - Traversing ports using a new API instead of flow_dump.
> > - Refactor packet state recover logic, with bug fix for error pop_header.
> > - One ref count for netdev in non-tunnel case.
> > - Rename variables, comments, rebase.
> > v4-v3:
> > - Extract orig_in_port from physdev for flow modify.
> > - Miss handling fixes.
> > v5-v4:
> > - Drop refactor offload rule creation commit.
> > - Comment about setting in_port in restore.
> > - Refactor vports flow offload commit.
> > v6-v5:
> > - Fixed duplicate netdev ref bug.
> >
> > Travis:
> > v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
> > v2: https://travis-ci.org/github/elibritstein/OVS/builds/758382963
> > v3: https://travis-ci.org/github/elibritstein/OVS/builds/761089087
> > v4: https://travis-ci.org/github/elibritstein/OVS/builds/763146966
> > v5: https://travis-ci.org/github/elibritstein/OVS/builds/765271879
> > v6: https://travis-ci.org/github/elibritstein/OVS/builds/765816800
> >
> > GitHub Actions:
> > v1: https://github.com/elibritstein/OVS/actions/runs/515334647
> > v2: https://github.com/elibritstein/OVS/actions/runs/554986007
> > v3: https://github.com/elibritstein/OVS/actions/runs/613226225
> > v4: https://github.com/elibritstein/OVS/actions/runs/658415274
> > v5: https://github.com/elibritstein/OVS/actions/runs/704357369
> > v6: https://github.com/elibritstein/OVS/actions/runs/716304028
> >
> > [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
> > [2] https://github.com/elibritstein/dpdk-stable/pull/1
> >
> >
> > Eli Britstein (10):
> >   netdev-offload: Add HW miss packet state recover API
> >   netdev-dpdk: Introduce DPDK tunnel APIs
> >   netdev-offload: Introduce an API to traverse ports
> >   netdev-dpdk: Add flow_api support for netdev vxlan vports
> >   netdev-offload-dpdk: Implement HW miss packet recover for vport
> >   dpif-netdev: Add HW miss packet state recover logic
> >   netdev-offload-dpdk: Change log rate limits
> >   netdev-offload-dpdk: Support tunnel pop action
> >   netdev-offload-dpdk: Support vports flows offload
> >   netdev-dpdk-offload: Add vxlan pattern matching function
> >
> > Ilya Maximets (2):
> >   netdev-offload: Allow offloading to netdev without ifindex.
> >   netdev-offload: Disallow offloading to unrelated tunneling vports.
> >
> > Sriharsha Basavapatna (1):
> >   dpif-netdev: Provide orig_in_port in metadata for tunneled packets
> >
> >  Documentation/howto/dpdk.rst  |   1 +
> >  NEWS                          |   2 +
> >  lib/dpif-netdev.c             |  97 +++--
> >  lib/netdev-dpdk.c             | 118 ++++++
> >  lib/netdev-dpdk.h             | 106 ++++-
> >  lib/netdev-offload-dpdk.c     | 704 +++++++++++++++++++++++++++++++---
> >  lib/netdev-offload-provider.h |   5 +
> >  lib/netdev-offload-tc.c       |   8 +
> >  lib/netdev-offload.c          |  47 ++-
> >  lib/netdev-offload.h          |  10 +
> >  lib/packets.h                 |   8 +-
> >  11 files changed, 1022 insertions(+), 84 deletions(-)
> >
> 
> Hi.  I reviewed the whole series and it looks mostly OK to me.
> I made a several changes along the way and below you may find a diff
> of what I changed.  In short:
> 
>  - Some style fixes: style of function prototypes and a way how long
>    function calls wrapped.  Added names to uint32_t arguments to
>    function prototypes.
> 
>  - Clarified hw_miss_packet_recover() API.  It needs a note that it
>    takes ownership of a packet on error.  Fixed 'return -1' which
>    doesn't comply with the API definition (should return positive
>    errno).
> 
>  - Moved NEWS updates to correct place.
> 
>  - Added a packet drop counter.
> 
>  - Refactored changes in dfc_processing().  Basically, restored it
>    to look mostly like it was before, because LIKELY/UNLIKELY markers
>    make no sense for non-offloading cases where they both should be
>    'unlikely'.  Also, old way of writing this part allows following
>    optimization:
> 
> https://patchwork.ozlabs.org/project/openvswitch/patch/c0b99b4b9be6c290a6aa3cb00049fac4ebfac5d8.161839
> 0390.git.bnemeth@redhat.com/
> 
>  - And the only functional change that I made is that I guarded
>    actual offloading support with 'ifdef DPDK_EXPERIMENTAL_API',
>    because they doesn't make sense to me if packet restoration
>    API is not available.  I also added this information to the
>    NEWS file.  I think that we should not try to offload TUNNEL_POP
>    if we can't restore the tunnel metadata.  And if we can't have
>    the first flow in HW, there is no point adding the second one.
> 
> Complete branch with ready-to-push patches available here:
>   https://github.com/igsilya/ovs/tree/tmp-vxlan-decap
> 
> Diff below is a diff with this v6 patch-set.  Please, take a look/test.
> I'll wait for Ack on changes below or comments, if I broke something,
> before pushing this to the main repo.
> 
> Best regards, Ilya Maximets.
> 
> The diff:
> 
> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> index 4918d80f3..36314c06a 100644
> --- a/Documentation/howto/dpdk.rst
> +++ b/Documentation/howto/dpdk.rst
> @@ -398,7 +398,7 @@ Supported actions for hardware offload are:
>  - VLAN Push/Pop (push_vlan/pop_vlan).
>  - Modification of IPv6 (set_field:<ADDR>->ipv6_src/ipv6_dst/mod_nw_ttl).
>  - Clone/output (tnl_push and output) for encapsulating over a tunnel.
> -- Tunnel pop, for changing from a physical port to a vport.
> +- Tunnel pop, for packets received on physical ports.
> 
>  Further Reading
>  ---------------
> diff --git a/NEWS b/NEWS
> index e49f2955d..10b3ab053 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -17,6 +17,10 @@ Post-v2.15.0
>       * OVS validated with DPDK 20.11.1. It is recommended to use this version
>         until further releases.
>       * New debug appctl command 'dpdk/get-malloc-stats'.
> +     * Add hardware offload support for tunnel pop action (experimental).
> +       Available only if DPDK experimantal APIs enabled during the build.
> +     * Add hardware offload support for VXLAN flows (experimental).
> +       Available only if DPDK experimantal APIs enabled during the build.
>     - ovsdb-tool:
>       * New option '--election-timer' to the 'create-cluster' command to set the
>         leader election timer during cluster creation.
> @@ -44,8 +48,6 @@ v2.15.0 - 15 Feb 2021
>     - DPDK:
>       * Removed support for vhost-user dequeue zero-copy.
>       * Add support for DPDK 20.11.
> -     * Add hardware offload support for tunnel pop action (experimental).
> -     * Add hardware offload support for VXLAN flows (experimental).
>     - Userspace datapath:
>       * Add the 'pmd' option to "ovs-appctl dpctl/dump-flows", which
>         restricts a flow dump to a single PMD thread if set.
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 1bd828f82..8766a00ea 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -114,6 +114,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_port);
>  COVERAGE_DEFINE(datapath_drop_invalid_bond);
>  COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
>  COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
> +COVERAGE_DEFINE(datapath_drop_hw_miss_recover);
> 
>  /* Protects against changes to 'dp_netdevs'. */
>  static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
> @@ -7068,9 +7069,8 @@ smc_lookup_batch(struct dp_netdev_pmd_thread *pmd,
>      pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SMC_HIT, n_smc_hit);
>  }
> 
> -static struct tx_port *
> -pmd_send_port_cache_lookup(const struct dp_netdev_pmd_thread *pmd,
> -                           odp_port_t port_no);
> +static struct tx_port * pmd_send_port_cache_lookup(
> +    const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no);
> 
>  static inline int
>  dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
> @@ -7081,17 +7081,13 @@ dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
>      struct tx_port *p;
>      uint32_t mark;
> 
> -    if (!netdev_is_flow_api_enabled() || *recirc_depth_get() != 0) {
> -        *flow = NULL;
> -        return 0;
> -    }
> -
>      /* Restore the packet if HW processing was terminated before completion. */
>      p = pmd_send_port_cache_lookup(pmd, port_no);
>      if (OVS_LIKELY(p)) {
>          int err = netdev_hw_miss_packet_recover(p->port->netdev, packet);
> 
> -        if (err != 0 && err != EOPNOTSUPP) {
> +        if (err && err != EOPNOTSUPP) {
> +            COVERAGE_INC(datapath_drop_hw_miss_recover);
>              return -1;
>          }
>      }
> @@ -7168,25 +7164,28 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
>              pkt_metadata_init(&packet->md, port_no);
>          }
> 
> -        if (OVS_UNLIKELY(dp_netdev_hw_flow(pmd, port_no, packet, &flow))) {
> -            /* Packet restoration failed. Its mbuf was freed, do not continue
> -             * processing.
> -             */
> -            continue;
> -        } else if (OVS_LIKELY(flow)) {
> -            tcp_flags = parse_tcp_flags(packet);
> -            if (OVS_LIKELY(batch_enable)) {
> -                dp_netdev_queue_batches(packet, flow, tcp_flags, batches,
> -                                        n_batches);
> -            } else {
> -                /* Flow batching should be performed only after fast-path
> -                 * processing is also completed for packets with emc miss
> -                 * or else it will result in reordering of packets with
> -                 * same datapath flows. */
> -                packet_enqueue_to_flow_map(packet, flow, tcp_flags, flow_map,
> -                                           map_cnt++);
> +        if (netdev_is_flow_api_enabled() && *recirc_depth_get() == 0) {
> +            if (OVS_UNLIKELY(dp_netdev_hw_flow(pmd, port_no, packet, &flow))) {
> +                /* Packet restoration failed and it was dropped, do not
> +                 * continue processing.
> +                 */
> +                continue;
> +            }
> +            if (OVS_LIKELY(flow)) {
> +                tcp_flags = parse_tcp_flags(packet);
> +                if (OVS_LIKELY(batch_enable)) {
> +                    dp_netdev_queue_batches(packet, flow, tcp_flags, batches,
> +                                            n_batches);
> +                } else {
> +                    /* Flow batching should be performed only after fast-path
> +                     * processing is also completed for packets with emc miss
> +                     * or else it will result in reordering of packets with
> +                     * same datapath flows. */
> +                    packet_enqueue_to_flow_map(packet, flow, tcp_flags,
> +                                               flow_map, map_cnt++);
> +                }
> +                continue;
>              }
> -            continue;
>          }
> 
>          miniflow_extract(packet, &key->mf);
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 6e35d0574..bc5485d60 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -5365,11 +5365,11 @@ netdev_dpdk_rte_flow_get_restore_info(struct netdev *netdev,
>  }
> 
>  int
> -netdev_dpdk_rte_flow_tunnel_action_decap_release
> -    (struct netdev *netdev,
> -     struct rte_flow_action *actions,
> -     uint32_t num_of_actions,
> -     struct rte_flow_error *error)
> +netdev_dpdk_rte_flow_tunnel_action_decap_release(
> +    struct netdev *netdev,
> +    struct rte_flow_action *actions,
> +    uint32_t num_of_actions,
> +    struct rte_flow_error *error)
>  {
>      struct netdev_dpdk *dev;
>      int ret;
> diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
> index 3b9bf8681..7b77ed8e0 100644
> --- a/lib/netdev-dpdk.h
> +++ b/lib/netdev-dpdk.h
> @@ -53,33 +53,28 @@ netdev_dpdk_get_port_id(struct netdev *netdev);
> 
>  #ifdef ALLOW_EXPERIMENTAL_API
> 
> -int
> -netdev_dpdk_rte_flow_tunnel_decap_set(struct netdev *,
> +int netdev_dpdk_rte_flow_tunnel_decap_set(struct netdev *,
> +                                          struct rte_flow_tunnel *,
> +                                          struct rte_flow_action **,
> +                                          uint32_t *num_of_actions,
> +                                          struct rte_flow_error *);
> +int netdev_dpdk_rte_flow_tunnel_match(struct netdev *,
>                                        struct rte_flow_tunnel *,
> -                                      struct rte_flow_action **,
> -                                      uint32_t *,
> -                                      struct rte_flow_error *);
> -int
> -netdev_dpdk_rte_flow_tunnel_match(struct netdev *,
> -                                  struct rte_flow_tunnel *,
> -                                  struct rte_flow_item **,
> -                                  uint32_t *,
> -                                  struct rte_flow_error *);
> -int
> -netdev_dpdk_rte_flow_get_restore_info(struct netdev *,
> -                                      struct dp_packet *,
> -                                      struct rte_flow_restore_info *,
> +                                      struct rte_flow_item **,
> +                                      uint32_t *num_of_items,
>                                        struct rte_flow_error *);
> -int
> -netdev_dpdk_rte_flow_tunnel_action_decap_release(struct netdev *,
> -                                                 struct rte_flow_action *,
> -                                                 uint32_t,
> -                                                 struct rte_flow_error *);
> -int
> -netdev_dpdk_rte_flow_tunnel_item_release(struct netdev *,
> -                                         struct rte_flow_item *,
> -                                         uint32_t,
> -                                         struct rte_flow_error *);
> +int netdev_dpdk_rte_flow_get_restore_info(struct netdev *,
> +                                          struct dp_packet *,
> +                                          struct rte_flow_restore_info *,
> +                                          struct rte_flow_error *);
> +int netdev_dpdk_rte_flow_tunnel_action_decap_release(struct netdev *,
> +                                                     struct rte_flow_action *,
> +                                                     uint32_t num_of_actions,
> +                                                     struct rte_flow_error *);
> +int netdev_dpdk_rte_flow_tunnel_item_release(struct netdev *,
> +                                             struct rte_flow_item *,
> +                                             uint32_t num_of_items,
> +                                             struct rte_flow_error *);
> 
>  #else
> 
> @@ -92,14 +87,13 @@ set_error(struct rte_flow_error *error, enum rte_flow_error_type type)
>  }
> 
>  static inline int
> -netdev_dpdk_rte_flow_tunnel_decap_set(struct netdev *netdev OVS_UNUSED,
> -                                      struct rte_flow_tunnel *tunnel,
> -                                      struct rte_flow_action **actions,
> -                                      uint32_t *num_of_actions OVS_UNUSED,
> -                                      struct rte_flow_error *error)
> +netdev_dpdk_rte_flow_tunnel_decap_set(
> +    struct netdev *netdev OVS_UNUSED,
> +    struct rte_flow_tunnel *tunnel OVS_UNUSED,
> +    struct rte_flow_action **actions OVS_UNUSED,
> +    uint32_t *num_of_actions OVS_UNUSED,
> +    struct rte_flow_error *error)
>  {
> -    (void) tunnel;
> -    (void) actions;
>      set_error(error, RTE_FLOW_ERROR_TYPE_ACTION);
>      return -1;
>  }
> @@ -116,34 +110,34 @@ netdev_dpdk_rte_flow_tunnel_match(struct netdev *netdev OVS_UNUSED,
>  }
> 
>  static inline int
> -netdev_dpdk_rte_flow_get_restore_info(struct netdev *netdev OVS_UNUSED,
> -                                      struct dp_packet *p OVS_UNUSED,
> -                                      struct rte_flow_restore_info *info,
> -                                      struct rte_flow_error *error)
> +netdev_dpdk_rte_flow_get_restore_info(
> +    struct netdev *netdev OVS_UNUSED,
> +    struct dp_packet *p OVS_UNUSED,
> +    struct rte_flow_restore_info *info OVS_UNUSED,
> +    struct rte_flow_error *error)
>  {
> -    (void) info;
>      set_error(error, RTE_FLOW_ERROR_TYPE_ATTR);
>      return -1;
>  }
> 
>  static inline int
> -netdev_dpdk_rte_flow_tunnel_action_decap_release
> -    (struct netdev *netdev OVS_UNUSED,
> -     struct rte_flow_action *actions OVS_UNUSED,
> -     uint32_t num_of_actions OVS_UNUSED,
> -     struct rte_flow_error *error)
> +netdev_dpdk_rte_flow_tunnel_action_decap_release(
> +    struct netdev *netdev OVS_UNUSED,
> +    struct rte_flow_action *actions OVS_UNUSED,
> +    uint32_t num_of_actions OVS_UNUSED,
> +    struct rte_flow_error *error)
>  {
>      set_error(error, RTE_FLOW_ERROR_TYPE_NONE);
>      return 0;
>  }
> 
>  static inline int
> -netdev_dpdk_rte_flow_tunnel_item_release(struct netdev *netdev OVS_UNUSED,
> -                                         struct rte_flow_item *items,
> -                                         uint32_t num_of_items OVS_UNUSED,
> -                                         struct rte_flow_error *error)
> +netdev_dpdk_rte_flow_tunnel_item_release(
> +    struct netdev *netdev OVS_UNUSED,
> +    struct rte_flow_item *items OVS_UNUSED,
> +    uint32_t num_of_items OVS_UNUSED,
> +    struct rte_flow_error *error)
>  {
> -    (void) items;
>      set_error(error, RTE_FLOW_ERROR_TYPE_NONE);
>      return 0;
>  }
> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> index 52a74a707..363f32f71 100644
> --- a/lib/netdev-offload-dpdk.c
> +++ b/lib/netdev-offload-dpdk.c
> @@ -459,7 +459,7 @@ dump_flow_action(struct ds *s, struct ds *s_extra,
>                 act_index >= flow_actions->tnl_pmd_actions_pos &&
>                 act_index < flow_actions->tnl_pmd_actions_pos +
>                             flow_actions->tnl_pmd_actions_cnt) {
> -        /* Opaque PMD tunnel actions is skipped. */
> +        /* Opaque PMD tunnel actions are skipped. */
>          return;
>      } else if (actions->type == RTE_FLOW_ACTION_TYPE_MARK) {
>          const struct rte_flow_action_mark *mark = actions->conf;
> @@ -792,9 +792,9 @@ free_flow_actions(struct flow_actions *actions)
>      for (i = 0; i < actions->cnt; i++) {
>          if (actions->tnl_pmd_actions_cnt &&
>              i == actions->tnl_pmd_actions_pos) {
> -            if (netdev_dpdk_rte_flow_tunnel_action_decap_release
> -                    (actions->tnl_netdev, actions->tnl_pmd_actions,
> -                     actions->tnl_pmd_actions_cnt, &error)) {
> +            if (netdev_dpdk_rte_flow_tunnel_action_decap_release(
> +                    actions->tnl_netdev, actions->tnl_pmd_actions,
> +                    actions->tnl_pmd_actions_cnt, &error)) {
>                  VLOG_DBG_RL(&rl, "%s: "
>                              "netdev_dpdk_rte_flow_tunnel_action_decap_release "
>                              "failed: %d (%s).",
> @@ -1009,7 +1009,7 @@ parse_vxlan_match(struct flow_patterns *patterns,
>      return 0;
>  }
> 
> -static int
> +static int OVS_UNUSED
>  parse_flow_tnl_match(struct netdev *tnldev,
>                       struct flow_patterns *patterns,
>                       odp_port_t orig_in_port,
> @@ -1031,7 +1031,7 @@ parse_flow_tnl_match(struct netdev *tnldev,
> 
>  static int
>  parse_flow_match(struct netdev *netdev,
> -                 odp_port_t orig_in_port,
> +                 odp_port_t orig_in_port OVS_UNUSED,
>                   struct flow_patterns *patterns,
>                   struct match *match)
>  {
> @@ -1045,10 +1045,12 @@ parse_flow_match(struct netdev *netdev,
>      }
> 
>      patterns->physdev = netdev;
> +#ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
>      if (netdev_vport_is_vport_class(netdev->netdev_class) &&
>          parse_flow_tnl_match(netdev, patterns, orig_in_port, match)) {
>          return -1;
>      }
> +#endif
>      memset(&consumed_masks->in_port, 0, sizeof consumed_masks->in_port);
>      /* recirc id must be zero. */
>      if (match->wc.masks.recirc_id & match->flow.recirc_id) {
> @@ -1679,7 +1681,7 @@ add_jump_action(struct flow_actions *actions, uint32_t group)
>      add_flow_action(actions, RTE_FLOW_ACTION_TYPE_JUMP, jump);
>  }
> 
> -static int
> +static int OVS_UNUSED
>  add_tnl_pop_action(struct netdev *netdev,
>                     struct flow_actions *actions,
>                     const struct nlattr *nla)
> @@ -1767,10 +1769,12 @@ parse_flow_actions(struct netdev *netdev,
>                                      clone_actions_len)) {
>                  return -1;
>              }
> +#ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
>          } else if (nl_attr_type(nla) == OVS_ACTION_ATTR_TUNNEL_POP) {
>              if (add_tnl_pop_action(netdev, actions, nla)) {
>                  return -1;
>              }
> +#endif
>          } else {
>              VLOG_DBG_RL(&rl, "Unsupported action type %d", nl_attr_type(nla));
>              return -1;
> @@ -2067,7 +2071,7 @@ get_vxlan_netdev_cb(struct netdev *netdev,
>      struct get_vport_netdev_aux *aux = aux_;
> 
>      if (strcmp(netdev_get_type(netdev), "vxlan")) {
> -       return false;
> +        return false;
>      }
> 
>      tnl_cfg = netdev_get_tunnel_config(netdev);
> @@ -2121,18 +2125,16 @@ netdev_offload_dpdk_hw_miss_packet_recover(struct netdev *netdev,
>      struct rte_flow_restore_info rte_restore_info;
>      struct rte_flow_tunnel *rte_tnl;
>      struct netdev *vport_netdev;
> -    struct rte_flow_error error;
>      struct pkt_metadata *md;
>      struct flow_tnl *md_tnl;
>      odp_port_t vport_odp;
>      int ret = 0;
> 
>      if (netdev_dpdk_rte_flow_get_restore_info(netdev, packet,
> -                                              &rte_restore_info, &error)) {
> +                                              &rte_restore_info, NULL)) {
>          /* This function is called for every packet, and in most cases there
>           * will be no restore info from the HW, thus error is expected.
>           */
> -        (void) error;
>          return 0;
>      }
> 
> @@ -2144,25 +2146,25 @@ netdev_offload_dpdk_hw_miss_packet_recover(struct netdev *netdev,
>      vport_netdev = get_vport_netdev(netdev->dpif_type, rte_tnl,
>                                      &vport_odp);
>      if (!vport_netdev) {
> -        VLOG_WARN("Could not find vport netdev");
> +        VLOG_WARN_RL(&rl, "Could not find vport netdev");
>          return EOPNOTSUPP;
>      }
> 
>      md = &packet->md;
>      /* For tunnel recovery (RTE_FLOW_RESTORE_INFO_TUNNEL), it is possible
> -     * to have the packet to still be encapsulated, or not
> -     * (RTE_FLOW_RESTORE_INFO_ENCAPSULATED).
> +     * to have the packet to still be encapsulated, or not.  This is reflected
> +     * by the RTE_FLOW_RESTORE_INFO_ENCAPSULATED flag.
>       * In the case it is on, the packet is still encapsulated, and we do
>       * the pop in SW.
>       * In the case it is off, the packet is already decapsulated by HW, and
> -     * the tunnel info is provided in the tunnel struct. For this case we
> +     * the tunnel info is provided in the tunnel struct.  For this case we
>       * take it to OVS metadata.
>       */
>      if (rte_restore_info.flags & RTE_FLOW_RESTORE_INFO_ENCAPSULATED) {
>          if (!vport_netdev->netdev_class ||
>              !vport_netdev->netdev_class->pop_header) {
> -            VLOG_ERR("vport nedtdev=%s with no pop_header method",
> -                     netdev_get_name(vport_netdev));
> +            VLOG_ERR_RL(&rl, "vport nedtdev=%s with no pop_header method",
> +                        netdev_get_name(vport_netdev));
>              ret = EOPNOTSUPP;
>              goto close_vport_netdev;
>          }
> @@ -2171,7 +2173,7 @@ netdev_offload_dpdk_hw_miss_packet_recover(struct netdev *netdev,
>              /* If there is an error with popping the header, the packet is
>               * freed. In this case it should not continue SW processing.
>               */
> -            ret = -1;
> +            ret = EINVAL;
>              goto close_vport_netdev;
>          }
>      } else {
> diff --git a/lib/netdev-offload-provider.h b/lib/netdev-offload-provider.h
> index f24c7dd19..348ca7081 100644
> --- a/lib/netdev-offload-provider.h
> +++ b/lib/netdev-offload-provider.h
> @@ -89,7 +89,8 @@ struct netdev_flow_api {
> 
>      /* Recover the packet state (contents and data) for continued processing
>       * in software.
> -     * Return 0 if successful, otherwise returns a positive errno value. */
> +     * Return 0 if successful, otherwise returns a positive errno value and
> +     * takes ownership of a packet if errno != EOPNOTSUPP. */
>      int (*hw_miss_packet_recover)(struct netdev *, struct dp_packet *);
> 
>      /* Initializies the netdev flow api.
> ---
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Ferriter, Cian June 23, 2021, 3:18 p.m. UTC | #11
> -----Original Message-----
> From: dev <ovs-dev-bounces@openvswitch.org> On Behalf Of Ferriter, Cian
> Sent: Wednesday 23 June 2021 13:38
> To: Ilya Maximets <i.maximets@ovn.org>; Eli Britstein <elibr@nvidia.com>; dev@openvswitch.org
> Cc: Ivan.Malov@oktetlabs.ru; Ameer Mahagneh <ameerm@nvidia.com>; Majd Dibbiny <majd@nvidia.com>
> Subject: Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload
> 
> Hi all,
> 
> As part of rebasing our AVX512 DPIF on this patchset, I tested this patchset with partial HWOL and I'm
> seeing strange behaviour.
> 
> I'll report back more detailed findings soon, just wanted to mention this here as soon as I found the
> issue.
> 
> Thanks,
> Cian
> 

More details on the issue I'm seeing:
I'm using Ilya's branch from Github:
https://github.com/igsilya/ovs/tree/tmp-vxlan-decap

~/ovs_scripts# $OVS_DIR/utilities/ovs-vsctl list Open_vSwitch
dpdk_version        : "DPDK 20.11.1"
other_config        : {dpdk-hugepage-dir="/mnt/huge", dpdk-init="true", dpdk-lcore-mask="0x1", dpdk-socket-mem="2048,0", emc-insert-inv-prob="0", hw-offload="true", pmd-cpu-mask="0x2"}

~/ovs_scripts# $OVS_DIR/utilities/ovs-vsctl show
31584ce5-09c1-44b3-ab27-1a0308d63fff
    Bridge br0
        datapath_type: netdev
        Port br0
            Interface br0
                type: internal
        Port phy0
            Interface phy0
                type: dpdk
                options: {dpdk-devargs="5e:00.0"}

~/ovs_scripts# $OVS_DIR/utilities/ovs-ofctl dump-flows br0
 cookie=0x0, duration=29.466s, table=0, n_packets=0, n_bytes=0, in_port=phy0 actions=IN_PORT

I'm expecting the flow to be partially offloaded, but I get a segfault when using the above branch. More info on the segfault below:

Thread 13 "pmd-c01/id:8" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f9f72734700 (LWP 19327)]
0x000056163bf0d825 in set_error (error=0x0, type=RTE_FLOW_ERROR_TYPE_ATTR) at lib/netdev-dpdk.h:84
(gdb) bt
#0  0x000056163bf0d825 in set_error (error=0x0, type=RTE_FLOW_ERROR_TYPE_ATTR) at lib/netdev-dpdk.h:84
#1  0x000056163bf0d8d3 in netdev_dpdk_rte_flow_get_restore_info (netdev=0x1bfc65c80, p=0x19033af00, info=0x7f9f72729a20, error=0x0) at lib/netdev-dpdk.h:119
#2  0x000056163bf14da3 in netdev_offload_dpdk_hw_miss_packet_recover (netdev=0x1bfc65c80, packet=0x19033af00) at lib/netdev-offload-dpdk.c:2133
#3  0x000056163bde3662 in netdev_hw_miss_packet_recover (netdev=0x1bfc65c80, packet=0x19033af00) at lib/netdev-offload.c:265
#4  0x000056163bda19a9 in dp_netdev_hw_flow (pmd=0x7f9f72735010, port_no=2, packet=0x19033af00, flow=0x7f9f72729b98) at lib/dpif-netdev.c:7087
#5  0x000056163bda1c5c in dfc_processing (pmd=0x7f9f72735010, packets_=0x7f9f727310d0, keys=0x7f9f7272c480, missed_keys=0x7f9f7272c370, batches=0x7f9f72729f60, n_batches=0x7f9f72730f70, flow_map=0x7f9f72729c50, n_flows=0x7f9f72730f78, index_map=0x7f9f72729c30 "", md_is_valid=false, port_no=2) at lib/dpif-netdev.c:7168
#6  0x000056163bda2f3e in dp_netdev_input__ (pmd=0x7f9f72735010, packets=0x7f9f727310d0, md_is_valid=false, port_no=2) at lib/dpif-netdev.c:7475
#7  0x000056163bda3105 in dp_netdev_input (pmd=0x7f9f72735010, packets=0x7f9f727310d0, port_no=2) at lib/dpif-netdev.c:7519
#8  0x000056163bd9ab04 in dp_netdev_process_rxq_port (pmd=0x7f9f72735010, rxq=0x56163fb3f610, port_no=2) at lib/dpif-netdev.c:4774
#9  0x000056163bd9ee17 in pmd_thread_main (f_=0x7f9f72735010) at lib/dpif-netdev.c:6063
#10 0x000056163be71c88 in ovsthread_wrapper (aux_=0x56163fb3fe70) at lib/ovs-thread.c:383
#11 0x00007f9f884cf6db in start_thread (arg=0x7f9f72734700) at pthread_create.c:463
#12 0x00007f9f862bb71f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

In netdev_offload_dpdk_hw_miss_packet_recover() calls netdev_dpdk_rte_flow_get_restore_info() with a NULL for the struct rte_flow_error *error argument:

    if (netdev_dpdk_rte_flow_get_restore_info(netdev, packet,
                                              &rte_restore_info, NULL)) {
        /* This function is called for every packet, and in most cases there
         * will be no restore info from the HW, thus error is expected.
         */
        return 0;
    }

There are 2 "netdev_dpdk_rte_flow_get_restore_info()" functions. One in lib/netdev-dpdk.h and one in lib/netdev-dpdk.c. 

I don't have the experimental API enabled, so I'm using the function rom lib/netdev-dpdk.h. 

> > -----Original Message-----
> > From: dev <ovs-dev-bounces@openvswitch.org> On Behalf Of Ilya Maximets
> > Sent: Tuesday 22 June 2021 16:55
> > To: Eli Britstein <elibr@nvidia.com>; dev@openvswitch.org; Ilya Maximets <i.maximets@ovn.org>
> > Cc: Ivan.Malov@oktetlabs.ru; Ameer Mahagneh <ameerm@nvidia.com>; Majd Dibbiny <majd@nvidia.com>
> > Subject: Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload
> >
> > On 4/4/21 11:54 AM, Eli Britstein wrote:
> > > VXLAN decap in OVS-DPDK configuration consists of two flows:
> > > F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
> > > F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
> > >
> > > F1 is a classification flow. It has outer headers matches and it
> > > classifies the packet as a VXLAN packet, and using tnl_pop action the
> > > packet continues processing in F2.
> > > F2 is a flow that has matches on tunnel metadata as well as on the inner
> > > packet headers (as any other flow).
> > >
> > > In order to fully offload VXLAN decap path, both F1 and F2 should be
> > > offloaded. As there are more than one flow in HW, it is possible that
> > > F1 is done by HW but F2 is not. Packet is received by SW, and should be
> > > processed starting from F2 as F1 was already done by HW.
> > > Rte_flows are applicable only on physical port IDs. Keeping the original
> > > physical in port on which the packet is received on enables applying
> > > vport flows (e.g. F2) on that physical port.
> > >
> > > This patch-set makes use of [1] introduced in DPDK 20.11, that adds API
> > > for tunnel offloads.
> > >
> > > Note that MLX5 PMD has a bug that the tnl_pop private actions must be
> > > first. In OVS it is not.
> > > Fixing this issue is scheduled for 21.05 (and stable 20.11.2).
> > > Meanwhile, tests were done with a workaround for it [2].
> > >
> > > v2-v1:
> > > - Tracking original in_port, and applying vport on that physical port instead of all PFs.
> > > v3-v2:
> > > - Traversing ports using a new API instead of flow_dump.
> > > - Refactor packet state recover logic, with bug fix for error pop_header.
> > > - One ref count for netdev in non-tunnel case.
> > > - Rename variables, comments, rebase.
> > > v4-v3:
> > > - Extract orig_in_port from physdev for flow modify.
> > > - Miss handling fixes.
> > > v5-v4:
> > > - Drop refactor offload rule creation commit.
> > > - Comment about setting in_port in restore.
> > > - Refactor vports flow offload commit.
> > > v6-v5:
> > > - Fixed duplicate netdev ref bug.
> > >
> > > Travis:
> > > v1: https://travis-ci.org/github/elibritstein/OVS/builds/756418552
> > > v2: https://travis-ci.org/github/elibritstein/OVS/builds/758382963
> > > v3: https://travis-ci.org/github/elibritstein/OVS/builds/761089087
> > > v4: https://travis-ci.org/github/elibritstein/OVS/builds/763146966
> > > v5: https://travis-ci.org/github/elibritstein/OVS/builds/765271879
> > > v6: https://travis-ci.org/github/elibritstein/OVS/builds/765816800
> > >
> > > GitHub Actions:
> > > v1: https://github.com/elibritstein/OVS/actions/runs/515334647
> > > v2: https://github.com/elibritstein/OVS/actions/runs/554986007
> > > v3: https://github.com/elibritstein/OVS/actions/runs/613226225
> > > v4: https://github.com/elibritstein/OVS/actions/runs/658415274
> > > v5: https://github.com/elibritstein/OVS/actions/runs/704357369
> > > v6: https://github.com/elibritstein/OVS/actions/runs/716304028
> > >
> > > [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
> > > [2] https://github.com/elibritstein/dpdk-stable/pull/1
> > >
> > >
> > > Eli Britstein (10):
> > >   netdev-offload: Add HW miss packet state recover API
> > >   netdev-dpdk: Introduce DPDK tunnel APIs
> > >   netdev-offload: Introduce an API to traverse ports
> > >   netdev-dpdk: Add flow_api support for netdev vxlan vports
> > >   netdev-offload-dpdk: Implement HW miss packet recover for vport
> > >   dpif-netdev: Add HW miss packet state recover logic
> > >   netdev-offload-dpdk: Change log rate limits
> > >   netdev-offload-dpdk: Support tunnel pop action
> > >   netdev-offload-dpdk: Support vports flows offload
> > >   netdev-dpdk-offload: Add vxlan pattern matching function
> > >
> > > Ilya Maximets (2):
> > >   netdev-offload: Allow offloading to netdev without ifindex.
> > >   netdev-offload: Disallow offloading to unrelated tunneling vports.
> > >
> > > Sriharsha Basavapatna (1):
> > >   dpif-netdev: Provide orig_in_port in metadata for tunneled packets
> > >
> > >  Documentation/howto/dpdk.rst  |   1 +
> > >  NEWS                          |   2 +
> > >  lib/dpif-netdev.c             |  97 +++--
> > >  lib/netdev-dpdk.c             | 118 ++++++
> > >  lib/netdev-dpdk.h             | 106 ++++-
> > >  lib/netdev-offload-dpdk.c     | 704 +++++++++++++++++++++++++++++++---
> > >  lib/netdev-offload-provider.h |   5 +
> > >  lib/netdev-offload-tc.c       |   8 +
> > >  lib/netdev-offload.c          |  47 ++-
> > >  lib/netdev-offload.h          |  10 +
> > >  lib/packets.h                 |   8 +-
> > >  11 files changed, 1022 insertions(+), 84 deletions(-)
> > >
> >
> > Hi.  I reviewed the whole series and it looks mostly OK to me.
> > I made a several changes along the way and below you may find a diff
> > of what I changed.  In short:
> >
> >  - Some style fixes: style of function prototypes and a way how long
> >    function calls wrapped.  Added names to uint32_t arguments to
> >    function prototypes.
> >
> >  - Clarified hw_miss_packet_recover() API.  It needs a note that it
> >    takes ownership of a packet on error.  Fixed 'return -1' which
> >    doesn't comply with the API definition (should return positive
> >    errno).
> >
> >  - Moved NEWS updates to correct place.
> >
> >  - Added a packet drop counter.
> >
> >  - Refactored changes in dfc_processing().  Basically, restored it
> >    to look mostly like it was before, because LIKELY/UNLIKELY markers
> >    make no sense for non-offloading cases where they both should be
> >    'unlikely'.  Also, old way of writing this part allows following
> >    optimization:
> >
> >
> https://patchwork.ozlabs.org/project/openvswitch/patch/c0b99b4b9be6c290a6aa3cb00049fac4ebfac5d8.161839
> > 0390.git.bnemeth@redhat.com/
> >
> >  - And the only functional change that I made is that I guarded
> >    actual offloading support with 'ifdef DPDK_EXPERIMENTAL_API',
> >    because they doesn't make sense to me if packet restoration
> >    API is not available.  I also added this information to the
> >    NEWS file.  I think that we should not try to offload TUNNEL_POP
> >    if we can't restore the tunnel metadata.  And if we can't have
> >    the first flow in HW, there is no point adding the second one.
> >
> > Complete branch with ready-to-push patches available here:
> >   https://github.com/igsilya/ovs/tree/tmp-vxlan-decap
> >
> > Diff below is a diff with this v6 patch-set.  Please, take a look/test.
> > I'll wait for Ack on changes below or comments, if I broke something,
> > before pushing this to the main repo.
> >
> > Best regards, Ilya Maximets.
> >
> > The diff:
> >
> > diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
> > index 4918d80f3..36314c06a 100644
> > --- a/Documentation/howto/dpdk.rst
> > +++ b/Documentation/howto/dpdk.rst
> > @@ -398,7 +398,7 @@ Supported actions for hardware offload are:
> >  - VLAN Push/Pop (push_vlan/pop_vlan).
> >  - Modification of IPv6 (set_field:<ADDR>->ipv6_src/ipv6_dst/mod_nw_ttl).
> >  - Clone/output (tnl_push and output) for encapsulating over a tunnel.
> > -- Tunnel pop, for changing from a physical port to a vport.
> > +- Tunnel pop, for packets received on physical ports.
> >
> >  Further Reading
> >  ---------------
> > diff --git a/NEWS b/NEWS
> > index e49f2955d..10b3ab053 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -17,6 +17,10 @@ Post-v2.15.0
> >       * OVS validated with DPDK 20.11.1. It is recommended to use this version
> >         until further releases.
> >       * New debug appctl command 'dpdk/get-malloc-stats'.
> > +     * Add hardware offload support for tunnel pop action (experimental).
> > +       Available only if DPDK experimantal APIs enabled during the build.
> > +     * Add hardware offload support for VXLAN flows (experimental).
> > +       Available only if DPDK experimantal APIs enabled during the build.
> >     - ovsdb-tool:
> >       * New option '--election-timer' to the 'create-cluster' command to set the
> >         leader election timer during cluster creation.
> > @@ -44,8 +48,6 @@ v2.15.0 - 15 Feb 2021
> >     - DPDK:
> >       * Removed support for vhost-user dequeue zero-copy.
> >       * Add support for DPDK 20.11.
> > -     * Add hardware offload support for tunnel pop action (experimental).
> > -     * Add hardware offload support for VXLAN flows (experimental).
> >     - Userspace datapath:
> >       * Add the 'pmd' option to "ovs-appctl dpctl/dump-flows", which
> >         restricts a flow dump to a single PMD thread if set.
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index 1bd828f82..8766a00ea 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -114,6 +114,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_port);
> >  COVERAGE_DEFINE(datapath_drop_invalid_bond);
> >  COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
> >  COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
> > +COVERAGE_DEFINE(datapath_drop_hw_miss_recover);
> >
> >  /* Protects against changes to 'dp_netdevs'. */
> >  static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
> > @@ -7068,9 +7069,8 @@ smc_lookup_batch(struct dp_netdev_pmd_thread *pmd,
> >      pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SMC_HIT, n_smc_hit);
> >  }
> >
> > -static struct tx_port *
> > -pmd_send_port_cache_lookup(const struct dp_netdev_pmd_thread *pmd,
> > -                           odp_port_t port_no);
> > +static struct tx_port * pmd_send_port_cache_lookup(
> > +    const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no);
> >
> >  static inline int
> >  dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
> > @@ -7081,17 +7081,13 @@ dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
> >      struct tx_port *p;
> >      uint32_t mark;
> >
> > -    if (!netdev_is_flow_api_enabled() || *recirc_depth_get() != 0) {
> > -        *flow = NULL;
> > -        return 0;
> > -    }
> > -
> >      /* Restore the packet if HW processing was terminated before completion. */
> >      p = pmd_send_port_cache_lookup(pmd, port_no);
> >      if (OVS_LIKELY(p)) {
> >          int err = netdev_hw_miss_packet_recover(p->port->netdev, packet);
> >
> > -        if (err != 0 && err != EOPNOTSUPP) {
> > +        if (err && err != EOPNOTSUPP) {
> > +            COVERAGE_INC(datapath_drop_hw_miss_recover);
> >              return -1;
> >          }
> >      }
> > @@ -7168,25 +7164,28 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
> >              pkt_metadata_init(&packet->md, port_no);
> >          }
> >
> > -        if (OVS_UNLIKELY(dp_netdev_hw_flow(pmd, port_no, packet, &flow))) {
> > -            /* Packet restoration failed. Its mbuf was freed, do not continue
> > -             * processing.
> > -             */
> > -            continue;
> > -        } else if (OVS_LIKELY(flow)) {
> > -            tcp_flags = parse_tcp_flags(packet);
> > -            if (OVS_LIKELY(batch_enable)) {
> > -                dp_netdev_queue_batches(packet, flow, tcp_flags, batches,
> > -                                        n_batches);
> > -            } else {
> > -                /* Flow batching should be performed only after fast-path
> > -                 * processing is also completed for packets with emc miss
> > -                 * or else it will result in reordering of packets with
> > -                 * same datapath flows. */
> > -                packet_enqueue_to_flow_map(packet, flow, tcp_flags, flow_map,
> > -                                           map_cnt++);
> > +        if (netdev_is_flow_api_enabled() && *recirc_depth_get() == 0) {
> > +            if (OVS_UNLIKELY(dp_netdev_hw_flow(pmd, port_no, packet, &flow))) {
> > +                /* Packet restoration failed and it was dropped, do not
> > +                 * continue processing.
> > +                 */
> > +                continue;
> > +            }
> > +            if (OVS_LIKELY(flow)) {
> > +                tcp_flags = parse_tcp_flags(packet);
> > +                if (OVS_LIKELY(batch_enable)) {
> > +                    dp_netdev_queue_batches(packet, flow, tcp_flags, batches,
> > +                                            n_batches);
> > +                } else {
> > +                    /* Flow batching should be performed only after fast-path
> > +                     * processing is also completed for packets with emc miss
> > +                     * or else it will result in reordering of packets with
> > +                     * same datapath flows. */
> > +                    packet_enqueue_to_flow_map(packet, flow, tcp_flags,
> > +                                               flow_map, map_cnt++);
> > +                }
> > +                continue;
> >              }
> > -            continue;
> >          }
> >
> >          miniflow_extract(packet, &key->mf);
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index 6e35d0574..bc5485d60 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -5365,11 +5365,11 @@ netdev_dpdk_rte_flow_get_restore_info(struct netdev *netdev,
> >  }
> >
> >  int
> > -netdev_dpdk_rte_flow_tunnel_action_decap_release
> > -    (struct netdev *netdev,
> > -     struct rte_flow_action *actions,
> > -     uint32_t num_of_actions,
> > -     struct rte_flow_error *error)
> > +netdev_dpdk_rte_flow_tunnel_action_decap_release(
> > +    struct netdev *netdev,
> > +    struct rte_flow_action *actions,
> > +    uint32_t num_of_actions,
> > +    struct rte_flow_error *error)
> >  {
> >      struct netdev_dpdk *dev;
> >      int ret;
> > diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
> > index 3b9bf8681..7b77ed8e0 100644
> > --- a/lib/netdev-dpdk.h
> > +++ b/lib/netdev-dpdk.h
> > @@ -53,33 +53,28 @@ netdev_dpdk_get_port_id(struct netdev *netdev);
> >
> >  #ifdef ALLOW_EXPERIMENTAL_API
> >
> > -int
> > -netdev_dpdk_rte_flow_tunnel_decap_set(struct netdev *,
> > +int netdev_dpdk_rte_flow_tunnel_decap_set(struct netdev *,
> > +                                          struct rte_flow_tunnel *,
> > +                                          struct rte_flow_action **,
> > +                                          uint32_t *num_of_actions,
> > +                                          struct rte_flow_error *);
> > +int netdev_dpdk_rte_flow_tunnel_match(struct netdev *,
> >                                        struct rte_flow_tunnel *,
> > -                                      struct rte_flow_action **,
> > -                                      uint32_t *,
> > -                                      struct rte_flow_error *);
> > -int
> > -netdev_dpdk_rte_flow_tunnel_match(struct netdev *,
> > -                                  struct rte_flow_tunnel *,
> > -                                  struct rte_flow_item **,
> > -                                  uint32_t *,
> > -                                  struct rte_flow_error *);
> > -int
> > -netdev_dpdk_rte_flow_get_restore_info(struct netdev *,
> > -                                      struct dp_packet *,
> > -                                      struct rte_flow_restore_info *,
> > +                                      struct rte_flow_item **,
> > +                                      uint32_t *num_of_items,
> >                                        struct rte_flow_error *);
> > -int
> > -netdev_dpdk_rte_flow_tunnel_action_decap_release(struct netdev *,
> > -                                                 struct rte_flow_action *,
> > -                                                 uint32_t,
> > -                                                 struct rte_flow_error *);
> > -int
> > -netdev_dpdk_rte_flow_tunnel_item_release(struct netdev *,
> > -                                         struct rte_flow_item *,
> > -                                         uint32_t,
> > -                                         struct rte_flow_error *);
> > +int netdev_dpdk_rte_flow_get_restore_info(struct netdev *,
> > +                                          struct dp_packet *,
> > +                                          struct rte_flow_restore_info *,
> > +                                          struct rte_flow_error *);
> > +int netdev_dpdk_rte_flow_tunnel_action_decap_release(struct netdev *,
> > +                                                     struct rte_flow_action *,
> > +                                                     uint32_t num_of_actions,
> > +                                                     struct rte_flow_error *);
> > +int netdev_dpdk_rte_flow_tunnel_item_release(struct netdev *,
> > +                                             struct rte_flow_item *,
> > +                                             uint32_t num_of_items,
> > +                                             struct rte_flow_error *);
> >
> >  #else
> >
> > @@ -92,14 +87,13 @@ set_error(struct rte_flow_error *error, enum rte_flow_error_type type)
> >  }
> >
> >  static inline int
> > -netdev_dpdk_rte_flow_tunnel_decap_set(struct netdev *netdev OVS_UNUSED,
> > -                                      struct rte_flow_tunnel *tunnel,
> > -                                      struct rte_flow_action **actions,
> > -                                      uint32_t *num_of_actions OVS_UNUSED,
> > -                                      struct rte_flow_error *error)
> > +netdev_dpdk_rte_flow_tunnel_decap_set(
> > +    struct netdev *netdev OVS_UNUSED,
> > +    struct rte_flow_tunnel *tunnel OVS_UNUSED,
> > +    struct rte_flow_action **actions OVS_UNUSED,
> > +    uint32_t *num_of_actions OVS_UNUSED,
> > +    struct rte_flow_error *error)
> >  {
> > -    (void) tunnel;
> > -    (void) actions;
> >      set_error(error, RTE_FLOW_ERROR_TYPE_ACTION);
> >      return -1;
> >  }
> > @@ -116,34 +110,34 @@ netdev_dpdk_rte_flow_tunnel_match(struct netdev *netdev OVS_UNUSED,
> >  }
> >
> >  static inline int
> > -netdev_dpdk_rte_flow_get_restore_info(struct netdev *netdev OVS_UNUSED,
> > -                                      struct dp_packet *p OVS_UNUSED,
> > -                                      struct rte_flow_restore_info *info,
> > -                                      struct rte_flow_error *error)
> > +netdev_dpdk_rte_flow_get_restore_info(
> > +    struct netdev *netdev OVS_UNUSED,
> > +    struct dp_packet *p OVS_UNUSED,
> > +    struct rte_flow_restore_info *info OVS_UNUSED,
> > +    struct rte_flow_error *error)
> >  {
> > -    (void) info;
> >      set_error(error, RTE_FLOW_ERROR_TYPE_ATTR);
> >      return -1;
> >  }
> >
> >  static inline int
> > -netdev_dpdk_rte_flow_tunnel_action_decap_release
> > -    (struct netdev *netdev OVS_UNUSED,
> > -     struct rte_flow_action *actions OVS_UNUSED,
> > -     uint32_t num_of_actions OVS_UNUSED,
> > -     struct rte_flow_error *error)
> > +netdev_dpdk_rte_flow_tunnel_action_decap_release(
> > +    struct netdev *netdev OVS_UNUSED,
> > +    struct rte_flow_action *actions OVS_UNUSED,
> > +    uint32_t num_of_actions OVS_UNUSED,
> > +    struct rte_flow_error *error)
> >  {
> >      set_error(error, RTE_FLOW_ERROR_TYPE_NONE);
> >      return 0;
> >  }
> >
> >  static inline int
> > -netdev_dpdk_rte_flow_tunnel_item_release(struct netdev *netdev OVS_UNUSED,
> > -                                         struct rte_flow_item *items,
> > -                                         uint32_t num_of_items OVS_UNUSED,
> > -                                         struct rte_flow_error *error)
> > +netdev_dpdk_rte_flow_tunnel_item_release(
> > +    struct netdev *netdev OVS_UNUSED,
> > +    struct rte_flow_item *items OVS_UNUSED,
> > +    uint32_t num_of_items OVS_UNUSED,
> > +    struct rte_flow_error *error)
> >  {
> > -    (void) items;
> >      set_error(error, RTE_FLOW_ERROR_TYPE_NONE);
> >      return 0;
> >  }
> > diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
> > index 52a74a707..363f32f71 100644
> > --- a/lib/netdev-offload-dpdk.c
> > +++ b/lib/netdev-offload-dpdk.c
> > @@ -459,7 +459,7 @@ dump_flow_action(struct ds *s, struct ds *s_extra,
> >                 act_index >= flow_actions->tnl_pmd_actions_pos &&
> >                 act_index < flow_actions->tnl_pmd_actions_pos +
> >                             flow_actions->tnl_pmd_actions_cnt) {
> > -        /* Opaque PMD tunnel actions is skipped. */
> > +        /* Opaque PMD tunnel actions are skipped. */
> >          return;
> >      } else if (actions->type == RTE_FLOW_ACTION_TYPE_MARK) {
> >          const struct rte_flow_action_mark *mark = actions->conf;
> > @@ -792,9 +792,9 @@ free_flow_actions(struct flow_actions *actions)
> >      for (i = 0; i < actions->cnt; i++) {
> >          if (actions->tnl_pmd_actions_cnt &&
> >              i == actions->tnl_pmd_actions_pos) {
> > -            if (netdev_dpdk_rte_flow_tunnel_action_decap_release
> > -                    (actions->tnl_netdev, actions->tnl_pmd_actions,
> > -                     actions->tnl_pmd_actions_cnt, &error)) {
> > +            if (netdev_dpdk_rte_flow_tunnel_action_decap_release(
> > +                    actions->tnl_netdev, actions->tnl_pmd_actions,
> > +                    actions->tnl_pmd_actions_cnt, &error)) {
> >                  VLOG_DBG_RL(&rl, "%s: "
> >                              "netdev_dpdk_rte_flow_tunnel_action_decap_release "
> >                              "failed: %d (%s).",
> > @@ -1009,7 +1009,7 @@ parse_vxlan_match(struct flow_patterns *patterns,
> >      return 0;
> >  }
> >
> > -static int
> > +static int OVS_UNUSED
> >  parse_flow_tnl_match(struct netdev *tnldev,
> >                       struct flow_patterns *patterns,
> >                       odp_port_t orig_in_port,
> > @@ -1031,7 +1031,7 @@ parse_flow_tnl_match(struct netdev *tnldev,
> >
> >  static int
> >  parse_flow_match(struct netdev *netdev,
> > -                 odp_port_t orig_in_port,
> > +                 odp_port_t orig_in_port OVS_UNUSED,
> >                   struct flow_patterns *patterns,
> >                   struct match *match)
> >  {
> > @@ -1045,10 +1045,12 @@ parse_flow_match(struct netdev *netdev,
> >      }
> >
> >      patterns->physdev = netdev;
> > +#ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
> >      if (netdev_vport_is_vport_class(netdev->netdev_class) &&
> >          parse_flow_tnl_match(netdev, patterns, orig_in_port, match)) {
> >          return -1;
> >      }
> > +#endif
> >      memset(&consumed_masks->in_port, 0, sizeof consumed_masks->in_port);
> >      /* recirc id must be zero. */
> >      if (match->wc.masks.recirc_id & match->flow.recirc_id) {
> > @@ -1679,7 +1681,7 @@ add_jump_action(struct flow_actions *actions, uint32_t group)
> >      add_flow_action(actions, RTE_FLOW_ACTION_TYPE_JUMP, jump);
> >  }
> >
> > -static int
> > +static int OVS_UNUSED
> >  add_tnl_pop_action(struct netdev *netdev,
> >                     struct flow_actions *actions,
> >                     const struct nlattr *nla)
> > @@ -1767,10 +1769,12 @@ parse_flow_actions(struct netdev *netdev,
> >                                      clone_actions_len)) {
> >                  return -1;
> >              }
> > +#ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
> >          } else if (nl_attr_type(nla) == OVS_ACTION_ATTR_TUNNEL_POP) {
> >              if (add_tnl_pop_action(netdev, actions, nla)) {
> >                  return -1;
> >              }
> > +#endif
> >          } else {
> >              VLOG_DBG_RL(&rl, "Unsupported action type %d", nl_attr_type(nla));
> >              return -1;
> > @@ -2067,7 +2071,7 @@ get_vxlan_netdev_cb(struct netdev *netdev,
> >      struct get_vport_netdev_aux *aux = aux_;
> >
> >      if (strcmp(netdev_get_type(netdev), "vxlan")) {
> > -       return false;
> > +        return false;
> >      }
> >
> >      tnl_cfg = netdev_get_tunnel_config(netdev);
> > @@ -2121,18 +2125,16 @@ netdev_offload_dpdk_hw_miss_packet_recover(struct netdev *netdev,
> >      struct rte_flow_restore_info rte_restore_info;
> >      struct rte_flow_tunnel *rte_tnl;
> >      struct netdev *vport_netdev;
> > -    struct rte_flow_error error;
> >      struct pkt_metadata *md;
> >      struct flow_tnl *md_tnl;
> >      odp_port_t vport_odp;
> >      int ret = 0;
> >
> >      if (netdev_dpdk_rte_flow_get_restore_info(netdev, packet,
> > -                                              &rte_restore_info, &error)) {
> > +                                              &rte_restore_info, NULL)) {
> >          /* This function is called for every packet, and in most cases there
> >           * will be no restore info from the HW, thus error is expected.
> >           */
> > -        (void) error;
> >          return 0;
> >      }
> >
> > @@ -2144,25 +2146,25 @@ netdev_offload_dpdk_hw_miss_packet_recover(struct netdev *netdev,
> >      vport_netdev = get_vport_netdev(netdev->dpif_type, rte_tnl,
> >                                      &vport_odp);
> >      if (!vport_netdev) {
> > -        VLOG_WARN("Could not find vport netdev");
> > +        VLOG_WARN_RL(&rl, "Could not find vport netdev");
> >          return EOPNOTSUPP;
> >      }
> >
> >      md = &packet->md;
> >      /* For tunnel recovery (RTE_FLOW_RESTORE_INFO_TUNNEL), it is possible
> > -     * to have the packet to still be encapsulated, or not
> > -     * (RTE_FLOW_RESTORE_INFO_ENCAPSULATED).
> > +     * to have the packet to still be encapsulated, or not.  This is reflected
> > +     * by the RTE_FLOW_RESTORE_INFO_ENCAPSULATED flag.
> >       * In the case it is on, the packet is still encapsulated, and we do
> >       * the pop in SW.
> >       * In the case it is off, the packet is already decapsulated by HW, and
> > -     * the tunnel info is provided in the tunnel struct. For this case we
> > +     * the tunnel info is provided in the tunnel struct.  For this case we
> >       * take it to OVS metadata.
> >       */
> >      if (rte_restore_info.flags & RTE_FLOW_RESTORE_INFO_ENCAPSULATED) {
> >          if (!vport_netdev->netdev_class ||
> >              !vport_netdev->netdev_class->pop_header) {
> > -            VLOG_ERR("vport nedtdev=%s with no pop_header method",
> > -                     netdev_get_name(vport_netdev));
> > +            VLOG_ERR_RL(&rl, "vport nedtdev=%s with no pop_header method",
> > +                        netdev_get_name(vport_netdev));
> >              ret = EOPNOTSUPP;
> >              goto close_vport_netdev;
> >          }
> > @@ -2171,7 +2173,7 @@ netdev_offload_dpdk_hw_miss_packet_recover(struct netdev *netdev,
> >              /* If there is an error with popping the header, the packet is
> >               * freed. In this case it should not continue SW processing.
> >               */
> > -            ret = -1;
> > +            ret = EINVAL;
> >              goto close_vport_netdev;
> >          }
> >      } else {
> > diff --git a/lib/netdev-offload-provider.h b/lib/netdev-offload-provider.h
> > index f24c7dd19..348ca7081 100644
> > --- a/lib/netdev-offload-provider.h
> > +++ b/lib/netdev-offload-provider.h
> > @@ -89,7 +89,8 @@ struct netdev_flow_api {
> >
> >      /* Recover the packet state (contents and data) for continued processing
> >       * in software.
> > -     * Return 0 if successful, otherwise returns a positive errno value. */
> > +     * Return 0 if successful, otherwise returns a positive errno value and
> > +     * takes ownership of a packet if errno != EOPNOTSUPP. */
> >      int (*hw_miss_packet_recover)(struct netdev *, struct dp_packet *);
> >
> >      /* Initializies the netdev flow api.
> > ---
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Ilya Maximets June 23, 2021, 3:24 p.m. UTC | #12
On 6/23/21 5:18 PM, Ferriter, Cian wrote:
>> -----Original Message-----
>> From: dev <ovs-dev-bounces@openvswitch.org> On Behalf Of Ferriter, Cian
>> Sent: Wednesday 23 June 2021 13:38
>> To: Ilya Maximets <i.maximets@ovn.org>; Eli Britstein <elibr@nvidia.com>; dev@openvswitch.org
>> Cc: Ivan.Malov@oktetlabs.ru; Ameer Mahagneh <ameerm@nvidia.com>; Majd Dibbiny <majd@nvidia.com>
>> Subject: Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload
>>
>> Hi all,
>>
>> As part of rebasing our AVX512 DPIF on this patchset, I tested this patchset with partial HWOL and I'm
>> seeing strange behaviour.
>>
>> I'll report back more detailed findings soon, just wanted to mention this here as soon as I found the
>> issue.
>>
>> Thanks,
>> Cian
>>
> 
> More details on the issue I'm seeing:
> I'm using Ilya's branch from Github:
> https://github.com/igsilya/ovs/tree/tmp-vxlan-decap
> 
> ~/ovs_scripts# $OVS_DIR/utilities/ovs-vsctl list Open_vSwitch
> dpdk_version        : "DPDK 20.11.1"
> other_config        : {dpdk-hugepage-dir="/mnt/huge", dpdk-init="true", dpdk-lcore-mask="0x1", dpdk-socket-mem="2048,0", emc-insert-inv-prob="0", hw-offload="true", pmd-cpu-mask="0x2"}
> 
> ~/ovs_scripts# $OVS_DIR/utilities/ovs-vsctl show
> 31584ce5-09c1-44b3-ab27-1a0308d63fff
>     Bridge br0
>         datapath_type: netdev
>         Port br0
>             Interface br0
>                 type: internal
>         Port phy0
>             Interface phy0
>                 type: dpdk
>                 options: {dpdk-devargs="5e:00.0"}
> 
> ~/ovs_scripts# $OVS_DIR/utilities/ovs-ofctl dump-flows br0
>  cookie=0x0, duration=29.466s, table=0, n_packets=0, n_bytes=0, in_port=phy0 actions=IN_PORT
> 
> I'm expecting the flow to be partially offloaded, but I get a segfault when using the above branch. More info on the segfault below:
> 
> Thread 13 "pmd-c01/id:8" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7f9f72734700 (LWP 19327)]
> 0x000056163bf0d825 in set_error (error=0x0, type=RTE_FLOW_ERROR_TYPE_ATTR) at lib/netdev-dpdk.h:84
> (gdb) bt
> #0  0x000056163bf0d825 in set_error (error=0x0, type=RTE_FLOW_ERROR_TYPE_ATTR) at lib/netdev-dpdk.h:84
> #1  0x000056163bf0d8d3 in netdev_dpdk_rte_flow_get_restore_info (netdev=0x1bfc65c80, p=0x19033af00, info=0x7f9f72729a20, error=0x0) at lib/netdev-dpdk.h:119
> #2  0x000056163bf14da3 in netdev_offload_dpdk_hw_miss_packet_recover (netdev=0x1bfc65c80, packet=0x19033af00) at lib/netdev-offload-dpdk.c:2133
> #3  0x000056163bde3662 in netdev_hw_miss_packet_recover (netdev=0x1bfc65c80, packet=0x19033af00) at lib/netdev-offload.c:265
> #4  0x000056163bda19a9 in dp_netdev_hw_flow (pmd=0x7f9f72735010, port_no=2, packet=0x19033af00, flow=0x7f9f72729b98) at lib/dpif-netdev.c:7087
> #5  0x000056163bda1c5c in dfc_processing (pmd=0x7f9f72735010, packets_=0x7f9f727310d0, keys=0x7f9f7272c480, missed_keys=0x7f9f7272c370, batches=0x7f9f72729f60, n_batches=0x7f9f72730f70, flow_map=0x7f9f72729c50, n_flows=0x7f9f72730f78, index_map=0x7f9f72729c30 "", md_is_valid=false, port_no=2) at lib/dpif-netdev.c:7168
> #6  0x000056163bda2f3e in dp_netdev_input__ (pmd=0x7f9f72735010, packets=0x7f9f727310d0, md_is_valid=false, port_no=2) at lib/dpif-netdev.c:7475
> #7  0x000056163bda3105 in dp_netdev_input (pmd=0x7f9f72735010, packets=0x7f9f727310d0, port_no=2) at lib/dpif-netdev.c:7519
> #8  0x000056163bd9ab04 in dp_netdev_process_rxq_port (pmd=0x7f9f72735010, rxq=0x56163fb3f610, port_no=2) at lib/dpif-netdev.c:4774
> #9  0x000056163bd9ee17 in pmd_thread_main (f_=0x7f9f72735010) at lib/dpif-netdev.c:6063
> #10 0x000056163be71c88 in ovsthread_wrapper (aux_=0x56163fb3fe70) at lib/ovs-thread.c:383
> #11 0x00007f9f884cf6db in start_thread (arg=0x7f9f72734700) at pthread_create.c:463
> #12 0x00007f9f862bb71f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> 
> In netdev_offload_dpdk_hw_miss_packet_recover() calls netdev_dpdk_rte_flow_get_restore_info() with a NULL for the struct rte_flow_error *error argument:
> 
>     if (netdev_dpdk_rte_flow_get_restore_info(netdev, packet,
>                                               &rte_restore_info, NULL)) {
>         /* This function is called for every packet, and in most cases there
>          * will be no restore info from the HW, thus error is expected.
>          */
>         return 0;
>     }
> 
> There are 2 "netdev_dpdk_rte_flow_get_restore_info()" functions. One in lib/netdev-dpdk.h and one in lib/netdev-dpdk.c. 
> 
> I don't have the experimental API enabled, so I'm using the function rom lib/netdev-dpdk.h. 

Yes, that's my fault.  I replaced 'error' with NULL, because actual DPDK
implementation supports that and we're not using this error anyway.
But I missed the fact that dummy implementation doesn't support NULL as
argument.  Following change should fix your issue:

diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
index 7b77ed8e0..699be3fb4 100644
--- a/lib/netdev-dpdk.h
+++ b/lib/netdev-dpdk.h
@@ -81,6 +81,9 @@ int netdev_dpdk_rte_flow_tunnel_item_release(struct netdev *,
 static inline void
 set_error(struct rte_flow_error *error, enum rte_flow_error_type type)
 {
+    if (!error) {
+        return;
+    }
     error->type = type;
     error->cause = NULL;
     error->message = NULL;
---

Please, try it out.

Best regrds, Ilya Maximets.
Ferriter, Cian June 23, 2021, 3:31 p.m. UTC | #13
> -----Original Message-----
> From: Ilya Maximets <i.maximets@ovn.org>
> Sent: Wednesday 23 June 2021 16:25
> To: Ferriter, Cian <cian.ferriter@intel.com>; Ilya Maximets <i.maximets@ovn.org>; Eli Britstein
> <elibr@nvidia.com>; dev@openvswitch.org
> Cc: Ivan.Malov@oktetlabs.ru; Ameer Mahagneh <ameerm@nvidia.com>; Majd Dibbiny <majd@nvidia.com>
> Subject: Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload
> 
> On 6/23/21 5:18 PM, Ferriter, Cian wrote:
> >> -----Original Message-----
> >> From: dev <ovs-dev-bounces@openvswitch.org> On Behalf Of Ferriter, Cian
> >> Sent: Wednesday 23 June 2021 13:38
> >> To: Ilya Maximets <i.maximets@ovn.org>; Eli Britstein <elibr@nvidia.com>; dev@openvswitch.org
> >> Cc: Ivan.Malov@oktetlabs.ru; Ameer Mahagneh <ameerm@nvidia.com>; Majd Dibbiny <majd@nvidia.com>
> >> Subject: Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload
> >>
> >> Hi all,
> >>
> >> As part of rebasing our AVX512 DPIF on this patchset, I tested this patchset with partial HWOL and
> I'm
> >> seeing strange behaviour.
> >>
> >> I'll report back more detailed findings soon, just wanted to mention this here as soon as I found
> the
> >> issue.
> >>
> >> Thanks,
> >> Cian
> >>
> >
> > More details on the issue I'm seeing:
> > I'm using Ilya's branch from Github:
> > https://github.com/igsilya/ovs/tree/tmp-vxlan-decap
> >
> > ~/ovs_scripts# $OVS_DIR/utilities/ovs-vsctl list Open_vSwitch
> > dpdk_version        : "DPDK 20.11.1"
> > other_config        : {dpdk-hugepage-dir="/mnt/huge", dpdk-init="true", dpdk-lcore-mask="0x1", dpdk-
> socket-mem="2048,0", emc-insert-inv-prob="0", hw-offload="true", pmd-cpu-mask="0x2"}
> >
> > ~/ovs_scripts# $OVS_DIR/utilities/ovs-vsctl show
> > 31584ce5-09c1-44b3-ab27-1a0308d63fff
> >     Bridge br0
> >         datapath_type: netdev
> >         Port br0
> >             Interface br0
> >                 type: internal
> >         Port phy0
> >             Interface phy0
> >                 type: dpdk
> >                 options: {dpdk-devargs="5e:00.0"}
> >
> > ~/ovs_scripts# $OVS_DIR/utilities/ovs-ofctl dump-flows br0
> >  cookie=0x0, duration=29.466s, table=0, n_packets=0, n_bytes=0, in_port=phy0 actions=IN_PORT
> >
> > I'm expecting the flow to be partially offloaded, but I get a segfault when using the above branch.
> More info on the segfault below:
> >
> > Thread 13 "pmd-c01/id:8" received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 0x7f9f72734700 (LWP 19327)]
> > 0x000056163bf0d825 in set_error (error=0x0, type=RTE_FLOW_ERROR_TYPE_ATTR) at lib/netdev-dpdk.h:84
> > (gdb) bt
> > #0  0x000056163bf0d825 in set_error (error=0x0, type=RTE_FLOW_ERROR_TYPE_ATTR) at lib/netdev-
> dpdk.h:84
> > #1  0x000056163bf0d8d3 in netdev_dpdk_rte_flow_get_restore_info (netdev=0x1bfc65c80, p=0x19033af00,
> info=0x7f9f72729a20, error=0x0) at lib/netdev-dpdk.h:119
> > #2  0x000056163bf14da3 in netdev_offload_dpdk_hw_miss_packet_recover (netdev=0x1bfc65c80,
> packet=0x19033af00) at lib/netdev-offload-dpdk.c:2133
> > #3  0x000056163bde3662 in netdev_hw_miss_packet_recover (netdev=0x1bfc65c80, packet=0x19033af00) at
> lib/netdev-offload.c:265
> > #4  0x000056163bda19a9 in dp_netdev_hw_flow (pmd=0x7f9f72735010, port_no=2, packet=0x19033af00,
> flow=0x7f9f72729b98) at lib/dpif-netdev.c:7087
> > #5  0x000056163bda1c5c in dfc_processing (pmd=0x7f9f72735010, packets_=0x7f9f727310d0,
> keys=0x7f9f7272c480, missed_keys=0x7f9f7272c370, batches=0x7f9f72729f60, n_batches=0x7f9f72730f70,
> flow_map=0x7f9f72729c50, n_flows=0x7f9f72730f78, index_map=0x7f9f72729c30 "", md_is_valid=false,
> port_no=2) at lib/dpif-netdev.c:7168
> > #6  0x000056163bda2f3e in dp_netdev_input__ (pmd=0x7f9f72735010, packets=0x7f9f727310d0,
> md_is_valid=false, port_no=2) at lib/dpif-netdev.c:7475
> > #7  0x000056163bda3105 in dp_netdev_input (pmd=0x7f9f72735010, packets=0x7f9f727310d0, port_no=2) at
> lib/dpif-netdev.c:7519
> > #8  0x000056163bd9ab04 in dp_netdev_process_rxq_port (pmd=0x7f9f72735010, rxq=0x56163fb3f610,
> port_no=2) at lib/dpif-netdev.c:4774
> > #9  0x000056163bd9ee17 in pmd_thread_main (f_=0x7f9f72735010) at lib/dpif-netdev.c:6063
> > #10 0x000056163be71c88 in ovsthread_wrapper (aux_=0x56163fb3fe70) at lib/ovs-thread.c:383
> > #11 0x00007f9f884cf6db in start_thread (arg=0x7f9f72734700) at pthread_create.c:463
> > #12 0x00007f9f862bb71f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
> >
> > In netdev_offload_dpdk_hw_miss_packet_recover() calls netdev_dpdk_rte_flow_get_restore_info() with a
> NULL for the struct rte_flow_error *error argument:
> >
> >     if (netdev_dpdk_rte_flow_get_restore_info(netdev, packet,
> >                                               &rte_restore_info, NULL)) {
> >         /* This function is called for every packet, and in most cases there
> >          * will be no restore info from the HW, thus error is expected.
> >          */
> >         return 0;
> >     }
> >
> > There are 2 "netdev_dpdk_rte_flow_get_restore_info()" functions. One in lib/netdev-dpdk.h and one in
> lib/netdev-dpdk.c.
> >
> > I don't have the experimental API enabled, so I'm using the function rom lib/netdev-dpdk.h.
> 
> Yes, that's my fault.  I replaced 'error' with NULL, because actual DPDK
> implementation supports that and we're not using this error anyway.
> But I missed the fact that dummy implementation doesn't support NULL as
> argument.  Following change should fix your issue:
> 
> diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
> index 7b77ed8e0..699be3fb4 100644
> --- a/lib/netdev-dpdk.h
> +++ b/lib/netdev-dpdk.h
> @@ -81,6 +81,9 @@ int netdev_dpdk_rte_flow_tunnel_item_release(struct netdev *,
>  static inline void
>  set_error(struct rte_flow_error *error, enum rte_flow_error_type type)
>  {
> +    if (!error) {
> +        return;
> +    }
>      error->type = type;
>      error->cause = NULL;
>      error->message = NULL;
> ---
> 
> Please, try it out.
> 
> Best regrds, Ilya Maximets.

That fix works for me. Thanks Ilya!
Eli Britstein June 23, 2021, 3:48 p.m. UTC | #14
On 6/23/2021 6:18 PM, Ferriter, Cian wrote:
> External email: Use caution opening links or attachments
>
>
>> -----Original Message-----
>> From: dev <ovs-dev-bounces@openvswitch.org> On Behalf Of Ferriter, Cian
>> Sent: Wednesday 23 June 2021 13:38
>> To: Ilya Maximets <i.maximets@ovn.org>; Eli Britstein <elibr@nvidia.com>; dev@openvswitch.org
>> Cc: Ivan.Malov@oktetlabs.ru; Ameer Mahagneh <ameerm@nvidia.com>; Majd Dibbiny <majd@nvidia.com>
>> Subject: Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload
>>
>> Hi all,
>>
>> As part of rebasing our AVX512 DPIF on this patchset, I tested this patchset with partial HWOL and I'm
>> seeing strange behaviour.
>>
>> I'll report back more detailed findings soon, just wanted to mention this here as soon as I found the
>> issue.
>>
>> Thanks,
>> Cian
>>
> More details on the issue I'm seeing:
> I'm using Ilya's branch from Github:
> https://github.com/igsilya/ovs/tree/tmp-vxlan-decap
>
> ~/ovs_scripts# $OVS_DIR/utilities/ovs-vsctl list Open_vSwitch
> dpdk_version        : "DPDK 20.11.1"
> other_config        : {dpdk-hugepage-dir="/mnt/huge", dpdk-init="true", dpdk-lcore-mask="0x1", dpdk-socket-mem="2048,0", emc-insert-inv-prob="0", hw-offload="true", pmd-cpu-mask="0x2"}
>
> ~/ovs_scripts# $OVS_DIR/utilities/ovs-vsctl show
> 31584ce5-09c1-44b3-ab27-1a0308d63fff
>      Bridge br0
>          datapath_type: netdev
>          Port br0
>              Interface br0
>                  type: internal
>          Port phy0
>              Interface phy0
>                  type: dpdk
>                  options: {dpdk-devargs="5e:00.0"}
>
> ~/ovs_scripts# $OVS_DIR/utilities/ovs-ofctl dump-flows br0
>   cookie=0x0, duration=29.466s, table=0, n_packets=0, n_bytes=0, in_port=phy0 actions=IN_PORT
>
> I'm expecting the flow to be partially offloaded, but I get a segfault when using the above branch. More info on the segfault below:
>
> Thread 13 "pmd-c01/id:8" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7f9f72734700 (LWP 19327)]
> 0x000056163bf0d825 in set_error (error=0x0, type=RTE_FLOW_ERROR_TYPE_ATTR) at lib/netdev-dpdk.h:84
> (gdb) bt
> #0  0x000056163bf0d825 in set_error (error=0x0, type=RTE_FLOW_ERROR_TYPE_ATTR) at lib/netdev-dpdk.h:84
Yes, it is caused by passing NULL instead of valid struct rte_error, by 
Ilya's comments. I will fix it in v7.
> #1  0x000056163bf0d8d3 in netdev_dpdk_rte_flow_get_restore_info (netdev=0x1bfc65c80, p=0x19033af00, info=0x7f9f72729a20, error=0x0) at lib/netdev-dpdk.h:119
> #2  0x000056163bf14da3 in netdev_offload_dpdk_hw_miss_packet_recover (netdev=0x1bfc65c80, packet=0x19033af00) at lib/netdev-offload-dpdk.c:2133
> #3  0x000056163bde3662 in netdev_hw_miss_packet_recover (netdev=0x1bfc65c80, packet=0x19033af00) at lib/netdev-offload.c:265
> #4  0x000056163bda19a9 in dp_netdev_hw_flow (pmd=0x7f9f72735010, port_no=2, packet=0x19033af00, flow=0x7f9f72729b98) at lib/dpif-netdev.c:7087
> #5  0x000056163bda1c5c in dfc_processing (pmd=0x7f9f72735010, packets_=0x7f9f727310d0, keys=0x7f9f7272c480, missed_keys=0x7f9f7272c370, batches=0x7f9f72729f60, n_batches=0x7f9f72730f70, flow_map=0x7f9f72729c50, n_flows=0x7f9f72730f78, index_map=0x7f9f72729c30 "", md_is_valid=false, port_no=2) at lib/dpif-netdev.c:7168
> #6  0x000056163bda2f3e in dp_netdev_input__ (pmd=0x7f9f72735010, packets=0x7f9f727310d0, md_is_valid=false, port_no=2) at lib/dpif-netdev.c:7475
> #7  0x000056163bda3105 in dp_netdev_input (pmd=0x7f9f72735010, packets=0x7f9f727310d0, port_no=2) at lib/dpif-netdev.c:7519
> #8  0x000056163bd9ab04 in dp_netdev_process_rxq_port (pmd=0x7f9f72735010, rxq=0x56163fb3f610, port_no=2) at lib/dpif-netdev.c:4774
> #9  0x000056163bd9ee17 in pmd_thread_main (f_=0x7f9f72735010) at lib/dpif-netdev.c:6063
> #10 0x000056163be71c88 in ovsthread_wrapper (aux_=0x56163fb3fe70) at lib/ovs-thread.c:383
> #11 0x00007f9f884cf6db in start_thread (arg=0x7f9f72734700) at pthread_create.c:463
> #12 0x00007f9f862bb71f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
>
> In netdev_offload_dpdk_hw_miss_packet_recover() calls netdev_dpdk_rte_flow_get_restore_info() with a NULL for the struct rte_flow_error *error argument:
>
>      if (netdev_dpdk_rte_flow_get_restore_info(netdev, packet,
>                                                &rte_restore_info, NULL)) {
>          /* This function is called for every packet, and in most cases there
>           * will be no restore info from the HW, thus error is expected.
>           */
>          return 0;
>      }
>
> There are 2 "netdev_dpdk_rte_flow_get_restore_info()" functions. One in lib/netdev-dpdk.h and one in lib/netdev-dpdk.c.
>
> I don't have the experimental API enabled, so I'm using the function rom lib/netdev-dpdk.h.
>
>>> -----Original Message-----
>>> From: dev <ovs-dev-bounces@openvswitch.org> On Behalf Of Ilya Maximets
>>> Sent: Tuesday 22 June 2021 16:55
>>> To: Eli Britstein <elibr@nvidia.com>; dev@openvswitch.org; Ilya Maximets <i.maximets@ovn.org>
>>> Cc: Ivan.Malov@oktetlabs.ru; Ameer Mahagneh <ameerm@nvidia.com>; Majd Dibbiny <majd@nvidia.com>
>>> Subject: Re: [ovs-dev] [PATCH V6 00/13] Netdev vxlan-decap offload
>>>
>>> On 4/4/21 11:54 AM, Eli Britstein wrote:
>>>> VXLAN decap in OVS-DPDK configuration consists of two flows:
>>>> F1: in_port(ens1f0),eth(),ipv4(),udp(), actions:tnl_pop(vxlan_sys_4789)
>>>> F2: tunnel(),in_port(vxlan_sys_4789),eth(),ipv4(), actions:ens1f0_0
>>>>
>>>> F1 is a classification flow. It has outer headers matches and it
>>>> classifies the packet as a VXLAN packet, and using tnl_pop action the
>>>> packet continues processing in F2.
>>>> F2 is a flow that has matches on tunnel metadata as well as on the inner
>>>> packet headers (as any other flow).
>>>>
>>>> In order to fully offload VXLAN decap path, both F1 and F2 should be
>>>> offloaded. As there are more than one flow in HW, it is possible that
>>>> F1 is done by HW but F2 is not. Packet is received by SW, and should be
>>>> processed starting from F2 as F1 was already done by HW.
>>>> Rte_flows are applicable only on physical port IDs. Keeping the original
>>>> physical in port on which the packet is received on enables applying
>>>> vport flows (e.g. F2) on that physical port.
>>>>
>>>> This patch-set makes use of [1] introduced in DPDK 20.11, that adds API
>>>> for tunnel offloads.
>>>>
>>>> Note that MLX5 PMD has a bug that the tnl_pop private actions must be
>>>> first. In OVS it is not.
>>>> Fixing this issue is scheduled for 21.05 (and stable 20.11.2).
>>>> Meanwhile, tests were done with a workaround for it [2].
>>>>
>>>> v2-v1:
>>>> - Tracking original in_port, and applying vport on that physical port instead of all PFs.
>>>> v3-v2:
>>>> - Traversing ports using a new API instead of flow_dump.
>>>> - Refactor packet state recover logic, with bug fix for error pop_header.
>>>> - One ref count for netdev in non-tunnel case.
>>>> - Rename variables, comments, rebase.
>>>> v4-v3:
>>>> - Extract orig_in_port from physdev for flow modify.
>>>> - Miss handling fixes.
>>>> v5-v4:
>>>> - Drop refactor offload rule creation commit.
>>>> - Comment about setting in_port in restore.
>>>> - Refactor vports flow offload commit.
>>>> v6-v5:
>>>> - Fixed duplicate netdev ref bug.
>>>>
>>>> Travis:
>>>> v1: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.org%2Fgithub%2Felibritstein%2FOVS%2Fbuilds%2F756418552&amp;data=04%7C01%7Celibr%40nvidia.com%7Ca74658db54f34dda383208d9365a2a66%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637600583912503391%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&amp;sdata=kU0ZeAX7jjJVnqdfmCSjZPG3kC9nZ9iotokpSkBnFCI%3D&amp;reserved=0
>>>> v2: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.org%2Fgithub%2Felibritstein%2FOVS%2Fbuilds%2F758382963&amp;data=04%7C01%7Celibr%40nvidia.com%7Ca74658db54f34dda383208d9365a2a66%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637600583912503391%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&amp;sdata=kJg%2FQpcTKsc8YYOa6IINCc0s8zYCdIvOp38r4VMNVWU%3D&amp;reserved=0
>>>> v3: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.org%2Fgithub%2Felibritstein%2FOVS%2Fbuilds%2F761089087&amp;data=04%7C01%7Celibr%40nvidia.com%7Ca74658db54f34dda383208d9365a2a66%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637600583912503391%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&amp;sdata=aez%2FMC%2BNEyWXqhV%2BHMYvgdwsKoAp1aOAJRzAy8bmISQ%3D&amp;reserved=0
>>>> v4: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.org%2Fgithub%2Felibritstein%2FOVS%2Fbuilds%2F763146966&amp;data=04%7C01%7Celibr%40nvidia.com%7Ca74658db54f34dda383208d9365a2a66%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637600583912503391%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&amp;sdata=rCG1WrZ5SDqnKvH6tLWP9wbHgBFNJvqQUI5TLhV%2BD6w%3D&amp;reserved=0
>>>> v5: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.org%2Fgithub%2Felibritstein%2FOVS%2Fbuilds%2F765271879&amp;data=04%7C01%7Celibr%40nvidia.com%7Ca74658db54f34dda383208d9365a2a66%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637600583912503391%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&amp;sdata=l8NnWqOmLxdgpvBzMxp9tze4U4uf4uewv4KUnJEz9Nk%3D&amp;reserved=0
>>>> v6: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.org%2Fgithub%2Felibritstein%2FOVS%2Fbuilds%2F765816800&amp;data=04%7C01%7Celibr%40nvidia.com%7Ca74658db54f34dda383208d9365a2a66%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637600583912503391%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&amp;sdata=paN242Nu36lzVrNpsKr2cyxXh8W7CJwv4h3kFshgKHs%3D&amp;reserved=0
>>>>
>>>> GitHub Actions:
>>>> v1: https://github.com/elibritstein/OVS/actions/runs/515334647
>>>> v2: https://github.com/elibritstein/OVS/actions/runs/554986007
>>>> v3: https://github.com/elibritstein/OVS/actions/runs/613226225
>>>> v4: https://github.com/elibritstein/OVS/actions/runs/658415274
>>>> v5: https://github.com/elibritstein/OVS/actions/runs/704357369
>>>> v6: https://github.com/elibritstein/OVS/actions/runs/716304028
>>>>
>>>> [1] https://mails.dpdk.org/archives/dev/2020-October/187314.html
>>>> [2] https://github.com/elibritstein/dpdk-stable/pull/1
>>>>
>>>>
>>>> Eli Britstein (10):
>>>>    netdev-offload: Add HW miss packet state recover API
>>>>    netdev-dpdk: Introduce DPDK tunnel APIs
>>>>    netdev-offload: Introduce an API to traverse ports
>>>>    netdev-dpdk: Add flow_api support for netdev vxlan vports
>>>>    netdev-offload-dpdk: Implement HW miss packet recover for vport
>>>>    dpif-netdev: Add HW miss packet state recover logic
>>>>    netdev-offload-dpdk: Change log rate limits
>>>>    netdev-offload-dpdk: Support tunnel pop action
>>>>    netdev-offload-dpdk: Support vports flows offload
>>>>    netdev-dpdk-offload: Add vxlan pattern matching function
>>>>
>>>> Ilya Maximets (2):
>>>>    netdev-offload: Allow offloading to netdev without ifindex.
>>>>    netdev-offload: Disallow offloading to unrelated tunneling vports.
>>>>
>>>> Sriharsha Basavapatna (1):
>>>>    dpif-netdev: Provide orig_in_port in metadata for tunneled packets
>>>>
>>>>   Documentation/howto/dpdk.rst  |   1 +
>>>>   NEWS                          |   2 +
>>>>   lib/dpif-netdev.c             |  97 +++--
>>>>   lib/netdev-dpdk.c             | 118 ++++++
>>>>   lib/netdev-dpdk.h             | 106 ++++-
>>>>   lib/netdev-offload-dpdk.c     | 704 +++++++++++++++++++++++++++++++---
>>>>   lib/netdev-offload-provider.h |   5 +
>>>>   lib/netdev-offload-tc.c       |   8 +
>>>>   lib/netdev-offload.c          |  47 ++-
>>>>   lib/netdev-offload.h          |  10 +
>>>>   lib/packets.h                 |   8 +-
>>>>   11 files changed, 1022 insertions(+), 84 deletions(-)
>>>>
>>> Hi.  I reviewed the whole series and it looks mostly OK to me.
>>> I made a several changes along the way and below you may find a diff
>>> of what I changed.  In short:
>>>
>>>   - Some style fixes: style of function prototypes and a way how long
>>>     function calls wrapped.  Added names to uint32_t arguments to
>>>     function prototypes.
>>>
>>>   - Clarified hw_miss_packet_recover() API.  It needs a note that it
>>>     takes ownership of a packet on error.  Fixed 'return -1' which
>>>     doesn't comply with the API definition (should return positive
>>>     errno).
>>>
>>>   - Moved NEWS updates to correct place.
>>>
>>>   - Added a packet drop counter.
>>>
>>>   - Refactored changes in dfc_processing().  Basically, restored it
>>>     to look mostly like it was before, because LIKELY/UNLIKELY markers
>>>     make no sense for non-offloading cases where they both should be
>>>     'unlikely'.  Also, old way of writing this part allows following
>>>     optimization:
>>>
>>>
>> https://patchwork.ozlabs.org/project/openvswitch/patch/c0b99b4b9be6c290a6aa3cb00049fac4ebfac5d8.161839
>>> 0390.git.bnemeth@redhat.com/
>>>
>>>   - And the only functional change that I made is that I guarded
>>>     actual offloading support with 'ifdef DPDK_EXPERIMENTAL_API',
>>>     because they doesn't make sense to me if packet restoration
>>>     API is not available.  I also added this information to the
>>>     NEWS file.  I think that we should not try to offload TUNNEL_POP
>>>     if we can't restore the tunnel metadata.  And if we can't have
>>>     the first flow in HW, there is no point adding the second one.
>>>
>>> Complete branch with ready-to-push patches available here:
>>>    https://github.com/igsilya/ovs/tree/tmp-vxlan-decap
>>>
>>> Diff below is a diff with this v6 patch-set.  Please, take a look/test.
>>> I'll wait for Ack on changes below or comments, if I broke something,
>>> before pushing this to the main repo.
>>>
>>> Best regards, Ilya Maximets.
>>>
>>> The diff:
>>>
>>> diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst
>>> index 4918d80f3..36314c06a 100644
>>> --- a/Documentation/howto/dpdk.rst
>>> +++ b/Documentation/howto/dpdk.rst
>>> @@ -398,7 +398,7 @@ Supported actions for hardware offload are:
>>>   - VLAN Push/Pop (push_vlan/pop_vlan).
>>>   - Modification of IPv6 (set_field:<ADDR>->ipv6_src/ipv6_dst/mod_nw_ttl).
>>>   - Clone/output (tnl_push and output) for encapsulating over a tunnel.
>>> -- Tunnel pop, for changing from a physical port to a vport.
>>> +- Tunnel pop, for packets received on physical ports.
>>>
>>>   Further Reading
>>>   ---------------
>>> diff --git a/NEWS b/NEWS
>>> index e49f2955d..10b3ab053 100644
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -17,6 +17,10 @@ Post-v2.15.0
>>>        * OVS validated with DPDK 20.11.1. It is recommended to use this version
>>>          until further releases.
>>>        * New debug appctl command 'dpdk/get-malloc-stats'.
>>> +     * Add hardware offload support for tunnel pop action (experimental).
>>> +       Available only if DPDK experimantal APIs enabled during the build.
>>> +     * Add hardware offload support for VXLAN flows (experimental).
>>> +       Available only if DPDK experimantal APIs enabled during the build.
>>>      - ovsdb-tool:
>>>        * New option '--election-timer' to the 'create-cluster' command to set the
>>>          leader election timer during cluster creation.
>>> @@ -44,8 +48,6 @@ v2.15.0 - 15 Feb 2021
>>>      - DPDK:
>>>        * Removed support for vhost-user dequeue zero-copy.
>>>        * Add support for DPDK 20.11.
>>> -     * Add hardware offload support for tunnel pop action (experimental).
>>> -     * Add hardware offload support for VXLAN flows (experimental).
>>>      - Userspace datapath:
>>>        * Add the 'pmd' option to "ovs-appctl dpctl/dump-flows", which
>>>          restricts a flow dump to a single PMD thread if set.
>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>>> index 1bd828f82..8766a00ea 100644
>>> --- a/lib/dpif-netdev.c
>>> +++ b/lib/dpif-netdev.c
>>> @@ -114,6 +114,7 @@ COVERAGE_DEFINE(datapath_drop_invalid_port);
>>>   COVERAGE_DEFINE(datapath_drop_invalid_bond);
>>>   COVERAGE_DEFINE(datapath_drop_invalid_tnl_port);
>>>   COVERAGE_DEFINE(datapath_drop_rx_invalid_packet);
>>> +COVERAGE_DEFINE(datapath_drop_hw_miss_recover);
>>>
>>>   /* Protects against changes to 'dp_netdevs'. */
>>>   static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER;
>>> @@ -7068,9 +7069,8 @@ smc_lookup_batch(struct dp_netdev_pmd_thread *pmd,
>>>       pmd_perf_update_counter(&pmd->perf_stats, PMD_STAT_SMC_HIT, n_smc_hit);
>>>   }
>>>
>>> -static struct tx_port *
>>> -pmd_send_port_cache_lookup(const struct dp_netdev_pmd_thread *pmd,
>>> -                           odp_port_t port_no);
>>> +static struct tx_port * pmd_send_port_cache_lookup(
>>> +    const struct dp_netdev_pmd_thread *pmd, odp_port_t port_no);
>>>
>>>   static inline int
>>>   dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
>>> @@ -7081,17 +7081,13 @@ dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
>>>       struct tx_port *p;
>>>       uint32_t mark;
>>>
>>> -    if (!netdev_is_flow_api_enabled() || *recirc_depth_get() != 0) {
>>> -        *flow = NULL;
>>> -        return 0;
>>> -    }
>>> -
>>>       /* Restore the packet if HW processing was terminated before completion. */
>>>       p = pmd_send_port_cache_lookup(pmd, port_no);
>>>       if (OVS_LIKELY(p)) {
>>>           int err = netdev_hw_miss_packet_recover(p->port->netdev, packet);
>>>
>>> -        if (err != 0 && err != EOPNOTSUPP) {
>>> +        if (err && err != EOPNOTSUPP) {
>>> +            COVERAGE_INC(datapath_drop_hw_miss_recover);
>>>               return -1;
>>>           }
>>>       }
>>> @@ -7168,25 +7164,28 @@ dfc_processing(struct dp_netdev_pmd_thread *pmd,
>>>               pkt_metadata_init(&packet->md, port_no);
>>>           }
>>>
>>> -        if (OVS_UNLIKELY(dp_netdev_hw_flow(pmd, port_no, packet, &flow))) {
>>> -            /* Packet restoration failed. Its mbuf was freed, do not continue
>>> -             * processing.
>>> -             */
>>> -            continue;
>>> -        } else if (OVS_LIKELY(flow)) {
>>> -            tcp_flags = parse_tcp_flags(packet);
>>> -            if (OVS_LIKELY(batch_enable)) {
>>> -                dp_netdev_queue_batches(packet, flow, tcp_flags, batches,
>>> -                                        n_batches);
>>> -            } else {
>>> -                /* Flow batching should be performed only after fast-path
>>> -                 * processing is also completed for packets with emc miss
>>> -                 * or else it will result in reordering of packets with
>>> -                 * same datapath flows. */
>>> -                packet_enqueue_to_flow_map(packet, flow, tcp_flags, flow_map,
>>> -                                           map_cnt++);
>>> +        if (netdev_is_flow_api_enabled() && *recirc_depth_get() == 0) {
>>> +            if (OVS_UNLIKELY(dp_netdev_hw_flow(pmd, port_no, packet, &flow))) {
>>> +                /* Packet restoration failed and it was dropped, do not
>>> +                 * continue processing.
>>> +                 */
>>> +                continue;
>>> +            }
>>> +            if (OVS_LIKELY(flow)) {
>>> +                tcp_flags = parse_tcp_flags(packet);
>>> +                if (OVS_LIKELY(batch_enable)) {
>>> +                    dp_netdev_queue_batches(packet, flow, tcp_flags, batches,
>>> +                                            n_batches);
>>> +                } else {
>>> +                    /* Flow batching should be performed only after fast-path
>>> +                     * processing is also completed for packets with emc miss
>>> +                     * or else it will result in reordering of packets with
>>> +                     * same datapath flows. */
>>> +                    packet_enqueue_to_flow_map(packet, flow, tcp_flags,
>>> +                                               flow_map, map_cnt++);
>>> +                }
>>> +                continue;
>>>               }
>>> -            continue;
>>>           }
>>>
>>>           miniflow_extract(packet, &key->mf);
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 6e35d0574..bc5485d60 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -5365,11 +5365,11 @@ netdev_dpdk_rte_flow_get_restore_info(struct netdev *netdev,
>>>   }
>>>
>>>   int
>>> -netdev_dpdk_rte_flow_tunnel_action_decap_release
>>> -    (struct netdev *netdev,
>>> -     struct rte_flow_action *actions,
>>> -     uint32_t num_of_actions,
>>> -     struct rte_flow_error *error)
>>> +netdev_dpdk_rte_flow_tunnel_action_decap_release(
>>> +    struct netdev *netdev,
>>> +    struct rte_flow_action *actions,
>>> +    uint32_t num_of_actions,
>>> +    struct rte_flow_error *error)
>>>   {
>>>       struct netdev_dpdk *dev;
>>>       int ret;
>>> diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
>>> index 3b9bf8681..7b77ed8e0 100644
>>> --- a/lib/netdev-dpdk.h
>>> +++ b/lib/netdev-dpdk.h
>>> @@ -53,33 +53,28 @@ netdev_dpdk_get_port_id(struct netdev *netdev);
>>>
>>>   #ifdef ALLOW_EXPERIMENTAL_API
>>>
>>> -int
>>> -netdev_dpdk_rte_flow_tunnel_decap_set(struct netdev *,
>>> +int netdev_dpdk_rte_flow_tunnel_decap_set(struct netdev *,
>>> +                                          struct rte_flow_tunnel *,
>>> +                                          struct rte_flow_action **,
>>> +                                          uint32_t *num_of_actions,
>>> +                                          struct rte_flow_error *);
>>> +int netdev_dpdk_rte_flow_tunnel_match(struct netdev *,
>>>                                         struct rte_flow_tunnel *,
>>> -                                      struct rte_flow_action **,
>>> -                                      uint32_t *,
>>> -                                      struct rte_flow_error *);
>>> -int
>>> -netdev_dpdk_rte_flow_tunnel_match(struct netdev *,
>>> -                                  struct rte_flow_tunnel *,
>>> -                                  struct rte_flow_item **,
>>> -                                  uint32_t *,
>>> -                                  struct rte_flow_error *);
>>> -int
>>> -netdev_dpdk_rte_flow_get_restore_info(struct netdev *,
>>> -                                      struct dp_packet *,
>>> -                                      struct rte_flow_restore_info *,
>>> +                                      struct rte_flow_item **,
>>> +                                      uint32_t *num_of_items,
>>>                                         struct rte_flow_error *);
>>> -int
>>> -netdev_dpdk_rte_flow_tunnel_action_decap_release(struct netdev *,
>>> -                                                 struct rte_flow_action *,
>>> -                                                 uint32_t,
>>> -                                                 struct rte_flow_error *);
>>> -int
>>> -netdev_dpdk_rte_flow_tunnel_item_release(struct netdev *,
>>> -                                         struct rte_flow_item *,
>>> -                                         uint32_t,
>>> -                                         struct rte_flow_error *);
>>> +int netdev_dpdk_rte_flow_get_restore_info(struct netdev *,
>>> +                                          struct dp_packet *,
>>> +                                          struct rte_flow_restore_info *,
>>> +                                          struct rte_flow_error *);
>>> +int netdev_dpdk_rte_flow_tunnel_action_decap_release(struct netdev *,
>>> +                                                     struct rte_flow_action *,
>>> +                                                     uint32_t num_of_actions,
>>> +                                                     struct rte_flow_error *);
>>> +int netdev_dpdk_rte_flow_tunnel_item_release(struct netdev *,
>>> +                                             struct rte_flow_item *,
>>> +                                             uint32_t num_of_items,
>>> +                                             struct rte_flow_error *);
>>>
>>>   #else
>>>
>>> @@ -92,14 +87,13 @@ set_error(struct rte_flow_error *error, enum rte_flow_error_type type)
>>>   }
>>>
>>>   static inline int
>>> -netdev_dpdk_rte_flow_tunnel_decap_set(struct netdev *netdev OVS_UNUSED,
>>> -                                      struct rte_flow_tunnel *tunnel,
>>> -                                      struct rte_flow_action **actions,
>>> -                                      uint32_t *num_of_actions OVS_UNUSED,
>>> -                                      struct rte_flow_error *error)
>>> +netdev_dpdk_rte_flow_tunnel_decap_set(
>>> +    struct netdev *netdev OVS_UNUSED,
>>> +    struct rte_flow_tunnel *tunnel OVS_UNUSED,
>>> +    struct rte_flow_action **actions OVS_UNUSED,
>>> +    uint32_t *num_of_actions OVS_UNUSED,
>>> +    struct rte_flow_error *error)
>>>   {
>>> -    (void) tunnel;
>>> -    (void) actions;
>>>       set_error(error, RTE_FLOW_ERROR_TYPE_ACTION);
>>>       return -1;
>>>   }
>>> @@ -116,34 +110,34 @@ netdev_dpdk_rte_flow_tunnel_match(struct netdev *netdev OVS_UNUSED,
>>>   }
>>>
>>>   static inline int
>>> -netdev_dpdk_rte_flow_get_restore_info(struct netdev *netdev OVS_UNUSED,
>>> -                                      struct dp_packet *p OVS_UNUSED,
>>> -                                      struct rte_flow_restore_info *info,
>>> -                                      struct rte_flow_error *error)
>>> +netdev_dpdk_rte_flow_get_restore_info(
>>> +    struct netdev *netdev OVS_UNUSED,
>>> +    struct dp_packet *p OVS_UNUSED,
>>> +    struct rte_flow_restore_info *info OVS_UNUSED,
>>> +    struct rte_flow_error *error)
>>>   {
>>> -    (void) info;
>>>       set_error(error, RTE_FLOW_ERROR_TYPE_ATTR);
>>>       return -1;
>>>   }
>>>
>>>   static inline int
>>> -netdev_dpdk_rte_flow_tunnel_action_decap_release
>>> -    (struct netdev *netdev OVS_UNUSED,
>>> -     struct rte_flow_action *actions OVS_UNUSED,
>>> -     uint32_t num_of_actions OVS_UNUSED,
>>> -     struct rte_flow_error *error)
>>> +netdev_dpdk_rte_flow_tunnel_action_decap_release(
>>> +    struct netdev *netdev OVS_UNUSED,
>>> +    struct rte_flow_action *actions OVS_UNUSED,
>>> +    uint32_t num_of_actions OVS_UNUSED,
>>> +    struct rte_flow_error *error)
>>>   {
>>>       set_error(error, RTE_FLOW_ERROR_TYPE_NONE);
>>>       return 0;
>>>   }
>>>
>>>   static inline int
>>> -netdev_dpdk_rte_flow_tunnel_item_release(struct netdev *netdev OVS_UNUSED,
>>> -                                         struct rte_flow_item *items,
>>> -                                         uint32_t num_of_items OVS_UNUSED,
>>> -                                         struct rte_flow_error *error)
>>> +netdev_dpdk_rte_flow_tunnel_item_release(
>>> +    struct netdev *netdev OVS_UNUSED,
>>> +    struct rte_flow_item *items OVS_UNUSED,
>>> +    uint32_t num_of_items OVS_UNUSED,
>>> +    struct rte_flow_error *error)
>>>   {
>>> -    (void) items;
>>>       set_error(error, RTE_FLOW_ERROR_TYPE_NONE);
>>>       return 0;
>>>   }
>>> diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
>>> index 52a74a707..363f32f71 100644
>>> --- a/lib/netdev-offload-dpdk.c
>>> +++ b/lib/netdev-offload-dpdk.c
>>> @@ -459,7 +459,7 @@ dump_flow_action(struct ds *s, struct ds *s_extra,
>>>                  act_index >= flow_actions->tnl_pmd_actions_pos &&
>>>                  act_index < flow_actions->tnl_pmd_actions_pos +
>>>                              flow_actions->tnl_pmd_actions_cnt) {
>>> -        /* Opaque PMD tunnel actions is skipped. */
>>> +        /* Opaque PMD tunnel actions are skipped. */
>>>           return;
>>>       } else if (actions->type == RTE_FLOW_ACTION_TYPE_MARK) {
>>>           const struct rte_flow_action_mark *mark = actions->conf;
>>> @@ -792,9 +792,9 @@ free_flow_actions(struct flow_actions *actions)
>>>       for (i = 0; i < actions->cnt; i++) {
>>>           if (actions->tnl_pmd_actions_cnt &&
>>>               i == actions->tnl_pmd_actions_pos) {
>>> -            if (netdev_dpdk_rte_flow_tunnel_action_decap_release
>>> -                    (actions->tnl_netdev, actions->tnl_pmd_actions,
>>> -                     actions->tnl_pmd_actions_cnt, &error)) {
>>> +            if (netdev_dpdk_rte_flow_tunnel_action_decap_release(
>>> +                    actions->tnl_netdev, actions->tnl_pmd_actions,
>>> +                    actions->tnl_pmd_actions_cnt, &error)) {
>>>                   VLOG_DBG_RL(&rl, "%s: "
>>>                               "netdev_dpdk_rte_flow_tunnel_action_decap_release "
>>>                               "failed: %d (%s).",
>>> @@ -1009,7 +1009,7 @@ parse_vxlan_match(struct flow_patterns *patterns,
>>>       return 0;
>>>   }
>>>
>>> -static int
>>> +static int OVS_UNUSED

Note that if experimental is allowed, the OVS_UNUSED attribute is 
misleading.

Also see below.

>>>   parse_flow_tnl_match(struct netdev *tnldev,
>>>                        struct flow_patterns *patterns,
>>>                        odp_port_t orig_in_port,
>>> @@ -1031,7 +1031,7 @@ parse_flow_tnl_match(struct netdev *tnldev,
>>>
>>>   static int
>>>   parse_flow_match(struct netdev *netdev,
>>> -                 odp_port_t orig_in_port,
>>> +                 odp_port_t orig_in_port OVS_UNUSED,
>>>                    struct flow_patterns *patterns,
>>>                    struct match *match)
>>>   {
>>> @@ -1045,10 +1045,12 @@ parse_flow_match(struct netdev *netdev,
>>>       }
>>>
>>>       patterns->physdev = netdev;
>>> +#ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */

In my opinion those should be removed in netdev-offload-dpdk.c, and keep 
such #ifdef only in netdev-dpdk (with stubs), so later, when dpdk 
removes the experimental attribute, there will be a single place to change.

This applies both to parse_flow_tnl_match and add_tnl_pop_action.

However, this is not critical and I would not hold the merge because of 
this.
>>>       if (netdev_vport_is_vport_class(netdev->netdev_class) &&
>>>           parse_flow_tnl_match(netdev, patterns, orig_in_port, match)) {
>>>           return -1;
>>>       }
>>> +#endif
>>>       memset(&consumed_masks->in_port, 0, sizeof consumed_masks->in_port);
>>>       /* recirc id must be zero. */
>>>       if (match->wc.masks.recirc_id & match->flow.recirc_id) {
>>> @@ -1679,7 +1681,7 @@ add_jump_action(struct flow_actions *actions, uint32_t group)
>>>       add_flow_action(actions, RTE_FLOW_ACTION_TYPE_JUMP, jump);
>>>   }
>>>
>>> -static int
>>> +static int OVS_UNUSED
>>>   add_tnl_pop_action(struct netdev *netdev,
>>>                      struct flow_actions *actions,
>>>                      const struct nlattr *nla)
>>> @@ -1767,10 +1769,12 @@ parse_flow_actions(struct netdev *netdev,
>>>                                       clone_actions_len)) {
>>>                   return -1;
>>>               }
>>> +#ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
>>>           } else if (nl_attr_type(nla) == OVS_ACTION_ATTR_TUNNEL_POP) {
>>>               if (add_tnl_pop_action(netdev, actions, nla)) {
>>>                   return -1;
>>>               }
>>> +#endif
>>>           } else {
>>>               VLOG_DBG_RL(&rl, "Unsupported action type %d", nl_attr_type(nla));
>>>               return -1;
>>> @@ -2067,7 +2071,7 @@ get_vxlan_netdev_cb(struct netdev *netdev,
>>>       struct get_vport_netdev_aux *aux = aux_;
>>>
>>>       if (strcmp(netdev_get_type(netdev), "vxlan")) {
>>> -       return false;
>>> +        return false;
>>>       }
>>>
>>>       tnl_cfg = netdev_get_tunnel_config(netdev);
>>> @@ -2121,18 +2125,16 @@ netdev_offload_dpdk_hw_miss_packet_recover(struct netdev *netdev,
>>>       struct rte_flow_restore_info rte_restore_info;
>>>       struct rte_flow_tunnel *rte_tnl;
>>>       struct netdev *vport_netdev;
>>> -    struct rte_flow_error error;
>>>       struct pkt_metadata *md;
>>>       struct flow_tnl *md_tnl;
>>>       odp_port_t vport_odp;
>>>       int ret = 0;
>>>
>>>       if (netdev_dpdk_rte_flow_get_restore_info(netdev, packet,
>>> -                                              &rte_restore_info, &error)) {
>>> +                                              &rte_restore_info, NULL)) {
>>>           /* This function is called for every packet, and in most cases there
>>>            * will be no restore info from the HW, thus error is expected.
>>>            */
>>> -        (void) error;
>>>           return 0;
>>>       }
>>>
>>> @@ -2144,25 +2146,25 @@ netdev_offload_dpdk_hw_miss_packet_recover(struct netdev *netdev,
>>>       vport_netdev = get_vport_netdev(netdev->dpif_type, rte_tnl,
>>>                                       &vport_odp);
>>>       if (!vport_netdev) {
>>> -        VLOG_WARN("Could not find vport netdev");
>>> +        VLOG_WARN_RL(&rl, "Could not find vport netdev");
>>>           return EOPNOTSUPP;
>>>       }
>>>
>>>       md = &packet->md;
>>>       /* For tunnel recovery (RTE_FLOW_RESTORE_INFO_TUNNEL), it is possible
>>> -     * to have the packet to still be encapsulated, or not
>>> -     * (RTE_FLOW_RESTORE_INFO_ENCAPSULATED).
>>> +     * to have the packet to still be encapsulated, or not.  This is reflected
>>> +     * by the RTE_FLOW_RESTORE_INFO_ENCAPSULATED flag.
>>>        * In the case it is on, the packet is still encapsulated, and we do
>>>        * the pop in SW.
>>>        * In the case it is off, the packet is already decapsulated by HW, and
>>> -     * the tunnel info is provided in the tunnel struct. For this case we
>>> +     * the tunnel info is provided in the tunnel struct.  For this case we
>>>        * take it to OVS metadata.
>>>        */
>>>       if (rte_restore_info.flags & RTE_FLOW_RESTORE_INFO_ENCAPSULATED) {
>>>           if (!vport_netdev->netdev_class ||
>>>               !vport_netdev->netdev_class->pop_header) {
>>> -            VLOG_ERR("vport nedtdev=%s with no pop_header method",
>>> -                     netdev_get_name(vport_netdev));
>>> +            VLOG_ERR_RL(&rl, "vport nedtdev=%s with no pop_header method",
>>> +                        netdev_get_name(vport_netdev));
>>>               ret = EOPNOTSUPP;
>>>               goto close_vport_netdev;
>>>           }
>>> @@ -2171,7 +2173,7 @@ netdev_offload_dpdk_hw_miss_packet_recover(struct netdev *netdev,
>>>               /* If there is an error with popping the header, the packet is
>>>                * freed. In this case it should not continue SW processing.
>>>                */
>>> -            ret = -1;
>>> +            ret = EINVAL;
>>>               goto close_vport_netdev;
>>>           }
>>>       } else {
>>> diff --git a/lib/netdev-offload-provider.h b/lib/netdev-offload-provider.h
>>> index f24c7dd19..348ca7081 100644
>>> --- a/lib/netdev-offload-provider.h
>>> +++ b/lib/netdev-offload-provider.h
>>> @@ -89,7 +89,8 @@ struct netdev_flow_api {
>>>
>>>       /* Recover the packet state (contents and data) for continued processing
>>>        * in software.
>>> -     * Return 0 if successful, otherwise returns a positive errno value. */
>>> +     * Return 0 if successful, otherwise returns a positive errno value and
>>> +     * takes ownership of a packet if errno != EOPNOTSUPP. */
>>>       int (*hw_miss_packet_recover)(struct netdev *, struct dp_packet *);
>>>
>>>       /* Initializies the netdev flow api.
>>> ---
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&amp;data=04%7C01%7Celibr%40nvidia.com%7Ca74658db54f34dda383208d9365a2a66%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637600583912503391%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&amp;sdata=%2B0FlHTBOVLb1A5dT3IddrxmRAWFtdVtHXQ0K9YH6M0M%3D&amp;reserved=0
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmail.openvswitch.org%2Fmailman%2Flistinfo%2Fovs-dev&amp;data=04%7C01%7Celibr%40nvidia.com%7Ca74658db54f34dda383208d9365a2a66%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637600583912503391%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&amp;sdata=%2B0FlHTBOVLb1A5dT3IddrxmRAWFtdVtHXQ0K9YH6M0M%3D&amp;reserved=0
Ilya Maximets June 23, 2021, 6:49 p.m. UTC | #15
On 6/23/21 5:48 PM, Eli Britstein wrote:
>>>> @@ -1009,7 +1009,7 @@ parse_vxlan_match(struct flow_patterns *patterns,
>>>>       return 0;
>>>>   }
>>>>
>>>> -static int
>>>> +static int OVS_UNUSED
> 
> Note that if experimental is allowed, the OVS_UNUSED attribute is misleading.

Yeah, I know.  We might introduce OVS_MAY_BE_UNUSED macro someday, but that is
really minor.

> 
> Also see below.
> 
>>>>   parse_flow_tnl_match(struct netdev *tnldev,
>>>>                        struct flow_patterns *patterns,
>>>>                        odp_port_t orig_in_port,
>>>> @@ -1031,7 +1031,7 @@ parse_flow_tnl_match(struct netdev *tnldev,
>>>>
>>>>   static int
>>>>   parse_flow_match(struct netdev *netdev,
>>>> -                 odp_port_t orig_in_port,
>>>> +                 odp_port_t orig_in_port OVS_UNUSED,
>>>>                    struct flow_patterns *patterns,
>>>>                    struct match *match)
>>>>   {
>>>> @@ -1045,10 +1045,12 @@ parse_flow_match(struct netdev *netdev,
>>>>       }
>>>>
>>>>       patterns->physdev = netdev;
>>>> +#ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
> 
> In my opinion those should be removed in netdev-offload-dpdk.c, and keep such #ifdef only in netdev-dpdk (with stubs), so later, when dpdk removes the experimental attribute, there will be a single place to change.
> 
> This applies both to parse_flow_tnl_match and add_tnl_pop_action.
> 
> However, this is not critical and I would not hold the merge because of this.

I agree that it's a bit of an overthinking from my side, but we will
need to introduce this kind of guarding here if DPDK APIs will become
non-experimental not all at once.  I'm not sure if that is a possible
scenario, but just in case.  Looking more at the code, I agree that
they are unnecessary for current version, but let them be, as they will
remind us to re-check things once some of APIs will become stable.

Best regards, Ilya Maximets.