mbox series

[ovs-dev,v2,0/8] OVS-DPDK flow offload with rte_flow

Message ID 1504603381-30071-1-git-send-email-yliu@fridaylinux.org
Headers show
Series OVS-DPDK flow offload with rte_flow | expand

Message

Yuanhan Liu Sept. 5, 2017, 9:22 a.m. UTC
Hi,

Here is a joint work from Mellanox and Napatech, to enable the flow hw
offload with the DPDK generic flow interface (rte_flow).

The basic idea is to associate the flow with a mark id (a unit32_t number).
Later, we then get the flow directly from the mark id, bypassing the heavy
emc processing, including miniflow_extract.

The association is done with CMAP in patch 1. It also reuses the flow
APIs introduced while adding the tc offloads. The emc bypassing is done
in patch 2. The flow offload is done in patch 4, which mainly does two
things:

- translate the ovs match to DPDK rte flow patterns
- bind those patterns with a MARK action.

Afterwards, the NIC will set the mark id in every pkt's mbuf when it
matches the flow. That's basically how we could get the flow directly
from the received mbuf.

While testing with PHY-PHY forwarding with one core and one queue, I got
about 54% performance boost. For PHY-vhost forwarding, I got about 41%
performance boost. The reason it's lower than v1 is I added the logic
to get the correct tcp_flags, which examines all packets recieved.

The major issue mentioned in last version is also workarounded: the
queue index is never set to 0 blindly anymore, but set to the rxq that
first receives the upcall pkt.

Note that it's disabled by default, which can be enabled by:

    $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true


v2: - workaround the queue action issue
    - fixed the tcp_flags being skipped issue, which also fixed the
      build warnings
    - fixed l2 patterns for Intel nic
    - Converted some macros to functions
    - did not hardcode the max number of flow/action
    - rebased on top of the lastest code

Thanks.

    --yliu


---
Finn Christensen (3):
  netdev-dpdk: implement flow put with rte flow
  netdev-dpdk: retry with queue action
  netdev-dpdk: set FDIR config

Shachar Beiser (1):
  dpif-netdev: record rx queue id for the upcall

Yuanhan Liu (4):
  dpif-netdev: associate flow with a mark id
  dpif-netdev: retrieve flow directly from the flow mark
  netdev-dpdk: convert ufid to dpdk flow
  netdev-dpdk: remove offloaded flow on deletion

 lib/dp-packet.h   |  14 ++
 lib/dpif-netdev.c | 132 +++++++++++--
 lib/flow.c        |  78 ++++++++
 lib/flow.h        |   1 +
 lib/netdev-dpdk.c | 574 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 lib/netdev.c      |   1 +
 lib/netdev.h      |   7 +
 7 files changed, 795 insertions(+), 12 deletions(-)

Comments

Patil, Harish Sept. 6, 2017, 6:33 a.m. UTC | #1
-----Original Message-----
From: <ovs-dev-bounces@openvswitch.org> on behalf of Yuanhan Liu
<yliu@fridaylinux.org>
Date: Tuesday, September 5, 2017 at 2:22 AM
To: "dev@openvswitch.org" <dev@openvswitch.org>
Subject: [ovs-dev] [PATCH v2 0/8] OVS-DPDK flow offload with rte_flow

>Hi,
>
>Here is a joint work from Mellanox and Napatech, to enable the flow hw
>offload with the DPDK generic flow interface (rte_flow).
>
>The basic idea is to associate the flow with a mark id (a unit32_t
>number).
>Later, we then get the flow directly from the mark id, bypassing the heavy
>emc processing, including miniflow_extract.
>
>The association is done with CMAP in patch 1. It also reuses the flow
>APIs introduced while adding the tc offloads. The emc bypassing is done
>in patch 2. The flow offload is done in patch 4, which mainly does two
>things:
>
>- translate the ovs match to DPDK rte flow patterns
>- bind those patterns with a MARK action.
>
>Afterwards, the NIC will set the mark id in every pkt's mbuf when it
>matches the flow. That's basically how we could get the flow directly
>from the received mbuf.
>
>While testing with PHY-PHY forwarding with one core and one queue, I got
>about 54% performance boost. For PHY-vhost forwarding, I got about 41%
>performance boost. The reason it's lower than v1 is I added the logic
>to get the correct tcp_flags, which examines all packets recieved.
>
>The major issue mentioned in last version is also workarounded: the
>queue index is never set to 0 blindly anymore, but set to the rxq that
>first receives the upcall pkt.
>
>Note that it's disabled by default, which can be enabled by:
>
>    $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
>
>
>v2: - workaround the queue action issue
>    - fixed the tcp_flags being skipped issue, which also fixed the
>      build warnings
>    - fixed l2 patterns for Intel nic
>    - Converted some macros to functions
>    - did not hardcode the max number of flow/action
>    - rebased on top of the lastest code
>
>Thanks.
>
>    --yliu
>
>
>---
>Finn Christensen (3):
>  netdev-dpdk: implement flow put with rte flow
>  netdev-dpdk: retry with queue action
>  netdev-dpdk: set FDIR config
>
>Shachar Beiser (1):
>  dpif-netdev: record rx queue id for the upcall
>
>Yuanhan Liu (4):
>  dpif-netdev: associate flow with a mark id
>  dpif-netdev: retrieve flow directly from the flow mark
>  netdev-dpdk: convert ufid to dpdk flow
>  netdev-dpdk: remove offloaded flow on deletion
>
> lib/dp-packet.h   |  14 ++
> lib/dpif-netdev.c | 132 +++++++++++--
> lib/flow.c        |  78 ++++++++
> lib/flow.h        |   1 +
> lib/netdev-dpdk.c | 574
>+++++++++++++++++++++++++++++++++++++++++++++++++++++-
> lib/netdev.c      |   1 +
> lib/netdev.h      |   7 +
> 7 files changed, 795 insertions(+), 12 deletions(-)
>
>-- 
>2.7.4
>

Hi all,

Can you please confirm that you are supporting offloading of both the EMC
flows and DPCLs (megaflows) here, i.e. OVS would skip hash table lookups
in both the cases if UFID is provided in the MBUF. Assuming that is
correct, when a match is found in dpcls, does OVS insert that new flow
back into the EMC cache?

Thanks,
Harish


>
Finn Christensen Sept. 6, 2017, 6:53 a.m. UTC | #2
-----Original Message-----
From: Patil, Harish [mailto:Harish.Patil@cavium.com] 
Sent: 6. september 2017 08:34
To: Yuanhan Liu <yliu@fridaylinux.org>; dev@openvswitch.org
Cc: Finn Christensen <fc@napatech.com>; dball@vmware.com
Subject: Re: [ovs-dev] [PATCH v2 0/8] OVS-DPDK flow offload with rte_flow



-----Original Message-----
From: <ovs-dev-bounces@openvswitch.org> on behalf of Yuanhan Liu <yliu@fridaylinux.org>
Date: Tuesday, September 5, 2017 at 2:22 AM
To: "dev@openvswitch.org" <dev@openvswitch.org>
Subject: [ovs-dev] [PATCH v2 0/8] OVS-DPDK flow offload with rte_flow

>Hi,
>
>Here is a joint work from Mellanox and Napatech, to enable the flow hw 
>offload with the DPDK generic flow interface (rte_flow).
>
>The basic idea is to associate the flow with a mark id (a unit32_t 
>number).
>Later, we then get the flow directly from the mark id, bypassing the 
>heavy emc processing, including miniflow_extract.
>
>The association is done with CMAP in patch 1. It also reuses the flow 
>APIs introduced while adding the tc offloads. The emc bypassing is done 
>in patch 2. The flow offload is done in patch 4, which mainly does two
>things:
>
>- translate the ovs match to DPDK rte flow patterns
>- bind those patterns with a MARK action.
>
>Afterwards, the NIC will set the mark id in every pkt's mbuf when it 
>matches the flow. That's basically how we could get the flow directly 
>from the received mbuf.
>
>While testing with PHY-PHY forwarding with one core and one queue, I 
>got about 54% performance boost. For PHY-vhost forwarding, I got about 
>41% performance boost. The reason it's lower than v1 is I added the 
>logic to get the correct tcp_flags, which examines all packets recieved.
>
>The major issue mentioned in last version is also workarounded: the 
>queue index is never set to 0 blindly anymore, but set to the rxq that 
>first receives the upcall pkt.
>
>Note that it's disabled by default, which can be enabled by:
>
>    $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
>
>
>v2: - workaround the queue action issue
>    - fixed the tcp_flags being skipped issue, which also fixed the
>      build warnings
>    - fixed l2 patterns for Intel nic
>    - Converted some macros to functions
>    - did not hardcode the max number of flow/action
>    - rebased on top of the lastest code
>
>Thanks.
>
>    --yliu
>
>
>---
>Finn Christensen (3):
>  netdev-dpdk: implement flow put with rte flow
>  netdev-dpdk: retry with queue action
>  netdev-dpdk: set FDIR config
>
>Shachar Beiser (1):
>  dpif-netdev: record rx queue id for the upcall
>
>Yuanhan Liu (4):
>  dpif-netdev: associate flow with a mark id
>  dpif-netdev: retrieve flow directly from the flow mark
>  netdev-dpdk: convert ufid to dpdk flow
>  netdev-dpdk: remove offloaded flow on deletion
>
> lib/dp-packet.h   |  14 ++
> lib/dpif-netdev.c | 132 +++++++++++--
> lib/flow.c        |  78 ++++++++
> lib/flow.h        |   1 +
> lib/netdev-dpdk.c | 574
>+++++++++++++++++++++++++++++++++++++++++++++++++++++-
> lib/netdev.c      |   1 +
> lib/netdev.h      |   7 +
> 7 files changed, 795 insertions(+), 12 deletions(-)
>
>--
>2.7.4
>

Hi all,

Can you please confirm that you are supporting offloading of both the EMC
flows and DPCLs (megaflows) here, i.e. OVS would skip hash table lookups
in both the cases if UFID is provided in the MBUF. Assuming that is
correct, when a match is found in dpcls, does OVS insert that new flow
back into the EMC cache?

Thanks,
Harish

[Finn]
Yes, you are correct. Once the megaflow is offloaded into NIC, using the flow UFID, 
the EMC and megaflow cache (dpcls) is skipped when a UFID is received in mbuf. When 
receiving these pre-classified packets the EMC is not needed. However, the initial packet
creating the megaflow (and then also creates the NIC rte flow), will be inserted into EMC.
But, new flows that would use the same megaflow, but would create a different EMC entry,
will not be inserted/created in EMC when offloaded by NIC.



>
Patil, Harish Sept. 7, 2017, 6:52 p.m. UTC | #3
-----Original Message-----
From: Finn Christensen <fc@napatech.com>

Date: Tuesday, September 5, 2017 at 11:53 PM
To: Harish Patil <Harish.Patil@cavium.com>, Yuanhan Liu
<yliu@fridaylinux.org>, "dev@openvswitch.org" <dev@openvswitch.org>
Cc: "dball@vmware.com" <dball@vmware.com>
Subject: RE: [ovs-dev] [PATCH v2 0/8] OVS-DPDK flow offload with rte_flow

>

>-----Original Message-----

>From: Patil, Harish [mailto:Harish.Patil@cavium.com]

>Sent: 6. september 2017 08:34

>To: Yuanhan Liu <yliu@fridaylinux.org>; dev@openvswitch.org

>Cc: Finn Christensen <fc@napatech.com>; dball@vmware.com

>Subject: Re: [ovs-dev] [PATCH v2 0/8] OVS-DPDK flow offload with rte_flow

>

>

>

>-----Original Message-----

>From: <ovs-dev-bounces@openvswitch.org> on behalf of Yuanhan Liu

><yliu@fridaylinux.org>

>Date: Tuesday, September 5, 2017 at 2:22 AM

>To: "dev@openvswitch.org" <dev@openvswitch.org>

>Subject: [ovs-dev] [PATCH v2 0/8] OVS-DPDK flow offload with rte_flow

>

>>Hi,

>>

>>Here is a joint work from Mellanox and Napatech, to enable the flow hw

>>offload with the DPDK generic flow interface (rte_flow).

>>

>>The basic idea is to associate the flow with a mark id (a unit32_t

>>number).

>>Later, we then get the flow directly from the mark id, bypassing the

>>heavy emc processing, including miniflow_extract.

>>

>>The association is done with CMAP in patch 1. It also reuses the flow

>>APIs introduced while adding the tc offloads. The emc bypassing is done

>>in patch 2. The flow offload is done in patch 4, which mainly does two

>>things:

>>

>>- translate the ovs match to DPDK rte flow patterns

>>- bind those patterns with a MARK action.

>>

>>Afterwards, the NIC will set the mark id in every pkt's mbuf when it

>>matches the flow. That's basically how we could get the flow directly

>>from the received mbuf.

>>

>>While testing with PHY-PHY forwarding with one core and one queue, I

>>got about 54% performance boost. For PHY-vhost forwarding, I got about

>>41% performance boost. The reason it's lower than v1 is I added the

>>logic to get the correct tcp_flags, which examines all packets recieved.

>>

>>The major issue mentioned in last version is also workarounded: the

>>queue index is never set to 0 blindly anymore, but set to the rxq that

>>first receives the upcall pkt.

>>

>>Note that it's disabled by default, which can be enabled by:

>>

>>    $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

>>

>>

>>v2: - workaround the queue action issue

>>    - fixed the tcp_flags being skipped issue, which also fixed the

>>      build warnings

>>    - fixed l2 patterns for Intel nic

>>    - Converted some macros to functions

>>    - did not hardcode the max number of flow/action

>>    - rebased on top of the lastest code

>>

>>Thanks.

>>

>>    --yliu

>>

>>

>>---

>>Finn Christensen (3):

>>  netdev-dpdk: implement flow put with rte flow

>>  netdev-dpdk: retry with queue action

>>  netdev-dpdk: set FDIR config

>>

>>Shachar Beiser (1):

>>  dpif-netdev: record rx queue id for the upcall

>>

>>Yuanhan Liu (4):

>>  dpif-netdev: associate flow with a mark id

>>  dpif-netdev: retrieve flow directly from the flow mark

>>  netdev-dpdk: convert ufid to dpdk flow

>>  netdev-dpdk: remove offloaded flow on deletion

>>

>> lib/dp-packet.h   |  14 ++

>> lib/dpif-netdev.c | 132 +++++++++++--

>> lib/flow.c        |  78 ++++++++

>> lib/flow.h        |   1 +

>> lib/netdev-dpdk.c | 574

>>+++++++++++++++++++++++++++++++++++++++++++++++++++++-

>> lib/netdev.c      |   1 +

>> lib/netdev.h      |   7 +

>> 7 files changed, 795 insertions(+), 12 deletions(-)

>>

>>--

>>2.7.4

>>

>

>Hi all,

>

>Can you please confirm that you are supporting offloading of both the EMC

>flows and DPCLs (megaflows) here, i.e. OVS would skip hash table lookups

>in both the cases if UFID is provided in the MBUF. Assuming that is

>correct, when a match is found in dpcls, does OVS insert that new flow

>back into the EMC cache?

>

>Thanks,

>Harish

>

>[Finn]

>Yes, you are correct. Once the megaflow is offloaded into NIC, using the

>flow UFID,

>the EMC and megaflow cache (dpcls) is skipped when a UFID is received in

>mbuf. When

>receiving these pre-classified packets the EMC is not needed. However,

>the initial packet

>creating the megaflow (and then also creates the NIC rte flow), will be

>inserted into EMC.


[Harish] Thanks Finn for confirming the behavior.


>But, new flows that would use the same megaflow, but would create a

>different EMC entry,

>will not be inserted/created in EMC when offloaded by NIC.


[Harish I did not fully understand this part. Can you pls elaborate and
possibly with an example?

[Harish] I have another question:
There was a patch series (11/11) submitted regarding offloading dpcls from
Shachar Beiser.

[ovs-dev] [PATCH 00/11] Data Path Classifier Offloading
..
..
[ovs-dev] [PATCH 11/11] ovs/dp-cls: inserting rule to HW from	offloading
thread context.


This does not use RTE_FLOW filtering framework. I don’t know status of
this patch series.
But this is very similar to what is being achieved with your current patch
series using RTE_FLOW.
Which one will be accepted in the end in the mainline OVS branch?

Thanks,
Harish
>
Yuanhan Liu Sept. 8, 2017, 1:58 a.m. UTC | #4
On Thu, Sep 07, 2017 at 06:52:49PM +0000, Patil, Harish wrote:
> >Hi all,
> >
> >Can you please confirm that you are supporting offloading of both the EMC
> >flows and DPCLs (megaflows) here, i.e. OVS would skip hash table lookups
> >in both the cases if UFID is provided in the MBUF. Assuming that is
> >correct, when a match is found in dpcls, does OVS insert that new flow
> >back into the EMC cache?
> >
> >Thanks,
> >Harish
> >
> >[Finn]
> >Yes, you are correct. Once the megaflow is offloaded into NIC, using the
> >flow UFID,
> >the EMC and megaflow cache (dpcls) is skipped when a UFID is received in
> >mbuf. When
> >receiving these pre-classified packets the EMC is not needed. However,
> >the initial packet
> >creating the megaflow (and then also creates the NIC rte flow), will be
> >inserted into EMC.
> 
> [Harish] Thanks Finn for confirming the behavior.
> 
> 
> >But, new flows that would use the same megaflow, but would create a
> >different EMC entry,
> >will not be inserted/created in EMC when offloaded by NIC.
> 
> [Harish I did not fully understand this part. Can you pls elaborate and
> possibly with an example?

It's megaflow cache being offloaded. And the EMC is skipped completely
once there are flow mark in the recved pkts. Thus, it doesn't matter
whether EMC will be inserted back.

Talking about that, I probably need move the "mark -> flow" code to
dpcls but not in emc_processing. Darrel, just let me know which one
you prefer.

> [Harish] I have another question:
> There was a patch series (11/11) submitted regarding offloading dpcls from
> Shachar Beiser.
> 
> [ovs-dev] [PATCH 00/11] Data Path Classifier Offloading
> ..
> ..
> [ovs-dev] [PATCH 11/11] ovs/dp-cls: inserting rule to HW from	offloading
> thread context.
> 
> 
> This does not use RTE_FLOW filtering framework. I don’t know status of
> this patch series.
> But this is very similar to what is being achieved with your current patch
> series using RTE_FLOW.
> Which one will be accepted in the end in the mainline OVS branch?

As it's been stated in the first sentence of my cover letter, it's a
joint work from Mellanox and Napatech. Shachar (who is from Mellanox)
does't have time to work on this project anymore, leaving me (who is
also from Mellanox) to continue the work. And then my choice was to
continue it based on what Napatech already had, for a simple reason:
it's simpler.

OTOH, do you see anything missing in this patchset, comparing the one
from Shachar?

	--yliu
Darrell Ball Sept. 8, 2017, 7:29 p.m. UTC | #5
Hi Yuanhan

In the dpdk public meeting, we requested Simon to also review this series to check
that it is sync with the HWOL for kernel with respect to one or two high level aspects.
I had some comments related to coding standards for V2, but I’ll wait for you to respond
to Simon’s comments before adding my own other comments, in order to avoid confusion.

Thanks Darrell
 

On 9/5/17, 2:22 AM, "Yuanhan Liu" <yliu@fridaylinux.org> wrote:

    Hi,
    
    Here is a joint work from Mellanox and Napatech, to enable the flow hw
    offload with the DPDK generic flow interface (rte_flow).
    
    The basic idea is to associate the flow with a mark id (a unit32_t number).
    Later, we then get the flow directly from the mark id, bypassing the heavy
    emc processing, including miniflow_extract.
    
    The association is done with CMAP in patch 1. It also reuses the flow
    APIs introduced while adding the tc offloads. The emc bypassing is done
    in patch 2. The flow offload is done in patch 4, which mainly does two
    things:
    
    - translate the ovs match to DPDK rte flow patterns
    - bind those patterns with a MARK action.
    
    Afterwards, the NIC will set the mark id in every pkt's mbuf when it
    matches the flow. That's basically how we could get the flow directly
    from the received mbuf.
    
    While testing with PHY-PHY forwarding with one core and one queue, I got
    about 54% performance boost. For PHY-vhost forwarding, I got about 41%
    performance boost. The reason it's lower than v1 is I added the logic
    to get the correct tcp_flags, which examines all packets recieved.
    
    The major issue mentioned in last version is also workarounded: the
    queue index is never set to 0 blindly anymore, but set to the rxq that
    first receives the upcall pkt.
    
    Note that it's disabled by default, which can be enabled by:
    
        $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
    
    
    v2: - workaround the queue action issue
        - fixed the tcp_flags being skipped issue, which also fixed the
          build warnings
        - fixed l2 patterns for Intel nic
        - Converted some macros to functions
        - did not hardcode the max number of flow/action
        - rebased on top of the lastest code
    
    Thanks.
    
        --yliu
    
    
    ---
    Finn Christensen (3):
      netdev-dpdk: implement flow put with rte flow
      netdev-dpdk: retry with queue action
      netdev-dpdk: set FDIR config
    
    Shachar Beiser (1):
      dpif-netdev: record rx queue id for the upcall
    
    Yuanhan Liu (4):
      dpif-netdev: associate flow with a mark id
      dpif-netdev: retrieve flow directly from the flow mark
      netdev-dpdk: convert ufid to dpdk flow
      netdev-dpdk: remove offloaded flow on deletion
    
     lib/dp-packet.h   |  14 ++
     lib/dpif-netdev.c | 132 +++++++++++--
     lib/flow.c        |  78 ++++++++
     lib/flow.h        |   1 +
     lib/netdev-dpdk.c | 574 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
     lib/netdev.c      |   1 +
     lib/netdev.h      |   7 +
     7 files changed, 795 insertions(+), 12 deletions(-)
    
    -- 
    2.7.4
Chandran, Sugesh Sept. 10, 2017, 4:12 p.m. UTC | #6
Hi Yuanhan,

Thank you for sending out the patch series. 

We are also looking into something similar to enable full offload in OVS-DPDK.
It is based on ' http://dpdk.org/ml/archives/dev/2017-September/074746.html' and some other rte_flow extension in DPDK.

It is noted that the patch series doesn't work very well for some of our requirements.
Please find below for the high level comments. I have also provided specific comments on the following patches.

1) Looks to me the patch series is to enable/use just one functionality of NIC(mark action). In a multiple hardware environment it is necessary to have a feature discovery mechanism to define what needs to be installed in the hardware based on its capabilities, for eg: MARK+QUEUE, MARK only, number of supported flow entries, supported flow fields and etc. This is very important to support different hardware NICs and make flow install easy.
In our implementation we have a feature discovery at the OVS init. It will also populate the OVSDB to expose the device capability to higher management layers. The new table introduced in OVSDB is like below.

  <table name="hw_offload">
    <p>
      Hardware switching configuration and capabilities.
    </p>
    <column name="name">
      The name of hardware acceleration device.
    </column>
    <column name="dev_id" type='{"type": "integer", "minInteger": 0, "maxInteger": 7}'>
      The integer device id of hardware accelerated NIC.
    </column>
     <column name="pci_id" type='{"type": "string"}'>
      The PCI ID of the hardware acceleration device. The broker id/PF id.
     </column>
     <column name="features" key="n_vhost_ports" type='{"type": "integer"}'>
      The number of supported vhost ports in the hardware switch.
     </column>
  </table>

The features column can be extended with more fields as necessary.
IMO the proposed partial offload doesn't need populating the OVSDB, however its necessary to have some kind of feature discovery at init.

2) I feel its better to keep the hardware offload functionalities in netdev as much as possible similar to kernel implementation. I see changes in upcall and dpif. 

3) The cost of flow install . PMDs are blocked when a hardware flow install is happening. Its an issue when there are lot of short lived flows are getting installed in the DP. One option to handle this would be move the flow install into revalidate. The advantage of this approach would be hardware offload would happen only when a flow is being used at least for sometime. Something like how revalidator thread handle the flow modify operation.

4) AFAIK, these hardware programmability for a NIC/not for a specific port. i.e the FDIR/RSS hash configuration are device specific. This will be an issue if a NIC shared between kernel and DPDK drivers? 


Regards
_Sugesh


> -----Original Message-----
> From: ovs-dev-bounces@openvswitch.org [mailto:ovs-dev-
> bounces@openvswitch.org] On Behalf Of Yuanhan Liu
> Sent: Tuesday, September 5, 2017 10:23 AM
> To: dev@openvswitch.org
> Subject: [ovs-dev] [PATCH v2 0/8] OVS-DPDK flow offload with rte_flow
> 
> Hi,
> 
> Here is a joint work from Mellanox and Napatech, to enable the flow hw
> offload with the DPDK generic flow interface (rte_flow).
> 
> The basic idea is to associate the flow with a mark id (a unit32_t number).
> Later, we then get the flow directly from the mark id, bypassing the heavy
> emc processing, including miniflow_extract.
> 
> The association is done with CMAP in patch 1. It also reuses the flow APIs
> introduced while adding the tc offloads. The emc bypassing is done in patch
> 2. The flow offload is done in patch 4, which mainly does two
> things:
> 
> - translate the ovs match to DPDK rte flow patterns
> - bind those patterns with a MARK action.
> 
> Afterwards, the NIC will set the mark id in every pkt's mbuf when it matches
> the flow. That's basically how we could get the flow directly from the
> received mbuf.
> 
> While testing with PHY-PHY forwarding with one core and one queue, I got
> about 54% performance boost. For PHY-vhost forwarding, I got about 41%
> performance boost. The reason it's lower than v1 is I added the logic to get
> the correct tcp_flags, which examines all packets recieved.
> 
> The major issue mentioned in last version is also workarounded: the queue
> index is never set to 0 blindly anymore, but set to the rxq that first receives
> the upcall pkt.
> 
> Note that it's disabled by default, which can be enabled by:
> 
>     $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
> 
> 
> v2: - workaround the queue action issue
>     - fixed the tcp_flags being skipped issue, which also fixed the
>       build warnings
>     - fixed l2 patterns for Intel nic
>     - Converted some macros to functions
>     - did not hardcode the max number of flow/action
>     - rebased on top of the lastest code
> 
> Thanks.
> 
>     --yliu
> 
> 
> ---
> Finn Christensen (3):
>   netdev-dpdk: implement flow put with rte flow
>   netdev-dpdk: retry with queue action
>   netdev-dpdk: set FDIR config
> 
> Shachar Beiser (1):
>   dpif-netdev: record rx queue id for the upcall
> 
> Yuanhan Liu (4):
>   dpif-netdev: associate flow with a mark id
>   dpif-netdev: retrieve flow directly from the flow mark
>   netdev-dpdk: convert ufid to dpdk flow
>   netdev-dpdk: remove offloaded flow on deletion
> 
>  lib/dp-packet.h   |  14 ++
>  lib/dpif-netdev.c | 132 +++++++++++--
>  lib/flow.c        |  78 ++++++++
>  lib/flow.h        |   1 +
>  lib/netdev-dpdk.c | 574
> +++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  lib/netdev.c      |   1 +
>  lib/netdev.h      |   7 +
>  7 files changed, 795 insertions(+), 12 deletions(-)
> 
> --
> 2.7.4
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Yuanhan Liu Sept. 11, 2017, 6:21 a.m. UTC | #7
I see. Thanks!

	--yliu

On Fri, Sep 08, 2017 at 07:29:48PM +0000, Darrell Ball wrote:
> Hi Yuanhan
> 
> In the dpdk public meeting, we requested Simon to also review this series to check
> that it is sync with the HWOL for kernel with respect to one or two high level aspects.
> I had some comments related to coding standards for V2, but I’ll wait for you to respond
> to Simon’s comments before adding my own other comments, in order to avoid confusion.
> 
> Thanks Darrell
>  
> 
> On 9/5/17, 2:22 AM, "Yuanhan Liu" <yliu@fridaylinux.org> wrote:
> 
>     Hi,
>     
>     Here is a joint work from Mellanox and Napatech, to enable the flow hw
>     offload with the DPDK generic flow interface (rte_flow).
>     
>     The basic idea is to associate the flow with a mark id (a unit32_t number).
>     Later, we then get the flow directly from the mark id, bypassing the heavy
>     emc processing, including miniflow_extract.
>     
>     The association is done with CMAP in patch 1. It also reuses the flow
>     APIs introduced while adding the tc offloads. The emc bypassing is done
>     in patch 2. The flow offload is done in patch 4, which mainly does two
>     things:
>     
>     - translate the ovs match to DPDK rte flow patterns
>     - bind those patterns with a MARK action.
>     
>     Afterwards, the NIC will set the mark id in every pkt's mbuf when it
>     matches the flow. That's basically how we could get the flow directly
>     from the received mbuf.
>     
>     While testing with PHY-PHY forwarding with one core and one queue, I got
>     about 54% performance boost. For PHY-vhost forwarding, I got about 41%
>     performance boost. The reason it's lower than v1 is I added the logic
>     to get the correct tcp_flags, which examines all packets recieved.
>     
>     The major issue mentioned in last version is also workarounded: the
>     queue index is never set to 0 blindly anymore, but set to the rxq that
>     first receives the upcall pkt.
>     
>     Note that it's disabled by default, which can be enabled by:
>     
>         $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
>     
>     
>     v2: - workaround the queue action issue
>         - fixed the tcp_flags being skipped issue, which also fixed the
>           build warnings
>         - fixed l2 patterns for Intel nic
>         - Converted some macros to functions
>         - did not hardcode the max number of flow/action
>         - rebased on top of the lastest code
>     
>     Thanks.
>     
>         --yliu
>     
>     
>     ---
>     Finn Christensen (3):
>       netdev-dpdk: implement flow put with rte flow
>       netdev-dpdk: retry with queue action
>       netdev-dpdk: set FDIR config
>     
>     Shachar Beiser (1):
>       dpif-netdev: record rx queue id for the upcall
>     
>     Yuanhan Liu (4):
>       dpif-netdev: associate flow with a mark id
>       dpif-netdev: retrieve flow directly from the flow mark
>       netdev-dpdk: convert ufid to dpdk flow
>       netdev-dpdk: remove offloaded flow on deletion
>     
>      lib/dp-packet.h   |  14 ++
>      lib/dpif-netdev.c | 132 +++++++++++--
>      lib/flow.c        |  78 ++++++++
>      lib/flow.h        |   1 +
>      lib/netdev-dpdk.c | 574 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
>      lib/netdev.c      |   1 +
>      lib/netdev.h      |   7 +
>      7 files changed, 795 insertions(+), 12 deletions(-)
>     
>     -- 
>     2.7.4
>     
>     
>
Yuanhan Liu Sept. 11, 2017, 9:11 a.m. UTC | #8
On Sun, Sep 10, 2017 at 04:12:47PM +0000, Chandran, Sugesh wrote:
> Hi Yuanhan,
> 
> Thank you for sending out the patch series. 

Hi Sugesh,

Thank you for taking your time to review it!


> We are also looking into something similar to enable full offload in OVS-DPDK.

Good to know!

> It is based on ' http://dpdk.org/ml/archives/dev/2017-September/074746.html' and some other rte_flow extension in DPDK.

I saw the patches, I will take some time to read it.

> 
> It is noted that the patch series doesn't work very well for some of our requirements.
> Please find below for the high level comments. I have also provided specific comments on the following patches.
> 
> 1) Looks to me the patch series is to enable/use just one functionality of NIC(mark action). In a multiple hardware environment it is necessary to have a feature discovery mechanism to define what needs to be installed in the hardware based on its capabilities, for eg: MARK+QUEUE, MARK only, number of supported flow entries, supported flow fields and etc. This is very important to support different hardware NICs and make flow install easy.

Yes, you are right. I have also observed this issue while coding this
patch.

> In our implementation we have a feature discovery at the OVS init. It will also populate the OVSDB to expose the device capability to higher management layers. The new table introduced in OVSDB is like below.

The solution I want to go, however, is different though. I was thinking
to introduce few DPDK rte_flow APIs and structs to define the NIC flow
capabilities.

I think this would help long run, as the capabilitiy will be updated
as the new features are added (when new versions are released). For the
solution you proposed, it won't allow DPDK work with multiple DPDK versions
(assume they provides different rte flow capabilities).

>   <table name="hw_offload">
>     <p>
>       Hardware switching configuration and capabilities.
>     </p>
>     <column name="name">
>       The name of hardware acceleration device.
>     </column>
>     <column name="dev_id" type='{"type": "integer", "minInteger": 0, "maxInteger": 7}'>
>       The integer device id of hardware accelerated NIC.
>     </column>
>      <column name="pci_id" type='{"type": "string"}'>
>       The PCI ID of the hardware acceleration device. The broker id/PF id.
>      </column>
>      <column name="features" key="n_vhost_ports" type='{"type": "integer"}'>
>       The number of supported vhost ports in the hardware switch.
>      </column>
>   </table>
> 
> The features column can be extended with more fields as necessary.
> IMO the proposed partial offload doesn't need populating the OVSDB, however its necessary to have some kind of feature discovery at init.
> 
> 2) I feel its better to keep the hardware offload functionalities in netdev as much as possible similar to kernel implementation. I see changes in upcall and dpif. 

I agree with you. But unfortunately, due to some dirver or hardware
limitation, that's what I can get best.

> 3) The cost of flow install . PMDs are blocked when a hardware flow install is happening. Its an issue when there are lot of short lived flows are getting installed in the DP.

I wasn't aware of it. Thank you for letting me know that!

> One option to handle this would be move the flow install into revalidate. The advantage of this approach would be hardware offload would happen only when a flow is being used at least for sometime. Something like how revalidator thread handle the flow modify operation.

Yes, it sounds workable. However, the MARK and QUEUE workaround won't
work then: we need record the rxq first. And again, I know the workaround
is far from being perfect.

> 4) AFAIK, these hardware programmability for a NIC/not for a specific port. i.e the FDIR/RSS hash configuration are device specific. This will be an issue if a NIC shared between kernel and DPDK drivers? 

That might be NIC specific. What do you mean by sharing between kernel
and DPDK? In most NICs I'm aware of, it's requried to unbind the kernel
driver first. Thus, it won't be shared. For Mellanox, the control unit
is based on queues, thus it could be shared correctly.

	--yliu
Chandran, Sugesh Sept. 11, 2017, 10 a.m. UTC | #9
Regards
_Sugesh


> -----Original Message-----
> From: Yuanhan Liu [mailto:yliu@fridaylinux.org]
> Sent: Monday, September 11, 2017 10:12 AM
> To: Chandran, Sugesh <sugesh.chandran@intel.com>
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v2 0/8] OVS-DPDK flow offload with rte_flow
> 
> On Sun, Sep 10, 2017 at 04:12:47PM +0000, Chandran, Sugesh wrote:
> > Hi Yuanhan,
> >
> > Thank you for sending out the patch series.
> 
> Hi Sugesh,
> 
> Thank you for taking your time to review it!
> 
> 
> > We are also looking into something similar to enable full offload in OVS-
> DPDK.
> 
> Good to know!
> 
> > It is based on ' http://dpdk.org/ml/archives/dev/2017-
> September/074746.html' and some other rte_flow extension in DPDK.
> 
> I saw the patches, I will take some time to read it.
[Sugesh]Sure.
> 
> >
> > It is noted that the patch series doesn't work very well for some of our
> requirements.
> > Please find below for the high level comments. I have also provided specific
> comments on the following patches.
> >
> > 1) Looks to me the patch series is to enable/use just one functionality of
> NIC(mark action). In a multiple hardware environment it is necessary to have
> a feature discovery mechanism to define what needs to be installed in the
> hardware based on its capabilities, for eg: MARK+QUEUE, MARK only,
> number of supported flow entries, supported flow fields and etc. This is very
> important to support different hardware NICs and make flow install easy.
> 
> Yes, you are right. I have also observed this issue while coding this patch.
[Sugesh] Ok.
> 
> > In our implementation we have a feature discovery at the OVS init. It will
> also populate the OVSDB to expose the device capability to higher
> management layers. The new table introduced in OVSDB is like below.
> 
> The solution I want to go, however, is different though. I was thinking to
> introduce few DPDK rte_flow APIs and structs to define the NIC flow
> capabilities.
[Sugesh] technically rte_flow is for flow programming and not for device capabilities.
Again if DPDK can have such API in rte_flow. I think it should be fine.
> 
> I think this would help long run, as the capabilitiy will be updated as the new
> features are added (when new versions are released). For the solution you
> proposed, it won't allow DPDK work with multiple DPDK versions (assume
> they provides different rte flow capabilities).
> 
> >   <table name="hw_offload">
> >     <p>
> >       Hardware switching configuration and capabilities.
> >     </p>
> >     <column name="name">
> >       The name of hardware acceleration device.
> >     </column>
> >     <column name="dev_id" type='{"type": "integer", "minInteger": 0,
> "maxInteger": 7}'>
> >       The integer device id of hardware accelerated NIC.
> >     </column>
> >      <column name="pci_id" type='{"type": "string"}'>
> >       The PCI ID of the hardware acceleration device. The broker id/PF id.
> >      </column>
> >      <column name="features" key="n_vhost_ports" type='{"type":
> "integer"}'>
> >       The number of supported vhost ports in the hardware switch.
> >      </column>
> >   </table>
> >
> > The features column can be extended with more fields as necessary.
> > IMO the proposed partial offload doesn't need populating the OVSDB,
> however its necessary to have some kind of feature discovery at init.
> >
> > 2) I feel its better to keep the hardware offload functionalities in netdev as
> much as possible similar to kernel implementation. I see changes in upcall
> and dpif.
> 
> I agree with you. But unfortunately, due to some dirver or hardware
> limitation, that's what I can get best.
[Sugesh] Ok. 
> 
> > 3) The cost of flow install . PMDs are blocked when a hardware flow install
> is happening. Its an issue when there are lot of short lived flows are getting
> installed in the DP.
> 
> I wasn't aware of it. Thank you for letting me know that!
> 
[Sugesh] Ok
> > One option to handle this would be move the flow install into revalidate.
> The advantage of this approach would be hardware offload would happen
> only when a flow is being used at least for sometime. Something like how
> revalidator thread handle the flow modify operation.
> 
> Yes, it sounds workable. However, the MARK and QUEUE workaround won't
> work then: we need record the rxq first. And again, I know the workaround is
> far from being perfect.
> 
> > 4) AFAIK, these hardware programmability for a NIC/not for a specific port.
> i.e the FDIR/RSS hash configuration are device specific. This will be an issue if
> a NIC shared between kernel and DPDK drivers?
> 
> That might be NIC specific. What do you mean by sharing between kernel
> and DPDK? In most NICs I'm aware of, it's requried to unbind the kernel
> driver first. Thus, it won't be shared. For Mellanox, the control unit is based
> on queues, thus it could be shared correctly.
[Sugesh] What  I meant by that is, consider a case of NIC with 4*10G ports.
2 ports bound to DPDK and 2 ports to kernel.
If I remember correctly XL710 NIC can support total 8k exact match flow entries in its
Flow director.  Similarly some other resources are also shared across all the ports in the NIC. 
Now how these resources are properly managed between
Kernel and DPDK.  
 I agree that Mellanox NICs this should be fine, but not sure if it work on all
the NICs out there. This will make adverse effect on each other when making changes to any global configuration.
> 
> 	--yliu
Yuanhan Liu Sept. 11, 2017, 10:22 a.m. UTC | #10
On Mon, Sep 11, 2017 at 10:00:06AM +0000, Chandran, Sugesh wrote:
> > > In our implementation we have a feature discovery at the OVS init. It will
> > also populate the OVSDB to expose the device capability to higher
> > management layers. The new table introduced in OVSDB is like below.
> > 
> > The solution I want to go, however, is different though. I was thinking to
> > introduce few DPDK rte_flow APIs and structs to define the NIC flow
> > capabilities.
> [Sugesh] technically rte_flow is for flow programming and not for device capabilities.

Not really. rte_flow is just a framework, it needs the underlaying NIC
to do the real thing. Each NIC has different limitations (we have seen
quite few of them). Thus, we need something like this.

If you think this way: device capabilities regarding to flow, it may
make more sense to you :)

> Again if DPDK can have such API in rte_flow. I think it should be fine.

Good! I will make a proposal to DPDK for v18.02 then.

> > > 4) AFAIK, these hardware programmability for a NIC/not for a specific port.
> > i.e the FDIR/RSS hash configuration are device specific. This will be an issue if
> > a NIC shared between kernel and DPDK drivers?
> > 
> > That might be NIC specific. What do you mean by sharing between kernel
> > and DPDK? In most NICs I'm aware of, it's requried to unbind the kernel
> > driver first. Thus, it won't be shared. For Mellanox, the control unit is based
> > on queues, thus it could be shared correctly.
> [Sugesh] What  I meant by that is, consider a case of NIC with 4*10G ports.
> 2 ports bound to DPDK and 2 ports to kernel.

I see.

> If I remember correctly XL710 NIC can support total 8k exact match flow entries in its
> Flow director.  Similarly some other resources are also shared across all the ports in the NIC. 
> Now how these resources are properly managed between
> Kernel and DPDK.  

Honestly, I don't know. We may need testings/investigations.

>  I agree that Mellanox NICs this should be fine, but not sure if it work on all
> the NICs out there. This will make adverse effect on each other when making changes to any global configuration.

For sure I agree we should make it work on as many nics as possible.
And I think that's what this patchset trying to do.

	--yliu
Chandran, Sugesh Sept. 11, 2017, 11:04 a.m. UTC | #11
Regards
_Sugesh


> -----Original Message-----
> From: Yuanhan Liu [mailto:yliu@fridaylinux.org]
> Sent: Monday, September 11, 2017 11:23 AM
> To: Chandran, Sugesh <sugesh.chandran@intel.com>
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH v2 0/8] OVS-DPDK flow offload with rte_flow
> 
> On Mon, Sep 11, 2017 at 10:00:06AM +0000, Chandran, Sugesh wrote:
> > > > In our implementation we have a feature discovery at the OVS init.
> > > > It will
> > > also populate the OVSDB to expose the device capability to higher
> > > management layers. The new table introduced in OVSDB is like below.
> > >
> > > The solution I want to go, however, is different though. I was
> > > thinking to introduce few DPDK rte_flow APIs and structs to define
> > > the NIC flow capabilities.
> > [Sugesh] technically rte_flow is for flow programming and not for device
> capabilities.
> 
> Not really. rte_flow is just a framework, it needs the underlaying NIC to do
> the real thing. Each NIC has different limitations (we have seen quite few of
> them). Thus, we need something like this.
> 
> If you think this way: device capabilities regarding to flow, it may make more
> sense to you :)
[Sugesh] For our use case , we are have more features to be considered/wanted to
expose. Those are not really a 'flow' attribute. Instead they are the device properties
that necessary to decide a flow is offload'able/not.

> 
> > Again if DPDK can have such API in rte_flow. I think it should be fine.
> 
> Good! I will make a proposal to DPDK for v18.02 then.
[Sugesh] Also its worth looking at the new port representor
model in DPDK. The reason being,  we are planning to propose an extension to it
for exposing device capabilities for our use case :)

> 
> > > > 4) AFAIK, these hardware programmability for a NIC/not for a specific
> port.
> > > i.e the FDIR/RSS hash configuration are device specific. This will
> > > be an issue if a NIC shared between kernel and DPDK drivers?
> > >
> > > That might be NIC specific. What do you mean by sharing between
> > > kernel and DPDK? In most NICs I'm aware of, it's requried to unbind
> > > the kernel driver first. Thus, it won't be shared. For Mellanox, the
> > > control unit is based on queues, thus it could be shared correctly.
> > [Sugesh] What  I meant by that is, consider a case of NIC with 4*10G ports.
> > 2 ports bound to DPDK and 2 ports to kernel.
> 
> I see.
> 
> > If I remember correctly XL710 NIC can support total 8k exact match
> > flow entries in its Flow director.  Similarly some other resources are also
> shared across all the ports in the NIC.
> > Now how these resources are properly managed between Kernel and
> DPDK.
> 
> Honestly, I don't know. We may need testings/investigations.
[Sugesh] yes.
> 
> >  I agree that Mellanox NICs this should be fine, but not sure if it
> > work on all the NICs out there. This will make adverse effect on each other
> when making changes to any global configuration.
> 
> For sure I agree we should make it work on as many nics as possible.
> And I think that's what this patchset trying to do.
[Sugesh] Sure, Thank you!
> 
> 	--yliu
Patil, Harish Sept. 13, 2017, 1:48 a.m. UTC | #12
-----Original Message-----
From: Yuanhan Liu <yliu@fridaylinux.org>

Date: Thursday, September 7, 2017 at 6:58 PM
To: Harish Patil <Harish.Patil@cavium.com>
Cc: Finn Christensen <fc@napatech.com>, "dev@openvswitch.org"
<dev@openvswitch.org>, "dball@vmware.com" <dball@vmware.com>
Subject: Re: [ovs-dev] [PATCH v2 0/8] OVS-DPDK flow offload with rte_flow

>On Thu, Sep 07, 2017 at 06:52:49PM +0000, Patil, Harish wrote:

>> >Hi all,

>> >

>> >Can you please confirm that you are supporting offloading of both the

>>EMC

>> >flows and DPCLs (megaflows) here, i.e. OVS would skip hash table

>>lookups

>> >in both the cases if UFID is provided in the MBUF. Assuming that is

>> >correct, when a match is found in dpcls, does OVS insert that new flow

>> >back into the EMC cache?

>> >

>> >Thanks,

>> >Harish

>> >

>> >[Finn]

>> >Yes, you are correct. Once the megaflow is offloaded into NIC, using

>>the

>> >flow UFID,

>> >the EMC and megaflow cache (dpcls) is skipped when a UFID is received

>>in

>> >mbuf. When

>> >receiving these pre-classified packets the EMC is not needed. However,

>> >the initial packet

>> >creating the megaflow (and then also creates the NIC rte flow), will be

>> >inserted into EMC.

>> 

>> [Harish] Thanks Finn for confirming the behavior.

>> 

>> 

>> >But, new flows that would use the same megaflow, but would create a

>> >different EMC entry,

>> >will not be inserted/created in EMC when offloaded by NIC.

>> 

>> [Harish I did not fully understand this part. Can you pls elaborate and

>> possibly with an example?

>

>It's megaflow cache being offloaded. And the EMC is skipped completely

>once there are flow mark in the recved pkts. Thus, it doesn't matter

>whether EMC will be inserted back.

>

>Talking about that, I probably need move the "mark -> flow" code to

>dpcls but not in emc_processing. Darrel, just let me know which one

>you prefer.

>

>> [Harish] I have another question:

>> There was a patch series (11/11) submitted regarding offloading dpcls

>>from

>> Shachar Beiser.

>> 

>> [ovs-dev] [PATCH 00/11] Data Path Classifier Offloading

>> ..

>> ..

>> [ovs-dev] [PATCH 11/11] ovs/dp-cls: inserting rule to HW from	offloading

>> thread context.

>> 

>> 

>> This does not use RTE_FLOW filtering framework. I don’t know status of

>> this patch series.

>> But this is very similar to what is being achieved with your current

>>patch

>> series using RTE_FLOW.

>> Which one will be accepted in the end in the mainline OVS branch?

>

>As it's been stated in the first sentence of my cover letter, it's a

>joint work from Mellanox and Napatech. Shachar (who is from Mellanox)

>does't have time to work on this project anymore, leaving me (who is

>also from Mellanox) to continue the work. And then my choice was to

>continue it based on what Napatech already had, for a simple reason:

>it's simpler.

>

>OTOH, do you see anything missing in this patchset, comparing the one

>from Shachar?

>

>	--yliu


[Harish] Thanks for clarifying this. So that’s fine.
Since I saw two independent patch series I was not sure what happened to
the one Shachar posted.
The only difference I saw was that the current patch uses rte_flow and the
earlier used legacy filtering framework.
Anyway its clear to me now, thanks.



>
Darrell Ball Sept. 13, 2017, 4:14 a.m. UTC | #13
On 9/7/17, 6:59 PM, "Yuanhan Liu" <yliu@fridaylinux.org> wrote:

    On Thu, Sep 07, 2017 at 06:52:49PM +0000, Patil, Harish wrote:
    > >Hi all,

    > >

    > >Can you please confirm that you are supporting offloading of both the EMC

    > >flows and DPCLs (megaflows) here, i.e. OVS would skip hash table lookups

    > >in both the cases if UFID is provided in the MBUF. Assuming that is

    > >correct, when a match is found in dpcls, does OVS insert that new flow

    > >back into the EMC cache?

    > >

    > >Thanks,

    > >Harish

    > >

    > >[Finn]

    > >Yes, you are correct. Once the megaflow is offloaded into NIC, using the

    > >flow UFID,

    > >the EMC and megaflow cache (dpcls) is skipped when a UFID is received in

    > >mbuf. When

    > >receiving these pre-classified packets the EMC is not needed. However,

    > >the initial packet

    > >creating the megaflow (and then also creates the NIC rte flow), will be

    > >inserted into EMC.

    > 

    > [Harish] Thanks Finn for confirming the behavior.

    > 

    > 

    > >But, new flows that would use the same megaflow, but would create a

    > >different EMC entry,

    > >will not be inserted/created in EMC when offloaded by NIC.

    > 

    > [Harish I did not fully understand this part. Can you pls elaborate and

    > possibly with an example?

    
    It's megaflow cache being offloaded. And the EMC is skipped completely
    once there are flow mark in the recved pkts. Thus, it doesn't matter
    whether EMC will be inserted back.
    
    Talking about that, I probably need move the "mark -> flow" code to
    dpcls but not in emc_processing. Darrel, just let me know which one
    you prefer.

[Darrell] sorry, too many e-mails.
                 I had some related comments to share, but yes.
                Probably fast_path_processing().


    
    > [Harish] I have another question:

    > There was a patch series (11/11) submitted regarding offloading dpcls from

    > Shachar Beiser.

    > 

    > [ovs-dev] [PATCH 00/11] Data Path Classifier Offloading

    > ..

    > ..

    > [ovs-dev] [PATCH 11/11] ovs/dp-cls: inserting rule to HW from	offloading

    > thread context.

    > 

    > 

    > This does not use RTE_FLOW filtering framework. I don’t know status of

    > this patch series.

    > But this is very similar to what is being achieved with your current patch

    > series using RTE_FLOW.

    > Which one will be accepted in the end in the mainline OVS branch?

    
    As it's been stated in the first sentence of my cover letter, it's a
    joint work from Mellanox and Napatech. Shachar (who is from Mellanox)
    does't have time to work on this project anymore, leaving me (who is
    also from Mellanox) to continue the work. And then my choice was to
    continue it based on what Napatech already had, for a simple reason:
    it's simpler.
    
    OTOH, do you see anything missing in this patchset, comparing the one
    from Shachar?
    
    	--yliu