diff mbox series

[ovs-dev,RFC,1/3] exact-match-offload: Introducing Exact-Match HW Offload Support for OVS

Message ID CO6PR22MB285237AA2A1128AB207655B8AD642@CO6PR22MB2852.namprd22.prod.outlook.com
State Changes Requested
Delegated to: Simon Horman
Headers show
Series [ovs-dev,RFC,1/3] exact-match-offload: Introducing Exact-Match HW Offload Support for OVS | expand

Commit Message

Farhan Tariq Sept. 12, 2024, 12:51 p.m. UTC
This patch introduces exact-match offload support in OVS to enhance
optimization for hardware datapath. It's being submitted as a RFC for early
feedback.

Problem Statement:
Currently, OVS offloads megaflows using tc-flower. However, hardware
support for megaflows has certain limitations:
a. TCAMs are used, which aren't scalable and support fewer rules.
b. Using hash tables to emulate TCAMs results in complex and slow
performance with many tables.

Solution:
This patch aims to add exact-match offload support to OVS, addressing
these issues. This feature is an optional enhancement and is disabled by
default.

Implementation Details:
1. Added a new flag, "exact-match-offload" to "ovs-vsctl...other_config",
   which is disabled by default but can be enabled at runtime. It works only
   if megaflows are disabled.
2. Introduced the "enable_exact_match_offload()" API in
   "/lib/netdev-offload-tc.c" to zero-out unused masks.
3. Moved "enable_megaflows" from "ofproto/ofproto-dpif-upcall.c" to "lib/-
   netdev-offload-tc.c" to make it visible in the "netdev-offload-tc.c"
   file. An alternate implementation could be to use a getter api.

Testing results:
Verified the basic functionality of the patch with different flows.

Future Work:
1. Add support for connection tracking.
2. Add support for OVS-DPDK rte_flow.

Note: This feature does not affect user OpenFlow tables.
---
Potential Issue:
This patch might not handle short-lived large numbers of new connections
effectively. The last proposal (i.e. proposal #4 listed below) can address
this issue. For now this feature is beneficial for many use cases, with
further optimization for handling large numbers of short-lived connections
planned for future work.

Other Possible Solutions:
Several approaches to implementing exact-match offload are being
considered. Comments are welcome on the best approach:
1. Offload all fields except mutable and unpredictable ones (e.g., ttl,
   tos, ip_frag, checksum). Megaflows are disabled.
2. Offload only essential fields (e.g., 5-tuple + VLAN IDs + tunnel info
   etc). Megaflows are disabled.
3. Provide configuration options for users to select which fields to
   offload. Megaflows are disabled.
4. Introduce an offloading path from the OVS datapath (kernel) module to
   integrate megaflows and exact-match offload, addressing short-lived new
   connection performance. Suggestions on this approach are welcome.
--
Co-authored-by: Farhat Ullah <farhat@dreambigsemi.com<mailto:farhat@dreambigsemi.com>>
Signed-off-by: Farhat Ullah <farhat@dreambigsemi.com<mailto:farhat@dreambigsemi.com>>
Signed-off-by: Farhan Tariq <ftariq@dreambigsemi.com<mailto:ftariq@dreambigsemi.com>>

---
 Documentation/howto/tc-offload.rst |  56 +++++++
 NEWS                               |   6 +
 lib/netdev-offload-tc.c            | 236 +++++++++++++++++++----------
 lib/netdev-offload.c               |   5 +
 lib/netdev-offload.h               |   2 +
 ofproto/ofproto-dpif-upcall.c      |   2 +-
 tests/system-offloads-traffic.at<http://system-offloads-traffic.at>   |  24 +++
 vswitchd/vswitch.xml               |  19 +++
 8 files changed, 266 insertions(+), 84 deletions(-)

--
2.27.0

Comments

Simon Horman Sept. 16, 2024, 1:45 p.m. UTC | #1
On Thu, Sep 12, 2024 at 12:51:27PM +0000, Farhan Tariq wrote:
> This patch introduces exact-match offload support in OVS to enhance
> optimization for hardware datapath. It's being submitted as a RFC for early
> feedback.
> 
> Problem Statement:
> Currently, OVS offloads megaflows using tc-flower. However, hardware
> support for megaflows has certain limitations:
> a. TCAMs are used, which aren't scalable and support fewer rules.
> b. Using hash tables to emulate TCAMs results in complex and slow
> performance with many tables.
> 
> Solution:
> This patch aims to add exact-match offload support to OVS, addressing
> these issues. This feature is an optional enhancement and is disabled by
> default.
> 
> Implementation Details:
> 1. Added a new flag, "exact-match-offload" to "ovs-vsctl...other_config",
>    which is disabled by default but can be enabled at runtime. It works only
>    if megaflows are disabled.
> 2. Introduced the "enable_exact_match_offload()" API in
>    "/lib/netdev-offload-tc.c" to zero-out unused masks.
> 3. Moved "enable_megaflows" from "ofproto/ofproto-dpif-upcall.c" to "lib/-
>    netdev-offload-tc.c" to make it visible in the "netdev-offload-tc.c"
>    file. An alternate implementation could be to use a getter api.
> 
> Testing results:
> Verified the basic functionality of the patch with different flows.
> 
> Future Work:
> 1. Add support for connection tracking.
> 2. Add support for OVS-DPDK rte_flow.
> 
> Note: This feature does not affect user OpenFlow tables.
> ---
> Potential Issue:
> This patch might not handle short-lived large numbers of new connections
> effectively. The last proposal (i.e. proposal #4 listed below) can address
> this issue. For now this feature is beneficial for many use cases, with
> further optimization for handling large numbers of short-lived connections
> planned for future work.
> 
> Other Possible Solutions:
> Several approaches to implementing exact-match offload are being
> considered. Comments are welcome on the best approach:
> 1. Offload all fields except mutable and unpredictable ones (e.g., ttl,
>    tos, ip_frag, checksum). Megaflows are disabled.
> 2. Offload only essential fields (e.g., 5-tuple + VLAN IDs + tunnel info
>    etc). Megaflows are disabled.
> 3. Provide configuration options for users to select which fields to
>    offload. Megaflows are disabled.
> 4. Introduce an offloading path from the OVS datapath (kernel) module to
>    integrate megaflows and exact-match offload, addressing short-lived new
>    connection performance. Suggestions on this approach are welcome.
> --
> Co-authored-by: Farhat Ullah <farhat@dreambigsemi.com<mailto:farhat@dreambigsemi.com>>
> Signed-off-by: Farhat Ullah <farhat@dreambigsemi.com<mailto:farhat@dreambigsemi.com>>
> Signed-off-by: Farhan Tariq <ftariq@dreambigsemi.com<mailto:ftariq@dreambigsemi.com>>

Thanks Farhat,

A high level, am I correct in thinking that the motivation for this is
hardware where rules are stored in TCAMs. Which, for the reasons you
describe above, achieve better results with exact-match rules than
mega-flows.

Not strictly related, but, it seems that for some reason the CI checks
didn't run.  Hopefully this will result in them running.

Recheck-request: github-robot
Farhan Tariq Sept. 16, 2024, 5:11 p.m. UTC | #2
Hello Simon,

Yes, you’re right. We are indeed targeting hardware offload. Since there are limitations when using TCAMs, bypassing them can help optimize both lookups and the forwarding process.

Best,
Farhan
Farhan Tariq Sept. 16, 2024, 5:13 p.m. UTC | #3
Hello Simon,

Yes, you’re right. We are indeed targeting hardware offload. Since there
are limitations when using TCAMs, bypassing them can help optimize both
lookups and the forwarding process.

Best,
Farhan
Simon Horman Sept. 20, 2024, 11:45 a.m. UTC | #4
On Mon, Sep 16, 2024 at 05:13:48PM +0000, Farhan Tariq wrote:
> Hello Simon,
> 
> Yes, you’re right. We are indeed targeting hardware offload. Since there
> are limitations when using TCAMs, bypassing them can help optimize both
> lookups and the forwarding process.

Hi Farhan,

First, please don't top-post on OVS mailing lists.

Second, it would be really nice to understand some more information about
the hardware you are working with.

Lastly, and most importantly, regarding the implementation.

Having spoken off-line to others, it is my understanding that the option to
disable megaflows is, at this time, only intended to be used for debugging
purposes.  So if that is a requirement of the design then I think work will
need to be done to make it suitable for non-debugging users.  But perhaps
we can consider some other options.

As it stands a transformation is made from OpenFlow rules to megaflow
datapath rules. But, as you point out, there are use-cases where
exact-match flows make sense. Indeed we see this in the see this in the
user-space datapath. But in that case the approach taken is to implement an
exact-match cache.

I also believe that is it perfectly ok for a datapath implementation to
take accept a megaflow rule and install a finer-grained rule, including an
exact-match rule.  I think this would lead to a similar outcome to your
patch-set, but likely with significantly less modification to OVS.
Eelco Chaudron Sept. 20, 2024, 11:46 a.m. UTC | #5
On 12 Sep 2024, at 14:51, Farhan Tariq wrote:

> This patch introduces exact-match offload support in OVS to enhance
> optimization for hardware datapath. It's being submitted as a RFC for early
> feedback.
>
> Problem Statement:
> Currently, OVS offloads megaflows using tc-flower. However, hardware
> support for megaflows has certain limitations:
> a. TCAMs are used, which aren't scalable and support fewer rules.
> b. Using hash tables to emulate TCAMs results in complex and slow
> performance with many tables.
>
> Solution:
> This patch aims to add exact-match offload support to OVS, addressing
> these issues. This feature is an optional enhancement and is disabled by
> default.
>
> Implementation Details:
> 1. Added a new flag, "exact-match-offload" to "ovs-vsctl...other_config",
>    which is disabled by default but can be enabled at runtime. It works only
>    if megaflows are disabled.
> 2. Introduced the "enable_exact_match_offload()" API in
>    "/lib/netdev-offload-tc.c" to zero-out unused masks.
> 3. Moved "enable_megaflows" from "ofproto/ofproto-dpif-upcall.c" to "lib/-
>    netdev-offload-tc.c" to make it visible in the "netdev-offload-tc.c"
>    file. An alternate implementation could be to use a getter api.
>
> Testing results:
> Verified the basic functionality of the patch with different flows.
>
> Future Work:
> 1. Add support for connection tracking.
> 2. Add support for OVS-DPDK rte_flow.
>
> Note: This feature does not affect user OpenFlow tables.
> ---
> Potential Issue:
> This patch might not handle short-lived large numbers of new connections
> effectively. The last proposal (i.e. proposal #4 listed below) can address
> this issue. For now this feature is beneficial for many use cases, with
> further optimization for handling large numbers of short-lived connections
> planned for future work.
>
> Other Possible Solutions:
> Several approaches to implementing exact-match offload are being
> considered. Comments are welcome on the best approach:
> 1. Offload all fields except mutable and unpredictable ones (e.g., ttl,
>    tos, ip_frag, checksum). Megaflows are disabled.
> 2. Offload only essential fields (e.g., 5-tuple + VLAN IDs + tunnel info
>    etc). Megaflows are disabled.
> 3. Provide configuration options for users to select which fields to
>    offload. Megaflows are disabled.
> 4. Introduce an offloading path from the OVS datapath (kernel) module to
>    integrate megaflows and exact-match offload, addressing short-lived new
>    connection performance. Suggestions on this approach are welcome.
> --

Thanks, Farhat, for submitting the RFC series. I noticed something that might be a mistake in the subject line—it says "patch 1/3,” but only patch 1 was sent.

Regarding your patch, I see that it relies on disabling mega-flow support. However, this isn’t something typically done in a production environment. The ‘ovs-appctl upcall/disable-megaflows’ command was originally introduced for debugging purposes when megaflows were first implemented. Since no one runs or tests with this disabled, it’s generally considered a debug-only command.

While I understand the value of your patch in simplifying hardware offloading of datapath flows, disabling megaflows doesn’t seem like a practical solution for production use. Perhaps it would be beneficial to collaborate with other hardware vendors within the OVS community to find a more viable approach. It might be worth reaching out to other vendors like, Broadcom, Marvell, NVIDIA, etc. on this mailing list.

Cheers,

Eelco

> Co-authored-by: Farhat Ullah <farhat@dreambigsemi.com<mailto:farhat@dreambigsemi.com>>
> Signed-off-by: Farhat Ullah <farhat@dreambigsemi.com<mailto:farhat@dreambigsemi.com>>
> Signed-off-by: Farhan Tariq <ftariq@dreambigsemi.com<mailto:ftariq@dreambigsemi.com>>
>
> ---
>  Documentation/howto/tc-offload.rst |  56 +++++++
>  NEWS                               |   6 +
>  lib/netdev-offload-tc.c            | 236 +++++++++++++++++++----------
>  lib/netdev-offload.c               |   5 +
>  lib/netdev-offload.h               |   2 +
>  ofproto/ofproto-dpif-upcall.c      |   2 +-
>  tests/system-offloads-traffic.at<http://system-offloads-traffic.at>   |  24 +++
>  vswitchd/vswitch.xml               |  19 +++
>  8 files changed, 266 insertions(+), 84 deletions(-)
>
> diff --git a/Documentation/howto/tc-offload.rst b/Documentation/howto/tc-offload.rst
> index ee7f73f8a..7e6e0e9d1 100644
> --- a/Documentation/howto/tc-offload.rst
> +++ b/Documentation/howto/tc-offload.rst
> @@ -37,6 +37,50 @@ The flow hardware offload is disabled by default and can be enabled by::
>  TC flower has one additional configuration option caled ``tc-policy``.
>  For more details see ``man ovs-vswitchd.conf.db``.
>
> +
> +Exact Match Flow Hardware Offload
> +---------------------------------
> +
> +The exact match flow hardware offload is disabled by default and can be
> +enabled by::
> +
> +    $ ovs-appctl upcall/disable-megaflows
> +    $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
> +    $ ovs-vsctl set Open_vSwitch . other_config:exact-match-offload=true
> +
> +For more details see ``man ovs-vswitchd.conf.db``.
> +
> +
> +Supported Fields
> +~~~~~~~~~~~~~~~~
> +
> +Not all fields are offloaded in the exact match flow. The list of supported
> +fields is:
> +
> +    ETH src_mac
> +    ETH dst_mac
> +    ETH eth_type
> +    ARP arp_sha
> +    ARP arp_tha
> +    ARP arp_sip
> +    ARP arp_tip
> +    ARP arp_op
> +    MPLS mpls_label
> +    VLAN eth_type
> +    VLAN id
> +    IPv4/v6 src_ip
> +    IPv4/v6 dst_ip
> +    IPv4/v6 ip_proto
> +    ICMPv4/v6 type
> +    ICMPv4/v6 code
> +    TCP/UDP/SCTP dst_port
> +    TCP/UDP/SCTP src_port
> +    TUNNEL enc_key_id
> +    TUNNEL enc_src_ip
> +    TUNNEL enc_dst_ip
> +    TUNNEL enc_dst_port
> +
> +
>  TC Meter Offload
>  ----------------
>
> @@ -123,3 +167,15 @@ might run through a TC rule, which internally will not call the conntrack
>  helper required.
>
>  So if ALG support is required, tc offload must be disabled.
> +
> +Exact Match Flow Offload
> +~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +This section describes the limitations related to the implementation of exact
> +match flow offload.
> +
> +Conntrack 'ct_state' field not offloaded
> +++++++++++++++++++++++++++++++++++++++++
> +
> +Currently, the exact match offloaded flow does not contain the ``ct_state``,
> +field. This will be addressed in a future patch.
> \ No newline at end of file
> diff --git a/NEWS b/NEWS
> index 944c95a8d..37d22630f 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -1,3 +1,9 @@
> +Patch: v3.4.0 - 12 Sep 2024
> +---------------------------
> +  - Linux TC offload:
> +    * Add support for offloading exact-match flows.
> +
> +
>  v3.4.0 - 15 Aug 2024
>  --------------------
>     - Option '--mlockall' now only locks memory pages on fault, if possible.
> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
> index 3be1c08d2..fe1334d2c 100644
> --- a/lib/netdev-offload-tc.c
> +++ b/lib/netdev-offload-tc.c
> @@ -32,6 +32,7 @@
>  #include "netdev-linux.h"
>  #include "netdev-offload-provider.h"
>  #include "netdev-provider.h"
> +#include "netdev-offload.h"
>  #include "netdev-vport.h"
>  #include "netlink.h"
>  #include "netlink-socket.h"
> @@ -44,6 +45,8 @@
>
>  VLOG_DEFINE_THIS_MODULE(netdev_offload_tc);
>
> +atomic_bool enable_megaflows = true;
> +
>  static struct vlog_rate_limit error_rl = VLOG_RATE_LIMIT_INIT(60, 5);
>  static struct vlog_rate_limit warn_rl = VLOG_RATE_LIMIT_INIT(10, 2);
>
> @@ -1682,6 +1685,41 @@ is_ipv6_fragment_and_masked(const struct flow *key, const struct flow *mask)
>      return false;
>  }
>
> +/* Check and clear unused masks for exact-match offloads */
> +static int
> +test_key_and_clear_mask(struct match *match)
> +{
> +    const struct flow *key = &match->flow;
> +    struct flow *mask = &match->wc.masks;
> +
> +    for (int i = 1; i < FLOW_MAX_MPLS_LABELS; i++) {
> +        if (key->mpls_lse[i]) {
> +            return EOPNOTSUPP;
> +        }
> +    }
> +
> +    for (int i = 1; i < FLOW_N_REGS; i++) {
> +        if (key->regs[i]) {
> +            return EOPNOTSUPP;
> +        }
> +    }
> +
> +    if (is_ipv6_fragment_and_masked(key, mask)) {
> +        return EOPNOTSUPP;
> +    }
> +
> +    if (key->dp_hash || key->conj_id ||
> +        key->skb_priority || key->actset_output ||
> +        key->packet_type || key->metadata ||
> +        (key->dl_type == htons(ETH_TYPE_IP) &&
> +            key->nw_proto == IPPROTO_IGMP)) {
> +        return EOPNOTSUPP;
> +    }
> +
> +    memset(mask, 0, sizeof *mask);
> +    return 0;
> +}
> +
>  static int
>  test_key_and_mask(struct match *match)
>  {
> @@ -2220,6 +2258,9 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>      const struct flow_tnl *tnl = &match->flow.tunnel;
>      struct flow_tnl *tnl_mask = &mask->tunnel;
>      struct dpif_flow_stats adjust_stats;
> +    bool exact_match_offload = false;
> +    bool microflow = false;
> +    bool megaflow = false;
>      bool recirc_act = false;
>      uint32_t block_id = 0;
>      struct tcf_id id;
> @@ -2235,6 +2276,11 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>          return -ifindex;
>      }
>
> +    atomic_read_relaxed(&enable_megaflows, &megaflow);
> +    atomic_read_relaxed(&netdev_exact_match_offload, &microflow);
> +
> +    exact_match_offload = !megaflow & microflow;
> +
>      memset(&flower, 0, sizeof flower);
>
>      chain = key->recirc_id;
> @@ -2253,57 +2299,60 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>          flower.key.tunnel.ipv4.ipv4_dst = tnl->ip_dst;
>          flower.key.tunnel.ipv6.ipv6_src = tnl->ipv6_src;
>          flower.key.tunnel.ipv6.ipv6_dst = tnl->ipv6_dst;
> -        flower.key.tunnel.tos = tnl->ip_tos;
> -        flower.key.tunnel.ttl = tnl->ip_ttl;
> -        flower.key.tunnel.tp_src = tnl->tp_src;
>          flower.key.tunnel.tp_dst = tnl->tp_dst;
> -        flower.key.tunnel.gbp.id<http://flower.key.tunnel.gbp.id> = tnl->gbp_id;
> -        flower.key.tunnel.gbp.flags = tnl->gbp_flags;
> -        flower.key.tunnel.gbp.id_present = !!tnl_mask->gbp_id;
> -
> +        flower.mask.tunnel.id<http://flower.mask.tunnel.id> = (tnl->flags & FLOW_TNL_F_KEY) ? tnl_mask->tun_id : 0;
>          flower.mask.tunnel.ipv4.ipv4_src = tnl_mask->ip_src;
>          flower.mask.tunnel.ipv4.ipv4_dst = tnl_mask->ip_dst;
>          flower.mask.tunnel.ipv6.ipv6_src = tnl_mask->ipv6_src;
>          flower.mask.tunnel.ipv6.ipv6_dst = tnl_mask->ipv6_dst;
> -        flower.mask.tunnel.tos = tnl_mask->ip_tos;
> -        flower.mask.tunnel.ttl = tnl_mask->ip_ttl;
> -        flower.mask.tunnel.tp_src = tnl_mask->tp_src;
>          /* XXX: We should be setting the mask from 'tnl_mask->tp_dst' here, but
>           * some hardware drivers (mlx5) doesn't support masked matches and will
>           * refuse to offload such flows keeping them in software path.
>           * Degrading the flow down to exact match for now as a workaround. */
>          flower.mask.tunnel.tp_dst = OVS_BE16_MAX;
> -        flower.mask.tunnel.id<http://flower.mask.tunnel.id> = (tnl->flags & FLOW_TNL_F_KEY) ? tnl_mask->tun_id : 0;
> -        flower.mask.tunnel.gbp.id<http://flower.mask.tunnel.gbp.id> = tnl_mask->gbp_id;
> -        flower.mask.tunnel.gbp.flags = tnl_mask->gbp_flags;
> -        flower.mask.tunnel.gbp.id_present = !!tnl_mask->gbp_id;
> -
> -        memset(&tnl_mask->ip_src, 0, sizeof tnl_mask->ip_src);
> -        memset(&tnl_mask->ip_dst, 0, sizeof tnl_mask->ip_dst);
> -        memset(&tnl_mask->ipv6_src, 0, sizeof tnl_mask->ipv6_src);
> -        memset(&tnl_mask->ipv6_dst, 0, sizeof tnl_mask->ipv6_dst);
> -        memset(&tnl_mask->ip_tos, 0, sizeof tnl_mask->ip_tos);
> -        memset(&tnl_mask->ip_ttl, 0, sizeof tnl_mask->ip_ttl);
> -        memset(&tnl_mask->tp_src, 0, sizeof tnl_mask->tp_src);
> -        memset(&tnl_mask->tp_dst, 0, sizeof tnl_mask->tp_dst);
> -
> -        memset(&tnl_mask->tun_id, 0, sizeof tnl_mask->tun_id);
> -        memset(&tnl_mask->gbp_id, 0, sizeof tnl_mask->gbp_id);
> -        memset(&tnl_mask->gbp_flags, 0, sizeof tnl_mask->gbp_flags);
> -        tnl_mask->flags &= ~FLOW_TNL_F_KEY;
> -
> -        /* XXX: This is wrong!  We're ignoring DF and CSUM flags configuration
> -         * requested by the user.  However, TC for now has no way to pass
> -         * these flags in a flower key and their masks are set by default,
> -         * meaning tunnel offloading will not work at all if not cleared.
> -         * Keeping incorrect behavior for now. */
> -        tnl_mask->flags &= ~(FLOW_TNL_F_DONT_FRAGMENT | FLOW_TNL_F_CSUM);
>
> -        if (!strcmp(netdev_get_type(netdev), "geneve")) {
> -            err = flower_match_to_tun_opt(&flower, tnl, tnl_mask);
> -            if (err) {
> -                VLOG_WARN_RL(&warn_rl, "Unable to parse geneve options");
> -                return err;
> +        if (!exact_match_offload) {
> +            flower.key.tunnel.tos = tnl->ip_tos;
> +            flower.key.tunnel.ttl = tnl->ip_ttl;
> +            flower.key.tunnel.tp_src = tnl->tp_src;
> +            flower.key.tunnel.gbp.id<http://flower.key.tunnel.gbp.id> = tnl->gbp_id;
> +            flower.key.tunnel.gbp.flags = tnl->gbp_flags;
> +            flower.key.tunnel.gbp.id_present = !!tnl_mask->gbp_id;
> +
> +            flower.mask.tunnel.tos = tnl_mask->ip_tos;
> +            flower.mask.tunnel.ttl = tnl_mask->ip_ttl;
> +            flower.mask.tunnel.tp_src = tnl_mask->tp_src;
> +            flower.mask.tunnel.gbp.id<http://flower.mask.tunnel.gbp.id> = tnl_mask->gbp_id;
> +            flower.mask.tunnel.gbp.flags = tnl_mask->gbp_flags;
> +            flower.mask.tunnel.gbp.id_present = !!tnl_mask->gbp_id;
> +
> +            memset(&tnl_mask->ip_src, 0, sizeof tnl_mask->ip_src);
> +            memset(&tnl_mask->ip_dst, 0, sizeof tnl_mask->ip_dst);
> +            memset(&tnl_mask->ipv6_src, 0, sizeof tnl_mask->ipv6_src);
> +            memset(&tnl_mask->ipv6_dst, 0, sizeof tnl_mask->ipv6_dst);
> +            memset(&tnl_mask->ip_tos, 0, sizeof tnl_mask->ip_tos);
> +            memset(&tnl_mask->ip_ttl, 0, sizeof tnl_mask->ip_ttl);
> +            memset(&tnl_mask->tp_src, 0, sizeof tnl_mask->tp_src);
> +            memset(&tnl_mask->tp_dst, 0, sizeof tnl_mask->tp_dst);
> +
> +            memset(&tnl_mask->tun_id, 0, sizeof tnl_mask->tun_id);
> +            memset(&tnl_mask->gbp_id, 0, sizeof tnl_mask->gbp_id);
> +            memset(&tnl_mask->gbp_flags, 0, sizeof tnl_mask->gbp_flags);
> +            tnl_mask->flags &= ~FLOW_TNL_F_KEY;
> +
> +            /* XXX: This is wrong!  We're ignoring DF and CSUM flags configuration
> +            * requested by the user.  However, TC for now has no way to pass
> +            * these flags in a flower key and their masks are set by default,
> +            * meaning tunnel offloading will not work at all if not cleared.
> +            * Keeping incorrect behavior for now. */
> +            tnl_mask->flags &= ~(FLOW_TNL_F_DONT_FRAGMENT | FLOW_TNL_F_CSUM);
> +
> +            if (!strcmp(netdev_get_type(netdev), "geneve")) {
> +                err = flower_match_to_tun_opt(&flower, tnl, tnl_mask);
> +                if (err) {
> +                    VLOG_WARN_RL(&warn_rl, "Unable to parse geneve options");
> +                    return err;
> +                }
>              }
>          }
>          flower.tunnel = true;
> @@ -2319,6 +2368,12 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>          flower.key.mpls_lse = key->mpls_lse[0];
>          flower.mask.mpls_lse = mask->mpls_lse[0];
>          flower.key.encap_eth_type[0] = flower.key.eth_type;
> +
> +        if (exact_match_offload) {
> +            /* Enable mpls_label only */
> +            flower.key.mpls_lse &= ntohl(MPLS_LABEL_MASK);
> +            flower.mask.mpls_lse &= ntohl(MPLS_LABEL_MASK);
> +        }
>      }
>      mask->mpls_lse[0] = 0;
>
> @@ -2342,7 +2397,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>                  flower.mask.vlan_id[0] = vlan_tci_to_vid(mask->vlans[0].tci);
>                  VLOG_DBG_RL(&rl, "vlan_id[0]: %d\n", flower.key.vlan_id[0]);
>              }
> -            if (pcp_mask) {
> +            /* Skip vlan_prio in case of exact-match offload */
> +            if (pcp_mask && !exact_match_offload) {
>                  flower.key.vlan_prio[0] = vlan_tci_to_pcp(key->vlans[0].tci);
>                  flower.mask.vlan_prio[0] = vlan_tci_to_pcp(mask->vlans[0].tci);
>                  VLOG_DBG_RL(&rl, "vlan_prio[0]: %d\n",
> @@ -2377,7 +2433,7 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>                  flower.mask.vlan_id[1] = vlan_tci_to_vid(mask->vlans[1].tci);
>                  VLOG_DBG_RL(&rl, "vlan_id[1]: %d", flower.key.vlan_id[1]);
>              }
> -            if (pcp_mask) {
> +            if (pcp_mask && !exact_match_offload) {
>                  flower.key.vlan_prio[1] = vlan_tci_to_pcp(key->vlans[1].tci);
>                  flower.mask.vlan_prio[1] = vlan_tci_to_pcp(mask->vlans[1].tci);
>                  VLOG_DBG_RL(&rl, "vlan_prio[1]: %d", flower.key.vlan_prio[1]);
> @@ -2421,44 +2477,34 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>      }
>
>      if (is_ip_any(key) && !is_ipv6_fragment_and_masked(key, mask)) {
> +
> +        if (key->dl_type == htons(ETH_P_IP)) {
> +            flower.key.ipv4.ipv4_src = key->nw_src;
> +            flower.mask.ipv4.ipv4_src = mask->nw_src;
> +            flower.key.ipv4.ipv4_dst = key->nw_dst;
> +            flower.mask.ipv4.ipv4_dst = mask->nw_dst;
> +            mask->nw_src = 0;
> +            mask->nw_dst = 0;
> +        } else if (key->dl_type == htons(ETH_P_IPV6)) {
> +            flower.key.ipv6.ipv6_src = key->ipv6_src;
> +            flower.mask.ipv6.ipv6_src = mask->ipv6_src;
> +            flower.key.ipv6.ipv6_dst = key->ipv6_dst;
> +            flower.mask.ipv6.ipv6_dst = mask->ipv6_dst;
> +            memset(&mask->ipv6_src, 0, sizeof mask->ipv6_src);
> +            memset(&mask->ipv6_dst, 0, sizeof mask->ipv6_dst);
> +        }
> +
>          flower.key.ip_proto = key->nw_proto;
>          flower.mask.ip_proto = mask->nw_proto;
>          mask->nw_proto = 0;
> -        flower.key.ip_tos = key->nw_tos;
> -        flower.mask.ip_tos = mask->nw_tos;
> -        mask->nw_tos = 0;
> -        flower.key.ip_ttl = key->nw_ttl;
> -        flower.mask.ip_ttl = mask->nw_ttl;
> -        mask->nw_ttl = 0;
> -
> -        if (mask->nw_frag & FLOW_NW_FRAG_ANY) {
> -            flower.mask.flags |= TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT;
> -
> -            if (key->nw_frag & FLOW_NW_FRAG_ANY) {
> -                flower.key.flags |= TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT;
> -
> -                if (mask->nw_frag & FLOW_NW_FRAG_LATER) {
> -                    flower.mask.flags |= TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST;
> -
> -                    if (!(key->nw_frag & FLOW_NW_FRAG_LATER)) {
> -                        flower.key.flags |= TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST;
> -                    }
> -                }
> -            }
> -
> -            mask->nw_frag = 0;
> -        }
>
>          if (key->nw_proto == IPPROTO_TCP) {
>              flower.key.tcp_dst = key->tp_dst;
>              flower.mask.tcp_dst = mask->tp_dst;
>              flower.key.tcp_src = key->tp_src;
>              flower.mask.tcp_src = mask->tp_src;
> -            flower.key.tcp_flags = key->tcp_flags;
> -            flower.mask.tcp_flags = mask->tcp_flags;
>              mask->tp_src = 0;
>              mask->tp_dst = 0;
> -            mask->tcp_flags = 0;
>          } else if (key->nw_proto == IPPROTO_UDP) {
>              flower.key.udp_dst = key->tp_dst;
>              flower.mask.udp_dst = mask->tp_dst;
> @@ -2483,20 +2529,37 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>              mask->tp_dst = 0;
>          }
>
> -        if (key->dl_type == htons(ETH_P_IP)) {
> -            flower.key.ipv4.ipv4_src = key->nw_src;
> -            flower.mask.ipv4.ipv4_src = mask->nw_src;
> -            flower.key.ipv4.ipv4_dst = key->nw_dst;
> -            flower.mask.ipv4.ipv4_dst = mask->nw_dst;
> -            mask->nw_src = 0;
> -            mask->nw_dst = 0;
> -        } else if (key->dl_type == htons(ETH_P_IPV6)) {
> -            flower.key.ipv6.ipv6_src = key->ipv6_src;
> -            flower.mask.ipv6.ipv6_src = mask->ipv6_src;
> -            flower.key.ipv6.ipv6_dst = key->ipv6_dst;
> -            flower.mask.ipv6.ipv6_dst = mask->ipv6_dst;
> -            memset(&mask->ipv6_src, 0, sizeof mask->ipv6_src);
> -            memset(&mask->ipv6_dst, 0, sizeof mask->ipv6_dst);
> +        if (!exact_match_offload) {
> +            flower.key.ip_tos = key->nw_tos;
> +            flower.mask.ip_tos = mask->nw_tos;
> +            flower.key.ip_ttl = key->nw_ttl;
> +            flower.mask.ip_ttl = mask->nw_ttl;
> +            mask->nw_tos = 0;
> +            mask->nw_ttl = 0;
> +
> +            if (mask->nw_frag & FLOW_NW_FRAG_ANY) {
> +                flower.mask.flags |= TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT;
> +
> +                if (key->nw_frag & FLOW_NW_FRAG_ANY) {
> +                    flower.key.flags |= TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT;
> +
> +                    if (mask->nw_frag & FLOW_NW_FRAG_LATER) {
> +                        flower.mask.flags |= TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST;
> +
> +                        if (!(key->nw_frag & FLOW_NW_FRAG_LATER)) {
> +                            flower.key.flags |= TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST;
> +                        }
> +                    }
> +                }
> +
> +                mask->nw_frag = 0;
> +            }
> +
> +            if (key->nw_proto == IPPROTO_TCP) {
> +                flower.key.tcp_flags = key->tcp_flags;
> +                flower.mask.tcp_flags = mask->tcp_flags;
> +                mask->tcp_flags = 0;
> +            }
>          }
>      }
>
> @@ -2507,6 +2570,13 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>          mask->pkt_mark = 0;
>      }
>
> +    if (exact_match_offload) {
> +        err = test_key_and_clear_mask(match);
> +        if (err) {
> +            return err;
> +        }
> +    }
> +
>      err = test_key_and_mask(match);
>      if (err) {
>          return err;
> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
> index 8a9d36555..349abcf9e 100644
> --- a/lib/netdev-offload.c
> +++ b/lib/netdev-offload.c
> @@ -60,6 +60,7 @@ VLOG_DEFINE_THIS_MODULE(netdev_offload);
>
>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
>  static bool netdev_flow_api_enabled = false;
> +atomic_bool netdev_exact_match_offload = false;
>
>  #define DEFAULT_OFFLOAD_THREAD_NB 1
>  #define MAX_OFFLOAD_THREAD_NB 10
> @@ -902,6 +903,10 @@ netdev_set_flow_api_enabled(const struct smap *ovs_other_config)
>                  netdev_offload_rebalance_policy = true;
>              }
>
> +            if (smap_get_bool(ovs_other_config, "exact-match-offload", false)) {
> +                atomic_store_relaxed(&netdev_exact_match_offload, true);
> +            }
> +
>              netdev_ports_flow_init();
>
>              ovsthread_once_done(&once);
> diff --git a/lib/netdev-offload.h b/lib/netdev-offload.h
> index 47f8e6f48..004d6feaa 100644
> --- a/lib/netdev-offload.h
> +++ b/lib/netdev-offload.h
> @@ -43,6 +43,8 @@ struct smap;
>  struct sset;
>  struct ovs_action_push_tnl;
>
> +extern atomic_bool enable_megaflows;
> +extern atomic_bool netdev_exact_match_offload;
>
>  /* Offload-capable (HW) netdev information */
>  struct netdev_hw_info {
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 4d39bc5a7..72a6a303c 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -23,6 +23,7 @@
>  #include "coverage.h"
>  #include "cmap.h"
>  #include "lib/dpif-provider.h"
> +#include "lib/netdev-offload.h"
>  #include "dpif.h"
>  #include "openvswitch/dynamic-string.h"
>  #include "fail-open.h"
> @@ -446,7 +447,6 @@ static int udpif_flow_unprogram(struct udpif *udpif, struct udpif_key *ukey,
>  static upcall_callback upcall_cb;
>  static dp_purge_callback dp_purge_cb;
>
> -static atomic_bool enable_megaflows = true;
>  static atomic_bool enable_ufid = true;
>
>  void
> diff --git a/tests/system-offloads-traffic.at<http://system-offloads-traffic.at> b/tests/system-offloads-traffic.at<http://system-offloads-traffic.at>
> index d1da33d96..345161f82 100644
> --- a/tests/system-offloads-traffic.at<http://system-offloads-traffic.at>
> +++ b/tests/system-offloads-traffic.at<http://system-offloads-traffic.at>
> @@ -93,6 +93,30 @@ AT_CHECK([ovs-appctl upcall/show | grep -E "offloaded flows : [[1-9]]"], [0], [i
>  OVS_TRAFFIC_VSWITCHD_STOP
>  AT_CLEANUP
>
> +AT_SETUP([offloads - ping between two ports - offloads enabled (exact-match)])
> +OVS_TRAFFIC_VSWITCHD_START([], [], [-- set Open_vSwitch . other_config:hw-offload=true other_config:exact-match-offload=true])
> +   AT_CHECK([ovs-appctl upcall/disable-megaflows], [0], [megaflows disabled
> +], [])
> +
> +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg netdev_offload_tc:file:dbg ofproto_dpif_upcall:DBG dpif_netlink:DBG])
> +
> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
> +
> +ADD_NAMESPACES(at_ns0, at_ns1)
> +
> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24<http://10.1.1.1/24> ")
> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24<http://10.1.1.2/24> ")
> +
> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.1 -w 2 10.1.1.2 | FORMAT_PING], [0], [ignore])
> +
> +AT_CHECK([ovs-appctl dpctl/dump-flows type=tc | grep "eth_type(0x0800)" | strip_stats | DUMP_CLEAN_SORTED], [0], [dnl
> +in_port(2),ct_zone(0),ct_mark(0),ct_label(0),eth(macs),eth_type(0x0800),ipv4(src=10.1.1.1,dst=10.1.1.2,proto=1),icmp(type=8,code=0), packets:0, bytes:0, used:0.001s, actions:output
> +in_port(3),ct_zone(0),ct_mark(0),ct_label(0),eth(macs),eth_type(0x0800),ipv4(src=10.1.1.2,dst=10.1.1.1,proto=1),icmp(type=0,code=0), packets:0, bytes:0, used:0.001s, actions:output
> +])
> +
> +OVS_TRAFFIC_VSWITCHD_STOP
> +AT_CLEANUP
> +
>  AT_SETUP([offloads - set ingress_policing_rate and ingress_policing_burst - offloads disabled])
>  AT_KEYWORDS([ingress_policing])
>  OVS_CHECK_TC_QDISC()
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 36cb4e495..90844b428 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -254,6 +254,25 @@
>          </p>
>        </column>
>
> +      <column name="other_config" key="exact-match-offload"
> +              type='{"type": "boolean"}'>
> +        <p>
> +          Set this value to <code>true</code> to enable exact match netdev flow
> +          offload.
> +        </p>
> +        <p>
> +          The default value is <code>false</code>.
> +        </p>
> +        <p>
> +          To offload exact match flows, the user needs to enable this
> +          feature along with disabling the megaflows.
> +        </p>
> +        <p>
> +          This is only relevant if
> +          <ref column="other_config" key="hw-offload"/> is enabled.
> +        </p>
> +      </column>
> +
>        <column name="other_config" key="n-offload-threads"
>                type='{"type": "integer", "minInteger": 1, "maxInteger": 10}'>
>          <p>
> --
> 2.27.0
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Eelco Chaudron Sept. 27, 2024, 11:28 a.m. UTC | #6
On 25 Sep 2024, at 11:46, Farhat Ullah wrote:

> ________________________________
> From: dev <ovs-dev-bounces@openvswitch.org> on behalf of Eelco Chaudron <echaudro@redhat.com>
> Sent: Friday, September 20, 2024 4:46 PM
> To: Farhan Tariq <FTariq@dreambigsemi.com>
> Cc: ovs-dev@openvswitch.org <ovs-dev@openvswitch.org>
> Subject: Re: [ovs-dev] [RFC PATCH 1/3] exact-match-offload: Introducing Exact-Match HW Offload Support for OVS
>
>
>
> On 12 Sep 2024, at 14:51, Farhan Tariq wrote:
>
>> This patch introduces exact-match offload support in OVS to enhance
>> optimization for hardware datapath. It's being submitted as a RFC for early
>> feedback.
>>
>> Problem Statement:
>> Currently, OVS offloads megaflows using tc-flower. However, hardware
>> support for megaflows has certain limitations:
>> a. TCAMs are used, which aren't scalable and support fewer rules.
>> b. Using hash tables to emulate TCAMs results in complex and slow
>> performance with many tables.
>>
>> Solution:
>> This patch aims to add exact-match offload support to OVS, addressing
>> these issues. This feature is an optional enhancement and is disabled by
>> default.
>>
>> Implementation Details:
>> 1. Added a new flag, "exact-match-offload" to "ovs-vsctl...other_config",
>>    which is disabled by default but can be enabled at runtime. It works only
>>    if megaflows are disabled.
>> 2. Introduced the "enable_exact_match_offload()" API in
>>    "/lib/netdev-offload-tc.c" to zero-out unused masks.
>> 3. Moved "enable_megaflows" from "ofproto/ofproto-dpif-upcall.c" to "lib/-
>>    netdev-offload-tc.c" to make it visible in the "netdev-offload-tc.c"
>>    file. An alternate implementation could be to use a getter api.
>>
>> Testing results:
>> Verified the basic functionality of the patch with different flows.
>>
>> Future Work:
>> 1. Add support for connection tracking.
>> 2. Add support for OVS-DPDK rte_flow.
>>
>> Note: This feature does not affect user OpenFlow tables.
>> ---
>> Potential Issue:
>> This patch might not handle short-lived large numbers of new connections
>> effectively. The last proposal (i.e. proposal #4 listed below) can address
>> this issue. For now this feature is beneficial for many use cases, with
>> further optimization for handling large numbers of short-lived connections
>> planned for future work.
>>
>> Other Possible Solutions:
>> Several approaches to implementing exact-match offload are being
>> considered. Comments are welcome on the best approach:
>> 1. Offload all fields except mutable and unpredictable ones (e.g., ttl,
>>    tos, ip_frag, checksum). Megaflows are disabled.
>> 2. Offload only essential fields (e.g., 5-tuple + VLAN IDs + tunnel info
>>    etc). Megaflows are disabled.
>> 3. Provide configuration options for users to select which fields to
>>    offload. Megaflows are disabled.
>> 4. Introduce an offloading path from the OVS datapath (kernel) module to
>>    integrate megaflows and exact-match offload, addressing short-lived new
>>    connection performance. Suggestions on this approach are welcome.
>> --
>
> Thanks, Eelco,
>
> Thanks, Farhat, for submitting the RFC series. I noticed something that might be a mistake in the subject line—it says "patch 1/3,” but only patch 1 was sent.
> We are planning to submit 2 more patches in the future. I think I misunderstood the use of series of patches.
>
> Regarding your patch, I see that it relies on disabling mega-flow support. However, this isn’t something typically done in a production environment. The ‘ovs-appctl upcall/disable-megaflows’ command was originally introduced for debugging purposes when megaflows were first implemented. Since no one runs or tests with this disabled, it’s generally considered a debug-only command.
>
> While I understand the value of your patch in simplifying hardware offloading of datapath flows, disabling megaflows doesn’t seem like a practical solution for production use. Perhaps it would be beneficial to collaborate with other hardware vendors within the OVS community to find a more viable approach. It might be worth reaching out to other vendors like, Broadcom, Marvell, NVIDIA, etc. on this mailing list.

> What's your take on introducing offload path from megaflows to microflow and microflow to tc-flower in the datapath (kernel) module.

I’m not entirely sure what you're proposing from an OVS architecture perspective, but it seems like it might require a new dpif implementation. Alternatively, we could consider adding this as a type of offload. That said, adding this to the existing netdev-offload-tc.c implementation might lead to a hacky solution, so you need a new type.

However, to support multiple offload types per netdev, a significant architectural change is needed to separate the offload framework. You might find it helpful to refer to the long discussion on the SFlow patch series between Ilya and Chris. I've CCed Ilya in case he wants to provide additional input.

> Regards,
> Farhat
>
> Cheers,
>
> Eelco
>
>> Co-authored-by: Farhat Ullah <farhat@dreambigsemi.com<mailto:farhat@dreambigsemi.com>>
>> Signed-off-by: Farhat Ullah <farhat@dreambigsemi.com<mailto:farhat@dreambigsemi.com>>
>> Signed-off-by: Farhan Tariq <ftariq@dreambigsemi.com<mailto:ftariq@dreambigsemi.com>>
>>
>> ---
>>  Documentation/howto/tc-offload.rst |  56 +++++++
>>  NEWS                               |   6 +
>>  lib/netdev-offload-tc.c            | 236 +++++++++++++++++++----------
>>  lib/netdev-offload.c               |   5 +
>>  lib/netdev-offload.h               |   2 +
>>  ofproto/ofproto-dpif-upcall.c      |   2 +-
>>  tests/system-offloads-traffic.at<http://system-offloads-traffic.at>   |  24 +++
>>  vswitchd/vswitch.xml               |  19 +++
>>  8 files changed, 266 insertions(+), 84 deletions(-)
>>
>> diff --git a/Documentation/howto/tc-offload.rst b/Documentation/howto/tc-offload.rst
>> index ee7f73f8a..7e6e0e9d1 100644
>> --- a/Documentation/howto/tc-offload.rst
>> +++ b/Documentation/howto/tc-offload.rst
>> @@ -37,6 +37,50 @@ The flow hardware offload is disabled by default and can be enabled by::
>>  TC flower has one additional configuration option caled ``tc-policy``.
>>  For more details see ``man ovs-vswitchd.conf.db``.
>>
>> +
>> +Exact Match Flow Hardware Offload
>> +---------------------------------
>> +
>> +The exact match flow hardware offload is disabled by default and can be
>> +enabled by::
>> +
>> +    $ ovs-appctl upcall/disable-megaflows
>> +    $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
>> +    $ ovs-vsctl set Open_vSwitch . other_config:exact-match-offload=true
>> +
>> +For more details see ``man ovs-vswitchd.conf.db``.
>> +
>> +
>> +Supported Fields
>> +~~~~~~~~~~~~~~~~
>> +
>> +Not all fields are offloaded in the exact match flow. The list of supported
>> +fields is:
>> +
>> +    ETH src_mac
>> +    ETH dst_mac
>> +    ETH eth_type
>> +    ARP arp_sha
>> +    ARP arp_tha
>> +    ARP arp_sip
>> +    ARP arp_tip
>> +    ARP arp_op
>> +    MPLS mpls_label
>> +    VLAN eth_type
>> +    VLAN id
>> +    IPv4/v6 src_ip
>> +    IPv4/v6 dst_ip
>> +    IPv4/v6 ip_proto
>> +    ICMPv4/v6 type
>> +    ICMPv4/v6 code
>> +    TCP/UDP/SCTP dst_port
>> +    TCP/UDP/SCTP src_port
>> +    TUNNEL enc_key_id
>> +    TUNNEL enc_src_ip
>> +    TUNNEL enc_dst_ip
>> +    TUNNEL enc_dst_port
>> +
>> +
>>  TC Meter Offload
>>  ----------------
>>
>> @@ -123,3 +167,15 @@ might run through a TC rule, which internally will not call the conntrack
>>  helper required.
>>
>>  So if ALG support is required, tc offload must be disabled.
>> +
>> +Exact Match Flow Offload
>> +~~~~~~~~~~~~~~~~~~~~~~~~
>> +
>> +This section describes the limitations related to the implementation of exact
>> +match flow offload.
>> +
>> +Conntrack 'ct_state' field not offloaded
>> +++++++++++++++++++++++++++++++++++++++++
>> +
>> +Currently, the exact match offloaded flow does not contain the ``ct_state``,
>> +field. This will be addressed in a future patch.
>> \ No newline at end of file
>> diff --git a/NEWS b/NEWS
>> index 944c95a8d..37d22630f 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -1,3 +1,9 @@
>> +Patch: v3.4.0 - 12 Sep 2024
>> +---------------------------
>> +  - Linux TC offload:
>> +    * Add support for offloading exact-match flows.
>> +
>> +
>>  v3.4.0 - 15 Aug 2024
>>  --------------------
>>     - Option '--mlockall' now only locks memory pages on fault, if possible.
>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>> index 3be1c08d2..fe1334d2c 100644
>> --- a/lib/netdev-offload-tc.c
>> +++ b/lib/netdev-offload-tc.c
>> @@ -32,6 +32,7 @@
>>  #include "netdev-linux.h"
>>  #include "netdev-offload-provider.h"
>>  #include "netdev-provider.h"
>> +#include "netdev-offload.h"
>>  #include "netdev-vport.h"
>>  #include "netlink.h"
>>  #include "netlink-socket.h"
>> @@ -44,6 +45,8 @@
>>
>>  VLOG_DEFINE_THIS_MODULE(netdev_offload_tc);
>>
>> +atomic_bool enable_megaflows = true;
>> +
>>  static struct vlog_rate_limit error_rl = VLOG_RATE_LIMIT_INIT(60, 5);
>>  static struct vlog_rate_limit warn_rl = VLOG_RATE_LIMIT_INIT(10, 2);
>>
>> @@ -1682,6 +1685,41 @@ is_ipv6_fragment_and_masked(const struct flow *key, const struct flow *mask)
>>      return false;
>>  }
>>
>> +/* Check and clear unused masks for exact-match offloads */
>> +static int
>> +test_key_and_clear_mask(struct match *match)
>> +{
>> +    const struct flow *key = &match->flow;
>> +    struct flow *mask = &match->wc.masks;
>> +
>> +    for (int i = 1; i < FLOW_MAX_MPLS_LABELS; i++) {
>> +        if (key->mpls_lse[i]) {
>> +            return EOPNOTSUPP;
>> +        }
>> +    }
>> +
>> +    for (int i = 1; i < FLOW_N_REGS; i++) {
>> +        if (key->regs[i]) {
>> +            return EOPNOTSUPP;
>> +        }
>> +    }
>> +
>> +    if (is_ipv6_fragment_and_masked(key, mask)) {
>> +        return EOPNOTSUPP;
>> +    }
>> +
>> +    if (key->dp_hash || key->conj_id ||
>> +        key->skb_priority || key->actset_output ||
>> +        key->packet_type || key->metadata ||
>> +        (key->dl_type == htons(ETH_TYPE_IP) &&
>> +            key->nw_proto == IPPROTO_IGMP)) {
>> +        return EOPNOTSUPP;
>> +    }
>> +
>> +    memset(mask, 0, sizeof *mask);
>> +    return 0;
>> +}
>> +
>>  static int
>>  test_key_and_mask(struct match *match)
>>  {
>> @@ -2220,6 +2258,9 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>>      const struct flow_tnl *tnl = &match->flow.tunnel;
>>      struct flow_tnl *tnl_mask = &mask->tunnel;
>>      struct dpif_flow_stats adjust_stats;
>> +    bool exact_match_offload = false;
>> +    bool microflow = false;
>> +    bool megaflow = false;
>>      bool recirc_act = false;
>>      uint32_t block_id = 0;
>>      struct tcf_id id;
>> @@ -2235,6 +2276,11 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>>          return -ifindex;
>>      }
>>
>> +    atomic_read_relaxed(&enable_megaflows, &megaflow);
>> +    atomic_read_relaxed(&netdev_exact_match_offload, &microflow);
>> +
>> +    exact_match_offload = !megaflow & microflow;
>> +
>>      memset(&flower, 0, sizeof flower);
>>
>>      chain = key->recirc_id;
>> @@ -2253,57 +2299,60 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>>          flower.key.tunnel.ipv4.ipv4_dst = tnl->ip_dst;
>>          flower.key.tunnel.ipv6.ipv6_src = tnl->ipv6_src;
>>          flower.key.tunnel.ipv6.ipv6_dst = tnl->ipv6_dst;
>> -        flower.key.tunnel.tos = tnl->ip_tos;
>> -        flower.key.tunnel.ttl = tnl->ip_ttl;
>> -        flower.key.tunnel.tp_src = tnl->tp_src;
>>          flower.key.tunnel.tp_dst = tnl->tp_dst;
>> -        flower.key.tunnel.gbp.id<http://flower.key.tunnel.gbp.id> = tnl->gbp_id;
>> -        flower.key.tunnel.gbp.flags = tnl->gbp_flags;
>> -        flower.key.tunnel.gbp.id_present = !!tnl_mask->gbp_id;
>> -
>> +        flower.mask.tunnel.id<http://flower.mask.tunnel.id> = (tnl->flags & FLOW_TNL_F_KEY) ? tnl_mask->tun_id : 0;
>>          flower.mask.tunnel.ipv4.ipv4_src = tnl_mask->ip_src;
>>          flower.mask.tunnel.ipv4.ipv4_dst = tnl_mask->ip_dst;
>>          flower.mask.tunnel.ipv6.ipv6_src = tnl_mask->ipv6_src;
>>          flower.mask.tunnel.ipv6.ipv6_dst = tnl_mask->ipv6_dst;
>> -        flower.mask.tunnel.tos = tnl_mask->ip_tos;
>> -        flower.mask.tunnel.ttl = tnl_mask->ip_ttl;
>> -        flower.mask.tunnel.tp_src = tnl_mask->tp_src;
>>          /* XXX: We should be setting the mask from 'tnl_mask->tp_dst' here, but
>>           * some hardware drivers (mlx5) doesn't support masked matches and will
>>           * refuse to offload such flows keeping them in software path.
>>           * Degrading the flow down to exact match for now as a workaround. */
>>          flower.mask.tunnel.tp_dst = OVS_BE16_MAX;
>> -        flower.mask.tunnel.id<http://flower.mask.tunnel.id> = (tnl->flags & FLOW_TNL_F_KEY) ? tnl_mask->tun_id : 0;
>> -        flower.mask.tunnel.gbp.id<http://flower.mask.tunnel.gbp.id> = tnl_mask->gbp_id;
>> -        flower.mask.tunnel.gbp.flags = tnl_mask->gbp_flags;
>> -        flower.mask.tunnel.gbp.id_present = !!tnl_mask->gbp_id;
>> -
>> -        memset(&tnl_mask->ip_src, 0, sizeof tnl_mask->ip_src);
>> -        memset(&tnl_mask->ip_dst, 0, sizeof tnl_mask->ip_dst);
>> -        memset(&tnl_mask->ipv6_src, 0, sizeof tnl_mask->ipv6_src);
>> -        memset(&tnl_mask->ipv6_dst, 0, sizeof tnl_mask->ipv6_dst);
>> -        memset(&tnl_mask->ip_tos, 0, sizeof tnl_mask->ip_tos);
>> -        memset(&tnl_mask->ip_ttl, 0, sizeof tnl_mask->ip_ttl);
>> -        memset(&tnl_mask->tp_src, 0, sizeof tnl_mask->tp_src);
>> -        memset(&tnl_mask->tp_dst, 0, sizeof tnl_mask->tp_dst);
>> -
>> -        memset(&tnl_mask->tun_id, 0, sizeof tnl_mask->tun_id);
>> -        memset(&tnl_mask->gbp_id, 0, sizeof tnl_mask->gbp_id);
>> -        memset(&tnl_mask->gbp_flags, 0, sizeof tnl_mask->gbp_flags);
>> -        tnl_mask->flags &= ~FLOW_TNL_F_KEY;
>> -
>> -        /* XXX: This is wrong!  We're ignoring DF and CSUM flags configuration
>> -         * requested by the user.  However, TC for now has no way to pass
>> -         * these flags in a flower key and their masks are set by default,
>> -         * meaning tunnel offloading will not work at all if not cleared.
>> -         * Keeping incorrect behavior for now. */
>> -        tnl_mask->flags &= ~(FLOW_TNL_F_DONT_FRAGMENT | FLOW_TNL_F_CSUM);
>>
>> -        if (!strcmp(netdev_get_type(netdev), "geneve")) {
>> -            err = flower_match_to_tun_opt(&flower, tnl, tnl_mask);
>> -            if (err) {
>> -                VLOG_WARN_RL(&warn_rl, "Unable to parse geneve options");
>> -                return err;
>> +        if (!exact_match_offload) {
>> +            flower.key.tunnel.tos = tnl->ip_tos;
>> +            flower.key.tunnel.ttl = tnl->ip_ttl;
>> +            flower.key.tunnel.tp_src = tnl->tp_src;
>> +            flower.key.tunnel.gbp.id<http://flower.key.tunnel.gbp.id> = tnl->gbp_id;
>> +            flower.key.tunnel.gbp.flags = tnl->gbp_flags;
>> +            flower.key.tunnel.gbp.id_present = !!tnl_mask->gbp_id;
>> +
>> +            flower.mask.tunnel.tos = tnl_mask->ip_tos;
>> +            flower.mask.tunnel.ttl = tnl_mask->ip_ttl;
>> +            flower.mask.tunnel.tp_src = tnl_mask->tp_src;
>> +            flower.mask.tunnel.gbp.id<http://flower.mask.tunnel.gbp.id> = tnl_mask->gbp_id;
>> +            flower.mask.tunnel.gbp.flags = tnl_mask->gbp_flags;
>> +            flower.mask.tunnel.gbp.id_present = !!tnl_mask->gbp_id;
>> +
>> +            memset(&tnl_mask->ip_src, 0, sizeof tnl_mask->ip_src);
>> +            memset(&tnl_mask->ip_dst, 0, sizeof tnl_mask->ip_dst);
>> +            memset(&tnl_mask->ipv6_src, 0, sizeof tnl_mask->ipv6_src);
>> +            memset(&tnl_mask->ipv6_dst, 0, sizeof tnl_mask->ipv6_dst);
>> +            memset(&tnl_mask->ip_tos, 0, sizeof tnl_mask->ip_tos);
>> +            memset(&tnl_mask->ip_ttl, 0, sizeof tnl_mask->ip_ttl);
>> +            memset(&tnl_mask->tp_src, 0, sizeof tnl_mask->tp_src);
>> +            memset(&tnl_mask->tp_dst, 0, sizeof tnl_mask->tp_dst);
>> +
>> +            memset(&tnl_mask->tun_id, 0, sizeof tnl_mask->tun_id);
>> +            memset(&tnl_mask->gbp_id, 0, sizeof tnl_mask->gbp_id);
>> +            memset(&tnl_mask->gbp_flags, 0, sizeof tnl_mask->gbp_flags);
>> +            tnl_mask->flags &= ~FLOW_TNL_F_KEY;
>> +
>> +            /* XXX: This is wrong!  We're ignoring DF and CSUM flags configuration
>> +            * requested by the user.  However, TC for now has no way to pass
>> +            * these flags in a flower key and their masks are set by default,
>> +            * meaning tunnel offloading will not work at all if not cleared.
>> +            * Keeping incorrect behavior for now. */
>> +            tnl_mask->flags &= ~(FLOW_TNL_F_DONT_FRAGMENT | FLOW_TNL_F_CSUM);
>> +
>> +            if (!strcmp(netdev_get_type(netdev), "geneve")) {
>> +                err = flower_match_to_tun_opt(&flower, tnl, tnl_mask);
>> +                if (err) {
>> +                    VLOG_WARN_RL(&warn_rl, "Unable to parse geneve options");
>> +                    return err;
>> +                }
>>              }
>>          }
>>          flower.tunnel = true;
>> @@ -2319,6 +2368,12 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>>          flower.key.mpls_lse = key->mpls_lse[0];
>>          flower.mask.mpls_lse = mask->mpls_lse[0];
>>          flower.key.encap_eth_type[0] = flower.key.eth_type;
>> +
>> +        if (exact_match_offload) {
>> +            /* Enable mpls_label only */
>> +            flower.key.mpls_lse &= ntohl(MPLS_LABEL_MASK);
>> +            flower.mask.mpls_lse &= ntohl(MPLS_LABEL_MASK);
>> +        }
>>      }
>>      mask->mpls_lse[0] = 0;
>>
>> @@ -2342,7 +2397,8 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>>                  flower.mask.vlan_id[0] = vlan_tci_to_vid(mask->vlans[0].tci);
>>                  VLOG_DBG_RL(&rl, "vlan_id[0]: %d\n", flower.key.vlan_id[0]);
>>              }
>> -            if (pcp_mask) {
>> +            /* Skip vlan_prio in case of exact-match offload */
>> +            if (pcp_mask && !exact_match_offload) {
>>                  flower.key.vlan_prio[0] = vlan_tci_to_pcp(key->vlans[0].tci);
>>                  flower.mask.vlan_prio[0] = vlan_tci_to_pcp(mask->vlans[0].tci);
>>                  VLOG_DBG_RL(&rl, "vlan_prio[0]: %d\n",
>> @@ -2377,7 +2433,7 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>>                  flower.mask.vlan_id[1] = vlan_tci_to_vid(mask->vlans[1].tci);
>>                  VLOG_DBG_RL(&rl, "vlan_id[1]: %d", flower.key.vlan_id[1]);
>>              }
>> -            if (pcp_mask) {
>> +            if (pcp_mask && !exact_match_offload) {
>>                  flower.key.vlan_prio[1] = vlan_tci_to_pcp(key->vlans[1].tci);
>>                  flower.mask.vlan_prio[1] = vlan_tci_to_pcp(mask->vlans[1].tci);
>>                  VLOG_DBG_RL(&rl, "vlan_prio[1]: %d", flower.key.vlan_prio[1]);
>> @@ -2421,44 +2477,34 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>>      }
>>
>>      if (is_ip_any(key) && !is_ipv6_fragment_and_masked(key, mask)) {
>> +
>> +        if (key->dl_type == htons(ETH_P_IP)) {
>> +            flower.key.ipv4.ipv4_src = key->nw_src;
>> +            flower.mask.ipv4.ipv4_src = mask->nw_src;
>> +            flower.key.ipv4.ipv4_dst = key->nw_dst;
>> +            flower.mask.ipv4.ipv4_dst = mask->nw_dst;
>> +            mask->nw_src = 0;
>> +            mask->nw_dst = 0;
>> +        } else if (key->dl_type == htons(ETH_P_IPV6)) {
>> +            flower.key.ipv6.ipv6_src = key->ipv6_src;
>> +            flower.mask.ipv6.ipv6_src = mask->ipv6_src;
>> +            flower.key.ipv6.ipv6_dst = key->ipv6_dst;
>> +            flower.mask.ipv6.ipv6_dst = mask->ipv6_dst;
>> +            memset(&mask->ipv6_src, 0, sizeof mask->ipv6_src);
>> +            memset(&mask->ipv6_dst, 0, sizeof mask->ipv6_dst);
>> +        }
>> +
>>          flower.key.ip_proto = key->nw_proto;
>>          flower.mask.ip_proto = mask->nw_proto;
>>          mask->nw_proto = 0;
>> -        flower.key.ip_tos = key->nw_tos;
>> -        flower.mask.ip_tos = mask->nw_tos;
>> -        mask->nw_tos = 0;
>> -        flower.key.ip_ttl = key->nw_ttl;
>> -        flower.mask.ip_ttl = mask->nw_ttl;
>> -        mask->nw_ttl = 0;
>> -
>> -        if (mask->nw_frag & FLOW_NW_FRAG_ANY) {
>> -            flower.mask.flags |= TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT;
>> -
>> -            if (key->nw_frag & FLOW_NW_FRAG_ANY) {
>> -                flower.key.flags |= TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT;
>> -
>> -                if (mask->nw_frag & FLOW_NW_FRAG_LATER) {
>> -                    flower.mask.flags |= TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST;
>> -
>> -                    if (!(key->nw_frag & FLOW_NW_FRAG_LATER)) {
>> -                        flower.key.flags |= TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST;
>> -                    }
>> -                }
>> -            }
>> -
>> -            mask->nw_frag = 0;
>> -        }
>>
>>          if (key->nw_proto == IPPROTO_TCP) {
>>              flower.key.tcp_dst = key->tp_dst;
>>              flower.mask.tcp_dst = mask->tp_dst;
>>              flower.key.tcp_src = key->tp_src;
>>              flower.mask.tcp_src = mask->tp_src;
>> -            flower.key.tcp_flags = key->tcp_flags;
>> -            flower.mask.tcp_flags = mask->tcp_flags;
>>              mask->tp_src = 0;
>>              mask->tp_dst = 0;
>> -            mask->tcp_flags = 0;
>>          } else if (key->nw_proto == IPPROTO_UDP) {
>>              flower.key.udp_dst = key->tp_dst;
>>              flower.mask.udp_dst = mask->tp_dst;
>> @@ -2483,20 +2529,37 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>>              mask->tp_dst = 0;
>>          }
>>
>> -        if (key->dl_type == htons(ETH_P_IP)) {
>> -            flower.key.ipv4.ipv4_src = key->nw_src;
>> -            flower.mask.ipv4.ipv4_src = mask->nw_src;
>> -            flower.key.ipv4.ipv4_dst = key->nw_dst;
>> -            flower.mask.ipv4.ipv4_dst = mask->nw_dst;
>> -            mask->nw_src = 0;
>> -            mask->nw_dst = 0;
>> -        } else if (key->dl_type == htons(ETH_P_IPV6)) {
>> -            flower.key.ipv6.ipv6_src = key->ipv6_src;
>> -            flower.mask.ipv6.ipv6_src = mask->ipv6_src;
>> -            flower.key.ipv6.ipv6_dst = key->ipv6_dst;
>> -            flower.mask.ipv6.ipv6_dst = mask->ipv6_dst;
>> -            memset(&mask->ipv6_src, 0, sizeof mask->ipv6_src);
>> -            memset(&mask->ipv6_dst, 0, sizeof mask->ipv6_dst);
>> +        if (!exact_match_offload) {
>> +            flower.key.ip_tos = key->nw_tos;
>> +            flower.mask.ip_tos = mask->nw_tos;
>> +            flower.key.ip_ttl = key->nw_ttl;
>> +            flower.mask.ip_ttl = mask->nw_ttl;
>> +            mask->nw_tos = 0;
>> +            mask->nw_ttl = 0;
>> +
>> +            if (mask->nw_frag & FLOW_NW_FRAG_ANY) {
>> +                flower.mask.flags |= TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT;
>> +
>> +                if (key->nw_frag & FLOW_NW_FRAG_ANY) {
>> +                    flower.key.flags |= TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT;
>> +
>> +                    if (mask->nw_frag & FLOW_NW_FRAG_LATER) {
>> +                        flower.mask.flags |= TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST;
>> +
>> +                        if (!(key->nw_frag & FLOW_NW_FRAG_LATER)) {
>> +                            flower.key.flags |= TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST;
>> +                        }
>> +                    }
>> +                }
>> +
>> +                mask->nw_frag = 0;
>> +            }
>> +
>> +            if (key->nw_proto == IPPROTO_TCP) {
>> +                flower.key.tcp_flags = key->tcp_flags;
>> +                flower.mask.tcp_flags = mask->tcp_flags;
>> +                mask->tcp_flags = 0;
>> +            }
>>          }
>>      }
>>
>> @@ -2507,6 +2570,13 @@ netdev_tc_flow_put(struct netdev *netdev, struct match *match,
>>          mask->pkt_mark = 0;
>>      }
>>
>> +    if (exact_match_offload) {
>> +        err = test_key_and_clear_mask(match);
>> +        if (err) {
>> +            return err;
>> +        }
>> +    }
>> +
>>      err = test_key_and_mask(match);
>>      if (err) {
>>          return err;
>> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
>> index 8a9d36555..349abcf9e 100644
>> --- a/lib/netdev-offload.c
>> +++ b/lib/netdev-offload.c
>> @@ -60,6 +60,7 @@ VLOG_DEFINE_THIS_MODULE(netdev_offload);
>>
>>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
>>  static bool netdev_flow_api_enabled = false;
>> +atomic_bool netdev_exact_match_offload = false;
>>
>>  #define DEFAULT_OFFLOAD_THREAD_NB 1
>>  #define MAX_OFFLOAD_THREAD_NB 10
>> @@ -902,6 +903,10 @@ netdev_set_flow_api_enabled(const struct smap *ovs_other_config)
>>                  netdev_offload_rebalance_policy = true;
>>              }
>>
>> +            if (smap_get_bool(ovs_other_config, "exact-match-offload", false)) {
>> +                atomic_store_relaxed(&netdev_exact_match_offload, true);
>> +            }
>> +
>>              netdev_ports_flow_init();
>>
>>              ovsthread_once_done(&once);
>> diff --git a/lib/netdev-offload.h b/lib/netdev-offload.h
>> index 47f8e6f48..004d6feaa 100644
>> --- a/lib/netdev-offload.h
>> +++ b/lib/netdev-offload.h
>> @@ -43,6 +43,8 @@ struct smap;
>>  struct sset;
>>  struct ovs_action_push_tnl;
>>
>> +extern atomic_bool enable_megaflows;
>> +extern atomic_bool netdev_exact_match_offload;
>>
>>  /* Offload-capable (HW) netdev information */
>>  struct netdev_hw_info {
>> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
>> index 4d39bc5a7..72a6a303c 100644
>> --- a/ofproto/ofproto-dpif-upcall.c
>> +++ b/ofproto/ofproto-dpif-upcall.c
>> @@ -23,6 +23,7 @@
>>  #include "coverage.h"
>>  #include "cmap.h"
>>  #include "lib/dpif-provider.h"
>> +#include "lib/netdev-offload.h"
>>  #include "dpif.h"
>>  #include "openvswitch/dynamic-string.h"
>>  #include "fail-open.h"
>> @@ -446,7 +447,6 @@ static int udpif_flow_unprogram(struct udpif *udpif, struct udpif_key *ukey,
>>  static upcall_callback upcall_cb;
>>  static dp_purge_callback dp_purge_cb;
>>
>> -static atomic_bool enable_megaflows = true;
>>  static atomic_bool enable_ufid = true;
>>
>>  void
>> diff --git a/tests/system-offloads-traffic.at<http://system-offloads-traffic.at> b/tests/system-offloads-traffic.at<http://system-offloads-traffic.at>
>> index d1da33d96..345161f82 100644
>> --- a/tests/system-offloads-traffic.at<http://system-offloads-traffic.at>
>> +++ b/tests/system-offloads-traffic.at<http://system-offloads-traffic.at>
>> @@ -93,6 +93,30 @@ AT_CHECK([ovs-appctl upcall/show | grep -E "offloaded flows : [[1-9]]"], [0], [i
>>  OVS_TRAFFIC_VSWITCHD_STOP
>>  AT_CLEANUP
>>
>> +AT_SETUP([offloads - ping between two ports - offloads enabled (exact-match)])
>> +OVS_TRAFFIC_VSWITCHD_START([], [], [-- set Open_vSwitch . other_config:hw-offload=true other_config:exact-match-offload=true])
>> +   AT_CHECK([ovs-appctl upcall/disable-megaflows], [0], [megaflows disabled
>> +], [])
>> +
>> +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg netdev_offload_tc:file:dbg ofproto_dpif_upcall:DBG dpif_netlink:DBG])
>> +
>> +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
>> +
>> +ADD_NAMESPACES(at_ns0, at_ns1)
>> +
>> +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24<http://10.1.1.1/24> ")
>> +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24<http://10.1.1.2/24> ")
>> +
>> +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.1 -w 2 10.1.1.2 | FORMAT_PING], [0], [ignore])
>> +
>> +AT_CHECK([ovs-appctl dpctl/dump-flows type=tc | grep "eth_type(0x0800)" | strip_stats | DUMP_CLEAN_SORTED], [0], [dnl
>> +in_port(2),ct_zone(0),ct_mark(0),ct_label(0),eth(macs),eth_type(0x0800),ipv4(src=10.1.1.1,dst=10.1.1.2,proto=1),icmp(type=8,code=0), packets:0, bytes:0, used:0.001s, actions:output
>> +in_port(3),ct_zone(0),ct_mark(0),ct_label(0),eth(macs),eth_type(0x0800),ipv4(src=10.1.1.2,dst=10.1.1.1,proto=1),icmp(type=0,code=0), packets:0, bytes:0, used:0.001s, actions:output
>> +])
>> +
>> +OVS_TRAFFIC_VSWITCHD_STOP
>> +AT_CLEANUP
>> +
>>  AT_SETUP([offloads - set ingress_policing_rate and ingress_policing_burst - offloads disabled])
>>  AT_KEYWORDS([ingress_policing])
>>  OVS_CHECK_TC_QDISC()
>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>> index 36cb4e495..90844b428 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> @@ -254,6 +254,25 @@
>>          </p>
>>        </column>
>>
>> +      <column name="other_config" key="exact-match-offload"
>> +              type='{"type": "boolean"}'>
>> +        <p>
>> +          Set this value to <code>true</code> to enable exact match netdev flow
>> +          offload.
>> +        </p>
>> +        <p>
>> +          The default value is <code>false</code>.
>> +        </p>
>> +        <p>
>> +          To offload exact match flows, the user needs to enable this
>> +          feature along with disabling the megaflows.
>> +        </p>
>> +        <p>
>> +          This is only relevant if
>> +          <ref column="other_config" key="hw-offload"/> is enabled.
>> +        </p>
>> +      </column>
>> +
>>        <column name="other_config" key="n-offload-threads"
>>                type='{"type": "integer", "minInteger": 1, "maxInteger": 10}'>
>>          <p>
>> --
>> 2.27.0
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff mbox series

Patch

diff --git a/Documentation/howto/tc-offload.rst b/Documentation/howto/tc-offload.rst
index ee7f73f8a..7e6e0e9d1 100644
--- a/Documentation/howto/tc-offload.rst
+++ b/Documentation/howto/tc-offload.rst
@@ -37,6 +37,50 @@  The flow hardware offload is disabled by default and can be enabled by::
 TC flower has one additional configuration option caled ``tc-policy``.
 For more details see ``man ovs-vswitchd.conf.db``.

+
+Exact Match Flow Hardware Offload
+---------------------------------
+
+The exact match flow hardware offload is disabled by default and can be
+enabled by::
+
+    $ ovs-appctl upcall/disable-megaflows
+    $ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true
+    $ ovs-vsctl set Open_vSwitch . other_config:exact-match-offload=true
+
+For more details see ``man ovs-vswitchd.conf.db``.
+
+
+Supported Fields
+~~~~~~~~~~~~~~~~
+
+Not all fields are offloaded in the exact match flow. The list of supported
+fields is:
+
+    ETH src_mac
+    ETH dst_mac
+    ETH eth_type
+    ARP arp_sha
+    ARP arp_tha
+    ARP arp_sip
+    ARP arp_tip
+    ARP arp_op
+    MPLS mpls_label
+    VLAN eth_type
+    VLAN id
+    IPv4/v6 src_ip
+    IPv4/v6 dst_ip
+    IPv4/v6 ip_proto
+    ICMPv4/v6 type
+    ICMPv4/v6 code
+    TCP/UDP/SCTP dst_port
+    TCP/UDP/SCTP src_port
+    TUNNEL enc_key_id
+    TUNNEL enc_src_ip
+    TUNNEL enc_dst_ip
+    TUNNEL enc_dst_port
+
+
 TC Meter Offload
 ----------------

@@ -123,3 +167,15 @@  might run through a TC rule, which internally will not call the conntrack
 helper required.

 So if ALG support is required, tc offload must be disabled.
+
+Exact Match Flow Offload
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+This section describes the limitations related to the implementation of exact
+match flow offload.
+
+Conntrack 'ct_state' field not offloaded
+++++++++++++++++++++++++++++++++++++++++
+
+Currently, the exact match offloaded flow does not contain the ``ct_state``,
+field. This will be addressed in a future patch.
\ No newline at end of file
diff --git a/NEWS b/NEWS
index 944c95a8d..37d22630f 100644
--- a/NEWS
+++ b/NEWS
@@ -1,3 +1,9 @@ 
+Patch: v3.4.0 - 12 Sep 2024
+---------------------------
+  - Linux TC offload:
+    * Add support for offloading exact-match flows.
+
+
 v3.4.0 - 15 Aug 2024
 --------------------
    - Option '--mlockall' now only locks memory pages on fault, if possible.
diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index 3be1c08d2..fe1334d2c 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -32,6 +32,7 @@ 
 #include "netdev-linux.h"
 #include "netdev-offload-provider.h"
 #include "netdev-provider.h"
+#include "netdev-offload.h"
 #include "netdev-vport.h"
 #include "netlink.h"
 #include "netlink-socket.h"
@@ -44,6 +45,8 @@ 

 VLOG_DEFINE_THIS_MODULE(netdev_offload_tc);

+atomic_bool enable_megaflows = true;
+
 static struct vlog_rate_limit error_rl = VLOG_RATE_LIMIT_INIT(60, 5);
 static struct vlog_rate_limit warn_rl = VLOG_RATE_LIMIT_INIT(10, 2);

@@ -1682,6 +1685,41 @@  is_ipv6_fragment_and_masked(const struct flow *key, const struct flow *mask)
     return false;
 }

+/* Check and clear unused masks for exact-match offloads */
+static int
+test_key_and_clear_mask(struct match *match)
+{
+    const struct flow *key = &match->flow;
+    struct flow *mask = &match->wc.masks;
+
+    for (int i = 1; i < FLOW_MAX_MPLS_LABELS; i++) {
+        if (key->mpls_lse[i]) {
+            return EOPNOTSUPP;
+        }
+    }
+
+    for (int i = 1; i < FLOW_N_REGS; i++) {
+        if (key->regs[i]) {
+            return EOPNOTSUPP;
+        }
+    }
+
+    if (is_ipv6_fragment_and_masked(key, mask)) {
+        return EOPNOTSUPP;
+    }
+
+    if (key->dp_hash || key->conj_id ||
+        key->skb_priority || key->actset_output ||
+        key->packet_type || key->metadata ||
+        (key->dl_type == htons(ETH_TYPE_IP) &&
+            key->nw_proto == IPPROTO_IGMP)) {
+        return EOPNOTSUPP;
+    }
+
+    memset(mask, 0, sizeof *mask);
+    return 0;
+}
+
 static int
 test_key_and_mask(struct match *match)
 {
@@ -2220,6 +2258,9 @@  netdev_tc_flow_put(struct netdev *netdev, struct match *match,
     const struct flow_tnl *tnl = &match->flow.tunnel;
     struct flow_tnl *tnl_mask = &mask->tunnel;
     struct dpif_flow_stats adjust_stats;
+    bool exact_match_offload = false;
+    bool microflow = false;
+    bool megaflow = false;
     bool recirc_act = false;
     uint32_t block_id = 0;
     struct tcf_id id;
@@ -2235,6 +2276,11 @@  netdev_tc_flow_put(struct netdev *netdev, struct match *match,
         return -ifindex;
     }

+    atomic_read_relaxed(&enable_megaflows, &megaflow);
+    atomic_read_relaxed(&netdev_exact_match_offload, &microflow);
+
+    exact_match_offload = !megaflow & microflow;
+
     memset(&flower, 0, sizeof flower);

     chain = key->recirc_id;
@@ -2253,57 +2299,60 @@  netdev_tc_flow_put(struct netdev *netdev, struct match *match,
         flower.key.tunnel.ipv4.ipv4_dst = tnl->ip_dst;
         flower.key.tunnel.ipv6.ipv6_src = tnl->ipv6_src;
         flower.key.tunnel.ipv6.ipv6_dst = tnl->ipv6_dst;
-        flower.key.tunnel.tos = tnl->ip_tos;
-        flower.key.tunnel.ttl = tnl->ip_ttl;
-        flower.key.tunnel.tp_src = tnl->tp_src;
         flower.key.tunnel.tp_dst = tnl->tp_dst;
-        flower.key.tunnel.gbp.id<http://flower.key.tunnel.gbp.id> = tnl->gbp_id;
-        flower.key.tunnel.gbp.flags = tnl->gbp_flags;
-        flower.key.tunnel.gbp.id_present = !!tnl_mask->gbp_id;
-
+        flower.mask.tunnel.id<http://flower.mask.tunnel.id> = (tnl->flags & FLOW_TNL_F_KEY) ? tnl_mask->tun_id : 0;
         flower.mask.tunnel.ipv4.ipv4_src = tnl_mask->ip_src;
         flower.mask.tunnel.ipv4.ipv4_dst = tnl_mask->ip_dst;
         flower.mask.tunnel.ipv6.ipv6_src = tnl_mask->ipv6_src;
         flower.mask.tunnel.ipv6.ipv6_dst = tnl_mask->ipv6_dst;
-        flower.mask.tunnel.tos = tnl_mask->ip_tos;
-        flower.mask.tunnel.ttl = tnl_mask->ip_ttl;
-        flower.mask.tunnel.tp_src = tnl_mask->tp_src;
         /* XXX: We should be setting the mask from 'tnl_mask->tp_dst' here, but
          * some hardware drivers (mlx5) doesn't support masked matches and will
          * refuse to offload such flows keeping them in software path.
          * Degrading the flow down to exact match for now as a workaround. */
         flower.mask.tunnel.tp_dst = OVS_BE16_MAX;
-        flower.mask.tunnel.id<http://flower.mask.tunnel.id> = (tnl->flags & FLOW_TNL_F_KEY) ? tnl_mask->tun_id : 0;
-        flower.mask.tunnel.gbp.id<http://flower.mask.tunnel.gbp.id> = tnl_mask->gbp_id;
-        flower.mask.tunnel.gbp.flags = tnl_mask->gbp_flags;
-        flower.mask.tunnel.gbp.id_present = !!tnl_mask->gbp_id;
-
-        memset(&tnl_mask->ip_src, 0, sizeof tnl_mask->ip_src);
-        memset(&tnl_mask->ip_dst, 0, sizeof tnl_mask->ip_dst);
-        memset(&tnl_mask->ipv6_src, 0, sizeof tnl_mask->ipv6_src);
-        memset(&tnl_mask->ipv6_dst, 0, sizeof tnl_mask->ipv6_dst);
-        memset(&tnl_mask->ip_tos, 0, sizeof tnl_mask->ip_tos);
-        memset(&tnl_mask->ip_ttl, 0, sizeof tnl_mask->ip_ttl);
-        memset(&tnl_mask->tp_src, 0, sizeof tnl_mask->tp_src);
-        memset(&tnl_mask->tp_dst, 0, sizeof tnl_mask->tp_dst);
-
-        memset(&tnl_mask->tun_id, 0, sizeof tnl_mask->tun_id);
-        memset(&tnl_mask->gbp_id, 0, sizeof tnl_mask->gbp_id);
-        memset(&tnl_mask->gbp_flags, 0, sizeof tnl_mask->gbp_flags);
-        tnl_mask->flags &= ~FLOW_TNL_F_KEY;
-
-        /* XXX: This is wrong!  We're ignoring DF and CSUM flags configuration
-         * requested by the user.  However, TC for now has no way to pass
-         * these flags in a flower key and their masks are set by default,
-         * meaning tunnel offloading will not work at all if not cleared.
-         * Keeping incorrect behavior for now. */
-        tnl_mask->flags &= ~(FLOW_TNL_F_DONT_FRAGMENT | FLOW_TNL_F_CSUM);

-        if (!strcmp(netdev_get_type(netdev), "geneve")) {
-            err = flower_match_to_tun_opt(&flower, tnl, tnl_mask);
-            if (err) {
-                VLOG_WARN_RL(&warn_rl, "Unable to parse geneve options");
-                return err;
+        if (!exact_match_offload) {
+            flower.key.tunnel.tos = tnl->ip_tos;
+            flower.key.tunnel.ttl = tnl->ip_ttl;
+            flower.key.tunnel.tp_src = tnl->tp_src;
+            flower.key.tunnel.gbp.id<http://flower.key.tunnel.gbp.id> = tnl->gbp_id;
+            flower.key.tunnel.gbp.flags = tnl->gbp_flags;
+            flower.key.tunnel.gbp.id_present = !!tnl_mask->gbp_id;
+
+            flower.mask.tunnel.tos = tnl_mask->ip_tos;
+            flower.mask.tunnel.ttl = tnl_mask->ip_ttl;
+            flower.mask.tunnel.tp_src = tnl_mask->tp_src;
+            flower.mask.tunnel.gbp.id<http://flower.mask.tunnel.gbp.id> = tnl_mask->gbp_id;
+            flower.mask.tunnel.gbp.flags = tnl_mask->gbp_flags;
+            flower.mask.tunnel.gbp.id_present = !!tnl_mask->gbp_id;
+
+            memset(&tnl_mask->ip_src, 0, sizeof tnl_mask->ip_src);
+            memset(&tnl_mask->ip_dst, 0, sizeof tnl_mask->ip_dst);
+            memset(&tnl_mask->ipv6_src, 0, sizeof tnl_mask->ipv6_src);
+            memset(&tnl_mask->ipv6_dst, 0, sizeof tnl_mask->ipv6_dst);
+            memset(&tnl_mask->ip_tos, 0, sizeof tnl_mask->ip_tos);
+            memset(&tnl_mask->ip_ttl, 0, sizeof tnl_mask->ip_ttl);
+            memset(&tnl_mask->tp_src, 0, sizeof tnl_mask->tp_src);
+            memset(&tnl_mask->tp_dst, 0, sizeof tnl_mask->tp_dst);
+
+            memset(&tnl_mask->tun_id, 0, sizeof tnl_mask->tun_id);
+            memset(&tnl_mask->gbp_id, 0, sizeof tnl_mask->gbp_id);
+            memset(&tnl_mask->gbp_flags, 0, sizeof tnl_mask->gbp_flags);
+            tnl_mask->flags &= ~FLOW_TNL_F_KEY;
+
+            /* XXX: This is wrong!  We're ignoring DF and CSUM flags configuration
+            * requested by the user.  However, TC for now has no way to pass
+            * these flags in a flower key and their masks are set by default,
+            * meaning tunnel offloading will not work at all if not cleared.
+            * Keeping incorrect behavior for now. */
+            tnl_mask->flags &= ~(FLOW_TNL_F_DONT_FRAGMENT | FLOW_TNL_F_CSUM);
+
+            if (!strcmp(netdev_get_type(netdev), "geneve")) {
+                err = flower_match_to_tun_opt(&flower, tnl, tnl_mask);
+                if (err) {
+                    VLOG_WARN_RL(&warn_rl, "Unable to parse geneve options");
+                    return err;
+                }
             }
         }
         flower.tunnel = true;
@@ -2319,6 +2368,12 @@  netdev_tc_flow_put(struct netdev *netdev, struct match *match,
         flower.key.mpls_lse = key->mpls_lse[0];
         flower.mask.mpls_lse = mask->mpls_lse[0];
         flower.key.encap_eth_type[0] = flower.key.eth_type;
+
+        if (exact_match_offload) {
+            /* Enable mpls_label only */
+            flower.key.mpls_lse &= ntohl(MPLS_LABEL_MASK);
+            flower.mask.mpls_lse &= ntohl(MPLS_LABEL_MASK);
+        }
     }
     mask->mpls_lse[0] = 0;

@@ -2342,7 +2397,8 @@  netdev_tc_flow_put(struct netdev *netdev, struct match *match,
                 flower.mask.vlan_id[0] = vlan_tci_to_vid(mask->vlans[0].tci);
                 VLOG_DBG_RL(&rl, "vlan_id[0]: %d\n", flower.key.vlan_id[0]);
             }
-            if (pcp_mask) {
+            /* Skip vlan_prio in case of exact-match offload */
+            if (pcp_mask && !exact_match_offload) {
                 flower.key.vlan_prio[0] = vlan_tci_to_pcp(key->vlans[0].tci);
                 flower.mask.vlan_prio[0] = vlan_tci_to_pcp(mask->vlans[0].tci);
                 VLOG_DBG_RL(&rl, "vlan_prio[0]: %d\n",
@@ -2377,7 +2433,7 @@  netdev_tc_flow_put(struct netdev *netdev, struct match *match,
                 flower.mask.vlan_id[1] = vlan_tci_to_vid(mask->vlans[1].tci);
                 VLOG_DBG_RL(&rl, "vlan_id[1]: %d", flower.key.vlan_id[1]);
             }
-            if (pcp_mask) {
+            if (pcp_mask && !exact_match_offload) {
                 flower.key.vlan_prio[1] = vlan_tci_to_pcp(key->vlans[1].tci);
                 flower.mask.vlan_prio[1] = vlan_tci_to_pcp(mask->vlans[1].tci);
                 VLOG_DBG_RL(&rl, "vlan_prio[1]: %d", flower.key.vlan_prio[1]);
@@ -2421,44 +2477,34 @@  netdev_tc_flow_put(struct netdev *netdev, struct match *match,
     }

     if (is_ip_any(key) && !is_ipv6_fragment_and_masked(key, mask)) {
+
+        if (key->dl_type == htons(ETH_P_IP)) {
+            flower.key.ipv4.ipv4_src = key->nw_src;
+            flower.mask.ipv4.ipv4_src = mask->nw_src;
+            flower.key.ipv4.ipv4_dst = key->nw_dst;
+            flower.mask.ipv4.ipv4_dst = mask->nw_dst;
+            mask->nw_src = 0;
+            mask->nw_dst = 0;
+        } else if (key->dl_type == htons(ETH_P_IPV6)) {
+            flower.key.ipv6.ipv6_src = key->ipv6_src;
+            flower.mask.ipv6.ipv6_src = mask->ipv6_src;
+            flower.key.ipv6.ipv6_dst = key->ipv6_dst;
+            flower.mask.ipv6.ipv6_dst = mask->ipv6_dst;
+            memset(&mask->ipv6_src, 0, sizeof mask->ipv6_src);
+            memset(&mask->ipv6_dst, 0, sizeof mask->ipv6_dst);
+        }
+
         flower.key.ip_proto = key->nw_proto;
         flower.mask.ip_proto = mask->nw_proto;
         mask->nw_proto = 0;
-        flower.key.ip_tos = key->nw_tos;
-        flower.mask.ip_tos = mask->nw_tos;
-        mask->nw_tos = 0;
-        flower.key.ip_ttl = key->nw_ttl;
-        flower.mask.ip_ttl = mask->nw_ttl;
-        mask->nw_ttl = 0;
-
-        if (mask->nw_frag & FLOW_NW_FRAG_ANY) {
-            flower.mask.flags |= TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT;
-
-            if (key->nw_frag & FLOW_NW_FRAG_ANY) {
-                flower.key.flags |= TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT;
-
-                if (mask->nw_frag & FLOW_NW_FRAG_LATER) {
-                    flower.mask.flags |= TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST;
-
-                    if (!(key->nw_frag & FLOW_NW_FRAG_LATER)) {
-                        flower.key.flags |= TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST;
-                    }
-                }
-            }
-
-            mask->nw_frag = 0;
-        }

         if (key->nw_proto == IPPROTO_TCP) {
             flower.key.tcp_dst = key->tp_dst;
             flower.mask.tcp_dst = mask->tp_dst;
             flower.key.tcp_src = key->tp_src;
             flower.mask.tcp_src = mask->tp_src;
-            flower.key.tcp_flags = key->tcp_flags;
-            flower.mask.tcp_flags = mask->tcp_flags;
             mask->tp_src = 0;
             mask->tp_dst = 0;
-            mask->tcp_flags = 0;
         } else if (key->nw_proto == IPPROTO_UDP) {
             flower.key.udp_dst = key->tp_dst;
             flower.mask.udp_dst = mask->tp_dst;
@@ -2483,20 +2529,37 @@  netdev_tc_flow_put(struct netdev *netdev, struct match *match,
             mask->tp_dst = 0;
         }

-        if (key->dl_type == htons(ETH_P_IP)) {
-            flower.key.ipv4.ipv4_src = key->nw_src;
-            flower.mask.ipv4.ipv4_src = mask->nw_src;
-            flower.key.ipv4.ipv4_dst = key->nw_dst;
-            flower.mask.ipv4.ipv4_dst = mask->nw_dst;
-            mask->nw_src = 0;
-            mask->nw_dst = 0;
-        } else if (key->dl_type == htons(ETH_P_IPV6)) {
-            flower.key.ipv6.ipv6_src = key->ipv6_src;
-            flower.mask.ipv6.ipv6_src = mask->ipv6_src;
-            flower.key.ipv6.ipv6_dst = key->ipv6_dst;
-            flower.mask.ipv6.ipv6_dst = mask->ipv6_dst;
-            memset(&mask->ipv6_src, 0, sizeof mask->ipv6_src);
-            memset(&mask->ipv6_dst, 0, sizeof mask->ipv6_dst);
+        if (!exact_match_offload) {
+            flower.key.ip_tos = key->nw_tos;
+            flower.mask.ip_tos = mask->nw_tos;
+            flower.key.ip_ttl = key->nw_ttl;
+            flower.mask.ip_ttl = mask->nw_ttl;
+            mask->nw_tos = 0;
+            mask->nw_ttl = 0;
+
+            if (mask->nw_frag & FLOW_NW_FRAG_ANY) {
+                flower.mask.flags |= TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT;
+
+                if (key->nw_frag & FLOW_NW_FRAG_ANY) {
+                    flower.key.flags |= TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT;
+
+                    if (mask->nw_frag & FLOW_NW_FRAG_LATER) {
+                        flower.mask.flags |= TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST;
+
+                        if (!(key->nw_frag & FLOW_NW_FRAG_LATER)) {
+                            flower.key.flags |= TCA_FLOWER_KEY_FLAGS_FRAG_IS_FIRST;
+                        }
+                    }
+                }
+
+                mask->nw_frag = 0;
+            }
+
+            if (key->nw_proto == IPPROTO_TCP) {
+                flower.key.tcp_flags = key->tcp_flags;
+                flower.mask.tcp_flags = mask->tcp_flags;
+                mask->tcp_flags = 0;
+            }
         }
     }

@@ -2507,6 +2570,13 @@  netdev_tc_flow_put(struct netdev *netdev, struct match *match,
         mask->pkt_mark = 0;
     }

+    if (exact_match_offload) {
+        err = test_key_and_clear_mask(match);
+        if (err) {
+            return err;
+        }
+    }
+
     err = test_key_and_mask(match);
     if (err) {
         return err;
diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
index 8a9d36555..349abcf9e 100644
--- a/lib/netdev-offload.c
+++ b/lib/netdev-offload.c
@@ -60,6 +60,7 @@  VLOG_DEFINE_THIS_MODULE(netdev_offload);

 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5);
 static bool netdev_flow_api_enabled = false;
+atomic_bool netdev_exact_match_offload = false;

 #define DEFAULT_OFFLOAD_THREAD_NB 1
 #define MAX_OFFLOAD_THREAD_NB 10
@@ -902,6 +903,10 @@  netdev_set_flow_api_enabled(const struct smap *ovs_other_config)
                 netdev_offload_rebalance_policy = true;
             }

+            if (smap_get_bool(ovs_other_config, "exact-match-offload", false)) {
+                atomic_store_relaxed(&netdev_exact_match_offload, true);
+            }
+
             netdev_ports_flow_init();

             ovsthread_once_done(&once);
diff --git a/lib/netdev-offload.h b/lib/netdev-offload.h
index 47f8e6f48..004d6feaa 100644
--- a/lib/netdev-offload.h
+++ b/lib/netdev-offload.h
@@ -43,6 +43,8 @@  struct smap;
 struct sset;
 struct ovs_action_push_tnl;

+extern atomic_bool enable_megaflows;
+extern atomic_bool netdev_exact_match_offload;

 /* Offload-capable (HW) netdev information */
 struct netdev_hw_info {
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 4d39bc5a7..72a6a303c 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -23,6 +23,7 @@ 
 #include "coverage.h"
 #include "cmap.h"
 #include "lib/dpif-provider.h"
+#include "lib/netdev-offload.h"
 #include "dpif.h"
 #include "openvswitch/dynamic-string.h"
 #include "fail-open.h"
@@ -446,7 +447,6 @@  static int udpif_flow_unprogram(struct udpif *udpif, struct udpif_key *ukey,
 static upcall_callback upcall_cb;
 static dp_purge_callback dp_purge_cb;

-static atomic_bool enable_megaflows = true;
 static atomic_bool enable_ufid = true;

 void
diff --git a/tests/system-offloads-traffic.at<http://system-offloads-traffic.at> b/tests/system-offloads-traffic.at<http://system-offloads-traffic.at>
index d1da33d96..345161f82 100644
--- a/tests/system-offloads-traffic.at<http://system-offloads-traffic.at>
+++ b/tests/system-offloads-traffic.at<http://system-offloads-traffic.at>
@@ -93,6 +93,30 @@  AT_CHECK([ovs-appctl upcall/show | grep -E "offloaded flows : [[1-9]]"], [0], [i
 OVS_TRAFFIC_VSWITCHD_STOP
 AT_CLEANUP

+AT_SETUP([offloads - ping between two ports - offloads enabled (exact-match)])
+OVS_TRAFFIC_VSWITCHD_START([], [], [-- set Open_vSwitch . other_config:hw-offload=true other_config:exact-match-offload=true])
+   AT_CHECK([ovs-appctl upcall/disable-megaflows], [0], [megaflows disabled
+], [])
+
+AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg netdev_offload_tc:file:dbg ofproto_dpif_upcall:DBG dpif_netlink:DBG])
+
+AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"])
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24<http://10.1.1.1/24>")
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24<http://10.1.1.2/24>")
+
+NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.1 -w 2 10.1.1.2 | FORMAT_PING], [0], [ignore])
+
+AT_CHECK([ovs-appctl dpctl/dump-flows type=tc | grep "eth_type(0x0800)" | strip_stats | DUMP_CLEAN_SORTED], [0], [dnl
+in_port(2),ct_zone(0),ct_mark(0),ct_label(0),eth(macs),eth_type(0x0800),ipv4(src=10.1.1.1,dst=10.1.1.2,proto=1),icmp(type=8,code=0), packets:0, bytes:0, used:0.001s, actions:output
+in_port(3),ct_zone(0),ct_mark(0),ct_label(0),eth(macs),eth_type(0x0800),ipv4(src=10.1.1.2,dst=10.1.1.1,proto=1),icmp(type=0,code=0), packets:0, bytes:0, used:0.001s, actions:output
+])
+
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([offloads - set ingress_policing_rate and ingress_policing_burst - offloads disabled])
 AT_KEYWORDS([ingress_policing])
 OVS_CHECK_TC_QDISC()
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 36cb4e495..90844b428 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -254,6 +254,25 @@ 
         </p>
       </column>

+      <column name="other_config" key="exact-match-offload"
+              type='{"type": "boolean"}'>
+        <p>
+          Set this value to <code>true</code> to enable exact match netdev flow
+          offload.
+        </p>
+        <p>
+          The default value is <code>false</code>.
+        </p>
+        <p>
+          To offload exact match flows, the user needs to enable this
+          feature along with disabling the megaflows.
+        </p>
+        <p>
+          This is only relevant if
+          <ref column="other_config" key="hw-offload"/> is enabled.
+        </p>
+      </column>
+
       <column name="other_config" key="n-offload-threads"
               type='{"type": "integer", "minInteger": 1, "maxInteger": 10}'>
         <p>