diff mbox series

[ovs-dev,v5,1/5] dpif-netdev: associate flow with a mark id

Message ID 1513781077-21984-2-git-send-email-yliu@fridaylinux.org
State Superseded
Delegated to: Ian Stokes
Headers show
Series OVS-DPDK flow offload with rte_flow | expand

Commit Message

Yuanhan Liu Dec. 20, 2017, 2:44 p.m. UTC
Most modern NICs have the ability to bind a flow with a mark, so that
every pkt matches such flow will have that mark present in its desc.

The basic idea of doing that is, when we receives pkts later, we could
directly get the flow from the mark. That could avoid some very costly
CPU operations, including (but not limiting to) miniflow_extract, emc
lookup, dpcls lookup, etc. Thus, performance could be greatly improved.

Thus, the mojor work of this patch is to associate a flow with a mark
id (an uint32_t number). The association in netdev datapatch is done
by CMAP, while in hardware it's done by the rte_flow MARK action.

One tricky thing in OVS-DPDK is, the flow tables is per-PMD. For the
case there is only one phys port but with 2 queues, there could be 2
PMDs. In another word, even for a single mega flow (i.e. udp,tp_src=1000),
there could be 2 different dp_netdev flows, one for each PMD. That could
results to the same mega flow being offloaded twice in the hardware,
worse, we may get 2 different marks and only the last one will work.

To avoid that, a megaflow_to_mark CMAP is created. An entry will be
added for the first PMD wants to offload a flow. For later PMDs, it
will see such megaflow is already offloaded, then the flow will not
be offloaded to HW twice.

Meanwhile, the mark to flow mapping becomes to 1:N mapping. That is
what the mark_to_flow CMAP for. For the first PMD wants to offload a
flow, it allocates a new mark and do the flow offload by reusing the
->flow_put method. When it succeeds, a "mark to flow" entry will be
added. For later PMDs, it will get the corresponding mark by above
megaflow_to_mark CMAP. Then, another "mark to flow" entry will be
added.

Another thing might worth mentioning is that hte megaflow is created
by masking all the bytes from match->flow with match->wc. It works
well so far, but I have a feeling that is not the best way.

Co-authored-by: Finn Christensen <fc@napatech.com>
Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
Signed-off-by: Finn Christensen <fc@napatech.com>
---

v5: - fixed check of flow_mark_has_no_ref (renamed from
      is_last_flow_mark_reference).
      This fixed an issue that it took too long to finish
      flow add/removal if we do that repeatdly.

    - do mark_to_flow disassociation if flow modification failed
---
 lib/dpif-netdev.c | 263 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 lib/netdev.h      |   6 ++
 2 files changed, 269 insertions(+)

Comments

Stokes, Ian Jan. 24, 2018, 5:29 p.m. UTC | #1
> -----Original Message-----
> From: Yuanhan Liu [mailto:yliu@fridaylinux.org]
> Sent: Wednesday, December 20, 2017 2:45 PM
> To: dev@openvswitch.org
> Cc: Finn Christensen <fc@napatech.com>; Darrell Ball <dball@vmware.com>;
> Chandran, Sugesh <sugesh.chandran@intel.com>; Simon Horman
> <simon.horman@netronome.com>; Stokes, Ian <ian.stokes@intel.com>; Yuanhan
> Liu <yliu@fridaylinux.org>
> Subject: [PATCH v5 1/5] dpif-netdev: associate flow with a mark id
> 

Hi Yuanhan, thanks for working on this, a few comments on the commit message and the code below to be addressed.

> Most modern NICs have the ability to bind a flow with a mark, so that
> every pkt matches such flow will have that mark present in its desc.

Can you use 'packet' and 'descriptor' rather than abbreviations pkt & desc above for clarity in the commit message. This can applies to other instances throughput the patch.

> 
> The basic idea of doing that is, when we receives pkts later, we could
> directly get the flow from the mark. That could avoid some very costly CPU
> operations, including (but not limiting to) miniflow_extract, emc lookup,
> dpcls lookup, etc. Thus, performance could be greatly improved.
> 
> Thus, the mojor work of this patch is to associate a flow with a mark id
Typo 'major'

> (an uint32_t number). The association in netdev datapatch is done by CMAP,
Typo 'datapath'

> while in hardware it's done by the rte_flow MARK action.
> 
> One tricky thing in OVS-DPDK is, the flow tables is per-PMD. For the case
> there is only one phys port but with 2 queues, there could be 2 PMDs. In
> another word, even for a single mega flow (i.e. udp,tp_src=1000), there
I think above should be 'In otherwords' rather than 'another word'.

> could be 2 different dp_netdev flows, one for each PMD. That could results
> to the same mega flow being offloaded twice in the hardware, worse, we may
> get 2 different marks and only the last one will work.
> 
> To avoid that, a megaflow_to_mark CMAP is created. An entry will be added
> for the first PMD wants to offload a flow. For later PMDs, it will see

Pmd 'that' wants to offload.

> such megaflow is already offloaded, then the flow will not be offloaded to
> HW twice.
> 
> Meanwhile, the mark to flow mapping becomes to 1:N mapping. That is what

Is this 1:N or 1:1? I thought I spotted below that it's 1:1.

> the mark_to_flow CMAP for. For the first PMD wants to offload a flow, it

mark_to_flow CMAP 'is' for. 'If' the first PMD wants to offload a flow it...

> allocates a new mark and do the flow offload by reusing the

And 'performs' the flow offload by....

> ->flow_put method. When it succeeds, a "mark to flow" entry will be
> added. For later PMDs, it will get the corresponding mark by above

corresponding mark by 'the'...

> megaflow_to_mark CMAP. Then, another "mark to flow" entry will be added.
> 
> Another thing might worth mentioning is that hte megaflow is created by

Another point worth mentioning is that 'the'...

> masking all the bytes from match->flow with match->wc. It works well so
> far, but I have a feeling that is not the best way.
> 
> Co-authored-by: Finn Christensen <fc@napatech.com>
> Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
> Signed-off-by: Finn Christensen <fc@napatech.com>
> ---
> 
> v5: - fixed check of flow_mark_has_no_ref (renamed from
>       is_last_flow_mark_reference).
>       This fixed an issue that it took too long to finish
>       flow add/removal if we do that repeatdly.
> 
>     - do mark_to_flow disassociation if flow modification failed
> ---
>  lib/dpif-netdev.c | 263
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  lib/netdev.h      |   6 ++
>  2 files changed, 269 insertions(+)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 55be632..2fdc8dd
> 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -77,6 +77,7 @@
>  #include "tnl-ports.h"
>  #include "unixctl.h"
>  #include "util.h"
> +#include "uuid.h"
> 

Just a general comment for this patch,

There's a lot of work ongoing in the dpif-netdev layer these days, did you think about moving some of the HWOL functionality here to a separate HWOL specific file? As HWOL grows over time I'm just thinking about the code maintainability.


>  VLOG_DEFINE_THIS_MODULE(dpif_netdev);
> 
> @@ -442,7 +443,9 @@ struct dp_netdev_flow {
>      /* Hash table index by unmasked flow. */
>      const struct cmap_node node; /* In owning dp_netdev_pmd_thread's */

Can you clarify the comment above, at first glance it's not clear /* In owning dp_netdev_pmd_thread's */ ?

>                                   /* 'flow_table'. */
> +    const struct cmap_node mark_node; /* In owning flow_mark's
> + mark_to_flow */

Same here, it's not clear what is meant by 'In owning...'

>      const ovs_u128 ufid;         /* Unique flow identifier. */
> +    const ovs_u128 mega_ufid;

As per OVS Coding style can you add a comment for mega_ufid.

>      const unsigned pmd_id;       /* The 'core_id' of pmd thread owning
> this */
>                                   /* flow. */
> 
> @@ -453,6 +456,7 @@ struct dp_netdev_flow {
>      struct ovs_refcount ref_cnt;
> 
>      bool dead;
> +    uint32_t mark;               /* Unique flow mark assiged to a flow */

Typo, 'assigned'.

> 
>      /* Statistics. */
>      struct dp_netdev_flow_stats stats;
> @@ -1832,6 +1836,172 @@ dp_netdev_pmd_find_dpcls(struct
> dp_netdev_pmd_thread *pmd,
>      return cls;
>  }
> 
> +#define MAX_FLOW_MARK       (UINT32_MAX - 1)
> +#define INVALID_FLOW_MARK   (UINT32_MAX)
> +
> +struct megaflow_to_mark_data {
> +    const struct cmap_node node;
> +    ovs_u128 mega_ufid;
> +    uint32_t mark;
> +};
> +
> +struct flow_mark {
> +    struct cmap megaflow_to_mark;
> +    struct cmap mark_to_flow;
> +    struct id_pool *pool;
> +    struct ovs_mutex mutex;
> +};
> +

Breaks compilation with Sparse. Sparse complains with:
'lib/dpif-netdev.c:1860:18: error: symbol 'flow_mark' was not declared. Should it be static?'

> +struct flow_mark flow_mark = {
> +    .megaflow_to_mark = CMAP_INITIALIZER,
> +    .mark_to_flow = CMAP_INITIALIZER,
> +    .mutex = OVS_MUTEX_INITIALIZER,
> +};
> +
> +static uint32_t
> +flow_mark_alloc(void)
> +{
> +    uint32_t mark;
> +

I would expect the id pool to be allocated only once before it's used to allocate ids for the marks.

Would it make sense to move the !flow_mark.pool check and allocation out of here to simplify the function to a single purpose and avoid the check for every mark id allocation.


> +    if (!flow_mark.pool) {
> +        /* Haven't initiated yet, do it here */
> +        flow_mark.pool = id_pool_create(0, MAX_FLOW_MARK);
> +    }
> +
> +    if (id_pool_alloc_id(flow_mark.pool, &mark)) {
> +        return mark;
> +    }
> +
> +    return INVALID_FLOW_MARK;
> +}
> +
> +static void
> +flow_mark_free(uint32_t mark)
> +{
> +    id_pool_free_id(flow_mark.pool, mark); }
> +
> +/* associate flow with a mark, which is a 1:1 mapping */ static void

Can you specify megaflow in the comment above.

> +megaflow_to_mark_associate(const ovs_u128 *mega_ufid, uint32_t mark) {
> +    size_t hash = dp_netdev_flow_hash(mega_ufid);
> +    struct megaflow_to_mark_data *data = xzalloc(sizeof(*data));
> +
> +    data->mega_ufid = *mega_ufid;
> +    data->mark = mark;
> +
> +    cmap_insert(&flow_mark.megaflow_to_mark,
> +                CONST_CAST(struct cmap_node *, &data->node), hash); }
> +
> +/* disassociate flow with a mark */
> +static void
> +megaflow_to_mark_disassociate(const ovs_u128 *mega_ufid) {
> +    size_t hash = dp_netdev_flow_hash(mega_ufid);
> +    struct megaflow_to_mark_data *data;
> +
> +    CMAP_FOR_EACH_WITH_HASH (data, node, hash,
> &flow_mark.megaflow_to_mark) {
> +        if (ovs_u128_equals(*mega_ufid, data->mega_ufid)) {
> +            cmap_remove(&flow_mark.megaflow_to_mark,
> +                        CONST_CAST(struct cmap_node *, &data->node),
> hash);
> +            free(data);
> +            return;
> +        }
> +    }
> +
> +    VLOG_WARN("masked ufid "UUID_FMT" is not associated with a mark?\n",
> +              UUID_ARGS((struct uuid *)mega_ufid)); }
> +
> +static inline uint32_t
> +megaflow_to_mark_find(const ovs_u128 *mega_ufid) {
> +    size_t hash = dp_netdev_flow_hash(mega_ufid);
> +    struct megaflow_to_mark_data *data;
> +
> +    CMAP_FOR_EACH_WITH_HASH (data, node, hash,
> &flow_mark.megaflow_to_mark) {
> +        if (ovs_u128_equals(*mega_ufid, data->mega_ufid)) {
> +            return data->mark;
> +        }
> +    }
> +
> +    return INVALID_FLOW_MARK;
> +}
> +
> +/* associate mark with a flow, which is 1:N mapping */ static void
> +mark_to_flow_associate(const uint32_t mark, struct dp_netdev_flow
> +*flow) {
> +    dp_netdev_flow_ref(flow);
> +
> +    cmap_insert(&flow_mark.mark_to_flow,
> +                CONST_CAST(struct cmap_node *, &flow->mark_node),
> +                mark);
> +    flow->mark = mark;
> +
> +    VLOG_DBG("associated dp_netdev flow %p with mark %u\n", flow,
> +mark); }
> +

I'd like to see a comment explain the purpose of the function below. Which reference is the flow_mark missing?

> +static bool
> +flow_mark_has_no_ref(uint32_t mark)
> +{
> +    struct dp_netdev_flow *flow;
> +

Maybe I'm missing something below, but I expected a hash to be computed for mark before being called with CMAP_FOR_EACH_WITH_HASH?

> +    CMAP_FOR_EACH_WITH_HASH (flow, mark_node, mark,
> +                             &flow_mark.mark_to_flow) {
> +        if (flow->mark == mark) {
> +            return false;
> +        }
> +    }
> +
> +    return true;
> +}
> +
> +static int
> +mark_to_flow_disassociate(struct dp_netdev_pmd_thread *pmd,
> +                          struct dp_netdev_flow *flow) {
> +    int ret = 0;
> +    uint32_t mark = flow->mark;
> +    struct cmap_node *mark_node = CONST_CAST(struct cmap_node *,
> +                                             &flow->mark_node);
> +
> +    cmap_remove(&flow_mark.mark_to_flow, mark_node, mark);
> +    flow->mark = INVALID_FLOW_MARK;
> +

A comment on this code block could be helpful. Mainly just to explain how the reference could be missing and the required steps because of this.

> +    if (flow_mark_has_no_ref(mark)) {
> +        struct dp_netdev_port *port;
> +        odp_port_t in_port = flow->flow.in_port.odp_port;
> +

Need port Mutex 'pmd->dp->port_mutex' before calling dp_netdev_lookup_port().

> +        port = dp_netdev_lookup_port(pmd->dp, in_port);
> +        if (port) {
> +            ret = netdev_flow_del(port->netdev, &flow->mega_ufid, NULL);
> +        }
> +
> +        flow_mark_free(mark);
> +        VLOG_DBG("freed flow mark %u\n", mark);
> +
> +        megaflow_to_mark_disassociate(&flow->mega_ufid);
> +    }
> +    dp_netdev_flow_unref(flow);
> +
> +    return ret;
> +}
> +
> +static void
> +flow_mark_flush(struct dp_netdev_pmd_thread *pmd) {
> +    struct dp_netdev_flow *flow;
> +
> +    CMAP_FOR_EACH (flow, mark_node, &flow_mark.mark_to_flow) {
> +        if (flow->pmd_id == pmd->core_id) {
> +            mark_to_flow_disassociate(pmd, flow);
> +        }
> +    }
> +}
> +
>  static void
>  dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
>                            struct dp_netdev_flow *flow) @@ -1845,6 +2015,9
> @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
>      ovs_assert(cls != NULL);
>      dpcls_remove(cls, &flow->cr);
>      cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow-
> >ufid));
> +    if (flow->mark != INVALID_FLOW_MARK) {
> +        mark_to_flow_disassociate(pmd, flow);
> +    }
>      flow->dead = true;
> 
>      dp_netdev_flow_unref(flow);
> @@ -2424,6 +2597,87 @@ out:
>      return error;
>  }
> 
> +static void
> +try_netdev_flow_put(struct dp_netdev_pmd_thread *pmd, odp_port_t in_port,
> +                    struct dp_netdev_flow *flow, struct match *match,
> +                    const struct nlattr *actions, size_t actions_len) {
> +    struct offload_info info;
> +    struct dp_netdev_port *port;
> +    bool modification = flow->mark != INVALID_FLOW_MARK;
> +    const char *op = modification ? "modify" : "add";
> +    uint32_t mark;
> +    int ret;
> +

Again port Mutex 'pmd->dp->port_mutex' required in this function for calling dp_netdev_lookup_port().

> +    port = dp_netdev_lookup_port(pmd->dp, in_port);
> +    if (!port) {
> +        return;
> +    }
> +
> +    ovs_mutex_lock(&flow_mark.mutex);
> +
> +    if (modification) {
> +        mark = flow->mark;
> +    } else {
> +        if (!netdev_is_flow_api_enabled()) {
> +            goto out;
> +        }
> +
> +        /*
> +         * If a mega flow has already been offloaded (from other PMD
> +         * instances), do not offload it again.
> +         */
> +        mark = megaflow_to_mark_find(&flow->mega_ufid);
> +        if (mark != INVALID_FLOW_MARK) {
> +            VLOG_DBG("flow has already been offloaded with mark %u\n",
> mark);
Just to understand this, Am I right in thinking the expeted behavior is a mark esists so it has been offloaded by another PMD, don't offload it in the HW again BUT do mark it as associated. Is this associated call only relevant to the current PMD? (i.e. each PMD must discover this association themselves?)

> +            mark_to_flow_associate(mark, flow);
> +            goto out;
> +        }
> +
> +        mark = flow_mark_alloc();
> +        if (mark == INVALID_FLOW_MARK) {
> +            VLOG_ERR("failed to allocate flow mark!\n");
> +            goto out;
> +        }
> +    }
> +
> +    info.flow_mark = mark;
> +    ret = netdev_flow_put(port->netdev, match,
> +                          CONST_CAST(struct nlattr *, actions),
> +                          actions_len, &flow->mega_ufid, &info, NULL);
> +    if (ret) {
> +        VLOG_ERR("failed to %s netdev flow with mark %u\n", op, mark);
> +        if (!modification) {
> +            flow_mark_free(mark);
> +        } else {
> +            mark_to_flow_disassociate(pmd, flow);
> +        }
> +        goto out;
> +    }
> +
> +    if (!modification) {
> +        megaflow_to_mark_associate(&flow->mega_ufid, mark);
> +        mark_to_flow_associate(mark, flow);
> +    }
> +    VLOG_DBG("succeed to %s netdev flow with mark %u\n", op, mark);

The whole section above seems a bit complicated, again comments explaining the behavior that depends on modification would help a lot here.

> +
> +out:
> +    ovs_mutex_unlock(&flow_mark.mutex);
> +}
> +
> +static void
> +dp_netdev_get_mega_ufid(const struct match *match, ovs_u128 *mega_ufid)
> +{
> +    struct flow masked_flow;
> +    size_t i;
> +
> +    for (i = 0; i < sizeof(struct flow); i++) {
> +        ((uint8_t *)&masked_flow)[i] = ((uint8_t *)&match->flow)[i] &
> +                                       ((uint8_t *)&match->wc)[i];
> +    }
> +    dpif_flow_hash(NULL, &masked_flow, sizeof(struct flow), mega_ufid);
> +}
> +
>  static struct dp_netdev_flow *
>  dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
>                     struct match *match, const ovs_u128 *ufid, @@ -2459,12
> +2713,14 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
>      memset(&flow->stats, 0, sizeof flow->stats);
>      flow->dead = false;
>      flow->batch = NULL;
> +    flow->mark = INVALID_FLOW_MARK;
>      *CONST_CAST(unsigned *, &flow->pmd_id) = pmd->core_id;
>      *CONST_CAST(struct flow *, &flow->flow) = match->flow;
>      *CONST_CAST(ovs_u128 *, &flow->ufid) = *ufid;
>      ovs_refcount_init(&flow->ref_cnt);
>      ovsrcu_set(&flow->actions, dp_netdev_actions_create(actions,
> actions_len));
> 
> +    dp_netdev_get_mega_ufid(match, CONST_CAST(ovs_u128 *,
> + &flow->mega_ufid));
>      netdev_flow_key_init_masked(&flow->cr.flow, &match->flow, &mask);
> 
>      /* Select dpcls for in_port. Relies on in_port to be exact match. */
> @@ -2474,6 +2730,8 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
>      cmap_insert(&pmd->flow_table, CONST_CAST(struct cmap_node *, &flow-
> >node),
>                  dp_netdev_flow_hash(&flow->ufid));
> 
> +    try_netdev_flow_put(pmd, in_port, flow, match, actions,
> + actions_len);
> +
>      if (OVS_UNLIKELY(!VLOG_DROP_DBG((&upcall_rl)))) {
>          struct ds ds = DS_EMPTY_INITIALIZER;
>          struct ofpbuf key_buf, mask_buf; @@ -2554,6 +2812,7 @@
> flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>          if (put->flags & DPIF_FP_MODIFY) {
>              struct dp_netdev_actions *new_actions;
>              struct dp_netdev_actions *old_actions;
> +            odp_port_t in_port = netdev_flow->flow.in_port.odp_port;
> 
>              new_actions = dp_netdev_actions_create(put->actions,
>                                                     put->actions_len); @@
> -2561,6 +2820,9 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
>              old_actions = dp_netdev_flow_get_actions(netdev_flow);
>              ovsrcu_set(&netdev_flow->actions, new_actions);
> 
> +            try_netdev_flow_put(pmd, in_port, netdev_flow, match,
> +                                put->actions, put->actions_len);
> +
>              if (stats) {
>                  get_dpif_flow_stats(netdev_flow, stats);
>              }
> @@ -3570,6 +3832,7 @@ reload_affected_pmds(struct dp_netdev *dp)
> 
>      CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
>          if (pmd->need_reload) {
> +            flow_mark_flush(pmd);
>              dp_netdev_reload_pmd__(pmd);
>              pmd->need_reload = false;
>          }
> diff --git a/lib/netdev.h b/lib/netdev.h index 3a545fe..0c1946a 100644
> --- a/lib/netdev.h
> +++ b/lib/netdev.h
> @@ -188,6 +188,12 @@ void netdev_send_wait(struct netdev *, int qid);
> struct offload_info {
>      const struct dpif_class *dpif_class;
>      ovs_be16 tp_dst_port; /* Destination port for tunnel in SET action */
> +
> +    /*
> +     * The flow mark id assigened to the flow. If any pkts hit the flow,
> +     * it will be in the pkt meta data.
> +     */
> +    uint32_t flow_mark;
>  };
>  struct dpif_class;
>  struct netdev_flow_dump;
> --
> 2.7.4
Yuanhan Liu Jan. 25, 2018, 2:39 p.m. UTC | #2
On Wed, Jan 24, 2018 at 05:29:45PM +0000, Stokes, Ian wrote:
> > -----Original Message-----
> > From: Yuanhan Liu [mailto:yliu@fridaylinux.org]
> > Sent: Wednesday, December 20, 2017 2:45 PM
> > To: dev@openvswitch.org
> > Cc: Finn Christensen <fc@napatech.com>; Darrell Ball <dball@vmware.com>;
> > Chandran, Sugesh <sugesh.chandran@intel.com>; Simon Horman
> > <simon.horman@netronome.com>; Stokes, Ian <ian.stokes@intel.com>; Yuanhan
> > Liu <yliu@fridaylinux.org>
> > Subject: [PATCH v5 1/5] dpif-netdev: associate flow with a mark id
> > 
> 
> Hi Yuanhan, thanks for working on this, a few comments on the commit message and the code below to be addressed.

Thanks for the review, I will address all the comments tomorrow, including
rebase and re-test.

	--yliu
> 
> > Most modern NICs have the ability to bind a flow with a mark, so that
> > every pkt matches such flow will have that mark present in its desc.
> 
> Can you use 'packet' and 'descriptor' rather than abbreviations pkt & desc above for clarity in the commit message. This can applies to other instances throughput the patch.
> 
> > 
> > The basic idea of doing that is, when we receives pkts later, we could
> > directly get the flow from the mark. That could avoid some very costly CPU
> > operations, including (but not limiting to) miniflow_extract, emc lookup,
> > dpcls lookup, etc. Thus, performance could be greatly improved.
> > 
> > Thus, the mojor work of this patch is to associate a flow with a mark id
> Typo 'major'
> 
> > (an uint32_t number). The association in netdev datapatch is done by CMAP,
> Typo 'datapath'
> 
> > while in hardware it's done by the rte_flow MARK action.
> > 
> > One tricky thing in OVS-DPDK is, the flow tables is per-PMD. For the case
> > there is only one phys port but with 2 queues, there could be 2 PMDs. In
> > another word, even for a single mega flow (i.e. udp,tp_src=1000), there
> I think above should be 'In otherwords' rather than 'another word'.
> 
> > could be 2 different dp_netdev flows, one for each PMD. That could results
> > to the same mega flow being offloaded twice in the hardware, worse, we may
> > get 2 different marks and only the last one will work.
> > 
> > To avoid that, a megaflow_to_mark CMAP is created. An entry will be added
> > for the first PMD wants to offload a flow. For later PMDs, it will see
> 
> Pmd 'that' wants to offload.
> 
> > such megaflow is already offloaded, then the flow will not be offloaded to
> > HW twice.
> > 
> > Meanwhile, the mark to flow mapping becomes to 1:N mapping. That is what
> 
> Is this 1:N or 1:1? I thought I spotted below that it's 1:1.
> 
> > the mark_to_flow CMAP for. For the first PMD wants to offload a flow, it
> 
> mark_to_flow CMAP 'is' for. 'If' the first PMD wants to offload a flow it...
> 
> > allocates a new mark and do the flow offload by reusing the
> 
> And 'performs' the flow offload by....
> 
> > ->flow_put method. When it succeeds, a "mark to flow" entry will be
> > added. For later PMDs, it will get the corresponding mark by above
> 
> corresponding mark by 'the'...
> 
> > megaflow_to_mark CMAP. Then, another "mark to flow" entry will be added.
> > 
> > Another thing might worth mentioning is that hte megaflow is created by
> 
> Another point worth mentioning is that 'the'...
> 
> > masking all the bytes from match->flow with match->wc. It works well so
> > far, but I have a feeling that is not the best way.
> > 
> > Co-authored-by: Finn Christensen <fc@napatech.com>
> > Signed-off-by: Yuanhan Liu <yliu@fridaylinux.org>
> > Signed-off-by: Finn Christensen <fc@napatech.com>
> > ---
> > 
> > v5: - fixed check of flow_mark_has_no_ref (renamed from
> >       is_last_flow_mark_reference).
> >       This fixed an issue that it took too long to finish
> >       flow add/removal if we do that repeatdly.
> > 
> >     - do mark_to_flow disassociation if flow modification failed
> > ---
> >  lib/dpif-netdev.c | 263
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  lib/netdev.h      |   6 ++
> >  2 files changed, 269 insertions(+)
> > 
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 55be632..2fdc8dd
> > 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -77,6 +77,7 @@
> >  #include "tnl-ports.h"
> >  #include "unixctl.h"
> >  #include "util.h"
> > +#include "uuid.h"
> > 
> 
> Just a general comment for this patch,
> 
> There's a lot of work ongoing in the dpif-netdev layer these days, did you think about moving some of the HWOL functionality here to a separate HWOL specific file? As HWOL grows over time I'm just thinking about the code maintainability.
> 
> 
> >  VLOG_DEFINE_THIS_MODULE(dpif_netdev);
> > 
> > @@ -442,7 +443,9 @@ struct dp_netdev_flow {
> >      /* Hash table index by unmasked flow. */
> >      const struct cmap_node node; /* In owning dp_netdev_pmd_thread's */
> 
> Can you clarify the comment above, at first glance it's not clear /* In owning dp_netdev_pmd_thread's */ ?
> 
> >                                   /* 'flow_table'. */
> > +    const struct cmap_node mark_node; /* In owning flow_mark's
> > + mark_to_flow */
> 
> Same here, it's not clear what is meant by 'In owning...'
> 
> >      const ovs_u128 ufid;         /* Unique flow identifier. */
> > +    const ovs_u128 mega_ufid;
> 
> As per OVS Coding style can you add a comment for mega_ufid.
> 
> >      const unsigned pmd_id;       /* The 'core_id' of pmd thread owning
> > this */
> >                                   /* flow. */
> > 
> > @@ -453,6 +456,7 @@ struct dp_netdev_flow {
> >      struct ovs_refcount ref_cnt;
> > 
> >      bool dead;
> > +    uint32_t mark;               /* Unique flow mark assiged to a flow */
> 
> Typo, 'assigned'.
> 
> > 
> >      /* Statistics. */
> >      struct dp_netdev_flow_stats stats;
> > @@ -1832,6 +1836,172 @@ dp_netdev_pmd_find_dpcls(struct
> > dp_netdev_pmd_thread *pmd,
> >      return cls;
> >  }
> > 
> > +#define MAX_FLOW_MARK       (UINT32_MAX - 1)
> > +#define INVALID_FLOW_MARK   (UINT32_MAX)
> > +
> > +struct megaflow_to_mark_data {
> > +    const struct cmap_node node;
> > +    ovs_u128 mega_ufid;
> > +    uint32_t mark;
> > +};
> > +
> > +struct flow_mark {
> > +    struct cmap megaflow_to_mark;
> > +    struct cmap mark_to_flow;
> > +    struct id_pool *pool;
> > +    struct ovs_mutex mutex;
> > +};
> > +
> 
> Breaks compilation with Sparse. Sparse complains with:
> 'lib/dpif-netdev.c:1860:18: error: symbol 'flow_mark' was not declared. Should it be static?'
> 
> > +struct flow_mark flow_mark = {
> > +    .megaflow_to_mark = CMAP_INITIALIZER,
> > +    .mark_to_flow = CMAP_INITIALIZER,
> > +    .mutex = OVS_MUTEX_INITIALIZER,
> > +};
> > +
> > +static uint32_t
> > +flow_mark_alloc(void)
> > +{
> > +    uint32_t mark;
> > +
> 
> I would expect the id pool to be allocated only once before it's used to allocate ids for the marks.
> 
> Would it make sense to move the !flow_mark.pool check and allocation out of here to simplify the function to a single purpose and avoid the check for every mark id allocation.
> 
> 
> > +    if (!flow_mark.pool) {
> > +        /* Haven't initiated yet, do it here */
> > +        flow_mark.pool = id_pool_create(0, MAX_FLOW_MARK);
> > +    }
> > +
> > +    if (id_pool_alloc_id(flow_mark.pool, &mark)) {
> > +        return mark;
> > +    }
> > +
> > +    return INVALID_FLOW_MARK;
> > +}
> > +
> > +static void
> > +flow_mark_free(uint32_t mark)
> > +{
> > +    id_pool_free_id(flow_mark.pool, mark); }
> > +
> > +/* associate flow with a mark, which is a 1:1 mapping */ static void
> 
> Can you specify megaflow in the comment above.
> 
> > +megaflow_to_mark_associate(const ovs_u128 *mega_ufid, uint32_t mark) {
> > +    size_t hash = dp_netdev_flow_hash(mega_ufid);
> > +    struct megaflow_to_mark_data *data = xzalloc(sizeof(*data));
> > +
> > +    data->mega_ufid = *mega_ufid;
> > +    data->mark = mark;
> > +
> > +    cmap_insert(&flow_mark.megaflow_to_mark,
> > +                CONST_CAST(struct cmap_node *, &data->node), hash); }
> > +
> > +/* disassociate flow with a mark */
> > +static void
> > +megaflow_to_mark_disassociate(const ovs_u128 *mega_ufid) {
> > +    size_t hash = dp_netdev_flow_hash(mega_ufid);
> > +    struct megaflow_to_mark_data *data;
> > +
> > +    CMAP_FOR_EACH_WITH_HASH (data, node, hash,
> > &flow_mark.megaflow_to_mark) {
> > +        if (ovs_u128_equals(*mega_ufid, data->mega_ufid)) {
> > +            cmap_remove(&flow_mark.megaflow_to_mark,
> > +                        CONST_CAST(struct cmap_node *, &data->node),
> > hash);
> > +            free(data);
> > +            return;
> > +        }
> > +    }
> > +
> > +    VLOG_WARN("masked ufid "UUID_FMT" is not associated with a mark?\n",
> > +              UUID_ARGS((struct uuid *)mega_ufid)); }
> > +
> > +static inline uint32_t
> > +megaflow_to_mark_find(const ovs_u128 *mega_ufid) {
> > +    size_t hash = dp_netdev_flow_hash(mega_ufid);
> > +    struct megaflow_to_mark_data *data;
> > +
> > +    CMAP_FOR_EACH_WITH_HASH (data, node, hash,
> > &flow_mark.megaflow_to_mark) {
> > +        if (ovs_u128_equals(*mega_ufid, data->mega_ufid)) {
> > +            return data->mark;
> > +        }
> > +    }
> > +
> > +    return INVALID_FLOW_MARK;
> > +}
> > +
> > +/* associate mark with a flow, which is 1:N mapping */ static void
> > +mark_to_flow_associate(const uint32_t mark, struct dp_netdev_flow
> > +*flow) {
> > +    dp_netdev_flow_ref(flow);
> > +
> > +    cmap_insert(&flow_mark.mark_to_flow,
> > +                CONST_CAST(struct cmap_node *, &flow->mark_node),
> > +                mark);
> > +    flow->mark = mark;
> > +
> > +    VLOG_DBG("associated dp_netdev flow %p with mark %u\n", flow,
> > +mark); }
> > +
> 
> I'd like to see a comment explain the purpose of the function below. Which reference is the flow_mark missing?
> 
> > +static bool
> > +flow_mark_has_no_ref(uint32_t mark)
> > +{
> > +    struct dp_netdev_flow *flow;
> > +
> 
> Maybe I'm missing something below, but I expected a hash to be computed for mark before being called with CMAP_FOR_EACH_WITH_HASH?
> 
> > +    CMAP_FOR_EACH_WITH_HASH (flow, mark_node, mark,
> > +                             &flow_mark.mark_to_flow) {
> > +        if (flow->mark == mark) {
> > +            return false;
> > +        }
> > +    }
> > +
> > +    return true;
> > +}
> > +
> > +static int
> > +mark_to_flow_disassociate(struct dp_netdev_pmd_thread *pmd,
> > +                          struct dp_netdev_flow *flow) {
> > +    int ret = 0;
> > +    uint32_t mark = flow->mark;
> > +    struct cmap_node *mark_node = CONST_CAST(struct cmap_node *,
> > +                                             &flow->mark_node);
> > +
> > +    cmap_remove(&flow_mark.mark_to_flow, mark_node, mark);
> > +    flow->mark = INVALID_FLOW_MARK;
> > +
> 
> A comment on this code block could be helpful. Mainly just to explain how the reference could be missing and the required steps because of this.
> 
> > +    if (flow_mark_has_no_ref(mark)) {
> > +        struct dp_netdev_port *port;
> > +        odp_port_t in_port = flow->flow.in_port.odp_port;
> > +
> 
> Need port Mutex 'pmd->dp->port_mutex' before calling dp_netdev_lookup_port().
> 
> > +        port = dp_netdev_lookup_port(pmd->dp, in_port);
> > +        if (port) {
> > +            ret = netdev_flow_del(port->netdev, &flow->mega_ufid, NULL);
> > +        }
> > +
> > +        flow_mark_free(mark);
> > +        VLOG_DBG("freed flow mark %u\n", mark);
> > +
> > +        megaflow_to_mark_disassociate(&flow->mega_ufid);
> > +    }
> > +    dp_netdev_flow_unref(flow);
> > +
> > +    return ret;
> > +}
> > +
> > +static void
> > +flow_mark_flush(struct dp_netdev_pmd_thread *pmd) {
> > +    struct dp_netdev_flow *flow;
> > +
> > +    CMAP_FOR_EACH (flow, mark_node, &flow_mark.mark_to_flow) {
> > +        if (flow->pmd_id == pmd->core_id) {
> > +            mark_to_flow_disassociate(pmd, flow);
> > +        }
> > +    }
> > +}
> > +
> >  static void
> >  dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
> >                            struct dp_netdev_flow *flow) @@ -1845,6 +2015,9
> > @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
> >      ovs_assert(cls != NULL);
> >      dpcls_remove(cls, &flow->cr);
> >      cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow-
> > >ufid));
> > +    if (flow->mark != INVALID_FLOW_MARK) {
> > +        mark_to_flow_disassociate(pmd, flow);
> > +    }
> >      flow->dead = true;
> > 
> >      dp_netdev_flow_unref(flow);
> > @@ -2424,6 +2597,87 @@ out:
> >      return error;
> >  }
> > 
> > +static void
> > +try_netdev_flow_put(struct dp_netdev_pmd_thread *pmd, odp_port_t in_port,
> > +                    struct dp_netdev_flow *flow, struct match *match,
> > +                    const struct nlattr *actions, size_t actions_len) {
> > +    struct offload_info info;
> > +    struct dp_netdev_port *port;
> > +    bool modification = flow->mark != INVALID_FLOW_MARK;
> > +    const char *op = modification ? "modify" : "add";
> > +    uint32_t mark;
> > +    int ret;
> > +
> 
> Again port Mutex 'pmd->dp->port_mutex' required in this function for calling dp_netdev_lookup_port().
> 
> > +    port = dp_netdev_lookup_port(pmd->dp, in_port);
> > +    if (!port) {
> > +        return;
> > +    }
> > +
> > +    ovs_mutex_lock(&flow_mark.mutex);
> > +
> > +    if (modification) {
> > +        mark = flow->mark;
> > +    } else {
> > +        if (!netdev_is_flow_api_enabled()) {
> > +            goto out;
> > +        }
> > +
> > +        /*
> > +         * If a mega flow has already been offloaded (from other PMD
> > +         * instances), do not offload it again.
> > +         */
> > +        mark = megaflow_to_mark_find(&flow->mega_ufid);
> > +        if (mark != INVALID_FLOW_MARK) {
> > +            VLOG_DBG("flow has already been offloaded with mark %u\n",
> > mark);
> Just to understand this, Am I right in thinking the expeted behavior is a mark esists so it has been offloaded by another PMD, don't offload it in the HW again BUT do mark it as associated. Is this associated call only relevant to the current PMD? (i.e. each PMD must discover this association themselves?)
> 
> > +            mark_to_flow_associate(mark, flow);
> > +            goto out;
> > +        }
> > +
> > +        mark = flow_mark_alloc();
> > +        if (mark == INVALID_FLOW_MARK) {
> > +            VLOG_ERR("failed to allocate flow mark!\n");
> > +            goto out;
> > +        }
> > +    }
> > +
> > +    info.flow_mark = mark;
> > +    ret = netdev_flow_put(port->netdev, match,
> > +                          CONST_CAST(struct nlattr *, actions),
> > +                          actions_len, &flow->mega_ufid, &info, NULL);
> > +    if (ret) {
> > +        VLOG_ERR("failed to %s netdev flow with mark %u\n", op, mark);
> > +        if (!modification) {
> > +            flow_mark_free(mark);
> > +        } else {
> > +            mark_to_flow_disassociate(pmd, flow);
> > +        }
> > +        goto out;
> > +    }
> > +
> > +    if (!modification) {
> > +        megaflow_to_mark_associate(&flow->mega_ufid, mark);
> > +        mark_to_flow_associate(mark, flow);
> > +    }
> > +    VLOG_DBG("succeed to %s netdev flow with mark %u\n", op, mark);
> 
> The whole section above seems a bit complicated, again comments explaining the behavior that depends on modification would help a lot here.
> 
> > +
> > +out:
> > +    ovs_mutex_unlock(&flow_mark.mutex);
> > +}
> > +
> > +static void
> > +dp_netdev_get_mega_ufid(const struct match *match, ovs_u128 *mega_ufid)
> > +{
> > +    struct flow masked_flow;
> > +    size_t i;
> > +
> > +    for (i = 0; i < sizeof(struct flow); i++) {
> > +        ((uint8_t *)&masked_flow)[i] = ((uint8_t *)&match->flow)[i] &
> > +                                       ((uint8_t *)&match->wc)[i];
> > +    }
> > +    dpif_flow_hash(NULL, &masked_flow, sizeof(struct flow), mega_ufid);
> > +}
> > +
> >  static struct dp_netdev_flow *
> >  dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
> >                     struct match *match, const ovs_u128 *ufid, @@ -2459,12
> > +2713,14 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
> >      memset(&flow->stats, 0, sizeof flow->stats);
> >      flow->dead = false;
> >      flow->batch = NULL;
> > +    flow->mark = INVALID_FLOW_MARK;
> >      *CONST_CAST(unsigned *, &flow->pmd_id) = pmd->core_id;
> >      *CONST_CAST(struct flow *, &flow->flow) = match->flow;
> >      *CONST_CAST(ovs_u128 *, &flow->ufid) = *ufid;
> >      ovs_refcount_init(&flow->ref_cnt);
> >      ovsrcu_set(&flow->actions, dp_netdev_actions_create(actions,
> > actions_len));
> > 
> > +    dp_netdev_get_mega_ufid(match, CONST_CAST(ovs_u128 *,
> > + &flow->mega_ufid));
> >      netdev_flow_key_init_masked(&flow->cr.flow, &match->flow, &mask);
> > 
> >      /* Select dpcls for in_port. Relies on in_port to be exact match. */
> > @@ -2474,6 +2730,8 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
> >      cmap_insert(&pmd->flow_table, CONST_CAST(struct cmap_node *, &flow-
> > >node),
> >                  dp_netdev_flow_hash(&flow->ufid));
> > 
> > +    try_netdev_flow_put(pmd, in_port, flow, match, actions,
> > + actions_len);
> > +
> >      if (OVS_UNLIKELY(!VLOG_DROP_DBG((&upcall_rl)))) {
> >          struct ds ds = DS_EMPTY_INITIALIZER;
> >          struct ofpbuf key_buf, mask_buf; @@ -2554,6 +2812,7 @@
> > flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
> >          if (put->flags & DPIF_FP_MODIFY) {
> >              struct dp_netdev_actions *new_actions;
> >              struct dp_netdev_actions *old_actions;
> > +            odp_port_t in_port = netdev_flow->flow.in_port.odp_port;
> > 
> >              new_actions = dp_netdev_actions_create(put->actions,
> >                                                     put->actions_len); @@
> > -2561,6 +2820,9 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
> >              old_actions = dp_netdev_flow_get_actions(netdev_flow);
> >              ovsrcu_set(&netdev_flow->actions, new_actions);
> > 
> > +            try_netdev_flow_put(pmd, in_port, netdev_flow, match,
> > +                                put->actions, put->actions_len);
> > +
> >              if (stats) {
> >                  get_dpif_flow_stats(netdev_flow, stats);
> >              }
> > @@ -3570,6 +3832,7 @@ reload_affected_pmds(struct dp_netdev *dp)
> > 
> >      CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
> >          if (pmd->need_reload) {
> > +            flow_mark_flush(pmd);
> >              dp_netdev_reload_pmd__(pmd);
> >              pmd->need_reload = false;
> >          }
> > diff --git a/lib/netdev.h b/lib/netdev.h index 3a545fe..0c1946a 100644
> > --- a/lib/netdev.h
> > +++ b/lib/netdev.h
> > @@ -188,6 +188,12 @@ void netdev_send_wait(struct netdev *, int qid);
> > struct offload_info {
> >      const struct dpif_class *dpif_class;
> >      ovs_be16 tp_dst_port; /* Destination port for tunnel in SET action */
> > +
> > +    /*
> > +     * The flow mark id assigened to the flow. If any pkts hit the flow,
> > +     * it will be in the pkt meta data.
> > +     */
> > +    uint32_t flow_mark;
> >  };
> >  struct dpif_class;
> >  struct netdev_flow_dump;
> > --
> > 2.7.4
Yuanhan Liu Jan. 26, 2018, 8:19 a.m. UTC | #3
On Wed, Jan 24, 2018 at 05:29:45PM +0000, Stokes, Ian wrote:
> > Meanwhile, the mark to flow mapping becomes to 1:N mapping. That is what
> 
> Is this 1:N or 1:1? I thought I spotted below that it's 1:1.

If you look further, you will also spot "1:N".
mark to mega flow is 1:1
mark to flow is 1:N
For why is that, I think the commit log has explained it well.

> > @@ -442,7 +443,9 @@ struct dp_netdev_flow {
> >      /* Hash table index by unmasked flow. */
> >      const struct cmap_node node; /* In owning dp_netdev_pmd_thread's */
> 
> Can you clarify the comment above, at first glance it's not clear /* In owning dp_netdev_pmd_thread's */ ?

Such comment doesn't even come from my patch. It's already there.

> >                                   /* 'flow_table'. */
> > +    const struct cmap_node mark_node; /* In owning flow_mark's
> > + mark_to_flow */
> 
> Same here, it's not clear what is meant by 'In owning...'

I was following the old comment style. Do you have better suggestions?

> 
> Breaks compilation with Sparse. Sparse complains with:
> 'lib/dpif-netdev.c:1860:18: error: symbol 'flow_mark' was not declared. Should it be static?'

Thanks. Will be fixed.

> > +struct flow_mark flow_mark = {
> > +    .megaflow_to_mark = CMAP_INITIALIZER,
> > +    .mark_to_flow = CMAP_INITIALIZER,
> > +    .mutex = OVS_MUTEX_INITIALIZER,
> > +};
> > +
> > +static uint32_t
> > +flow_mark_alloc(void)
> > +{
> > +    uint32_t mark;
> > +
> 
> I would expect the id pool to be allocated only once before it's used to allocate ids for the marks.
> 
> Would it make sense to move the !flow_mark.pool check and allocation out of here to simplify the function to a single purpose and avoid the check for every mark id allocation.

I don't find a good place to put it. Or you have better idea?

> > +    if (!flow_mark.pool) {
> > +        /* Haven't initiated yet, do it here */
> > +        flow_mark.pool = id_pool_create(0, MAX_FLOW_MARK);
> > +    }
> > +
> > +    if (id_pool_alloc_id(flow_mark.pool, &mark)) {
> > +        return mark;
> > +    }
> > +
> > +    return INVALID_FLOW_MARK;
> > +}
> > +
> > +static void
> > +flow_mark_free(uint32_t mark)
> > +{
> > +    id_pool_free_id(flow_mark.pool, mark); }
> > +
> > +/* associate flow with a mark, which is a 1:1 mapping */ static void
> 
> Can you specify megaflow in the comment above.

ok.

> > +megaflow_to_mark_associate(const ovs_u128 *mega_ufid, uint32_t mark) {
> > +    size_t hash = dp_netdev_flow_hash(mega_ufid);
> > +    struct megaflow_to_mark_data *data = xzalloc(sizeof(*data));
> > +
> > +    data->mega_ufid = *mega_ufid;
> > +    data->mark = mark;
> > +
> > +    cmap_insert(&flow_mark.megaflow_to_mark,
> > +                CONST_CAST(struct cmap_node *, &data->node), hash); }
> > +
> > +/* disassociate flow with a mark */
> > +static void
> > +megaflow_to_mark_disassociate(const ovs_u128 *mega_ufid) {
> > +    size_t hash = dp_netdev_flow_hash(mega_ufid);
> > +    struct megaflow_to_mark_data *data;
> > +
> > +    CMAP_FOR_EACH_WITH_HASH (data, node, hash,
> > &flow_mark.megaflow_to_mark) {
> > +        if (ovs_u128_equals(*mega_ufid, data->mega_ufid)) {
> > +            cmap_remove(&flow_mark.megaflow_to_mark,
> > +                        CONST_CAST(struct cmap_node *, &data->node),
> > hash);
> > +            free(data);
> > +            return;
> > +        }
> > +    }
> > +
> > +    VLOG_WARN("masked ufid "UUID_FMT" is not associated with a mark?\n",
> > +              UUID_ARGS((struct uuid *)mega_ufid)); }
> > +
> > +static inline uint32_t
> > +megaflow_to_mark_find(const ovs_u128 *mega_ufid) {
> > +    size_t hash = dp_netdev_flow_hash(mega_ufid);
> > +    struct megaflow_to_mark_data *data;
> > +
> > +    CMAP_FOR_EACH_WITH_HASH (data, node, hash,
> > &flow_mark.megaflow_to_mark) {
> > +        if (ovs_u128_equals(*mega_ufid, data->mega_ufid)) {
> > +            return data->mark;
> > +        }
> > +    }
> > +
> > +    return INVALID_FLOW_MARK;
> > +}
> > +
> > +/* associate mark with a flow, which is 1:N mapping */ static void
> > +mark_to_flow_associate(const uint32_t mark, struct dp_netdev_flow
> > +*flow) {
> > +    dp_netdev_flow_ref(flow);
> > +
> > +    cmap_insert(&flow_mark.mark_to_flow,
> > +                CONST_CAST(struct cmap_node *, &flow->mark_node),
> > +                mark);
> > +    flow->mark = mark;
> > +
> > +    VLOG_DBG("associated dp_netdev flow %p with mark %u\n", flow,
> > +mark); }
> > +
> 
> I'd like to see a comment explain the purpose of the function below. Which reference is the flow_mark missing?

What do you mean by "which reference is the flow_mark missing"?

--- forgot about this question: I think I understood you after
    looked you comments below. Also, you will find the answer
    below.

> > +static bool
> > +flow_mark_has_no_ref(uint32_t mark)
> > +{
> > +    struct dp_netdev_flow *flow;
> > +
> 
> Maybe I'm missing something below, but I expected a hash to be computed for mark before being called with CMAP_FOR_EACH_WITH_HASH?

I treat "mark" as the hash, as it's with the same type of "hash" and it's
uniq. But you are probably right, it might be better to get the hash by
"hash_int". Will fix it.

> 
> > +    CMAP_FOR_EACH_WITH_HASH (flow, mark_node, mark,
> > +                             &flow_mark.mark_to_flow) {
> > +        if (flow->mark == mark) {
> > +            return false;
> > +        }
> > +    }
> > +
> > +    return true;
> > +}
> > +
> > +static int
> > +mark_to_flow_disassociate(struct dp_netdev_pmd_thread *pmd,
> > +                          struct dp_netdev_flow *flow) {
> > +    int ret = 0;
> > +    uint32_t mark = flow->mark;
> > +    struct cmap_node *mark_node = CONST_CAST(struct cmap_node *,
> > +                                             &flow->mark_node);
> > +
> > +    cmap_remove(&flow_mark.mark_to_flow, mark_node, mark);
> > +    flow->mark = INVALID_FLOW_MARK;
> > +
> 
> A comment on this code block could be helpful. Mainly just to explain how the reference could be missing and the required steps because of this.

It's not about "missing". It's that we are the last reference to the mark,
so that we could remove the mark and the related flow from hardware.

> > +    if (flow_mark_has_no_ref(mark)) {
> > +        struct dp_netdev_port *port;
> > +        odp_port_t in_port = flow->flow.in_port.odp_port;
> > +
> 
> Need port Mutex 'pmd->dp->port_mutex' before calling dp_netdev_lookup_port().

Right, will fix it.

> 
> > +        port = dp_netdev_lookup_port(pmd->dp, in_port);
> > +        if (port) {
> > +            ret = netdev_flow_del(port->netdev, &flow->mega_ufid, NULL);
> > +        }
> > +
> > +        flow_mark_free(mark);
> > +        VLOG_DBG("freed flow mark %u\n", mark);
> > +
> > +        megaflow_to_mark_disassociate(&flow->mega_ufid);
> > +    }
> > +    dp_netdev_flow_unref(flow);
> > +
> > +    return ret;
> > +}
> > +
> > +static void
> > +flow_mark_flush(struct dp_netdev_pmd_thread *pmd) {
> > +    struct dp_netdev_flow *flow;
> > +
> > +    CMAP_FOR_EACH (flow, mark_node, &flow_mark.mark_to_flow) {
> > +        if (flow->pmd_id == pmd->core_id) {
> > +            mark_to_flow_disassociate(pmd, flow);
> > +        }
> > +    }
> > +}
> > +
> >  static void
> >  dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
> >                            struct dp_netdev_flow *flow) @@ -1845,6 +2015,9
> > @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
> >      ovs_assert(cls != NULL);
> >      dpcls_remove(cls, &flow->cr);
> >      cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow-
> > >ufid));
> > +    if (flow->mark != INVALID_FLOW_MARK) {
> > +        mark_to_flow_disassociate(pmd, flow);
> > +    }
> >      flow->dead = true;
> > 
> >      dp_netdev_flow_unref(flow);
> > @@ -2424,6 +2597,87 @@ out:
> >      return error;
> >  }
> > 
> > +static void
> > +try_netdev_flow_put(struct dp_netdev_pmd_thread *pmd, odp_port_t in_port,
> > +                    struct dp_netdev_flow *flow, struct match *match,
> > +                    const struct nlattr *actions, size_t actions_len) {
> > +    struct offload_info info;
> > +    struct dp_netdev_port *port;
> > +    bool modification = flow->mark != INVALID_FLOW_MARK;
> > +    const char *op = modification ? "modify" : "add";
> > +    uint32_t mark;
> > +    int ret;
> > +
> 
> Again port Mutex 'pmd->dp->port_mutex' required in this function for calling dp_netdev_lookup_port().
> 
> > +    port = dp_netdev_lookup_port(pmd->dp, in_port);
> > +    if (!port) {
> > +        return;
> > +    }
> > +
> > +    ovs_mutex_lock(&flow_mark.mutex);
> > +
> > +    if (modification) {
> > +        mark = flow->mark;
> > +    } else {
> > +        if (!netdev_is_flow_api_enabled()) {
> > +            goto out;
> > +        }
> > +
> > +        /*
> > +         * If a mega flow has already been offloaded (from other PMD
> > +         * instances), do not offload it again.
> > +         */
> > +        mark = megaflow_to_mark_find(&flow->mega_ufid);
> > +        if (mark != INVALID_FLOW_MARK) {
> > +            VLOG_DBG("flow has already been offloaded with mark %u\n",
> > mark);
> Just to understand this, Am I right in thinking the expeted behavior is a mark esists so it has been offloaded by another PMD, don't offload it in the HW again BUT do mark it as associated. Is this associated call only relevant to the current PMD? (i.e. each PMD must discover this association themselves?)

Yes.

> > +            mark_to_flow_associate(mark, flow);
> > +            goto out;
> > +        }
> > +
> > +        mark = flow_mark_alloc();
> > +        if (mark == INVALID_FLOW_MARK) {
> > +            VLOG_ERR("failed to allocate flow mark!\n");
> > +            goto out;
> > +        }
> > +    }
> > +
> > +    info.flow_mark = mark;
> > +    ret = netdev_flow_put(port->netdev, match,
> > +                          CONST_CAST(struct nlattr *, actions),
> > +                          actions_len, &flow->mega_ufid, &info, NULL);
> > +    if (ret) {
> > +        VLOG_ERR("failed to %s netdev flow with mark %u\n", op, mark);
> > +        if (!modification) {
> > +            flow_mark_free(mark);
> > +        } else {
> > +            mark_to_flow_disassociate(pmd, flow);
> > +        }
> > +        goto out;
> > +    }
> > +
> > +    if (!modification) {
> > +        megaflow_to_mark_associate(&flow->mega_ufid, mark);
> > +        mark_to_flow_associate(mark, flow);
> > +    }
> > +    VLOG_DBG("succeed to %s netdev flow with mark %u\n", op, mark);
> 
> The whole section above seems a bit complicated, again comments explaining the behavior that depends on modification would help a lot here.

sure, I will add some explanations.

	--yliu

> > +
> > +out:
> > +    ovs_mutex_unlock(&flow_mark.mutex);
> > +}
> > +
> > +static void
> > +dp_netdev_get_mega_ufid(const struct match *match, ovs_u128 *mega_ufid)
> > +{
> > +    struct flow masked_flow;
> > +    size_t i;
> > +
> > +    for (i = 0; i < sizeof(struct flow); i++) {
> > +        ((uint8_t *)&masked_flow)[i] = ((uint8_t *)&match->flow)[i] &
> > +                                       ((uint8_t *)&match->wc)[i];
> > +    }
> > +    dpif_flow_hash(NULL, &masked_flow, sizeof(struct flow), mega_ufid);
> > +}
> > +
> >  static struct dp_netdev_flow *
> >  dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
> >                     struct match *match, const ovs_u128 *ufid, @@ -2459,12
> > +2713,14 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
> >      memset(&flow->stats, 0, sizeof flow->stats);
> >      flow->dead = false;
> >      flow->batch = NULL;
> > +    flow->mark = INVALID_FLOW_MARK;
> >      *CONST_CAST(unsigned *, &flow->pmd_id) = pmd->core_id;
> >      *CONST_CAST(struct flow *, &flow->flow) = match->flow;
> >      *CONST_CAST(ovs_u128 *, &flow->ufid) = *ufid;
> >      ovs_refcount_init(&flow->ref_cnt);
> >      ovsrcu_set(&flow->actions, dp_netdev_actions_create(actions,
> > actions_len));
> > 
> > +    dp_netdev_get_mega_ufid(match, CONST_CAST(ovs_u128 *,
> > + &flow->mega_ufid));
> >      netdev_flow_key_init_masked(&flow->cr.flow, &match->flow, &mask);
> > 
> >      /* Select dpcls for in_port. Relies on in_port to be exact match. */
> > @@ -2474,6 +2730,8 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
> >      cmap_insert(&pmd->flow_table, CONST_CAST(struct cmap_node *, &flow-
> > >node),
> >                  dp_netdev_flow_hash(&flow->ufid));
> > 
> > +    try_netdev_flow_put(pmd, in_port, flow, match, actions,
> > + actions_len);
> > +
> >      if (OVS_UNLIKELY(!VLOG_DROP_DBG((&upcall_rl)))) {
> >          struct ds ds = DS_EMPTY_INITIALIZER;
> >          struct ofpbuf key_buf, mask_buf; @@ -2554,6 +2812,7 @@
> > flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
> >          if (put->flags & DPIF_FP_MODIFY) {
> >              struct dp_netdev_actions *new_actions;
> >              struct dp_netdev_actions *old_actions;
> > +            odp_port_t in_port = netdev_flow->flow.in_port.odp_port;
> > 
> >              new_actions = dp_netdev_actions_create(put->actions,
> >                                                     put->actions_len); @@
> > -2561,6 +2820,9 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
> >              old_actions = dp_netdev_flow_get_actions(netdev_flow);
> >              ovsrcu_set(&netdev_flow->actions, new_actions);
> > 
> > +            try_netdev_flow_put(pmd, in_port, netdev_flow, match,
> > +                                put->actions, put->actions_len);
> > +
> >              if (stats) {
> >                  get_dpif_flow_stats(netdev_flow, stats);
> >              }
> > @@ -3570,6 +3832,7 @@ reload_affected_pmds(struct dp_netdev *dp)
> > 
> >      CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
> >          if (pmd->need_reload) {
> > +            flow_mark_flush(pmd);
> >              dp_netdev_reload_pmd__(pmd);
> >              pmd->need_reload = false;
> >          }
> > diff --git a/lib/netdev.h b/lib/netdev.h index 3a545fe..0c1946a 100644
> > --- a/lib/netdev.h
> > +++ b/lib/netdev.h
> > @@ -188,6 +188,12 @@ void netdev_send_wait(struct netdev *, int qid);
> > struct offload_info {
> >      const struct dpif_class *dpif_class;
> >      ovs_be16 tp_dst_port; /* Destination port for tunnel in SET action */
> > +
> > +    /*
> > +     * The flow mark id assigened to the flow. If any pkts hit the flow,
> > +     * it will be in the pkt meta data.
> > +     */
> > +    uint32_t flow_mark;
> >  };
> >  struct dpif_class;
> >  struct netdev_flow_dump;
> > --
> > 2.7.4
Stokes, Ian Jan. 26, 2018, 10:49 a.m. UTC | #4
> -----Original Message-----
> From: Yuanhan Liu [mailto:yliu@fridaylinux.org]
> Sent: Friday, January 26, 2018 8:20 AM
> To: Stokes, Ian <ian.stokes@intel.com>
> Cc: dev@openvswitch.org; Finn Christensen <fc@napatech.com>; Darrell Ball
> <dball@vmware.com>; Chandran, Sugesh <sugesh.chandran@intel.com>; Simon
> Horman <simon.horman@netronome.com>
> Subject: Re: [PATCH v5 1/5] dpif-netdev: associate flow with a mark id
> 
> On Wed, Jan 24, 2018 at 05:29:45PM +0000, Stokes, Ian wrote:
> > > Meanwhile, the mark to flow mapping becomes to 1:N mapping. That is
> > > what
> >
> > Is this 1:N or 1:1? I thought I spotted below that it's 1:1.
> 
> If you look further, you will also spot "1:N".
> mark to mega flow is 1:1
> mark to flow is 1:N
> For why is that, I think the commit log has explained it well.
> 
> > > @@ -442,7 +443,9 @@ struct dp_netdev_flow {
> > >      /* Hash table index by unmasked flow. */
> > >      const struct cmap_node node; /* In owning
> > > dp_netdev_pmd_thread's */
> >
> > Can you clarify the comment above, at first glance it's not clear /* In
> owning dp_netdev_pmd_thread's */ ?
> 
> Such comment doesn't even come from my patch. It's already there.
> 
> > >                                   /* 'flow_table'. */
> > > +    const struct cmap_node mark_node; /* In owning flow_mark's
> > > + mark_to_flow */
> >
> > Same here, it's not clear what is meant by 'In owning...'
> 
> I was following the old comment style. Do you have better suggestions?
Its ok, I hadn't spotted you were following the existing comment.

> 
> >
> > Breaks compilation with Sparse. Sparse complains with:
> > 'lib/dpif-netdev.c:1860:18: error: symbol 'flow_mark' was not declared.
> Should it be static?'
> 
> Thanks. Will be fixed.
> 
> > > +struct flow_mark flow_mark = {
> > > +    .megaflow_to_mark = CMAP_INITIALIZER,
> > > +    .mark_to_flow = CMAP_INITIALIZER,
> > > +    .mutex = OVS_MUTEX_INITIALIZER, };
> > > +
> > > +static uint32_t
> > > +flow_mark_alloc(void)
> > > +{
> > > +    uint32_t mark;
> > > +
> >
> > I would expect the id pool to be allocated only once before it's used to
> allocate ids for the marks.
> >
> > Would it make sense to move the !flow_mark.pool check and allocation out
> of here to simplify the function to a single purpose and avoid the check
> for every mark id allocation.
> 
> I don't find a good place to put it. Or you have better idea?
It's not a big deal, I was just thinking was it required to be here or was there a way to avoid the calls every time.

> 
> > > +    if (!flow_mark.pool) {
> > > +        /* Haven't initiated yet, do it here */
> > > +        flow_mark.pool = id_pool_create(0, MAX_FLOW_MARK);
> > > +    }
> > > +
> > > +    if (id_pool_alloc_id(flow_mark.pool, &mark)) {
> > > +        return mark;
> > > +    }
> > > +
> > > +    return INVALID_FLOW_MARK;
> > > +}
> > > +
> > > +static void
> > > +flow_mark_free(uint32_t mark)
> > > +{
> > > +    id_pool_free_id(flow_mark.pool, mark); }
> > > +
> > > +/* associate flow with a mark, which is a 1:1 mapping */ static
> > > +void
> >
> > Can you specify megaflow in the comment above.
> 
> ok.
> 
> > > +megaflow_to_mark_associate(const ovs_u128 *mega_ufid, uint32_t mark)
> {
> > > +    size_t hash = dp_netdev_flow_hash(mega_ufid);
> > > +    struct megaflow_to_mark_data *data = xzalloc(sizeof(*data));
> > > +
> > > +    data->mega_ufid = *mega_ufid;
> > > +    data->mark = mark;
> > > +
> > > +    cmap_insert(&flow_mark.megaflow_to_mark,
> > > +                CONST_CAST(struct cmap_node *, &data->node), hash);
> > > + }
> > > +
> > > +/* disassociate flow with a mark */ static void
> > > +megaflow_to_mark_disassociate(const ovs_u128 *mega_ufid) {
> > > +    size_t hash = dp_netdev_flow_hash(mega_ufid);
> > > +    struct megaflow_to_mark_data *data;
> > > +
> > > +    CMAP_FOR_EACH_WITH_HASH (data, node, hash,
> > > &flow_mark.megaflow_to_mark) {
> > > +        if (ovs_u128_equals(*mega_ufid, data->mega_ufid)) {
> > > +            cmap_remove(&flow_mark.megaflow_to_mark,
> > > +                        CONST_CAST(struct cmap_node *,
> > > + &data->node),
> > > hash);
> > > +            free(data);
> > > +            return;
> > > +        }
> > > +    }
> > > +
> > > +    VLOG_WARN("masked ufid "UUID_FMT" is not associated with a
> mark?\n",
> > > +              UUID_ARGS((struct uuid *)mega_ufid)); }
> > > +
> > > +static inline uint32_t
> > > +megaflow_to_mark_find(const ovs_u128 *mega_ufid) {
> > > +    size_t hash = dp_netdev_flow_hash(mega_ufid);
> > > +    struct megaflow_to_mark_data *data;
> > > +
> > > +    CMAP_FOR_EACH_WITH_HASH (data, node, hash,
> > > &flow_mark.megaflow_to_mark) {
> > > +        if (ovs_u128_equals(*mega_ufid, data->mega_ufid)) {
> > > +            return data->mark;
> > > +        }
> > > +    }
> > > +
> > > +    return INVALID_FLOW_MARK;
> > > +}
> > > +
> > > +/* associate mark with a flow, which is 1:N mapping */ static void
> > > +mark_to_flow_associate(const uint32_t mark, struct dp_netdev_flow
> > > +*flow) {
> > > +    dp_netdev_flow_ref(flow);
> > > +
> > > +    cmap_insert(&flow_mark.mark_to_flow,
> > > +                CONST_CAST(struct cmap_node *, &flow->mark_node),
> > > +                mark);
> > > +    flow->mark = mark;
> > > +
> > > +    VLOG_DBG("associated dp_netdev flow %p with mark %u\n", flow,
> > > +mark); }
> > > +
> >
> > I'd like to see a comment explain the purpose of the function below.
> Which reference is the flow_mark missing?

Apologies, missing is probably the wrong term, I meant when a flow mark has no ref? (reference I assume this stands for).

> 
> What do you mean by "which reference is the flow_mark missing"?
> 
> --- forgot about this question: I think I understood you after
>     looked you comments below. Also, you will find the answer
>     below.
> 
> > > +static bool
> > > +flow_mark_has_no_ref(uint32_t mark) {
> > > +    struct dp_netdev_flow *flow;
> > > +
> >
> > Maybe I'm missing something below, but I expected a hash to be computed
> for mark before being called with CMAP_FOR_EACH_WITH_HASH?
> 
> I treat "mark" as the hash, as it's with the same type of "hash" and it's
> uniq. But you are probably right, it might be better to get the hash by
> "hash_int". Will fix it.
> 
> >
> > > +    CMAP_FOR_EACH_WITH_HASH (flow, mark_node, mark,
> > > +                             &flow_mark.mark_to_flow) {
> > > +        if (flow->mark == mark) {
> > > +            return false;
> > > +        }
> > > +    }
> > > +
> > > +    return true;
> > > +}
> > > +
> > > +static int
> > > +mark_to_flow_disassociate(struct dp_netdev_pmd_thread *pmd,
> > > +                          struct dp_netdev_flow *flow) {
> > > +    int ret = 0;
> > > +    uint32_t mark = flow->mark;
> > > +    struct cmap_node *mark_node = CONST_CAST(struct cmap_node *,
> > > +                                             &flow->mark_node);
> > > +
> > > +    cmap_remove(&flow_mark.mark_to_flow, mark_node, mark);
> > > +    flow->mark = INVALID_FLOW_MARK;
> > > +
> >
> > A comment on this code block could be helpful. Mainly just to explain
> how the reference could be missing and the required steps because of this.
> 
> It's not about "missing". It's that we are the last reference to the mark,
> so that we could remove the mark and the related flow from hardware.

Ah ok, that clears it up. Again 'missing' was the wrong term on my part.

So would this removal of the flow from  HW be carried out by the PMD that first installed the flow to the HW? Or can any PMD remove the flow from HW? I'd assume then every other PMD that had previously associated the flow with a Mark must then dissociate it.

> 
> > > +    if (flow_mark_has_no_ref(mark)) {
> > > +        struct dp_netdev_port *port;
> > > +        odp_port_t in_port = flow->flow.in_port.odp_port;
> > > +
> >
> > Need port Mutex 'pmd->dp->port_mutex' before calling
> dp_netdev_lookup_port().
> 
> Right, will fix it.
> 
> >
> > > +        port = dp_netdev_lookup_port(pmd->dp, in_port);
> > > +        if (port) {
> > > +            ret = netdev_flow_del(port->netdev, &flow->mega_ufid,
> NULL);
> > > +        }
> > > +
> > > +        flow_mark_free(mark);
> > > +        VLOG_DBG("freed flow mark %u\n", mark);
> > > +
> > > +        megaflow_to_mark_disassociate(&flow->mega_ufid);
> > > +    }
> > > +    dp_netdev_flow_unref(flow);
> > > +
> > > +    return ret;
> > > +}
> > > +
> > > +static void
> > > +flow_mark_flush(struct dp_netdev_pmd_thread *pmd) {
> > > +    struct dp_netdev_flow *flow;
> > > +
> > > +    CMAP_FOR_EACH (flow, mark_node, &flow_mark.mark_to_flow) {
> > > +        if (flow->pmd_id == pmd->core_id) {
> > > +            mark_to_flow_disassociate(pmd, flow);
> > > +        }
> > > +    }
> > > +}
> > > +
> > >  static void
> > >  dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
> > >                            struct dp_netdev_flow *flow) @@ -1845,6
> > > +2015,9 @@ dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
> > >      ovs_assert(cls != NULL);
> > >      dpcls_remove(cls, &flow->cr);
> > >      cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow-
> > > >ufid));
> > > +    if (flow->mark != INVALID_FLOW_MARK) {
> > > +        mark_to_flow_disassociate(pmd, flow);
> > > +    }
> > >      flow->dead = true;
> > >
> > >      dp_netdev_flow_unref(flow);
> > > @@ -2424,6 +2597,87 @@ out:
> > >      return error;
> > >  }
> > >
> > > +static void
> > > +try_netdev_flow_put(struct dp_netdev_pmd_thread *pmd, odp_port_t
> in_port,
> > > +                    struct dp_netdev_flow *flow, struct match *match,
> > > +                    const struct nlattr *actions, size_t actions_len)
> {
> > > +    struct offload_info info;
> > > +    struct dp_netdev_port *port;
> > > +    bool modification = flow->mark != INVALID_FLOW_MARK;
> > > +    const char *op = modification ? "modify" : "add";
> > > +    uint32_t mark;
> > > +    int ret;
> > > +
> >
> > Again port Mutex 'pmd->dp->port_mutex' required in this function for
> calling dp_netdev_lookup_port().
> >
> > > +    port = dp_netdev_lookup_port(pmd->dp, in_port);
> > > +    if (!port) {
> > > +        return;
> > > +    }
> > > +
> > > +    ovs_mutex_lock(&flow_mark.mutex);
> > > +
> > > +    if (modification) {
> > > +        mark = flow->mark;
> > > +    } else {
> > > +        if (!netdev_is_flow_api_enabled()) {
> > > +            goto out;
> > > +        }
> > > +
> > > +        /*
> > > +         * If a mega flow has already been offloaded (from other PMD
> > > +         * instances), do not offload it again.
> > > +         */
> > > +        mark = megaflow_to_mark_find(&flow->mega_ufid);
> > > +        if (mark != INVALID_FLOW_MARK) {
> > > +            VLOG_DBG("flow has already been offloaded with mark
> > > + %u\n",
> > > mark);
> > Just to understand this, Am I right in thinking the expeted behavior
> > is a mark esists so it has been offloaded by another PMD, don't
> > offload it in the HW again BUT do mark it as associated. Is this
> > associated call only relevant to the current PMD? (i.e. each PMD must
> > discover this association themselves?)
> 
> Yes.
> 
> > > +            mark_to_flow_associate(mark, flow);
> > > +            goto out;
> > > +        }
> > > +
> > > +        mark = flow_mark_alloc();
> > > +        if (mark == INVALID_FLOW_MARK) {
> > > +            VLOG_ERR("failed to allocate flow mark!\n");
> > > +            goto out;
> > > +        }
> > > +    }
> > > +
> > > +    info.flow_mark = mark;
> > > +    ret = netdev_flow_put(port->netdev, match,
> > > +                          CONST_CAST(struct nlattr *, actions),
> > > +                          actions_len, &flow->mega_ufid, &info,
> NULL);
> > > +    if (ret) {
> > > +        VLOG_ERR("failed to %s netdev flow with mark %u\n", op,
> mark);
> > > +        if (!modification) {
> > > +            flow_mark_free(mark);
> > > +        } else {
> > > +            mark_to_flow_disassociate(pmd, flow);
> > > +        }
> > > +        goto out;
> > > +    }
> > > +
> > > +    if (!modification) {
> > > +        megaflow_to_mark_associate(&flow->mega_ufid, mark);
> > > +        mark_to_flow_associate(mark, flow);
> > > +    }
> > > +    VLOG_DBG("succeed to %s netdev flow with mark %u\n", op, mark);
> >
> > The whole section above seems a bit complicated, again comments
> explaining the behavior that depends on modification would help a lot
> here.
> 
> sure, I will add some explanations.
> 
> 	--yliu
> 
> > > +
> > > +out:
> > > +    ovs_mutex_unlock(&flow_mark.mutex);
> > > +}
> > > +
> > > +static void
> > > +dp_netdev_get_mega_ufid(const struct match *match, ovs_u128
> > > +*mega_ufid) {
> > > +    struct flow masked_flow;
> > > +    size_t i;
> > > +
> > > +    for (i = 0; i < sizeof(struct flow); i++) {
> > > +        ((uint8_t *)&masked_flow)[i] = ((uint8_t *)&match->flow)[i] &
> > > +                                       ((uint8_t *)&match->wc)[i];
> > > +    }
> > > +    dpif_flow_hash(NULL, &masked_flow, sizeof(struct flow),
> > > +mega_ufid); }
> > > +
> > >  static struct dp_netdev_flow *
> > >  dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
> > >                     struct match *match, const ovs_u128 *ufid, @@
> > > -2459,12
> > > +2713,14 @@ dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
> > >      memset(&flow->stats, 0, sizeof flow->stats);
> > >      flow->dead = false;
> > >      flow->batch = NULL;
> > > +    flow->mark = INVALID_FLOW_MARK;
> > >      *CONST_CAST(unsigned *, &flow->pmd_id) = pmd->core_id;
> > >      *CONST_CAST(struct flow *, &flow->flow) = match->flow;
> > >      *CONST_CAST(ovs_u128 *, &flow->ufid) = *ufid;
> > >      ovs_refcount_init(&flow->ref_cnt);
> > >      ovsrcu_set(&flow->actions, dp_netdev_actions_create(actions,
> > > actions_len));
> > >
> > > +    dp_netdev_get_mega_ufid(match, CONST_CAST(ovs_u128 *,
> > > + &flow->mega_ufid));
> > >      netdev_flow_key_init_masked(&flow->cr.flow, &match->flow,
> > > &mask);
> > >
> > >      /* Select dpcls for in_port. Relies on in_port to be exact
> > > match. */ @@ -2474,6 +2730,8 @@ dp_netdev_flow_add(struct
> dp_netdev_pmd_thread *pmd,
> > >      cmap_insert(&pmd->flow_table, CONST_CAST(struct cmap_node *,
> > > &flow-
> > > >node),
> > >                  dp_netdev_flow_hash(&flow->ufid));
> > >
> > > +    try_netdev_flow_put(pmd, in_port, flow, match, actions,
> > > + actions_len);
> > > +
> > >      if (OVS_UNLIKELY(!VLOG_DROP_DBG((&upcall_rl)))) {
> > >          struct ds ds = DS_EMPTY_INITIALIZER;
> > >          struct ofpbuf key_buf, mask_buf; @@ -2554,6 +2812,7 @@
> > > flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
> > >          if (put->flags & DPIF_FP_MODIFY) {
> > >              struct dp_netdev_actions *new_actions;
> > >              struct dp_netdev_actions *old_actions;
> > > +            odp_port_t in_port =
> > > + netdev_flow->flow.in_port.odp_port;
> > >
> > >              new_actions = dp_netdev_actions_create(put->actions,
> > >
> > > put->actions_len); @@
> > > -2561,6 +2820,9 @@ flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
> > >              old_actions = dp_netdev_flow_get_actions(netdev_flow);
> > >              ovsrcu_set(&netdev_flow->actions, new_actions);
> > >
> > > +            try_netdev_flow_put(pmd, in_port, netdev_flow, match,
> > > +                                put->actions, put->actions_len);
> > > +
> > >              if (stats) {
> > >                  get_dpif_flow_stats(netdev_flow, stats);
> > >              }
> > > @@ -3570,6 +3832,7 @@ reload_affected_pmds(struct dp_netdev *dp)
> > >
> > >      CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
> > >          if (pmd->need_reload) {
> > > +            flow_mark_flush(pmd);
> > >              dp_netdev_reload_pmd__(pmd);
> > >              pmd->need_reload = false;
> > >          }
> > > diff --git a/lib/netdev.h b/lib/netdev.h index 3a545fe..0c1946a
> > > 100644
> > > --- a/lib/netdev.h
> > > +++ b/lib/netdev.h
> > > @@ -188,6 +188,12 @@ void netdev_send_wait(struct netdev *, int
> > > qid); struct offload_info {
> > >      const struct dpif_class *dpif_class;
> > >      ovs_be16 tp_dst_port; /* Destination port for tunnel in SET
> > > action */
> > > +
> > > +    /*
> > > +     * The flow mark id assigened to the flow. If any pkts hit the
> flow,
> > > +     * it will be in the pkt meta data.
> > > +     */
> > > +    uint32_t flow_mark;
> > >  };
> > >  struct dpif_class;
> > >  struct netdev_flow_dump;
> > > --
> > > 2.7.4
Yuanhan Liu Jan. 26, 2018, 11:03 a.m. UTC | #5
On Fri, Jan 26, 2018 at 10:49:46AM +0000, Stokes, Ian wrote:
> > > > +static int
> > > > +mark_to_flow_disassociate(struct dp_netdev_pmd_thread *pmd,
> > > > +                          struct dp_netdev_flow *flow) {
> > > > +    int ret = 0;
> > > > +    uint32_t mark = flow->mark;
> > > > +    struct cmap_node *mark_node = CONST_CAST(struct cmap_node *,
> > > > +                                             &flow->mark_node);
> > > > +
> > > > +    cmap_remove(&flow_mark.mark_to_flow, mark_node, mark);
> > > > +    flow->mark = INVALID_FLOW_MARK;
> > > > +
> > >
> > > A comment on this code block could be helpful. Mainly just to explain
> > how the reference could be missing and the required steps because of this.
> > 
> > It's not about "missing". It's that we are the last reference to the mark,
> > so that we could remove the mark and the related flow from hardware.
> 
> Ah ok, that clears it up. Again 'missing' was the wrong term on my part.
> 
> So would this removal of the flow from  HW be carried out by the PMD that first installed the flow to the HW? Or can any PMD remove the flow from HW?

Yes, any PMD can remove the flow from HW. And of course, only the last
one will trigger the removal.

> I'd assume then every other PMD that had previously associated the flow with a Mark must then dissociate it.

Yes.

	--yliu
Yuanhan Liu Jan. 29, 2018, 6:58 a.m. UTC | #6
On Fri, Jan 26, 2018 at 04:19:30PM +0800, Yuanhan Liu wrote:
> > > +static bool
> > > +flow_mark_has_no_ref(uint32_t mark)
> > > +{
> > > +    struct dp_netdev_flow *flow;
> > > +
> > 
> > Maybe I'm missing something below, but I expected a hash to be computed for mark before being called with CMAP_FOR_EACH_WITH_HASH?
> 
> I treat "mark" as the hash, as it's with the same type of "hash" and it's
> uniq. But you are probably right, it might be better to get the hash by
> "hash_int". Will fix it.

Oops, I forgot to do the coressponding change for mark_to_flow find. Thus,
the partial offload is completely skiped, as retrieving the flow from mark
would always fail (returing NULL).

The reason I missed it, while I was testing v6, is I found the flow creation
is failed. I don't really change anything between v5 and v6 and when I was
back to v5, I also met same issue. I then thought it might be introduced by
the OFED or firmware change I have done (for switching to other projects).
I then thought it's something I could figure out laterly. Thus, v6 was sent
out basically just with a build test.

I then figured out today that the flow creation failure is introduced by
a MLX5 PMD driver change in DPDK v17.11. It looks like a bug to me. And
there are 2 solutions for that:

- fix it in MLX5 PMD (if it's a bug); I was talking to the author made
  such change.
- set rss_conf to NULL, which will let DPDK to follow the one OVS-DPDK
  has set in the beginning.

I chose 2, sicne option 1 won't change the fact that it won't work with
DPDK v17.11.

And Finn, I probably need your help to verify that option 2 won't break
Napa PMD drivers.

I will send v7 soon. Please help review.

Thanks.

	--yliu
Finn Christensen Jan. 29, 2018, 8:48 a.m. UTC | #7
Hi Yuanhan,

This will not break our PMD. I think PMDs should be able to handle rss_conf == NULL and then failover to default or initially set rss_conf.

Finn

>-----Original Message-----
>From: Yuanhan Liu [mailto:yliu@fridaylinux.org]
>Sent: 29. januar 2018 07:59
>To: Stokes, Ian <ian.stokes@intel.com>
>Cc: dev@openvswitch.org; Finn Christensen <fc@napatech.com>; Darrell Ball
><dball@vmware.com>; Chandran, Sugesh <sugesh.chandran@intel.com>;
>Simon Horman <simon.horman@netronome.com>
>Subject: Re: [PATCH v5 1/5] dpif-netdev: associate flow with a mark id
>
>On Fri, Jan 26, 2018 at 04:19:30PM +0800, Yuanhan Liu wrote:
>> > > +static bool
>> > > +flow_mark_has_no_ref(uint32_t mark) {
>> > > +    struct dp_netdev_flow *flow;
>> > > +
>> >
>> > Maybe I'm missing something below, but I expected a hash to be
>computed for mark before being called with CMAP_FOR_EACH_WITH_HASH?
>>
>> I treat "mark" as the hash, as it's with the same type of "hash" and
>> it's uniq. But you are probably right, it might be better to get the
>> hash by "hash_int". Will fix it.
>
>Oops, I forgot to do the coressponding change for mark_to_flow find. Thus,
>the partial offload is completely skiped, as retrieving the flow from mark
>would always fail (returing NULL).
>
>The reason I missed it, while I was testing v6, is I found the flow creation is
>failed. I don't really change anything between v5 and v6 and when I was back
>to v5, I also met same issue. I then thought it might be introduced by the
>OFED or firmware change I have done (for switching to other projects).
>I then thought it's something I could figure out laterly. Thus, v6 was sent out
>basically just with a build test.
>
>I then figured out today that the flow creation failure is introduced by a MLX5
>PMD driver change in DPDK v17.11. It looks like a bug to me. And there are 2
>solutions for that:
>
>- fix it in MLX5 PMD (if it's a bug); I was talking to the author made
>  such change.
>- set rss_conf to NULL, which will let DPDK to follow the one OVS-DPDK
>  has set in the beginning.
>
>I chose 2, sicne option 1 won't change the fact that it won't work with DPDK
>v17.11.
>
>And Finn, I probably need your help to verify that option 2 won't break Napa
>PMD drivers.
>
>I will send v7 soon. Please help review.
>
>Thanks.
>
>	--yliu
Yuanhan Liu Jan. 29, 2018, 9:35 a.m. UTC | #8
On Mon, Jan 29, 2018 at 08:48:42AM +0000, Finn Christensen wrote:
> Hi Yuanhan,
> 
> This will not break our PMD.

Great and thank you for the confirmation!

> I think PMDs should be able to handle rss_conf == NULL and then failover to default or initially set rss_conf.

That's also what I think of.

	--yliu
> 
> Finn
> 
> >-----Original Message-----
> >From: Yuanhan Liu [mailto:yliu@fridaylinux.org]
> >Sent: 29. januar 2018 07:59
> >To: Stokes, Ian <ian.stokes@intel.com>
> >Cc: dev@openvswitch.org; Finn Christensen <fc@napatech.com>; Darrell Ball
> ><dball@vmware.com>; Chandran, Sugesh <sugesh.chandran@intel.com>;
> >Simon Horman <simon.horman@netronome.com>
> >Subject: Re: [PATCH v5 1/5] dpif-netdev: associate flow with a mark id
> >
> >On Fri, Jan 26, 2018 at 04:19:30PM +0800, Yuanhan Liu wrote:
> >> > > +static bool
> >> > > +flow_mark_has_no_ref(uint32_t mark) {
> >> > > +    struct dp_netdev_flow *flow;
> >> > > +
> >> >
> >> > Maybe I'm missing something below, but I expected a hash to be
> >computed for mark before being called with CMAP_FOR_EACH_WITH_HASH?
> >>
> >> I treat "mark" as the hash, as it's with the same type of "hash" and
> >> it's uniq. But you are probably right, it might be better to get the
> >> hash by "hash_int". Will fix it.
> >
> >Oops, I forgot to do the coressponding change for mark_to_flow find. Thus,
> >the partial offload is completely skiped, as retrieving the flow from mark
> >would always fail (returing NULL).
> >
> >The reason I missed it, while I was testing v6, is I found the flow creation is
> >failed. I don't really change anything between v5 and v6 and when I was back
> >to v5, I also met same issue. I then thought it might be introduced by the
> >OFED or firmware change I have done (for switching to other projects).
> >I then thought it's something I could figure out laterly. Thus, v6 was sent out
> >basically just with a build test.
> >
> >I then figured out today that the flow creation failure is introduced by a MLX5
> >PMD driver change in DPDK v17.11. It looks like a bug to me. And there are 2
> >solutions for that:
> >
> >- fix it in MLX5 PMD (if it's a bug); I was talking to the author made
> >  such change.
> >- set rss_conf to NULL, which will let DPDK to follow the one OVS-DPDK
> >  has set in the beginning.
> >
> >I chose 2, sicne option 1 won't change the fact that it won't work with DPDK
> >v17.11.
> >
> >And Finn, I probably need your help to verify that option 2 won't break Napa
> >PMD drivers.
> >
> >I will send v7 soon. Please help review.
> >
> >Thanks.
> >
> >--yliu
> Disclaimer: This email and any files transmitted with it may contain confidential information intended for the addressee(s) only. The information is not to be surrendered or copied to unauthorized persons. If you have received this communication in error, please notify the sender immediately and delete this e-mail from your system.
diff mbox series

Patch

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 55be632..2fdc8dd 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -77,6 +77,7 @@ 
 #include "tnl-ports.h"
 #include "unixctl.h"
 #include "util.h"
+#include "uuid.h"
 
 VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
@@ -442,7 +443,9 @@  struct dp_netdev_flow {
     /* Hash table index by unmasked flow. */
     const struct cmap_node node; /* In owning dp_netdev_pmd_thread's */
                                  /* 'flow_table'. */
+    const struct cmap_node mark_node; /* In owning flow_mark's mark_to_flow */
     const ovs_u128 ufid;         /* Unique flow identifier. */
+    const ovs_u128 mega_ufid;
     const unsigned pmd_id;       /* The 'core_id' of pmd thread owning this */
                                  /* flow. */
 
@@ -453,6 +456,7 @@  struct dp_netdev_flow {
     struct ovs_refcount ref_cnt;
 
     bool dead;
+    uint32_t mark;               /* Unique flow mark assiged to a flow */
 
     /* Statistics. */
     struct dp_netdev_flow_stats stats;
@@ -1832,6 +1836,172 @@  dp_netdev_pmd_find_dpcls(struct dp_netdev_pmd_thread *pmd,
     return cls;
 }
 
+#define MAX_FLOW_MARK       (UINT32_MAX - 1)
+#define INVALID_FLOW_MARK   (UINT32_MAX)
+
+struct megaflow_to_mark_data {
+    const struct cmap_node node;
+    ovs_u128 mega_ufid;
+    uint32_t mark;
+};
+
+struct flow_mark {
+    struct cmap megaflow_to_mark;
+    struct cmap mark_to_flow;
+    struct id_pool *pool;
+    struct ovs_mutex mutex;
+};
+
+struct flow_mark flow_mark = {
+    .megaflow_to_mark = CMAP_INITIALIZER,
+    .mark_to_flow = CMAP_INITIALIZER,
+    .mutex = OVS_MUTEX_INITIALIZER,
+};
+
+static uint32_t
+flow_mark_alloc(void)
+{
+    uint32_t mark;
+
+    if (!flow_mark.pool) {
+        /* Haven't initiated yet, do it here */
+        flow_mark.pool = id_pool_create(0, MAX_FLOW_MARK);
+    }
+
+    if (id_pool_alloc_id(flow_mark.pool, &mark)) {
+        return mark;
+    }
+
+    return INVALID_FLOW_MARK;
+}
+
+static void
+flow_mark_free(uint32_t mark)
+{
+    id_pool_free_id(flow_mark.pool, mark);
+}
+
+/* associate flow with a mark, which is a 1:1 mapping */
+static void
+megaflow_to_mark_associate(const ovs_u128 *mega_ufid, uint32_t mark)
+{
+    size_t hash = dp_netdev_flow_hash(mega_ufid);
+    struct megaflow_to_mark_data *data = xzalloc(sizeof(*data));
+
+    data->mega_ufid = *mega_ufid;
+    data->mark = mark;
+
+    cmap_insert(&flow_mark.megaflow_to_mark,
+                CONST_CAST(struct cmap_node *, &data->node), hash);
+}
+
+/* disassociate flow with a mark */
+static void
+megaflow_to_mark_disassociate(const ovs_u128 *mega_ufid)
+{
+    size_t hash = dp_netdev_flow_hash(mega_ufid);
+    struct megaflow_to_mark_data *data;
+
+    CMAP_FOR_EACH_WITH_HASH (data, node, hash, &flow_mark.megaflow_to_mark) {
+        if (ovs_u128_equals(*mega_ufid, data->mega_ufid)) {
+            cmap_remove(&flow_mark.megaflow_to_mark,
+                        CONST_CAST(struct cmap_node *, &data->node), hash);
+            free(data);
+            return;
+        }
+    }
+
+    VLOG_WARN("masked ufid "UUID_FMT" is not associated with a mark?\n",
+              UUID_ARGS((struct uuid *)mega_ufid));
+}
+
+static inline uint32_t
+megaflow_to_mark_find(const ovs_u128 *mega_ufid)
+{
+    size_t hash = dp_netdev_flow_hash(mega_ufid);
+    struct megaflow_to_mark_data *data;
+
+    CMAP_FOR_EACH_WITH_HASH (data, node, hash, &flow_mark.megaflow_to_mark) {
+        if (ovs_u128_equals(*mega_ufid, data->mega_ufid)) {
+            return data->mark;
+        }
+    }
+
+    return INVALID_FLOW_MARK;
+}
+
+/* associate mark with a flow, which is 1:N mapping */
+static void
+mark_to_flow_associate(const uint32_t mark, struct dp_netdev_flow *flow)
+{
+    dp_netdev_flow_ref(flow);
+
+    cmap_insert(&flow_mark.mark_to_flow,
+                CONST_CAST(struct cmap_node *, &flow->mark_node),
+                mark);
+    flow->mark = mark;
+
+    VLOG_DBG("associated dp_netdev flow %p with mark %u\n", flow, mark);
+}
+
+static bool
+flow_mark_has_no_ref(uint32_t mark)
+{
+    struct dp_netdev_flow *flow;
+
+    CMAP_FOR_EACH_WITH_HASH (flow, mark_node, mark,
+                             &flow_mark.mark_to_flow) {
+        if (flow->mark == mark) {
+            return false;
+        }
+    }
+
+    return true;
+}
+
+static int
+mark_to_flow_disassociate(struct dp_netdev_pmd_thread *pmd,
+                          struct dp_netdev_flow *flow)
+{
+    int ret = 0;
+    uint32_t mark = flow->mark;
+    struct cmap_node *mark_node = CONST_CAST(struct cmap_node *,
+                                             &flow->mark_node);
+
+    cmap_remove(&flow_mark.mark_to_flow, mark_node, mark);
+    flow->mark = INVALID_FLOW_MARK;
+
+    if (flow_mark_has_no_ref(mark)) {
+        struct dp_netdev_port *port;
+        odp_port_t in_port = flow->flow.in_port.odp_port;
+
+        port = dp_netdev_lookup_port(pmd->dp, in_port);
+        if (port) {
+            ret = netdev_flow_del(port->netdev, &flow->mega_ufid, NULL);
+        }
+
+        flow_mark_free(mark);
+        VLOG_DBG("freed flow mark %u\n", mark);
+
+        megaflow_to_mark_disassociate(&flow->mega_ufid);
+    }
+    dp_netdev_flow_unref(flow);
+
+    return ret;
+}
+
+static void
+flow_mark_flush(struct dp_netdev_pmd_thread *pmd)
+{
+    struct dp_netdev_flow *flow;
+
+    CMAP_FOR_EACH (flow, mark_node, &flow_mark.mark_to_flow) {
+        if (flow->pmd_id == pmd->core_id) {
+            mark_to_flow_disassociate(pmd, flow);
+        }
+    }
+}
+
 static void
 dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
                           struct dp_netdev_flow *flow)
@@ -1845,6 +2015,9 @@  dp_netdev_pmd_remove_flow(struct dp_netdev_pmd_thread *pmd,
     ovs_assert(cls != NULL);
     dpcls_remove(cls, &flow->cr);
     cmap_remove(&pmd->flow_table, node, dp_netdev_flow_hash(&flow->ufid));
+    if (flow->mark != INVALID_FLOW_MARK) {
+        mark_to_flow_disassociate(pmd, flow);
+    }
     flow->dead = true;
 
     dp_netdev_flow_unref(flow);
@@ -2424,6 +2597,87 @@  out:
     return error;
 }
 
+static void
+try_netdev_flow_put(struct dp_netdev_pmd_thread *pmd, odp_port_t in_port,
+                    struct dp_netdev_flow *flow, struct match *match,
+                    const struct nlattr *actions, size_t actions_len)
+{
+    struct offload_info info;
+    struct dp_netdev_port *port;
+    bool modification = flow->mark != INVALID_FLOW_MARK;
+    const char *op = modification ? "modify" : "add";
+    uint32_t mark;
+    int ret;
+
+    port = dp_netdev_lookup_port(pmd->dp, in_port);
+    if (!port) {
+        return;
+    }
+
+    ovs_mutex_lock(&flow_mark.mutex);
+
+    if (modification) {
+        mark = flow->mark;
+    } else {
+        if (!netdev_is_flow_api_enabled()) {
+            goto out;
+        }
+
+        /*
+         * If a mega flow has already been offloaded (from other PMD
+         * instances), do not offload it again.
+         */
+        mark = megaflow_to_mark_find(&flow->mega_ufid);
+        if (mark != INVALID_FLOW_MARK) {
+            VLOG_DBG("flow has already been offloaded with mark %u\n", mark);
+            mark_to_flow_associate(mark, flow);
+            goto out;
+        }
+
+        mark = flow_mark_alloc();
+        if (mark == INVALID_FLOW_MARK) {
+            VLOG_ERR("failed to allocate flow mark!\n");
+            goto out;
+        }
+    }
+
+    info.flow_mark = mark;
+    ret = netdev_flow_put(port->netdev, match,
+                          CONST_CAST(struct nlattr *, actions),
+                          actions_len, &flow->mega_ufid, &info, NULL);
+    if (ret) {
+        VLOG_ERR("failed to %s netdev flow with mark %u\n", op, mark);
+        if (!modification) {
+            flow_mark_free(mark);
+        } else {
+            mark_to_flow_disassociate(pmd, flow);
+        }
+        goto out;
+    }
+
+    if (!modification) {
+        megaflow_to_mark_associate(&flow->mega_ufid, mark);
+        mark_to_flow_associate(mark, flow);
+    }
+    VLOG_DBG("succeed to %s netdev flow with mark %u\n", op, mark);
+
+out:
+    ovs_mutex_unlock(&flow_mark.mutex);
+}
+
+static void
+dp_netdev_get_mega_ufid(const struct match *match, ovs_u128 *mega_ufid)
+{
+    struct flow masked_flow;
+    size_t i;
+
+    for (i = 0; i < sizeof(struct flow); i++) {
+        ((uint8_t *)&masked_flow)[i] = ((uint8_t *)&match->flow)[i] &
+                                       ((uint8_t *)&match->wc)[i];
+    }
+    dpif_flow_hash(NULL, &masked_flow, sizeof(struct flow), mega_ufid);
+}
+
 static struct dp_netdev_flow *
 dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
                    struct match *match, const ovs_u128 *ufid,
@@ -2459,12 +2713,14 @@  dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
     memset(&flow->stats, 0, sizeof flow->stats);
     flow->dead = false;
     flow->batch = NULL;
+    flow->mark = INVALID_FLOW_MARK;
     *CONST_CAST(unsigned *, &flow->pmd_id) = pmd->core_id;
     *CONST_CAST(struct flow *, &flow->flow) = match->flow;
     *CONST_CAST(ovs_u128 *, &flow->ufid) = *ufid;
     ovs_refcount_init(&flow->ref_cnt);
     ovsrcu_set(&flow->actions, dp_netdev_actions_create(actions, actions_len));
 
+    dp_netdev_get_mega_ufid(match, CONST_CAST(ovs_u128 *, &flow->mega_ufid));
     netdev_flow_key_init_masked(&flow->cr.flow, &match->flow, &mask);
 
     /* Select dpcls for in_port. Relies on in_port to be exact match. */
@@ -2474,6 +2730,8 @@  dp_netdev_flow_add(struct dp_netdev_pmd_thread *pmd,
     cmap_insert(&pmd->flow_table, CONST_CAST(struct cmap_node *, &flow->node),
                 dp_netdev_flow_hash(&flow->ufid));
 
+    try_netdev_flow_put(pmd, in_port, flow, match, actions, actions_len);
+
     if (OVS_UNLIKELY(!VLOG_DROP_DBG((&upcall_rl)))) {
         struct ds ds = DS_EMPTY_INITIALIZER;
         struct ofpbuf key_buf, mask_buf;
@@ -2554,6 +2812,7 @@  flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
         if (put->flags & DPIF_FP_MODIFY) {
             struct dp_netdev_actions *new_actions;
             struct dp_netdev_actions *old_actions;
+            odp_port_t in_port = netdev_flow->flow.in_port.odp_port;
 
             new_actions = dp_netdev_actions_create(put->actions,
                                                    put->actions_len);
@@ -2561,6 +2820,9 @@  flow_put_on_pmd(struct dp_netdev_pmd_thread *pmd,
             old_actions = dp_netdev_flow_get_actions(netdev_flow);
             ovsrcu_set(&netdev_flow->actions, new_actions);
 
+            try_netdev_flow_put(pmd, in_port, netdev_flow, match,
+                                put->actions, put->actions_len);
+
             if (stats) {
                 get_dpif_flow_stats(netdev_flow, stats);
             }
@@ -3570,6 +3832,7 @@  reload_affected_pmds(struct dp_netdev *dp)
 
     CMAP_FOR_EACH (pmd, node, &dp->poll_threads) {
         if (pmd->need_reload) {
+            flow_mark_flush(pmd);
             dp_netdev_reload_pmd__(pmd);
             pmd->need_reload = false;
         }
diff --git a/lib/netdev.h b/lib/netdev.h
index 3a545fe..0c1946a 100644
--- a/lib/netdev.h
+++ b/lib/netdev.h
@@ -188,6 +188,12 @@  void netdev_send_wait(struct netdev *, int qid);
 struct offload_info {
     const struct dpif_class *dpif_class;
     ovs_be16 tp_dst_port; /* Destination port for tunnel in SET action */
+
+    /*
+     * The flow mark id assigened to the flow. If any pkts hit the flow,
+     * it will be in the pkt meta data.
+     */
+    uint32_t flow_mark;
 };
 struct dpif_class;
 struct netdev_flow_dump;