diff mbox

[ovs-dev,v2,4/4] tunneling: Avoid datapath-recirc by combining recirc actions at xlate.

Message ID 1499161601-3230-5-git-send-email-sugesh.chandran@intel.com
State Changes Requested
Delegated to: Joe Stringer
Headers show

Commit Message

Chandran, Sugesh July 4, 2017, 9:46 a.m. UTC
This patch set removes the recirculation of encapsulated tunnel packets
if possible. It is done by computing the post tunnel actions at the time of
translation. The combined nested action set are programmed in the datapath
using CLONE action.

Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
Co-authored-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
---
 lib/dpif-netdev.c                  |  18 +--
 ofproto/ofproto-dpif-xlate-cache.c |  14 +-
 ofproto/ofproto-dpif-xlate-cache.h |  12 +-
 ofproto/ofproto-dpif-xlate.c       | 295 ++++++++++++++++++++++++++++++++++++-
 ofproto/ofproto-dpif-xlate.h       |   1 +
 ofproto/ofproto-dpif.c             |   3 +-
 tests/packet-type-aware.at         |  24 +--
 7 files changed, 324 insertions(+), 43 deletions(-)

Comments

Joe Stringer July 14, 2017, 11:46 p.m. UTC | #1
On 4 July 2017 at 02:46, Sugesh Chandran <sugesh.chandran@intel.com> wrote:
> This patch set removes the recirculation of encapsulated tunnel packets
> if possible. It is done by computing the post tunnel actions at the time of
> translation. The combined nested action set are programmed in the datapath
> using CLONE action.
>
> Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>
> Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
> Co-authored-by: Zoltán Balogh <zoltan.balogh@ericsson.com>
> ---

Hi Sugesh, Zoltán,

Thanks for working on this, it's subtle code but it seems like you've
found a useful optimization here. Hopefully we can move this forward
soon.

Would you be able to put your performance numbers into the commit
message here? They could be useful to refer back to in the future.

This patch needs rebasing.

More feedback below.

>  lib/dpif-netdev.c                  |  18 +--
>  ofproto/ofproto-dpif-xlate-cache.c |  14 +-
>  ofproto/ofproto-dpif-xlate-cache.h |  12 +-
>  ofproto/ofproto-dpif-xlate.c       | 295 ++++++++++++++++++++++++++++++++++++-
>  ofproto/ofproto-dpif-xlate.h       |   1 +
>  ofproto/ofproto-dpif.c             |   3 +-
>  tests/packet-type-aware.at         |  24 +--
>  7 files changed, 324 insertions(+), 43 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 4e29085..4d996c1 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -5028,24 +5028,8 @@ dp_execute_cb(void *aux_, struct dp_packet_batch *packets_,
>
>      case OVS_ACTION_ATTR_TUNNEL_PUSH:
>          if (*depth < MAX_RECIRC_DEPTH) {
> -            struct dp_packet_batch tnl_pkt;
> -            struct dp_packet_batch *orig_packets_ = packets_;
> -            int err;
> -
> -            if (!may_steal) {
> -                dp_packet_batch_clone(&tnl_pkt, packets_);
> -                packets_ = &tnl_pkt;
> -                dp_packet_batch_reset_cutlen(orig_packets_);
> -            }
> -
>              dp_packet_batch_apply_cutlen(packets_);
> -
> -            err = push_tnl_action(pmd, a, packets_);
> -            if (!err) {
> -                (*depth)++;
> -                dp_netdev_recirculate(pmd, packets_);
> -                (*depth)--;
> -            }
> +            push_tnl_action(pmd, a, packets_);

If I follow, this was the previous datapath-level code to ensure that
when packets are output to a tunnel, only the copy of the packet that
goes to the tunnel gets the cutlen applied. With the new version of
the code, we replace this by making sure that the ofproto layer
generates a clone to wrap the push_tunnel and all subsequent bridge
translation, so then it results in the equivalent behaviour: The
cutlen is only ever applied to a clone of the packets. Is that right?

<snip>

> +/* Validate if the transalated combined actions are OK to proceed.
> + * If actions consist of TRUNC action, it is not allowed to do the
> + * tunnel_push combine as it cannot update stats correctly.
> + */
> +static bool
> +is_tunnel_actions_clone_ready(struct ofpbuf *actions)
> +{
> +    struct nlattr *tnl_actions;
> +    const struct nlattr *a;
> +    unsigned int left;
> +    size_t actions_len;
> +
> +    if (!actions) {
> +        /* No actions, no harm in doing combine. */
> +        return true;
> +    }
> +    actions_len = actions->size;
> +
> +    tnl_actions =(struct nlattr *)(actions->data);
> +    NL_ATTR_FOR_EACH_UNSAFE (a, left, tnl_actions, actions_len) {
> +        int type = nl_attr_type(a);
> +        if (type == OVS_ACTION_ATTR_TRUNC) {
> +            VLOG_DBG("Cannot do tunnel action-combine on trunc action");
> +            return false;
> +            break;
> +        }
> +    }
> +    return true;
> +}
> +
> +/*
> + * Copy the xc entry and update the references accordingly.
> + */
> +static void
> +copy_xc_entry(struct xc_entry *xc_dst, struct xc_entry *xc_src)

I think that this whole function should probably reside in
ofproto/ofproto-dpif-xlate-cache.c. However, I have an alternative
proposal detailed below that would not require us to also take
expensive refcounts due to this approach.

> +{
> +    memcpy(xc_dst, xc_src, sizeof *xc_dst);
> +    switch (xc_src->type) {
> +    case XC_RULE:
> +        ofproto_rule_ref(&xc_dst->rule->up);
> +        break;
> +    case XC_BOND:
> +        bond_ref(xc_dst->bond.bond);
> +        xc_dst->bond.flow = xmemdup(xc_src->bond.flow,
> +                            sizeof *xc_src->bond.flow);
> +        break;
> +    case XC_NETDEV:
> +        if (xc_src->dev.tx) {
> +            netdev_ref(xc_dst->dev.tx);
> +        }
> +        if (xc_src->dev.rx) {
> +            netdev_ref(xc_dst->dev.rx);
> +        }
> +        if (xc_src->dev.bfd) {
> +            bfd_ref(xc_dst->dev.bfd);
> +        }
> +        break;
> +    case XC_NETFLOW:
> +        netflow_ref(xc_dst->nf.netflow);
> +        xc_dst->nf.flow = xmemdup(xc_src->nf.flow, sizeof *xc_src->nf.flow);
> +        break;
> +    case XC_MIRROR:
> +        mbridge_ref(xc_dst->mirror.mbridge);
> +        break;
> +    case XC_LEARN:
> +    case XC_TABLE:
> +    case XC_NORMAL:
> +    case XC_FIN_TIMEOUT:
> +    case XC_GROUP:
> +    case XC_TNL_NEIGH:
> +    case XC_CONTROLLER:
> +    case XC_TUNNEL_HEADER:
> +    break;
> +    default:
> +        OVS_NOT_REACHED();
> +    }
> +}
> +
> +static bool
> +validate_and_combine_post_tnl_actions(struct xlate_ctx *ctx,
> +                                      const struct xport *xport,
> +                                      struct xport *out_dev,
> +                                      struct ovs_action_push_tnl tnl_push_data)
> +{
> +    const struct dpif_flow_stats *backup_resubmit_stats;
> +    struct xlate_cache *backup_xcache;
> +    bool nested_act_flag = false;
> +    struct flow_wildcards tmp_flow_wc;
> +    struct flow_wildcards *backup_flow_wc_ptr;
> +    bool backup_side_effects;
> +    const struct dp_packet *backup_pkt;
> +
> +    memset(&tmp_flow_wc, 0 , sizeof tmp_flow_wc);
> +    backup_flow_wc_ptr = ctx->wc;
> +    ctx->wc = &tmp_flow_wc;
> +    ctx->xin->wc = NULL;
> +    backup_resubmit_stats = ctx->xin->resubmit_stats;
> +    backup_xcache = ctx->xin->xcache;
> +    backup_side_effects = ctx->xin->allow_side_effects;
> +    backup_pkt = ctx->xin->packet;
> +
> +    size_t push_action_size = 0;
> +    size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions,
> +                                           OVS_ACTION_ATTR_CLONE);
> +    odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);
> +    push_action_size = ctx->odp_actions->size;
> +
> +    ctx->xin->resubmit_stats =  NULL;
> +    ctx->xin->xcache = xlate_cache_new(); /* Use new temporary cache. */
> +    ctx->xin->allow_side_effects = false;
> +    ctx->xin->packet = NULL;

I realised that you asked about this after the previous patch, but I
didn't understand the implications of it. I believe that this packet
is how, for instance, the second bridge can forward packets to the
controller. If you reset it to NULL here, then I don't think that the
controller action is executed correctly during the second bridge
translation. (That's a test we could add).

I suspect that some multicast snooping stuff and some other pieces
that would be affected by this.

> +
> +    if (ctx->xin->xcache) {

A few lines above, this is unconditionally set. Could be removed?

> +        /* Push the cache entry for the tunnel first. */
> +        struct xc_entry *entry;
> +        entry = xlate_cache_add_entry(ctx->xin->xcache, XC_TUNNEL_HEADER);
> +        entry->tunnel_hdr.hdr_size = tnl_push_data.header_len;
> +        entry->tunnel_hdr.operation = ADD;
> +    }
> +
> +    apply_nested_clone_actions(ctx, xport, out_dev);
> +    nested_act_flag = is_tunnel_actions_clone_ready(ctx->odp_actions);
> +
> +    if (nested_act_flag) {
> +        struct dpif_flow_stats tmp_resubmit_stats;
> +        struct xc_entry *entry;
> +        struct ofpbuf entries;
> +
> +        memset(&tmp_resubmit_stats, 0, sizeof tmp_resubmit_stats);

I think that tmp_resubmit_stats is only ever used in the if condition
below, which will initialize the entire structure anyway so the
declaration can be moved below, and this memset line can be removed.

> +         /* Similar to the stats update in revalidation, the x_cache entries
> +          * are populated by the previous translation are used to update the
> +          * stats correctly.
> +         .*/
> +        ovs_mutex_lock(&ctx->xin->resubmit_stats_mutex);

Which piece of shared memory is this mutex protecting?

> +        if (backup_resubmit_stats) {
> +            memcpy(&tmp_resubmit_stats, backup_resubmit_stats,
> +                   sizeof tmp_resubmit_stats);
> +            xlate_push_stats(ctx->xin->xcache, &tmp_resubmit_stats);
> +        }
> +        entries = ctx->xin->xcache->entries;
> +        XC_ENTRY_FOR_EACH (entry, &entries) {
> +            struct xc_entry *xc_copy_entry;
> +            if (!backup_xcache) {
> +                break;

Do we need to do this for every single XC entry?

> +            }
> +            xc_copy_entry = xlate_cache_add_entry(backup_xcache, entry->type);
> +            copy_xc_entry(xc_copy_entry, entry);

I wonder if instead of open-coding this stuff here, instead there was
a function defined in ofproto-dpif-xlate-cache.c which would append a
second xlate_cache onto a first. A function prototype like this,
perhaps?

/* Iterates through 'src' and removes the xc_entry elements from it,
and places them inside 'dst' */
void xlate_cache_steal_entries(struct xlate_cache *dst, struct
xlate_cache *src);

The current execution should own both xlate_caches so I think that it
should be safe to more or less just append the latter xlate_cache onto
the original one.

> +        }
> +        ovs_mutex_unlock(&ctx->xin->resubmit_stats_mutex);
> +    }
> +    else {

Please place the else on the same line as the close bracket.

> +        /* Combine is not valid. */
> +        nl_msg_cancel_nested(ctx->odp_actions, clone_ofs);
> +        goto out;
> +    }
> +    if (ctx->odp_actions->size > push_action_size) {
> +        /* Update the CLONE action only when combined. */
> +        nl_msg_end_nested(ctx->odp_actions, clone_ofs);
> +    }
> +    else {

Same style request.

> +        /* No actions after the tunnel, no need of clone. */
> +        nl_msg_cancel_nested(ctx->odp_actions, clone_ofs);
> +        odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);

If there are no actions after the tunnel, then do we need the tunnel
header at all?

Do we have a test case which outputs to a tunnel which goes to another
bridge that drops the packet, then on the first bridge continues to
execute some more stuff to make sure this path works in the way we
expect?
Zoltan Balogh July 17, 2017, 2:37 p.m. UTC | #2
Hi Joe,

I used the setup below to measure Tx performance using a single core and 2 x 10G Ethernet links. I generated traffic of 64 byte packets to utilize the 10G bandwidth and measured the Tx processing cost per transmitted packet in nsec.

            +-------------+                  
      dpdk0 |             |                  
         -->o    br-in    |                  
            |             o--> gre0          
            +-------------+                  
                                             
                   --> LOCAL                 
            +-----------o-+                  
            |             | dpdk1            
            |    br-p1    o-->             
            |             |                  
            +-------------+  

This is the result of OVS master with DPDK 16.11.2:

 # dpdk0

 RX packets         : 7037641.60  / sec
 RX packet errors   : 0  / sec
 RX packets dropped : 7730632.90  / sec
 RX rate            : 402.69 MB/sec

 # dpdk1

 TX packets         : 7037641.60  / sec
 TX packet errors   : 0  / sec
 TX packets dropped : 0  / sec
 TX rate            : 657.73 MB/sec
 TX processing cost per TX packets in nsec : 142.09


This is the result of OVS master + patch with DPDK 16.11.2:

 # dpdk0

 RX packets         : 9386809.60  / sec
 RX packet errors   : 0  / sec
 RX packets dropped : 5381496.40  / sec
 RX rate            : 537.11 MB/sec

 # dpdk1

 TX packets         : 9386809.60  / sec
 TX packet errors   : 0  / sec
 TX packets dropped : 0  / sec
 TX rate            : 877.29 MB/sec
 TX processing cost per TX packets in nsec : 106.53

As you can see the number of transmitted packets per second increased from 7M to 9.3M. The gain is above 30%

Best regards,
Zoltan

> -----Original Message-----

> From: Joe Stringer [mailto:joe@ovn.org]

> Sent: Saturday, July 15, 2017 1:46 AM

> To: Sugesh Chandran <sugesh.chandran@intel.com>

> Cc: ovs dev <dev@openvswitch.org>; Andy Zhou <azhou@ovn.org>; Zoltán Balogh <zoltan.balogh@ericsson.com>

> Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by combining recirc actions at xlate.

> 

> On 4 July 2017 at 02:46, Sugesh Chandran <sugesh.chandran@intel.com> wrote:

> > This patch set removes the recirculation of encapsulated tunnel packets

> > if possible. It is done by computing the post tunnel actions at the time of

> > translation. The combined nested action set are programmed in the datapath

> > using CLONE action.

> >

> > Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>

> > Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com>

> > Co-authored-by: Zoltán Balogh <zoltan.balogh@ericsson.com>

> > ---

> 

> Hi Sugesh, Zoltán,

> 

> Thanks for working on this, it's subtle code but it seems like you've

> found a useful optimization here. Hopefully we can move this forward

> soon.

> 

> Would you be able to put your performance numbers into the commit

> message here? They could be useful to refer back to in the future.

> 

> This patch needs rebasing.

> 

> More feedback below.

> 

> >  lib/dpif-netdev.c                  |  18 +--

> >  ofproto/ofproto-dpif-xlate-cache.c |  14 +-

> >  ofproto/ofproto-dpif-xlate-cache.h |  12 +-

> >  ofproto/ofproto-dpif-xlate.c       | 295 ++++++++++++++++++++++++++++++++++++-

> >  ofproto/ofproto-dpif-xlate.h       |   1 +

> >  ofproto/ofproto-dpif.c             |   3 +-

> >  tests/packet-type-aware.at         |  24 +--

> >  7 files changed, 324 insertions(+), 43 deletions(-)

> >

> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c

> > index 4e29085..4d996c1 100644

> > --- a/lib/dpif-netdev.c

> > +++ b/lib/dpif-netdev.c

> > @@ -5028,24 +5028,8 @@ dp_execute_cb(void *aux_, struct dp_packet_batch *packets_,

> >

> >      case OVS_ACTION_ATTR_TUNNEL_PUSH:

> >          if (*depth < MAX_RECIRC_DEPTH) {

> > -            struct dp_packet_batch tnl_pkt;

> > -            struct dp_packet_batch *orig_packets_ = packets_;

> > -            int err;

> > -

> > -            if (!may_steal) {

> > -                dp_packet_batch_clone(&tnl_pkt, packets_);

> > -                packets_ = &tnl_pkt;

> > -                dp_packet_batch_reset_cutlen(orig_packets_);

> > -            }

> > -

> >              dp_packet_batch_apply_cutlen(packets_);

> > -

> > -            err = push_tnl_action(pmd, a, packets_);

> > -            if (!err) {

> > -                (*depth)++;

> > -                dp_netdev_recirculate(pmd, packets_);

> > -                (*depth)--;

> > -            }

> > +            push_tnl_action(pmd, a, packets_);

> 

> If I follow, this was the previous datapath-level code to ensure that

> when packets are output to a tunnel, only the copy of the packet that

> goes to the tunnel gets the cutlen applied. With the new version of

> the code, we replace this by making sure that the ofproto layer

> generates a clone to wrap the push_tunnel and all subsequent bridge

> translation, so then it results in the equivalent behaviour: The

> cutlen is only ever applied to a clone of the packets. Is that right?

> 

> <snip>

> 

> > +/* Validate if the transalated combined actions are OK to proceed.

> > + * If actions consist of TRUNC action, it is not allowed to do the

> > + * tunnel_push combine as it cannot update stats correctly.

> > + */

> > +static bool

> > +is_tunnel_actions_clone_ready(struct ofpbuf *actions)

> > +{

> > +    struct nlattr *tnl_actions;

> > +    const struct nlattr *a;

> > +    unsigned int left;

> > +    size_t actions_len;

> > +

> > +    if (!actions) {

> > +        /* No actions, no harm in doing combine. */

> > +        return true;

> > +    }

> > +    actions_len = actions->size;

> > +

> > +    tnl_actions =(struct nlattr *)(actions->data);

> > +    NL_ATTR_FOR_EACH_UNSAFE (a, left, tnl_actions, actions_len) {

> > +        int type = nl_attr_type(a);

> > +        if (type == OVS_ACTION_ATTR_TRUNC) {

> > +            VLOG_DBG("Cannot do tunnel action-combine on trunc action");

> > +            return false;

> > +            break;

> > +        }

> > +    }

> > +    return true;

> > +}

> > +

> > +/*

> > + * Copy the xc entry and update the references accordingly.

> > + */

> > +static void

> > +copy_xc_entry(struct xc_entry *xc_dst, struct xc_entry *xc_src)

> 

> I think that this whole function should probably reside in

> ofproto/ofproto-dpif-xlate-cache.c. However, I have an alternative

> proposal detailed below that would not require us to also take

> expensive refcounts due to this approach.

> 

> > +{

> > +    memcpy(xc_dst, xc_src, sizeof *xc_dst);

> > +    switch (xc_src->type) {

> > +    case XC_RULE:

> > +        ofproto_rule_ref(&xc_dst->rule->up);

> > +        break;

> > +    case XC_BOND:

> > +        bond_ref(xc_dst->bond.bond);

> > +        xc_dst->bond.flow = xmemdup(xc_src->bond.flow,

> > +                            sizeof *xc_src->bond.flow);

> > +        break;

> > +    case XC_NETDEV:

> > +        if (xc_src->dev.tx) {

> > +            netdev_ref(xc_dst->dev.tx);

> > +        }

> > +        if (xc_src->dev.rx) {

> > +            netdev_ref(xc_dst->dev.rx);

> > +        }

> > +        if (xc_src->dev.bfd) {

> > +            bfd_ref(xc_dst->dev.bfd);

> > +        }

> > +        break;

> > +    case XC_NETFLOW:

> > +        netflow_ref(xc_dst->nf.netflow);

> > +        xc_dst->nf.flow = xmemdup(xc_src->nf.flow, sizeof *xc_src->nf.flow);

> > +        break;

> > +    case XC_MIRROR:

> > +        mbridge_ref(xc_dst->mirror.mbridge);

> > +        break;

> > +    case XC_LEARN:

> > +    case XC_TABLE:

> > +    case XC_NORMAL:

> > +    case XC_FIN_TIMEOUT:

> > +    case XC_GROUP:

> > +    case XC_TNL_NEIGH:

> > +    case XC_CONTROLLER:

> > +    case XC_TUNNEL_HEADER:

> > +    break;

> > +    default:

> > +        OVS_NOT_REACHED();

> > +    }

> > +}

> > +

> > +static bool

> > +validate_and_combine_post_tnl_actions(struct xlate_ctx *ctx,

> > +                                      const struct xport *xport,

> > +                                      struct xport *out_dev,

> > +                                      struct ovs_action_push_tnl tnl_push_data)

> > +{

> > +    const struct dpif_flow_stats *backup_resubmit_stats;

> > +    struct xlate_cache *backup_xcache;

> > +    bool nested_act_flag = false;

> > +    struct flow_wildcards tmp_flow_wc;

> > +    struct flow_wildcards *backup_flow_wc_ptr;

> > +    bool backup_side_effects;

> > +    const struct dp_packet *backup_pkt;

> > +

> > +    memset(&tmp_flow_wc, 0 , sizeof tmp_flow_wc);

> > +    backup_flow_wc_ptr = ctx->wc;

> > +    ctx->wc = &tmp_flow_wc;

> > +    ctx->xin->wc = NULL;

> > +    backup_resubmit_stats = ctx->xin->resubmit_stats;

> > +    backup_xcache = ctx->xin->xcache;

> > +    backup_side_effects = ctx->xin->allow_side_effects;

> > +    backup_pkt = ctx->xin->packet;

> > +

> > +    size_t push_action_size = 0;

> > +    size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions,

> > +                                           OVS_ACTION_ATTR_CLONE);

> > +    odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);

> > +    push_action_size = ctx->odp_actions->size;

> > +

> > +    ctx->xin->resubmit_stats =  NULL;

> > +    ctx->xin->xcache = xlate_cache_new(); /* Use new temporary cache. */

> > +    ctx->xin->allow_side_effects = false;

> > +    ctx->xin->packet = NULL;

> 

> I realised that you asked about this after the previous patch, but I

> didn't understand the implications of it. I believe that this packet

> is how, for instance, the second bridge can forward packets to the

> controller. If you reset it to NULL here, then I don't think that the

> controller action is executed correctly during the second bridge

> translation. (That's a test we could add).

> 

> I suspect that some multicast snooping stuff and some other pieces

> that would be affected by this.

> 

> > +

> > +    if (ctx->xin->xcache) {

> 

> A few lines above, this is unconditionally set. Could be removed?

> 

> > +        /* Push the cache entry for the tunnel first. */

> > +        struct xc_entry *entry;

> > +        entry = xlate_cache_add_entry(ctx->xin->xcache, XC_TUNNEL_HEADER);

> > +        entry->tunnel_hdr.hdr_size = tnl_push_data.header_len;

> > +        entry->tunnel_hdr.operation = ADD;

> > +    }

> > +

> > +    apply_nested_clone_actions(ctx, xport, out_dev);

> > +    nested_act_flag = is_tunnel_actions_clone_ready(ctx->odp_actions);

> > +

> > +    if (nested_act_flag) {

> > +        struct dpif_flow_stats tmp_resubmit_stats;

> > +        struct xc_entry *entry;

> > +        struct ofpbuf entries;

> > +

> > +        memset(&tmp_resubmit_stats, 0, sizeof tmp_resubmit_stats);

> 

> I think that tmp_resubmit_stats is only ever used in the if condition

> below, which will initialize the entire structure anyway so the

> declaration can be moved below, and this memset line can be removed.

> 

> > +         /* Similar to the stats update in revalidation, the x_cache entries

> > +          * are populated by the previous translation are used to update the

> > +          * stats correctly.

> > +         .*/

> > +        ovs_mutex_lock(&ctx->xin->resubmit_stats_mutex);

> 

> Which piece of shared memory is this mutex protecting?

> 

> > +        if (backup_resubmit_stats) {

> > +            memcpy(&tmp_resubmit_stats, backup_resubmit_stats,

> > +                   sizeof tmp_resubmit_stats);

> > +            xlate_push_stats(ctx->xin->xcache, &tmp_resubmit_stats);

> > +        }

> > +        entries = ctx->xin->xcache->entries;

> > +        XC_ENTRY_FOR_EACH (entry, &entries) {

> > +            struct xc_entry *xc_copy_entry;

> > +            if (!backup_xcache) {

> > +                break;

> 

> Do we need to do this for every single XC entry?

> 

> > +            }

> > +            xc_copy_entry = xlate_cache_add_entry(backup_xcache, entry->type);

> > +            copy_xc_entry(xc_copy_entry, entry);

> 

> I wonder if instead of open-coding this stuff here, instead there was

> a function defined in ofproto-dpif-xlate-cache.c which would append a

> second xlate_cache onto a first. A function prototype like this,

> perhaps?

> 

> /* Iterates through 'src' and removes the xc_entry elements from it,

> and places them inside 'dst' */

> void xlate_cache_steal_entries(struct xlate_cache *dst, struct

> xlate_cache *src);

> 

> The current execution should own both xlate_caches so I think that it

> should be safe to more or less just append the latter xlate_cache onto

> the original one.

> 

> > +        }

> > +        ovs_mutex_unlock(&ctx->xin->resubmit_stats_mutex);

> > +    }

> > +    else {

> 

> Please place the else on the same line as the close bracket.

> 

> > +        /* Combine is not valid. */

> > +        nl_msg_cancel_nested(ctx->odp_actions, clone_ofs);

> > +        goto out;

> > +    }

> > +    if (ctx->odp_actions->size > push_action_size) {

> > +        /* Update the CLONE action only when combined. */

> > +        nl_msg_end_nested(ctx->odp_actions, clone_ofs);

> > +    }

> > +    else {

> 

> Same style request.

> 

> > +        /* No actions after the tunnel, no need of clone. */

> > +        nl_msg_cancel_nested(ctx->odp_actions, clone_ofs);

> > +        odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);

> 

> If there are no actions after the tunnel, then do we need the tunnel

> header at all?

> 

> Do we have a test case which outputs to a tunnel which goes to another

> bridge that drops the packet, then on the first bridge continues to

> execute some more stuff to make sure this path works in the way we

> expect?
Chandran, Sugesh July 17, 2017, 5:27 p.m. UTC | #3
Hi Joe,
Thank you for the comments., 
Please find my answers below

Regards
_Sugesh


> -----Original Message-----

> From: Zoltán Balogh [mailto:zoltan.balogh@ericsson.com]

> Sent: Monday, July 17, 2017 3:38 PM

> To: Joe Stringer <joe@ovn.org>; Chandran, Sugesh

> <sugesh.chandran@intel.com>

> Cc: ovs dev <dev@openvswitch.org>; Andy Zhou <azhou@ovn.org>

> Subject: RE: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by combining

> recirc actions at xlate.

> 

> 

> Hi Joe,

> 

> I used the setup below to measure Tx performance using a single core and 2

> x 10G Ethernet links. I generated traffic of 64 byte packets to utilize the 10G

> bandwidth and measured the Tx processing cost per transmitted packet in

> nsec.

> 

>             +-------------+

>       dpdk0 |             |

>          -->o    br-in    |

>             |             o--> gre0

>             +-------------+

> 

>                    --> LOCAL

>             +-----------o-+

>             |             | dpdk1

>             |    br-p1    o-->

>             |             |

>             +-------------+

> 

> This is the result of OVS master with DPDK 16.11.2:

> 

>  # dpdk0

> 

>  RX packets         : 7037641.60  / sec

>  RX packet errors   : 0  / sec

>  RX packets dropped : 7730632.90  / sec

>  RX rate            : 402.69 MB/sec

> 

>  # dpdk1

> 

>  TX packets         : 7037641.60  / sec

>  TX packet errors   : 0  / sec

>  TX packets dropped : 0  / sec

>  TX rate            : 657.73 MB/sec

>  TX processing cost per TX packets in nsec : 142.09

> 

> 

> This is the result of OVS master + patch with DPDK 16.11.2:

> 

>  # dpdk0

> 

>  RX packets         : 9386809.60  / sec

>  RX packet errors   : 0  / sec

>  RX packets dropped : 5381496.40  / sec

>  RX rate            : 537.11 MB/sec

> 

>  # dpdk1

> 

>  TX packets         : 9386809.60  / sec

>  TX packet errors   : 0  / sec

>  TX packets dropped : 0  / sec

>  TX rate            : 877.29 MB/sec

>  TX processing cost per TX packets in nsec : 106.53

> 

> As you can see the number of transmitted packets per second increased

> from 7M to 9.3M. The gain is above 30%

> 

> Best regards,

> Zoltan

> 

> > -----Original Message-----

> > From: Joe Stringer [mailto:joe@ovn.org]

> > Sent: Saturday, July 15, 2017 1:46 AM

> > To: Sugesh Chandran <sugesh.chandran@intel.com>

> > Cc: ovs dev <dev@openvswitch.org>; Andy Zhou <azhou@ovn.org>; Zoltán

> > Balogh <zoltan.balogh@ericsson.com>

> > Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by combining

> recirc actions at xlate.

> >

> > On 4 July 2017 at 02:46, Sugesh Chandran <sugesh.chandran@intel.com>

> wrote:

> > > This patch set removes the recirculation of encapsulated tunnel

> > > packets if possible. It is done by computing the post tunnel actions

> > > at the time of translation. The combined nested action set are

> > > programmed in the datapath using CLONE action.

> > >

> > > Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>

> > > Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com>

> > > Co-authored-by: Zoltán Balogh <zoltan.balogh@ericsson.com>

> > > ---

> >

> > Hi Sugesh, Zoltán,

> >

> > Thanks for working on this, it's subtle code but it seems like you've

> > found a useful optimization here. Hopefully we can move this forward

> > soon.

> >

[Sugesh] Thank you Joe!. Will try to do the best to cover most of the corner cases.
> > Would you be able to put your performance numbers into the commit

> > message here? They could be useful to refer back to in the future.

[Sugesh] Sure, Zoltan already ran performance benchmarking and shared the results.
Will add that in the next version of patch.
> >

> > This patch needs rebasing.

[Sugesh] Will do that in the next version.
> >

> > More feedback below.

> >

> > >  lib/dpif-netdev.c                  |  18 +--

> > >  ofproto/ofproto-dpif-xlate-cache.c |  14 +-

> > > ofproto/ofproto-dpif-xlate-cache.h |  12 +-

> > >  ofproto/ofproto-dpif-xlate.c       | 295

> ++++++++++++++++++++++++++++++++++++-

> > >  ofproto/ofproto-dpif-xlate.h       |   1 +

> > >  ofproto/ofproto-dpif.c             |   3 +-

> > >  tests/packet-type-aware.at         |  24 +--

> > >  7 files changed, 324 insertions(+), 43 deletions(-)

> > >

> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index

> > > 4e29085..4d996c1 100644

> > > --- a/lib/dpif-netdev.c

> > > +++ b/lib/dpif-netdev.c

> > > @@ -5028,24 +5028,8 @@ dp_execute_cb(void *aux_, struct

> > > dp_packet_batch *packets_,

> > >

> > >      case OVS_ACTION_ATTR_TUNNEL_PUSH:

> > >          if (*depth < MAX_RECIRC_DEPTH) {

> > > -            struct dp_packet_batch tnl_pkt;

> > > -            struct dp_packet_batch *orig_packets_ = packets_;

> > > -            int err;

> > > -

> > > -            if (!may_steal) {

> > > -                dp_packet_batch_clone(&tnl_pkt, packets_);

> > > -                packets_ = &tnl_pkt;

> > > -                dp_packet_batch_reset_cutlen(orig_packets_);

> > > -            }

> > > -

> > >              dp_packet_batch_apply_cutlen(packets_);

> > > -

> > > -            err = push_tnl_action(pmd, a, packets_);

> > > -            if (!err) {

> > > -                (*depth)++;

> > > -                dp_netdev_recirculate(pmd, packets_);

> > > -                (*depth)--;

> > > -            }

> > > +            push_tnl_action(pmd, a, packets_);

> >

> > If I follow, this was the previous datapath-level code to ensure that

> > when packets are output to a tunnel, only the copy of the packet that

> > goes to the tunnel gets the cutlen applied. With the new version of

> > the code, we replace this by making sure that the ofproto layer

> > generates a clone to wrap the push_tunnel and all subsequent bridge

> > translation, so then it results in the equivalent behaviour: The

> > cutlen is only ever applied to a clone of the packets. Is that right?

[Sugesh] Hmm. That’s the good catch. Looks like the packets are not
cloned when combine_tunnel_action is found impossible in ofproto layer.

As a fix we should keep the code block in 'may_steal' to handle the non tunnel-combine
case.
Any other suggestion to apply cut_len in ofproto?

> >

> > <snip>

> >

> > > +/* Validate if the transalated combined actions are OK to proceed.

> > > + * If actions consist of TRUNC action, it is not allowed to do the

> > > + * tunnel_push combine as it cannot update stats correctly.

> > > + */

> > > +static bool

> > > +is_tunnel_actions_clone_ready(struct ofpbuf *actions) {

> > > +    struct nlattr *tnl_actions;

> > > +    const struct nlattr *a;

> > > +    unsigned int left;

> > > +    size_t actions_len;

> > > +

> > > +    if (!actions) {

> > > +        /* No actions, no harm in doing combine. */

> > > +        return true;

> > > +    }

> > > +    actions_len = actions->size;

> > > +

> > > +    tnl_actions =(struct nlattr *)(actions->data);

> > > +    NL_ATTR_FOR_EACH_UNSAFE (a, left, tnl_actions, actions_len) {

> > > +        int type = nl_attr_type(a);

> > > +        if (type == OVS_ACTION_ATTR_TRUNC) {

> > > +            VLOG_DBG("Cannot do tunnel action-combine on trunc action");

> > > +            return false;

> > > +            break;

> > > +        }

> > > +    }

> > > +    return true;

> > > +}

> > > +

> > > +/*

> > > + * Copy the xc entry and update the references accordingly.

> > > + */

> > > +static void

> > > +copy_xc_entry(struct xc_entry *xc_dst, struct xc_entry *xc_src)

> >

> > I think that this whole function should probably reside in

> > ofproto/ofproto-dpif-xlate-cache.c. However, I have an alternative

> > proposal detailed below that would not require us to also take

> > expensive refcounts due to this approach.

[Sugesh] Will introduce a new function in xlate-cache to handle it.
> >

> > > +{

> > > +    memcpy(xc_dst, xc_src, sizeof *xc_dst);

> > > +    switch (xc_src->type) {

> > > +    case XC_RULE:

> > > +        ofproto_rule_ref(&xc_dst->rule->up);

> > > +        break;

> > > +    case XC_BOND:

> > > +        bond_ref(xc_dst->bond.bond);

> > > +        xc_dst->bond.flow = xmemdup(xc_src->bond.flow,

> > > +                            sizeof *xc_src->bond.flow);

> > > +        break;

> > > +    case XC_NETDEV:

> > > +        if (xc_src->dev.tx) {

> > > +            netdev_ref(xc_dst->dev.tx);

> > > +        }

> > > +        if (xc_src->dev.rx) {

> > > +            netdev_ref(xc_dst->dev.rx);

> > > +        }

> > > +        if (xc_src->dev.bfd) {

> > > +            bfd_ref(xc_dst->dev.bfd);

> > > +        }

> > > +        break;

> > > +    case XC_NETFLOW:

> > > +        netflow_ref(xc_dst->nf.netflow);

> > > +        xc_dst->nf.flow = xmemdup(xc_src->nf.flow, sizeof *xc_src-

> >nf.flow);

> > > +        break;

> > > +    case XC_MIRROR:

> > > +        mbridge_ref(xc_dst->mirror.mbridge);

> > > +        break;

> > > +    case XC_LEARN:

> > > +    case XC_TABLE:

> > > +    case XC_NORMAL:

> > > +    case XC_FIN_TIMEOUT:

> > > +    case XC_GROUP:

> > > +    case XC_TNL_NEIGH:

> > > +    case XC_CONTROLLER:

> > > +    case XC_TUNNEL_HEADER:

> > > +    break;

> > > +    default:

> > > +        OVS_NOT_REACHED();

> > > +    }

> > > +}

> > > +

> > > +static bool

> > > +validate_and_combine_post_tnl_actions(struct xlate_ctx *ctx,

> > > +                                      const struct xport *xport,

> > > +                                      struct xport *out_dev,

> > > +                                      struct ovs_action_push_tnl

> > > +tnl_push_data) {

> > > +    const struct dpif_flow_stats *backup_resubmit_stats;

> > > +    struct xlate_cache *backup_xcache;

> > > +    bool nested_act_flag = false;

> > > +    struct flow_wildcards tmp_flow_wc;

> > > +    struct flow_wildcards *backup_flow_wc_ptr;

> > > +    bool backup_side_effects;

> > > +    const struct dp_packet *backup_pkt;

> > > +

> > > +    memset(&tmp_flow_wc, 0 , sizeof tmp_flow_wc);

> > > +    backup_flow_wc_ptr = ctx->wc;

> > > +    ctx->wc = &tmp_flow_wc;

> > > +    ctx->xin->wc = NULL;

> > > +    backup_resubmit_stats = ctx->xin->resubmit_stats;

> > > +    backup_xcache = ctx->xin->xcache;

> > > +    backup_side_effects = ctx->xin->allow_side_effects;

> > > +    backup_pkt = ctx->xin->packet;

> > > +

> > > +    size_t push_action_size = 0;

> > > +    size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions,

> > > +                                           OVS_ACTION_ATTR_CLONE);

> > > +    odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);

> > > +    push_action_size = ctx->odp_actions->size;

> > > +

> > > +    ctx->xin->resubmit_stats =  NULL;

> > > +    ctx->xin->xcache = xlate_cache_new(); /* Use new temporary cache.

> */

> > > +    ctx->xin->allow_side_effects = false;

> > > +    ctx->xin->packet = NULL;

> >

> > I realised that you asked about this after the previous patch, but I

> > didn't understand the implications of it. I believe that this packet

> > is how, for instance, the second bridge can forward packets to the

> > controller. If you reset it to NULL here, then I don't think that the

> > controller action is executed correctly during the second bridge

> > translation. (That's a test we could add).

[Sugesh] Yes, that’s a good test case for it.
> >

> > I suspect that some multicast snooping stuff and some other pieces

> > that would be affected by this.

[Sugesh] This is interesting. So does the controller expects a encaped
packets in that case? Do you think, the controller action
also be an exception case like the TRUNC?

Do we have a case where send the encap traffic to controller?
	
> >

> > > +

> > > +    if (ctx->xin->xcache) {

> >

> > A few lines above, this is unconditionally set. Could be removed?

[Sugesh] I think it is newly created and its possible to have a NULL incase
of malloc failure. That’s why added the case.
> >

> > > +        /* Push the cache entry for the tunnel first. */

> > > +        struct xc_entry *entry;

> > > +        entry = xlate_cache_add_entry(ctx->xin->xcache,

> XC_TUNNEL_HEADER);

> > > +        entry->tunnel_hdr.hdr_size = tnl_push_data.header_len;

> > > +        entry->tunnel_hdr.operation = ADD;

> > > +    }

> > > +

> > > +    apply_nested_clone_actions(ctx, xport, out_dev);

> > > +    nested_act_flag =

> > > + is_tunnel_actions_clone_ready(ctx->odp_actions);

> > > +

> > > +    if (nested_act_flag) {

> > > +        struct dpif_flow_stats tmp_resubmit_stats;

> > > +        struct xc_entry *entry;

> > > +        struct ofpbuf entries;

> > > +

> > > +        memset(&tmp_resubmit_stats, 0, sizeof tmp_resubmit_stats);

> >

> > I think that tmp_resubmit_stats is only ever used in the if condition

> > below, which will initialize the entire structure anyway so the

> > declaration can be moved below, and this memset line can be removed.

[Sugesh] Sure. Will move it down.
> >

> > > +         /* Similar to the stats update in revalidation, the x_cache entries

> > > +          * are populated by the previous translation are used to update the

> > > +          * stats correctly.

> > > +         .*/

> > > +        ovs_mutex_lock(&ctx->xin->resubmit_stats_mutex);

> >

> > Which piece of shared memory is this mutex protecting?

[Sugesh] Will remove it.
> >

> > > +        if (backup_resubmit_stats) {

> > > +            memcpy(&tmp_resubmit_stats, backup_resubmit_stats,

> > > +                   sizeof tmp_resubmit_stats);

> > > +            xlate_push_stats(ctx->xin->xcache, &tmp_resubmit_stats);

> > > +        }

> > > +        entries = ctx->xin->xcache->entries;

> > > +        XC_ENTRY_FOR_EACH (entry, &entries) {

> > > +            struct xc_entry *xc_copy_entry;

> > > +            if (!backup_xcache) {

> > > +                break;

> >

> > Do we need to do this for every single XC entry?

[Sugesh] Will remove in next release .
> >

> > > +            }

> > > +            xc_copy_entry = xlate_cache_add_entry(backup_xcache, entry-

> >type);

> > > +            copy_xc_entry(xc_copy_entry, entry);

> >

> > I wonder if instead of open-coding this stuff here, instead there was

> > a function defined in ofproto-dpif-xlate-cache.c which would append a

> > second xlate_cache onto a first. A function prototype like this,

> > perhaps?

> >

> > /* Iterates through 'src' and removes the xc_entry elements from it,

> > and places them inside 'dst' */ void xlate_cache_steal_entries(struct

> > xlate_cache *dst, struct xlate_cache *src);

[Sugesh] Make sense, will create a function to append cache entries and clear
off the src once the copying completed.
> >

> > The current execution should own both xlate_caches so I think that it

> > should be safe to more or less just append the latter xlate_cache onto

> > the original one.

> >

> > > +        }

> > > +        ovs_mutex_unlock(&ctx->xin->resubmit_stats_mutex);

> > > +    }

> > > +    else {

> >

> > Please place the else on the same line as the close bracket.

[Sugesh] ok
> >

> > > +        /* Combine is not valid. */

> > > +        nl_msg_cancel_nested(ctx->odp_actions, clone_ofs);

> > > +        goto out;

> > > +    }

> > > +    if (ctx->odp_actions->size > push_action_size) {

> > > +        /* Update the CLONE action only when combined. */

> > > +        nl_msg_end_nested(ctx->odp_actions, clone_ofs);

> > > +    }

> > > +    else {

> >

> > Same style request.

> >

> > > +        /* No actions after the tunnel, no need of clone. */

> > > +        nl_msg_cancel_nested(ctx->odp_actions, clone_ofs);

> > > +        odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);

> >

> > If there are no actions after the tunnel, then do we need the tunnel

> > header at all?

[Sugesh] There are test cases that does it. I kept it for cover those.
> >

> > Do we have a test case which outputs to a tunnel which goes to another

> > bridge that drops the packet, then on the first bridge continues to

> > execute some more stuff to make sure this path works in the way we

> > expect?
Joe Stringer July 17, 2017, 5:43 p.m. UTC | #4
On 17 July 2017 at 07:37, Zoltán Balogh <zoltan.balogh@ericsson.com> wrote:
>
> Hi Joe,
>
> I used the setup below to measure Tx performance using a single core and 2 x 10G Ethernet links. I generated traffic of 64 byte packets to utilize the 10G bandwidth and measured the Tx processing cost per transmitted packet in nsec.
>
>             +-------------+
>       dpdk0 |             |
>          -->o    br-in    |
>             |             o--> gre0
>             +-------------+
>
>                    --> LOCAL
>             +-----------o-+
>             |             | dpdk1
>             |    br-p1    o-->
>             |             |
>             +-------------+
>
> This is the result of OVS master with DPDK 16.11.2:
>
>  # dpdk0
>
>  RX packets         : 7037641.60  / sec
>  RX packet errors   : 0  / sec
>  RX packets dropped : 7730632.90  / sec
>  RX rate            : 402.69 MB/sec
>
>  # dpdk1
>
>  TX packets         : 7037641.60  / sec
>  TX packet errors   : 0  / sec
>  TX packets dropped : 0  / sec
>  TX rate            : 657.73 MB/sec
>  TX processing cost per TX packets in nsec : 142.09
>
>
> This is the result of OVS master + patch with DPDK 16.11.2:
>
>  # dpdk0
>
>  RX packets         : 9386809.60  / sec
>  RX packet errors   : 0  / sec
>  RX packets dropped : 5381496.40  / sec
>  RX rate            : 537.11 MB/sec
>
>  # dpdk1
>
>  TX packets         : 9386809.60  / sec
>  TX packet errors   : 0  / sec
>  TX packets dropped : 0  / sec
>  TX rate            : 877.29 MB/sec
>  TX processing cost per TX packets in nsec : 106.53
>
> As you can see the number of transmitted packets per second increased from 7M to 9.3M. The gain is above 30%

Nice! Thanks for running these numbers.
Joe Stringer July 17, 2017, 6:32 p.m. UTC | #5
On 17 July 2017 at 10:27, Chandran, Sugesh <sugesh.chandran@intel.com> wrote:
>> > -----Original Message-----
>> > From: Joe Stringer [mailto:joe@ovn.org]
>> > Sent: Saturday, July 15, 2017 1:46 AM
>> > To: Sugesh Chandran <sugesh.chandran@intel.com>
>> > Cc: ovs dev <dev@openvswitch.org>; Andy Zhou <azhou@ovn.org>; Zoltán
>> > Balogh <zoltan.balogh@ericsson.com>
>> > Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by combining
>> recirc actions at xlate.
>> >
>> > On 4 July 2017 at 02:46, Sugesh Chandran <sugesh.chandran@intel.com>
>> wrote:

<snip>

>> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
>> > > 4e29085..4d996c1 100644
>> > > --- a/lib/dpif-netdev.c
>> > > +++ b/lib/dpif-netdev.c
>> > > @@ -5028,24 +5028,8 @@ dp_execute_cb(void *aux_, struct
>> > > dp_packet_batch *packets_,
>> > >
>> > >      case OVS_ACTION_ATTR_TUNNEL_PUSH:
>> > >          if (*depth < MAX_RECIRC_DEPTH) {
>> > > -            struct dp_packet_batch tnl_pkt;
>> > > -            struct dp_packet_batch *orig_packets_ = packets_;
>> > > -            int err;
>> > > -
>> > > -            if (!may_steal) {
>> > > -                dp_packet_batch_clone(&tnl_pkt, packets_);
>> > > -                packets_ = &tnl_pkt;
>> > > -                dp_packet_batch_reset_cutlen(orig_packets_);
>> > > -            }
>> > > -
>> > >              dp_packet_batch_apply_cutlen(packets_);
>> > > -
>> > > -            err = push_tnl_action(pmd, a, packets_);
>> > > -            if (!err) {
>> > > -                (*depth)++;
>> > > -                dp_netdev_recirculate(pmd, packets_);
>> > > -                (*depth)--;
>> > > -            }
>> > > +            push_tnl_action(pmd, a, packets_);
>> >
>> > If I follow, this was the previous datapath-level code to ensure that
>> > when packets are output to a tunnel, only the copy of the packet that
>> > goes to the tunnel gets the cutlen applied. With the new version of
>> > the code, we replace this by making sure that the ofproto layer
>> > generates a clone to wrap the push_tunnel and all subsequent bridge
>> > translation, so then it results in the equivalent behaviour: The
>> > cutlen is only ever applied to a clone of the packets. Is that right?
> [Sugesh] Hmm. That’s the good catch. Looks like the packets are not
> cloned when combine_tunnel_action is found impossible in ofproto layer.
>
> As a fix we should keep the code block in 'may_steal' to handle the non tunnel-combine
> case.

Oh, really? I thought that we always end up with a clone wrapping the
tunnel_push now, which means we shouldn't need this bit any more. My
comment was intended just to ask if I understood correctly to make
sure we don't miss anything.

> Any other suggestion to apply cut_len in ofproto?

Unfortunately the cut_len is set up through a previous truncate
action, and it is supposed to apply to the "next output action" (and
reset afterwards), so it is part of the datapath actions interface
definition; I don't think there's a way to push it up to ofproto.

>> >
>> > > +{
>> > > +    memcpy(xc_dst, xc_src, sizeof *xc_dst);
>> > > +    switch (xc_src->type) {
>> > > +    case XC_RULE:
>> > > +        ofproto_rule_ref(&xc_dst->rule->up);
>> > > +        break;
>> > > +    case XC_BOND:
>> > > +        bond_ref(xc_dst->bond.bond);
>> > > +        xc_dst->bond.flow = xmemdup(xc_src->bond.flow,
>> > > +                            sizeof *xc_src->bond.flow);
>> > > +        break;
>> > > +    case XC_NETDEV:
>> > > +        if (xc_src->dev.tx) {
>> > > +            netdev_ref(xc_dst->dev.tx);
>> > > +        }
>> > > +        if (xc_src->dev.rx) {
>> > > +            netdev_ref(xc_dst->dev.rx);
>> > > +        }
>> > > +        if (xc_src->dev.bfd) {
>> > > +            bfd_ref(xc_dst->dev.bfd);
>> > > +        }
>> > > +        break;
>> > > +    case XC_NETFLOW:
>> > > +        netflow_ref(xc_dst->nf.netflow);
>> > > +        xc_dst->nf.flow = xmemdup(xc_src->nf.flow, sizeof *xc_src-
>> >nf.flow);
>> > > +        break;
>> > > +    case XC_MIRROR:
>> > > +        mbridge_ref(xc_dst->mirror.mbridge);
>> > > +        break;
>> > > +    case XC_LEARN:
>> > > +    case XC_TABLE:
>> > > +    case XC_NORMAL:
>> > > +    case XC_FIN_TIMEOUT:
>> > > +    case XC_GROUP:
>> > > +    case XC_TNL_NEIGH:
>> > > +    case XC_CONTROLLER:
>> > > +    case XC_TUNNEL_HEADER:
>> > > +    break;
>> > > +    default:
>> > > +        OVS_NOT_REACHED();
>> > > +    }
>> > > +}
>> > > +
>> > > +static bool
>> > > +validate_and_combine_post_tnl_actions(struct xlate_ctx *ctx,
>> > > +                                      const struct xport *xport,
>> > > +                                      struct xport *out_dev,
>> > > +                                      struct ovs_action_push_tnl
>> > > +tnl_push_data) {
>> > > +    const struct dpif_flow_stats *backup_resubmit_stats;
>> > > +    struct xlate_cache *backup_xcache;
>> > > +    bool nested_act_flag = false;
>> > > +    struct flow_wildcards tmp_flow_wc;
>> > > +    struct flow_wildcards *backup_flow_wc_ptr;
>> > > +    bool backup_side_effects;
>> > > +    const struct dp_packet *backup_pkt;
>> > > +
>> > > +    memset(&tmp_flow_wc, 0 , sizeof tmp_flow_wc);
>> > > +    backup_flow_wc_ptr = ctx->wc;
>> > > +    ctx->wc = &tmp_flow_wc;
>> > > +    ctx->xin->wc = NULL;
>> > > +    backup_resubmit_stats = ctx->xin->resubmit_stats;
>> > > +    backup_xcache = ctx->xin->xcache;
>> > > +    backup_side_effects = ctx->xin->allow_side_effects;
>> > > +    backup_pkt = ctx->xin->packet;
>> > > +
>> > > +    size_t push_action_size = 0;
>> > > +    size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions,
>> > > +                                           OVS_ACTION_ATTR_CLONE);
>> > > +    odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);
>> > > +    push_action_size = ctx->odp_actions->size;
>> > > +
>> > > +    ctx->xin->resubmit_stats =  NULL;
>> > > +    ctx->xin->xcache = xlate_cache_new(); /* Use new temporary cache.
>> */
>> > > +    ctx->xin->allow_side_effects = false;
>> > > +    ctx->xin->packet = NULL;
>> >
>> > I realised that you asked about this after the previous patch, but I
>> > didn't understand the implications of it. I believe that this packet
>> > is how, for instance, the second bridge can forward packets to the
>> > controller. If you reset it to NULL here, then I don't think that the
>> > controller action is executed correctly during the second bridge
>> > translation. (That's a test we could add).
> [Sugesh] Yes, that’s a good test case for it.
>> >
>> > I suspect that some multicast snooping stuff and some other pieces
>> > that would be affected by this.
> [Sugesh] This is interesting. So does the controller expects a encaped
> packets in that case? Do you think, the controller action
> also be an exception case like the TRUNC?

Yes, I think the controller expects encapped packets in this case.
Yes, I think that the easiest way to handle this is to put controller
in that exception case with truncate.

> Do we have a case where send the encap traffic to controller?

Logically, the second bridge should only see the encapped traffic so
if there's any way to send to controller, or sample for things like
IPFIX and so on, then those external entities should observe encap
traffic. I can't think of many cases on transmit - debugging exactly
what is happening to packets would be the main one.

>> >
>> > > +
>> > > +    if (ctx->xin->xcache) {
>> >
>> > A few lines above, this is unconditionally set. Could be removed?
> [Sugesh] I think it is newly created and its possible to have a NULL incase
> of malloc failure. That’s why added the case.

OK. For what it's worth, OVS xmalloc() guarantees a valid object so
this should be unnecessary.
Chandran, Sugesh July 17, 2017, 9:44 p.m. UTC | #6
Hi Joe,
Thank you for the comments., 
Please find my answers below

Regards
_Sugesh


> -----Original Message-----

> From: Zoltán Balogh [mailto:zoltan.balogh@ericsson.com]

> Sent: Monday, July 17, 2017 3:38 PM

> To: Joe Stringer <joe@ovn.org>; Chandran, Sugesh

> <sugesh.chandran@intel.com>

> Cc: ovs dev <dev@openvswitch.org>; Andy Zhou <azhou@ovn.org>

> Subject: RE: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by combining

> recirc actions at xlate.

> 

> 

> Hi Joe,

> 

> I used the setup below to measure Tx performance using a single core and 2

> x 10G Ethernet links. I generated traffic of 64 byte packets to utilize the 10G

> bandwidth and measured the Tx processing cost per transmitted packet in

> nsec.

> 

>             +-------------+

>       dpdk0 |             |

>          -->o    br-in    |

>             |             o--> gre0

>             +-------------+

> 

>                    --> LOCAL

>             +-----------o-+

>             |             | dpdk1

>             |    br-p1    o-->

>             |             |

>             +-------------+

> 

> This is the result of OVS master with DPDK 16.11.2:

> 

>  # dpdk0

> 

>  RX packets         : 7037641.60  / sec

>  RX packet errors   : 0  / sec

>  RX packets dropped : 7730632.90  / sec

>  RX rate            : 402.69 MB/sec

> 

>  # dpdk1

> 

>  TX packets         : 7037641.60  / sec

>  TX packet errors   : 0  / sec

>  TX packets dropped : 0  / sec

>  TX rate            : 657.73 MB/sec

>  TX processing cost per TX packets in nsec : 142.09

> 

> 

> This is the result of OVS master + patch with DPDK 16.11.2:

> 

>  # dpdk0

> 

>  RX packets         : 9386809.60  / sec

>  RX packet errors   : 0  / sec

>  RX packets dropped : 5381496.40  / sec

>  RX rate            : 537.11 MB/sec

> 

>  # dpdk1

> 

>  TX packets         : 9386809.60  / sec

>  TX packet errors   : 0  / sec

>  TX packets dropped : 0  / sec

>  TX rate            : 877.29 MB/sec

>  TX processing cost per TX packets in nsec : 106.53

> 

> As you can see the number of transmitted packets per second increased

> from 7M to 9.3M. The gain is above 30%

> 

> Best regards,

> Zoltan

> 

> > -----Original Message-----

> > From: Joe Stringer [mailto:joe@ovn.org]

> > Sent: Saturday, July 15, 2017 1:46 AM

> > To: Sugesh Chandran <sugesh.chandran@intel.com>

> > Cc: ovs dev <dev@openvswitch.org>; Andy Zhou <azhou@ovn.org>; Zoltán

> > Balogh <zoltan.balogh@ericsson.com>

> > Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by combining

> recirc actions at xlate.

> >

> > On 4 July 2017 at 02:46, Sugesh Chandran <sugesh.chandran@intel.com>

> wrote:

> > > This patch set removes the recirculation of encapsulated tunnel

> > > packets if possible. It is done by computing the post tunnel actions

> > > at the time of translation. The combined nested action set are

> > > programmed in the datapath using CLONE action.

> > >

> > > Signed-off-by: Sugesh Chandran <sugesh.chandran@intel.com>

> > > Signed-off-by: Zoltán Balogh <zoltan.balogh@ericsson.com>

> > > Co-authored-by: Zoltán Balogh <zoltan.balogh@ericsson.com>

> > > ---

> >

> > Hi Sugesh, Zoltán,

> >

> > Thanks for working on this, it's subtle code but it seems like you've

> > found a useful optimization here. Hopefully we can move this forward

> > soon.

> >

[Sugesh] Thank you Joe!. Will try to do the best to cover most of the corner cases.
> > Would you be able to put your performance numbers into the commit

> > message here? They could be useful to refer back to in the future.

[Sugesh] Sure, Zoltan already ran performance benchmarking and shared the results.
Will add that in the next version of patch.
> >

> > This patch needs rebasing.

[Sugesh] Will do that in the next version.
> >

> > More feedback below.

> >

> > >  lib/dpif-netdev.c                  |  18 +--

> > >  ofproto/ofproto-dpif-xlate-cache.c |  14 +-

> > > ofproto/ofproto-dpif-xlate-cache.h |  12 +-

> > >  ofproto/ofproto-dpif-xlate.c       | 295

> ++++++++++++++++++++++++++++++++++++-

> > >  ofproto/ofproto-dpif-xlate.h       |   1 +

> > >  ofproto/ofproto-dpif.c             |   3 +-

> > >  tests/packet-type-aware.at         |  24 +--

> > >  7 files changed, 324 insertions(+), 43 deletions(-)

> > >

> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index

> > > 4e29085..4d996c1 100644

> > > --- a/lib/dpif-netdev.c

> > > +++ b/lib/dpif-netdev.c

> > > @@ -5028,24 +5028,8 @@ dp_execute_cb(void *aux_, struct

> > > dp_packet_batch *packets_,

> > >

> > >      case OVS_ACTION_ATTR_TUNNEL_PUSH:

> > >          if (*depth < MAX_RECIRC_DEPTH) {

> > > -            struct dp_packet_batch tnl_pkt;

> > > -            struct dp_packet_batch *orig_packets_ = packets_;

> > > -            int err;

> > > -

> > > -            if (!may_steal) {

> > > -                dp_packet_batch_clone(&tnl_pkt, packets_);

> > > -                packets_ = &tnl_pkt;

> > > -                dp_packet_batch_reset_cutlen(orig_packets_);

> > > -            }

> > > -

> > >              dp_packet_batch_apply_cutlen(packets_);

> > > -

> > > -            err = push_tnl_action(pmd, a, packets_);

> > > -            if (!err) {

> > > -                (*depth)++;

> > > -                dp_netdev_recirculate(pmd, packets_);

> > > -                (*depth)--;

> > > -            }

> > > +            push_tnl_action(pmd, a, packets_);

> >

> > If I follow, this was the previous datapath-level code to ensure that

> > when packets are output to a tunnel, only the copy of the packet that

> > goes to the tunnel gets the cutlen applied. With the new version of

> > the code, we replace this by making sure that the ofproto layer

> > generates a clone to wrap the push_tunnel and all subsequent bridge

> > translation, so then it results in the equivalent behaviour: The

> > cutlen is only ever applied to a clone of the packets. Is that right?

[Sugesh] Hmm. That’s the good catch. Looks like the packets are not
cloned when combine_tunnel_action is found impossible in ofproto layer.

As a fix we should keep the code block in 'may_steal' to handle the non tunnel-combine
case.
Any other suggestion to apply cut_len in ofproto?

> >

> > <snip>

> >

> > > +/* Validate if the transalated combined actions are OK to proceed.

> > > + * If actions consist of TRUNC action, it is not allowed to do the

> > > + * tunnel_push combine as it cannot update stats correctly.

> > > + */

> > > +static bool

> > > +is_tunnel_actions_clone_ready(struct ofpbuf *actions) {

> > > +    struct nlattr *tnl_actions;

> > > +    const struct nlattr *a;

> > > +    unsigned int left;

> > > +    size_t actions_len;

> > > +

> > > +    if (!actions) {

> > > +        /* No actions, no harm in doing combine. */

> > > +        return true;

> > > +    }

> > > +    actions_len = actions->size;

> > > +

> > > +    tnl_actions =(struct nlattr *)(actions->data);

> > > +    NL_ATTR_FOR_EACH_UNSAFE (a, left, tnl_actions, actions_len) {

> > > +        int type = nl_attr_type(a);

> > > +        if (type == OVS_ACTION_ATTR_TRUNC) {

> > > +            VLOG_DBG("Cannot do tunnel action-combine on trunc action");

> > > +            return false;

> > > +            break;

> > > +        }

> > > +    }

> > > +    return true;

> > > +}

> > > +

> > > +/*

> > > + * Copy the xc entry and update the references accordingly.

> > > + */

> > > +static void

> > > +copy_xc_entry(struct xc_entry *xc_dst, struct xc_entry *xc_src)

> >

> > I think that this whole function should probably reside in

> > ofproto/ofproto-dpif-xlate-cache.c. However, I have an alternative

> > proposal detailed below that would not require us to also take

> > expensive refcounts due to this approach.

[Sugesh] Will introduce a new function in xlate-cache to handle it.
> >

> > > +{

> > > +    memcpy(xc_dst, xc_src, sizeof *xc_dst);

> > > +    switch (xc_src->type) {

> > > +    case XC_RULE:

> > > +        ofproto_rule_ref(&xc_dst->rule->up);

> > > +        break;

> > > +    case XC_BOND:

> > > +        bond_ref(xc_dst->bond.bond);

> > > +        xc_dst->bond.flow = xmemdup(xc_src->bond.flow,

> > > +                            sizeof *xc_src->bond.flow);

> > > +        break;

> > > +    case XC_NETDEV:

> > > +        if (xc_src->dev.tx) {

> > > +            netdev_ref(xc_dst->dev.tx);

> > > +        }

> > > +        if (xc_src->dev.rx) {

> > > +            netdev_ref(xc_dst->dev.rx);

> > > +        }

> > > +        if (xc_src->dev.bfd) {

> > > +            bfd_ref(xc_dst->dev.bfd);

> > > +        }

> > > +        break;

> > > +    case XC_NETFLOW:

> > > +        netflow_ref(xc_dst->nf.netflow);

> > > +        xc_dst->nf.flow = xmemdup(xc_src->nf.flow, sizeof *xc_src-

> >nf.flow);

> > > +        break;

> > > +    case XC_MIRROR:

> > > +        mbridge_ref(xc_dst->mirror.mbridge);

> > > +        break;

> > > +    case XC_LEARN:

> > > +    case XC_TABLE:

> > > +    case XC_NORMAL:

> > > +    case XC_FIN_TIMEOUT:

> > > +    case XC_GROUP:

> > > +    case XC_TNL_NEIGH:

> > > +    case XC_CONTROLLER:

> > > +    case XC_TUNNEL_HEADER:

> > > +    break;

> > > +    default:

> > > +        OVS_NOT_REACHED();

> > > +    }

> > > +}

> > > +

> > > +static bool

> > > +validate_and_combine_post_tnl_actions(struct xlate_ctx *ctx,

> > > +                                      const struct xport *xport,

> > > +                                      struct xport *out_dev,

> > > +                                      struct ovs_action_push_tnl

> > > +tnl_push_data) {

> > > +    const struct dpif_flow_stats *backup_resubmit_stats;

> > > +    struct xlate_cache *backup_xcache;

> > > +    bool nested_act_flag = false;

> > > +    struct flow_wildcards tmp_flow_wc;

> > > +    struct flow_wildcards *backup_flow_wc_ptr;

> > > +    bool backup_side_effects;

> > > +    const struct dp_packet *backup_pkt;

> > > +

> > > +    memset(&tmp_flow_wc, 0 , sizeof tmp_flow_wc);

> > > +    backup_flow_wc_ptr = ctx->wc;

> > > +    ctx->wc = &tmp_flow_wc;

> > > +    ctx->xin->wc = NULL;

> > > +    backup_resubmit_stats = ctx->xin->resubmit_stats;

> > > +    backup_xcache = ctx->xin->xcache;

> > > +    backup_side_effects = ctx->xin->allow_side_effects;

> > > +    backup_pkt = ctx->xin->packet;

> > > +

> > > +    size_t push_action_size = 0;

> > > +    size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions,

> > > +                                           OVS_ACTION_ATTR_CLONE);

> > > +    odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);

> > > +    push_action_size = ctx->odp_actions->size;

> > > +

> > > +    ctx->xin->resubmit_stats =  NULL;

> > > +    ctx->xin->xcache = xlate_cache_new(); /* Use new temporary cache.

> */

> > > +    ctx->xin->allow_side_effects = false;

> > > +    ctx->xin->packet = NULL;

> >

> > I realised that you asked about this after the previous patch, but I

> > didn't understand the implications of it. I believe that this packet

> > is how, for instance, the second bridge can forward packets to the

> > controller. If you reset it to NULL here, then I don't think that the

> > controller action is executed correctly during the second bridge

> > translation. (That's a test we could add).

[Sugesh] Yes, that’s a good test case for it.
> >

> > I suspect that some multicast snooping stuff and some other pieces

> > that would be affected by this.

[Sugesh] This is interesting. So does the controller expects a encaped
packets in that case? Do you think, the controller action
also be an exception case like the TRUNC?

Do we have a case where send the encap traffic to controller?
	
> >

> > > +

> > > +    if (ctx->xin->xcache) {

> >

> > A few lines above, this is unconditionally set. Could be removed?

[Sugesh] I think it is newly created and its possible to have a NULL incase
of malloc failure. That’s why added the case.
> >

> > > +        /* Push the cache entry for the tunnel first. */

> > > +        struct xc_entry *entry;

> > > +        entry = xlate_cache_add_entry(ctx->xin->xcache,

> XC_TUNNEL_HEADER);

> > > +        entry->tunnel_hdr.hdr_size = tnl_push_data.header_len;

> > > +        entry->tunnel_hdr.operation = ADD;

> > > +    }

> > > +

> > > +    apply_nested_clone_actions(ctx, xport, out_dev);

> > > +    nested_act_flag =

> > > + is_tunnel_actions_clone_ready(ctx->odp_actions);

> > > +

> > > +    if (nested_act_flag) {

> > > +        struct dpif_flow_stats tmp_resubmit_stats;

> > > +        struct xc_entry *entry;

> > > +        struct ofpbuf entries;

> > > +

> > > +        memset(&tmp_resubmit_stats, 0, sizeof tmp_resubmit_stats);

> >

> > I think that tmp_resubmit_stats is only ever used in the if condition

> > below, which will initialize the entire structure anyway so the

> > declaration can be moved below, and this memset line can be removed.

[Sugesh] Sure. Will move it down.
> >

> > > +         /* Similar to the stats update in revalidation, the x_cache entries

> > > +          * are populated by the previous translation are used to update the

> > > +          * stats correctly.

> > > +         .*/

> > > +        ovs_mutex_lock(&ctx->xin->resubmit_stats_mutex);

> >

> > Which piece of shared memory is this mutex protecting?

[Sugesh] Will remove it.
> >

> > > +        if (backup_resubmit_stats) {

> > > +            memcpy(&tmp_resubmit_stats, backup_resubmit_stats,

> > > +                   sizeof tmp_resubmit_stats);

> > > +            xlate_push_stats(ctx->xin->xcache, &tmp_resubmit_stats);

> > > +        }

> > > +        entries = ctx->xin->xcache->entries;

> > > +        XC_ENTRY_FOR_EACH (entry, &entries) {

> > > +            struct xc_entry *xc_copy_entry;

> > > +            if (!backup_xcache) {

> > > +                break;

> >

> > Do we need to do this for every single XC entry?

[Sugesh] Will remove in next release .
> >

> > > +            }

> > > +            xc_copy_entry = xlate_cache_add_entry(backup_xcache, entry-

> >type);

> > > +            copy_xc_entry(xc_copy_entry, entry);

> >

> > I wonder if instead of open-coding this stuff here, instead there was

> > a function defined in ofproto-dpif-xlate-cache.c which would append a

> > second xlate_cache onto a first. A function prototype like this,

> > perhaps?

> >

> > /* Iterates through 'src' and removes the xc_entry elements from it,

> > and places them inside 'dst' */ void xlate_cache_steal_entries(struct

> > xlate_cache *dst, struct xlate_cache *src);

[Sugesh] Make sense, will create a function to append cache entries and clear
off the src once the copying completed.
> >

> > The current execution should own both xlate_caches so I think that it

> > should be safe to more or less just append the latter xlate_cache onto

> > the original one.

> >

> > > +        }

> > > +        ovs_mutex_unlock(&ctx->xin->resubmit_stats_mutex);

> > > +    }

> > > +    else {

> >

> > Please place the else on the same line as the close bracket.

[Sugesh] ok
> >

> > > +        /* Combine is not valid. */

> > > +        nl_msg_cancel_nested(ctx->odp_actions, clone_ofs);

> > > +        goto out;

> > > +    }

> > > +    if (ctx->odp_actions->size > push_action_size) {

> > > +        /* Update the CLONE action only when combined. */

> > > +        nl_msg_end_nested(ctx->odp_actions, clone_ofs);

> > > +    }

> > > +    else {

> >

> > Same style request.

> >

> > > +        /* No actions after the tunnel, no need of clone. */

> > > +        nl_msg_cancel_nested(ctx->odp_actions, clone_ofs);

> > > +        odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);

> >

> > If there are no actions after the tunnel, then do we need the tunnel

> > header at all?

[Sugesh] There are test cases that does it. I kept it for cover those.
> >

> > Do we have a test case which outputs to a tunnel which goes to another

> > bridge that drops the packet, then on the first bridge continues to

> > execute some more stuff to make sure this path works in the way we

> > expect?
Chandran, Sugesh July 17, 2017, 10:17 p.m. UTC | #7
Comments inline

Regards
_Sugesh

> -----Original Message-----

> From: Joe Stringer [mailto:joe@ovn.org]

> Sent: Monday, July 17, 2017 7:33 PM

> To: Chandran, Sugesh <sugesh.chandran@intel.com>

> Cc: Zoltán Balogh <zoltan.balogh@ericsson.com>; ovs dev

> <dev@openvswitch.org>; Andy Zhou <azhou@ovn.org>

> Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by combining

> recirc actions at xlate.

> 

> On 17 July 2017 at 10:27, Chandran, Sugesh <sugesh.chandran@intel.com>

> wrote:

> >> > -----Original Message-----

> >> > From: Joe Stringer [mailto:joe@ovn.org]

> >> > Sent: Saturday, July 15, 2017 1:46 AM

> >> > To: Sugesh Chandran <sugesh.chandran@intel.com>

> >> > Cc: ovs dev <dev@openvswitch.org>; Andy Zhou <azhou@ovn.org>;

> >> > Zoltán Balogh <zoltan.balogh@ericsson.com>

> >> > Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by

> >> > combining

> >> recirc actions at xlate.

> >> >

> >> > On 4 July 2017 at 02:46, Sugesh Chandran

> >> > <sugesh.chandran@intel.com>

> >> wrote:

> 

> <snip>

> 

> >> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index

> >> > > 4e29085..4d996c1 100644

> >> > > --- a/lib/dpif-netdev.c

> >> > > +++ b/lib/dpif-netdev.c

> >> > > @@ -5028,24 +5028,8 @@ dp_execute_cb(void *aux_, struct

> >> > > dp_packet_batch *packets_,

> >> > >

> >> > >      case OVS_ACTION_ATTR_TUNNEL_PUSH:

> >> > >          if (*depth < MAX_RECIRC_DEPTH) {

> >> > > -            struct dp_packet_batch tnl_pkt;

> >> > > -            struct dp_packet_batch *orig_packets_ = packets_;

> >> > > -            int err;

> >> > > -

> >> > > -            if (!may_steal) {

> >> > > -                dp_packet_batch_clone(&tnl_pkt, packets_);

> >> > > -                packets_ = &tnl_pkt;

> >> > > -                dp_packet_batch_reset_cutlen(orig_packets_);

> >> > > -            }

> >> > > -

> >> > >              dp_packet_batch_apply_cutlen(packets_);

> >> > > -

> >> > > -            err = push_tnl_action(pmd, a, packets_);

> >> > > -            if (!err) {

> >> > > -                (*depth)++;

> >> > > -                dp_netdev_recirculate(pmd, packets_);

> >> > > -                (*depth)--;

> >> > > -            }

> >> > > +            push_tnl_action(pmd, a, packets_);

> >> >

> >> > If I follow, this was the previous datapath-level code to ensure

> >> > that when packets are output to a tunnel, only the copy of the

> >> > packet that goes to the tunnel gets the cutlen applied. With the

> >> > new version of the code, we replace this by making sure that the

> >> > ofproto layer generates a clone to wrap the push_tunnel and all

> >> > subsequent bridge translation, so then it results in the equivalent

> >> > behaviour: The cutlen is only ever applied to a clone of the packets. Is

> that right?

> > [Sugesh] Hmm. That’s the good catch. Looks like the packets are not

> > cloned when combine_tunnel_action is found impossible in ofproto layer.

> >

> > As a fix we should keep the code block in 'may_steal' to handle the

> > non tunnel-combine case.

> 

> Oh, really? I thought that we always end up with a clone wrapping the

> tunnel_push now, which means we shouldn't need this bit any more. My

> comment was intended just to ask if I understood correctly to make sure we

> don't miss anything.

> 

> > Any other suggestion to apply cut_len in ofproto?

> 

> Unfortunately the cut_len is set up through a previous truncate action, and it

> is supposed to apply to the "next output action" (and reset afterwards), so it

> is part of the datapath actions interface definition; I don't think there's a way

> to push it up to ofproto.

[Sugesh] Or at ofproto layer,  the non-combine tunnel push also placed in a clone, similar to the
combine case? That will help to keep the datapath as simple as just a push.
> 

> >> >

> >> > > +{

> >> > > +    memcpy(xc_dst, xc_src, sizeof *xc_dst);

> >> > > +    switch (xc_src->type) {

> >> > > +    case XC_RULE:

> >> > > +        ofproto_rule_ref(&xc_dst->rule->up);

> >> > > +        break;

> >> > > +    case XC_BOND:

> >> > > +        bond_ref(xc_dst->bond.bond);

> >> > > +        xc_dst->bond.flow = xmemdup(xc_src->bond.flow,

> >> > > +                            sizeof *xc_src->bond.flow);

> >> > > +        break;

> >> > > +    case XC_NETDEV:

> >> > > +        if (xc_src->dev.tx) {

> >> > > +            netdev_ref(xc_dst->dev.tx);

> >> > > +        }

> >> > > +        if (xc_src->dev.rx) {

> >> > > +            netdev_ref(xc_dst->dev.rx);

> >> > > +        }

> >> > > +        if (xc_src->dev.bfd) {

> >> > > +            bfd_ref(xc_dst->dev.bfd);

> >> > > +        }

> >> > > +        break;

> >> > > +    case XC_NETFLOW:

> >> > > +        netflow_ref(xc_dst->nf.netflow);

> >> > > +        xc_dst->nf.flow = xmemdup(xc_src->nf.flow, sizeof

> >> > > +*xc_src-

> >> >nf.flow);

> >> > > +        break;

> >> > > +    case XC_MIRROR:

> >> > > +        mbridge_ref(xc_dst->mirror.mbridge);

> >> > > +        break;

> >> > > +    case XC_LEARN:

> >> > > +    case XC_TABLE:

> >> > > +    case XC_NORMAL:

> >> > > +    case XC_FIN_TIMEOUT:

> >> > > +    case XC_GROUP:

> >> > > +    case XC_TNL_NEIGH:

> >> > > +    case XC_CONTROLLER:

> >> > > +    case XC_TUNNEL_HEADER:

> >> > > +    break;

> >> > > +    default:

> >> > > +        OVS_NOT_REACHED();

> >> > > +    }

> >> > > +}

> >> > > +

> >> > > +static bool

> >> > > +validate_and_combine_post_tnl_actions(struct xlate_ctx *ctx,

> >> > > +                                      const struct xport *xport,

> >> > > +                                      struct xport *out_dev,

> >> > > +                                      struct ovs_action_push_tnl

> >> > > +tnl_push_data) {

> >> > > +    const struct dpif_flow_stats *backup_resubmit_stats;

> >> > > +    struct xlate_cache *backup_xcache;

> >> > > +    bool nested_act_flag = false;

> >> > > +    struct flow_wildcards tmp_flow_wc;

> >> > > +    struct flow_wildcards *backup_flow_wc_ptr;

> >> > > +    bool backup_side_effects;

> >> > > +    const struct dp_packet *backup_pkt;

> >> > > +

> >> > > +    memset(&tmp_flow_wc, 0 , sizeof tmp_flow_wc);

> >> > > +    backup_flow_wc_ptr = ctx->wc;

> >> > > +    ctx->wc = &tmp_flow_wc;

> >> > > +    ctx->xin->wc = NULL;

> >> > > +    backup_resubmit_stats = ctx->xin->resubmit_stats;

> >> > > +    backup_xcache = ctx->xin->xcache;

> >> > > +    backup_side_effects = ctx->xin->allow_side_effects;

> >> > > +    backup_pkt = ctx->xin->packet;

> >> > > +

> >> > > +    size_t push_action_size = 0;

> >> > > +    size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions,

> >> > > +                                           OVS_ACTION_ATTR_CLONE);

> >> > > +    odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);

> >> > > +    push_action_size = ctx->odp_actions->size;

> >> > > +

> >> > > +    ctx->xin->resubmit_stats =  NULL;

> >> > > +    ctx->xin->xcache = xlate_cache_new(); /* Use new temporary

> cache.

> >> */

> >> > > +    ctx->xin->allow_side_effects = false;

> >> > > +    ctx->xin->packet = NULL;

> >> >

> >> > I realised that you asked about this after the previous patch, but

> >> > I didn't understand the implications of it. I believe that this

> >> > packet is how, for instance, the second bridge can forward packets

> >> > to the controller. If you reset it to NULL here, then I don't think

> >> > that the controller action is executed correctly during the second

> >> > bridge translation. (That's a test we could add).

> > [Sugesh] Yes, that’s a good test case for it.

> >> >

> >> > I suspect that some multicast snooping stuff and some other pieces

> >> > that would be affected by this.

> > [Sugesh] This is interesting. So does the controller expects a encaped

> > packets in that case? Do you think, the controller action also be an

> > exception case like the TRUNC?

> 

> Yes, I think the controller expects encapped packets in this case.

> Yes, I think that the easiest way to handle this is to put controller in that

> exception case with truncate.

[Sugesh] Ok.  Will add that case as an exception.

Before that, Would like to clarify the controller output action bit more
How does the following case are currently handled that involves controller action
(We haven’t run any test case that involves a controller and very limited knowledge on that side)

MOD_VLAN(10) --> (PATCH-PORT) --> (SECOND-BR) 
What if there is no rules to handle the packet in 'second-br' and wanted to report to controller?

Will the packet report with VLAN 10 to controller?

> 

> > Do we have a case where send the encap traffic to controller?

> 

> Logically, the second bridge should only see the encapped traffic so if there's

> any way to send to controller, or sample for things like IPFIX and so on, then

> those external entities should observe encap traffic. I can't think of many

> cases on transmit - debugging exactly what is happening to packets would be

> the main one.

[Sugesh] ok. Thank you for this input.

> 

> >> >

> >> > > +

> >> > > +    if (ctx->xin->xcache) {

> >> >

> >> > A few lines above, this is unconditionally set. Could be removed?

> > [Sugesh] I think it is newly created and its possible to have a NULL

> > incase of malloc failure. That’s why added the case.

> 

> OK. For what it's worth, OVS xmalloc() guarantees a valid object so this

> should be unnecessary.

[Sugesh] Ok, will remove it.
Joe Stringer July 18, 2017, 5:42 a.m. UTC | #8
On 17 July 2017 at 15:17, Chandran, Sugesh <sugesh.chandran@intel.com> wrote:
> Comments inline
>
> Regards
> _Sugesh
>
>> -----Original Message-----
>> From: Joe Stringer [mailto:joe@ovn.org]
>> Sent: Monday, July 17, 2017 7:33 PM
>> To: Chandran, Sugesh <sugesh.chandran@intel.com>
>> Cc: Zoltán Balogh <zoltan.balogh@ericsson.com>; ovs dev
>> <dev@openvswitch.org>; Andy Zhou <azhou@ovn.org>
>> Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by combining
>> recirc actions at xlate.
>>
>> On 17 July 2017 at 10:27, Chandran, Sugesh <sugesh.chandran@intel.com>
>> wrote:
>> >> > -----Original Message-----
>> >> > From: Joe Stringer [mailto:joe@ovn.org]
>> >> > Sent: Saturday, July 15, 2017 1:46 AM
>> >> > To: Sugesh Chandran <sugesh.chandran@intel.com>
>> >> > Cc: ovs dev <dev@openvswitch.org>; Andy Zhou <azhou@ovn.org>;
>> >> > Zoltán Balogh <zoltan.balogh@ericsson.com>
>> >> > Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by
>> >> > combining
>> >> recirc actions at xlate.
>> >> >
>> >> > On 4 July 2017 at 02:46, Sugesh Chandran
>> >> > <sugesh.chandran@intel.com>
>> >> wrote:
>>
>> <snip>
>>
>> >> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
>> >> > > 4e29085..4d996c1 100644
>> >> > > --- a/lib/dpif-netdev.c
>> >> > > +++ b/lib/dpif-netdev.c
>> >> > > @@ -5028,24 +5028,8 @@ dp_execute_cb(void *aux_, struct
>> >> > > dp_packet_batch *packets_,
>> >> > >
>> >> > >      case OVS_ACTION_ATTR_TUNNEL_PUSH:
>> >> > >          if (*depth < MAX_RECIRC_DEPTH) {
>> >> > > -            struct dp_packet_batch tnl_pkt;
>> >> > > -            struct dp_packet_batch *orig_packets_ = packets_;
>> >> > > -            int err;
>> >> > > -
>> >> > > -            if (!may_steal) {
>> >> > > -                dp_packet_batch_clone(&tnl_pkt, packets_);
>> >> > > -                packets_ = &tnl_pkt;
>> >> > > -                dp_packet_batch_reset_cutlen(orig_packets_);
>> >> > > -            }
>> >> > > -
>> >> > >              dp_packet_batch_apply_cutlen(packets_);
>> >> > > -
>> >> > > -            err = push_tnl_action(pmd, a, packets_);
>> >> > > -            if (!err) {
>> >> > > -                (*depth)++;
>> >> > > -                dp_netdev_recirculate(pmd, packets_);
>> >> > > -                (*depth)--;
>> >> > > -            }
>> >> > > +            push_tnl_action(pmd, a, packets_);
>> >> >
>> >> > If I follow, this was the previous datapath-level code to ensure
>> >> > that when packets are output to a tunnel, only the copy of the
>> >> > packet that goes to the tunnel gets the cutlen applied. With the
>> >> > new version of the code, we replace this by making sure that the
>> >> > ofproto layer generates a clone to wrap the push_tunnel and all
>> >> > subsequent bridge translation, so then it results in the equivalent
>> >> > behaviour: The cutlen is only ever applied to a clone of the packets. Is
>> that right?
>> > [Sugesh] Hmm. That’s the good catch. Looks like the packets are not
>> > cloned when combine_tunnel_action is found impossible in ofproto layer.
>> >
>> > As a fix we should keep the code block in 'may_steal' to handle the
>> > non tunnel-combine case.
>>
>> Oh, really? I thought that we always end up with a clone wrapping the
>> tunnel_push now, which means we shouldn't need this bit any more. My
>> comment was intended just to ask if I understood correctly to make sure we
>> don't miss anything.
>>
>> > Any other suggestion to apply cut_len in ofproto?
>>
>> Unfortunately the cut_len is set up through a previous truncate action, and it
>> is supposed to apply to the "next output action" (and reset afterwards), so it
>> is part of the datapath actions interface definition; I don't think there's a way
>> to push it up to ofproto.
> [Sugesh] Or at ofproto layer,  the non-combine tunnel push also placed in a clone, similar to the
> combine case? That will help to keep the datapath as simple as just a push.

That sounds simpler, but I don't think we can get rid of applying the
cut_len just yet. This is related to some other discussions going on
around the interaction between truncate() and clone(), so it's
probably a bit safer to just leave this code to apply cut len in place
for now.

>> >> > > +static bool
>> >> > > +validate_and_combine_post_tnl_actions(struct xlate_ctx *ctx,
>> >> > > +                                      const struct xport *xport,
>> >> > > +                                      struct xport *out_dev,
>> >> > > +                                      struct ovs_action_push_tnl
>> >> > > +tnl_push_data) {
>> >> > > +    const struct dpif_flow_stats *backup_resubmit_stats;
>> >> > > +    struct xlate_cache *backup_xcache;
>> >> > > +    bool nested_act_flag = false;
>> >> > > +    struct flow_wildcards tmp_flow_wc;
>> >> > > +    struct flow_wildcards *backup_flow_wc_ptr;
>> >> > > +    bool backup_side_effects;
>> >> > > +    const struct dp_packet *backup_pkt;
>> >> > > +
>> >> > > +    memset(&tmp_flow_wc, 0 , sizeof tmp_flow_wc);
>> >> > > +    backup_flow_wc_ptr = ctx->wc;
>> >> > > +    ctx->wc = &tmp_flow_wc;
>> >> > > +    ctx->xin->wc = NULL;
>> >> > > +    backup_resubmit_stats = ctx->xin->resubmit_stats;
>> >> > > +    backup_xcache = ctx->xin->xcache;
>> >> > > +    backup_side_effects = ctx->xin->allow_side_effects;
>> >> > > +    backup_pkt = ctx->xin->packet;
>> >> > > +
>> >> > > +    size_t push_action_size = 0;
>> >> > > +    size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions,
>> >> > > +                                           OVS_ACTION_ATTR_CLONE);
>> >> > > +    odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);
>> >> > > +    push_action_size = ctx->odp_actions->size;
>> >> > > +
>> >> > > +    ctx->xin->resubmit_stats =  NULL;
>> >> > > +    ctx->xin->xcache = xlate_cache_new(); /* Use new temporary
>> cache.
>> >> */
>> >> > > +    ctx->xin->allow_side_effects = false;
>> >> > > +    ctx->xin->packet = NULL;
>> >> >
>> >> > I realised that you asked about this after the previous patch, but
>> >> > I didn't understand the implications of it. I believe that this
>> >> > packet is how, for instance, the second bridge can forward packets
>> >> > to the controller. If you reset it to NULL here, then I don't think
>> >> > that the controller action is executed correctly during the second
>> >> > bridge translation. (That's a test we could add).
>> > [Sugesh] Yes, that’s a good test case for it.
>> >> >
>> >> > I suspect that some multicast snooping stuff and some other pieces
>> >> > that would be affected by this.
>> > [Sugesh] This is interesting. So does the controller expects a encaped
>> > packets in that case? Do you think, the controller action also be an
>> > exception case like the TRUNC?
>>
>> Yes, I think the controller expects encapped packets in this case.
>> Yes, I think that the easiest way to handle this is to put controller in that
>> exception case with truncate.
> [Sugesh] Ok.  Will add that case as an exception.
>
> Before that, Would like to clarify the controller output action bit more
> How does the following case are currently handled that involves controller action
> (We haven’t run any test case that involves a controller and very limited knowledge on that side)
>
> MOD_VLAN(10) --> (PATCH-PORT) --> (SECOND-BR)
> What if there is no rules to handle the packet in 'second-br' and wanted to report to controller?
>
> Will the packet report with VLAN 10 to controller?

Currently, this will work because patch port doesn't NULLify this
ctx->xin->packet pointer, then when it translates through the second
bridge and gets to execute_controller_action(), that function will
make a clone, apply the current actions to that packet, then send the
modified packet to the controller.

So, longer term I think that this NULLification should be removed and
we keep the same packet attached to the context. However at the
moment, from execute_controller_action() ->
xlate_execute_odp_actions() -> odp_execute_actions() there is no
datapath callback provided for implementing the tunnel_push
functionality. Maybe it's as simple as plugging
dpif_execute_helper_cb() into the odp_execute_actions() call there,
but that sounds like a separate change that should get some thorough
review and testing as well. I don't think it's strictly necessary for
getting this series in, this series can function correctly using the
fallback path as we discussed above. It'd be nice to see this at some
point in future though to consolidate the paths through this code.
Chandran, Sugesh July 18, 2017, 8:59 a.m. UTC | #9
Regards
_Sugesh


> -----Original Message-----

> From: Joe Stringer [mailto:joe@ovn.org]

> Sent: Tuesday, July 18, 2017 6:42 AM

> To: Chandran, Sugesh <sugesh.chandran@intel.com>

> Cc: Zoltán Balogh <zoltan.balogh@ericsson.com>; ovs dev

> <dev@openvswitch.org>; Andy Zhou <azhou@ovn.org>

> Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by combining

> recirc actions at xlate.

> 

> On 17 July 2017 at 15:17, Chandran, Sugesh <sugesh.chandran@intel.com>

> wrote:

> > Comments inline

> >

> > Regards

> > _Sugesh

> >

> >> -----Original Message-----

> >> From: Joe Stringer [mailto:joe@ovn.org]

> >> Sent: Monday, July 17, 2017 7:33 PM

> >> To: Chandran, Sugesh <sugesh.chandran@intel.com>

> >> Cc: Zoltán Balogh <zoltan.balogh@ericsson.com>; ovs dev

> >> <dev@openvswitch.org>; Andy Zhou <azhou@ovn.org>

> >> Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by

> >> combining recirc actions at xlate.

> >>

> >> On 17 July 2017 at 10:27, Chandran, Sugesh

> >> <sugesh.chandran@intel.com>

> >> wrote:

> >> >> > -----Original Message-----

> >> >> > From: Joe Stringer [mailto:joe@ovn.org]

> >> >> > Sent: Saturday, July 15, 2017 1:46 AM

> >> >> > To: Sugesh Chandran <sugesh.chandran@intel.com>

> >> >> > Cc: ovs dev <dev@openvswitch.org>; Andy Zhou <azhou@ovn.org>;

> >> >> > Zoltán Balogh <zoltan.balogh@ericsson.com>

> >> >> > Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by

> >> >> > combining

> >> >> recirc actions at xlate.

> >> >> >

> >> >> > On 4 July 2017 at 02:46, Sugesh Chandran

> >> >> > <sugesh.chandran@intel.com>

> >> >> wrote:

> >>

> >> <snip>

> >>

> >> >> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index

> >> >> > > 4e29085..4d996c1 100644

> >> >> > > --- a/lib/dpif-netdev.c

> >> >> > > +++ b/lib/dpif-netdev.c

> >> >> > > @@ -5028,24 +5028,8 @@ dp_execute_cb(void *aux_, struct

> >> >> > > dp_packet_batch *packets_,

> >> >> > >

> >> >> > >      case OVS_ACTION_ATTR_TUNNEL_PUSH:

> >> >> > >          if (*depth < MAX_RECIRC_DEPTH) {

> >> >> > > -            struct dp_packet_batch tnl_pkt;

> >> >> > > -            struct dp_packet_batch *orig_packets_ = packets_;

> >> >> > > -            int err;

> >> >> > > -

> >> >> > > -            if (!may_steal) {

> >> >> > > -                dp_packet_batch_clone(&tnl_pkt, packets_);

> >> >> > > -                packets_ = &tnl_pkt;

> >> >> > > -                dp_packet_batch_reset_cutlen(orig_packets_);

> >> >> > > -            }

> >> >> > > -

> >> >> > >              dp_packet_batch_apply_cutlen(packets_);

> >> >> > > -

> >> >> > > -            err = push_tnl_action(pmd, a, packets_);

> >> >> > > -            if (!err) {

> >> >> > > -                (*depth)++;

> >> >> > > -                dp_netdev_recirculate(pmd, packets_);

> >> >> > > -                (*depth)--;

> >> >> > > -            }

> >> >> > > +            push_tnl_action(pmd, a, packets_);

> >> >> >

> >> >> > If I follow, this was the previous datapath-level code to ensure

> >> >> > that when packets are output to a tunnel, only the copy of the

> >> >> > packet that goes to the tunnel gets the cutlen applied. With the

> >> >> > new version of the code, we replace this by making sure that the

> >> >> > ofproto layer generates a clone to wrap the push_tunnel and all

> >> >> > subsequent bridge translation, so then it results in the

> >> >> > equivalent

> >> >> > behaviour: The cutlen is only ever applied to a clone of the

> >> >> > packets. Is

> >> that right?

> >> > [Sugesh] Hmm. That’s the good catch. Looks like the packets are not

> >> > cloned when combine_tunnel_action is found impossible in ofproto

> layer.

> >> >

> >> > As a fix we should keep the code block in 'may_steal' to handle the

> >> > non tunnel-combine case.

> >>

> >> Oh, really? I thought that we always end up with a clone wrapping the

> >> tunnel_push now, which means we shouldn't need this bit any more. My

> >> comment was intended just to ask if I understood correctly to make

> >> sure we don't miss anything.

> >>

> >> > Any other suggestion to apply cut_len in ofproto?

> >>

> >> Unfortunately the cut_len is set up through a previous truncate

> >> action, and it is supposed to apply to the "next output action" (and

> >> reset afterwards), so it is part of the datapath actions interface

> >> definition; I don't think there's a way to push it up to ofproto.

> > [Sugesh] Or at ofproto layer,  the non-combine tunnel push also placed

> > in a clone, similar to the combine case? That will help to keep the datapath

> as simple as just a push.

> 

> That sounds simpler, but I don't think we can get rid of applying the cut_len

> just yet. This is related to some other discussions going on around the

> interaction between truncate() and clone(), so it's probably a bit safer to just

> leave this code to apply cut len in place for now.

[Sugesh] I don’t meant to get rid of cut_len. Instead we make sure any tunnel-push
always happen on a clone. We keep the push +cut_len as is in the patch.
So the ofproto actions would become
clone(tnl_push), trunc(200) -->Non-combine case. (instead of just [tnl_push, trunc(200)]
clone(tnl_push, output(1)) ---> Combine case
Do you think it has any other implication that we may missed?
> 

> >> >> > > +static bool

> >> >> > > +validate_and_combine_post_tnl_actions(struct xlate_ctx *ctx,

> >> >> > > +                                      const struct xport *xport,

> >> >> > > +                                      struct xport *out_dev,

> >> >> > > +                                      struct

> >> >> > > +ovs_action_push_tnl

> >> >> > > +tnl_push_data) {

> >> >> > > +    const struct dpif_flow_stats *backup_resubmit_stats;

> >> >> > > +    struct xlate_cache *backup_xcache;

> >> >> > > +    bool nested_act_flag = false;

> >> >> > > +    struct flow_wildcards tmp_flow_wc;

> >> >> > > +    struct flow_wildcards *backup_flow_wc_ptr;

> >> >> > > +    bool backup_side_effects;

> >> >> > > +    const struct dp_packet *backup_pkt;

> >> >> > > +

> >> >> > > +    memset(&tmp_flow_wc, 0 , sizeof tmp_flow_wc);

> >> >> > > +    backup_flow_wc_ptr = ctx->wc;

> >> >> > > +    ctx->wc = &tmp_flow_wc;

> >> >> > > +    ctx->xin->wc = NULL;

> >> >> > > +    backup_resubmit_stats = ctx->xin->resubmit_stats;

> >> >> > > +    backup_xcache = ctx->xin->xcache;

> >> >> > > +    backup_side_effects = ctx->xin->allow_side_effects;

> >> >> > > +    backup_pkt = ctx->xin->packet;

> >> >> > > +

> >> >> > > +    size_t push_action_size = 0;

> >> >> > > +    size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions,

> >> >> > > +                                           OVS_ACTION_ATTR_CLONE);

> >> >> > > +    odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);

> >> >> > > +    push_action_size = ctx->odp_actions->size;

> >> >> > > +

> >> >> > > +    ctx->xin->resubmit_stats =  NULL;

> >> >> > > +    ctx->xin->xcache = xlate_cache_new(); /* Use new

> >> >> > > + temporary

> >> cache.

> >> >> */

> >> >> > > +    ctx->xin->allow_side_effects = false;

> >> >> > > +    ctx->xin->packet = NULL;

> >> >> >

> >> >> > I realised that you asked about this after the previous patch,

> >> >> > but I didn't understand the implications of it. I believe that

> >> >> > this packet is how, for instance, the second bridge can forward

> >> >> > packets to the controller. If you reset it to NULL here, then I

> >> >> > don't think that the controller action is executed correctly

> >> >> > during the second bridge translation. (That's a test we could add).

> >> > [Sugesh] Yes, that’s a good test case for it.

> >> >> >

> >> >> > I suspect that some multicast snooping stuff and some other

> >> >> > pieces that would be affected by this.

> >> > [Sugesh] This is interesting. So does the controller expects a

> >> > encaped packets in that case? Do you think, the controller action

> >> > also be an exception case like the TRUNC?

> >>

> >> Yes, I think the controller expects encapped packets in this case.

> >> Yes, I think that the easiest way to handle this is to put controller

> >> in that exception case with truncate.

> > [Sugesh] Ok.  Will add that case as an exception.

> >

> > Before that, Would like to clarify the controller output action bit

> > more How does the following case are currently handled that involves

> > controller action (We haven’t run any test case that involves a

> > controller and very limited knowledge on that side)

> >

> > MOD_VLAN(10) --> (PATCH-PORT) --> (SECOND-BR) What if there is no

> > rules to handle the packet in 'second-br' and wanted to report to

> controller?

> >

> > Will the packet report with VLAN 10 to controller?

> 

> Currently, this will work because patch port doesn't NULLify this

> ctx->xin->packet pointer, then when it translates through the second

> bridge and gets to execute_controller_action(), that function will make a

> clone, apply the current actions to that packet, then send the modified

> packet to the controller.

[Sugesh] Ok. So all the pre-actions will be applied before 
cloning the packets(applying mod_vlan) using the callback.
> 

> So, longer term I think that this NULLification should be removed and we

> keep the same packet attached to the context. However at the moment,

> from execute_controller_action() ->

> xlate_execute_odp_actions() -> odp_execute_actions() there is no datapath

> callback provided for implementing the tunnel_push functionality. Maybe it's

> as simple as plugging

> dpif_execute_helper_cb() into the odp_execute_actions() call there, but

[Sugesh] I think we should offer the callback to do the modification.
Will work on when this patch series is accepted.
> that sounds like a separate change that should get some thorough review

> and testing as well. I don't think it's strictly necessary for getting this series in,

> this series can function correctly using the fallback path as we discussed

> above. It'd be nice to see this at some point in future though to consolidate

> the paths through this code.

[Sugesh] Will try to do as a separate patch as you suggested.
Joe Stringer July 18, 2017, 5:29 p.m. UTC | #10
On 18 July 2017 at 01:59, Chandran, Sugesh <sugesh.chandran@intel.com> wrote:
>
>
> Regards
> _Sugesh
>
>
>> -----Original Message-----
>> From: Joe Stringer [mailto:joe@ovn.org]
>> Sent: Tuesday, July 18, 2017 6:42 AM
>> To: Chandran, Sugesh <sugesh.chandran@intel.com>
>> Cc: Zoltán Balogh <zoltan.balogh@ericsson.com>; ovs dev
>> <dev@openvswitch.org>; Andy Zhou <azhou@ovn.org>
>> Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by combining
>> recirc actions at xlate.
>>
>> On 17 July 2017 at 15:17, Chandran, Sugesh <sugesh.chandran@intel.com>
>> wrote:
>> > Comments inline
>> >
>> > Regards
>> > _Sugesh
>> >
>> >> -----Original Message-----
>> >> From: Joe Stringer [mailto:joe@ovn.org]
>> >> Sent: Monday, July 17, 2017 7:33 PM
>> >> To: Chandran, Sugesh <sugesh.chandran@intel.com>
>> >> Cc: Zoltán Balogh <zoltan.balogh@ericsson.com>; ovs dev
>> >> <dev@openvswitch.org>; Andy Zhou <azhou@ovn.org>
>> >> Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by
>> >> combining recirc actions at xlate.
>> >>
>> >> On 17 July 2017 at 10:27, Chandran, Sugesh
>> >> <sugesh.chandran@intel.com>
>> >> wrote:
>> >> >> > -----Original Message-----
>> >> >> > From: Joe Stringer [mailto:joe@ovn.org]
>> >> >> > Sent: Saturday, July 15, 2017 1:46 AM
>> >> >> > To: Sugesh Chandran <sugesh.chandran@intel.com>
>> >> >> > Cc: ovs dev <dev@openvswitch.org>; Andy Zhou <azhou@ovn.org>;
>> >> >> > Zoltán Balogh <zoltan.balogh@ericsson.com>
>> >> >> > Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by
>> >> >> > combining
>> >> >> recirc actions at xlate.
>> >> >> >
>> >> >> > On 4 July 2017 at 02:46, Sugesh Chandran
>> >> >> > <sugesh.chandran@intel.com>
>> >> >> wrote:
>> >>
>> >> <snip>
>> >>
>> >> >> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index
>> >> >> > > 4e29085..4d996c1 100644
>> >> >> > > --- a/lib/dpif-netdev.c
>> >> >> > > +++ b/lib/dpif-netdev.c
>> >> >> > > @@ -5028,24 +5028,8 @@ dp_execute_cb(void *aux_, struct
>> >> >> > > dp_packet_batch *packets_,
>> >> >> > >
>> >> >> > >      case OVS_ACTION_ATTR_TUNNEL_PUSH:
>> >> >> > >          if (*depth < MAX_RECIRC_DEPTH) {
>> >> >> > > -            struct dp_packet_batch tnl_pkt;
>> >> >> > > -            struct dp_packet_batch *orig_packets_ = packets_;
>> >> >> > > -            int err;
>> >> >> > > -
>> >> >> > > -            if (!may_steal) {
>> >> >> > > -                dp_packet_batch_clone(&tnl_pkt, packets_);
>> >> >> > > -                packets_ = &tnl_pkt;
>> >> >> > > -                dp_packet_batch_reset_cutlen(orig_packets_);
>> >> >> > > -            }
>> >> >> > > -
>> >> >> > >              dp_packet_batch_apply_cutlen(packets_);
>> >> >> > > -
>> >> >> > > -            err = push_tnl_action(pmd, a, packets_);
>> >> >> > > -            if (!err) {
>> >> >> > > -                (*depth)++;
>> >> >> > > -                dp_netdev_recirculate(pmd, packets_);
>> >> >> > > -                (*depth)--;
>> >> >> > > -            }
>> >> >> > > +            push_tnl_action(pmd, a, packets_);
>> >> >> >
>> >> >> > If I follow, this was the previous datapath-level code to ensure
>> >> >> > that when packets are output to a tunnel, only the copy of the
>> >> >> > packet that goes to the tunnel gets the cutlen applied. With the
>> >> >> > new version of the code, we replace this by making sure that the
>> >> >> > ofproto layer generates a clone to wrap the push_tunnel and all
>> >> >> > subsequent bridge translation, so then it results in the
>> >> >> > equivalent
>> >> >> > behaviour: The cutlen is only ever applied to a clone of the
>> >> >> > packets. Is
>> >> that right?
>> >> > [Sugesh] Hmm. That’s the good catch. Looks like the packets are not
>> >> > cloned when combine_tunnel_action is found impossible in ofproto
>> layer.
>> >> >
>> >> > As a fix we should keep the code block in 'may_steal' to handle the
>> >> > non tunnel-combine case.
>> >>
>> >> Oh, really? I thought that we always end up with a clone wrapping the
>> >> tunnel_push now, which means we shouldn't need this bit any more. My
>> >> comment was intended just to ask if I understood correctly to make
>> >> sure we don't miss anything.
>> >>
>> >> > Any other suggestion to apply cut_len in ofproto?
>> >>
>> >> Unfortunately the cut_len is set up through a previous truncate
>> >> action, and it is supposed to apply to the "next output action" (and
>> >> reset afterwards), so it is part of the datapath actions interface
>> >> definition; I don't think there's a way to push it up to ofproto.
>> > [Sugesh] Or at ofproto layer,  the non-combine tunnel push also placed
>> > in a clone, similar to the combine case? That will help to keep the datapath
>> as simple as just a push.
>>
>> That sounds simpler, but I don't think we can get rid of applying the cut_len
>> just yet. This is related to some other discussions going on around the
>> interaction between truncate() and clone(), so it's probably a bit safer to just
>> leave this code to apply cut len in place for now.
> [Sugesh] I don’t meant to get rid of cut_len. Instead we make sure any tunnel-push
> always happen on a clone. We keep the push +cut_len as is in the patch.
> So the ofproto actions would become
> clone(tnl_push), trunc(200) -->Non-combine case. (instead of just [tnl_push, trunc(200)]
> clone(tnl_push, output(1)) ---> Combine case
> Do you think it has any other implication that we may missed?

OK, that sounds about right, except that the truncate action is
serialized first - see xlate_output_trunc_action(). Ie, from a
translation perspective it generates trunc(),clone(tnl_push), then
from datapath perspective when executing, the trunc() sets the
cut_len, then the clone(tnl_push) is responsible for cutting the
packet before pushing the header.

I'm not aware of any other implications of this.

>> >> >> > > +static bool
>> >> >> > > +validate_and_combine_post_tnl_actions(struct xlate_ctx *ctx,
>> >> >> > > +                                      const struct xport *xport,
>> >> >> > > +                                      struct xport *out_dev,
>> >> >> > > +                                      struct
>> >> >> > > +ovs_action_push_tnl
>> >> >> > > +tnl_push_data) {
>> >> >> > > +    const struct dpif_flow_stats *backup_resubmit_stats;
>> >> >> > > +    struct xlate_cache *backup_xcache;
>> >> >> > > +    bool nested_act_flag = false;
>> >> >> > > +    struct flow_wildcards tmp_flow_wc;
>> >> >> > > +    struct flow_wildcards *backup_flow_wc_ptr;
>> >> >> > > +    bool backup_side_effects;
>> >> >> > > +    const struct dp_packet *backup_pkt;
>> >> >> > > +
>> >> >> > > +    memset(&tmp_flow_wc, 0 , sizeof tmp_flow_wc);
>> >> >> > > +    backup_flow_wc_ptr = ctx->wc;
>> >> >> > > +    ctx->wc = &tmp_flow_wc;
>> >> >> > > +    ctx->xin->wc = NULL;
>> >> >> > > +    backup_resubmit_stats = ctx->xin->resubmit_stats;
>> >> >> > > +    backup_xcache = ctx->xin->xcache;
>> >> >> > > +    backup_side_effects = ctx->xin->allow_side_effects;
>> >> >> > > +    backup_pkt = ctx->xin->packet;
>> >> >> > > +
>> >> >> > > +    size_t push_action_size = 0;
>> >> >> > > +    size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions,
>> >> >> > > +                                           OVS_ACTION_ATTR_CLONE);
>> >> >> > > +    odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);
>> >> >> > > +    push_action_size = ctx->odp_actions->size;
>> >> >> > > +
>> >> >> > > +    ctx->xin->resubmit_stats =  NULL;
>> >> >> > > +    ctx->xin->xcache = xlate_cache_new(); /* Use new
>> >> >> > > + temporary
>> >> cache.
>> >> >> */
>> >> >> > > +    ctx->xin->allow_side_effects = false;
>> >> >> > > +    ctx->xin->packet = NULL;
>> >> >> >
>> >> >> > I realised that you asked about this after the previous patch,
>> >> >> > but I didn't understand the implications of it. I believe that
>> >> >> > this packet is how, for instance, the second bridge can forward
>> >> >> > packets to the controller. If you reset it to NULL here, then I
>> >> >> > don't think that the controller action is executed correctly
>> >> >> > during the second bridge translation. (That's a test we could add).
>> >> > [Sugesh] Yes, that’s a good test case for it.
>> >> >> >
>> >> >> > I suspect that some multicast snooping stuff and some other
>> >> >> > pieces that would be affected by this.
>> >> > [Sugesh] This is interesting. So does the controller expects a
>> >> > encaped packets in that case? Do you think, the controller action
>> >> > also be an exception case like the TRUNC?
>> >>
>> >> Yes, I think the controller expects encapped packets in this case.
>> >> Yes, I think that the easiest way to handle this is to put controller
>> >> in that exception case with truncate.
>> > [Sugesh] Ok.  Will add that case as an exception.
>> >
>> > Before that, Would like to clarify the controller output action bit
>> > more How does the following case are currently handled that involves
>> > controller action (We haven’t run any test case that involves a
>> > controller and very limited knowledge on that side)
>> >
>> > MOD_VLAN(10) --> (PATCH-PORT) --> (SECOND-BR) What if there is no
>> > rules to handle the packet in 'second-br' and wanted to report to
>> controller?
>> >
>> > Will the packet report with VLAN 10 to controller?
>>
>> Currently, this will work because patch port doesn't NULLify this
>> ctx->xin->packet pointer, then when it translates through the second
>> bridge and gets to execute_controller_action(), that function will make a
>> clone, apply the current actions to that packet, then send the modified
>> packet to the controller.
> [Sugesh] Ok. So all the pre-actions will be applied before
> cloning the packets(applying mod_vlan) using the callback.

Yup, that's right.

>> So, longer term I think that this NULLification should be removed and we
>> keep the same packet attached to the context. However at the moment,
>> from execute_controller_action() ->
>> xlate_execute_odp_actions() -> odp_execute_actions() there is no datapath
>> callback provided for implementing the tunnel_push functionality. Maybe it's
>> as simple as plugging
>> dpif_execute_helper_cb() into the odp_execute_actions() call there, but
> [Sugesh] I think we should offer the callback to do the modification.
> Will work on when this patch series is accepted.
>> that sounds like a separate change that should get some thorough review
>> and testing as well. I don't think it's strictly necessary for getting this series in,
>> this series can function correctly using the fallback path as we discussed
>> above. It'd be nice to see this at some point in future though to consolidate
>> the paths through this code.
> [Sugesh] Will try to do as a separate patch as you suggested.

OK, thanks!
Chandran, Sugesh July 18, 2017, 5:50 p.m. UTC | #11
Thank you Joe,
I sent out the v3 patch . Please have a look


Regards
_Sugesh


> -----Original Message-----

> From: Joe Stringer [mailto:joe@ovn.org]

> Sent: Tuesday, July 18, 2017 6:29 PM

> To: Chandran, Sugesh <sugesh.chandran@intel.com>

> Cc: Zoltán Balogh <zoltan.balogh@ericsson.com>; ovs dev

> <dev@openvswitch.org>; Andy Zhou <azhou@ovn.org>

> Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by combining

> recirc actions at xlate.

> 

> On 18 July 2017 at 01:59, Chandran, Sugesh <sugesh.chandran@intel.com>

> wrote:

> >

> >

> > Regards

> > _Sugesh

> >

> >

> >> -----Original Message-----

> >> From: Joe Stringer [mailto:joe@ovn.org]

> >> Sent: Tuesday, July 18, 2017 6:42 AM

> >> To: Chandran, Sugesh <sugesh.chandran@intel.com>

> >> Cc: Zoltán Balogh <zoltan.balogh@ericsson.com>; ovs dev

> >> <dev@openvswitch.org>; Andy Zhou <azhou@ovn.org>

> >> Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by

> >> combining recirc actions at xlate.

> >>

> >> On 17 July 2017 at 15:17, Chandran, Sugesh

> >> <sugesh.chandran@intel.com>

> >> wrote:

> >> > Comments inline

> >> >

> >> > Regards

> >> > _Sugesh

> >> >

> >> >> -----Original Message-----

> >> >> From: Joe Stringer [mailto:joe@ovn.org]

> >> >> Sent: Monday, July 17, 2017 7:33 PM

> >> >> To: Chandran, Sugesh <sugesh.chandran@intel.com>

> >> >> Cc: Zoltán Balogh <zoltan.balogh@ericsson.com>; ovs dev

> >> >> <dev@openvswitch.org>; Andy Zhou <azhou@ovn.org>

> >> >> Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc by

> >> >> combining recirc actions at xlate.

> >> >>

> >> >> On 17 July 2017 at 10:27, Chandran, Sugesh

> >> >> <sugesh.chandran@intel.com>

> >> >> wrote:

> >> >> >> > -----Original Message-----

> >> >> >> > From: Joe Stringer [mailto:joe@ovn.org]

> >> >> >> > Sent: Saturday, July 15, 2017 1:46 AM

> >> >> >> > To: Sugesh Chandran <sugesh.chandran@intel.com>

> >> >> >> > Cc: ovs dev <dev@openvswitch.org>; Andy Zhou

> <azhou@ovn.org>;

> >> >> >> > Zoltán Balogh <zoltan.balogh@ericsson.com>

> >> >> >> > Subject: Re: [PATCH v2 4/4] tunneling: Avoid datapath-recirc

> >> >> >> > by combining

> >> >> >> recirc actions at xlate.

> >> >> >> >

> >> >> >> > On 4 July 2017 at 02:46, Sugesh Chandran

> >> >> >> > <sugesh.chandran@intel.com>

> >> >> >> wrote:

> >> >>

> >> >> <snip>

> >> >>

> >> >> >> > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index

> >> >> >> > > 4e29085..4d996c1 100644

> >> >> >> > > --- a/lib/dpif-netdev.c

> >> >> >> > > +++ b/lib/dpif-netdev.c

> >> >> >> > > @@ -5028,24 +5028,8 @@ dp_execute_cb(void *aux_, struct

> >> >> >> > > dp_packet_batch *packets_,

> >> >> >> > >

> >> >> >> > >      case OVS_ACTION_ATTR_TUNNEL_PUSH:

> >> >> >> > >          if (*depth < MAX_RECIRC_DEPTH) {

> >> >> >> > > -            struct dp_packet_batch tnl_pkt;

> >> >> >> > > -            struct dp_packet_batch *orig_packets_ = packets_;

> >> >> >> > > -            int err;

> >> >> >> > > -

> >> >> >> > > -            if (!may_steal) {

> >> >> >> > > -                dp_packet_batch_clone(&tnl_pkt, packets_);

> >> >> >> > > -                packets_ = &tnl_pkt;

> >> >> >> > > -                dp_packet_batch_reset_cutlen(orig_packets_);

> >> >> >> > > -            }

> >> >> >> > > -

> >> >> >> > >              dp_packet_batch_apply_cutlen(packets_);

> >> >> >> > > -

> >> >> >> > > -            err = push_tnl_action(pmd, a, packets_);

> >> >> >> > > -            if (!err) {

> >> >> >> > > -                (*depth)++;

> >> >> >> > > -                dp_netdev_recirculate(pmd, packets_);

> >> >> >> > > -                (*depth)--;

> >> >> >> > > -            }

> >> >> >> > > +            push_tnl_action(pmd, a, packets_);

> >> >> >> >

> >> >> >> > If I follow, this was the previous datapath-level code to

> >> >> >> > ensure that when packets are output to a tunnel, only the

> >> >> >> > copy of the packet that goes to the tunnel gets the cutlen

> >> >> >> > applied. With the new version of the code, we replace this by

> >> >> >> > making sure that the ofproto layer generates a clone to wrap

> >> >> >> > the push_tunnel and all subsequent bridge translation, so

> >> >> >> > then it results in the equivalent

> >> >> >> > behaviour: The cutlen is only ever applied to a clone of the

> >> >> >> > packets. Is

> >> >> that right?

> >> >> > [Sugesh] Hmm. That’s the good catch. Looks like the packets are

> >> >> > not cloned when combine_tunnel_action is found impossible in

> >> >> > ofproto

> >> layer.

> >> >> >

> >> >> > As a fix we should keep the code block in 'may_steal' to handle

> >> >> > the non tunnel-combine case.

> >> >>

> >> >> Oh, really? I thought that we always end up with a clone wrapping

> >> >> the tunnel_push now, which means we shouldn't need this bit any

> >> >> more. My comment was intended just to ask if I understood

> >> >> correctly to make sure we don't miss anything.

> >> >>

> >> >> > Any other suggestion to apply cut_len in ofproto?

> >> >>

> >> >> Unfortunately the cut_len is set up through a previous truncate

> >> >> action, and it is supposed to apply to the "next output action"

> >> >> (and reset afterwards), so it is part of the datapath actions

> >> >> interface definition; I don't think there's a way to push it up to ofproto.

> >> > [Sugesh] Or at ofproto layer,  the non-combine tunnel push also

> >> > placed in a clone, similar to the combine case? That will help to

> >> > keep the datapath

> >> as simple as just a push.

> >>

> >> That sounds simpler, but I don't think we can get rid of applying the

> >> cut_len just yet. This is related to some other discussions going on

> >> around the interaction between truncate() and clone(), so it's

> >> probably a bit safer to just leave this code to apply cut len in place for

> now.

> > [Sugesh] I don’t meant to get rid of cut_len. Instead we make sure any

> > tunnel-push always happen on a clone. We keep the push +cut_len as is in

> the patch.

> > So the ofproto actions would become

> > clone(tnl_push), trunc(200) -->Non-combine case. (instead of just

> > [tnl_push, trunc(200)] clone(tnl_push, output(1)) ---> Combine case Do

> > you think it has any other implication that we may missed?

> 

> OK, that sounds about right, except that the truncate action is serialized first -

> see xlate_output_trunc_action(). Ie, from a translation perspective it

> generates trunc(),clone(tnl_push), then from datapath perspective when

> executing, the trunc() sets the cut_len, then the clone(tnl_push) is

> responsible for cutting the packet before pushing the header.

> 

> I'm not aware of any other implications of this.

> 

> >> >> >> > > +static bool

> >> >> >> > > +validate_and_combine_post_tnl_actions(struct xlate_ctx *ctx,

> >> >> >> > > +                                      const struct xport *xport,

> >> >> >> > > +                                      struct xport *out_dev,

> >> >> >> > > +                                      struct

> >> >> >> > > +ovs_action_push_tnl

> >> >> >> > > +tnl_push_data) {

> >> >> >> > > +    const struct dpif_flow_stats *backup_resubmit_stats;

> >> >> >> > > +    struct xlate_cache *backup_xcache;

> >> >> >> > > +    bool nested_act_flag = false;

> >> >> >> > > +    struct flow_wildcards tmp_flow_wc;

> >> >> >> > > +    struct flow_wildcards *backup_flow_wc_ptr;

> >> >> >> > > +    bool backup_side_effects;

> >> >> >> > > +    const struct dp_packet *backup_pkt;

> >> >> >> > > +

> >> >> >> > > +    memset(&tmp_flow_wc, 0 , sizeof tmp_flow_wc);

> >> >> >> > > +    backup_flow_wc_ptr = ctx->wc;

> >> >> >> > > +    ctx->wc = &tmp_flow_wc;

> >> >> >> > > +    ctx->xin->wc = NULL;

> >> >> >> > > +    backup_resubmit_stats = ctx->xin->resubmit_stats;

> >> >> >> > > +    backup_xcache = ctx->xin->xcache;

> >> >> >> > > +    backup_side_effects = ctx->xin->allow_side_effects;

> >> >> >> > > +    backup_pkt = ctx->xin->packet;

> >> >> >> > > +

> >> >> >> > > +    size_t push_action_size = 0;

> >> >> >> > > +    size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions,

> >> >> >> > > +                                           OVS_ACTION_ATTR_CLONE);

> >> >> >> > > +    odp_put_tnl_push_action(ctx->odp_actions,

> &tnl_push_data);

> >> >> >> > > +    push_action_size = ctx->odp_actions->size;

> >> >> >> > > +

> >> >> >> > > +    ctx->xin->resubmit_stats =  NULL;

> >> >> >> > > +    ctx->xin->xcache = xlate_cache_new(); /* Use new

> >> >> >> > > + temporary

> >> >> cache.

> >> >> >> */

> >> >> >> > > +    ctx->xin->allow_side_effects = false;

> >> >> >> > > +    ctx->xin->packet = NULL;

> >> >> >> >

> >> >> >> > I realised that you asked about this after the previous

> >> >> >> > patch, but I didn't understand the implications of it. I

> >> >> >> > believe that this packet is how, for instance, the second

> >> >> >> > bridge can forward packets to the controller. If you reset it

> >> >> >> > to NULL here, then I don't think that the controller action

> >> >> >> > is executed correctly during the second bridge translation. (That's

> a test we could add).

> >> >> > [Sugesh] Yes, that’s a good test case for it.

> >> >> >> >

> >> >> >> > I suspect that some multicast snooping stuff and some other

> >> >> >> > pieces that would be affected by this.

> >> >> > [Sugesh] This is interesting. So does the controller expects a

> >> >> > encaped packets in that case? Do you think, the controller

> >> >> > action also be an exception case like the TRUNC?

> >> >>

> >> >> Yes, I think the controller expects encapped packets in this case.

> >> >> Yes, I think that the easiest way to handle this is to put

> >> >> controller in that exception case with truncate.

> >> > [Sugesh] Ok.  Will add that case as an exception.

> >> >

> >> > Before that, Would like to clarify the controller output action bit

> >> > more How does the following case are currently handled that

> >> > involves controller action (We haven’t run any test case that

> >> > involves a controller and very limited knowledge on that side)

> >> >

> >> > MOD_VLAN(10) --> (PATCH-PORT) --> (SECOND-BR) What if there is no

> >> > rules to handle the packet in 'second-br' and wanted to report to

> >> controller?

> >> >

> >> > Will the packet report with VLAN 10 to controller?

> >>

> >> Currently, this will work because patch port doesn't NULLify this

> >> ctx->xin->packet pointer, then when it translates through the second

> >> bridge and gets to execute_controller_action(), that function will

> >> make a clone, apply the current actions to that packet, then send the

> >> modified packet to the controller.

> > [Sugesh] Ok. So all the pre-actions will be applied before cloning the

> > packets(applying mod_vlan) using the callback.

> 

> Yup, that's right.

> 

> >> So, longer term I think that this NULLification should be removed and

> >> we keep the same packet attached to the context. However at the

> >> moment, from execute_controller_action() ->

> >> xlate_execute_odp_actions() -> odp_execute_actions() there is no

> >> datapath callback provided for implementing the tunnel_push

> >> functionality. Maybe it's as simple as plugging

> >> dpif_execute_helper_cb() into the odp_execute_actions() call there,

> >> but

> > [Sugesh] I think we should offer the callback to do the modification.

> > Will work on when this patch series is accepted.

> >> that sounds like a separate change that should get some thorough

> >> review and testing as well. I don't think it's strictly necessary for

> >> getting this series in, this series can function correctly using the

> >> fallback path as we discussed above. It'd be nice to see this at some

> >> point in future though to consolidate the paths through this code.

> > [Sugesh] Will try to do as a separate patch as you suggested.

> 

> OK, thanks!
diff mbox

Patch

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 4e29085..4d996c1 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -5028,24 +5028,8 @@  dp_execute_cb(void *aux_, struct dp_packet_batch *packets_,
 
     case OVS_ACTION_ATTR_TUNNEL_PUSH:
         if (*depth < MAX_RECIRC_DEPTH) {
-            struct dp_packet_batch tnl_pkt;
-            struct dp_packet_batch *orig_packets_ = packets_;
-            int err;
-
-            if (!may_steal) {
-                dp_packet_batch_clone(&tnl_pkt, packets_);
-                packets_ = &tnl_pkt;
-                dp_packet_batch_reset_cutlen(orig_packets_);
-            }
-
             dp_packet_batch_apply_cutlen(packets_);
-
-            err = push_tnl_action(pmd, a, packets_);
-            if (!err) {
-                (*depth)++;
-                dp_netdev_recirculate(pmd, packets_);
-                (*depth)--;
-            }
+            push_tnl_action(pmd, a, packets_);
             return;
         }
         break;
diff --git a/ofproto/ofproto-dpif-xlate-cache.c b/ofproto/ofproto-dpif-xlate-cache.c
index 9161701..945f169 100644
--- a/ofproto/ofproto-dpif-xlate-cache.c
+++ b/ofproto/ofproto-dpif-xlate-cache.c
@@ -89,7 +89,7 @@  xlate_cache_netdev(struct xc_entry *entry, const struct dpif_flow_stats *stats)
 /* Push stats and perform side effects of flow translation. */
 void
 xlate_push_stats_entry(struct xc_entry *entry,
-                       const struct dpif_flow_stats *stats)
+                       struct dpif_flow_stats *stats)
 {
     struct eth_addr dmac;
 
@@ -160,6 +160,14 @@  xlate_push_stats_entry(struct xc_entry *entry,
             entry->controller.am = NULL; /* One time only. */
         }
         break;
+    case XC_TUNNEL_HEADER:
+        if (entry->tunnel_hdr.operation == ADD) {
+            stats->n_bytes += stats->n_packets * entry->tunnel_hdr.hdr_size;
+        } else {
+            stats->n_bytes -= stats->n_packets * entry->tunnel_hdr.hdr_size;
+        }
+
+        break;
     default:
         OVS_NOT_REACHED();
     }
@@ -167,7 +175,7 @@  xlate_push_stats_entry(struct xc_entry *entry,
 
 void
 xlate_push_stats(struct xlate_cache *xcache,
-                 const struct dpif_flow_stats *stats)
+                 struct dpif_flow_stats *stats)
 {
     if (!stats->n_packets) {
         return;
@@ -245,6 +253,8 @@  xlate_cache_clear_entry(struct xc_entry *entry)
             entry->controller.am = NULL;
         }
         break;
+    case XC_TUNNEL_HEADER:
+        break;
     default:
         OVS_NOT_REACHED();
     }
diff --git a/ofproto/ofproto-dpif-xlate-cache.h b/ofproto/ofproto-dpif-xlate-cache.h
index 13f7cbc..1efc648 100644
--- a/ofproto/ofproto-dpif-xlate-cache.h
+++ b/ofproto/ofproto-dpif-xlate-cache.h
@@ -52,6 +52,7 @@  enum xc_type {
     XC_GROUP,
     XC_TNL_NEIGH,
     XC_CONTROLLER,
+    XC_TUNNEL_HEADER,
 };
 
 /* xlate_cache entries hold enough information to perform the side effects of
@@ -119,6 +120,13 @@  struct xc_entry {
             struct ofproto_dpif *ofproto;
             struct ofproto_async_msg *am;
         } controller;
+        struct {
+            enum {
+                ADD,
+                REMOVE,
+            } operation;
+            uint16_t hdr_size;
+        } tunnel_hdr;
     };
 };
 
@@ -134,8 +142,8 @@  struct xlate_cache {
 void xlate_cache_init(struct xlate_cache *);
 struct xlate_cache *xlate_cache_new(void);
 struct xc_entry *xlate_cache_add_entry(struct xlate_cache *, enum xc_type);
-void xlate_push_stats_entry(struct xc_entry *, const struct dpif_flow_stats *);
-void xlate_push_stats(struct xlate_cache *, const struct dpif_flow_stats *);
+void xlate_push_stats_entry(struct xc_entry *, struct dpif_flow_stats *);
+void xlate_push_stats(struct xlate_cache *, struct dpif_flow_stats *);
 void xlate_cache_clear_entry(struct xc_entry *);
 void xlate_cache_clear(struct xlate_cache *);
 void xlate_cache_uninit(struct xlate_cache *);
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index a335d14..808a87b 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -3146,6 +3146,262 @@  tnl_send_arp_request(struct xlate_ctx *ctx, const struct xport *out_dev,
     dp_packet_uninit(&packet);
 }
 
+static void
+propagate_tunnel_data_to_flow__(struct flow *dst_flow,
+                                const struct flow *src_flow,
+                                struct eth_addr dmac, struct eth_addr smac,
+                                struct in6_addr s_ip6, ovs_be32 s_ip,
+                                bool is_tnl_ipv6, uint8_t nw_proto)
+{
+    dst_flow->dl_dst = dmac;
+    dst_flow->dl_src = smac;
+
+    dst_flow->packet_type = htonl(PT_ETH);
+    dst_flow->nw_dst = src_flow->tunnel.ip_dst;
+    dst_flow->nw_src = src_flow->tunnel.ip_src;
+    dst_flow->ipv6_dst = src_flow->tunnel.ipv6_dst;
+    dst_flow->ipv6_src = src_flow->tunnel.ipv6_src;
+
+    dst_flow->nw_tos = src_flow->tunnel.ip_tos;
+    dst_flow->nw_ttl = src_flow->tunnel.ip_ttl;
+    dst_flow->tp_dst = src_flow->tunnel.tp_dst;
+    dst_flow->tp_src = src_flow->tunnel.tp_src;
+
+    if (is_tnl_ipv6) {
+        dst_flow->dl_type = htons(ETH_TYPE_IPV6);
+        if (ipv6_mask_is_any(&dst_flow->ipv6_src)
+            && !ipv6_mask_is_any(&s_ip6)) {
+            dst_flow->ipv6_src = s_ip6;
+        }
+    } else {
+        dst_flow->dl_type = htons(ETH_TYPE_IP);
+        if (dst_flow->nw_src == 0 && s_ip) {
+            dst_flow->nw_src = s_ip;
+        }
+    }
+    dst_flow->nw_proto = nw_proto;
+}
+
+/*
+ * Populate the 'flow' and 'base_flow' L3 fields to do the post tunnel push
+ * translations.
+ */
+static void
+propagate_tunnel_data_to_flow(struct xlate_ctx *ctx, struct eth_addr dmac,
+                              struct eth_addr smac,   struct in6_addr s_ip6,
+                              ovs_be32 s_ip, bool is_tnl_ipv6,
+                              enum ovs_vport_type tnl_type)
+{
+    struct flow *base_flow, *flow;
+    flow = &ctx->xin->flow;
+    base_flow = &ctx->base_flow;
+    uint8_t nw_proto = 0;
+
+    switch (tnl_type) {
+    case OVS_VPORT_TYPE_GRE:
+        nw_proto = IPPROTO_GRE;
+        break;
+    case OVS_VPORT_TYPE_VXLAN:
+    case OVS_VPORT_TYPE_GENEVE:
+        nw_proto = IPPROTO_UDP;
+        break;
+    case OVS_VPORT_TYPE_LISP:
+    case OVS_VPORT_TYPE_STT:
+    case OVS_VPORT_TYPE_UNSPEC:
+    case OVS_VPORT_TYPE_NETDEV:
+    case OVS_VPORT_TYPE_INTERNAL:
+    case __OVS_VPORT_TYPE_MAX:
+    default:
+        OVS_NOT_REACHED();
+        break;
+    }
+    /*
+     * Update base_flow first followed by flow as the dst_flow gets modified
+     * in the function.
+     */
+    propagate_tunnel_data_to_flow__(base_flow, flow, dmac, smac, s_ip6, s_ip,
+                                    is_tnl_ipv6, nw_proto);
+    propagate_tunnel_data_to_flow__(flow, flow, dmac, smac, s_ip6, s_ip,
+                                    is_tnl_ipv6, nw_proto);
+}
+
+/* Validate if the transalated combined actions are OK to proceed.
+ * If actions consist of TRUNC action, it is not allowed to do the
+ * tunnel_push combine as it cannot update stats correctly.
+ */
+static bool
+is_tunnel_actions_clone_ready(struct ofpbuf *actions)
+{
+    struct nlattr *tnl_actions;
+    const struct nlattr *a;
+    unsigned int left;
+    size_t actions_len;
+
+    if (!actions) {
+        /* No actions, no harm in doing combine. */
+        return true;
+    }
+    actions_len = actions->size;
+
+    tnl_actions =(struct nlattr *)(actions->data);
+    NL_ATTR_FOR_EACH_UNSAFE (a, left, tnl_actions, actions_len) {
+        int type = nl_attr_type(a);
+        if (type == OVS_ACTION_ATTR_TRUNC) {
+            VLOG_DBG("Cannot do tunnel action-combine on trunc action");
+            return false;
+            break;
+        }
+    }
+    return true;
+}
+
+/*
+ * Copy the xc entry and update the references accordingly.
+ */
+static void
+copy_xc_entry(struct xc_entry *xc_dst, struct xc_entry *xc_src)
+{
+    memcpy(xc_dst, xc_src, sizeof *xc_dst);
+    switch (xc_src->type) {
+    case XC_RULE:
+        ofproto_rule_ref(&xc_dst->rule->up);
+        break;
+    case XC_BOND:
+        bond_ref(xc_dst->bond.bond);
+        xc_dst->bond.flow = xmemdup(xc_src->bond.flow,
+                            sizeof *xc_src->bond.flow);
+        break;
+    case XC_NETDEV:
+        if (xc_src->dev.tx) {
+            netdev_ref(xc_dst->dev.tx);
+        }
+        if (xc_src->dev.rx) {
+            netdev_ref(xc_dst->dev.rx);
+        }
+        if (xc_src->dev.bfd) {
+            bfd_ref(xc_dst->dev.bfd);
+        }
+        break;
+    case XC_NETFLOW:
+        netflow_ref(xc_dst->nf.netflow);
+        xc_dst->nf.flow = xmemdup(xc_src->nf.flow, sizeof *xc_src->nf.flow);
+        break;
+    case XC_MIRROR:
+        mbridge_ref(xc_dst->mirror.mbridge);
+        break;
+    case XC_LEARN:
+    case XC_TABLE:
+    case XC_NORMAL:
+    case XC_FIN_TIMEOUT:
+    case XC_GROUP:
+    case XC_TNL_NEIGH:
+    case XC_CONTROLLER:
+    case XC_TUNNEL_HEADER:
+    break;
+    default:
+        OVS_NOT_REACHED();
+    }
+}
+
+static bool
+validate_and_combine_post_tnl_actions(struct xlate_ctx *ctx,
+                                      const struct xport *xport,
+                                      struct xport *out_dev,
+                                      struct ovs_action_push_tnl tnl_push_data)
+{
+    const struct dpif_flow_stats *backup_resubmit_stats;
+    struct xlate_cache *backup_xcache;
+    bool nested_act_flag = false;
+    struct flow_wildcards tmp_flow_wc;
+    struct flow_wildcards *backup_flow_wc_ptr;
+    bool backup_side_effects;
+    const struct dp_packet *backup_pkt;
+
+    memset(&tmp_flow_wc, 0 , sizeof tmp_flow_wc);
+    backup_flow_wc_ptr = ctx->wc;
+    ctx->wc = &tmp_flow_wc;
+    ctx->xin->wc = NULL;
+    backup_resubmit_stats = ctx->xin->resubmit_stats;
+    backup_xcache = ctx->xin->xcache;
+    backup_side_effects = ctx->xin->allow_side_effects;
+    backup_pkt = ctx->xin->packet;
+
+    size_t push_action_size = 0;
+    size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions,
+                                           OVS_ACTION_ATTR_CLONE);
+    odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);
+    push_action_size = ctx->odp_actions->size;
+
+    ctx->xin->resubmit_stats =  NULL;
+    ctx->xin->xcache = xlate_cache_new(); /* Use new temporary cache. */
+    ctx->xin->allow_side_effects = false;
+    ctx->xin->packet = NULL;
+
+    if (ctx->xin->xcache) {
+        /* Push the cache entry for the tunnel first. */
+        struct xc_entry *entry;
+        entry = xlate_cache_add_entry(ctx->xin->xcache, XC_TUNNEL_HEADER);
+        entry->tunnel_hdr.hdr_size = tnl_push_data.header_len;
+        entry->tunnel_hdr.operation = ADD;
+    }
+
+    apply_nested_clone_actions(ctx, xport, out_dev);
+    nested_act_flag = is_tunnel_actions_clone_ready(ctx->odp_actions);
+
+    if (nested_act_flag) {
+        struct dpif_flow_stats tmp_resubmit_stats;
+        struct xc_entry *entry;
+        struct ofpbuf entries;
+
+        memset(&tmp_resubmit_stats, 0, sizeof tmp_resubmit_stats);
+         /* Similar to the stats update in revalidation, the x_cache entries
+          * are populated by the previous translation are used to update the
+          * stats correctly.
+         .*/
+        ovs_mutex_lock(&ctx->xin->resubmit_stats_mutex);
+        if (backup_resubmit_stats) {
+            memcpy(&tmp_resubmit_stats, backup_resubmit_stats,
+                   sizeof tmp_resubmit_stats);
+            xlate_push_stats(ctx->xin->xcache, &tmp_resubmit_stats);
+        }
+        entries = ctx->xin->xcache->entries;
+        XC_ENTRY_FOR_EACH (entry, &entries) {
+            struct xc_entry *xc_copy_entry;
+            if (!backup_xcache) {
+                break;
+            }
+            xc_copy_entry = xlate_cache_add_entry(backup_xcache, entry->type);
+            copy_xc_entry(xc_copy_entry, entry);
+        }
+        ovs_mutex_unlock(&ctx->xin->resubmit_stats_mutex);
+    }
+    else {
+        /* Combine is not valid. */
+        nl_msg_cancel_nested(ctx->odp_actions, clone_ofs);
+        goto out;
+    }
+    if (ctx->odp_actions->size > push_action_size) {
+        /* Update the CLONE action only when combined. */
+        nl_msg_end_nested(ctx->odp_actions, clone_ofs);
+    }
+    else {
+        /* No actions after the tunnel, no need of clone. */
+        nl_msg_cancel_nested(ctx->odp_actions, clone_ofs);
+        odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);
+    }
+
+out:
+    /* Restore context status. */
+    ctx->xin->resubmit_stats = backup_resubmit_stats;
+    xlate_cache_delete(ctx->xin->xcache);
+    ctx->xin->xcache = backup_xcache;
+    ctx->xin->allow_side_effects = backup_side_effects;
+    ctx->xin->packet = backup_pkt;
+    ctx->wc = backup_flow_wc_ptr;
+    return nested_act_flag;
+
+}
+
 static int
 build_tunnel_send(struct xlate_ctx *ctx, const struct xport *xport,
                   const struct flow *flow, odp_port_t tunnel_odp_port)
@@ -3162,6 +3418,14 @@  build_tunnel_send(struct xlate_ctx *ctx, const struct xport *xport,
     char buf_sip6[INET6_ADDRSTRLEN];
     char buf_dip6[INET6_ADDRSTRLEN];
 
+    /* Structures to backup Ethernet and IP of base_flow. */
+    struct flow old_base_flow;
+    struct flow old_flow;
+
+    /* Backup flow & base_flow data. */
+    memcpy(&old_base_flow, &ctx->base_flow, sizeof old_base_flow);
+    memcpy(&old_flow, &ctx->xin->flow, sizeof old_flow);
+
     err = tnl_route_lookup_flow(flow, &d_ip6, &s_ip6, &out_dev);
     if (err) {
         xlate_report(ctx, OFT_WARN, "native tunnel routing failed");
@@ -3221,12 +3485,31 @@  build_tunnel_send(struct xlate_ctx *ctx, const struct xport *xport,
     tnl_push_data.tnl_port = tunnel_odp_port;
     tnl_push_data.out_port = out_dev->odp_port;
 
-    /* After tunnel header has been added, packet_type of flow and base_flow
-     * need to be set to PT_ETH. */
-    ctx->xin->flow.packet_type = htonl(PT_ETH);
-    ctx->base_flow.packet_type = htonl(PT_ETH);
+    /* After tunnel header has been added, MAC and IP data of flow and
+     * base_flow need to be set properly, since there is not recirculation
+     * any more when sending packet to tunnel. */
 
-    odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);
+    propagate_tunnel_data_to_flow(ctx, dmac, smac, s_ip6, s_ip,
+                                  tnl_params.is_ipv6, tnl_push_data.tnl_type);
+
+
+    /* Try to see if its possible to apply nested clone actions on tunnel.
+     * Revert the combined actions on tunnel if its not valid.
+     */
+    if (!validate_and_combine_post_tnl_actions(ctx, xport, out_dev,
+                                      tnl_push_data)) {
+        /* Datapath is not doing the recirculation now, so lets make it
+         * happen explicitly.
+         */
+        size_t clone_ofs = nl_msg_start_nested(ctx->odp_actions,
+                                        OVS_ACTION_ATTR_CLONE);
+        odp_put_tnl_push_action(ctx->odp_actions, &tnl_push_data);
+        nl_msg_put_u32(ctx->odp_actions, OVS_ACTION_ATTR_RECIRC, 0);
+        nl_msg_end_nested(ctx->odp_actions, clone_ofs);
+    }
+    /* Restore the flows after the translation. */
+    memcpy(&ctx->xin->flow, &old_flow, sizeof ctx->xin->flow);
+    memcpy(&ctx->base_flow, &old_base_flow, sizeof ctx->base_flow);
     return 0;
 }
 
@@ -5978,6 +6261,8 @@  xlate_in_init(struct xlate_in *xin, struct ofproto_dpif *ofproto,
     xin->odp_actions = odp_actions;
     xin->in_packet_out = false;
 
+    ovs_mutex_init_adaptive(&xin->resubmit_stats_mutex);
+
     /* Do recirc lookup. */
     xin->frozen_state = NULL;
     if (flow->recirc_id) {
diff --git a/ofproto/ofproto-dpif-xlate.h b/ofproto/ofproto-dpif-xlate.h
index 68e114a..8e48665 100644
--- a/ofproto/ofproto-dpif-xlate.h
+++ b/ofproto/ofproto-dpif-xlate.h
@@ -112,6 +112,7 @@  struct xlate_in {
      * This is normally null so the client has to set it manually after
      * calling xlate_in_init(). */
     const struct dpif_flow_stats *resubmit_stats;
+    struct ovs_mutex resubmit_stats_mutex;
 
     /* Counters carried over from a pre-existing translation of a related flow.
      * This can occur due to, e.g., the translation of an ARP packet that was
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index d19d486..edf1eef 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -4591,7 +4591,7 @@  packet_xlate_revert(struct ofproto *ofproto OVS_UNUSED,
 static void
 ofproto_dpif_xcache_execute(struct ofproto_dpif *ofproto,
                             struct xlate_cache *xcache,
-                            const struct dpif_flow_stats *stats)
+                            struct dpif_flow_stats *stats)
     OVS_REQUIRES(ofproto_mutex)
 {
     struct xc_entry *entry;
@@ -4624,6 +4624,7 @@  ofproto_dpif_xcache_execute(struct ofproto_dpif *ofproto,
         case XC_GROUP:
         case XC_TNL_NEIGH:
         case XC_CONTROLLER:
+        case XC_TUNNEL_HEADER:
             xlate_push_stats_entry(entry, stats);
             break;
         default:
diff --git a/tests/packet-type-aware.at b/tests/packet-type-aware.at
index 1104078..7eda747 100644
--- a/tests/packet-type-aware.at
+++ b/tests/packet-type-aware.at
@@ -326,8 +326,7 @@  ovs-appctl time/warp 1000
 AT_CHECK([
     ovs-appctl dpctl/dump-flows --names dummy@ovs-dummy | strip_used | grep -v ipv6 | sort
 ], [0], [flow-dump from non-dpdk interfaces:
-recirc_id(0),in_port(br-p1),packet_type(ns=0,id=0),eth(dst=aa:55:00:00:00:03),eth_type(0x0800),ipv4(src=10.0.0.1,dst=10.0.0.3,proto=47,frag=no), packets:1, bytes:136, used:0.0s, actions:set(ipv4(src=30.0.0.1,dst=30.0.0.3)),tnl_pop(gre_sys)
-recirc_id(0),in_port(n1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=192.168.10.30,tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=aa:55:00:00:00:03,src=aa:55:00:00:00:01,dl_type=0x0800),ipv4(src=10.0.0.1,dst=10.0.0.3,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x6558))),out_port(br-p1))
+recirc_id(0),in_port(n1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=192.168.10.30,tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:clone(tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=aa:55:00:00:00:03,src=aa:55:00:00:00:01,dl_type=0x0800),ipv4(src=10.0.0.1,dst=10.0.0.3,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x6558))),out_port(br-p1)),set(ipv4(src=30.0.0.1,dst=30.0.0.3)),tnl_pop(gre_sys))
 tunnel(src=30.0.0.1,dst=30.0.0.3,flags(-df-csum)),recirc_id(0),in_port(gre_sys),packet_type(ns=0,id=0),eth(dst=1e:2c:e9:2a:66:9e),eth_type(0x0800),ipv4(dst=192.168.10.30,frag=no), packets:1, bytes:98, used:0.0s, actions:set(eth(dst=aa:55:aa:55:00:03)),n3
 ])
 
@@ -345,8 +344,7 @@  ovs-appctl time/warp 1000
 AT_CHECK([
     ovs-appctl dpctl/dump-flows --names dummy@ovs-dummy | strip_used | grep -v ipv6 | sort
 ], [0], [flow-dump from non-dpdk interfaces:
-recirc_id(0),in_port(br-p1),packet_type(ns=0,id=0),eth(dst=aa:55:00:00:00:02),eth_type(0x0800),ipv4(src=10.0.0.1,dst=10.0.0.2,proto=47,frag=no), packets:1, bytes:136, used:0.0s, actions:set(ipv4(src=20.0.0.1,dst=20.0.0.2)),tnl_pop(gre_sys)
-recirc_id(0),in_port(n1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=192.168.10.20,tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=aa:55:00:00:00:02,src=aa:55:00:00:00:01,dl_type=0x0800),ipv4(src=10.0.0.1,dst=10.0.0.2,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x6558))),out_port(br-p1))
+recirc_id(0),in_port(n1),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=192.168.10.20,tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:clone(tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=aa:55:00:00:00:02,src=aa:55:00:00:00:01,dl_type=0x0800),ipv4(src=10.0.0.1,dst=10.0.0.2,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x6558))),out_port(br-p1)),set(ipv4(src=20.0.0.1,dst=20.0.0.2)),tnl_pop(gre_sys))
 tunnel(src=20.0.0.1,dst=20.0.0.2,flags(-df-csum)),recirc_id(0),in_port(gre_sys),packet_type(ns=0,id=0),eth(dst=46:1e:7d:1a:95:a1),eth_type(0x0800),ipv4(dst=192.168.10.20,frag=no), packets:1, bytes:98, used:0.0s, actions:set(eth(dst=aa:55:aa:55:00:02)),n2
 ])
 
@@ -364,8 +362,7 @@  ovs-appctl time/warp 1000
 AT_CHECK([
     ovs-appctl dpctl/dump-flows --names dummy@ovs-dummy | strip_used | grep -v ipv6 | sort
 ], [0], [flow-dump from non-dpdk interfaces:
-recirc_id(0),in_port(br-p2),packet_type(ns=0,id=0),eth(dst=aa:55:00:00:00:01),eth_type(0x0800),ipv4(src=20.0.0.2,dst=20.0.0.1,proto=47,frag=no), packets:1, bytes:136, used:0.0s, actions:set(ipv4(src=10.0.0.2,dst=10.0.0.1)),tnl_pop(gre_sys)
-recirc_id(0),in_port(n2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=192.168.10.10,tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=aa:55:00:00:00:01,src=aa:55:00:00:00:02,dl_type=0x0800),ipv4(src=20.0.0.2,dst=20.0.0.1,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x6558))),out_port(br-p2))
+recirc_id(0),in_port(n2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=192.168.10.10,tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:clone(tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=aa:55:00:00:00:01,src=aa:55:00:00:00:02,dl_type=0x0800),ipv4(src=20.0.0.2,dst=20.0.0.1,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x6558))),out_port(br-p2)),set(ipv4(src=10.0.0.2,dst=10.0.0.1)),tnl_pop(gre_sys))
 tunnel(src=10.0.0.2,dst=10.0.0.1,flags(-df-csum)),recirc_id(0),in_port(gre_sys),packet_type(ns=0,id=0),eth(dst=3a:6d:d2:09:9c:ab),eth_type(0x0800),ipv4(dst=192.168.10.10,frag=no), packets:1, bytes:98, used:0.0s, actions:set(eth(dst=aa:55:aa:55:00:01)),n1
 ])
 
@@ -383,10 +380,8 @@  ovs-appctl time/warp 1000
 AT_CHECK([
     ovs-appctl dpctl/dump-flows --names dummy@ovs-dummy | strip_used | grep -v ipv6 | sort
 ], [0], [flow-dump from non-dpdk interfaces:
-recirc_id(0),in_port(br-p1),packet_type(ns=0,id=0),eth(dst=aa:55:00:00:00:03),eth_type(0x0800),ipv4(src=10.0.0.1,dst=10.0.0.3,proto=47,frag=no), packets:1, bytes:136, used:0.0s, actions:set(ipv4(src=30.0.0.1,dst=30.0.0.3)),tnl_pop(gre_sys)
-recirc_id(0),in_port(br-p2),packet_type(ns=0,id=0),eth(dst=aa:55:00:00:00:01),eth_type(0x0800),ipv4(src=20.0.0.2,dst=20.0.0.1,proto=47,frag=no), packets:1, bytes:136, used:0.0s, actions:set(ipv4(src=10.0.0.2,dst=10.0.0.1)),tnl_pop(gre_sys)
-recirc_id(0),in_port(n2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=192.168.10.30,tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=aa:55:00:00:00:01,src=aa:55:00:00:00:02,dl_type=0x0800),ipv4(src=20.0.0.2,dst=20.0.0.1,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x6558))),out_port(br-p2))
-tunnel(src=10.0.0.2,dst=10.0.0.1,flags(-df-csum)),recirc_id(0),in_port(gre_sys),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=192.168.10.30,tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=aa:55:00:00:00:03,src=aa:55:00:00:00:01,dl_type=0x0800),ipv4(src=10.0.0.1,dst=10.0.0.3,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x6558))),out_port(br-p1))
+recirc_id(0),in_port(n2),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=192.168.10.30,tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:clone(tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=aa:55:00:00:00:01,src=aa:55:00:00:00:02,dl_type=0x0800),ipv4(src=20.0.0.2,dst=20.0.0.1,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x6558))),out_port(br-p2)),set(ipv4(src=10.0.0.2,dst=10.0.0.1)),tnl_pop(gre_sys))
+tunnel(src=10.0.0.2,dst=10.0.0.1,flags(-df-csum)),recirc_id(0),in_port(gre_sys),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=192.168.10.30,tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:clone(tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=aa:55:00:00:00:03,src=aa:55:00:00:00:01,dl_type=0x0800),ipv4(src=10.0.0.1,dst=10.0.0.3,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x6558))),out_port(br-p1)),set(ipv4(src=30.0.0.1,dst=30.0.0.3)),tnl_pop(gre_sys))
 tunnel(src=30.0.0.1,dst=30.0.0.3,flags(-df-csum)),recirc_id(0),in_port(gre_sys),packet_type(ns=0,id=0),eth(dst=1e:2c:e9:2a:66:9e),eth_type(0x0800),ipv4(dst=192.168.10.30,frag=no), packets:1, bytes:98, used:0.0s, actions:set(eth(dst=aa:55:aa:55:00:03)),n3
 ])
 
@@ -404,11 +399,9 @@  ovs-appctl time/warp 1000
 AT_CHECK([
     ovs-appctl dpctl/dump-flows --names dummy@ovs-dummy | strip_used | grep -v ipv6 | sort
 ], [0], [flow-dump from non-dpdk interfaces:
-recirc_id(0),in_port(br-p2),packet_type(ns=0,id=0),eth(dst=aa:55:00:00:00:01),eth_type(0x0800),ipv4(src=20.0.0.2,dst=20.0.0.1,proto=47,frag=no), packets:1, bytes:122, used:0.0s, actions:set(ipv4(src=10.0.0.2,dst=10.0.0.1)),tnl_pop(gre_sys)
-recirc_id(0),in_port(br-p3),packet_type(ns=0,id=0),eth(dst=aa:55:00:00:00:02),eth_type(0x0800),ipv4(src=30.0.0.3,dst=30.0.0.2,proto=47,frag=no), packets:1, bytes:122, used:0.0s, actions:set(ipv4(src=20.0.0.3,dst=20.0.0.2)),tnl_pop(gre_sys)
-recirc_id(0),in_port(n3),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=192.168.10.10,tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:pop_eth,tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=aa:55:00:00:00:02,src=aa:55:00:00:00:03,dl_type=0x0800),ipv4(src=30.0.0.3,dst=30.0.0.2,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x800))),out_port(br-p3))
+recirc_id(0),in_port(n3),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=192.168.10.10,tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:pop_eth,clone(tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=aa:55:00:00:00:02,src=aa:55:00:00:00:03,dl_type=0x0800),ipv4(src=30.0.0.3,dst=30.0.0.2,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x800))),out_port(br-p3)),set(ipv4(src=20.0.0.3,dst=20.0.0.2)),tnl_pop(gre_sys))
 tunnel(src=10.0.0.2,dst=10.0.0.1,flags(-df-csum)),recirc_id(0),in_port(gre_sys),packet_type(ns=1,id=0x800),ipv4(dst=192.168.10.10,frag=no), packets:1, bytes:84, used:0.0s, actions:push_eth(src=00:00:00:00:00:00,dst=aa:55:aa:55:00:01),n1
-tunnel(src=20.0.0.3,dst=20.0.0.2,flags(-df-csum)),recirc_id(0),in_port(gre_sys),packet_type(ns=1,id=0x800),ipv4(dst=192.168.10.10,tos=0/0x3,frag=no), packets:1, bytes:84, used:0.0s, actions:tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=aa:55:00:00:00:01,src=aa:55:00:00:00:02,dl_type=0x0800),ipv4(src=20.0.0.2,dst=20.0.0.1,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x800))),out_port(br-p2))
+tunnel(src=20.0.0.3,dst=20.0.0.2,flags(-df-csum)),recirc_id(0),in_port(gre_sys),packet_type(ns=1,id=0x800),ipv4(dst=192.168.10.10,tos=0/0x3,frag=no), packets:1, bytes:84, used:0.0s, actions:clone(tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=aa:55:00:00:00:01,src=aa:55:00:00:00:02,dl_type=0x0800),ipv4(src=20.0.0.2,dst=20.0.0.1,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x800))),out_port(br-p2)),set(ipv4(src=10.0.0.2,dst=10.0.0.1)),tnl_pop(gre_sys))
 ])
 
 # Clear up megaflow cache
@@ -425,8 +418,7 @@  ovs-appctl time/warp 1000
 AT_CHECK([
     ovs-appctl dpctl/dump-flows --names dummy@ovs-dummy | strip_used | grep -v ipv6 | sort
 ], [0], [flow-dump from non-dpdk interfaces:
-recirc_id(0),in_port(br-p3),packet_type(ns=0,id=0),eth(dst=aa:55:00:00:00:02),eth_type(0x0800),ipv4(src=30.0.0.3,dst=30.0.0.2,proto=47,frag=no), packets:1, bytes:136, used:0.0s, actions:set(ipv4(src=20.0.0.3,dst=20.0.0.2)),tnl_pop(gre_sys)
-recirc_id(0),in_port(n3),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=192.168.10.20,tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=aa:55:00:00:00:02,src=aa:55:00:00:00:03,dl_type=0x0800),ipv4(src=30.0.0.3,dst=30.0.0.2,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x6558))),out_port(br-p3))
+recirc_id(0),in_port(n3),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(dst=192.168.10.20,tos=0/0x3,frag=no), packets:1, bytes:98, used:0.0s, actions:clone(tnl_push(tnl_port(gre_sys),header(size=38,type=3,eth(dst=aa:55:00:00:00:02,src=aa:55:00:00:00:03,dl_type=0x0800),ipv4(src=30.0.0.3,dst=30.0.0.2,proto=47,tos=0,ttl=64,frag=0x4000),gre((flags=0x0,proto=0x6558))),out_port(br-p3)),set(ipv4(src=20.0.0.3,dst=20.0.0.2)),tnl_pop(gre_sys))
 tunnel(src=20.0.0.3,dst=20.0.0.2,flags(-df-csum)),recirc_id(0),in_port(gre_sys),packet_type(ns=0,id=0),eth(dst=46:1e:7d:1a:95:a1),eth_type(0x0800),ipv4(dst=192.168.10.20,frag=no), packets:1, bytes:98, used:0.0s, actions:set(eth(dst=aa:55:aa:55:00:02)),n2
 ])