[ovs-dev,v3] Avoid tunneling for VLAN packets redirected to a gateway chassis

Message ID 20180525113347.6259-1-vkommadi@redhat.com
State Changes Requested
Headers show
Series
  • [ovs-dev,v3] Avoid tunneling for VLAN packets redirected to a gateway chassis
Related show

Commit Message

Venkata Anil May 25, 2018, 11:33 a.m.
From: venkata anil <vkommadi@redhat.com>

When a vm on a vlan tenant network sends traffic to an external network,
it is tunneled from host chassis to gateway chassis. In the earlier
discussion [1], Russel (also in his doc [2]) suggested if we can figure
out a way for OVN to do this redirect to the gateway host over a VLAN
network. This patch implements his suggestion i.e will redirect to
gateway chassis using incoming tenant vlan network. Gateway chassis are
expected to be configured with tenant vlan networks. In this approach,
new logical and physical flows introduced for packet processing in both
host and gateway chassis.

Packet processing in the host chassis:
1) A new ovs flow added in physical table 65, which sets MLF_RCV_FROM_VLAN
   flag for packets from vlan network entering into router pipeline
2) A new flow added in lr_in_ip_routing, for packets output through
   distributed gateway port and matching MLF_RCV_FROM_VLAN flag,
   set REGBIT_NAT_REDIRECT i.e
   table=7 (lr_in_ip_routing   ), priority=2    , match=(
   ip4.dst == 0.0.0.0/0 && flags.rcv_from_vlan == 1 &&
   !is_chassis_resident("cr-alice")), action=(reg9[0] = 1; next;)
   This flow will be set only on chassis not hosting chassisredirect
   port i.e compute node.
   When REGBIT_NAT_REDIRECT set,
   a) lr_in_arp_resolve, will set packet eth.dst to distibuted gateway
      port MAC
   b) lr_in_gw_redirect, will set chassisredirect port as outport
3) A new ovs flow added in physical table 32 will use source vlan tenant
   network tag as vlan ID for sending the packet to gateway chassis.
   As this vlan packet destination MAC is distibuted gateway port MAC,
   packet will only reach the gateway chassis.
   table=32,priority=150,reg14=0x3,reg15=0x6,metadata=0x4
   actions=mod_vlan_vid:2010,output:25,strip_vlan
   This flow will be set only on chassis not hosting chassisredirect
   port i.e compute node.

Packet processing in the gateway chassis:
1) A new ovs flow added in physical table 0 to pass vlan traffic coming
   from localnet port to the connected router pipeline(i.e router
   attached to vlan tenant network).
   This flow will set router metadata, reg14 to router's patch port(lrp)
   (i.e patch port connecting router and vlan tenant network) and a new
   MLF_RCV_FROM_VLAN flag.
   table=0,priority=150,in_port=67,dl_vlan=2010 actions=strip_vlan,
   load:0x4->OXM_OF_METADATA[],load:0x3->NXM_NX_REG14[],
   load:0x1->NXM_NX_REG10[5],resubmit(,8)
   This flow will be set only on chassis hosting chassisredirect
   port i.e gateway node.
2) A new flow added in lr_in_admission which checks MLF_RCV_FROM_VLAN
   and allows the packet. This flow will be set only on chassis hosting
   chassisredirect port i.e gateway node.
   table=0 (lr_in_admission    ), priority=100  , match=(
   flags.rcv_from_vlan == 1 && inport == "lrp-44383893-613a-4bfe-b483-
   e7d0dc3055cd" && is_chassis_resident("cr-lrp-a6e3d2ab-313a-4ea3-
   8ec4-c3c774a11f49")), action=(next;)
   Then packet will pass through router ingress and egress pipelines and
   then to external switch pipeline.

In a scenario where the traffic between two vms in the same tenant vlan
network across different chassis i.e if "vm1" on tenant vlan network
"net1" is on host chassis "ch1" and "vm2" on same tenant vlan network
"net1" is on gateway chassis "gw1". When the packet arrived on "gw1"
chassis from localnet port, we still send it to router pipeline and router
pipeline will send it to destination switch ("net1") pipeline.
But in this case when packet arrives at "vm2", it will have router MAC as
source MAC as the packet is routed in gateway chassis. This bevaviour can
be seen only for destination vms hosted on gateway node.

[1] https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046557.html
[2] Point 3 in section 3.3.1 - Future Enhancements
https://docs.google.com/document/d/1JecGIXPH0RAqfGvD0nmtBdEU1zflHACp8WSRnKCFSgg/edit#

Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html

Signed-off-by: Venkata Anil <vkommadi@redhat.com>
---
 ovn/controller/bfd.c            |   3 +-
 ovn/controller/binding.c        |  10 +-
 ovn/controller/ovn-controller.c |   3 +
 ovn/controller/ovn-controller.h |  16 ++-
 ovn/controller/physical.c       |  94 ++++++++++++++++-
 ovn/lib/logical-fields.c        |   4 +
 ovn/lib/logical-fields.h        |   2 +
 ovn/northd/ovn-northd.c         |  35 +++++++
 tests/ovn.at                    | 227 ++++++++++++++++++++++++++++++++++++++++
 9 files changed, 390 insertions(+), 4 deletions(-)

Comments

Russell Bryant May 30, 2018, 7:59 p.m. | #1
On Fri, May 25, 2018 at 7:33 AM,  <vkommadi@redhat.com> wrote:
> From: venkata anil <vkommadi@redhat.com>
>
> When a vm on a vlan tenant network sends traffic to an external network,
> it is tunneled from host chassis to gateway chassis. In the earlier
> discussion [1], Russel (also in his doc [2]) suggested if we can figure
> out a way for OVN to do this redirect to the gateway host over a VLAN
> network. This patch implements his suggestion i.e will redirect to
> gateway chassis using incoming tenant vlan network. Gateway chassis are
> expected to be configured with tenant vlan networks. In this approach,
> new logical and physical flows introduced for packet processing in both
> host and gateway chassis.

I don't think we can impose the expectation that the gateway is on the
same vlan network as the original compute node.  The previous behavior
of using the tunnel does not require that.

Have you thought of whether we could use the new behavior
automatically if we know both chassis are on the same network, or fall
back to a tunnel if necessary?

>
> Packet processing in the host chassis:
> 1) A new ovs flow added in physical table 65, which sets MLF_RCV_FROM_VLAN
>    flag for packets from vlan network entering into router pipeline
> 2) A new flow added in lr_in_ip_routing, for packets output through
>    distributed gateway port and matching MLF_RCV_FROM_VLAN flag,
>    set REGBIT_NAT_REDIRECT i.e
>    table=7 (lr_in_ip_routing   ), priority=2    , match=(
>    ip4.dst == 0.0.0.0/0 && flags.rcv_from_vlan == 1 &&
>    !is_chassis_resident("cr-alice")), action=(reg9[0] = 1; next;)
>    This flow will be set only on chassis not hosting chassisredirect
>    port i.e compute node.
>    When REGBIT_NAT_REDIRECT set,
>    a) lr_in_arp_resolve, will set packet eth.dst to distibuted gateway
>       port MAC
>    b) lr_in_gw_redirect, will set chassisredirect port as outport
> 3) A new ovs flow added in physical table 32 will use source vlan tenant
>    network tag as vlan ID for sending the packet to gateway chassis.
>    As this vlan packet destination MAC is distibuted gateway port MAC,
>    packet will only reach the gateway chassis.
>    table=32,priority=150,reg14=0x3,reg15=0x6,metadata=0x4
>    actions=mod_vlan_vid:2010,output:25,strip_vlan
>    This flow will be set only on chassis not hosting chassisredirect
>    port i.e compute node.
>
> Packet processing in the gateway chassis:
> 1) A new ovs flow added in physical table 0 to pass vlan traffic coming
>    from localnet port to the connected router pipeline(i.e router
>    attached to vlan tenant network).
>    This flow will set router metadata, reg14 to router's patch port(lrp)
>    (i.e patch port connecting router and vlan tenant network) and a new
>    MLF_RCV_FROM_VLAN flag.
>    table=0,priority=150,in_port=67,dl_vlan=2010 actions=strip_vlan,
>    load:0x4->OXM_OF_METADATA[],load:0x3->NXM_NX_REG14[],
>    load:0x1->NXM_NX_REG10[5],resubmit(,8)
>    This flow will be set only on chassis hosting chassisredirect
>    port i.e gateway node.
> 2) A new flow added in lr_in_admission which checks MLF_RCV_FROM_VLAN
>    and allows the packet. This flow will be set only on chassis hosting
>    chassisredirect port i.e gateway node.
>    table=0 (lr_in_admission    ), priority=100  , match=(
>    flags.rcv_from_vlan == 1 && inport == "lrp-44383893-613a-4bfe-b483-
>    e7d0dc3055cd" && is_chassis_resident("cr-lrp-a6e3d2ab-313a-4ea3-
>    8ec4-c3c774a11f49")), action=(next;)
>    Then packet will pass through router ingress and egress pipelines and
>    then to external switch pipeline.
>
> In a scenario where the traffic between two vms in the same tenant vlan
> network across different chassis i.e if "vm1" on tenant vlan network
> "net1" is on host chassis "ch1" and "vm2" on same tenant vlan network
> "net1" is on gateway chassis "gw1". When the packet arrived on "gw1"
> chassis from localnet port, we still send it to router pipeline and router
> pipeline will send it to destination switch ("net1") pipeline.

Why is this?  Wouldn't the packet just have a destination MAC for "vm2"?

> But in this case when packet arrives at "vm2", it will have router MAC as
> source MAC as the packet is routed in gateway chassis. This bevaviour can
> be seen only for destination vms hosted on gateway node.
>
> [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046557.html
> [2] Point 3 in section 3.3.1 - Future Enhancements
> https://docs.google.com/document/d/1JecGIXPH0RAqfGvD0nmtBdEU1zflHACp8WSRnKCFSgg/edit#
>
> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
>
> Signed-off-by: Venkata Anil <vkommadi@redhat.com>
> ---
>  ovn/controller/bfd.c            |   3 +-
>  ovn/controller/binding.c        |  10 +-
>  ovn/controller/ovn-controller.c |   3 +
>  ovn/controller/ovn-controller.h |  16 ++-
>  ovn/controller/physical.c       |  94 ++++++++++++++++-
>  ovn/lib/logical-fields.c        |   4 +
>  ovn/lib/logical-fields.h        |   2 +
>  ovn/northd/ovn-northd.c         |  35 +++++++
>  tests/ovn.at                    | 227 ++++++++++++++++++++++++++++++++++++++++
>  9 files changed, 390 insertions(+), 4 deletions(-)
>
> diff --git a/ovn/controller/bfd.c b/ovn/controller/bfd.c
> index 8f020d5..cbbd3ba 100644
> --- a/ovn/controller/bfd.c
> +++ b/ovn/controller/bfd.c
> @@ -139,8 +139,9 @@ bfd_travel_gw_related_chassis(struct local_datapath *dp,
>                                    struct local_datapath_node, node);
>          dp = dp_binding->dp;
>          free(dp_binding);
> +        const struct sbrec_datapath_binding *pdp;
>          for (size_t i = 0; i < dp->n_peer_dps; i++) {
> -            const struct sbrec_datapath_binding *pdp = dp->peer_dps[i];
> +            pdp = dp->peer_dps[i]->peer_dp;
>              if (!pdp) {
>                  continue;
>              }
> diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
> index 0785a94..f02bde5 100644
> --- a/ovn/controller/binding.c
> +++ b/ovn/controller/binding.c
> @@ -148,10 +148,14 @@ add_local_datapath__(struct controller_ctx *ctx,
>                                  "lport-by-datapath", &cursor);
>
>      SBREC_PORT_BINDING_FOR_EACH_EQUAL (pb, &cursor, lpval) {
> +        if (!strcmp(pb->type, "chassisredirect")) {
> +            ld->chassisredirect_port = pb;
> +        }
>          if (!strcmp(pb->type, "patch")) {
>              const char *peer_name = smap_get(&pb->options, "peer");
>              if (peer_name) {
>                  const struct sbrec_port_binding *peer;
> +                struct peer_datapath *pdp;
>
>                  peer = lport_lookup_by_name( ctx->ovnsb_idl, peer_name);
>
> @@ -162,8 +166,12 @@ add_local_datapath__(struct controller_ctx *ctx,
>                      ld->peer_dps = xrealloc(
>                              ld->peer_dps,
>                              ld->n_peer_dps * sizeof *ld->peer_dps);
> -                    ld->peer_dps[ld->n_peer_dps - 1] = datapath_lookup_by_key(
> +                    pdp = xcalloc(1, sizeof(struct peer_datapath));
> +                    pdp->peer_dp = datapath_lookup_by_key(
>                          ctx->ovnsb_idl, peer->datapath->tunnel_key);
> +                    pdp->patch = pb;
> +                    pdp->peer = peer;
> +                    ld->peer_dps[ld->n_peer_dps - 1] = pdp;
>                  }
>              }
>          }
> diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
> index 86e1836..55573fd 100644
> --- a/ovn/controller/ovn-controller.c
> +++ b/ovn/controller/ovn-controller.c
> @@ -803,6 +803,9 @@ main(int argc, char *argv[])
>
>          struct local_datapath *cur_node, *next_node;
>          HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node, &local_datapaths) {
> +            for (int i = 0; i < cur_node->n_peer_dps; i++) {
> +                free(cur_node->peer_dps[i]);
> +            }
>              free(cur_node->peer_dps);
>              hmap_remove(&local_datapaths, &cur_node->hmap_node);
>              free(cur_node);
> diff --git a/ovn/controller/ovn-controller.h b/ovn/controller/ovn-controller.h
> index 6617b0c..8023de2 100644
> --- a/ovn/controller/ovn-controller.h
> +++ b/ovn/controller/ovn-controller.h
> @@ -46,6 +46,17 @@ struct ct_zone_pending_entry {
>      enum ct_zone_pending_state state;
>  };
>
> +/* Represents a peer datapath connected to a given datapath */
> +struct peer_datapath {
> +    const struct sbrec_datapath_binding *peer_dp;
> +
> +    /* Patch port connected to local datapath */
> +    const struct sbrec_port_binding *patch;
> +
> +    /* Peer patch port connected to peer datapath */
> +    const struct sbrec_port_binding *peer;
> +};
> +
>  /* A logical datapath that has some relevance to this hypervisor.  A logical
>   * datapath D is relevant to hypervisor H if:
>   *
> @@ -63,10 +74,13 @@ struct local_datapath {
>      /* The localnet port in this datapath, if any (at most one is allowed). */
>      const struct sbrec_port_binding *localnet_port;
>
> +    /* The chassisredirect port in this datapath */
> +    const struct sbrec_port_binding *chassisredirect_port;
> +
>      /* True if this datapath contains an l3gateway port located on this
>       * hypervisor. */
>      bool has_local_l3gateway;
> -    const struct sbrec_datapath_binding **peer_dps;
> +    struct peer_datapath **peer_dps;
>      size_t n_peer_dps;
>  };
>
> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
> index fc8adcf..ad57f69 100644
> --- a/ovn/controller/physical.c
> +++ b/ovn/controller/physical.c
> @@ -304,7 +304,8 @@ consider_port_binding(struct controller_ctx *ctx,
>  {
>      uint32_t dp_key = binding->datapath->tunnel_key;
>      uint32_t port_key = binding->tunnel_key;
> -    if (!get_local_datapath(local_datapaths, dp_key)) {
> +    struct local_datapath *ld = get_local_datapath(local_datapaths, dp_key);
> +    if (!ld) {
>          return;
>      }
>
> @@ -350,6 +351,14 @@ consider_port_binding(struct controller_ctx *ctx,
>              put_load(0, MFF_LOG_REG0 + i, 0, 32, ofpacts_p);
>          }
>          put_load(0, MFF_IN_PORT, 0, 16, ofpacts_p);
> +        if (ld->localnet_port) {
> +            int vlan_tag = (ld->localnet_port->n_tag ?
> +                            *ld->localnet_port->tag : 0);
> +            if (vlan_tag) {
> +                put_load(1, MFF_LOG_FLAGS, MLF_RCV_FROM_VLAN_BIT, 1,
> +                         ofpacts_p);
> +            }
> +        }
>          put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
>          clone = ofpbuf_at_assert(ofpacts_p, clone_ofs, sizeof *clone);
>          ofpacts_p->header = clone;
> @@ -539,6 +548,47 @@ consider_port_binding(struct controller_ctx *ctx,
>           * input port, MFF_LOG_DATAPATH to the logical datapath, and
>           * resubmit into the logical ingress pipeline starting at table
>           * 16. */
> +
> +        /* Match a VLAN tag and strip it. If the vlan network is connected
> +         * to a router which has a gateway port on redirect-chassis,
> +         * set MLF_RCV_FROM_VLAN flag, router metadata and input port to
> +         * connecting patch port */
> +        int vlan_tag = binding->n_tag ? *binding->tag : 0;
> +        if (!strcmp(binding->type, "localnet") && vlan_tag) {
> +            struct local_datapath *ldp = get_local_datapath(
> +                local_datapaths, binding->datapath->tunnel_key);
> +            for (int i = 0; i < ldp->n_peer_dps; i++) {
> +                struct local_datapath *peer_ldp = get_local_datapath(
> +                    local_datapaths, ldp->peer_dps[i]->peer_dp->tunnel_key);
> +                const struct sbrec_port_binding *crp;
> +                crp = peer_ldp->chassisredirect_port;
> +                if (crp && crp->chassis &&
> +                   !strcmp(crp->chassis->name, chassis->name)) {
> +                    const char *gwp = smap_get(&crp->options,
> +                                               "distributed-port");
> +                    if (strcmp(gwp, ldp->peer_dps[i]->peer->logical_port)) {
> +                        ofpbuf_clear(ofpacts_p);
> +                        match_init_catchall(&match);
> +
> +                        match_set_in_port(&match, ofport);
> +                        match_set_dl_vlan(&match, htons(vlan_tag));
> +
> +                        ofpact_put_STRIP_VLAN(ofpacts_p);
> +                        put_load(peer_ldp->datapath->tunnel_key,
> +                                 MFF_LOG_DATAPATH, 0, 64, ofpacts_p);
> +                        put_load(ldp->peer_dps[i]->peer->tunnel_key,
> +                                 MFF_LOG_INPORT, 0, 32, ofpacts_p);
> +                        put_load(1, MFF_LOG_FLAGS,
> +                                 MLF_RCV_FROM_VLAN_BIT, 1, ofpacts_p);
> +                        put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
> +
> +                        ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG,
> +                                        150, 0, &match, ofpacts_p);
> +                    }
> +                }
> +            }
> +        }
> +
>          ofpbuf_clear(ofpacts_p);
>          match_init_catchall(&match);
>          match_set_in_port(&match, ofport);
> @@ -639,6 +689,48 @@ consider_port_binding(struct controller_ctx *ctx,
>           * flow matches an output port that includes a logical port on a remote
>           * hypervisor, and tunnels the packet to that hypervisor.
>           */
> +
> +        /* For each vlan network connected to the router, add that network's
> +         * vlan tag to the packet and output it through localnet port */
> +        struct local_datapath *ldp = get_local_datapath(local_datapaths,
> +                                                        dp_key);
> +        for (int i = 0; i < ldp->n_peer_dps; i++) {
> +            struct ofpact_vlan_vid *vlan_vid;
> +            ofp_port_t port_ofport = 0;
> +            struct peer_datapath *pdp = ldp->peer_dps[i];
> +            struct local_datapath *peer_ldp = get_local_datapath(
> +                local_datapaths, pdp->peer_dp->tunnel_key);
> +            if (peer_ldp->localnet_port && pdp->patch->tunnel_key) {
> +                int64_t vlan_tag = (peer_ldp->localnet_port->n_tag ?
> +                                    *peer_ldp->localnet_port->tag : 0);
> +                if (!vlan_tag) {
> +                    continue;
> +                }
> +                port_ofport = u16_to_ofp(simap_get(&localvif_to_ofport,
> +                    peer_ldp->localnet_port->logical_port));
> +                if (!port_ofport) {
> +                    continue;
> +                }
> +
> +                match_init_catchall(&match);
> +                ofpbuf_clear(ofpacts_p);
> +
> +                match_set_metadata(&match, htonll(dp_key));
> +                match_set_reg(&match, MFF_LOG_INPORT - MFF_REG0,
> +                              pdp->patch->tunnel_key);
> +                match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key);
> +
> +                vlan_vid = ofpact_put_SET_VLAN_VID(ofpacts_p);
> +                vlan_vid->vlan_vid = vlan_tag;
> +                vlan_vid->push_vlan_if_needed = true;
> +                ofpact_put_OUTPUT(ofpacts_p)->port = port_ofport;
> +                ofpact_put_STRIP_VLAN(ofpacts_p);
> +
> +                ofctrl_add_flow(flow_table, OFTABLE_REMOTE_OUTPUT, 150, 0,
> +                                &match, ofpacts_p);
> +            }
> +        }
> +
>          match_init_catchall(&match);
>          ofpbuf_clear(ofpacts_p);
>
> diff --git a/ovn/lib/logical-fields.c b/ovn/lib/logical-fields.c
> index a8b5e3c..b9efa02 100644
> --- a/ovn/lib/logical-fields.c
> +++ b/ovn/lib/logical-fields.c
> @@ -105,6 +105,10 @@ ovn_init_symtab(struct shash *symtab)
>               MLF_FORCE_SNAT_FOR_LB_BIT);
>      expr_symtab_add_subfield(symtab, "flags.force_snat_for_lb", NULL,
>                               flags_str);
> +    snprintf(flags_str, sizeof flags_str, "flags[%d]",
> +             MLF_RCV_FROM_VLAN_BIT);
> +    expr_symtab_add_subfield(symtab, "flags.rcv_from_vlan", NULL,
> +                             flags_str);
>
>      /* Connection tracking state. */
>      expr_symtab_add_field(symtab, "ct_mark", MFF_CT_MARK, NULL, false);
> diff --git a/ovn/lib/logical-fields.h b/ovn/lib/logical-fields.h
> index b1dbb03..96250fd 100644
> --- a/ovn/lib/logical-fields.h
> +++ b/ovn/lib/logical-fields.h
> @@ -50,6 +50,7 @@ enum mff_log_flags_bits {
>      MLF_FORCE_SNAT_FOR_DNAT_BIT = 2,
>      MLF_FORCE_SNAT_FOR_LB_BIT = 3,
>      MLF_LOCAL_ONLY_BIT = 4,
> +    MLF_RCV_FROM_VLAN_BIT = 5,
>  };
>
>  /* MFF_LOG_FLAGS_REG flag assignments */
> @@ -75,6 +76,7 @@ enum mff_log_flags {
>       * hypervisors should instead only be output to local targets
>       */
>      MLF_LOCAL_ONLY = (1 << MLF_LOCAL_ONLY_BIT),
> +    MLF_RCV_FROM_VLAN = (1 << MLF_RCV_FROM_VLAN_BIT),
>  };
>
>  #endif /* ovn/lib/logical-fields.h */
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index 0e06776..f68da2b 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -4411,6 +4411,28 @@ add_route(struct hmap *lflows, const struct ovn_port *op,
>       * routing. */
>      ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING, priority,
>                    ds_cstr(&match), ds_cstr(&actions));
> +
> +    /* When output port is distributed gateway port, check if the router
> +     * input port is a patch port connected to vlan network.
> +     * Traffic from VLAN network to external network should be redirected
> +     * to "redirect-chassis" by setting REGBIT_NAT_REDIRECT flag.
> +     * Later physical table 32 will output this traffic to gateway
> +     * chassis using input network vlan tag */
> +    if (op == op->od->l3dgw_port) {
> +        ds_clear(&match);
> +        ds_clear(&actions);
> +
> +        ds_put_format(&match, "ip%s.%s == %s/%d", is_ipv4 ? "4" : "6",
> +                      dir, network_s, plen);
> +        ds_put_format(&match, " && flags.rcv_from_vlan == 1");
> +        ds_put_format(&match, " && !is_chassis_resident(%s)",
> +                      op->od->l3redirect_port->json_key);
> +
> +        ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING,
> +                      priority + 1, ds_cstr(&match),
> +                      REGBIT_NAT_REDIRECT" = 1; next;");
> +    }
> +
>      ds_destroy(&match);
>      ds_destroy(&actions);
>  }
> @@ -4822,6 +4844,19 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>          }
>          ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 50,
>                        ds_cstr(&match), "next;");
> +
> +        /* VLAN traffic from localnet port should be allowed for
> +         * router processing on the "redirect-chassis". */
> +        if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer &&
> +            op->peer->od->localnet_port && (op != op->od->l3dgw_port)) {
> +            ds_clear(&match);
> +            ds_put_format(&match, "flags.rcv_from_vlan == 1");
> +            ds_put_format(&match, " && inport == %s", op->json_key);
> +            ds_put_format(&match, " && is_chassis_resident(%s)",
> +                          op->od->l3redirect_port->json_key);
> +            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 100,
> +                          ds_cstr(&match), "next;");
> +        }
>      }
>
>      /* Logical router ingress table 1: IP Input. */
> diff --git a/tests/ovn.at b/tests/ovn.at
> index f12c24c..6916bd0 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -7713,6 +7713,233 @@ test_ip_packet gw2 gw1
>  OVN_CLEANUP([hv1],[gw1],[gw2],[ext1])
>  AT_CLEANUP
>
> +# VLAN traffic for external network redirected through distributed router gateway port
> +# should use vlans(i.e input network vlan tag) across hypervisors instead of tunneling.
> +AT_SETUP([ovn -- vlan traffic for external network with distributed router gateway port])
> +AT_SKIP_IF([test $HAVE_PYTHON = no])
> +ovn_start
> +
> +# Logical network:
> +# # One LR R1 that has switches foo (192.168.1.0/24) and
> +# # alice (172.16.1.0/24) connected to it.  The logical port
> +# # between R1 and alice has a "redirect-chassis" specified,
> +# # i.e. it is the distributed router gateway port(172.16.1.6).
> +# # Switch alice also has a localnet port defined.
> +# # An additional switch outside has the same subnet as alice
> +# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1)
> +# # which will receive the packet destined for external network
> +# # (i.e 8.8.8.8 as destination ip).
> +
> +# Physical network:
> +# # Three hypervisors hv[123].
> +# # hv1 hosts vif foo1.
> +# # hv2 is the "redirect-chassis" that hosts the distributed router gateway port.
> +# # hv3 hosts nexthop port vif outside1.
> +# # All other tests connect hypervisors to network n1 through br-phys for tunneling.
> +# # But in this test, hv1 won't connect to n1(and no br-phys in hv1), and
> +# # in order to show vlans(instead of tunneling) used between hv1 and hv2,
> +# # a new network n2 created and hv1 and hv2 connected to this network through br-ex.
> +# # hv2 and hv3 are still connected to n1 network through br-phys.
> +net_add n1
> +
> +# We are not calling ovn_attach for hv1, to avoid adding br-phys.
> +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any bridge in hv1
> +sim_add hv1
> +as hv1
> +ovs-vsctl \
> +    -- set Open_vSwitch . external-ids:system-id=hv1 \
> +    -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> +    -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \
> +    -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \
> +    -- add-br br-int \
> +    -- set bridge br-int fail-mode=secure other-config:disable-in-band=true
> +start_daemon ovn-controller
> +ovs-vsctl -- add-port br-int hv1-vif1 -- \
> +    set interface hv1-vif1 external-ids:iface-id=foo1 \
> +    ofport-request=1
> +
> +sim_add hv2
> +as hv2
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.2
> +
> +sim_add hv3
> +as hv3
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.3
> +ovs-vsctl -- add-port br-int hv3-vif1 -- \
> +    set interface hv3-vif1 external-ids:iface-id=outside1 \
> +    options:tx_pcap=hv3/vif1-tx.pcap \
> +    options:rxq_pcap=hv3/vif1-rx.pcap \
> +    ofport-request=1
> +
> +# Create network n2 for vlan connectivity between hv1 and hv2
> +net_add n2
> +
> +as hv1
> +ovs-vsctl add-br br-ex
> +net_attach n2 br-ex
> +
> +as hv2
> +ovs-vsctl add-br br-ex
> +net_attach n2 br-ex
> +
> +as hv1 ovs-vsctl set open . external-ids:ovn-bridge-mappings="public:br-ex"
> +as hv2 ovs-vsctl set open . external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys"
> +as hv3 ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
> +OVN_POPULATE_ARP
> +
> +ovn-nbctl create Logical_Router name=R1
> +
> +ovn-nbctl ls-add foo
> +ovn-nbctl ls-add alice
> +ovn-nbctl ls-add outside
> +
> +# Connect foo to R1
> +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24
> +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
> +    type=router options:router-port=foo \
> +    -- lsp-set-addresses rp-foo router
> +
> +# Connect alice to R1 as distributed router gateway port (172.16.1.6) on hv2
> +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \
> +    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
> +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
> +    type=router options:router-port=alice \
> +    -- lsp-set-addresses rp-alice router
> +
> +# Create logical port foo1 in foo
> +ovn-nbctl lsp-add foo foo1 \
> +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
> +
> +# Create logical port outside1 in outside, which is a nexthop address
> +# for 172.16.1.0/24
> +ovn-nbctl lsp-add outside outside1 \
> +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1"
> +
> +# Set default gateway (nexthop) to 172.16.1.1
> +ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice
> +AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24])
> +
> +ovn-nbctl lsp-add foo ln-foo
> +ovn-nbctl lsp-set-addresses ln-foo unknown
> +ovn-nbctl lsp-set-options ln-foo network_name=public
> +ovn-nbctl lsp-set-type ln-foo localnet
> +AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2])
> +
> +# Create localnet port in alice
> +ovn-nbctl lsp-add alice ln-alice
> +ovn-nbctl lsp-set-addresses ln-alice unknown
> +ovn-nbctl lsp-set-type ln-alice localnet
> +ovn-nbctl lsp-set-options ln-alice network_name=phys
> +
> +# Create localnet port in outside
> +ovn-nbctl lsp-add outside ln-outside
> +ovn-nbctl lsp-set-addresses ln-outside unknown
> +ovn-nbctl lsp-set-type ln-outside localnet
> +ovn-nbctl lsp-set-options ln-outside network_name=phys
> +ovn-nbctl --wait=hv sync
> +
> +ip_to_hex() {
> +    printf "%02x%02x%02x%02x" "$@"
> +}
> +gw_ip=$(ip_to_hex 172 16 1 6)
> +src_ip=$(ip_to_hex 192 168 1 2)
> +dst_ip=$(ip_to_hex 8 8 8 8)
> +nexthop_ip=$(ip_to_hex 172 16 1 1)
> +
> +# Send ip packet from foo1 to 8.8.8.8
> +src_mac="f00000010203"
> +dst_mac="000001010203"
> +packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
> +
> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
> +sleep 2
> +
> +# ARP request packet to expect at outside1
> +src_mac="000002010203"
> +arp_request=ffffffffffff${src_mac}08060001080006040001${src_mac}${gw_ip}000000000000${nexthop_ip}
> +echo $arp_request >> hv3-vif1.expected
> +OVN_CHECK_PACKETS([hv3/vif1-tx.pcap], [hv3-vif1.expected])
> +
> +# Send ARP reply from outside1 back to the router
> +reply_mac="f00000010204"
> +arp_reply=${src_mac}${reply_mac}08060001080006040002${reply_mac}${nexthop_ip}${src_mac}${gw_ip}
> +
> +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply
> +
> +# Allow some time for ovn-northd and ovn-controller to catch up.
> +# XXX This should be more systematic.
> +sleep 1
> +
> +# VLAN tagged packet with distributed gateway port(172.16.1.6) MAC as destination MAC
> +# is expected on bridge connecting hv1 and hv2
> +src_mac="f00000010203"
> +dst_mac="000002010203"
> +expected=${dst_mac}${src_mac}8100000208004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
> +echo $expected > hv1-br-ex_n2.expected
> +
> +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port.
> +# As connection tracking not enabled for this test, snat can't be done on the packet.
> +# We still see foo1 as the source ip address. But source mac(172.16.1.6 MAC) and
> +# dest mac(172.16.1.1 mac) are properly configured.
> +src_mac="000002010203"
> +dst_mac="f00000010204"
> +expected=${dst_mac}${src_mac}08004500001c000000003f110100${src_ip}${dst_ip}0035111100080000
> +echo $expected > hv3-vif1.expected
> +
> +reset_pcap_file() {
> +    local iface=$1
> +    local pcap_file=$2
> +    ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \
> +options:rxq_pcap=dummy-rx.pcap
> +    rm -f ${pcap_file}*.pcap
> +    ovs-vsctl -- set Interface $iface options:tx_pcap=${pcap_file}-tx.pcap \
> +options:rxq_pcap=${pcap_file}-rx.pcap
> +}
> +
> +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2
> +as hv3 reset_pcap_file hv3-vif1 hv3/vif1
> +sleep 1
> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
> +sleep 2
> +
> +# On hv1, table 65 for packets going from vlan switch pipleline to router pipleine
> +# set MLF_RCV_FROM_VLAN flag
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep "priority=100,reg15=0x1,metadata=0x2" \
> +| grep actions=clone | grep "load:0x1->NXM_NX_REG10" | wc -l], [0], [[1
> +]])
> +# On hv1, because of snat rule in table 15, a higher priority(i.e 2) flow
> +# added for packets with MLF_RCV_FROM_VLAN flag with output as distributed
> +# gateway port, which sets REGBIT_NAT_REDIRECT flag
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=15 | grep "priority=2,ip,reg10=0x20/0x20,metadata=0x1" \
> +| grep "actions=load:0x1->OXM_OF_PKT_REG4" | wc -l], [0], [[1
> +]])
> +
> +# On hv1, table 32 flow which tags packet with source network vlan tag and sends it to hv2
> +# through br-ex
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 | grep "priority=150,reg14=0x1,reg15=0x3,metadata=0x1" \
> +| grep "actions=mod_vlan_vid:2" | grep "n_packets=2," | wc -l], [0], [[1
> +]])
> +
> +# On hv2 table 0, vlan tagged packet is sent through router pipeline
> +# by setting MLF_RCV_FROM_VLAN flag (REG10)
> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep "table=0," | grep "priority=150" | grep "dl_vlan=2" | \
> +grep "actions=strip_vlan,load:0x1->OXM_OF_METADATA" | grep "load:0x1->NXM_NX_REG14" | \
> +grep "load:0x1->NXM_NX_REG10" | wc -l], [0], [[1
> +]])
> +# on hv2 table 8, allow packets with router metadata and with MLF_RCV_FROM_VLAN flag
> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=8 | grep "priority=100,reg10=0x20/0x20,reg14=0x1,metadata=0x1" | wc -l], [0], [[1
> +]])
> +
> +# Check vlan tagged packet on the bridge connecting hv1 and hv2
> +OVN_CHECK_PACKETS([hv1/br-ex_n2-tx.pcap], [hv1-br-ex_n2.expected])
> +# Check expected packet on nexthop interface
> +OVN_CHECK_PACKETS([hv3/vif1-tx.pcap], [hv3-vif1.expected])
> +
> +OVN_CLEANUP([hv1],[hv2],[hv3])
> +AT_CLEANUP
> +
>  AT_SETUP([ovn -- 1 LR with distributed router gateway port])
>  AT_SKIP_IF([test $HAVE_PYTHON = no])
>  ovn_start
> --
> 1.8.3.1
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Russell Bryant May 30, 2018, 8:11 p.m. | #2
One more general question:

a major difference when doing the redirect to the gateway via a VLAN
vs a geneve tunnel is the lack of metadata.  You've demonstrated how
it's easy enough to identify the network (the VLAN ID + the port it
arrived on).  How about the logical input / output IDs?  What values
are included when the packet is sent over the tunnel?  Are we
confident those values are not needed, or can be inferred another way
in this scenario?

On Wed, May 30, 2018 at 3:59 PM, Russell Bryant <russell@ovn.org> wrote:
> On Fri, May 25, 2018 at 7:33 AM,  <vkommadi@redhat.com> wrote:
>> From: venkata anil <vkommadi@redhat.com>
>>
>> When a vm on a vlan tenant network sends traffic to an external network,
>> it is tunneled from host chassis to gateway chassis. In the earlier
>> discussion [1], Russel (also in his doc [2]) suggested if we can figure
>> out a way for OVN to do this redirect to the gateway host over a VLAN
>> network. This patch implements his suggestion i.e will redirect to
>> gateway chassis using incoming tenant vlan network. Gateway chassis are
>> expected to be configured with tenant vlan networks. In this approach,
>> new logical and physical flows introduced for packet processing in both
>> host and gateway chassis.
>
> I don't think we can impose the expectation that the gateway is on the
> same vlan network as the original compute node.  The previous behavior
> of using the tunnel does not require that.
>
> Have you thought of whether we could use the new behavior
> automatically if we know both chassis are on the same network, or fall
> back to a tunnel if necessary?
>
>>
>> Packet processing in the host chassis:
>> 1) A new ovs flow added in physical table 65, which sets MLF_RCV_FROM_VLAN
>>    flag for packets from vlan network entering into router pipeline
>> 2) A new flow added in lr_in_ip_routing, for packets output through
>>    distributed gateway port and matching MLF_RCV_FROM_VLAN flag,
>>    set REGBIT_NAT_REDIRECT i.e
>>    table=7 (lr_in_ip_routing   ), priority=2    , match=(
>>    ip4.dst == 0.0.0.0/0 && flags.rcv_from_vlan == 1 &&
>>    !is_chassis_resident("cr-alice")), action=(reg9[0] = 1; next;)
>>    This flow will be set only on chassis not hosting chassisredirect
>>    port i.e compute node.
>>    When REGBIT_NAT_REDIRECT set,
>>    a) lr_in_arp_resolve, will set packet eth.dst to distibuted gateway
>>       port MAC
>>    b) lr_in_gw_redirect, will set chassisredirect port as outport
>> 3) A new ovs flow added in physical table 32 will use source vlan tenant
>>    network tag as vlan ID for sending the packet to gateway chassis.
>>    As this vlan packet destination MAC is distibuted gateway port MAC,
>>    packet will only reach the gateway chassis.
>>    table=32,priority=150,reg14=0x3,reg15=0x6,metadata=0x4
>>    actions=mod_vlan_vid:2010,output:25,strip_vlan
>>    This flow will be set only on chassis not hosting chassisredirect
>>    port i.e compute node.
>>
>> Packet processing in the gateway chassis:
>> 1) A new ovs flow added in physical table 0 to pass vlan traffic coming
>>    from localnet port to the connected router pipeline(i.e router
>>    attached to vlan tenant network).
>>    This flow will set router metadata, reg14 to router's patch port(lrp)
>>    (i.e patch port connecting router and vlan tenant network) and a new
>>    MLF_RCV_FROM_VLAN flag.
>>    table=0,priority=150,in_port=67,dl_vlan=2010 actions=strip_vlan,
>>    load:0x4->OXM_OF_METADATA[],load:0x3->NXM_NX_REG14[],
>>    load:0x1->NXM_NX_REG10[5],resubmit(,8)
>>    This flow will be set only on chassis hosting chassisredirect
>>    port i.e gateway node.
>> 2) A new flow added in lr_in_admission which checks MLF_RCV_FROM_VLAN
>>    and allows the packet. This flow will be set only on chassis hosting
>>    chassisredirect port i.e gateway node.
>>    table=0 (lr_in_admission    ), priority=100  , match=(
>>    flags.rcv_from_vlan == 1 && inport == "lrp-44383893-613a-4bfe-b483-
>>    e7d0dc3055cd" && is_chassis_resident("cr-lrp-a6e3d2ab-313a-4ea3-
>>    8ec4-c3c774a11f49")), action=(next;)
>>    Then packet will pass through router ingress and egress pipelines and
>>    then to external switch pipeline.
>>
>> In a scenario where the traffic between two vms in the same tenant vlan
>> network across different chassis i.e if "vm1" on tenant vlan network
>> "net1" is on host chassis "ch1" and "vm2" on same tenant vlan network
>> "net1" is on gateway chassis "gw1". When the packet arrived on "gw1"
>> chassis from localnet port, we still send it to router pipeline and router
>> pipeline will send it to destination switch ("net1") pipeline.
>
> Why is this?  Wouldn't the packet just have a destination MAC for "vm2"?
>
>> But in this case when packet arrives at "vm2", it will have router MAC as
>> source MAC as the packet is routed in gateway chassis. This bevaviour can
>> be seen only for destination vms hosted on gateway node.
>>
>> [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046557.html
>> [2] Point 3 in section 3.3.1 - Future Enhancements
>> https://docs.google.com/document/d/1JecGIXPH0RAqfGvD0nmtBdEU1zflHACp8WSRnKCFSgg/edit#
>>
>> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
>>
>> Signed-off-by: Venkata Anil <vkommadi@redhat.com>
>> ---
>>  ovn/controller/bfd.c            |   3 +-
>>  ovn/controller/binding.c        |  10 +-
>>  ovn/controller/ovn-controller.c |   3 +
>>  ovn/controller/ovn-controller.h |  16 ++-
>>  ovn/controller/physical.c       |  94 ++++++++++++++++-
>>  ovn/lib/logical-fields.c        |   4 +
>>  ovn/lib/logical-fields.h        |   2 +
>>  ovn/northd/ovn-northd.c         |  35 +++++++
>>  tests/ovn.at                    | 227 ++++++++++++++++++++++++++++++++++++++++
>>  9 files changed, 390 insertions(+), 4 deletions(-)
>>
>> diff --git a/ovn/controller/bfd.c b/ovn/controller/bfd.c
>> index 8f020d5..cbbd3ba 100644
>> --- a/ovn/controller/bfd.c
>> +++ b/ovn/controller/bfd.c
>> @@ -139,8 +139,9 @@ bfd_travel_gw_related_chassis(struct local_datapath *dp,
>>                                    struct local_datapath_node, node);
>>          dp = dp_binding->dp;
>>          free(dp_binding);
>> +        const struct sbrec_datapath_binding *pdp;
>>          for (size_t i = 0; i < dp->n_peer_dps; i++) {
>> -            const struct sbrec_datapath_binding *pdp = dp->peer_dps[i];
>> +            pdp = dp->peer_dps[i]->peer_dp;
>>              if (!pdp) {
>>                  continue;
>>              }
>> diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
>> index 0785a94..f02bde5 100644
>> --- a/ovn/controller/binding.c
>> +++ b/ovn/controller/binding.c
>> @@ -148,10 +148,14 @@ add_local_datapath__(struct controller_ctx *ctx,
>>                                  "lport-by-datapath", &cursor);
>>
>>      SBREC_PORT_BINDING_FOR_EACH_EQUAL (pb, &cursor, lpval) {
>> +        if (!strcmp(pb->type, "chassisredirect")) {
>> +            ld->chassisredirect_port = pb;
>> +        }
>>          if (!strcmp(pb->type, "patch")) {
>>              const char *peer_name = smap_get(&pb->options, "peer");
>>              if (peer_name) {
>>                  const struct sbrec_port_binding *peer;
>> +                struct peer_datapath *pdp;
>>
>>                  peer = lport_lookup_by_name( ctx->ovnsb_idl, peer_name);
>>
>> @@ -162,8 +166,12 @@ add_local_datapath__(struct controller_ctx *ctx,
>>                      ld->peer_dps = xrealloc(
>>                              ld->peer_dps,
>>                              ld->n_peer_dps * sizeof *ld->peer_dps);
>> -                    ld->peer_dps[ld->n_peer_dps - 1] = datapath_lookup_by_key(
>> +                    pdp = xcalloc(1, sizeof(struct peer_datapath));
>> +                    pdp->peer_dp = datapath_lookup_by_key(
>>                          ctx->ovnsb_idl, peer->datapath->tunnel_key);
>> +                    pdp->patch = pb;
>> +                    pdp->peer = peer;
>> +                    ld->peer_dps[ld->n_peer_dps - 1] = pdp;
>>                  }
>>              }
>>          }
>> diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
>> index 86e1836..55573fd 100644
>> --- a/ovn/controller/ovn-controller.c
>> +++ b/ovn/controller/ovn-controller.c
>> @@ -803,6 +803,9 @@ main(int argc, char *argv[])
>>
>>          struct local_datapath *cur_node, *next_node;
>>          HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node, &local_datapaths) {
>> +            for (int i = 0; i < cur_node->n_peer_dps; i++) {
>> +                free(cur_node->peer_dps[i]);
>> +            }
>>              free(cur_node->peer_dps);
>>              hmap_remove(&local_datapaths, &cur_node->hmap_node);
>>              free(cur_node);
>> diff --git a/ovn/controller/ovn-controller.h b/ovn/controller/ovn-controller.h
>> index 6617b0c..8023de2 100644
>> --- a/ovn/controller/ovn-controller.h
>> +++ b/ovn/controller/ovn-controller.h
>> @@ -46,6 +46,17 @@ struct ct_zone_pending_entry {
>>      enum ct_zone_pending_state state;
>>  };
>>
>> +/* Represents a peer datapath connected to a given datapath */
>> +struct peer_datapath {
>> +    const struct sbrec_datapath_binding *peer_dp;
>> +
>> +    /* Patch port connected to local datapath */
>> +    const struct sbrec_port_binding *patch;
>> +
>> +    /* Peer patch port connected to peer datapath */
>> +    const struct sbrec_port_binding *peer;
>> +};
>> +
>>  /* A logical datapath that has some relevance to this hypervisor.  A logical
>>   * datapath D is relevant to hypervisor H if:
>>   *
>> @@ -63,10 +74,13 @@ struct local_datapath {
>>      /* The localnet port in this datapath, if any (at most one is allowed). */
>>      const struct sbrec_port_binding *localnet_port;
>>
>> +    /* The chassisredirect port in this datapath */
>> +    const struct sbrec_port_binding *chassisredirect_port;
>> +
>>      /* True if this datapath contains an l3gateway port located on this
>>       * hypervisor. */
>>      bool has_local_l3gateway;
>> -    const struct sbrec_datapath_binding **peer_dps;
>> +    struct peer_datapath **peer_dps;
>>      size_t n_peer_dps;
>>  };
>>
>> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
>> index fc8adcf..ad57f69 100644
>> --- a/ovn/controller/physical.c
>> +++ b/ovn/controller/physical.c
>> @@ -304,7 +304,8 @@ consider_port_binding(struct controller_ctx *ctx,
>>  {
>>      uint32_t dp_key = binding->datapath->tunnel_key;
>>      uint32_t port_key = binding->tunnel_key;
>> -    if (!get_local_datapath(local_datapaths, dp_key)) {
>> +    struct local_datapath *ld = get_local_datapath(local_datapaths, dp_key);
>> +    if (!ld) {
>>          return;
>>      }
>>
>> @@ -350,6 +351,14 @@ consider_port_binding(struct controller_ctx *ctx,
>>              put_load(0, MFF_LOG_REG0 + i, 0, 32, ofpacts_p);
>>          }
>>          put_load(0, MFF_IN_PORT, 0, 16, ofpacts_p);
>> +        if (ld->localnet_port) {
>> +            int vlan_tag = (ld->localnet_port->n_tag ?
>> +                            *ld->localnet_port->tag : 0);
>> +            if (vlan_tag) {
>> +                put_load(1, MFF_LOG_FLAGS, MLF_RCV_FROM_VLAN_BIT, 1,
>> +                         ofpacts_p);
>> +            }
>> +        }
>>          put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
>>          clone = ofpbuf_at_assert(ofpacts_p, clone_ofs, sizeof *clone);
>>          ofpacts_p->header = clone;
>> @@ -539,6 +548,47 @@ consider_port_binding(struct controller_ctx *ctx,
>>           * input port, MFF_LOG_DATAPATH to the logical datapath, and
>>           * resubmit into the logical ingress pipeline starting at table
>>           * 16. */
>> +
>> +        /* Match a VLAN tag and strip it. If the vlan network is connected
>> +         * to a router which has a gateway port on redirect-chassis,
>> +         * set MLF_RCV_FROM_VLAN flag, router metadata and input port to
>> +         * connecting patch port */
>> +        int vlan_tag = binding->n_tag ? *binding->tag : 0;
>> +        if (!strcmp(binding->type, "localnet") && vlan_tag) {
>> +            struct local_datapath *ldp = get_local_datapath(
>> +                local_datapaths, binding->datapath->tunnel_key);
>> +            for (int i = 0; i < ldp->n_peer_dps; i++) {
>> +                struct local_datapath *peer_ldp = get_local_datapath(
>> +                    local_datapaths, ldp->peer_dps[i]->peer_dp->tunnel_key);
>> +                const struct sbrec_port_binding *crp;
>> +                crp = peer_ldp->chassisredirect_port;
>> +                if (crp && crp->chassis &&
>> +                   !strcmp(crp->chassis->name, chassis->name)) {
>> +                    const char *gwp = smap_get(&crp->options,
>> +                                               "distributed-port");
>> +                    if (strcmp(gwp, ldp->peer_dps[i]->peer->logical_port)) {
>> +                        ofpbuf_clear(ofpacts_p);
>> +                        match_init_catchall(&match);
>> +
>> +                        match_set_in_port(&match, ofport);
>> +                        match_set_dl_vlan(&match, htons(vlan_tag));
>> +
>> +                        ofpact_put_STRIP_VLAN(ofpacts_p);
>> +                        put_load(peer_ldp->datapath->tunnel_key,
>> +                                 MFF_LOG_DATAPATH, 0, 64, ofpacts_p);
>> +                        put_load(ldp->peer_dps[i]->peer->tunnel_key,
>> +                                 MFF_LOG_INPORT, 0, 32, ofpacts_p);
>> +                        put_load(1, MFF_LOG_FLAGS,
>> +                                 MLF_RCV_FROM_VLAN_BIT, 1, ofpacts_p);
>> +                        put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
>> +
>> +                        ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG,
>> +                                        150, 0, &match, ofpacts_p);
>> +                    }
>> +                }
>> +            }
>> +        }
>> +
>>          ofpbuf_clear(ofpacts_p);
>>          match_init_catchall(&match);
>>          match_set_in_port(&match, ofport);
>> @@ -639,6 +689,48 @@ consider_port_binding(struct controller_ctx *ctx,
>>           * flow matches an output port that includes a logical port on a remote
>>           * hypervisor, and tunnels the packet to that hypervisor.
>>           */
>> +
>> +        /* For each vlan network connected to the router, add that network's
>> +         * vlan tag to the packet and output it through localnet port */
>> +        struct local_datapath *ldp = get_local_datapath(local_datapaths,
>> +                                                        dp_key);
>> +        for (int i = 0; i < ldp->n_peer_dps; i++) {
>> +            struct ofpact_vlan_vid *vlan_vid;
>> +            ofp_port_t port_ofport = 0;
>> +            struct peer_datapath *pdp = ldp->peer_dps[i];
>> +            struct local_datapath *peer_ldp = get_local_datapath(
>> +                local_datapaths, pdp->peer_dp->tunnel_key);
>> +            if (peer_ldp->localnet_port && pdp->patch->tunnel_key) {
>> +                int64_t vlan_tag = (peer_ldp->localnet_port->n_tag ?
>> +                                    *peer_ldp->localnet_port->tag : 0);
>> +                if (!vlan_tag) {
>> +                    continue;
>> +                }
>> +                port_ofport = u16_to_ofp(simap_get(&localvif_to_ofport,
>> +                    peer_ldp->localnet_port->logical_port));
>> +                if (!port_ofport) {
>> +                    continue;
>> +                }
>> +
>> +                match_init_catchall(&match);
>> +                ofpbuf_clear(ofpacts_p);
>> +
>> +                match_set_metadata(&match, htonll(dp_key));
>> +                match_set_reg(&match, MFF_LOG_INPORT - MFF_REG0,
>> +                              pdp->patch->tunnel_key);
>> +                match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key);
>> +
>> +                vlan_vid = ofpact_put_SET_VLAN_VID(ofpacts_p);
>> +                vlan_vid->vlan_vid = vlan_tag;
>> +                vlan_vid->push_vlan_if_needed = true;
>> +                ofpact_put_OUTPUT(ofpacts_p)->port = port_ofport;
>> +                ofpact_put_STRIP_VLAN(ofpacts_p);
>> +
>> +                ofctrl_add_flow(flow_table, OFTABLE_REMOTE_OUTPUT, 150, 0,
>> +                                &match, ofpacts_p);
>> +            }
>> +        }
>> +
>>          match_init_catchall(&match);
>>          ofpbuf_clear(ofpacts_p);
>>
>> diff --git a/ovn/lib/logical-fields.c b/ovn/lib/logical-fields.c
>> index a8b5e3c..b9efa02 100644
>> --- a/ovn/lib/logical-fields.c
>> +++ b/ovn/lib/logical-fields.c
>> @@ -105,6 +105,10 @@ ovn_init_symtab(struct shash *symtab)
>>               MLF_FORCE_SNAT_FOR_LB_BIT);
>>      expr_symtab_add_subfield(symtab, "flags.force_snat_for_lb", NULL,
>>                               flags_str);
>> +    snprintf(flags_str, sizeof flags_str, "flags[%d]",
>> +             MLF_RCV_FROM_VLAN_BIT);
>> +    expr_symtab_add_subfield(symtab, "flags.rcv_from_vlan", NULL,
>> +                             flags_str);
>>
>>      /* Connection tracking state. */
>>      expr_symtab_add_field(symtab, "ct_mark", MFF_CT_MARK, NULL, false);
>> diff --git a/ovn/lib/logical-fields.h b/ovn/lib/logical-fields.h
>> index b1dbb03..96250fd 100644
>> --- a/ovn/lib/logical-fields.h
>> +++ b/ovn/lib/logical-fields.h
>> @@ -50,6 +50,7 @@ enum mff_log_flags_bits {
>>      MLF_FORCE_SNAT_FOR_DNAT_BIT = 2,
>>      MLF_FORCE_SNAT_FOR_LB_BIT = 3,
>>      MLF_LOCAL_ONLY_BIT = 4,
>> +    MLF_RCV_FROM_VLAN_BIT = 5,
>>  };
>>
>>  /* MFF_LOG_FLAGS_REG flag assignments */
>> @@ -75,6 +76,7 @@ enum mff_log_flags {
>>       * hypervisors should instead only be output to local targets
>>       */
>>      MLF_LOCAL_ONLY = (1 << MLF_LOCAL_ONLY_BIT),
>> +    MLF_RCV_FROM_VLAN = (1 << MLF_RCV_FROM_VLAN_BIT),
>>  };
>>
>>  #endif /* ovn/lib/logical-fields.h */
>> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>> index 0e06776..f68da2b 100644
>> --- a/ovn/northd/ovn-northd.c
>> +++ b/ovn/northd/ovn-northd.c
>> @@ -4411,6 +4411,28 @@ add_route(struct hmap *lflows, const struct ovn_port *op,
>>       * routing. */
>>      ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING, priority,
>>                    ds_cstr(&match), ds_cstr(&actions));
>> +
>> +    /* When output port is distributed gateway port, check if the router
>> +     * input port is a patch port connected to vlan network.
>> +     * Traffic from VLAN network to external network should be redirected
>> +     * to "redirect-chassis" by setting REGBIT_NAT_REDIRECT flag.
>> +     * Later physical table 32 will output this traffic to gateway
>> +     * chassis using input network vlan tag */
>> +    if (op == op->od->l3dgw_port) {
>> +        ds_clear(&match);
>> +        ds_clear(&actions);
>> +
>> +        ds_put_format(&match, "ip%s.%s == %s/%d", is_ipv4 ? "4" : "6",
>> +                      dir, network_s, plen);
>> +        ds_put_format(&match, " && flags.rcv_from_vlan == 1");
>> +        ds_put_format(&match, " && !is_chassis_resident(%s)",
>> +                      op->od->l3redirect_port->json_key);
>> +
>> +        ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING,
>> +                      priority + 1, ds_cstr(&match),
>> +                      REGBIT_NAT_REDIRECT" = 1; next;");
>> +    }
>> +
>>      ds_destroy(&match);
>>      ds_destroy(&actions);
>>  }
>> @@ -4822,6 +4844,19 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>>          }
>>          ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 50,
>>                        ds_cstr(&match), "next;");
>> +
>> +        /* VLAN traffic from localnet port should be allowed for
>> +         * router processing on the "redirect-chassis". */
>> +        if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer &&
>> +            op->peer->od->localnet_port && (op != op->od->l3dgw_port)) {
>> +            ds_clear(&match);
>> +            ds_put_format(&match, "flags.rcv_from_vlan == 1");
>> +            ds_put_format(&match, " && inport == %s", op->json_key);
>> +            ds_put_format(&match, " && is_chassis_resident(%s)",
>> +                          op->od->l3redirect_port->json_key);
>> +            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 100,
>> +                          ds_cstr(&match), "next;");
>> +        }
>>      }
>>
>>      /* Logical router ingress table 1: IP Input. */
>> diff --git a/tests/ovn.at b/tests/ovn.at
>> index f12c24c..6916bd0 100644
>> --- a/tests/ovn.at
>> +++ b/tests/ovn.at
>> @@ -7713,6 +7713,233 @@ test_ip_packet gw2 gw1
>>  OVN_CLEANUP([hv1],[gw1],[gw2],[ext1])
>>  AT_CLEANUP
>>
>> +# VLAN traffic for external network redirected through distributed router gateway port
>> +# should use vlans(i.e input network vlan tag) across hypervisors instead of tunneling.
>> +AT_SETUP([ovn -- vlan traffic for external network with distributed router gateway port])
>> +AT_SKIP_IF([test $HAVE_PYTHON = no])
>> +ovn_start
>> +
>> +# Logical network:
>> +# # One LR R1 that has switches foo (192.168.1.0/24) and
>> +# # alice (172.16.1.0/24) connected to it.  The logical port
>> +# # between R1 and alice has a "redirect-chassis" specified,
>> +# # i.e. it is the distributed router gateway port(172.16.1.6).
>> +# # Switch alice also has a localnet port defined.
>> +# # An additional switch outside has the same subnet as alice
>> +# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1)
>> +# # which will receive the packet destined for external network
>> +# # (i.e 8.8.8.8 as destination ip).
>> +
>> +# Physical network:
>> +# # Three hypervisors hv[123].
>> +# # hv1 hosts vif foo1.
>> +# # hv2 is the "redirect-chassis" that hosts the distributed router gateway port.
>> +# # hv3 hosts nexthop port vif outside1.
>> +# # All other tests connect hypervisors to network n1 through br-phys for tunneling.
>> +# # But in this test, hv1 won't connect to n1(and no br-phys in hv1), and
>> +# # in order to show vlans(instead of tunneling) used between hv1 and hv2,
>> +# # a new network n2 created and hv1 and hv2 connected to this network through br-ex.
>> +# # hv2 and hv3 are still connected to n1 network through br-phys.
>> +net_add n1
>> +
>> +# We are not calling ovn_attach for hv1, to avoid adding br-phys.
>> +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any bridge in hv1
>> +sim_add hv1
>> +as hv1
>> +ovs-vsctl \
>> +    -- set Open_vSwitch . external-ids:system-id=hv1 \
>> +    -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
>> +    -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \
>> +    -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \
>> +    -- add-br br-int \
>> +    -- set bridge br-int fail-mode=secure other-config:disable-in-band=true
>> +start_daemon ovn-controller
>> +ovs-vsctl -- add-port br-int hv1-vif1 -- \
>> +    set interface hv1-vif1 external-ids:iface-id=foo1 \
>> +    ofport-request=1
>> +
>> +sim_add hv2
>> +as hv2
>> +ovs-vsctl add-br br-phys
>> +ovn_attach n1 br-phys 192.168.0.2
>> +
>> +sim_add hv3
>> +as hv3
>> +ovs-vsctl add-br br-phys
>> +ovn_attach n1 br-phys 192.168.0.3
>> +ovs-vsctl -- add-port br-int hv3-vif1 -- \
>> +    set interface hv3-vif1 external-ids:iface-id=outside1 \
>> +    options:tx_pcap=hv3/vif1-tx.pcap \
>> +    options:rxq_pcap=hv3/vif1-rx.pcap \
>> +    ofport-request=1
>> +
>> +# Create network n2 for vlan connectivity between hv1 and hv2
>> +net_add n2
>> +
>> +as hv1
>> +ovs-vsctl add-br br-ex
>> +net_attach n2 br-ex
>> +
>> +as hv2
>> +ovs-vsctl add-br br-ex
>> +net_attach n2 br-ex
>> +
>> +as hv1 ovs-vsctl set open . external-ids:ovn-bridge-mappings="public:br-ex"
>> +as hv2 ovs-vsctl set open . external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys"
>> +as hv3 ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
>> +OVN_POPULATE_ARP
>> +
>> +ovn-nbctl create Logical_Router name=R1
>> +
>> +ovn-nbctl ls-add foo
>> +ovn-nbctl ls-add alice
>> +ovn-nbctl ls-add outside
>> +
>> +# Connect foo to R1
>> +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24
>> +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
>> +    type=router options:router-port=foo \
>> +    -- lsp-set-addresses rp-foo router
>> +
>> +# Connect alice to R1 as distributed router gateway port (172.16.1.6) on hv2
>> +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \
>> +    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
>> +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
>> +    type=router options:router-port=alice \
>> +    -- lsp-set-addresses rp-alice router
>> +
>> +# Create logical port foo1 in foo
>> +ovn-nbctl lsp-add foo foo1 \
>> +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
>> +
>> +# Create logical port outside1 in outside, which is a nexthop address
>> +# for 172.16.1.0/24
>> +ovn-nbctl lsp-add outside outside1 \
>> +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1"
>> +
>> +# Set default gateway (nexthop) to 172.16.1.1
>> +ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice
>> +AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24])
>> +
>> +ovn-nbctl lsp-add foo ln-foo
>> +ovn-nbctl lsp-set-addresses ln-foo unknown
>> +ovn-nbctl lsp-set-options ln-foo network_name=public
>> +ovn-nbctl lsp-set-type ln-foo localnet
>> +AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2])
>> +
>> +# Create localnet port in alice
>> +ovn-nbctl lsp-add alice ln-alice
>> +ovn-nbctl lsp-set-addresses ln-alice unknown
>> +ovn-nbctl lsp-set-type ln-alice localnet
>> +ovn-nbctl lsp-set-options ln-alice network_name=phys
>> +
>> +# Create localnet port in outside
>> +ovn-nbctl lsp-add outside ln-outside
>> +ovn-nbctl lsp-set-addresses ln-outside unknown
>> +ovn-nbctl lsp-set-type ln-outside localnet
>> +ovn-nbctl lsp-set-options ln-outside network_name=phys
>> +ovn-nbctl --wait=hv sync
>> +
>> +ip_to_hex() {
>> +    printf "%02x%02x%02x%02x" "$@"
>> +}
>> +gw_ip=$(ip_to_hex 172 16 1 6)
>> +src_ip=$(ip_to_hex 192 168 1 2)
>> +dst_ip=$(ip_to_hex 8 8 8 8)
>> +nexthop_ip=$(ip_to_hex 172 16 1 1)
>> +
>> +# Send ip packet from foo1 to 8.8.8.8
>> +src_mac="f00000010203"
>> +dst_mac="000001010203"
>> +packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
>> +
>> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>> +sleep 2
>> +
>> +# ARP request packet to expect at outside1
>> +src_mac="000002010203"
>> +arp_request=ffffffffffff${src_mac}08060001080006040001${src_mac}${gw_ip}000000000000${nexthop_ip}
>> +echo $arp_request >> hv3-vif1.expected
>> +OVN_CHECK_PACKETS([hv3/vif1-tx.pcap], [hv3-vif1.expected])
>> +
>> +# Send ARP reply from outside1 back to the router
>> +reply_mac="f00000010204"
>> +arp_reply=${src_mac}${reply_mac}08060001080006040002${reply_mac}${nexthop_ip}${src_mac}${gw_ip}
>> +
>> +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply
>> +
>> +# Allow some time for ovn-northd and ovn-controller to catch up.
>> +# XXX This should be more systematic.
>> +sleep 1
>> +
>> +# VLAN tagged packet with distributed gateway port(172.16.1.6) MAC as destination MAC
>> +# is expected on bridge connecting hv1 and hv2
>> +src_mac="f00000010203"
>> +dst_mac="000002010203"
>> +expected=${dst_mac}${src_mac}8100000208004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
>> +echo $expected > hv1-br-ex_n2.expected
>> +
>> +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port.
>> +# As connection tracking not enabled for this test, snat can't be done on the packet.
>> +# We still see foo1 as the source ip address. But source mac(172.16.1.6 MAC) and
>> +# dest mac(172.16.1.1 mac) are properly configured.
>> +src_mac="000002010203"
>> +dst_mac="f00000010204"
>> +expected=${dst_mac}${src_mac}08004500001c000000003f110100${src_ip}${dst_ip}0035111100080000
>> +echo $expected > hv3-vif1.expected
>> +
>> +reset_pcap_file() {
>> +    local iface=$1
>> +    local pcap_file=$2
>> +    ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \
>> +options:rxq_pcap=dummy-rx.pcap
>> +    rm -f ${pcap_file}*.pcap
>> +    ovs-vsctl -- set Interface $iface options:tx_pcap=${pcap_file}-tx.pcap \
>> +options:rxq_pcap=${pcap_file}-rx.pcap
>> +}
>> +
>> +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2
>> +as hv3 reset_pcap_file hv3-vif1 hv3/vif1
>> +sleep 1
>> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>> +sleep 2
>> +
>> +# On hv1, table 65 for packets going from vlan switch pipleline to router pipleine
>> +# set MLF_RCV_FROM_VLAN flag
>> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep "priority=100,reg15=0x1,metadata=0x2" \
>> +| grep actions=clone | grep "load:0x1->NXM_NX_REG10" | wc -l], [0], [[1
>> +]])
>> +# On hv1, because of snat rule in table 15, a higher priority(i.e 2) flow
>> +# added for packets with MLF_RCV_FROM_VLAN flag with output as distributed
>> +# gateway port, which sets REGBIT_NAT_REDIRECT flag
>> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=15 | grep "priority=2,ip,reg10=0x20/0x20,metadata=0x1" \
>> +| grep "actions=load:0x1->OXM_OF_PKT_REG4" | wc -l], [0], [[1
>> +]])
>> +
>> +# On hv1, table 32 flow which tags packet with source network vlan tag and sends it to hv2
>> +# through br-ex
>> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 | grep "priority=150,reg14=0x1,reg15=0x3,metadata=0x1" \
>> +| grep "actions=mod_vlan_vid:2" | grep "n_packets=2," | wc -l], [0], [[1
>> +]])
>> +
>> +# On hv2 table 0, vlan tagged packet is sent through router pipeline
>> +# by setting MLF_RCV_FROM_VLAN flag (REG10)
>> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep "table=0," | grep "priority=150" | grep "dl_vlan=2" | \
>> +grep "actions=strip_vlan,load:0x1->OXM_OF_METADATA" | grep "load:0x1->NXM_NX_REG14" | \
>> +grep "load:0x1->NXM_NX_REG10" | wc -l], [0], [[1
>> +]])
>> +# on hv2 table 8, allow packets with router metadata and with MLF_RCV_FROM_VLAN flag
>> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=8 | grep "priority=100,reg10=0x20/0x20,reg14=0x1,metadata=0x1" | wc -l], [0], [[1
>> +]])
>> +
>> +# Check vlan tagged packet on the bridge connecting hv1 and hv2
>> +OVN_CHECK_PACKETS([hv1/br-ex_n2-tx.pcap], [hv1-br-ex_n2.expected])
>> +# Check expected packet on nexthop interface
>> +OVN_CHECK_PACKETS([hv3/vif1-tx.pcap], [hv3-vif1.expected])
>> +
>> +OVN_CLEANUP([hv1],[hv2],[hv3])
>> +AT_CLEANUP
>> +
>>  AT_SETUP([ovn -- 1 LR with distributed router gateway port])
>>  AT_SKIP_IF([test $HAVE_PYTHON = no])
>>  ovn_start
>> --
>> 1.8.3.1
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
>
> --
> Russell Bryant
Anil Venkata May 31, 2018, 6:43 p.m. | #3
Thanks Russell.

On Thu, May 31, 2018 at 1:41 AM, Russell Bryant <russell@ovn.org> wrote:

> One more general question:
>
> a major difference when doing the redirect to the gateway via a VLAN
> vs a geneve tunnel is the lack of metadata.  You've demonstrated how
> it's easy enough to identify the network (the VLAN ID + the port it
> arrived on).  How about the logical input / output IDs?  What values
> are included when the packet is sent over the tunnel?  Are we
> confident those values are not needed, or can be inferred another way
> in this scenario?
>

When the packet is sent over the geneve tunnel, geneve tunnel key consists
of
below values
1) router patch port connected to tenant vlan network as logical input port
2) redirect chassis port as logical output port
2) router datapath as logical datapath
When gateway chassis receives packet from tunnel, it ignores logical input
port
and directly sends it to table 33 (which sets output to distributed gateway
port)
for egress processing.

When gateway chassis receives vlan tagged packet, as we don't have logical
input
 & output details, this patch sets -
 "router patch port connected to tenant vlan network" as logical input port
and
sends the packet through router ingress pipeline to determine logical
output port.

If we check destination MAC along with VLAN ID in table 0 on gateway
chassis,
as suggested by Miguel and Jakub, and if the destination MAC is for
distributed
gateway port, then we can set output port to distributed gateway port in
table 0
and directly pass packet to router egress pipeline(skipping router ingress
pipeline).


> On Wed, May 30, 2018 at 3:59 PM, Russell Bryant <russell@ovn.org> wrote:
> > On Fri, May 25, 2018 at 7:33 AM,  <vkommadi@redhat.com> wrote:
> >> From: venkata anil <vkommadi@redhat.com>
> >>
> >> When a vm on a vlan tenant network sends traffic to an external network,
> >> it is tunneled from host chassis to gateway chassis. In the earlier
> >> discussion [1], Russel (also in his doc [2]) suggested if we can figure
> >> out a way for OVN to do this redirect to the gateway host over a VLAN
> >> network. This patch implements his suggestion i.e will redirect to
> >> gateway chassis using incoming tenant vlan network. Gateway chassis are
> >> expected to be configured with tenant vlan networks. In this approach,
> >> new logical and physical flows introduced for packet processing in both
> >> host and gateway chassis.
> >
> > I don't think we can impose the expectation that the gateway is on the
> > same vlan network as the original compute node.  The previous behavior
> > of using the tunnel does not require that.
> >
>


I thought we can impose this expectation as in Openstack DVR environment,
 to use centralized SNAT for DVR routers, centralized router(on network
node)
 will have connectivity to same tenant vlan network which compute node has
 connected. Indeed a new port is created on each tenant network and these
ports
are added to centralized SNAT router namespace.  VM will send packet to
these
ports on centralized SNAT router though this tenant  vlan network.


> > Have you thought of whether we could use the new behavior
> > automatically if we know both chassis are on the same network, or fall
> > back to a tunnel if necessary?
> >
>


No, I didn't think about that. Thanks for the suggestion.
If we couldn't impose the above expectation, then we can implement your
suggestion
about falling back to tunnelling.


> >>
> >> Packet processing in the host chassis:
> >> 1) A new ovs flow added in physical table 65, which sets
> MLF_RCV_FROM_VLAN
> >>    flag for packets from vlan network entering into router pipeline
> >> 2) A new flow added in lr_in_ip_routing, for packets output through
> >>    distributed gateway port and matching MLF_RCV_FROM_VLAN flag,
> >>    set REGBIT_NAT_REDIRECT i.e
> >>    table=7 (lr_in_ip_routing   ), priority=2    , match=(
> >>    ip4.dst == 0.0.0.0/0 && flags.rcv_from_vlan == 1 &&
> >>    !is_chassis_resident("cr-alice")), action=(reg9[0] = 1; next;)
> >>    This flow will be set only on chassis not hosting chassisredirect
> >>    port i.e compute node.
> >>    When REGBIT_NAT_REDIRECT set,
> >>    a) lr_in_arp_resolve, will set packet eth.dst to distibuted gateway
> >>       port MAC
> >>    b) lr_in_gw_redirect, will set chassisredirect port as outport
> >> 3) A new ovs flow added in physical table 32 will use source vlan tenant
> >>    network tag as vlan ID for sending the packet to gateway chassis.
> >>    As this vlan packet destination MAC is distibuted gateway port MAC,
> >>    packet will only reach the gateway chassis.
> >>    table=32,priority=150,reg14=0x3,reg15=0x6,metadata=0x4
> >>    actions=mod_vlan_vid:2010,output:25,strip_vlan
> >>    This flow will be set only on chassis not hosting chassisredirect
> >>    port i.e compute node.
> >>
> >> Packet processing in the gateway chassis:
> >> 1) A new ovs flow added in physical table 0 to pass vlan traffic coming
> >>    from localnet port to the connected router pipeline(i.e router
> >>    attached to vlan tenant network).
> >>    This flow will set router metadata, reg14 to router's patch port(lrp)
> >>    (i.e patch port connecting router and vlan tenant network) and a new
> >>    MLF_RCV_FROM_VLAN flag.
> >>    table=0,priority=150,in_port=67,dl_vlan=2010 actions=strip_vlan,
> >>    load:0x4->OXM_OF_METADATA[],load:0x3->NXM_NX_REG14[],
> >>    load:0x1->NXM_NX_REG10[5],resubmit(,8)
> >>    This flow will be set only on chassis hosting chassisredirect
> >>    port i.e gateway node.
> >> 2) A new flow added in lr_in_admission which checks MLF_RCV_FROM_VLAN
> >>    and allows the packet. This flow will be set only on chassis hosting
> >>    chassisredirect port i.e gateway node.
> >>    table=0 (lr_in_admission    ), priority=100  , match=(
> >>    flags.rcv_from_vlan == 1 && inport == "lrp-44383893-613a-4bfe-b483-
> >>    e7d0dc3055cd" && is_chassis_resident("cr-lrp-a6e3d2ab-313a-4ea3-
> >>    8ec4-c3c774a11f49")), action=(next;)
> >>    Then packet will pass through router ingress and egress pipelines and
> >>    then to external switch pipeline.
> >>
> >> In a scenario where the traffic between two vms in the same tenant vlan
> >> network across different chassis i.e if "vm1" on tenant vlan network
> >> "net1" is on host chassis "ch1" and "vm2" on same tenant vlan network
> >> "net1" is on gateway chassis "gw1". When the packet arrived on "gw1"
> >> chassis from localnet port, we still send it to router pipeline and
> router
> >> pipeline will send it to destination switch ("net1") pipeline.
> >
> > Why is this?  Wouldn't the packet just have a destination MAC for "vm2"?
> >
>

When the packet arrived at gateway chassis, in physical table 0, for vlan
packets
we are not checking destination MAC address and instead forcing the packet
through router pipeline.
Ajo and Jakub suggested to check for the destination MAC address along with
vlan ID in physical table 0. Their suggestion is, if the destination MAC
address is
for distributed gateway port then send it through router pipeline,
otherwise send it
though switch pipeline. Otherwise it will hit the performance for vms in
the same
 tenant vlan network and across hypervisors (i.e compute and gateway nodes),
as we are forcing the packet through router pipeline in the gateway chassis.



> >> But in this case when packet arrives at "vm2", it will have router MAC
> as
> >> source MAC as the packet is routed in gateway chassis. This bevaviour
> can
> >> be seen only for destination vms hosted on gateway node.
> >>
> >> [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2018-
> April/046557.html
> >> [2] Point 3 in section 3.3.1 - Future Enhancements
> >> https://docs.google.com/document/d/1JecGIXPH0RAqfGvD0nmtBdEU1zflH
> ACp8WSRnKCFSgg/edit#
> >>
> >> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-
> April/046543.html
> >>
> >> Signed-off-by: Venkata Anil <vkommadi@redhat.com>
> >> ---
> >>  ovn/controller/bfd.c            |   3 +-
> >>  ovn/controller/binding.c        |  10 +-
> >>  ovn/controller/ovn-controller.c |   3 +
> >>  ovn/controller/ovn-controller.h |  16 ++-
> >>  ovn/controller/physical.c       |  94 ++++++++++++++++-
> >>  ovn/lib/logical-fields.c        |   4 +
> >>  ovn/lib/logical-fields.h        |   2 +
> >>  ovn/northd/ovn-northd.c         |  35 +++++++
> >>  tests/ovn.at                    | 227 ++++++++++++++++++++++++++++++
> ++++++++++
> >>  9 files changed, 390 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/ovn/controller/bfd.c b/ovn/controller/bfd.c
> >> index 8f020d5..cbbd3ba 100644
> >> --- a/ovn/controller/bfd.c
> >> +++ b/ovn/controller/bfd.c
> >> @@ -139,8 +139,9 @@ bfd_travel_gw_related_chassis(struct
> local_datapath *dp,
> >>                                    struct local_datapath_node, node);
> >>          dp = dp_binding->dp;
> >>          free(dp_binding);
> >> +        const struct sbrec_datapath_binding *pdp;
> >>          for (size_t i = 0; i < dp->n_peer_dps; i++) {
> >> -            const struct sbrec_datapath_binding *pdp = dp->peer_dps[i];
> >> +            pdp = dp->peer_dps[i]->peer_dp;
> >>              if (!pdp) {
> >>                  continue;
> >>              }
> >> diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
> >> index 0785a94..f02bde5 100644
> >> --- a/ovn/controller/binding.c
> >> +++ b/ovn/controller/binding.c
> >> @@ -148,10 +148,14 @@ add_local_datapath__(struct controller_ctx *ctx,
> >>                                  "lport-by-datapath", &cursor);
> >>
> >>      SBREC_PORT_BINDING_FOR_EACH_EQUAL (pb, &cursor, lpval) {
> >> +        if (!strcmp(pb->type, "chassisredirect")) {
> >> +            ld->chassisredirect_port = pb;
> >> +        }
> >>          if (!strcmp(pb->type, "patch")) {
> >>              const char *peer_name = smap_get(&pb->options, "peer");
> >>              if (peer_name) {
> >>                  const struct sbrec_port_binding *peer;
> >> +                struct peer_datapath *pdp;
> >>
> >>                  peer = lport_lookup_by_name( ctx->ovnsb_idl,
> peer_name);
> >>
> >> @@ -162,8 +166,12 @@ add_local_datapath__(struct controller_ctx *ctx,
> >>                      ld->peer_dps = xrealloc(
> >>                              ld->peer_dps,
> >>                              ld->n_peer_dps * sizeof *ld->peer_dps);
> >> -                    ld->peer_dps[ld->n_peer_dps - 1] =
> datapath_lookup_by_key(
> >> +                    pdp = xcalloc(1, sizeof(struct peer_datapath));
> >> +                    pdp->peer_dp = datapath_lookup_by_key(
> >>                          ctx->ovnsb_idl, peer->datapath->tunnel_key);
> >> +                    pdp->patch = pb;
> >> +                    pdp->peer = peer;
> >> +                    ld->peer_dps[ld->n_peer_dps - 1] = pdp;
> >>                  }
> >>              }
> >>          }
> >> diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-
> controller.c
> >> index 86e1836..55573fd 100644
> >> --- a/ovn/controller/ovn-controller.c
> >> +++ b/ovn/controller/ovn-controller.c
> >> @@ -803,6 +803,9 @@ main(int argc, char *argv[])
> >>
> >>          struct local_datapath *cur_node, *next_node;
> >>          HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node,
> &local_datapaths) {
> >> +            for (int i = 0; i < cur_node->n_peer_dps; i++) {
> >> +                free(cur_node->peer_dps[i]);
> >> +            }
> >>              free(cur_node->peer_dps);
> >>              hmap_remove(&local_datapaths, &cur_node->hmap_node);
> >>              free(cur_node);
> >> diff --git a/ovn/controller/ovn-controller.h b/ovn/controller/ovn-
> controller.h
> >> index 6617b0c..8023de2 100644
> >> --- a/ovn/controller/ovn-controller.h
> >> +++ b/ovn/controller/ovn-controller.h
> >> @@ -46,6 +46,17 @@ struct ct_zone_pending_entry {
> >>      enum ct_zone_pending_state state;
> >>  };
> >>
> >> +/* Represents a peer datapath connected to a given datapath */
> >> +struct peer_datapath {
> >> +    const struct sbrec_datapath_binding *peer_dp;
> >> +
> >> +    /* Patch port connected to local datapath */
> >> +    const struct sbrec_port_binding *patch;
> >> +
> >> +    /* Peer patch port connected to peer datapath */
> >> +    const struct sbrec_port_binding *peer;
> >> +};
> >> +
> >>  /* A logical datapath that has some relevance to this hypervisor.  A
> logical
> >>   * datapath D is relevant to hypervisor H if:
> >>   *
> >> @@ -63,10 +74,13 @@ struct local_datapath {
> >>      /* The localnet port in this datapath, if any (at most one is
> allowed). */
> >>      const struct sbrec_port_binding *localnet_port;
> >>
> >> +    /* The chassisredirect port in this datapath */
> >> +    const struct sbrec_port_binding *chassisredirect_port;
> >> +
> >>      /* True if this datapath contains an l3gateway port located on this
> >>       * hypervisor. */
> >>      bool has_local_l3gateway;
> >> -    const struct sbrec_datapath_binding **peer_dps;
> >> +    struct peer_datapath **peer_dps;
> >>      size_t n_peer_dps;
> >>  };
> >>
> >> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
> >> index fc8adcf..ad57f69 100644
> >> --- a/ovn/controller/physical.c
> >> +++ b/ovn/controller/physical.c
> >> @@ -304,7 +304,8 @@ consider_port_binding(struct controller_ctx *ctx,
> >>  {
> >>      uint32_t dp_key = binding->datapath->tunnel_key;
> >>      uint32_t port_key = binding->tunnel_key;
> >> -    if (!get_local_datapath(local_datapaths, dp_key)) {
> >> +    struct local_datapath *ld = get_local_datapath(local_datapaths,
> dp_key);
> >> +    if (!ld) {
> >>          return;
> >>      }
> >>
> >> @@ -350,6 +351,14 @@ consider_port_binding(struct controller_ctx *ctx,
> >>              put_load(0, MFF_LOG_REG0 + i, 0, 32, ofpacts_p);
> >>          }
> >>          put_load(0, MFF_IN_PORT, 0, 16, ofpacts_p);
> >> +        if (ld->localnet_port) {
> >> +            int vlan_tag = (ld->localnet_port->n_tag ?
> >> +                            *ld->localnet_port->tag : 0);
> >> +            if (vlan_tag) {
> >> +                put_load(1, MFF_LOG_FLAGS, MLF_RCV_FROM_VLAN_BIT, 1,
> >> +                         ofpacts_p);
> >> +            }
> >> +        }
> >>          put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
> >>          clone = ofpbuf_at_assert(ofpacts_p, clone_ofs, sizeof *clone);
> >>          ofpacts_p->header = clone;
> >> @@ -539,6 +548,47 @@ consider_port_binding(struct controller_ctx *ctx,
> >>           * input port, MFF_LOG_DATAPATH to the logical datapath, and
> >>           * resubmit into the logical ingress pipeline starting at table
> >>           * 16. */
> >> +
> >> +        /* Match a VLAN tag and strip it. If the vlan network is
> connected
> >> +         * to a router which has a gateway port on redirect-chassis,
> >> +         * set MLF_RCV_FROM_VLAN flag, router metadata and input port
> to
> >> +         * connecting patch port */
> >> +        int vlan_tag = binding->n_tag ? *binding->tag : 0;
> >> +        if (!strcmp(binding->type, "localnet") && vlan_tag) {
> >> +            struct local_datapath *ldp = get_local_datapath(
> >> +                local_datapaths, binding->datapath->tunnel_key);
> >> +            for (int i = 0; i < ldp->n_peer_dps; i++) {
> >> +                struct local_datapath *peer_ldp = get_local_datapath(
> >> +                    local_datapaths, ldp->peer_dps[i]->peer_dp->
> tunnel_key);
> >> +                const struct sbrec_port_binding *crp;
> >> +                crp = peer_ldp->chassisredirect_port;
> >> +                if (crp && crp->chassis &&
> >> +                   !strcmp(crp->chassis->name, chassis->name)) {
> >> +                    const char *gwp = smap_get(&crp->options,
> >> +                                               "distributed-port");
> >> +                    if (strcmp(gwp, ldp->peer_dps[i]->peer->logical_port))
> {
> >> +                        ofpbuf_clear(ofpacts_p);
> >> +                        match_init_catchall(&match);
> >> +
> >> +                        match_set_in_port(&match, ofport);
> >> +                        match_set_dl_vlan(&match, htons(vlan_tag));
> >> +
> >> +                        ofpact_put_STRIP_VLAN(ofpacts_p);
> >> +                        put_load(peer_ldp->datapath->tunnel_key,
> >> +                                 MFF_LOG_DATAPATH, 0, 64, ofpacts_p);
> >> +                        put_load(ldp->peer_dps[i]->peer->tunnel_key,
> >> +                                 MFF_LOG_INPORT, 0, 32, ofpacts_p);
> >> +                        put_load(1, MFF_LOG_FLAGS,
> >> +                                 MLF_RCV_FROM_VLAN_BIT, 1, ofpacts_p);
> >> +                        put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE,
> ofpacts_p);
> >> +
> >> +                        ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG,
> >> +                                        150, 0, &match, ofpacts_p);
> >> +                    }
> >> +                }
> >> +            }
> >> +        }
> >> +
> >>          ofpbuf_clear(ofpacts_p);
> >>          match_init_catchall(&match);
> >>          match_set_in_port(&match, ofport);
> >> @@ -639,6 +689,48 @@ consider_port_binding(struct controller_ctx *ctx,
> >>           * flow matches an output port that includes a logical port on
> a remote
> >>           * hypervisor, and tunnels the packet to that hypervisor.
> >>           */
> >> +
> >> +        /* For each vlan network connected to the router, add that
> network's
> >> +         * vlan tag to the packet and output it through localnet port
> */
> >> +        struct local_datapath *ldp = get_local_datapath(local_
> datapaths,
> >> +                                                        dp_key);
> >> +        for (int i = 0; i < ldp->n_peer_dps; i++) {
> >> +            struct ofpact_vlan_vid *vlan_vid;
> >> +            ofp_port_t port_ofport = 0;
> >> +            struct peer_datapath *pdp = ldp->peer_dps[i];
> >> +            struct local_datapath *peer_ldp = get_local_datapath(
> >> +                local_datapaths, pdp->peer_dp->tunnel_key);
> >> +            if (peer_ldp->localnet_port && pdp->patch->tunnel_key) {
> >> +                int64_t vlan_tag = (peer_ldp->localnet_port->n_tag ?
> >> +                                    *peer_ldp->localnet_port->tag : 0);
> >> +                if (!vlan_tag) {
> >> +                    continue;
> >> +                }
> >> +                port_ofport = u16_to_ofp(simap_get(&
> localvif_to_ofport,
> >> +                    peer_ldp->localnet_port->logical_port));
> >> +                if (!port_ofport) {
> >> +                    continue;
> >> +                }
> >> +
> >> +                match_init_catchall(&match);
> >> +                ofpbuf_clear(ofpacts_p);
> >> +
> >> +                match_set_metadata(&match, htonll(dp_key));
> >> +                match_set_reg(&match, MFF_LOG_INPORT - MFF_REG0,
> >> +                              pdp->patch->tunnel_key);
> >> +                match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0,
> port_key);
> >> +
> >> +                vlan_vid = ofpact_put_SET_VLAN_VID(ofpacts_p);
> >> +                vlan_vid->vlan_vid = vlan_tag;
> >> +                vlan_vid->push_vlan_if_needed = true;
> >> +                ofpact_put_OUTPUT(ofpacts_p)->port = port_ofport;
> >> +                ofpact_put_STRIP_VLAN(ofpacts_p);
> >> +
> >> +                ofctrl_add_flow(flow_table, OFTABLE_REMOTE_OUTPUT,
> 150, 0,
> >> +                                &match, ofpacts_p);
> >> +            }
> >> +        }
> >> +
> >>          match_init_catchall(&match);
> >>          ofpbuf_clear(ofpacts_p);
> >>
> >> diff --git a/ovn/lib/logical-fields.c b/ovn/lib/logical-fields.c
> >> index a8b5e3c..b9efa02 100644
> >> --- a/ovn/lib/logical-fields.c
> >> +++ b/ovn/lib/logical-fields.c
> >> @@ -105,6 +105,10 @@ ovn_init_symtab(struct shash *symtab)
> >>               MLF_FORCE_SNAT_FOR_LB_BIT);
> >>      expr_symtab_add_subfield(symtab, "flags.force_snat_for_lb", NULL,
> >>                               flags_str);
> >> +    snprintf(flags_str, sizeof flags_str, "flags[%d]",
> >> +             MLF_RCV_FROM_VLAN_BIT);
> >> +    expr_symtab_add_subfield(symtab, "flags.rcv_from_vlan", NULL,
> >> +                             flags_str);
> >>
> >>      /* Connection tracking state. */
> >>      expr_symtab_add_field(symtab, "ct_mark", MFF_CT_MARK, NULL, false);
> >> diff --git a/ovn/lib/logical-fields.h b/ovn/lib/logical-fields.h
> >> index b1dbb03..96250fd 100644
> >> --- a/ovn/lib/logical-fields.h
> >> +++ b/ovn/lib/logical-fields.h
> >> @@ -50,6 +50,7 @@ enum mff_log_flags_bits {
> >>      MLF_FORCE_SNAT_FOR_DNAT_BIT = 2,
> >>      MLF_FORCE_SNAT_FOR_LB_BIT = 3,
> >>      MLF_LOCAL_ONLY_BIT = 4,
> >> +    MLF_RCV_FROM_VLAN_BIT = 5,
> >>  };
> >>
> >>  /* MFF_LOG_FLAGS_REG flag assignments */
> >> @@ -75,6 +76,7 @@ enum mff_log_flags {
> >>       * hypervisors should instead only be output to local targets
> >>       */
> >>      MLF_LOCAL_ONLY = (1 << MLF_LOCAL_ONLY_BIT),
> >> +    MLF_RCV_FROM_VLAN = (1 << MLF_RCV_FROM_VLAN_BIT),
> >>  };
> >>
> >>  #endif /* ovn/lib/logical-fields.h */
> >> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> >> index 0e06776..f68da2b 100644
> >> --- a/ovn/northd/ovn-northd.c
> >> +++ b/ovn/northd/ovn-northd.c
> >> @@ -4411,6 +4411,28 @@ add_route(struct hmap *lflows, const struct
> ovn_port *op,
> >>       * routing. */
> >>      ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING, priority,
> >>                    ds_cstr(&match), ds_cstr(&actions));
> >> +
> >> +    /* When output port is distributed gateway port, check if the
> router
> >> +     * input port is a patch port connected to vlan network.
> >> +     * Traffic from VLAN network to external network should be
> redirected
> >> +     * to "redirect-chassis" by setting REGBIT_NAT_REDIRECT flag.
> >> +     * Later physical table 32 will output this traffic to gateway
> >> +     * chassis using input network vlan tag */
> >> +    if (op == op->od->l3dgw_port) {
> >> +        ds_clear(&match);
> >> +        ds_clear(&actions);
> >> +
> >> +        ds_put_format(&match, "ip%s.%s == %s/%d", is_ipv4 ? "4" : "6",
> >> +                      dir, network_s, plen);
> >> +        ds_put_format(&match, " && flags.rcv_from_vlan == 1");
> >> +        ds_put_format(&match, " && !is_chassis_resident(%s)",
> >> +                      op->od->l3redirect_port->json_key);
> >> +
> >> +        ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING,
> >> +                      priority + 1, ds_cstr(&match),
> >> +                      REGBIT_NAT_REDIRECT" = 1; next;");
> >> +    }
> >> +
> >>      ds_destroy(&match);
> >>      ds_destroy(&actions);
> >>  }
> >> @@ -4822,6 +4844,19 @@ build_lrouter_flows(struct hmap *datapaths,
> struct hmap *ports,
> >>          }
> >>          ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 50,
> >>                        ds_cstr(&match), "next;");
> >> +
> >> +        /* VLAN traffic from localnet port should be allowed for
> >> +         * router processing on the "redirect-chassis". */
> >> +        if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer
> &&
> >> +            op->peer->od->localnet_port && (op != op->od->l3dgw_port))
> {
> >> +            ds_clear(&match);
> >> +            ds_put_format(&match, "flags.rcv_from_vlan == 1");
> >> +            ds_put_format(&match, " && inport == %s", op->json_key);
> >> +            ds_put_format(&match, " && is_chassis_resident(%s)",
> >> +                          op->od->l3redirect_port->json_key);
> >> +            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 100,
> >> +                          ds_cstr(&match), "next;");
> >> +        }
> >>      }
> >>
> >>      /* Logical router ingress table 1: IP Input. */
> >> diff --git a/tests/ovn.at b/tests/ovn.at
> >> index f12c24c..6916bd0 100644
> >> --- a/tests/ovn.at
> >> +++ b/tests/ovn.at
> >> @@ -7713,6 +7713,233 @@ test_ip_packet gw2 gw1
> >>  OVN_CLEANUP([hv1],[gw1],[gw2],[ext1])
> >>  AT_CLEANUP
> >>
> >> +# VLAN traffic for external network redirected through distributed
> router gateway port
> >> +# should use vlans(i.e input network vlan tag) across hypervisors
> instead of tunneling.
> >> +AT_SETUP([ovn -- vlan traffic for external network with distributed
> router gateway port])
> >> +AT_SKIP_IF([test $HAVE_PYTHON = no])
> >> +ovn_start
> >> +
> >> +# Logical network:
> >> +# # One LR R1 that has switches foo (192.168.1.0/24) and
> >> +# # alice (172.16.1.0/24) connected to it.  The logical port
> >> +# # between R1 and alice has a "redirect-chassis" specified,
> >> +# # i.e. it is the distributed router gateway port(172.16.1.6).
> >> +# # Switch alice also has a localnet port defined.
> >> +# # An additional switch outside has the same subnet as alice
> >> +# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1)
> >> +# # which will receive the packet destined for external network
> >> +# # (i.e 8.8.8.8 as destination ip).
> >> +
> >> +# Physical network:
> >> +# # Three hypervisors hv[123].
> >> +# # hv1 hosts vif foo1.
> >> +# # hv2 is the "redirect-chassis" that hosts the distributed router
> gateway port.
> >> +# # hv3 hosts nexthop port vif outside1.
> >> +# # All other tests connect hypervisors to network n1 through br-phys
> for tunneling.
> >> +# # But in this test, hv1 won't connect to n1(and no br-phys in hv1),
> and
> >> +# # in order to show vlans(instead of tunneling) used between hv1 and
> hv2,
> >> +# # a new network n2 created and hv1 and hv2 connected to this network
> through br-ex.
> >> +# # hv2 and hv3 are still connected to n1 network through br-phys.
> >> +net_add n1
> >> +
> >> +# We are not calling ovn_attach for hv1, to avoid adding br-phys.
> >> +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any
> bridge in hv1
> >> +sim_add hv1
> >> +as hv1
> >> +ovs-vsctl \
> >> +    -- set Open_vSwitch . external-ids:system-id=hv1 \
> >> +    -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock
> \
> >> +    -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \
> >> +    -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \
> >> +    -- add-br br-int \
> >> +    -- set bridge br-int fail-mode=secure other-config:disable-in-band=
> true
> >> +start_daemon ovn-controller
> >> +ovs-vsctl -- add-port br-int hv1-vif1 -- \
> >> +    set interface hv1-vif1 external-ids:iface-id=foo1 \
> >> +    ofport-request=1
> >> +
> >> +sim_add hv2
> >> +as hv2
> >> +ovs-vsctl add-br br-phys
> >> +ovn_attach n1 br-phys 192.168.0.2
> >> +
> >> +sim_add hv3
> >> +as hv3
> >> +ovs-vsctl add-br br-phys
> >> +ovn_attach n1 br-phys 192.168.0.3
> >> +ovs-vsctl -- add-port br-int hv3-vif1 -- \
> >> +    set interface hv3-vif1 external-ids:iface-id=outside1 \
> >> +    options:tx_pcap=hv3/vif1-tx.pcap \
> >> +    options:rxq_pcap=hv3/vif1-rx.pcap \
> >> +    ofport-request=1
> >> +
> >> +# Create network n2 for vlan connectivity between hv1 and hv2
> >> +net_add n2
> >> +
> >> +as hv1
> >> +ovs-vsctl add-br br-ex
> >> +net_attach n2 br-ex
> >> +
> >> +as hv2
> >> +ovs-vsctl add-br br-ex
> >> +net_attach n2 br-ex
> >> +
> >> +as hv1 ovs-vsctl set open . external-ids:ovn-bridge-
> mappings="public:br-ex"
> >> +as hv2 ovs-vsctl set open . external-ids:ovn-bridge-
> mappings="public:br-ex,phys:br-phys"
> >> +as hv3 ovs-vsctl set open . external-ids:ovn-bridge-
> mappings=phys:br-phys
> >> +OVN_POPULATE_ARP
> >> +
> >> +ovn-nbctl create Logical_Router name=R1
> >> +
> >> +ovn-nbctl ls-add foo
> >> +ovn-nbctl ls-add alice
> >> +ovn-nbctl ls-add outside
> >> +
> >> +# Connect foo to R1
> >> +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24
> >> +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
> >> +    type=router options:router-port=foo \
> >> +    -- lsp-set-addresses rp-foo router
> >> +
> >> +# Connect alice to R1 as distributed router gateway port (172.16.1.6)
> on hv2
> >> +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \
> >> +    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
> >> +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
> >> +    type=router options:router-port=alice \
> >> +    -- lsp-set-addresses rp-alice router
> >> +
> >> +# Create logical port foo1 in foo
> >> +ovn-nbctl lsp-add foo foo1 \
> >> +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
> >> +
> >> +# Create logical port outside1 in outside, which is a nexthop address
> >> +# for 172.16.1.0/24
> >> +ovn-nbctl lsp-add outside outside1 \
> >> +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1"
> >> +
> >> +# Set default gateway (nexthop) to 172.16.1.1
> >> +ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice
> >> +AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24])
> >> +
> >> +ovn-nbctl lsp-add foo ln-foo
> >> +ovn-nbctl lsp-set-addresses ln-foo unknown
> >> +ovn-nbctl lsp-set-options ln-foo network_name=public
> >> +ovn-nbctl lsp-set-type ln-foo localnet
> >> +AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2])
> >> +
> >> +# Create localnet port in alice
> >> +ovn-nbctl lsp-add alice ln-alice
> >> +ovn-nbctl lsp-set-addresses ln-alice unknown
> >> +ovn-nbctl lsp-set-type ln-alice localnet
> >> +ovn-nbctl lsp-set-options ln-alice network_name=phys
> >> +
> >> +# Create localnet port in outside
> >> +ovn-nbctl lsp-add outside ln-outside
> >> +ovn-nbctl lsp-set-addresses ln-outside unknown
> >> +ovn-nbctl lsp-set-type ln-outside localnet
> >> +ovn-nbctl lsp-set-options ln-outside network_name=phys
> >> +ovn-nbctl --wait=hv sync
> >> +
> >> +ip_to_hex() {
> >> +    printf "%02x%02x%02x%02x" "$@"
> >> +}
> >> +gw_ip=$(ip_to_hex 172 16 1 6)
> >> +src_ip=$(ip_to_hex 192 168 1 2)
> >> +dst_ip=$(ip_to_hex 8 8 8 8)
> >> +nexthop_ip=$(ip_to_hex 172 16 1 1)
> >> +
> >> +# Send ip packet from foo1 to 8.8.8.8
> >> +src_mac="f00000010203"
> >> +dst_mac="000001010203"
> >> +packet=${dst_mac}${src_mac}08004500001c0000000040110000${
> src_ip}${dst_ip}0035111100080000
> >> +
> >> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
> >> +sleep 2
> >> +
> >> +# ARP request packet to expect at outside1
> >> +src_mac="000002010203"
> >> +arp_request=ffffffffffff${src_mac}08060001080006040001${
> src_mac}${gw_ip}000000000000${nexthop_ip}
> >> +echo $arp_request >> hv3-vif1.expected
> >> +OVN_CHECK_PACKETS([hv3/vif1-tx.pcap], [hv3-vif1.expected])
> >> +
> >> +# Send ARP reply from outside1 back to the router
> >> +reply_mac="f00000010204"
> >> +arp_reply=${src_mac}${reply_mac}08060001080006040002${
> reply_mac}${nexthop_ip}${src_mac}${gw_ip}
> >> +
> >> +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply
> >> +
> >> +# Allow some time for ovn-northd and ovn-controller to catch up.
> >> +# XXX This should be more systematic.
> >> +sleep 1
> >> +
> >> +# VLAN tagged packet with distributed gateway port(172.16.1.6) MAC as
> destination MAC
> >> +# is expected on bridge connecting hv1 and hv2
> >> +src_mac="f00000010203"
> >> +dst_mac="000002010203"
> >> +expected=${dst_mac}${src_mac}8100000208004500001c0000000040
> 110000${src_ip}${dst_ip}0035111100080000
> >> +echo $expected > hv1-br-ex_n2.expected
> >> +
> >> +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port.
> >> +# As connection tracking not enabled for this test, snat can't be done
> on the packet.
> >> +# We still see foo1 as the source ip address. But source
> mac(172.16.1.6 MAC) and
> >> +# dest mac(172.16.1.1 mac) are properly configured.
> >> +src_mac="000002010203"
> >> +dst_mac="f00000010204"
> >> +expected=${dst_mac}${src_mac}08004500001c000000003f110100${
> src_ip}${dst_ip}0035111100080000
> >> +echo $expected > hv3-vif1.expected
> >> +
> >> +reset_pcap_file() {
> >> +    local iface=$1
> >> +    local pcap_file=$2
> >> +    ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \
> >> +options:rxq_pcap=dummy-rx.pcap
> >> +    rm -f ${pcap_file}*.pcap
> >> +    ovs-vsctl -- set Interface $iface options:tx_pcap=${pcap_file}-tx.pcap
> \
> >> +options:rxq_pcap=${pcap_file}-rx.pcap
> >> +}
> >> +
> >> +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2
> >> +as hv3 reset_pcap_file hv3-vif1 hv3/vif1
> >> +sleep 1
> >> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
> >> +sleep 2
> >> +
> >> +# On hv1, table 65 for packets going from vlan switch pipleline to
> router pipleine
> >> +# set MLF_RCV_FROM_VLAN flag
> >> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
> "priority=100,reg15=0x1,metadata=0x2" \
> >> +| grep actions=clone | grep "load:0x1->NXM_NX_REG10" | wc -l], [0], [[1
> >> +]])
> >> +# On hv1, because of snat rule in table 15, a higher priority(i.e 2)
> flow
> >> +# added for packets with MLF_RCV_FROM_VLAN flag with output as
> distributed
> >> +# gateway port, which sets REGBIT_NAT_REDIRECT flag
> >> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=15 | grep
> "priority=2,ip,reg10=0x20/0x20,metadata=0x1" \
> >> +| grep "actions=load:0x1->OXM_OF_PKT_REG4" | wc -l], [0], [[1
> >> +]])
> >> +
> >> +# On hv1, table 32 flow which tags packet with source network vlan tag
> and sends it to hv2
> >> +# through br-ex
> >> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 | grep
> "priority=150,reg14=0x1,reg15=0x3,metadata=0x1" \
> >> +| grep "actions=mod_vlan_vid:2" | grep "n_packets=2," | wc -l], [0],
> [[1
> >> +]])
> >> +
> >> +# On hv2 table 0, vlan tagged packet is sent through router pipeline
> >> +# by setting MLF_RCV_FROM_VLAN flag (REG10)
> >> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep "table=0," | grep
> "priority=150" | grep "dl_vlan=2" | \
> >> +grep "actions=strip_vlan,load:0x1->OXM_OF_METADATA" | grep
> "load:0x1->NXM_NX_REG14" | \
> >> +grep "load:0x1->NXM_NX_REG10" | wc -l], [0], [[1
> >> +]])
> >> +# on hv2 table 8, allow packets with router metadata and with
> MLF_RCV_FROM_VLAN flag
> >> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=8 | grep
> "priority=100,reg10=0x20/0x20,reg14=0x1,metadata=0x1" | wc -l], [0], [[1
> >> +]])
> >> +
> >> +# Check vlan tagged packet on the bridge connecting hv1 and hv2
> >> +OVN_CHECK_PACKETS([hv1/br-ex_n2-tx.pcap], [hv1-br-ex_n2.expected])
> >> +# Check expected packet on nexthop interface
> >> +OVN_CHECK_PACKETS([hv3/vif1-tx.pcap], [hv3-vif1.expected])
> >> +
> >> +OVN_CLEANUP([hv1],[hv2],[hv3])
> >> +AT_CLEANUP
> >> +
> >>  AT_SETUP([ovn -- 1 LR with distributed router gateway port])
> >>  AT_SKIP_IF([test $HAVE_PYTHON = no])
> >>  ovn_start
> >> --
> >> 1.8.3.1
> >>
> >> _______________________________________________
> >> dev mailing list
> >> dev@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> >
> >
> > --
> > Russell Bryant
>
>
>
> --
> Russell Bryant
>
Anil Venkata June 1, 2018, 12:57 p.m. | #4
On Fri, Jun 1, 2018 at 12:13 AM, Anil Venkata <anilvenkata@redhat.com>
wrote:

> Thanks Russell.
>
> On Thu, May 31, 2018 at 1:41 AM, Russell Bryant <russell@ovn.org> wrote:
>
>> One more general question:
>>
>> a major difference when doing the redirect to the gateway via a VLAN
>> vs a geneve tunnel is the lack of metadata.  You've demonstrated how
>> it's easy enough to identify the network (the VLAN ID + the port it
>> arrived on).  How about the logical input / output IDs?  What values
>> are included when the packet is sent over the tunnel?  Are we
>> confident those values are not needed, or can be inferred another way
>> in this scenario?
>>
>
> When the packet is sent over the geneve tunnel, geneve tunnel key consists
> of
> below values
> 1) router patch port connected to tenant vlan network as logical input port
> 2) redirect chassis port as logical output port
> 2) router datapath as logical datapath
> When gateway chassis receives packet from tunnel, it ignores logical input
> port
> and directly sends it to table 33 (which sets output to distributed
> gateway port)
> for egress processing.
>
> When gateway chassis receives vlan tagged packet, as we don't have logical
> input
>  & output details, this patch sets -
>  "router patch port connected to tenant vlan network" as logical input
> port and
> sends the packet through router ingress pipeline to determine logical
> output port.
>
> If we check destination MAC along with VLAN ID in table 0 on gateway
> chassis,
> as suggested by Miguel and Jakub, and if the destination MAC is for
> distributed
> gateway port, then we can set output port to distributed gateway port in
> table 0
> and directly pass packet to router egress pipeline(skipping router ingress
> pipeline).
>
>
>> On Wed, May 30, 2018 at 3:59 PM, Russell Bryant <russell@ovn.org> wrote:
>> > On Fri, May 25, 2018 at 7:33 AM,  <vkommadi@redhat.com> wrote:
>> >> From: venkata anil <vkommadi@redhat.com>
>> >>
>> >> When a vm on a vlan tenant network sends traffic to an external
>> network,
>> >> it is tunneled from host chassis to gateway chassis. In the earlier
>> >> discussion [1], Russel (also in his doc [2]) suggested if we can figure
>> >> out a way for OVN to do this redirect to the gateway host over a VLAN
>> >> network. This patch implements his suggestion i.e will redirect to
>> >> gateway chassis using incoming tenant vlan network. Gateway chassis are
>> >> expected to be configured with tenant vlan networks. In this approach,
>> >> new logical and physical flows introduced for packet processing in both
>> >> host and gateway chassis.
>> >
>> > I don't think we can impose the expectation that the gateway is on the
>> > same vlan network as the original compute node.  The previous behavior
>> > of using the tunnel does not require that.
>> >
>>
>
>
> I thought we can impose this expectation as in Openstack DVR environment,
>  to use centralized SNAT for DVR routers, centralized router(on network
> node)
>  will have connectivity to same tenant vlan network which compute node has
>  connected. Indeed a new port is created on each tenant network and these
> ports
> are added to centralized SNAT router namespace.  VM will send packet to
> these
> ports on centralized SNAT router though this tenant  vlan network.
>
>

Without this patch, the existing code tunnels the packet to gateway
chassis. But the return path i.e gateway chassis to compute chassis is
always through localnet port. So bridge mappings on gateway chassis is must
for existing implementation as well.

We can still implement falling back to tunnelling(for compute to gateway
chassis communication) when gateway chassis is not having bridge mappings.


> > Have you thought of whether we could use the new behavior
>> > automatically if we know both chassis are on the same network, or fall
>> > back to a tunnel if necessary?
>> >
>>
>
>
> No, I didn't think about that. Thanks for the suggestion.
> If we couldn't impose the above expectation, then we can implement your
> suggestion
> about falling back to tunnelling.
>
>
>> >>
>> >> Packet processing in the host chassis:
>> >> 1) A new ovs flow added in physical table 65, which sets
>> MLF_RCV_FROM_VLAN
>> >>    flag for packets from vlan network entering into router pipeline
>> >> 2) A new flow added in lr_in_ip_routing, for packets output through
>> >>    distributed gateway port and matching MLF_RCV_FROM_VLAN flag,
>> >>    set REGBIT_NAT_REDIRECT i.e
>> >>    table=7 (lr_in_ip_routing   ), priority=2    , match=(
>> >>    ip4.dst == 0.0.0.0/0 && flags.rcv_from_vlan == 1 &&
>> >>    !is_chassis_resident("cr-alice")), action=(reg9[0] = 1; next;)
>> >>    This flow will be set only on chassis not hosting chassisredirect
>> >>    port i.e compute node.
>> >>    When REGBIT_NAT_REDIRECT set,
>> >>    a) lr_in_arp_resolve, will set packet eth.dst to distibuted gateway
>> >>       port MAC
>> >>    b) lr_in_gw_redirect, will set chassisredirect port as outport
>> >> 3) A new ovs flow added in physical table 32 will use source vlan
>> tenant
>> >>    network tag as vlan ID for sending the packet to gateway chassis.
>> >>    As this vlan packet destination MAC is distibuted gateway port MAC,
>> >>    packet will only reach the gateway chassis.
>> >>    table=32,priority=150,reg14=0x3,reg15=0x6,metadata=0x4
>> >>    actions=mod_vlan_vid:2010,output:25,strip_vlan
>> >>    This flow will be set only on chassis not hosting chassisredirect
>> >>    port i.e compute node.
>> >>
>> >> Packet processing in the gateway chassis:
>> >> 1) A new ovs flow added in physical table 0 to pass vlan traffic coming
>> >>    from localnet port to the connected router pipeline(i.e router
>> >>    attached to vlan tenant network).
>> >>    This flow will set router metadata, reg14 to router's patch
>> port(lrp)
>> >>    (i.e patch port connecting router and vlan tenant network) and a new
>> >>    MLF_RCV_FROM_VLAN flag.
>> >>    table=0,priority=150,in_port=67,dl_vlan=2010 actions=strip_vlan,
>> >>    load:0x4->OXM_OF_METADATA[],load:0x3->NXM_NX_REG14[],
>> >>    load:0x1->NXM_NX_REG10[5],resubmit(,8)
>> >>    This flow will be set only on chassis hosting chassisredirect
>> >>    port i.e gateway node.
>> >> 2) A new flow added in lr_in_admission which checks MLF_RCV_FROM_VLAN
>> >>    and allows the packet. This flow will be set only on chassis hosting
>> >>    chassisredirect port i.e gateway node.
>> >>    table=0 (lr_in_admission    ), priority=100  , match=(
>> >>    flags.rcv_from_vlan == 1 && inport == "lrp-44383893-613a-4bfe-b483-
>> >>    e7d0dc3055cd" && is_chassis_resident("cr-lrp-a6e3d2ab-313a-4ea3-
>> >>    8ec4-c3c774a11f49")), action=(next;)
>> >>    Then packet will pass through router ingress and egress pipelines
>> and
>> >>    then to external switch pipeline.
>> >>
>> >> In a scenario where the traffic between two vms in the same tenant vlan
>> >> network across different chassis i.e if "vm1" on tenant vlan network
>> >> "net1" is on host chassis "ch1" and "vm2" on same tenant vlan network
>> >> "net1" is on gateway chassis "gw1". When the packet arrived on "gw1"
>> >> chassis from localnet port, we still send it to router pipeline and
>> router
>> >> pipeline will send it to destination switch ("net1") pipeline.
>> >
>> > Why is this?  Wouldn't the packet just have a destination MAC for "vm2"?
>> >
>>
>
> When the packet arrived at gateway chassis, in physical table 0, for vlan
> packets
> we are not checking destination MAC address and instead forcing the packet
> through router pipeline.
> Ajo and Jakub suggested to check for the destination MAC address along with
> vlan ID in physical table 0. Their suggestion is, if the destination MAC
> address is
> for distributed gateway port then send it through router pipeline,
> otherwise send it
> though switch pipeline. Otherwise it will hit the performance for vms in
> the same
>  tenant vlan network and across hypervisors (i.e compute and gateway
> nodes),
> as we are forcing the packet through router pipeline in the gateway
> chassis.
>
>
>
>> >> But in this case when packet arrives at "vm2", it will have router MAC
>> as
>> >> source MAC as the packet is routed in gateway chassis. This bevaviour
>> can
>> >> be seen only for destination vms hosted on gateway node.
>> >>
>> >> [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2018-Apri
>> l/046557.html
>> >> [2] Point 3 in section 3.3.1 - Future Enhancements
>> >> https://docs.google.com/document/d/1JecGIXPH0RAqfGvD0nmtBdEU
>> 1zflHACp8WSRnKCFSgg/edit#
>> >>
>> >> Reported-at: https://mail.openvswitch.org/p
>> ipermail/ovs-discuss/2018-April/046543.html
>> >>
>> >> Signed-off-by: Venkata Anil <vkommadi@redhat.com>
>> >> ---
>> >>  ovn/controller/bfd.c            |   3 +-
>> >>  ovn/controller/binding.c        |  10 +-
>> >>  ovn/controller/ovn-controller.c |   3 +
>> >>  ovn/controller/ovn-controller.h |  16 ++-
>> >>  ovn/controller/physical.c       |  94 ++++++++++++++++-
>> >>  ovn/lib/logical-fields.c        |   4 +
>> >>  ovn/lib/logical-fields.h        |   2 +
>> >>  ovn/northd/ovn-northd.c         |  35 +++++++
>> >>  tests/ovn.at                    | 227 ++++++++++++++++++++++++++++++
>> ++++++++++
>> >>  9 files changed, 390 insertions(+), 4 deletions(-)
>> >>
>> >> diff --git a/ovn/controller/bfd.c b/ovn/controller/bfd.c
>> >> index 8f020d5..cbbd3ba 100644
>> >> --- a/ovn/controller/bfd.c
>> >> +++ b/ovn/controller/bfd.c
>> >> @@ -139,8 +139,9 @@ bfd_travel_gw_related_chassis(struct
>> local_datapath *dp,
>> >>                                    struct local_datapath_node, node);
>> >>          dp = dp_binding->dp;
>> >>          free(dp_binding);
>> >> +        const struct sbrec_datapath_binding *pdp;
>> >>          for (size_t i = 0; i < dp->n_peer_dps; i++) {
>> >> -            const struct sbrec_datapath_binding *pdp =
>> dp->peer_dps[i];
>> >> +            pdp = dp->peer_dps[i]->peer_dp;
>> >>              if (!pdp) {
>> >>                  continue;
>> >>              }
>> >> diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
>> >> index 0785a94..f02bde5 100644
>> >> --- a/ovn/controller/binding.c
>> >> +++ b/ovn/controller/binding.c
>> >> @@ -148,10 +148,14 @@ add_local_datapath__(struct controller_ctx *ctx,
>> >>                                  "lport-by-datapath", &cursor);
>> >>
>> >>      SBREC_PORT_BINDING_FOR_EACH_EQUAL (pb, &cursor, lpval) {
>> >> +        if (!strcmp(pb->type, "chassisredirect")) {
>> >> +            ld->chassisredirect_port = pb;
>> >> +        }
>> >>          if (!strcmp(pb->type, "patch")) {
>> >>              const char *peer_name = smap_get(&pb->options, "peer");
>> >>              if (peer_name) {
>> >>                  const struct sbrec_port_binding *peer;
>> >> +                struct peer_datapath *pdp;
>> >>
>> >>                  peer = lport_lookup_by_name( ctx->ovnsb_idl,
>> peer_name);
>> >>
>> >> @@ -162,8 +166,12 @@ add_local_datapath__(struct controller_ctx *ctx,
>> >>                      ld->peer_dps = xrealloc(
>> >>                              ld->peer_dps,
>> >>                              ld->n_peer_dps * sizeof *ld->peer_dps);
>> >> -                    ld->peer_dps[ld->n_peer_dps - 1] =
>> datapath_lookup_by_key(
>> >> +                    pdp = xcalloc(1, sizeof(struct peer_datapath));
>> >> +                    pdp->peer_dp = datapath_lookup_by_key(
>> >>                          ctx->ovnsb_idl, peer->datapath->tunnel_key);
>> >> +                    pdp->patch = pb;
>> >> +                    pdp->peer = peer;
>> >> +                    ld->peer_dps[ld->n_peer_dps - 1] = pdp;
>> >>                  }
>> >>              }
>> >>          }
>> >> diff --git a/ovn/controller/ovn-controller.c
>> b/ovn/controller/ovn-controller.c
>> >> index 86e1836..55573fd 100644
>> >> --- a/ovn/controller/ovn-controller.c
>> >> +++ b/ovn/controller/ovn-controller.c
>> >> @@ -803,6 +803,9 @@ main(int argc, char *argv[])
>> >>
>> >>          struct local_datapath *cur_node, *next_node;
>> >>          HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node,
>> &local_datapaths) {
>> >> +            for (int i = 0; i < cur_node->n_peer_dps; i++) {
>> >> +                free(cur_node->peer_dps[i]);
>> >> +            }
>> >>              free(cur_node->peer_dps);
>> >>              hmap_remove(&local_datapaths, &cur_node->hmap_node);
>> >>              free(cur_node);
>> >> diff --git a/ovn/controller/ovn-controller.h
>> b/ovn/controller/ovn-controller.h
>> >> index 6617b0c..8023de2 100644
>> >> --- a/ovn/controller/ovn-controller.h
>> >> +++ b/ovn/controller/ovn-controller.h
>> >> @@ -46,6 +46,17 @@ struct ct_zone_pending_entry {
>> >>      enum ct_zone_pending_state state;
>> >>  };
>> >>
>> >> +/* Represents a peer datapath connected to a given datapath */
>> >> +struct peer_datapath {
>> >> +    const struct sbrec_datapath_binding *peer_dp;
>> >> +
>> >> +    /* Patch port connected to local datapath */
>> >> +    const struct sbrec_port_binding *patch;
>> >> +
>> >> +    /* Peer patch port connected to peer datapath */
>> >> +    const struct sbrec_port_binding *peer;
>> >> +};
>> >> +
>> >>  /* A logical datapath that has some relevance to this hypervisor.  A
>> logical
>> >>   * datapath D is relevant to hypervisor H if:
>> >>   *
>> >> @@ -63,10 +74,13 @@ struct local_datapath {
>> >>      /* The localnet port in this datapath, if any (at most one is
>> allowed). */
>> >>      const struct sbrec_port_binding *localnet_port;
>> >>
>> >> +    /* The chassisredirect port in this datapath */
>> >> +    const struct sbrec_port_binding *chassisredirect_port;
>> >> +
>> >>      /* True if this datapath contains an l3gateway port located on
>> this
>> >>       * hypervisor. */
>> >>      bool has_local_l3gateway;
>> >> -    const struct sbrec_datapath_binding **peer_dps;
>> >> +    struct peer_datapath **peer_dps;
>> >>      size_t n_peer_dps;
>> >>  };
>> >>
>> >> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
>> >> index fc8adcf..ad57f69 100644
>> >> --- a/ovn/controller/physical.c
>> >> +++ b/ovn/controller/physical.c
>> >> @@ -304,7 +304,8 @@ consider_port_binding(struct controller_ctx *ctx,
>> >>  {
>> >>      uint32_t dp_key = binding->datapath->tunnel_key;
>> >>      uint32_t port_key = binding->tunnel_key;
>> >> -    if (!get_local_datapath(local_datapaths, dp_key)) {
>> >> +    struct local_datapath *ld = get_local_datapath(local_datapaths,
>> dp_key);
>> >> +    if (!ld) {
>> >>          return;
>> >>      }
>> >>
>> >> @@ -350,6 +351,14 @@ consider_port_binding(struct controller_ctx *ctx,
>> >>              put_load(0, MFF_LOG_REG0 + i, 0, 32, ofpacts_p);
>> >>          }
>> >>          put_load(0, MFF_IN_PORT, 0, 16, ofpacts_p);
>> >> +        if (ld->localnet_port) {
>> >> +            int vlan_tag = (ld->localnet_port->n_tag ?
>> >> +                            *ld->localnet_port->tag : 0);
>> >> +            if (vlan_tag) {
>> >> +                put_load(1, MFF_LOG_FLAGS, MLF_RCV_FROM_VLAN_BIT, 1,
>> >> +                         ofpacts_p);
>> >> +            }
>> >> +        }
>> >>          put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
>> >>          clone = ofpbuf_at_assert(ofpacts_p, clone_ofs, sizeof *clone);
>> >>          ofpacts_p->header = clone;
>> >> @@ -539,6 +548,47 @@ consider_port_binding(struct controller_ctx *ctx,
>> >>           * input port, MFF_LOG_DATAPATH to the logical datapath, and
>> >>           * resubmit into the logical ingress pipeline starting at
>> table
>> >>           * 16. */
>> >> +
>> >> +        /* Match a VLAN tag and strip it. If the vlan network is
>> connected
>> >> +         * to a router which has a gateway port on redirect-chassis,
>> >> +         * set MLF_RCV_FROM_VLAN flag, router metadata and input port
>> to
>> >> +         * connecting patch port */
>> >> +        int vlan_tag = binding->n_tag ? *binding->tag : 0;
>> >> +        if (!strcmp(binding->type, "localnet") && vlan_tag) {
>> >> +            struct local_datapath *ldp = get_local_datapath(
>> >> +                local_datapaths, binding->datapath->tunnel_key);
>> >> +            for (int i = 0; i < ldp->n_peer_dps; i++) {
>> >> +                struct local_datapath *peer_ldp = get_local_datapath(
>> >> +                    local_datapaths, ldp->peer_dps[i]->peer_dp->tun
>> nel_key);
>> >> +                const struct sbrec_port_binding *crp;
>> >> +                crp = peer_ldp->chassisredirect_port;
>> >> +                if (crp && crp->chassis &&
>> >> +                   !strcmp(crp->chassis->name, chassis->name)) {
>> >> +                    const char *gwp = smap_get(&crp->options,
>> >> +                                               "distributed-port");
>> >> +                    if (strcmp(gwp, ldp->peer_dps[i]->peer->logical_port))
>> {
>> >> +                        ofpbuf_clear(ofpacts_p);
>> >> +                        match_init_catchall(&match);
>> >> +
>> >> +                        match_set_in_port(&match, ofport);
>> >> +                        match_set_dl_vlan(&match, htons(vlan_tag));
>> >> +
>> >> +                        ofpact_put_STRIP_VLAN(ofpacts_p);
>> >> +                        put_load(peer_ldp->datapath->tunnel_key,
>> >> +                                 MFF_LOG_DATAPATH, 0, 64, ofpacts_p);
>> >> +                        put_load(ldp->peer_dps[i]->peer->tunnel_key,
>> >> +                                 MFF_LOG_INPORT, 0, 32, ofpacts_p);
>> >> +                        put_load(1, MFF_LOG_FLAGS,
>> >> +                                 MLF_RCV_FROM_VLAN_BIT, 1, ofpacts_p);
>> >> +                        put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE,
>> ofpacts_p);
>> >> +
>> >> +                        ofctrl_add_flow(flow_table,
>> OFTABLE_PHY_TO_LOG,
>> >> +                                        150, 0, &match, ofpacts_p);
>> >> +                    }
>> >> +                }
>> >> +            }
>> >> +        }
>> >> +
>> >>          ofpbuf_clear(ofpacts_p);
>> >>          match_init_catchall(&match);
>> >>          match_set_in_port(&match, ofport);
>> >> @@ -639,6 +689,48 @@ consider_port_binding(struct controller_ctx *ctx,
>> >>           * flow matches an output port that includes a logical port
>> on a remote
>> >>           * hypervisor, and tunnels the packet to that hypervisor.
>> >>           */
>> >> +
>> >> +        /* For each vlan network connected to the router, add that
>> network's
>> >> +         * vlan tag to the packet and output it through localnet port
>> */
>> >> +        struct local_datapath *ldp = get_local_datapath(local_datap
>> aths,
>> >> +                                                        dp_key);
>> >> +        for (int i = 0; i < ldp->n_peer_dps; i++) {
>> >> +            struct ofpact_vlan_vid *vlan_vid;
>> >> +            ofp_port_t port_ofport = 0;
>> >> +            struct peer_datapath *pdp = ldp->peer_dps[i];
>> >> +            struct local_datapath *peer_ldp = get_local_datapath(
>> >> +                local_datapaths, pdp->peer_dp->tunnel_key);
>> >> +            if (peer_ldp->localnet_port && pdp->patch->tunnel_key) {
>> >> +                int64_t vlan_tag = (peer_ldp->localnet_port->n_tag ?
>> >> +                                    *peer_ldp->localnet_port->tag :
>> 0);
>> >> +                if (!vlan_tag) {
>> >> +                    continue;
>> >> +                }
>> >> +                port_ofport = u16_to_ofp(simap_get(&localvif
>> _to_ofport,
>> >> +                    peer_ldp->localnet_port->logical_port));
>> >> +                if (!port_ofport) {
>> >> +                    continue;
>> >> +                }
>> >> +
>> >> +                match_init_catchall(&match);
>> >> +                ofpbuf_clear(ofpacts_p);
>> >> +
>> >> +                match_set_metadata(&match, htonll(dp_key));
>> >> +                match_set_reg(&match, MFF_LOG_INPORT - MFF_REG0,
>> >> +                              pdp->patch->tunnel_key);
>> >> +                match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0,
>> port_key);
>> >> +
>> >> +                vlan_vid = ofpact_put_SET_VLAN_VID(ofpacts_p);
>> >> +                vlan_vid->vlan_vid = vlan_tag;
>> >> +                vlan_vid->push_vlan_if_needed = true;
>> >> +                ofpact_put_OUTPUT(ofpacts_p)->port = port_ofport;
>> >> +                ofpact_put_STRIP_VLAN(ofpacts_p);
>> >> +
>> >> +                ofctrl_add_flow(flow_table, OFTABLE_REMOTE_OUTPUT,
>> 150, 0,
>> >> +                                &match, ofpacts_p);
>> >> +            }
>> >> +        }
>> >> +
>> >>          match_init_catchall(&match);
>> >>          ofpbuf_clear(ofpacts_p);
>> >>
>> >> diff --git a/ovn/lib/logical-fields.c b/ovn/lib/logical-fields.c
>> >> index a8b5e3c..b9efa02 100644
>> >> --- a/ovn/lib/logical-fields.c
>> >> +++ b/ovn/lib/logical-fields.c
>> >> @@ -105,6 +105,10 @@ ovn_init_symtab(struct shash *symtab)
>> >>               MLF_FORCE_SNAT_FOR_LB_BIT);
>> >>      expr_symtab_add_subfield(symtab, "flags.force_snat_for_lb", NULL,
>> >>                               flags_str);
>> >> +    snprintf(flags_str, sizeof flags_str, "flags[%d]",
>> >> +             MLF_RCV_FROM_VLAN_BIT);
>> >> +    expr_symtab_add_subfield(symtab, "flags.rcv_from_vlan", NULL,
>> >> +                             flags_str);
>> >>
>> >>      /* Connection tracking state. */
>> >>      expr_symtab_add_field(symtab, "ct_mark", MFF_CT_MARK, NULL,
>> false);
>> >> diff --git a/ovn/lib/logical-fields.h b/ovn/lib/logical-fields.h
>> >> index b1dbb03..96250fd 100644
>> >> --- a/ovn/lib/logical-fields.h
>> >> +++ b/ovn/lib/logical-fields.h
>> >> @@ -50,6 +50,7 @@ enum mff_log_flags_bits {
>> >>      MLF_FORCE_SNAT_FOR_DNAT_BIT = 2,
>> >>      MLF_FORCE_SNAT_FOR_LB_BIT = 3,
>> >>      MLF_LOCAL_ONLY_BIT = 4,
>> >> +    MLF_RCV_FROM_VLAN_BIT = 5,
>> >>  };
>> >>
>> >>  /* MFF_LOG_FLAGS_REG flag assignments */
>> >> @@ -75,6 +76,7 @@ enum mff_log_flags {
>> >>       * hypervisors should instead only be output to local targets
>> >>       */
>> >>      MLF_LOCAL_ONLY = (1 << MLF_LOCAL_ONLY_BIT),
>> >> +    MLF_RCV_FROM_VLAN = (1 << MLF_RCV_FROM_VLAN_BIT),
>> >>  };
>> >>
>> >>  #endif /* ovn/lib/logical-fields.h */
>> >> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>> >> index 0e06776..f68da2b 100644
>> >> --- a/ovn/northd/ovn-northd.c
>> >> +++ b/ovn/northd/ovn-northd.c
>> >> @@ -4411,6 +4411,28 @@ add_route(struct hmap *lflows, const struct
>> ovn_port *op,
>> >>       * routing. */
>> >>      ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING, priority,
>> >>                    ds_cstr(&match), ds_cstr(&actions));
>> >> +
>> >> +    /* When output port is distributed gateway port, check if the
>> router
>> >> +     * input port is a patch port connected to vlan network.
>> >> +     * Traffic from VLAN network to external network should be
>> redirected
>> >> +     * to "redirect-chassis" by setting REGBIT_NAT_REDIRECT flag.
>> >> +     * Later physical table 32 will output this traffic to gateway
>> >> +     * chassis using input network vlan tag */
>> >> +    if (op == op->od->l3dgw_port) {
>> >> +        ds_clear(&match);
>> >> +        ds_clear(&actions);
>> >> +
>> >> +        ds_put_format(&match, "ip%s.%s == %s/%d", is_ipv4 ? "4" : "6",
>> >> +                      dir, network_s, plen);
>> >> +        ds_put_format(&match, " && flags.rcv_from_vlan == 1");
>> >> +        ds_put_format(&match, " && !is_chassis_resident(%s)",
>> >> +                      op->od->l3redirect_port->json_key);
>> >> +
>> >> +        ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING,
>> >> +                      priority + 1, ds_cstr(&match),
>> >> +                      REGBIT_NAT_REDIRECT" = 1; next;");
>> >> +    }
>> >> +
>> >>      ds_destroy(&match);
>> >>      ds_destroy(&actions);
>> >>  }
>> >> @@ -4822,6 +4844,19 @@ build_lrouter_flows(struct hmap *datapaths,
>> struct hmap *ports,
>> >>          }
>> >>          ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 50,
>> >>                        ds_cstr(&match), "next;");
>> >> +
>> >> +        /* VLAN traffic from localnet port should be allowed for
>> >> +         * router processing on the "redirect-chassis". */
>> >> +        if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer
>> &&
>> >> +            op->peer->od->localnet_port && (op !=
>> op->od->l3dgw_port)) {
>> >> +            ds_clear(&match);
>> >> +            ds_put_format(&match, "flags.rcv_from_vlan == 1");
>> >> +            ds_put_format(&match, " && inport == %s", op->json_key);
>> >> +            ds_put_format(&match, " && is_chassis_resident(%s)",
>> >> +                          op->od->l3redirect_port->json_key);
>> >> +            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 100,
>> >> +                          ds_cstr(&match), "next;");
>> >> +        }
>> >>      }
>> >>
>> >>      /* Logical router ingress table 1: IP Input. */
>> >> diff --git a/tests/ovn.at b/tests/ovn.at
>> >> index f12c24c..6916bd0 100644
>> >> --- a/tests/ovn.at
>> >> +++ b/tests/ovn.at
>> >> @@ -7713,6 +7713,233 @@ test_ip_packet gw2 gw1
>> >>  OVN_CLEANUP([hv1],[gw1],[gw2],[ext1])
>> >>  AT_CLEANUP
>> >>
>> >> +# VLAN traffic for external network redirected through distributed
>> router gateway port
>> >> +# should use vlans(i.e input network vlan tag) across hypervisors
>> instead of tunneling.
>> >> +AT_SETUP([ovn -- vlan traffic for external network with distributed
>> router gateway port])
>> >> +AT_SKIP_IF([test $HAVE_PYTHON = no])
>> >> +ovn_start
>> >> +
>> >> +# Logical network:
>> >> +# # One LR R1 that has switches foo (192.168.1.0/24) and
>> >> +# # alice (172.16.1.0/24) connected to it.  The logical port
>> >> +# # between R1 and alice has a "redirect-chassis" specified,
>> >> +# # i.e. it is the distributed router gateway port(172.16.1.6).
>> >> +# # Switch alice also has a localnet port defined.
>> >> +# # An additional switch outside has the same subnet as alice
>> >> +# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1)
>> >> +# # which will receive the packet destined for external network
>> >> +# # (i.e 8.8.8.8 as destination ip).
>> >> +
>> >> +# Physical network:
>> >> +# # Three hypervisors hv[123].
>> >> +# # hv1 hosts vif foo1.
>> >> +# # hv2 is the "redirect-chassis" that hosts the distributed router
>> gateway port.
>> >> +# # hv3 hosts nexthop port vif outside1.
>> >> +# # All other tests connect hypervisors to network n1 through br-phys
>> for tunneling.
>> >> +# # But in this test, hv1 won't connect to n1(and no br-phys in hv1),
>> and
>> >> +# # in order to show vlans(instead of tunneling) used between hv1 and
>> hv2,
>> >> +# # a new network n2 created and hv1 and hv2 connected to this
>> network through br-ex.
>> >> +# # hv2 and hv3 are still connected to n1 network through br-phys.
>> >> +net_add n1
>> >> +
>> >> +# We are not calling ovn_attach for hv1, to avoid adding br-phys.
>> >> +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any
>> bridge in hv1
>> >> +sim_add hv1
>> >> +as hv1
>> >> +ovs-vsctl \
>> >> +    -- set Open_vSwitch . external-ids:system-id=hv1 \
>> >> +    -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock
>> \
>> >> +    -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \
>> >> +    -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \
>> >> +    -- add-br br-int \
>> >> +    -- set bridge br-int fail-mode=secure
>> other-config:disable-in-band=true
>> >> +start_daemon ovn-controller
>> >> +ovs-vsctl -- add-port br-int hv1-vif1 -- \
>> >> +    set interface hv1-vif1 external-ids:iface-id=foo1 \
>> >> +    ofport-request=1
>> >> +
>> >> +sim_add hv2
>> >> +as hv2
>> >> +ovs-vsctl add-br br-phys
>> >> +ovn_attach n1 br-phys 192.168.0.2
>> >> +
>> >> +sim_add hv3
>> >> +as hv3
>> >> +ovs-vsctl add-br br-phys
>> >> +ovn_attach n1 br-phys 192.168.0.3
>> >> +ovs-vsctl -- add-port br-int hv3-vif1 -- \
>> >> +    set interface hv3-vif1 external-ids:iface-id=outside1 \
>> >> +    options:tx_pcap=hv3/vif1-tx.pcap \
>> >> +    options:rxq_pcap=hv3/vif1-rx.pcap \
>> >> +    ofport-request=1
>> >> +
>> >> +# Create network n2 for vlan connectivity between hv1 and hv2
>> >> +net_add n2
>> >> +
>> >> +as hv1
>> >> +ovs-vsctl add-br br-ex
>> >> +net_attach n2 br-ex
>> >> +
>> >> +as hv2
>> >> +ovs-vsctl add-br br-ex
>> >> +net_attach n2 br-ex
>> >> +
>> >> +as hv1 ovs-vsctl set open . external-ids:ovn-bridge-mappin
>> gs="public:br-ex"
>> >> +as hv2 ovs-vsctl set open . external-ids:ovn-bridge-mappin
>> gs="public:br-ex,phys:br-phys"
>> >> +as hv3 ovs-vsctl set open . external-ids:ovn-bridge-mappin
>> gs=phys:br-phys
>> >> +OVN_POPULATE_ARP
>> >> +
>> >> +ovn-nbctl create Logical_Router name=R1
>> >> +
>> >> +ovn-nbctl ls-add foo
>> >> +ovn-nbctl ls-add alice
>> >> +ovn-nbctl ls-add outside
>> >> +
>> >> +# Connect foo to R1
>> >> +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24
>> >> +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
>> >> +    type=router options:router-port=foo \
>> >> +    -- lsp-set-addresses rp-foo router
>> >> +
>> >> +# Connect alice to R1 as distributed router gateway port (172.16.1.6)
>> on hv2
>> >> +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \
>> >> +    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
>> >> +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
>> >> +    type=router options:router-port=alice \
>> >> +    -- lsp-set-addresses rp-alice router
>> >> +
>> >> +# Create logical port foo1 in foo
>> >> +ovn-nbctl lsp-add foo foo1 \
>> >> +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
>> >> +
>> >> +# Create logical port outside1 in outside, which is a nexthop address
>> >> +# for 172.16.1.0/24
>> >> +ovn-nbctl lsp-add outside outside1 \
>> >> +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1"
>> >> +
>> >> +# Set default gateway (nexthop) to 172.16.1.1
>> >> +ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice
>> >> +AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24])
>> >> +
>> >> +ovn-nbctl lsp-add foo ln-foo
>> >> +ovn-nbctl lsp-set-addresses ln-foo unknown
>> >> +ovn-nbctl lsp-set-options ln-foo network_name=public
>> >> +ovn-nbctl lsp-set-type ln-foo localnet
>> >> +AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2])
>> >> +
>> >> +# Create localnet port in alice
>> >> +ovn-nbctl lsp-add alice ln-alice
>> >> +ovn-nbctl lsp-set-addresses ln-alice unknown
>> >> +ovn-nbctl lsp-set-type ln-alice localnet
>> >> +ovn-nbctl lsp-set-options ln-alice network_name=phys
>> >> +
>> >> +# Create localnet port in outside
>> >> +ovn-nbctl lsp-add outside ln-outside
>> >> +ovn-nbctl lsp-set-addresses ln-outside unknown
>> >> +ovn-nbctl lsp-set-type ln-outside localnet
>> >> +ovn-nbctl lsp-set-options ln-outside network_name=phys
>> >> +ovn-nbctl --wait=hv sync
>> >> +
>> >> +ip_to_hex() {
>> >> +    printf "%02x%02x%02x%02x" "$@"
>> >> +}
>> >> +gw_ip=$(ip_to_hex 172 16 1 6)
>> >> +src_ip=$(ip_to_hex 192 168 1 2)
>> >> +dst_ip=$(ip_to_hex 8 8 8 8)
>> >> +nexthop_ip=$(ip_to_hex 172 16 1 1)
>> >> +
>> >> +# Send ip packet from foo1 to 8.8.8.8
>> >> +src_mac="f00000010203"
>> >> +dst_mac="000001010203"
>> >> +packet=${dst_mac}${src_mac}08004500001c0000000040110000${sr
>> c_ip}${dst_ip}0035111100080000
>> >> +
>> >> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>> >> +sleep 2
>> >> +
>> >> +# ARP request packet to expect at outside1
>> >> +src_mac="000002010203"
>> >> +arp_request=ffffffffffff${src_mac}08060001080006040001${src
>> _mac}${gw_ip}000000000000${nexthop_ip}
>> >> +echo $arp_request >> hv3-vif1.expected
>> >> +OVN_CHECK_PACKETS([hv3/vif1-tx.pcap], [hv3-vif1.expected])
>> >> +
>> >> +# Send ARP reply from outside1 back to the router
>> >> +reply_mac="f00000010204"
>> >> +arp_reply=${src_mac}${reply_mac}08060001080006040002${reply
>> _mac}${nexthop_ip}${src_mac}${gw_ip}
>> >> +
>> >> +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply
>> >> +
>> >> +# Allow some time for ovn-northd and ovn-controller to catch up.
>> >> +# XXX This should be more systematic.
>> >> +sleep 1
>> >> +
>> >> +# VLAN tagged packet with distributed gateway port(172.16.1.6) MAC as
>> destination MAC
>> >> +# is expected on bridge connecting hv1 and hv2
>> >> +src_mac="f00000010203"
>> >> +dst_mac="000002010203"
>> >> +expected=${dst_mac}${src_mac}8100000208004500001c0000000040
>> 110000${src_ip}${dst_ip}0035111100080000
>> >> +echo $expected > hv1-br-ex_n2.expected
>> >> +
>> >> +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port.
>> >> +# As connection tracking not enabled for this test, snat can't be
>> done on the packet.
>> >> +# We still see foo1 as the source ip address. But source
>> mac(172.16.1.6 MAC) and
>> >> +# dest mac(172.16.1.1 mac) are properly configured.
>> >> +src_mac="000002010203"
>> >> +dst_mac="f00000010204"
>> >> +expected=${dst_mac}${src_mac}08004500001c000000003f110100${
>> src_ip}${dst_ip}0035111100080000
>> >> +echo $expected > hv3-vif1.expected
>> >> +
>> >> +reset_pcap_file() {
>> >> +    local iface=$1
>> >> +    local pcap_file=$2
>> >> +    ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \
>> >> +options:rxq_pcap=dummy-rx.pcap
>> >> +    rm -f ${pcap_file}*.pcap
>> >> +    ovs-vsctl -- set Interface $iface options:tx_pcap=${pcap_file}-tx.pcap
>> \
>> >> +options:rxq_pcap=${pcap_file}-rx.pcap
>> >> +}
>> >> +
>> >> +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2
>> >> +as hv3 reset_pcap_file hv3-vif1 hv3/vif1
>> >> +sleep 1
>> >> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>> >> +sleep 2
>> >> +
>> >> +# On hv1, table 65 for packets going from vlan switch pipleline to
>> router pipleine
>> >> +# set MLF_RCV_FROM_VLAN flag
>> >> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
>> "priority=100,reg15=0x1,metadata=0x2" \
>> >> +| grep actions=clone | grep "load:0x1->NXM_NX_REG10" | wc -l], [0],
>> [[1
>> >> +]])
>> >> +# On hv1, because of snat rule in table 15, a higher priority(i.e 2)
>> flow
>> >> +# added for packets with MLF_RCV_FROM_VLAN flag with output as
>> distributed
>> >> +# gateway port, which sets REGBIT_NAT_REDIRECT flag
>> >> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=15 | grep
>> "priority=2,ip,reg10=0x20/0x20,metadata=0x1" \
>> >> +| grep "actions=load:0x1->OXM_OF_PKT_REG4" | wc -l], [0], [[1
>> >> +]])
>> >> +
>> >> +# On hv1, table 32 flow which tags packet with source network vlan
>> tag and sends it to hv2
>> >> +# through br-ex
>> >> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 | grep
>> "priority=150,reg14=0x1,reg15=0x3,metadata=0x1" \
>> >> +| grep "actions=mod_vlan_vid:2" | grep "n_packets=2," | wc -l], [0],
>> [[1
>> >> +]])
>> >> +
>> >> +# On hv2 table 0, vlan tagged packet is sent through router pipeline
>> >> +# by setting MLF_RCV_FROM_VLAN flag (REG10)
>> >> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep "table=0," | grep
>> "priority=150" | grep "dl_vlan=2" | \
>> >> +grep "actions=strip_vlan,load:0x1->OXM_OF_METADATA" | grep
>> "load:0x1->NXM_NX_REG14" | \
>> >> +grep "load:0x1->NXM_NX_REG10" | wc -l], [0], [[1
>> >> +]])
>> >> +# on hv2 table 8, allow packets with router metadata and with
>> MLF_RCV_FROM_VLAN flag
>> >> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=8 | grep
>> "priority=100,reg10=0x20/0x20,reg14=0x1,metadata=0x1" | wc -l], [0], [[1
>> >> +]])
>> >> +
>> >> +# Check vlan tagged packet on the bridge connecting hv1 and hv2
>> >> +OVN_CHECK_PACKETS([hv1/br-ex_n2-tx.pcap], [hv1-br-ex_n2.expected])
>> >> +# Check expected packet on nexthop interface
>> >> +OVN_CHECK_PACKETS([hv3/vif1-tx.pcap], [hv3-vif1.expected])
>> >> +
>> >> +OVN_CLEANUP([hv1],[hv2],[hv3])
>> >> +AT_CLEANUP
>> >> +
>> >>  AT_SETUP([ovn -- 1 LR with distributed router gateway port])
>> >>  AT_SKIP_IF([test $HAVE_PYTHON = no])
>> >>  ovn_start
>> >> --
>> >> 1.8.3.1
>> >>
>> >> _______________________________________________
>> >> dev mailing list
>> >> dev@openvswitch.org
>> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> >
>> >
>> >
>> > --
>> > Russell Bryant
>>
>>
>>
>> --
>> Russell Bryant
>>
>
>

Patch

diff --git a/ovn/controller/bfd.c b/ovn/controller/bfd.c
index 8f020d5..cbbd3ba 100644
--- a/ovn/controller/bfd.c
+++ b/ovn/controller/bfd.c
@@ -139,8 +139,9 @@  bfd_travel_gw_related_chassis(struct local_datapath *dp,
                                   struct local_datapath_node, node);
         dp = dp_binding->dp;
         free(dp_binding);
+        const struct sbrec_datapath_binding *pdp;
         for (size_t i = 0; i < dp->n_peer_dps; i++) {
-            const struct sbrec_datapath_binding *pdp = dp->peer_dps[i];
+            pdp = dp->peer_dps[i]->peer_dp;
             if (!pdp) {
                 continue;
             }
diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
index 0785a94..f02bde5 100644
--- a/ovn/controller/binding.c
+++ b/ovn/controller/binding.c
@@ -148,10 +148,14 @@  add_local_datapath__(struct controller_ctx *ctx,
                                 "lport-by-datapath", &cursor);
 
     SBREC_PORT_BINDING_FOR_EACH_EQUAL (pb, &cursor, lpval) {
+        if (!strcmp(pb->type, "chassisredirect")) {
+            ld->chassisredirect_port = pb;
+        }
         if (!strcmp(pb->type, "patch")) {
             const char *peer_name = smap_get(&pb->options, "peer");
             if (peer_name) {
                 const struct sbrec_port_binding *peer;
+                struct peer_datapath *pdp;
 
                 peer = lport_lookup_by_name( ctx->ovnsb_idl, peer_name);
 
@@ -162,8 +166,12 @@  add_local_datapath__(struct controller_ctx *ctx,
                     ld->peer_dps = xrealloc(
                             ld->peer_dps,
                             ld->n_peer_dps * sizeof *ld->peer_dps);
-                    ld->peer_dps[ld->n_peer_dps - 1] = datapath_lookup_by_key(
+                    pdp = xcalloc(1, sizeof(struct peer_datapath));
+                    pdp->peer_dp = datapath_lookup_by_key(
                         ctx->ovnsb_idl, peer->datapath->tunnel_key);
+                    pdp->patch = pb;
+                    pdp->peer = peer;
+                    ld->peer_dps[ld->n_peer_dps - 1] = pdp;
                 }
             }
         }
diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
index 86e1836..55573fd 100644
--- a/ovn/controller/ovn-controller.c
+++ b/ovn/controller/ovn-controller.c
@@ -803,6 +803,9 @@  main(int argc, char *argv[])
 
         struct local_datapath *cur_node, *next_node;
         HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node, &local_datapaths) {
+            for (int i = 0; i < cur_node->n_peer_dps; i++) {
+                free(cur_node->peer_dps[i]);
+            }
             free(cur_node->peer_dps);
             hmap_remove(&local_datapaths, &cur_node->hmap_node);
             free(cur_node);
diff --git a/ovn/controller/ovn-controller.h b/ovn/controller/ovn-controller.h
index 6617b0c..8023de2 100644
--- a/ovn/controller/ovn-controller.h
+++ b/ovn/controller/ovn-controller.h
@@ -46,6 +46,17 @@  struct ct_zone_pending_entry {
     enum ct_zone_pending_state state;
 };
 
+/* Represents a peer datapath connected to a given datapath */
+struct peer_datapath {
+    const struct sbrec_datapath_binding *peer_dp;
+
+    /* Patch port connected to local datapath */
+    const struct sbrec_port_binding *patch;
+
+    /* Peer patch port connected to peer datapath */
+    const struct sbrec_port_binding *peer;
+};
+
 /* A logical datapath that has some relevance to this hypervisor.  A logical
  * datapath D is relevant to hypervisor H if:
  *
@@ -63,10 +74,13 @@  struct local_datapath {
     /* The localnet port in this datapath, if any (at most one is allowed). */
     const struct sbrec_port_binding *localnet_port;
 
+    /* The chassisredirect port in this datapath */
+    const struct sbrec_port_binding *chassisredirect_port;
+
     /* True if this datapath contains an l3gateway port located on this
      * hypervisor. */
     bool has_local_l3gateway;
-    const struct sbrec_datapath_binding **peer_dps;
+    struct peer_datapath **peer_dps;
     size_t n_peer_dps;
 };
 
diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c
index fc8adcf..ad57f69 100644
--- a/ovn/controller/physical.c
+++ b/ovn/controller/physical.c
@@ -304,7 +304,8 @@  consider_port_binding(struct controller_ctx *ctx,
 {
     uint32_t dp_key = binding->datapath->tunnel_key;
     uint32_t port_key = binding->tunnel_key;
-    if (!get_local_datapath(local_datapaths, dp_key)) {
+    struct local_datapath *ld = get_local_datapath(local_datapaths, dp_key);
+    if (!ld) {
         return;
     }
 
@@ -350,6 +351,14 @@  consider_port_binding(struct controller_ctx *ctx,
             put_load(0, MFF_LOG_REG0 + i, 0, 32, ofpacts_p);
         }
         put_load(0, MFF_IN_PORT, 0, 16, ofpacts_p);
+        if (ld->localnet_port) {
+            int vlan_tag = (ld->localnet_port->n_tag ?
+                            *ld->localnet_port->tag : 0);
+            if (vlan_tag) {
+                put_load(1, MFF_LOG_FLAGS, MLF_RCV_FROM_VLAN_BIT, 1,
+                         ofpacts_p);
+            }
+        }
         put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
         clone = ofpbuf_at_assert(ofpacts_p, clone_ofs, sizeof *clone);
         ofpacts_p->header = clone;
@@ -539,6 +548,47 @@  consider_port_binding(struct controller_ctx *ctx,
          * input port, MFF_LOG_DATAPATH to the logical datapath, and
          * resubmit into the logical ingress pipeline starting at table
          * 16. */
+
+        /* Match a VLAN tag and strip it. If the vlan network is connected
+         * to a router which has a gateway port on redirect-chassis,
+         * set MLF_RCV_FROM_VLAN flag, router metadata and input port to
+         * connecting patch port */
+        int vlan_tag = binding->n_tag ? *binding->tag : 0;
+        if (!strcmp(binding->type, "localnet") && vlan_tag) {
+            struct local_datapath *ldp = get_local_datapath(
+                local_datapaths, binding->datapath->tunnel_key);
+            for (int i = 0; i < ldp->n_peer_dps; i++) {
+                struct local_datapath *peer_ldp = get_local_datapath(
+                    local_datapaths, ldp->peer_dps[i]->peer_dp->tunnel_key);
+                const struct sbrec_port_binding *crp;
+                crp = peer_ldp->chassisredirect_port;
+                if (crp && crp->chassis &&
+                   !strcmp(crp->chassis->name, chassis->name)) {
+                    const char *gwp = smap_get(&crp->options,
+                                               "distributed-port");
+                    if (strcmp(gwp, ldp->peer_dps[i]->peer->logical_port)) {
+                        ofpbuf_clear(ofpacts_p);
+                        match_init_catchall(&match);
+
+                        match_set_in_port(&match, ofport);
+                        match_set_dl_vlan(&match, htons(vlan_tag));
+
+                        ofpact_put_STRIP_VLAN(ofpacts_p);
+                        put_load(peer_ldp->datapath->tunnel_key,
+                                 MFF_LOG_DATAPATH, 0, 64, ofpacts_p);
+                        put_load(ldp->peer_dps[i]->peer->tunnel_key,
+                                 MFF_LOG_INPORT, 0, 32, ofpacts_p);
+                        put_load(1, MFF_LOG_FLAGS,
+                                 MLF_RCV_FROM_VLAN_BIT, 1, ofpacts_p);
+                        put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p);
+
+                        ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG,
+                                        150, 0, &match, ofpacts_p);
+                    }
+                }
+            }
+        }
+
         ofpbuf_clear(ofpacts_p);
         match_init_catchall(&match);
         match_set_in_port(&match, ofport);
@@ -639,6 +689,48 @@  consider_port_binding(struct controller_ctx *ctx,
          * flow matches an output port that includes a logical port on a remote
          * hypervisor, and tunnels the packet to that hypervisor.
          */
+
+        /* For each vlan network connected to the router, add that network's
+         * vlan tag to the packet and output it through localnet port */
+        struct local_datapath *ldp = get_local_datapath(local_datapaths,
+                                                        dp_key);
+        for (int i = 0; i < ldp->n_peer_dps; i++) {
+            struct ofpact_vlan_vid *vlan_vid;
+            ofp_port_t port_ofport = 0;
+            struct peer_datapath *pdp = ldp->peer_dps[i];
+            struct local_datapath *peer_ldp = get_local_datapath(
+                local_datapaths, pdp->peer_dp->tunnel_key);
+            if (peer_ldp->localnet_port && pdp->patch->tunnel_key) {
+                int64_t vlan_tag = (peer_ldp->localnet_port->n_tag ?
+                                    *peer_ldp->localnet_port->tag : 0);
+                if (!vlan_tag) {
+                    continue;
+                }
+                port_ofport = u16_to_ofp(simap_get(&localvif_to_ofport,
+                    peer_ldp->localnet_port->logical_port));
+                if (!port_ofport) {
+                    continue;
+                }
+
+                match_init_catchall(&match);
+                ofpbuf_clear(ofpacts_p);
+
+                match_set_metadata(&match, htonll(dp_key));
+                match_set_reg(&match, MFF_LOG_INPORT - MFF_REG0,
+                              pdp->patch->tunnel_key);
+                match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key);
+
+                vlan_vid = ofpact_put_SET_VLAN_VID(ofpacts_p);
+                vlan_vid->vlan_vid = vlan_tag;
+                vlan_vid->push_vlan_if_needed = true;
+                ofpact_put_OUTPUT(ofpacts_p)->port = port_ofport;
+                ofpact_put_STRIP_VLAN(ofpacts_p);
+
+                ofctrl_add_flow(flow_table, OFTABLE_REMOTE_OUTPUT, 150, 0,
+                                &match, ofpacts_p);
+            }
+        }
+
         match_init_catchall(&match);
         ofpbuf_clear(ofpacts_p);
 
diff --git a/ovn/lib/logical-fields.c b/ovn/lib/logical-fields.c
index a8b5e3c..b9efa02 100644
--- a/ovn/lib/logical-fields.c
+++ b/ovn/lib/logical-fields.c
@@ -105,6 +105,10 @@  ovn_init_symtab(struct shash *symtab)
              MLF_FORCE_SNAT_FOR_LB_BIT);
     expr_symtab_add_subfield(symtab, "flags.force_snat_for_lb", NULL,
                              flags_str);
+    snprintf(flags_str, sizeof flags_str, "flags[%d]",
+             MLF_RCV_FROM_VLAN_BIT);
+    expr_symtab_add_subfield(symtab, "flags.rcv_from_vlan", NULL,
+                             flags_str);
 
     /* Connection tracking state. */
     expr_symtab_add_field(symtab, "ct_mark", MFF_CT_MARK, NULL, false);
diff --git a/ovn/lib/logical-fields.h b/ovn/lib/logical-fields.h
index b1dbb03..96250fd 100644
--- a/ovn/lib/logical-fields.h
+++ b/ovn/lib/logical-fields.h
@@ -50,6 +50,7 @@  enum mff_log_flags_bits {
     MLF_FORCE_SNAT_FOR_DNAT_BIT = 2,
     MLF_FORCE_SNAT_FOR_LB_BIT = 3,
     MLF_LOCAL_ONLY_BIT = 4,
+    MLF_RCV_FROM_VLAN_BIT = 5,
 };
 
 /* MFF_LOG_FLAGS_REG flag assignments */
@@ -75,6 +76,7 @@  enum mff_log_flags {
      * hypervisors should instead only be output to local targets
      */
     MLF_LOCAL_ONLY = (1 << MLF_LOCAL_ONLY_BIT),
+    MLF_RCV_FROM_VLAN = (1 << MLF_RCV_FROM_VLAN_BIT),
 };
 
 #endif /* ovn/lib/logical-fields.h */
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 0e06776..f68da2b 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -4411,6 +4411,28 @@  add_route(struct hmap *lflows, const struct ovn_port *op,
      * routing. */
     ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING, priority,
                   ds_cstr(&match), ds_cstr(&actions));
+
+    /* When output port is distributed gateway port, check if the router
+     * input port is a patch port connected to vlan network.
+     * Traffic from VLAN network to external network should be redirected
+     * to "redirect-chassis" by setting REGBIT_NAT_REDIRECT flag.
+     * Later physical table 32 will output this traffic to gateway
+     * chassis using input network vlan tag */
+    if (op == op->od->l3dgw_port) {
+        ds_clear(&match);
+        ds_clear(&actions);
+
+        ds_put_format(&match, "ip%s.%s == %s/%d", is_ipv4 ? "4" : "6",
+                      dir, network_s, plen);
+        ds_put_format(&match, " && flags.rcv_from_vlan == 1");
+        ds_put_format(&match, " && !is_chassis_resident(%s)",
+                      op->od->l3redirect_port->json_key);
+
+        ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING,
+                      priority + 1, ds_cstr(&match),
+                      REGBIT_NAT_REDIRECT" = 1; next;");
+    }
+
     ds_destroy(&match);
     ds_destroy(&actions);
 }
@@ -4822,6 +4844,19 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
         }
         ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 50,
                       ds_cstr(&match), "next;");
+
+        /* VLAN traffic from localnet port should be allowed for
+         * router processing on the "redirect-chassis". */
+        if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer &&
+            op->peer->od->localnet_port && (op != op->od->l3dgw_port)) {
+            ds_clear(&match);
+            ds_put_format(&match, "flags.rcv_from_vlan == 1");
+            ds_put_format(&match, " && inport == %s", op->json_key);
+            ds_put_format(&match, " && is_chassis_resident(%s)",
+                          op->od->l3redirect_port->json_key);
+            ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 100,
+                          ds_cstr(&match), "next;");
+        }
     }
 
     /* Logical router ingress table 1: IP Input. */
diff --git a/tests/ovn.at b/tests/ovn.at
index f12c24c..6916bd0 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -7713,6 +7713,233 @@  test_ip_packet gw2 gw1
 OVN_CLEANUP([hv1],[gw1],[gw2],[ext1])
 AT_CLEANUP
 
+# VLAN traffic for external network redirected through distributed router gateway port
+# should use vlans(i.e input network vlan tag) across hypervisors instead of tunneling.
+AT_SETUP([ovn -- vlan traffic for external network with distributed router gateway port])
+AT_SKIP_IF([test $HAVE_PYTHON = no])
+ovn_start
+
+# Logical network:
+# # One LR R1 that has switches foo (192.168.1.0/24) and
+# # alice (172.16.1.0/24) connected to it.  The logical port
+# # between R1 and alice has a "redirect-chassis" specified,
+# # i.e. it is the distributed router gateway port(172.16.1.6).
+# # Switch alice also has a localnet port defined.
+# # An additional switch outside has the same subnet as alice
+# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1)
+# # which will receive the packet destined for external network
+# # (i.e 8.8.8.8 as destination ip).
+
+# Physical network:
+# # Three hypervisors hv[123].
+# # hv1 hosts vif foo1.
+# # hv2 is the "redirect-chassis" that hosts the distributed router gateway port.
+# # hv3 hosts nexthop port vif outside1.
+# # All other tests connect hypervisors to network n1 through br-phys for tunneling.
+# # But in this test, hv1 won't connect to n1(and no br-phys in hv1), and
+# # in order to show vlans(instead of tunneling) used between hv1 and hv2,
+# # a new network n2 created and hv1 and hv2 connected to this network through br-ex.
+# # hv2 and hv3 are still connected to n1 network through br-phys.
+net_add n1
+
+# We are not calling ovn_attach for hv1, to avoid adding br-phys.
+# Tunneling won't work in hv1 as ovn-encap-ip is not added to any bridge in hv1
+sim_add hv1
+as hv1
+ovs-vsctl \
+    -- set Open_vSwitch . external-ids:system-id=hv1 \
+    -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
+    -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \
+    -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \
+    -- add-br br-int \
+    -- set bridge br-int fail-mode=secure other-config:disable-in-band=true
+start_daemon ovn-controller
+ovs-vsctl -- add-port br-int hv1-vif1 -- \
+    set interface hv1-vif1 external-ids:iface-id=foo1 \
+    ofport-request=1
+
+sim_add hv2
+as hv2
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.2
+
+sim_add hv3
+as hv3
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.3
+ovs-vsctl -- add-port br-int hv3-vif1 -- \
+    set interface hv3-vif1 external-ids:iface-id=outside1 \
+    options:tx_pcap=hv3/vif1-tx.pcap \
+    options:rxq_pcap=hv3/vif1-rx.pcap \
+    ofport-request=1
+
+# Create network n2 for vlan connectivity between hv1 and hv2
+net_add n2
+
+as hv1
+ovs-vsctl add-br br-ex
+net_attach n2 br-ex
+
+as hv2
+ovs-vsctl add-br br-ex
+net_attach n2 br-ex
+
+as hv1 ovs-vsctl set open . external-ids:ovn-bridge-mappings="public:br-ex"
+as hv2 ovs-vsctl set open . external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys"
+as hv3 ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
+OVN_POPULATE_ARP
+
+ovn-nbctl create Logical_Router name=R1
+
+ovn-nbctl ls-add foo
+ovn-nbctl ls-add alice
+ovn-nbctl ls-add outside
+
+# Connect foo to R1
+ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24
+ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
+    type=router options:router-port=foo \
+    -- lsp-set-addresses rp-foo router
+
+# Connect alice to R1 as distributed router gateway port (172.16.1.6) on hv2
+ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \
+    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
+ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
+    type=router options:router-port=alice \
+    -- lsp-set-addresses rp-alice router
+
+# Create logical port foo1 in foo
+ovn-nbctl lsp-add foo foo1 \
+-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
+
+# Create logical port outside1 in outside, which is a nexthop address
+# for 172.16.1.0/24
+ovn-nbctl lsp-add outside outside1 \
+-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1"
+
+# Set default gateway (nexthop) to 172.16.1.1
+ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice
+AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24])
+
+ovn-nbctl lsp-add foo ln-foo
+ovn-nbctl lsp-set-addresses ln-foo unknown
+ovn-nbctl lsp-set-options ln-foo network_name=public
+ovn-nbctl lsp-set-type ln-foo localnet
+AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2])
+
+# Create localnet port in alice
+ovn-nbctl lsp-add alice ln-alice
+ovn-nbctl lsp-set-addresses ln-alice unknown
+ovn-nbctl lsp-set-type ln-alice localnet
+ovn-nbctl lsp-set-options ln-alice network_name=phys
+
+# Create localnet port in outside
+ovn-nbctl lsp-add outside ln-outside
+ovn-nbctl lsp-set-addresses ln-outside unknown
+ovn-nbctl lsp-set-type ln-outside localnet
+ovn-nbctl lsp-set-options ln-outside network_name=phys
+ovn-nbctl --wait=hv sync
+
+ip_to_hex() {
+    printf "%02x%02x%02x%02x" "$@"
+}
+gw_ip=$(ip_to_hex 172 16 1 6)
+src_ip=$(ip_to_hex 192 168 1 2)
+dst_ip=$(ip_to_hex 8 8 8 8)
+nexthop_ip=$(ip_to_hex 172 16 1 1)
+
+# Send ip packet from foo1 to 8.8.8.8
+src_mac="f00000010203"
+dst_mac="000001010203"
+packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
+
+as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
+sleep 2
+
+# ARP request packet to expect at outside1
+src_mac="000002010203"
+arp_request=ffffffffffff${src_mac}08060001080006040001${src_mac}${gw_ip}000000000000${nexthop_ip}
+echo $arp_request >> hv3-vif1.expected
+OVN_CHECK_PACKETS([hv3/vif1-tx.pcap], [hv3-vif1.expected])
+
+# Send ARP reply from outside1 back to the router
+reply_mac="f00000010204"
+arp_reply=${src_mac}${reply_mac}08060001080006040002${reply_mac}${nexthop_ip}${src_mac}${gw_ip}
+
+as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply
+
+# Allow some time for ovn-northd and ovn-controller to catch up.
+# XXX This should be more systematic.
+sleep 1
+
+# VLAN tagged packet with distributed gateway port(172.16.1.6) MAC as destination MAC
+# is expected on bridge connecting hv1 and hv2
+src_mac="f00000010203"
+dst_mac="000002010203"
+expected=${dst_mac}${src_mac}8100000208004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
+echo $expected > hv1-br-ex_n2.expected
+
+# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port.
+# As connection tracking not enabled for this test, snat can't be done on the packet.
+# We still see foo1 as the source ip address. But source mac(172.16.1.6 MAC) and
+# dest mac(172.16.1.1 mac) are properly configured.
+src_mac="000002010203"
+dst_mac="f00000010204"
+expected=${dst_mac}${src_mac}08004500001c000000003f110100${src_ip}${dst_ip}0035111100080000
+echo $expected > hv3-vif1.expected
+
+reset_pcap_file() {
+    local iface=$1
+    local pcap_file=$2
+    ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \
+options:rxq_pcap=dummy-rx.pcap
+    rm -f ${pcap_file}*.pcap
+    ovs-vsctl -- set Interface $iface options:tx_pcap=${pcap_file}-tx.pcap \
+options:rxq_pcap=${pcap_file}-rx.pcap
+}
+
+as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2
+as hv3 reset_pcap_file hv3-vif1 hv3/vif1
+sleep 1
+as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
+sleep 2
+
+# On hv1, table 65 for packets going from vlan switch pipleline to router pipleine
+# set MLF_RCV_FROM_VLAN flag
+AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep "priority=100,reg15=0x1,metadata=0x2" \
+| grep actions=clone | grep "load:0x1->NXM_NX_REG10" | wc -l], [0], [[1
+]])
+# On hv1, because of snat rule in table 15, a higher priority(i.e 2) flow
+# added for packets with MLF_RCV_FROM_VLAN flag with output as distributed
+# gateway port, which sets REGBIT_NAT_REDIRECT flag
+AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=15 | grep "priority=2,ip,reg10=0x20/0x20,metadata=0x1" \
+| grep "actions=load:0x1->OXM_OF_PKT_REG4" | wc -l], [0], [[1
+]])
+
+# On hv1, table 32 flow which tags packet with source network vlan tag and sends it to hv2
+# through br-ex
+AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 | grep "priority=150,reg14=0x1,reg15=0x3,metadata=0x1" \
+| grep "actions=mod_vlan_vid:2" | grep "n_packets=2," | wc -l], [0], [[1
+]])
+
+# On hv2 table 0, vlan tagged packet is sent through router pipeline
+# by setting MLF_RCV_FROM_VLAN flag (REG10)
+AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep "table=0," | grep "priority=150" | grep "dl_vlan=2" | \
+grep "actions=strip_vlan,load:0x1->OXM_OF_METADATA" | grep "load:0x1->NXM_NX_REG14" | \
+grep "load:0x1->NXM_NX_REG10" | wc -l], [0], [[1
+]])
+# on hv2 table 8, allow packets with router metadata and with MLF_RCV_FROM_VLAN flag
+AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=8 | grep "priority=100,reg10=0x20/0x20,reg14=0x1,metadata=0x1" | wc -l], [0], [[1
+]])
+
+# Check vlan tagged packet on the bridge connecting hv1 and hv2
+OVN_CHECK_PACKETS([hv1/br-ex_n2-tx.pcap], [hv1-br-ex_n2.expected])
+# Check expected packet on nexthop interface
+OVN_CHECK_PACKETS([hv3/vif1-tx.pcap], [hv3-vif1.expected])
+
+OVN_CLEANUP([hv1],[hv2],[hv3])
+AT_CLEANUP
+
 AT_SETUP([ovn -- 1 LR with distributed router gateway port])
 AT_SKIP_IF([test $HAVE_PYTHON = no])
 ovn_start