diff mbox series

[ovs-dev,v2] ovn: Avoid tunneling for VLAN packets redirected to a gateway chassis

Message ID 20181119161738.9468-1-nusiddiq@redhat.com
State Accepted
Headers show
Series [ovs-dev,v2] ovn: Avoid tunneling for VLAN packets redirected to a gateway chassis | expand

Commit Message

Numan Siddique Nov. 19, 2018, 4:17 p.m. UTC
From: Numan Siddique <nusiddiq@redhat.com>

An OVN deployment can have multiple logical switches each with a
localnet port connected to a distributed logical router in which one
logical switch may provide external connectivity and the rest of
the localnet logical switches use VLAN tagging in the physical
network.

As reported in [1], external traffic from these localnet VLAN tagged
logical switches are tunnelled to the gateway chassis (chassis hosting
a distributed gateway port which applies NAT rules). As part of the
discussion in [1], there are few possible solutions proposed by
Russell [2]. This patch implements the first option in [2].

With this patch, a new option 'reside-on-redirect-chassis' in 'options'
column of Logical_Router_Port table is added. If the value of this
option is set to 'true' and if the logical router also have a
distributed gateway port, then routing for this logical router port
is centralized in the chassis hosting the distributed gateway port.

If a logical switch 'sw0' is connected to a router 'lr0' with the
router port - 'lr0-sw0' with the address - "00:00:00:00:af:12 192.168.1.1"
, and it has a distributed logical port - 'lr0-public', then the
below logical flow is added in the logical switch pipeline
of 'sw0' if the 'reside-on-redirect-chassis' option is set on 'lr-sw0' -

table=16(ls_in_l2_lkup), priority=50,
match=(eth.dst == 00:00:00:00:af:12 && is_chassis_resident("cr-lr0-public")),
action=(outport = "sw0-lr0"; output;)

"cr-lr0-public" is an internal port binding created by ovn-northd of type
'chassisredirect' for lr0-public in SB DB. Please see "man ovn-sb" for more details.

With the above flow, the packet doesn't enter the router pipeline in
the source chassis. Instead the packet is sent out via the localnet
port of 'sw0'. The gateway chassis upon receiving this packet, runs
the logical router pipeline applying NAT rules and sends the traffic
out via the localnet port of the logical switch providing external connectivity.
The gateway chassis will also reply to the ARP requests for the router port IPs.

With this approach, we avoid redirecting the external traffic to the
gateway chassis via the tunnel port. There are a couple of drawbacks
with this approach:

  - East - West routing is no more distributed for the VLAN tagged
    localnet logical switches if 'reside-on-redirect-chassis' option is defined

  - 'dnat_and_snat' NAT rules with 'logical_mac' and 'logical_port'
    columns defined will not work for these logical switches.

This approach is taken for now as it is simple. If there is a requirement
to support distributed routing for these VLAN tenant networks, we
can explore other possible solutions.

[1] -  https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
[2] - https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046557.html

Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
Reported-by: venkata anil <vkommadi@redhat.com>
Acked-by: Gurucharan Shetty <guru@ovn.org>
Co-authored-by: venkata anil <vkommadi@redhat.com>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: venkata anil <vkommadi@redhat.com>
---

v1 -> v2
--------
 * Addressed the review comments from Guru.
 * Removed the patch 2 'ovn: Support a new Logical_Switch_Port.type -
   'external' from this series as it is an independent patch.

 ovn/northd/ovn-northd.8.xml |  30 ++++
 ovn/northd/ovn-northd.c     |  71 +++++++---
 ovn/ovn-architecture.7.xml  | 211 ++++++++++++++++++++++++++++
 ovn/ovn-nb.xml              |  43 ++++++
 tests/ovn.at                | 273 ++++++++++++++++++++++++++++++++++++
 5 files changed, 612 insertions(+), 16 deletions(-)

Comments

Gurucharan Shetty Nov. 26, 2018, 7:17 p.m. UTC | #1
On Mon, 19 Nov 2018 at 08:18, <nusiddiq@redhat.com> wrote:

> From: Numan Siddique <nusiddiq@redhat.com>
>
> An OVN deployment can have multiple logical switches each with a
> localnet port connected to a distributed logical router in which one
> logical switch may provide external connectivity and the rest of
> the localnet logical switches use VLAN tagging in the physical
> network.
>
> As reported in [1], external traffic from these localnet VLAN tagged
> logical switches are tunnelled to the gateway chassis (chassis hosting
> a distributed gateway port which applies NAT rules). As part of the
> discussion in [1], there are few possible solutions proposed by
> Russell [2]. This patch implements the first option in [2].
>
> With this patch, a new option 'reside-on-redirect-chassis' in 'options'
> column of Logical_Router_Port table is added. If the value of this
> option is set to 'true' and if the logical router also have a
> distributed gateway port, then routing for this logical router port
> is centralized in the chassis hosting the distributed gateway port.
>
> If a logical switch 'sw0' is connected to a router 'lr0' with the
> router port - 'lr0-sw0' with the address - "00:00:00:00:af:12 192.168.1.1"
> , and it has a distributed logical port - 'lr0-public', then the
> below logical flow is added in the logical switch pipeline
> of 'sw0' if the 'reside-on-redirect-chassis' option is set on 'lr-sw0' -
>
> table=16(ls_in_l2_lkup), priority=50,
> match=(eth.dst == 00:00:00:00:af:12 &&
> is_chassis_resident("cr-lr0-public")),
> action=(outport = "sw0-lr0"; output;)
>
> "cr-lr0-public" is an internal port binding created by ovn-northd of type
> 'chassisredirect' for lr0-public in SB DB. Please see "man ovn-sb" for
> more details.
>
> With the above flow, the packet doesn't enter the router pipeline in
> the source chassis. Instead the packet is sent out via the localnet
> port of 'sw0'. The gateway chassis upon receiving this packet, runs
> the logical router pipeline applying NAT rules and sends the traffic
> out via the localnet port of the logical switch providing external
> connectivity.
> The gateway chassis will also reply to the ARP requests for the router
> port IPs.
>
> With this approach, we avoid redirecting the external traffic to the
> gateway chassis via the tunnel port. There are a couple of drawbacks
> with this approach:
>
>   - East - West routing is no more distributed for the VLAN tagged
>     localnet logical switches if 'reside-on-redirect-chassis' option is
> defined
>
>   - 'dnat_and_snat' NAT rules with 'logical_mac' and 'logical_port'
>     columns defined will not work for these logical switches.
>
> This approach is taken for now as it is simple. If there is a requirement
> to support distributed routing for these VLAN tenant networks, we
> can explore other possible solutions.
>
> [1] -
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
> [2] -
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046557.html
>
> Reported-at:
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
> Reported-by: venkata anil <vkommadi@redhat.com>
> Acked-by: Gurucharan Shetty <guru@ovn.org>
> Co-authored-by: venkata anil <vkommadi@redhat.com>
> Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
> Signed-off-by: venkata anil <vkommadi@redhat.com>
>

Since no one else looks to have any further comments, I applied this to
master.


> ---
>
> v1 -> v2
> --------
>  * Addressed the review comments from Guru.
>  * Removed the patch 2 'ovn: Support a new Logical_Switch_Port.type -
>    'external' from this series as it is an independent patch.
>
>  ovn/northd/ovn-northd.8.xml |  30 ++++
>  ovn/northd/ovn-northd.c     |  71 +++++++---
>  ovn/ovn-architecture.7.xml  | 211 ++++++++++++++++++++++++++++
>  ovn/ovn-nb.xml              |  43 ++++++
>  tests/ovn.at                | 273 ++++++++++++++++++++++++++++++++++++
>  5 files changed, 612 insertions(+), 16 deletions(-)
>
> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
> index 7352c6764..f52699bd3 100644
> --- a/ovn/northd/ovn-northd.8.xml
> +++ b/ovn/northd/ovn-northd.8.xml
> @@ -874,6 +874,25 @@ output;
>              resident.
>            </li>
>          </ul>
> +
> +        <p>
> +          For the Ethernet address on a logical switch port of type
> +          <code>router</code>, when that logical switch port's
> +          <ref column="addresses" table="Logical_Switch_Port"
> +          db="OVN_Northbound"/> column is set to <code>router</code> and
> +          the connected logical router port specifies a
> +          <code>reside-on-redirect-chassis</code> and the logical router
> +          to which the connected logical router port belongs to has a
> +          <code>redirect-chassis</code> distributed gateway logical router
> +          port:
> +        </p>
> +
> +        <ul>
> +          <li>
> +            The flow for the connected logical router port's Ethernet
> +            address is only programmed on the
> <code>redirect-chassis</code>.
> +          </li>
> +        </ul>
>        </li>
>
>        <li>
> @@ -1179,6 +1198,17 @@ output;
>            upstream MAC learning to point to the
>            <code>redirect-chassis</code>.
>          </p>
> +
> +        <p>
> +          For the logical router port with the option
> +          <code>reside-on-redirect-chassis</code> set (which is
> centralized),
> +          the above flows are only programmed on the gateway port
> instance on
> +          the <code>redirect-chassis</code> (if the logical router has a
> +          distributed gateway port). This behavior avoids generation
> +          of multiple ARP responses from different chassis, and allows
> +          upstream MAC learning to point to the
> +          <code>redirect-chassis</code>.
> +        </p>
>        </li>
>
>        <li>
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index 58bef7de5..2de9fb38d 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -4461,13 +4461,32 @@ build_lswitch_flows(struct hmap *datapaths, struct
> hmap *ports,
>                  ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT,
>                                ETH_ADDR_ARGS(mac));
>                  if (op->peer->od->l3dgw_port
> -                    && op->peer == op->peer->od->l3dgw_port
> -                    && op->peer->od->l3redirect_port) {
> -                    /* The destination lookup flow for the router's
> -                     * distributed gateway port MAC address should only be
> -                     * programmed on the "redirect-chassis". */
> -                    ds_put_format(&match, " && is_chassis_resident(%s)",
> -
> op->peer->od->l3redirect_port->json_key);
> +                    && op->peer->od->l3redirect_port
> +                    && op->od->localnet_port) {
> +                    bool add_chassis_resident_check = false;
> +                    if (op->peer == op->peer->od->l3dgw_port) {
> +                        /* The peer of this port represents a distributed
> +                         * gateway port. The destination lookup flow for
> the
> +                         * router's distributed gateway port MAC address
> should
> +                         * only be programmed on the "redirect-chassis".
> */
> +                        add_chassis_resident_check = true;
> +                    } else {
> +                        /* Check if the option
> 'reside-on-redirect-chassis'
> +                         * is set to true on the peer port. If set to true
> +                         * and if the logical switch has a localnet port,
> it
> +                         * means the router pipeline for the packets from
> +                         * this logical switch should be run on the
> chassis
> +                         * hosting the gateway port.
> +                         */
> +                        add_chassis_resident_check = smap_get_bool(
> +                            &op->peer->nbrp->options,
> +                            "reside-on-redirect-chassis", false);
> +                    }
> +
> +                    if (add_chassis_resident_check) {
> +                        ds_put_format(&match, " &&
> is_chassis_resident(%s)",
> +
> op->peer->od->l3redirect_port->json_key);
> +                    }
>                  }
>
>                  ds_clear(&actions);
> @@ -5232,15 +5251,35 @@ build_lrouter_flows(struct hmap *datapaths, struct
> hmap *ports,
>                            op->lrp_networks.ipv4_addrs[i].network_s,
>                            op->lrp_networks.ipv4_addrs[i].plen,
>                            op->lrp_networks.ipv4_addrs[i].addr_s);
> -            if (op->od->l3dgw_port && op == op->od->l3dgw_port
> -                && op->od->l3redirect_port) {
> -                /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s
> -                 * should only be sent from the "redirect-chassis", so
> that
> -                 * upstream MAC learning points to the "redirect-chassis".
> -                 * Also need to avoid generation of multiple ARP responses
> -                 * from different chassis. */
> -                ds_put_format(&match, " && is_chassis_resident(%s)",
> -                              op->od->l3redirect_port->json_key);
> +
> +            if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer
> +                && op->peer->od->localnet_port) {
> +                bool add_chassis_resident_check = false;
> +                if (op == op->od->l3dgw_port) {
> +                    /* Traffic with eth.src =
> l3dgw_port->lrp_networks.ea_s
> +                     * should only be sent from the "redirect-chassis",
> so that
> +                     * upstream MAC learning points to the
> "redirect-chassis".
> +                     * Also need to avoid generation of multiple ARP
> responses
> +                     * from different chassis. */
> +                    add_chassis_resident_check = true;
> +                } else {
> +                    /* Check if the option 'reside-on-redirect-chassis'
> +                     * is set to true on the router port. If set to true
> +                     * and if peer's logical switch has a localnet port,
> it
> +                     * means the router pipeline for the packets from
> +                     * peer's logical switch is be run on the chassis
> +                     * hosting the gateway port and it should reply to the
> +                     * ARP requests for the router port IPs.
> +                     */
> +                    add_chassis_resident_check = smap_get_bool(
> +                        &op->nbrp->options,
> +                        "reside-on-redirect-chassis", false);
> +                }
> +
> +                if (add_chassis_resident_check) {
> +                    ds_put_format(&match, " && is_chassis_resident(%s)",
> +                                  op->od->l3redirect_port->json_key);
> +                }
>              }
>
>              ds_clear(&actions);
> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
> index 64e7d89e6..3936e6016 100644
> --- a/ovn/ovn-architecture.7.xml
> +++ b/ovn/ovn-architecture.7.xml
> @@ -1372,6 +1372,217 @@
>      http://docs.openvswitch.org/en/latest/topics/high-availability.
>    </p>
>
> +  <h2>Multiple localnet logical switches connected to a Logical
> Router</h2>
> +
> +  <p>
> +    It is possible to have multiple logical switches each with a localnet
> port
> +    (representing physical networks) connected to a logical router, in
> which
> +    one localnet logical switch may provide the external connectivity via
> a
> +    distributed gateway port and rest of the localnet logical switches use
> +    VLAN tagging in the physical network. It is expected that
> +    <code>ovn-bridge-mappings</code> is configured appropriately on the
> +    chassis for all these localnet networks.
> +  </p>
> +
> +  <h3>East West routing</h3>
> +  <p>
> +    East-West routing between these localnet VLAN tagged logical switches
> +    work almost the same way as normal logical switches. When the VM sends
> +    such a packet, then:
> +  </p>
> +  <ol>
> +    <li>
> +      It first enters the ingress pipeline, and then egress pipeline of
> the
> +      source localnet logical switch datapath. It then enters the ingress
> +      pipeline of the logical router datapath via the logical router port
> in
> +      the source chassis.
> +    </li>
> +
> +    <li>
> +      Routing decision is taken.
> +    </li>
> +
> +    <li>
> +      From the router datapath, packet enters the ingress pipeline and
> then
> +      egress pipeline of the destination localnet logical switch datapath
> +      and goes out of the integration bridge to the provider bridge (
> +      belonging to the destination logical switch) via the localnet port.
> +    </li>
> +
> +    <li>
> +      The destination chassis receives the packet via the localnet port
> and
> +      sends it to the integration bridge. The packet enters the
> +      ingress pipeline and then egress pipeline of the destination
> localnet
> +      logical switch and finally gets delivered to the destination VM
> port.
> +    </li>
> +  </ol>
> +
> +  <h3>External traffic</h3>
> +
> +  <p>
> +    The following happens when a VM sends an external traffic (which
> requires
> +    NATting) and the chassis hosting the VM doesn't have a distributed
> gateway
> +    port.
> +  </p>
> +
> +  <ol>
> +    <li>
> +      The packet first enters the ingress pipeline, and then egress
> pipeline of
> +      the source localnet logical switch datapath. It then enters the
> ingress
> +      pipeline of the logical router datapath via the logical router port
> in
> +      the source chassis.
> +    </li>
> +
> +    <li>
> +      Routing decision is taken. Since the gateway router or the
> distributed
> +      gateway port doesn't reside in the source chassis, the traffic is
> +      redirected to the gateway chassis via the tunnel port.
> +    </li>
> +
> +    <li>
> +      The gateway chassis receives the packet via the tunnel port and the
> +      packet enters the egress pipeline of the logical router datapath.
> NAT
> +      rules are applied here. The packet then enters the ingress pipeline
> and
> +      then egress pipeline of the localnet logical switch datapath which
> +      provides external connectivity and finally goes out via the localnet
> +      port of the logical switch which provides external connectivity.
> +    </li>
> +  </ol>
> +
> +  <p>
> +    Although this works, the VM traffic is tunnelled when sent from the
> compute
> +    chassis to the gateway chassis. In order for it to work properly, the
> MTU
> +    of the localnet logical switches must be lowered to account for the
> tunnel
> +    encapsulation.
> +  </p>
> +
> +  <h2>
> +    Centralized routing for localnet VLAN tagged logical switches
> connected
> +    to a Logical Router
> +  </h2>
> +
> +  <p>
> +    To overcome the tunnel encapsulation problem described in the previous
> +    section, <code>OVN</code> supports the option of enabling centralized
> +    routing for localnet VLAN tagged logical switches. CMS can configure
> the
> +    option <ref column="options:reside-on-redirect-chassis"
> +    table="Logical_Router_Port" db="OVN_NB"/> to <code>true</code> for
> each
> +    <ref table="Logical_Router_Port" db="OVN_NB"/> which connects to the
> +    localnet VLAN tagged logical switches. This causes the gateway
> +    chassis (hosting the distributed gateway port) to handle all the
> +    routing for these networks, making it centralized. It will reply to
> +    the ARP requests for the logical router port IPs.
> +  </p>
> +
> +  <p>
> +    If the logical router doesn't have a distributed gateway port
> connecting
> +    to the localnet logical switch which provides external connectivity,
> +    then this option is ignored by <code>OVN</code>.
> +  </p>
> +
> +  <p>
> +    The following happens when a VM sends an east-west traffic which
> needs to
> +    be routed:
> +  </p>
> +
> +  <ol>
> +    <li>
> +      The packet first enters the ingress pipeline, and then egress
> pipeline of
> +      the source localnet logical switch datapath and is sent out via the
> +      localnet port of the source localnet logical switch (instead of
> sending
> +      it to router pipeline).
> +    </li>
> +
> +    <li>
> +      The gateway chassis receives the packet via the localnet port of the
> +      source localnet logical switch and sends it to the integration
> bridge.
> +      The packet then enters the ingress pipeline, and then egress
> pipeline of
> +      the source localnet logical switch datapath and enters the ingress
> +      pipeline of the logical router datapath.
> +    </li>
> +
> +    <li>
> +      Routing decision is taken.
> +    </li>
> +
> +    <li>
> +      From the router datapath, packet enters the ingress pipeline and
> then
> +      egress pipeline of the destination localnet logical switch datapath.
> +      It then goes out of the integration bridge to the provider bridge (
> +      belonging to the destination logical switch) via the localnet port.
> +    </li>
> +
> +    <li>
> +      The destination chassis receives the packet via the localnet port
> and
> +      sends it to the integration bridge. The packet enters the
> +      ingress pipeline and then egress pipeline of the destination
> localnet
> +      logical switch and finally delivered to the destination VM port.
> +    </li>
> +  </ol>
> +
> +  <p>
> +    The following happens when a VM sends an external traffic which
> requires
> +    NATting:
> +  </p>
> +
> +  <ol>
> +    <li>
> +      The packet first enters the ingress pipeline, and then egress
> pipeline of
> +      the source localnet logical switch datapath and is sent out via the
> +      localnet port of the source localnet logical switch (instead of
> sending
> +      it to router pipeline).
> +    </li>
> +
> +    <li>
> +      The gateway chassis receives the packet via the localnet port of the
> +      source localnet logical switch and sends it to the integration
> bridge.
> +      The packet then enters the ingress pipeline, and then egress
> pipeline of
> +      the source localnet logical switch datapath and enters the ingress
> +      pipeline of the logical router datapath.
> +    </li>
> +
> +    <li>
> +      Routing decision is taken and NAT rules are applied.
> +    </li>
> +
> +    <li>
> +      From the router datapath, packet enters the ingress pipeline and
> then
> +      egress pipeline of the localnet logical switch datapath which
> provides
> +      external connectivity. It then goes out of the integration bridge
> to the
> +      provider bridge (belonging to the logical switch which provides
> external
> +      connectivity) via the localnet port.
> +    </li>
> +  </ol>
> +
> +  <p>
> +    The following happens for the reverse external traffic.
> +  </p>
> +
> +  <ol>
> +    <li>
> +      The gateway chassis receives the packet from the localnet port of
> +      the logical switch which provides external connectivity. The packet
> then
> +      enters the ingress pipeline and then egress pipeline of the localnet
> +      logical switch (which provides external connectivity). The packet
> then
> +      enters the ingress pipeline of the logical router datapath.
> +    </li>
> +
> +    <li>
> +      The ingress pipeline of the logical router datapath applies the
> unNATting
> +      rules. The packet then enters the ingress pipeline and then egress
> +      pipeline of the source localnet logical switch. Since the source VM
> +      doesn't reside in the gateway chassis, the packet is sent out via
> the
> +      localnet port of the source logical switch.
> +    </li>
> +
> +    <li>
> +      The source chassis receives the packet via the localnet port and
> +      sends it to the integration bridge. The packet enters the
> +      ingress pipeline and then egress pipeline of the source localnet
> +      logical switch and finally gets delivered to the source VM port.
> +    </li>
> +  </ol>
> +
>    <h2>Life Cycle of a VTEP gateway</h2>
>
>    <p>
> diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
> index 474b4f9a7..4141751f8 100644
> --- a/ovn/ovn-nb.xml
> +++ b/ovn/ovn-nb.xml
> @@ -1681,6 +1681,49 @@
>            chassis to enable high availability.
>          </p>
>        </column>
> +
> +      <column name="options" key="reside-on-redirect-chassis">
> +        <p>
> +          Generally routing is distributed in <code>OVN</code>. The packet
> +          from a logical port which needs to be routed hits the router
> pipeline
> +          in the source chassis. For the East-West traffic, the packet is
> +          sent directly to the destination chassis. For the outside
> traffic
> +          the packet is sent to the gateway chassis.
> +        </p>
> +
> +        <p>
> +          When this option is set, <code>OVN</code> considers this only if
> +        </p>
> +
> +        <ul>
> +          <li>
> +            The logical router to which this logical router port belongs
> to
> +            has a distributed gateway port.
> +          </li>
> +
> +          <li>
> +            The peer's logical switch has a localnet port (representing
> +            a VLAN tagged network)
> +          </li>
> +        </ul>
> +
> +        <p>
> +          When this option is set to <code>true</code>, then the packet
> +          which needs to be routed hits the router pipeline in the chassis
> +          hosting the distributed gateway router port. The source chassis
> +          pushes out this traffic via the localnet port. With this the
> +          East-West traffic is no more distributed and will always go
> through
> +          the gateway chassis.
> +        </p>
> +
> +        <p>
> +          Without this option set, for any traffic destined to outside
> from a
> +          logical port which belongs to a logical switch with localnet
> port,
> +          the source chassis will send the traffic to the gateway chassis
> via
> +          the tunnel port instead of the localnet port and this could
> cause MTU
> +          issues.
> +        </p>
> +      </column>
>      </group>
>
>      <group title="Attachment">
> diff --git a/tests/ovn.at b/tests/ovn.at
> index ab32faa6b..2db3f675a 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -8567,6 +8567,279 @@ OVN_CLEANUP([hv1],[hv2],[hv3])
>
>  AT_CLEANUP
>
> +# VLAN traffic for external network redirected through distributed router
> +# gateway port should use vlans(i.e input network vlan tag) across
> hypervisors
> +# instead of tunneling.
> +AT_SETUP([ovn -- vlan traffic for external network with distributed
> router gateway port])
> +AT_SKIP_IF([test $HAVE_PYTHON = no])
> +ovn_start
> +
> +# Logical network:
> +# # One LR R1 that has switches foo (192.168.1.0/24) and
> +# # alice (172.16.1.0/24) connected to it.  The logical port
> +# # between R1 and alice has a "redirect-chassis" specified,
> +# # i.e. it is the distributed router gateway port(172.16.1.6).
> +# # Switch alice also has a localnet port defined.
> +# # An additional switch outside has the same subnet as alice
> +# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1)
> +# # which will receive the packet destined for external network
> +# # (i.e 8.8.8.8 as destination ip).
> +
> +# Physical network:
> +# # Three hypervisors hv[123].
> +# # hv1 hosts vif foo1.
> +# # hv2 is the "redirect-chassis" that hosts the distributed router
> gateway port.
> +# # hv3 hosts nexthop port vif outside1.
> +# # All other tests connect hypervisors to network n1 through br-phys for
> tunneling.
> +# # But in this test, hv1 won't connect to n1(and no br-phys in hv1), and
> +# # in order to show vlans(instead of tunneling) used between hv1 and hv2,
> +# # a new network n2 created and hv1 and hv2 connected to this network
> through br-ex.
> +# # hv2 and hv3 are still connected to n1 network through br-phys.
> +net_add n1
> +
> +# We are not calling ovn_attach for hv1, to avoid adding br-phys.
> +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any bridge
> in hv1
> +sim_add hv1
> +as hv1
> +ovs-vsctl \
> +    -- set Open_vSwitch . external-ids:system-id=hv1 \
> +    -- set Open_vSwitch .
> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> +    -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \
> +    -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \
> +    -- add-br br-int \
> +    -- set bridge br-int fail-mode=secure
> other-config:disable-in-band=true \
> +    -- set Open_vSwitch . external-ids:ovn-bridge-mappings=public:br-ex
> +
> +start_daemon ovn-controller
> +ovs-vsctl -- add-port br-int hv1-vif1 -- \
> +    set interface hv1-vif1 external-ids:iface-id=foo1 \
> +    ofport-request=1
> +
> +sim_add hv2
> +as hv2
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.2
> +ovs-vsctl set Open_vSwitch .
> external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys"
> +
> +sim_add hv3
> +as hv3
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.3
> +ovs-vsctl -- add-port br-int hv3-vif1 -- \
> +    set interface hv3-vif1 external-ids:iface-id=outside1 \
> +    options:tx_pcap=hv3/vif1-tx.pcap \
> +    options:rxq_pcap=hv3/vif1-rx.pcap \
> +    ofport-request=1
> +ovs-vsctl set Open_vSwitch .
> external-ids:ovn-bridge-mappings="phys:br-phys"
> +
> +# Create network n2 for vlan connectivity between hv1 and hv2
> +net_add n2
> +
> +as hv1
> +ovs-vsctl add-br br-ex
> +net_attach n2 br-ex
> +
> +as hv2
> +ovs-vsctl add-br br-ex
> +net_attach n2 br-ex
> +
> +OVN_POPULATE_ARP
> +
> +ovn-nbctl create Logical_Router name=R1
> +
> +ovn-nbctl ls-add foo
> +ovn-nbctl ls-add alice
> +ovn-nbctl ls-add outside
> +
> +# Connect foo to R1
> +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24
> +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
> +    type=router options:router-port=foo \
> +    -- lsp-set-addresses rp-foo router
> +
> +# Connect alice to R1 as distributed router gateway port (172.16.1.6) on
> hv2
> +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \
> +    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
> +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
> +    type=router options:router-port=alice \
> +    -- lsp-set-addresses rp-alice router \
> +
> +# Create logical port foo1 in foo
> +ovn-nbctl lsp-add foo foo1 \
> +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
> +
> +# Create logical port outside1 in outside, which is a nexthop address
> +# for 172.16.1.0/24
> +ovn-nbctl lsp-add outside outside1 \
> +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1"
> +
> +# Set default gateway (nexthop) to 172.16.1.1
> +ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice
> +AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24])
> +ovn-nbctl set Logical_Switch_Port rp-alice options:nat-addresses=router
> +
> +ovn-nbctl lsp-add foo ln-foo
> +ovn-nbctl lsp-set-addresses ln-foo unknown
> +ovn-nbctl lsp-set-options ln-foo network_name=public
> +ovn-nbctl lsp-set-type ln-foo localnet
> +AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2])
> +
> +# Create localnet port in alice
> +ovn-nbctl lsp-add alice ln-alice
> +ovn-nbctl lsp-set-addresses ln-alice unknown
> +ovn-nbctl lsp-set-type ln-alice localnet
> +ovn-nbctl lsp-set-options ln-alice network_name=phys
> +
> +# Create localnet port in outside
> +ovn-nbctl lsp-add outside ln-outside
> +ovn-nbctl lsp-set-addresses ln-outside unknown
> +ovn-nbctl lsp-set-type ln-outside localnet
> +ovn-nbctl lsp-set-options ln-outside network_name=phys
> +
> +# Allow some time for ovn-northd and ovn-controller to catch up.
> +# XXX This should be more systematic.
> +ovn-nbctl --wait=hv --timeout=3 sync
> +
> +# Check that there is a logical flow in logical switch foo's pipeline
> +# to set the outport to rp-foo (which is expected).
> +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup |
> \
> +grep rp-foo | grep -v is_chassis_resident | wc -l`])
> +
> +# Set the option 'reside-on-redirect-chassis' for foo
> +ovn-nbctl set logical_router_port foo
> options:reside-on-redirect-chassis=true
> +# Check that there is a logical flow in logical switch foo's pipeline
> +# to set the outport to rp-foo with the condition is_chassis_redirect.
> +ovn-sbctl dump-flows foo
> +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup |
> \
> +grep rp-foo | grep is_chassis_resident | wc -l`])
> +
> +echo "---------NB dump-----"
> +ovn-nbctl show
> +echo "---------------------"
> +ovn-nbctl list logical_router
> +echo "---------------------"
> +ovn-nbctl list nat
> +echo "---------------------"
> +ovn-nbctl list logical_router_port
> +echo "---------------------"
> +
> +echo "---------SB dump-----"
> +ovn-sbctl list datapath_binding
> +echo "---------------------"
> +ovn-sbctl list port_binding
> +echo "---------------------"
> +ovn-sbctl dump-flows
> +echo "---------------------"
> +ovn-sbctl list chassis
> +echo "---------------------"
> +
> +for chassis in hv1 hv2 hv3; do
> +    as $chassis
> +    echo "------ $chassis dump ----------"
> +    ovs-vsctl show br-int
> +    ovs-ofctl show br-int
> +    ovs-ofctl dump-flows br-int
> +    echo "--------------------------"
> +done
> +
> +ip_to_hex() {
> +    printf "%02x%02x%02x%02x" "$@"
> +}
> +
> +foo1_ip=$(ip_to_hex 192 168 1 2)
> +gw_ip=$(ip_to_hex 172 16 1 6)
> +dst_ip=$(ip_to_hex 8 8 8 8)
> +nexthop_ip=$(ip_to_hex 172 16 1 1)
> +
> +foo1_mac="f00000010203"
> +foo_mac="000001010203"
> +gw_mac="000002010203"
> +nexthop_mac="f00000010204"
> +
> +# Send ip packet from foo1 to 8.8.8.8
> +src_mac="f00000010203"
> +dst_mac="000001010203"
>
> +packet=${foo_mac}${foo1_mac}08004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000
> +
> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
> +sleep 2
> +
> +# ARP request packet for nexthop_ip to expect at outside1
>
> +arp_request=ffffffffffff${gw_mac}08060001080006040001${gw_mac}${gw_ip}000000000000${nexthop_ip}
> +echo $arp_request >> hv3-vif1.expected
> +cat hv3-vif1.expected > expout
> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep
> ${nexthop_ip} | uniq > hv3-vif1
> +AT_CHECK([sort hv3-vif1], [0], [expout])
> +
> +# Send ARP reply from outside1 back to the router
> +reply_mac="f00000010204"
>
> +arp_reply=${gw_mac}${nexthop_mac}08060001080006040002${nexthop_mac}${nexthop_ip}${gw_mac}${gw_ip}
> +
> +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply
> +OVS_WAIT_UNTIL([
> +    test `as hv2 ovs-ofctl dump-flows br-int | grep table=66 | \
> +grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1
> +    ])
> +
> +# VLAN tagged packet with router port(192.168.1.1) MAC as destination MAC
> +# is expected on bridge connecting hv1 and hv2
>
> +expected=${foo_mac}${foo1_mac}8100000208004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000
> +echo $expected > hv1-br-ex_n2.expected
> +
> +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port.
> +# As connection tracking not enabled for this test, snat can't be done on
> the packet.
> +# We still see foo1 as the source ip address. But source mac(gateway MAC)
> and
> +# dest mac(nexthop mac) are properly configured.
>
> +expected=${nexthop_mac}${gw_mac}08004500001c000000003f110100${foo1_ip}${dst_ip}0035111100080000
> +echo $expected > hv3-vif1.expected
> +
> +reset_pcap_file() {
> +    local iface=$1
> +    local pcap_file=$2
> +    ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \
> +options:rxq_pcap=dummy-rx.pcap
> +    rm -f ${pcap_file}*.pcap
> +    ovs-vsctl -- set Interface $iface
> options:tx_pcap=${pcap_file}-tx.pcap \
> +options:rxq_pcap=${pcap_file}-rx.pcap
> +}
> +
> +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2
> +as hv3 reset_pcap_file hv3-vif1 hv3/vif1
> +sleep 2
> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
> +sleep 2
> +
> +# On hv1, the packet should not go from vlan switch pipleline to router
> +# pipleine
> +as hv1 ovs-ofctl dump-flows br-int
> +
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
> "priority=100,reg15=0x1,metadata=0x2" \
> +| grep actions=clone | grep -v n_packets=0 | wc -l], [0], [[0
> +]])
> +
> +# On hv1, table 32 check that no packet goes via the tunnel port
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 \
> +| grep "NXM_NX_TUN_ID" | grep -v n_packets=0 | wc -l], [0], [[0
> +]])
> +
> +ip_packet() {
> +    grep "1010203f00000010203"
> +}
> +
> +# Check vlan tagged packet on the bridge connecting hv1 and hv2 with the
> +# foo1's mac.
> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-ex_n2-tx.pcap |
> ip_packet | uniq > hv1-br-ex_n2
> +cat hv1-br-ex_n2.expected > expout
> +AT_CHECK([sort hv1-br-ex_n2], [0], [expout])
> +
> +# Check expected packet on nexthop interface
> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep
> ${foo1_ip}${dst_ip} | uniq > hv3-vif1
> +cat hv3-vif1.expected > expout
> +AT_CHECK([sort hv3-vif1], [0], [expout])
> +
> +OVN_CLEANUP([hv1],[hv2],[hv3])
> +AT_CLEANUP
> +
>  AT_SETUP([ovn -- IPv6 ND Router Solicitation responder])
>  AT_KEYWORDS([ovn-nd_ra])
>  AT_SKIP_IF([test $HAVE_PYTHON = no])
> --
> 2.19.1
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Numan Siddique Nov. 27, 2018, 7:21 a.m. UTC | #2
On Tue, Nov 27, 2018 at 12:48 AM Guru Shetty <guru@ovn.org> wrote:

>
>
> On Mon, 19 Nov 2018 at 08:18, <nusiddiq@redhat.com> wrote:
>
>> From: Numan Siddique <nusiddiq@redhat.com>
>>
>> An OVN deployment can have multiple logical switches each with a
>> localnet port connected to a distributed logical router in which one
>> logical switch may provide external connectivity and the rest of
>> the localnet logical switches use VLAN tagging in the physical
>> network.
>>
>> As reported in [1], external traffic from these localnet VLAN tagged
>> logical switches are tunnelled to the gateway chassis (chassis hosting
>> a distributed gateway port which applies NAT rules). As part of the
>> discussion in [1], there are few possible solutions proposed by
>> Russell [2]. This patch implements the first option in [2].
>>
>> With this patch, a new option 'reside-on-redirect-chassis' in 'options'
>> column of Logical_Router_Port table is added. If the value of this
>> option is set to 'true' and if the logical router also have a
>> distributed gateway port, then routing for this logical router port
>> is centralized in the chassis hosting the distributed gateway port.
>>
>> If a logical switch 'sw0' is connected to a router 'lr0' with the
>> router port - 'lr0-sw0' with the address - "00:00:00:00:af:12 192.168.1.1"
>> , and it has a distributed logical port - 'lr0-public', then the
>> below logical flow is added in the logical switch pipeline
>> of 'sw0' if the 'reside-on-redirect-chassis' option is set on 'lr-sw0' -
>>
>> table=16(ls_in_l2_lkup), priority=50,
>> match=(eth.dst == 00:00:00:00:af:12 &&
>> is_chassis_resident("cr-lr0-public")),
>> action=(outport = "sw0-lr0"; output;)
>>
>> "cr-lr0-public" is an internal port binding created by ovn-northd of type
>> 'chassisredirect' for lr0-public in SB DB. Please see "man ovn-sb" for
>> more details.
>>
>> With the above flow, the packet doesn't enter the router pipeline in
>> the source chassis. Instead the packet is sent out via the localnet
>> port of 'sw0'. The gateway chassis upon receiving this packet, runs
>> the logical router pipeline applying NAT rules and sends the traffic
>> out via the localnet port of the logical switch providing external
>> connectivity.
>> The gateway chassis will also reply to the ARP requests for the router
>> port IPs.
>>
>> With this approach, we avoid redirecting the external traffic to the
>> gateway chassis via the tunnel port. There are a couple of drawbacks
>> with this approach:
>>
>>   - East - West routing is no more distributed for the VLAN tagged
>>     localnet logical switches if 'reside-on-redirect-chassis' option is
>> defined
>>
>>   - 'dnat_and_snat' NAT rules with 'logical_mac' and 'logical_port'
>>     columns defined will not work for these logical switches.
>>
>> This approach is taken for now as it is simple. If there is a requirement
>> to support distributed routing for these VLAN tenant networks, we
>> can explore other possible solutions.
>>
>> [1] -
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
>> [2] -
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046557.html
>>
>> Reported-at:
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
>> Reported-by: venkata anil <vkommadi@redhat.com>
>> Acked-by: Gurucharan Shetty <guru@ovn.org>
>> Co-authored-by: venkata anil <vkommadi@redhat.com>
>> Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
>> Signed-off-by: venkata anil <vkommadi@redhat.com>
>>
>
> Since no one else looks to have any further comments, I applied this to
> master.
>
>


Thanks Guru for the review and applying the patch.

Numan


> ---
>>
>> v1 -> v2
>> --------
>>  * Addressed the review comments from Guru.
>>  * Removed the patch 2 'ovn: Support a new Logical_Switch_Port.type -
>>    'external' from this series as it is an independent patch.
>>
>>  ovn/northd/ovn-northd.8.xml |  30 ++++
>>  ovn/northd/ovn-northd.c     |  71 +++++++---
>>  ovn/ovn-architecture.7.xml  | 211 ++++++++++++++++++++++++++++
>>  ovn/ovn-nb.xml              |  43 ++++++
>>  tests/ovn.at                | 273 ++++++++++++++++++++++++++++++++++++
>>  5 files changed, 612 insertions(+), 16 deletions(-)
>>
>> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
>> index 7352c6764..f52699bd3 100644
>> --- a/ovn/northd/ovn-northd.8.xml
>> +++ b/ovn/northd/ovn-northd.8.xml
>> @@ -874,6 +874,25 @@ output;
>>              resident.
>>            </li>
>>          </ul>
>> +
>> +        <p>
>> +          For the Ethernet address on a logical switch port of type
>> +          <code>router</code>, when that logical switch port's
>> +          <ref column="addresses" table="Logical_Switch_Port"
>> +          db="OVN_Northbound"/> column is set to <code>router</code> and
>> +          the connected logical router port specifies a
>> +          <code>reside-on-redirect-chassis</code> and the logical router
>> +          to which the connected logical router port belongs to has a
>> +          <code>redirect-chassis</code> distributed gateway logical
>> router
>> +          port:
>> +        </p>
>> +
>> +        <ul>
>> +          <li>
>> +            The flow for the connected logical router port's Ethernet
>> +            address is only programmed on the
>> <code>redirect-chassis</code>.
>> +          </li>
>> +        </ul>
>>        </li>
>>
>>        <li>
>> @@ -1179,6 +1198,17 @@ output;
>>            upstream MAC learning to point to the
>>            <code>redirect-chassis</code>.
>>          </p>
>> +
>> +        <p>
>> +          For the logical router port with the option
>> +          <code>reside-on-redirect-chassis</code> set (which is
>> centralized),
>> +          the above flows are only programmed on the gateway port
>> instance on
>> +          the <code>redirect-chassis</code> (if the logical router has a
>> +          distributed gateway port). This behavior avoids generation
>> +          of multiple ARP responses from different chassis, and allows
>> +          upstream MAC learning to point to the
>> +          <code>redirect-chassis</code>.
>> +        </p>
>>        </li>
>>
>>        <li>
>> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>> index 58bef7de5..2de9fb38d 100644
>> --- a/ovn/northd/ovn-northd.c
>> +++ b/ovn/northd/ovn-northd.c
>> @@ -4461,13 +4461,32 @@ build_lswitch_flows(struct hmap *datapaths,
>> struct hmap *ports,
>>                  ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT,
>>                                ETH_ADDR_ARGS(mac));
>>                  if (op->peer->od->l3dgw_port
>> -                    && op->peer == op->peer->od->l3dgw_port
>> -                    && op->peer->od->l3redirect_port) {
>> -                    /* The destination lookup flow for the router's
>> -                     * distributed gateway port MAC address should only
>> be
>> -                     * programmed on the "redirect-chassis". */
>> -                    ds_put_format(&match, " && is_chassis_resident(%s)",
>> -
>> op->peer->od->l3redirect_port->json_key);
>> +                    && op->peer->od->l3redirect_port
>> +                    && op->od->localnet_port) {
>> +                    bool add_chassis_resident_check = false;
>> +                    if (op->peer == op->peer->od->l3dgw_port) {
>> +                        /* The peer of this port represents a distributed
>> +                         * gateway port. The destination lookup flow for
>> the
>> +                         * router's distributed gateway port MAC address
>> should
>> +                         * only be programmed on the "redirect-chassis".
>> */
>> +                        add_chassis_resident_check = true;
>> +                    } else {
>> +                        /* Check if the option
>> 'reside-on-redirect-chassis'
>> +                         * is set to true on the peer port. If set to
>> true
>> +                         * and if the logical switch has a localnet
>> port, it
>> +                         * means the router pipeline for the packets from
>> +                         * this logical switch should be run on the
>> chassis
>> +                         * hosting the gateway port.
>> +                         */
>> +                        add_chassis_resident_check = smap_get_bool(
>> +                            &op->peer->nbrp->options,
>> +                            "reside-on-redirect-chassis", false);
>> +                    }
>> +
>> +                    if (add_chassis_resident_check) {
>> +                        ds_put_format(&match, " &&
>> is_chassis_resident(%s)",
>> +
>> op->peer->od->l3redirect_port->json_key);
>> +                    }
>>                  }
>>
>>                  ds_clear(&actions);
>> @@ -5232,15 +5251,35 @@ build_lrouter_flows(struct hmap *datapaths,
>> struct hmap *ports,
>>                            op->lrp_networks.ipv4_addrs[i].network_s,
>>                            op->lrp_networks.ipv4_addrs[i].plen,
>>                            op->lrp_networks.ipv4_addrs[i].addr_s);
>> -            if (op->od->l3dgw_port && op == op->od->l3dgw_port
>> -                && op->od->l3redirect_port) {
>> -                /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s
>> -                 * should only be sent from the "redirect-chassis", so
>> that
>> -                 * upstream MAC learning points to the
>> "redirect-chassis".
>> -                 * Also need to avoid generation of multiple ARP
>> responses
>> -                 * from different chassis. */
>> -                ds_put_format(&match, " && is_chassis_resident(%s)",
>> -                              op->od->l3redirect_port->json_key);
>> +
>> +            if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer
>> +                && op->peer->od->localnet_port) {
>> +                bool add_chassis_resident_check = false;
>> +                if (op == op->od->l3dgw_port) {
>> +                    /* Traffic with eth.src =
>> l3dgw_port->lrp_networks.ea_s
>> +                     * should only be sent from the "redirect-chassis",
>> so that
>> +                     * upstream MAC learning points to the
>> "redirect-chassis".
>> +                     * Also need to avoid generation of multiple ARP
>> responses
>> +                     * from different chassis. */
>> +                    add_chassis_resident_check = true;
>> +                } else {
>> +                    /* Check if the option 'reside-on-redirect-chassis'
>> +                     * is set to true on the router port. If set to true
>> +                     * and if peer's logical switch has a localnet port,
>> it
>> +                     * means the router pipeline for the packets from
>> +                     * peer's logical switch is be run on the chassis
>> +                     * hosting the gateway port and it should reply to
>> the
>> +                     * ARP requests for the router port IPs.
>> +                     */
>> +                    add_chassis_resident_check = smap_get_bool(
>> +                        &op->nbrp->options,
>> +                        "reside-on-redirect-chassis", false);
>> +                }
>> +
>> +                if (add_chassis_resident_check) {
>> +                    ds_put_format(&match, " && is_chassis_resident(%s)",
>> +                                  op->od->l3redirect_port->json_key);
>> +                }
>>              }
>>
>>              ds_clear(&actions);
>> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
>> index 64e7d89e6..3936e6016 100644
>> --- a/ovn/ovn-architecture.7.xml
>> +++ b/ovn/ovn-architecture.7.xml
>> @@ -1372,6 +1372,217 @@
>>      http://docs.openvswitch.org/en/latest/topics/high-availability.
>>    </p>
>>
>> +  <h2>Multiple localnet logical switches connected to a Logical
>> Router</h2>
>> +
>> +  <p>
>> +    It is possible to have multiple logical switches each with a
>> localnet port
>> +    (representing physical networks) connected to a logical router, in
>> which
>> +    one localnet logical switch may provide the external connectivity
>> via a
>> +    distributed gateway port and rest of the localnet logical switches
>> use
>> +    VLAN tagging in the physical network. It is expected that
>> +    <code>ovn-bridge-mappings</code> is configured appropriately on the
>> +    chassis for all these localnet networks.
>> +  </p>
>> +
>> +  <h3>East West routing</h3>
>> +  <p>
>> +    East-West routing between these localnet VLAN tagged logical switches
>> +    work almost the same way as normal logical switches. When the VM
>> sends
>> +    such a packet, then:
>> +  </p>
>> +  <ol>
>> +    <li>
>> +      It first enters the ingress pipeline, and then egress pipeline of
>> the
>> +      source localnet logical switch datapath. It then enters the ingress
>> +      pipeline of the logical router datapath via the logical router
>> port in
>> +      the source chassis.
>> +    </li>
>> +
>> +    <li>
>> +      Routing decision is taken.
>> +    </li>
>> +
>> +    <li>
>> +      From the router datapath, packet enters the ingress pipeline and
>> then
>> +      egress pipeline of the destination localnet logical switch datapath
>> +      and goes out of the integration bridge to the provider bridge (
>> +      belonging to the destination logical switch) via the localnet port.
>> +    </li>
>> +
>> +    <li>
>> +      The destination chassis receives the packet via the localnet port
>> and
>> +      sends it to the integration bridge. The packet enters the
>> +      ingress pipeline and then egress pipeline of the destination
>> localnet
>> +      logical switch and finally gets delivered to the destination VM
>> port.
>> +    </li>
>> +  </ol>
>> +
>> +  <h3>External traffic</h3>
>> +
>> +  <p>
>> +    The following happens when a VM sends an external traffic (which
>> requires
>> +    NATting) and the chassis hosting the VM doesn't have a distributed
>> gateway
>> +    port.
>> +  </p>
>> +
>> +  <ol>
>> +    <li>
>> +      The packet first enters the ingress pipeline, and then egress
>> pipeline of
>> +      the source localnet logical switch datapath. It then enters the
>> ingress
>> +      pipeline of the logical router datapath via the logical router
>> port in
>> +      the source chassis.
>> +    </li>
>> +
>> +    <li>
>> +      Routing decision is taken. Since the gateway router or the
>> distributed
>> +      gateway port doesn't reside in the source chassis, the traffic is
>> +      redirected to the gateway chassis via the tunnel port.
>> +    </li>
>> +
>> +    <li>
>> +      The gateway chassis receives the packet via the tunnel port and the
>> +      packet enters the egress pipeline of the logical router datapath.
>> NAT
>> +      rules are applied here. The packet then enters the ingress
>> pipeline and
>> +      then egress pipeline of the localnet logical switch datapath which
>> +      provides external connectivity and finally goes out via the
>> localnet
>> +      port of the logical switch which provides external connectivity.
>> +    </li>
>> +  </ol>
>> +
>> +  <p>
>> +    Although this works, the VM traffic is tunnelled when sent from the
>> compute
>> +    chassis to the gateway chassis. In order for it to work properly,
>> the MTU
>> +    of the localnet logical switches must be lowered to account for the
>> tunnel
>> +    encapsulation.
>> +  </p>
>> +
>> +  <h2>
>> +    Centralized routing for localnet VLAN tagged logical switches
>> connected
>> +    to a Logical Router
>> +  </h2>
>> +
>> +  <p>
>> +    To overcome the tunnel encapsulation problem described in the
>> previous
>> +    section, <code>OVN</code> supports the option of enabling centralized
>> +    routing for localnet VLAN tagged logical switches. CMS can configure
>> the
>> +    option <ref column="options:reside-on-redirect-chassis"
>> +    table="Logical_Router_Port" db="OVN_NB"/> to <code>true</code> for
>> each
>> +    <ref table="Logical_Router_Port" db="OVN_NB"/> which connects to the
>> +    localnet VLAN tagged logical switches. This causes the gateway
>> +    chassis (hosting the distributed gateway port) to handle all the
>> +    routing for these networks, making it centralized. It will reply to
>> +    the ARP requests for the logical router port IPs.
>> +  </p>
>> +
>> +  <p>
>> +    If the logical router doesn't have a distributed gateway port
>> connecting
>> +    to the localnet logical switch which provides external connectivity,
>> +    then this option is ignored by <code>OVN</code>.
>> +  </p>
>> +
>> +  <p>
>> +    The following happens when a VM sends an east-west traffic which
>> needs to
>> +    be routed:
>> +  </p>
>> +
>> +  <ol>
>> +    <li>
>> +      The packet first enters the ingress pipeline, and then egress
>> pipeline of
>> +      the source localnet logical switch datapath and is sent out via the
>> +      localnet port of the source localnet logical switch (instead of
>> sending
>> +      it to router pipeline).
>> +    </li>
>> +
>> +    <li>
>> +      The gateway chassis receives the packet via the localnet port of
>> the
>> +      source localnet logical switch and sends it to the integration
>> bridge.
>> +      The packet then enters the ingress pipeline, and then egress
>> pipeline of
>> +      the source localnet logical switch datapath and enters the ingress
>> +      pipeline of the logical router datapath.
>> +    </li>
>> +
>> +    <li>
>> +      Routing decision is taken.
>> +    </li>
>> +
>> +    <li>
>> +      From the router datapath, packet enters the ingress pipeline and
>> then
>> +      egress pipeline of the destination localnet logical switch
>> datapath.
>> +      It then goes out of the integration bridge to the provider bridge (
>> +      belonging to the destination logical switch) via the localnet port.
>> +    </li>
>> +
>> +    <li>
>> +      The destination chassis receives the packet via the localnet port
>> and
>> +      sends it to the integration bridge. The packet enters the
>> +      ingress pipeline and then egress pipeline of the destination
>> localnet
>> +      logical switch and finally delivered to the destination VM port.
>> +    </li>
>> +  </ol>
>> +
>> +  <p>
>> +    The following happens when a VM sends an external traffic which
>> requires
>> +    NATting:
>> +  </p>
>> +
>> +  <ol>
>> +    <li>
>> +      The packet first enters the ingress pipeline, and then egress
>> pipeline of
>> +      the source localnet logical switch datapath and is sent out via the
>> +      localnet port of the source localnet logical switch (instead of
>> sending
>> +      it to router pipeline).
>> +    </li>
>> +
>> +    <li>
>> +      The gateway chassis receives the packet via the localnet port of
>> the
>> +      source localnet logical switch and sends it to the integration
>> bridge.
>> +      The packet then enters the ingress pipeline, and then egress
>> pipeline of
>> +      the source localnet logical switch datapath and enters the ingress
>> +      pipeline of the logical router datapath.
>> +    </li>
>> +
>> +    <li>
>> +      Routing decision is taken and NAT rules are applied.
>> +    </li>
>> +
>> +    <li>
>> +      From the router datapath, packet enters the ingress pipeline and
>> then
>> +      egress pipeline of the localnet logical switch datapath which
>> provides
>> +      external connectivity. It then goes out of the integration bridge
>> to the
>> +      provider bridge (belonging to the logical switch which provides
>> external
>> +      connectivity) via the localnet port.
>> +    </li>
>> +  </ol>
>> +
>> +  <p>
>> +    The following happens for the reverse external traffic.
>> +  </p>
>> +
>> +  <ol>
>> +    <li>
>> +      The gateway chassis receives the packet from the localnet port of
>> +      the logical switch which provides external connectivity. The
>> packet then
>> +      enters the ingress pipeline and then egress pipeline of the
>> localnet
>> +      logical switch (which provides external connectivity). The packet
>> then
>> +      enters the ingress pipeline of the logical router datapath.
>> +    </li>
>> +
>> +    <li>
>> +      The ingress pipeline of the logical router datapath applies the
>> unNATting
>> +      rules. The packet then enters the ingress pipeline and then egress
>> +      pipeline of the source localnet logical switch. Since the source VM
>> +      doesn't reside in the gateway chassis, the packet is sent out via
>> the
>> +      localnet port of the source logical switch.
>> +    </li>
>> +
>> +    <li>
>> +      The source chassis receives the packet via the localnet port and
>> +      sends it to the integration bridge. The packet enters the
>> +      ingress pipeline and then egress pipeline of the source localnet
>> +      logical switch and finally gets delivered to the source VM port.
>> +    </li>
>> +  </ol>
>> +
>>    <h2>Life Cycle of a VTEP gateway</h2>
>>
>>    <p>
>> diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
>> index 474b4f9a7..4141751f8 100644
>> --- a/ovn/ovn-nb.xml
>> +++ b/ovn/ovn-nb.xml
>> @@ -1681,6 +1681,49 @@
>>            chassis to enable high availability.
>>          </p>
>>        </column>
>> +
>> +      <column name="options" key="reside-on-redirect-chassis">
>> +        <p>
>> +          Generally routing is distributed in <code>OVN</code>. The
>> packet
>> +          from a logical port which needs to be routed hits the router
>> pipeline
>> +          in the source chassis. For the East-West traffic, the packet is
>> +          sent directly to the destination chassis. For the outside
>> traffic
>> +          the packet is sent to the gateway chassis.
>> +        </p>
>> +
>> +        <p>
>> +          When this option is set, <code>OVN</code> considers this only
>> if
>> +        </p>
>> +
>> +        <ul>
>> +          <li>
>> +            The logical router to which this logical router port belongs
>> to
>> +            has a distributed gateway port.
>> +          </li>
>> +
>> +          <li>
>> +            The peer's logical switch has a localnet port (representing
>> +            a VLAN tagged network)
>> +          </li>
>> +        </ul>
>> +
>> +        <p>
>> +          When this option is set to <code>true</code>, then the packet
>> +          which needs to be routed hits the router pipeline in the
>> chassis
>> +          hosting the distributed gateway router port. The source chassis
>> +          pushes out this traffic via the localnet port. With this the
>> +          East-West traffic is no more distributed and will always go
>> through
>> +          the gateway chassis.
>> +        </p>
>> +
>> +        <p>
>> +          Without this option set, for any traffic destined to outside
>> from a
>> +          logical port which belongs to a logical switch with localnet
>> port,
>> +          the source chassis will send the traffic to the gateway
>> chassis via
>> +          the tunnel port instead of the localnet port and this could
>> cause MTU
>> +          issues.
>> +        </p>
>> +      </column>
>>      </group>
>>
>>      <group title="Attachment">
>> diff --git a/tests/ovn.at b/tests/ovn.at
>> index ab32faa6b..2db3f675a 100644
>> --- a/tests/ovn.at
>> +++ b/tests/ovn.at
>> @@ -8567,6 +8567,279 @@ OVN_CLEANUP([hv1],[hv2],[hv3])
>>
>>  AT_CLEANUP
>>
>> +# VLAN traffic for external network redirected through distributed router
>> +# gateway port should use vlans(i.e input network vlan tag) across
>> hypervisors
>> +# instead of tunneling.
>> +AT_SETUP([ovn -- vlan traffic for external network with distributed
>> router gateway port])
>> +AT_SKIP_IF([test $HAVE_PYTHON = no])
>> +ovn_start
>> +
>> +# Logical network:
>> +# # One LR R1 that has switches foo (192.168.1.0/24) and
>> +# # alice (172.16.1.0/24) connected to it.  The logical port
>> +# # between R1 and alice has a "redirect-chassis" specified,
>> +# # i.e. it is the distributed router gateway port(172.16.1.6).
>> +# # Switch alice also has a localnet port defined.
>> +# # An additional switch outside has the same subnet as alice
>> +# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1)
>> +# # which will receive the packet destined for external network
>> +# # (i.e 8.8.8.8 as destination ip).
>> +
>> +# Physical network:
>> +# # Three hypervisors hv[123].
>> +# # hv1 hosts vif foo1.
>> +# # hv2 is the "redirect-chassis" that hosts the distributed router
>> gateway port.
>> +# # hv3 hosts nexthop port vif outside1.
>> +# # All other tests connect hypervisors to network n1 through br-phys
>> for tunneling.
>> +# # But in this test, hv1 won't connect to n1(and no br-phys in hv1), and
>> +# # in order to show vlans(instead of tunneling) used between hv1 and
>> hv2,
>> +# # a new network n2 created and hv1 and hv2 connected to this network
>> through br-ex.
>> +# # hv2 and hv3 are still connected to n1 network through br-phys.
>> +net_add n1
>> +
>> +# We are not calling ovn_attach for hv1, to avoid adding br-phys.
>> +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any bridge
>> in hv1
>> +sim_add hv1
>> +as hv1
>> +ovs-vsctl \
>> +    -- set Open_vSwitch . external-ids:system-id=hv1 \
>> +    -- set Open_vSwitch .
>> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
>> +    -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \
>> +    -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \
>> +    -- add-br br-int \
>> +    -- set bridge br-int fail-mode=secure
>> other-config:disable-in-band=true \
>> +    -- set Open_vSwitch . external-ids:ovn-bridge-mappings=public:br-ex
>> +
>> +start_daemon ovn-controller
>> +ovs-vsctl -- add-port br-int hv1-vif1 -- \
>> +    set interface hv1-vif1 external-ids:iface-id=foo1 \
>> +    ofport-request=1
>> +
>> +sim_add hv2
>> +as hv2
>> +ovs-vsctl add-br br-phys
>> +ovn_attach n1 br-phys 192.168.0.2
>> +ovs-vsctl set Open_vSwitch .
>> external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys"
>> +
>> +sim_add hv3
>> +as hv3
>> +ovs-vsctl add-br br-phys
>> +ovn_attach n1 br-phys 192.168.0.3
>> +ovs-vsctl -- add-port br-int hv3-vif1 -- \
>> +    set interface hv3-vif1 external-ids:iface-id=outside1 \
>> +    options:tx_pcap=hv3/vif1-tx.pcap \
>> +    options:rxq_pcap=hv3/vif1-rx.pcap \
>> +    ofport-request=1
>> +ovs-vsctl set Open_vSwitch .
>> external-ids:ovn-bridge-mappings="phys:br-phys"
>> +
>> +# Create network n2 for vlan connectivity between hv1 and hv2
>> +net_add n2
>> +
>> +as hv1
>> +ovs-vsctl add-br br-ex
>> +net_attach n2 br-ex
>> +
>> +as hv2
>> +ovs-vsctl add-br br-ex
>> +net_attach n2 br-ex
>> +
>> +OVN_POPULATE_ARP
>> +
>> +ovn-nbctl create Logical_Router name=R1
>> +
>> +ovn-nbctl ls-add foo
>> +ovn-nbctl ls-add alice
>> +ovn-nbctl ls-add outside
>> +
>> +# Connect foo to R1
>> +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24
>> +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
>> +    type=router options:router-port=foo \
>> +    -- lsp-set-addresses rp-foo router
>> +
>> +# Connect alice to R1 as distributed router gateway port (172.16.1.6) on
>> hv2
>> +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \
>> +    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
>> +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
>> +    type=router options:router-port=alice \
>> +    -- lsp-set-addresses rp-alice router \
>> +
>> +# Create logical port foo1 in foo
>> +ovn-nbctl lsp-add foo foo1 \
>> +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
>> +
>> +# Create logical port outside1 in outside, which is a nexthop address
>> +# for 172.16.1.0/24
>> +ovn-nbctl lsp-add outside outside1 \
>> +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1"
>> +
>> +# Set default gateway (nexthop) to 172.16.1.1
>> +ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice
>> +AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24])
>> +ovn-nbctl set Logical_Switch_Port rp-alice options:nat-addresses=router
>> +
>> +ovn-nbctl lsp-add foo ln-foo
>> +ovn-nbctl lsp-set-addresses ln-foo unknown
>> +ovn-nbctl lsp-set-options ln-foo network_name=public
>> +ovn-nbctl lsp-set-type ln-foo localnet
>> +AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2])
>> +
>> +# Create localnet port in alice
>> +ovn-nbctl lsp-add alice ln-alice
>> +ovn-nbctl lsp-set-addresses ln-alice unknown
>> +ovn-nbctl lsp-set-type ln-alice localnet
>> +ovn-nbctl lsp-set-options ln-alice network_name=phys
>> +
>> +# Create localnet port in outside
>> +ovn-nbctl lsp-add outside ln-outside
>> +ovn-nbctl lsp-set-addresses ln-outside unknown
>> +ovn-nbctl lsp-set-type ln-outside localnet
>> +ovn-nbctl lsp-set-options ln-outside network_name=phys
>> +
>> +# Allow some time for ovn-northd and ovn-controller to catch up.
>> +# XXX This should be more systematic.
>> +ovn-nbctl --wait=hv --timeout=3 sync
>> +
>> +# Check that there is a logical flow in logical switch foo's pipeline
>> +# to set the outport to rp-foo (which is expected).
>> +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup
>> | \
>> +grep rp-foo | grep -v is_chassis_resident | wc -l`])
>> +
>> +# Set the option 'reside-on-redirect-chassis' for foo
>> +ovn-nbctl set logical_router_port foo
>> options:reside-on-redirect-chassis=true
>> +# Check that there is a logical flow in logical switch foo's pipeline
>> +# to set the outport to rp-foo with the condition is_chassis_redirect.
>> +ovn-sbctl dump-flows foo
>> +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup
>> | \
>> +grep rp-foo | grep is_chassis_resident | wc -l`])
>> +
>> +echo "---------NB dump-----"
>> +ovn-nbctl show
>> +echo "---------------------"
>> +ovn-nbctl list logical_router
>> +echo "---------------------"
>> +ovn-nbctl list nat
>> +echo "---------------------"
>> +ovn-nbctl list logical_router_port
>> +echo "---------------------"
>> +
>> +echo "---------SB dump-----"
>> +ovn-sbctl list datapath_binding
>> +echo "---------------------"
>> +ovn-sbctl list port_binding
>> +echo "---------------------"
>> +ovn-sbctl dump-flows
>> +echo "---------------------"
>> +ovn-sbctl list chassis
>> +echo "---------------------"
>> +
>> +for chassis in hv1 hv2 hv3; do
>> +    as $chassis
>> +    echo "------ $chassis dump ----------"
>> +    ovs-vsctl show br-int
>> +    ovs-ofctl show br-int
>> +    ovs-ofctl dump-flows br-int
>> +    echo "--------------------------"
>> +done
>> +
>> +ip_to_hex() {
>> +    printf "%02x%02x%02x%02x" "$@"
>> +}
>> +
>> +foo1_ip=$(ip_to_hex 192 168 1 2)
>> +gw_ip=$(ip_to_hex 172 16 1 6)
>> +dst_ip=$(ip_to_hex 8 8 8 8)
>> +nexthop_ip=$(ip_to_hex 172 16 1 1)
>> +
>> +foo1_mac="f00000010203"
>> +foo_mac="000001010203"
>> +gw_mac="000002010203"
>> +nexthop_mac="f00000010204"
>> +
>> +# Send ip packet from foo1 to 8.8.8.8
>> +src_mac="f00000010203"
>> +dst_mac="000001010203"
>>
>> +packet=${foo_mac}${foo1_mac}08004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000
>> +
>> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>> +sleep 2
>> +
>> +# ARP request packet for nexthop_ip to expect at outside1
>>
>> +arp_request=ffffffffffff${gw_mac}08060001080006040001${gw_mac}${gw_ip}000000000000${nexthop_ip}
>> +echo $arp_request >> hv3-vif1.expected
>> +cat hv3-vif1.expected > expout
>> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep
>> ${nexthop_ip} | uniq > hv3-vif1
>> +AT_CHECK([sort hv3-vif1], [0], [expout])
>> +
>> +# Send ARP reply from outside1 back to the router
>> +reply_mac="f00000010204"
>>
>> +arp_reply=${gw_mac}${nexthop_mac}08060001080006040002${nexthop_mac}${nexthop_ip}${gw_mac}${gw_ip}
>> +
>> +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply
>> +OVS_WAIT_UNTIL([
>> +    test `as hv2 ovs-ofctl dump-flows br-int | grep table=66 | \
>> +grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1
>> +    ])
>> +
>> +# VLAN tagged packet with router port(192.168.1.1) MAC as destination MAC
>> +# is expected on bridge connecting hv1 and hv2
>>
>> +expected=${foo_mac}${foo1_mac}8100000208004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000
>> +echo $expected > hv1-br-ex_n2.expected
>> +
>> +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port.
>> +# As connection tracking not enabled for this test, snat can't be done
>> on the packet.
>> +# We still see foo1 as the source ip address. But source mac(gateway
>> MAC) and
>> +# dest mac(nexthop mac) are properly configured.
>>
>> +expected=${nexthop_mac}${gw_mac}08004500001c000000003f110100${foo1_ip}${dst_ip}0035111100080000
>> +echo $expected > hv3-vif1.expected
>> +
>> +reset_pcap_file() {
>> +    local iface=$1
>> +    local pcap_file=$2
>> +    ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \
>> +options:rxq_pcap=dummy-rx.pcap
>> +    rm -f ${pcap_file}*.pcap
>> +    ovs-vsctl -- set Interface $iface
>> options:tx_pcap=${pcap_file}-tx.pcap \
>> +options:rxq_pcap=${pcap_file}-rx.pcap
>> +}
>> +
>> +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2
>> +as hv3 reset_pcap_file hv3-vif1 hv3/vif1
>> +sleep 2
>> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>> +sleep 2
>> +
>> +# On hv1, the packet should not go from vlan switch pipleline to router
>> +# pipleine
>> +as hv1 ovs-ofctl dump-flows br-int
>> +
>> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
>> "priority=100,reg15=0x1,metadata=0x2" \
>> +| grep actions=clone | grep -v n_packets=0 | wc -l], [0], [[0
>> +]])
>> +
>> +# On hv1, table 32 check that no packet goes via the tunnel port
>> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 \
>> +| grep "NXM_NX_TUN_ID" | grep -v n_packets=0 | wc -l], [0], [[0
>> +]])
>> +
>> +ip_packet() {
>> +    grep "1010203f00000010203"
>> +}
>> +
>> +# Check vlan tagged packet on the bridge connecting hv1 and hv2 with the
>> +# foo1's mac.
>> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-ex_n2-tx.pcap |
>> ip_packet | uniq > hv1-br-ex_n2
>> +cat hv1-br-ex_n2.expected > expout
>> +AT_CHECK([sort hv1-br-ex_n2], [0], [expout])
>> +
>> +# Check expected packet on nexthop interface
>> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep
>> ${foo1_ip}${dst_ip} | uniq > hv3-vif1
>> +cat hv3-vif1.expected > expout
>> +AT_CHECK([sort hv3-vif1], [0], [expout])
>> +
>> +OVN_CLEANUP([hv1],[hv2],[hv3])
>> +AT_CLEANUP
>> +
>>  AT_SETUP([ovn -- IPv6 ND Router Solicitation responder])
>>  AT_KEYWORDS([ovn-nd_ra])
>>  AT_SKIP_IF([test $HAVE_PYTHON = no])
>> --
>> 2.19.1
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
diff mbox series

Patch

diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index 7352c6764..f52699bd3 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -874,6 +874,25 @@  output;
             resident.
           </li>
         </ul>
+
+        <p>
+          For the Ethernet address on a logical switch port of type
+          <code>router</code>, when that logical switch port's
+          <ref column="addresses" table="Logical_Switch_Port"
+          db="OVN_Northbound"/> column is set to <code>router</code> and
+          the connected logical router port specifies a
+          <code>reside-on-redirect-chassis</code> and the logical router
+          to which the connected logical router port belongs to has a
+          <code>redirect-chassis</code> distributed gateway logical router
+          port:
+        </p>
+
+        <ul>
+          <li>
+            The flow for the connected logical router port's Ethernet
+            address is only programmed on the <code>redirect-chassis</code>.
+          </li>
+        </ul>
       </li>
 
       <li>
@@ -1179,6 +1198,17 @@  output;
           upstream MAC learning to point to the
           <code>redirect-chassis</code>.
         </p>
+
+        <p>
+          For the logical router port with the option
+          <code>reside-on-redirect-chassis</code> set (which is centralized),
+          the above flows are only programmed on the gateway port instance on
+          the <code>redirect-chassis</code> (if the logical router has a
+          distributed gateway port). This behavior avoids generation
+          of multiple ARP responses from different chassis, and allows
+          upstream MAC learning to point to the
+          <code>redirect-chassis</code>.
+        </p>
       </li>
 
       <li>
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 58bef7de5..2de9fb38d 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -4461,13 +4461,32 @@  build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
                 ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT,
                               ETH_ADDR_ARGS(mac));
                 if (op->peer->od->l3dgw_port
-                    && op->peer == op->peer->od->l3dgw_port
-                    && op->peer->od->l3redirect_port) {
-                    /* The destination lookup flow for the router's
-                     * distributed gateway port MAC address should only be
-                     * programmed on the "redirect-chassis". */
-                    ds_put_format(&match, " && is_chassis_resident(%s)",
-                                  op->peer->od->l3redirect_port->json_key);
+                    && op->peer->od->l3redirect_port
+                    && op->od->localnet_port) {
+                    bool add_chassis_resident_check = false;
+                    if (op->peer == op->peer->od->l3dgw_port) {
+                        /* The peer of this port represents a distributed
+                         * gateway port. The destination lookup flow for the
+                         * router's distributed gateway port MAC address should
+                         * only be programmed on the "redirect-chassis". */
+                        add_chassis_resident_check = true;
+                    } else {
+                        /* Check if the option 'reside-on-redirect-chassis'
+                         * is set to true on the peer port. If set to true
+                         * and if the logical switch has a localnet port, it
+                         * means the router pipeline for the packets from
+                         * this logical switch should be run on the chassis
+                         * hosting the gateway port.
+                         */
+                        add_chassis_resident_check = smap_get_bool(
+                            &op->peer->nbrp->options,
+                            "reside-on-redirect-chassis", false);
+                    }
+
+                    if (add_chassis_resident_check) {
+                        ds_put_format(&match, " && is_chassis_resident(%s)",
+                                      op->peer->od->l3redirect_port->json_key);
+                    }
                 }
 
                 ds_clear(&actions);
@@ -5232,15 +5251,35 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
                           op->lrp_networks.ipv4_addrs[i].network_s,
                           op->lrp_networks.ipv4_addrs[i].plen,
                           op->lrp_networks.ipv4_addrs[i].addr_s);
-            if (op->od->l3dgw_port && op == op->od->l3dgw_port
-                && op->od->l3redirect_port) {
-                /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s
-                 * should only be sent from the "redirect-chassis", so that
-                 * upstream MAC learning points to the "redirect-chassis".
-                 * Also need to avoid generation of multiple ARP responses
-                 * from different chassis. */
-                ds_put_format(&match, " && is_chassis_resident(%s)",
-                              op->od->l3redirect_port->json_key);
+
+            if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer
+                && op->peer->od->localnet_port) {
+                bool add_chassis_resident_check = false;
+                if (op == op->od->l3dgw_port) {
+                    /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s
+                     * should only be sent from the "redirect-chassis", so that
+                     * upstream MAC learning points to the "redirect-chassis".
+                     * Also need to avoid generation of multiple ARP responses
+                     * from different chassis. */
+                    add_chassis_resident_check = true;
+                } else {
+                    /* Check if the option 'reside-on-redirect-chassis'
+                     * is set to true on the router port. If set to true
+                     * and if peer's logical switch has a localnet port, it
+                     * means the router pipeline for the packets from
+                     * peer's logical switch is be run on the chassis
+                     * hosting the gateway port and it should reply to the
+                     * ARP requests for the router port IPs.
+                     */
+                    add_chassis_resident_check = smap_get_bool(
+                        &op->nbrp->options,
+                        "reside-on-redirect-chassis", false);
+                }
+
+                if (add_chassis_resident_check) {
+                    ds_put_format(&match, " && is_chassis_resident(%s)",
+                                  op->od->l3redirect_port->json_key);
+                }
             }
 
             ds_clear(&actions);
diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
index 64e7d89e6..3936e6016 100644
--- a/ovn/ovn-architecture.7.xml
+++ b/ovn/ovn-architecture.7.xml
@@ -1372,6 +1372,217 @@ 
     http://docs.openvswitch.org/en/latest/topics/high-availability.
   </p>
 
+  <h2>Multiple localnet logical switches connected to a Logical Router</h2>
+
+  <p>
+    It is possible to have multiple logical switches each with a localnet port
+    (representing physical networks) connected to a logical router, in which
+    one localnet logical switch may provide the external connectivity via a
+    distributed gateway port and rest of the localnet logical switches use
+    VLAN tagging in the physical network. It is expected that
+    <code>ovn-bridge-mappings</code> is configured appropriately on the
+    chassis for all these localnet networks.
+  </p>
+
+  <h3>East West routing</h3>
+  <p>
+    East-West routing between these localnet VLAN tagged logical switches
+    work almost the same way as normal logical switches. When the VM sends
+    such a packet, then:
+  </p>
+  <ol>
+    <li>
+      It first enters the ingress pipeline, and then egress pipeline of the
+      source localnet logical switch datapath. It then enters the ingress
+      pipeline of the logical router datapath via the logical router port in
+      the source chassis.
+    </li>
+
+    <li>
+      Routing decision is taken.
+    </li>
+
+    <li>
+      From the router datapath, packet enters the ingress pipeline and then
+      egress pipeline of the destination localnet logical switch datapath
+      and goes out of the integration bridge to the provider bridge (
+      belonging to the destination logical switch) via the localnet port.
+    </li>
+
+    <li>
+      The destination chassis receives the packet via the localnet port and
+      sends it to the integration bridge. The packet enters the
+      ingress pipeline and then egress pipeline of the destination localnet
+      logical switch and finally gets delivered to the destination VM port.
+    </li>
+  </ol>
+
+  <h3>External traffic</h3>
+
+  <p>
+    The following happens when a VM sends an external traffic (which requires
+    NATting) and the chassis hosting the VM doesn't have a distributed gateway
+    port.
+  </p>
+
+  <ol>
+    <li>
+      The packet first enters the ingress pipeline, and then egress pipeline of
+      the source localnet logical switch datapath. It then enters the ingress
+      pipeline of the logical router datapath via the logical router port in
+      the source chassis.
+    </li>
+
+    <li>
+      Routing decision is taken. Since the gateway router or the distributed
+      gateway port doesn't reside in the source chassis, the traffic is
+      redirected to the gateway chassis via the tunnel port.
+    </li>
+
+    <li>
+      The gateway chassis receives the packet via the tunnel port and the
+      packet enters the egress pipeline of the logical router datapath. NAT
+      rules are applied here. The packet then enters the ingress pipeline and
+      then egress pipeline of the localnet logical switch datapath which
+      provides external connectivity and finally goes out via the localnet
+      port of the logical switch which provides external connectivity.
+    </li>
+  </ol>
+
+  <p>
+    Although this works, the VM traffic is tunnelled when sent from the compute
+    chassis to the gateway chassis. In order for it to work properly, the MTU
+    of the localnet logical switches must be lowered to account for the tunnel
+    encapsulation.
+  </p>
+
+  <h2>
+    Centralized routing for localnet VLAN tagged logical switches connected
+    to a Logical Router
+  </h2>
+
+  <p>
+    To overcome the tunnel encapsulation problem described in the previous
+    section, <code>OVN</code> supports the option of enabling centralized
+    routing for localnet VLAN tagged logical switches. CMS can configure the
+    option <ref column="options:reside-on-redirect-chassis"
+    table="Logical_Router_Port" db="OVN_NB"/> to <code>true</code> for each
+    <ref table="Logical_Router_Port" db="OVN_NB"/> which connects to the
+    localnet VLAN tagged logical switches. This causes the gateway
+    chassis (hosting the distributed gateway port) to handle all the
+    routing for these networks, making it centralized. It will reply to
+    the ARP requests for the logical router port IPs.
+  </p>
+
+  <p>
+    If the logical router doesn't have a distributed gateway port connecting
+    to the localnet logical switch which provides external connectivity,
+    then this option is ignored by <code>OVN</code>.
+  </p>
+
+  <p>
+    The following happens when a VM sends an east-west traffic which needs to
+    be routed:
+  </p>
+
+  <ol>
+    <li>
+      The packet first enters the ingress pipeline, and then egress pipeline of
+      the source localnet logical switch datapath and is sent out via the
+      localnet port of the source localnet logical switch (instead of sending
+      it to router pipeline).
+    </li>
+
+    <li>
+      The gateway chassis receives the packet via the localnet port of the
+      source localnet logical switch and sends it to the integration bridge.
+      The packet then enters the ingress pipeline, and then egress pipeline of
+      the source localnet logical switch datapath and enters the ingress
+      pipeline of the logical router datapath.
+    </li>
+
+    <li>
+      Routing decision is taken.
+    </li>
+
+    <li>
+      From the router datapath, packet enters the ingress pipeline and then
+      egress pipeline of the destination localnet logical switch datapath.
+      It then goes out of the integration bridge to the provider bridge (
+      belonging to the destination logical switch) via the localnet port.
+    </li>
+
+    <li>
+      The destination chassis receives the packet via the localnet port and
+      sends it to the integration bridge. The packet enters the
+      ingress pipeline and then egress pipeline of the destination localnet
+      logical switch and finally delivered to the destination VM port.
+    </li>
+  </ol>
+
+  <p>
+    The following happens when a VM sends an external traffic which requires
+    NATting:
+  </p>
+
+  <ol>
+    <li>
+      The packet first enters the ingress pipeline, and then egress pipeline of
+      the source localnet logical switch datapath and is sent out via the
+      localnet port of the source localnet logical switch (instead of sending
+      it to router pipeline).
+    </li>
+
+    <li>
+      The gateway chassis receives the packet via the localnet port of the
+      source localnet logical switch and sends it to the integration bridge.
+      The packet then enters the ingress pipeline, and then egress pipeline of
+      the source localnet logical switch datapath and enters the ingress
+      pipeline of the logical router datapath.
+    </li>
+
+    <li>
+      Routing decision is taken and NAT rules are applied.
+    </li>
+
+    <li>
+      From the router datapath, packet enters the ingress pipeline and then
+      egress pipeline of the localnet logical switch datapath which provides
+      external connectivity. It then goes out of the integration bridge to the
+      provider bridge (belonging to the logical switch which provides external
+      connectivity) via the localnet port.
+    </li>
+  </ol>
+
+  <p>
+    The following happens for the reverse external traffic.
+  </p>
+
+  <ol>
+    <li>
+      The gateway chassis receives the packet from the localnet port of
+      the logical switch which provides external connectivity. The packet then
+      enters the ingress pipeline and then egress pipeline of the localnet
+      logical switch (which provides external connectivity). The packet then
+      enters the ingress pipeline of the logical router datapath.
+    </li>
+
+    <li>
+      The ingress pipeline of the logical router datapath applies the unNATting
+      rules. The packet then enters the ingress pipeline and then egress
+      pipeline of the source localnet logical switch. Since the source VM
+      doesn't reside in the gateway chassis, the packet is sent out via the
+      localnet port of the source logical switch.
+    </li>
+
+    <li>
+      The source chassis receives the packet via the localnet port and
+      sends it to the integration bridge. The packet enters the
+      ingress pipeline and then egress pipeline of the source localnet
+      logical switch and finally gets delivered to the source VM port.
+    </li>
+  </ol>
+
   <h2>Life Cycle of a VTEP gateway</h2>
 
   <p>
diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
index 474b4f9a7..4141751f8 100644
--- a/ovn/ovn-nb.xml
+++ b/ovn/ovn-nb.xml
@@ -1681,6 +1681,49 @@ 
           chassis to enable high availability.
         </p>
       </column>
+
+      <column name="options" key="reside-on-redirect-chassis">
+        <p>
+          Generally routing is distributed in <code>OVN</code>. The packet
+          from a logical port which needs to be routed hits the router pipeline
+          in the source chassis. For the East-West traffic, the packet is
+          sent directly to the destination chassis. For the outside traffic
+          the packet is sent to the gateway chassis.
+        </p>
+
+        <p>
+          When this option is set, <code>OVN</code> considers this only if
+        </p>
+
+        <ul>
+          <li>
+            The logical router to which this logical router port belongs to
+            has a distributed gateway port.
+          </li>
+
+          <li>
+            The peer's logical switch has a localnet port (representing
+            a VLAN tagged network)
+          </li>
+        </ul>
+
+        <p>
+          When this option is set to <code>true</code>, then the packet
+          which needs to be routed hits the router pipeline in the chassis
+          hosting the distributed gateway router port. The source chassis
+          pushes out this traffic via the localnet port. With this the
+          East-West traffic is no more distributed and will always go through
+          the gateway chassis.
+        </p>
+
+        <p>
+          Without this option set, for any traffic destined to outside from a
+          logical port which belongs to a logical switch with localnet port,
+          the source chassis will send the traffic to the gateway chassis via
+          the tunnel port instead of the localnet port and this could cause MTU
+          issues.
+        </p>
+      </column>
     </group>
 
     <group title="Attachment">
diff --git a/tests/ovn.at b/tests/ovn.at
index ab32faa6b..2db3f675a 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -8567,6 +8567,279 @@  OVN_CLEANUP([hv1],[hv2],[hv3])
 
 AT_CLEANUP
 
+# VLAN traffic for external network redirected through distributed router
+# gateway port should use vlans(i.e input network vlan tag) across hypervisors
+# instead of tunneling.
+AT_SETUP([ovn -- vlan traffic for external network with distributed router gateway port])
+AT_SKIP_IF([test $HAVE_PYTHON = no])
+ovn_start
+
+# Logical network:
+# # One LR R1 that has switches foo (192.168.1.0/24) and
+# # alice (172.16.1.0/24) connected to it.  The logical port
+# # between R1 and alice has a "redirect-chassis" specified,
+# # i.e. it is the distributed router gateway port(172.16.1.6).
+# # Switch alice also has a localnet port defined.
+# # An additional switch outside has the same subnet as alice
+# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1)
+# # which will receive the packet destined for external network
+# # (i.e 8.8.8.8 as destination ip).
+
+# Physical network:
+# # Three hypervisors hv[123].
+# # hv1 hosts vif foo1.
+# # hv2 is the "redirect-chassis" that hosts the distributed router gateway port.
+# # hv3 hosts nexthop port vif outside1.
+# # All other tests connect hypervisors to network n1 through br-phys for tunneling.
+# # But in this test, hv1 won't connect to n1(and no br-phys in hv1), and
+# # in order to show vlans(instead of tunneling) used between hv1 and hv2,
+# # a new network n2 created and hv1 and hv2 connected to this network through br-ex.
+# # hv2 and hv3 are still connected to n1 network through br-phys.
+net_add n1
+
+# We are not calling ovn_attach for hv1, to avoid adding br-phys.
+# Tunneling won't work in hv1 as ovn-encap-ip is not added to any bridge in hv1
+sim_add hv1
+as hv1
+ovs-vsctl \
+    -- set Open_vSwitch . external-ids:system-id=hv1 \
+    -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
+    -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \
+    -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \
+    -- add-br br-int \
+    -- set bridge br-int fail-mode=secure other-config:disable-in-band=true \
+    -- set Open_vSwitch . external-ids:ovn-bridge-mappings=public:br-ex
+
+start_daemon ovn-controller
+ovs-vsctl -- add-port br-int hv1-vif1 -- \
+    set interface hv1-vif1 external-ids:iface-id=foo1 \
+    ofport-request=1
+
+sim_add hv2
+as hv2
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.2
+ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys"
+
+sim_add hv3
+as hv3
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.3
+ovs-vsctl -- add-port br-int hv3-vif1 -- \
+    set interface hv3-vif1 external-ids:iface-id=outside1 \
+    options:tx_pcap=hv3/vif1-tx.pcap \
+    options:rxq_pcap=hv3/vif1-rx.pcap \
+    ofport-request=1
+ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings="phys:br-phys"
+
+# Create network n2 for vlan connectivity between hv1 and hv2
+net_add n2
+
+as hv1
+ovs-vsctl add-br br-ex
+net_attach n2 br-ex
+
+as hv2
+ovs-vsctl add-br br-ex
+net_attach n2 br-ex
+
+OVN_POPULATE_ARP
+
+ovn-nbctl create Logical_Router name=R1
+
+ovn-nbctl ls-add foo
+ovn-nbctl ls-add alice
+ovn-nbctl ls-add outside
+
+# Connect foo to R1
+ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24
+ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
+    type=router options:router-port=foo \
+    -- lsp-set-addresses rp-foo router
+
+# Connect alice to R1 as distributed router gateway port (172.16.1.6) on hv2
+ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \
+    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
+ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
+    type=router options:router-port=alice \
+    -- lsp-set-addresses rp-alice router \
+
+# Create logical port foo1 in foo
+ovn-nbctl lsp-add foo foo1 \
+-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
+
+# Create logical port outside1 in outside, which is a nexthop address
+# for 172.16.1.0/24
+ovn-nbctl lsp-add outside outside1 \
+-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1"
+
+# Set default gateway (nexthop) to 172.16.1.1
+ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice
+AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24])
+ovn-nbctl set Logical_Switch_Port rp-alice options:nat-addresses=router
+
+ovn-nbctl lsp-add foo ln-foo
+ovn-nbctl lsp-set-addresses ln-foo unknown
+ovn-nbctl lsp-set-options ln-foo network_name=public
+ovn-nbctl lsp-set-type ln-foo localnet
+AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2])
+
+# Create localnet port in alice
+ovn-nbctl lsp-add alice ln-alice
+ovn-nbctl lsp-set-addresses ln-alice unknown
+ovn-nbctl lsp-set-type ln-alice localnet
+ovn-nbctl lsp-set-options ln-alice network_name=phys
+
+# Create localnet port in outside
+ovn-nbctl lsp-add outside ln-outside
+ovn-nbctl lsp-set-addresses ln-outside unknown
+ovn-nbctl lsp-set-type ln-outside localnet
+ovn-nbctl lsp-set-options ln-outside network_name=phys
+
+# Allow some time for ovn-northd and ovn-controller to catch up.
+# XXX This should be more systematic.
+ovn-nbctl --wait=hv --timeout=3 sync
+
+# Check that there is a logical flow in logical switch foo's pipeline
+# to set the outport to rp-foo (which is expected).
+OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup | \
+grep rp-foo | grep -v is_chassis_resident | wc -l`])
+
+# Set the option 'reside-on-redirect-chassis' for foo
+ovn-nbctl set logical_router_port foo options:reside-on-redirect-chassis=true
+# Check that there is a logical flow in logical switch foo's pipeline
+# to set the outport to rp-foo with the condition is_chassis_redirect.
+ovn-sbctl dump-flows foo
+OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup | \
+grep rp-foo | grep is_chassis_resident | wc -l`])
+
+echo "---------NB dump-----"
+ovn-nbctl show
+echo "---------------------"
+ovn-nbctl list logical_router
+echo "---------------------"
+ovn-nbctl list nat
+echo "---------------------"
+ovn-nbctl list logical_router_port
+echo "---------------------"
+
+echo "---------SB dump-----"
+ovn-sbctl list datapath_binding
+echo "---------------------"
+ovn-sbctl list port_binding
+echo "---------------------"
+ovn-sbctl dump-flows
+echo "---------------------"
+ovn-sbctl list chassis
+echo "---------------------"
+
+for chassis in hv1 hv2 hv3; do
+    as $chassis
+    echo "------ $chassis dump ----------"
+    ovs-vsctl show br-int
+    ovs-ofctl show br-int
+    ovs-ofctl dump-flows br-int
+    echo "--------------------------"
+done
+
+ip_to_hex() {
+    printf "%02x%02x%02x%02x" "$@"
+}
+
+foo1_ip=$(ip_to_hex 192 168 1 2)
+gw_ip=$(ip_to_hex 172 16 1 6)
+dst_ip=$(ip_to_hex 8 8 8 8)
+nexthop_ip=$(ip_to_hex 172 16 1 1)
+
+foo1_mac="f00000010203"
+foo_mac="000001010203"
+gw_mac="000002010203"
+nexthop_mac="f00000010204"
+
+# Send ip packet from foo1 to 8.8.8.8
+src_mac="f00000010203"
+dst_mac="000001010203"
+packet=${foo_mac}${foo1_mac}08004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000
+
+as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
+sleep 2
+
+# ARP request packet for nexthop_ip to expect at outside1
+arp_request=ffffffffffff${gw_mac}08060001080006040001${gw_mac}${gw_ip}000000000000${nexthop_ip}
+echo $arp_request >> hv3-vif1.expected
+cat hv3-vif1.expected > expout
+$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep ${nexthop_ip} | uniq > hv3-vif1
+AT_CHECK([sort hv3-vif1], [0], [expout])
+
+# Send ARP reply from outside1 back to the router
+reply_mac="f00000010204"
+arp_reply=${gw_mac}${nexthop_mac}08060001080006040002${nexthop_mac}${nexthop_ip}${gw_mac}${gw_ip}
+
+as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply
+OVS_WAIT_UNTIL([
+    test `as hv2 ovs-ofctl dump-flows br-int | grep table=66 | \
+grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1
+    ])
+
+# VLAN tagged packet with router port(192.168.1.1) MAC as destination MAC
+# is expected on bridge connecting hv1 and hv2
+expected=${foo_mac}${foo1_mac}8100000208004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000
+echo $expected > hv1-br-ex_n2.expected
+
+# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port.
+# As connection tracking not enabled for this test, snat can't be done on the packet.
+# We still see foo1 as the source ip address. But source mac(gateway MAC) and
+# dest mac(nexthop mac) are properly configured.
+expected=${nexthop_mac}${gw_mac}08004500001c000000003f110100${foo1_ip}${dst_ip}0035111100080000
+echo $expected > hv3-vif1.expected
+
+reset_pcap_file() {
+    local iface=$1
+    local pcap_file=$2
+    ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \
+options:rxq_pcap=dummy-rx.pcap
+    rm -f ${pcap_file}*.pcap
+    ovs-vsctl -- set Interface $iface options:tx_pcap=${pcap_file}-tx.pcap \
+options:rxq_pcap=${pcap_file}-rx.pcap
+}
+
+as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2
+as hv3 reset_pcap_file hv3-vif1 hv3/vif1
+sleep 2
+as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
+sleep 2
+
+# On hv1, the packet should not go from vlan switch pipleline to router
+# pipleine
+as hv1 ovs-ofctl dump-flows br-int
+
+AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep "priority=100,reg15=0x1,metadata=0x2" \
+| grep actions=clone | grep -v n_packets=0 | wc -l], [0], [[0
+]])
+
+# On hv1, table 32 check that no packet goes via the tunnel port
+AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 \
+| grep "NXM_NX_TUN_ID" | grep -v n_packets=0 | wc -l], [0], [[0
+]])
+
+ip_packet() {
+    grep "1010203f00000010203"
+}
+
+# Check vlan tagged packet on the bridge connecting hv1 and hv2 with the
+# foo1's mac.
+$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-ex_n2-tx.pcap | ip_packet | uniq > hv1-br-ex_n2
+cat hv1-br-ex_n2.expected > expout
+AT_CHECK([sort hv1-br-ex_n2], [0], [expout])
+
+# Check expected packet on nexthop interface
+$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep ${foo1_ip}${dst_ip} | uniq > hv3-vif1
+cat hv3-vif1.expected > expout
+AT_CHECK([sort hv3-vif1], [0], [expout])
+
+OVN_CLEANUP([hv1],[hv2],[hv3])
+AT_CLEANUP
+
 AT_SETUP([ovn -- IPv6 ND Router Solicitation responder])
 AT_KEYWORDS([ovn-nd_ra])
 AT_SKIP_IF([test $HAVE_PYTHON = no])