diff mbox series

[ovs-dev,1/2] ovn: Avoid tunneling for VLAN packets redirected to a gateway chassis

Message ID 20181005171452.31582-1-nusiddiq@redhat.com
State Superseded
Headers show
Series Addressing VLAN tenant network issues and add | expand

Commit Message

Numan Siddique Oct. 5, 2018, 5:14 p.m. UTC
From: Numan Siddique <nusiddiq@redhat.com>

An OVN deployment can have multiple logical switches each with a
localnet port connected to a distributed logical router with one
logical router port providing external connectivity (provider network)
and others used as tenant networks with VLAN tagging.

As reported in [1], external traffic from these VLAN tenant networks
are tunnelled to the gateway chassis (chassis hosting a distributed
gateway port which applies NAT rules). As part of the discussion in
[1], there were few possible solutions proposed by Russell [2]. This
patch implements the first option in [2].

With this patch, a new option 'reside-on-redirect-chassis' in 'options'
column of Logical_Router_Port table is added. If the value of this
option is set to 'true' and if the logical router also have a
distributed gateway port, then routing for this logical router port
is centralized in the chassis hosting the distributed gateway port.

If a logical switch 'sw0' is connected to a router 'lr0' with the
router port - 'lr0-sw0' with the address - "00:00:00:00:af:12 192.168.1.1"
, and it has a distributed logical port - 'lr0-public', then the
below logical flow is added in the logical switch pipeline
of 'sw0' if the 'reside-on-redirect-chassis' option is set on 'lr-sw0' -

table=16(ls_in_l2_lkup), priority=50,
match=(eth.dst == 00:00:00:00:af:12 && is_chassis_resident("cr-lr0-public")),
action=(outport = "sw0-lr0"; output;)

With the above flow, the packet doesn't enter the router pipeline in
the source chassis. Instead the packet is sent out via the localnet
port of 'sw0'. The gateway chassis upon receiving this packet, runs
the logical router pipeline applying NAT rules and sends the traffic
out via the localnet port of the provider network. The gateway
chassis will also reply to the ARP requests for the router port IPs.

With this approach, we avoid redirecting the external traffic to the
gateway chassis via the tunnel port. There are a couple of drawbacks
with this approach:

  - East - West routing is no more distributed for the VLAN tenant
    networks if 'reside-on-redirect-chassis' option is defined

  - 'dnat_and_snat' NAT rules with 'logical_mac' and 'logical_port'
    columns defined will not work for the VLAN tenant networks.

This approach is taken for now as it is simple. If there is a requirement
to support distributed routing for these VLAN tenant networks, we
can explore other possible solutions.

[1] -  https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
[2] - https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046557.html

Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
Reported-by: venkata anil <vkommadi@redhat.com>
Co-authored-by: venkata anil <vkommadi@redhat.com>
Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
Signed-off-by: venkata anil <vkommadi@redhat.com>
---
 ovn/northd/ovn-northd.8.xml |  30 ++++
 ovn/northd/ovn-northd.c     |  71 +++++++---
 ovn/ovn-architecture.7.xml  | 160 +++++++++++++++++++++
 ovn/ovn-nb.xml              |  43 ++++++
 tests/ovn.at                | 273 ++++++++++++++++++++++++++++++++++++
 5 files changed, 561 insertions(+), 16 deletions(-)

Comments

Mark Michelson Oct. 12, 2018, 7:56 p.m. UTC | #1
Hi Numan,

The patch does a good job of explaining the routing behavior and the 
tunneling problem solved within.

Prior to the patch, you can have a distributed gateway router with a 
redirect-chassis port set on it. This allows for east-west traffic to 
have an optimal direct path between hypervisors, but for the north-south 
use case, when the traffic is redirected to the redirect-chassis, the 
traffic is encapsulated.

With this patch, you add the reside-on-redirect-chassis option to router 
ports. This essentially makes all traffic destined for the router port 
get redirected to the gateway chassis prior to running the router 
pipeline. This removes the encapsulation issue, but it also means that 
east-west traffic is now also centralized.

I'm curious what the current behavior is when you specify a gateway 
router by setting options:chassis. Specifically, I'm curious about how 
it compares if you define a router where the "external" port has 
options:redirect-chassis set on it and all other ports have 
options:reside-on-redirect-chassis set on them. Have you essentially 
just created the same thing? Or is there some subtle difference?

On 10/05/2018 01:14 PM, nusiddiq@redhat.com wrote:
> From: Numan Siddique <nusiddiq@redhat.com>
> 
> An OVN deployment can have multiple logical switches each with a
> localnet port connected to a distributed logical router with one
> logical router port providing external connectivity (provider network)
> and others used as tenant networks with VLAN tagging.
> 
> As reported in [1], external traffic from these VLAN tenant networks
> are tunnelled to the gateway chassis (chassis hosting a distributed
> gateway port which applies NAT rules). As part of the discussion in
> [1], there were few possible solutions proposed by Russell [2]. This
> patch implements the first option in [2].
> 
> With this patch, a new option 'reside-on-redirect-chassis' in 'options'
> column of Logical_Router_Port table is added. If the value of this
> option is set to 'true' and if the logical router also have a
> distributed gateway port, then routing for this logical router port
> is centralized in the chassis hosting the distributed gateway port.
> 
> If a logical switch 'sw0' is connected to a router 'lr0' with the
> router port - 'lr0-sw0' with the address - "00:00:00:00:af:12 192.168.1.1"
> , and it has a distributed logical port - 'lr0-public', then the
> below logical flow is added in the logical switch pipeline
> of 'sw0' if the 'reside-on-redirect-chassis' option is set on 'lr-sw0' -
> 
> table=16(ls_in_l2_lkup), priority=50,
> match=(eth.dst == 00:00:00:00:af:12 && is_chassis_resident("cr-lr0-public")),
> action=(outport = "sw0-lr0"; output;)
> 
> With the above flow, the packet doesn't enter the router pipeline in
> the source chassis. Instead the packet is sent out via the localnet
> port of 'sw0'. The gateway chassis upon receiving this packet, runs
> the logical router pipeline applying NAT rules and sends the traffic
> out via the localnet port of the provider network. The gateway
> chassis will also reply to the ARP requests for the router port IPs.
> 
> With this approach, we avoid redirecting the external traffic to the
> gateway chassis via the tunnel port. There are a couple of drawbacks
> with this approach:
> 
>    - East - West routing is no more distributed for the VLAN tenant
>      networks if 'reside-on-redirect-chassis' option is defined
> 
>    - 'dnat_and_snat' NAT rules with 'logical_mac' and 'logical_port'
>      columns defined will not work for the VLAN tenant networks.
> 
> This approach is taken for now as it is simple. If there is a requirement
> to support distributed routing for these VLAN tenant networks, we
> can explore other possible solutions.
> 
> [1] -  https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
> [2] - https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046557.html
> 
> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
> Reported-by: venkata anil <vkommadi@redhat.com>
> Co-authored-by: venkata anil <vkommadi@redhat.com>
> Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
> Signed-off-by: venkata anil <vkommadi@redhat.com>
> ---
>   ovn/northd/ovn-northd.8.xml |  30 ++++
>   ovn/northd/ovn-northd.c     |  71 +++++++---
>   ovn/ovn-architecture.7.xml  | 160 +++++++++++++++++++++
>   ovn/ovn-nb.xml              |  43 ++++++
>   tests/ovn.at                | 273 ++++++++++++++++++++++++++++++++++++
>   5 files changed, 561 insertions(+), 16 deletions(-)
> 
> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
> index 7352c6764..f52699bd3 100644
> --- a/ovn/northd/ovn-northd.8.xml
> +++ b/ovn/northd/ovn-northd.8.xml
> @@ -874,6 +874,25 @@ output;
>               resident.
>             </li>
>           </ul>
> +
> +        <p>
> +          For the Ethernet address on a logical switch port of type
> +          <code>router</code>, when that logical switch port's
> +          <ref column="addresses" table="Logical_Switch_Port"
> +          db="OVN_Northbound"/> column is set to <code>router</code> and
> +          the connected logical router port specifies a
> +          <code>reside-on-redirect-chassis</code> and the logical router
> +          to which the connected logical router port belongs to has a
> +          <code>redirect-chassis</code> distributed gateway logical router
> +          port:
> +        </p>
> +
> +        <ul>
> +          <li>
> +            The flow for the connected logical router port's Ethernet
> +            address is only programmed on the <code>redirect-chassis</code>.
> +          </li>
> +        </ul>
>         </li>
>   
>         <li>
> @@ -1179,6 +1198,17 @@ output;
>             upstream MAC learning to point to the
>             <code>redirect-chassis</code>.
>           </p>
> +
> +        <p>
> +          For the logical router port with the option
> +          <code>reside-on-redirect-chassis</code> set (which is centralized),
> +          the above flows are only programmed on the gateway port instance on
> +          the <code>redirect-chassis</code> (if the logical router has a
> +          distributed gateway port). This behavior avoids generation
> +          of multiple ARP responses from different chassis, and allows
> +          upstream MAC learning to point to the
> +          <code>redirect-chassis</code>.
> +        </p>
>         </li>
>   
>         <li>
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index 31ea5f410..3998a898c 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -4426,13 +4426,32 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
>                   ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT,
>                                 ETH_ADDR_ARGS(mac));
>                   if (op->peer->od->l3dgw_port
> -                    && op->peer == op->peer->od->l3dgw_port
> -                    && op->peer->od->l3redirect_port) {
> -                    /* The destination lookup flow for the router's
> -                     * distributed gateway port MAC address should only be
> -                     * programmed on the "redirect-chassis". */
> -                    ds_put_format(&match, " && is_chassis_resident(%s)",
> -                                  op->peer->od->l3redirect_port->json_key);
> +                    && op->peer->od->l3redirect_port
> +                    && op->od->localnet_port) {
> +                    bool add_chassis_resident_check = false;
> +                    if (op->peer == op->peer->od->l3dgw_port) {
> +                        /* The peer of this port represents a distributed
> +                         * gateway port. The destination lookup flow for the
> +                         * router's distributed gateway port MAC address should
> +                         * only be programmed on the "redirect-chassis". */
> +                        add_chassis_resident_check = true;
> +                    } else {
> +                        /* Check if the option 'reside-on-redirect-chassis'
> +                         * is set to true on the peer port. If set to true
> +                         * and if the logical switch has a localnet port, it
> +                         * means the router pipeline for the packets from
> +                         * this logical switch should be run on the chassis
> +                         * hosting the gateway port.
> +                         */
> +                        add_chassis_resident_check = smap_get_bool(
> +                            &op->peer->nbrp->options,
> +                            "reside-on-redirect-chassis", false);
> +                    }
> +
> +                    if (add_chassis_resident_check) {
> +                        ds_put_format(&match, " && is_chassis_resident(%s)",
> +                                      op->peer->od->l3redirect_port->json_key);
> +                    }
>                   }
>   
>                   ds_clear(&actions);
> @@ -5197,15 +5216,35 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
>                             op->lrp_networks.ipv4_addrs[i].network_s,
>                             op->lrp_networks.ipv4_addrs[i].plen,
>                             op->lrp_networks.ipv4_addrs[i].addr_s);
> -            if (op->od->l3dgw_port && op == op->od->l3dgw_port
> -                && op->od->l3redirect_port) {
> -                /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s
> -                 * should only be sent from the "redirect-chassis", so that
> -                 * upstream MAC learning points to the "redirect-chassis".
> -                 * Also need to avoid generation of multiple ARP responses
> -                 * from different chassis. */
> -                ds_put_format(&match, " && is_chassis_resident(%s)",
> -                              op->od->l3redirect_port->json_key);
> +
> +            if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer
> +                && op->peer->od->localnet_port) {
> +                bool add_chassis_resident_check = false;
> +                if (op == op->od->l3dgw_port) {
> +                    /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s
> +                     * should only be sent from the "redirect-chassis", so that
> +                     * upstream MAC learning points to the "redirect-chassis".
> +                     * Also need to avoid generation of multiple ARP responses
> +                     * from different chassis. */
> +                    add_chassis_resident_check = true;
> +                } else {
> +                    /* Check if the option 'reside-on-redirect-chassis'
> +                     * is set to true on the router port. If set to true
> +                     * and if peer's logical switch has a localnet port, it
> +                     * means the router pipeline for the packets from
> +                     * peer's logical switch is be run on the chassis
> +                     * hosting the gateway port and it should reply to the
> +                     * ARP requests for the router port IPs.
> +                     */
> +                    add_chassis_resident_check = smap_get_bool(
> +                        &op->nbrp->options,
> +                        "reside-on-redirect-chassis", false);
> +                }
> +
> +                if (add_chassis_resident_check) {
> +                    ds_put_format(&match, " && is_chassis_resident(%s)",
> +                                  op->od->l3redirect_port->json_key);
> +                }
>               }
>   
>               ds_clear(&actions);
> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
> index 6ed2cf132..998470c34 100644
> --- a/ovn/ovn-architecture.7.xml
> +++ b/ovn/ovn-architecture.7.xml
> @@ -1372,6 +1372,166 @@
>       http://docs.openvswitch.org/en/latest/topics/high-availability.
>     </p>
>   
> +  <h2>Tenant VLAN networks connected to a Logical Router</h2>
> +
> +  <p>
> +    It is possible to have multiple logical switches each with a localnet port
> +    (representing physical networks) connected to a logical router in which one
> +    may provide the external connectivity via a distributed gatewat port and
> +    the rest of them are used internally (with VLAN tagged). It is expected
> +    that <code>ovn-bridge-mappings</code> is configured appropriately on the
> +    chassis.
> +  </p>
> +
> +  <h3>East West routing</h3>
> +  <p>
> +    East-West routing between these tenant VLAN logical switches works almost
> +    the same way as normal logical switches. When the VM sends such a packet,
> +    then:
> +  </p>
> +  <ol>
> +    <li>
> +      The packet enters the ingress pipeline of the logical router datapath
> +      via the logical router port in the source chassis.
> +    </li>
> +
> +    <li>
> +      Routing decision is taken.
> +    </li>
> +
> +    <li>
> +      The packet goes out of the integration bridge to the provider bridge (
> +      belonging to the destination logical switch) via the localnet port.
> +    </li>
> +
> +    <li>
> +      The destination chassis receives the packet via the localnet port
> +      and delivers to the destination VM.
> +    </li>
> +  </ol>
> +
> +  <h3>External traffic</h3>
> +
> +  <p>
> +    The following happens when a VM sends an external traffic (which requires
> +    NATting) and the chassis hosting the VM doesn't have a distributed gateway
> +    port.
> +  </p>
> +
> +  <ol>
> +    <li>
> +      The packet enters the ingress pipeline of the logical router datapath
> +      via the logical router port in the source chassis.
> +    </li>
> +
> +    <li>
> +      Routing decision is taken. Since the gateway router or the distributed
> +      gateway port doesn't reside in the source chassis, the traffic is
> +      redirected to the gateway chassis via the tunnel port.
> +    </li>
> +
> +    <li>
> +      The gateway chassis receives the packet, applies the NAT rules and
> +      forwards it via the localnet port.
> +    </li>
> +  </ol>
> +
> +  <p>
> +    Although this works, the VM traffic is tunnelled. In order for it to
> +    work properly, the MTU of the VLAN tenant networks must be lowered to
> +    account for the tunnel encapsulation.
> +  </p>
> +
> +  <h2>Centralized routing for VLAN tenant networks</h2>
> +
> +  <p>
> +    To overcome the tunnel encapsulation problem described in the previous
> +    section, <code>OVN</code> supports the option of enabling centralized
> +    routing for VLAN tenant networks. CMS can configure the option
> +    <ref column="options:reside-on-redirect-chassis"
> +    table="Logical_Router_Port" db="OVN_NB"/> to <code>true</code> for each
> +    <ref table="Logical_Router_Port" db="OVN_NB"/> which connects to the
> +    logical switch of the VLAN tenant network. This causes the gateway
> +    chassis (hosting the distributed gateway port) to handle all the
> +    routing for these networks, making it centralized. It will reply to
> +    the ARP requests for the logical router port IPs.
> +  </p>
> +
> +  <p>
> +    If the logical router doesn't have a distributed gateway port connecting
> +    to the provider network, then this option is ignored by <code>OVN</code>.
> +  </p>
> +
> +  <p>
> +    The following happens when a VM sends an east-west traffic which needs to
> +    be routed:
> +  </p>
> +
> +  <ol>
> +    <li>
> +      The packet from the VM enters the logical datapath pipeline of the source
> +      VLAN network in the source chassis and is sent out via the localnet port
> +      (instead of sending it to router pipeline).
> +    </li>
> +
> +    <li>
> +      The packet enters the logical datapath pipeline of the source VLAN
> +      network in the gateway chassis and is sent to the logical datapath
> +      pipeline belonging to the logical router.
> +    </li>
> +
> +    <li>
> +      Routing decision is taken.
> +    </li>
> +
> +    <li>
> +      The packet enters the logical datapath pipeline of the destination
> +      VLAN network. The packet is delivered to the destination VM if it resides
> +      in the same chassis. Otherwise the packet is sent out via the localnet
> +      port of the destination VLAN network.
> +    </li>
> +
> +    <li>
> +      The destination chassis receives the packet via the localnet port
> +      and delivers to the destination VM.
> +    </li>
> +  </ol>
> +
> +  <p>
> +    The following happens when a VM sends an external traffic which requires
> +    NATting:
> +  </p>
> +
> +  <ol>
> +    <li>
> +      The packet from the VM enters the logical datapath pipeline of the source
> +      VLAN network in the source chassis and is sent out via the localnet port
> +      (instead of sending it to router pipeline).
> +    </li>
> +
> +    <li>
> +      The packet enters the logical datapath pipeline of the source VLAN
> +      network in the gateway chassis and is sent to the logical datapath
> +      pipeline belonging to the logical router.
> +    </li>
> +
> +    <li>
> +      Routing decision is taken and NAT rules are applied.
> +    </li>
> +
> +    <li>
> +      The packet enters the logical datapath pipeline of the provider network
> +      and is sent out via the localnet port of the provider network.
> +    </li>
> +  </ol>
> +
> +  <p>
> +    For the reverse external traffic, the gateway chassis applies the unNATting
> +    rules and sends the packet via the localnet port of the VLAN tenant
> +    network and the destination chassis receives the packet and delivers to
> +    the VM.
> +  </p>
> +
>     <h2>Life Cycle of a VTEP gateway</h2>
>   
>     <p>
> diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
> index 8564ed39c..13ae56e13 100644
> --- a/ovn/ovn-nb.xml
> +++ b/ovn/ovn-nb.xml
> @@ -1635,6 +1635,49 @@
>             chassis to enable high availability.
>           </p>
>         </column>
> +
> +      <column name="options" key="reside-on-redirect-chassis">
> +        <p>
> +          Generally routing is distributed in <code>OVN</code>. The packet
> +          from a logical port which needs to be routed hits the router pipeline
> +          in the source chassis. For the East-West traffic, the packet is
> +          sent directly to the destination chassis. For the outside traffic
> +          the packet is sent to the gateway chassis.
> +        </p>
> +
> +        <p>
> +          When this option is set, <code>OVN</code> considers this only if
> +        </p>
> +
> +        <ul>
> +          <li>
> +            The logical router to which this logical router port belongs to
> +            has a distributed gateway port.
> +          </li>
> +
> +          <li>
> +            The peer's logical switch has a localnet port (representing
> +            a tenant VLAN network)
> +          </li>
> +        </ul>
> +
> +        <p>
> +          When this option is set to <code>true</code>, then the packet
> +          which needs to be routed hits the router pipeline in the chassis
> +          hosting the distributed gateway router port. The source chassis
> +          pushes out this traffic via the localnet port. With this the
> +          East-West traffic is no more distributed and will always go through
> +          the gateway chassis.
> +        </p>
> +
> +        <p>
> +          Without this option set, for any traffic destined to outside from a
> +          logical port which belongs to a logical switch with localnet port,
> +          the source chassis will send the traffic to the gateway chassis via
> +          the tunnel port instead of the localnet port and this could cause MTU
> +          issues.
> +        </p>
> +      </column>
>       </group>
>   
>       <group title="Attachment">
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 769e09f81..504ba228d 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -8537,6 +8537,279 @@ OVN_CLEANUP([hv1],[hv2],[hv3])
>   
>   AT_CLEANUP
>   
> +# VLAN traffic for external network redirected through distributed router
> +# gateway port should use vlans(i.e input network vlan tag) across hypervisors
> +# instead of tunneling.
> +AT_SETUP([ovn -- vlan traffic for external network with distributed router gateway port])
> +AT_SKIP_IF([test $HAVE_PYTHON = no])
> +ovn_start
> +
> +# Logical network:
> +# # One LR R1 that has switches foo (192.168.1.0/24) and
> +# # alice (172.16.1.0/24) connected to it.  The logical port
> +# # between R1 and alice has a "redirect-chassis" specified,
> +# # i.e. it is the distributed router gateway port(172.16.1.6).
> +# # Switch alice also has a localnet port defined.
> +# # An additional switch outside has the same subnet as alice
> +# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1)
> +# # which will receive the packet destined for external network
> +# # (i.e 8.8.8.8 as destination ip).
> +
> +# Physical network:
> +# # Three hypervisors hv[123].
> +# # hv1 hosts vif foo1.
> +# # hv2 is the "redirect-chassis" that hosts the distributed router gateway port.
> +# # hv3 hosts nexthop port vif outside1.
> +# # All other tests connect hypervisors to network n1 through br-phys for tunneling.
> +# # But in this test, hv1 won't connect to n1(and no br-phys in hv1), and
> +# # in order to show vlans(instead of tunneling) used between hv1 and hv2,
> +# # a new network n2 created and hv1 and hv2 connected to this network through br-ex.
> +# # hv2 and hv3 are still connected to n1 network through br-phys.
> +net_add n1
> +
> +# We are not calling ovn_attach for hv1, to avoid adding br-phys.
> +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any bridge in hv1
> +sim_add hv1
> +as hv1
> +ovs-vsctl \
> +    -- set Open_vSwitch . external-ids:system-id=hv1 \
> +    -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> +    -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \
> +    -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \
> +    -- add-br br-int \
> +    -- set bridge br-int fail-mode=secure other-config:disable-in-band=true \
> +    -- set Open_vSwitch . external-ids:ovn-bridge-mappings=public:br-ex
> +
> +start_daemon ovn-controller
> +ovs-vsctl -- add-port br-int hv1-vif1 -- \
> +    set interface hv1-vif1 external-ids:iface-id=foo1 \
> +    ofport-request=1
> +
> +sim_add hv2
> +as hv2
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.2
> +ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys"
> +
> +sim_add hv3
> +as hv3
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.3
> +ovs-vsctl -- add-port br-int hv3-vif1 -- \
> +    set interface hv3-vif1 external-ids:iface-id=outside1 \
> +    options:tx_pcap=hv3/vif1-tx.pcap \
> +    options:rxq_pcap=hv3/vif1-rx.pcap \
> +    ofport-request=1
> +ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings="phys:br-phys"
> +
> +# Create network n2 for vlan connectivity between hv1 and hv2
> +net_add n2
> +
> +as hv1
> +ovs-vsctl add-br br-ex
> +net_attach n2 br-ex
> +
> +as hv2
> +ovs-vsctl add-br br-ex
> +net_attach n2 br-ex
> +
> +OVN_POPULATE_ARP
> +
> +ovn-nbctl create Logical_Router name=R1
> +
> +ovn-nbctl ls-add foo
> +ovn-nbctl ls-add alice
> +ovn-nbctl ls-add outside
> +
> +# Connect foo to R1
> +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24
> +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
> +    type=router options:router-port=foo \
> +    -- lsp-set-addresses rp-foo router
> +
> +# Connect alice to R1 as distributed router gateway port (172.16.1.6) on hv2
> +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \
> +    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
> +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
> +    type=router options:router-port=alice \
> +    -- lsp-set-addresses rp-alice router \
> +
> +# Create logical port foo1 in foo
> +ovn-nbctl lsp-add foo foo1 \
> +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
> +
> +# Create logical port outside1 in outside, which is a nexthop address
> +# for 172.16.1.0/24
> +ovn-nbctl lsp-add outside outside1 \
> +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1"
> +
> +# Set default gateway (nexthop) to 172.16.1.1
> +ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice
> +AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24])
> +ovn-nbctl set Logical_Switch_Port rp-alice options:nat-addresses=router
> +
> +ovn-nbctl lsp-add foo ln-foo
> +ovn-nbctl lsp-set-addresses ln-foo unknown
> +ovn-nbctl lsp-set-options ln-foo network_name=public
> +ovn-nbctl lsp-set-type ln-foo localnet
> +AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2])
> +
> +# Create localnet port in alice
> +ovn-nbctl lsp-add alice ln-alice
> +ovn-nbctl lsp-set-addresses ln-alice unknown
> +ovn-nbctl lsp-set-type ln-alice localnet
> +ovn-nbctl lsp-set-options ln-alice network_name=phys
> +
> +# Create localnet port in outside
> +ovn-nbctl lsp-add outside ln-outside
> +ovn-nbctl lsp-set-addresses ln-outside unknown
> +ovn-nbctl lsp-set-type ln-outside localnet
> +ovn-nbctl lsp-set-options ln-outside network_name=phys
> +
> +# Allow some time for ovn-northd and ovn-controller to catch up.
> +# XXX This should be more systematic.
> +ovn-nbctl --wait=hv --timeout=3 sync
> +
> +# Check that there is a logical flow in logical switch foo's pipeline
> +# to set the outport to rp-foo (which is expected).
> +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup | \
> +grep rp-foo | grep -v is_chassis_resident | wc -l`])
> +
> +# Set the option 'reside-on-redirect-chassis' for foo
> +ovn-nbctl set logical_router_port foo options:reside-on-redirect-chassis=true
> +# Check that there is a logical flow in logical switch foo's pipeline
> +# to set the outport to rp-foo with the condition is_chassis_redirect.
> +ovn-sbctl dump-flows foo
> +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup | \
> +grep rp-foo | grep is_chassis_resident | wc -l`])
> +
> +echo "---------NB dump-----"
> +ovn-nbctl show
> +echo "---------------------"
> +ovn-nbctl list logical_router
> +echo "---------------------"
> +ovn-nbctl list nat
> +echo "---------------------"
> +ovn-nbctl list logical_router_port
> +echo "---------------------"
> +
> +echo "---------SB dump-----"
> +ovn-sbctl list datapath_binding
> +echo "---------------------"
> +ovn-sbctl list port_binding
> +echo "---------------------"
> +ovn-sbctl dump-flows
> +echo "---------------------"
> +ovn-sbctl list chassis
> +echo "---------------------"
> +
> +for chassis in hv1 hv2 hv3; do
> +    as $chassis
> +    echo "------ $chassis dump ----------"
> +    ovs-vsctl show br-int
> +    ovs-ofctl show br-int
> +    ovs-ofctl dump-flows br-int
> +    echo "--------------------------"
> +done
> +
> +ip_to_hex() {
> +    printf "%02x%02x%02x%02x" "$@"
> +}
> +
> +foo1_ip=$(ip_to_hex 192 168 1 2)
> +gw_ip=$(ip_to_hex 172 16 1 6)
> +dst_ip=$(ip_to_hex 8 8 8 8)
> +nexthop_ip=$(ip_to_hex 172 16 1 1)
> +
> +foo1_mac="f00000010203"
> +foo_mac="000001010203"
> +gw_mac="000002010203"
> +nexthop_mac="f00000010204"
> +
> +# Send ip packet from foo1 to 8.8.8.8
> +src_mac="f00000010203"
> +dst_mac="000001010203"
> +packet=${foo_mac}${foo1_mac}08004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000
> +
> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
> +sleep 2
> +
> +# ARP request packet for nexthop_ip to expect at outside1
> +arp_request=ffffffffffff${gw_mac}08060001080006040001${gw_mac}${gw_ip}000000000000${nexthop_ip}
> +echo $arp_request >> hv3-vif1.expected
> +cat hv3-vif1.expected > expout
> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep ${nexthop_ip} | uniq > hv3-vif1
> +AT_CHECK([sort hv3-vif1], [0], [expout])
> +
> +# Send ARP reply from outside1 back to the router
> +reply_mac="f00000010204"
> +arp_reply=${gw_mac}${nexthop_mac}08060001080006040002${nexthop_mac}${nexthop_ip}${gw_mac}${gw_ip}
> +
> +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply
> +OVS_WAIT_UNTIL([
> +    test `as hv2 ovs-ofctl dump-flows br-int | grep table=66 | \
> +grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1
> +    ])
> +
> +# VLAN tagged packet with router port(192.168.1.1) MAC as destination MAC
> +# is expected on bridge connecting hv1 and hv2
> +expected=${foo_mac}${foo1_mac}8100000208004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000
> +echo $expected > hv1-br-ex_n2.expected
> +
> +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port.
> +# As connection tracking not enabled for this test, snat can't be done on the packet.
> +# We still see foo1 as the source ip address. But source mac(gateway MAC) and
> +# dest mac(nexthop mac) are properly configured.
> +expected=${nexthop_mac}${gw_mac}08004500001c000000003f110100${foo1_ip}${dst_ip}0035111100080000
> +echo $expected > hv3-vif1.expected
> +
> +reset_pcap_file() {
> +    local iface=$1
> +    local pcap_file=$2
> +    ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \
> +options:rxq_pcap=dummy-rx.pcap
> +    rm -f ${pcap_file}*.pcap
> +    ovs-vsctl -- set Interface $iface options:tx_pcap=${pcap_file}-tx.pcap \
> +options:rxq_pcap=${pcap_file}-rx.pcap
> +}
> +
> +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2
> +as hv3 reset_pcap_file hv3-vif1 hv3/vif1
> +sleep 2
> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
> +sleep 2
> +
> +# On hv1, the packet should not go from vlan switch pipleline to router
> +# pipleine
> +as hv1 ovs-ofctl dump-flows br-int
> +
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep "priority=100,reg15=0x1,metadata=0x2" \
> +| grep actions=clone | grep -v n_packets=0 | wc -l], [0], [[0
> +]])
> +
> +# On hv1, table 32 check that no packet goes via the tunnel port
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 \
> +| grep "NXM_NX_TUN_ID" | grep -v n_packets=0 | wc -l], [0], [[0
> +]])
> +
> +ip_packet() {
> +    grep "1010203f00000010203"
> +}
> +
> +# Check vlan tagged packet on the bridge connecting hv1 and hv2 with the
> +# foo1's mac.
> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-ex_n2-tx.pcap | ip_packet | uniq > hv1-br-ex_n2
> +cat hv1-br-ex_n2.expected > expout
> +AT_CHECK([sort hv1-br-ex_n2], [0], [expout])
> +
> +# Check expected packet on nexthop interface
> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep ${foo1_ip}${dst_ip} | uniq > hv3-vif1
> +cat hv3-vif1.expected > expout
> +AT_CHECK([sort hv3-vif1], [0], [expout])
> +
> +OVN_CLEANUP([hv1],[hv2],[hv3])
> +AT_CLEANUP
> +
>   AT_SETUP([ovn -- IPv6 ND Router Solicitation responder])
>   AT_KEYWORDS([ovn-nd_ra])
>   AT_SKIP_IF([test $HAVE_PYTHON = no])
>
Numan Siddique Oct. 15, 2018, 9:38 a.m. UTC | #2
On Sat, Oct 13, 2018 at 1:26 AM Mark Michelson <mmichels@redhat.com> wrote:

> Hi Numan,
>
> The patch does a good job of explaining the routing behavior and the
> tunneling problem solved within.
>
> Prior to the patch, you can have a distributed gateway router with a
> redirect-chassis port set on it. This allows for east-west traffic to
> have an optimal direct path between hypervisors, but for the north-south
> use case, when the traffic is redirected to the redirect-chassis, the
> traffic is encapsulated.
>
> With this patch, you add the reside-on-redirect-chassis option to router
> ports. This essentially makes all traffic destined for the router port
> get redirected to the gateway chassis prior to running the router
> pipeline. This removes the encapsulation issue, but it also means that
> east-west traffic is now also centralized.
>

That's right. That's the trade off to solve this issue for VLAN tenant
networks.


>
> I'm curious what the current behavior is when you specify a gateway
> router by setting options:chassis. Specifically, I'm curious about how
> it compares if you define a router where the "external" port has
> options:redirect-chassis set on it and all other ports have
> options:reside-on-redirect-chassis set on them. Have you essentially
> just created the same thing? Or is there some subtle difference?
>

There is a difference. In the case of gateway router (options:chassis set)
scenario, it is
expected that the tenant VLAN logical switches will be connected to a
normal router
and this normal router will be connected to the gateway router via a
transit switch.
So the east west traffic will be distributed, but for the North/South
traffic, the packet
on the source chassis enters logical switch pipeline -> normal router
pipeline -> transit
switch pipeline. And then the packet is sent to the chassis hosting
the gateway router via the tunnel port. On the gateway chassis, packet
enters
the transit switch pipeline ->  gateway router pipeline -> provider network
pipeline.

Thanks
Numan






> On 10/05/2018 01:14 PM, nusiddiq@redhat.com wrote:
> > From: Numan Siddique <nusiddiq@redhat.com>
> >
> > An OVN deployment can have multiple logical switches each with a
> > localnet port connected to a distributed logical router with one
> > logical router port providing external connectivity (provider network)
> > and others used as tenant networks with VLAN tagging.
> >
> > As reported in [1], external traffic from these VLAN tenant networks
> > are tunnelled to the gateway chassis (chassis hosting a distributed
> > gateway port which applies NAT rules). As part of the discussion in
> > [1], there were few possible solutions proposed by Russell [2]. This
> > patch implements the first option in [2].
> >
> > With this patch, a new option 'reside-on-redirect-chassis' in 'options'
> > column of Logical_Router_Port table is added. If the value of this
> > option is set to 'true' and if the logical router also have a
> > distributed gateway port, then routing for this logical router port
> > is centralized in the chassis hosting the distributed gateway port.
> >
> > If a logical switch 'sw0' is connected to a router 'lr0' with the
> > router port - 'lr0-sw0' with the address - "00:00:00:00:af:12
> 192.168.1.1"
> > , and it has a distributed logical port - 'lr0-public', then the
> > below logical flow is added in the logical switch pipeline
> > of 'sw0' if the 'reside-on-redirect-chassis' option is set on 'lr-sw0' -
> >
> > table=16(ls_in_l2_lkup), priority=50,
> > match=(eth.dst == 00:00:00:00:af:12 &&
> is_chassis_resident("cr-lr0-public")),
> > action=(outport = "sw0-lr0"; output;)
> >
> > With the above flow, the packet doesn't enter the router pipeline in
> > the source chassis. Instead the packet is sent out via the localnet
> > port of 'sw0'. The gateway chassis upon receiving this packet, runs
> > the logical router pipeline applying NAT rules and sends the traffic
> > out via the localnet port of the provider network. The gateway
> > chassis will also reply to the ARP requests for the router port IPs.
> >
> > With this approach, we avoid redirecting the external traffic to the
> > gateway chassis via the tunnel port. There are a couple of drawbacks
> > with this approach:
> >
> >    - East - West routing is no more distributed for the VLAN tenant
> >      networks if 'reside-on-redirect-chassis' option is defined
> >
> >    - 'dnat_and_snat' NAT rules with 'logical_mac' and 'logical_port'
> >      columns defined will not work for the VLAN tenant networks.
> >
> > This approach is taken for now as it is simple. If there is a requirement
> > to support distributed routing for these VLAN tenant networks, we
> > can explore other possible solutions.
> >
> > [1] -
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
> > [2] -
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046557.html
> >
> > Reported-at:
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
> > Reported-by: venkata anil <vkommadi@redhat.com>
> > Co-authored-by: venkata anil <vkommadi@redhat.com>
> > Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
> > Signed-off-by: venkata anil <vkommadi@redhat.com>
> > ---
> >   ovn/northd/ovn-northd.8.xml |  30 ++++
> >   ovn/northd/ovn-northd.c     |  71 +++++++---
> >   ovn/ovn-architecture.7.xml  | 160 +++++++++++++++++++++
> >   ovn/ovn-nb.xml              |  43 ++++++
> >   tests/ovn.at                | 273 ++++++++++++++++++++++++++++++++++++
> >   5 files changed, 561 insertions(+), 16 deletions(-)
> >
> > diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
> > index 7352c6764..f52699bd3 100644
> > --- a/ovn/northd/ovn-northd.8.xml
> > +++ b/ovn/northd/ovn-northd.8.xml
> > @@ -874,6 +874,25 @@ output;
> >               resident.
> >             </li>
> >           </ul>
> > +
> > +        <p>
> > +          For the Ethernet address on a logical switch port of type
> > +          <code>router</code>, when that logical switch port's
> > +          <ref column="addresses" table="Logical_Switch_Port"
> > +          db="OVN_Northbound"/> column is set to <code>router</code> and
> > +          the connected logical router port specifies a
> > +          <code>reside-on-redirect-chassis</code> and the logical router
> > +          to which the connected logical router port belongs to has a
> > +          <code>redirect-chassis</code> distributed gateway logical
> router
> > +          port:
> > +        </p>
> > +
> > +        <ul>
> > +          <li>
> > +            The flow for the connected logical router port's Ethernet
> > +            address is only programmed on the
> <code>redirect-chassis</code>.
> > +          </li>
> > +        </ul>
> >         </li>
> >
> >         <li>
> > @@ -1179,6 +1198,17 @@ output;
> >             upstream MAC learning to point to the
> >             <code>redirect-chassis</code>.
> >           </p>
> > +
> > +        <p>
> > +          For the logical router port with the option
> > +          <code>reside-on-redirect-chassis</code> set (which is
> centralized),
> > +          the above flows are only programmed on the gateway port
> instance on
> > +          the <code>redirect-chassis</code> (if the logical router has a
> > +          distributed gateway port). This behavior avoids generation
> > +          of multiple ARP responses from different chassis, and allows
> > +          upstream MAC learning to point to the
> > +          <code>redirect-chassis</code>.
> > +        </p>
> >         </li>
> >
> >         <li>
> > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> > index 31ea5f410..3998a898c 100644
> > --- a/ovn/northd/ovn-northd.c
> > +++ b/ovn/northd/ovn-northd.c
> > @@ -4426,13 +4426,32 @@ build_lswitch_flows(struct hmap *datapaths,
> struct hmap *ports,
> >                   ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT,
> >                                 ETH_ADDR_ARGS(mac));
> >                   if (op->peer->od->l3dgw_port
> > -                    && op->peer == op->peer->od->l3dgw_port
> > -                    && op->peer->od->l3redirect_port) {
> > -                    /* The destination lookup flow for the router's
> > -                     * distributed gateway port MAC address should only
> be
> > -                     * programmed on the "redirect-chassis". */
> > -                    ds_put_format(&match, " && is_chassis_resident(%s)",
> > -
> op->peer->od->l3redirect_port->json_key);
> > +                    && op->peer->od->l3redirect_port
> > +                    && op->od->localnet_port) {
> > +                    bool add_chassis_resident_check = false;
> > +                    if (op->peer == op->peer->od->l3dgw_port) {
> > +                        /* The peer of this port represents a
> distributed
> > +                         * gateway port. The destination lookup flow
> for the
> > +                         * router's distributed gateway port MAC
> address should
> > +                         * only be programmed on the
> "redirect-chassis". */
> > +                        add_chassis_resident_check = true;
> > +                    } else {
> > +                        /* Check if the option
> 'reside-on-redirect-chassis'
> > +                         * is set to true on the peer port. If set to
> true
> > +                         * and if the logical switch has a localnet
> port, it
> > +                         * means the router pipeline for the packets
> from
> > +                         * this logical switch should be run on the
> chassis
> > +                         * hosting the gateway port.
> > +                         */
> > +                        add_chassis_resident_check = smap_get_bool(
> > +                            &op->peer->nbrp->options,
> > +                            "reside-on-redirect-chassis", false);
> > +                    }
> > +
> > +                    if (add_chassis_resident_check) {
> > +                        ds_put_format(&match, " &&
> is_chassis_resident(%s)",
> > +
> op->peer->od->l3redirect_port->json_key);
> > +                    }
> >                   }
> >
> >                   ds_clear(&actions);
> > @@ -5197,15 +5216,35 @@ build_lrouter_flows(struct hmap *datapaths,
> struct hmap *ports,
> >                             op->lrp_networks.ipv4_addrs[i].network_s,
> >                             op->lrp_networks.ipv4_addrs[i].plen,
> >                             op->lrp_networks.ipv4_addrs[i].addr_s);
> > -            if (op->od->l3dgw_port && op == op->od->l3dgw_port
> > -                && op->od->l3redirect_port) {
> > -                /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s
> > -                 * should only be sent from the "redirect-chassis", so
> that
> > -                 * upstream MAC learning points to the
> "redirect-chassis".
> > -                 * Also need to avoid generation of multiple ARP
> responses
> > -                 * from different chassis. */
> > -                ds_put_format(&match, " && is_chassis_resident(%s)",
> > -                              op->od->l3redirect_port->json_key);
> > +
> > +            if (op->od->l3dgw_port && op->od->l3redirect_port &&
> op->peer
> > +                && op->peer->od->localnet_port) {
> > +                bool add_chassis_resident_check = false;
> > +                if (op == op->od->l3dgw_port) {
> > +                    /* Traffic with eth.src =
> l3dgw_port->lrp_networks.ea_s
> > +                     * should only be sent from the "redirect-chassis",
> so that
> > +                     * upstream MAC learning points to the
> "redirect-chassis".
> > +                     * Also need to avoid generation of multiple ARP
> responses
> > +                     * from different chassis. */
> > +                    add_chassis_resident_check = true;
> > +                } else {
> > +                    /* Check if the option 'reside-on-redirect-chassis'
> > +                     * is set to true on the router port. If set to true
> > +                     * and if peer's logical switch has a localnet
> port, it
> > +                     * means the router pipeline for the packets from
> > +                     * peer's logical switch is be run on the chassis
> > +                     * hosting the gateway port and it should reply to
> the
> > +                     * ARP requests for the router port IPs.
> > +                     */
> > +                    add_chassis_resident_check = smap_get_bool(
> > +                        &op->nbrp->options,
> > +                        "reside-on-redirect-chassis", false);
> > +                }
> > +
> > +                if (add_chassis_resident_check) {
> > +                    ds_put_format(&match, " && is_chassis_resident(%s)",
> > +                                  op->od->l3redirect_port->json_key);
> > +                }
> >               }
> >
> >               ds_clear(&actions);
> > diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
> > index 6ed2cf132..998470c34 100644
> > --- a/ovn/ovn-architecture.7.xml
> > +++ b/ovn/ovn-architecture.7.xml
> > @@ -1372,6 +1372,166 @@
> >       http://docs.openvswitch.org/en/latest/topics/high-availability.
> >     </p>
> >
> > +  <h2>Tenant VLAN networks connected to a Logical Router</h2>
> > +
> > +  <p>
> > +    It is possible to have multiple logical switches each with a
> localnet port
> > +    (representing physical networks) connected to a logical router in
> which one
> > +    may provide the external connectivity via a distributed gatewat
> port and
> > +    the rest of them are used internally (with VLAN tagged). It is
> expected
> > +    that <code>ovn-bridge-mappings</code> is configured appropriately
> on the
> > +    chassis.
> > +  </p>
> > +
> > +  <h3>East West routing</h3>
> > +  <p>
> > +    East-West routing between these tenant VLAN logical switches works
> almost
> > +    the same way as normal logical switches. When the VM sends such a
> packet,
> > +    then:
> > +  </p>
> > +  <ol>
> > +    <li>
> > +      The packet enters the ingress pipeline of the logical router
> datapath
> > +      via the logical router port in the source chassis.
> > +    </li>
> > +
> > +    <li>
> > +      Routing decision is taken.
> > +    </li>
> > +
> > +    <li>
> > +      The packet goes out of the integration bridge to the provider
> bridge (
> > +      belonging to the destination logical switch) via the localnet
> port.
> > +    </li>
> > +
> > +    <li>
> > +      The destination chassis receives the packet via the localnet port
> > +      and delivers to the destination VM.
> > +    </li>
> > +  </ol>
> > +
> > +  <h3>External traffic</h3>
> > +
> > +  <p>
> > +    The following happens when a VM sends an external traffic (which
> requires
> > +    NATting) and the chassis hosting the VM doesn't have a distributed
> gateway
> > +    port.
> > +  </p>
> > +
> > +  <ol>
> > +    <li>
> > +      The packet enters the ingress pipeline of the logical router
> datapath
> > +      via the logical router port in the source chassis.
> > +    </li>
> > +
> > +    <li>
> > +      Routing decision is taken. Since the gateway router or the
> distributed
> > +      gateway port doesn't reside in the source chassis, the traffic is
> > +      redirected to the gateway chassis via the tunnel port.
> > +    </li>
> > +
> > +    <li>
> > +      The gateway chassis receives the packet, applies the NAT rules and
> > +      forwards it via the localnet port.
> > +    </li>
> > +  </ol>
> > +
> > +  <p>
> > +    Although this works, the VM traffic is tunnelled. In order for it to
> > +    work properly, the MTU of the VLAN tenant networks must be lowered
> to
> > +    account for the tunnel encapsulation.
> > +  </p>
> > +
> > +  <h2>Centralized routing for VLAN tenant networks</h2>
> > +
> > +  <p>
> > +    To overcome the tunnel encapsulation problem described in the
> previous
> > +    section, <code>OVN</code> supports the option of enabling
> centralized
> > +    routing for VLAN tenant networks. CMS can configure the option
> > +    <ref column="options:reside-on-redirect-chassis"
> > +    table="Logical_Router_Port" db="OVN_NB"/> to <code>true</code> for
> each
> > +    <ref table="Logical_Router_Port" db="OVN_NB"/> which connects to the
> > +    logical switch of the VLAN tenant network. This causes the gateway
> > +    chassis (hosting the distributed gateway port) to handle all the
> > +    routing for these networks, making it centralized. It will reply to
> > +    the ARP requests for the logical router port IPs.
> > +  </p>
> > +
> > +  <p>
> > +    If the logical router doesn't have a distributed gateway port
> connecting
> > +    to the provider network, then this option is ignored by
> <code>OVN</code>.
> > +  </p>
> > +
> > +  <p>
> > +    The following happens when a VM sends an east-west traffic which
> needs to
> > +    be routed:
> > +  </p>
> > +
> > +  <ol>
> > +    <li>
> > +      The packet from the VM enters the logical datapath pipeline of
> the source
> > +      VLAN network in the source chassis and is sent out via the
> localnet port
> > +      (instead of sending it to router pipeline).
> > +    </li>
> > +
> > +    <li>
> > +      The packet enters the logical datapath pipeline of the source VLAN
> > +      network in the gateway chassis and is sent to the logical datapath
> > +      pipeline belonging to the logical router.
> > +    </li>
> > +
> > +    <li>
> > +      Routing decision is taken.
> > +    </li>
> > +
> > +    <li>
> > +      The packet enters the logical datapath pipeline of the destination
> > +      VLAN network. The packet is delivered to the destination VM if it
> resides
> > +      in the same chassis. Otherwise the packet is sent out via the
> localnet
> > +      port of the destination VLAN network.
> > +    </li>
> > +
> > +    <li>
> > +      The destination chassis receives the packet via the localnet port
> > +      and delivers to the destination VM.
> > +    </li>
> > +  </ol>
> > +
> > +  <p>
> > +    The following happens when a VM sends an external traffic which
> requires
> > +    NATting:
> > +  </p>
> > +
> > +  <ol>
> > +    <li>
> > +      The packet from the VM enters the logical datapath pipeline of
> the source
> > +      VLAN network in the source chassis and is sent out via the
> localnet port
> > +      (instead of sending it to router pipeline).
> > +    </li>
> > +
> > +    <li>
> > +      The packet enters the logical datapath pipeline of the source VLAN
> > +      network in the gateway chassis and is sent to the logical datapath
> > +      pipeline belonging to the logical router.
> > +    </li>
> > +
> > +    <li>
> > +      Routing decision is taken and NAT rules are applied.
> > +    </li>
> > +
> > +    <li>
> > +      The packet enters the logical datapath pipeline of the provider
> network
> > +      and is sent out via the localnet port of the provider network.
> > +    </li>
> > +  </ol>
> > +
> > +  <p>
> > +    For the reverse external traffic, the gateway chassis applies the
> unNATting
> > +    rules and sends the packet via the localnet port of the VLAN tenant
> > +    network and the destination chassis receives the packet and
> delivers to
> > +    the VM.
> > +  </p>
> > +
> >     <h2>Life Cycle of a VTEP gateway</h2>
> >
> >     <p>
> > diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
> > index 8564ed39c..13ae56e13 100644
> > --- a/ovn/ovn-nb.xml
> > +++ b/ovn/ovn-nb.xml
> > @@ -1635,6 +1635,49 @@
> >             chassis to enable high availability.
> >           </p>
> >         </column>
> > +
> > +      <column name="options" key="reside-on-redirect-chassis">
> > +        <p>
> > +          Generally routing is distributed in <code>OVN</code>. The
> packet
> > +          from a logical port which needs to be routed hits the router
> pipeline
> > +          in the source chassis. For the East-West traffic, the packet
> is
> > +          sent directly to the destination chassis. For the outside
> traffic
> > +          the packet is sent to the gateway chassis.
> > +        </p>
> > +
> > +        <p>
> > +          When this option is set, <code>OVN</code> considers this only
> if
> > +        </p>
> > +
> > +        <ul>
> > +          <li>
> > +            The logical router to which this logical router port
> belongs to
> > +            has a distributed gateway port.
> > +          </li>
> > +
> > +          <li>
> > +            The peer's logical switch has a localnet port (representing
> > +            a tenant VLAN network)
> > +          </li>
> > +        </ul>
> > +
> > +        <p>
> > +          When this option is set to <code>true</code>, then the packet
> > +          which needs to be routed hits the router pipeline in the
> chassis
> > +          hosting the distributed gateway router port. The source
> chassis
> > +          pushes out this traffic via the localnet port. With this the
> > +          East-West traffic is no more distributed and will always go
> through
> > +          the gateway chassis.
> > +        </p>
> > +
> > +        <p>
> > +          Without this option set, for any traffic destined to outside
> from a
> > +          logical port which belongs to a logical switch with localnet
> port,
> > +          the source chassis will send the traffic to the gateway
> chassis via
> > +          the tunnel port instead of the localnet port and this could
> cause MTU
> > +          issues.
> > +        </p>
> > +      </column>
> >       </group>
> >
> >       <group title="Attachment">
> > diff --git a/tests/ovn.at b/tests/ovn.at
> > index 769e09f81..504ba228d 100644
> > --- a/tests/ovn.at
> > +++ b/tests/ovn.at
> > @@ -8537,6 +8537,279 @@ OVN_CLEANUP([hv1],[hv2],[hv3])
> >
> >   AT_CLEANUP
> >
> > +# VLAN traffic for external network redirected through distributed
> router
> > +# gateway port should use vlans(i.e input network vlan tag) across
> hypervisors
> > +# instead of tunneling.
> > +AT_SETUP([ovn -- vlan traffic for external network with distributed
> router gateway port])
> > +AT_SKIP_IF([test $HAVE_PYTHON = no])
> > +ovn_start
> > +
> > +# Logical network:
> > +# # One LR R1 that has switches foo (192.168.1.0/24) and
> > +# # alice (172.16.1.0/24) connected to it.  The logical port
> > +# # between R1 and alice has a "redirect-chassis" specified,
> > +# # i.e. it is the distributed router gateway port(172.16.1.6).
> > +# # Switch alice also has a localnet port defined.
> > +# # An additional switch outside has the same subnet as alice
> > +# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1)
> > +# # which will receive the packet destined for external network
> > +# # (i.e 8.8.8.8 as destination ip).
> > +
> > +# Physical network:
> > +# # Three hypervisors hv[123].
> > +# # hv1 hosts vif foo1.
> > +# # hv2 is the "redirect-chassis" that hosts the distributed router
> gateway port.
> > +# # hv3 hosts nexthop port vif outside1.
> > +# # All other tests connect hypervisors to network n1 through br-phys
> for tunneling.
> > +# # But in this test, hv1 won't connect to n1(and no br-phys in hv1),
> and
> > +# # in order to show vlans(instead of tunneling) used between hv1 and
> hv2,
> > +# # a new network n2 created and hv1 and hv2 connected to this network
> through br-ex.
> > +# # hv2 and hv3 are still connected to n1 network through br-phys.
> > +net_add n1
> > +
> > +# We are not calling ovn_attach for hv1, to avoid adding br-phys.
> > +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any
> bridge in hv1
> > +sim_add hv1
> > +as hv1
> > +ovs-vsctl \
> > +    -- set Open_vSwitch . external-ids:system-id=hv1 \
> > +    -- set Open_vSwitch .
> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> > +    -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \
> > +    -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \
> > +    -- add-br br-int \
> > +    -- set bridge br-int fail-mode=secure
> other-config:disable-in-band=true \
> > +    -- set Open_vSwitch . external-ids:ovn-bridge-mappings=public:br-ex
> > +
> > +start_daemon ovn-controller
> > +ovs-vsctl -- add-port br-int hv1-vif1 -- \
> > +    set interface hv1-vif1 external-ids:iface-id=foo1 \
> > +    ofport-request=1
> > +
> > +sim_add hv2
> > +as hv2
> > +ovs-vsctl add-br br-phys
> > +ovn_attach n1 br-phys 192.168.0.2
> > +ovs-vsctl set Open_vSwitch .
> external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys"
> > +
> > +sim_add hv3
> > +as hv3
> > +ovs-vsctl add-br br-phys
> > +ovn_attach n1 br-phys 192.168.0.3
> > +ovs-vsctl -- add-port br-int hv3-vif1 -- \
> > +    set interface hv3-vif1 external-ids:iface-id=outside1 \
> > +    options:tx_pcap=hv3/vif1-tx.pcap \
> > +    options:rxq_pcap=hv3/vif1-rx.pcap \
> > +    ofport-request=1
> > +ovs-vsctl set Open_vSwitch .
> external-ids:ovn-bridge-mappings="phys:br-phys"
> > +
> > +# Create network n2 for vlan connectivity between hv1 and hv2
> > +net_add n2
> > +
> > +as hv1
> > +ovs-vsctl add-br br-ex
> > +net_attach n2 br-ex
> > +
> > +as hv2
> > +ovs-vsctl add-br br-ex
> > +net_attach n2 br-ex
> > +
> > +OVN_POPULATE_ARP
> > +
> > +ovn-nbctl create Logical_Router name=R1
> > +
> > +ovn-nbctl ls-add foo
> > +ovn-nbctl ls-add alice
> > +ovn-nbctl ls-add outside
> > +
> > +# Connect foo to R1
> > +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24
> > +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
> > +    type=router options:router-port=foo \
> > +    -- lsp-set-addresses rp-foo router
> > +
> > +# Connect alice to R1 as distributed router gateway port (172.16.1.6)
> on hv2
> > +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \
> > +    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
> > +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
> > +    type=router options:router-port=alice \
> > +    -- lsp-set-addresses rp-alice router \
> > +
> > +# Create logical port foo1 in foo
> > +ovn-nbctl lsp-add foo foo1 \
> > +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
> > +
> > +# Create logical port outside1 in outside, which is a nexthop address
> > +# for 172.16.1.0/24
> > +ovn-nbctl lsp-add outside outside1 \
> > +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1"
> > +
> > +# Set default gateway (nexthop) to 172.16.1.1
> > +ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice
> > +AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24])
> > +ovn-nbctl set Logical_Switch_Port rp-alice options:nat-addresses=router
> > +
> > +ovn-nbctl lsp-add foo ln-foo
> > +ovn-nbctl lsp-set-addresses ln-foo unknown
> > +ovn-nbctl lsp-set-options ln-foo network_name=public
> > +ovn-nbctl lsp-set-type ln-foo localnet
> > +AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2])
> > +
> > +# Create localnet port in alice
> > +ovn-nbctl lsp-add alice ln-alice
> > +ovn-nbctl lsp-set-addresses ln-alice unknown
> > +ovn-nbctl lsp-set-type ln-alice localnet
> > +ovn-nbctl lsp-set-options ln-alice network_name=phys
> > +
> > +# Create localnet port in outside
> > +ovn-nbctl lsp-add outside ln-outside
> > +ovn-nbctl lsp-set-addresses ln-outside unknown
> > +ovn-nbctl lsp-set-type ln-outside localnet
> > +ovn-nbctl lsp-set-options ln-outside network_name=phys
> > +
> > +# Allow some time for ovn-northd and ovn-controller to catch up.
> > +# XXX This should be more systematic.
> > +ovn-nbctl --wait=hv --timeout=3 sync
> > +
> > +# Check that there is a logical flow in logical switch foo's pipeline
> > +# to set the outport to rp-foo (which is expected).
> > +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup
> | \
> > +grep rp-foo | grep -v is_chassis_resident | wc -l`])
> > +
> > +# Set the option 'reside-on-redirect-chassis' for foo
> > +ovn-nbctl set logical_router_port foo
> options:reside-on-redirect-chassis=true
> > +# Check that there is a logical flow in logical switch foo's pipeline
> > +# to set the outport to rp-foo with the condition is_chassis_redirect.
> > +ovn-sbctl dump-flows foo
> > +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup
> | \
> > +grep rp-foo | grep is_chassis_resident | wc -l`])
> > +
> > +echo "---------NB dump-----"
> > +ovn-nbctl show
> > +echo "---------------------"
> > +ovn-nbctl list logical_router
> > +echo "---------------------"
> > +ovn-nbctl list nat
> > +echo "---------------------"
> > +ovn-nbctl list logical_router_port
> > +echo "---------------------"
> > +
> > +echo "---------SB dump-----"
> > +ovn-sbctl list datapath_binding
> > +echo "---------------------"
> > +ovn-sbctl list port_binding
> > +echo "---------------------"
> > +ovn-sbctl dump-flows
> > +echo "---------------------"
> > +ovn-sbctl list chassis
> > +echo "---------------------"
> > +
> > +for chassis in hv1 hv2 hv3; do
> > +    as $chassis
> > +    echo "------ $chassis dump ----------"
> > +    ovs-vsctl show br-int
> > +    ovs-ofctl show br-int
> > +    ovs-ofctl dump-flows br-int
> > +    echo "--------------------------"
> > +done
> > +
> > +ip_to_hex() {
> > +    printf "%02x%02x%02x%02x" "$@"
> > +}
> > +
> > +foo1_ip=$(ip_to_hex 192 168 1 2)
> > +gw_ip=$(ip_to_hex 172 16 1 6)
> > +dst_ip=$(ip_to_hex 8 8 8 8)
> > +nexthop_ip=$(ip_to_hex 172 16 1 1)
> > +
> > +foo1_mac="f00000010203"
> > +foo_mac="000001010203"
> > +gw_mac="000002010203"
> > +nexthop_mac="f00000010204"
> > +
> > +# Send ip packet from foo1 to 8.8.8.8
> > +src_mac="f00000010203"
> > +dst_mac="000001010203"
> >
> +packet=${foo_mac}${foo1_mac}08004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000
> > +
> > +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
> > +sleep 2
> > +
> > +# ARP request packet for nexthop_ip to expect at outside1
> >
> +arp_request=ffffffffffff${gw_mac}08060001080006040001${gw_mac}${gw_ip}000000000000${nexthop_ip}
> > +echo $arp_request >> hv3-vif1.expected
> > +cat hv3-vif1.expected > expout
> > +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep
> ${nexthop_ip} | uniq > hv3-vif1
> > +AT_CHECK([sort hv3-vif1], [0], [expout])
> > +
> > +# Send ARP reply from outside1 back to the router
> > +reply_mac="f00000010204"
> >
> +arp_reply=${gw_mac}${nexthop_mac}08060001080006040002${nexthop_mac}${nexthop_ip}${gw_mac}${gw_ip}
> > +
> > +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply
> > +OVS_WAIT_UNTIL([
> > +    test `as hv2 ovs-ofctl dump-flows br-int | grep table=66 | \
> > +grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1
> > +    ])
> > +
> > +# VLAN tagged packet with router port(192.168.1.1) MAC as destination
> MAC
> > +# is expected on bridge connecting hv1 and hv2
> >
> +expected=${foo_mac}${foo1_mac}8100000208004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000
> > +echo $expected > hv1-br-ex_n2.expected
> > +
> > +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port.
> > +# As connection tracking not enabled for this test, snat can't be done
> on the packet.
> > +# We still see foo1 as the source ip address. But source mac(gateway
> MAC) and
> > +# dest mac(nexthop mac) are properly configured.
> >
> +expected=${nexthop_mac}${gw_mac}08004500001c000000003f110100${foo1_ip}${dst_ip}0035111100080000
> > +echo $expected > hv3-vif1.expected
> > +
> > +reset_pcap_file() {
> > +    local iface=$1
> > +    local pcap_file=$2
> > +    ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \
> > +options:rxq_pcap=dummy-rx.pcap
> > +    rm -f ${pcap_file}*.pcap
> > +    ovs-vsctl -- set Interface $iface
> options:tx_pcap=${pcap_file}-tx.pcap \
> > +options:rxq_pcap=${pcap_file}-rx.pcap
> > +}
> > +
> > +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2
> > +as hv3 reset_pcap_file hv3-vif1 hv3/vif1
> > +sleep 2
> > +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
> > +sleep 2
> > +
> > +# On hv1, the packet should not go from vlan switch pipleline to router
> > +# pipleine
> > +as hv1 ovs-ofctl dump-flows br-int
> > +
> > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
> "priority=100,reg15=0x1,metadata=0x2" \
> > +| grep actions=clone | grep -v n_packets=0 | wc -l], [0], [[0
> > +]])
> > +
> > +# On hv1, table 32 check that no packet goes via the tunnel port
> > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 \
> > +| grep "NXM_NX_TUN_ID" | grep -v n_packets=0 | wc -l], [0], [[0
> > +]])
> > +
> > +ip_packet() {
> > +    grep "1010203f00000010203"
> > +}
> > +
> > +# Check vlan tagged packet on the bridge connecting hv1 and hv2 with the
> > +# foo1's mac.
> > +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-ex_n2-tx.pcap |
> ip_packet | uniq > hv1-br-ex_n2
> > +cat hv1-br-ex_n2.expected > expout
> > +AT_CHECK([sort hv1-br-ex_n2], [0], [expout])
> > +
> > +# Check expected packet on nexthop interface
> > +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep
> ${foo1_ip}${dst_ip} | uniq > hv3-vif1
> > +cat hv3-vif1.expected > expout
> > +AT_CHECK([sort hv3-vif1], [0], [expout])
> > +
> > +OVN_CLEANUP([hv1],[hv2],[hv3])
> > +AT_CLEANUP
> > +
> >   AT_SETUP([ovn -- IPv6 ND Router Solicitation responder])
> >   AT_KEYWORDS([ovn-nd_ra])
> >   AT_SKIP_IF([test $HAVE_PYTHON = no])
> >
>
>
Ankur Sharma Nov. 2, 2018, 12:50 a.m. UTC | #3
Hi Numan, Mark,

Thanks for the patch.
Description explains the problem statement and solution really well.

I have following comments:

a. Regarding the solution:
   i. I think referring to a non connected port in a logical datapath pipeline is probably not the right way.
      i.e as per the description, in ls0 pipeline we are referring to lr0-public.
      lr0-public is not a peer interface of ls0-lr0 patch port pair, hence ls0 pipeline should be totally agnostic to it (just my 2 cents).

      table=16(ls_in_l2_lkup), priority=50, match=(eth.dst == 00:00:00:00:af:12 && is_chassis_resident("cr-lr0-public")),  action=(outport = "sw0-lr0"; output;)

b. Nit: Patch 2 in this series does not look related to patch  1 (unless I am missing something),
            if current then we should have a separate patch for it.

c. Nit: Any new config that we are adding, we should add corresponding ovn-nbctl CLI.
           This way non openstack deployments can also play with the feature 😊

===================================================

Regarding the solution:
------------------------------

As I understand, we are trying to achieve following here:
a. For a distributed virtual router, we want some router interface to be fully centralized, rather than distributed.
    i.e instead of creating 2 versions of a router port (lrp-* and cr-lrp*), one distributed and one centralized, 
    We want the lrp-*  itself to be centralized.
    i.e in your case, from vlan backed logical switch, you want to enter router pipeline only on the gateway chassis and NOT on the source chassis.

b. Just wanted to propose an alternate approach here, which we implemented for a slightly different use case. 
     Looks like this alternate approach would help in your scenario as well.

Alternate Approach:
--------------------------
a. Convert the port pair between logical switch and logical router to be of type "l3gateway", rather than "patch".
b. i.e in your configuration, lr0-sw0 and sw0-lr0 should be implemented as type l3gateway rather than patch.
c. In other words, we are simulating a centralized router (for specific peer logical switches) in a distributed router.
We will be discussing this approach in OVS Conf as well:
https://ovsfall2018.sched.com/event/IO9w/connectivity-for-external-networks-on-the-overlay?iframe=no&w=100%&sidebar=yes&bg=no

===================================================

Please feel free to point, if I missed something here.
Please feel free to comment on the proposed alternative.

Thanks

Regards,
Ankur


-----Original Message-----
From: ovs-dev-bounces@openvswitch.org <ovs-dev-bounces@openvswitch.org> On Behalf Of Numan Siddique
Sent: Monday, October 15, 2018 2:38 AM
To: Mark Michelson <mmichels@redhat.com>
Cc: ovs dev <dev@openvswitch.org>
Subject: Re: [ovs-dev] [PATCH 1/2] ovn: Avoid tunneling for VLAN packets redirected to a gateway chassis

On Sat, Oct 13, 2018 at 1:26 AM Mark Michelson <mailto:mmichels@redhat.com> wrote:

> Hi Numan,
>
> The patch does a good job of explaining the routing behavior and the 
> tunneling problem solved within.
>
> Prior to the patch, you can have a distributed gateway router with a 
> redirect-chassis port set on it. This allows for east-west traffic to 
> have an optimal direct path between hypervisors, but for the 
> north-south use case, when the traffic is redirected to the 
> redirect-chassis, the traffic is encapsulated.
>
> With this patch, you add the reside-on-redirect-chassis option to 
> router ports. This essentially makes all traffic destined for the 
> router port get redirected to the gateway chassis prior to running the 
> router pipeline. This removes the encapsulation issue, but it also 
> means that east-west traffic is now also centralized.
>

That's right. That's the trade off to solve this issue for VLAN tenant networks.


>
> I'm curious what the current behavior is when you specify a gateway 
> router by setting options:chassis. Specifically, I'm curious about how 
> it compares if you define a router where the "external" port has 
> options:redirect-chassis set on it and all other ports have 
> options:reside-on-redirect-chassis set on them. Have you essentially 
> just created the same thing? Or is there some subtle difference?
>

There is a difference. In the case of gateway router (options:chassis set) scenario, it is expected that the tenant VLAN logical switches will be connected to a normal router and this normal router will be connected to the gateway router via a transit switch.
So the east west traffic will be distributed, but for the North/South traffic, the packet on the source chassis enters logical switch pipeline -> normal router pipeline -> transit switch pipeline. And then the packet is sent to the chassis hosting the gateway router via the tunnel port. On the gateway chassis, packet enters the transit switch pipeline ->  gateway router pipeline -> provider network pipeline.

Thanks
Numan






> On 10/05/2018 01:14 PM, mailto:nusiddiq@redhat.com wrote:
> > From: Numan Siddique <mailto:nusiddiq@redhat.com>
> >
> > An OVN deployment can have multiple logical switches each with a 
> > localnet port connected to a distributed logical router with one 
> > logical router port providing external connectivity (provider 
> > network) and others used as tenant networks with VLAN tagging.
> >
> > As reported in [1], external traffic from these VLAN tenant networks 
> > are tunnelled to the gateway chassis (chassis hosting a distributed 
> > gateway port which applies NAT rules). As part of the discussion in 
> > [1], there were few possible solutions proposed by Russell [2]. This 
> > patch implements the first option in [2].
> >
> > With this patch, a new option 'reside-on-redirect-chassis' in 'options'
> > column of Logical_Router_Port table is added. If the value of this 
> > option is set to 'true' and if the logical router also have a 
> > distributed gateway port, then routing for this logical router port 
> > is centralized in the chassis hosting the distributed gateway port.
> >
> > If a logical switch 'sw0' is connected to a router 'lr0' with the 
> > router port - 'lr0-sw0' with the address - "00:00:00:00:af:12
> 192.168.1.1"
> > , and it has a distributed logical port - 'lr0-public', then the 
> > below logical flow is added in the logical switch pipeline of 'sw0' 
> > if the 'reside-on-redirect-chassis' option is set on 'lr-sw0' -
> >
> > table=16(ls_in_l2_lkup), priority=50, match=(eth.dst == 
> > 00:00:00:00:af:12 &&
> is_chassis_resident("cr-lr0-public")),
> > action=(outport = "sw0-lr0"; output;)
> >
> > With the above flow, the packet doesn't enter the router pipeline in 
> > the source chassis. Instead the packet is sent out via the localnet 
> > port of 'sw0'. The gateway chassis upon receiving this packet, runs 
> > the logical router pipeline applying NAT rules and sends the traffic 
> > out via the localnet port of the provider network. The gateway 
> > chassis will also reply to the ARP requests for the router port IPs.
> >
> > With this approach, we avoid redirecting the external traffic to the 
> > gateway chassis via the tunnel port. There are a couple of drawbacks 
> > with this approach:
> >
> >    - East - West routing is no more distributed for the VLAN tenant
> >      networks if 'reside-on-redirect-chassis' option is defined
> >
> >    - 'dnat_and_snat' NAT rules with 'logical_mac' and 'logical_port'
> >      columns defined will not work for the VLAN tenant networks.
> >
> > This approach is taken for now as it is simple. If there is a 
> > requirement to support distributed routing for these VLAN tenant 
> > networks, we can explore other possible solutions.
> >
> > [1] -
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.
> org_pipermail_ovs-2Ddiscuss_2018-2DApril_046543.html&d=DwICAg&c=s883Gp
> UCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mE2qy
> wxdjIadgBjj3-Xsjbt7jiXYD543pSHvwWZn5sg&s=TupRua9WlqBQG00wHQofyUYjEPymD
> aCMGNnZisJ-KyY&e=
> > [2] -
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.
> org_pipermail_ovs-2Ddiscuss_2018-2DApril_046557.html&d=DwICAg&c=s883Gp
> UCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mE2qy
> wxdjIadgBjj3-Xsjbt7jiXYD543pSHvwWZn5sg&s=ufwXW9yvvqyU0Uc4YG3VaNekB5ieu
> 5EpBrRGcK_j0-k&e=
> >
> > Reported-at:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.
> org_pipermail_ovs-2Ddiscuss_2018-2DApril_046543.html&d=DwICAg&c=s883Gp
> UCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mE2qy
> wxdjIadgBjj3-Xsjbt7jiXYD543pSHvwWZn5sg&s=TupRua9WlqBQG00wHQofyUYjEPymD
> aCMGNnZisJ-KyY&e=
> > Reported-by: venkata anil <mailto:vkommadi@redhat.com>
> > Co-authored-by: venkata anil <mailto:vkommadi@redhat.com>
> > Signed-off-by: Numan Siddique <mailto:nusiddiq@redhat.com>
> > Signed-off-by: venkata anil <mailto:vkommadi@redhat.com>
> > ---
> >   ovn/northd/ovn-northd.8.xml |  30 ++++
> >   ovn/northd/ovn-northd.c     |  71 +++++++---
> >   ovn/ovn-architecture.7.xml  | 160 +++++++++++++++++++++
> >   ovn/ovn-nb.xml              |  43 ++++++
> >   tests/ovn.at                | 273 ++++++++++++++++++++++++++++++++++++
> >   5 files changed, 561 insertions(+), 16 deletions(-)
> >
> > diff --git a/ovn/northd/ovn-northd.8.xml 
> > b/ovn/northd/ovn-northd.8.xml index 7352c6764..f52699bd3 100644
> > --- a/ovn/northd/ovn-northd.8.xml
> > +++ b/ovn/northd/ovn-northd.8.xml
> > @@ -874,6 +874,25 @@ output;
> >               resident.
> >             </li>
> >           </ul>
> > +
> > +        <p>
> > +          For the Ethernet address on a logical switch port of type
> > +          <code>router</code>, when that logical switch port's
> > +          <ref column="addresses" table="Logical_Switch_Port"
> > +          db="OVN_Northbound"/> column is set to <code>router</code> and
> > +          the connected logical router port specifies a
> > +          <code>reside-on-redirect-chassis</code> and the logical router
> > +          to which the connected logical router port belongs to has a
> > +          <code>redirect-chassis</code> distributed gateway logical
> router
> > +          port:
> > +        </p>
> > +
> > +        <ul>
> > +          <li>
> > +            The flow for the connected logical router port's Ethernet
> > +            address is only programmed on the
> <code>redirect-chassis</code>.
> > +          </li>
> > +        </ul>
> >         </li>
> >
> >         <li>
> > @@ -1179,6 +1198,17 @@ output;
> >             upstream MAC learning to point to the
> >             <code>redirect-chassis</code>.
> >           </p>
> > +
> > +        <p>
> > +          For the logical router port with the option
> > +          <code>reside-on-redirect-chassis</code> set (which is
> centralized),
> > +          the above flows are only programmed on the gateway port
> instance on
> > +          the <code>redirect-chassis</code> (if the logical router has a
> > +          distributed gateway port). This behavior avoids generation
> > +          of multiple ARP responses from different chassis, and allows
> > +          upstream MAC learning to point to the
> > +          <code>redirect-chassis</code>.
> > +        </p>
> >         </li>
> >
> >         <li>
> > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index 
> > 31ea5f410..3998a898c 100644
> > --- a/ovn/northd/ovn-northd.c
> > +++ b/ovn/northd/ovn-northd.c
> > @@ -4426,13 +4426,32 @@ build_lswitch_flows(struct hmap *datapaths,
> struct hmap *ports,
> >                   ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT,
> >                                 ETH_ADDR_ARGS(mac));
> >                   if (op->peer->od->l3dgw_port
> > -                    && op->peer == op->peer->od->l3dgw_port
> > -                    && op->peer->od->l3redirect_port) {
> > -                    /* The destination lookup flow for the router's
> > -                     * distributed gateway port MAC address should only
> be
> > -                     * programmed on the "redirect-chassis". */
> > -                    ds_put_format(&match, " && is_chassis_resident(%s)",
> > -
> op->peer->od->l3redirect_port->json_key);
> > +                    && op->peer->od->l3redirect_port
> > +                    && op->od->localnet_port) {
> > +                    bool add_chassis_resident_check = false;
> > +                    if (op->peer == op->peer->od->l3dgw_port) {
> > +                        /* The peer of this port represents a
> distributed
> > +                         * gateway port. The destination lookup 
> > + flow
> for the
> > +                         * router's distributed gateway port MAC
> address should
> > +                         * only be programmed on the
> "redirect-chassis". */
> > +                        add_chassis_resident_check = true;
> > +                    } else {
> > +                        /* Check if the option
> 'reside-on-redirect-chassis'
> > +                         * is set to true on the peer port. If set 
> > + to
> true
> > +                         * and if the logical switch has a localnet
> port, it
> > +                         * means the router pipeline for the 
> > + packets
> from
> > +                         * this logical switch should be run on the
> chassis
> > +                         * hosting the gateway port.
> > +                         */
> > +                        add_chassis_resident_check = smap_get_bool(
> > +                            &op->peer->nbrp->options,
> > +                            "reside-on-redirect-chassis", false);
> > +                    }
> > +
> > +                    if (add_chassis_resident_check) {
> > +                        ds_put_format(&match, " &&
> is_chassis_resident(%s)",
> > +
> op->peer->od->l3redirect_port->json_key);
> > +                    }
> >                   }
> >
> >                   ds_clear(&actions); @@ -5197,15 +5216,35 @@ 
> > build_lrouter_flows(struct hmap *datapaths,
> struct hmap *ports,
> >                             op->lrp_networks.ipv4_addrs[i].network_s,
> >                             op->lrp_networks.ipv4_addrs[i].plen,
> >                             op->lrp_networks.ipv4_addrs[i].addr_s);
> > -            if (op->od->l3dgw_port && op == op->od->l3dgw_port
> > -                && op->od->l3redirect_port) {
> > -                /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s
> > -                 * should only be sent from the "redirect-chassis", so
> that
> > -                 * upstream MAC learning points to the
> "redirect-chassis".
> > -                 * Also need to avoid generation of multiple ARP
> responses
> > -                 * from different chassis. */
> > -                ds_put_format(&match, " && is_chassis_resident(%s)",
> > -                              op->od->l3redirect_port->json_key);
> > +
> > +            if (op->od->l3dgw_port && op->od->l3redirect_port &&
> op->peer
> > +                && op->peer->od->localnet_port) {
> > +                bool add_chassis_resident_check = false;
> > +                if (op == op->od->l3dgw_port) {
> > +                    /* Traffic with eth.src =
> l3dgw_port->lrp_networks.ea_s
> > +                     * should only be sent from the 
> > + "redirect-chassis",
> so that
> > +                     * upstream MAC learning points to the
> "redirect-chassis".
> > +                     * Also need to avoid generation of multiple 
> > + ARP
> responses
> > +                     * from different chassis. */
> > +                    add_chassis_resident_check = true;
> > +                } else {
> > +                    /* Check if the option 'reside-on-redirect-chassis'
> > +                     * is set to true on the router port. If set to true
> > +                     * and if peer's logical switch has a localnet
> port, it
> > +                     * means the router pipeline for the packets from
> > +                     * peer's logical switch is be run on the chassis
> > +                     * hosting the gateway port and it should reply 
> > + to
> the
> > +                     * ARP requests for the router port IPs.
> > +                     */
> > +                    add_chassis_resident_check = smap_get_bool(
> > +                        &op->nbrp->options,
> > +                        "reside-on-redirect-chassis", false);
> > +                }
> > +
> > +                if (add_chassis_resident_check) {
> > +                    ds_put_format(&match, " && is_chassis_resident(%s)",
> > +                                  op->od->l3redirect_port->json_key);
> > +                }
> >               }
> >
> >               ds_clear(&actions);
> > diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml 
> > index 6ed2cf132..998470c34 100644
> > --- a/ovn/ovn-architecture.7.xml
> > +++ b/ovn/ovn-architecture.7.xml
> > @@ -1372,6 +1372,166 @@
> >       https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_topics_high-2Davailability&d=DwICAg&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mE2qywxdjIadgBjj3-Xsjbt7jiXYD543pSHvwWZn5sg&s=3YEL0T7qW3h-GbKKAAcQ2q6kFtMqXliOiuOLrpKVQsg&e=.
> >     </p>
> >
> > +  <h2>Tenant VLAN networks connected to a Logical Router</h2>
> > +
> > +  <p>
> > +    It is possible to have multiple logical switches each with a
> localnet port
> > +    (representing physical networks) connected to a logical router 
> > + in
> which one
> > +    may provide the external connectivity via a distributed gatewat
> port and
> > +    the rest of them are used internally (with VLAN tagged). It is
> expected
> > +    that <code>ovn-bridge-mappings</code> is configured 
> > + appropriately
> on the
> > +    chassis.
> > +  </p>
> > +
> > +  <h3>East West routing</h3>
> > +  <p>
> > +    East-West routing between these tenant VLAN logical switches 
> > + works
> almost
> > +    the same way as normal logical switches. When the VM sends such 
> > + a
> packet,
> > +    then:
> > +  </p>
> > +  <ol>
> > +    <li>
> > +      The packet enters the ingress pipeline of the logical router
> datapath
> > +      via the logical router port in the source chassis.
> > +    </li>
> > +
> > +    <li>
> > +      Routing decision is taken.
> > +    </li>
> > +
> > +    <li>
> > +      The packet goes out of the integration bridge to the provider
> bridge (
> > +      belonging to the destination logical switch) via the localnet
> port.
> > +    </li>
> > +
> > +    <li>
> > +      The destination chassis receives the packet via the localnet port
> > +      and delivers to the destination VM.
> > +    </li>
> > +  </ol>
> > +
> > +  <h3>External traffic</h3>
> > +
> > +  <p>
> > +    The following happens when a VM sends an external traffic 
> > + (which
> requires
> > +    NATting) and the chassis hosting the VM doesn't have a 
> > + distributed
> gateway
> > +    port.
> > +  </p>
> > +
> > +  <ol>
> > +    <li>
> > +      The packet enters the ingress pipeline of the logical router
> datapath
> > +      via the logical router port in the source chassis.
> > +    </li>
> > +
> > +    <li>
> > +      Routing decision is taken. Since the gateway router or the
> distributed
> > +      gateway port doesn't reside in the source chassis, the traffic is
> > +      redirected to the gateway chassis via the tunnel port.
> > +    </li>
> > +
> > +    <li>
> > +      The gateway chassis receives the packet, applies the NAT rules and
> > +      forwards it via the localnet port.
> > +    </li>
> > +  </ol>
> > +
> > +  <p>
> > +    Although this works, the VM traffic is tunnelled. In order for it to
> > +    work properly, the MTU of the VLAN tenant networks must be 
> > + lowered
> to
> > +    account for the tunnel encapsulation.
> > +  </p>
> > +
> > +  <h2>Centralized routing for VLAN tenant networks</h2>
> > +
> > +  <p>
> > +    To overcome the tunnel encapsulation problem described in the
> previous
> > +    section, <code>OVN</code> supports the option of enabling
> centralized
> > +    routing for VLAN tenant networks. CMS can configure the option
> > +    <ref column="options:reside-on-redirect-chassis"
> > +    table="Logical_Router_Port" db="OVN_NB"/> to <code>true</code> 
> > + for
> each
> > +    <ref table="Logical_Router_Port" db="OVN_NB"/> which connects to the
> > +    logical switch of the VLAN tenant network. This causes the gateway
> > +    chassis (hosting the distributed gateway port) to handle all the
> > +    routing for these networks, making it centralized. It will reply to
> > +    the ARP requests for the logical router port IPs.
> > +  </p>
> > +
> > +  <p>
> > +    If the logical router doesn't have a distributed gateway port
> connecting
> > +    to the provider network, then this option is ignored by
> <code>OVN</code>.
> > +  </p>
> > +
> > +  <p>
> > +    The following happens when a VM sends an east-west traffic 
> > + which
> needs to
> > +    be routed:
> > +  </p>
> > +
> > +  <ol>
> > +    <li>
> > +      The packet from the VM enters the logical datapath pipeline 
> > + of
> the source
> > +      VLAN network in the source chassis and is sent out via the
> localnet port
> > +      (instead of sending it to router pipeline).
> > +    </li>
> > +
> > +    <li>
> > +      The packet enters the logical datapath pipeline of the source VLAN
> > +      network in the gateway chassis and is sent to the logical datapath
> > +      pipeline belonging to the logical router.
> > +    </li>
> > +
> > +    <li>
> > +      Routing decision is taken.
> > +    </li>
> > +
> > +    <li>
> > +      The packet enters the logical datapath pipeline of the destination
> > +      VLAN network. The packet is delivered to the destination VM 
> > + if it
> resides
> > +      in the same chassis. Otherwise the packet is sent out via the
> localnet
> > +      port of the destination VLAN network.
> > +    </li>
> > +
> > +    <li>
> > +      The destination chassis receives the packet via the localnet port
> > +      and delivers to the destination VM.
> > +    </li>
> > +  </ol>
> > +
> > +  <p>
> > +    The following happens when a VM sends an external traffic which
> requires
> > +    NATting:
> > +  </p>
> > +
> > +  <ol>
> > +    <li>
> > +      The packet from the VM enters the logical datapath pipeline 
> > + of
> the source
> > +      VLAN network in the source chassis and is sent out via the
> localnet port
> > +      (instead of sending it to router pipeline).
> > +    </li>
> > +
> > +    <li>
> > +      The packet enters the logical datapath pipeline of the source VLAN
> > +      network in the gateway chassis and is sent to the logical datapath
> > +      pipeline belonging to the logical router.
> > +    </li>
> > +
> > +    <li>
> > +      Routing decision is taken and NAT rules are applied.
> > +    </li>
> > +
> > +    <li>
> > +      The packet enters the logical datapath pipeline of the 
> > + provider
> network
> > +      and is sent out via the localnet port of the provider network.
> > +    </li>
> > +  </ol>
> > +
> > +  <p>
> > +    For the reverse external traffic, the gateway chassis applies 
> > + the
> unNATting
> > +    rules and sends the packet via the localnet port of the VLAN tenant
> > +    network and the destination chassis receives the packet and
> delivers to
> > +    the VM.
> > +  </p>
> > +
> >     <h2>Life Cycle of a VTEP gateway</h2>
> >
> >     <p>
> > diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml index 
> > 8564ed39c..13ae56e13 100644
> > --- a/ovn/ovn-nb.xml
> > +++ b/ovn/ovn-nb.xml
> > @@ -1635,6 +1635,49 @@
> >             chassis to enable high availability.
> >           </p>
> >         </column>
> > +
> > +      <column name="options" key="reside-on-redirect-chassis">
> > +        <p>
> > +          Generally routing is distributed in <code>OVN</code>. The
> packet
> > +          from a logical port which needs to be routed hits the 
> > + router
> pipeline
> > +          in the source chassis. For the East-West traffic, the 
> > + packet
> is
> > +          sent directly to the destination chassis. For the outside
> traffic
> > +          the packet is sent to the gateway chassis.
> > +        </p>
> > +
> > +        <p>
> > +          When this option is set, <code>OVN</code> considers this 
> > + only
> if
> > +        </p>
> > +
> > +        <ul>
> > +          <li>
> > +            The logical router to which this logical router port
> belongs to
> > +            has a distributed gateway port.
> > +          </li>
> > +
> > +          <li>
> > +            The peer's logical switch has a localnet port (representing
> > +            a tenant VLAN network)
> > +          </li>
> > +        </ul>
> > +
> > +        <p>
> > +          When this option is set to <code>true</code>, then the packet
> > +          which needs to be routed hits the router pipeline in the
> chassis
> > +          hosting the distributed gateway router port. The source
> chassis
> > +          pushes out this traffic via the localnet port. With this the
> > +          East-West traffic is no more distributed and will always 
> > + go
> through
> > +          the gateway chassis.
> > +        </p>
> > +
> > +        <p>
> > +          Without this option set, for any traffic destined to 
> > + outside
> from a
> > +          logical port which belongs to a logical switch with 
> > + localnet
> port,
> > +          the source chassis will send the traffic to the gateway
> chassis via
> > +          the tunnel port instead of the localnet port and this 
> > + could
> cause MTU
> > +          issues.
> > +        </p>
> > +      </column>
> >       </group>
> >
> >       <group title="Attachment">
> > diff --git a/tests/ovn.at b/tests/ovn.at index 769e09f81..504ba228d 
> > 100644
> > --- a/tests/ovn.at
> > +++ b/tests/ovn.at
> > @@ -8537,6 +8537,279 @@ OVN_CLEANUP([hv1],[hv2],[hv3])
> >
> >   AT_CLEANUP
> >
> > +# VLAN traffic for external network redirected through distributed
> router
> > +# gateway port should use vlans(i.e input network vlan tag) across
> hypervisors
> > +# instead of tunneling.
> > +AT_SETUP([ovn -- vlan traffic for external network with distributed
> router gateway port])
> > +AT_SKIP_IF([test $HAVE_PYTHON = no]) ovn_start
> > +
> > +# Logical network:
> > +# # One LR R1 that has switches foo (192.168.1.0/24) and # # alice 
> > +(172.16.1.0/24) connected to it.  The logical port # # between R1 
> > +and alice has a "redirect-chassis" specified, # # i.e. it is the 
> > +distributed router gateway port(172.16.1.6).
> > +# # Switch alice also has a localnet port defined.
> > +# # An additional switch outside has the same subnet as alice # # 
> > +(172.16.1.0/24), a localnet port and nexthop port(172.16.1.1) # # 
> > +which will receive the packet destined for external network # # 
> > +(i.e 8.8.8.8 as destination ip).
> > +
> > +# Physical network:
> > +# # Three hypervisors hv[123].
> > +# # hv1 hosts vif foo1.
> > +# # hv2 is the "redirect-chassis" that hosts the distributed router
> gateway port.
> > +# # hv3 hosts nexthop port vif outside1.
> > +# # All other tests connect hypervisors to network n1 through 
> > +br-phys
> for tunneling.
> > +# # But in this test, hv1 won't connect to n1(and no br-phys in 
> > +hv1),
> and
> > +# # in order to show vlans(instead of tunneling) used between hv1 
> > +and
> hv2,
> > +# # a new network n2 created and hv1 and hv2 connected to this 
> > +network
> through br-ex.
> > +# # hv2 and hv3 are still connected to n1 network through br-phys.
> > +net_add n1
> > +
> > +# We are not calling ovn_attach for hv1, to avoid adding br-phys.
> > +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any
> bridge in hv1
> > +sim_add hv1
> > +as hv1
> > +ovs-vsctl \
> > +    -- set Open_vSwitch . external-ids:system-id=hv1 \
> > +    -- set Open_vSwitch .
> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> > +    -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \
> > +    -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \
> > +    -- add-br br-int \
> > +    -- set bridge br-int fail-mode=secure
> other-config:disable-in-band=true \
> > +    -- set Open_vSwitch . 
> > + external-ids:ovn-bridge-mappings=public:br-ex
> > +
> > +start_daemon ovn-controller
> > +ovs-vsctl -- add-port br-int hv1-vif1 -- \
> > +    set interface hv1-vif1 external-ids:iface-id=foo1 \
> > +    ofport-request=1
> > +
> > +sim_add hv2
> > +as hv2
> > +ovs-vsctl add-br br-phys
> > +ovn_attach n1 br-phys 192.168.0.2
> > +ovs-vsctl set Open_vSwitch .
> external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys"
> > +
> > +sim_add hv3
> > +as hv3
> > +ovs-vsctl add-br br-phys
> > +ovn_attach n1 br-phys 192.168.0.3
> > +ovs-vsctl -- add-port br-int hv3-vif1 -- \
> > +    set interface hv3-vif1 external-ids:iface-id=outside1 \
> > +    options:tx_pcap=hv3/vif1-tx.pcap \
> > +    options:rxq_pcap=hv3/vif1-rx.pcap \
> > +    ofport-request=1
> > +ovs-vsctl set Open_vSwitch .
> external-ids:ovn-bridge-mappings="phys:br-phys"
> > +
> > +# Create network n2 for vlan connectivity between hv1 and hv2 
> > +net_add n2
> > +
> > +as hv1
> > +ovs-vsctl add-br br-ex
> > +net_attach n2 br-ex
> > +
> > +as hv2
> > +ovs-vsctl add-br br-ex
> > +net_attach n2 br-ex
> > +
> > +OVN_POPULATE_ARP
> > +
> > +ovn-nbctl create Logical_Router name=R1
> > +
> > +ovn-nbctl ls-add foo
> > +ovn-nbctl ls-add alice
> > +ovn-nbctl ls-add outside
> > +
> > +# Connect foo to R1
> > +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24 ovn-nbctl 
> > +lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
> > +    type=router options:router-port=foo \
> > +    -- lsp-set-addresses rp-foo router
> > +
> > +# Connect alice to R1 as distributed router gateway port 
> > +(172.16.1.6)
> on hv2
> > +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \
> > +    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
> > +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
> > +    type=router options:router-port=alice \
> > +    -- lsp-set-addresses rp-alice router \
> > +
> > +# Create logical port foo1 in foo
> > +ovn-nbctl lsp-add foo foo1 \
> > +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
> > +
> > +# Create logical port outside1 in outside, which is a nexthop 
> > +address # for 172.16.1.0/24 ovn-nbctl lsp-add outside outside1 \
> > +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1"
> > +
> > +# Set default gateway (nexthop) to 172.16.1.1 ovn-nbctl 
> > +lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice AT_CHECK([ovn-nbctl 
> > +lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24]) ovn-nbctl set 
> > +Logical_Switch_Port rp-alice options:nat-addresses=router
> > +
> > +ovn-nbctl lsp-add foo ln-foo
> > +ovn-nbctl lsp-set-addresses ln-foo unknown ovn-nbctl 
> > +lsp-set-options ln-foo network_name=public ovn-nbctl lsp-set-type 
> > +ln-foo localnet AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo 
> > +tag=2])
> > +
> > +# Create localnet port in alice
> > +ovn-nbctl lsp-add alice ln-alice
> > +ovn-nbctl lsp-set-addresses ln-alice unknown ovn-nbctl lsp-set-type 
> > +ln-alice localnet ovn-nbctl lsp-set-options ln-alice 
> > +network_name=phys
> > +
> > +# Create localnet port in outside
> > +ovn-nbctl lsp-add outside ln-outside ovn-nbctl lsp-set-addresses 
> > +ln-outside unknown ovn-nbctl lsp-set-type ln-outside localnet 
> > +ovn-nbctl lsp-set-options ln-outside network_name=phys
> > +
> > +# Allow some time for ovn-northd and ovn-controller to catch up.
> > +# XXX This should be more systematic.
> > +ovn-nbctl --wait=hv --timeout=3 sync
> > +
> > +# Check that there is a logical flow in logical switch foo's 
> > +pipeline # to set the outport to rp-foo (which is expected).
> > +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep 
> > +ls_in_l2_lkup
> | \
> > +grep rp-foo | grep -v is_chassis_resident | wc -l`])
> > +
> > +# Set the option 'reside-on-redirect-chassis' for foo ovn-nbctl set 
> > +logical_router_port foo
> options:reside-on-redirect-chassis=true
> > +# Check that there is a logical flow in logical switch foo's 
> > +pipeline # to set the outport to rp-foo with the condition is_chassis_redirect.
> > +ovn-sbctl dump-flows foo
> > +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep 
> > +ls_in_l2_lkup
> | \
> > +grep rp-foo | grep is_chassis_resident | wc -l`])
> > +
> > +echo "---------NB dump-----"
> > +ovn-nbctl show
> > +echo "---------------------"
> > +ovn-nbctl list logical_router
> > +echo "---------------------"
> > +ovn-nbctl list nat
> > +echo "---------------------"
> > +ovn-nbctl list logical_router_port
> > +echo "---------------------"
> > +
> > +echo "---------SB dump-----"
> > +ovn-sbctl list datapath_binding
> > +echo "---------------------"
> > +ovn-sbctl list port_binding
> > +echo "---------------------"
> > +ovn-sbctl dump-flows
> > +echo "---------------------"
> > +ovn-sbctl list chassis
> > +echo "---------------------"
> > +
> > +for chassis in hv1 hv2 hv3; do
> > +    as $chassis
> > +    echo "------ $chassis dump ----------"
> > +    ovs-vsctl show br-int
> > +    ovs-ofctl show br-int
> > +    ovs-ofctl dump-flows br-int
> > +    echo "--------------------------"
> > +done
> > +
> > +ip_to_hex() {
> > +    printf "%02x%02x%02x%02x" "$@"
> > +}
> > +
> > +foo1_ip=$(ip_to_hex 192 168 1 2)
> > +gw_ip=$(ip_to_hex 172 16 1 6)
> > +dst_ip=$(ip_to_hex 8 8 8 8)
> > +nexthop_ip=$(ip_to_hex 172 16 1 1)
> > +
> > +foo1_mac="f00000010203"
> > +foo_mac="000001010203"
> > +gw_mac="000002010203"
> > +nexthop_mac="f00000010204"
> > +
> > +# Send ip packet from foo1 to 8.8.8.8 src_mac="f00000010203"
> > +dst_mac="000001010203"
> >
> +packet=${foo_mac}${foo1_mac}08004500001c0000000040110000${foo1_ip}${d
> +st_ip}0035111100080000
> > +
> > +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet sleep 2
> > +
> > +# ARP request packet for nexthop_ip to expect at outside1
> >
> +arp_request=ffffffffffff${gw_mac}08060001080006040001${gw_mac}${gw_ip
> +}000000000000${nexthop_ip}
> > +echo $arp_request >> hv3-vif1.expected cat hv3-vif1.expected > 
> > +expout $PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap 
> > +| grep
> ${nexthop_ip} | uniq > hv3-vif1
> > +AT_CHECK([sort hv3-vif1], [0], [expout])
> > +
> > +# Send ARP reply from outside1 back to the router 
> > +reply_mac="f00000010204"
> >
> +arp_reply=${gw_mac}${nexthop_mac}08060001080006040002${nexthop_mac}${
> +nexthop_ip}${gw_mac}${gw_ip}
> > +
> > +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply 
> > +OVS_WAIT_UNTIL([
> > +    test `as hv2 ovs-ofctl dump-flows br-int | grep table=66 | \ 
> > +grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1
> > +    ])
> > +
> > +# VLAN tagged packet with router port(192.168.1.1) MAC as 
> > +destination
> MAC
> > +# is expected on bridge connecting hv1 and hv2
> >
> +expected=${foo_mac}${foo1_mac}8100000208004500001c0000000040110000${f
> +oo1_ip}${dst_ip}0035111100080000
> > +echo $expected > hv1-br-ex_n2.expected
> > +
> > +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port.
> > +# As connection tracking not enabled for this test, snat can't be 
> > +done
> on the packet.
> > +# We still see foo1 as the source ip address. But source 
> > +mac(gateway
> MAC) and
> > +# dest mac(nexthop mac) are properly configured.
> >
> +expected=${nexthop_mac}${gw_mac}08004500001c000000003f110100${foo1_ip
> +}${dst_ip}0035111100080000
> > +echo $expected > hv3-vif1.expected
> > +
> > +reset_pcap_file() {
> > +    local iface=$1
> > +    local pcap_file=$2
> > +    ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap 
> > +\ options:rxq_pcap=dummy-rx.pcap
> > +    rm -f ${pcap_file}*.pcap
> > +    ovs-vsctl -- set Interface $iface
> options:tx_pcap=${pcap_file}-tx.pcap \
> > +options:rxq_pcap=${pcap_file}-rx.pcap
> > +}
> > +
> > +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2 as hv3 reset_pcap_file 
> > +hv3-vif1 hv3/vif1 sleep 2 as hv1 ovs-appctl netdev-dummy/receive 
> > +hv1-vif1 $packet sleep 2
> > +
> > +# On hv1, the packet should not go from vlan switch pipleline to 
> > +router # pipleine as hv1 ovs-ofctl dump-flows br-int
> > +
> > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
> "priority=100,reg15=0x1,metadata=0x2" \
> > +| grep actions=clone | grep -v n_packets=0 | wc -l], [0], [[0
> > +]])
> > +
> > +# On hv1, table 32 check that no packet goes via the tunnel port 
> > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 \
> > +| grep "NXM_NX_TUN_ID" | grep -v n_packets=0 | wc -l], [0], [[0
> > +]])
> > +
> > +ip_packet() {
> > +    grep "1010203f00000010203"
> > +}
> > +
> > +# Check vlan tagged packet on the bridge connecting hv1 and hv2 
> > +with the # foo1's mac.
> > +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-ex_n2-tx.pcap |
> ip_packet | uniq > hv1-br-ex_n2
> > +cat hv1-br-ex_n2.expected > expout
> > +AT_CHECK([sort hv1-br-ex_n2], [0], [expout])
> > +
> > +# Check expected packet on nexthop interface $PYTHON 
> > +"$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep
> ${foo1_ip}${dst_ip} | uniq > hv3-vif1
> > +cat hv3-vif1.expected > expout
> > +AT_CHECK([sort hv3-vif1], [0], [expout])
> > +
> > +OVN_CLEANUP([hv1],[hv2],[hv3])
> > +AT_CLEANUP
> > +
> >   AT_SETUP([ovn -- IPv6 ND Router Solicitation responder])
> >   AT_KEYWORDS([ovn-nd_ra])
> >   AT_SKIP_IF([test $HAVE_PYTHON = no])
> >
>
>
Numan Siddique Nov. 3, 2018, 12:25 p.m. UTC | #4
Hi Ankur,
Thanks for the review  and for the comments. Please see below

Thanks
Numan


On Fri, Nov 2, 2018 at 6:21 AM Ankur Sharma <ankur.sharma@nutanix.com>
wrote:

> Hi Numan, Mark,
>
> Thanks for the patch.
> Description explains the problem statement and solution really well.
>
> I have following comments:
>
> a. Regarding the solution:
>    i. I think referring to a non connected port in a logical datapath
> pipeline is probably not the right way.
>       i.e as per the description, in ls0 pipeline we are referring to
> lr0-public.
>       lr0-public is not a peer interface of ls0-lr0 patch port pair, hence
> ls0 pipeline should be totally agnostic to it (just my 2 cents).
>
>       table=16(ls_in_l2_lkup), priority=50, match=(eth.dst ==
> 00:00:00:00:af:12 && is_chassis_resident("cr-lr0-public")),
> action=(outport = "sw0-lr0"; output;)
>

If we want the packet to enter the router pipeline on the chassis which is
hosting the gateway port, then I couldn't find any otherway.


>
> b. Nit: Patch 2 in this series does not look related to patch  1 (unless I
> am missing something),
>             if current then we should have a separate patch for it.
>

Agree. But since both the patches are related to VLAN I thought I would
club it.

>
> c. Nit: Any new config that we are adding, we should add corresponding
> ovn-nbctl CLI.
>            This way non openstack deployments can also play with the
> feature 😊
>

We could add a new command in ovn-nbctl -  "ovn-nbctl lrp-set-options".
Generally I tend to use the
generic DB commands - set/add/remove/clear/destroy. In this case we can use
"ovn-nbctl set logical_router_port <LRP_NAME>
options:reside-on-redirect-chassis=true".

>
> ===================================================
>
> Regarding the solution:
> ------------------------------
>
> As I understand, we are trying to achieve following here:
> a. For a distributed virtual router, we want some router interface to be
> fully centralized, rather than distributed.
>     i.e instead of creating 2 versions of a router port (lrp-* and
> cr-lrp*), one distributed and one centralized,
>     We want the lrp-*  itself to be centralized.
>     i.e in your case, from vlan backed logical switch, you want to enter
> router pipeline only on the gateway chassis and NOT on the source chassis.
>
> b. Just wanted to propose an alternate approach here, which we implemented
> for a slightly different use case.
>      Looks like this alternate approach would help in your scenario as
> well.
>
> Alternate Approach:
> --------------------------
> a. Convert the port pair between logical switch and logical router to be
> of type "l3gateway", rather than "patch".
> b. i.e in your configuration, lr0-sw0 and sw0-lr0 should be implemented as
> type l3gateway rather than patch.
> c. In other words, we are simulating a centralized router (for specific
> peer logical switches) in a distributed router.
> We will be discussing this approach in OVS Conf as well:
>

I actually thought about this approach. I remember Goushai Li had
submitted  a patch to support multiple gateway ports.
https://patchwork.ozlabs.org/patch/884351/. Unfortunately I didn't review
it when it was submitted (although I had promised to
look into it). I somehow missed it. When I re looked it, I thought that may
be it's not the right approach since the VLAN tenant
networks are internal tenant networks and they are not externally reachable
networks and semantically doesn't seem right
to set type as "l3gateway" when these VLAN networks don't provide external
connectivity. If multiple VLAN networks are added
to a logical router with one providing external connectivity, I am not sure
how NATting would be handled there.




https://ovsfall2018.sched.com/event/IO9w/connectivity-for-external-networks-on-the-overlay?iframe=no&w=100%&sidebar=yes&bg=no
>
> ===================================================
>
> Please feel free to point, if I missed something here.
> Please feel free to comment on the proposed alternative.
>
> Thanks
>
> Regards,
> Ankur
>
>
> -----Original Message-----
> From: ovs-dev-bounces@openvswitch.org <ovs-dev-bounces@openvswitch.org>
> On Behalf Of Numan Siddique
> Sent: Monday, October 15, 2018 2:38 AM
> To: Mark Michelson <mmichels@redhat.com>
> Cc: ovs dev <dev@openvswitch.org>
> Subject: Re: [ovs-dev] [PATCH 1/2] ovn: Avoid tunneling for VLAN packets
> redirected to a gateway chassis
>
> On Sat, Oct 13, 2018 at 1:26 AM Mark Michelson <mailto:mmichels@redhat.com>
> wrote:
>
> > Hi Numan,
> >
> > The patch does a good job of explaining the routing behavior and the
> > tunneling problem solved within.
> >
> > Prior to the patch, you can have a distributed gateway router with a
> > redirect-chassis port set on it. This allows for east-west traffic to
> > have an optimal direct path between hypervisors, but for the
> > north-south use case, when the traffic is redirected to the
> > redirect-chassis, the traffic is encapsulated.
> >
> > With this patch, you add the reside-on-redirect-chassis option to
> > router ports. This essentially makes all traffic destined for the
> > router port get redirected to the gateway chassis prior to running the
> > router pipeline. This removes the encapsulation issue, but it also
> > means that east-west traffic is now also centralized.
> >
>
> That's right. That's the trade off to solve this issue for VLAN tenant
> networks.
>
>
> >
> > I'm curious what the current behavior is when you specify a gateway
> > router by setting options:chassis. Specifically, I'm curious about how
> > it compares if you define a router where the "external" port has
> > options:redirect-chassis set on it and all other ports have
> > options:reside-on-redirect-chassis set on them. Have you essentially
> > just created the same thing? Or is there some subtle difference?
> >
>
> There is a difference. In the case of gateway router (options:chassis set)
> scenario, it is expected that the tenant VLAN logical switches will be
> connected to a normal router and this normal router will be connected to
> the gateway router via a transit switch.
> So the east west traffic will be distributed, but for the North/South
> traffic, the packet on the source chassis enters logical switch pipeline ->
> normal router pipeline -> transit switch pipeline. And then the packet is
> sent to the chassis hosting the gateway router via the tunnel port. On the
> gateway chassis, packet enters the transit switch pipeline ->  gateway
> router pipeline -> provider network pipeline.
>
> Thanks
> Numan
>
>
>
>
>
>
> > On 10/05/2018 01:14 PM, mailto:nusiddiq@redhat.com wrote:
> > > From: Numan Siddique <mailto:nusiddiq@redhat.com>
> > >
> > > An OVN deployment can have multiple logical switches each with a
> > > localnet port connected to a distributed logical router with one
> > > logical router port providing external connectivity (provider
> > > network) and others used as tenant networks with VLAN tagging.
> > >
> > > As reported in [1], external traffic from these VLAN tenant networks
> > > are tunnelled to the gateway chassis (chassis hosting a distributed
> > > gateway port which applies NAT rules). As part of the discussion in
> > > [1], there were few possible solutions proposed by Russell [2]. This
> > > patch implements the first option in [2].
> > >
> > > With this patch, a new option 'reside-on-redirect-chassis' in 'options'
> > > column of Logical_Router_Port table is added. If the value of this
> > > option is set to 'true' and if the logical router also have a
> > > distributed gateway port, then routing for this logical router port
> > > is centralized in the chassis hosting the distributed gateway port.
> > >
> > > If a logical switch 'sw0' is connected to a router 'lr0' with the
> > > router port - 'lr0-sw0' with the address - "00:00:00:00:af:12
> > 192.168.1.1"
> > > , and it has a distributed logical port - 'lr0-public', then the
> > > below logical flow is added in the logical switch pipeline of 'sw0'
> > > if the 'reside-on-redirect-chassis' option is set on 'lr-sw0' -
> > >
> > > table=16(ls_in_l2_lkup), priority=50, match=(eth.dst ==
> > > 00:00:00:00:af:12 &&
> > is_chassis_resident("cr-lr0-public")),
> > > action=(outport = "sw0-lr0"; output;)
> > >
> > > With the above flow, the packet doesn't enter the router pipeline in
> > > the source chassis. Instead the packet is sent out via the localnet
> > > port of 'sw0'. The gateway chassis upon receiving this packet, runs
> > > the logical router pipeline applying NAT rules and sends the traffic
> > > out via the localnet port of the provider network. The gateway
> > > chassis will also reply to the ARP requests for the router port IPs.
> > >
> > > With this approach, we avoid redirecting the external traffic to the
> > > gateway chassis via the tunnel port. There are a couple of drawbacks
> > > with this approach:
> > >
> > >    - East - West routing is no more distributed for the VLAN tenant
> > >      networks if 'reside-on-redirect-chassis' option is defined
> > >
> > >    - 'dnat_and_snat' NAT rules with 'logical_mac' and 'logical_port'
> > >      columns defined will not work for the VLAN tenant networks.
> > >
> > > This approach is taken for now as it is simple. If there is a
> > > requirement to support distributed routing for these VLAN tenant
> > > networks, we can explore other possible solutions.
> > >
> > > [1] -
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.
> > org_pipermail_ovs-2Ddiscuss_2018-2DApril_046543.html&d=DwICAg&c=s883Gp
> > UCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mE2qy
> > wxdjIadgBjj3-Xsjbt7jiXYD543pSHvwWZn5sg&s=TupRua9WlqBQG00wHQofyUYjEPymD
> > aCMGNnZisJ-KyY&e=
> > > [2] -
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.
> > org_pipermail_ovs-2Ddiscuss_2018-2DApril_046557.html&d=DwICAg&c=s883Gp
> > UCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mE2qy
> > wxdjIadgBjj3-Xsjbt7jiXYD543pSHvwWZn5sg&s=ufwXW9yvvqyU0Uc4YG3VaNekB5ieu
> > 5EpBrRGcK_j0-k&e=
> > >
> > > Reported-at:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.
> > org_pipermail_ovs-2Ddiscuss_2018-2DApril_046543.html&d=DwICAg&c=s883Gp
> > UCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mE2qy
> > wxdjIadgBjj3-Xsjbt7jiXYD543pSHvwWZn5sg&s=TupRua9WlqBQG00wHQofyUYjEPymD
> > aCMGNnZisJ-KyY&e=
> > > Reported-by: venkata anil <mailto:vkommadi@redhat.com>
> > > Co-authored-by: venkata anil <mailto:vkommadi@redhat.com>
> > > Signed-off-by: Numan Siddique <mailto:nusiddiq@redhat.com>
> > > Signed-off-by: venkata anil <mailto:vkommadi@redhat.com>
> > > ---
> > >   ovn/northd/ovn-northd.8.xml |  30 ++++
> > >   ovn/northd/ovn-northd.c     |  71 +++++++---
> > >   ovn/ovn-architecture.7.xml  | 160 +++++++++++++++++++++
> > >   ovn/ovn-nb.xml              |  43 ++++++
> > >   tests/ovn.at                | 273
> ++++++++++++++++++++++++++++++++++++
> > >   5 files changed, 561 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/ovn/northd/ovn-northd.8.xml
> > > b/ovn/northd/ovn-northd.8.xml index 7352c6764..f52699bd3 100644
> > > --- a/ovn/northd/ovn-northd.8.xml
> > > +++ b/ovn/northd/ovn-northd.8.xml
> > > @@ -874,6 +874,25 @@ output;
> > >               resident.
> > >             </li>
> > >           </ul>
> > > +
> > > +        <p>
> > > +          For the Ethernet address on a logical switch port of type
> > > +          <code>router</code>, when that logical switch port's
> > > +          <ref column="addresses" table="Logical_Switch_Port"
> > > +          db="OVN_Northbound"/> column is set to <code>router</code>
> and
> > > +          the connected logical router port specifies a
> > > +          <code>reside-on-redirect-chassis</code> and the logical
> router
> > > +          to which the connected logical router port belongs to has a
> > > +          <code>redirect-chassis</code> distributed gateway logical
> > router
> > > +          port:
> > > +        </p>
> > > +
> > > +        <ul>
> > > +          <li>
> > > +            The flow for the connected logical router port's Ethernet
> > > +            address is only programmed on the
> > <code>redirect-chassis</code>.
> > > +          </li>
> > > +        </ul>
> > >         </li>
> > >
> > >         <li>
> > > @@ -1179,6 +1198,17 @@ output;
> > >             upstream MAC learning to point to the
> > >             <code>redirect-chassis</code>.
> > >           </p>
> > > +
> > > +        <p>
> > > +          For the logical router port with the option
> > > +          <code>reside-on-redirect-chassis</code> set (which is
> > centralized),
> > > +          the above flows are only programmed on the gateway port
> > instance on
> > > +          the <code>redirect-chassis</code> (if the logical router
> has a
> > > +          distributed gateway port). This behavior avoids generation
> > > +          of multiple ARP responses from different chassis, and allows
> > > +          upstream MAC learning to point to the
> > > +          <code>redirect-chassis</code>.
> > > +        </p>
> > >         </li>
> > >
> > >         <li>
> > > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index
> > > 31ea5f410..3998a898c 100644
> > > --- a/ovn/northd/ovn-northd.c
> > > +++ b/ovn/northd/ovn-northd.c
> > > @@ -4426,13 +4426,32 @@ build_lswitch_flows(struct hmap *datapaths,
> > struct hmap *ports,
> > >                   ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT,
> > >                                 ETH_ADDR_ARGS(mac));
> > >                   if (op->peer->od->l3dgw_port
> > > -                    && op->peer == op->peer->od->l3dgw_port
> > > -                    && op->peer->od->l3redirect_port) {
> > > -                    /* The destination lookup flow for the router's
> > > -                     * distributed gateway port MAC address should
> only
> > be
> > > -                     * programmed on the "redirect-chassis". */
> > > -                    ds_put_format(&match, " &&
> is_chassis_resident(%s)",
> > > -
> > op->peer->od->l3redirect_port->json_key);
> > > +                    && op->peer->od->l3redirect_port
> > > +                    && op->od->localnet_port) {
> > > +                    bool add_chassis_resident_check = false;
> > > +                    if (op->peer == op->peer->od->l3dgw_port) {
> > > +                        /* The peer of this port represents a
> > distributed
> > > +                         * gateway port. The destination lookup
> > > + flow
> > for the
> > > +                         * router's distributed gateway port MAC
> > address should
> > > +                         * only be programmed on the
> > "redirect-chassis". */
> > > +                        add_chassis_resident_check = true;
> > > +                    } else {
> > > +                        /* Check if the option
> > 'reside-on-redirect-chassis'
> > > +                         * is set to true on the peer port. If set
> > > + to
> > true
> > > +                         * and if the logical switch has a localnet
> > port, it
> > > +                         * means the router pipeline for the
> > > + packets
> > from
> > > +                         * this logical switch should be run on the
> > chassis
> > > +                         * hosting the gateway port.
> > > +                         */
> > > +                        add_chassis_resident_check = smap_get_bool(
> > > +                            &op->peer->nbrp->options,
> > > +                            "reside-on-redirect-chassis", false);
> > > +                    }
> > > +
> > > +                    if (add_chassis_resident_check) {
> > > +                        ds_put_format(&match, " &&
> > is_chassis_resident(%s)",
> > > +
> > op->peer->od->l3redirect_port->json_key);
> > > +                    }
> > >                   }
> > >
> > >                   ds_clear(&actions); @@ -5197,15 +5216,35 @@
> > > build_lrouter_flows(struct hmap *datapaths,
> > struct hmap *ports,
> > >                             op->lrp_networks.ipv4_addrs[i].network_s,
> > >                             op->lrp_networks.ipv4_addrs[i].plen,
> > >                             op->lrp_networks.ipv4_addrs[i].addr_s);
> > > -            if (op->od->l3dgw_port && op == op->od->l3dgw_port
> > > -                && op->od->l3redirect_port) {
> > > -                /* Traffic with eth.src =
> l3dgw_port->lrp_networks.ea_s
> > > -                 * should only be sent from the "redirect-chassis", so
> > that
> > > -                 * upstream MAC learning points to the
> > "redirect-chassis".
> > > -                 * Also need to avoid generation of multiple ARP
> > responses
> > > -                 * from different chassis. */
> > > -                ds_put_format(&match, " && is_chassis_resident(%s)",
> > > -                              op->od->l3redirect_port->json_key);
> > > +
> > > +            if (op->od->l3dgw_port && op->od->l3redirect_port &&
> > op->peer
> > > +                && op->peer->od->localnet_port) {
> > > +                bool add_chassis_resident_check = false;
> > > +                if (op == op->od->l3dgw_port) {
> > > +                    /* Traffic with eth.src =
> > l3dgw_port->lrp_networks.ea_s
> > > +                     * should only be sent from the
> > > + "redirect-chassis",
> > so that
> > > +                     * upstream MAC learning points to the
> > "redirect-chassis".
> > > +                     * Also need to avoid generation of multiple
> > > + ARP
> > responses
> > > +                     * from different chassis. */
> > > +                    add_chassis_resident_check = true;
> > > +                } else {
> > > +                    /* Check if the option
> 'reside-on-redirect-chassis'
> > > +                     * is set to true on the router port. If set to
> true
> > > +                     * and if peer's logical switch has a localnet
> > port, it
> > > +                     * means the router pipeline for the packets from
> > > +                     * peer's logical switch is be run on the chassis
> > > +                     * hosting the gateway port and it should reply
> > > + to
> > the
> > > +                     * ARP requests for the router port IPs.
> > > +                     */
> > > +                    add_chassis_resident_check = smap_get_bool(
> > > +                        &op->nbrp->options,
> > > +                        "reside-on-redirect-chassis", false);
> > > +                }
> > > +
> > > +                if (add_chassis_resident_check) {
> > > +                    ds_put_format(&match, " &&
> is_chassis_resident(%s)",
> > > +                                  op->od->l3redirect_port->json_key);
> > > +                }
> > >               }
> > >
> > >               ds_clear(&actions);
> > > diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
> > > index 6ed2cf132..998470c34 100644
> > > --- a/ovn/ovn-architecture.7.xml
> > > +++ b/ovn/ovn-architecture.7.xml
> > > @@ -1372,6 +1372,166 @@
> > >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_topics_high-2Davailability&d=DwICAg&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mE2qywxdjIadgBjj3-Xsjbt7jiXYD543pSHvwWZn5sg&s=3YEL0T7qW3h-GbKKAAcQ2q6kFtMqXliOiuOLrpKVQsg&e=
> .
> > >     </p>
> > >
> > > +  <h2>Tenant VLAN networks connected to a Logical Router</h2>
> > > +
> > > +  <p>
> > > +    It is possible to have multiple logical switches each with a
> > localnet port
> > > +    (representing physical networks) connected to a logical router
> > > + in
> > which one
> > > +    may provide the external connectivity via a distributed gatewat
> > port and
> > > +    the rest of them are used internally (with VLAN tagged). It is
> > expected
> > > +    that <code>ovn-bridge-mappings</code> is configured
> > > + appropriately
> > on the
> > > +    chassis.
> > > +  </p>
> > > +
> > > +  <h3>East West routing</h3>
> > > +  <p>
> > > +    East-West routing between these tenant VLAN logical switches
> > > + works
> > almost
> > > +    the same way as normal logical switches. When the VM sends such
> > > + a
> > packet,
> > > +    then:
> > > +  </p>
> > > +  <ol>
> > > +    <li>
> > > +      The packet enters the ingress pipeline of the logical router
> > datapath
> > > +      via the logical router port in the source chassis.
> > > +    </li>
> > > +
> > > +    <li>
> > > +      Routing decision is taken.
> > > +    </li>
> > > +
> > > +    <li>
> > > +      The packet goes out of the integration bridge to the provider
> > bridge (
> > > +      belonging to the destination logical switch) via the localnet
> > port.
> > > +    </li>
> > > +
> > > +    <li>
> > > +      The destination chassis receives the packet via the localnet
> port
> > > +      and delivers to the destination VM.
> > > +    </li>
> > > +  </ol>
> > > +
> > > +  <h3>External traffic</h3>
> > > +
> > > +  <p>
> > > +    The following happens when a VM sends an external traffic
> > > + (which
> > requires
> > > +    NATting) and the chassis hosting the VM doesn't have a
> > > + distributed
> > gateway
> > > +    port.
> > > +  </p>
> > > +
> > > +  <ol>
> > > +    <li>
> > > +      The packet enters the ingress pipeline of the logical router
> > datapath
> > > +      via the logical router port in the source chassis.
> > > +    </li>
> > > +
> > > +    <li>
> > > +      Routing decision is taken. Since the gateway router or the
> > distributed
> > > +      gateway port doesn't reside in the source chassis, the traffic
> is
> > > +      redirected to the gateway chassis via the tunnel port.
> > > +    </li>
> > > +
> > > +    <li>
> > > +      The gateway chassis receives the packet, applies the NAT rules
> and
> > > +      forwards it via the localnet port.
> > > +    </li>
> > > +  </ol>
> > > +
> > > +  <p>
> > > +    Although this works, the VM traffic is tunnelled. In order for it
> to
> > > +    work properly, the MTU of the VLAN tenant networks must be
> > > + lowered
> > to
> > > +    account for the tunnel encapsulation.
> > > +  </p>
> > > +
> > > +  <h2>Centralized routing for VLAN tenant networks</h2>
> > > +
> > > +  <p>
> > > +    To overcome the tunnel encapsulation problem described in the
> > previous
> > > +    section, <code>OVN</code> supports the option of enabling
> > centralized
> > > +    routing for VLAN tenant networks. CMS can configure the option
> > > +    <ref column="options:reside-on-redirect-chassis"
> > > +    table="Logical_Router_Port" db="OVN_NB"/> to <code>true</code>
> > > + for
> > each
> > > +    <ref table="Logical_Router_Port" db="OVN_NB"/> which connects to
> the
> > > +    logical switch of the VLAN tenant network. This causes the gateway
> > > +    chassis (hosting the distributed gateway port) to handle all the
> > > +    routing for these networks, making it centralized. It will reply
> to
> > > +    the ARP requests for the logical router port IPs.
> > > +  </p>
> > > +
> > > +  <p>
> > > +    If the logical router doesn't have a distributed gateway port
> > connecting
> > > +    to the provider network, then this option is ignored by
> > <code>OVN</code>.
> > > +  </p>
> > > +
> > > +  <p>
> > > +    The following happens when a VM sends an east-west traffic
> > > + which
> > needs to
> > > +    be routed:
> > > +  </p>
> > > +
> > > +  <ol>
> > > +    <li>
> > > +      The packet from the VM enters the logical datapath pipeline
> > > + of
> > the source
> > > +      VLAN network in the source chassis and is sent out via the
> > localnet port
> > > +      (instead of sending it to router pipeline).
> > > +    </li>
> > > +
> > > +    <li>
> > > +      The packet enters the logical datapath pipeline of the source
> VLAN
> > > +      network in the gateway chassis and is sent to the logical
> datapath
> > > +      pipeline belonging to the logical router.
> > > +    </li>
> > > +
> > > +    <li>
> > > +      Routing decision is taken.
> > > +    </li>
> > > +
> > > +    <li>
> > > +      The packet enters the logical datapath pipeline of the
> destination
> > > +      VLAN network. The packet is delivered to the destination VM
> > > + if it
> > resides
> > > +      in the same chassis. Otherwise the packet is sent out via the
> > localnet
> > > +      port of the destination VLAN network.
> > > +    </li>
> > > +
> > > +    <li>
> > > +      The destination chassis receives the packet via the localnet
> port
> > > +      and delivers to the destination VM.
> > > +    </li>
> > > +  </ol>
> > > +
> > > +  <p>
> > > +    The following happens when a VM sends an external traffic which
> > requires
> > > +    NATting:
> > > +  </p>
> > > +
> > > +  <ol>
> > > +    <li>
> > > +      The packet from the VM enters the logical datapath pipeline
> > > + of
> > the source
> > > +      VLAN network in the source chassis and is sent out via the
> > localnet port
> > > +      (instead of sending it to router pipeline).
> > > +    </li>
> > > +
> > > +    <li>
> > > +      The packet enters the logical datapath pipeline of the source
> VLAN
> > > +      network in the gateway chassis and is sent to the logical
> datapath
> > > +      pipeline belonging to the logical router.
> > > +    </li>
> > > +
> > > +    <li>
> > > +      Routing decision is taken and NAT rules are applied.
> > > +    </li>
> > > +
> > > +    <li>
> > > +      The packet enters the logical datapath pipeline of the
> > > + provider
> > network
> > > +      and is sent out via the localnet port of the provider network.
> > > +    </li>
> > > +  </ol>
> > > +
> > > +  <p>
> > > +    For the reverse external traffic, the gateway chassis applies
> > > + the
> > unNATting
> > > +    rules and sends the packet via the localnet port of the VLAN
> tenant
> > > +    network and the destination chassis receives the packet and
> > delivers to
> > > +    the VM.
> > > +  </p>
> > > +
> > >     <h2>Life Cycle of a VTEP gateway</h2>
> > >
> > >     <p>
> > > diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml index
> > > 8564ed39c..13ae56e13 100644
> > > --- a/ovn/ovn-nb.xml
> > > +++ b/ovn/ovn-nb.xml
> > > @@ -1635,6 +1635,49 @@
> > >             chassis to enable high availability.
> > >           </p>
> > >         </column>
> > > +
> > > +      <column name="options" key="reside-on-redirect-chassis">
> > > +        <p>
> > > +          Generally routing is distributed in <code>OVN</code>. The
> > packet
> > > +          from a logical port which needs to be routed hits the
> > > + router
> > pipeline
> > > +          in the source chassis. For the East-West traffic, the
> > > + packet
> > is
> > > +          sent directly to the destination chassis. For the outside
> > traffic
> > > +          the packet is sent to the gateway chassis.
> > > +        </p>
> > > +
> > > +        <p>
> > > +          When this option is set, <code>OVN</code> considers this
> > > + only
> > if
> > > +        </p>
> > > +
> > > +        <ul>
> > > +          <li>
> > > +            The logical router to which this logical router port
> > belongs to
> > > +            has a distributed gateway port.
> > > +          </li>
> > > +
> > > +          <li>
> > > +            The peer's logical switch has a localnet port
> (representing
> > > +            a tenant VLAN network)
> > > +          </li>
> > > +        </ul>
> > > +
> > > +        <p>
> > > +          When this option is set to <code>true</code>, then the
> packet
> > > +          which needs to be routed hits the router pipeline in the
> > chassis
> > > +          hosting the distributed gateway router port. The source
> > chassis
> > > +          pushes out this traffic via the localnet port. With this the
> > > +          East-West traffic is no more distributed and will always
> > > + go
> > through
> > > +          the gateway chassis.
> > > +        </p>
> > > +
> > > +        <p>
> > > +          Without this option set, for any traffic destined to
> > > + outside
> > from a
> > > +          logical port which belongs to a logical switch with
> > > + localnet
> > port,
> > > +          the source chassis will send the traffic to the gateway
> > chassis via
> > > +          the tunnel port instead of the localnet port and this
> > > + could
> > cause MTU
> > > +          issues.
> > > +        </p>
> > > +      </column>
> > >       </group>
> > >
> > >       <group title="Attachment">
> > > diff --git a/tests/ovn.at b/tests/ovn.at index 769e09f81..504ba228d
> > > 100644
> > > --- a/tests/ovn.at
> > > +++ b/tests/ovn.at
> > > @@ -8537,6 +8537,279 @@ OVN_CLEANUP([hv1],[hv2],[hv3])
> > >
> > >   AT_CLEANUP
> > >
> > > +# VLAN traffic for external network redirected through distributed
> > router
> > > +# gateway port should use vlans(i.e input network vlan tag) across
> > hypervisors
> > > +# instead of tunneling.
> > > +AT_SETUP([ovn -- vlan traffic for external network with distributed
> > router gateway port])
> > > +AT_SKIP_IF([test $HAVE_PYTHON = no]) ovn_start
> > > +
> > > +# Logical network:
> > > +# # One LR R1 that has switches foo (192.168.1.0/24) and # # alice
> > > +(172.16.1.0/24) connected to it.  The logical port # # between R1
> > > +and alice has a "redirect-chassis" specified, # # i.e. it is the
> > > +distributed router gateway port(172.16.1.6).
> > > +# # Switch alice also has a localnet port defined.
> > > +# # An additional switch outside has the same subnet as alice # #
> > > +(172.16.1.0/24), a localnet port and nexthop port(172.16.1.1) # #
> > > +which will receive the packet destined for external network # #
> > > +(i.e 8.8.8.8 as destination ip).
> > > +
> > > +# Physical network:
> > > +# # Three hypervisors hv[123].
> > > +# # hv1 hosts vif foo1.
> > > +# # hv2 is the "redirect-chassis" that hosts the distributed router
> > gateway port.
> > > +# # hv3 hosts nexthop port vif outside1.
> > > +# # All other tests connect hypervisors to network n1 through
> > > +br-phys
> > for tunneling.
> > > +# # But in this test, hv1 won't connect to n1(and no br-phys in
> > > +hv1),
> > and
> > > +# # in order to show vlans(instead of tunneling) used between hv1
> > > +and
> > hv2,
> > > +# # a new network n2 created and hv1 and hv2 connected to this
> > > +network
> > through br-ex.
> > > +# # hv2 and hv3 are still connected to n1 network through br-phys.
> > > +net_add n1
> > > +
> > > +# We are not calling ovn_attach for hv1, to avoid adding br-phys.
> > > +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any
> > bridge in hv1
> > > +sim_add hv1
> > > +as hv1
> > > +ovs-vsctl \
> > > +    -- set Open_vSwitch . external-ids:system-id=hv1 \
> > > +    -- set Open_vSwitch .
> > external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> > > +    -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \
> > > +    -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \
> > > +    -- add-br br-int \
> > > +    -- set bridge br-int fail-mode=secure
> > other-config:disable-in-band=true \
> > > +    -- set Open_vSwitch .
> > > + external-ids:ovn-bridge-mappings=public:br-ex
> > > +
> > > +start_daemon ovn-controller
> > > +ovs-vsctl -- add-port br-int hv1-vif1 -- \
> > > +    set interface hv1-vif1 external-ids:iface-id=foo1 \
> > > +    ofport-request=1
> > > +
> > > +sim_add hv2
> > > +as hv2
> > > +ovs-vsctl add-br br-phys
> > > +ovn_attach n1 br-phys 192.168.0.2
> > > +ovs-vsctl set Open_vSwitch .
> > external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys"
> > > +
> > > +sim_add hv3
> > > +as hv3
> > > +ovs-vsctl add-br br-phys
> > > +ovn_attach n1 br-phys 192.168.0.3
> > > +ovs-vsctl -- add-port br-int hv3-vif1 -- \
> > > +    set interface hv3-vif1 external-ids:iface-id=outside1 \
> > > +    options:tx_pcap=hv3/vif1-tx.pcap \
> > > +    options:rxq_pcap=hv3/vif1-rx.pcap \
> > > +    ofport-request=1
> > > +ovs-vsctl set Open_vSwitch .
> > external-ids:ovn-bridge-mappings="phys:br-phys"
> > > +
> > > +# Create network n2 for vlan connectivity between hv1 and hv2
> > > +net_add n2
> > > +
> > > +as hv1
> > > +ovs-vsctl add-br br-ex
> > > +net_attach n2 br-ex
> > > +
> > > +as hv2
> > > +ovs-vsctl add-br br-ex
> > > +net_attach n2 br-ex
> > > +
> > > +OVN_POPULATE_ARP
> > > +
> > > +ovn-nbctl create Logical_Router name=R1
> > > +
> > > +ovn-nbctl ls-add foo
> > > +ovn-nbctl ls-add alice
> > > +ovn-nbctl ls-add outside
> > > +
> > > +# Connect foo to R1
> > > +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24 ovn-nbctl
> > > +lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
> > > +    type=router options:router-port=foo \
> > > +    -- lsp-set-addresses rp-foo router
> > > +
> > > +# Connect alice to R1 as distributed router gateway port
> > > +(172.16.1.6)
> > on hv2
> > > +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \
> > > +    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
> > > +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
> > > +    type=router options:router-port=alice \
> > > +    -- lsp-set-addresses rp-alice router \
> > > +
> > > +# Create logical port foo1 in foo
> > > +ovn-nbctl lsp-add foo foo1 \
> > > +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
> > > +
> > > +# Create logical port outside1 in outside, which is a nexthop
> > > +address # for 172.16.1.0/24 ovn-nbctl lsp-add outside outside1 \
> > > +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1"
> > > +
> > > +# Set default gateway (nexthop) to 172.16.1.1 ovn-nbctl
> > > +lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice AT_CHECK([ovn-nbctl
> > > +lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24]) ovn-nbctl set
> > > +Logical_Switch_Port rp-alice options:nat-addresses=router
> > > +
> > > +ovn-nbctl lsp-add foo ln-foo
> > > +ovn-nbctl lsp-set-addresses ln-foo unknown ovn-nbctl
> > > +lsp-set-options ln-foo network_name=public ovn-nbctl lsp-set-type
> > > +ln-foo localnet AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo
> > > +tag=2])
> > > +
> > > +# Create localnet port in alice
> > > +ovn-nbctl lsp-add alice ln-alice
> > > +ovn-nbctl lsp-set-addresses ln-alice unknown ovn-nbctl lsp-set-type
> > > +ln-alice localnet ovn-nbctl lsp-set-options ln-alice
> > > +network_name=phys
> > > +
> > > +# Create localnet port in outside
> > > +ovn-nbctl lsp-add outside ln-outside ovn-nbctl lsp-set-addresses
> > > +ln-outside unknown ovn-nbctl lsp-set-type ln-outside localnet
> > > +ovn-nbctl lsp-set-options ln-outside network_name=phys
> > > +
> > > +# Allow some time for ovn-northd and ovn-controller to catch up.
> > > +# XXX This should be more systematic.
> > > +ovn-nbctl --wait=hv --timeout=3 sync
> > > +
> > > +# Check that there is a logical flow in logical switch foo's
> > > +pipeline # to set the outport to rp-foo (which is expected).
> > > +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep
> > > +ls_in_l2_lkup
> > | \
> > > +grep rp-foo | grep -v is_chassis_resident | wc -l`])
> > > +
> > > +# Set the option 'reside-on-redirect-chassis' for foo ovn-nbctl set
> > > +logical_router_port foo
> > options:reside-on-redirect-chassis=true
> > > +# Check that there is a logical flow in logical switch foo's
> > > +pipeline # to set the outport to rp-foo with the condition
> is_chassis_redirect.
> > > +ovn-sbctl dump-flows foo
> > > +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep
> > > +ls_in_l2_lkup
> > | \
> > > +grep rp-foo | grep is_chassis_resident | wc -l`])
> > > +
> > > +echo "---------NB dump-----"
> > > +ovn-nbctl show
> > > +echo "---------------------"
> > > +ovn-nbctl list logical_router
> > > +echo "---------------------"
> > > +ovn-nbctl list nat
> > > +echo "---------------------"
> > > +ovn-nbctl list logical_router_port
> > > +echo "---------------------"
> > > +
> > > +echo "---------SB dump-----"
> > > +ovn-sbctl list datapath_binding
> > > +echo "---------------------"
> > > +ovn-sbctl list port_binding
> > > +echo "---------------------"
> > > +ovn-sbctl dump-flows
> > > +echo "---------------------"
> > > +ovn-sbctl list chassis
> > > +echo "---------------------"
> > > +
> > > +for chassis in hv1 hv2 hv3; do
> > > +    as $chassis
> > > +    echo "------ $chassis dump ----------"
> > > +    ovs-vsctl show br-int
> > > +    ovs-ofctl show br-int
> > > +    ovs-ofctl dump-flows br-int
> > > +    echo "--------------------------"
> > > +done
> > > +
> > > +ip_to_hex() {
> > > +    printf "%02x%02x%02x%02x" "$@"
> > > +}
> > > +
> > > +foo1_ip=$(ip_to_hex 192 168 1 2)
> > > +gw_ip=$(ip_to_hex 172 16 1 6)
> > > +dst_ip=$(ip_to_hex 8 8 8 8)
> > > +nexthop_ip=$(ip_to_hex 172 16 1 1)
> > > +
> > > +foo1_mac="f00000010203"
> > > +foo_mac="000001010203"
> > > +gw_mac="000002010203"
> > > +nexthop_mac="f00000010204"
> > > +
> > > +# Send ip packet from foo1 to 8.8.8.8 src_mac="f00000010203"
> > > +dst_mac="000001010203"
> > >
> > +packet=${foo_mac}${foo1_mac}08004500001c0000000040110000${foo1_ip}${d
> > +st_ip}0035111100080000
> > > +
> > > +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet sleep 2
> > > +
> > > +# ARP request packet for nexthop_ip to expect at outside1
> > >
> > +arp_request=ffffffffffff${gw_mac}08060001080006040001${gw_mac}${gw_ip
> > +}000000000000${nexthop_ip}
> > > +echo $arp_request >> hv3-vif1.expected cat hv3-vif1.expected >
> > > +expout $PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap
> > > +| grep
> > ${nexthop_ip} | uniq > hv3-vif1
> > > +AT_CHECK([sort hv3-vif1], [0], [expout])
> > > +
> > > +# Send ARP reply from outside1 back to the router
> > > +reply_mac="f00000010204"
> > >
> > +arp_reply=${gw_mac}${nexthop_mac}08060001080006040002${nexthop_mac}${
> > +nexthop_ip}${gw_mac}${gw_ip}
> > > +
> > > +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply
> > > +OVS_WAIT_UNTIL([
> > > +    test `as hv2 ovs-ofctl dump-flows br-int | grep table=66 | \
> > > +grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1
> > > +    ])
> > > +
> > > +# VLAN tagged packet with router port(192.168.1.1) MAC as
> > > +destination
> > MAC
> > > +# is expected on bridge connecting hv1 and hv2
> > >
> > +expected=${foo_mac}${foo1_mac}8100000208004500001c0000000040110000${f
> > +oo1_ip}${dst_ip}0035111100080000
> > > +echo $expected > hv1-br-ex_n2.expected
> > > +
> > > +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port.
> > > +# As connection tracking not enabled for this test, snat can't be
> > > +done
> > on the packet.
> > > +# We still see foo1 as the source ip address. But source
> > > +mac(gateway
> > MAC) and
> > > +# dest mac(nexthop mac) are properly configured.
> > >
> > +expected=${nexthop_mac}${gw_mac}08004500001c000000003f110100${foo1_ip
> > +}${dst_ip}0035111100080000
> > > +echo $expected > hv3-vif1.expected
> > > +
> > > +reset_pcap_file() {
> > > +    local iface=$1
> > > +    local pcap_file=$2
> > > +    ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap
> > > +\ options:rxq_pcap=dummy-rx.pcap
> > > +    rm -f ${pcap_file}*.pcap
> > > +    ovs-vsctl -- set Interface $iface
> > options:tx_pcap=${pcap_file}-tx.pcap \
> > > +options:rxq_pcap=${pcap_file}-rx.pcap
> > > +}
> > > +
> > > +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2 as hv3 reset_pcap_file
> > > +hv3-vif1 hv3/vif1 sleep 2 as hv1 ovs-appctl netdev-dummy/receive
> > > +hv1-vif1 $packet sleep 2
> > > +
> > > +# On hv1, the packet should not go from vlan switch pipleline to
> > > +router # pipleine as hv1 ovs-ofctl dump-flows br-int
> > > +
> > > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
> > "priority=100,reg15=0x1,metadata=0x2" \
> > > +| grep actions=clone | grep -v n_packets=0 | wc -l], [0], [[0
> > > +]])
> > > +
> > > +# On hv1, table 32 check that no packet goes via the tunnel port
> > > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 \
> > > +| grep "NXM_NX_TUN_ID" | grep -v n_packets=0 | wc -l], [0], [[0
> > > +]])
> > > +
> > > +ip_packet() {
> > > +    grep "1010203f00000010203"
> > > +}
> > > +
> > > +# Check vlan tagged packet on the bridge connecting hv1 and hv2
> > > +with the # foo1's mac.
> > > +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-ex_n2-tx.pcap |
> > ip_packet | uniq > hv1-br-ex_n2
> > > +cat hv1-br-ex_n2.expected > expout
> > > +AT_CHECK([sort hv1-br-ex_n2], [0], [expout])
> > > +
> > > +# Check expected packet on nexthop interface $PYTHON
> > > +"$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep
> > ${foo1_ip}${dst_ip} | uniq > hv3-vif1
> > > +cat hv3-vif1.expected > expout
> > > +AT_CHECK([sort hv3-vif1], [0], [expout])
> > > +
> > > +OVN_CLEANUP([hv1],[hv2],[hv3])
> > > +AT_CLEANUP
> > > +
> > >   AT_SETUP([ovn -- IPv6 ND Router Solicitation responder])
> > >   AT_KEYWORDS([ovn-nd_ra])
> > >   AT_SKIP_IF([test $HAVE_PYTHON = no])
> > >
> >
> >
> _______________________________________________
> dev mailing list
> mailto:dev@openvswitch.org
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwICAg&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mE2qywxdjIadgBjj3-Xsjbt7jiXYD543pSHvwWZn5sg&s=vk8-2EI8-njSdNsgLyP81K8HEZOJfSxugzH3JpXsMUM&e=
>
Numan Siddique Nov. 3, 2018, 12:26 p.m. UTC | #5
On Sat, Nov 3, 2018 at 5:55 PM Numan Siddique <nusiddiq@redhat.com> wrote:

> Hi Ankur,
> Thanks for the review  and for the comments. Please see below
>
> Thanks
> Numan
>
>
> On Fri, Nov 2, 2018 at 6:21 AM Ankur Sharma <ankur.sharma@nutanix.com>
> wrote:
>
>> Hi Numan, Mark,
>>
>> Thanks for the patch.
>> Description explains the problem statement and solution really well.
>>
>> I have following comments:
>>
>> a. Regarding the solution:
>>    i. I think referring to a non connected port in a logical datapath
>> pipeline is probably not the right way.
>>       i.e as per the description, in ls0 pipeline we are referring to
>> lr0-public.
>>       lr0-public is not a peer interface of ls0-lr0 patch port pair,
>> hence ls0 pipeline should be totally agnostic to it (just my 2 cents).
>>
>>       table=16(ls_in_l2_lkup), priority=50, match=(eth.dst ==
>> 00:00:00:00:af:12 && is_chassis_resident("cr-lr0-public")),
>> action=(outport = "sw0-lr0"; output;)
>>
>
> If we want the packet to enter the router pipeline on the chassis which is
> hosting the gateway port, then I couldn't find any otherway.
>
>
>>
>> b. Nit: Patch 2 in this series does not look related to patch  1 (unless
>> I am missing something),
>>             if current then we should have a separate patch for it.
>>
>
> Agree. But since both the patches are related to VLAN I thought I would
> club it.
>
>>
>> c. Nit: Any new config that we are adding, we should add corresponding
>> ovn-nbctl CLI.
>>            This way non openstack deployments can also play with the
>> feature 😊
>>
>
> We could add a new command in ovn-nbctl -  "ovn-nbctl lrp-set-options".
> Generally I tend to use the
> generic DB commands - set/add/remove/clear/destroy. In this case we can use
> "ovn-nbctl set logical_router_port <LRP_NAME>
> options:reside-on-redirect-chassis=true".
>
>>
>> ===================================================
>>
>> Regarding the solution:
>> ------------------------------
>>
>> As I understand, we are trying to achieve following here:
>> a. For a distributed virtual router, we want some router interface to be
>> fully centralized, rather than distributed.
>>     i.e instead of creating 2 versions of a router port (lrp-* and
>> cr-lrp*), one distributed and one centralized,
>>     We want the lrp-*  itself to be centralized.
>>     i.e in your case, from vlan backed logical switch, you want to enter
>> router pipeline only on the gateway chassis and NOT on the source chassis.
>>
>> b. Just wanted to propose an alternate approach here, which we
>> implemented for a slightly different use case.
>>      Looks like this alternate approach would help in your scenario as
>> well.
>>
>> Alternate Approach:
>> --------------------------
>> a. Convert the port pair between logical switch and logical router to be
>> of type "l3gateway", rather than "patch".
>> b. i.e in your configuration, lr0-sw0 and sw0-lr0 should be implemented
>> as type l3gateway rather than patch.
>> c. In other words, we are simulating a centralized router (for specific
>> peer logical switches) in a distributed router.
>> We will be discussing this approach in OVS Conf as well:
>>
>
> I actually thought about this approach. I remember Goushai Li had
> submitted  a patch to support multiple gateway ports.
> https://patchwork.ozlabs.org/patch/884351/. Unfortunately I didn't review
> it when it was submitted (although I had promised to
> look into it). I somehow missed it. When I re looked it, I thought that
> may be it's not the right approach since the VLAN tenant
> networks are internal tenant networks and they are not externally
> reachable networks and semantically doesn't seem right
> to set type as "l3gateway" when these VLAN networks don't provide external
> connectivity. If multiple VLAN networks are added
> to a logical router with one providing external connectivity, I am not
> sure how NATting would be handled there.
>
>
>
>
>
>> https://ovsfall2018.sched.com/event/IO9w/connectivity-for-external-networks-on-the-overlay?iframe=no&w=100%&sidebar=yes&bg=no
>>
>> ===================================================
>>
>> Please feel free to point, if I missed something here.
>> Please feel free to comment on the proposed alternative.
>>
>> Thanks
>>
>> Regards,
>> Ankur
>>
>>
>> -----Original Message-----
>> From: ovs-dev-bounces@openvswitch.org <ovs-dev-bounces@openvswitch.org>
>> On Behalf Of Numan Siddique
>> Sent: Monday, October 15, 2018 2:38 AM
>> To: Mark Michelson <mmichels@redhat.com>
>> Cc: ovs dev <dev@openvswitch.org>
>> Subject: Re: [ovs-dev] [PATCH 1/2] ovn: Avoid tunneling for VLAN packets
>> redirected to a gateway chassis
>>
>> On Sat, Oct 13, 2018 at 1:26 AM Mark Michelson <mailto:
>> mmichels@redhat.com> wrote:
>>
>> > Hi Numan,
>> >
>> > The patch does a good job of explaining the routing behavior and the
>> > tunneling problem solved within.
>> >
>> > Prior to the patch, you can have a distributed gateway router with a
>> > redirect-chassis port set on it. This allows for east-west traffic to
>> > have an optimal direct path between hypervisors, but for the
>> > north-south use case, when the traffic is redirected to the
>> > redirect-chassis, the traffic is encapsulated.
>> >
>> > With this patch, you add the reside-on-redirect-chassis option to
>> > router ports. This essentially makes all traffic destined for the
>> > router port get redirected to the gateway chassis prior to running the
>> > router pipeline. This removes the encapsulation issue, but it also
>> > means that east-west traffic is now also centralized.
>> >
>>
>> That's right. That's the trade off to solve this issue for VLAN tenant
>> networks.
>>
>>
>> >
>> > I'm curious what the current behavior is when you specify a gateway
>> > router by setting options:chassis. Specifically, I'm curious about how
>> > it compares if you define a router where the "external" port has
>> > options:redirect-chassis set on it and all other ports have
>> > options:reside-on-redirect-chassis set on them. Have you essentially
>> > just created the same thing? Or is there some subtle difference?
>> >
>>
>> There is a difference. In the case of gateway router (options:chassis
>> set) scenario, it is expected that the tenant VLAN logical switches will be
>> connected to a normal router and this normal router will be connected to
>> the gateway router via a transit switch.
>> So the east west traffic will be distributed, but for the North/South
>> traffic, the packet on the source chassis enters logical switch pipeline ->
>> normal router pipeline -> transit switch pipeline. And then the packet is
>> sent to the chassis hosting the gateway router via the tunnel port. On the
>> gateway chassis, packet enters the transit switch pipeline ->  gateway
>> router pipeline -> provider network pipeline.
>>
>> Thanks
>> Numan
>>
>>
>>
>>
>>
>>
>> > On 10/05/2018 01:14 PM, mailto:nusiddiq@redhat.com wrote:
>> > > From: Numan Siddique <mailto:nusiddiq@redhat.com>
>> > >
>> > > An OVN deployment can have multiple logical switches each with a
>> > > localnet port connected to a distributed logical router with one
>> > > logical router port providing external connectivity (provider
>> > > network) and others used as tenant networks with VLAN tagging.
>> > >
>> > > As reported in [1], external traffic from these VLAN tenant networks
>> > > are tunnelled to the gateway chassis (chassis hosting a distributed
>> > > gateway port which applies NAT rules). As part of the discussion in
>> > > [1], there were few possible solutions proposed by Russell [2]. This
>> > > patch implements the first option in [2].
>> > >
>> > > With this patch, a new option 'reside-on-redirect-chassis' in
>> 'options'
>> > > column of Logical_Router_Port table is added. If the value of this
>> > > option is set to 'true' and if the logical router also have a
>> > > distributed gateway port, then routing for this logical router port
>> > > is centralized in the chassis hosting the distributed gateway port.
>> > >
>> > > If a logical switch 'sw0' is connected to a router 'lr0' with the
>> > > router port - 'lr0-sw0' with the address - "00:00:00:00:af:12
>> > 192.168.1.1"
>> > > , and it has a distributed logical port - 'lr0-public', then the
>> > > below logical flow is added in the logical switch pipeline of 'sw0'
>> > > if the 'reside-on-redirect-chassis' option is set on 'lr-sw0' -
>> > >
>> > > table=16(ls_in_l2_lkup), priority=50, match=(eth.dst ==
>> > > 00:00:00:00:af:12 &&
>> > is_chassis_resident("cr-lr0-public")),
>> > > action=(outport = "sw0-lr0"; output;)
>> > >
>> > > With the above flow, the packet doesn't enter the router pipeline in
>> > > the source chassis. Instead the packet is sent out via the localnet
>> > > port of 'sw0'. The gateway chassis upon receiving this packet, runs
>> > > the logical router pipeline applying NAT rules and sends the traffic
>> > > out via the localnet port of the provider network. The gateway
>> > > chassis will also reply to the ARP requests for the router port IPs.
>> > >
>> > > With this approach, we avoid redirecting the external traffic to the
>> > > gateway chassis via the tunnel port. There are a couple of drawbacks
>> > > with this approach:
>> > >
>> > >    - East - West routing is no more distributed for the VLAN tenant
>> > >      networks if 'reside-on-redirect-chassis' option is defined
>> > >
>> > >    - 'dnat_and_snat' NAT rules with 'logical_mac' and 'logical_port'
>> > >      columns defined will not work for the VLAN tenant networks.
>> > >
>> > > This approach is taken for now as it is simple. If there is a
>> > > requirement to support distributed routing for these VLAN tenant
>> > > networks, we can explore other possible solutions.
>> > >
>> > > [1] -
>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.
>> > org_pipermail_ovs-2Ddiscuss_2018-2DApril_046543.html&d=DwICAg&c=s883Gp
>> > UCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mE2qy
>> > wxdjIadgBjj3-Xsjbt7jiXYD543pSHvwWZn5sg&s=TupRua9WlqBQG00wHQofyUYjEPymD
>> > aCMGNnZisJ-KyY&e=
>> > > [2] -
>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.
>> > org_pipermail_ovs-2Ddiscuss_2018-2DApril_046557.html&d=DwICAg&c=s883Gp
>> > UCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mE2qy
>> > wxdjIadgBjj3-Xsjbt7jiXYD543pSHvwWZn5sg&s=ufwXW9yvvqyU0Uc4YG3VaNekB5ieu
>> > 5EpBrRGcK_j0-k&e=
>> > >
>> > > Reported-at:
>> > https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.
>> > org_pipermail_ovs-2Ddiscuss_2018-2DApril_046543.html&d=DwICAg&c=s883Gp
>> > UCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mE2qy
>> > wxdjIadgBjj3-Xsjbt7jiXYD543pSHvwWZn5sg&s=TupRua9WlqBQG00wHQofyUYjEPymD
>> > aCMGNnZisJ-KyY&e=
>> > > Reported-by: venkata anil <mailto:vkommadi@redhat.com>
>> > > Co-authored-by: venkata anil <mailto:vkommadi@redhat.com>
>> > > Signed-off-by: Numan Siddique <mailto:nusiddiq@redhat.com>
>> > > Signed-off-by: venkata anil <mailto:vkommadi@redhat.com>
>> > > ---
>> > >   ovn/northd/ovn-northd.8.xml |  30 ++++
>> > >   ovn/northd/ovn-northd.c     |  71 +++++++---
>> > >   ovn/ovn-architecture.7.xml  | 160 +++++++++++++++++++++
>> > >   ovn/ovn-nb.xml              |  43 ++++++
>> > >   tests/ovn.at                | 273
>> ++++++++++++++++++++++++++++++++++++
>> > >   5 files changed, 561 insertions(+), 16 deletions(-)
>> > >
>> > > diff --git a/ovn/northd/ovn-northd.8.xml
>> > > b/ovn/northd/ovn-northd.8.xml index 7352c6764..f52699bd3 100644
>> > > --- a/ovn/northd/ovn-northd.8.xml
>> > > +++ b/ovn/northd/ovn-northd.8.xml
>> > > @@ -874,6 +874,25 @@ output;
>> > >               resident.
>> > >             </li>
>> > >           </ul>
>> > > +
>> > > +        <p>
>> > > +          For the Ethernet address on a logical switch port of type
>> > > +          <code>router</code>, when that logical switch port's
>> > > +          <ref column="addresses" table="Logical_Switch_Port"
>> > > +          db="OVN_Northbound"/> column is set to <code>router</code>
>> and
>> > > +          the connected logical router port specifies a
>> > > +          <code>reside-on-redirect-chassis</code> and the logical
>> router
>> > > +          to which the connected logical router port belongs to has a
>> > > +          <code>redirect-chassis</code> distributed gateway logical
>> > router
>> > > +          port:
>> > > +        </p>
>> > > +
>> > > +        <ul>
>> > > +          <li>
>> > > +            The flow for the connected logical router port's Ethernet
>> > > +            address is only programmed on the
>> > <code>redirect-chassis</code>.
>> > > +          </li>
>> > > +        </ul>
>> > >         </li>
>> > >
>> > >         <li>
>> > > @@ -1179,6 +1198,17 @@ output;
>> > >             upstream MAC learning to point to the
>> > >             <code>redirect-chassis</code>.
>> > >           </p>
>> > > +
>> > > +        <p>
>> > > +          For the logical router port with the option
>> > > +          <code>reside-on-redirect-chassis</code> set (which is
>> > centralized),
>> > > +          the above flows are only programmed on the gateway port
>> > instance on
>> > > +          the <code>redirect-chassis</code> (if the logical router
>> has a
>> > > +          distributed gateway port). This behavior avoids generation
>> > > +          of multiple ARP responses from different chassis, and
>> allows
>> > > +          upstream MAC learning to point to the
>> > > +          <code>redirect-chassis</code>.
>> > > +        </p>
>> > >         </li>
>> > >
>> > >         <li>
>> > > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index
>> > > 31ea5f410..3998a898c 100644
>> > > --- a/ovn/northd/ovn-northd.c
>> > > +++ b/ovn/northd/ovn-northd.c
>> > > @@ -4426,13 +4426,32 @@ build_lswitch_flows(struct hmap *datapaths,
>> > struct hmap *ports,
>> > >                   ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT,
>> > >                                 ETH_ADDR_ARGS(mac));
>> > >                   if (op->peer->od->l3dgw_port
>> > > -                    && op->peer == op->peer->od->l3dgw_port
>> > > -                    && op->peer->od->l3redirect_port) {
>> > > -                    /* The destination lookup flow for the router's
>> > > -                     * distributed gateway port MAC address should
>> only
>> > be
>> > > -                     * programmed on the "redirect-chassis". */
>> > > -                    ds_put_format(&match, " &&
>> is_chassis_resident(%s)",
>> > > -
>> > op->peer->od->l3redirect_port->json_key);
>> > > +                    && op->peer->od->l3redirect_port
>> > > +                    && op->od->localnet_port) {
>> > > +                    bool add_chassis_resident_check = false;
>> > > +                    if (op->peer == op->peer->od->l3dgw_port) {
>> > > +                        /* The peer of this port represents a
>> > distributed
>> > > +                         * gateway port. The destination lookup
>> > > + flow
>> > for the
>> > > +                         * router's distributed gateway port MAC
>> > address should
>> > > +                         * only be programmed on the
>> > "redirect-chassis". */
>> > > +                        add_chassis_resident_check = true;
>> > > +                    } else {
>> > > +                        /* Check if the option
>> > 'reside-on-redirect-chassis'
>> > > +                         * is set to true on the peer port. If set
>> > > + to
>> > true
>> > > +                         * and if the logical switch has a localnet
>> > port, it
>> > > +                         * means the router pipeline for the
>> > > + packets
>> > from
>> > > +                         * this logical switch should be run on the
>> > chassis
>> > > +                         * hosting the gateway port.
>> > > +                         */
>> > > +                        add_chassis_resident_check = smap_get_bool(
>> > > +                            &op->peer->nbrp->options,
>> > > +                            "reside-on-redirect-chassis", false);
>> > > +                    }
>> > > +
>> > > +                    if (add_chassis_resident_check) {
>> > > +                        ds_put_format(&match, " &&
>> > is_chassis_resident(%s)",
>> > > +
>> > op->peer->od->l3redirect_port->json_key);
>> > > +                    }
>> > >                   }
>> > >
>> > >                   ds_clear(&actions); @@ -5197,15 +5216,35 @@
>> > > build_lrouter_flows(struct hmap *datapaths,
>> > struct hmap *ports,
>> > >                             op->lrp_networks.ipv4_addrs[i].network_s,
>> > >                             op->lrp_networks.ipv4_addrs[i].plen,
>> > >                             op->lrp_networks.ipv4_addrs[i].addr_s);
>> > > -            if (op->od->l3dgw_port && op == op->od->l3dgw_port
>> > > -                && op->od->l3redirect_port) {
>> > > -                /* Traffic with eth.src =
>> l3dgw_port->lrp_networks.ea_s
>> > > -                 * should only be sent from the "redirect-chassis",
>> so
>> > that
>> > > -                 * upstream MAC learning points to the
>> > "redirect-chassis".
>> > > -                 * Also need to avoid generation of multiple ARP
>> > responses
>> > > -                 * from different chassis. */
>> > > -                ds_put_format(&match, " && is_chassis_resident(%s)",
>> > > -                              op->od->l3redirect_port->json_key);
>> > > +
>> > > +            if (op->od->l3dgw_port && op->od->l3redirect_port &&
>> > op->peer
>> > > +                && op->peer->od->localnet_port) {
>> > > +                bool add_chassis_resident_check = false;
>> > > +                if (op == op->od->l3dgw_port) {
>> > > +                    /* Traffic with eth.src =
>> > l3dgw_port->lrp_networks.ea_s
>> > > +                     * should only be sent from the
>> > > + "redirect-chassis",
>> > so that
>> > > +                     * upstream MAC learning points to the
>> > "redirect-chassis".
>> > > +                     * Also need to avoid generation of multiple
>> > > + ARP
>> > responses
>> > > +                     * from different chassis. */
>> > > +                    add_chassis_resident_check = true;
>> > > +                } else {
>> > > +                    /* Check if the option
>> 'reside-on-redirect-chassis'
>> > > +                     * is set to true on the router port. If set to
>> true
>> > > +                     * and if peer's logical switch has a localnet
>> > port, it
>> > > +                     * means the router pipeline for the packets from
>> > > +                     * peer's logical switch is be run on the chassis
>> > > +                     * hosting the gateway port and it should reply
>> > > + to
>> > the
>> > > +                     * ARP requests for the router port IPs.
>> > > +                     */
>> > > +                    add_chassis_resident_check = smap_get_bool(
>> > > +                        &op->nbrp->options,
>> > > +                        "reside-on-redirect-chassis", false);
>> > > +                }
>> > > +
>> > > +                if (add_chassis_resident_check) {
>> > > +                    ds_put_format(&match, " &&
>> is_chassis_resident(%s)",
>> > > +                                  op->od->l3redirect_port->json_key);
>> > > +                }
>> > >               }
>> > >
>> > >               ds_clear(&actions);
>> > > diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
>> > > index 6ed2cf132..998470c34 100644
>> > > --- a/ovn/ovn-architecture.7.xml
>> > > +++ b/ovn/ovn-architecture.7.xml
>> > > @@ -1372,6 +1372,166 @@
>> > >
>> https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.openvswitch.org_en_latest_topics_high-2Davailability&d=DwICAg&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mE2qywxdjIadgBjj3-Xsjbt7jiXYD543pSHvwWZn5sg&s=3YEL0T7qW3h-GbKKAAcQ2q6kFtMqXliOiuOLrpKVQsg&e=
>> .
>> > >     </p>
>> > >
>> > > +  <h2>Tenant VLAN networks connected to a Logical Router</h2>
>> > > +
>> > > +  <p>
>> > > +    It is possible to have multiple logical switches each with a
>> > localnet port
>> > > +    (representing physical networks) connected to a logical router
>> > > + in
>> > which one
>> > > +    may provide the external connectivity via a distributed gatewat
>> > port and
>> > > +    the rest of them are used internally (with VLAN tagged). It is
>> > expected
>> > > +    that <code>ovn-bridge-mappings</code> is configured
>> > > + appropriately
>> > on the
>> > > +    chassis.
>> > > +  </p>
>> > > +
>> > > +  <h3>East West routing</h3>
>> > > +  <p>
>> > > +    East-West routing between these tenant VLAN logical switches
>> > > + works
>> > almost
>> > > +    the same way as normal logical switches. When the VM sends such
>> > > + a
>> > packet,
>> > > +    then:
>> > > +  </p>
>> > > +  <ol>
>> > > +    <li>
>> > > +      The packet enters the ingress pipeline of the logical router
>> > datapath
>> > > +      via the logical router port in the source chassis.
>> > > +    </li>
>> > > +
>> > > +    <li>
>> > > +      Routing decision is taken.
>> > > +    </li>
>> > > +
>> > > +    <li>
>> > > +      The packet goes out of the integration bridge to the provider
>> > bridge (
>> > > +      belonging to the destination logical switch) via the localnet
>> > port.
>> > > +    </li>
>> > > +
>> > > +    <li>
>> > > +      The destination chassis receives the packet via the localnet
>> port
>> > > +      and delivers to the destination VM.
>> > > +    </li>
>> > > +  </ol>
>> > > +
>> > > +  <h3>External traffic</h3>
>> > > +
>> > > +  <p>
>> > > +    The following happens when a VM sends an external traffic
>> > > + (which
>> > requires
>> > > +    NATting) and the chassis hosting the VM doesn't have a
>> > > + distributed
>> > gateway
>> > > +    port.
>> > > +  </p>
>> > > +
>> > > +  <ol>
>> > > +    <li>
>> > > +      The packet enters the ingress pipeline of the logical router
>> > datapath
>> > > +      via the logical router port in the source chassis.
>> > > +    </li>
>> > > +
>> > > +    <li>
>> > > +      Routing decision is taken. Since the gateway router or the
>> > distributed
>> > > +      gateway port doesn't reside in the source chassis, the traffic
>> is
>> > > +      redirected to the gateway chassis via the tunnel port.
>> > > +    </li>
>> > > +
>> > > +    <li>
>> > > +      The gateway chassis receives the packet, applies the NAT rules
>> and
>> > > +      forwards it via the localnet port.
>> > > +    </li>
>> > > +  </ol>
>> > > +
>> > > +  <p>
>> > > +    Although this works, the VM traffic is tunnelled. In order for
>> it to
>> > > +    work properly, the MTU of the VLAN tenant networks must be
>> > > + lowered
>> > to
>> > > +    account for the tunnel encapsulation.
>> > > +  </p>
>> > > +
>> > > +  <h2>Centralized routing for VLAN tenant networks</h2>
>> > > +
>> > > +  <p>
>> > > +    To overcome the tunnel encapsulation problem described in the
>> > previous
>> > > +    section, <code>OVN</code> supports the option of enabling
>> > centralized
>> > > +    routing for VLAN tenant networks. CMS can configure the option
>> > > +    <ref column="options:reside-on-redirect-chassis"
>> > > +    table="Logical_Router_Port" db="OVN_NB"/> to <code>true</code>
>> > > + for
>> > each
>> > > +    <ref table="Logical_Router_Port" db="OVN_NB"/> which connects to
>> the
>> > > +    logical switch of the VLAN tenant network. This causes the
>> gateway
>> > > +    chassis (hosting the distributed gateway port) to handle all the
>> > > +    routing for these networks, making it centralized. It will reply
>> to
>> > > +    the ARP requests for the logical router port IPs.
>> > > +  </p>
>> > > +
>> > > +  <p>
>> > > +    If the logical router doesn't have a distributed gateway port
>> > connecting
>> > > +    to the provider network, then this option is ignored by
>> > <code>OVN</code>.
>> > > +  </p>
>> > > +
>> > > +  <p>
>> > > +    The following happens when a VM sends an east-west traffic
>> > > + which
>> > needs to
>> > > +    be routed:
>> > > +  </p>
>> > > +
>> > > +  <ol>
>> > > +    <li>
>> > > +      The packet from the VM enters the logical datapath pipeline
>> > > + of
>> > the source
>> > > +      VLAN network in the source chassis and is sent out via the
>> > localnet port
>> > > +      (instead of sending it to router pipeline).
>> > > +    </li>
>> > > +
>> > > +    <li>
>> > > +      The packet enters the logical datapath pipeline of the source
>> VLAN
>> > > +      network in the gateway chassis and is sent to the logical
>> datapath
>> > > +      pipeline belonging to the logical router.
>> > > +    </li>
>> > > +
>> > > +    <li>
>> > > +      Routing decision is taken.
>> > > +    </li>
>> > > +
>> > > +    <li>
>> > > +      The packet enters the logical datapath pipeline of the
>> destination
>> > > +      VLAN network. The packet is delivered to the destination VM
>> > > + if it
>> > resides
>> > > +      in the same chassis. Otherwise the packet is sent out via the
>> > localnet
>> > > +      port of the destination VLAN network.
>> > > +    </li>
>> > > +
>> > > +    <li>
>> > > +      The destination chassis receives the packet via the localnet
>> port
>> > > +      and delivers to the destination VM.
>> > > +    </li>
>> > > +  </ol>
>> > > +
>> > > +  <p>
>> > > +    The following happens when a VM sends an external traffic which
>> > requires
>> > > +    NATting:
>> > > +  </p>
>> > > +
>> > > +  <ol>
>> > > +    <li>
>> > > +      The packet from the VM enters the logical datapath pipeline
>> > > + of
>> > the source
>> > > +      VLAN network in the source chassis and is sent out via the
>> > localnet port
>> > > +      (instead of sending it to router pipeline).
>> > > +    </li>
>> > > +
>> > > +    <li>
>> > > +      The packet enters the logical datapath pipeline of the source
>> VLAN
>> > > +      network in the gateway chassis and is sent to the logical
>> datapath
>> > > +      pipeline belonging to the logical router.
>> > > +    </li>
>> > > +
>> > > +    <li>
>> > > +      Routing decision is taken and NAT rules are applied.
>> > > +    </li>
>> > > +
>> > > +    <li>
>> > > +      The packet enters the logical datapath pipeline of the
>> > > + provider
>> > network
>> > > +      and is sent out via the localnet port of the provider network.
>> > > +    </li>
>> > > +  </ol>
>> > > +
>> > > +  <p>
>> > > +    For the reverse external traffic, the gateway chassis applies
>> > > + the
>> > unNATting
>> > > +    rules and sends the packet via the localnet port of the VLAN
>> tenant
>> > > +    network and the destination chassis receives the packet and
>> > delivers to
>> > > +    the VM.
>> > > +  </p>
>> > > +
>> > >     <h2>Life Cycle of a VTEP gateway</h2>
>> > >
>> > >     <p>
>> > > diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml index
>> > > 8564ed39c..13ae56e13 100644
>> > > --- a/ovn/ovn-nb.xml
>> > > +++ b/ovn/ovn-nb.xml
>> > > @@ -1635,6 +1635,49 @@
>> > >             chassis to enable high availability.
>> > >           </p>
>> > >         </column>
>> > > +
>> > > +      <column name="options" key="reside-on-redirect-chassis">
>> > > +        <p>
>> > > +          Generally routing is distributed in <code>OVN</code>. The
>> > packet
>> > > +          from a logical port which needs to be routed hits the
>> > > + router
>> > pipeline
>> > > +          in the source chassis. For the East-West traffic, the
>> > > + packet
>> > is
>> > > +          sent directly to the destination chassis. For the outside
>> > traffic
>> > > +          the packet is sent to the gateway chassis.
>> > > +        </p>
>> > > +
>> > > +        <p>
>> > > +          When this option is set, <code>OVN</code> considers this
>> > > + only
>> > if
>> > > +        </p>
>> > > +
>> > > +        <ul>
>> > > +          <li>
>> > > +            The logical router to which this logical router port
>> > belongs to
>> > > +            has a distributed gateway port.
>> > > +          </li>
>> > > +
>> > > +          <li>
>> > > +            The peer's logical switch has a localnet port
>> (representing
>> > > +            a tenant VLAN network)
>> > > +          </li>
>> > > +        </ul>
>> > > +
>> > > +        <p>
>> > > +          When this option is set to <code>true</code>, then the
>> packet
>> > > +          which needs to be routed hits the router pipeline in the
>> > chassis
>> > > +          hosting the distributed gateway router port. The source
>> > chassis
>> > > +          pushes out this traffic via the localnet port. With this
>> the
>> > > +          East-West traffic is no more distributed and will always
>> > > + go
>> > through
>> > > +          the gateway chassis.
>> > > +        </p>
>> > > +
>> > > +        <p>
>> > > +          Without this option set, for any traffic destined to
>> > > + outside
>> > from a
>> > > +          logical port which belongs to a logical switch with
>> > > + localnet
>> > port,
>> > > +          the source chassis will send the traffic to the gateway
>> > chassis via
>> > > +          the tunnel port instead of the localnet port and this
>> > > + could
>> > cause MTU
>> > > +          issues.
>> > > +        </p>
>> > > +      </column>
>> > >       </group>
>> > >
>> > >       <group title="Attachment">
>> > > diff --git a/tests/ovn.at b/tests/ovn.at index 769e09f81..504ba228d
>> > > 100644
>> > > --- a/tests/ovn.at
>> > > +++ b/tests/ovn.at
>> > > @@ -8537,6 +8537,279 @@ OVN_CLEANUP([hv1],[hv2],[hv3])
>> > >
>> > >   AT_CLEANUP
>> > >
>> > > +# VLAN traffic for external network redirected through distributed
>> > router
>> > > +# gateway port should use vlans(i.e input network vlan tag) across
>> > hypervisors
>> > > +# instead of tunneling.
>> > > +AT_SETUP([ovn -- vlan traffic for external network with distributed
>> > router gateway port])
>> > > +AT_SKIP_IF([test $HAVE_PYTHON = no]) ovn_start
>> > > +
>> > > +# Logical network:
>> > > +# # One LR R1 that has switches foo (192.168.1.0/24) and # # alice
>> > > +(172.16.1.0/24) connected to it.  The logical port # # between R1
>> > > +and alice has a "redirect-chassis" specified, # # i.e. it is the
>> > > +distributed router gateway port(172.16.1.6).
>> > > +# # Switch alice also has a localnet port defined.
>> > > +# # An additional switch outside has the same subnet as alice # #
>> > > +(172.16.1.0/24), a localnet port and nexthop port(172.16.1.1) # #
>> > > +which will receive the packet destined for external network # #
>> > > +(i.e 8.8.8.8 as destination ip).
>> > > +
>> > > +# Physical network:
>> > > +# # Three hypervisors hv[123].
>> > > +# # hv1 hosts vif foo1.
>> > > +# # hv2 is the "redirect-chassis" that hosts the distributed router
>> > gateway port.
>> > > +# # hv3 hosts nexthop port vif outside1.
>> > > +# # All other tests connect hypervisors to network n1 through
>> > > +br-phys
>> > for tunneling.
>> > > +# # But in this test, hv1 won't connect to n1(and no br-phys in
>> > > +hv1),
>> > and
>> > > +# # in order to show vlans(instead of tunneling) used between hv1
>> > > +and
>> > hv2,
>> > > +# # a new network n2 created and hv1 and hv2 connected to this
>> > > +network
>> > through br-ex.
>> > > +# # hv2 and hv3 are still connected to n1 network through br-phys.
>> > > +net_add n1
>> > > +
>> > > +# We are not calling ovn_attach for hv1, to avoid adding br-phys.
>> > > +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any
>> > bridge in hv1
>> > > +sim_add hv1
>> > > +as hv1
>> > > +ovs-vsctl \
>> > > +    -- set Open_vSwitch . external-ids:system-id=hv1 \
>> > > +    -- set Open_vSwitch .
>> > external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
>> > > +    -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \
>> > > +    -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \
>> > > +    -- add-br br-int \
>> > > +    -- set bridge br-int fail-mode=secure
>> > other-config:disable-in-band=true \
>> > > +    -- set Open_vSwitch .
>> > > + external-ids:ovn-bridge-mappings=public:br-ex
>> > > +
>> > > +start_daemon ovn-controller
>> > > +ovs-vsctl -- add-port br-int hv1-vif1 -- \
>> > > +    set interface hv1-vif1 external-ids:iface-id=foo1 \
>> > > +    ofport-request=1
>> > > +
>> > > +sim_add hv2
>> > > +as hv2
>> > > +ovs-vsctl add-br br-phys
>> > > +ovn_attach n1 br-phys 192.168.0.2
>> > > +ovs-vsctl set Open_vSwitch .
>> > external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys"
>> > > +
>> > > +sim_add hv3
>> > > +as hv3
>> > > +ovs-vsctl add-br br-phys
>> > > +ovn_attach n1 br-phys 192.168.0.3
>> > > +ovs-vsctl -- add-port br-int hv3-vif1 -- \
>> > > +    set interface hv3-vif1 external-ids:iface-id=outside1 \
>> > > +    options:tx_pcap=hv3/vif1-tx.pcap \
>> > > +    options:rxq_pcap=hv3/vif1-rx.pcap \
>> > > +    ofport-request=1
>> > > +ovs-vsctl set Open_vSwitch .
>> > external-ids:ovn-bridge-mappings="phys:br-phys"
>> > > +
>> > > +# Create network n2 for vlan connectivity between hv1 and hv2
>> > > +net_add n2
>> > > +
>> > > +as hv1
>> > > +ovs-vsctl add-br br-ex
>> > > +net_attach n2 br-ex
>> > > +
>> > > +as hv2
>> > > +ovs-vsctl add-br br-ex
>> > > +net_attach n2 br-ex
>> > > +
>> > > +OVN_POPULATE_ARP
>> > > +
>> > > +ovn-nbctl create Logical_Router name=R1
>> > > +
>> > > +ovn-nbctl ls-add foo
>> > > +ovn-nbctl ls-add alice
>> > > +ovn-nbctl ls-add outside
>> > > +
>> > > +# Connect foo to R1
>> > > +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24 ovn-nbctl
>> > > +lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
>> > > +    type=router options:router-port=foo \
>> > > +    -- lsp-set-addresses rp-foo router
>> > > +
>> > > +# Connect alice to R1 as distributed router gateway port
>> > > +(172.16.1.6)
>> > on hv2
>> > > +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \
>> > > +    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
>> > > +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice
>> \
>> > > +    type=router options:router-port=alice \
>> > > +    -- lsp-set-addresses rp-alice router \
>> > > +
>> > > +# Create logical port foo1 in foo
>> > > +ovn-nbctl lsp-add foo foo1 \
>> > > +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
>> > > +
>> > > +# Create logical port outside1 in outside, which is a nexthop
>> > > +address # for 172.16.1.0/24 ovn-nbctl lsp-add outside outside1 \
>> > > +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1"
>> > > +
>> > > +# Set default gateway (nexthop) to 172.16.1.1 ovn-nbctl
>> > > +lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice AT_CHECK([ovn-nbctl
>> > > +lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24]) ovn-nbctl set
>> > > +Logical_Switch_Port rp-alice options:nat-addresses=router
>> > > +
>> > > +ovn-nbctl lsp-add foo ln-foo
>> > > +ovn-nbctl lsp-set-addresses ln-foo unknown ovn-nbctl
>> > > +lsp-set-options ln-foo network_name=public ovn-nbctl lsp-set-type
>> > > +ln-foo localnet AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo
>> > > +tag=2])
>> > > +
>> > > +# Create localnet port in alice
>> > > +ovn-nbctl lsp-add alice ln-alice
>> > > +ovn-nbctl lsp-set-addresses ln-alice unknown ovn-nbctl lsp-set-type
>> > > +ln-alice localnet ovn-nbctl lsp-set-options ln-alice
>> > > +network_name=phys
>> > > +
>> > > +# Create localnet port in outside
>> > > +ovn-nbctl lsp-add outside ln-outside ovn-nbctl lsp-set-addresses
>> > > +ln-outside unknown ovn-nbctl lsp-set-type ln-outside localnet
>> > > +ovn-nbctl lsp-set-options ln-outside network_name=phys
>> > > +
>> > > +# Allow some time for ovn-northd and ovn-controller to catch up.
>> > > +# XXX This should be more systematic.
>> > > +ovn-nbctl --wait=hv --timeout=3 sync
>> > > +
>> > > +# Check that there is a logical flow in logical switch foo's
>> > > +pipeline # to set the outport to rp-foo (which is expected).
>> > > +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep
>> > > +ls_in_l2_lkup
>> > | \
>> > > +grep rp-foo | grep -v is_chassis_resident | wc -l`])
>> > > +
>> > > +# Set the option 'reside-on-redirect-chassis' for foo ovn-nbctl set
>> > > +logical_router_port foo
>> > options:reside-on-redirect-chassis=true
>> > > +# Check that there is a logical flow in logical switch foo's
>> > > +pipeline # to set the outport to rp-foo with the condition
>> is_chassis_redirect.
>> > > +ovn-sbctl dump-flows foo
>> > > +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep
>> > > +ls_in_l2_lkup
>> > | \
>> > > +grep rp-foo | grep is_chassis_resident | wc -l`])
>> > > +
>> > > +echo "---------NB dump-----"
>> > > +ovn-nbctl show
>> > > +echo "---------------------"
>> > > +ovn-nbctl list logical_router
>> > > +echo "---------------------"
>> > > +ovn-nbctl list nat
>> > > +echo "---------------------"
>> > > +ovn-nbctl list logical_router_port
>> > > +echo "---------------------"
>> > > +
>> > > +echo "---------SB dump-----"
>> > > +ovn-sbctl list datapath_binding
>> > > +echo "---------------------"
>> > > +ovn-sbctl list port_binding
>> > > +echo "---------------------"
>> > > +ovn-sbctl dump-flows
>> > > +echo "---------------------"
>> > > +ovn-sbctl list chassis
>> > > +echo "---------------------"
>> > > +
>> > > +for chassis in hv1 hv2 hv3; do
>> > > +    as $chassis
>> > > +    echo "------ $chassis dump ----------"
>> > > +    ovs-vsctl show br-int
>> > > +    ovs-ofctl show br-int
>> > > +    ovs-ofctl dump-flows br-int
>> > > +    echo "--------------------------"
>> > > +done
>> > > +
>> > > +ip_to_hex() {
>> > > +    printf "%02x%02x%02x%02x" "$@"
>> > > +}
>> > > +
>> > > +foo1_ip=$(ip_to_hex 192 168 1 2)
>> > > +gw_ip=$(ip_to_hex 172 16 1 6)
>> > > +dst_ip=$(ip_to_hex 8 8 8 8)
>> > > +nexthop_ip=$(ip_to_hex 172 16 1 1)
>> > > +
>> > > +foo1_mac="f00000010203"
>> > > +foo_mac="000001010203"
>> > > +gw_mac="000002010203"
>> > > +nexthop_mac="f00000010204"
>> > > +
>> > > +# Send ip packet from foo1 to 8.8.8.8 src_mac="f00000010203"
>> > > +dst_mac="000001010203"
>> > >
>> > +packet=${foo_mac}${foo1_mac}08004500001c0000000040110000${foo1_ip}${d
>> > +st_ip}0035111100080000
>> > > +
>> > > +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet sleep 2
>> > > +
>> > > +# ARP request packet for nexthop_ip to expect at outside1
>> > >
>> > +arp_request=ffffffffffff${gw_mac}08060001080006040001${gw_mac}${gw_ip
>> > +}000000000000${nexthop_ip}
>> > > +echo $arp_request >> hv3-vif1.expected cat hv3-vif1.expected >
>> > > +expout $PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap
>> > > +| grep
>> > ${nexthop_ip} | uniq > hv3-vif1
>> > > +AT_CHECK([sort hv3-vif1], [0], [expout])
>> > > +
>> > > +# Send ARP reply from outside1 back to the router
>> > > +reply_mac="f00000010204"
>> > >
>> > +arp_reply=${gw_mac}${nexthop_mac}08060001080006040002${nexthop_mac}${
>> > +nexthop_ip}${gw_mac}${gw_ip}
>> > > +
>> > > +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply
>> > > +OVS_WAIT_UNTIL([
>> > > +    test `as hv2 ovs-ofctl dump-flows br-int | grep table=66 | \
>> > > +grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1
>> > > +    ])
>> > > +
>> > > +# VLAN tagged packet with router port(192.168.1.1) MAC as
>> > > +destination
>> > MAC
>> > > +# is expected on bridge connecting hv1 and hv2
>> > >
>> > +expected=${foo_mac}${foo1_mac}8100000208004500001c0000000040110000${f
>> > +oo1_ip}${dst_ip}0035111100080000
>> > > +echo $expected > hv1-br-ex_n2.expected
>> > > +
>> > > +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port.
>> > > +# As connection tracking not enabled for this test, snat can't be
>> > > +done
>> > on the packet.
>> > > +# We still see foo1 as the source ip address. But source
>> > > +mac(gateway
>> > MAC) and
>> > > +# dest mac(nexthop mac) are properly configured.
>> > >
>> > +expected=${nexthop_mac}${gw_mac}08004500001c000000003f110100${foo1_ip
>> > +}${dst_ip}0035111100080000
>> > > +echo $expected > hv3-vif1.expected
>> > > +
>> > > +reset_pcap_file() {
>> > > +    local iface=$1
>> > > +    local pcap_file=$2
>> > > +    ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap
>> > > +\ options:rxq_pcap=dummy-rx.pcap
>> > > +    rm -f ${pcap_file}*.pcap
>> > > +    ovs-vsctl -- set Interface $iface
>> > options:tx_pcap=${pcap_file}-tx.pcap \
>> > > +options:rxq_pcap=${pcap_file}-rx.pcap
>> > > +}
>> > > +
>> > > +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2 as hv3 reset_pcap_file
>> > > +hv3-vif1 hv3/vif1 sleep 2 as hv1 ovs-appctl netdev-dummy/receive
>> > > +hv1-vif1 $packet sleep 2
>> > > +
>> > > +# On hv1, the packet should not go from vlan switch pipleline to
>> > > +router # pipleine as hv1 ovs-ofctl dump-flows br-int
>> > > +
>> > > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
>> > "priority=100,reg15=0x1,metadata=0x2" \
>> > > +| grep actions=clone | grep -v n_packets=0 | wc -l], [0], [[0
>> > > +]])
>> > > +
>> > > +# On hv1, table 32 check that no packet goes via the tunnel port
>> > > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 \
>> > > +| grep "NXM_NX_TUN_ID" | grep -v n_packets=0 | wc -l], [0], [[0
>> > > +]])
>> > > +
>> > > +ip_packet() {
>> > > +    grep "1010203f00000010203"
>> > > +}
>> > > +
>> > > +# Check vlan tagged packet on the bridge connecting hv1 and hv2
>> > > +with the # foo1's mac.
>> > > +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-ex_n2-tx.pcap |
>> > ip_packet | uniq > hv1-br-ex_n2
>> > > +cat hv1-br-ex_n2.expected > expout
>> > > +AT_CHECK([sort hv1-br-ex_n2], [0], [expout])
>> > > +
>> > > +# Check expected packet on nexthop interface $PYTHON
>> > > +"$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep
>> > ${foo1_ip}${dst_ip} | uniq > hv3-vif1
>> > > +cat hv3-vif1.expected > expout
>> > > +AT_CHECK([sort hv3-vif1], [0], [expout])
>> > > +
>> > > +OVN_CLEANUP([hv1],[hv2],[hv3])
>> > > +AT_CLEANUP
>> > > +
>> > >   AT_SETUP([ovn -- IPv6 ND Router Solicitation responder])
>> > >   AT_KEYWORDS([ovn-nd_ra])
>> > >   AT_SKIP_IF([test $HAVE_PYTHON = no])
>> > >
>> >
>> >
>> _______________________________________________
>> dev mailing list
>> mailto:dev@openvswitch.org
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwICAg&c=s883GpUCOChKOHiocYtGcg&r=mZwX9gFQgeJHzTg-68aCJgsODyUEVsHGFOfL90J6MJY&m=mE2qywxdjIadgBjj3-Xsjbt7jiXYD543pSHvwWZn5sg&s=vk8-2EI8-njSdNsgLyP81K8HEZOJfSxugzH3JpXsMUM&e=
>>
>
Ankur Sharma Nov. 8, 2018, 10:52 a.m. UTC | #6
Hi Numan,

Thanks for getting back on my comments.
Please find my reply inline.

Just summarizing main points that i have:

CONCERN:
a. Not comfortable with referring to a NOT CONNECTED logical router port in logical switch pipeline.


SUGGESTED ALTERNATIVE APPROACH:
a. Ideal way is to use a centralized router.

b. if a. is not possible from openstack (or other CMS) side, then we can just convert the patch port pair between vlan ls and lr to type l3gateway.
    That ways, pakcets originated from vlan logical switch will exercise L3 pipeline only on redirect chassis.


Intention is not block the patch :). If rest of the community is fine with the approach in this series, then please feel free to go ahead.
Just wanted to raise the concern.


Thanks

Regards,
Ankur
Gurucharan Shetty Nov. 15, 2018, 1:10 p.m. UTC | #7
On Fri, 5 Oct 2018 at 10:15, <nusiddiq@redhat.com> wrote:

> From: Numan Siddique <nusiddiq@redhat.com>
>
> An OVN deployment can have multiple logical switches each with a
> localnet port connected to a distributed logical router with one
> logical router port providing external connectivity (provider network)
> and others used as tenant networks with VLAN tagging.
>
> As reported in [1], external traffic from these VLAN tenant networks
> are tunnelled to the gateway chassis (chassis hosting a distributed
> gateway port which applies NAT rules). As part of the discussion in
> [1], there were few possible solutions proposed by Russell [2]. This
> patch implements the first option in [2].
>
> With this patch, a new option 'reside-on-redirect-chassis' in 'options'
> column of Logical_Router_Port table is added. If the value of this
> option is set to 'true' and if the logical router also have a
> distributed gateway port, then routing for this logical router port
> is centralized in the chassis hosting the distributed gateway port.
>
> If a logical switch 'sw0' is connected to a router 'lr0' with the
> router port - 'lr0-sw0' with the address - "00:00:00:00:af:12 192.168.1.1"
> , and it has a distributed logical port - 'lr0-public', then the
> below logical flow is added in the logical switch pipeline
> of 'sw0' if the 'reside-on-redirect-chassis' option is set on 'lr-sw0' -
>
> table=16(ls_in_l2_lkup), priority=50,
> match=(eth.dst == 00:00:00:00:af:12 &&
> is_chassis_resident("cr-lr0-public")),
> action=(outport = "sw0-lr0"; output;)
>

Where does the "cr" come above?

>
> With the above flow, the packet doesn't enter the router pipeline in
> the source chassis. Instead the packet is sent out via the localnet
> port of 'sw0'. The gateway chassis upon receiving this packet, runs
> the logical router pipeline applying NAT rules and sends the traffic
> out via the localnet port of the provider network. The gateway
> chassis will also reply to the ARP requests for the router port IPs.
>
> With this approach, we avoid redirecting the external traffic to the
> gateway chassis via the tunnel port. There are a couple of drawbacks
> with this approach:
>
>   - East - West routing is no more distributed for the VLAN tenant
>     networks if 'reside-on-redirect-chassis' option is defined
>
>   - 'dnat_and_snat' NAT rules with 'logical_mac' and 'logical_port'
>     columns defined will not work for the VLAN tenant networks.
>
> This approach is taken for now as it is simple. If there is a requirement
> to support distributed routing for these VLAN tenant networks, we
> can explore other possible solutions.
>
> [1] -
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
> [2] -
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046557.html
>
> Reported-at:
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
> Reported-by: venkata anil <vkommadi@redhat.com>
> Co-authored-by: venkata anil <vkommadi@redhat.com>
> Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
> Signed-off-by: venkata anil <vkommadi@redhat.com>
>

Acked-by: Gurucharan Shetty <guru@ovn.org>

Thank you for the extensive documentation. Though many of it feels obvious
now when I read it. For non-users of localnet, the memory had faded enough
to not understand the context well before.

Couple of comments for documentation below.


> ---
>  ovn/northd/ovn-northd.8.xml |  30 ++++
>  ovn/northd/ovn-northd.c     |  71 +++++++---
>  ovn/ovn-architecture.7.xml  | 160 +++++++++++++++++++++
>  ovn/ovn-nb.xml              |  43 ++++++
>  tests/ovn.at                | 273 ++++++++++++++++++++++++++++++++++++
>  5 files changed, 561 insertions(+), 16 deletions(-)
>
> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
> index 7352c6764..f52699bd3 100644
> --- a/ovn/northd/ovn-northd.8.xml
> +++ b/ovn/northd/ovn-northd.8.xml
> @@ -874,6 +874,25 @@ output;
>              resident.
>            </li>
>          </ul>
> +
> +        <p>
> +          For the Ethernet address on a logical switch port of type
> +          <code>router</code>, when that logical switch port's
> +          <ref column="addresses" table="Logical_Switch_Port"
> +          db="OVN_Northbound"/> column is set to <code>router</code> and
> +          the connected logical router port specifies a
> +          <code>reside-on-redirect-chassis</code> and the logical router
> +          to which the connected logical router port belongs to has a
> +          <code>redirect-chassis</code> distributed gateway logical router
> +          port:
> +        </p>
> +
> +        <ul>
> +          <li>
> +            The flow for the connected logical router port's Ethernet
> +            address is only programmed on the
> <code>redirect-chassis</code>.
> +          </li>
> +        </ul>
>        </li>
>
>        <li>
> @@ -1179,6 +1198,17 @@ output;
>            upstream MAC learning to point to the
>            <code>redirect-chassis</code>.
>          </p>
> +
> +        <p>
> +          For the logical router port with the option
> +          <code>reside-on-redirect-chassis</code> set (which is
> centralized),
> +          the above flows are only programmed on the gateway port
> instance on
> +          the <code>redirect-chassis</code> (if the logical router has a
> +          distributed gateway port). This behavior avoids generation
> +          of multiple ARP responses from different chassis, and allows
> +          upstream MAC learning to point to the
> +          <code>redirect-chassis</code>.
> +        </p>
>        </li>
>
>        <li>
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index 31ea5f410..3998a898c 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -4426,13 +4426,32 @@ build_lswitch_flows(struct hmap *datapaths, struct
> hmap *ports,
>                  ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT,
>                                ETH_ADDR_ARGS(mac));
>                  if (op->peer->od->l3dgw_port
> -                    && op->peer == op->peer->od->l3dgw_port
> -                    && op->peer->od->l3redirect_port) {
> -                    /* The destination lookup flow for the router's
> -                     * distributed gateway port MAC address should only be
> -                     * programmed on the "redirect-chassis". */
> -                    ds_put_format(&match, " && is_chassis_resident(%s)",
> -
> op->peer->od->l3redirect_port->json_key);
> +                    && op->peer->od->l3redirect_port
> +                    && op->od->localnet_port) {
> +                    bool add_chassis_resident_check = false;
> +                    if (op->peer == op->peer->od->l3dgw_port) {
> +                        /* The peer of this port represents a distributed
> +                         * gateway port. The destination lookup flow for
> the
> +                         * router's distributed gateway port MAC address
> should
> +                         * only be programmed on the "redirect-chassis".
> */
> +                        add_chassis_resident_check = true;
> +                    } else {
> +                        /* Check if the option
> 'reside-on-redirect-chassis'
> +                         * is set to true on the peer port. If set to true
> +                         * and if the logical switch has a localnet port,
> it
> +                         * means the router pipeline for the packets from
> +                         * this logical switch should be run on the
> chassis
> +                         * hosting the gateway port.
> +                         */
> +                        add_chassis_resident_check = smap_get_bool(
> +                            &op->peer->nbrp->options,
> +                            "reside-on-redirect-chassis", false);
> +                    }
> +
> +                    if (add_chassis_resident_check) {
> +                        ds_put_format(&match, " &&
> is_chassis_resident(%s)",
> +
> op->peer->od->l3redirect_port->json_key);
> +                    }
>                  }
>
>                  ds_clear(&actions);
> @@ -5197,15 +5216,35 @@ build_lrouter_flows(struct hmap *datapaths, struct
> hmap *ports,
>                            op->lrp_networks.ipv4_addrs[i].network_s,
>                            op->lrp_networks.ipv4_addrs[i].plen,
>                            op->lrp_networks.ipv4_addrs[i].addr_s);
> -            if (op->od->l3dgw_port && op == op->od->l3dgw_port
> -                && op->od->l3redirect_port) {
> -                /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s
> -                 * should only be sent from the "redirect-chassis", so
> that
> -                 * upstream MAC learning points to the "redirect-chassis".
> -                 * Also need to avoid generation of multiple ARP responses
> -                 * from different chassis. */
> -                ds_put_format(&match, " && is_chassis_resident(%s)",
> -                              op->od->l3redirect_port->json_key);
> +
> +            if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer
> +                && op->peer->od->localnet_port) {
> +                bool add_chassis_resident_check = false;
> +                if (op == op->od->l3dgw_port) {
> +                    /* Traffic with eth.src =
> l3dgw_port->lrp_networks.ea_s
> +                     * should only be sent from the "redirect-chassis",
> so that
> +                     * upstream MAC learning points to the
> "redirect-chassis".
> +                     * Also need to avoid generation of multiple ARP
> responses
> +                     * from different chassis. */
> +                    add_chassis_resident_check = true;
> +                } else {
> +                    /* Check if the option 'reside-on-redirect-chassis'
> +                     * is set to true on the router port. If set to true
> +                     * and if peer's logical switch has a localnet port,
> it
> +                     * means the router pipeline for the packets from
> +                     * peer's logical switch is be run on the chassis
> +                     * hosting the gateway port and it should reply to the
> +                     * ARP requests for the router port IPs.
> +                     */
> +                    add_chassis_resident_check = smap_get_bool(
> +                        &op->nbrp->options,
> +                        "reside-on-redirect-chassis", false);
> +                }
> +
> +                if (add_chassis_resident_check) {
> +                    ds_put_format(&match, " && is_chassis_resident(%s)",
> +                                  op->od->l3redirect_port->json_key);
> +                }
>              }
>
>              ds_clear(&actions);
> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
> index 6ed2cf132..998470c34 100644
> --- a/ovn/ovn-architecture.7.xml
> +++ b/ovn/ovn-architecture.7.xml
> @@ -1372,6 +1372,166 @@
>      http://docs.openvswitch.org/en/latest/topics/high-availability.
>    </p>
>
> +  <h2>Tenant VLAN networks connected to a Logical Router</h2>
>
I think we should say "Localnet VLAN networks" above. The word "tenant"
feels a little confusing. The use of word "tenant" elsewhere should also
change.
Or we can talk about what a "vlan tenant network" and "provider network" is
here and not worry about later changes.


> +
> +  <p>
> +    It is possible to have multiple logical switches each with a localnet
> port
> +    (representing physical networks) connected to a logical router in
> which one
> +    may provide the external connectivity via a distributed gatewat port
> and
>
s/gatewat/gateway

+    the rest of them are used internally (with VLAN tagged). It is expected
>
 s/"the rest of them are used internally (with VLAN tagged)"/"the rest of
the connectivity is achieved via vlan tagging in physical network"/

+    that <code>ovn-bridge-mappings</code> is configured appropriately on
> the
> +    chassis.
> +  </p>
> +
> +  <h3>East West routing</h3>
> +  <p>
> +    East-West routing between these tenant VLAN logical switches works
> almost
> +    the same way as normal logical switches. When the VM sends such a
> packet,
> +    then:
> +  </p>
> +  <ol>
> +    <li>
> +      The packet enters the ingress pipeline of the logical router
> datapath
> +      via the logical router port in the source chassis.
> +    </li>
> +
> +    <li>
> +      Routing decision is taken.
> +    </li>
> +
> +    <li>
> +      The packet goes out of the integration bridge to the provider
> bridge (
> +      belonging to the destination logical switch) via the localnet port.
>
Doesn't this happen after going through the destination logical switch (and
not immediately of router)?



> +    </li>
> +
> +    <li>
> +      The destination chassis receives the packet via the localnet port
> +      and delivers to the destination VM.
>
It may be worth mentioning that it goes through ingress and egress pipeline
of destination logical switch again.


> +    </li>
> +  </ol>
> +
> +  <h3>External traffic</h3>
> +
> +  <p>
> +    The following happens when a VM sends an external traffic (which
> requires
> +    NATting) and the chassis hosting the VM doesn't have a distributed
> gateway
> +    port.
> +  </p>
> +
> +  <ol>
> +    <li>
> +      The packet enters the ingress pipeline of the logical router
> datapath
> +      via the logical router port in the source chassis.
> +    </li>
> +
> +    <li>
> +      Routing decision is taken. Since the gateway router or the
> distributed
> +      gateway port doesn't reside in the source chassis, the traffic is
> +      redirected to the gateway chassis via the tunnel port.
> +    </li>
> +
> +    <li>
> +      The gateway chassis receives the packet, applies the NAT rules and
> +      forwards it via the localnet port.
> +    </li>
> +  </ol>
> +
> +  <p>
> +    Although this works, the VM traffic is tunnelled. In order for it to
> +    work properly, the MTU of the VLAN tenant networks must be lowered to
> +    account for the tunnel encapsulation.
> +  </p>
> +
> +  <h2>Centralized routing for VLAN tenant networks</h2>
> +
> +  <p>
> +    To overcome the tunnel encapsulation problem described in the previous
> +    section, <code>OVN</code> supports the option of enabling centralized
> +    routing for VLAN tenant networks. CMS can configure the option
> +    <ref column="options:reside-on-redirect-chassis"
> +    table="Logical_Router_Port" db="OVN_NB"/> to <code>true</code> for
> each
> +    <ref table="Logical_Router_Port" db="OVN_NB"/> which connects to the
> +    logical switch of the VLAN tenant network. This causes the gateway
> +    chassis (hosting the distributed gateway port) to handle all the
> +    routing for these networks, making it centralized. It will reply to
> +    the ARP requests for the logical router port IPs.
> +  </p>
> +
> +  <p>
> +    If the logical router doesn't have a distributed gateway port
> connecting
> +    to the provider network, then this option is ignored by
> <code>OVN</code>.
> +  </p>
> +
> +  <p>
> +    The following happens when a VM sends an east-west traffic which
> needs to
> +    be routed:
> +  </p>
> +
> +  <ol>
> +    <li>
> +      The packet from the VM enters the logical datapath pipeline of the
> source
> +      VLAN network in the source chassis and is sent out via the localnet
> port
> +      (instead of sending it to router pipeline).
> +    </li>
> +
> +    <li>
> +      The packet enters the logical datapath pipeline of the source VLAN
> +      network in the gateway chassis and is sent to the logical datapath
> +      pipeline belonging to the logical router.
> +    </li>
> +
> +    <li>
> +      Routing decision is taken.
> +    </li>
> +
> +    <li>
> +      The packet enters the logical datapath pipeline of the destination
> +      VLAN network. The packet is delivered to the destination VM if it
> resides
> +      in the same chassis. Otherwise the packet is sent out via the
> localnet
> +      port of the destination VLAN network.
> +    </li>
> +
> +    <li>
> +      The destination chassis receives the packet via the localnet port
> +      and delivers to the destination VM.
> +    </li>
> +  </ol>
> +
> +  <p>
> +    The following happens when a VM sends an external traffic which
> requires
> +    NATting:
> +  </p>
> +
> +  <ol>
> +    <li>
> +      The packet from the VM enters the logical datapath pipeline of the
> source
> +      VLAN network in the source chassis and is sent out via the localnet
> port
> +      (instead of sending it to router pipeline).
> +    </li>
> +
> +    <li>
> +      The packet enters the logical datapath pipeline of the source VLAN
> +      network in the gateway chassis and is sent to the logical datapath
> +      pipeline belonging to the logical router.
> +    </li>
> +
> +    <li>
> +      Routing decision is taken and NAT rules are applied.
> +    </li>
> +
> +    <li>
> +      The packet enters the logical datapath pipeline of the provider
> network
> +      and is sent out via the localnet port of the provider network.
> +    </li>
> +  </ol>
> +
> +  <p>
> +    For the reverse external traffic, the gateway chassis applies the
> unNATting
> +    rules and sends the packet via the localnet port of the VLAN tenant
> +    network and the destination chassis receives the packet and delivers
> to
> +    the VM.
> +  </p>
> +
>    <h2>Life Cycle of a VTEP gateway</h2>
>
>    <p>
> diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
> index 8564ed39c..13ae56e13 100644
> --- a/ovn/ovn-nb.xml
> +++ b/ovn/ovn-nb.xml
> @@ -1635,6 +1635,49 @@
>            chassis to enable high availability.
>          </p>
>        </column>
> +
> +      <column name="options" key="reside-on-redirect-chassis">
> +        <p>
> +          Generally routing is distributed in <code>OVN</code>. The packet
> +          from a logical port which needs to be routed hits the router
> pipeline
> +          in the source chassis. For the East-West traffic, the packet is
> +          sent directly to the destination chassis. For the outside
> traffic
> +          the packet is sent to the gateway chassis.
> +        </p>
> +
> +        <p>
> +          When this option is set, <code>OVN</code> considers this only if
> +        </p>
> +
> +        <ul>
> +          <li>
> +            The logical router to which this logical router port belongs
> to
> +            has a distributed gateway port.
> +          </li>
> +
> +          <li>
> +            The peer's logical switch has a localnet port (representing
> +            a tenant VLAN network)
> +          </li>
> +        </ul>
> +
> +        <p>
> +          When this option is set to <code>true</code>, then the packet
> +          which needs to be routed hits the router pipeline in the chassis
> +          hosting the distributed gateway router port. The source chassis
> +          pushes out this traffic via the localnet port. With this the
> +          East-West traffic is no more distributed and will always go
> through
> +          the gateway chassis.
> +        </p>
> +
> +        <p>
> +          Without this option set, for any traffic destined to outside
> from a
> +          logical port which belongs to a logical switch with localnet
> port,
> +          the source chassis will send the traffic to the gateway chassis
> via
> +          the tunnel port instead of the localnet port and this could
> cause MTU
> +          issues.
> +        </p>
> +      </column>
>      </group>
>
>      <group title="Attachment">
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 769e09f81..504ba228d 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -8537,6 +8537,279 @@ OVN_CLEANUP([hv1],[hv2],[hv3])
>
>  AT_CLEANUP
>
> +# VLAN traffic for external network redirected through distributed router
> +# gateway port should use vlans(i.e input network vlan tag) across
> hypervisors
> +# instead of tunneling.
> +AT_SETUP([ovn -- vlan traffic for external network with distributed
> router gateway port])
> +AT_SKIP_IF([test $HAVE_PYTHON = no])
> +ovn_start
> +
> +# Logical network:
> +# # One LR R1 that has switches foo (192.168.1.0/24) and
> +# # alice (172.16.1.0/24) connected to it.  The logical port
> +# # between R1 and alice has a "redirect-chassis" specified,
> +# # i.e. it is the distributed router gateway port(172.16.1.6).
> +# # Switch alice also has a localnet port defined.
> +# # An additional switch outside has the same subnet as alice
> +# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1)
> +# # which will receive the packet destined for external network
> +# # (i.e 8.8.8.8 as destination ip).
> +
> +# Physical network:
> +# # Three hypervisors hv[123].
> +# # hv1 hosts vif foo1.
> +# # hv2 is the "redirect-chassis" that hosts the distributed router
> gateway port.
> +# # hv3 hosts nexthop port vif outside1.
> +# # All other tests connect hypervisors to network n1 through br-phys for
> tunneling.
> +# # But in this test, hv1 won't connect to n1(and no br-phys in hv1), and
> +# # in order to show vlans(instead of tunneling) used between hv1 and hv2,
> +# # a new network n2 created and hv1 and hv2 connected to this network
> through br-ex.
> +# # hv2 and hv3 are still connected to n1 network through br-phys.
> +net_add n1
> +
> +# We are not calling ovn_attach for hv1, to avoid adding br-phys.
> +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any bridge
> in hv1
> +sim_add hv1
> +as hv1
> +ovs-vsctl \
> +    -- set Open_vSwitch . external-ids:system-id=hv1 \
> +    -- set Open_vSwitch .
> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> +    -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \
> +    -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \
> +    -- add-br br-int \
> +    -- set bridge br-int fail-mode=secure
> other-config:disable-in-band=true \
> +    -- set Open_vSwitch . external-ids:ovn-bridge-mappings=public:br-ex
> +
> +start_daemon ovn-controller
> +ovs-vsctl -- add-port br-int hv1-vif1 -- \
> +    set interface hv1-vif1 external-ids:iface-id=foo1 \
> +    ofport-request=1
> +
> +sim_add hv2
> +as hv2
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.2
> +ovs-vsctl set Open_vSwitch .
> external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys"
> +
> +sim_add hv3
> +as hv3
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.3
> +ovs-vsctl -- add-port br-int hv3-vif1 -- \
> +    set interface hv3-vif1 external-ids:iface-id=outside1 \
> +    options:tx_pcap=hv3/vif1-tx.pcap \
> +    options:rxq_pcap=hv3/vif1-rx.pcap \
> +    ofport-request=1
> +ovs-vsctl set Open_vSwitch .
> external-ids:ovn-bridge-mappings="phys:br-phys"
> +
> +# Create network n2 for vlan connectivity between hv1 and hv2
> +net_add n2
> +
> +as hv1
> +ovs-vsctl add-br br-ex
> +net_attach n2 br-ex
> +
> +as hv2
> +ovs-vsctl add-br br-ex
> +net_attach n2 br-ex
> +
> +OVN_POPULATE_ARP
> +
> +ovn-nbctl create Logical_Router name=R1
> +
> +ovn-nbctl ls-add foo
> +ovn-nbctl ls-add alice
> +ovn-nbctl ls-add outside
> +
> +# Connect foo to R1
> +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24
> +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
> +    type=router options:router-port=foo \
> +    -- lsp-set-addresses rp-foo router
> +
> +# Connect alice to R1 as distributed router gateway port (172.16.1.6) on
> hv2
> +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \
> +    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
> +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
> +    type=router options:router-port=alice \
> +    -- lsp-set-addresses rp-alice router \
> +
> +# Create logical port foo1 in foo
> +ovn-nbctl lsp-add foo foo1 \
> +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
> +
> +# Create logical port outside1 in outside, which is a nexthop address
> +# for 172.16.1.0/24
> +ovn-nbctl lsp-add outside outside1 \
> +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1"
> +
> +# Set default gateway (nexthop) to 172.16.1.1
> +ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice
> +AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24])
> +ovn-nbctl set Logical_Switch_Port rp-alice options:nat-addresses=router
> +
> +ovn-nbctl lsp-add foo ln-foo
> +ovn-nbctl lsp-set-addresses ln-foo unknown
> +ovn-nbctl lsp-set-options ln-foo network_name=public
> +ovn-nbctl lsp-set-type ln-foo localnet
> +AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2])
> +
> +# Create localnet port in alice
> +ovn-nbctl lsp-add alice ln-alice
> +ovn-nbctl lsp-set-addresses ln-alice unknown
> +ovn-nbctl lsp-set-type ln-alice localnet
> +ovn-nbctl lsp-set-options ln-alice network_name=phys
> +
> +# Create localnet port in outside
> +ovn-nbctl lsp-add outside ln-outside
> +ovn-nbctl lsp-set-addresses ln-outside unknown
> +ovn-nbctl lsp-set-type ln-outside localnet
> +ovn-nbctl lsp-set-options ln-outside network_name=phys
> +
> +# Allow some time for ovn-northd and ovn-controller to catch up.
> +# XXX This should be more systematic.
> +ovn-nbctl --wait=hv --timeout=3 sync
> +
> +# Check that there is a logical flow in logical switch foo's pipeline
> +# to set the outport to rp-foo (which is expected).
> +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup |
> \
> +grep rp-foo | grep -v is_chassis_resident | wc -l`])
> +
> +# Set the option 'reside-on-redirect-chassis' for foo
> +ovn-nbctl set logical_router_port foo
> options:reside-on-redirect-chassis=true
> +# Check that there is a logical flow in logical switch foo's pipeline
> +# to set the outport to rp-foo with the condition is_chassis_redirect.
> +ovn-sbctl dump-flows foo
> +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup |
> \
> +grep rp-foo | grep is_chassis_resident | wc -l`])
> +
> +echo "---------NB dump-----"
> +ovn-nbctl show
> +echo "---------------------"
> +ovn-nbctl list logical_router
> +echo "---------------------"
> +ovn-nbctl list nat
> +echo "---------------------"
> +ovn-nbctl list logical_router_port
> +echo "---------------------"
> +
> +echo "---------SB dump-----"
> +ovn-sbctl list datapath_binding
> +echo "---------------------"
> +ovn-sbctl list port_binding
> +echo "---------------------"
> +ovn-sbctl dump-flows
> +echo "---------------------"
> +ovn-sbctl list chassis
> +echo "---------------------"
> +
> +for chassis in hv1 hv2 hv3; do
> +    as $chassis
> +    echo "------ $chassis dump ----------"
> +    ovs-vsctl show br-int
> +    ovs-ofctl show br-int
> +    ovs-ofctl dump-flows br-int
> +    echo "--------------------------"
> +done
> +
> +ip_to_hex() {
> +    printf "%02x%02x%02x%02x" "$@"
> +}
> +
> +foo1_ip=$(ip_to_hex 192 168 1 2)
> +gw_ip=$(ip_to_hex 172 16 1 6)
> +dst_ip=$(ip_to_hex 8 8 8 8)
> +nexthop_ip=$(ip_to_hex 172 16 1 1)
> +
> +foo1_mac="f00000010203"
> +foo_mac="000001010203"
> +gw_mac="000002010203"
> +nexthop_mac="f00000010204"
> +
> +# Send ip packet from foo1 to 8.8.8.8
> +src_mac="f00000010203"
> +dst_mac="000001010203"
>
> +packet=${foo_mac}${foo1_mac}08004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000
> +
> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
> +sleep 2
> +
> +# ARP request packet for nexthop_ip to expect at outside1
>
> +arp_request=ffffffffffff${gw_mac}08060001080006040001${gw_mac}${gw_ip}000000000000${nexthop_ip}
> +echo $arp_request >> hv3-vif1.expected
> +cat hv3-vif1.expected > expout
> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep
> ${nexthop_ip} | uniq > hv3-vif1
> +AT_CHECK([sort hv3-vif1], [0], [expout])
> +
> +# Send ARP reply from outside1 back to the router
> +reply_mac="f00000010204"
>
> +arp_reply=${gw_mac}${nexthop_mac}08060001080006040002${nexthop_mac}${nexthop_ip}${gw_mac}${gw_ip}
> +
> +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply
> +OVS_WAIT_UNTIL([
> +    test `as hv2 ovs-ofctl dump-flows br-int | grep table=66 | \
> +grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1
> +    ])
> +
> +# VLAN tagged packet with router port(192.168.1.1) MAC as destination MAC
> +# is expected on bridge connecting hv1 and hv2
>
> +expected=${foo_mac}${foo1_mac}8100000208004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000
> +echo $expected > hv1-br-ex_n2.expected
> +
> +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port.
> +# As connection tracking not enabled for this test, snat can't be done on
> the packet.
> +# We still see foo1 as the source ip address. But source mac(gateway MAC)
> and
> +# dest mac(nexthop mac) are properly configured.
>
> +expected=${nexthop_mac}${gw_mac}08004500001c000000003f110100${foo1_ip}${dst_ip}0035111100080000
> +echo $expected > hv3-vif1.expected
> +
> +reset_pcap_file() {
> +    local iface=$1
> +    local pcap_file=$2
> +    ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \
> +options:rxq_pcap=dummy-rx.pcap
> +    rm -f ${pcap_file}*.pcap
> +    ovs-vsctl -- set Interface $iface
> options:tx_pcap=${pcap_file}-tx.pcap \
> +options:rxq_pcap=${pcap_file}-rx.pcap
> +}
> +
> +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2
> +as hv3 reset_pcap_file hv3-vif1 hv3/vif1
> +sleep 2
> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
> +sleep 2
> +
> +# On hv1, the packet should not go from vlan switch pipleline to router
> +# pipleine
> +as hv1 ovs-ofctl dump-flows br-int
> +
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
> "priority=100,reg15=0x1,metadata=0x2" \
> +| grep actions=clone | grep -v n_packets=0 | wc -l], [0], [[0
> +]])
> +
> +# On hv1, table 32 check that no packet goes via the tunnel port
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 \
> +| grep "NXM_NX_TUN_ID" | grep -v n_packets=0 | wc -l], [0], [[0
> +]])
> +
> +ip_packet() {
> +    grep "1010203f00000010203"
> +}
> +
> +# Check vlan tagged packet on the bridge connecting hv1 and hv2 with the
> +# foo1's mac.
> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-ex_n2-tx.pcap |
> ip_packet | uniq > hv1-br-ex_n2
> +cat hv1-br-ex_n2.expected > expout
> +AT_CHECK([sort hv1-br-ex_n2], [0], [expout])
> +
> +# Check expected packet on nexthop interface
> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep
> ${foo1_ip}${dst_ip} | uniq > hv3-vif1
> +cat hv3-vif1.expected > expout
> +AT_CHECK([sort hv3-vif1], [0], [expout])
> +
> +OVN_CLEANUP([hv1],[hv2],[hv3])
> +AT_CLEANUP
> +
>  AT_SETUP([ovn -- IPv6 ND Router Solicitation responder])
>  AT_KEYWORDS([ovn-nd_ra])
>  AT_SKIP_IF([test $HAVE_PYTHON = no])
> --
> 2.17.1
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Gurucharan Shetty Nov. 15, 2018, 6:03 p.m. UTC | #8
On Fri, 5 Oct 2018 at 10:15, <nusiddiq@redhat.com> wrote:

> From: Numan Siddique <nusiddiq@redhat.com>
>
> An OVN deployment can have multiple logical switches each with a
> localnet port connected to a distributed logical router with one
> logical router port providing external connectivity (provider network)
> and others used as tenant networks with VLAN tagging.
>
> As reported in [1], external traffic from these VLAN tenant networks
> are tunnelled to the gateway chassis (chassis hosting a distributed
> gateway port which applies NAT rules). As part of the discussion in
> [1], there were few possible solutions proposed by Russell [2]. This
> patch implements the first option in [2].
>
> With this patch, a new option 'reside-on-redirect-chassis' in 'options'
> column of Logical_Router_Port table is added. If the value of this
> option is set to 'true' and if the logical router also have a
> distributed gateway port, then routing for this logical router port
> is centralized in the chassis hosting the distributed gateway port.
>
> If a logical switch 'sw0' is connected to a router 'lr0' with the
> router port - 'lr0-sw0' with the address - "00:00:00:00:af:12 192.168.1.1"
> , and it has a distributed logical port - 'lr0-public', then the
> below logical flow is added in the logical switch pipeline
> of 'sw0' if the 'reside-on-redirect-chassis' option is set on 'lr-sw0' -
>
> table=16(ls_in_l2_lkup), priority=50,
> match=(eth.dst == 00:00:00:00:af:12 &&
> is_chassis_resident("cr-lr0-public")),
> action=(outport = "sw0-lr0"; output;)
>

Where does "cr" come from.

>
> With the above flow, the packet doesn't enter the router pipeline in
> the source chassis. Instead the packet is sent out via the localnet
> port of 'sw0'. The gateway chassis upon receiving this packet, runs
> the logical router pipeline applying NAT rules and sends the traffic
> out via the localnet port of the provider network. The gateway
> chassis will also reply to the ARP requests for the router port IPs.
>
> With this approach, we avoid redirecting the external traffic to the
> gateway chassis via the tunnel port. There are a couple of drawbacks
> with this approach:
>
>   - East - West routing is no more distributed for the VLAN tenant
>     networks if 'reside-on-redirect-chassis' option is defined
>
>   - 'dnat_and_snat' NAT rules with 'logical_mac' and 'logical_port'
>     columns defined will not work for the VLAN tenant networks.
>
> This approach is taken for now as it is simple. If there is a requirement
> to support distributed routing for these VLAN tenant networks, we
> can explore other possible solutions.
>
> [1] -
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
> [2] -
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046557.html
>
> Reported-at:
> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
> Reported-by: venkata anil <vkommadi@redhat.com>
> Co-authored-by: venkata anil <vkommadi@redhat.com>
> Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
> Signed-off-by: venkata anil <vkommadi@redhat.com>
>

Acked-by: Gurucharan Shetty <guru@ovn.org>

A few comments below.

> ---
>  ovn/northd/ovn-northd.8.xml |  30 ++++
>  ovn/northd/ovn-northd.c     |  71 +++++++---
>  ovn/ovn-architecture.7.xml  | 160 +++++++++++++++++++++
>  ovn/ovn-nb.xml              |  43 ++++++
>  tests/ovn.at                | 273 ++++++++++++++++++++++++++++++++++++
>  5 files changed, 561 insertions(+), 16 deletions(-)
>
> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
> index 7352c6764..f52699bd3 100644
> --- a/ovn/northd/ovn-northd.8.xml
> +++ b/ovn/northd/ovn-northd.8.xml
> @@ -874,6 +874,25 @@ output;
>              resident.
>            </li>
>          </ul>
> +
> +        <p>
> +          For the Ethernet address on a logical switch port of type
> +          <code>router</code>, when that logical switch port's
> +          <ref column="addresses" table="Logical_Switch_Port"
> +          db="OVN_Northbound"/> column is set to <code>router</code> and
> +          the connected logical router port specifies a
> +          <code>reside-on-redirect-chassis</code> and the logical router
> +          to which the connected logical router port belongs to has a
> +          <code>redirect-chassis</code> distributed gateway logical router
> +          port:
> +        </p>
> +
> +        <ul>
> +          <li>
> +            The flow for the connected logical router port's Ethernet
> +            address is only programmed on the
> <code>redirect-chassis</code>.
> +          </li>
> +        </ul>
>        </li>
>
>        <li>
> @@ -1179,6 +1198,17 @@ output;
>            upstream MAC learning to point to the
>            <code>redirect-chassis</code>.
>          </p>
> +
> +        <p>
> +          For the logical router port with the option
> +          <code>reside-on-redirect-chassis</code> set (which is
> centralized),
> +          the above flows are only programmed on the gateway port
> instance on
> +          the <code>redirect-chassis</code> (if the logical router has a
> +          distributed gateway port). This behavior avoids generation
> +          of multiple ARP responses from different chassis, and allows
> +          upstream MAC learning to point to the
> +          <code>redirect-chassis</code>.
> +        </p>
>        </li>
>
>        <li>
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index 31ea5f410..3998a898c 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -4426,13 +4426,32 @@ build_lswitch_flows(struct hmap *datapaths, struct
> hmap *ports,
>                  ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT,
>                                ETH_ADDR_ARGS(mac));
>                  if (op->peer->od->l3dgw_port
> -                    && op->peer == op->peer->od->l3dgw_port
> -                    && op->peer->od->l3redirect_port) {
> -                    /* The destination lookup flow for the router's
> -                     * distributed gateway port MAC address should only be
> -                     * programmed on the "redirect-chassis". */
> -                    ds_put_format(&match, " && is_chassis_resident(%s)",
> -
> op->peer->od->l3redirect_port->json_key);
> +                    && op->peer->od->l3redirect_port
> +                    && op->od->localnet_port) {
> +                    bool add_chassis_resident_check = false;
> +                    if (op->peer == op->peer->od->l3dgw_port) {
> +                        /* The peer of this port represents a distributed
> +                         * gateway port. The destination lookup flow for
> the
> +                         * router's distributed gateway port MAC address
> should
> +                         * only be programmed on the "redirect-chassis".
> */
> +                        add_chassis_resident_check = true;
> +                    } else {
> +                        /* Check if the option
> 'reside-on-redirect-chassis'
> +                         * is set to true on the peer port. If set to true
> +                         * and if the logical switch has a localnet port,
> it
> +                         * means the router pipeline for the packets from
> +                         * this logical switch should be run on the
> chassis
> +                         * hosting the gateway port.
> +                         */
> +                        add_chassis_resident_check = smap_get_bool(
> +                            &op->peer->nbrp->options,
> +                            "reside-on-redirect-chassis", false);
> +                    }
> +
> +                    if (add_chassis_resident_check) {
> +                        ds_put_format(&match, " &&
> is_chassis_resident(%s)",
> +
> op->peer->od->l3redirect_port->json_key);
> +                    }
>                  }
>
>                  ds_clear(&actions);
> @@ -5197,15 +5216,35 @@ build_lrouter_flows(struct hmap *datapaths, struct
> hmap *ports,
>                            op->lrp_networks.ipv4_addrs[i].network_s,
>                            op->lrp_networks.ipv4_addrs[i].plen,
>                            op->lrp_networks.ipv4_addrs[i].addr_s);
> -            if (op->od->l3dgw_port && op == op->od->l3dgw_port
> -                && op->od->l3redirect_port) {
> -                /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s
> -                 * should only be sent from the "redirect-chassis", so
> that
> -                 * upstream MAC learning points to the "redirect-chassis".
> -                 * Also need to avoid generation of multiple ARP responses
> -                 * from different chassis. */
> -                ds_put_format(&match, " && is_chassis_resident(%s)",
> -                              op->od->l3redirect_port->json_key);
> +
> +            if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer
> +                && op->peer->od->localnet_port) {
> +                bool add_chassis_resident_check = false;
> +                if (op == op->od->l3dgw_port) {
> +                    /* Traffic with eth.src =
> l3dgw_port->lrp_networks.ea_s
> +                     * should only be sent from the "redirect-chassis",
> so that
> +                     * upstream MAC learning points to the
> "redirect-chassis".
> +                     * Also need to avoid generation of multiple ARP
> responses
> +                     * from different chassis. */
> +                    add_chassis_resident_check = true;
> +                } else {
> +                    /* Check if the option 'reside-on-redirect-chassis'
> +                     * is set to true on the router port. If set to true
> +                     * and if peer's logical switch has a localnet port,
> it
> +                     * means the router pipeline for the packets from
> +                     * peer's logical switch is be run on the chassis
> +                     * hosting the gateway port and it should reply to the
> +                     * ARP requests for the router port IPs.
> +                     */
> +                    add_chassis_resident_check = smap_get_bool(
> +                        &op->nbrp->options,
> +                        "reside-on-redirect-chassis", false);
> +                }
> +
> +                if (add_chassis_resident_check) {
> +                    ds_put_format(&match, " && is_chassis_resident(%s)",
> +                                  op->od->l3redirect_port->json_key);
> +                }
>              }
>
>              ds_clear(&actions);
> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
> index 6ed2cf132..998470c34 100644
> --- a/ovn/ovn-architecture.7.xml
> +++ b/ovn/ovn-architecture.7.xml
> @@ -1372,6 +1372,166 @@
>      http://docs.openvswitch.org/en/latest/topics/high-availability.
>    </p>
>
> +  <h2>Tenant VLAN networks connected to a Logical Router</h2>
>
I think we should explain what a "Tenant VLAN network" and what a "provider
network" is based on OVN context here.


> +
> +  <p>
> +    It is possible to have multiple logical switches each with a localnet
> port
> +    (representing physical networks) connected to a logical router in
> which one
> +    may provide the external connectivity via a distributed gatewat port
> and
>
s/gatewat/gateway


> +    the rest of them are used internally (with VLAN tagged). It is
> expected
>
I think it would be better if you expand what "used internally" actually
means.



> +    that <code>ovn-bridge-mappings</code> is configured appropriately on
> the
> +    chassis.
> +  </p>
> +
> +  <h3>East West routing</h3>
> +  <p>
> +    East-West routing between these tenant VLAN logical switches works
> almost
> +    the same way as normal logical switches. When the VM sends such a
> packet,
> +    then:
> +  </p>
> +  <ol>
> +    <li>
>
It first enters the ingress pipeline of the first localnet logical switch
and ...

> +      The packet enters the ingress pipeline of the logical router
> datapath
> +      via the logical router port in the source chassis.
> +    </li>
> +
> +    <li>
> +      Routing decision is taken.
> +    </li>
> +
> +    <li>
>
I guess, it goes to the pipeline of 2nd logical switch and then to localnet
port below

> +      The packet goes out of the integration bridge to the provider
> bridge (
> +      belonging to the destination logical switch) via the localnet port.
> +    </li>
> +
> +    <li>
> +      The destination chassis receives the packet via the localnet port
> +      and delivers to the destination VM.
>
It would be nice to point out that it goes through ingress pipeline again
and then the egress pipeline of the second switch (if that is correct).

> +    </li>
> +  </ol>
> +
> +  <h3>External traffic</h3>
> +
> +  <p>
> +    The following happens when a VM sends an external traffic (which
> requires
> +    NATting) and the chassis hosting the VM doesn't have a distributed
> gateway
> +    port.
> +  </p>
> +
> +  <ol>
> +    <li>
> +      The packet enters the ingress pipeline of the logical router
> datapath
> +      via the logical router port in the source chassis.
> +    </li>
> +
> +    <li>
> +      Routing decision is taken. Since the gateway router or the
> distributed
> +      gateway port doesn't reside in the source chassis, the traffic is
> +      redirected to the gateway chassis via the tunnel port.
> +    </li>
> +
> +    <li>
> +      The gateway chassis receives the packet, applies the NAT rules and
> +      forwards it via the localnet port.
> +    </li>
> +  </ol>
> +
> +  <p>
> +    Although this works, the VM traffic is tunnelled. In order for it to
> +    work properly, the MTU of the VLAN tenant networks must be lowered to
> +    account for the tunnel encapsulation.
> +  </p>
> +
> +  <h2>Centralized routing for VLAN tenant networks</h2>
> +
> +  <p>
> +    To overcome the tunnel encapsulation problem described in the previous
> +    section, <code>OVN</code> supports the option of enabling centralized
> +    routing for VLAN tenant networks. CMS can configure the option
> +    <ref column="options:reside-on-redirect-chassis"
> +    table="Logical_Router_Port" db="OVN_NB"/> to <code>true</code> for
> each
> +    <ref table="Logical_Router_Port" db="OVN_NB"/> which connects to the
> +    logical switch of the VLAN tenant network. This causes the gateway
> +    chassis (hosting the distributed gateway port) to handle all the
> +    routing for these networks, making it centralized. It will reply to
> +    the ARP requests for the logical router port IPs.
> +  </p>
> +
> +  <p>
> +    If the logical router doesn't have a distributed gateway port
> connecting
> +    to the provider network, then this option is ignored by
> <code>OVN</code>.
> +  </p>
> +
> +  <p>
> +    The following happens when a VM sends an east-west traffic which
> needs to
> +    be routed:
> +  </p>
> +
> +  <ol>
> +    <li>
> +      The packet from the VM enters the logical datapath pipeline of the
> source
> +      VLAN network in the source chassis and is sent out via the localnet
> port
> +      (instead of sending it to router pipeline).
> +    </li>
> +
> +    <li>
> +      The packet enters the logical datapath pipeline of the source VLAN
> +      network in the gateway chassis and is sent to the logical datapath
> +      pipeline belonging to the logical router.
> +    </li>
> +
> +    <li>
> +      Routing decision is taken.
> +    </li>
> +
> +    <li>
> +      The packet enters the logical datapath pipeline of the destination
> +      VLAN network. The packet is delivered to the destination VM if it
> resides
> +      in the same chassis. Otherwise the packet is sent out via the
> localnet
> +      port of the destination VLAN network.
> +    </li>
> +
> +    <li>
> +      The destination chassis receives the packet via the localnet port
> +      and delivers to the destination VM.
>
Point out that it goes through second switch's pipeline again.

The same points for NAT traffic.


> +    </li>
> +  </ol>
> +
> +  <p>
> +    The following happens when a VM sends an external traffic which
> requires
> +    NATting:
> +  </p>
> +
> +  <ol>
> +    <li>
> +      The packet from the VM enters the logical datapath pipeline of the
> source
> +      VLAN network in the source chassis and is sent out via the localnet
> port
> +      (instead of sending it to router pipeline).
> +    </li>
> +
> +    <li>
> +      The packet enters the logical datapath pipeline of the source VLAN
> +      network in the gateway chassis and is sent to the logical datapath
> +      pipeline belonging to the logical router.
> +    </li>
> +
> +    <li>
> +      Routing decision is taken and NAT rules are applied.
> +    </li>
> +
> +    <li>
> +      The packet enters the logical datapath pipeline of the provider
> network
> +      and is sent out via the localnet port of the provider network.
> +    </li>
> +  </ol>
> +
> +  <p>
> +    For the reverse external traffic, the gateway chassis applies the
> unNATting
> +    rules and sends the packet via the localnet port of the VLAN tenant
> +    network and the destination chassis receives the packet and delivers
> to
> +    the VM.
> +  </p>
> +
>    <h2>Life Cycle of a VTEP gateway</h2>
>
>    <p>
> diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
> index 8564ed39c..13ae56e13 100644
> --- a/ovn/ovn-nb.xml
> +++ b/ovn/ovn-nb.xml
> @@ -1635,6 +1635,49 @@
>            chassis to enable high availability.
>          </p>
>        </column>
> +
> +      <column name="options" key="reside-on-redirect-chassis">
> +        <p>
> +          Generally routing is distributed in <code>OVN</code>. The packet
> +          from a logical port which needs to be routed hits the router
> pipeline
> +          in the source chassis. For the East-West traffic, the packet is
> +          sent directly to the destination chassis. For the outside
> traffic
> +          the packet is sent to the gateway chassis.
> +        </p>
> +
> +        <p>
> +          When this option is set, <code>OVN</code> considers this only if
> +        </p>
> +
> +        <ul>
> +          <li>
> +            The logical router to which this logical router port belongs
> to
> +            has a distributed gateway port.
> +          </li>
> +
> +          <li>
> +            The peer's logical switch has a localnet port (representing
> +            a tenant VLAN network)
> +          </li>
> +        </ul>
> +
> +        <p>
> +          When this option is set to <code>true</code>, then the packet
> +          which needs to be routed hits the router pipeline in the chassis
> +          hosting the distributed gateway router port. The source chassis
> +          pushes out this traffic via the localnet port. With this the
> +          East-West traffic is no more distributed and will always go
> through
> +          the gateway chassis.
> +        </p>
> +
> +        <p>
> +          Without this option set, for any traffic destined to outside
> from a
> +          logical port which belongs to a logical switch with localnet
> port,
> +          the source chassis will send the traffic to the gateway chassis
> via
> +          the tunnel port instead of the localnet port and this could
> cause MTU
> +          issues.
> +        </p>
> +      </column>
>      </group>
>
>      <group title="Attachment">
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 769e09f81..504ba228d 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -8537,6 +8537,279 @@ OVN_CLEANUP([hv1],[hv2],[hv3])
>
>  AT_CLEANUP
>
> +# VLAN traffic for external network redirected through distributed router
> +# gateway port should use vlans(i.e input network vlan tag) across
> hypervisors
> +# instead of tunneling.
> +AT_SETUP([ovn -- vlan traffic for external network with distributed
> router gateway port])
> +AT_SKIP_IF([test $HAVE_PYTHON = no])
> +ovn_start
> +
> +# Logical network:
> +# # One LR R1 that has switches foo (192.168.1.0/24) and
> +# # alice (172.16.1.0/24) connected to it.  The logical port
> +# # between R1 and alice has a "redirect-chassis" specified,
> +# # i.e. it is the distributed router gateway port(172.16.1.6).
> +# # Switch alice also has a localnet port defined.
> +# # An additional switch outside has the same subnet as alice
> +# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1)
> +# # which will receive the packet destined for external network
> +# # (i.e 8.8.8.8 as destination ip).
> +
> +# Physical network:
> +# # Three hypervisors hv[123].
> +# # hv1 hosts vif foo1.
> +# # hv2 is the "redirect-chassis" that hosts the distributed router
> gateway port.
> +# # hv3 hosts nexthop port vif outside1.
> +# # All other tests connect hypervisors to network n1 through br-phys for
> tunneling.
> +# # But in this test, hv1 won't connect to n1(and no br-phys in hv1), and
> +# # in order to show vlans(instead of tunneling) used between hv1 and hv2,
> +# # a new network n2 created and hv1 and hv2 connected to this network
> through br-ex.
> +# # hv2 and hv3 are still connected to n1 network through br-phys.
> +net_add n1
> +
> +# We are not calling ovn_attach for hv1, to avoid adding br-phys.
> +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any bridge
> in hv1
> +sim_add hv1
> +as hv1
> +ovs-vsctl \
> +    -- set Open_vSwitch . external-ids:system-id=hv1 \
> +    -- set Open_vSwitch .
> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
> +    -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \
> +    -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \
> +    -- add-br br-int \
> +    -- set bridge br-int fail-mode=secure
> other-config:disable-in-band=true \
> +    -- set Open_vSwitch . external-ids:ovn-bridge-mappings=public:br-ex
> +
> +start_daemon ovn-controller
> +ovs-vsctl -- add-port br-int hv1-vif1 -- \
> +    set interface hv1-vif1 external-ids:iface-id=foo1 \
> +    ofport-request=1
> +
> +sim_add hv2
> +as hv2
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.2
> +ovs-vsctl set Open_vSwitch .
> external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys"
> +
> +sim_add hv3
> +as hv3
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.3
> +ovs-vsctl -- add-port br-int hv3-vif1 -- \
> +    set interface hv3-vif1 external-ids:iface-id=outside1 \
> +    options:tx_pcap=hv3/vif1-tx.pcap \
> +    options:rxq_pcap=hv3/vif1-rx.pcap \
> +    ofport-request=1
> +ovs-vsctl set Open_vSwitch .
> external-ids:ovn-bridge-mappings="phys:br-phys"
> +
> +# Create network n2 for vlan connectivity between hv1 and hv2
> +net_add n2
> +
> +as hv1
> +ovs-vsctl add-br br-ex
> +net_attach n2 br-ex
> +
> +as hv2
> +ovs-vsctl add-br br-ex
> +net_attach n2 br-ex
> +
> +OVN_POPULATE_ARP
> +
> +ovn-nbctl create Logical_Router name=R1
> +
> +ovn-nbctl ls-add foo
> +ovn-nbctl ls-add alice
> +ovn-nbctl ls-add outside
> +
> +# Connect foo to R1
> +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24
> +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
> +    type=router options:router-port=foo \
> +    -- lsp-set-addresses rp-foo router
> +
> +# Connect alice to R1 as distributed router gateway port (172.16.1.6) on
> hv2
> +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \
> +    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
> +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
> +    type=router options:router-port=alice \
> +    -- lsp-set-addresses rp-alice router \
> +
> +# Create logical port foo1 in foo
> +ovn-nbctl lsp-add foo foo1 \
> +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
> +
> +# Create logical port outside1 in outside, which is a nexthop address
> +# for 172.16.1.0/24
> +ovn-nbctl lsp-add outside outside1 \
> +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1"
> +
> +# Set default gateway (nexthop) to 172.16.1.1
> +ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice
> +AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24])
> +ovn-nbctl set Logical_Switch_Port rp-alice options:nat-addresses=router
> +
> +ovn-nbctl lsp-add foo ln-foo
> +ovn-nbctl lsp-set-addresses ln-foo unknown
> +ovn-nbctl lsp-set-options ln-foo network_name=public
> +ovn-nbctl lsp-set-type ln-foo localnet
> +AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2])
> +
> +# Create localnet port in alice
> +ovn-nbctl lsp-add alice ln-alice
> +ovn-nbctl lsp-set-addresses ln-alice unknown
> +ovn-nbctl lsp-set-type ln-alice localnet
> +ovn-nbctl lsp-set-options ln-alice network_name=phys
> +
> +# Create localnet port in outside
> +ovn-nbctl lsp-add outside ln-outside
> +ovn-nbctl lsp-set-addresses ln-outside unknown
> +ovn-nbctl lsp-set-type ln-outside localnet
> +ovn-nbctl lsp-set-options ln-outside network_name=phys
> +
> +# Allow some time for ovn-northd and ovn-controller to catch up.
> +# XXX This should be more systematic.
> +ovn-nbctl --wait=hv --timeout=3 sync
> +
> +# Check that there is a logical flow in logical switch foo's pipeline
> +# to set the outport to rp-foo (which is expected).
> +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup |
> \
> +grep rp-foo | grep -v is_chassis_resident | wc -l`])
> +
> +# Set the option 'reside-on-redirect-chassis' for foo
> +ovn-nbctl set logical_router_port foo
> options:reside-on-redirect-chassis=true
> +# Check that there is a logical flow in logical switch foo's pipeline
> +# to set the outport to rp-foo with the condition is_chassis_redirect.
> +ovn-sbctl dump-flows foo
> +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup |
> \
> +grep rp-foo | grep is_chassis_resident | wc -l`])
> +
> +echo "---------NB dump-----"
> +ovn-nbctl show
> +echo "---------------------"
> +ovn-nbctl list logical_router
> +echo "---------------------"
> +ovn-nbctl list nat
> +echo "---------------------"
> +ovn-nbctl list logical_router_port
> +echo "---------------------"
> +
> +echo "---------SB dump-----"
> +ovn-sbctl list datapath_binding
> +echo "---------------------"
> +ovn-sbctl list port_binding
> +echo "---------------------"
> +ovn-sbctl dump-flows
> +echo "---------------------"
> +ovn-sbctl list chassis
> +echo "---------------------"
> +
> +for chassis in hv1 hv2 hv3; do
> +    as $chassis
> +    echo "------ $chassis dump ----------"
> +    ovs-vsctl show br-int
> +    ovs-ofctl show br-int
> +    ovs-ofctl dump-flows br-int
> +    echo "--------------------------"
> +done
> +
> +ip_to_hex() {
> +    printf "%02x%02x%02x%02x" "$@"
> +}
> +
> +foo1_ip=$(ip_to_hex 192 168 1 2)
> +gw_ip=$(ip_to_hex 172 16 1 6)
> +dst_ip=$(ip_to_hex 8 8 8 8)
> +nexthop_ip=$(ip_to_hex 172 16 1 1)
> +
> +foo1_mac="f00000010203"
> +foo_mac="000001010203"
> +gw_mac="000002010203"
> +nexthop_mac="f00000010204"
> +
> +# Send ip packet from foo1 to 8.8.8.8
> +src_mac="f00000010203"
> +dst_mac="000001010203"
>
> +packet=${foo_mac}${foo1_mac}08004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000
> +
> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
> +sleep 2
> +
> +# ARP request packet for nexthop_ip to expect at outside1
>
> +arp_request=ffffffffffff${gw_mac}08060001080006040001${gw_mac}${gw_ip}000000000000${nexthop_ip}
> +echo $arp_request >> hv3-vif1.expected
> +cat hv3-vif1.expected > expout
> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep
> ${nexthop_ip} | uniq > hv3-vif1
> +AT_CHECK([sort hv3-vif1], [0], [expout])
> +
> +# Send ARP reply from outside1 back to the router
> +reply_mac="f00000010204"
>
> +arp_reply=${gw_mac}${nexthop_mac}08060001080006040002${nexthop_mac}${nexthop_ip}${gw_mac}${gw_ip}
> +
> +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply
> +OVS_WAIT_UNTIL([
> +    test `as hv2 ovs-ofctl dump-flows br-int | grep table=66 | \
> +grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1
> +    ])
> +
> +# VLAN tagged packet with router port(192.168.1.1) MAC as destination MAC
> +# is expected on bridge connecting hv1 and hv2
>
> +expected=${foo_mac}${foo1_mac}8100000208004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000
> +echo $expected > hv1-br-ex_n2.expected
> +
> +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port.
> +# As connection tracking not enabled for this test, snat can't be done on
> the packet.
> +# We still see foo1 as the source ip address. But source mac(gateway MAC)
> and
> +# dest mac(nexthop mac) are properly configured.
>
> +expected=${nexthop_mac}${gw_mac}08004500001c000000003f110100${foo1_ip}${dst_ip}0035111100080000
> +echo $expected > hv3-vif1.expected
> +
> +reset_pcap_file() {
> +    local iface=$1
> +    local pcap_file=$2
> +    ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \
> +options:rxq_pcap=dummy-rx.pcap
> +    rm -f ${pcap_file}*.pcap
> +    ovs-vsctl -- set Interface $iface
> options:tx_pcap=${pcap_file}-tx.pcap \
> +options:rxq_pcap=${pcap_file}-rx.pcap
> +}
> +
> +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2
> +as hv3 reset_pcap_file hv3-vif1 hv3/vif1
> +sleep 2
> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
> +sleep 2
> +
> +# On hv1, the packet should not go from vlan switch pipleline to router
> +# pipleine
> +as hv1 ovs-ofctl dump-flows br-int
> +
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
> "priority=100,reg15=0x1,metadata=0x2" \
> +| grep actions=clone | grep -v n_packets=0 | wc -l], [0], [[0
> +]])
> +
> +# On hv1, table 32 check that no packet goes via the tunnel port
> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 \
> +| grep "NXM_NX_TUN_ID" | grep -v n_packets=0 | wc -l], [0], [[0
> +]])
> +
> +ip_packet() {
> +    grep "1010203f00000010203"
> +}
> +
> +# Check vlan tagged packet on the bridge connecting hv1 and hv2 with the
> +# foo1's mac.
> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-ex_n2-tx.pcap |
> ip_packet | uniq > hv1-br-ex_n2
> +cat hv1-br-ex_n2.expected > expout
> +AT_CHECK([sort hv1-br-ex_n2], [0], [expout])
> +
> +# Check expected packet on nexthop interface
> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep
> ${foo1_ip}${dst_ip} | uniq > hv3-vif1
> +cat hv3-vif1.expected > expout
> +AT_CHECK([sort hv3-vif1], [0], [expout])
> +
> +OVN_CLEANUP([hv1],[hv2],[hv3])
> +AT_CLEANUP
> +
>  AT_SETUP([ovn -- IPv6 ND Router Solicitation responder])
>  AT_KEYWORDS([ovn-nd_ra])
>  AT_SKIP_IF([test $HAVE_PYTHON = no])
> --
> 2.17.1
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Numan Siddique Nov. 19, 2018, 4:19 p.m. UTC | #9
Thanks Guru for the review and comments. I have addressed the comments and
submitted v2 here - https://patchwork.ozlabs.org/patch/999893/

Thanks
Numan


On Thu, Nov 15, 2018 at 11:33 PM Guru Shetty <guru@ovn.org> wrote:

>
>
> On Fri, 5 Oct 2018 at 10:15, <nusiddiq@redhat.com> wrote:
>
>> From: Numan Siddique <nusiddiq@redhat.com>
>>
>> An OVN deployment can have multiple logical switches each with a
>> localnet port connected to a distributed logical router with one
>> logical router port providing external connectivity (provider network)
>> and others used as tenant networks with VLAN tagging.
>>
>> As reported in [1], external traffic from these VLAN tenant networks
>> are tunnelled to the gateway chassis (chassis hosting a distributed
>> gateway port which applies NAT rules). As part of the discussion in
>> [1], there were few possible solutions proposed by Russell [2]. This
>> patch implements the first option in [2].
>>
>> With this patch, a new option 'reside-on-redirect-chassis' in 'options'
>> column of Logical_Router_Port table is added. If the value of this
>> option is set to 'true' and if the logical router also have a
>> distributed gateway port, then routing for this logical router port
>> is centralized in the chassis hosting the distributed gateway port.
>>
>> If a logical switch 'sw0' is connected to a router 'lr0' with the
>> router port - 'lr0-sw0' with the address - "00:00:00:00:af:12 192.168.1.1"
>> , and it has a distributed logical port - 'lr0-public', then the
>> below logical flow is added in the logical switch pipeline
>> of 'sw0' if the 'reside-on-redirect-chassis' option is set on 'lr-sw0' -
>>
>> table=16(ls_in_l2_lkup), priority=50,
>> match=(eth.dst == 00:00:00:00:af:12 &&
>> is_chassis_resident("cr-lr0-public")),
>> action=(outport = "sw0-lr0"; output;)
>>
>
> Where does "cr" come from.
>
>>
>> With the above flow, the packet doesn't enter the router pipeline in
>> the source chassis. Instead the packet is sent out via the localnet
>> port of 'sw0'. The gateway chassis upon receiving this packet, runs
>> the logical router pipeline applying NAT rules and sends the traffic
>> out via the localnet port of the provider network. The gateway
>> chassis will also reply to the ARP requests for the router port IPs.
>>
>> With this approach, we avoid redirecting the external traffic to the
>> gateway chassis via the tunnel port. There are a couple of drawbacks
>> with this approach:
>>
>>   - East - West routing is no more distributed for the VLAN tenant
>>     networks if 'reside-on-redirect-chassis' option is defined
>>
>>   - 'dnat_and_snat' NAT rules with 'logical_mac' and 'logical_port'
>>     columns defined will not work for the VLAN tenant networks.
>>
>> This approach is taken for now as it is simple. If there is a requirement
>> to support distributed routing for these VLAN tenant networks, we
>> can explore other possible solutions.
>>
>> [1] -
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
>> [2] -
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046557.html
>>
>> Reported-at:
>> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html
>> Reported-by: venkata anil <vkommadi@redhat.com>
>> Co-authored-by: venkata anil <vkommadi@redhat.com>
>> Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
>> Signed-off-by: venkata anil <vkommadi@redhat.com>
>>
>
> Acked-by: Gurucharan Shetty <guru@ovn.org>
>
> A few comments below.
>
>> ---
>>  ovn/northd/ovn-northd.8.xml |  30 ++++
>>  ovn/northd/ovn-northd.c     |  71 +++++++---
>>  ovn/ovn-architecture.7.xml  | 160 +++++++++++++++++++++
>>  ovn/ovn-nb.xml              |  43 ++++++
>>  tests/ovn.at                | 273 ++++++++++++++++++++++++++++++++++++
>>  5 files changed, 561 insertions(+), 16 deletions(-)
>>
>> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
>> index 7352c6764..f52699bd3 100644
>> --- a/ovn/northd/ovn-northd.8.xml
>> +++ b/ovn/northd/ovn-northd.8.xml
>> @@ -874,6 +874,25 @@ output;
>>              resident.
>>            </li>
>>          </ul>
>> +
>> +        <p>
>> +          For the Ethernet address on a logical switch port of type
>> +          <code>router</code>, when that logical switch port's
>> +          <ref column="addresses" table="Logical_Switch_Port"
>> +          db="OVN_Northbound"/> column is set to <code>router</code> and
>> +          the connected logical router port specifies a
>> +          <code>reside-on-redirect-chassis</code> and the logical router
>> +          to which the connected logical router port belongs to has a
>> +          <code>redirect-chassis</code> distributed gateway logical
>> router
>> +          port:
>> +        </p>
>> +
>> +        <ul>
>> +          <li>
>> +            The flow for the connected logical router port's Ethernet
>> +            address is only programmed on the
>> <code>redirect-chassis</code>.
>> +          </li>
>> +        </ul>
>>        </li>
>>
>>        <li>
>> @@ -1179,6 +1198,17 @@ output;
>>            upstream MAC learning to point to the
>>            <code>redirect-chassis</code>.
>>          </p>
>> +
>> +        <p>
>> +          For the logical router port with the option
>> +          <code>reside-on-redirect-chassis</code> set (which is
>> centralized),
>> +          the above flows are only programmed on the gateway port
>> instance on
>> +          the <code>redirect-chassis</code> (if the logical router has a
>> +          distributed gateway port). This behavior avoids generation
>> +          of multiple ARP responses from different chassis, and allows
>> +          upstream MAC learning to point to the
>> +          <code>redirect-chassis</code>.
>> +        </p>
>>        </li>
>>
>>        <li>
>> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>> index 31ea5f410..3998a898c 100644
>> --- a/ovn/northd/ovn-northd.c
>> +++ b/ovn/northd/ovn-northd.c
>> @@ -4426,13 +4426,32 @@ build_lswitch_flows(struct hmap *datapaths,
>> struct hmap *ports,
>>                  ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT,
>>                                ETH_ADDR_ARGS(mac));
>>                  if (op->peer->od->l3dgw_port
>> -                    && op->peer == op->peer->od->l3dgw_port
>> -                    && op->peer->od->l3redirect_port) {
>> -                    /* The destination lookup flow for the router's
>> -                     * distributed gateway port MAC address should only
>> be
>> -                     * programmed on the "redirect-chassis". */
>> -                    ds_put_format(&match, " && is_chassis_resident(%s)",
>> -
>> op->peer->od->l3redirect_port->json_key);
>> +                    && op->peer->od->l3redirect_port
>> +                    && op->od->localnet_port) {
>> +                    bool add_chassis_resident_check = false;
>> +                    if (op->peer == op->peer->od->l3dgw_port) {
>> +                        /* The peer of this port represents a distributed
>> +                         * gateway port. The destination lookup flow for
>> the
>> +                         * router's distributed gateway port MAC address
>> should
>> +                         * only be programmed on the "redirect-chassis".
>> */
>> +                        add_chassis_resident_check = true;
>> +                    } else {
>> +                        /* Check if the option
>> 'reside-on-redirect-chassis'
>> +                         * is set to true on the peer port. If set to
>> true
>> +                         * and if the logical switch has a localnet
>> port, it
>> +                         * means the router pipeline for the packets from
>> +                         * this logical switch should be run on the
>> chassis
>> +                         * hosting the gateway port.
>> +                         */
>> +                        add_chassis_resident_check = smap_get_bool(
>> +                            &op->peer->nbrp->options,
>> +                            "reside-on-redirect-chassis", false);
>> +                    }
>> +
>> +                    if (add_chassis_resident_check) {
>> +                        ds_put_format(&match, " &&
>> is_chassis_resident(%s)",
>> +
>> op->peer->od->l3redirect_port->json_key);
>> +                    }
>>                  }
>>
>>                  ds_clear(&actions);
>> @@ -5197,15 +5216,35 @@ build_lrouter_flows(struct hmap *datapaths,
>> struct hmap *ports,
>>                            op->lrp_networks.ipv4_addrs[i].network_s,
>>                            op->lrp_networks.ipv4_addrs[i].plen,
>>                            op->lrp_networks.ipv4_addrs[i].addr_s);
>> -            if (op->od->l3dgw_port && op == op->od->l3dgw_port
>> -                && op->od->l3redirect_port) {
>> -                /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s
>> -                 * should only be sent from the "redirect-chassis", so
>> that
>> -                 * upstream MAC learning points to the
>> "redirect-chassis".
>> -                 * Also need to avoid generation of multiple ARP
>> responses
>> -                 * from different chassis. */
>> -                ds_put_format(&match, " && is_chassis_resident(%s)",
>> -                              op->od->l3redirect_port->json_key);
>> +
>> +            if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer
>> +                && op->peer->od->localnet_port) {
>> +                bool add_chassis_resident_check = false;
>> +                if (op == op->od->l3dgw_port) {
>> +                    /* Traffic with eth.src =
>> l3dgw_port->lrp_networks.ea_s
>> +                     * should only be sent from the "redirect-chassis",
>> so that
>> +                     * upstream MAC learning points to the
>> "redirect-chassis".
>> +                     * Also need to avoid generation of multiple ARP
>> responses
>> +                     * from different chassis. */
>> +                    add_chassis_resident_check = true;
>> +                } else {
>> +                    /* Check if the option 'reside-on-redirect-chassis'
>> +                     * is set to true on the router port. If set to true
>> +                     * and if peer's logical switch has a localnet port,
>> it
>> +                     * means the router pipeline for the packets from
>> +                     * peer's logical switch is be run on the chassis
>> +                     * hosting the gateway port and it should reply to
>> the
>> +                     * ARP requests for the router port IPs.
>> +                     */
>> +                    add_chassis_resident_check = smap_get_bool(
>> +                        &op->nbrp->options,
>> +                        "reside-on-redirect-chassis", false);
>> +                }
>> +
>> +                if (add_chassis_resident_check) {
>> +                    ds_put_format(&match, " && is_chassis_resident(%s)",
>> +                                  op->od->l3redirect_port->json_key);
>> +                }
>>              }
>>
>>              ds_clear(&actions);
>> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
>> index 6ed2cf132..998470c34 100644
>> --- a/ovn/ovn-architecture.7.xml
>> +++ b/ovn/ovn-architecture.7.xml
>> @@ -1372,6 +1372,166 @@
>>      http://docs.openvswitch.org/en/latest/topics/high-availability.
>>    </p>
>>
>> +  <h2>Tenant VLAN networks connected to a Logical Router</h2>
>>
> I think we should explain what a "Tenant VLAN network" and what a
> "provider network" is based on OVN context here.
>
>
>> +
>> +  <p>
>> +    It is possible to have multiple logical switches each with a
>> localnet port
>> +    (representing physical networks) connected to a logical router in
>> which one
>> +    may provide the external connectivity via a distributed gatewat port
>> and
>>
> s/gatewat/gateway
>
>
>> +    the rest of them are used internally (with VLAN tagged). It is
>> expected
>>
> I think it would be better if you expand what "used internally" actually
> means.
>
>
>
>> +    that <code>ovn-bridge-mappings</code> is configured appropriately on
>> the
>> +    chassis.
>> +  </p>
>> +
>> +  <h3>East West routing</h3>
>> +  <p>
>> +    East-West routing between these tenant VLAN logical switches works
>> almost
>> +    the same way as normal logical switches. When the VM sends such a
>> packet,
>> +    then:
>> +  </p>
>> +  <ol>
>> +    <li>
>>
> It first enters the ingress pipeline of the first localnet logical switch
> and ...
>
>> +      The packet enters the ingress pipeline of the logical router
>> datapath
>> +      via the logical router port in the source chassis.
>> +    </li>
>> +
>> +    <li>
>> +      Routing decision is taken.
>> +    </li>
>> +
>> +    <li>
>>
> I guess, it goes to the pipeline of 2nd logical switch and then to
> localnet port below
>
>> +      The packet goes out of the integration bridge to the provider
>> bridge (
>> +      belonging to the destination logical switch) via the localnet port.
>> +    </li>
>> +
>> +    <li>
>> +      The destination chassis receives the packet via the localnet port
>> +      and delivers to the destination VM.
>>
> It would be nice to point out that it goes through ingress pipeline again
> and then the egress pipeline of the second switch (if that is correct).
>
>> +    </li>
>> +  </ol>
>> +
>> +  <h3>External traffic</h3>
>> +
>> +  <p>
>> +    The following happens when a VM sends an external traffic (which
>> requires
>> +    NATting) and the chassis hosting the VM doesn't have a distributed
>> gateway
>> +    port.
>> +  </p>
>> +
>> +  <ol>
>> +    <li>
>> +      The packet enters the ingress pipeline of the logical router
>> datapath
>> +      via the logical router port in the source chassis.
>> +    </li>
>> +
>> +    <li>
>> +      Routing decision is taken. Since the gateway router or the
>> distributed
>> +      gateway port doesn't reside in the source chassis, the traffic is
>> +      redirected to the gateway chassis via the tunnel port.
>> +    </li>
>> +
>> +    <li>
>> +      The gateway chassis receives the packet, applies the NAT rules and
>> +      forwards it via the localnet port.
>> +    </li>
>> +  </ol>
>> +
>> +  <p>
>> +    Although this works, the VM traffic is tunnelled. In order for it to
>> +    work properly, the MTU of the VLAN tenant networks must be lowered to
>> +    account for the tunnel encapsulation.
>> +  </p>
>> +
>> +  <h2>Centralized routing for VLAN tenant networks</h2>
>> +
>> +  <p>
>> +    To overcome the tunnel encapsulation problem described in the
>> previous
>> +    section, <code>OVN</code> supports the option of enabling centralized
>> +    routing for VLAN tenant networks. CMS can configure the option
>> +    <ref column="options:reside-on-redirect-chassis"
>> +    table="Logical_Router_Port" db="OVN_NB"/> to <code>true</code> for
>> each
>> +    <ref table="Logical_Router_Port" db="OVN_NB"/> which connects to the
>> +    logical switch of the VLAN tenant network. This causes the gateway
>> +    chassis (hosting the distributed gateway port) to handle all the
>> +    routing for these networks, making it centralized. It will reply to
>> +    the ARP requests for the logical router port IPs.
>> +  </p>
>> +
>> +  <p>
>> +    If the logical router doesn't have a distributed gateway port
>> connecting
>> +    to the provider network, then this option is ignored by
>> <code>OVN</code>.
>> +  </p>
>> +
>> +  <p>
>> +    The following happens when a VM sends an east-west traffic which
>> needs to
>> +    be routed:
>> +  </p>
>> +
>> +  <ol>
>> +    <li>
>> +      The packet from the VM enters the logical datapath pipeline of the
>> source
>> +      VLAN network in the source chassis and is sent out via the
>> localnet port
>> +      (instead of sending it to router pipeline).
>> +    </li>
>> +
>> +    <li>
>> +      The packet enters the logical datapath pipeline of the source VLAN
>> +      network in the gateway chassis and is sent to the logical datapath
>> +      pipeline belonging to the logical router.
>> +    </li>
>> +
>> +    <li>
>> +      Routing decision is taken.
>> +    </li>
>> +
>> +    <li>
>> +      The packet enters the logical datapath pipeline of the destination
>> +      VLAN network. The packet is delivered to the destination VM if it
>> resides
>> +      in the same chassis. Otherwise the packet is sent out via the
>> localnet
>> +      port of the destination VLAN network.
>> +    </li>
>> +
>> +    <li>
>> +      The destination chassis receives the packet via the localnet port
>> +      and delivers to the destination VM.
>>
> Point out that it goes through second switch's pipeline again.
>
> The same points for NAT traffic.
>
>
>> +    </li>
>> +  </ol>
>> +
>> +  <p>
>> +    The following happens when a VM sends an external traffic which
>> requires
>> +    NATting:
>> +  </p>
>> +
>> +  <ol>
>> +    <li>
>> +      The packet from the VM enters the logical datapath pipeline of the
>> source
>> +      VLAN network in the source chassis and is sent out via the
>> localnet port
>> +      (instead of sending it to router pipeline).
>> +    </li>
>> +
>> +    <li>
>> +      The packet enters the logical datapath pipeline of the source VLAN
>> +      network in the gateway chassis and is sent to the logical datapath
>> +      pipeline belonging to the logical router.
>> +    </li>
>> +
>> +    <li>
>> +      Routing decision is taken and NAT rules are applied.
>> +    </li>
>> +
>> +    <li>
>> +      The packet enters the logical datapath pipeline of the provider
>> network
>> +      and is sent out via the localnet port of the provider network.
>> +    </li>
>> +  </ol>
>> +
>> +  <p>
>> +    For the reverse external traffic, the gateway chassis applies the
>> unNATting
>> +    rules and sends the packet via the localnet port of the VLAN tenant
>> +    network and the destination chassis receives the packet and delivers
>> to
>> +    the VM.
>> +  </p>
>> +
>>    <h2>Life Cycle of a VTEP gateway</h2>
>>
>>    <p>
>> diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
>> index 8564ed39c..13ae56e13 100644
>> --- a/ovn/ovn-nb.xml
>> +++ b/ovn/ovn-nb.xml
>> @@ -1635,6 +1635,49 @@
>>            chassis to enable high availability.
>>          </p>
>>        </column>
>> +
>> +      <column name="options" key="reside-on-redirect-chassis">
>> +        <p>
>> +          Generally routing is distributed in <code>OVN</code>. The
>> packet
>> +          from a logical port which needs to be routed hits the router
>> pipeline
>> +          in the source chassis. For the East-West traffic, the packet is
>> +          sent directly to the destination chassis. For the outside
>> traffic
>> +          the packet is sent to the gateway chassis.
>> +        </p>
>> +
>> +        <p>
>> +          When this option is set, <code>OVN</code> considers this only
>> if
>> +        </p>
>> +
>> +        <ul>
>> +          <li>
>> +            The logical router to which this logical router port belongs
>> to
>> +            has a distributed gateway port.
>> +          </li>
>> +
>> +          <li>
>> +            The peer's logical switch has a localnet port (representing
>> +            a tenant VLAN network)
>> +          </li>
>> +        </ul>
>> +
>> +        <p>
>> +          When this option is set to <code>true</code>, then the packet
>> +          which needs to be routed hits the router pipeline in the
>> chassis
>> +          hosting the distributed gateway router port. The source chassis
>> +          pushes out this traffic via the localnet port. With this the
>> +          East-West traffic is no more distributed and will always go
>> through
>> +          the gateway chassis.
>> +        </p>
>> +
>> +        <p>
>> +          Without this option set, for any traffic destined to outside
>> from a
>> +          logical port which belongs to a logical switch with localnet
>> port,
>> +          the source chassis will send the traffic to the gateway
>> chassis via
>> +          the tunnel port instead of the localnet port and this could
>> cause MTU
>> +          issues.
>> +        </p>
>> +      </column>
>>      </group>
>>
>>      <group title="Attachment">
>> diff --git a/tests/ovn.at b/tests/ovn.at
>> index 769e09f81..504ba228d 100644
>> --- a/tests/ovn.at
>> +++ b/tests/ovn.at
>> @@ -8537,6 +8537,279 @@ OVN_CLEANUP([hv1],[hv2],[hv3])
>>
>>  AT_CLEANUP
>>
>> +# VLAN traffic for external network redirected through distributed router
>> +# gateway port should use vlans(i.e input network vlan tag) across
>> hypervisors
>> +# instead of tunneling.
>> +AT_SETUP([ovn -- vlan traffic for external network with distributed
>> router gateway port])
>> +AT_SKIP_IF([test $HAVE_PYTHON = no])
>> +ovn_start
>> +
>> +# Logical network:
>> +# # One LR R1 that has switches foo (192.168.1.0/24) and
>> +# # alice (172.16.1.0/24) connected to it.  The logical port
>> +# # between R1 and alice has a "redirect-chassis" specified,
>> +# # i.e. it is the distributed router gateway port(172.16.1.6).
>> +# # Switch alice also has a localnet port defined.
>> +# # An additional switch outside has the same subnet as alice
>> +# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1)
>> +# # which will receive the packet destined for external network
>> +# # (i.e 8.8.8.8 as destination ip).
>> +
>> +# Physical network:
>> +# # Three hypervisors hv[123].
>> +# # hv1 hosts vif foo1.
>> +# # hv2 is the "redirect-chassis" that hosts the distributed router
>> gateway port.
>> +# # hv3 hosts nexthop port vif outside1.
>> +# # All other tests connect hypervisors to network n1 through br-phys
>> for tunneling.
>> +# # But in this test, hv1 won't connect to n1(and no br-phys in hv1), and
>> +# # in order to show vlans(instead of tunneling) used between hv1 and
>> hv2,
>> +# # a new network n2 created and hv1 and hv2 connected to this network
>> through br-ex.
>> +# # hv2 and hv3 are still connected to n1 network through br-phys.
>> +net_add n1
>> +
>> +# We are not calling ovn_attach for hv1, to avoid adding br-phys.
>> +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any bridge
>> in hv1
>> +sim_add hv1
>> +as hv1
>> +ovs-vsctl \
>> +    -- set Open_vSwitch . external-ids:system-id=hv1 \
>> +    -- set Open_vSwitch .
>> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
>> +    -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \
>> +    -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \
>> +    -- add-br br-int \
>> +    -- set bridge br-int fail-mode=secure
>> other-config:disable-in-band=true \
>> +    -- set Open_vSwitch . external-ids:ovn-bridge-mappings=public:br-ex
>> +
>> +start_daemon ovn-controller
>> +ovs-vsctl -- add-port br-int hv1-vif1 -- \
>> +    set interface hv1-vif1 external-ids:iface-id=foo1 \
>> +    ofport-request=1
>> +
>> +sim_add hv2
>> +as hv2
>> +ovs-vsctl add-br br-phys
>> +ovn_attach n1 br-phys 192.168.0.2
>> +ovs-vsctl set Open_vSwitch .
>> external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys"
>> +
>> +sim_add hv3
>> +as hv3
>> +ovs-vsctl add-br br-phys
>> +ovn_attach n1 br-phys 192.168.0.3
>> +ovs-vsctl -- add-port br-int hv3-vif1 -- \
>> +    set interface hv3-vif1 external-ids:iface-id=outside1 \
>> +    options:tx_pcap=hv3/vif1-tx.pcap \
>> +    options:rxq_pcap=hv3/vif1-rx.pcap \
>> +    ofport-request=1
>> +ovs-vsctl set Open_vSwitch .
>> external-ids:ovn-bridge-mappings="phys:br-phys"
>> +
>> +# Create network n2 for vlan connectivity between hv1 and hv2
>> +net_add n2
>> +
>> +as hv1
>> +ovs-vsctl add-br br-ex
>> +net_attach n2 br-ex
>> +
>> +as hv2
>> +ovs-vsctl add-br br-ex
>> +net_attach n2 br-ex
>> +
>> +OVN_POPULATE_ARP
>> +
>> +ovn-nbctl create Logical_Router name=R1
>> +
>> +ovn-nbctl ls-add foo
>> +ovn-nbctl ls-add alice
>> +ovn-nbctl ls-add outside
>> +
>> +# Connect foo to R1
>> +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24
>> +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
>> +    type=router options:router-port=foo \
>> +    -- lsp-set-addresses rp-foo router
>> +
>> +# Connect alice to R1 as distributed router gateway port (172.16.1.6) on
>> hv2
>> +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \
>> +    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
>> +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
>> +    type=router options:router-port=alice \
>> +    -- lsp-set-addresses rp-alice router \
>> +
>> +# Create logical port foo1 in foo
>> +ovn-nbctl lsp-add foo foo1 \
>> +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
>> +
>> +# Create logical port outside1 in outside, which is a nexthop address
>> +# for 172.16.1.0/24
>> +ovn-nbctl lsp-add outside outside1 \
>> +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1"
>> +
>> +# Set default gateway (nexthop) to 172.16.1.1
>> +ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice
>> +AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24])
>> +ovn-nbctl set Logical_Switch_Port rp-alice options:nat-addresses=router
>> +
>> +ovn-nbctl lsp-add foo ln-foo
>> +ovn-nbctl lsp-set-addresses ln-foo unknown
>> +ovn-nbctl lsp-set-options ln-foo network_name=public
>> +ovn-nbctl lsp-set-type ln-foo localnet
>> +AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2])
>> +
>> +# Create localnet port in alice
>> +ovn-nbctl lsp-add alice ln-alice
>> +ovn-nbctl lsp-set-addresses ln-alice unknown
>> +ovn-nbctl lsp-set-type ln-alice localnet
>> +ovn-nbctl lsp-set-options ln-alice network_name=phys
>> +
>> +# Create localnet port in outside
>> +ovn-nbctl lsp-add outside ln-outside
>> +ovn-nbctl lsp-set-addresses ln-outside unknown
>> +ovn-nbctl lsp-set-type ln-outside localnet
>> +ovn-nbctl lsp-set-options ln-outside network_name=phys
>> +
>> +# Allow some time for ovn-northd and ovn-controller to catch up.
>> +# XXX This should be more systematic.
>> +ovn-nbctl --wait=hv --timeout=3 sync
>> +
>> +# Check that there is a logical flow in logical switch foo's pipeline
>> +# to set the outport to rp-foo (which is expected).
>> +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup
>> | \
>> +grep rp-foo | grep -v is_chassis_resident | wc -l`])
>> +
>> +# Set the option 'reside-on-redirect-chassis' for foo
>> +ovn-nbctl set logical_router_port foo
>> options:reside-on-redirect-chassis=true
>> +# Check that there is a logical flow in logical switch foo's pipeline
>> +# to set the outport to rp-foo with the condition is_chassis_redirect.
>> +ovn-sbctl dump-flows foo
>> +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup
>> | \
>> +grep rp-foo | grep is_chassis_resident | wc -l`])
>> +
>> +echo "---------NB dump-----"
>> +ovn-nbctl show
>> +echo "---------------------"
>> +ovn-nbctl list logical_router
>> +echo "---------------------"
>> +ovn-nbctl list nat
>> +echo "---------------------"
>> +ovn-nbctl list logical_router_port
>> +echo "---------------------"
>> +
>> +echo "---------SB dump-----"
>> +ovn-sbctl list datapath_binding
>> +echo "---------------------"
>> +ovn-sbctl list port_binding
>> +echo "---------------------"
>> +ovn-sbctl dump-flows
>> +echo "---------------------"
>> +ovn-sbctl list chassis
>> +echo "---------------------"
>> +
>> +for chassis in hv1 hv2 hv3; do
>> +    as $chassis
>> +    echo "------ $chassis dump ----------"
>> +    ovs-vsctl show br-int
>> +    ovs-ofctl show br-int
>> +    ovs-ofctl dump-flows br-int
>> +    echo "--------------------------"
>> +done
>> +
>> +ip_to_hex() {
>> +    printf "%02x%02x%02x%02x" "$@"
>> +}
>> +
>> +foo1_ip=$(ip_to_hex 192 168 1 2)
>> +gw_ip=$(ip_to_hex 172 16 1 6)
>> +dst_ip=$(ip_to_hex 8 8 8 8)
>> +nexthop_ip=$(ip_to_hex 172 16 1 1)
>> +
>> +foo1_mac="f00000010203"
>> +foo_mac="000001010203"
>> +gw_mac="000002010203"
>> +nexthop_mac="f00000010204"
>> +
>> +# Send ip packet from foo1 to 8.8.8.8
>> +src_mac="f00000010203"
>> +dst_mac="000001010203"
>>
>> +packet=${foo_mac}${foo1_mac}08004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000
>> +
>> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>> +sleep 2
>> +
>> +# ARP request packet for nexthop_ip to expect at outside1
>>
>> +arp_request=ffffffffffff${gw_mac}08060001080006040001${gw_mac}${gw_ip}000000000000${nexthop_ip}
>> +echo $arp_request >> hv3-vif1.expected
>> +cat hv3-vif1.expected > expout
>> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep
>> ${nexthop_ip} | uniq > hv3-vif1
>> +AT_CHECK([sort hv3-vif1], [0], [expout])
>> +
>> +# Send ARP reply from outside1 back to the router
>> +reply_mac="f00000010204"
>>
>> +arp_reply=${gw_mac}${nexthop_mac}08060001080006040002${nexthop_mac}${nexthop_ip}${gw_mac}${gw_ip}
>> +
>> +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply
>> +OVS_WAIT_UNTIL([
>> +    test `as hv2 ovs-ofctl dump-flows br-int | grep table=66 | \
>> +grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1
>> +    ])
>> +
>> +# VLAN tagged packet with router port(192.168.1.1) MAC as destination MAC
>> +# is expected on bridge connecting hv1 and hv2
>>
>> +expected=${foo_mac}${foo1_mac}8100000208004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000
>> +echo $expected > hv1-br-ex_n2.expected
>> +
>> +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port.
>> +# As connection tracking not enabled for this test, snat can't be done
>> on the packet.
>> +# We still see foo1 as the source ip address. But source mac(gateway
>> MAC) and
>> +# dest mac(nexthop mac) are properly configured.
>>
>> +expected=${nexthop_mac}${gw_mac}08004500001c000000003f110100${foo1_ip}${dst_ip}0035111100080000
>> +echo $expected > hv3-vif1.expected
>> +
>> +reset_pcap_file() {
>> +    local iface=$1
>> +    local pcap_file=$2
>> +    ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \
>> +options:rxq_pcap=dummy-rx.pcap
>> +    rm -f ${pcap_file}*.pcap
>> +    ovs-vsctl -- set Interface $iface
>> options:tx_pcap=${pcap_file}-tx.pcap \
>> +options:rxq_pcap=${pcap_file}-rx.pcap
>> +}
>> +
>> +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2
>> +as hv3 reset_pcap_file hv3-vif1 hv3/vif1
>> +sleep 2
>> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>> +sleep 2
>> +
>> +# On hv1, the packet should not go from vlan switch pipleline to router
>> +# pipleine
>> +as hv1 ovs-ofctl dump-flows br-int
>> +
>> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
>> "priority=100,reg15=0x1,metadata=0x2" \
>> +| grep actions=clone | grep -v n_packets=0 | wc -l], [0], [[0
>> +]])
>> +
>> +# On hv1, table 32 check that no packet goes via the tunnel port
>> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 \
>> +| grep "NXM_NX_TUN_ID" | grep -v n_packets=0 | wc -l], [0], [[0
>> +]])
>> +
>> +ip_packet() {
>> +    grep "1010203f00000010203"
>> +}
>> +
>> +# Check vlan tagged packet on the bridge connecting hv1 and hv2 with the
>> +# foo1's mac.
>> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-ex_n2-tx.pcap |
>> ip_packet | uniq > hv1-br-ex_n2
>> +cat hv1-br-ex_n2.expected > expout
>> +AT_CHECK([sort hv1-br-ex_n2], [0], [expout])
>> +
>> +# Check expected packet on nexthop interface
>> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep
>> ${foo1_ip}${dst_ip} | uniq > hv3-vif1
>> +cat hv3-vif1.expected > expout
>> +AT_CHECK([sort hv3-vif1], [0], [expout])
>> +
>> +OVN_CLEANUP([hv1],[hv2],[hv3])
>> +AT_CLEANUP
>> +
>>  AT_SETUP([ovn -- IPv6 ND Router Solicitation responder])
>>  AT_KEYWORDS([ovn-nd_ra])
>>  AT_SKIP_IF([test $HAVE_PYTHON = no])
>> --
>> 2.17.1
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>
diff mbox series

Patch

diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index 7352c6764..f52699bd3 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -874,6 +874,25 @@  output;
             resident.
           </li>
         </ul>
+
+        <p>
+          For the Ethernet address on a logical switch port of type
+          <code>router</code>, when that logical switch port's
+          <ref column="addresses" table="Logical_Switch_Port"
+          db="OVN_Northbound"/> column is set to <code>router</code> and
+          the connected logical router port specifies a
+          <code>reside-on-redirect-chassis</code> and the logical router
+          to which the connected logical router port belongs to has a
+          <code>redirect-chassis</code> distributed gateway logical router
+          port:
+        </p>
+
+        <ul>
+          <li>
+            The flow for the connected logical router port's Ethernet
+            address is only programmed on the <code>redirect-chassis</code>.
+          </li>
+        </ul>
       </li>
 
       <li>
@@ -1179,6 +1198,17 @@  output;
           upstream MAC learning to point to the
           <code>redirect-chassis</code>.
         </p>
+
+        <p>
+          For the logical router port with the option
+          <code>reside-on-redirect-chassis</code> set (which is centralized),
+          the above flows are only programmed on the gateway port instance on
+          the <code>redirect-chassis</code> (if the logical router has a
+          distributed gateway port). This behavior avoids generation
+          of multiple ARP responses from different chassis, and allows
+          upstream MAC learning to point to the
+          <code>redirect-chassis</code>.
+        </p>
       </li>
 
       <li>
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 31ea5f410..3998a898c 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -4426,13 +4426,32 @@  build_lswitch_flows(struct hmap *datapaths, struct hmap *ports,
                 ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT,
                               ETH_ADDR_ARGS(mac));
                 if (op->peer->od->l3dgw_port
-                    && op->peer == op->peer->od->l3dgw_port
-                    && op->peer->od->l3redirect_port) {
-                    /* The destination lookup flow for the router's
-                     * distributed gateway port MAC address should only be
-                     * programmed on the "redirect-chassis". */
-                    ds_put_format(&match, " && is_chassis_resident(%s)",
-                                  op->peer->od->l3redirect_port->json_key);
+                    && op->peer->od->l3redirect_port
+                    && op->od->localnet_port) {
+                    bool add_chassis_resident_check = false;
+                    if (op->peer == op->peer->od->l3dgw_port) {
+                        /* The peer of this port represents a distributed
+                         * gateway port. The destination lookup flow for the
+                         * router's distributed gateway port MAC address should
+                         * only be programmed on the "redirect-chassis". */
+                        add_chassis_resident_check = true;
+                    } else {
+                        /* Check if the option 'reside-on-redirect-chassis'
+                         * is set to true on the peer port. If set to true
+                         * and if the logical switch has a localnet port, it
+                         * means the router pipeline for the packets from
+                         * this logical switch should be run on the chassis
+                         * hosting the gateway port.
+                         */
+                        add_chassis_resident_check = smap_get_bool(
+                            &op->peer->nbrp->options,
+                            "reside-on-redirect-chassis", false);
+                    }
+
+                    if (add_chassis_resident_check) {
+                        ds_put_format(&match, " && is_chassis_resident(%s)",
+                                      op->peer->od->l3redirect_port->json_key);
+                    }
                 }
 
                 ds_clear(&actions);
@@ -5197,15 +5216,35 @@  build_lrouter_flows(struct hmap *datapaths, struct hmap *ports,
                           op->lrp_networks.ipv4_addrs[i].network_s,
                           op->lrp_networks.ipv4_addrs[i].plen,
                           op->lrp_networks.ipv4_addrs[i].addr_s);
-            if (op->od->l3dgw_port && op == op->od->l3dgw_port
-                && op->od->l3redirect_port) {
-                /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s
-                 * should only be sent from the "redirect-chassis", so that
-                 * upstream MAC learning points to the "redirect-chassis".
-                 * Also need to avoid generation of multiple ARP responses
-                 * from different chassis. */
-                ds_put_format(&match, " && is_chassis_resident(%s)",
-                              op->od->l3redirect_port->json_key);
+
+            if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer
+                && op->peer->od->localnet_port) {
+                bool add_chassis_resident_check = false;
+                if (op == op->od->l3dgw_port) {
+                    /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s
+                     * should only be sent from the "redirect-chassis", so that
+                     * upstream MAC learning points to the "redirect-chassis".
+                     * Also need to avoid generation of multiple ARP responses
+                     * from different chassis. */
+                    add_chassis_resident_check = true;
+                } else {
+                    /* Check if the option 'reside-on-redirect-chassis'
+                     * is set to true on the router port. If set to true
+                     * and if peer's logical switch has a localnet port, it
+                     * means the router pipeline for the packets from
+                     * peer's logical switch is be run on the chassis
+                     * hosting the gateway port and it should reply to the
+                     * ARP requests for the router port IPs.
+                     */
+                    add_chassis_resident_check = smap_get_bool(
+                        &op->nbrp->options,
+                        "reside-on-redirect-chassis", false);
+                }
+
+                if (add_chassis_resident_check) {
+                    ds_put_format(&match, " && is_chassis_resident(%s)",
+                                  op->od->l3redirect_port->json_key);
+                }
             }
 
             ds_clear(&actions);
diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml
index 6ed2cf132..998470c34 100644
--- a/ovn/ovn-architecture.7.xml
+++ b/ovn/ovn-architecture.7.xml
@@ -1372,6 +1372,166 @@ 
     http://docs.openvswitch.org/en/latest/topics/high-availability.
   </p>
 
+  <h2>Tenant VLAN networks connected to a Logical Router</h2>
+
+  <p>
+    It is possible to have multiple logical switches each with a localnet port
+    (representing physical networks) connected to a logical router in which one
+    may provide the external connectivity via a distributed gatewat port and
+    the rest of them are used internally (with VLAN tagged). It is expected
+    that <code>ovn-bridge-mappings</code> is configured appropriately on the
+    chassis.
+  </p>
+
+  <h3>East West routing</h3>
+  <p>
+    East-West routing between these tenant VLAN logical switches works almost
+    the same way as normal logical switches. When the VM sends such a packet,
+    then:
+  </p>
+  <ol>
+    <li>
+      The packet enters the ingress pipeline of the logical router datapath
+      via the logical router port in the source chassis.
+    </li>
+
+    <li>
+      Routing decision is taken.
+    </li>
+
+    <li>
+      The packet goes out of the integration bridge to the provider bridge (
+      belonging to the destination logical switch) via the localnet port.
+    </li>
+
+    <li>
+      The destination chassis receives the packet via the localnet port
+      and delivers to the destination VM.
+    </li>
+  </ol>
+
+  <h3>External traffic</h3>
+
+  <p>
+    The following happens when a VM sends an external traffic (which requires
+    NATting) and the chassis hosting the VM doesn't have a distributed gateway
+    port.
+  </p>
+
+  <ol>
+    <li>
+      The packet enters the ingress pipeline of the logical router datapath
+      via the logical router port in the source chassis.
+    </li>
+
+    <li>
+      Routing decision is taken. Since the gateway router or the distributed
+      gateway port doesn't reside in the source chassis, the traffic is
+      redirected to the gateway chassis via the tunnel port.
+    </li>
+
+    <li>
+      The gateway chassis receives the packet, applies the NAT rules and
+      forwards it via the localnet port.
+    </li>
+  </ol>
+
+  <p>
+    Although this works, the VM traffic is tunnelled. In order for it to
+    work properly, the MTU of the VLAN tenant networks must be lowered to
+    account for the tunnel encapsulation.
+  </p>
+
+  <h2>Centralized routing for VLAN tenant networks</h2>
+
+  <p>
+    To overcome the tunnel encapsulation problem described in the previous
+    section, <code>OVN</code> supports the option of enabling centralized
+    routing for VLAN tenant networks. CMS can configure the option
+    <ref column="options:reside-on-redirect-chassis"
+    table="Logical_Router_Port" db="OVN_NB"/> to <code>true</code> for each
+    <ref table="Logical_Router_Port" db="OVN_NB"/> which connects to the
+    logical switch of the VLAN tenant network. This causes the gateway
+    chassis (hosting the distributed gateway port) to handle all the
+    routing for these networks, making it centralized. It will reply to
+    the ARP requests for the logical router port IPs.
+  </p>
+
+  <p>
+    If the logical router doesn't have a distributed gateway port connecting
+    to the provider network, then this option is ignored by <code>OVN</code>.
+  </p>
+
+  <p>
+    The following happens when a VM sends an east-west traffic which needs to
+    be routed:
+  </p>
+
+  <ol>
+    <li>
+      The packet from the VM enters the logical datapath pipeline of the source
+      VLAN network in the source chassis and is sent out via the localnet port
+      (instead of sending it to router pipeline).
+    </li>
+
+    <li>
+      The packet enters the logical datapath pipeline of the source VLAN
+      network in the gateway chassis and is sent to the logical datapath
+      pipeline belonging to the logical router.
+    </li>
+
+    <li>
+      Routing decision is taken.
+    </li>
+
+    <li>
+      The packet enters the logical datapath pipeline of the destination
+      VLAN network. The packet is delivered to the destination VM if it resides
+      in the same chassis. Otherwise the packet is sent out via the localnet
+      port of the destination VLAN network.
+    </li>
+
+    <li>
+      The destination chassis receives the packet via the localnet port
+      and delivers to the destination VM.
+    </li>
+  </ol>
+
+  <p>
+    The following happens when a VM sends an external traffic which requires
+    NATting:
+  </p>
+
+  <ol>
+    <li>
+      The packet from the VM enters the logical datapath pipeline of the source
+      VLAN network in the source chassis and is sent out via the localnet port
+      (instead of sending it to router pipeline).
+    </li>
+
+    <li>
+      The packet enters the logical datapath pipeline of the source VLAN
+      network in the gateway chassis and is sent to the logical datapath
+      pipeline belonging to the logical router.
+    </li>
+
+    <li>
+      Routing decision is taken and NAT rules are applied.
+    </li>
+
+    <li>
+      The packet enters the logical datapath pipeline of the provider network
+      and is sent out via the localnet port of the provider network.
+    </li>
+  </ol>
+
+  <p>
+    For the reverse external traffic, the gateway chassis applies the unNATting
+    rules and sends the packet via the localnet port of the VLAN tenant
+    network and the destination chassis receives the packet and delivers to
+    the VM.
+  </p>
+
   <h2>Life Cycle of a VTEP gateway</h2>
 
   <p>
diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml
index 8564ed39c..13ae56e13 100644
--- a/ovn/ovn-nb.xml
+++ b/ovn/ovn-nb.xml
@@ -1635,6 +1635,49 @@ 
           chassis to enable high availability.
         </p>
       </column>
+
+      <column name="options" key="reside-on-redirect-chassis">
+        <p>
+          Generally routing is distributed in <code>OVN</code>. The packet
+          from a logical port which needs to be routed hits the router pipeline
+          in the source chassis. For the East-West traffic, the packet is
+          sent directly to the destination chassis. For the outside traffic
+          the packet is sent to the gateway chassis.
+        </p>
+
+        <p>
+          When this option is set, <code>OVN</code> considers this only if
+        </p>
+
+        <ul>
+          <li>
+            The logical router to which this logical router port belongs to
+            has a distributed gateway port.
+          </li>
+
+          <li>
+            The peer's logical switch has a localnet port (representing
+            a tenant VLAN network)
+          </li>
+        </ul>
+
+        <p>
+          When this option is set to <code>true</code>, then the packet
+          which needs to be routed hits the router pipeline in the chassis
+          hosting the distributed gateway router port. The source chassis
+          pushes out this traffic via the localnet port. With this the
+          East-West traffic is no more distributed and will always go through
+          the gateway chassis.
+        </p>
+
+        <p>
+          Without this option set, for any traffic destined to outside from a
+          logical port which belongs to a logical switch with localnet port,
+          the source chassis will send the traffic to the gateway chassis via
+          the tunnel port instead of the localnet port and this could cause MTU
+          issues.
+        </p>
+      </column>
     </group>
 
     <group title="Attachment">
diff --git a/tests/ovn.at b/tests/ovn.at
index 769e09f81..504ba228d 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -8537,6 +8537,279 @@  OVN_CLEANUP([hv1],[hv2],[hv3])
 
 AT_CLEANUP
 
+# VLAN traffic for external network redirected through distributed router
+# gateway port should use vlans(i.e input network vlan tag) across hypervisors
+# instead of tunneling.
+AT_SETUP([ovn -- vlan traffic for external network with distributed router gateway port])
+AT_SKIP_IF([test $HAVE_PYTHON = no])
+ovn_start
+
+# Logical network:
+# # One LR R1 that has switches foo (192.168.1.0/24) and
+# # alice (172.16.1.0/24) connected to it.  The logical port
+# # between R1 and alice has a "redirect-chassis" specified,
+# # i.e. it is the distributed router gateway port(172.16.1.6).
+# # Switch alice also has a localnet port defined.
+# # An additional switch outside has the same subnet as alice
+# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1)
+# # which will receive the packet destined for external network
+# # (i.e 8.8.8.8 as destination ip).
+
+# Physical network:
+# # Three hypervisors hv[123].
+# # hv1 hosts vif foo1.
+# # hv2 is the "redirect-chassis" that hosts the distributed router gateway port.
+# # hv3 hosts nexthop port vif outside1.
+# # All other tests connect hypervisors to network n1 through br-phys for tunneling.
+# # But in this test, hv1 won't connect to n1(and no br-phys in hv1), and
+# # in order to show vlans(instead of tunneling) used between hv1 and hv2,
+# # a new network n2 created and hv1 and hv2 connected to this network through br-ex.
+# # hv2 and hv3 are still connected to n1 network through br-phys.
+net_add n1
+
+# We are not calling ovn_attach for hv1, to avoid adding br-phys.
+# Tunneling won't work in hv1 as ovn-encap-ip is not added to any bridge in hv1
+sim_add hv1
+as hv1
+ovs-vsctl \
+    -- set Open_vSwitch . external-ids:system-id=hv1 \
+    -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
+    -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \
+    -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \
+    -- add-br br-int \
+    -- set bridge br-int fail-mode=secure other-config:disable-in-band=true \
+    -- set Open_vSwitch . external-ids:ovn-bridge-mappings=public:br-ex
+
+start_daemon ovn-controller
+ovs-vsctl -- add-port br-int hv1-vif1 -- \
+    set interface hv1-vif1 external-ids:iface-id=foo1 \
+    ofport-request=1
+
+sim_add hv2
+as hv2
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.2
+ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys"
+
+sim_add hv3
+as hv3
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.3
+ovs-vsctl -- add-port br-int hv3-vif1 -- \
+    set interface hv3-vif1 external-ids:iface-id=outside1 \
+    options:tx_pcap=hv3/vif1-tx.pcap \
+    options:rxq_pcap=hv3/vif1-rx.pcap \
+    ofport-request=1
+ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings="phys:br-phys"
+
+# Create network n2 for vlan connectivity between hv1 and hv2
+net_add n2
+
+as hv1
+ovs-vsctl add-br br-ex
+net_attach n2 br-ex
+
+as hv2
+ovs-vsctl add-br br-ex
+net_attach n2 br-ex
+
+OVN_POPULATE_ARP
+
+ovn-nbctl create Logical_Router name=R1
+
+ovn-nbctl ls-add foo
+ovn-nbctl ls-add alice
+ovn-nbctl ls-add outside
+
+# Connect foo to R1
+ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24
+ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \
+    type=router options:router-port=foo \
+    -- lsp-set-addresses rp-foo router
+
+# Connect alice to R1 as distributed router gateway port (172.16.1.6) on hv2
+ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \
+    -- set Logical_Router_Port alice options:redirect-chassis="hv2"
+ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
+    type=router options:router-port=alice \
+    -- lsp-set-addresses rp-alice router \
+
+# Create logical port foo1 in foo
+ovn-nbctl lsp-add foo foo1 \
+-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2"
+
+# Create logical port outside1 in outside, which is a nexthop address
+# for 172.16.1.0/24
+ovn-nbctl lsp-add outside outside1 \
+-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1"
+
+# Set default gateway (nexthop) to 172.16.1.1
+ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice
+AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24])
+ovn-nbctl set Logical_Switch_Port rp-alice options:nat-addresses=router
+
+ovn-nbctl lsp-add foo ln-foo
+ovn-nbctl lsp-set-addresses ln-foo unknown
+ovn-nbctl lsp-set-options ln-foo network_name=public
+ovn-nbctl lsp-set-type ln-foo localnet
+AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2])
+
+# Create localnet port in alice
+ovn-nbctl lsp-add alice ln-alice
+ovn-nbctl lsp-set-addresses ln-alice unknown
+ovn-nbctl lsp-set-type ln-alice localnet
+ovn-nbctl lsp-set-options ln-alice network_name=phys
+
+# Create localnet port in outside
+ovn-nbctl lsp-add outside ln-outside
+ovn-nbctl lsp-set-addresses ln-outside unknown
+ovn-nbctl lsp-set-type ln-outside localnet
+ovn-nbctl lsp-set-options ln-outside network_name=phys
+
+# Allow some time for ovn-northd and ovn-controller to catch up.
+# XXX This should be more systematic.
+ovn-nbctl --wait=hv --timeout=3 sync
+
+# Check that there is a logical flow in logical switch foo's pipeline
+# to set the outport to rp-foo (which is expected).
+OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup | \
+grep rp-foo | grep -v is_chassis_resident | wc -l`])
+
+# Set the option 'reside-on-redirect-chassis' for foo
+ovn-nbctl set logical_router_port foo options:reside-on-redirect-chassis=true
+# Check that there is a logical flow in logical switch foo's pipeline
+# to set the outport to rp-foo with the condition is_chassis_redirect.
+ovn-sbctl dump-flows foo
+OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup | \
+grep rp-foo | grep is_chassis_resident | wc -l`])
+
+echo "---------NB dump-----"
+ovn-nbctl show
+echo "---------------------"
+ovn-nbctl list logical_router
+echo "---------------------"
+ovn-nbctl list nat
+echo "---------------------"
+ovn-nbctl list logical_router_port
+echo "---------------------"
+
+echo "---------SB dump-----"
+ovn-sbctl list datapath_binding
+echo "---------------------"
+ovn-sbctl list port_binding
+echo "---------------------"
+ovn-sbctl dump-flows
+echo "---------------------"
+ovn-sbctl list chassis
+echo "---------------------"
+
+for chassis in hv1 hv2 hv3; do
+    as $chassis
+    echo "------ $chassis dump ----------"
+    ovs-vsctl show br-int
+    ovs-ofctl show br-int
+    ovs-ofctl dump-flows br-int
+    echo "--------------------------"
+done
+
+ip_to_hex() {
+    printf "%02x%02x%02x%02x" "$@"
+}
+
+foo1_ip=$(ip_to_hex 192 168 1 2)
+gw_ip=$(ip_to_hex 172 16 1 6)
+dst_ip=$(ip_to_hex 8 8 8 8)
+nexthop_ip=$(ip_to_hex 172 16 1 1)
+
+foo1_mac="f00000010203"
+foo_mac="000001010203"
+gw_mac="000002010203"
+nexthop_mac="f00000010204"
+
+# Send ip packet from foo1 to 8.8.8.8
+src_mac="f00000010203"
+dst_mac="000001010203"
+packet=${foo_mac}${foo1_mac}08004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000
+
+as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
+sleep 2
+
+# ARP request packet for nexthop_ip to expect at outside1
+arp_request=ffffffffffff${gw_mac}08060001080006040001${gw_mac}${gw_ip}000000000000${nexthop_ip}
+echo $arp_request >> hv3-vif1.expected
+cat hv3-vif1.expected > expout
+$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep ${nexthop_ip} | uniq > hv3-vif1
+AT_CHECK([sort hv3-vif1], [0], [expout])
+
+# Send ARP reply from outside1 back to the router
+reply_mac="f00000010204"
+arp_reply=${gw_mac}${nexthop_mac}08060001080006040002${nexthop_mac}${nexthop_ip}${gw_mac}${gw_ip}
+
+as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply
+OVS_WAIT_UNTIL([
+    test `as hv2 ovs-ofctl dump-flows br-int | grep table=66 | \
+grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1
+    ])
+
+# VLAN tagged packet with router port(192.168.1.1) MAC as destination MAC
+# is expected on bridge connecting hv1 and hv2
+expected=${foo_mac}${foo1_mac}8100000208004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000
+echo $expected > hv1-br-ex_n2.expected
+
+# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port.
+# As connection tracking not enabled for this test, snat can't be done on the packet.
+# We still see foo1 as the source ip address. But source mac(gateway MAC) and
+# dest mac(nexthop mac) are properly configured.
+expected=${nexthop_mac}${gw_mac}08004500001c000000003f110100${foo1_ip}${dst_ip}0035111100080000
+echo $expected > hv3-vif1.expected
+
+reset_pcap_file() {
+    local iface=$1
+    local pcap_file=$2
+    ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \
+options:rxq_pcap=dummy-rx.pcap
+    rm -f ${pcap_file}*.pcap
+    ovs-vsctl -- set Interface $iface options:tx_pcap=${pcap_file}-tx.pcap \
+options:rxq_pcap=${pcap_file}-rx.pcap
+}
+
+as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2
+as hv3 reset_pcap_file hv3-vif1 hv3/vif1
+sleep 2
+as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
+sleep 2
+
+# On hv1, the packet should not go from vlan switch pipleline to router
+# pipleine
+as hv1 ovs-ofctl dump-flows br-int
+
+AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep "priority=100,reg15=0x1,metadata=0x2" \
+| grep actions=clone | grep -v n_packets=0 | wc -l], [0], [[0
+]])
+
+# On hv1, table 32 check that no packet goes via the tunnel port
+AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 \
+| grep "NXM_NX_TUN_ID" | grep -v n_packets=0 | wc -l], [0], [[0
+]])
+
+ip_packet() {
+    grep "1010203f00000010203"
+}
+
+# Check vlan tagged packet on the bridge connecting hv1 and hv2 with the
+# foo1's mac.
+$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-ex_n2-tx.pcap | ip_packet | uniq > hv1-br-ex_n2
+cat hv1-br-ex_n2.expected > expout
+AT_CHECK([sort hv1-br-ex_n2], [0], [expout])
+
+# Check expected packet on nexthop interface
+$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep ${foo1_ip}${dst_ip} | uniq > hv3-vif1
+cat hv3-vif1.expected > expout
+AT_CHECK([sort hv3-vif1], [0], [expout])
+
+OVN_CLEANUP([hv1],[hv2],[hv3])
+AT_CLEANUP
+
 AT_SETUP([ovn -- IPv6 ND Router Solicitation responder])
 AT_KEYWORDS([ovn-nd_ra])
 AT_SKIP_IF([test $HAVE_PYTHON = no])