Message ID | 20181119161738.9468-1-nusiddiq@redhat.com |
---|---|
State | Accepted |
Headers | show |
Series | [ovs-dev,v2] ovn: Avoid tunneling for VLAN packets redirected to a gateway chassis | expand |
On Mon, 19 Nov 2018 at 08:18, <nusiddiq@redhat.com> wrote: > From: Numan Siddique <nusiddiq@redhat.com> > > An OVN deployment can have multiple logical switches each with a > localnet port connected to a distributed logical router in which one > logical switch may provide external connectivity and the rest of > the localnet logical switches use VLAN tagging in the physical > network. > > As reported in [1], external traffic from these localnet VLAN tagged > logical switches are tunnelled to the gateway chassis (chassis hosting > a distributed gateway port which applies NAT rules). As part of the > discussion in [1], there are few possible solutions proposed by > Russell [2]. This patch implements the first option in [2]. > > With this patch, a new option 'reside-on-redirect-chassis' in 'options' > column of Logical_Router_Port table is added. If the value of this > option is set to 'true' and if the logical router also have a > distributed gateway port, then routing for this logical router port > is centralized in the chassis hosting the distributed gateway port. > > If a logical switch 'sw0' is connected to a router 'lr0' with the > router port - 'lr0-sw0' with the address - "00:00:00:00:af:12 192.168.1.1" > , and it has a distributed logical port - 'lr0-public', then the > below logical flow is added in the logical switch pipeline > of 'sw0' if the 'reside-on-redirect-chassis' option is set on 'lr-sw0' - > > table=16(ls_in_l2_lkup), priority=50, > match=(eth.dst == 00:00:00:00:af:12 && > is_chassis_resident("cr-lr0-public")), > action=(outport = "sw0-lr0"; output;) > > "cr-lr0-public" is an internal port binding created by ovn-northd of type > 'chassisredirect' for lr0-public in SB DB. Please see "man ovn-sb" for > more details. > > With the above flow, the packet doesn't enter the router pipeline in > the source chassis. Instead the packet is sent out via the localnet > port of 'sw0'. The gateway chassis upon receiving this packet, runs > the logical router pipeline applying NAT rules and sends the traffic > out via the localnet port of the logical switch providing external > connectivity. > The gateway chassis will also reply to the ARP requests for the router > port IPs. > > With this approach, we avoid redirecting the external traffic to the > gateway chassis via the tunnel port. There are a couple of drawbacks > with this approach: > > - East - West routing is no more distributed for the VLAN tagged > localnet logical switches if 'reside-on-redirect-chassis' option is > defined > > - 'dnat_and_snat' NAT rules with 'logical_mac' and 'logical_port' > columns defined will not work for these logical switches. > > This approach is taken for now as it is simple. If there is a requirement > to support distributed routing for these VLAN tenant networks, we > can explore other possible solutions. > > [1] - > https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html > [2] - > https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046557.html > > Reported-at: > https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html > Reported-by: venkata anil <vkommadi@redhat.com> > Acked-by: Gurucharan Shetty <guru@ovn.org> > Co-authored-by: venkata anil <vkommadi@redhat.com> > Signed-off-by: Numan Siddique <nusiddiq@redhat.com> > Signed-off-by: venkata anil <vkommadi@redhat.com> > Since no one else looks to have any further comments, I applied this to master. > --- > > v1 -> v2 > -------- > * Addressed the review comments from Guru. > * Removed the patch 2 'ovn: Support a new Logical_Switch_Port.type - > 'external' from this series as it is an independent patch. > > ovn/northd/ovn-northd.8.xml | 30 ++++ > ovn/northd/ovn-northd.c | 71 +++++++--- > ovn/ovn-architecture.7.xml | 211 ++++++++++++++++++++++++++++ > ovn/ovn-nb.xml | 43 ++++++ > tests/ovn.at | 273 ++++++++++++++++++++++++++++++++++++ > 5 files changed, 612 insertions(+), 16 deletions(-) > > diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml > index 7352c6764..f52699bd3 100644 > --- a/ovn/northd/ovn-northd.8.xml > +++ b/ovn/northd/ovn-northd.8.xml > @@ -874,6 +874,25 @@ output; > resident. > </li> > </ul> > + > + <p> > + For the Ethernet address on a logical switch port of type > + <code>router</code>, when that logical switch port's > + <ref column="addresses" table="Logical_Switch_Port" > + db="OVN_Northbound"/> column is set to <code>router</code> and > + the connected logical router port specifies a > + <code>reside-on-redirect-chassis</code> and the logical router > + to which the connected logical router port belongs to has a > + <code>redirect-chassis</code> distributed gateway logical router > + port: > + </p> > + > + <ul> > + <li> > + The flow for the connected logical router port's Ethernet > + address is only programmed on the > <code>redirect-chassis</code>. > + </li> > + </ul> > </li> > > <li> > @@ -1179,6 +1198,17 @@ output; > upstream MAC learning to point to the > <code>redirect-chassis</code>. > </p> > + > + <p> > + For the logical router port with the option > + <code>reside-on-redirect-chassis</code> set (which is > centralized), > + the above flows are only programmed on the gateway port > instance on > + the <code>redirect-chassis</code> (if the logical router has a > + distributed gateway port). This behavior avoids generation > + of multiple ARP responses from different chassis, and allows > + upstream MAC learning to point to the > + <code>redirect-chassis</code>. > + </p> > </li> > > <li> > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c > index 58bef7de5..2de9fb38d 100644 > --- a/ovn/northd/ovn-northd.c > +++ b/ovn/northd/ovn-northd.c > @@ -4461,13 +4461,32 @@ build_lswitch_flows(struct hmap *datapaths, struct > hmap *ports, > ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT, > ETH_ADDR_ARGS(mac)); > if (op->peer->od->l3dgw_port > - && op->peer == op->peer->od->l3dgw_port > - && op->peer->od->l3redirect_port) { > - /* The destination lookup flow for the router's > - * distributed gateway port MAC address should only be > - * programmed on the "redirect-chassis". */ > - ds_put_format(&match, " && is_chassis_resident(%s)", > - > op->peer->od->l3redirect_port->json_key); > + && op->peer->od->l3redirect_port > + && op->od->localnet_port) { > + bool add_chassis_resident_check = false; > + if (op->peer == op->peer->od->l3dgw_port) { > + /* The peer of this port represents a distributed > + * gateway port. The destination lookup flow for > the > + * router's distributed gateway port MAC address > should > + * only be programmed on the "redirect-chassis". > */ > + add_chassis_resident_check = true; > + } else { > + /* Check if the option > 'reside-on-redirect-chassis' > + * is set to true on the peer port. If set to true > + * and if the logical switch has a localnet port, > it > + * means the router pipeline for the packets from > + * this logical switch should be run on the > chassis > + * hosting the gateway port. > + */ > + add_chassis_resident_check = smap_get_bool( > + &op->peer->nbrp->options, > + "reside-on-redirect-chassis", false); > + } > + > + if (add_chassis_resident_check) { > + ds_put_format(&match, " && > is_chassis_resident(%s)", > + > op->peer->od->l3redirect_port->json_key); > + } > } > > ds_clear(&actions); > @@ -5232,15 +5251,35 @@ build_lrouter_flows(struct hmap *datapaths, struct > hmap *ports, > op->lrp_networks.ipv4_addrs[i].network_s, > op->lrp_networks.ipv4_addrs[i].plen, > op->lrp_networks.ipv4_addrs[i].addr_s); > - if (op->od->l3dgw_port && op == op->od->l3dgw_port > - && op->od->l3redirect_port) { > - /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s > - * should only be sent from the "redirect-chassis", so > that > - * upstream MAC learning points to the "redirect-chassis". > - * Also need to avoid generation of multiple ARP responses > - * from different chassis. */ > - ds_put_format(&match, " && is_chassis_resident(%s)", > - op->od->l3redirect_port->json_key); > + > + if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer > + && op->peer->od->localnet_port) { > + bool add_chassis_resident_check = false; > + if (op == op->od->l3dgw_port) { > + /* Traffic with eth.src = > l3dgw_port->lrp_networks.ea_s > + * should only be sent from the "redirect-chassis", > so that > + * upstream MAC learning points to the > "redirect-chassis". > + * Also need to avoid generation of multiple ARP > responses > + * from different chassis. */ > + add_chassis_resident_check = true; > + } else { > + /* Check if the option 'reside-on-redirect-chassis' > + * is set to true on the router port. If set to true > + * and if peer's logical switch has a localnet port, > it > + * means the router pipeline for the packets from > + * peer's logical switch is be run on the chassis > + * hosting the gateway port and it should reply to the > + * ARP requests for the router port IPs. > + */ > + add_chassis_resident_check = smap_get_bool( > + &op->nbrp->options, > + "reside-on-redirect-chassis", false); > + } > + > + if (add_chassis_resident_check) { > + ds_put_format(&match, " && is_chassis_resident(%s)", > + op->od->l3redirect_port->json_key); > + } > } > > ds_clear(&actions); > diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml > index 64e7d89e6..3936e6016 100644 > --- a/ovn/ovn-architecture.7.xml > +++ b/ovn/ovn-architecture.7.xml > @@ -1372,6 +1372,217 @@ > http://docs.openvswitch.org/en/latest/topics/high-availability. > </p> > > + <h2>Multiple localnet logical switches connected to a Logical > Router</h2> > + > + <p> > + It is possible to have multiple logical switches each with a localnet > port > + (representing physical networks) connected to a logical router, in > which > + one localnet logical switch may provide the external connectivity via > a > + distributed gateway port and rest of the localnet logical switches use > + VLAN tagging in the physical network. It is expected that > + <code>ovn-bridge-mappings</code> is configured appropriately on the > + chassis for all these localnet networks. > + </p> > + > + <h3>East West routing</h3> > + <p> > + East-West routing between these localnet VLAN tagged logical switches > + work almost the same way as normal logical switches. When the VM sends > + such a packet, then: > + </p> > + <ol> > + <li> > + It first enters the ingress pipeline, and then egress pipeline of > the > + source localnet logical switch datapath. It then enters the ingress > + pipeline of the logical router datapath via the logical router port > in > + the source chassis. > + </li> > + > + <li> > + Routing decision is taken. > + </li> > + > + <li> > + From the router datapath, packet enters the ingress pipeline and > then > + egress pipeline of the destination localnet logical switch datapath > + and goes out of the integration bridge to the provider bridge ( > + belonging to the destination logical switch) via the localnet port. > + </li> > + > + <li> > + The destination chassis receives the packet via the localnet port > and > + sends it to the integration bridge. The packet enters the > + ingress pipeline and then egress pipeline of the destination > localnet > + logical switch and finally gets delivered to the destination VM > port. > + </li> > + </ol> > + > + <h3>External traffic</h3> > + > + <p> > + The following happens when a VM sends an external traffic (which > requires > + NATting) and the chassis hosting the VM doesn't have a distributed > gateway > + port. > + </p> > + > + <ol> > + <li> > + The packet first enters the ingress pipeline, and then egress > pipeline of > + the source localnet logical switch datapath. It then enters the > ingress > + pipeline of the logical router datapath via the logical router port > in > + the source chassis. > + </li> > + > + <li> > + Routing decision is taken. Since the gateway router or the > distributed > + gateway port doesn't reside in the source chassis, the traffic is > + redirected to the gateway chassis via the tunnel port. > + </li> > + > + <li> > + The gateway chassis receives the packet via the tunnel port and the > + packet enters the egress pipeline of the logical router datapath. > NAT > + rules are applied here. The packet then enters the ingress pipeline > and > + then egress pipeline of the localnet logical switch datapath which > + provides external connectivity and finally goes out via the localnet > + port of the logical switch which provides external connectivity. > + </li> > + </ol> > + > + <p> > + Although this works, the VM traffic is tunnelled when sent from the > compute > + chassis to the gateway chassis. In order for it to work properly, the > MTU > + of the localnet logical switches must be lowered to account for the > tunnel > + encapsulation. > + </p> > + > + <h2> > + Centralized routing for localnet VLAN tagged logical switches > connected > + to a Logical Router > + </h2> > + > + <p> > + To overcome the tunnel encapsulation problem described in the previous > + section, <code>OVN</code> supports the option of enabling centralized > + routing for localnet VLAN tagged logical switches. CMS can configure > the > + option <ref column="options:reside-on-redirect-chassis" > + table="Logical_Router_Port" db="OVN_NB"/> to <code>true</code> for > each > + <ref table="Logical_Router_Port" db="OVN_NB"/> which connects to the > + localnet VLAN tagged logical switches. This causes the gateway > + chassis (hosting the distributed gateway port) to handle all the > + routing for these networks, making it centralized. It will reply to > + the ARP requests for the logical router port IPs. > + </p> > + > + <p> > + If the logical router doesn't have a distributed gateway port > connecting > + to the localnet logical switch which provides external connectivity, > + then this option is ignored by <code>OVN</code>. > + </p> > + > + <p> > + The following happens when a VM sends an east-west traffic which > needs to > + be routed: > + </p> > + > + <ol> > + <li> > + The packet first enters the ingress pipeline, and then egress > pipeline of > + the source localnet logical switch datapath and is sent out via the > + localnet port of the source localnet logical switch (instead of > sending > + it to router pipeline). > + </li> > + > + <li> > + The gateway chassis receives the packet via the localnet port of the > + source localnet logical switch and sends it to the integration > bridge. > + The packet then enters the ingress pipeline, and then egress > pipeline of > + the source localnet logical switch datapath and enters the ingress > + pipeline of the logical router datapath. > + </li> > + > + <li> > + Routing decision is taken. > + </li> > + > + <li> > + From the router datapath, packet enters the ingress pipeline and > then > + egress pipeline of the destination localnet logical switch datapath. > + It then goes out of the integration bridge to the provider bridge ( > + belonging to the destination logical switch) via the localnet port. > + </li> > + > + <li> > + The destination chassis receives the packet via the localnet port > and > + sends it to the integration bridge. The packet enters the > + ingress pipeline and then egress pipeline of the destination > localnet > + logical switch and finally delivered to the destination VM port. > + </li> > + </ol> > + > + <p> > + The following happens when a VM sends an external traffic which > requires > + NATting: > + </p> > + > + <ol> > + <li> > + The packet first enters the ingress pipeline, and then egress > pipeline of > + the source localnet logical switch datapath and is sent out via the > + localnet port of the source localnet logical switch (instead of > sending > + it to router pipeline). > + </li> > + > + <li> > + The gateway chassis receives the packet via the localnet port of the > + source localnet logical switch and sends it to the integration > bridge. > + The packet then enters the ingress pipeline, and then egress > pipeline of > + the source localnet logical switch datapath and enters the ingress > + pipeline of the logical router datapath. > + </li> > + > + <li> > + Routing decision is taken and NAT rules are applied. > + </li> > + > + <li> > + From the router datapath, packet enters the ingress pipeline and > then > + egress pipeline of the localnet logical switch datapath which > provides > + external connectivity. It then goes out of the integration bridge > to the > + provider bridge (belonging to the logical switch which provides > external > + connectivity) via the localnet port. > + </li> > + </ol> > + > + <p> > + The following happens for the reverse external traffic. > + </p> > + > + <ol> > + <li> > + The gateway chassis receives the packet from the localnet port of > + the logical switch which provides external connectivity. The packet > then > + enters the ingress pipeline and then egress pipeline of the localnet > + logical switch (which provides external connectivity). The packet > then > + enters the ingress pipeline of the logical router datapath. > + </li> > + > + <li> > + The ingress pipeline of the logical router datapath applies the > unNATting > + rules. The packet then enters the ingress pipeline and then egress > + pipeline of the source localnet logical switch. Since the source VM > + doesn't reside in the gateway chassis, the packet is sent out via > the > + localnet port of the source logical switch. > + </li> > + > + <li> > + The source chassis receives the packet via the localnet port and > + sends it to the integration bridge. The packet enters the > + ingress pipeline and then egress pipeline of the source localnet > + logical switch and finally gets delivered to the source VM port. > + </li> > + </ol> > + > <h2>Life Cycle of a VTEP gateway</h2> > > <p> > diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml > index 474b4f9a7..4141751f8 100644 > --- a/ovn/ovn-nb.xml > +++ b/ovn/ovn-nb.xml > @@ -1681,6 +1681,49 @@ > chassis to enable high availability. > </p> > </column> > + > + <column name="options" key="reside-on-redirect-chassis"> > + <p> > + Generally routing is distributed in <code>OVN</code>. The packet > + from a logical port which needs to be routed hits the router > pipeline > + in the source chassis. For the East-West traffic, the packet is > + sent directly to the destination chassis. For the outside > traffic > + the packet is sent to the gateway chassis. > + </p> > + > + <p> > + When this option is set, <code>OVN</code> considers this only if > + </p> > + > + <ul> > + <li> > + The logical router to which this logical router port belongs > to > + has a distributed gateway port. > + </li> > + > + <li> > + The peer's logical switch has a localnet port (representing > + a VLAN tagged network) > + </li> > + </ul> > + > + <p> > + When this option is set to <code>true</code>, then the packet > + which needs to be routed hits the router pipeline in the chassis > + hosting the distributed gateway router port. The source chassis > + pushes out this traffic via the localnet port. With this the > + East-West traffic is no more distributed and will always go > through > + the gateway chassis. > + </p> > + > + <p> > + Without this option set, for any traffic destined to outside > from a > + logical port which belongs to a logical switch with localnet > port, > + the source chassis will send the traffic to the gateway chassis > via > + the tunnel port instead of the localnet port and this could > cause MTU > + issues. > + </p> > + </column> > </group> > > <group title="Attachment"> > diff --git a/tests/ovn.at b/tests/ovn.at > index ab32faa6b..2db3f675a 100644 > --- a/tests/ovn.at > +++ b/tests/ovn.at > @@ -8567,6 +8567,279 @@ OVN_CLEANUP([hv1],[hv2],[hv3]) > > AT_CLEANUP > > +# VLAN traffic for external network redirected through distributed router > +# gateway port should use vlans(i.e input network vlan tag) across > hypervisors > +# instead of tunneling. > +AT_SETUP([ovn -- vlan traffic for external network with distributed > router gateway port]) > +AT_SKIP_IF([test $HAVE_PYTHON = no]) > +ovn_start > + > +# Logical network: > +# # One LR R1 that has switches foo (192.168.1.0/24) and > +# # alice (172.16.1.0/24) connected to it. The logical port > +# # between R1 and alice has a "redirect-chassis" specified, > +# # i.e. it is the distributed router gateway port(172.16.1.6). > +# # Switch alice also has a localnet port defined. > +# # An additional switch outside has the same subnet as alice > +# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1) > +# # which will receive the packet destined for external network > +# # (i.e 8.8.8.8 as destination ip). > + > +# Physical network: > +# # Three hypervisors hv[123]. > +# # hv1 hosts vif foo1. > +# # hv2 is the "redirect-chassis" that hosts the distributed router > gateway port. > +# # hv3 hosts nexthop port vif outside1. > +# # All other tests connect hypervisors to network n1 through br-phys for > tunneling. > +# # But in this test, hv1 won't connect to n1(and no br-phys in hv1), and > +# # in order to show vlans(instead of tunneling) used between hv1 and hv2, > +# # a new network n2 created and hv1 and hv2 connected to this network > through br-ex. > +# # hv2 and hv3 are still connected to n1 network through br-phys. > +net_add n1 > + > +# We are not calling ovn_attach for hv1, to avoid adding br-phys. > +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any bridge > in hv1 > +sim_add hv1 > +as hv1 > +ovs-vsctl \ > + -- set Open_vSwitch . external-ids:system-id=hv1 \ > + -- set Open_vSwitch . > external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ > + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \ > + -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \ > + -- add-br br-int \ > + -- set bridge br-int fail-mode=secure > other-config:disable-in-band=true \ > + -- set Open_vSwitch . external-ids:ovn-bridge-mappings=public:br-ex > + > +start_daemon ovn-controller > +ovs-vsctl -- add-port br-int hv1-vif1 -- \ > + set interface hv1-vif1 external-ids:iface-id=foo1 \ > + ofport-request=1 > + > +sim_add hv2 > +as hv2 > +ovs-vsctl add-br br-phys > +ovn_attach n1 br-phys 192.168.0.2 > +ovs-vsctl set Open_vSwitch . > external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys" > + > +sim_add hv3 > +as hv3 > +ovs-vsctl add-br br-phys > +ovn_attach n1 br-phys 192.168.0.3 > +ovs-vsctl -- add-port br-int hv3-vif1 -- \ > + set interface hv3-vif1 external-ids:iface-id=outside1 \ > + options:tx_pcap=hv3/vif1-tx.pcap \ > + options:rxq_pcap=hv3/vif1-rx.pcap \ > + ofport-request=1 > +ovs-vsctl set Open_vSwitch . > external-ids:ovn-bridge-mappings="phys:br-phys" > + > +# Create network n2 for vlan connectivity between hv1 and hv2 > +net_add n2 > + > +as hv1 > +ovs-vsctl add-br br-ex > +net_attach n2 br-ex > + > +as hv2 > +ovs-vsctl add-br br-ex > +net_attach n2 br-ex > + > +OVN_POPULATE_ARP > + > +ovn-nbctl create Logical_Router name=R1 > + > +ovn-nbctl ls-add foo > +ovn-nbctl ls-add alice > +ovn-nbctl ls-add outside > + > +# Connect foo to R1 > +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24 > +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \ > + type=router options:router-port=foo \ > + -- lsp-set-addresses rp-foo router > + > +# Connect alice to R1 as distributed router gateway port (172.16.1.6) on > hv2 > +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \ > + -- set Logical_Router_Port alice options:redirect-chassis="hv2" > +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \ > + type=router options:router-port=alice \ > + -- lsp-set-addresses rp-alice router \ > + > +# Create logical port foo1 in foo > +ovn-nbctl lsp-add foo foo1 \ > +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2" > + > +# Create logical port outside1 in outside, which is a nexthop address > +# for 172.16.1.0/24 > +ovn-nbctl lsp-add outside outside1 \ > +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1" > + > +# Set default gateway (nexthop) to 172.16.1.1 > +ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice > +AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24]) > +ovn-nbctl set Logical_Switch_Port rp-alice options:nat-addresses=router > + > +ovn-nbctl lsp-add foo ln-foo > +ovn-nbctl lsp-set-addresses ln-foo unknown > +ovn-nbctl lsp-set-options ln-foo network_name=public > +ovn-nbctl lsp-set-type ln-foo localnet > +AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2]) > + > +# Create localnet port in alice > +ovn-nbctl lsp-add alice ln-alice > +ovn-nbctl lsp-set-addresses ln-alice unknown > +ovn-nbctl lsp-set-type ln-alice localnet > +ovn-nbctl lsp-set-options ln-alice network_name=phys > + > +# Create localnet port in outside > +ovn-nbctl lsp-add outside ln-outside > +ovn-nbctl lsp-set-addresses ln-outside unknown > +ovn-nbctl lsp-set-type ln-outside localnet > +ovn-nbctl lsp-set-options ln-outside network_name=phys > + > +# Allow some time for ovn-northd and ovn-controller to catch up. > +# XXX This should be more systematic. > +ovn-nbctl --wait=hv --timeout=3 sync > + > +# Check that there is a logical flow in logical switch foo's pipeline > +# to set the outport to rp-foo (which is expected). > +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup | > \ > +grep rp-foo | grep -v is_chassis_resident | wc -l`]) > + > +# Set the option 'reside-on-redirect-chassis' for foo > +ovn-nbctl set logical_router_port foo > options:reside-on-redirect-chassis=true > +# Check that there is a logical flow in logical switch foo's pipeline > +# to set the outport to rp-foo with the condition is_chassis_redirect. > +ovn-sbctl dump-flows foo > +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup | > \ > +grep rp-foo | grep is_chassis_resident | wc -l`]) > + > +echo "---------NB dump-----" > +ovn-nbctl show > +echo "---------------------" > +ovn-nbctl list logical_router > +echo "---------------------" > +ovn-nbctl list nat > +echo "---------------------" > +ovn-nbctl list logical_router_port > +echo "---------------------" > + > +echo "---------SB dump-----" > +ovn-sbctl list datapath_binding > +echo "---------------------" > +ovn-sbctl list port_binding > +echo "---------------------" > +ovn-sbctl dump-flows > +echo "---------------------" > +ovn-sbctl list chassis > +echo "---------------------" > + > +for chassis in hv1 hv2 hv3; do > + as $chassis > + echo "------ $chassis dump ----------" > + ovs-vsctl show br-int > + ovs-ofctl show br-int > + ovs-ofctl dump-flows br-int > + echo "--------------------------" > +done > + > +ip_to_hex() { > + printf "%02x%02x%02x%02x" "$@" > +} > + > +foo1_ip=$(ip_to_hex 192 168 1 2) > +gw_ip=$(ip_to_hex 172 16 1 6) > +dst_ip=$(ip_to_hex 8 8 8 8) > +nexthop_ip=$(ip_to_hex 172 16 1 1) > + > +foo1_mac="f00000010203" > +foo_mac="000001010203" > +gw_mac="000002010203" > +nexthop_mac="f00000010204" > + > +# Send ip packet from foo1 to 8.8.8.8 > +src_mac="f00000010203" > +dst_mac="000001010203" > > +packet=${foo_mac}${foo1_mac}08004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000 > + > +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet > +sleep 2 > + > +# ARP request packet for nexthop_ip to expect at outside1 > > +arp_request=ffffffffffff${gw_mac}08060001080006040001${gw_mac}${gw_ip}000000000000${nexthop_ip} > +echo $arp_request >> hv3-vif1.expected > +cat hv3-vif1.expected > expout > +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep > ${nexthop_ip} | uniq > hv3-vif1 > +AT_CHECK([sort hv3-vif1], [0], [expout]) > + > +# Send ARP reply from outside1 back to the router > +reply_mac="f00000010204" > > +arp_reply=${gw_mac}${nexthop_mac}08060001080006040002${nexthop_mac}${nexthop_ip}${gw_mac}${gw_ip} > + > +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply > +OVS_WAIT_UNTIL([ > + test `as hv2 ovs-ofctl dump-flows br-int | grep table=66 | \ > +grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1 > + ]) > + > +# VLAN tagged packet with router port(192.168.1.1) MAC as destination MAC > +# is expected on bridge connecting hv1 and hv2 > > +expected=${foo_mac}${foo1_mac}8100000208004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000 > +echo $expected > hv1-br-ex_n2.expected > + > +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port. > +# As connection tracking not enabled for this test, snat can't be done on > the packet. > +# We still see foo1 as the source ip address. But source mac(gateway MAC) > and > +# dest mac(nexthop mac) are properly configured. > > +expected=${nexthop_mac}${gw_mac}08004500001c000000003f110100${foo1_ip}${dst_ip}0035111100080000 > +echo $expected > hv3-vif1.expected > + > +reset_pcap_file() { > + local iface=$1 > + local pcap_file=$2 > + ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \ > +options:rxq_pcap=dummy-rx.pcap > + rm -f ${pcap_file}*.pcap > + ovs-vsctl -- set Interface $iface > options:tx_pcap=${pcap_file}-tx.pcap \ > +options:rxq_pcap=${pcap_file}-rx.pcap > +} > + > +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2 > +as hv3 reset_pcap_file hv3-vif1 hv3/vif1 > +sleep 2 > +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet > +sleep 2 > + > +# On hv1, the packet should not go from vlan switch pipleline to router > +# pipleine > +as hv1 ovs-ofctl dump-flows br-int > + > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep > "priority=100,reg15=0x1,metadata=0x2" \ > +| grep actions=clone | grep -v n_packets=0 | wc -l], [0], [[0 > +]]) > + > +# On hv1, table 32 check that no packet goes via the tunnel port > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 \ > +| grep "NXM_NX_TUN_ID" | grep -v n_packets=0 | wc -l], [0], [[0 > +]]) > + > +ip_packet() { > + grep "1010203f00000010203" > +} > + > +# Check vlan tagged packet on the bridge connecting hv1 and hv2 with the > +# foo1's mac. > +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-ex_n2-tx.pcap | > ip_packet | uniq > hv1-br-ex_n2 > +cat hv1-br-ex_n2.expected > expout > +AT_CHECK([sort hv1-br-ex_n2], [0], [expout]) > + > +# Check expected packet on nexthop interface > +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep > ${foo1_ip}${dst_ip} | uniq > hv3-vif1 > +cat hv3-vif1.expected > expout > +AT_CHECK([sort hv3-vif1], [0], [expout]) > + > +OVN_CLEANUP([hv1],[hv2],[hv3]) > +AT_CLEANUP > + > AT_SETUP([ovn -- IPv6 ND Router Solicitation responder]) > AT_KEYWORDS([ovn-nd_ra]) > AT_SKIP_IF([test $HAVE_PYTHON = no]) > -- > 2.19.1 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On Tue, Nov 27, 2018 at 12:48 AM Guru Shetty <guru@ovn.org> wrote: > > > On Mon, 19 Nov 2018 at 08:18, <nusiddiq@redhat.com> wrote: > >> From: Numan Siddique <nusiddiq@redhat.com> >> >> An OVN deployment can have multiple logical switches each with a >> localnet port connected to a distributed logical router in which one >> logical switch may provide external connectivity and the rest of >> the localnet logical switches use VLAN tagging in the physical >> network. >> >> As reported in [1], external traffic from these localnet VLAN tagged >> logical switches are tunnelled to the gateway chassis (chassis hosting >> a distributed gateway port which applies NAT rules). As part of the >> discussion in [1], there are few possible solutions proposed by >> Russell [2]. This patch implements the first option in [2]. >> >> With this patch, a new option 'reside-on-redirect-chassis' in 'options' >> column of Logical_Router_Port table is added. If the value of this >> option is set to 'true' and if the logical router also have a >> distributed gateway port, then routing for this logical router port >> is centralized in the chassis hosting the distributed gateway port. >> >> If a logical switch 'sw0' is connected to a router 'lr0' with the >> router port - 'lr0-sw0' with the address - "00:00:00:00:af:12 192.168.1.1" >> , and it has a distributed logical port - 'lr0-public', then the >> below logical flow is added in the logical switch pipeline >> of 'sw0' if the 'reside-on-redirect-chassis' option is set on 'lr-sw0' - >> >> table=16(ls_in_l2_lkup), priority=50, >> match=(eth.dst == 00:00:00:00:af:12 && >> is_chassis_resident("cr-lr0-public")), >> action=(outport = "sw0-lr0"; output;) >> >> "cr-lr0-public" is an internal port binding created by ovn-northd of type >> 'chassisredirect' for lr0-public in SB DB. Please see "man ovn-sb" for >> more details. >> >> With the above flow, the packet doesn't enter the router pipeline in >> the source chassis. Instead the packet is sent out via the localnet >> port of 'sw0'. The gateway chassis upon receiving this packet, runs >> the logical router pipeline applying NAT rules and sends the traffic >> out via the localnet port of the logical switch providing external >> connectivity. >> The gateway chassis will also reply to the ARP requests for the router >> port IPs. >> >> With this approach, we avoid redirecting the external traffic to the >> gateway chassis via the tunnel port. There are a couple of drawbacks >> with this approach: >> >> - East - West routing is no more distributed for the VLAN tagged >> localnet logical switches if 'reside-on-redirect-chassis' option is >> defined >> >> - 'dnat_and_snat' NAT rules with 'logical_mac' and 'logical_port' >> columns defined will not work for these logical switches. >> >> This approach is taken for now as it is simple. If there is a requirement >> to support distributed routing for these VLAN tenant networks, we >> can explore other possible solutions. >> >> [1] - >> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html >> [2] - >> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046557.html >> >> Reported-at: >> https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html >> Reported-by: venkata anil <vkommadi@redhat.com> >> Acked-by: Gurucharan Shetty <guru@ovn.org> >> Co-authored-by: venkata anil <vkommadi@redhat.com> >> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> >> Signed-off-by: venkata anil <vkommadi@redhat.com> >> > > Since no one else looks to have any further comments, I applied this to > master. > > Thanks Guru for the review and applying the patch. Numan > --- >> >> v1 -> v2 >> -------- >> * Addressed the review comments from Guru. >> * Removed the patch 2 'ovn: Support a new Logical_Switch_Port.type - >> 'external' from this series as it is an independent patch. >> >> ovn/northd/ovn-northd.8.xml | 30 ++++ >> ovn/northd/ovn-northd.c | 71 +++++++--- >> ovn/ovn-architecture.7.xml | 211 ++++++++++++++++++++++++++++ >> ovn/ovn-nb.xml | 43 ++++++ >> tests/ovn.at | 273 ++++++++++++++++++++++++++++++++++++ >> 5 files changed, 612 insertions(+), 16 deletions(-) >> >> diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml >> index 7352c6764..f52699bd3 100644 >> --- a/ovn/northd/ovn-northd.8.xml >> +++ b/ovn/northd/ovn-northd.8.xml >> @@ -874,6 +874,25 @@ output; >> resident. >> </li> >> </ul> >> + >> + <p> >> + For the Ethernet address on a logical switch port of type >> + <code>router</code>, when that logical switch port's >> + <ref column="addresses" table="Logical_Switch_Port" >> + db="OVN_Northbound"/> column is set to <code>router</code> and >> + the connected logical router port specifies a >> + <code>reside-on-redirect-chassis</code> and the logical router >> + to which the connected logical router port belongs to has a >> + <code>redirect-chassis</code> distributed gateway logical >> router >> + port: >> + </p> >> + >> + <ul> >> + <li> >> + The flow for the connected logical router port's Ethernet >> + address is only programmed on the >> <code>redirect-chassis</code>. >> + </li> >> + </ul> >> </li> >> >> <li> >> @@ -1179,6 +1198,17 @@ output; >> upstream MAC learning to point to the >> <code>redirect-chassis</code>. >> </p> >> + >> + <p> >> + For the logical router port with the option >> + <code>reside-on-redirect-chassis</code> set (which is >> centralized), >> + the above flows are only programmed on the gateway port >> instance on >> + the <code>redirect-chassis</code> (if the logical router has a >> + distributed gateway port). This behavior avoids generation >> + of multiple ARP responses from different chassis, and allows >> + upstream MAC learning to point to the >> + <code>redirect-chassis</code>. >> + </p> >> </li> >> >> <li> >> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c >> index 58bef7de5..2de9fb38d 100644 >> --- a/ovn/northd/ovn-northd.c >> +++ b/ovn/northd/ovn-northd.c >> @@ -4461,13 +4461,32 @@ build_lswitch_flows(struct hmap *datapaths, >> struct hmap *ports, >> ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT, >> ETH_ADDR_ARGS(mac)); >> if (op->peer->od->l3dgw_port >> - && op->peer == op->peer->od->l3dgw_port >> - && op->peer->od->l3redirect_port) { >> - /* The destination lookup flow for the router's >> - * distributed gateway port MAC address should only >> be >> - * programmed on the "redirect-chassis". */ >> - ds_put_format(&match, " && is_chassis_resident(%s)", >> - >> op->peer->od->l3redirect_port->json_key); >> + && op->peer->od->l3redirect_port >> + && op->od->localnet_port) { >> + bool add_chassis_resident_check = false; >> + if (op->peer == op->peer->od->l3dgw_port) { >> + /* The peer of this port represents a distributed >> + * gateway port. The destination lookup flow for >> the >> + * router's distributed gateway port MAC address >> should >> + * only be programmed on the "redirect-chassis". >> */ >> + add_chassis_resident_check = true; >> + } else { >> + /* Check if the option >> 'reside-on-redirect-chassis' >> + * is set to true on the peer port. If set to >> true >> + * and if the logical switch has a localnet >> port, it >> + * means the router pipeline for the packets from >> + * this logical switch should be run on the >> chassis >> + * hosting the gateway port. >> + */ >> + add_chassis_resident_check = smap_get_bool( >> + &op->peer->nbrp->options, >> + "reside-on-redirect-chassis", false); >> + } >> + >> + if (add_chassis_resident_check) { >> + ds_put_format(&match, " && >> is_chassis_resident(%s)", >> + >> op->peer->od->l3redirect_port->json_key); >> + } >> } >> >> ds_clear(&actions); >> @@ -5232,15 +5251,35 @@ build_lrouter_flows(struct hmap *datapaths, >> struct hmap *ports, >> op->lrp_networks.ipv4_addrs[i].network_s, >> op->lrp_networks.ipv4_addrs[i].plen, >> op->lrp_networks.ipv4_addrs[i].addr_s); >> - if (op->od->l3dgw_port && op == op->od->l3dgw_port >> - && op->od->l3redirect_port) { >> - /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s >> - * should only be sent from the "redirect-chassis", so >> that >> - * upstream MAC learning points to the >> "redirect-chassis". >> - * Also need to avoid generation of multiple ARP >> responses >> - * from different chassis. */ >> - ds_put_format(&match, " && is_chassis_resident(%s)", >> - op->od->l3redirect_port->json_key); >> + >> + if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer >> + && op->peer->od->localnet_port) { >> + bool add_chassis_resident_check = false; >> + if (op == op->od->l3dgw_port) { >> + /* Traffic with eth.src = >> l3dgw_port->lrp_networks.ea_s >> + * should only be sent from the "redirect-chassis", >> so that >> + * upstream MAC learning points to the >> "redirect-chassis". >> + * Also need to avoid generation of multiple ARP >> responses >> + * from different chassis. */ >> + add_chassis_resident_check = true; >> + } else { >> + /* Check if the option 'reside-on-redirect-chassis' >> + * is set to true on the router port. If set to true >> + * and if peer's logical switch has a localnet port, >> it >> + * means the router pipeline for the packets from >> + * peer's logical switch is be run on the chassis >> + * hosting the gateway port and it should reply to >> the >> + * ARP requests for the router port IPs. >> + */ >> + add_chassis_resident_check = smap_get_bool( >> + &op->nbrp->options, >> + "reside-on-redirect-chassis", false); >> + } >> + >> + if (add_chassis_resident_check) { >> + ds_put_format(&match, " && is_chassis_resident(%s)", >> + op->od->l3redirect_port->json_key); >> + } >> } >> >> ds_clear(&actions); >> diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml >> index 64e7d89e6..3936e6016 100644 >> --- a/ovn/ovn-architecture.7.xml >> +++ b/ovn/ovn-architecture.7.xml >> @@ -1372,6 +1372,217 @@ >> http://docs.openvswitch.org/en/latest/topics/high-availability. >> </p> >> >> + <h2>Multiple localnet logical switches connected to a Logical >> Router</h2> >> + >> + <p> >> + It is possible to have multiple logical switches each with a >> localnet port >> + (representing physical networks) connected to a logical router, in >> which >> + one localnet logical switch may provide the external connectivity >> via a >> + distributed gateway port and rest of the localnet logical switches >> use >> + VLAN tagging in the physical network. It is expected that >> + <code>ovn-bridge-mappings</code> is configured appropriately on the >> + chassis for all these localnet networks. >> + </p> >> + >> + <h3>East West routing</h3> >> + <p> >> + East-West routing between these localnet VLAN tagged logical switches >> + work almost the same way as normal logical switches. When the VM >> sends >> + such a packet, then: >> + </p> >> + <ol> >> + <li> >> + It first enters the ingress pipeline, and then egress pipeline of >> the >> + source localnet logical switch datapath. It then enters the ingress >> + pipeline of the logical router datapath via the logical router >> port in >> + the source chassis. >> + </li> >> + >> + <li> >> + Routing decision is taken. >> + </li> >> + >> + <li> >> + From the router datapath, packet enters the ingress pipeline and >> then >> + egress pipeline of the destination localnet logical switch datapath >> + and goes out of the integration bridge to the provider bridge ( >> + belonging to the destination logical switch) via the localnet port. >> + </li> >> + >> + <li> >> + The destination chassis receives the packet via the localnet port >> and >> + sends it to the integration bridge. The packet enters the >> + ingress pipeline and then egress pipeline of the destination >> localnet >> + logical switch and finally gets delivered to the destination VM >> port. >> + </li> >> + </ol> >> + >> + <h3>External traffic</h3> >> + >> + <p> >> + The following happens when a VM sends an external traffic (which >> requires >> + NATting) and the chassis hosting the VM doesn't have a distributed >> gateway >> + port. >> + </p> >> + >> + <ol> >> + <li> >> + The packet first enters the ingress pipeline, and then egress >> pipeline of >> + the source localnet logical switch datapath. It then enters the >> ingress >> + pipeline of the logical router datapath via the logical router >> port in >> + the source chassis. >> + </li> >> + >> + <li> >> + Routing decision is taken. Since the gateway router or the >> distributed >> + gateway port doesn't reside in the source chassis, the traffic is >> + redirected to the gateway chassis via the tunnel port. >> + </li> >> + >> + <li> >> + The gateway chassis receives the packet via the tunnel port and the >> + packet enters the egress pipeline of the logical router datapath. >> NAT >> + rules are applied here. The packet then enters the ingress >> pipeline and >> + then egress pipeline of the localnet logical switch datapath which >> + provides external connectivity and finally goes out via the >> localnet >> + port of the logical switch which provides external connectivity. >> + </li> >> + </ol> >> + >> + <p> >> + Although this works, the VM traffic is tunnelled when sent from the >> compute >> + chassis to the gateway chassis. In order for it to work properly, >> the MTU >> + of the localnet logical switches must be lowered to account for the >> tunnel >> + encapsulation. >> + </p> >> + >> + <h2> >> + Centralized routing for localnet VLAN tagged logical switches >> connected >> + to a Logical Router >> + </h2> >> + >> + <p> >> + To overcome the tunnel encapsulation problem described in the >> previous >> + section, <code>OVN</code> supports the option of enabling centralized >> + routing for localnet VLAN tagged logical switches. CMS can configure >> the >> + option <ref column="options:reside-on-redirect-chassis" >> + table="Logical_Router_Port" db="OVN_NB"/> to <code>true</code> for >> each >> + <ref table="Logical_Router_Port" db="OVN_NB"/> which connects to the >> + localnet VLAN tagged logical switches. This causes the gateway >> + chassis (hosting the distributed gateway port) to handle all the >> + routing for these networks, making it centralized. It will reply to >> + the ARP requests for the logical router port IPs. >> + </p> >> + >> + <p> >> + If the logical router doesn't have a distributed gateway port >> connecting >> + to the localnet logical switch which provides external connectivity, >> + then this option is ignored by <code>OVN</code>. >> + </p> >> + >> + <p> >> + The following happens when a VM sends an east-west traffic which >> needs to >> + be routed: >> + </p> >> + >> + <ol> >> + <li> >> + The packet first enters the ingress pipeline, and then egress >> pipeline of >> + the source localnet logical switch datapath and is sent out via the >> + localnet port of the source localnet logical switch (instead of >> sending >> + it to router pipeline). >> + </li> >> + >> + <li> >> + The gateway chassis receives the packet via the localnet port of >> the >> + source localnet logical switch and sends it to the integration >> bridge. >> + The packet then enters the ingress pipeline, and then egress >> pipeline of >> + the source localnet logical switch datapath and enters the ingress >> + pipeline of the logical router datapath. >> + </li> >> + >> + <li> >> + Routing decision is taken. >> + </li> >> + >> + <li> >> + From the router datapath, packet enters the ingress pipeline and >> then >> + egress pipeline of the destination localnet logical switch >> datapath. >> + It then goes out of the integration bridge to the provider bridge ( >> + belonging to the destination logical switch) via the localnet port. >> + </li> >> + >> + <li> >> + The destination chassis receives the packet via the localnet port >> and >> + sends it to the integration bridge. The packet enters the >> + ingress pipeline and then egress pipeline of the destination >> localnet >> + logical switch and finally delivered to the destination VM port. >> + </li> >> + </ol> >> + >> + <p> >> + The following happens when a VM sends an external traffic which >> requires >> + NATting: >> + </p> >> + >> + <ol> >> + <li> >> + The packet first enters the ingress pipeline, and then egress >> pipeline of >> + the source localnet logical switch datapath and is sent out via the >> + localnet port of the source localnet logical switch (instead of >> sending >> + it to router pipeline). >> + </li> >> + >> + <li> >> + The gateway chassis receives the packet via the localnet port of >> the >> + source localnet logical switch and sends it to the integration >> bridge. >> + The packet then enters the ingress pipeline, and then egress >> pipeline of >> + the source localnet logical switch datapath and enters the ingress >> + pipeline of the logical router datapath. >> + </li> >> + >> + <li> >> + Routing decision is taken and NAT rules are applied. >> + </li> >> + >> + <li> >> + From the router datapath, packet enters the ingress pipeline and >> then >> + egress pipeline of the localnet logical switch datapath which >> provides >> + external connectivity. It then goes out of the integration bridge >> to the >> + provider bridge (belonging to the logical switch which provides >> external >> + connectivity) via the localnet port. >> + </li> >> + </ol> >> + >> + <p> >> + The following happens for the reverse external traffic. >> + </p> >> + >> + <ol> >> + <li> >> + The gateway chassis receives the packet from the localnet port of >> + the logical switch which provides external connectivity. The >> packet then >> + enters the ingress pipeline and then egress pipeline of the >> localnet >> + logical switch (which provides external connectivity). The packet >> then >> + enters the ingress pipeline of the logical router datapath. >> + </li> >> + >> + <li> >> + The ingress pipeline of the logical router datapath applies the >> unNATting >> + rules. The packet then enters the ingress pipeline and then egress >> + pipeline of the source localnet logical switch. Since the source VM >> + doesn't reside in the gateway chassis, the packet is sent out via >> the >> + localnet port of the source logical switch. >> + </li> >> + >> + <li> >> + The source chassis receives the packet via the localnet port and >> + sends it to the integration bridge. The packet enters the >> + ingress pipeline and then egress pipeline of the source localnet >> + logical switch and finally gets delivered to the source VM port. >> + </li> >> + </ol> >> + >> <h2>Life Cycle of a VTEP gateway</h2> >> >> <p> >> diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml >> index 474b4f9a7..4141751f8 100644 >> --- a/ovn/ovn-nb.xml >> +++ b/ovn/ovn-nb.xml >> @@ -1681,6 +1681,49 @@ >> chassis to enable high availability. >> </p> >> </column> >> + >> + <column name="options" key="reside-on-redirect-chassis"> >> + <p> >> + Generally routing is distributed in <code>OVN</code>. The >> packet >> + from a logical port which needs to be routed hits the router >> pipeline >> + in the source chassis. For the East-West traffic, the packet is >> + sent directly to the destination chassis. For the outside >> traffic >> + the packet is sent to the gateway chassis. >> + </p> >> + >> + <p> >> + When this option is set, <code>OVN</code> considers this only >> if >> + </p> >> + >> + <ul> >> + <li> >> + The logical router to which this logical router port belongs >> to >> + has a distributed gateway port. >> + </li> >> + >> + <li> >> + The peer's logical switch has a localnet port (representing >> + a VLAN tagged network) >> + </li> >> + </ul> >> + >> + <p> >> + When this option is set to <code>true</code>, then the packet >> + which needs to be routed hits the router pipeline in the >> chassis >> + hosting the distributed gateway router port. The source chassis >> + pushes out this traffic via the localnet port. With this the >> + East-West traffic is no more distributed and will always go >> through >> + the gateway chassis. >> + </p> >> + >> + <p> >> + Without this option set, for any traffic destined to outside >> from a >> + logical port which belongs to a logical switch with localnet >> port, >> + the source chassis will send the traffic to the gateway >> chassis via >> + the tunnel port instead of the localnet port and this could >> cause MTU >> + issues. >> + </p> >> + </column> >> </group> >> >> <group title="Attachment"> >> diff --git a/tests/ovn.at b/tests/ovn.at >> index ab32faa6b..2db3f675a 100644 >> --- a/tests/ovn.at >> +++ b/tests/ovn.at >> @@ -8567,6 +8567,279 @@ OVN_CLEANUP([hv1],[hv2],[hv3]) >> >> AT_CLEANUP >> >> +# VLAN traffic for external network redirected through distributed router >> +# gateway port should use vlans(i.e input network vlan tag) across >> hypervisors >> +# instead of tunneling. >> +AT_SETUP([ovn -- vlan traffic for external network with distributed >> router gateway port]) >> +AT_SKIP_IF([test $HAVE_PYTHON = no]) >> +ovn_start >> + >> +# Logical network: >> +# # One LR R1 that has switches foo (192.168.1.0/24) and >> +# # alice (172.16.1.0/24) connected to it. The logical port >> +# # between R1 and alice has a "redirect-chassis" specified, >> +# # i.e. it is the distributed router gateway port(172.16.1.6). >> +# # Switch alice also has a localnet port defined. >> +# # An additional switch outside has the same subnet as alice >> +# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1) >> +# # which will receive the packet destined for external network >> +# # (i.e 8.8.8.8 as destination ip). >> + >> +# Physical network: >> +# # Three hypervisors hv[123]. >> +# # hv1 hosts vif foo1. >> +# # hv2 is the "redirect-chassis" that hosts the distributed router >> gateway port. >> +# # hv3 hosts nexthop port vif outside1. >> +# # All other tests connect hypervisors to network n1 through br-phys >> for tunneling. >> +# # But in this test, hv1 won't connect to n1(and no br-phys in hv1), and >> +# # in order to show vlans(instead of tunneling) used between hv1 and >> hv2, >> +# # a new network n2 created and hv1 and hv2 connected to this network >> through br-ex. >> +# # hv2 and hv3 are still connected to n1 network through br-phys. >> +net_add n1 >> + >> +# We are not calling ovn_attach for hv1, to avoid adding br-phys. >> +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any bridge >> in hv1 >> +sim_add hv1 >> +as hv1 >> +ovs-vsctl \ >> + -- set Open_vSwitch . external-ids:system-id=hv1 \ >> + -- set Open_vSwitch . >> external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ >> + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \ >> + -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \ >> + -- add-br br-int \ >> + -- set bridge br-int fail-mode=secure >> other-config:disable-in-band=true \ >> + -- set Open_vSwitch . external-ids:ovn-bridge-mappings=public:br-ex >> + >> +start_daemon ovn-controller >> +ovs-vsctl -- add-port br-int hv1-vif1 -- \ >> + set interface hv1-vif1 external-ids:iface-id=foo1 \ >> + ofport-request=1 >> + >> +sim_add hv2 >> +as hv2 >> +ovs-vsctl add-br br-phys >> +ovn_attach n1 br-phys 192.168.0.2 >> +ovs-vsctl set Open_vSwitch . >> external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys" >> + >> +sim_add hv3 >> +as hv3 >> +ovs-vsctl add-br br-phys >> +ovn_attach n1 br-phys 192.168.0.3 >> +ovs-vsctl -- add-port br-int hv3-vif1 -- \ >> + set interface hv3-vif1 external-ids:iface-id=outside1 \ >> + options:tx_pcap=hv3/vif1-tx.pcap \ >> + options:rxq_pcap=hv3/vif1-rx.pcap \ >> + ofport-request=1 >> +ovs-vsctl set Open_vSwitch . >> external-ids:ovn-bridge-mappings="phys:br-phys" >> + >> +# Create network n2 for vlan connectivity between hv1 and hv2 >> +net_add n2 >> + >> +as hv1 >> +ovs-vsctl add-br br-ex >> +net_attach n2 br-ex >> + >> +as hv2 >> +ovs-vsctl add-br br-ex >> +net_attach n2 br-ex >> + >> +OVN_POPULATE_ARP >> + >> +ovn-nbctl create Logical_Router name=R1 >> + >> +ovn-nbctl ls-add foo >> +ovn-nbctl ls-add alice >> +ovn-nbctl ls-add outside >> + >> +# Connect foo to R1 >> +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24 >> +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \ >> + type=router options:router-port=foo \ >> + -- lsp-set-addresses rp-foo router >> + >> +# Connect alice to R1 as distributed router gateway port (172.16.1.6) on >> hv2 >> +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \ >> + -- set Logical_Router_Port alice options:redirect-chassis="hv2" >> +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \ >> + type=router options:router-port=alice \ >> + -- lsp-set-addresses rp-alice router \ >> + >> +# Create logical port foo1 in foo >> +ovn-nbctl lsp-add foo foo1 \ >> +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2" >> + >> +# Create logical port outside1 in outside, which is a nexthop address >> +# for 172.16.1.0/24 >> +ovn-nbctl lsp-add outside outside1 \ >> +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1" >> + >> +# Set default gateway (nexthop) to 172.16.1.1 >> +ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice >> +AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24]) >> +ovn-nbctl set Logical_Switch_Port rp-alice options:nat-addresses=router >> + >> +ovn-nbctl lsp-add foo ln-foo >> +ovn-nbctl lsp-set-addresses ln-foo unknown >> +ovn-nbctl lsp-set-options ln-foo network_name=public >> +ovn-nbctl lsp-set-type ln-foo localnet >> +AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2]) >> + >> +# Create localnet port in alice >> +ovn-nbctl lsp-add alice ln-alice >> +ovn-nbctl lsp-set-addresses ln-alice unknown >> +ovn-nbctl lsp-set-type ln-alice localnet >> +ovn-nbctl lsp-set-options ln-alice network_name=phys >> + >> +# Create localnet port in outside >> +ovn-nbctl lsp-add outside ln-outside >> +ovn-nbctl lsp-set-addresses ln-outside unknown >> +ovn-nbctl lsp-set-type ln-outside localnet >> +ovn-nbctl lsp-set-options ln-outside network_name=phys >> + >> +# Allow some time for ovn-northd and ovn-controller to catch up. >> +# XXX This should be more systematic. >> +ovn-nbctl --wait=hv --timeout=3 sync >> + >> +# Check that there is a logical flow in logical switch foo's pipeline >> +# to set the outport to rp-foo (which is expected). >> +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup >> | \ >> +grep rp-foo | grep -v is_chassis_resident | wc -l`]) >> + >> +# Set the option 'reside-on-redirect-chassis' for foo >> +ovn-nbctl set logical_router_port foo >> options:reside-on-redirect-chassis=true >> +# Check that there is a logical flow in logical switch foo's pipeline >> +# to set the outport to rp-foo with the condition is_chassis_redirect. >> +ovn-sbctl dump-flows foo >> +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup >> | \ >> +grep rp-foo | grep is_chassis_resident | wc -l`]) >> + >> +echo "---------NB dump-----" >> +ovn-nbctl show >> +echo "---------------------" >> +ovn-nbctl list logical_router >> +echo "---------------------" >> +ovn-nbctl list nat >> +echo "---------------------" >> +ovn-nbctl list logical_router_port >> +echo "---------------------" >> + >> +echo "---------SB dump-----" >> +ovn-sbctl list datapath_binding >> +echo "---------------------" >> +ovn-sbctl list port_binding >> +echo "---------------------" >> +ovn-sbctl dump-flows >> +echo "---------------------" >> +ovn-sbctl list chassis >> +echo "---------------------" >> + >> +for chassis in hv1 hv2 hv3; do >> + as $chassis >> + echo "------ $chassis dump ----------" >> + ovs-vsctl show br-int >> + ovs-ofctl show br-int >> + ovs-ofctl dump-flows br-int >> + echo "--------------------------" >> +done >> + >> +ip_to_hex() { >> + printf "%02x%02x%02x%02x" "$@" >> +} >> + >> +foo1_ip=$(ip_to_hex 192 168 1 2) >> +gw_ip=$(ip_to_hex 172 16 1 6) >> +dst_ip=$(ip_to_hex 8 8 8 8) >> +nexthop_ip=$(ip_to_hex 172 16 1 1) >> + >> +foo1_mac="f00000010203" >> +foo_mac="000001010203" >> +gw_mac="000002010203" >> +nexthop_mac="f00000010204" >> + >> +# Send ip packet from foo1 to 8.8.8.8 >> +src_mac="f00000010203" >> +dst_mac="000001010203" >> >> +packet=${foo_mac}${foo1_mac}08004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000 >> + >> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet >> +sleep 2 >> + >> +# ARP request packet for nexthop_ip to expect at outside1 >> >> +arp_request=ffffffffffff${gw_mac}08060001080006040001${gw_mac}${gw_ip}000000000000${nexthop_ip} >> +echo $arp_request >> hv3-vif1.expected >> +cat hv3-vif1.expected > expout >> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep >> ${nexthop_ip} | uniq > hv3-vif1 >> +AT_CHECK([sort hv3-vif1], [0], [expout]) >> + >> +# Send ARP reply from outside1 back to the router >> +reply_mac="f00000010204" >> >> +arp_reply=${gw_mac}${nexthop_mac}08060001080006040002${nexthop_mac}${nexthop_ip}${gw_mac}${gw_ip} >> + >> +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply >> +OVS_WAIT_UNTIL([ >> + test `as hv2 ovs-ofctl dump-flows br-int | grep table=66 | \ >> +grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1 >> + ]) >> + >> +# VLAN tagged packet with router port(192.168.1.1) MAC as destination MAC >> +# is expected on bridge connecting hv1 and hv2 >> >> +expected=${foo_mac}${foo1_mac}8100000208004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000 >> +echo $expected > hv1-br-ex_n2.expected >> + >> +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port. >> +# As connection tracking not enabled for this test, snat can't be done >> on the packet. >> +# We still see foo1 as the source ip address. But source mac(gateway >> MAC) and >> +# dest mac(nexthop mac) are properly configured. >> >> +expected=${nexthop_mac}${gw_mac}08004500001c000000003f110100${foo1_ip}${dst_ip}0035111100080000 >> +echo $expected > hv3-vif1.expected >> + >> +reset_pcap_file() { >> + local iface=$1 >> + local pcap_file=$2 >> + ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \ >> +options:rxq_pcap=dummy-rx.pcap >> + rm -f ${pcap_file}*.pcap >> + ovs-vsctl -- set Interface $iface >> options:tx_pcap=${pcap_file}-tx.pcap \ >> +options:rxq_pcap=${pcap_file}-rx.pcap >> +} >> + >> +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2 >> +as hv3 reset_pcap_file hv3-vif1 hv3/vif1 >> +sleep 2 >> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet >> +sleep 2 >> + >> +# On hv1, the packet should not go from vlan switch pipleline to router >> +# pipleine >> +as hv1 ovs-ofctl dump-flows br-int >> + >> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep >> "priority=100,reg15=0x1,metadata=0x2" \ >> +| grep actions=clone | grep -v n_packets=0 | wc -l], [0], [[0 >> +]]) >> + >> +# On hv1, table 32 check that no packet goes via the tunnel port >> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 \ >> +| grep "NXM_NX_TUN_ID" | grep -v n_packets=0 | wc -l], [0], [[0 >> +]]) >> + >> +ip_packet() { >> + grep "1010203f00000010203" >> +} >> + >> +# Check vlan tagged packet on the bridge connecting hv1 and hv2 with the >> +# foo1's mac. >> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-ex_n2-tx.pcap | >> ip_packet | uniq > hv1-br-ex_n2 >> +cat hv1-br-ex_n2.expected > expout >> +AT_CHECK([sort hv1-br-ex_n2], [0], [expout]) >> + >> +# Check expected packet on nexthop interface >> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep >> ${foo1_ip}${dst_ip} | uniq > hv3-vif1 >> +cat hv3-vif1.expected > expout >> +AT_CHECK([sort hv3-vif1], [0], [expout]) >> + >> +OVN_CLEANUP([hv1],[hv2],[hv3]) >> +AT_CLEANUP >> + >> AT_SETUP([ovn -- IPv6 ND Router Solicitation responder]) >> AT_KEYWORDS([ovn-nd_ra]) >> AT_SKIP_IF([test $HAVE_PYTHON = no]) >> -- >> 2.19.1 >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >
diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml index 7352c6764..f52699bd3 100644 --- a/ovn/northd/ovn-northd.8.xml +++ b/ovn/northd/ovn-northd.8.xml @@ -874,6 +874,25 @@ output; resident. </li> </ul> + + <p> + For the Ethernet address on a logical switch port of type + <code>router</code>, when that logical switch port's + <ref column="addresses" table="Logical_Switch_Port" + db="OVN_Northbound"/> column is set to <code>router</code> and + the connected logical router port specifies a + <code>reside-on-redirect-chassis</code> and the logical router + to which the connected logical router port belongs to has a + <code>redirect-chassis</code> distributed gateway logical router + port: + </p> + + <ul> + <li> + The flow for the connected logical router port's Ethernet + address is only programmed on the <code>redirect-chassis</code>. + </li> + </ul> </li> <li> @@ -1179,6 +1198,17 @@ output; upstream MAC learning to point to the <code>redirect-chassis</code>. </p> + + <p> + For the logical router port with the option + <code>reside-on-redirect-chassis</code> set (which is centralized), + the above flows are only programmed on the gateway port instance on + the <code>redirect-chassis</code> (if the logical router has a + distributed gateway port). This behavior avoids generation + of multiple ARP responses from different chassis, and allows + upstream MAC learning to point to the + <code>redirect-chassis</code>. + </p> </li> <li> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index 58bef7de5..2de9fb38d 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -4461,13 +4461,32 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT, ETH_ADDR_ARGS(mac)); if (op->peer->od->l3dgw_port - && op->peer == op->peer->od->l3dgw_port - && op->peer->od->l3redirect_port) { - /* The destination lookup flow for the router's - * distributed gateway port MAC address should only be - * programmed on the "redirect-chassis". */ - ds_put_format(&match, " && is_chassis_resident(%s)", - op->peer->od->l3redirect_port->json_key); + && op->peer->od->l3redirect_port + && op->od->localnet_port) { + bool add_chassis_resident_check = false; + if (op->peer == op->peer->od->l3dgw_port) { + /* The peer of this port represents a distributed + * gateway port. The destination lookup flow for the + * router's distributed gateway port MAC address should + * only be programmed on the "redirect-chassis". */ + add_chassis_resident_check = true; + } else { + /* Check if the option 'reside-on-redirect-chassis' + * is set to true on the peer port. If set to true + * and if the logical switch has a localnet port, it + * means the router pipeline for the packets from + * this logical switch should be run on the chassis + * hosting the gateway port. + */ + add_chassis_resident_check = smap_get_bool( + &op->peer->nbrp->options, + "reside-on-redirect-chassis", false); + } + + if (add_chassis_resident_check) { + ds_put_format(&match, " && is_chassis_resident(%s)", + op->peer->od->l3redirect_port->json_key); + } } ds_clear(&actions); @@ -5232,15 +5251,35 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, op->lrp_networks.ipv4_addrs[i].network_s, op->lrp_networks.ipv4_addrs[i].plen, op->lrp_networks.ipv4_addrs[i].addr_s); - if (op->od->l3dgw_port && op == op->od->l3dgw_port - && op->od->l3redirect_port) { - /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s - * should only be sent from the "redirect-chassis", so that - * upstream MAC learning points to the "redirect-chassis". - * Also need to avoid generation of multiple ARP responses - * from different chassis. */ - ds_put_format(&match, " && is_chassis_resident(%s)", - op->od->l3redirect_port->json_key); + + if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer + && op->peer->od->localnet_port) { + bool add_chassis_resident_check = false; + if (op == op->od->l3dgw_port) { + /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s + * should only be sent from the "redirect-chassis", so that + * upstream MAC learning points to the "redirect-chassis". + * Also need to avoid generation of multiple ARP responses + * from different chassis. */ + add_chassis_resident_check = true; + } else { + /* Check if the option 'reside-on-redirect-chassis' + * is set to true on the router port. If set to true + * and if peer's logical switch has a localnet port, it + * means the router pipeline for the packets from + * peer's logical switch is be run on the chassis + * hosting the gateway port and it should reply to the + * ARP requests for the router port IPs. + */ + add_chassis_resident_check = smap_get_bool( + &op->nbrp->options, + "reside-on-redirect-chassis", false); + } + + if (add_chassis_resident_check) { + ds_put_format(&match, " && is_chassis_resident(%s)", + op->od->l3redirect_port->json_key); + } } ds_clear(&actions); diff --git a/ovn/ovn-architecture.7.xml b/ovn/ovn-architecture.7.xml index 64e7d89e6..3936e6016 100644 --- a/ovn/ovn-architecture.7.xml +++ b/ovn/ovn-architecture.7.xml @@ -1372,6 +1372,217 @@ http://docs.openvswitch.org/en/latest/topics/high-availability. </p> + <h2>Multiple localnet logical switches connected to a Logical Router</h2> + + <p> + It is possible to have multiple logical switches each with a localnet port + (representing physical networks) connected to a logical router, in which + one localnet logical switch may provide the external connectivity via a + distributed gateway port and rest of the localnet logical switches use + VLAN tagging in the physical network. It is expected that + <code>ovn-bridge-mappings</code> is configured appropriately on the + chassis for all these localnet networks. + </p> + + <h3>East West routing</h3> + <p> + East-West routing between these localnet VLAN tagged logical switches + work almost the same way as normal logical switches. When the VM sends + such a packet, then: + </p> + <ol> + <li> + It first enters the ingress pipeline, and then egress pipeline of the + source localnet logical switch datapath. It then enters the ingress + pipeline of the logical router datapath via the logical router port in + the source chassis. + </li> + + <li> + Routing decision is taken. + </li> + + <li> + From the router datapath, packet enters the ingress pipeline and then + egress pipeline of the destination localnet logical switch datapath + and goes out of the integration bridge to the provider bridge ( + belonging to the destination logical switch) via the localnet port. + </li> + + <li> + The destination chassis receives the packet via the localnet port and + sends it to the integration bridge. The packet enters the + ingress pipeline and then egress pipeline of the destination localnet + logical switch and finally gets delivered to the destination VM port. + </li> + </ol> + + <h3>External traffic</h3> + + <p> + The following happens when a VM sends an external traffic (which requires + NATting) and the chassis hosting the VM doesn't have a distributed gateway + port. + </p> + + <ol> + <li> + The packet first enters the ingress pipeline, and then egress pipeline of + the source localnet logical switch datapath. It then enters the ingress + pipeline of the logical router datapath via the logical router port in + the source chassis. + </li> + + <li> + Routing decision is taken. Since the gateway router or the distributed + gateway port doesn't reside in the source chassis, the traffic is + redirected to the gateway chassis via the tunnel port. + </li> + + <li> + The gateway chassis receives the packet via the tunnel port and the + packet enters the egress pipeline of the logical router datapath. NAT + rules are applied here. The packet then enters the ingress pipeline and + then egress pipeline of the localnet logical switch datapath which + provides external connectivity and finally goes out via the localnet + port of the logical switch which provides external connectivity. + </li> + </ol> + + <p> + Although this works, the VM traffic is tunnelled when sent from the compute + chassis to the gateway chassis. In order for it to work properly, the MTU + of the localnet logical switches must be lowered to account for the tunnel + encapsulation. + </p> + + <h2> + Centralized routing for localnet VLAN tagged logical switches connected + to a Logical Router + </h2> + + <p> + To overcome the tunnel encapsulation problem described in the previous + section, <code>OVN</code> supports the option of enabling centralized + routing for localnet VLAN tagged logical switches. CMS can configure the + option <ref column="options:reside-on-redirect-chassis" + table="Logical_Router_Port" db="OVN_NB"/> to <code>true</code> for each + <ref table="Logical_Router_Port" db="OVN_NB"/> which connects to the + localnet VLAN tagged logical switches. This causes the gateway + chassis (hosting the distributed gateway port) to handle all the + routing for these networks, making it centralized. It will reply to + the ARP requests for the logical router port IPs. + </p> + + <p> + If the logical router doesn't have a distributed gateway port connecting + to the localnet logical switch which provides external connectivity, + then this option is ignored by <code>OVN</code>. + </p> + + <p> + The following happens when a VM sends an east-west traffic which needs to + be routed: + </p> + + <ol> + <li> + The packet first enters the ingress pipeline, and then egress pipeline of + the source localnet logical switch datapath and is sent out via the + localnet port of the source localnet logical switch (instead of sending + it to router pipeline). + </li> + + <li> + The gateway chassis receives the packet via the localnet port of the + source localnet logical switch and sends it to the integration bridge. + The packet then enters the ingress pipeline, and then egress pipeline of + the source localnet logical switch datapath and enters the ingress + pipeline of the logical router datapath. + </li> + + <li> + Routing decision is taken. + </li> + + <li> + From the router datapath, packet enters the ingress pipeline and then + egress pipeline of the destination localnet logical switch datapath. + It then goes out of the integration bridge to the provider bridge ( + belonging to the destination logical switch) via the localnet port. + </li> + + <li> + The destination chassis receives the packet via the localnet port and + sends it to the integration bridge. The packet enters the + ingress pipeline and then egress pipeline of the destination localnet + logical switch and finally delivered to the destination VM port. + </li> + </ol> + + <p> + The following happens when a VM sends an external traffic which requires + NATting: + </p> + + <ol> + <li> + The packet first enters the ingress pipeline, and then egress pipeline of + the source localnet logical switch datapath and is sent out via the + localnet port of the source localnet logical switch (instead of sending + it to router pipeline). + </li> + + <li> + The gateway chassis receives the packet via the localnet port of the + source localnet logical switch and sends it to the integration bridge. + The packet then enters the ingress pipeline, and then egress pipeline of + the source localnet logical switch datapath and enters the ingress + pipeline of the logical router datapath. + </li> + + <li> + Routing decision is taken and NAT rules are applied. + </li> + + <li> + From the router datapath, packet enters the ingress pipeline and then + egress pipeline of the localnet logical switch datapath which provides + external connectivity. It then goes out of the integration bridge to the + provider bridge (belonging to the logical switch which provides external + connectivity) via the localnet port. + </li> + </ol> + + <p> + The following happens for the reverse external traffic. + </p> + + <ol> + <li> + The gateway chassis receives the packet from the localnet port of + the logical switch which provides external connectivity. The packet then + enters the ingress pipeline and then egress pipeline of the localnet + logical switch (which provides external connectivity). The packet then + enters the ingress pipeline of the logical router datapath. + </li> + + <li> + The ingress pipeline of the logical router datapath applies the unNATting + rules. The packet then enters the ingress pipeline and then egress + pipeline of the source localnet logical switch. Since the source VM + doesn't reside in the gateway chassis, the packet is sent out via the + localnet port of the source logical switch. + </li> + + <li> + The source chassis receives the packet via the localnet port and + sends it to the integration bridge. The packet enters the + ingress pipeline and then egress pipeline of the source localnet + logical switch and finally gets delivered to the source VM port. + </li> + </ol> + <h2>Life Cycle of a VTEP gateway</h2> <p> diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml index 474b4f9a7..4141751f8 100644 --- a/ovn/ovn-nb.xml +++ b/ovn/ovn-nb.xml @@ -1681,6 +1681,49 @@ chassis to enable high availability. </p> </column> + + <column name="options" key="reside-on-redirect-chassis"> + <p> + Generally routing is distributed in <code>OVN</code>. The packet + from a logical port which needs to be routed hits the router pipeline + in the source chassis. For the East-West traffic, the packet is + sent directly to the destination chassis. For the outside traffic + the packet is sent to the gateway chassis. + </p> + + <p> + When this option is set, <code>OVN</code> considers this only if + </p> + + <ul> + <li> + The logical router to which this logical router port belongs to + has a distributed gateway port. + </li> + + <li> + The peer's logical switch has a localnet port (representing + a VLAN tagged network) + </li> + </ul> + + <p> + When this option is set to <code>true</code>, then the packet + which needs to be routed hits the router pipeline in the chassis + hosting the distributed gateway router port. The source chassis + pushes out this traffic via the localnet port. With this the + East-West traffic is no more distributed and will always go through + the gateway chassis. + </p> + + <p> + Without this option set, for any traffic destined to outside from a + logical port which belongs to a logical switch with localnet port, + the source chassis will send the traffic to the gateway chassis via + the tunnel port instead of the localnet port and this could cause MTU + issues. + </p> + </column> </group> <group title="Attachment"> diff --git a/tests/ovn.at b/tests/ovn.at index ab32faa6b..2db3f675a 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -8567,6 +8567,279 @@ OVN_CLEANUP([hv1],[hv2],[hv3]) AT_CLEANUP +# VLAN traffic for external network redirected through distributed router +# gateway port should use vlans(i.e input network vlan tag) across hypervisors +# instead of tunneling. +AT_SETUP([ovn -- vlan traffic for external network with distributed router gateway port]) +AT_SKIP_IF([test $HAVE_PYTHON = no]) +ovn_start + +# Logical network: +# # One LR R1 that has switches foo (192.168.1.0/24) and +# # alice (172.16.1.0/24) connected to it. The logical port +# # between R1 and alice has a "redirect-chassis" specified, +# # i.e. it is the distributed router gateway port(172.16.1.6). +# # Switch alice also has a localnet port defined. +# # An additional switch outside has the same subnet as alice +# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1) +# # which will receive the packet destined for external network +# # (i.e 8.8.8.8 as destination ip). + +# Physical network: +# # Three hypervisors hv[123]. +# # hv1 hosts vif foo1. +# # hv2 is the "redirect-chassis" that hosts the distributed router gateway port. +# # hv3 hosts nexthop port vif outside1. +# # All other tests connect hypervisors to network n1 through br-phys for tunneling. +# # But in this test, hv1 won't connect to n1(and no br-phys in hv1), and +# # in order to show vlans(instead of tunneling) used between hv1 and hv2, +# # a new network n2 created and hv1 and hv2 connected to this network through br-ex. +# # hv2 and hv3 are still connected to n1 network through br-phys. +net_add n1 + +# We are not calling ovn_attach for hv1, to avoid adding br-phys. +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any bridge in hv1 +sim_add hv1 +as hv1 +ovs-vsctl \ + -- set Open_vSwitch . external-ids:system-id=hv1 \ + -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \ + -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \ + -- add-br br-int \ + -- set bridge br-int fail-mode=secure other-config:disable-in-band=true \ + -- set Open_vSwitch . external-ids:ovn-bridge-mappings=public:br-ex + +start_daemon ovn-controller +ovs-vsctl -- add-port br-int hv1-vif1 -- \ + set interface hv1-vif1 external-ids:iface-id=foo1 \ + ofport-request=1 + +sim_add hv2 +as hv2 +ovs-vsctl add-br br-phys +ovn_attach n1 br-phys 192.168.0.2 +ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys" + +sim_add hv3 +as hv3 +ovs-vsctl add-br br-phys +ovn_attach n1 br-phys 192.168.0.3 +ovs-vsctl -- add-port br-int hv3-vif1 -- \ + set interface hv3-vif1 external-ids:iface-id=outside1 \ + options:tx_pcap=hv3/vif1-tx.pcap \ + options:rxq_pcap=hv3/vif1-rx.pcap \ + ofport-request=1 +ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings="phys:br-phys" + +# Create network n2 for vlan connectivity between hv1 and hv2 +net_add n2 + +as hv1 +ovs-vsctl add-br br-ex +net_attach n2 br-ex + +as hv2 +ovs-vsctl add-br br-ex +net_attach n2 br-ex + +OVN_POPULATE_ARP + +ovn-nbctl create Logical_Router name=R1 + +ovn-nbctl ls-add foo +ovn-nbctl ls-add alice +ovn-nbctl ls-add outside + +# Connect foo to R1 +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24 +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \ + type=router options:router-port=foo \ + -- lsp-set-addresses rp-foo router + +# Connect alice to R1 as distributed router gateway port (172.16.1.6) on hv2 +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \ + -- set Logical_Router_Port alice options:redirect-chassis="hv2" +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \ + type=router options:router-port=alice \ + -- lsp-set-addresses rp-alice router \ + +# Create logical port foo1 in foo +ovn-nbctl lsp-add foo foo1 \ +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2" + +# Create logical port outside1 in outside, which is a nexthop address +# for 172.16.1.0/24 +ovn-nbctl lsp-add outside outside1 \ +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1" + +# Set default gateway (nexthop) to 172.16.1.1 +ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice +AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24]) +ovn-nbctl set Logical_Switch_Port rp-alice options:nat-addresses=router + +ovn-nbctl lsp-add foo ln-foo +ovn-nbctl lsp-set-addresses ln-foo unknown +ovn-nbctl lsp-set-options ln-foo network_name=public +ovn-nbctl lsp-set-type ln-foo localnet +AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2]) + +# Create localnet port in alice +ovn-nbctl lsp-add alice ln-alice +ovn-nbctl lsp-set-addresses ln-alice unknown +ovn-nbctl lsp-set-type ln-alice localnet +ovn-nbctl lsp-set-options ln-alice network_name=phys + +# Create localnet port in outside +ovn-nbctl lsp-add outside ln-outside +ovn-nbctl lsp-set-addresses ln-outside unknown +ovn-nbctl lsp-set-type ln-outside localnet +ovn-nbctl lsp-set-options ln-outside network_name=phys + +# Allow some time for ovn-northd and ovn-controller to catch up. +# XXX This should be more systematic. +ovn-nbctl --wait=hv --timeout=3 sync + +# Check that there is a logical flow in logical switch foo's pipeline +# to set the outport to rp-foo (which is expected). +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup | \ +grep rp-foo | grep -v is_chassis_resident | wc -l`]) + +# Set the option 'reside-on-redirect-chassis' for foo +ovn-nbctl set logical_router_port foo options:reside-on-redirect-chassis=true +# Check that there is a logical flow in logical switch foo's pipeline +# to set the outport to rp-foo with the condition is_chassis_redirect. +ovn-sbctl dump-flows foo +OVS_WAIT_UNTIL([test 1 = `ovn-sbctl dump-flows foo | grep ls_in_l2_lkup | \ +grep rp-foo | grep is_chassis_resident | wc -l`]) + +echo "---------NB dump-----" +ovn-nbctl show +echo "---------------------" +ovn-nbctl list logical_router +echo "---------------------" +ovn-nbctl list nat +echo "---------------------" +ovn-nbctl list logical_router_port +echo "---------------------" + +echo "---------SB dump-----" +ovn-sbctl list datapath_binding +echo "---------------------" +ovn-sbctl list port_binding +echo "---------------------" +ovn-sbctl dump-flows +echo "---------------------" +ovn-sbctl list chassis +echo "---------------------" + +for chassis in hv1 hv2 hv3; do + as $chassis + echo "------ $chassis dump ----------" + ovs-vsctl show br-int + ovs-ofctl show br-int + ovs-ofctl dump-flows br-int + echo "--------------------------" +done + +ip_to_hex() { + printf "%02x%02x%02x%02x" "$@" +} + +foo1_ip=$(ip_to_hex 192 168 1 2) +gw_ip=$(ip_to_hex 172 16 1 6) +dst_ip=$(ip_to_hex 8 8 8 8) +nexthop_ip=$(ip_to_hex 172 16 1 1) + +foo1_mac="f00000010203" +foo_mac="000001010203" +gw_mac="000002010203" +nexthop_mac="f00000010204" + +# Send ip packet from foo1 to 8.8.8.8 +src_mac="f00000010203" +dst_mac="000001010203" +packet=${foo_mac}${foo1_mac}08004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000 + +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet +sleep 2 + +# ARP request packet for nexthop_ip to expect at outside1 +arp_request=ffffffffffff${gw_mac}08060001080006040001${gw_mac}${gw_ip}000000000000${nexthop_ip} +echo $arp_request >> hv3-vif1.expected +cat hv3-vif1.expected > expout +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep ${nexthop_ip} | uniq > hv3-vif1 +AT_CHECK([sort hv3-vif1], [0], [expout]) + +# Send ARP reply from outside1 back to the router +reply_mac="f00000010204" +arp_reply=${gw_mac}${nexthop_mac}08060001080006040002${nexthop_mac}${nexthop_ip}${gw_mac}${gw_ip} + +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply +OVS_WAIT_UNTIL([ + test `as hv2 ovs-ofctl dump-flows br-int | grep table=66 | \ +grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1 + ]) + +# VLAN tagged packet with router port(192.168.1.1) MAC as destination MAC +# is expected on bridge connecting hv1 and hv2 +expected=${foo_mac}${foo1_mac}8100000208004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000 +echo $expected > hv1-br-ex_n2.expected + +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port. +# As connection tracking not enabled for this test, snat can't be done on the packet. +# We still see foo1 as the source ip address. But source mac(gateway MAC) and +# dest mac(nexthop mac) are properly configured. +expected=${nexthop_mac}${gw_mac}08004500001c000000003f110100${foo1_ip}${dst_ip}0035111100080000 +echo $expected > hv3-vif1.expected + +reset_pcap_file() { + local iface=$1 + local pcap_file=$2 + ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \ +options:rxq_pcap=dummy-rx.pcap + rm -f ${pcap_file}*.pcap + ovs-vsctl -- set Interface $iface options:tx_pcap=${pcap_file}-tx.pcap \ +options:rxq_pcap=${pcap_file}-rx.pcap +} + +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2 +as hv3 reset_pcap_file hv3-vif1 hv3/vif1 +sleep 2 +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet +sleep 2 + +# On hv1, the packet should not go from vlan switch pipleline to router +# pipleine +as hv1 ovs-ofctl dump-flows br-int + +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep "priority=100,reg15=0x1,metadata=0x2" \ +| grep actions=clone | grep -v n_packets=0 | wc -l], [0], [[0 +]]) + +# On hv1, table 32 check that no packet goes via the tunnel port +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 \ +| grep "NXM_NX_TUN_ID" | grep -v n_packets=0 | wc -l], [0], [[0 +]]) + +ip_packet() { + grep "1010203f00000010203" +} + +# Check vlan tagged packet on the bridge connecting hv1 and hv2 with the +# foo1's mac. +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-ex_n2-tx.pcap | ip_packet | uniq > hv1-br-ex_n2 +cat hv1-br-ex_n2.expected > expout +AT_CHECK([sort hv1-br-ex_n2], [0], [expout]) + +# Check expected packet on nexthop interface +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep ${foo1_ip}${dst_ip} | uniq > hv3-vif1 +cat hv3-vif1.expected > expout +AT_CHECK([sort hv3-vif1], [0], [expout]) + +OVN_CLEANUP([hv1],[hv2],[hv3]) +AT_CLEANUP + AT_SETUP([ovn -- IPv6 ND Router Solicitation responder]) AT_KEYWORDS([ovn-nd_ra]) AT_SKIP_IF([test $HAVE_PYTHON = no])