Message ID | 20180625195348.8917-2-vkommadi@redhat.com |
---|---|
State | Superseded |
Headers | show |
Series | Use VLANs for VLAN packets redirected to a gateway chassis | expand |
I've had a look through and have some notes in-line. I know some of them will be a bit nit-picky, but...sometimes that's just how I am :) On 06/25/2018 03:53 PM, vkommadi@redhat.com wrote: > From: venkata anil <vkommadi@redhat.com> > > When a vm on a vlan tenant network sends traffic to an external network, > it is tunneled from host chassis to gateway chassis. In the earlier > discussion [1], Russel (also in his doc [2]) suggested if we can figure > out a way for OVN to do this redirect to the gateway host over a VLAN > network. This patch implements his suggestion i.e will redirect to > gateway chassis using incoming tenant vlan network. Gateway chassis are > expected to be configured with tenant vlan networks. In this approach, > new logical and physical flows introduced for packet processing in both > host and gateway chassis. > > Packet processing in the host chassis: > 1) A new ovs flow added in physical table 65, which sets MLF_RCV_FROM_VLAN > flag for packets from vlan network entering into router pipeline > 2) A new flow added in lr_in_ip_routing, for packets output through > distributed gateway port and matching MLF_RCV_FROM_VLAN flag, > set REGBIT_NAT_REDIRECT i.e > table=7 (lr_in_ip_routing ), priority=2 , match=( > ip4.dst == 0.0.0.0/0 && flags.rcv_from_vlan == 1 && > !is_chassis_resident("cr-alice")), action=(reg9[0] = 1; next;) > This flow will be set only on chassis not hosting chassisredirect > port i.e compute node. > When REGBIT_NAT_REDIRECT set, > a) lr_in_arp_resolve, will set packet eth.dst to distibuted gateway > port MAC > b) lr_in_gw_redirect, will set chassisredirect port as outport > 3) A new ovs flow added in physical table 32 will use source vlan tenant > network tag as vlan ID for sending the packet to gateway chassis. > As this vlan packet destination MAC is distibuted gateway port MAC, > packet will only reach the gateway chassis. > table=32,priority=150,reg10=0x20/0x20,reg14=0x3,reg15=0x6,metadata=0x4 > actions=mod_vlan_vid:2010,output:25,strip_vlan > This flow will be set only on chassis not hosting chassisredirect > port i.e compute node. > > Packet processing in the gateway chassis: > 1) A new ovs flow added in physical table 0 for vlan traffic coming > from localnet port with router distributed gateway port MAC as > destination MAC address, resubmit to connected router ingress pipeline > (i.e router attached to vlan tenant network). > table=0,priority=150,in_port=67,dl_vlan=2010,dl_dst=00:00:02:01:02:03 > actions=strip_vlan,load:0x4->OXM_OF_METADATA[],load:0x3->NXM_NX_REG14[], > load:0x1->NXM_NX_REG10[5],resubmit(,8) > This flow will be set only on chassis hosting chassisredirect > port i.e gateway node. > 2) A new flow added in lr_in_admission which checks MLF_RCV_FROM_VLAN > and allows the packet. This flow will be set only on chassis hosting > chassisredirect port i.e gateway node. > table=0 (lr_in_admission ), priority=100 , match=( > flags.rcv_from_vlan == 1 && inport == "lrp-44383893-613a-4bfe-b483- > e7d0dc3055cd" && is_chassis_resident("cr-lrp-a6e3d2ab-313a-4ea3- > 8ec4-c3c774a11f49")), action=(next;) > Then packet will pass through router ingress and egress pipelines and > then to external switch pipeline. > > [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046557.html > [2] Point 3 in section 3.3.1 - Future Enhancements > https://docs.google.com/document/d/1JecGIXPH0RAqfGvD0nmtBdEU1zflHACp8WSRnKCFSgg/edit# > > Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-April/046543.html > > Signed-off-by: Venkata Anil <vkommadi@redhat.com> > --- > > v5->v6: > * Rebased > > v4->v5: > * No changes in this patch > > v3->v4: > * Previous v3 patch became this patch of v4 > * Updated the newly added flow in physical table 0 on gateway chassis > to check for distributed gateway port MAC and then resubmit to > router ingress pipeline > * Improved the test > * Added more comments > > ovn/controller/bfd.c | 3 +- > ovn/controller/binding.c | 10 +- > ovn/controller/ovn-controller.c | 3 + > ovn/controller/ovn-controller.h | 17 ++- > ovn/controller/physical.c | 121 ++++++++++++++++- > ovn/lib/logical-fields.c | 4 + > ovn/lib/logical-fields.h | 2 + > ovn/northd/ovn-northd.c | 35 +++++ > tests/ovn.at | 278 ++++++++++++++++++++++++++++++++++++++++ > 9 files changed, 465 insertions(+), 8 deletions(-) > > diff --git a/ovn/controller/bfd.c b/ovn/controller/bfd.c > index 051781f..c696741 100644 > --- a/ovn/controller/bfd.c > +++ b/ovn/controller/bfd.c > @@ -139,8 +139,9 @@ bfd_travel_gw_related_chassis( > LIST_FOR_EACH_POP (dp_binding, node, &dp_list) { > dp = dp_binding->dp; > free(dp_binding); > + const struct sbrec_datapath_binding *pdp; > for (size_t i = 0; i < dp->n_peer_dps; i++) { > - const struct sbrec_datapath_binding *pdp = dp->peer_dps[i]; > + pdp = dp->peer_dps[i]->peer_dp; > if (!pdp) { > continue; > } > diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c > index 021ecdd..de4385f 100644 > --- a/ovn/controller/binding.c > +++ b/ovn/controller/binding.c > @@ -145,10 +145,14 @@ add_local_datapath__(struct ovsdb_idl_index *sbrec_datapath_binding_by_key, > const struct sbrec_port_binding *pb; > SBREC_PORT_BINDING_FOR_EACH_EQUAL (pb, target, > sbrec_port_binding_by_datapath) { > + if (!strcmp(pb->type, "chassisredirect")) { > + ld->chassisredirect_port = pb; > + } > if (!strcmp(pb->type, "patch")) { > const char *peer_name = smap_get(&pb->options, "peer"); > if (peer_name) { > const struct sbrec_port_binding *peer; > + struct peer_datapath *pdp; > > peer = lport_lookup_by_name(sbrec_port_binding_by_name, > peer_name); > @@ -163,9 +167,13 @@ add_local_datapath__(struct ovsdb_idl_index *sbrec_datapath_binding_by_key, > ld->peer_dps = xrealloc( > ld->peer_dps, > ld->n_peer_dps * sizeof *ld->peer_dps); > - ld->peer_dps[ld->n_peer_dps - 1] = datapath_lookup_by_key( > + pdp = xcalloc(1, sizeof(struct peer_datapath)); Use xmalloc here instead of xcalloc. > + pdp->peer_dp = datapath_lookup_by_key( > sbrec_datapath_binding_by_key, > peer->datapath->tunnel_key); > + pdp->patch = pb; > + pdp->peer = peer; > + ld->peer_dps[ld->n_peer_dps - 1] = pdp; > } > } > } > diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c > index 6ee72a9..8e04780 100644 > --- a/ovn/controller/ovn-controller.c > +++ b/ovn/controller/ovn-controller.c > @@ -812,6 +812,9 @@ main(int argc, char *argv[]) > > struct local_datapath *cur_node, *next_node; > HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node, &local_datapaths) { > + for (int i = 0; i < cur_node->n_peer_dps; i++) { > + free(cur_node->peer_dps[i]); > + } > free(cur_node->peer_dps); > hmap_remove(&local_datapaths, &cur_node->hmap_node); > free(cur_node); > diff --git a/ovn/controller/ovn-controller.h b/ovn/controller/ovn-controller.h > index 3b15620..6c1afd9 100644 > --- a/ovn/controller/ovn-controller.h > +++ b/ovn/controller/ovn-controller.h > @@ -40,6 +40,17 @@ struct ct_zone_pending_entry { > enum ct_zone_pending_state state; > }; > > +/* Represents a peer datapath connected to a given datapath */ > +struct peer_datapath { > + const struct sbrec_datapath_binding *peer_dp; > + > + /* Patch port connected to local datapath */ > + const struct sbrec_port_binding *patch; > + > + /* Peer patch port connected to peer datapath */ > + const struct sbrec_port_binding *peer; > +}; > + > /* A logical datapath that has some relevance to this hypervisor. A logical > * datapath D is relevant to hypervisor H if: > * > @@ -57,10 +68,14 @@ struct local_datapath { > /* The localnet port in this datapath, if any (at most one is allowed). */ > const struct sbrec_port_binding *localnet_port; > > + /* The chassisredirect port in this datapath, if any > + * (at most one is allowed). */ > + const struct sbrec_port_binding *chassisredirect_port; > + > /* True if this datapath contains an l3gateway port located on this > * hypervisor. */ > bool has_local_l3gateway; > - const struct sbrec_datapath_binding **peer_dps; > + struct peer_datapath **peer_dps; > size_t n_peer_dps; > }; > > diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c > index dcf2183..c4cf419 100644 > --- a/ovn/controller/physical.c > +++ b/ovn/controller/physical.c > @@ -304,7 +304,8 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_chassis_by_name, > { > uint32_t dp_key = binding->datapath->tunnel_key; > uint32_t port_key = binding->tunnel_key; > - if (!get_local_datapath(local_datapaths, dp_key)) { > + struct local_datapath *ld = get_local_datapath(local_datapaths, dp_key); > + if (!ld) { > return; > } > > @@ -350,6 +351,16 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_chassis_by_name, > put_load(0, MFF_LOG_REG0 + i, 0, 32, ofpacts_p); > } > put_load(0, MFF_IN_PORT, 0, 16, ofpacts_p); > + > + /* Set MLF_RCV_FROM_VLAN flag for vlan network */ > + if (ld->localnet_port) { > + int vlan_tag = (ld->localnet_port->n_tag ? > + *ld->localnet_port->tag : 0); > + if (vlan_tag) { > + put_load(1, MFF_LOG_FLAGS, MLF_RCV_FROM_VLAN_BIT, 1, > + ofpacts_p); > + } Since this block doesn't actually use the vlan_tag for anything, it could be simplified to: if (ld->localnet_port->n_tag) { put_load(...); } I think I'm right here since the "tag" field of the database is constrained to being between 1 and 4095. > + } > put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p); > clone = ofpbuf_at_assert(ofpacts_p, clone_ofs, sizeof *clone); > ofpacts_p->header = clone; > @@ -526,9 +537,15 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_chassis_by_name, > put_local_common_flows(dp_key, port_key, nested_container, &zone_ids, > ofpacts_p, flow_table); > > - /* Table 0, Priority 150 and 100. > + /* Table 0, Priority 200, 150 and 100. > * ============================== > * > + * Priority 200 is for vlan traffic with distributed gateway port MAC > + * as destination MAC address. For such traffic, set MLF_RCV_FROM_VLAN > + * flag, MFF_LOG_DATAPATH to the router metadata and MFF_LOG_INPORT to > + * the patch port connecting router and vlan network and resubmit into > + * the logical router ingress pipeline. > + * > * Priority 150 is for tagged traffic. This may be containers in a > * VM or a VLAN on a local network. For such traffic, match on the > * tags and then strip the tag. > @@ -540,6 +557,55 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_chassis_by_name, > * input port, MFF_LOG_DATAPATH to the logical datapath, and > * resubmit into the logical ingress pipeline starting at table > * 16. */ > + > + /* For packet from vlan network with distributed gateway port MAC as > + * destination MAC address, submit it to router ingress pipeline */ > + int vlan_tag = binding->n_tag ? *binding->tag : 0; > + if (!strcmp(binding->type, "localnet") && vlan_tag) { > + struct local_datapath *ldp = get_local_datapath( > + local_datapaths, binding->datapath->tunnel_key); If I'm reading this correctly, ldp is the same as ld. So I think you can skip looking up ldp and just ld throughout the rest of this block. > + for (int i = 0; i < ldp->n_peer_dps; i++) { > + struct local_datapath *peer_ldp = get_local_datapath( > + local_datapaths, ldp->peer_dps[i]->peer_dp->tunnel_key); Is it possible for peer_ldp to be NULL? If so, you should check for it. If not, then you should probably add an assertion. > + const struct sbrec_port_binding *crp; > + crp = peer_ldp->chassisredirect_port; > + if (crp && crp->chassis && > + !strcmp(crp->chassis->name, chassis->name)) { > + const char *gwp = smap_get(&crp->options, > + "distributed-port"); Check for gwp to be NULL here. If there is no options:distributed-port set, then we're gonna have a bad time. > + if (strcmp(gwp, ldp->peer_dps[i]->peer->logical_port)) { > + ofpbuf_clear(ofpacts_p); > + match_init_catchall(&match); > + > + match_set_in_port(&match, ofport); > + match_set_dl_vlan(&match, htons(vlan_tag)); > + for (int j = 0; j < crp->n_mac; j++) { > + struct lport_addresses laddrs; > + if (!extract_lsp_addresses(crp->mac[j], &laddrs) > + || !laddrs.n_ipv4_addrs) { > + continue; A couple of notes here: * Is there a reason why IPv4 addresses are required? Is this not intended to work in an IPv6-only scenario? * In the case where extract_lsp_addresses() succeeds, but there are no IPv4 addresses, this will leak laddrs. > + } > + match_set_dl_dst(&match, laddrs.ea); > + destroy_lport_addresses(&laddrs); > + break; > + } If execution reaches here and we never added destination ethernet address to the match, should we continue creating the logical flow? Should we bail? > + > + ofpact_put_STRIP_VLAN(ofpacts_p); > + put_load(peer_ldp->datapath->tunnel_key, > + MFF_LOG_DATAPATH, 0, 64, ofpacts_p); > + put_load(ldp->peer_dps[i]->peer->tunnel_key, > + MFF_LOG_INPORT, 0, 32, ofpacts_p); > + put_load(1, MFF_LOG_FLAGS, > + MLF_RCV_FROM_VLAN_BIT, 1, ofpacts_p); > + put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p); > + > + ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, > + 200, 0, &match, ofpacts_p); > + } > + } > + } > + } > + > ofpbuf_clear(ofpacts_p); > match_init_catchall(&match); > match_set_in_port(&match, ofport); > @@ -633,13 +699,58 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_chassis_by_name, > } else { > /* Remote port connected by tunnel */ > > - /* Table 32, priority 100. > - * ======================= > + /* Table 32, priority 150 and 100. > + * ============================== > * > * Handles traffic that needs to be sent to a remote hypervisor. Each > * flow matches an output port that includes a logical port on a remote > - * hypervisor, and tunnels the packet to that hypervisor. > + * hypervisor, and tunnels the packet or send through vlan network to > + * that hypervisor. > */ > + > + /* For each vlan network connected to the router, add that network's > + * vlan tag to the packet and output it through localnet port */ > + struct local_datapath *ldp = get_local_datapath(local_datapaths, > + dp_key); > + for (int i = 0; i < ldp->n_peer_dps; i++) { > + struct ofpact_vlan_vid *vlan_vid; > + ofp_port_t port_ofport = 0; > + struct peer_datapath *pdp = ldp->peer_dps[i]; > + struct local_datapath *peer_ldp = get_local_datapath( > + local_datapaths, pdp->peer_dp->tunnel_key); I have the same question about peer_ldp here that I did about the peer_ldp lookup previously. > + if (peer_ldp->localnet_port && pdp->patch->tunnel_key) { > + int64_t vlan_tag = (peer_ldp->localnet_port->n_tag ? > + *peer_ldp->localnet_port->tag : 0); > + if (!vlan_tag) { > + continue; > + } > + port_ofport = u16_to_ofp(simap_get(&localvif_to_ofport, > + peer_ldp->localnet_port->logical_port)); > + if (!port_ofport) { > + continue; > + } > + > + match_init_catchall(&match); > + ofpbuf_clear(ofpacts_p); > + > + match_set_metadata(&match, htonll(dp_key)); > + match_set_reg_masked(&match, MFF_LOG_FLAGS - MFF_REG0, > + MLF_RCV_FROM_VLAN, MLF_RCV_FROM_VLAN); > + match_set_reg(&match, MFF_LOG_INPORT - MFF_REG0, > + pdp->patch->tunnel_key); > + match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key); > + > + vlan_vid = ofpact_put_SET_VLAN_VID(ofpacts_p); > + vlan_vid->vlan_vid = vlan_tag; > + vlan_vid->push_vlan_if_needed = true; > + ofpact_put_OUTPUT(ofpacts_p)->port = port_ofport; > + ofpact_put_STRIP_VLAN(ofpacts_p); > + > + ofctrl_add_flow(flow_table, OFTABLE_REMOTE_OUTPUT, 150, 0, > + &match, ofpacts_p); > + } > + } > + > match_init_catchall(&match); > ofpbuf_clear(ofpacts_p); > > diff --git a/ovn/lib/logical-fields.c b/ovn/lib/logical-fields.c > index a8b5e3c..b9efa02 100644 > --- a/ovn/lib/logical-fields.c > +++ b/ovn/lib/logical-fields.c > @@ -105,6 +105,10 @@ ovn_init_symtab(struct shash *symtab) > MLF_FORCE_SNAT_FOR_LB_BIT); > expr_symtab_add_subfield(symtab, "flags.force_snat_for_lb", NULL, > flags_str); > + snprintf(flags_str, sizeof flags_str, "flags[%d]", > + MLF_RCV_FROM_VLAN_BIT); > + expr_symtab_add_subfield(symtab, "flags.rcv_from_vlan", NULL, > + flags_str); > > /* Connection tracking state. */ > expr_symtab_add_field(symtab, "ct_mark", MFF_CT_MARK, NULL, false); > diff --git a/ovn/lib/logical-fields.h b/ovn/lib/logical-fields.h > index b1dbb03..96250fd 100644 > --- a/ovn/lib/logical-fields.h > +++ b/ovn/lib/logical-fields.h > @@ -50,6 +50,7 @@ enum mff_log_flags_bits { > MLF_FORCE_SNAT_FOR_DNAT_BIT = 2, > MLF_FORCE_SNAT_FOR_LB_BIT = 3, > MLF_LOCAL_ONLY_BIT = 4, > + MLF_RCV_FROM_VLAN_BIT = 5, > }; > > /* MFF_LOG_FLAGS_REG flag assignments */ > @@ -75,6 +76,7 @@ enum mff_log_flags { > * hypervisors should instead only be output to local targets > */ > MLF_LOCAL_ONLY = (1 << MLF_LOCAL_ONLY_BIT), > + MLF_RCV_FROM_VLAN = (1 << MLF_RCV_FROM_VLAN_BIT), > }; > > #endif /* ovn/lib/logical-fields.h */ > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c > index 72fe4e7..350f015 100644 > --- a/ovn/northd/ovn-northd.c > +++ b/ovn/northd/ovn-northd.c > @@ -4411,6 +4411,28 @@ add_route(struct hmap *lflows, const struct ovn_port *op, > * routing. */ > ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING, priority, > ds_cstr(&match), ds_cstr(&actions)); > + > + /* When output port is distributed gateway port, check if the router > + * input port is a patch port connected to vlan network. > + * Traffic from VLAN network to external network should be redirected > + * to "redirect-chassis" by setting REGBIT_NAT_REDIRECT flag. > + * Later physical table 32 will output this traffic to gateway > + * chassis using input network vlan tag */ > + if (op == op->od->l3dgw_port) { > + ds_clear(&match); > + ds_clear(&actions); > + > + ds_put_format(&match, "ip%s.%s == %s/%d", is_ipv4 ? "4" : "6", > + dir, network_s, plen); > + ds_put_format(&match, " && flags.rcv_from_vlan == 1"); > + ds_put_format(&match, " && !is_chassis_resident(%s)", > + op->od->l3redirect_port->json_key); > + > + ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING, > + priority + 1, ds_cstr(&match), > + REGBIT_NAT_REDIRECT" = 1; next;"); > + } > + > ds_destroy(&match); > ds_destroy(&actions); > } > @@ -4822,6 +4844,19 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, > } > ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 50, > ds_cstr(&match), "next;"); > + > + /* VLAN traffic from localnet port should be allowed for > + * router processing on the "redirect-chassis". */ > + if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer && > + op->peer->od->localnet_port && (op != op->od->l3dgw_port)) { > + ds_clear(&match); > + ds_put_format(&match, "flags.rcv_from_vlan == 1"); > + ds_put_format(&match, " && inport == %s", op->json_key); > + ds_put_format(&match, " && is_chassis_resident(%s)", > + op->od->l3redirect_port->json_key); > + ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 100, > + ds_cstr(&match), "next;"); > + } > } > > /* Logical router ingress table 1: IP Input. */ > diff --git a/tests/ovn.at b/tests/ovn.at > index 6553d17..5ae767d 100644 > --- a/tests/ovn.at > +++ b/tests/ovn.at > @@ -7713,6 +7713,284 @@ test_ip_packet gw2 gw1 > OVN_CLEANUP([hv1],[gw1],[gw2],[ext1]) > AT_CLEANUP > > +# VLAN traffic for external network redirected through distributed router gateway port > +# should use vlans(i.e input network vlan tag) across hypervisors instead of tunneling. > +AT_SETUP([ovn -- vlan traffic for external network with distributed router gateway port]) > +AT_SKIP_IF([test $HAVE_PYTHON = no]) > +ovn_start > + > +# Logical network: > +# # One LR R1 that has switches foo (192.168.1.0/24) and > +# # alice (172.16.1.0/24) connected to it. The logical port > +# # between R1 and alice has a "redirect-chassis" specified, > +# # i.e. it is the distributed router gateway port(172.16.1.6). > +# # Switch alice also has a localnet port defined. > +# # An additional switch outside has the same subnet as alice > +# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1) > +# # which will receive the packet destined for external network > +# # (i.e 8.8.8.8 as destination ip). > + > +# Physical network: > +# # Three hypervisors hv[123]. > +# # hv1 hosts vif foo1. > +# # hv2 is the "redirect-chassis" that hosts the distributed router gateway port. > +# # hv3 hosts nexthop port vif outside1. > +# # All other tests connect hypervisors to network n1 through br-phys for tunneling. > +# # But in this test, hv1 won't connect to n1(and no br-phys in hv1), and > +# # in order to show vlans(instead of tunneling) used between hv1 and hv2, > +# # a new network n2 created and hv1 and hv2 connected to this network through br-ex. > +# # hv2 and hv3 are still connected to n1 network through br-phys. > +net_add n1 > + > +# We are not calling ovn_attach for hv1, to avoid adding br-phys. > +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any bridge in hv1 > +sim_add hv1 > +as hv1 > +ovs-vsctl \ > + -- set Open_vSwitch . external-ids:system-id=hv1 \ > + -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ > + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \ > + -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \ > + -- add-br br-int \ > + -- set bridge br-int fail-mode=secure other-config:disable-in-band=true \ > + -- set Open_vSwitch . external-ids:ovn-bridge-mappings=public:br-ex > + > +start_daemon ovn-controller > +ovs-vsctl -- add-port br-int hv1-vif1 -- \ > + set interface hv1-vif1 external-ids:iface-id=foo1 \ > + ofport-request=1 > + > +sim_add hv2 > +as hv2 > +ovs-vsctl add-br br-phys > +ovn_attach n1 br-phys 192.168.0.2 > +ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys" > + > +sim_add hv3 > +as hv3 > +ovs-vsctl add-br br-phys > +ovn_attach n1 br-phys 192.168.0.3 > +ovs-vsctl -- add-port br-int hv3-vif1 -- \ > + set interface hv3-vif1 external-ids:iface-id=outside1 \ > + options:tx_pcap=hv3/vif1-tx.pcap \ > + options:rxq_pcap=hv3/vif1-rx.pcap \ > + ofport-request=1 > +ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings="phys:br-phys" > + > +# Create network n2 for vlan connectivity between hv1 and hv2 > +net_add n2 > + > +as hv1 > +ovs-vsctl add-br br-ex > +net_attach n2 br-ex > + > +as hv2 > +ovs-vsctl add-br br-ex > +net_attach n2 br-ex > + > +OVN_POPULATE_ARP > + > +ovn-nbctl create Logical_Router name=R1 > + > +ovn-nbctl ls-add foo > +ovn-nbctl ls-add alice > +ovn-nbctl ls-add outside > + > +# Connect foo to R1 > +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24 > +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \ > + type=router options:router-port=foo \ > + -- lsp-set-addresses rp-foo router > + > +# Connect alice to R1 as distributed router gateway port (172.16.1.6) on hv2 > +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \ > + -- set Logical_Router_Port alice options:redirect-chassis="hv2" > +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \ > + type=router options:router-port=alice \ > + -- lsp-set-addresses rp-alice router \ > + > + > +# Create logical port foo1 in foo > +ovn-nbctl lsp-add foo foo1 \ > +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2" > + > +# Create logical port outside1 in outside, which is a nexthop address > +# for 172.16.1.0/24 > +ovn-nbctl lsp-add outside outside1 \ > +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1" > + > +# Set default gateway (nexthop) to 172.16.1.1 > +ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice > +AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24]) > +ovn-nbctl set Logical_Switch_Port rp-alice options:nat-addresses=router > + > +ovn-nbctl lsp-add foo ln-foo > +ovn-nbctl lsp-set-addresses ln-foo unknown > +ovn-nbctl lsp-set-options ln-foo network_name=public > +ovn-nbctl lsp-set-type ln-foo localnet > +AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2]) > + > +# Create localnet port in alice > +ovn-nbctl lsp-add alice ln-alice > +ovn-nbctl lsp-set-addresses ln-alice unknown > +ovn-nbctl lsp-set-type ln-alice localnet > +ovn-nbctl lsp-set-options ln-alice network_name=phys > + > +# Create localnet port in outside > +ovn-nbctl lsp-add outside ln-outside > +ovn-nbctl lsp-set-addresses ln-outside unknown > +ovn-nbctl lsp-set-type ln-outside localnet > +ovn-nbctl lsp-set-options ln-outside network_name=phys > + > +# Allow some time for ovn-northd and ovn-controller to catch up. > +# XXX This should be more systematic. > +ovn-nbctl --wait=hv --timeout=3 sync > + > +echo "---------NB dump-----" > +ovn-nbctl show > +echo "---------------------" > +ovn-nbctl list logical_router > +echo "---------------------" > +ovn-nbctl list nat > +echo "---------------------" > +ovn-nbctl list logical_router_port > +echo "---------------------" > + > +echo "---------SB dump-----" > +ovn-sbctl list datapath_binding > +echo "---------------------" > +ovn-sbctl list port_binding > +echo "---------------------" > +ovn-sbctl dump-flows > +echo "---------------------" > +ovn-sbctl list chassis > +echo "---------------------" > + > +for chassis in hv1 hv2 hv3; do > + as $chassis > + echo "------ $chassis dump ----------" > + ovs-vsctl show br-int > + ovs-ofctl show br-int > + ovs-ofctl dump-flows br-int > + echo "--------------------------" > +done > + > +ip_to_hex() { > + printf "%02x%02x%02x%02x" "$@" > +} > + > +foo1_ip=$(ip_to_hex 192 168 1 2) > +gw_ip=$(ip_to_hex 172 16 1 6) > +dst_ip=$(ip_to_hex 8 8 8 8) > +nexthop_ip=$(ip_to_hex 172 16 1 1) > + > +foo1_mac="f00000010203" > +foo_mac="000001010203" > +gw_mac="000002010203" > +nexthop_mac="f00000010204" > + > +# Send ip packet from foo1 to 8.8.8.8 > +src_mac="f00000010203" > +dst_mac="000001010203" > +packet=${foo_mac}${foo1_mac}08004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000 > + > +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet > +sleep 2 > + > +# ARP request packet for nexthop_ip to expect at outside1 > +arp_request=ffffffffffff${gw_mac}08060001080006040001${gw_mac}${gw_ip}000000000000${nexthop_ip} > +echo $arp_request >> hv3-vif1.expected > +cat hv3-vif1.expected > expout > +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep ${nexthop_ip} | uniq > hv3-vif1 > +AT_CHECK([sort hv3-vif1], [0], [expout]) > + > +# Send ARP reply from outside1 back to the router > +reply_mac="f00000010204" > +arp_reply=${gw_mac}${nexthop_mac}08060001080006040002${nexthop_mac}${nexthop_ip}${gw_mac}${gw_ip} > + > +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply > +OVS_WAIT_UNTIL([ > + test `as hv2 ovs-ofctl dump-flows br-int | grep table=66 | \ > +grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1 > + ]) > + > +echo "ovn-sbctl list MAC_Binding" > +ovn-sbctl list MAC_Binding > +echo "============================" > + > +# VLAN tagged packet with distributed gateway port(172.16.1.6) MAC as destination MAC > +# is expected on bridge connecting hv1 and hv2 > +expected=${gw_mac}${foo1_mac}8100000208004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000 > +echo $expected > hv1-br-ex_n2.expected > + > +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port. > +# As connection tracking not enabled for this test, snat can't be done on the packet. > +# We still see foo1 as the source ip address. But source mac(gateway MAC) and > +# dest mac(nexthop mac) are properly configured. > +expected=${nexthop_mac}${gw_mac}08004500001c000000003f110100${foo1_ip}${dst_ip}0035111100080000 > +echo $expected > hv3-vif1.expected > + > +reset_pcap_file() { > + local iface=$1 > + local pcap_file=$2 > + ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \ > +options:rxq_pcap=dummy-rx.pcap > + rm -f ${pcap_file}*.pcap > + ovs-vsctl -- set Interface $iface options:tx_pcap=${pcap_file}-tx.pcap \ > +options:rxq_pcap=${pcap_file}-rx.pcap > +} > + > +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2 > +as hv3 reset_pcap_file hv3-vif1 hv3/vif1 > +sleep 2 > +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet > +sleep 2 > + > +# On hv1, table 65 for packets going from vlan switch pipleline to router pipleine > +# set MLF_RCV_FROM_VLAN flag > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep "priority=100,reg15=0x1,metadata=0x2" \ > +| grep actions=clone | grep "load:0x1->NXM_NX_REG10" | wc -l], [0], [[1 > +]]) > +# On hv1, because of snat rule in table 15, a higher priority(i.e 2) flow > +# added for packets with MLF_RCV_FROM_VLAN flag with output as distributed > +# gateway port, which sets REGBIT_NAT_REDIRECT flag > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=15 | grep "priority=2,ip,reg10=0x20/0x20,metadata=0x1" \ > +| grep "actions=load:0x1->OXM_OF_PKT_REG4" | wc -l], [0], [[1 > +]]) > + > +# On hv1, table 32 flow which tags packet with source network vlan tag and sends it to hv2 > +# through br-ex > +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 | grep "priority=150,reg10=0x20/0x20,reg14=0x1,reg15=0x3,metadata=0x1" \ > +| grep "actions=mod_vlan_vid:2" | grep "n_packets=2," | wc -l], [0], [[1 > +]]) > + > +# On hv2 table 0, vlan tagged packet is sent through router pipeline > +# by setting MLF_RCV_FROM_VLAN flag (REG10) > +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep "table=0," | grep "priority=200" | grep "dl_vlan=2" | \ > +grep "dl_dst=00:00:02:01:02:03" | grep "actions=strip_vlan,load:0x1->OXM_OF_METADATA" | grep "load:0x1->NXM_NX_REG14" | \ > +grep "load:0x1->NXM_NX_REG10" | wc -l], [0], [[1 > +]]) > +# on hv2 table 8, allow packets with router metadata and with MLF_RCV_FROM_VLAN flag > +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=8 | grep "priority=100,reg10=0x20/0x20,reg14=0x1,metadata=0x1" | wc -l], [0], [[1 > +]]) > + > +ip_packet() { > + grep "2010203f00000010203" > +} > + > +# Check vlan tagged packet on the bridge connecting hv1 and hv2 > +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-ex_n2-tx.pcap | ip_packet | uniq > hv1-br-ex_n2 > +cat hv1-br-ex_n2.expected > expout > +AT_CHECK([sort hv1-br-ex_n2], [0], [expout]) > + > +# Check expected packet on nexthop interface > +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep ${foo1_ip}${dst_ip} | uniq > hv3-vif1 > +cat hv3-vif1.expected > expout > +AT_CHECK([sort hv3-vif1], [0], [expout]) > + > +OVN_CLEANUP([hv1],[hv2],[hv3]) > +AT_CLEANUP > + > AT_SETUP([ovn -- 1 LR with distributed router gateway port]) > AT_SKIP_IF([test $HAVE_PYTHON = no]) > ovn_start >
Thanks Mark for the review. I will address the review comments. On Thu, Jul 19, 2018 at 6:54 PM, Mark Michelson <mmichels@redhat.com> wrote: > I've had a look through and have some notes in-line. I know some of them > will be a bit nit-picky, but...sometimes that's just how I am :) > > > On 06/25/2018 03:53 PM, vkommadi@redhat.com wrote: > >> From: venkata anil <vkommadi@redhat.com> >> >> When a vm on a vlan tenant network sends traffic to an external network, >> it is tunneled from host chassis to gateway chassis. In the earlier >> discussion [1], Russel (also in his doc [2]) suggested if we can figure >> out a way for OVN to do this redirect to the gateway host over a VLAN >> network. This patch implements his suggestion i.e will redirect to >> gateway chassis using incoming tenant vlan network. Gateway chassis are >> expected to be configured with tenant vlan networks. In this approach, >> new logical and physical flows introduced for packet processing in both >> host and gateway chassis. >> >> Packet processing in the host chassis: >> 1) A new ovs flow added in physical table 65, which sets MLF_RCV_FROM_VLAN >> flag for packets from vlan network entering into router pipeline >> 2) A new flow added in lr_in_ip_routing, for packets output through >> distributed gateway port and matching MLF_RCV_FROM_VLAN flag, >> set REGBIT_NAT_REDIRECT i.e >> table=7 (lr_in_ip_routing ), priority=2 , match=( >> ip4.dst == 0.0.0.0/0 && flags.rcv_from_vlan == 1 && >> !is_chassis_resident("cr-alice")), action=(reg9[0] = 1; next;) >> This flow will be set only on chassis not hosting chassisredirect >> port i.e compute node. >> When REGBIT_NAT_REDIRECT set, >> a) lr_in_arp_resolve, will set packet eth.dst to distibuted gateway >> port MAC >> b) lr_in_gw_redirect, will set chassisredirect port as outport >> 3) A new ovs flow added in physical table 32 will use source vlan tenant >> network tag as vlan ID for sending the packet to gateway chassis. >> As this vlan packet destination MAC is distibuted gateway port MAC, >> packet will only reach the gateway chassis. >> table=32,priority=150,reg10=0x20/0x20,reg14=0x3,reg15=0x6,me >> tadata=0x4 >> actions=mod_vlan_vid:2010,output:25,strip_vlan >> This flow will be set only on chassis not hosting chassisredirect >> port i.e compute node. >> >> Packet processing in the gateway chassis: >> 1) A new ovs flow added in physical table 0 for vlan traffic coming >> from localnet port with router distributed gateway port MAC as >> destination MAC address, resubmit to connected router ingress pipeline >> (i.e router attached to vlan tenant network). >> table=0,priority=150,in_port=67,dl_vlan=2010,dl_dst=00:00:02:01:02:03 >> actions=strip_vlan,load:0x4->OXM_OF_METADATA[],load:0x3->NXM >> _NX_REG14[], >> load:0x1->NXM_NX_REG10[5],resubmit(,8) >> This flow will be set only on chassis hosting chassisredirect >> port i.e gateway node. >> 2) A new flow added in lr_in_admission which checks MLF_RCV_FROM_VLAN >> and allows the packet. This flow will be set only on chassis hosting >> chassisredirect port i.e gateway node. >> table=0 (lr_in_admission ), priority=100 , match=( >> flags.rcv_from_vlan == 1 && inport == "lrp-44383893-613a-4bfe-b483- >> e7d0dc3055cd" && is_chassis_resident("cr-lrp-a6e3d2ab-313a-4ea3- >> 8ec4-c3c774a11f49")), action=(next;) >> Then packet will pass through router ingress and egress pipelines and >> then to external switch pipeline. >> >> [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2018-Apri >> l/046557.html >> [2] Point 3 in section 3.3.1 - Future Enhancements >> https://docs.google.com/document/d/1JecGIXPH0RAqfGvD0nmtBdE >> U1zflHACp8WSRnKCFSgg/edit# >> >> Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2018-Apri >> l/046543.html >> >> Signed-off-by: Venkata Anil <vkommadi@redhat.com> >> --- >> >> v5->v6: >> * Rebased >> >> v4->v5: >> * No changes in this patch >> >> v3->v4: >> * Previous v3 patch became this patch of v4 >> * Updated the newly added flow in physical table 0 on gateway chassis >> to check for distributed gateway port MAC and then resubmit to >> router ingress pipeline >> * Improved the test >> * Added more comments >> >> ovn/controller/bfd.c | 3 +- >> ovn/controller/binding.c | 10 +- >> ovn/controller/ovn-controller.c | 3 + >> ovn/controller/ovn-controller.h | 17 ++- >> ovn/controller/physical.c | 121 ++++++++++++++++- >> ovn/lib/logical-fields.c | 4 + >> ovn/lib/logical-fields.h | 2 + >> ovn/northd/ovn-northd.c | 35 +++++ >> tests/ovn.at | 278 ++++++++++++++++++++++++++++++ >> ++++++++++ >> 9 files changed, 465 insertions(+), 8 deletions(-) >> >> diff --git a/ovn/controller/bfd.c b/ovn/controller/bfd.c >> index 051781f..c696741 100644 >> --- a/ovn/controller/bfd.c >> +++ b/ovn/controller/bfd.c >> @@ -139,8 +139,9 @@ bfd_travel_gw_related_chassis( >> LIST_FOR_EACH_POP (dp_binding, node, &dp_list) { >> dp = dp_binding->dp; >> free(dp_binding); >> + const struct sbrec_datapath_binding *pdp; >> for (size_t i = 0; i < dp->n_peer_dps; i++) { >> - const struct sbrec_datapath_binding *pdp = dp->peer_dps[i]; >> + pdp = dp->peer_dps[i]->peer_dp; >> if (!pdp) { >> continue; >> } >> diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c >> index 021ecdd..de4385f 100644 >> --- a/ovn/controller/binding.c >> +++ b/ovn/controller/binding.c >> @@ -145,10 +145,14 @@ add_local_datapath__(struct ovsdb_idl_index >> *sbrec_datapath_binding_by_key, >> const struct sbrec_port_binding *pb; >> SBREC_PORT_BINDING_FOR_EACH_EQUAL (pb, target, >> sbrec_port_binding_by_datapath) >> { >> + if (!strcmp(pb->type, "chassisredirect")) { >> + ld->chassisredirect_port = pb; >> + } >> if (!strcmp(pb->type, "patch")) { >> const char *peer_name = smap_get(&pb->options, "peer"); >> if (peer_name) { >> const struct sbrec_port_binding *peer; >> + struct peer_datapath *pdp; >> peer = lport_lookup_by_name(sbrec_por >> t_binding_by_name, >> peer_name); >> @@ -163,9 +167,13 @@ add_local_datapath__(struct ovsdb_idl_index >> *sbrec_datapath_binding_by_key, >> ld->peer_dps = xrealloc( >> ld->peer_dps, >> ld->n_peer_dps * sizeof *ld->peer_dps); >> - ld->peer_dps[ld->n_peer_dps - 1] = >> datapath_lookup_by_key( >> + pdp = xcalloc(1, sizeof(struct peer_datapath)); >> > > Use xmalloc here instead of xcalloc. > > sure > > + pdp->peer_dp = datapath_lookup_by_key( >> sbrec_datapath_binding_by_key, >> peer->datapath->tunnel_key); >> + pdp->patch = pb; >> + pdp->peer = peer; >> + ld->peer_dps[ld->n_peer_dps - 1] = pdp; >> } >> } >> } >> diff --git a/ovn/controller/ovn-controller.c >> b/ovn/controller/ovn-controller.c >> index 6ee72a9..8e04780 100644 >> --- a/ovn/controller/ovn-controller.c >> +++ b/ovn/controller/ovn-controller.c >> @@ -812,6 +812,9 @@ main(int argc, char *argv[]) >> struct local_datapath *cur_node, *next_node; >> HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node, >> &local_datapaths) { >> + for (int i = 0; i < cur_node->n_peer_dps; i++) { >> + free(cur_node->peer_dps[i]); >> + } >> free(cur_node->peer_dps); >> hmap_remove(&local_datapaths, &cur_node->hmap_node); >> free(cur_node); >> diff --git a/ovn/controller/ovn-controller.h >> b/ovn/controller/ovn-controller.h >> index 3b15620..6c1afd9 100644 >> --- a/ovn/controller/ovn-controller.h >> +++ b/ovn/controller/ovn-controller.h >> @@ -40,6 +40,17 @@ struct ct_zone_pending_entry { >> enum ct_zone_pending_state state; >> }; >> +/* Represents a peer datapath connected to a given datapath */ >> +struct peer_datapath { >> + const struct sbrec_datapath_binding *peer_dp; >> + >> + /* Patch port connected to local datapath */ >> + const struct sbrec_port_binding *patch; >> + >> + /* Peer patch port connected to peer datapath */ >> + const struct sbrec_port_binding *peer; >> +}; >> + >> /* A logical datapath that has some relevance to this hypervisor. A >> logical >> * datapath D is relevant to hypervisor H if: >> * >> @@ -57,10 +68,14 @@ struct local_datapath { >> /* The localnet port in this datapath, if any (at most one is >> allowed). */ >> const struct sbrec_port_binding *localnet_port; >> + /* The chassisredirect port in this datapath, if any >> + * (at most one is allowed). */ >> + const struct sbrec_port_binding *chassisredirect_port; >> + >> /* True if this datapath contains an l3gateway port located on this >> * hypervisor. */ >> bool has_local_l3gateway; >> - const struct sbrec_datapath_binding **peer_dps; >> + struct peer_datapath **peer_dps; >> size_t n_peer_dps; >> }; >> diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c >> index dcf2183..c4cf419 100644 >> --- a/ovn/controller/physical.c >> +++ b/ovn/controller/physical.c >> @@ -304,7 +304,8 @@ consider_port_binding(struct ovsdb_idl_index >> *sbrec_chassis_by_name, >> { >> uint32_t dp_key = binding->datapath->tunnel_key; >> uint32_t port_key = binding->tunnel_key; >> - if (!get_local_datapath(local_datapaths, dp_key)) { >> + struct local_datapath *ld = get_local_datapath(local_datapaths, >> dp_key); >> + if (!ld) { >> return; >> } >> @@ -350,6 +351,16 @@ consider_port_binding(struct ovsdb_idl_index >> *sbrec_chassis_by_name, >> put_load(0, MFF_LOG_REG0 + i, 0, 32, ofpacts_p); >> } >> put_load(0, MFF_IN_PORT, 0, 16, ofpacts_p); >> + >> + /* Set MLF_RCV_FROM_VLAN flag for vlan network */ >> + if (ld->localnet_port) { >> + int vlan_tag = (ld->localnet_port->n_tag ? >> + *ld->localnet_port->tag : 0); >> + if (vlan_tag) { >> + put_load(1, MFF_LOG_FLAGS, MLF_RCV_FROM_VLAN_BIT, 1, >> + ofpacts_p); >> + } >> > > Since this block doesn't actually use the vlan_tag for anything, it could > be simplified to: > > if (ld->localnet_port->n_tag) { > put_load(...); > } > > I think I'm right here since the "tag" field of the database is > constrained to being between 1 and 4095. > > Sure > > + } >> put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p); >> clone = ofpbuf_at_assert(ofpacts_p, clone_ofs, sizeof *clone); >> ofpacts_p->header = clone; >> @@ -526,9 +537,15 @@ consider_port_binding(struct ovsdb_idl_index >> *sbrec_chassis_by_name, >> put_local_common_flows(dp_key, port_key, nested_container, >> &zone_ids, >> ofpacts_p, flow_table); >> - /* Table 0, Priority 150 and 100. >> + /* Table 0, Priority 200, 150 and 100. >> * ============================== >> * >> + * Priority 200 is for vlan traffic with distributed gateway >> port MAC >> + * as destination MAC address. For such traffic, set >> MLF_RCV_FROM_VLAN >> + * flag, MFF_LOG_DATAPATH to the router metadata and >> MFF_LOG_INPORT to >> + * the patch port connecting router and vlan network and >> resubmit into >> + * the logical router ingress pipeline. >> + * >> * Priority 150 is for tagged traffic. This may be containers >> in a >> * VM or a VLAN on a local network. For such traffic, match on >> the >> * tags and then strip the tag. >> @@ -540,6 +557,55 @@ consider_port_binding(struct ovsdb_idl_index >> *sbrec_chassis_by_name, >> * input port, MFF_LOG_DATAPATH to the logical datapath, and >> * resubmit into the logical ingress pipeline starting at table >> * 16. */ >> + >> + /* For packet from vlan network with distributed gateway port >> MAC as >> + * destination MAC address, submit it to router ingress pipeline >> */ >> + int vlan_tag = binding->n_tag ? *binding->tag : 0; >> + if (!strcmp(binding->type, "localnet") && vlan_tag) { >> + struct local_datapath *ldp = get_local_datapath( >> + local_datapaths, binding->datapath->tunnel_key); >> > > If I'm reading this correctly, ldp is the same as ld. So I think you can > skip looking up ldp and just ld throughout the rest of this block. Sure > > > + for (int i = 0; i < ldp->n_peer_dps; i++) { >> + struct local_datapath *peer_ldp = get_local_datapath( >> + local_datapaths, ldp->peer_dps[i]->peer_dp->tun >> nel_key); >> > > Is it possible for peer_ldp to be NULL? If so, you should check for it. If > not, then you should probably add an assertion. peer_ldp shoudn't be NULL. How to add assertion for checking not NULL? > > > + const struct sbrec_port_binding *crp; >> + crp = peer_ldp->chassisredirect_port; >> + if (crp && crp->chassis && >> + !strcmp(crp->chassis->name, chassis->name)) { >> + const char *gwp = smap_get(&crp->options, >> + "distributed-port"); >> > > Check for gwp to be NULL here. If there is no options:distributed-port > set, then we're gonna have a bad time. Sure. But gwp can never be NULL. Chassisredirect port will be created from distributed gateway port. > > > + if (strcmp(gwp, ldp->peer_dps[i]->peer->logical_port)) >> { >> + ofpbuf_clear(ofpacts_p); >> + match_init_catchall(&match); >> + >> + match_set_in_port(&match, ofport); >> + match_set_dl_vlan(&match, htons(vlan_tag)); >> + for (int j = 0; j < crp->n_mac; j++) { >> + struct lport_addresses laddrs; >> + if (!extract_lsp_addresses(crp->mac[j], >> &laddrs) >> + || !laddrs.n_ipv4_addrs) { >> + continue; >> > > A couple of notes here: > > * Is there a reason why IPv4 addresses are required? Is this not intended > to work in an IPv6-only scenario? > Thanks Mark. Its my mistake, shouldn't have checked for IPv4 addresses ( I have blindly copied from pinctrl.c). * In the case where extract_lsp_addresses() succeeds, but there are no IPv4 > addresses, this will leak laddrs. > > > + } >> + match_set_dl_dst(&match, laddrs.ea); >> + destroy_lport_addresses(&laddrs); >> + break; >> + } >> > > If execution reaches here and we never added destination ethernet address > to the match, should we continue creating the logical flow? Should we bail? Chassisredirect port will have MAC. But I will add a check and bail if no MAC present. > > > + >> + ofpact_put_STRIP_VLAN(ofpacts_p); >> + put_load(peer_ldp->datapath->tunnel_key, >> + MFF_LOG_DATAPATH, 0, 64, ofpacts_p); >> + put_load(ldp->peer_dps[i]->peer->tunnel_key, >> + MFF_LOG_INPORT, 0, 32, ofpacts_p); >> + put_load(1, MFF_LOG_FLAGS, >> + MLF_RCV_FROM_VLAN_BIT, 1, ofpacts_p); >> + put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, >> ofpacts_p); >> + >> + ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, >> + 200, 0, &match, ofpacts_p); >> + } >> + } >> + } >> + } >> + >> ofpbuf_clear(ofpacts_p); >> match_init_catchall(&match); >> match_set_in_port(&match, ofport); >> @@ -633,13 +699,58 @@ consider_port_binding(struct ovsdb_idl_index >> *sbrec_chassis_by_name, >> } else { >> /* Remote port connected by tunnel */ >> - /* Table 32, priority 100. >> - * ======================= >> + /* Table 32, priority 150 and 100. >> + * ============================== >> * >> * Handles traffic that needs to be sent to a remote >> hypervisor. Each >> * flow matches an output port that includes a logical port on >> a remote >> - * hypervisor, and tunnels the packet to that hypervisor. >> + * hypervisor, and tunnels the packet or send through vlan >> network to >> + * that hypervisor. >> */ >> + >> + /* For each vlan network connected to the router, add that >> network's >> + * vlan tag to the packet and output it through localnet port */ >> + struct local_datapath *ldp = get_local_datapath(local_datapaths, >> + dp_key); >> + for (int i = 0; i < ldp->n_peer_dps; i++) { >> + struct ofpact_vlan_vid *vlan_vid; >> + ofp_port_t port_ofport = 0; >> + struct peer_datapath *pdp = ldp->peer_dps[i]; >> + struct local_datapath *peer_ldp = get_local_datapath( >> + local_datapaths, pdp->peer_dp->tunnel_key); >> > > I have the same question about peer_ldp here that I did about the peer_ldp > lookup previously. > > > + if (peer_ldp->localnet_port && pdp->patch->tunnel_key) { >> + int64_t vlan_tag = (peer_ldp->localnet_port->n_tag ? >> + *peer_ldp->localnet_port->tag : 0); >> + if (!vlan_tag) { >> + continue; >> + } >> + port_ofport = u16_to_ofp(simap_get(&localvif_to_ofport, >> + peer_ldp->localnet_port->logical_port)); >> + if (!port_ofport) { >> + continue; >> + } >> + >> + match_init_catchall(&match); >> + ofpbuf_clear(ofpacts_p); >> + >> + match_set_metadata(&match, htonll(dp_key)); >> + match_set_reg_masked(&match, MFF_LOG_FLAGS - MFF_REG0, >> + MLF_RCV_FROM_VLAN, >> MLF_RCV_FROM_VLAN); >> + match_set_reg(&match, MFF_LOG_INPORT - MFF_REG0, >> + pdp->patch->tunnel_key); >> + match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, >> port_key); >> + >> + vlan_vid = ofpact_put_SET_VLAN_VID(ofpacts_p); >> + vlan_vid->vlan_vid = vlan_tag; >> + vlan_vid->push_vlan_if_needed = true; >> + ofpact_put_OUTPUT(ofpacts_p)->port = port_ofport; >> + ofpact_put_STRIP_VLAN(ofpacts_p); >> + >> + ofctrl_add_flow(flow_table, OFTABLE_REMOTE_OUTPUT, 150, >> 0, >> + &match, ofpacts_p); >> + } >> + } >> + >> match_init_catchall(&match); >> ofpbuf_clear(ofpacts_p); >> diff --git a/ovn/lib/logical-fields.c b/ovn/lib/logical-fields.c >> index a8b5e3c..b9efa02 100644 >> --- a/ovn/lib/logical-fields.c >> +++ b/ovn/lib/logical-fields.c >> @@ -105,6 +105,10 @@ ovn_init_symtab(struct shash *symtab) >> MLF_FORCE_SNAT_FOR_LB_BIT); >> expr_symtab_add_subfield(symtab, "flags.force_snat_for_lb", NULL, >> flags_str); >> + snprintf(flags_str, sizeof flags_str, "flags[%d]", >> + MLF_RCV_FROM_VLAN_BIT); >> + expr_symtab_add_subfield(symtab, "flags.rcv_from_vlan", NULL, >> + flags_str); >> /* Connection tracking state. */ >> expr_symtab_add_field(symtab, "ct_mark", MFF_CT_MARK, NULL, false); >> diff --git a/ovn/lib/logical-fields.h b/ovn/lib/logical-fields.h >> index b1dbb03..96250fd 100644 >> --- a/ovn/lib/logical-fields.h >> +++ b/ovn/lib/logical-fields.h >> @@ -50,6 +50,7 @@ enum mff_log_flags_bits { >> MLF_FORCE_SNAT_FOR_DNAT_BIT = 2, >> MLF_FORCE_SNAT_FOR_LB_BIT = 3, >> MLF_LOCAL_ONLY_BIT = 4, >> + MLF_RCV_FROM_VLAN_BIT = 5, >> }; >> /* MFF_LOG_FLAGS_REG flag assignments */ >> @@ -75,6 +76,7 @@ enum mff_log_flags { >> * hypervisors should instead only be output to local targets >> */ >> MLF_LOCAL_ONLY = (1 << MLF_LOCAL_ONLY_BIT), >> + MLF_RCV_FROM_VLAN = (1 << MLF_RCV_FROM_VLAN_BIT), >> }; >> #endif /* ovn/lib/logical-fields.h */ >> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c >> index 72fe4e7..350f015 100644 >> --- a/ovn/northd/ovn-northd.c >> +++ b/ovn/northd/ovn-northd.c >> @@ -4411,6 +4411,28 @@ add_route(struct hmap *lflows, const struct >> ovn_port *op, >> * routing. */ >> ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING, priority, >> ds_cstr(&match), ds_cstr(&actions)); >> + >> + /* When output port is distributed gateway port, check if the router >> + * input port is a patch port connected to vlan network. >> + * Traffic from VLAN network to external network should be redirected >> + * to "redirect-chassis" by setting REGBIT_NAT_REDIRECT flag. >> + * Later physical table 32 will output this traffic to gateway >> + * chassis using input network vlan tag */ >> + if (op == op->od->l3dgw_port) { >> + ds_clear(&match); >> + ds_clear(&actions); >> + >> + ds_put_format(&match, "ip%s.%s == %s/%d", is_ipv4 ? "4" : "6", >> + dir, network_s, plen); >> + ds_put_format(&match, " && flags.rcv_from_vlan == 1"); >> + ds_put_format(&match, " && !is_chassis_resident(%s)", >> + op->od->l3redirect_port->json_key); >> + >> + ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING, >> + priority + 1, ds_cstr(&match), >> + REGBIT_NAT_REDIRECT" = 1; next;"); >> + } >> + >> ds_destroy(&match); >> ds_destroy(&actions); >> } >> @@ -4822,6 +4844,19 @@ build_lrouter_flows(struct hmap *datapaths, struct >> hmap *ports, >> } >> ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 50, >> ds_cstr(&match), "next;"); >> + >> + /* VLAN traffic from localnet port should be allowed for >> + * router processing on the "redirect-chassis". */ >> + if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer && >> + op->peer->od->localnet_port && (op != op->od->l3dgw_port)) { >> + ds_clear(&match); >> + ds_put_format(&match, "flags.rcv_from_vlan == 1"); >> + ds_put_format(&match, " && inport == %s", op->json_key); >> + ds_put_format(&match, " && is_chassis_resident(%s)", >> + op->od->l3redirect_port->json_key); >> + ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 100, >> + ds_cstr(&match), "next;"); >> + } >> } >> /* Logical router ingress table 1: IP Input. */ >> diff --git a/tests/ovn.at b/tests/ovn.at >> index 6553d17..5ae767d 100644 >> --- a/tests/ovn.at >> +++ b/tests/ovn.at >> @@ -7713,6 +7713,284 @@ test_ip_packet gw2 gw1 >> OVN_CLEANUP([hv1],[gw1],[gw2],[ext1]) >> AT_CLEANUP >> +# VLAN traffic for external network redirected through distributed >> router gateway port >> +# should use vlans(i.e input network vlan tag) across hypervisors >> instead of tunneling. >> +AT_SETUP([ovn -- vlan traffic for external network with distributed >> router gateway port]) >> +AT_SKIP_IF([test $HAVE_PYTHON = no]) >> +ovn_start >> + >> +# Logical network: >> +# # One LR R1 that has switches foo (192.168.1.0/24) and >> +# # alice (172.16.1.0/24) connected to it. The logical port >> +# # between R1 and alice has a "redirect-chassis" specified, >> +# # i.e. it is the distributed router gateway port(172.16.1.6). >> +# # Switch alice also has a localnet port defined. >> +# # An additional switch outside has the same subnet as alice >> +# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1) >> +# # which will receive the packet destined for external network >> +# # (i.e 8.8.8.8 as destination ip). >> + >> +# Physical network: >> +# # Three hypervisors hv[123]. >> +# # hv1 hosts vif foo1. >> +# # hv2 is the "redirect-chassis" that hosts the distributed router >> gateway port. >> +# # hv3 hosts nexthop port vif outside1. >> +# # All other tests connect hypervisors to network n1 through br-phys >> for tunneling. >> +# # But in this test, hv1 won't connect to n1(and no br-phys in hv1), and >> +# # in order to show vlans(instead of tunneling) used between hv1 and >> hv2, >> +# # a new network n2 created and hv1 and hv2 connected to this network >> through br-ex. >> +# # hv2 and hv3 are still connected to n1 network through br-phys. >> +net_add n1 >> + >> +# We are not calling ovn_attach for hv1, to avoid adding br-phys. >> +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any bridge >> in hv1 >> +sim_add hv1 >> +as hv1 >> +ovs-vsctl \ >> + -- set Open_vSwitch . external-ids:system-id=hv1 \ >> + -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock >> \ >> + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \ >> + -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \ >> + -- add-br br-int \ >> + -- set bridge br-int fail-mode=secure other-config:disable-in-band=true >> \ >> + -- set Open_vSwitch . external-ids:ovn-bridge-mappings=public:br-ex >> + >> +start_daemon ovn-controller >> +ovs-vsctl -- add-port br-int hv1-vif1 -- \ >> + set interface hv1-vif1 external-ids:iface-id=foo1 \ >> + ofport-request=1 >> + >> +sim_add hv2 >> +as hv2 >> +ovs-vsctl add-br br-phys >> +ovn_attach n1 br-phys 192.168.0.2 >> +ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappin >> gs="public:br-ex,phys:br-phys" >> + >> +sim_add hv3 >> +as hv3 >> +ovs-vsctl add-br br-phys >> +ovn_attach n1 br-phys 192.168.0.3 >> +ovs-vsctl -- add-port br-int hv3-vif1 -- \ >> + set interface hv3-vif1 external-ids:iface-id=outside1 \ >> + options:tx_pcap=hv3/vif1-tx.pcap \ >> + options:rxq_pcap=hv3/vif1-rx.pcap \ >> + ofport-request=1 >> +ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappin >> gs="phys:br-phys" >> + >> +# Create network n2 for vlan connectivity between hv1 and hv2 >> +net_add n2 >> + >> +as hv1 >> +ovs-vsctl add-br br-ex >> +net_attach n2 br-ex >> + >> +as hv2 >> +ovs-vsctl add-br br-ex >> +net_attach n2 br-ex >> + >> +OVN_POPULATE_ARP >> + >> +ovn-nbctl create Logical_Router name=R1 >> + >> +ovn-nbctl ls-add foo >> +ovn-nbctl ls-add alice >> +ovn-nbctl ls-add outside >> + >> +# Connect foo to R1 >> +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24 >> +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \ >> + type=router options:router-port=foo \ >> + -- lsp-set-addresses rp-foo router >> + >> +# Connect alice to R1 as distributed router gateway port (172.16.1.6) on >> hv2 >> +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \ >> + -- set Logical_Router_Port alice options:redirect-chassis="hv2" >> +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \ >> + type=router options:router-port=alice \ >> + -- lsp-set-addresses rp-alice router \ >> + >> + >> +# Create logical port foo1 in foo >> +ovn-nbctl lsp-add foo foo1 \ >> +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2" >> + >> +# Create logical port outside1 in outside, which is a nexthop address >> +# for 172.16.1.0/24 >> +ovn-nbctl lsp-add outside outside1 \ >> +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1" >> + >> +# Set default gateway (nexthop) to 172.16.1.1 >> +ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice >> +AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24]) >> +ovn-nbctl set Logical_Switch_Port rp-alice options:nat-addresses=router >> + >> +ovn-nbctl lsp-add foo ln-foo >> +ovn-nbctl lsp-set-addresses ln-foo unknown >> +ovn-nbctl lsp-set-options ln-foo network_name=public >> +ovn-nbctl lsp-set-type ln-foo localnet >> +AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2]) >> + >> +# Create localnet port in alice >> +ovn-nbctl lsp-add alice ln-alice >> +ovn-nbctl lsp-set-addresses ln-alice unknown >> +ovn-nbctl lsp-set-type ln-alice localnet >> +ovn-nbctl lsp-set-options ln-alice network_name=phys >> + >> +# Create localnet port in outside >> +ovn-nbctl lsp-add outside ln-outside >> +ovn-nbctl lsp-set-addresses ln-outside unknown >> +ovn-nbctl lsp-set-type ln-outside localnet >> +ovn-nbctl lsp-set-options ln-outside network_name=phys >> + >> +# Allow some time for ovn-northd and ovn-controller to catch up. >> +# XXX This should be more systematic. >> +ovn-nbctl --wait=hv --timeout=3 sync >> + >> +echo "---------NB dump-----" >> +ovn-nbctl show >> +echo "---------------------" >> +ovn-nbctl list logical_router >> +echo "---------------------" >> +ovn-nbctl list nat >> +echo "---------------------" >> +ovn-nbctl list logical_router_port >> +echo "---------------------" >> + >> +echo "---------SB dump-----" >> +ovn-sbctl list datapath_binding >> +echo "---------------------" >> +ovn-sbctl list port_binding >> +echo "---------------------" >> +ovn-sbctl dump-flows >> +echo "---------------------" >> +ovn-sbctl list chassis >> +echo "---------------------" >> + >> +for chassis in hv1 hv2 hv3; do >> + as $chassis >> + echo "------ $chassis dump ----------" >> + ovs-vsctl show br-int >> + ovs-ofctl show br-int >> + ovs-ofctl dump-flows br-int >> + echo "--------------------------" >> +done >> + >> +ip_to_hex() { >> + printf "%02x%02x%02x%02x" "$@" >> +} >> + >> +foo1_ip=$(ip_to_hex 192 168 1 2) >> +gw_ip=$(ip_to_hex 172 16 1 6) >> +dst_ip=$(ip_to_hex 8 8 8 8) >> +nexthop_ip=$(ip_to_hex 172 16 1 1) >> + >> +foo1_mac="f00000010203" >> +foo_mac="000001010203" >> +gw_mac="000002010203" >> +nexthop_mac="f00000010204" >> + >> +# Send ip packet from foo1 to 8.8.8.8 >> +src_mac="f00000010203" >> +dst_mac="000001010203" >> +packet=${foo_mac}${foo1_mac}08004500001c0000000040110000${f >> oo1_ip}${dst_ip}0035111100080000 >> + >> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet >> +sleep 2 >> + >> +# ARP request packet for nexthop_ip to expect at outside1 >> +arp_request=ffffffffffff${gw_mac}08060001080006040001${gw_m >> ac}${gw_ip}000000000000${nexthop_ip} >> +echo $arp_request >> hv3-vif1.expected >> +cat hv3-vif1.expected > expout >> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep >> ${nexthop_ip} | uniq > hv3-vif1 >> +AT_CHECK([sort hv3-vif1], [0], [expout]) >> + >> +# Send ARP reply from outside1 back to the router >> +reply_mac="f00000010204" >> +arp_reply=${gw_mac}${nexthop_mac}08060001080006040002${next >> hop_mac}${nexthop_ip}${gw_mac}${gw_ip} >> + >> +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply >> +OVS_WAIT_UNTIL([ >> + test `as hv2 ovs-ofctl dump-flows br-int | grep table=66 | \ >> +grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1 >> + ]) >> + >> +echo "ovn-sbctl list MAC_Binding" >> +ovn-sbctl list MAC_Binding >> +echo "============================" >> + >> +# VLAN tagged packet with distributed gateway port(172.16.1.6) MAC as >> destination MAC >> +# is expected on bridge connecting hv1 and hv2 >> +expected=${gw_mac}${foo1_mac}8100000208004500001c0000000040 >> 110000${foo1_ip}${dst_ip}0035111100080000 >> +echo $expected > hv1-br-ex_n2.expected >> + >> +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port. >> +# As connection tracking not enabled for this test, snat can't be done >> on the packet. >> +# We still see foo1 as the source ip address. But source mac(gateway >> MAC) and >> +# dest mac(nexthop mac) are properly configured. >> +expected=${nexthop_mac}${gw_mac}08004500001c000000003f11010 >> 0${foo1_ip}${dst_ip}0035111100080000 >> +echo $expected > hv3-vif1.expected >> + >> +reset_pcap_file() { >> + local iface=$1 >> + local pcap_file=$2 >> + ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \ >> +options:rxq_pcap=dummy-rx.pcap >> + rm -f ${pcap_file}*.pcap >> + ovs-vsctl -- set Interface $iface options:tx_pcap=${pcap_file}-tx.pcap >> \ >> +options:rxq_pcap=${pcap_file}-rx.pcap >> +} >> + >> +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2 >> +as hv3 reset_pcap_file hv3-vif1 hv3/vif1 >> +sleep 2 >> +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet >> +sleep 2 >> + >> +# On hv1, table 65 for packets going from vlan switch pipleline to >> router pipleine >> +# set MLF_RCV_FROM_VLAN flag >> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep >> "priority=100,reg15=0x1,metadata=0x2" \ >> +| grep actions=clone | grep "load:0x1->NXM_NX_REG10" | wc -l], [0], [[1 >> +]]) >> +# On hv1, because of snat rule in table 15, a higher priority(i.e 2) flow >> +# added for packets with MLF_RCV_FROM_VLAN flag with output as >> distributed >> +# gateway port, which sets REGBIT_NAT_REDIRECT flag >> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=15 | grep >> "priority=2,ip,reg10=0x20/0x20,metadata=0x1" \ >> +| grep "actions=load:0x1->OXM_OF_PKT_REG4" | wc -l], [0], [[1 >> +]]) >> + >> +# On hv1, table 32 flow which tags packet with source network vlan tag >> and sends it to hv2 >> +# through br-ex >> +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 | grep >> "priority=150,reg10=0x20/0x20,reg14=0x1,reg15=0x3,metadata=0x1" \ >> +| grep "actions=mod_vlan_vid:2" | grep "n_packets=2," | wc -l], [0], [[1 >> +]]) >> + >> +# On hv2 table 0, vlan tagged packet is sent through router pipeline >> +# by setting MLF_RCV_FROM_VLAN flag (REG10) >> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep "table=0," | grep >> "priority=200" | grep "dl_vlan=2" | \ >> +grep "dl_dst=00:00:02:01:02:03" | grep "actions=strip_vlan,load:0x1->OXM_OF_METADATA" >> | grep "load:0x1->NXM_NX_REG14" | \ >> +grep "load:0x1->NXM_NX_REG10" | wc -l], [0], [[1 >> +]]) >> +# on hv2 table 8, allow packets with router metadata and with >> MLF_RCV_FROM_VLAN flag >> +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=8 | grep >> "priority=100,reg10=0x20/0x20,reg14=0x1,metadata=0x1" | wc -l], [0], [[1 >> +]]) >> + >> +ip_packet() { >> + grep "2010203f00000010203" >> +} >> + >> +# Check vlan tagged packet on the bridge connecting hv1 and hv2 >> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-ex_n2-tx.pcap | >> ip_packet | uniq > hv1-br-ex_n2 >> +cat hv1-br-ex_n2.expected > expout >> +AT_CHECK([sort hv1-br-ex_n2], [0], [expout]) >> + >> +# Check expected packet on nexthop interface >> +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep >> ${foo1_ip}${dst_ip} | uniq > hv3-vif1 >> +cat hv3-vif1.expected > expout >> +AT_CHECK([sort hv3-vif1], [0], [expout]) >> + >> +OVN_CLEANUP([hv1],[hv2],[hv3]) >> +AT_CLEANUP >> + >> AT_SETUP([ovn -- 1 LR with distributed router gateway port]) >> AT_SKIP_IF([test $HAVE_PYTHON = no]) >> ovn_start >> >> >
diff --git a/ovn/controller/bfd.c b/ovn/controller/bfd.c index 051781f..c696741 100644 --- a/ovn/controller/bfd.c +++ b/ovn/controller/bfd.c @@ -139,8 +139,9 @@ bfd_travel_gw_related_chassis( LIST_FOR_EACH_POP (dp_binding, node, &dp_list) { dp = dp_binding->dp; free(dp_binding); + const struct sbrec_datapath_binding *pdp; for (size_t i = 0; i < dp->n_peer_dps; i++) { - const struct sbrec_datapath_binding *pdp = dp->peer_dps[i]; + pdp = dp->peer_dps[i]->peer_dp; if (!pdp) { continue; } diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c index 021ecdd..de4385f 100644 --- a/ovn/controller/binding.c +++ b/ovn/controller/binding.c @@ -145,10 +145,14 @@ add_local_datapath__(struct ovsdb_idl_index *sbrec_datapath_binding_by_key, const struct sbrec_port_binding *pb; SBREC_PORT_BINDING_FOR_EACH_EQUAL (pb, target, sbrec_port_binding_by_datapath) { + if (!strcmp(pb->type, "chassisredirect")) { + ld->chassisredirect_port = pb; + } if (!strcmp(pb->type, "patch")) { const char *peer_name = smap_get(&pb->options, "peer"); if (peer_name) { const struct sbrec_port_binding *peer; + struct peer_datapath *pdp; peer = lport_lookup_by_name(sbrec_port_binding_by_name, peer_name); @@ -163,9 +167,13 @@ add_local_datapath__(struct ovsdb_idl_index *sbrec_datapath_binding_by_key, ld->peer_dps = xrealloc( ld->peer_dps, ld->n_peer_dps * sizeof *ld->peer_dps); - ld->peer_dps[ld->n_peer_dps - 1] = datapath_lookup_by_key( + pdp = xcalloc(1, sizeof(struct peer_datapath)); + pdp->peer_dp = datapath_lookup_by_key( sbrec_datapath_binding_by_key, peer->datapath->tunnel_key); + pdp->patch = pb; + pdp->peer = peer; + ld->peer_dps[ld->n_peer_dps - 1] = pdp; } } } diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c index 6ee72a9..8e04780 100644 --- a/ovn/controller/ovn-controller.c +++ b/ovn/controller/ovn-controller.c @@ -812,6 +812,9 @@ main(int argc, char *argv[]) struct local_datapath *cur_node, *next_node; HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node, &local_datapaths) { + for (int i = 0; i < cur_node->n_peer_dps; i++) { + free(cur_node->peer_dps[i]); + } free(cur_node->peer_dps); hmap_remove(&local_datapaths, &cur_node->hmap_node); free(cur_node); diff --git a/ovn/controller/ovn-controller.h b/ovn/controller/ovn-controller.h index 3b15620..6c1afd9 100644 --- a/ovn/controller/ovn-controller.h +++ b/ovn/controller/ovn-controller.h @@ -40,6 +40,17 @@ struct ct_zone_pending_entry { enum ct_zone_pending_state state; }; +/* Represents a peer datapath connected to a given datapath */ +struct peer_datapath { + const struct sbrec_datapath_binding *peer_dp; + + /* Patch port connected to local datapath */ + const struct sbrec_port_binding *patch; + + /* Peer patch port connected to peer datapath */ + const struct sbrec_port_binding *peer; +}; + /* A logical datapath that has some relevance to this hypervisor. A logical * datapath D is relevant to hypervisor H if: * @@ -57,10 +68,14 @@ struct local_datapath { /* The localnet port in this datapath, if any (at most one is allowed). */ const struct sbrec_port_binding *localnet_port; + /* The chassisredirect port in this datapath, if any + * (at most one is allowed). */ + const struct sbrec_port_binding *chassisredirect_port; + /* True if this datapath contains an l3gateway port located on this * hypervisor. */ bool has_local_l3gateway; - const struct sbrec_datapath_binding **peer_dps; + struct peer_datapath **peer_dps; size_t n_peer_dps; }; diff --git a/ovn/controller/physical.c b/ovn/controller/physical.c index dcf2183..c4cf419 100644 --- a/ovn/controller/physical.c +++ b/ovn/controller/physical.c @@ -304,7 +304,8 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_chassis_by_name, { uint32_t dp_key = binding->datapath->tunnel_key; uint32_t port_key = binding->tunnel_key; - if (!get_local_datapath(local_datapaths, dp_key)) { + struct local_datapath *ld = get_local_datapath(local_datapaths, dp_key); + if (!ld) { return; } @@ -350,6 +351,16 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_chassis_by_name, put_load(0, MFF_LOG_REG0 + i, 0, 32, ofpacts_p); } put_load(0, MFF_IN_PORT, 0, 16, ofpacts_p); + + /* Set MLF_RCV_FROM_VLAN flag for vlan network */ + if (ld->localnet_port) { + int vlan_tag = (ld->localnet_port->n_tag ? + *ld->localnet_port->tag : 0); + if (vlan_tag) { + put_load(1, MFF_LOG_FLAGS, MLF_RCV_FROM_VLAN_BIT, 1, + ofpacts_p); + } + } put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p); clone = ofpbuf_at_assert(ofpacts_p, clone_ofs, sizeof *clone); ofpacts_p->header = clone; @@ -526,9 +537,15 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_chassis_by_name, put_local_common_flows(dp_key, port_key, nested_container, &zone_ids, ofpacts_p, flow_table); - /* Table 0, Priority 150 and 100. + /* Table 0, Priority 200, 150 and 100. * ============================== * + * Priority 200 is for vlan traffic with distributed gateway port MAC + * as destination MAC address. For such traffic, set MLF_RCV_FROM_VLAN + * flag, MFF_LOG_DATAPATH to the router metadata and MFF_LOG_INPORT to + * the patch port connecting router and vlan network and resubmit into + * the logical router ingress pipeline. + * * Priority 150 is for tagged traffic. This may be containers in a * VM or a VLAN on a local network. For such traffic, match on the * tags and then strip the tag. @@ -540,6 +557,55 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_chassis_by_name, * input port, MFF_LOG_DATAPATH to the logical datapath, and * resubmit into the logical ingress pipeline starting at table * 16. */ + + /* For packet from vlan network with distributed gateway port MAC as + * destination MAC address, submit it to router ingress pipeline */ + int vlan_tag = binding->n_tag ? *binding->tag : 0; + if (!strcmp(binding->type, "localnet") && vlan_tag) { + struct local_datapath *ldp = get_local_datapath( + local_datapaths, binding->datapath->tunnel_key); + for (int i = 0; i < ldp->n_peer_dps; i++) { + struct local_datapath *peer_ldp = get_local_datapath( + local_datapaths, ldp->peer_dps[i]->peer_dp->tunnel_key); + const struct sbrec_port_binding *crp; + crp = peer_ldp->chassisredirect_port; + if (crp && crp->chassis && + !strcmp(crp->chassis->name, chassis->name)) { + const char *gwp = smap_get(&crp->options, + "distributed-port"); + if (strcmp(gwp, ldp->peer_dps[i]->peer->logical_port)) { + ofpbuf_clear(ofpacts_p); + match_init_catchall(&match); + + match_set_in_port(&match, ofport); + match_set_dl_vlan(&match, htons(vlan_tag)); + for (int j = 0; j < crp->n_mac; j++) { + struct lport_addresses laddrs; + if (!extract_lsp_addresses(crp->mac[j], &laddrs) + || !laddrs.n_ipv4_addrs) { + continue; + } + match_set_dl_dst(&match, laddrs.ea); + destroy_lport_addresses(&laddrs); + break; + } + + ofpact_put_STRIP_VLAN(ofpacts_p); + put_load(peer_ldp->datapath->tunnel_key, + MFF_LOG_DATAPATH, 0, 64, ofpacts_p); + put_load(ldp->peer_dps[i]->peer->tunnel_key, + MFF_LOG_INPORT, 0, 32, ofpacts_p); + put_load(1, MFF_LOG_FLAGS, + MLF_RCV_FROM_VLAN_BIT, 1, ofpacts_p); + put_resubmit(OFTABLE_LOG_INGRESS_PIPELINE, ofpacts_p); + + ofctrl_add_flow(flow_table, OFTABLE_PHY_TO_LOG, + 200, 0, &match, ofpacts_p); + } + } + } + } + ofpbuf_clear(ofpacts_p); match_init_catchall(&match); match_set_in_port(&match, ofport); @@ -633,13 +699,58 @@ consider_port_binding(struct ovsdb_idl_index *sbrec_chassis_by_name, } else { /* Remote port connected by tunnel */ - /* Table 32, priority 100. - * ======================= + /* Table 32, priority 150 and 100. + * ============================== * * Handles traffic that needs to be sent to a remote hypervisor. Each * flow matches an output port that includes a logical port on a remote - * hypervisor, and tunnels the packet to that hypervisor. + * hypervisor, and tunnels the packet or send through vlan network to + * that hypervisor. */ + + /* For each vlan network connected to the router, add that network's + * vlan tag to the packet and output it through localnet port */ + struct local_datapath *ldp = get_local_datapath(local_datapaths, + dp_key); + for (int i = 0; i < ldp->n_peer_dps; i++) { + struct ofpact_vlan_vid *vlan_vid; + ofp_port_t port_ofport = 0; + struct peer_datapath *pdp = ldp->peer_dps[i]; + struct local_datapath *peer_ldp = get_local_datapath( + local_datapaths, pdp->peer_dp->tunnel_key); + if (peer_ldp->localnet_port && pdp->patch->tunnel_key) { + int64_t vlan_tag = (peer_ldp->localnet_port->n_tag ? + *peer_ldp->localnet_port->tag : 0); + if (!vlan_tag) { + continue; + } + port_ofport = u16_to_ofp(simap_get(&localvif_to_ofport, + peer_ldp->localnet_port->logical_port)); + if (!port_ofport) { + continue; + } + + match_init_catchall(&match); + ofpbuf_clear(ofpacts_p); + + match_set_metadata(&match, htonll(dp_key)); + match_set_reg_masked(&match, MFF_LOG_FLAGS - MFF_REG0, + MLF_RCV_FROM_VLAN, MLF_RCV_FROM_VLAN); + match_set_reg(&match, MFF_LOG_INPORT - MFF_REG0, + pdp->patch->tunnel_key); + match_set_reg(&match, MFF_LOG_OUTPORT - MFF_REG0, port_key); + + vlan_vid = ofpact_put_SET_VLAN_VID(ofpacts_p); + vlan_vid->vlan_vid = vlan_tag; + vlan_vid->push_vlan_if_needed = true; + ofpact_put_OUTPUT(ofpacts_p)->port = port_ofport; + ofpact_put_STRIP_VLAN(ofpacts_p); + + ofctrl_add_flow(flow_table, OFTABLE_REMOTE_OUTPUT, 150, 0, + &match, ofpacts_p); + } + } + match_init_catchall(&match); ofpbuf_clear(ofpacts_p); diff --git a/ovn/lib/logical-fields.c b/ovn/lib/logical-fields.c index a8b5e3c..b9efa02 100644 --- a/ovn/lib/logical-fields.c +++ b/ovn/lib/logical-fields.c @@ -105,6 +105,10 @@ ovn_init_symtab(struct shash *symtab) MLF_FORCE_SNAT_FOR_LB_BIT); expr_symtab_add_subfield(symtab, "flags.force_snat_for_lb", NULL, flags_str); + snprintf(flags_str, sizeof flags_str, "flags[%d]", + MLF_RCV_FROM_VLAN_BIT); + expr_symtab_add_subfield(symtab, "flags.rcv_from_vlan", NULL, + flags_str); /* Connection tracking state. */ expr_symtab_add_field(symtab, "ct_mark", MFF_CT_MARK, NULL, false); diff --git a/ovn/lib/logical-fields.h b/ovn/lib/logical-fields.h index b1dbb03..96250fd 100644 --- a/ovn/lib/logical-fields.h +++ b/ovn/lib/logical-fields.h @@ -50,6 +50,7 @@ enum mff_log_flags_bits { MLF_FORCE_SNAT_FOR_DNAT_BIT = 2, MLF_FORCE_SNAT_FOR_LB_BIT = 3, MLF_LOCAL_ONLY_BIT = 4, + MLF_RCV_FROM_VLAN_BIT = 5, }; /* MFF_LOG_FLAGS_REG flag assignments */ @@ -75,6 +76,7 @@ enum mff_log_flags { * hypervisors should instead only be output to local targets */ MLF_LOCAL_ONLY = (1 << MLF_LOCAL_ONLY_BIT), + MLF_RCV_FROM_VLAN = (1 << MLF_RCV_FROM_VLAN_BIT), }; #endif /* ovn/lib/logical-fields.h */ diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index 72fe4e7..350f015 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -4411,6 +4411,28 @@ add_route(struct hmap *lflows, const struct ovn_port *op, * routing. */ ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING, priority, ds_cstr(&match), ds_cstr(&actions)); + + /* When output port is distributed gateway port, check if the router + * input port is a patch port connected to vlan network. + * Traffic from VLAN network to external network should be redirected + * to "redirect-chassis" by setting REGBIT_NAT_REDIRECT flag. + * Later physical table 32 will output this traffic to gateway + * chassis using input network vlan tag */ + if (op == op->od->l3dgw_port) { + ds_clear(&match); + ds_clear(&actions); + + ds_put_format(&match, "ip%s.%s == %s/%d", is_ipv4 ? "4" : "6", + dir, network_s, plen); + ds_put_format(&match, " && flags.rcv_from_vlan == 1"); + ds_put_format(&match, " && !is_chassis_resident(%s)", + op->od->l3redirect_port->json_key); + + ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_ROUTING, + priority + 1, ds_cstr(&match), + REGBIT_NAT_REDIRECT" = 1; next;"); + } + ds_destroy(&match); ds_destroy(&actions); } @@ -4822,6 +4844,19 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, } ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 50, ds_cstr(&match), "next;"); + + /* VLAN traffic from localnet port should be allowed for + * router processing on the "redirect-chassis". */ + if (op->od->l3dgw_port && op->od->l3redirect_port && op->peer && + op->peer->od->localnet_port && (op != op->od->l3dgw_port)) { + ds_clear(&match); + ds_put_format(&match, "flags.rcv_from_vlan == 1"); + ds_put_format(&match, " && inport == %s", op->json_key); + ds_put_format(&match, " && is_chassis_resident(%s)", + op->od->l3redirect_port->json_key); + ovn_lflow_add(lflows, op->od, S_ROUTER_IN_ADMISSION, 100, + ds_cstr(&match), "next;"); + } } /* Logical router ingress table 1: IP Input. */ diff --git a/tests/ovn.at b/tests/ovn.at index 6553d17..5ae767d 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -7713,6 +7713,284 @@ test_ip_packet gw2 gw1 OVN_CLEANUP([hv1],[gw1],[gw2],[ext1]) AT_CLEANUP +# VLAN traffic for external network redirected through distributed router gateway port +# should use vlans(i.e input network vlan tag) across hypervisors instead of tunneling. +AT_SETUP([ovn -- vlan traffic for external network with distributed router gateway port]) +AT_SKIP_IF([test $HAVE_PYTHON = no]) +ovn_start + +# Logical network: +# # One LR R1 that has switches foo (192.168.1.0/24) and +# # alice (172.16.1.0/24) connected to it. The logical port +# # between R1 and alice has a "redirect-chassis" specified, +# # i.e. it is the distributed router gateway port(172.16.1.6). +# # Switch alice also has a localnet port defined. +# # An additional switch outside has the same subnet as alice +# # (172.16.1.0/24), a localnet port and nexthop port(172.16.1.1) +# # which will receive the packet destined for external network +# # (i.e 8.8.8.8 as destination ip). + +# Physical network: +# # Three hypervisors hv[123]. +# # hv1 hosts vif foo1. +# # hv2 is the "redirect-chassis" that hosts the distributed router gateway port. +# # hv3 hosts nexthop port vif outside1. +# # All other tests connect hypervisors to network n1 through br-phys for tunneling. +# # But in this test, hv1 won't connect to n1(and no br-phys in hv1), and +# # in order to show vlans(instead of tunneling) used between hv1 and hv2, +# # a new network n2 created and hv1 and hv2 connected to this network through br-ex. +# # hv2 and hv3 are still connected to n1 network through br-phys. +net_add n1 + +# We are not calling ovn_attach for hv1, to avoid adding br-phys. +# Tunneling won't work in hv1 as ovn-encap-ip is not added to any bridge in hv1 +sim_add hv1 +as hv1 +ovs-vsctl \ + -- set Open_vSwitch . external-ids:system-id=hv1 \ + -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ + -- set Open_vSwitch . external-ids:ovn-encap-type=geneve,vxlan \ + -- set Open_vSwitch . external-ids:ovn-encap-ip=192.168.0.1 \ + -- add-br br-int \ + -- set bridge br-int fail-mode=secure other-config:disable-in-band=true \ + -- set Open_vSwitch . external-ids:ovn-bridge-mappings=public:br-ex + +start_daemon ovn-controller +ovs-vsctl -- add-port br-int hv1-vif1 -- \ + set interface hv1-vif1 external-ids:iface-id=foo1 \ + ofport-request=1 + +sim_add hv2 +as hv2 +ovs-vsctl add-br br-phys +ovn_attach n1 br-phys 192.168.0.2 +ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings="public:br-ex,phys:br-phys" + +sim_add hv3 +as hv3 +ovs-vsctl add-br br-phys +ovn_attach n1 br-phys 192.168.0.3 +ovs-vsctl -- add-port br-int hv3-vif1 -- \ + set interface hv3-vif1 external-ids:iface-id=outside1 \ + options:tx_pcap=hv3/vif1-tx.pcap \ + options:rxq_pcap=hv3/vif1-rx.pcap \ + ofport-request=1 +ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings="phys:br-phys" + +# Create network n2 for vlan connectivity between hv1 and hv2 +net_add n2 + +as hv1 +ovs-vsctl add-br br-ex +net_attach n2 br-ex + +as hv2 +ovs-vsctl add-br br-ex +net_attach n2 br-ex + +OVN_POPULATE_ARP + +ovn-nbctl create Logical_Router name=R1 + +ovn-nbctl ls-add foo +ovn-nbctl ls-add alice +ovn-nbctl ls-add outside + +# Connect foo to R1 +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24 +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \ + type=router options:router-port=foo \ + -- lsp-set-addresses rp-foo router + +# Connect alice to R1 as distributed router gateway port (172.16.1.6) on hv2 +ovn-nbctl lrp-add R1 alice 00:00:02:01:02:03 172.16.1.6/24 \ + -- set Logical_Router_Port alice options:redirect-chassis="hv2" +ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \ + type=router options:router-port=alice \ + -- lsp-set-addresses rp-alice router \ + + +# Create logical port foo1 in foo +ovn-nbctl lsp-add foo foo1 \ +-- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2" + +# Create logical port outside1 in outside, which is a nexthop address +# for 172.16.1.0/24 +ovn-nbctl lsp-add outside outside1 \ +-- lsp-set-addresses outside1 "f0:00:00:01:02:04 172.16.1.1" + +# Set default gateway (nexthop) to 172.16.1.1 +ovn-nbctl lr-route-add R1 "0.0.0.0/0" 172.16.1.1 alice +AT_CHECK([ovn-nbctl lr-nat-add R1 snat 172.16.1.6 192.168.1.1/24]) +ovn-nbctl set Logical_Switch_Port rp-alice options:nat-addresses=router + +ovn-nbctl lsp-add foo ln-foo +ovn-nbctl lsp-set-addresses ln-foo unknown +ovn-nbctl lsp-set-options ln-foo network_name=public +ovn-nbctl lsp-set-type ln-foo localnet +AT_CHECK([ovn-nbctl set Logical_Switch_Port ln-foo tag=2]) + +# Create localnet port in alice +ovn-nbctl lsp-add alice ln-alice +ovn-nbctl lsp-set-addresses ln-alice unknown +ovn-nbctl lsp-set-type ln-alice localnet +ovn-nbctl lsp-set-options ln-alice network_name=phys + +# Create localnet port in outside +ovn-nbctl lsp-add outside ln-outside +ovn-nbctl lsp-set-addresses ln-outside unknown +ovn-nbctl lsp-set-type ln-outside localnet +ovn-nbctl lsp-set-options ln-outside network_name=phys + +# Allow some time for ovn-northd and ovn-controller to catch up. +# XXX This should be more systematic. +ovn-nbctl --wait=hv --timeout=3 sync + +echo "---------NB dump-----" +ovn-nbctl show +echo "---------------------" +ovn-nbctl list logical_router +echo "---------------------" +ovn-nbctl list nat +echo "---------------------" +ovn-nbctl list logical_router_port +echo "---------------------" + +echo "---------SB dump-----" +ovn-sbctl list datapath_binding +echo "---------------------" +ovn-sbctl list port_binding +echo "---------------------" +ovn-sbctl dump-flows +echo "---------------------" +ovn-sbctl list chassis +echo "---------------------" + +for chassis in hv1 hv2 hv3; do + as $chassis + echo "------ $chassis dump ----------" + ovs-vsctl show br-int + ovs-ofctl show br-int + ovs-ofctl dump-flows br-int + echo "--------------------------" +done + +ip_to_hex() { + printf "%02x%02x%02x%02x" "$@" +} + +foo1_ip=$(ip_to_hex 192 168 1 2) +gw_ip=$(ip_to_hex 172 16 1 6) +dst_ip=$(ip_to_hex 8 8 8 8) +nexthop_ip=$(ip_to_hex 172 16 1 1) + +foo1_mac="f00000010203" +foo_mac="000001010203" +gw_mac="000002010203" +nexthop_mac="f00000010204" + +# Send ip packet from foo1 to 8.8.8.8 +src_mac="f00000010203" +dst_mac="000001010203" +packet=${foo_mac}${foo1_mac}08004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000 + +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet +sleep 2 + +# ARP request packet for nexthop_ip to expect at outside1 +arp_request=ffffffffffff${gw_mac}08060001080006040001${gw_mac}${gw_ip}000000000000${nexthop_ip} +echo $arp_request >> hv3-vif1.expected +cat hv3-vif1.expected > expout +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep ${nexthop_ip} | uniq > hv3-vif1 +AT_CHECK([sort hv3-vif1], [0], [expout]) + +# Send ARP reply from outside1 back to the router +reply_mac="f00000010204" +arp_reply=${gw_mac}${nexthop_mac}08060001080006040002${nexthop_mac}${nexthop_ip}${gw_mac}${gw_ip} + +as hv3 ovs-appctl netdev-dummy/receive hv3-vif1 $arp_reply +OVS_WAIT_UNTIL([ + test `as hv2 ovs-ofctl dump-flows br-int | grep table=66 | \ +grep actions=mod_dl_dst:f0:00:00:01:02:04 | wc -l` -eq 1 + ]) + +echo "ovn-sbctl list MAC_Binding" +ovn-sbctl list MAC_Binding +echo "============================" + +# VLAN tagged packet with distributed gateway port(172.16.1.6) MAC as destination MAC +# is expected on bridge connecting hv1 and hv2 +expected=${gw_mac}${foo1_mac}8100000208004500001c0000000040110000${foo1_ip}${dst_ip}0035111100080000 +echo $expected > hv1-br-ex_n2.expected + +# Packet to Expect at outside1 i.e nexthop(172.16.1.1) port. +# As connection tracking not enabled for this test, snat can't be done on the packet. +# We still see foo1 as the source ip address. But source mac(gateway MAC) and +# dest mac(nexthop mac) are properly configured. +expected=${nexthop_mac}${gw_mac}08004500001c000000003f110100${foo1_ip}${dst_ip}0035111100080000 +echo $expected > hv3-vif1.expected + +reset_pcap_file() { + local iface=$1 + local pcap_file=$2 + ovs-vsctl -- set Interface $iface options:tx_pcap=dummy-tx.pcap \ +options:rxq_pcap=dummy-rx.pcap + rm -f ${pcap_file}*.pcap + ovs-vsctl -- set Interface $iface options:tx_pcap=${pcap_file}-tx.pcap \ +options:rxq_pcap=${pcap_file}-rx.pcap +} + +as hv1 reset_pcap_file br-ex_n2 hv1/br-ex_n2 +as hv3 reset_pcap_file hv3-vif1 hv3/vif1 +sleep 2 +as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet +sleep 2 + +# On hv1, table 65 for packets going from vlan switch pipleline to router pipleine +# set MLF_RCV_FROM_VLAN flag +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep "priority=100,reg15=0x1,metadata=0x2" \ +| grep actions=clone | grep "load:0x1->NXM_NX_REG10" | wc -l], [0], [[1 +]]) +# On hv1, because of snat rule in table 15, a higher priority(i.e 2) flow +# added for packets with MLF_RCV_FROM_VLAN flag with output as distributed +# gateway port, which sets REGBIT_NAT_REDIRECT flag +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=15 | grep "priority=2,ip,reg10=0x20/0x20,metadata=0x1" \ +| grep "actions=load:0x1->OXM_OF_PKT_REG4" | wc -l], [0], [[1 +]]) + +# On hv1, table 32 flow which tags packet with source network vlan tag and sends it to hv2 +# through br-ex +AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=32 | grep "priority=150,reg10=0x20/0x20,reg14=0x1,reg15=0x3,metadata=0x1" \ +| grep "actions=mod_vlan_vid:2" | grep "n_packets=2," | wc -l], [0], [[1 +]]) + +# On hv2 table 0, vlan tagged packet is sent through router pipeline +# by setting MLF_RCV_FROM_VLAN flag (REG10) +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int | grep "table=0," | grep "priority=200" | grep "dl_vlan=2" | \ +grep "dl_dst=00:00:02:01:02:03" | grep "actions=strip_vlan,load:0x1->OXM_OF_METADATA" | grep "load:0x1->NXM_NX_REG14" | \ +grep "load:0x1->NXM_NX_REG10" | wc -l], [0], [[1 +]]) +# on hv2 table 8, allow packets with router metadata and with MLF_RCV_FROM_VLAN flag +AT_CHECK([as hv2 ovs-ofctl dump-flows br-int table=8 | grep "priority=100,reg10=0x20/0x20,reg14=0x1,metadata=0x1" | wc -l], [0], [[1 +]]) + +ip_packet() { + grep "2010203f00000010203" +} + +# Check vlan tagged packet on the bridge connecting hv1 and hv2 +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv1/br-ex_n2-tx.pcap | ip_packet | uniq > hv1-br-ex_n2 +cat hv1-br-ex_n2.expected > expout +AT_CHECK([sort hv1-br-ex_n2], [0], [expout]) + +# Check expected packet on nexthop interface +$PYTHON "$top_srcdir/utilities/ovs-pcap.in" hv3/vif1-tx.pcap | grep ${foo1_ip}${dst_ip} | uniq > hv3-vif1 +cat hv3-vif1.expected > expout +AT_CHECK([sort hv3-vif1], [0], [expout]) + +OVN_CLEANUP([hv1],[hv2],[hv3]) +AT_CLEANUP + AT_SETUP([ovn -- 1 LR with distributed router gateway port]) AT_SKIP_IF([test $HAVE_PYTHON = no]) ovn_start