From patchwork Thu Jan 5 10:46:23 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Mickey Spiegel router
and
the connected logical router port specifies a
- redirect-chassis
, the flow is only programmed on the
- redirect-chassis
.
+ redirect-chassis
:
redirect-chassis
.
+ redirect-chassis
.
+ For each dnat_and_snat
NAT rule on a distributed
+ router that specifies an external Ethernet address E,
+ a priority-50 flow that matches inport == GW
+ && eth.dst == E
, where GW
+ is the logical router gateway port, with action
+ next;
.
+
+ This flow is only programmed on the gateway port instance on
+ the chassis where the logical_port
specified in
+ the NAT rule resides.
+
@@ -1042,6 +1075,50 @@ outport = P; flags.loopback = 1; output; + +
+ For the gateway port on a distributed logical router with NAT
+ (where one of the logical router ports specifies a
+ redirect-chassis
):
+
redirect-chassis
. This behavior avoids
+ generation of multiple ARP responses from different chassis,
+ and allows upstream MAC learning to point to the
+ redirect-chassis
.
+
+ If the corresponding NAT rule can be handled in a distributed
+ manner, then this flow is only programmed on the gateway port
+ instance where the logical_port
specified in the
+ NAT rule resides.
+
+ Some of the actions are different for this case, using the
+ external_mac
specified in the NAT rule rather
+ than the gateway port's Ethernet address E:
+
+eth.src = external_mac; +arp.sha = external_mac; ++ +
+ This behavior avoids generation of multiple ARP responses + from different chassis, and allows upstream MAC learning to + point to the correct chassis. +
+Ingress Table 3: UNSNAT on Gateway Routers
+@@ -1275,6 +1354,45 @@ icmp4 {
Ingress Table 3: UNSNAT on Distributed Routers
+ +
+ For each configuration in the OVN Northbound database, that asks
+ to change the source IP address of a packet from A to
+ B, a priority-100 flow matches ip &&
+ ip4.dst == B && inport == GW
,
+ where GW is the logical router gateway port, with an
+ action ct_snat; next;
.
+
+ If the NAT rule cannot be handled in a distributed manner, then
+ the priority-100 flow above is only programmed on the
+ redirect-chassis
.
+
+ For each configuration in the OVN Northbound database, that asks
+ to change the source IP address of a packet from A to
+ B, a priority-50 flow matches ip &&
+ ip4.dst == B
with an action
+ REGBIT_NAT_REDIRECT = 1; next;
. This flow is for
+ east/west traffic to a NAT destination IPv4 address. By
+ setting the REGBIT_NAT_REDIRECT
flag, in the
+ ingress table Gateway Redirect
this will trigger a
+ redirect to the instance of the gateway port on the
+ redirect-chassis
with egress loopback.
+
+ A priority-0 logical flow with match 1
has actions
+ next;
.
+
@@ -1282,6 +1400,9 @@ icmp4 { be DNATted from a virtual IP address to a real IP address. Packets in the reverse direction needs to be unDNATed.
+ +Ingress Table 4: DNAT on Gateway Routers
+Ingress Table 4: DNAT on Distributed Routers
+ ++ On distributed routers, the DNAT table only handles packets + with destination IP address that needs to be DNATted from a + virtual IP address to a real IP address. The unDNAT processing + in the reverse direction is handled in a separate table in the + egress pipeline. +
+ +
+ For each configuration in the OVN Northbound database, that asks
+ to change the destination IP address of a packet from A to
+ B, a priority-100 flow matches ip &&
+ ip4.dst == B && inport == GW
,
+ where GW is the logical router gateway port, with an
+ action ct_dnat(B);
.
+
+ If the NAT rule cannot be handled in a distributed manner, then
+ the priority-100 flow above is only programmed on the
+ redirect-chassis
.
+
+ For each configuration in the OVN Northbound database, that asks
+ to change the destination IP address of a packet from A to
+ B, a priority-50 flow matches ip &&
+ ip4.dst == B
with an action
+ REGBIT_NAT_REDIRECT = 1; next;
. This flow is for
+ east/west traffic to a NAT destination IPv4 address. By
+ setting the REGBIT_NAT_REDIRECT
flag, in the
+ ingress table Gateway Redirect
this will trigger a
+ redirect to the instance of the gateway port on the
+ redirect-chassis
with egress loopback.
+
+ A priority-0 logical flow with match 1
has actions
+ next;
.
+
@@ -1369,9 +1537,9 @@ icmp4 {
packet's final destination, unchanged) and advances to the next
table for ARP resolution. It also sets reg1
(or
xxreg1
) to the IP address owned by the selected router
- port (Table 7 will generate ARP request, if needed, with
- reg0
as the target protocol address and reg1
- as the source protocol address).
+ port (ingress table ARP Request
will generate an ARP
+ request, if needed, with reg0
as the target protocol
+ address and reg1
as the source protocol address).
@@ -1381,6 +1549,16 @@ icmp4 {
+ For distributed logical routers where one of the logical router
+ ports specifies a redirect-chassis
, a priority-300
+ logical flow with match REGBIT_NAT_REDIRECT == 1
has
+ actions ip.ttl--; next;
. The outport
+ will be set later in the Gateway Redirect table.
+
IPv4 routing table. For each route to IPv4 network N with netmask M, on router port P with IP address A and Ethernet @@ -1466,6 +1644,17 @@ next;
+ For distributed logical routers where one of the logical router
+ ports specifies a redirect-chassis
, a priority-200
+ logical flow with match REGBIT_NAT_REDIRECT == 1
has
+ actions eth.dst = E; next;
, where
+ E is the ethernet address of the router's distributed
+ gateway port.
+
Static MAC bindings. MAC bindings can be known statically based on
data in the OVN_Northbound
database. For router ports
connected to logical switches, MAC bindings can be known statically
@@ -1515,9 +1704,9 @@ next;
Dynamic MAC bindings. These flows resolve MAC-to-IP bindings
that have become known dynamically through ARP or neighbor
- discovery. (The next table will issue an ARP or neighbor
- solicitation request for cases where the binding is not yet
- known.)
+ discovery. (The ingress table ARP Request
will
+ issue an ARP or neighbor solicitation request for cases where
+ the binding is not yet known.)
@@ -1543,6 +1732,15 @@ next;
REGBIT_NAT_REDIRECT == 1
has actions
+ outport = CR; next;
, where CR
+ is the chassisredirect
port representing the instance
+ of the logical router distributed gateway port on the
+ redirect-chassis
.
+ outport == GW &&
eth.dst == 00:00:00:00:00:00
has actions
@@ -1555,6 +1753,15 @@ next;
ip4.src == B &&
+ outport == GW
, where GW is
+ the logical router distributed gateway port, with actions
+ next;
.
+ outport == GW
has actions
outport = CR; next;
, where
@@ -1597,9 +1804,9 @@ arp {
- (Ingress table 4 initialized reg1
with the IP address
- owned by outport
and reg0
with the next-hop
- IP address)
+ (Ingress table IP Routing
initialized reg1
+ with the IP address owned by outport
and
+ reg0
with the next-hop IP address)
@@ -1613,12 +1820,60 @@ arp {
+ This is for already established connections' reverse traffic. + i.e., DNAT has already been done in ingress pipeline and now the + packet has entered the egress pipeline as part of a reply. For + NAT on a distributed router, it is unDNATted here. For Gateway + routers, the unDNAT processing is carried out in the ingress DNAT + table. +
+ +
+ For each configuration in the OVN Northbound database that asks
+ to change the destination IP address of a packet from an IP
+ address of A to B, a priority-100 flow
+ matches ip && ip4.src == B
+ && outport == GW
, where GW
+ is the logical router gateway port, with an action
+ ct_dnat;
.
+
+ If the NAT rule cannot be handled in a distributed manner, then
+ the priority-100 flow above is only programmed on the
+ redirect-chassis
.
+
+ If the NAT rule can be handled in a distributed manner, then
+ there is an additional action
+ eth.src = EA;
, where EA
+ is the ethernet address associated with the IP address
+ A in the NAT rule. This allows upstream MAC
+ learning to point to the correct chassis.
+
1
has actions
+ next;
.
+ Packets that are configured to be SNATed get their source IP address changed based on the configuration in the OVN Northbound database.
+ +Egress Table 1: SNAT on Gateway Routers
+@@ -1652,7 +1907,96 @@ arp {
Egress Table 1: SNAT on Distributed Routers
+ +
+ For each configuration in the OVN Northbound database, that asks
+ to change the source IP address of a packet from an IP address of
+ A or to change the source IP address of a packet that
+ belongs to network A to B, a flow matches
+ ip && ip4.src == A &&
+ outport == GW
, where GW is the
+ logical router gateway port, with an action
+ ct_snat(B);
. The priority of the flow
+ is calculated based on the mask of A, with matches
+ having larger masks getting higher priorities.
+
+ If the NAT rule cannot be handled in a distributed manner, then
+ the flow above is only programmed on the
+ redirect-chassis
.
+
+ If the NAT rule can be handled in a distributed manner, then
+ there is an additional action
+ eth.src = EA;
, where EA
+ is the ethernet address associated with the IP address
+ A in the NAT rule. This allows upstream MAC
+ learning to point to the correct chassis.
+
1
has actions
+ next;
.
+
+ For distributed logical routers where one of the logical router
+ ports specifies a redirect-chassis
.
+
+ Earlier in the ingress pipeline, some east-west traffic was
+ redirected to the chassisredirect
port, based on
+ flows in the UNSNAT
and DNAT
ingress
+ tables setting the REGBIT_NAT_REDIRECT
flag, which
+ then triggered a match to a flow in the
+ Gateway Redirect
ingress table. The intention was
+ not to actually send traffic out the distributed gateway port
+ instance on the redirect-chassis
. This traffic was
+ sent to the distributed gateway port instance in order for DNAT
+ and/or SNAT processing to be applied.
+
+ While UNDNAT and SNAT processing have already occurred by this + point, this traffic needs to be forced through egress loopback on + this distributed gateway port instance, in order for UNSNAT and + DNAT processing to be applied, and also for IP routing and ARP + resolution after all of the NAT processing, so that the packet can + be forwarded to the destination. +
+ ++ This table has the following flows: +
+ +ip4.dst == E &&
+ outport == GW
, where E is the
+ external IP address specified in the NAT rule, and GW
+ is the logical router distributed gateway port, with actions
+ flags.force_egress_loopback = 1; next;
.
+ 1
has actions
+ next;
.
+ Packets that reach this table are ready for delivery. It contains diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index 59fd02e..dbf5af7 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -136,8 +136,10 @@ enum ovn_stage { PIPELINE_STAGE(ROUTER, IN, ARP_REQUEST, 8, "lr_in_arp_request") \ \ /* Logical router egress stages. */ \ - PIPELINE_STAGE(ROUTER, OUT, SNAT, 0, "lr_out_snat") \ - PIPELINE_STAGE(ROUTER, OUT, DELIVERY, 1, "lr_out_delivery") + PIPELINE_STAGE(ROUTER, OUT, UNDNAT, 0, "lr_out_undnat") \ + PIPELINE_STAGE(ROUTER, OUT, SNAT, 1, "lr_out_snat") \ + PIPELINE_STAGE(ROUTER, OUT, EGR_LOOP, 2, "lr_out_egr_loop") \ + PIPELINE_STAGE(ROUTER, OUT, DELIVERY, 3, "lr_out_delivery") #define PIPELINE_STAGE(DP_TYPE, PIPELINE, STAGE, TABLE, NAME) \ S_##DP_TYPE##_##PIPELINE##_##STAGE \ @@ -152,11 +154,15 @@ enum ovn_stage { * priority to determine the ACL's logical flow priority. */ #define OVN_ACL_PRI_OFFSET 1000 +/* Register definitions specific to switches. */ #define REGBIT_CONNTRACK_DEFRAG "reg0[0]" #define REGBIT_CONNTRACK_COMMIT "reg0[1]" #define REGBIT_CONNTRACK_NAT "reg0[2]" #define REGBIT_DHCP_OPTS_RESULT "reg0[3]" +/* Register definitions for switches and routers. */ +#define REGBIT_NAT_REDIRECT "reg9[0]" + /* Returns an "enum ovn_stage" built from the arguments. */ static enum ovn_stage ovn_stage_build(enum ovn_datapath_type dp_type, enum ovn_pipeline pipeline, @@ -3236,6 +3242,33 @@ build_lswitch_flows(struct hmap *datapaths, struct hmap *ports, ds_put_format(&actions, "outport = %s; output;", op->json_key); ovn_lflow_add(lflows, op->od, S_SWITCH_IN_L2_LKUP, 50, ds_cstr(&match), ds_cstr(&actions)); + + /* Add ethernet addresses specified in NAT rules on + * distributed logical routers. */ + if (op->peer->od->l3dgw_port + && op->peer == op->peer->od->l3dgw_port) { + for (int i = 0; i < op->peer->od->nbr->n_nat; i++) { + const struct nbrec_nat *nat + = op->peer->od->nbr->nat[i]; + if (!strcmp(nat->type, "dnat_and_snat") + && nat->logical_port && nat->external_mac + && eth_addr_from_string(nat->external_mac, &mac)) { + + ds_clear(&match); + ds_put_format(&match, "eth.dst == "ETH_ADDR_FMT + " && is_chassis_resident(\"%s\")", + ETH_ADDR_ARGS(mac), + nat->logical_port); + + ds_clear(&actions); + ds_put_format(&actions, "outport = %s; output;", + op->json_key); + ovn_lflow_add(lflows, op->od, S_SWITCH_IN_L2_LKUP, + 50, ds_cstr(&match), + ds_cstr(&actions)); + } + } + } } else { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 1); @@ -3938,17 +3971,56 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, ds_clear(&actions); ds_put_format(&actions, "eth.dst = eth.src; " - "eth.src = %s; " "arp.op = 2; /* ARP reply */ " - "arp.tha = arp.sha; " - "arp.sha = %s; " + "arp.tha = arp.sha; "); + + if (op->od->l3dgw_port && op == op->od->l3dgw_port) { + struct eth_addr mac; + if (nat->external_mac && + eth_addr_from_string(nat->external_mac, &mac) + && nat->logical_port) { + /* distributed NAT case, use nat->external_mac */ + ds_put_format(&actions, + "eth.src = "ETH_ADDR_FMT"; " + "arp.sha = "ETH_ADDR_FMT"; ", + ETH_ADDR_ARGS(mac), + ETH_ADDR_ARGS(mac)); + /* Traffic with eth.src = nat->external_mac should only be + * sent from the chassis where nat->logical_port is + * resident, so that upstream MAC learning points to the + * correct chassis. Also need to avoid generation of + * multiple ARP responses from different chassis. */ + ds_put_format(&match, " && is_chassis_resident(\"%s\")", + nat->logical_port); + } else { + ds_put_format(&actions, + "eth.src = %s; " + "arp.sha = %s; ", + op->lrp_networks.ea_s, + op->lrp_networks.ea_s); + /* Traffic with eth.src = l3dgw_port->lrp_networks.ea_s + * should only be sent from the "redirect-chassis", so that + * upstream MAC learning points to the "redirect-chassis". + * Also need to avoid generation of multiple ARP responses + * from different chassis. */ + if (op->od->l3redirect_port) { + ds_put_format(&match, " && is_chassis_resident(%s)", + op->od->l3redirect_port->json_key); + } + } + } else { + ds_put_format(&actions, + "eth.src = %s; " + "arp.sha = %s; ", + op->lrp_networks.ea_s, + op->lrp_networks.ea_s); + } + ds_put_format(&actions, "arp.tpa = arp.spa; " "arp.spa = "IP_FMT"; " "outport = %s; " "flags.loopback = 1; " "output;", - op->lrp_networks.ea_s, - op->lrp_networks.ea_s, IP_ARGS(ip), op->json_key); ovn_lflow_add(lflows, op->od, S_ROUTER_IN_IP_INPUT, 90, @@ -4077,7 +4149,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, } } - /* NAT, Defrag and load balancing in Gateway routers. */ + /* NAT, Defrag and load balancing. */ HMAP_FOR_EACH (od, key_node, datapaths) { if (!od->nbr) { continue; @@ -4088,10 +4160,13 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 0, "1", "next;"); ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, 0, "1", "next;"); ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 0, "1", "next;"); + ovn_lflow_add(lflows, od, S_ROUTER_OUT_UNDNAT, 0, "1", "next;"); + ovn_lflow_add(lflows, od, S_ROUTER_OUT_EGR_LOOP, 0, "1", "next;"); - /* NAT rules, packet defrag and load balancing are only valid on - * Gateway routers. */ - if (!smap_get(&od->nbr->options, "chassis")) { + /* NAT rules are only valid on Gateway routers and routers with + * l3dgw_port (router has a port with "redirect-chassis" + * specified). */ + if (!smap_get(&od->nbr->options, "chassis") && !od->l3dgw_port) { continue; } @@ -4141,6 +4216,22 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, } } + /* For distributed router NAT, determine whether this NAT rule + * satisfies the conditions for distributed NAT processing. */ + bool distributed = false; + struct eth_addr mac; + if (od->l3dgw_port && !strcmp(nat->type, "dnat_and_snat") && + nat->logical_port && nat->external_mac) { + if (eth_addr_from_string(nat->external_mac, &mac)) { + distributed = true; + } else { + static struct vlog_rate_limit rl = + VLOG_RATE_LIMIT_INIT(5, 1); + VLOG_WARN_RL(&rl, "bad mac %s for dnat in router " + ""UUID_FMT"", nat->external_mac, UUID_ARGS(&od->key)); + } + } + /* Ingress UNSNAT table: It is for already established connections' * reverse traffic. i.e., SNAT has already been done in egress * pipeline and now the packet has entered the ingress pipeline as @@ -4152,10 +4243,41 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, * egress pipeline. */ if (!strcmp(nat->type, "snat") || !strcmp(nat->type, "dnat_and_snat")) { - ds_clear(&match); - ds_put_format(&match, "ip && ip4.dst == %s", nat->external_ip); - ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 90, - ds_cstr(&match), "ct_snat; next;"); + if (!od->l3dgw_port) { + /* Gateway router. */ + ds_clear(&match); + ds_put_format(&match, "ip && ip4.dst == %s", + nat->external_ip); + ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 90, + ds_cstr(&match), "ct_snat; next;"); + } else { + /* Distributed router. */ + + /* Traffic received on l3dgw_port is subject to NAT. */ + ds_clear(&match); + ds_put_format(&match, "ip && ip4.dst == %s" + " && inport == %s", + nat->external_ip, + od->l3dgw_port->json_key); + if (!distributed && od->l3redirect_port) { + /* Flows for NAT rules that are centralized are only + * programmed on the "redirect-chassis". */ + ds_put_format(&match, " && is_chassis_resident(%s)", + od->l3redirect_port->json_key); + } + ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 100, + ds_cstr(&match), "ct_snat;"); + + /* Traffic received on other router ports must be + * redirected to the central instance of the l3dgw_port + * with egress loopback for NAT processing. */ + ds_clear(&match); + ds_put_format(&match, "ip && ip4.dst == %s", + nat->external_ip); + ovn_lflow_add(lflows, od, S_ROUTER_IN_UNSNAT, 50, + ds_cstr(&match), + REGBIT_NAT_REDIRECT" = 1; next;"); + } } /* Ingress DNAT table: Packets enter the pipeline with destination @@ -4163,21 +4285,87 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, * to a logical IP address. */ if (!strcmp(nat->type, "dnat") || !strcmp(nat->type, "dnat_and_snat")) { - /* Packet when it goes from the initiator to destination. - * We need to zero the inport because the router can - * send the packet back through the same interface. */ + if (!od->l3dgw_port) { + /* Gateway router. */ + /* Packet when it goes from the initiator to destination. + * We need to set flags.loopback because the router can + * send the packet back through the same interface. */ + ds_clear(&match); + ds_put_format(&match, "ip && ip4.dst == %s", + nat->external_ip); + ds_clear(&actions); + if (dnat_force_snat_ip) { + /* Indicate to the future tables that a DNAT has taken + * place and a force SNAT needs to be done in the + * Egress SNAT table. */ + ds_put_format(&actions, + "flags.force_snat_for_dnat = 1; "); + } + ds_put_format(&actions, "flags.loopback = 1; ct_dnat(%s);", + nat->logical_ip); + ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 100, + ds_cstr(&match), ds_cstr(&actions)); + } else { + /* Distributed router. */ + + /* Traffic received on l3dgw_port is subject to NAT. */ + ds_clear(&match); + ds_put_format(&match, "ip && ip4.dst == %s" + " && inport == %s", + nat->external_ip, + od->l3dgw_port->json_key); + if (!distributed && od->l3redirect_port) { + /* Flows for NAT rules that are centralized are only + * programmed on the "redirect-chassis". */ + ds_put_format(&match, " && is_chassis_resident(%s)", + od->l3redirect_port->json_key); + } + ds_clear(&actions); + ds_put_format(&actions,"ct_dnat(%s);", + nat->logical_ip); + ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 100, + ds_cstr(&match), ds_cstr(&actions)); + + /* Traffic received on other router ports must be + * redirected to the central instance of the l3dgw_port + * with egress loopback for NAT processing. */ + ds_clear(&match); + ds_put_format(&match, "ip && ip4.dst == %s", + nat->external_ip); + ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50, + ds_cstr(&match), + REGBIT_NAT_REDIRECT" = 1; next;"); + } + } + + /* Egress UNDNAT table: It is for already established connections' + * reverse traffic. i.e., DNAT has already been done in ingress + * pipeline and now the packet has entered the egress pipeline as + * part of a reply. We undo the DNAT here. + * + * Note that this only applies for NAT on a distributed router. + * Undo DNAT on a gateway router is done in the ingress DNAT + * pipeline stage. */ + if (od->l3dgw_port && (!strcmp(nat->type, "dnat") + || !strcmp(nat->type, "dnat_and_snat"))) { ds_clear(&match); - ds_put_format(&match, "ip && ip4.dst == %s", nat->external_ip); + ds_put_format(&match, "ip && ip4.src == %s" + " && outport == %s", + nat->logical_ip, + od->l3dgw_port->json_key); + if (!distributed && od->l3redirect_port) { + /* Flows for NAT rules that are centralized are only + * programmed on the "redirect-chassis". */ + ds_put_format(&match, " && is_chassis_resident(%s)", + od->l3redirect_port->json_key); + } ds_clear(&actions); - if (dnat_force_snat_ip) { - /* Indicate to the future tables that a DNAT has taken - * place and a force SNAT needs to be done in the Egress - * SNAT table. */ - ds_put_format(&actions, "flags.force_snat_for_dnat = 1; "); + if (distributed) { + ds_put_format(&actions, "eth.src = "ETH_ADDR_FMT"; ", + ETH_ADDR_ARGS(mac)); } - ds_put_format(&actions, "flags.loopback = 1; ct_dnat(%s);", - nat->logical_ip); - ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 100, + ds_put_format(&actions, "ct_dnat;"); + ovn_lflow_add(lflows, od, S_ROUTER_OUT_UNDNAT, 100, ds_cstr(&match), ds_cstr(&actions)); } @@ -4186,22 +4374,97 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, * address. */ if (!strcmp(nat->type, "snat") || !strcmp(nat->type, "dnat_and_snat")) { + if (!od->l3dgw_port) { + /* Gateway router. */ + ds_clear(&match); + ds_put_format(&match, "ip && ip4.src == %s", + nat->logical_ip); + ds_clear(&actions); + ds_put_format(&actions, "ct_snat(%s);", nat->external_ip); + + /* The priority here is calculated such that the + * nat->logical_ip with the longest mask gets a higher + * priority. */ + ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, + count_1bits(ntohl(mask)) + 1, + ds_cstr(&match), ds_cstr(&actions)); + } else { + /* Distributed router. */ + ds_clear(&match); + ds_put_format(&match, "ip && ip4.src == %s" + " && outport == %s", + nat->logical_ip, + od->l3dgw_port->json_key); + if (!distributed && od->l3redirect_port) { + /* Flows for NAT rules that are centralized are only + * programmed on the "redirect-chassis". */ + ds_put_format(&match, " && is_chassis_resident(%s)", + od->l3redirect_port->json_key); + } + ds_clear(&actions); + if (distributed) { + ds_put_format(&actions, "eth.src = "ETH_ADDR_FMT"; ", + ETH_ADDR_ARGS(mac)); + } + ds_put_format(&actions, "ct_snat(%s);", nat->external_ip); + + /* The priority here is calculated such that the + * nat->logical_ip with the longest mask gets a higher + * priority. */ + ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, + count_1bits(ntohl(mask)) + 1, + ds_cstr(&match), ds_cstr(&actions)); + } + } + + /* Logical router ingress table 0: + * For NAT on a distributed router, add rules allowing + * ingress traffic with eth.dst matching nat->external_mac + * on the l3dgw_port instance where nat->logical_port is + * resident. */ + if (distributed) { ds_clear(&match); - ds_put_format(&match, "ip && ip4.src == %s", nat->logical_ip); - ds_clear(&actions); - ds_put_format(&actions, "ct_snat(%s);", nat->external_ip); + ds_put_format(&match, + "eth.dst == "ETH_ADDR_FMT" && inport == %s" + " && is_chassis_resident(\"%s\")", + ETH_ADDR_ARGS(mac), + od->l3dgw_port->json_key, + nat->logical_port); + ovn_lflow_add(lflows, od, S_ROUTER_IN_ADMISSION, 50, + ds_cstr(&match), "next;"); + } - /* The priority here is calculated such that the - * nat->logical_ip with the longest mask gets a higher - * priority. */ - ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, - count_1bits(ntohl(mask)) + 1, - ds_cstr(&match), ds_cstr(&actions)); + /* Ingress Gateway Redirect Table: For NAT on a distributed + * router, add flows that are specific to a NAT rule. These + * flows indicate the presence of an applicable NAT rule that + * can be applied in a distributed manner. */ + if (distributed) { + ds_clear(&match); + ds_put_format(&match, "ip4.src == %s && outport == %s", + nat->logical_ip, + od->l3dgw_port->json_key); + ovn_lflow_add(lflows, od, S_ROUTER_IN_GW_REDIRECT, 100, + ds_cstr(&match), "next;"); + } + + /* Egress Loopback table: For NAT on a distributed router. + * If packets in the egress pipeline on the distributed + * gateway port have ip.dst matching a NAT external IP, then + * set the flag to force egress loopback. */ + if (od->l3dgw_port) { + /* Distributed router. */ + ds_clear(&match); + ds_put_format(&match, "ip4.dst == %s && outport == %s", + nat->external_ip, + od->l3dgw_port->json_key); + ovn_lflow_add(lflows, od, S_ROUTER_OUT_EGR_LOOP, 100, + ds_cstr(&match), + "flags.force_egress_loopback = 1; next;"); } } /* Handle force SNAT options set in the gateway router. */ - if (dnat_force_snat_ip) { + if (dnat_force_snat_ip && !od->l3dgw_port) { /* If a packet with destination IP address as that of the * gateway router (as set in options:dnat_force_snat_ip) is seen, * UNSNAT it. */ @@ -4220,7 +4483,7 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, 100, ds_cstr(&match), ds_cstr(&actions)); } - if (lb_force_snat_ip) { + if (lb_force_snat_ip && !od->l3dgw_port) { /* If a packet with destination IP address as that of the * gateway router (as set in options:lb_force_snat_ip) is seen, * UNSNAT it. */ @@ -4239,22 +4502,61 @@ build_lrouter_flows(struct hmap *datapaths, struct hmap *ports, ds_cstr(&match), ds_cstr(&actions)); } - /* Re-circulate every packet through the DNAT zone. - * This helps with two things. - * - * 1. Any packet that needs to be unDNATed in the reverse - * direction gets unDNATed. Ideally this could be done in - * the egress pipeline. But since the gateway router - * does not have any feature that depends on the source - * ip address being external IP address for IP routing, - * we can do it here, saving a future re-circulation. - * - * 2. Any packet that was sent through SNAT zone in the - * previous table automatically gets re-circulated to get - * back the new destination IP address that is needed for - * routing in the openflow pipeline. */ - ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50, - "ip", "flags.loopback = 1; ct_dnat;"); + if (!od->l3dgw_port) { + /* For gateway router, re-circulate every packet through + * the DNAT zone. This helps with two things. + * + * 1. Any packet that needs to be unDNATed in the reverse + * direction gets unDNATed. Ideally this could be done in + * the egress pipeline. But since the gateway router + * does not have any feature that depends on the source + * ip address being external IP address for IP routing, + * we can do it here, saving a future re-circulation. + * + * 2. Any packet that was sent through SNAT zone in the + * previous table automatically gets re-circulated to get + * back the new destination IP address that is needed for + * routing in the openflow pipeline. */ + ovn_lflow_add(lflows, od, S_ROUTER_IN_DNAT, 50, + "ip", "flags.loopback = 1; ct_dnat;"); + } else { + /* For NAT on a distributed router, add flows to Ingress + * IP Routing table, Ingress ARP Resolution table, and + * Ingress Gateway Redirect Table that are not specific to a + * NAT rule. */ + + /* The highest priority IN_IP_ROUTING rule matches packets + * with REGBIT_NAT_REDIRECT (set in DNAT or UNSNAT stages), + * with action "ip.ttl--; next;". The IN_GW_REDIRECT table + * will take care of setting the outport. */ + ovn_lflow_add(lflows, od, S_ROUTER_IN_IP_ROUTING, 300, + REGBIT_NAT_REDIRECT" == 1", "ip.ttl--; next;"); + + /* The highest priority IN_ARP_RESOLVE rule matches packets + * with REGBIT_NAT_REDIRECT (set in DNAT or UNSNAT stages), + * then sets eth.dst to the distributed gateway port's + * ethernet address. */ + ds_clear(&actions); + ds_put_format(&actions, "eth.dst = %s; next;", + od->l3dgw_port->lrp_networks.ea_s); + ovn_lflow_add(lflows, od, S_ROUTER_IN_ARP_RESOLVE, 200, + REGBIT_NAT_REDIRECT" == 1", ds_cstr(&actions)); + + /* The highest priority IN_GW_REDIRECT rule redirects packets + * with REGBIT_NAT_REDIRECT (set in DNAT or UNSNAT stages) to + * the central instance of the l3dgw_port for NAT processing. */ + ds_clear(&actions); + ds_put_format(&actions, "outport = %s; next;", + od->l3redirect_port->json_key); + ovn_lflow_add(lflows, od, S_ROUTER_IN_GW_REDIRECT, 200, + REGBIT_NAT_REDIRECT" == 1", ds_cstr(&actions)); + } + + /* Load balancing and packet defrag are only valid on + * Gateway routers. */ + if (!smap_get(&od->nbr->options, "chassis")) { + continue; + } /* A set to hold all ips that need defragmentation and tracking. */ struct sset all_ips = SSET_INITIALIZER(&all_ips); diff --git a/ovn/ovn-nb.ovsschema b/ovn/ovn-nb.ovsschema index 1c8319f..dd0ac3d 100644 --- a/ovn/ovn-nb.ovsschema +++ b/ovn/ovn-nb.ovsschema @@ -1,7 +1,7 @@ { "name": "OVN_Northbound", "version": "5.5.0", - "cksum": "379266191 13990", + "cksum": "2099428463 14236", "tables": { "NB_Global": { "columns": { @@ -220,7 +220,11 @@ "NAT": { "columns": { "external_ip": {"type": "string"}, + "external_mac": {"type": {"key": "string", + "min": 0, "max": 1}}, "logical_ip": {"type": "string"}, + "logical_port": {"type": {"key": "string", + "min": 0, "max": 1}}, "type": {"type": {"key": {"type": "string", "enum": ["set", ["dnat", "snat", diff --git a/ovn/ovn-nb.xml b/ovn/ovn-nb.xml index 0e64ba8..f43fda5 100644 --- a/ovn/ovn-nb.xml +++ b/ovn/ovn-nb.xml @@ -533,6 +533,15 @@ switch's destination lookup, and also for the logical switch to generate ARP and ND replies.
+ +
+ If the connected logical router port has a
+ redirect-chassis
specified and the logical router
+ has rules specified in
+ with , then those
+ addresses are also used to populate the switch's destination
+ lookup.
+
redirect-chassis
specified.