Message ID | 20240304105542.349690-2-martin.kalcok@canonical.com |
---|---|
State | Superseded |
Headers | show |
Series | [ovs-dev,1/2] actions.c/h: Enable specifying zone for ct_commit. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
On Mon, Mar 4, 2024 at 11:56 AM Martin Kalcok <martin.kalcok@canonical.com> wrote: > This change fixes bug that breaks ability of machines from external > networks to communicate with machines in SNATed networks (specifically > when using a Distributed router). > > Currently when a machine (S1) on an external network tries to talk > over TCP with a machine (A1) in a network that has enabled SNAT, the > connection is established successfully. However after the three-way > handshake, any packets that come from the A1 machine will have their > source address translated by the Distributed router, breaking the > communication. > > Existing rule in `build_lrouter_out_snat_flow` that decides which > packets should be SNATed already tries to avoid SNATing packets in > reply direction with `(!ct.trk || !ct.rpl)`. However, previous stages > in the distributed LR egress pipeline do not initiate the CT state. > > Additionally we need to commit new connections that originate from > external networks into CT, so that the packets in the reply direction > can be properly identified. > > Rationale: > > In my original RFC [0], there were questions about the motivation for > fixing this issue. I'll try to summarize why I think this is a bug > that should be fixed. > > 1. Current implementation for Distributed router already tries to > avoid SNATing packets in the reply direction, it's just missing > initialized CT states to make proper decisions. > > 2. This same scenario works with Gateway Router. I tested with > following setup: > > foo -- R1 -- join - R3 -- alice > | > bar ----------R2 > > R1 is a Distributed router with SNAT for foo. R2 is a Gateway > router with SNAT for bar. R3 is a Gateway router with no SNAT. > Using 'alice1' as a client I was able to talk over TCP with > 'bar1' but connection with 'foo1' failed. > > 3. Regarding security and "leaking" of internal IPs. Reading through > RFC 4787 [1], 5382 [2] and their update in 7857 [3], the > specifications do not seem to mandate that SNAT implementations > must filter incoming traffic destined directly to the internal > network. Sections like "5. Filtering Behavior" in 4787 and > "4.3 Externally Initiated Connections" in 5382 describe only > behavior for traffic destined to external IP/Port associated > with NAT on the device that performs NAT. > > Besides, with the current implementation, it's already possible > to scan the internal network with pings and TCP syn scanning. > > If an additional security is needed, ACLs can be used to filter > out incomming traffic from external networks. > > 4. We do have customers/clouds that depend on this functionality. > This is a scenario that used to work in Openstack with ML2/OVS > and migrating those clouds to ML2/OVN would break it. > > [0] > https://mail.openvswitch.org/pipermail/ovs-dev/2024-February/411670.html > [1]https://datatracker.ietf.org/doc/html/rfc4787 > [2]https://datatracker.ietf.org/doc/html/rfc5382 > [3]https://datatracker.ietf.org/doc/html/rfc7857 > > Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com> > --- > Hi Martin, I have a few comments down below. > northd/northd.c | 82 ++++++++++++++++++++++++++++++++++++----- > northd/ovn-northd.8.xml | 29 +++++++++++++++ > tests/ovn-northd.at | 15 +++++++- > tests/system-ovn.at | 69 ++++++++++++++++++++++++++++++++++ > 4 files changed, 185 insertions(+), 10 deletions(-) > > diff --git a/northd/northd.c b/northd/northd.c > index 2c3560ce2..4b79b357c 100644 > --- a/northd/northd.c > +++ b/northd/northd.c > @@ -14437,21 +14437,28 @@ build_lrouter_out_is_dnat_local(struct > lflow_table *lflows, > } > > static void > -build_lrouter_out_snat_match(struct lflow_table *lflows, > - const struct ovn_datapath *od, > - const struct nbrec_nat *nat, struct ds > *match, > - bool distributed_nat, int cidr_bits, bool > is_v6, > - struct ovn_port *l3dgw_port, > - struct lflow_ref *lflow_ref) > +build_lrouter_out_snat_direction_match__(struct lflow_table *lflows, > + const struct ovn_datapath *od, > + const struct nbrec_nat *nat, > + struct ds *match, > + bool distributed_nat, int > cidr_bits, > + bool is_v6, > + struct ovn_port *l3dgw_port, > + struct lflow_ref *lflow_ref, > + bool is_reverse) > { > ds_clear(match); > > - ds_put_format(match, "ip && ip%c.src == %s", is_v6 ? '6' : '4', > + ds_put_format(match, "ip && ip%c.%s == %s", > + is_v6 ? '6' : '4', > + is_reverse ? "dst" : "src", > nat->logical_ip); > > if (!od->is_gw_router) { > /* Distributed router. */ > - ds_put_format(match, " && outport == %s", l3dgw_port->json_key); > + ds_put_format(match, " && %s == %s", > + is_reverse ? "inport" : "outport", > + l3dgw_port->json_key); > if (od->n_l3dgw_ports) { > ds_put_format(match, " && is_chassis_resident(\"%s\")", > distributed_nat > @@ -14462,11 +14469,37 @@ build_lrouter_out_snat_match(struct lflow_table > *lflows, > > if (nat->allowed_ext_ips || nat->exempted_ext_ips) { > lrouter_nat_add_ext_ip_match(od, lflows, match, nat, > - is_v6, false, cidr_bits, > + is_v6, is_reverse, cidr_bits, > lflow_ref); > } > } > > +static void > +build_lrouter_out_snat_match(struct lflow_table *lflows, > + const struct ovn_datapath *od, > + const struct nbrec_nat *nat, struct ds > *match, > + bool distributed_nat, int cidr_bits, bool > is_v6, > + struct ovn_port *l3dgw_port, > + struct lflow_ref *lflow_ref) > +{ > + build_lrouter_out_snat_direction_match__(lflows, od, nat, match, > + distributed_nat, cidr_bits, > is_v6, > + l3dgw_port, lflow_ref, > false); > +} > + > +static void > +build_lrouter_out_snat_reverse_match(struct lflow_table *lflows, > + const struct ovn_datapath *od, > + const struct nbrec_nat *nat, struct ds > *match, > + bool distributed_nat, int cidr_bits, bool > is_v6, > + struct ovn_port *l3dgw_port, > + struct lflow_ref *lflow_ref) > +{ > + build_lrouter_out_snat_direction_match__(lflows, od, nat, match, > + distributed_nat, cidr_bits, > is_v6, > + l3dgw_port, lflow_ref, true); > +} > + > nit: I don't think we need two new functions when only one argument was added. Considering the old function was used at 3 places only anyway. > static void > build_lrouter_out_snat_stateless_flow(struct lflow_table *lflows, > const struct ovn_datapath *od, > @@ -14591,6 +14624,7 @@ build_lrouter_out_snat_flow(struct lflow_table > *lflows, > return; > } > > + struct ds match_all_from_snat = DS_EMPTY_INITIALIZER; > ds_clear(actions); > > /* The priority here is calculated such that the > @@ -14600,6 +14634,7 @@ build_lrouter_out_snat_flow(struct lflow_table > *lflows, > > build_lrouter_out_snat_match(lflows, od, nat, match, distributed_nat, > cidr_bits, is_v6, l3dgw_port, lflow_ref); > + ds_clone(&match_all_from_snat, match); > > if (!od->is_gw_router) { > /* Distributed router. */ > @@ -14624,6 +14659,35 @@ build_lrouter_out_snat_flow(struct lflow_table > *lflows, > priority, ds_cstr(match), > ds_cstr(actions), &nat->header_, > lflow_ref); > + > + /* For the SNAT networks, we need to make sure that connections are > + * properly tracked so we can decide whether to perform SNAT on > traffic > + * exiting the network. */ > + if (!strcmp(nat->type, "snat") && !od->is_gw_router) { > + /* For traffic that comes from SNAT network, initiate CT state > before > + * entering S_ROUTER_OUT_SNAT to allow matching on various CT > states. > + */ > + ovn_lflow_add(lflows, od, S_ROUTER_OUT_POST_UNDNAT, 70, > + ds_cstr(&match_all_from_snat), "ct_snat; ", > + lflow_ref); > + > + build_lrouter_out_snat_reverse_match(lflows, od, nat, match, > + distributed_nat, cidr_bits, > is_v6, > + l3dgw_port, lflow_ref); > + > + /* New traffic that goes into SNAT network is committed to CT to > avoid > + * SNAT-ing replies.*/ > + ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, priority, > + ds_cstr(match), "ct_snat;", > + lflow_ref); > + > + ds_put_cstr(match, "&& ct.new"); > + ovn_lflow_add(lflows, od, S_ROUTER_OUT_POST_SNAT, priority, > + ds_cstr(match), "ct_commit(snat); next; ", > + lflow_ref); > + } > + > + ds_destroy(&match_all_from_snat); > I suppose that the gw router works because we have ct_commit for new traffic [0]. I wonder if we could extend that with a check for the DGP case. > } > > static void > diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml > index 17b414144..ce6fd1270 100644 > --- a/northd/ovn-northd.8.xml > +++ b/northd/ovn-northd.8.xml > @@ -4869,6 +4869,13 @@ nd_ns { > > <p> > <ul> > + <li> > + A priority-70 logical flow is added that initiates CT state for > + traffic that is configured to be SNATed on Distributed routers. > + This allows the next table, <code>lr_out_snat</code>, to > + effectively match on various CT states. > + </li> > + > <li> > A priority-50 logical flow is added that commits any untracked > flows > from the previous table <code>lr_out_undnat</code> for Gateway > @@ -5061,6 +5068,18 @@ nd_ns { > > </li> > > + <li> > + An additional flow is added for traffic that goes in opposite > + direction (i.e. it enters a network with configured SNAT). Where > the > + flow above matched on <code>ip4.src == <var>A</var> && > outport > + == <var>GW</var></code>, this flow matches on <code> ip4.dst == > + <var>A</var> && inport == <var>GW</var></code>. A CT > state is > + initiated for this traffic so that the following table, <code> > + lr_out_post_snat</code>, can identify whether the traffic flow was > + initiated from the internal or external network. > + > + </li> > + > <li> > A priority-0 logical flow with match <code>1</code> has actions > <code>next;</code>. > @@ -5072,6 +5091,16 @@ nd_ns { > <p> > Packets reaching this table are processed according to the flows > below: > <ul> > + <li> > + Traffic that goes directly into a network configured with SNAT > on > + Distributed routers, and was initiated from an external network > (i.e. > + it matches <code>ct.new</code>), is committed to the SNAT CT > zone. > + This ensures that replies returning from the SNATed network do > not > + have their source address translated. For details about match > rules > + and priority see section "Egress Table 3: SNAT on Distributed > + Routers". > + </li> > + > <li> > A priority-0 logical flow that matches all packets not already > handled (match <code>1</code>) and action <code>next;</code>. > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > index 0732486f3..a7597a975 100644 > --- a/tests/ovn-northd.at > +++ b/tests/ovn-northd.at > @@ -1155,6 +1155,7 @@ AT_CAPTURE_FILE([crflows]) > AT_CHECK([grep -e "lr_out_snat" drflows | ovn_strip_lflows], [0], [dnl > table=??(lr_out_snat ), priority=0 , match=(1), action=(next;) > table=??(lr_out_snat ), priority=120 , match=(nd_ns), > action=(next;) > + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == > 50.0.0.11 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1") && > ip4.src == $allowed_range), action=(ct_snat;) > table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == > 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && > ip4.dst == $allowed_range && (!ct.trk || !ct.rpl)), > action=(ct_snat(172.16.1.1);) > ]) > > @@ -1185,6 +1186,7 @@ AT_CAPTURE_FILE([crflows2]) > AT_CHECK([grep -e "lr_out_snat" drflows2 | ovn_strip_lflows], [0], [dnl > table=??(lr_out_snat ), priority=0 , match=(1), action=(next;) > table=??(lr_out_snat ), priority=120 , match=(nd_ns), > action=(next;) > + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == > 50.0.0.11 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1")), > action=(ct_snat;) > table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == > 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && > (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.1);) > table=??(lr_out_snat ), priority=163 , match=(ip && ip4.src == > 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && > ip4.dst == $disallowed_range), action=(next;) > ]) > @@ -1279,7 +1281,7 @@ AT_CHECK([grep -e "lr_out_snat" crflows5 | > ovn_strip_lflows], [0], [dnl > table=??(lr_out_snat ), priority=33 , match=(ip && ip4.src == > 50.0.0.11 && ip4.dst == $allowed_range), action=(ip4.src=172.16.1.2; next;) > ]) > > -# Stateful FIP with DISALLOWED_IPs > +# Stateless FIP with DISALLOWED_IPs > ovn-nbctl lr-nat-del DR dnat_and_snat 172.16.1.2 > ovn-nbctl lr-nat-del CR dnat_and_snat 172.16.1.2 > > @@ -5590,12 +5592,16 @@ AT_CHECK([grep "lr_out_undnat" lr0flows | > ovn_strip_lflows], [0], [dnl > > AT_CHECK([grep "lr_out_post_undnat" lr0flows | ovn_strip_lflows], [0], > [dnl > table=??(lr_out_post_undnat ), priority=0 , match=(1), action=(next;) > + table=??(lr_out_post_undnat ), priority=70 , match=(ip && ip4.src == > 10.0.0.0/24 && outport == "lr0-public" && > is_chassis_resident("cr-lr0-public")), action=(ct_snat; ) > + table=??(lr_out_post_undnat ), priority=70 , match=(ip && ip4.src == > 10.0.0.10 && outport == "lr0-public" && > is_chassis_resident("cr-lr0-public")), action=(ct_snat; ) > ]) > > AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl > table=??(lr_out_snat ), priority=0 , match=(1), action=(next;) > table=??(lr_out_snat ), priority=120 , match=(nd_ns), > action=(next;) > + table=??(lr_out_snat ), priority=153 , match=(ip && ip4.dst == > 10.0.0.0/24 && inport == "lr0-public" && > is_chassis_resident("cr-lr0-public")), action=(ct_snat;) > table=??(lr_out_snat ), priority=153 , match=(ip && ip4.src == > 10.0.0.0/24 && outport == "lr0-public" && > is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), > action=(ct_snat(172.168.0.10);) > + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == > 10.0.0.10 && inport == "lr0-public" && > is_chassis_resident("cr-lr0-public")), action=(ct_snat;) > table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == > 10.0.0.10 && outport == "lr0-public" && > is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), > action=(ct_snat(172.168.0.30);) > table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == > 10.0.0.3 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public") > && (!ct.trk || !ct.rpl)), action=(ct_snat(172.168.0.20);) > ]) > @@ -5738,12 +5744,16 @@ AT_CHECK([grep "lr_out_undnat" lr0flows | > ovn_strip_lflows], [0], [dnl > > AT_CHECK([grep "lr_out_post_undnat" lr0flows | ovn_strip_lflows], [0], > [dnl > table=??(lr_out_post_undnat ), priority=0 , match=(1), action=(next;) > + table=??(lr_out_post_undnat ), priority=70 , match=(ip && ip4.src == > 10.0.0.0/24 && outport == "lr0-public" && > is_chassis_resident("cr-lr0-public")), action=(ct_snat; ) > + table=??(lr_out_post_undnat ), priority=70 , match=(ip && ip4.src == > 10.0.0.10 && outport == "lr0-public" && > is_chassis_resident("cr-lr0-public")), action=(ct_snat; ) > ]) > > AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl > table=??(lr_out_snat ), priority=0 , match=(1), action=(next;) > table=??(lr_out_snat ), priority=120 , match=(nd_ns), > action=(next;) > + table=??(lr_out_snat ), priority=153 , match=(ip && ip4.dst == > 10.0.0.0/24 && inport == "lr0-public" && > is_chassis_resident("cr-lr0-public")), action=(ct_snat;) > table=??(lr_out_snat ), priority=153 , match=(ip && ip4.src == > 10.0.0.0/24 && outport == "lr0-public" && > is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), > action=(ct_snat(172.168.0.10);) > + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == > 10.0.0.10 && inport == "lr0-public" && > is_chassis_resident("cr-lr0-public")), action=(ct_snat;) > table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == > 10.0.0.10 && outport == "lr0-public" && > is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), > action=(ct_snat(172.168.0.30);) > table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == > 10.0.0.3 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public") > && (!ct.trk || !ct.rpl)), action=(ct_snat(172.168.0.20);) > ]) > @@ -7643,6 +7653,9 @@ AT_CHECK([grep lr_in_unsnat lrflows | grep ct_snat | > ovn_strip_lflows], [0], [dn > ]) > > AT_CHECK([grep lr_out_snat lrflows | grep ct_snat | ovn_strip_lflows], > [0], [dnl > + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == > 20.0.0.10 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1")), > action=(ct_snat;) > + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == > 20.0.0.10 && inport == "DR-S2" && is_chassis_resident("cr-DR-S2")), > action=(ct_snat;) > + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == > 20.0.0.10 && inport == "DR-S3" && is_chassis_resident("cr-DR-S3")), > action=(ct_snat;) > table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == > 20.0.0.10 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && > (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.10);) > table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == > 20.0.0.10 && outport == "DR-S2" && is_chassis_resident("cr-DR-S2") && > (!ct.trk || !ct.rpl)), action=(ct_snat(10.0.0.10);) > table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == > 20.0.0.10 && outport == "DR-S3" && is_chassis_resident("cr-DR-S3") && > (!ct.trk || !ct.rpl)), action=(ct_snat(192.168.0.10);) > diff --git a/tests/system-ovn.at b/tests/system-ovn.at > index c22c7882f..80e9b7d0b 100644 > --- a/tests/system-ovn.at > +++ b/tests/system-ovn.at > @@ -3572,6 +3572,40 @@ NS_CHECK_EXEC([foo1], [ping -q -c 3 -i 0.3 -w 2 > 10.0.0.1 | FORMAT_PING], \ > 3 packets transmitted, 3 received, 0% packet loss, time 0ms > ]) > > + > +# test_connectivity_from_ext takes parameters 'vm' and 'ip'. It tests > +# icmp, udp and tcp connectivity from external network to the 'vm' on > +# the specified 'ip'. > +test_connectivity_from_ext() { > + local vm=$1; shift > + local ip=$1; shift > + > + # Start listening daemons for UDP and TCP connections > + NETNS_DAEMONIZE($vm, [nc -l -u 1234], [nc-$vm-$ip-udp.pid]) > + NETNS_DAEMONIZE($vm, [nc -l -k 1235], [nc-$vm-$ip-tcp.pid]) > + > + # Ensure that vm can be pinged on the specified IP > + NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 $ip | FORMAT_PING], > \ > + [0], [dnl > +3 packets transmitted, 3 received, 0% packet loss, time 0ms > +]) > + > + # Perform two consecutive UDP connections to the specified IP > + NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z]) > + NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z]) > + > + # Send data over TCP connection to the specified IP > + NS_CHECK_EXEC([alice1], [echo "TCP test" | nc --send-only $ip 1235]) > +} > + > +# Test access from external network to the internal IP of a VM that > +# has also configured DNAT > +test_connectivity_from_ext foo1 192.168.1.2 > + > +# Test access from external network to the internal IP of a VM that > +# does not have DNAT > +test_connectivity_from_ext bar1 192.168.2.2 > + > OVS_WAIT_UNTIL([ > total_pkts=$(cat ext-net.pcap | wc -l) > test "${total_pkts}" = "3" > @@ -3738,6 +3772,39 @@ sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl > > icmpv6,orig=(src=fd12::2,dst=fd20::2,id=<cleared>,type=128,code=0),reply=(src=fd20::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared> > ]) > > +# test_connectivity_from_ext takes parameters 'vm' and 'ip'. It tests > +# icmp, udp and tcp connectivity from external network to the 'vm' on > +# the specified 'ip'. > +test_connectivity_from_ext() { > + local vm=$1; shift > + local ip=$1; shift > + > + # Start listening daemons for UDP and TCP connections > + NETNS_DAEMONIZE($vm, [nc -l -u 1234], [nc-$vm-$ip-udp.pid]) > + NETNS_DAEMONIZE($vm, [nc -l -k 1235], [nc-$vm-$ip-tcp.pid]) > + > + # Ensure that vm can be pinged on the specified IP > + NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 $ip | FORMAT_PING], > \ > + [0], [dnl > +3 packets transmitted, 3 received, 0% packet loss, time 0ms > +]) > + > + # Perform two consecutive UDP connections to the specified IP > + NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z]) > + NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z]) > + > + # Send data over TCP connection to the specified IP > + NS_CHECK_EXEC([alice1], [echo "TCP test" | nc --send-only $ip 1235]) > +} > + > +# Test access from external network to the internal IP of a VM that > +# has also configured DNAT > +test_connectivity_from_ext foo1 fd11::2 > + > +# Test access from external network to the internal IP of a VM that > +# does not have DNAT > +test_connectivity_from_ext bar1 fd12::2 > + > OVS_APP_EXIT_AND_WAIT([ovn-controller]) > > as ovn-sb > @@ -3918,6 +3985,7 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2 > 172.16.1.4 | FORMAT_PING], \ > AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep icmp | > FORMAT_CT(172.16.1.1) | \ > sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl > > icmp,orig=(src=172.16.1.1,dst=172.16.1.4,id=<cleared>,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared> > > +icmp,orig=(src=172.16.1.1,dst=192.168.2.2,id=<cleared>,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared> > > icmp,orig=(src=192.168.1.3,dst=172.16.1.4,id=<cleared>,type=8,code=0),reply=(src=172.16.1.4,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared> > ]) > > @@ -4086,6 +4154,7 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2 > fd20::4 | FORMAT_PING], \ > AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd20::1) | \ > sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl > > icmpv6,orig=(src=fd11::3,dst=fd20::4,id=<cleared>,type=128,code=0),reply=(src=fd20::4,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared> > > +icmpv6,orig=(src=fd20::1,dst=fd12::2,id=<cleared>,type=128,code=0),reply=(src=fd12::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared> > > icmpv6,orig=(src=fd20::1,dst=fd20::4,id=<cleared>,type=128,code=0),reply=(src=fd12::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared> > ]) > > -- > 2.40.1 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > One thing that slightly worries me is the potential impact on traffic performance. Basically anything going towards the SNAT would go through CT at least once. Thanks, Ales
On Wed, Mar 6, 2024 at 8:07 AM Ales Musil <amusil@redhat.com> wrote: > > > On Mon, Mar 4, 2024 at 11:56 AM Martin Kalcok <martin.kalcok@canonical.com> > wrote: > >> This change fixes bug that breaks ability of machines from external >> networks to communicate with machines in SNATed networks (specifically >> when using a Distributed router). >> >> Currently when a machine (S1) on an external network tries to talk >> over TCP with a machine (A1) in a network that has enabled SNAT, the >> connection is established successfully. However after the three-way >> handshake, any packets that come from the A1 machine will have their >> source address translated by the Distributed router, breaking the >> communication. >> >> Existing rule in `build_lrouter_out_snat_flow` that decides which >> packets should be SNATed already tries to avoid SNATing packets in >> reply direction with `(!ct.trk || !ct.rpl)`. However, previous stages >> in the distributed LR egress pipeline do not initiate the CT state. >> >> Additionally we need to commit new connections that originate from >> external networks into CT, so that the packets in the reply direction >> can be properly identified. >> >> Rationale: >> >> In my original RFC [0], there were questions about the motivation for >> fixing this issue. I'll try to summarize why I think this is a bug >> that should be fixed. >> >> 1. Current implementation for Distributed router already tries to >> avoid SNATing packets in the reply direction, it's just missing >> initialized CT states to make proper decisions. >> >> 2. This same scenario works with Gateway Router. I tested with >> following setup: >> >> foo -- R1 -- join - R3 -- alice >> | >> bar ----------R2 >> >> R1 is a Distributed router with SNAT for foo. R2 is a Gateway >> router with SNAT for bar. R3 is a Gateway router with no SNAT. >> Using 'alice1' as a client I was able to talk over TCP with >> 'bar1' but connection with 'foo1' failed. >> >> 3. Regarding security and "leaking" of internal IPs. Reading through >> RFC 4787 [1], 5382 [2] and their update in 7857 [3], the >> specifications do not seem to mandate that SNAT implementations >> must filter incoming traffic destined directly to the internal >> network. Sections like "5. Filtering Behavior" in 4787 and >> "4.3 Externally Initiated Connections" in 5382 describe only >> behavior for traffic destined to external IP/Port associated >> with NAT on the device that performs NAT. >> >> Besides, with the current implementation, it's already possible >> to scan the internal network with pings and TCP syn scanning. >> >> If an additional security is needed, ACLs can be used to filter >> out incomming traffic from external networks. >> >> 4. We do have customers/clouds that depend on this functionality. >> This is a scenario that used to work in Openstack with ML2/OVS >> and migrating those clouds to ML2/OVN would break it. >> >> [0] >> https://mail.openvswitch.org/pipermail/ovs-dev/2024-February/411670.html >> [1]https://datatracker.ietf.org/doc/html/rfc4787 >> [2]https://datatracker.ietf.org/doc/html/rfc5382 >> [3]https://datatracker.ietf.org/doc/html/rfc7857 >> >> Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com> >> --- >> > > Hi Martin, > > I have a few comments down below. > > >> northd/northd.c | 82 ++++++++++++++++++++++++++++++++++++----- >> northd/ovn-northd.8.xml | 29 +++++++++++++++ >> tests/ovn-northd.at | 15 +++++++- >> tests/system-ovn.at | 69 ++++++++++++++++++++++++++++++++++ >> 4 files changed, 185 insertions(+), 10 deletions(-) >> >> diff --git a/northd/northd.c b/northd/northd.c >> index 2c3560ce2..4b79b357c 100644 >> --- a/northd/northd.c >> +++ b/northd/northd.c >> @@ -14437,21 +14437,28 @@ build_lrouter_out_is_dnat_local(struct >> lflow_table *lflows, >> } >> >> static void >> -build_lrouter_out_snat_match(struct lflow_table *lflows, >> - const struct ovn_datapath *od, >> - const struct nbrec_nat *nat, struct ds >> *match, >> - bool distributed_nat, int cidr_bits, bool >> is_v6, >> - struct ovn_port *l3dgw_port, >> - struct lflow_ref *lflow_ref) >> +build_lrouter_out_snat_direction_match__(struct lflow_table *lflows, >> + const struct ovn_datapath *od, >> + const struct nbrec_nat *nat, >> + struct ds *match, >> + bool distributed_nat, int >> cidr_bits, >> + bool is_v6, >> + struct ovn_port *l3dgw_port, >> + struct lflow_ref *lflow_ref, >> + bool is_reverse) >> { >> ds_clear(match); >> >> - ds_put_format(match, "ip && ip%c.src == %s", is_v6 ? '6' : '4', >> + ds_put_format(match, "ip && ip%c.%s == %s", >> + is_v6 ? '6' : '4', >> + is_reverse ? "dst" : "src", >> nat->logical_ip); >> >> if (!od->is_gw_router) { >> /* Distributed router. */ >> - ds_put_format(match, " && outport == %s", l3dgw_port->json_key); >> + ds_put_format(match, " && %s == %s", >> + is_reverse ? "inport" : "outport", >> + l3dgw_port->json_key); >> if (od->n_l3dgw_ports) { >> ds_put_format(match, " && is_chassis_resident(\"%s\")", >> distributed_nat >> @@ -14462,11 +14469,37 @@ build_lrouter_out_snat_match(struct lflow_table >> *lflows, >> >> if (nat->allowed_ext_ips || nat->exempted_ext_ips) { >> lrouter_nat_add_ext_ip_match(od, lflows, match, nat, >> - is_v6, false, cidr_bits, >> + is_v6, is_reverse, cidr_bits, >> lflow_ref); >> } >> } >> >> +static void >> +build_lrouter_out_snat_match(struct lflow_table *lflows, >> + const struct ovn_datapath *od, >> + const struct nbrec_nat *nat, struct ds >> *match, >> + bool distributed_nat, int cidr_bits, bool >> is_v6, >> + struct ovn_port *l3dgw_port, >> + struct lflow_ref *lflow_ref) >> +{ >> + build_lrouter_out_snat_direction_match__(lflows, od, nat, match, >> + distributed_nat, cidr_bits, >> is_v6, >> + l3dgw_port, lflow_ref, >> false); >> +} >> + >> +static void >> +build_lrouter_out_snat_reverse_match(struct lflow_table *lflows, >> + const struct ovn_datapath *od, >> + const struct nbrec_nat *nat, struct ds >> *match, >> + bool distributed_nat, int cidr_bits, bool >> is_v6, >> + struct ovn_port *l3dgw_port, >> + struct lflow_ref *lflow_ref) >> +{ >> + build_lrouter_out_snat_direction_match__(lflows, od, nat, match, >> + distributed_nat, cidr_bits, >> is_v6, >> + l3dgw_port, lflow_ref, >> true); >> +} >> + >> > > nit: I don't think we need two new functions when only one argument was > added. Considering the old function was used at 3 places only anyway. > > >> static void >> build_lrouter_out_snat_stateless_flow(struct lflow_table *lflows, >> const struct ovn_datapath *od, >> @@ -14591,6 +14624,7 @@ build_lrouter_out_snat_flow(struct lflow_table >> *lflows, >> return; >> } >> >> + struct ds match_all_from_snat = DS_EMPTY_INITIALIZER; >> ds_clear(actions); >> >> /* The priority here is calculated such that the >> @@ -14600,6 +14634,7 @@ build_lrouter_out_snat_flow(struct lflow_table >> *lflows, >> >> build_lrouter_out_snat_match(lflows, od, nat, match, distributed_nat, >> cidr_bits, is_v6, l3dgw_port, >> lflow_ref); >> + ds_clone(&match_all_from_snat, match); >> >> if (!od->is_gw_router) { >> /* Distributed router. */ >> @@ -14624,6 +14659,35 @@ build_lrouter_out_snat_flow(struct lflow_table >> *lflows, >> priority, ds_cstr(match), >> ds_cstr(actions), &nat->header_, >> lflow_ref); >> + >> + /* For the SNAT networks, we need to make sure that connections are >> + * properly tracked so we can decide whether to perform SNAT on >> traffic >> + * exiting the network. */ >> + if (!strcmp(nat->type, "snat") && !od->is_gw_router) { >> > + /* For traffic that comes from SNAT network, initiate CT state >> before >> + * entering S_ROUTER_OUT_SNAT to allow matching on various CT >> states. >> + */ >> + ovn_lflow_add(lflows, od, S_ROUTER_OUT_POST_UNDNAT, 70, >> + ds_cstr(&match_all_from_snat), "ct_snat; ", >> + lflow_ref); >> > + >> + build_lrouter_out_snat_reverse_match(lflows, od, nat, match, >> + distributed_nat, cidr_bits, >> is_v6, >> + l3dgw_port, lflow_ref); >> + >> + /* New traffic that goes into SNAT network is committed to CT to >> avoid >> + * SNAT-ing replies.*/ >> + ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, priority, >> + ds_cstr(match), "ct_snat;", >> + lflow_ref); >> + >> + ds_put_cstr(match, "&& ct.new"); >> + ovn_lflow_add(lflows, od, S_ROUTER_OUT_POST_SNAT, priority, >> + ds_cstr(match), "ct_commit(snat); next; ", >> + lflow_ref); >> + } >> + >> + ds_destroy(&match_all_from_snat); >> > > I suppose that the gw router works because we have ct_commit for new > traffic [0]. I wonder if we could extend that with a check for the DGP case. > > >> } >> >> static void >> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml >> index 17b414144..ce6fd1270 100644 >> --- a/northd/ovn-northd.8.xml >> +++ b/northd/ovn-northd.8.xml >> @@ -4869,6 +4869,13 @@ nd_ns { >> >> <p> >> <ul> >> + <li> >> + A priority-70 logical flow is added that initiates CT state for >> + traffic that is configured to be SNATed on Distributed routers. >> + This allows the next table, <code>lr_out_snat</code>, to >> + effectively match on various CT states. >> + </li> >> + >> <li> >> A priority-50 logical flow is added that commits any untracked >> flows >> from the previous table <code>lr_out_undnat</code> for Gateway >> @@ -5061,6 +5068,18 @@ nd_ns { >> >> </li> >> >> + <li> >> + An additional flow is added for traffic that goes in opposite >> + direction (i.e. it enters a network with configured SNAT). Where >> the >> + flow above matched on <code>ip4.src == <var>A</var> && >> outport >> + == <var>GW</var></code>, this flow matches on <code> ip4.dst == >> + <var>A</var> && inport == <var>GW</var></code>. A CT >> state is >> + initiated for this traffic so that the following table, <code> >> + lr_out_post_snat</code>, can identify whether the traffic flow >> was >> + initiated from the internal or external network. >> + >> + </li> >> + >> <li> >> A priority-0 logical flow with match <code>1</code> has actions >> <code>next;</code>. >> @@ -5072,6 +5091,16 @@ nd_ns { >> <p> >> Packets reaching this table are processed according to the flows >> below: >> <ul> >> + <li> >> + Traffic that goes directly into a network configured with SNAT >> on >> + Distributed routers, and was initiated from an external >> network (i.e. >> + it matches <code>ct.new</code>), is committed to the SNAT CT >> zone. >> + This ensures that replies returning from the SNATed network do >> not >> + have their source address translated. For details about match >> rules >> + and priority see section "Egress Table 3: SNAT on Distributed >> + Routers". >> + </li> >> + >> <li> >> A priority-0 logical flow that matches all packets not already >> handled (match <code>1</code>) and action <code>next;</code>. >> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >> index 0732486f3..a7597a975 100644 >> --- a/tests/ovn-northd.at >> +++ b/tests/ovn-northd.at >> @@ -1155,6 +1155,7 @@ AT_CAPTURE_FILE([crflows]) >> AT_CHECK([grep -e "lr_out_snat" drflows | ovn_strip_lflows], [0], [dnl >> table=??(lr_out_snat ), priority=0 , match=(1), >> action=(next;) >> table=??(lr_out_snat ), priority=120 , match=(nd_ns), >> action=(next;) >> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == >> 50.0.0.11 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1") && >> ip4.src == $allowed_range), action=(ct_snat;) >> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == >> 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && >> ip4.dst == $allowed_range && (!ct.trk || !ct.rpl)), >> action=(ct_snat(172.16.1.1);) >> ]) >> >> @@ -1185,6 +1186,7 @@ AT_CAPTURE_FILE([crflows2]) >> AT_CHECK([grep -e "lr_out_snat" drflows2 | ovn_strip_lflows], [0], [dnl >> table=??(lr_out_snat ), priority=0 , match=(1), >> action=(next;) >> table=??(lr_out_snat ), priority=120 , match=(nd_ns), >> action=(next;) >> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == >> 50.0.0.11 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1")), >> action=(ct_snat;) >> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == >> 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && >> (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.1);) >> table=??(lr_out_snat ), priority=163 , match=(ip && ip4.src == >> 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && >> ip4.dst == $disallowed_range), action=(next;) >> ]) >> @@ -1279,7 +1281,7 @@ AT_CHECK([grep -e "lr_out_snat" crflows5 | >> ovn_strip_lflows], [0], [dnl >> table=??(lr_out_snat ), priority=33 , match=(ip && ip4.src == >> 50.0.0.11 && ip4.dst == $allowed_range), action=(ip4.src=172.16.1.2; next;) >> ]) >> >> -# Stateful FIP with DISALLOWED_IPs >> +# Stateless FIP with DISALLOWED_IPs >> ovn-nbctl lr-nat-del DR dnat_and_snat 172.16.1.2 >> ovn-nbctl lr-nat-del CR dnat_and_snat 172.16.1.2 >> >> @@ -5590,12 +5592,16 @@ AT_CHECK([grep "lr_out_undnat" lr0flows | >> ovn_strip_lflows], [0], [dnl >> >> AT_CHECK([grep "lr_out_post_undnat" lr0flows | ovn_strip_lflows], [0], >> [dnl >> table=??(lr_out_post_undnat ), priority=0 , match=(1), >> action=(next;) >> + table=??(lr_out_post_undnat ), priority=70 , match=(ip && ip4.src == >> 10.0.0.0/24 && outport == "lr0-public" && >> is_chassis_resident("cr-lr0-public")), action=(ct_snat; ) >> + table=??(lr_out_post_undnat ), priority=70 , match=(ip && ip4.src == >> 10.0.0.10 && outport == "lr0-public" && >> is_chassis_resident("cr-lr0-public")), action=(ct_snat; ) >> ]) >> >> AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl >> table=??(lr_out_snat ), priority=0 , match=(1), >> action=(next;) >> table=??(lr_out_snat ), priority=120 , match=(nd_ns), >> action=(next;) >> + table=??(lr_out_snat ), priority=153 , match=(ip && ip4.dst == >> 10.0.0.0/24 && inport == "lr0-public" && >> is_chassis_resident("cr-lr0-public")), action=(ct_snat;) >> table=??(lr_out_snat ), priority=153 , match=(ip && ip4.src == >> 10.0.0.0/24 && outport == "lr0-public" && >> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), >> action=(ct_snat(172.168.0.10);) >> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == >> 10.0.0.10 && inport == "lr0-public" && >> is_chassis_resident("cr-lr0-public")), action=(ct_snat;) >> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == >> 10.0.0.10 && outport == "lr0-public" && >> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), >> action=(ct_snat(172.168.0.30);) >> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == >> 10.0.0.3 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public") >> && (!ct.trk || !ct.rpl)), action=(ct_snat(172.168.0.20);) >> ]) >> @@ -5738,12 +5744,16 @@ AT_CHECK([grep "lr_out_undnat" lr0flows | >> ovn_strip_lflows], [0], [dnl >> >> AT_CHECK([grep "lr_out_post_undnat" lr0flows | ovn_strip_lflows], [0], >> [dnl >> table=??(lr_out_post_undnat ), priority=0 , match=(1), >> action=(next;) >> + table=??(lr_out_post_undnat ), priority=70 , match=(ip && ip4.src == >> 10.0.0.0/24 && outport == "lr0-public" && >> is_chassis_resident("cr-lr0-public")), action=(ct_snat; ) >> + table=??(lr_out_post_undnat ), priority=70 , match=(ip && ip4.src == >> 10.0.0.10 && outport == "lr0-public" && >> is_chassis_resident("cr-lr0-public")), action=(ct_snat; ) >> ]) >> >> AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl >> table=??(lr_out_snat ), priority=0 , match=(1), >> action=(next;) >> table=??(lr_out_snat ), priority=120 , match=(nd_ns), >> action=(next;) >> + table=??(lr_out_snat ), priority=153 , match=(ip && ip4.dst == >> 10.0.0.0/24 && inport == "lr0-public" && >> is_chassis_resident("cr-lr0-public")), action=(ct_snat;) >> table=??(lr_out_snat ), priority=153 , match=(ip && ip4.src == >> 10.0.0.0/24 && outport == "lr0-public" && >> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), >> action=(ct_snat(172.168.0.10);) >> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == >> 10.0.0.10 && inport == "lr0-public" && >> is_chassis_resident("cr-lr0-public")), action=(ct_snat;) >> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == >> 10.0.0.10 && outport == "lr0-public" && >> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), >> action=(ct_snat(172.168.0.30);) >> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == >> 10.0.0.3 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public") >> && (!ct.trk || !ct.rpl)), action=(ct_snat(172.168.0.20);) >> ]) >> @@ -7643,6 +7653,9 @@ AT_CHECK([grep lr_in_unsnat lrflows | grep ct_snat >> | ovn_strip_lflows], [0], [dn >> ]) >> >> AT_CHECK([grep lr_out_snat lrflows | grep ct_snat | ovn_strip_lflows], >> [0], [dnl >> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == >> 20.0.0.10 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1")), >> action=(ct_snat;) >> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == >> 20.0.0.10 && inport == "DR-S2" && is_chassis_resident("cr-DR-S2")), >> action=(ct_snat;) >> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == >> 20.0.0.10 && inport == "DR-S3" && is_chassis_resident("cr-DR-S3")), >> action=(ct_snat;) >> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == >> 20.0.0.10 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && >> (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.10);) >> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == >> 20.0.0.10 && outport == "DR-S2" && is_chassis_resident("cr-DR-S2") && >> (!ct.trk || !ct.rpl)), action=(ct_snat(10.0.0.10);) >> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == >> 20.0.0.10 && outport == "DR-S3" && is_chassis_resident("cr-DR-S3") && >> (!ct.trk || !ct.rpl)), action=(ct_snat(192.168.0.10);) >> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >> index c22c7882f..80e9b7d0b 100644 >> --- a/tests/system-ovn.at >> +++ b/tests/system-ovn.at >> @@ -3572,6 +3572,40 @@ NS_CHECK_EXEC([foo1], [ping -q -c 3 -i 0.3 -w 2 >> 10.0.0.1 | FORMAT_PING], \ >> 3 packets transmitted, 3 received, 0% packet loss, time 0ms >> ]) >> >> + >> +# test_connectivity_from_ext takes parameters 'vm' and 'ip'. It tests >> +# icmp, udp and tcp connectivity from external network to the 'vm' on >> +# the specified 'ip'. >> +test_connectivity_from_ext() { >> + local vm=$1; shift >> + local ip=$1; shift >> + >> + # Start listening daemons for UDP and TCP connections >> + NETNS_DAEMONIZE($vm, [nc -l -u 1234], [nc-$vm-$ip-udp.pid]) >> + NETNS_DAEMONIZE($vm, [nc -l -k 1235], [nc-$vm-$ip-tcp.pid]) >> + >> + # Ensure that vm can be pinged on the specified IP >> + NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 $ip | >> FORMAT_PING], \ >> + [0], [dnl >> +3 packets transmitted, 3 received, 0% packet loss, time 0ms >> +]) >> + >> + # Perform two consecutive UDP connections to the specified IP >> + NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z]) >> + NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z]) >> + >> + # Send data over TCP connection to the specified IP >> + NS_CHECK_EXEC([alice1], [echo "TCP test" | nc --send-only $ip 1235]) >> +} >> + >> +# Test access from external network to the internal IP of a VM that >> +# has also configured DNAT >> +test_connectivity_from_ext foo1 192.168.1.2 >> + >> +# Test access from external network to the internal IP of a VM that >> +# does not have DNAT >> +test_connectivity_from_ext bar1 192.168.2.2 >> + >> OVS_WAIT_UNTIL([ >> total_pkts=$(cat ext-net.pcap | wc -l) >> test "${total_pkts}" = "3" >> @@ -3738,6 +3772,39 @@ sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], >> [dnl >> >> icmpv6,orig=(src=fd12::2,dst=fd20::2,id=<cleared>,type=128,code=0),reply=(src=fd20::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared> >> ]) >> >> +# test_connectivity_from_ext takes parameters 'vm' and 'ip'. It tests >> +# icmp, udp and tcp connectivity from external network to the 'vm' on >> +# the specified 'ip'. >> +test_connectivity_from_ext() { >> + local vm=$1; shift >> + local ip=$1; shift >> + >> + # Start listening daemons for UDP and TCP connections >> + NETNS_DAEMONIZE($vm, [nc -l -u 1234], [nc-$vm-$ip-udp.pid]) >> + NETNS_DAEMONIZE($vm, [nc -l -k 1235], [nc-$vm-$ip-tcp.pid]) >> + >> + # Ensure that vm can be pinged on the specified IP >> + NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 $ip | >> FORMAT_PING], \ >> + [0], [dnl >> +3 packets transmitted, 3 received, 0% packet loss, time 0ms >> +]) >> + >> + # Perform two consecutive UDP connections to the specified IP >> + NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z]) >> + NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z]) >> + >> + # Send data over TCP connection to the specified IP >> + NS_CHECK_EXEC([alice1], [echo "TCP test" | nc --send-only $ip 1235]) >> +} >> + >> +# Test access from external network to the internal IP of a VM that >> +# has also configured DNAT >> +test_connectivity_from_ext foo1 fd11::2 >> + >> +# Test access from external network to the internal IP of a VM that >> +# does not have DNAT >> +test_connectivity_from_ext bar1 fd12::2 >> + >> OVS_APP_EXIT_AND_WAIT([ovn-controller]) >> >> as ovn-sb >> @@ -3918,6 +3985,7 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2 >> 172.16.1.4 | FORMAT_PING], \ >> AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep icmp | >> FORMAT_CT(172.16.1.1) | \ >> sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl >> >> icmp,orig=(src=172.16.1.1,dst=172.16.1.4,id=<cleared>,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared> >> >> +icmp,orig=(src=172.16.1.1,dst=192.168.2.2,id=<cleared>,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared> >> >> icmp,orig=(src=192.168.1.3,dst=172.16.1.4,id=<cleared>,type=8,code=0),reply=(src=172.16.1.4,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared> >> ]) >> >> @@ -4086,6 +4154,7 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2 >> fd20::4 | FORMAT_PING], \ >> AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd20::1) | \ >> sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl >> >> icmpv6,orig=(src=fd11::3,dst=fd20::4,id=<cleared>,type=128,code=0),reply=(src=fd20::4,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared> >> >> +icmpv6,orig=(src=fd20::1,dst=fd12::2,id=<cleared>,type=128,code=0),reply=(src=fd12::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared> >> >> icmpv6,orig=(src=fd20::1,dst=fd20::4,id=<cleared>,type=128,code=0),reply=(src=fd12::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared> >> ]) >> >> -- >> 2.40.1 >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> > One thing that slightly worries me is the potential impact on traffic > performance. Basically anything going towards the SNAT would go through CT > at least once. > > Thanks, > Ales > > -- > > Ales Musil > > Senior Software Engineer - OVN Core > > Red Hat EMEA <https://www.redhat.com> > > amusil@redhat.com > <https://red.ht/sig> > Forgot to add the link. [0] https://github.com/ovn-org/ovn/blob/b92ad9e0b408a202273d69ba573f2538e53c6e48/northd/northd.c#L14998-L15009
Thanks for reviewing this as well, Ales. On Wed, Mar 6, 2024 at 8:08 AM Ales Musil <amusil@redhat.com> wrote: > > > On Wed, Mar 6, 2024 at 8:07 AM Ales Musil <amusil@redhat.com> wrote: > >> >> >> On Mon, Mar 4, 2024 at 11:56 AM Martin Kalcok < >> martin.kalcok@canonical.com> wrote: >> >>> This change fixes bug that breaks ability of machines from external >>> networks to communicate with machines in SNATed networks (specifically >>> when using a Distributed router). >>> >>> Currently when a machine (S1) on an external network tries to talk >>> over TCP with a machine (A1) in a network that has enabled SNAT, the >>> connection is established successfully. However after the three-way >>> handshake, any packets that come from the A1 machine will have their >>> source address translated by the Distributed router, breaking the >>> communication. >>> >>> Existing rule in `build_lrouter_out_snat_flow` that decides which >>> packets should be SNATed already tries to avoid SNATing packets in >>> reply direction with `(!ct.trk || !ct.rpl)`. However, previous stages >>> in the distributed LR egress pipeline do not initiate the CT state. >>> >>> Additionally we need to commit new connections that originate from >>> external networks into CT, so that the packets in the reply direction >>> can be properly identified. >>> >>> Rationale: >>> >>> In my original RFC [0], there were questions about the motivation for >>> fixing this issue. I'll try to summarize why I think this is a bug >>> that should be fixed. >>> >>> 1. Current implementation for Distributed router already tries to >>> avoid SNATing packets in the reply direction, it's just missing >>> initialized CT states to make proper decisions. >>> >>> 2. This same scenario works with Gateway Router. I tested with >>> following setup: >>> >>> foo -- R1 -- join - R3 -- alice >>> | >>> bar ----------R2 >>> >>> R1 is a Distributed router with SNAT for foo. R2 is a Gateway >>> router with SNAT for bar. R3 is a Gateway router with no SNAT. >>> Using 'alice1' as a client I was able to talk over TCP with >>> 'bar1' but connection with 'foo1' failed. >>> >>> 3. Regarding security and "leaking" of internal IPs. Reading through >>> RFC 4787 [1], 5382 [2] and their update in 7857 [3], the >>> specifications do not seem to mandate that SNAT implementations >>> must filter incoming traffic destined directly to the internal >>> network. Sections like "5. Filtering Behavior" in 4787 and >>> "4.3 Externally Initiated Connections" in 5382 describe only >>> behavior for traffic destined to external IP/Port associated >>> with NAT on the device that performs NAT. >>> >>> Besides, with the current implementation, it's already possible >>> to scan the internal network with pings and TCP syn scanning. >>> >>> If an additional security is needed, ACLs can be used to filter >>> out incomming traffic from external networks. >>> >>> 4. We do have customers/clouds that depend on this functionality. >>> This is a scenario that used to work in Openstack with ML2/OVS >>> and migrating those clouds to ML2/OVN would break it. >>> >>> [0] >>> https://mail.openvswitch.org/pipermail/ovs-dev/2024-February/411670.html >>> [1]https://datatracker.ietf.org/doc/html/rfc4787 >>> [2]https://datatracker.ietf.org/doc/html/rfc5382 >>> [3]https://datatracker.ietf.org/doc/html/rfc7857 >>> >>> Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com> >>> --- >>> >> >> Hi Martin, >> >> I have a few comments down below. >> >> >>> northd/northd.c | 82 ++++++++++++++++++++++++++++++++++++----- >>> northd/ovn-northd.8.xml | 29 +++++++++++++++ >>> tests/ovn-northd.at | 15 +++++++- >>> tests/system-ovn.at | 69 ++++++++++++++++++++++++++++++++++ >>> 4 files changed, 185 insertions(+), 10 deletions(-) >>> >>> diff --git a/northd/northd.c b/northd/northd.c >>> index 2c3560ce2..4b79b357c 100644 >>> --- a/northd/northd.c >>> +++ b/northd/northd.c >>> @@ -14437,21 +14437,28 @@ build_lrouter_out_is_dnat_local(struct >>> lflow_table *lflows, >>> } >>> >>> static void >>> -build_lrouter_out_snat_match(struct lflow_table *lflows, >>> - const struct ovn_datapath *od, >>> - const struct nbrec_nat *nat, struct ds >>> *match, >>> - bool distributed_nat, int cidr_bits, bool >>> is_v6, >>> - struct ovn_port *l3dgw_port, >>> - struct lflow_ref *lflow_ref) >>> +build_lrouter_out_snat_direction_match__(struct lflow_table *lflows, >>> + const struct ovn_datapath *od, >>> + const struct nbrec_nat *nat, >>> + struct ds *match, >>> + bool distributed_nat, int >>> cidr_bits, >>> + bool is_v6, >>> + struct ovn_port *l3dgw_port, >>> + struct lflow_ref *lflow_ref, >>> + bool is_reverse) >>> { >>> ds_clear(match); >>> >>> - ds_put_format(match, "ip && ip%c.src == %s", is_v6 ? '6' : '4', >>> + ds_put_format(match, "ip && ip%c.%s == %s", >>> + is_v6 ? '6' : '4', >>> + is_reverse ? "dst" : "src", >>> nat->logical_ip); >>> >>> if (!od->is_gw_router) { >>> /* Distributed router. */ >>> - ds_put_format(match, " && outport == %s", l3dgw_port->json_key); >>> + ds_put_format(match, " && %s == %s", >>> + is_reverse ? "inport" : "outport", >>> + l3dgw_port->json_key); >>> if (od->n_l3dgw_ports) { >>> ds_put_format(match, " && is_chassis_resident(\"%s\")", >>> distributed_nat >>> @@ -14462,11 +14469,37 @@ build_lrouter_out_snat_match(struct >>> lflow_table *lflows, >>> >>> if (nat->allowed_ext_ips || nat->exempted_ext_ips) { >>> lrouter_nat_add_ext_ip_match(od, lflows, match, nat, >>> - is_v6, false, cidr_bits, >>> + is_v6, is_reverse, cidr_bits, >>> lflow_ref); >>> } >>> } >>> >>> +static void >>> +build_lrouter_out_snat_match(struct lflow_table *lflows, >>> + const struct ovn_datapath *od, >>> + const struct nbrec_nat *nat, struct ds >>> *match, >>> + bool distributed_nat, int cidr_bits, bool >>> is_v6, >>> + struct ovn_port *l3dgw_port, >>> + struct lflow_ref *lflow_ref) >>> +{ >>> + build_lrouter_out_snat_direction_match__(lflows, od, nat, match, >>> + distributed_nat, >>> cidr_bits, is_v6, >>> + l3dgw_port, lflow_ref, >>> false); >>> +} >>> + >>> +static void >>> +build_lrouter_out_snat_reverse_match(struct lflow_table *lflows, >>> + const struct ovn_datapath *od, >>> + const struct nbrec_nat *nat, struct ds >>> *match, >>> + bool distributed_nat, int cidr_bits, bool >>> is_v6, >>> + struct ovn_port *l3dgw_port, >>> + struct lflow_ref *lflow_ref) >>> +{ >>> + build_lrouter_out_snat_direction_match__(lflows, od, nat, match, >>> + distributed_nat, >>> cidr_bits, is_v6, >>> + l3dgw_port, lflow_ref, >>> true); >>> +} >>> + >>> >> >> nit: I don't think we need two new functions when only one argument was >> added. Considering the old function was used at 3 places only anyway. >> > Ack. I went with this approach because I deemed it more cautious/less intrusive, although bit more verbose. However If you are OK with changing the function's signature, I can go with that as well. > >> >>> static void >>> build_lrouter_out_snat_stateless_flow(struct lflow_table *lflows, >>> const struct ovn_datapath *od, >>> @@ -14591,6 +14624,7 @@ build_lrouter_out_snat_flow(struct lflow_table >>> *lflows, >>> return; >>> } >>> >>> + struct ds match_all_from_snat = DS_EMPTY_INITIALIZER; >>> ds_clear(actions); >>> >>> /* The priority here is calculated such that the >>> @@ -14600,6 +14634,7 @@ build_lrouter_out_snat_flow(struct lflow_table >>> *lflows, >>> >>> build_lrouter_out_snat_match(lflows, od, nat, match, >>> distributed_nat, >>> cidr_bits, is_v6, l3dgw_port, >>> lflow_ref); >>> + ds_clone(&match_all_from_snat, match); >>> >>> if (!od->is_gw_router) { >>> /* Distributed router. */ >>> @@ -14624,6 +14659,35 @@ build_lrouter_out_snat_flow(struct lflow_table >>> *lflows, >>> priority, ds_cstr(match), >>> ds_cstr(actions), &nat->header_, >>> lflow_ref); >>> + >>> + /* For the SNAT networks, we need to make sure that connections are >>> + * properly tracked so we can decide whether to perform SNAT on >>> traffic >>> + * exiting the network. */ >>> + if (!strcmp(nat->type, "snat") && !od->is_gw_router) { >>> >> + /* For traffic that comes from SNAT network, initiate CT state >>> before >>> + * entering S_ROUTER_OUT_SNAT to allow matching on various CT >>> states. >>> + */ >>> + ovn_lflow_add(lflows, od, S_ROUTER_OUT_POST_UNDNAT, 70, >>> + ds_cstr(&match_all_from_snat), "ct_snat; ", >>> + lflow_ref); >>> >> + >>> + build_lrouter_out_snat_reverse_match(lflows, od, nat, match, >>> + distributed_nat, >>> cidr_bits, is_v6, >>> + l3dgw_port, lflow_ref); >>> + >>> + /* New traffic that goes into SNAT network is committed to CT >>> to avoid >>> + * SNAT-ing replies.*/ >>> + ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, priority, >>> + ds_cstr(match), "ct_snat;", >>> + lflow_ref); >>> + >>> + ds_put_cstr(match, "&& ct.new"); >>> + ovn_lflow_add(lflows, od, S_ROUTER_OUT_POST_SNAT, priority, >>> + ds_cstr(match), "ct_commit(snat); next; ", >>> + lflow_ref); >>> + } >>> + >>> + ds_destroy(&match_all_from_snat); >>> >> >> I suppose that the gw router works because we have ct_commit for new >> traffic [0]. I wonder if we could extend that with a check for the DGP case. >> > Yes. I think so too. I tested removing the `is_gw_router` condition and it fixed our use-case on the Distributed router. However it also broke tests fro "stateless" NAT because there were calls to "ct_*" in the flows even if stateless NAT was requested. I ultimately decided to go with this, more targeted, approach rather than sending everything to conntrack. > >> >>> } >>> >>> static void >>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml >>> index 17b414144..ce6fd1270 100644 >>> --- a/northd/ovn-northd.8.xml >>> +++ b/northd/ovn-northd.8.xml >>> @@ -4869,6 +4869,13 @@ nd_ns { >>> >>> <p> >>> <ul> >>> + <li> >>> + A priority-70 logical flow is added that initiates CT state >>> for >>> + traffic that is configured to be SNATed on Distributed >>> routers. >>> + This allows the next table, <code>lr_out_snat</code>, to >>> + effectively match on various CT states. >>> + </li> >>> + >>> <li> >>> A priority-50 logical flow is added that commits any >>> untracked flows >>> from the previous table <code>lr_out_undnat</code> for Gateway >>> @@ -5061,6 +5068,18 @@ nd_ns { >>> >>> </li> >>> >>> + <li> >>> + An additional flow is added for traffic that goes in opposite >>> + direction (i.e. it enters a network with configured SNAT). >>> Where the >>> + flow above matched on <code>ip4.src == <var>A</var> && >>> outport >>> + == <var>GW</var></code>, this flow matches on <code> ip4.dst == >>> + <var>A</var> && inport == <var>GW</var></code>. A CT >>> state is >>> + initiated for this traffic so that the following table, <code> >>> + lr_out_post_snat</code>, can identify whether the traffic flow >>> was >>> + initiated from the internal or external network. >>> + >>> + </li> >>> + >>> <li> >>> A priority-0 logical flow with match <code>1</code> has actions >>> <code>next;</code>. >>> @@ -5072,6 +5091,16 @@ nd_ns { >>> <p> >>> Packets reaching this table are processed according to the flows >>> below: >>> <ul> >>> + <li> >>> + Traffic that goes directly into a network configured with >>> SNAT on >>> + Distributed routers, and was initiated from an external >>> network (i.e. >>> + it matches <code>ct.new</code>), is committed to the SNAT CT >>> zone. >>> + This ensures that replies returning from the SNATed network >>> do not >>> + have their source address translated. For details about match >>> rules >>> + and priority see section "Egress Table 3: SNAT on Distributed >>> + Routers". >>> + </li> >>> + >>> <li> >>> A priority-0 logical flow that matches all packets not already >>> handled (match <code>1</code>) and action <code>next;</code>. >>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>> index 0732486f3..a7597a975 100644 >>> --- a/tests/ovn-northd.at >>> +++ b/tests/ovn-northd.at >>> @@ -1155,6 +1155,7 @@ AT_CAPTURE_FILE([crflows]) >>> AT_CHECK([grep -e "lr_out_snat" drflows | ovn_strip_lflows], [0], [dnl >>> table=??(lr_out_snat ), priority=0 , match=(1), >>> action=(next;) >>> table=??(lr_out_snat ), priority=120 , match=(nd_ns), >>> action=(next;) >>> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst >>> == 50.0.0.11 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1") && >>> ip4.src == $allowed_range), action=(ct_snat;) >>> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src >>> == 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && >>> ip4.dst == $allowed_range && (!ct.trk || !ct.rpl)), >>> action=(ct_snat(172.16.1.1);) >>> ]) >>> >>> @@ -1185,6 +1186,7 @@ AT_CAPTURE_FILE([crflows2]) >>> AT_CHECK([grep -e "lr_out_snat" drflows2 | ovn_strip_lflows], [0], [dnl >>> table=??(lr_out_snat ), priority=0 , match=(1), >>> action=(next;) >>> table=??(lr_out_snat ), priority=120 , match=(nd_ns), >>> action=(next;) >>> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst >>> == 50.0.0.11 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1")), >>> action=(ct_snat;) >>> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src >>> == 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && >>> (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.1);) >>> table=??(lr_out_snat ), priority=163 , match=(ip && ip4.src >>> == 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && >>> ip4.dst == $disallowed_range), action=(next;) >>> ]) >>> @@ -1279,7 +1281,7 @@ AT_CHECK([grep -e "lr_out_snat" crflows5 | >>> ovn_strip_lflows], [0], [dnl >>> table=??(lr_out_snat ), priority=33 , match=(ip && ip4.src >>> == 50.0.0.11 && ip4.dst == $allowed_range), action=(ip4.src=172.16.1.2; >>> next;) >>> ]) >>> >>> -# Stateful FIP with DISALLOWED_IPs >>> +# Stateless FIP with DISALLOWED_IPs >>> ovn-nbctl lr-nat-del DR dnat_and_snat 172.16.1.2 >>> ovn-nbctl lr-nat-del CR dnat_and_snat 172.16.1.2 >>> >>> @@ -5590,12 +5592,16 @@ AT_CHECK([grep "lr_out_undnat" lr0flows | >>> ovn_strip_lflows], [0], [dnl >>> >>> AT_CHECK([grep "lr_out_post_undnat" lr0flows | ovn_strip_lflows], [0], >>> [dnl >>> table=??(lr_out_post_undnat ), priority=0 , match=(1), >>> action=(next;) >>> + table=??(lr_out_post_undnat ), priority=70 , match=(ip && ip4.src >>> == 10.0.0.0/24 && outport == "lr0-public" && >>> is_chassis_resident("cr-lr0-public")), action=(ct_snat; ) >>> + table=??(lr_out_post_undnat ), priority=70 , match=(ip && ip4.src >>> == 10.0.0.10 && outport == "lr0-public" && >>> is_chassis_resident("cr-lr0-public")), action=(ct_snat; ) >>> ]) >>> >>> AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl >>> table=??(lr_out_snat ), priority=0 , match=(1), >>> action=(next;) >>> table=??(lr_out_snat ), priority=120 , match=(nd_ns), >>> action=(next;) >>> + table=??(lr_out_snat ), priority=153 , match=(ip && ip4.dst >>> == 10.0.0.0/24 && inport == "lr0-public" && >>> is_chassis_resident("cr-lr0-public")), action=(ct_snat;) >>> table=??(lr_out_snat ), priority=153 , match=(ip && ip4.src >>> == 10.0.0.0/24 && outport == "lr0-public" && >>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), >>> action=(ct_snat(172.168.0.10);) >>> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst >>> == 10.0.0.10 && inport == "lr0-public" && >>> is_chassis_resident("cr-lr0-public")), action=(ct_snat;) >>> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src >>> == 10.0.0.10 && outport == "lr0-public" && >>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), >>> action=(ct_snat(172.168.0.30);) >>> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src >>> == 10.0.0.3 && outport == "lr0-public" && >>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), >>> action=(ct_snat(172.168.0.20);) >>> ]) >>> @@ -5738,12 +5744,16 @@ AT_CHECK([grep "lr_out_undnat" lr0flows | >>> ovn_strip_lflows], [0], [dnl >>> >>> AT_CHECK([grep "lr_out_post_undnat" lr0flows | ovn_strip_lflows], [0], >>> [dnl >>> table=??(lr_out_post_undnat ), priority=0 , match=(1), >>> action=(next;) >>> + table=??(lr_out_post_undnat ), priority=70 , match=(ip && ip4.src >>> == 10.0.0.0/24 && outport == "lr0-public" && >>> is_chassis_resident("cr-lr0-public")), action=(ct_snat; ) >>> + table=??(lr_out_post_undnat ), priority=70 , match=(ip && ip4.src >>> == 10.0.0.10 && outport == "lr0-public" && >>> is_chassis_resident("cr-lr0-public")), action=(ct_snat; ) >>> ]) >>> >>> AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl >>> table=??(lr_out_snat ), priority=0 , match=(1), >>> action=(next;) >>> table=??(lr_out_snat ), priority=120 , match=(nd_ns), >>> action=(next;) >>> + table=??(lr_out_snat ), priority=153 , match=(ip && ip4.dst >>> == 10.0.0.0/24 && inport == "lr0-public" && >>> is_chassis_resident("cr-lr0-public")), action=(ct_snat;) >>> table=??(lr_out_snat ), priority=153 , match=(ip && ip4.src >>> == 10.0.0.0/24 && outport == "lr0-public" && >>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), >>> action=(ct_snat(172.168.0.10);) >>> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst >>> == 10.0.0.10 && inport == "lr0-public" && >>> is_chassis_resident("cr-lr0-public")), action=(ct_snat;) >>> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src >>> == 10.0.0.10 && outport == "lr0-public" && >>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), >>> action=(ct_snat(172.168.0.30);) >>> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src >>> == 10.0.0.3 && outport == "lr0-public" && >>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), >>> action=(ct_snat(172.168.0.20);) >>> ]) >>> @@ -7643,6 +7653,9 @@ AT_CHECK([grep lr_in_unsnat lrflows | grep ct_snat >>> | ovn_strip_lflows], [0], [dn >>> ]) >>> >>> AT_CHECK([grep lr_out_snat lrflows | grep ct_snat | ovn_strip_lflows], >>> [0], [dnl >>> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst >>> == 20.0.0.10 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1")), >>> action=(ct_snat;) >>> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst >>> == 20.0.0.10 && inport == "DR-S2" && is_chassis_resident("cr-DR-S2")), >>> action=(ct_snat;) >>> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst >>> == 20.0.0.10 && inport == "DR-S3" && is_chassis_resident("cr-DR-S3")), >>> action=(ct_snat;) >>> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src >>> == 20.0.0.10 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && >>> (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.10);) >>> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src >>> == 20.0.0.10 && outport == "DR-S2" && is_chassis_resident("cr-DR-S2") && >>> (!ct.trk || !ct.rpl)), action=(ct_snat(10.0.0.10);) >>> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src >>> == 20.0.0.10 && outport == "DR-S3" && is_chassis_resident("cr-DR-S3") && >>> (!ct.trk || !ct.rpl)), action=(ct_snat(192.168.0.10);) >>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>> index c22c7882f..80e9b7d0b 100644 >>> --- a/tests/system-ovn.at >>> +++ b/tests/system-ovn.at >>> @@ -3572,6 +3572,40 @@ NS_CHECK_EXEC([foo1], [ping -q -c 3 -i 0.3 -w 2 >>> 10.0.0.1 | FORMAT_PING], \ >>> 3 packets transmitted, 3 received, 0% packet loss, time 0ms >>> ]) >>> >>> + >>> +# test_connectivity_from_ext takes parameters 'vm' and 'ip'. It tests >>> +# icmp, udp and tcp connectivity from external network to the 'vm' on >>> +# the specified 'ip'. >>> +test_connectivity_from_ext() { >>> + local vm=$1; shift >>> + local ip=$1; shift >>> + >>> + # Start listening daemons for UDP and TCP connections >>> + NETNS_DAEMONIZE($vm, [nc -l -u 1234], [nc-$vm-$ip-udp.pid]) >>> + NETNS_DAEMONIZE($vm, [nc -l -k 1235], [nc-$vm-$ip-tcp.pid]) >>> + >>> + # Ensure that vm can be pinged on the specified IP >>> + NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 $ip | >>> FORMAT_PING], \ >>> + [0], [dnl >>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms >>> +]) >>> + >>> + # Perform two consecutive UDP connections to the specified IP >>> + NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z]) >>> + NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z]) >>> + >>> + # Send data over TCP connection to the specified IP >>> + NS_CHECK_EXEC([alice1], [echo "TCP test" | nc --send-only $ip 1235]) >>> +} >>> + >>> +# Test access from external network to the internal IP of a VM that >>> +# has also configured DNAT >>> +test_connectivity_from_ext foo1 192.168.1.2 >>> + >>> +# Test access from external network to the internal IP of a VM that >>> +# does not have DNAT >>> +test_connectivity_from_ext bar1 192.168.2.2 >>> + >>> OVS_WAIT_UNTIL([ >>> total_pkts=$(cat ext-net.pcap | wc -l) >>> test "${total_pkts}" = "3" >>> @@ -3738,6 +3772,39 @@ sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], >>> [dnl >>> >>> icmpv6,orig=(src=fd12::2,dst=fd20::2,id=<cleared>,type=128,code=0),reply=(src=fd20::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared> >>> ]) >>> >>> +# test_connectivity_from_ext takes parameters 'vm' and 'ip'. It tests >>> +# icmp, udp and tcp connectivity from external network to the 'vm' on >>> +# the specified 'ip'. >>> +test_connectivity_from_ext() { >>> + local vm=$1; shift >>> + local ip=$1; shift >>> + >>> + # Start listening daemons for UDP and TCP connections >>> + NETNS_DAEMONIZE($vm, [nc -l -u 1234], [nc-$vm-$ip-udp.pid]) >>> + NETNS_DAEMONIZE($vm, [nc -l -k 1235], [nc-$vm-$ip-tcp.pid]) >>> + >>> + # Ensure that vm can be pinged on the specified IP >>> + NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 $ip | >>> FORMAT_PING], \ >>> + [0], [dnl >>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms >>> +]) >>> + >>> + # Perform two consecutive UDP connections to the specified IP >>> + NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z]) >>> + NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z]) >>> + >>> + # Send data over TCP connection to the specified IP >>> + NS_CHECK_EXEC([alice1], [echo "TCP test" | nc --send-only $ip 1235]) >>> +} >>> + >>> +# Test access from external network to the internal IP of a VM that >>> +# has also configured DNAT >>> +test_connectivity_from_ext foo1 fd11::2 >>> + >>> +# Test access from external network to the internal IP of a VM that >>> +# does not have DNAT >>> +test_connectivity_from_ext bar1 fd12::2 >>> + >>> OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>> >>> as ovn-sb >>> @@ -3918,6 +3985,7 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2 >>> 172.16.1.4 | FORMAT_PING], \ >>> AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep icmp | >>> FORMAT_CT(172.16.1.1) | \ >>> sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl >>> >>> icmp,orig=(src=172.16.1.1,dst=172.16.1.4,id=<cleared>,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared> >>> >>> +icmp,orig=(src=172.16.1.1,dst=192.168.2.2,id=<cleared>,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared> >>> >>> icmp,orig=(src=192.168.1.3,dst=172.16.1.4,id=<cleared>,type=8,code=0),reply=(src=172.16.1.4,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared> >>> ]) >>> >>> @@ -4086,6 +4154,7 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2 >>> fd20::4 | FORMAT_PING], \ >>> AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd20::1) | \ >>> sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl >>> >>> icmpv6,orig=(src=fd11::3,dst=fd20::4,id=<cleared>,type=128,code=0),reply=(src=fd20::4,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared> >>> >>> +icmpv6,orig=(src=fd20::1,dst=fd12::2,id=<cleared>,type=128,code=0),reply=(src=fd12::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared> >>> >>> icmpv6,orig=(src=fd20::1,dst=fd20::4,id=<cleared>,type=128,code=0),reply=(src=fd12::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared> >>> ]) >>> >>> -- >>> 2.40.1 >>> >>> _______________________________________________ >>> dev mailing list >>> dev@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>> >>> >> One thing that slightly worries me is the potential impact on traffic >> performance. Basically anything going towards the SNAT would go through CT >> at least once. >> > Am I correct to assume that the "expensive" operation here is only the `ct_commit`? If so, then only "new" traffic towards SNAT would be something "additional" that goes through CT and "reply" traffic should be unaffected (performance-wise), correct? > >> Thanks, >> Ales >> >> -- >> >> Ales Musil >> >> Senior Software Engineer - OVN Core >> >> Red Hat EMEA <https://www.redhat.com> >> >> amusil@redhat.com >> <https://red.ht/sig> >> > > Forgot to add the link. > > [0] > https://github.com/ovn-org/ovn/blob/b92ad9e0b408a202273d69ba573f2538e53c6e48/northd/northd.c#L14998-L15009 > > -- > > Ales Musil > > Senior Software Engineer - OVN Core > > Red Hat EMEA <https://www.redhat.com> > > amusil@redhat.com > <https://red.ht/sig> >
On Wed, Mar 6, 2024 at 1:50 PM Martin Kalcok <martin.kalcok@canonical.com> wrote: > > Thanks for reviewing this as well, Ales. > > On Wed, Mar 6, 2024 at 8:08 AM Ales Musil <amusil@redhat.com> wrote: > >> >> >> On Wed, Mar 6, 2024 at 8:07 AM Ales Musil <amusil@redhat.com> wrote: >> >>> >>> >>> On Mon, Mar 4, 2024 at 11:56 AM Martin Kalcok < >>> martin.kalcok@canonical.com> wrote: >>> >>>> This change fixes bug that breaks ability of machines from external >>>> networks to communicate with machines in SNATed networks (specifically >>>> when using a Distributed router). >>>> >>>> Currently when a machine (S1) on an external network tries to talk >>>> over TCP with a machine (A1) in a network that has enabled SNAT, the >>>> connection is established successfully. However after the three-way >>>> handshake, any packets that come from the A1 machine will have their >>>> source address translated by the Distributed router, breaking the >>>> communication. >>>> >>>> Existing rule in `build_lrouter_out_snat_flow` that decides which >>>> packets should be SNATed already tries to avoid SNATing packets in >>>> reply direction with `(!ct.trk || !ct.rpl)`. However, previous stages >>>> in the distributed LR egress pipeline do not initiate the CT state. >>>> >>>> Additionally we need to commit new connections that originate from >>>> external networks into CT, so that the packets in the reply direction >>>> can be properly identified. >>>> >>>> Rationale: >>>> >>>> In my original RFC [0], there were questions about the motivation for >>>> fixing this issue. I'll try to summarize why I think this is a bug >>>> that should be fixed. >>>> >>>> 1. Current implementation for Distributed router already tries to >>>> avoid SNATing packets in the reply direction, it's just missing >>>> initialized CT states to make proper decisions. >>>> >>>> 2. This same scenario works with Gateway Router. I tested with >>>> following setup: >>>> >>>> foo -- R1 -- join - R3 -- alice >>>> | >>>> bar ----------R2 >>>> >>>> R1 is a Distributed router with SNAT for foo. R2 is a Gateway >>>> router with SNAT for bar. R3 is a Gateway router with no SNAT. >>>> Using 'alice1' as a client I was able to talk over TCP with >>>> 'bar1' but connection with 'foo1' failed. >>>> >>>> 3. Regarding security and "leaking" of internal IPs. Reading through >>>> RFC 4787 [1], 5382 [2] and their update in 7857 [3], the >>>> specifications do not seem to mandate that SNAT implementations >>>> must filter incoming traffic destined directly to the internal >>>> network. Sections like "5. Filtering Behavior" in 4787 and >>>> "4.3 Externally Initiated Connections" in 5382 describe only >>>> behavior for traffic destined to external IP/Port associated >>>> with NAT on the device that performs NAT. >>>> >>>> Besides, with the current implementation, it's already possible >>>> to scan the internal network with pings and TCP syn scanning. >>>> >>>> If an additional security is needed, ACLs can be used to filter >>>> out incomming traffic from external networks. >>>> >>>> 4. We do have customers/clouds that depend on this functionality. >>>> This is a scenario that used to work in Openstack with ML2/OVS >>>> and migrating those clouds to ML2/OVN would break it. >>>> >>>> [0] >>>> https://mail.openvswitch.org/pipermail/ovs-dev/2024-February/411670.html >>>> [1]https://datatracker.ietf.org/doc/html/rfc4787 >>>> [2]https://datatracker.ietf.org/doc/html/rfc5382 >>>> [3]https://datatracker.ietf.org/doc/html/rfc7857 >>>> >>>> Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com> >>>> --- >>>> >>> >>> Hi Martin, >>> >>> I have a few comments down below. >>> >>> >>>> northd/northd.c | 82 ++++++++++++++++++++++++++++++++++++----- >>>> northd/ovn-northd.8.xml | 29 +++++++++++++++ >>>> tests/ovn-northd.at | 15 +++++++- >>>> tests/system-ovn.at | 69 ++++++++++++++++++++++++++++++++++ >>>> 4 files changed, 185 insertions(+), 10 deletions(-) >>>> >>>> diff --git a/northd/northd.c b/northd/northd.c >>>> index 2c3560ce2..4b79b357c 100644 >>>> --- a/northd/northd.c >>>> +++ b/northd/northd.c >>>> @@ -14437,21 +14437,28 @@ build_lrouter_out_is_dnat_local(struct >>>> lflow_table *lflows, >>>> } >>>> >>>> static void >>>> -build_lrouter_out_snat_match(struct lflow_table *lflows, >>>> - const struct ovn_datapath *od, >>>> - const struct nbrec_nat *nat, struct ds >>>> *match, >>>> - bool distributed_nat, int cidr_bits, bool >>>> is_v6, >>>> - struct ovn_port *l3dgw_port, >>>> - struct lflow_ref *lflow_ref) >>>> +build_lrouter_out_snat_direction_match__(struct lflow_table *lflows, >>>> + const struct ovn_datapath *od, >>>> + const struct nbrec_nat *nat, >>>> + struct ds *match, >>>> + bool distributed_nat, int >>>> cidr_bits, >>>> + bool is_v6, >>>> + struct ovn_port *l3dgw_port, >>>> + struct lflow_ref *lflow_ref, >>>> + bool is_reverse) >>>> { >>>> ds_clear(match); >>>> >>>> - ds_put_format(match, "ip && ip%c.src == %s", is_v6 ? '6' : '4', >>>> + ds_put_format(match, "ip && ip%c.%s == %s", >>>> + is_v6 ? '6' : '4', >>>> + is_reverse ? "dst" : "src", >>>> nat->logical_ip); >>>> >>>> if (!od->is_gw_router) { >>>> /* Distributed router. */ >>>> - ds_put_format(match, " && outport == %s", >>>> l3dgw_port->json_key); >>>> + ds_put_format(match, " && %s == %s", >>>> + is_reverse ? "inport" : "outport", >>>> + l3dgw_port->json_key); >>>> if (od->n_l3dgw_ports) { >>>> ds_put_format(match, " && is_chassis_resident(\"%s\")", >>>> distributed_nat >>>> @@ -14462,11 +14469,37 @@ build_lrouter_out_snat_match(struct >>>> lflow_table *lflows, >>>> >>>> if (nat->allowed_ext_ips || nat->exempted_ext_ips) { >>>> lrouter_nat_add_ext_ip_match(od, lflows, match, nat, >>>> - is_v6, false, cidr_bits, >>>> + is_v6, is_reverse, cidr_bits, >>>> lflow_ref); >>>> } >>>> } >>>> >>>> +static void >>>> +build_lrouter_out_snat_match(struct lflow_table *lflows, >>>> + const struct ovn_datapath *od, >>>> + const struct nbrec_nat *nat, struct ds >>>> *match, >>>> + bool distributed_nat, int cidr_bits, bool >>>> is_v6, >>>> + struct ovn_port *l3dgw_port, >>>> + struct lflow_ref *lflow_ref) >>>> +{ >>>> + build_lrouter_out_snat_direction_match__(lflows, od, nat, match, >>>> + distributed_nat, >>>> cidr_bits, is_v6, >>>> + l3dgw_port, lflow_ref, >>>> false); >>>> +} >>>> + >>>> +static void >>>> +build_lrouter_out_snat_reverse_match(struct lflow_table *lflows, >>>> + const struct ovn_datapath *od, >>>> + const struct nbrec_nat *nat, struct ds >>>> *match, >>>> + bool distributed_nat, int cidr_bits, bool >>>> is_v6, >>>> + struct ovn_port *l3dgw_port, >>>> + struct lflow_ref *lflow_ref) >>>> +{ >>>> + build_lrouter_out_snat_direction_match__(lflows, od, nat, match, >>>> + distributed_nat, >>>> cidr_bits, is_v6, >>>> + l3dgw_port, lflow_ref, >>>> true); >>>> +} >>>> + >>>> >>> >>> nit: I don't think we need two new functions when only one argument was >>> added. Considering the old function was used at 3 places only anyway. >>> >> > Ack. I went with this approach because I deemed it more cautious/less > intrusive, although bit more verbose. However If you are OK with changing > the function's signature, I can go with that as well. > > >> >>> >>>> static void >>>> build_lrouter_out_snat_stateless_flow(struct lflow_table *lflows, >>>> const struct ovn_datapath *od, >>>> @@ -14591,6 +14624,7 @@ build_lrouter_out_snat_flow(struct lflow_table >>>> *lflows, >>>> return; >>>> } >>>> >>>> + struct ds match_all_from_snat = DS_EMPTY_INITIALIZER; >>>> ds_clear(actions); >>>> >>>> /* The priority here is calculated such that the >>>> @@ -14600,6 +14634,7 @@ build_lrouter_out_snat_flow(struct lflow_table >>>> *lflows, >>>> >>>> build_lrouter_out_snat_match(lflows, od, nat, match, >>>> distributed_nat, >>>> cidr_bits, is_v6, l3dgw_port, >>>> lflow_ref); >>>> + ds_clone(&match_all_from_snat, match); >>>> >>>> if (!od->is_gw_router) { >>>> /* Distributed router. */ >>>> @@ -14624,6 +14659,35 @@ build_lrouter_out_snat_flow(struct lflow_table >>>> *lflows, >>>> priority, ds_cstr(match), >>>> ds_cstr(actions), &nat->header_, >>>> lflow_ref); >>>> + >>>> + /* For the SNAT networks, we need to make sure that connections are >>>> + * properly tracked so we can decide whether to perform SNAT on >>>> traffic >>>> + * exiting the network. */ >>>> + if (!strcmp(nat->type, "snat") && !od->is_gw_router) { >>>> >>> + /* For traffic that comes from SNAT network, initiate CT state >>>> before >>>> + * entering S_ROUTER_OUT_SNAT to allow matching on various CT >>>> states. >>>> + */ >>>> + ovn_lflow_add(lflows, od, S_ROUTER_OUT_POST_UNDNAT, 70, >>>> + ds_cstr(&match_all_from_snat), "ct_snat; ", >>>> + lflow_ref); >>>> >>> + >>>> + build_lrouter_out_snat_reverse_match(lflows, od, nat, match, >>>> + distributed_nat, >>>> cidr_bits, is_v6, >>>> + l3dgw_port, lflow_ref); >>>> + >>>> + /* New traffic that goes into SNAT network is committed to CT >>>> to avoid >>>> + * SNAT-ing replies.*/ >>>> + ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, priority, >>>> + ds_cstr(match), "ct_snat;", >>>> + lflow_ref); >>>> + >>>> + ds_put_cstr(match, "&& ct.new"); >>>> + ovn_lflow_add(lflows, od, S_ROUTER_OUT_POST_SNAT, priority, >>>> + ds_cstr(match), "ct_commit(snat); next; ", >>>> + lflow_ref); >>>> + } >>>> + >>>> + ds_destroy(&match_all_from_snat); >>>> >>> >>> I suppose that the gw router works because we have ct_commit for new >>> traffic [0]. I wonder if we could extend that with a check for the DGP case. >>> >> > Yes. I think so too. I tested removing the `is_gw_router` condition and it > fixed our use-case on the Distributed router. However it also broke tests > fro "stateless" NAT because there were calls to "ct_*" in the flows even if > stateless NAT was requested. I ultimately decided to go with this, more > targeted, approach rather than sending everything to conntrack. > > >> >>> >>>> } >>>> >>>> static void >>>> diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml >>>> index 17b414144..ce6fd1270 100644 >>>> --- a/northd/ovn-northd.8.xml >>>> +++ b/northd/ovn-northd.8.xml >>>> @@ -4869,6 +4869,13 @@ nd_ns { >>>> >>>> <p> >>>> <ul> >>>> + <li> >>>> + A priority-70 logical flow is added that initiates CT state >>>> for >>>> + traffic that is configured to be SNATed on Distributed >>>> routers. >>>> + This allows the next table, <code>lr_out_snat</code>, to >>>> + effectively match on various CT states. >>>> + </li> >>>> + >>>> <li> >>>> A priority-50 logical flow is added that commits any >>>> untracked flows >>>> from the previous table <code>lr_out_undnat</code> for >>>> Gateway >>>> @@ -5061,6 +5068,18 @@ nd_ns { >>>> >>>> </li> >>>> >>>> + <li> >>>> + An additional flow is added for traffic that goes in opposite >>>> + direction (i.e. it enters a network with configured SNAT). >>>> Where the >>>> + flow above matched on <code>ip4.src == <var>A</var> && >>>> outport >>>> + == <var>GW</var></code>, this flow matches on <code> ip4.dst == >>>> + <var>A</var> && inport == <var>GW</var></code>. A CT >>>> state is >>>> + initiated for this traffic so that the following table, <code> >>>> + lr_out_post_snat</code>, can identify whether the traffic flow >>>> was >>>> + initiated from the internal or external network. >>>> + >>>> + </li> >>>> + >>>> <li> >>>> A priority-0 logical flow with match <code>1</code> has actions >>>> <code>next;</code>. >>>> @@ -5072,6 +5091,16 @@ nd_ns { >>>> <p> >>>> Packets reaching this table are processed according to the flows >>>> below: >>>> <ul> >>>> + <li> >>>> + Traffic that goes directly into a network configured with >>>> SNAT on >>>> + Distributed routers, and was initiated from an external >>>> network (i.e. >>>> + it matches <code>ct.new</code>), is committed to the SNAT CT >>>> zone. >>>> + This ensures that replies returning from the SNATed network >>>> do not >>>> + have their source address translated. For details about >>>> match rules >>>> + and priority see section "Egress Table 3: SNAT on Distributed >>>> + Routers". >>>> + </li> >>>> + >>>> <li> >>>> A priority-0 logical flow that matches all packets not >>>> already >>>> handled (match <code>1</code>) and action <code>next;</code>. >>>> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at >>>> index 0732486f3..a7597a975 100644 >>>> --- a/tests/ovn-northd.at >>>> +++ b/tests/ovn-northd.at >>>> @@ -1155,6 +1155,7 @@ AT_CAPTURE_FILE([crflows]) >>>> AT_CHECK([grep -e "lr_out_snat" drflows | ovn_strip_lflows], [0], [dnl >>>> table=??(lr_out_snat ), priority=0 , match=(1), >>>> action=(next;) >>>> table=??(lr_out_snat ), priority=120 , match=(nd_ns), >>>> action=(next;) >>>> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst >>>> == 50.0.0.11 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1") && >>>> ip4.src == $allowed_range), action=(ct_snat;) >>>> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src >>>> == 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && >>>> ip4.dst == $allowed_range && (!ct.trk || !ct.rpl)), >>>> action=(ct_snat(172.16.1.1);) >>>> ]) >>>> >>>> @@ -1185,6 +1186,7 @@ AT_CAPTURE_FILE([crflows2]) >>>> AT_CHECK([grep -e "lr_out_snat" drflows2 | ovn_strip_lflows], [0], [dnl >>>> table=??(lr_out_snat ), priority=0 , match=(1), >>>> action=(next;) >>>> table=??(lr_out_snat ), priority=120 , match=(nd_ns), >>>> action=(next;) >>>> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst >>>> == 50.0.0.11 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1")), >>>> action=(ct_snat;) >>>> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src >>>> == 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && >>>> (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.1);) >>>> table=??(lr_out_snat ), priority=163 , match=(ip && ip4.src >>>> == 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && >>>> ip4.dst == $disallowed_range), action=(next;) >>>> ]) >>>> @@ -1279,7 +1281,7 @@ AT_CHECK([grep -e "lr_out_snat" crflows5 | >>>> ovn_strip_lflows], [0], [dnl >>>> table=??(lr_out_snat ), priority=33 , match=(ip && ip4.src >>>> == 50.0.0.11 && ip4.dst == $allowed_range), action=(ip4.src=172.16.1.2; >>>> next;) >>>> ]) >>>> >>>> -# Stateful FIP with DISALLOWED_IPs >>>> +# Stateless FIP with DISALLOWED_IPs >>>> ovn-nbctl lr-nat-del DR dnat_and_snat 172.16.1.2 >>>> ovn-nbctl lr-nat-del CR dnat_and_snat 172.16.1.2 >>>> >>>> @@ -5590,12 +5592,16 @@ AT_CHECK([grep "lr_out_undnat" lr0flows | >>>> ovn_strip_lflows], [0], [dnl >>>> >>>> AT_CHECK([grep "lr_out_post_undnat" lr0flows | ovn_strip_lflows], [0], >>>> [dnl >>>> table=??(lr_out_post_undnat ), priority=0 , match=(1), >>>> action=(next;) >>>> + table=??(lr_out_post_undnat ), priority=70 , match=(ip && ip4.src >>>> == 10.0.0.0/24 && outport == "lr0-public" && >>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat; ) >>>> + table=??(lr_out_post_undnat ), priority=70 , match=(ip && ip4.src >>>> == 10.0.0.10 && outport == "lr0-public" && >>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat; ) >>>> ]) >>>> >>>> AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl >>>> table=??(lr_out_snat ), priority=0 , match=(1), >>>> action=(next;) >>>> table=??(lr_out_snat ), priority=120 , match=(nd_ns), >>>> action=(next;) >>>> + table=??(lr_out_snat ), priority=153 , match=(ip && ip4.dst >>>> == 10.0.0.0/24 && inport == "lr0-public" && >>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat;) >>>> table=??(lr_out_snat ), priority=153 , match=(ip && ip4.src >>>> == 10.0.0.0/24 && outport == "lr0-public" && >>>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), >>>> action=(ct_snat(172.168.0.10);) >>>> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst >>>> == 10.0.0.10 && inport == "lr0-public" && >>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat;) >>>> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src >>>> == 10.0.0.10 && outport == "lr0-public" && >>>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), >>>> action=(ct_snat(172.168.0.30);) >>>> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src >>>> == 10.0.0.3 && outport == "lr0-public" && >>>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), >>>> action=(ct_snat(172.168.0.20);) >>>> ]) >>>> @@ -5738,12 +5744,16 @@ AT_CHECK([grep "lr_out_undnat" lr0flows | >>>> ovn_strip_lflows], [0], [dnl >>>> >>>> AT_CHECK([grep "lr_out_post_undnat" lr0flows | ovn_strip_lflows], [0], >>>> [dnl >>>> table=??(lr_out_post_undnat ), priority=0 , match=(1), >>>> action=(next;) >>>> + table=??(lr_out_post_undnat ), priority=70 , match=(ip && ip4.src >>>> == 10.0.0.0/24 && outport == "lr0-public" && >>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat; ) >>>> + table=??(lr_out_post_undnat ), priority=70 , match=(ip && ip4.src >>>> == 10.0.0.10 && outport == "lr0-public" && >>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat; ) >>>> ]) >>>> >>>> AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl >>>> table=??(lr_out_snat ), priority=0 , match=(1), >>>> action=(next;) >>>> table=??(lr_out_snat ), priority=120 , match=(nd_ns), >>>> action=(next;) >>>> + table=??(lr_out_snat ), priority=153 , match=(ip && ip4.dst >>>> == 10.0.0.0/24 && inport == "lr0-public" && >>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat;) >>>> table=??(lr_out_snat ), priority=153 , match=(ip && ip4.src >>>> == 10.0.0.0/24 && outport == "lr0-public" && >>>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), >>>> action=(ct_snat(172.168.0.10);) >>>> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst >>>> == 10.0.0.10 && inport == "lr0-public" && >>>> is_chassis_resident("cr-lr0-public")), action=(ct_snat;) >>>> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src >>>> == 10.0.0.10 && outport == "lr0-public" && >>>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), >>>> action=(ct_snat(172.168.0.30);) >>>> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src >>>> == 10.0.0.3 && outport == "lr0-public" && >>>> is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), >>>> action=(ct_snat(172.168.0.20);) >>>> ]) >>>> @@ -7643,6 +7653,9 @@ AT_CHECK([grep lr_in_unsnat lrflows | grep >>>> ct_snat | ovn_strip_lflows], [0], [dn >>>> ]) >>>> >>>> AT_CHECK([grep lr_out_snat lrflows | grep ct_snat | ovn_strip_lflows], >>>> [0], [dnl >>>> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst >>>> == 20.0.0.10 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1")), >>>> action=(ct_snat;) >>>> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst >>>> == 20.0.0.10 && inport == "DR-S2" && is_chassis_resident("cr-DR-S2")), >>>> action=(ct_snat;) >>>> + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst >>>> == 20.0.0.10 && inport == "DR-S3" && is_chassis_resident("cr-DR-S3")), >>>> action=(ct_snat;) >>>> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src >>>> == 20.0.0.10 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && >>>> (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.10);) >>>> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src >>>> == 20.0.0.10 && outport == "DR-S2" && is_chassis_resident("cr-DR-S2") && >>>> (!ct.trk || !ct.rpl)), action=(ct_snat(10.0.0.10);) >>>> table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src >>>> == 20.0.0.10 && outport == "DR-S3" && is_chassis_resident("cr-DR-S3") && >>>> (!ct.trk || !ct.rpl)), action=(ct_snat(192.168.0.10);) >>>> diff --git a/tests/system-ovn.at b/tests/system-ovn.at >>>> index c22c7882f..80e9b7d0b 100644 >>>> --- a/tests/system-ovn.at >>>> +++ b/tests/system-ovn.at >>>> @@ -3572,6 +3572,40 @@ NS_CHECK_EXEC([foo1], [ping -q -c 3 -i 0.3 -w 2 >>>> 10.0.0.1 | FORMAT_PING], \ >>>> 3 packets transmitted, 3 received, 0% packet loss, time 0ms >>>> ]) >>>> >>>> + >>>> +# test_connectivity_from_ext takes parameters 'vm' and 'ip'. It tests >>>> +# icmp, udp and tcp connectivity from external network to the 'vm' on >>>> +# the specified 'ip'. >>>> +test_connectivity_from_ext() { >>>> + local vm=$1; shift >>>> + local ip=$1; shift >>>> + >>>> + # Start listening daemons for UDP and TCP connections >>>> + NETNS_DAEMONIZE($vm, [nc -l -u 1234], [nc-$vm-$ip-udp.pid]) >>>> + NETNS_DAEMONIZE($vm, [nc -l -k 1235], [nc-$vm-$ip-tcp.pid]) >>>> + >>>> + # Ensure that vm can be pinged on the specified IP >>>> + NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 $ip | >>>> FORMAT_PING], \ >>>> + [0], [dnl >>>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms >>>> +]) >>>> + >>>> + # Perform two consecutive UDP connections to the specified IP >>>> + NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z]) >>>> + NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z]) >>>> + >>>> + # Send data over TCP connection to the specified IP >>>> + NS_CHECK_EXEC([alice1], [echo "TCP test" | nc --send-only $ip >>>> 1235]) >>>> +} >>>> + >>>> +# Test access from external network to the internal IP of a VM that >>>> +# has also configured DNAT >>>> +test_connectivity_from_ext foo1 192.168.1.2 >>>> + >>>> +# Test access from external network to the internal IP of a VM that >>>> +# does not have DNAT >>>> +test_connectivity_from_ext bar1 192.168.2.2 >>>> + >>>> OVS_WAIT_UNTIL([ >>>> total_pkts=$(cat ext-net.pcap | wc -l) >>>> test "${total_pkts}" = "3" >>>> @@ -3738,6 +3772,39 @@ sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], >>>> [dnl >>>> >>>> icmpv6,orig=(src=fd12::2,dst=fd20::2,id=<cleared>,type=128,code=0),reply=(src=fd20::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared> >>>> ]) >>>> >>>> +# test_connectivity_from_ext takes parameters 'vm' and 'ip'. It tests >>>> +# icmp, udp and tcp connectivity from external network to the 'vm' on >>>> +# the specified 'ip'. >>>> +test_connectivity_from_ext() { >>>> + local vm=$1; shift >>>> + local ip=$1; shift >>>> + >>>> + # Start listening daemons for UDP and TCP connections >>>> + NETNS_DAEMONIZE($vm, [nc -l -u 1234], [nc-$vm-$ip-udp.pid]) >>>> + NETNS_DAEMONIZE($vm, [nc -l -k 1235], [nc-$vm-$ip-tcp.pid]) >>>> + >>>> + # Ensure that vm can be pinged on the specified IP >>>> + NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 $ip | >>>> FORMAT_PING], \ >>>> + [0], [dnl >>>> +3 packets transmitted, 3 received, 0% packet loss, time 0ms >>>> +]) >>>> + >>>> + # Perform two consecutive UDP connections to the specified IP >>>> + NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z]) >>>> + NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z]) >>>> + >>>> + # Send data over TCP connection to the specified IP >>>> + NS_CHECK_EXEC([alice1], [echo "TCP test" | nc --send-only $ip >>>> 1235]) >>>> +} >>>> + >>>> +# Test access from external network to the internal IP of a VM that >>>> +# has also configured DNAT >>>> +test_connectivity_from_ext foo1 fd11::2 >>>> + >>>> +# Test access from external network to the internal IP of a VM that >>>> +# does not have DNAT >>>> +test_connectivity_from_ext bar1 fd12::2 >>>> + >>>> OVS_APP_EXIT_AND_WAIT([ovn-controller]) >>>> >>>> as ovn-sb >>>> @@ -3918,6 +3985,7 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2 >>>> 172.16.1.4 | FORMAT_PING], \ >>>> AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep icmp | >>>> FORMAT_CT(172.16.1.1) | \ >>>> sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl >>>> >>>> icmp,orig=(src=172.16.1.1,dst=172.16.1.4,id=<cleared>,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared> >>>> >>>> +icmp,orig=(src=172.16.1.1,dst=192.168.2.2,id=<cleared>,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared> >>>> >>>> icmp,orig=(src=192.168.1.3,dst=172.16.1.4,id=<cleared>,type=8,code=0),reply=(src=172.16.1.4,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared> >>>> ]) >>>> >>>> @@ -4086,6 +4154,7 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2 >>>> fd20::4 | FORMAT_PING], \ >>>> AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd20::1) | \ >>>> sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl >>>> >>>> icmpv6,orig=(src=fd11::3,dst=fd20::4,id=<cleared>,type=128,code=0),reply=(src=fd20::4,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared> >>>> >>>> +icmpv6,orig=(src=fd20::1,dst=fd12::2,id=<cleared>,type=128,code=0),reply=(src=fd12::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared> >>>> >>>> icmpv6,orig=(src=fd20::1,dst=fd20::4,id=<cleared>,type=128,code=0),reply=(src=fd12::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared> >>>> ]) >>>> >>>> -- >>>> 2.40.1 >>>> >>>> _______________________________________________ >>>> dev mailing list >>>> dev@openvswitch.org >>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >>>> >>>> >>> One thing that slightly worries me is the potential impact on traffic >>> performance. Basically anything going towards the SNAT would go through CT >>> at least once. >>> >> > Am I correct to assume that the "expensive" operation here is only the > `ct_commit`? If so, then only "new" traffic towards SNAT would be something > "additional" that goes through CT and "reply" traffic should be unaffected > (performance-wise), correct? > Actually not only that, any CT operation has its own cost, it also splits megaflows into smaller pieces because with every CT we have to recirculate. > > >> >>> Thanks, >>> Ales >>> >>> -- >>> >>> Ales Musil >>> >>> Senior Software Engineer - OVN Core >>> >>> Red Hat EMEA <https://www.redhat.com> >>> >>> amusil@redhat.com >>> <https://red.ht/sig> >>> >> >> Forgot to add the link. >> >> [0] >> https://github.com/ovn-org/ovn/blob/b92ad9e0b408a202273d69ba573f2538e53c6e48/northd/northd.c#L14998-L15009 >> >> -- >> >> Ales Musil >> >> Senior Software Engineer - OVN Core >> >> Red Hat EMEA <https://www.redhat.com> >> >> amusil@redhat.com >> <https://red.ht/sig> >> > > > -- > Best Regards, > Martin Kalcok. >
diff --git a/northd/northd.c b/northd/northd.c index 2c3560ce2..4b79b357c 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -14437,21 +14437,28 @@ build_lrouter_out_is_dnat_local(struct lflow_table *lflows, } static void -build_lrouter_out_snat_match(struct lflow_table *lflows, - const struct ovn_datapath *od, - const struct nbrec_nat *nat, struct ds *match, - bool distributed_nat, int cidr_bits, bool is_v6, - struct ovn_port *l3dgw_port, - struct lflow_ref *lflow_ref) +build_lrouter_out_snat_direction_match__(struct lflow_table *lflows, + const struct ovn_datapath *od, + const struct nbrec_nat *nat, + struct ds *match, + bool distributed_nat, int cidr_bits, + bool is_v6, + struct ovn_port *l3dgw_port, + struct lflow_ref *lflow_ref, + bool is_reverse) { ds_clear(match); - ds_put_format(match, "ip && ip%c.src == %s", is_v6 ? '6' : '4', + ds_put_format(match, "ip && ip%c.%s == %s", + is_v6 ? '6' : '4', + is_reverse ? "dst" : "src", nat->logical_ip); if (!od->is_gw_router) { /* Distributed router. */ - ds_put_format(match, " && outport == %s", l3dgw_port->json_key); + ds_put_format(match, " && %s == %s", + is_reverse ? "inport" : "outport", + l3dgw_port->json_key); if (od->n_l3dgw_ports) { ds_put_format(match, " && is_chassis_resident(\"%s\")", distributed_nat @@ -14462,11 +14469,37 @@ build_lrouter_out_snat_match(struct lflow_table *lflows, if (nat->allowed_ext_ips || nat->exempted_ext_ips) { lrouter_nat_add_ext_ip_match(od, lflows, match, nat, - is_v6, false, cidr_bits, + is_v6, is_reverse, cidr_bits, lflow_ref); } } +static void +build_lrouter_out_snat_match(struct lflow_table *lflows, + const struct ovn_datapath *od, + const struct nbrec_nat *nat, struct ds *match, + bool distributed_nat, int cidr_bits, bool is_v6, + struct ovn_port *l3dgw_port, + struct lflow_ref *lflow_ref) +{ + build_lrouter_out_snat_direction_match__(lflows, od, nat, match, + distributed_nat, cidr_bits, is_v6, + l3dgw_port, lflow_ref, false); +} + +static void +build_lrouter_out_snat_reverse_match(struct lflow_table *lflows, + const struct ovn_datapath *od, + const struct nbrec_nat *nat, struct ds *match, + bool distributed_nat, int cidr_bits, bool is_v6, + struct ovn_port *l3dgw_port, + struct lflow_ref *lflow_ref) +{ + build_lrouter_out_snat_direction_match__(lflows, od, nat, match, + distributed_nat, cidr_bits, is_v6, + l3dgw_port, lflow_ref, true); +} + static void build_lrouter_out_snat_stateless_flow(struct lflow_table *lflows, const struct ovn_datapath *od, @@ -14591,6 +14624,7 @@ build_lrouter_out_snat_flow(struct lflow_table *lflows, return; } + struct ds match_all_from_snat = DS_EMPTY_INITIALIZER; ds_clear(actions); /* The priority here is calculated such that the @@ -14600,6 +14634,7 @@ build_lrouter_out_snat_flow(struct lflow_table *lflows, build_lrouter_out_snat_match(lflows, od, nat, match, distributed_nat, cidr_bits, is_v6, l3dgw_port, lflow_ref); + ds_clone(&match_all_from_snat, match); if (!od->is_gw_router) { /* Distributed router. */ @@ -14624,6 +14659,35 @@ build_lrouter_out_snat_flow(struct lflow_table *lflows, priority, ds_cstr(match), ds_cstr(actions), &nat->header_, lflow_ref); + + /* For the SNAT networks, we need to make sure that connections are + * properly tracked so we can decide whether to perform SNAT on traffic + * exiting the network. */ + if (!strcmp(nat->type, "snat") && !od->is_gw_router) { + /* For traffic that comes from SNAT network, initiate CT state before + * entering S_ROUTER_OUT_SNAT to allow matching on various CT states. + */ + ovn_lflow_add(lflows, od, S_ROUTER_OUT_POST_UNDNAT, 70, + ds_cstr(&match_all_from_snat), "ct_snat; ", + lflow_ref); + + build_lrouter_out_snat_reverse_match(lflows, od, nat, match, + distributed_nat, cidr_bits, is_v6, + l3dgw_port, lflow_ref); + + /* New traffic that goes into SNAT network is committed to CT to avoid + * SNAT-ing replies.*/ + ovn_lflow_add(lflows, od, S_ROUTER_OUT_SNAT, priority, + ds_cstr(match), "ct_snat;", + lflow_ref); + + ds_put_cstr(match, "&& ct.new"); + ovn_lflow_add(lflows, od, S_ROUTER_OUT_POST_SNAT, priority, + ds_cstr(match), "ct_commit(snat); next; ", + lflow_ref); + } + + ds_destroy(&match_all_from_snat); } static void diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml index 17b414144..ce6fd1270 100644 --- a/northd/ovn-northd.8.xml +++ b/northd/ovn-northd.8.xml @@ -4869,6 +4869,13 @@ nd_ns { <p> <ul> + <li> + A priority-70 logical flow is added that initiates CT state for + traffic that is configured to be SNATed on Distributed routers. + This allows the next table, <code>lr_out_snat</code>, to + effectively match on various CT states. + </li> + <li> A priority-50 logical flow is added that commits any untracked flows from the previous table <code>lr_out_undnat</code> for Gateway @@ -5061,6 +5068,18 @@ nd_ns { </li> + <li> + An additional flow is added for traffic that goes in opposite + direction (i.e. it enters a network with configured SNAT). Where the + flow above matched on <code>ip4.src == <var>A</var> && outport + == <var>GW</var></code>, this flow matches on <code> ip4.dst == + <var>A</var> && inport == <var>GW</var></code>. A CT state is + initiated for this traffic so that the following table, <code> + lr_out_post_snat</code>, can identify whether the traffic flow was + initiated from the internal or external network. + + </li> + <li> A priority-0 logical flow with match <code>1</code> has actions <code>next;</code>. @@ -5072,6 +5091,16 @@ nd_ns { <p> Packets reaching this table are processed according to the flows below: <ul> + <li> + Traffic that goes directly into a network configured with SNAT on + Distributed routers, and was initiated from an external network (i.e. + it matches <code>ct.new</code>), is committed to the SNAT CT zone. + This ensures that replies returning from the SNATed network do not + have their source address translated. For details about match rules + and priority see section "Egress Table 3: SNAT on Distributed + Routers". + </li> + <li> A priority-0 logical flow that matches all packets not already handled (match <code>1</code>) and action <code>next;</code>. diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at index 0732486f3..a7597a975 100644 --- a/tests/ovn-northd.at +++ b/tests/ovn-northd.at @@ -1155,6 +1155,7 @@ AT_CAPTURE_FILE([crflows]) AT_CHECK([grep -e "lr_out_snat" drflows | ovn_strip_lflows], [0], [dnl table=??(lr_out_snat ), priority=0 , match=(1), action=(next;) table=??(lr_out_snat ), priority=120 , match=(nd_ns), action=(next;) + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == 50.0.0.11 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1") && ip4.src == $allowed_range), action=(ct_snat;) table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && ip4.dst == $allowed_range && (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.1);) ]) @@ -1185,6 +1186,7 @@ AT_CAPTURE_FILE([crflows2]) AT_CHECK([grep -e "lr_out_snat" drflows2 | ovn_strip_lflows], [0], [dnl table=??(lr_out_snat ), priority=0 , match=(1), action=(next;) table=??(lr_out_snat ), priority=120 , match=(nd_ns), action=(next;) + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == 50.0.0.11 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1")), action=(ct_snat;) table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.1);) table=??(lr_out_snat ), priority=163 , match=(ip && ip4.src == 50.0.0.11 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && ip4.dst == $disallowed_range), action=(next;) ]) @@ -1279,7 +1281,7 @@ AT_CHECK([grep -e "lr_out_snat" crflows5 | ovn_strip_lflows], [0], [dnl table=??(lr_out_snat ), priority=33 , match=(ip && ip4.src == 50.0.0.11 && ip4.dst == $allowed_range), action=(ip4.src=172.16.1.2; next;) ]) -# Stateful FIP with DISALLOWED_IPs +# Stateless FIP with DISALLOWED_IPs ovn-nbctl lr-nat-del DR dnat_and_snat 172.16.1.2 ovn-nbctl lr-nat-del CR dnat_and_snat 172.16.1.2 @@ -5590,12 +5592,16 @@ AT_CHECK([grep "lr_out_undnat" lr0flows | ovn_strip_lflows], [0], [dnl AT_CHECK([grep "lr_out_post_undnat" lr0flows | ovn_strip_lflows], [0], [dnl table=??(lr_out_post_undnat ), priority=0 , match=(1), action=(next;) + table=??(lr_out_post_undnat ), priority=70 , match=(ip && ip4.src == 10.0.0.0/24 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public")), action=(ct_snat; ) + table=??(lr_out_post_undnat ), priority=70 , match=(ip && ip4.src == 10.0.0.10 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public")), action=(ct_snat; ) ]) AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl table=??(lr_out_snat ), priority=0 , match=(1), action=(next;) table=??(lr_out_snat ), priority=120 , match=(nd_ns), action=(next;) + table=??(lr_out_snat ), priority=153 , match=(ip && ip4.dst == 10.0.0.0/24 && inport == "lr0-public" && is_chassis_resident("cr-lr0-public")), action=(ct_snat;) table=??(lr_out_snat ), priority=153 , match=(ip && ip4.src == 10.0.0.0/24 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), action=(ct_snat(172.168.0.10);) + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == 10.0.0.10 && inport == "lr0-public" && is_chassis_resident("cr-lr0-public")), action=(ct_snat;) table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == 10.0.0.10 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), action=(ct_snat(172.168.0.30);) table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == 10.0.0.3 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), action=(ct_snat(172.168.0.20);) ]) @@ -5738,12 +5744,16 @@ AT_CHECK([grep "lr_out_undnat" lr0flows | ovn_strip_lflows], [0], [dnl AT_CHECK([grep "lr_out_post_undnat" lr0flows | ovn_strip_lflows], [0], [dnl table=??(lr_out_post_undnat ), priority=0 , match=(1), action=(next;) + table=??(lr_out_post_undnat ), priority=70 , match=(ip && ip4.src == 10.0.0.0/24 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public")), action=(ct_snat; ) + table=??(lr_out_post_undnat ), priority=70 , match=(ip && ip4.src == 10.0.0.10 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public")), action=(ct_snat; ) ]) AT_CHECK([grep "lr_out_snat" lr0flows | ovn_strip_lflows], [0], [dnl table=??(lr_out_snat ), priority=0 , match=(1), action=(next;) table=??(lr_out_snat ), priority=120 , match=(nd_ns), action=(next;) + table=??(lr_out_snat ), priority=153 , match=(ip && ip4.dst == 10.0.0.0/24 && inport == "lr0-public" && is_chassis_resident("cr-lr0-public")), action=(ct_snat;) table=??(lr_out_snat ), priority=153 , match=(ip && ip4.src == 10.0.0.0/24 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), action=(ct_snat(172.168.0.10);) + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == 10.0.0.10 && inport == "lr0-public" && is_chassis_resident("cr-lr0-public")), action=(ct_snat;) table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == 10.0.0.10 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), action=(ct_snat(172.168.0.30);) table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == 10.0.0.3 && outport == "lr0-public" && is_chassis_resident("cr-lr0-public") && (!ct.trk || !ct.rpl)), action=(ct_snat(172.168.0.20);) ]) @@ -7643,6 +7653,9 @@ AT_CHECK([grep lr_in_unsnat lrflows | grep ct_snat | ovn_strip_lflows], [0], [dn ]) AT_CHECK([grep lr_out_snat lrflows | grep ct_snat | ovn_strip_lflows], [0], [dnl + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == 20.0.0.10 && inport == "DR-S1" && is_chassis_resident("cr-DR-S1")), action=(ct_snat;) + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == 20.0.0.10 && inport == "DR-S2" && is_chassis_resident("cr-DR-S2")), action=(ct_snat;) + table=??(lr_out_snat ), priority=161 , match=(ip && ip4.dst == 20.0.0.10 && inport == "DR-S3" && is_chassis_resident("cr-DR-S3")), action=(ct_snat;) table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == 20.0.0.10 && outport == "DR-S1" && is_chassis_resident("cr-DR-S1") && (!ct.trk || !ct.rpl)), action=(ct_snat(172.16.1.10);) table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == 20.0.0.10 && outport == "DR-S2" && is_chassis_resident("cr-DR-S2") && (!ct.trk || !ct.rpl)), action=(ct_snat(10.0.0.10);) table=??(lr_out_snat ), priority=161 , match=(ip && ip4.src == 20.0.0.10 && outport == "DR-S3" && is_chassis_resident("cr-DR-S3") && (!ct.trk || !ct.rpl)), action=(ct_snat(192.168.0.10);) diff --git a/tests/system-ovn.at b/tests/system-ovn.at index c22c7882f..80e9b7d0b 100644 --- a/tests/system-ovn.at +++ b/tests/system-ovn.at @@ -3572,6 +3572,40 @@ NS_CHECK_EXEC([foo1], [ping -q -c 3 -i 0.3 -w 2 10.0.0.1 | FORMAT_PING], \ 3 packets transmitted, 3 received, 0% packet loss, time 0ms ]) + +# test_connectivity_from_ext takes parameters 'vm' and 'ip'. It tests +# icmp, udp and tcp connectivity from external network to the 'vm' on +# the specified 'ip'. +test_connectivity_from_ext() { + local vm=$1; shift + local ip=$1; shift + + # Start listening daemons for UDP and TCP connections + NETNS_DAEMONIZE($vm, [nc -l -u 1234], [nc-$vm-$ip-udp.pid]) + NETNS_DAEMONIZE($vm, [nc -l -k 1235], [nc-$vm-$ip-tcp.pid]) + + # Ensure that vm can be pinged on the specified IP + NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 $ip | FORMAT_PING], \ + [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + + # Perform two consecutive UDP connections to the specified IP + NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z]) + NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z]) + + # Send data over TCP connection to the specified IP + NS_CHECK_EXEC([alice1], [echo "TCP test" | nc --send-only $ip 1235]) +} + +# Test access from external network to the internal IP of a VM that +# has also configured DNAT +test_connectivity_from_ext foo1 192.168.1.2 + +# Test access from external network to the internal IP of a VM that +# does not have DNAT +test_connectivity_from_ext bar1 192.168.2.2 + OVS_WAIT_UNTIL([ total_pkts=$(cat ext-net.pcap | wc -l) test "${total_pkts}" = "3" @@ -3738,6 +3772,39 @@ sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl icmpv6,orig=(src=fd12::2,dst=fd20::2,id=<cleared>,type=128,code=0),reply=(src=fd20::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared> ]) +# test_connectivity_from_ext takes parameters 'vm' and 'ip'. It tests +# icmp, udp and tcp connectivity from external network to the 'vm' on +# the specified 'ip'. +test_connectivity_from_ext() { + local vm=$1; shift + local ip=$1; shift + + # Start listening daemons for UDP and TCP connections + NETNS_DAEMONIZE($vm, [nc -l -u 1234], [nc-$vm-$ip-udp.pid]) + NETNS_DAEMONIZE($vm, [nc -l -k 1235], [nc-$vm-$ip-tcp.pid]) + + # Ensure that vm can be pinged on the specified IP + NS_CHECK_EXEC([alice1], [ping -q -c 3 -i 0.3 -w 2 $ip | FORMAT_PING], \ + [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + + # Perform two consecutive UDP connections to the specified IP + NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z]) + NS_CHECK_EXEC([alice1], [nc -u $ip 1234 -p 2000 -z]) + + # Send data over TCP connection to the specified IP + NS_CHECK_EXEC([alice1], [echo "TCP test" | nc --send-only $ip 1235]) +} + +# Test access from external network to the internal IP of a VM that +# has also configured DNAT +test_connectivity_from_ext foo1 fd11::2 + +# Test access from external network to the internal IP of a VM that +# does not have DNAT +test_connectivity_from_ext bar1 fd12::2 + OVS_APP_EXIT_AND_WAIT([ovn-controller]) as ovn-sb @@ -3918,6 +3985,7 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2 172.16.1.4 | FORMAT_PING], \ AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep icmp | FORMAT_CT(172.16.1.1) | \ sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl icmp,orig=(src=172.16.1.1,dst=172.16.1.4,id=<cleared>,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared> +icmp,orig=(src=172.16.1.1,dst=192.168.2.2,id=<cleared>,type=8,code=0),reply=(src=192.168.2.2,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared> icmp,orig=(src=192.168.1.3,dst=172.16.1.4,id=<cleared>,type=8,code=0),reply=(src=172.16.1.4,dst=172.16.1.1,id=<cleared>,type=0,code=0),zone=<cleared> ]) @@ -4086,6 +4154,7 @@ NS_CHECK_EXEC([foo2], [ping -q -c 3 -i 0.3 -w 2 fd20::4 | FORMAT_PING], \ AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd20::1) | \ sed -e 's/zone=[[0-9]]*/zone=<cleared>/'], [0], [dnl icmpv6,orig=(src=fd11::3,dst=fd20::4,id=<cleared>,type=128,code=0),reply=(src=fd20::4,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared> +icmpv6,orig=(src=fd20::1,dst=fd12::2,id=<cleared>,type=128,code=0),reply=(src=fd12::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared> icmpv6,orig=(src=fd20::1,dst=fd20::4,id=<cleared>,type=128,code=0),reply=(src=fd12::2,dst=fd20::1,id=<cleared>,type=129,code=0),zone=<cleared> ])
This change fixes bug that breaks ability of machines from external networks to communicate with machines in SNATed networks (specifically when using a Distributed router). Currently when a machine (S1) on an external network tries to talk over TCP with a machine (A1) in a network that has enabled SNAT, the connection is established successfully. However after the three-way handshake, any packets that come from the A1 machine will have their source address translated by the Distributed router, breaking the communication. Existing rule in `build_lrouter_out_snat_flow` that decides which packets should be SNATed already tries to avoid SNATing packets in reply direction with `(!ct.trk || !ct.rpl)`. However, previous stages in the distributed LR egress pipeline do not initiate the CT state. Additionally we need to commit new connections that originate from external networks into CT, so that the packets in the reply direction can be properly identified. Rationale: In my original RFC [0], there were questions about the motivation for fixing this issue. I'll try to summarize why I think this is a bug that should be fixed. 1. Current implementation for Distributed router already tries to avoid SNATing packets in the reply direction, it's just missing initialized CT states to make proper decisions. 2. This same scenario works with Gateway Router. I tested with following setup: foo -- R1 -- join - R3 -- alice | bar ----------R2 R1 is a Distributed router with SNAT for foo. R2 is a Gateway router with SNAT for bar. R3 is a Gateway router with no SNAT. Using 'alice1' as a client I was able to talk over TCP with 'bar1' but connection with 'foo1' failed. 3. Regarding security and "leaking" of internal IPs. Reading through RFC 4787 [1], 5382 [2] and their update in 7857 [3], the specifications do not seem to mandate that SNAT implementations must filter incoming traffic destined directly to the internal network. Sections like "5. Filtering Behavior" in 4787 and "4.3 Externally Initiated Connections" in 5382 describe only behavior for traffic destined to external IP/Port associated with NAT on the device that performs NAT. Besides, with the current implementation, it's already possible to scan the internal network with pings and TCP syn scanning. If an additional security is needed, ACLs can be used to filter out incomming traffic from external networks. 4. We do have customers/clouds that depend on this functionality. This is a scenario that used to work in Openstack with ML2/OVS and migrating those clouds to ML2/OVN would break it. [0]https://mail.openvswitch.org/pipermail/ovs-dev/2024-February/411670.html [1]https://datatracker.ietf.org/doc/html/rfc4787 [2]https://datatracker.ietf.org/doc/html/rfc5382 [3]https://datatracker.ietf.org/doc/html/rfc7857 Signed-off-by: Martin Kalcok <martin.kalcok@canonical.com> --- northd/northd.c | 82 ++++++++++++++++++++++++++++++++++++----- northd/ovn-northd.8.xml | 29 +++++++++++++++ tests/ovn-northd.at | 15 +++++++- tests/system-ovn.at | 69 ++++++++++++++++++++++++++++++++++ 4 files changed, 185 insertions(+), 10 deletions(-)