diff mbox series

[ovs-dev,2/2] ovn-northd: Fix IPv6 ECMP symmetric reply flows

Message ID 20210520163033.885819-3-mark.d.gray@redhat.com
State Superseded, archived
Headers show
Series Fix ECMP symmetric replies for IPv6 | expand

Commit Message

Mark Gray May 20, 2021, 4:30 p.m. UTC
When adding ECMP routes with symmetric replies, ovn-northd
adds a priority 100 flow in the table "lr_in_ip_routing" which bypasses
the ECMP routes for replies by using the ct.rpl field. Lower priority
flows are also added in this table for each route to IPv4/6 network. These
flows have a priority that is calculated by build_route_match() as a function
of the length of the prefix of the route. For long prefixes the priority of
these flows exceeds that of the symmetric reply flow causing unexpected
behaviour. This is particularly seen with IPv6 due to the long length of IPv6
prefixes.

This patch changes the priority of the symmetric reply flow to 300
which will ensure it will have a greater priority than the route
flows even at the maximum prefix length of 128.

This patch also adds additional tests for this condition.

Reported-at: https://bugzilla.redhat.com/1959008
Fixes: 4fdca656857d ("Add ECMP symmetric replies.")
Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
---
 northd/ovn-northd.8.xml |   2 +-
 northd/ovn-northd.c     |   2 +-
 tests/ovn.at            | 127 +++++++++++++++++++++++++++++++
 tests/system-ovn.at     | 162 ++++++++++++++++++++++++++++++++++++++++
 4 files changed, 291 insertions(+), 2 deletions(-)

Comments

Ilya Maximets May 20, 2021, 6:53 p.m. UTC | #1
On 5/20/21 6:30 PM, Mark Gray wrote:
> When adding ECMP routes with symmetric replies, ovn-northd
> adds a priority 100 flow in the table "lr_in_ip_routing" which bypasses
> the ECMP routes for replies by using the ct.rpl field. Lower priority
> flows are also added in this table for each route to IPv4/6 network. These
> flows have a priority that is calculated by build_route_match() as a function
> of the length of the prefix of the route. For long prefixes the priority of
> these flows exceeds that of the symmetric reply flow causing unexpected
> behaviour. This is particularly seen with IPv6 due to the long length of IPv6
> prefixes.
> 
> This patch changes the priority of the symmetric reply flow to 300
> which will ensure it will have a greater priority than the route
> flows even at the maximum prefix length of 128.
> 
> This patch also adds additional tests for this condition.
> 
> Reported-at: https://bugzilla.redhat.com/1959008
> Fixes: 4fdca656857d ("Add ECMP symmetric replies.")
> Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
> ---
>  northd/ovn-northd.8.xml |   2 +-
>  northd/ovn-northd.c     |   2 +-
>  tests/ovn.at            | 127 +++++++++++++++++++++++++++++++
>  tests/system-ovn.at     | 162 ++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 291 insertions(+), 2 deletions(-)

Shouldn't there be a corresponding ddlog part?

Best regards, Ilya Maximets.
Mark Gray May 21, 2021, 1:11 p.m. UTC | #2
On 20/05/2021 19:53, Ilya Maximets wrote:
> On 5/20/21 6:30 PM, Mark Gray wrote:
>> When adding ECMP routes with symmetric replies, ovn-northd
>> adds a priority 100 flow in the table "lr_in_ip_routing" which bypasses
>> the ECMP routes for replies by using the ct.rpl field. Lower priority
>> flows are also added in this table for each route to IPv4/6 network. These
>> flows have a priority that is calculated by build_route_match() as a function
>> of the length of the prefix of the route. For long prefixes the priority of
>> these flows exceeds that of the symmetric reply flow causing unexpected
>> behaviour. This is particularly seen with IPv6 due to the long length of IPv6
>> prefixes.
>>
>> This patch changes the priority of the symmetric reply flow to 300
>> which will ensure it will have a greater priority than the route
>> flows even at the maximum prefix length of 128.
>>
>> This patch also adds additional tests for this condition.
>>
>> Reported-at: https://bugzilla.redhat.com/1959008
>> Fixes: 4fdca656857d ("Add ECMP symmetric replies.")
>> Signed-off-by: Mark Gray <mark.d.gray@redhat.com>
>> ---
>>  northd/ovn-northd.8.xml |   2 +-
>>  northd/ovn-northd.c     |   2 +-
>>  tests/ovn.at            | 127 +++++++++++++++++++++++++++++++
>>  tests/system-ovn.at     | 162 ++++++++++++++++++++++++++++++++++++++++
>>  4 files changed, 291 insertions(+), 2 deletions(-)
> 
> Shouldn't there be a corresponding ddlog part?

Oops. I left it out by mistake.

I have sent a v2. I also slightly improved the system test by increasing
the prefix length.

https://patchwork.ozlabs.org/project/ovn/list/?series=245131

> 
> Best regards, Ilya Maximets.
>
diff mbox series

Patch

diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
index bca9cb4f4e79..bb77689de516 100644
--- a/northd/ovn-northd.8.xml
+++ b/northd/ovn-northd.8.xml
@@ -2621,7 +2621,7 @@  icmp6 {
 
     <p>
       If ECMP routes with symmetric reply are configured in the
-      <code>OVN_Northbound</code> database for a gateway router, a priority-100
+      <code>OVN_Northbound</code> database for a gateway router, a priority-300
       flow is added for each router port on which symmetric replies are
       configured. The matching logic for these ports essentially reverses the
       configured logic of the ECMP route. So for instance, a route with a
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 7464b7fba153..08eac1208dcb 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -8426,7 +8426,7 @@  add_ecmp_symmetric_reply_flows(struct hmap *lflows,
                   out_port->lrp_networks.ea_s,
                   IN6_IS_ADDR_V4MAPPED(&route->prefix) ? "" : "xx",
                   port_ip, out_port->json_key);
-    ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_IP_ROUTING, 100,
+    ovn_lflow_add_with_hint(lflows, od, S_ROUTER_IN_IP_ROUTING, 300,
                            ds_cstr(&match), ds_cstr(&actions),
                            &st_route->header_);
 
diff --git a/tests/ovn.at b/tests/ovn.at
index dcf3e0e09164..4220eb0f2c6f 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -22982,6 +22982,133 @@  OVN_CLEANUP([hv1], [hv2])
 AT_CLEANUP
 ])
 
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([ovn -- Symmetric IPv6 ECMP reply flows])
+ovn_start
+
+net_add n1
+sim_add hv1
+as hv1
+check ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.1
+
+sim_add hv2
+as hv2
+check ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.2
+
+# Logical network
+#
+#   ls1 \
+#        \
+#         DR -- join -- GW -- ext
+#        /
+#   ls2 /
+#
+#  ls1 and ls2 are internal switches connected to distributed router
+#  DR. DR is then connected via a join switch to gateway router GW.
+#  GW is then connected to external switch ext. In real life, this
+#  would likely have a localnet port, but for the purposes of this test
+#  it is unnecessary.
+
+ovn-nbctl create Logical_Router name=DR
+gw_uuid=$(ovn-nbctl create Logical_Router name=GW)
+
+check ovn-nbctl ls-add ls1
+check ovn-nbctl ls-add ls2
+check ovn-nbctl ls-add join
+check ovn-nbctl ls-add ext
+
+# Connect ls1 to DR
+check ovn-nbctl lrp-add DR dr-ls1 00:00:01:01:02:03 1001::1/64
+check ovn-nbctl lsp-add ls1 ls1-dr -- set Logical_Switch_Port ls1-dr \
+    type=router options:router-port=dr-ls1 addresses='"00:00:01:01:02:03"'
+
+# Connect ls2 to DR
+check ovn-nbctl lrp-add DR dr-ls2 00:00:01:01:02:04 1001::2/64
+check ovn-nbctl lsp-add ls2 ls2-dr -- set Logical_Switch_Port ls2-dr \
+    type=router options:router-port=dr-ls2 addresses='"00:00:01:01:02:04"'
+
+# Connect join to DR
+check ovn-nbctl lrp-add DR dr-join 00:00:02:01:02:03 2001::1/64
+check ovn-nbctl lsp-add join join-dr -- set Logical_Switch_Port join-dr \
+    type=router options:router-port=dr-join addresses='"00:00:02:01:02:03"'
+
+# Connect join to GW
+check ovn-nbctl lrp-add GW gw-join 00:00:02:01:02:04 2001::2/64
+check ovn-nbctl lsp-add join join-gw -- set Logical_Switch_Port join-gw \
+    type=router options:router-port=gw-join addresses='"00:00:02:01:02:04"'
+
+# Connect ext to GW
+check ovn-nbctl lrp-add GW gw-ext 00:00:03:01:02:03 7001::1/64
+check ovn-nbctl lsp-add ext ext-gw -- set Logical_Switch_Port ext-gw \
+    type=router options:router-port=gw-ext addresses='"00:00:03:01:02:03"'
+
+check ovn-nbctl lr-route-add GW 1001::/64 2001::1
+check ovn-nbctl --policy="src-ip" lr-route-add DR 1001::0/64 2001::2
+
+# Now add some ECMP routes to the GW router.
+check ovn-nbctl --ecmp-symmetric-reply --policy="src-ip" lr-route-add GW 1001::/64 7001::2
+check ovn-nbctl --ecmp-symmetric-reply --policy="src-ip" lr-route-add GW 1001::/64 7001::3
+
+wait_for_ports_up
+check ovn-nbctl --wait=hv sync
+
+# Ensure ECMP symmetric reply flows are not present on any hypervisor.
+AT_CHECK([
+    test 0 -eq $(as hv1 ovs-ofctl dump-flows br-int table=15 | \
+    grep "priority=100" | \
+    grep "ct(commit,zone=NXM_NX_REG11\\[[0..15\\]],exec(move:NXM_OF_ETH_SRC\\[[\\]]->NXM_NX_CT_LABEL\\[[32..79\\]],load:0x[[0-9]]->NXM_NX_CT_LABEL\\[[80..95\\]]))" -c)
+])
+AT_CHECK([
+    test 0 -eq $(as hv1 ovs-ofctl dump-flows br-int table=21 | \
+    grep "priority=200" | \
+    grep "actions=move:NXM_NX_CT_LABEL\\[[32..79\\]]->NXM_OF_ETH_DST\\[[\\]]" -c)
+])
+
+AT_CHECK([
+    test 0 -eq $(as hv2 ovs-ofctl dump-flows br-int table=15 | \
+    grep "priority=100" | \
+    grep "ct(commit,zone=NXM_NX_REG11\\[[0..15\\]],exec(move:NXM_OF_ETH_SRC\\[[\\]]->NXM_NX_CT_LABEL\\[[32..79\\]],load:0x[[0-9]]->NXM_NX_CT_LABEL\\[[80..95\\]]))" -c)
+])
+AT_CHECK([
+    test 0 -eq $(as hv2 ovs-ofctl dump-flows br-int table=21 | \
+    grep "priority=200" | \
+    grep "actions=move:NXM_NX_CT_LABEL\\[[32..79\\]]->NXM_OF_ETH_DST\\[[\\]]" -c)
+])
+
+# Now make GW a gateway router on hv1
+ovn-nbctl set Logical_Router $gw_uuid options:chassis=hv1
+ovn-nbctl --wait=hv sync
+
+# And ensure that ECMP symmetric reply flows are present only on hv1
+as hv1 ovs-ofctl dump-flows br-int > hv1flows
+AT_CAPTURE_FILE([hv1flows])
+as hv2 ovs-ofctl dump-flows br-int > hv2flows
+AT_CAPTURE_FILE([hv2flows])
+
+AT_CHECK([
+    for hv in 1 2; do
+        grep table=15 hv${hv}flows | \
+        grep "priority=100" | \
+        grep -c "ct(commit,zone=NXM_NX_REG11\\[[0..15\\]],exec(move:NXM_OF_ETH_SRC\\[[\\]]->NXM_NX_CT_LABEL\\[[32..79\\]],load:0x[[0-9]]->NXM_NX_CT_LABEL\\[[80..95\\]]))"
+
+        grep table=22 hv${hv}flows | \
+        grep "priority=200" | \
+        grep -c "actions=move:NXM_NX_CT_LABEL\\[[32..79\\]]->NXM_OF_ETH_DST\\[[\\]]"
+    done; :], [0], [dnl
+1
+1
+0
+0
+])
+
+OVN_CLEANUP([hv1], [hv2])
+AT_CLEANUP
+])
+
+
+
 OVN_FOR_EACH_NORTHD([
 # Test option:dynamic_neigh_routers. No static neighbor flows when enabled, and
 # traffic should still work, with the help of dynamic mac_bindings.
diff --git a/tests/system-ovn.at b/tests/system-ovn.at
index adc087b8d4b1..fbfd22821e48 100644
--- a/tests/system-ovn.at
+++ b/tests/system-ovn.at
@@ -5537,6 +5537,168 @@  OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
 AT_CLEANUP
 ])
 
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([ovn -- ECMP IPv6 symmetric reply])
+AT_KEYWORDS([ecmp])
+
+CHECK_CONNTRACK()
+ovn_start
+
+OVS_TRAFFIC_VSWITCHD_START()
+ADD_BR([br-int])
+
+# Set external-ids in br-int needed for ovn-controller
+ovs-vsctl \
+        -- set Open_vSwitch . external-ids:system-id=hv1 \
+        -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \
+        -- set Open_vSwitch . external-ids:ovn-encap-type=geneve \
+        -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \
+        -- set bridge br-int fail-mode=secure other-config:disable-in-band=true
+
+# Start ovn-controller
+start_daemon ovn-controller
+
+# Logical network:
+# Alice is connected to gateway router R1. R1 is connected to two "external"
+# routers, R2 and R3 via an "ext" switch.
+# Bob is connected to both R2 and R3. R1 contains two ECMP routes, one through R2
+# and one through R3, to Bob.
+#
+#     alice -- R1 -- ext ---- R2
+#                     |         \
+#                     |           bob
+#                     |         /
+#                     + ----- R3
+#
+# For this test, Bob sends request traffic through R2 to Alice. We want to ensure that
+# all response traffic from Alice is routed through R2 as well.
+
+ovn-nbctl create Logical_Router name=R1 options:chassis=hv1
+ovn-nbctl create Logical_Router name=R2
+ovn-nbctl create Logical_Router name=R3
+
+ovn-nbctl ls-add alice
+ovn-nbctl ls-add bob
+ovn-nbctl ls-add ext
+
+# connect alice to R1
+ovn-nbctl lrp-add R1 alice 00:00:01:01:02:03 fd01::1/64
+ovn-nbctl lsp-add alice rp-alice -- set Logical_Switch_Port rp-alice \
+    type=router options:router-port=alice addresses='"00:00:01:01:02:03"'
+
+# connect bob to R2
+ovn-nbctl lrp-add R2 R2_bob 00:00:02:01:02:03 fd07::2/64
+ovn-nbctl lsp-add bob rp2-bob -- set Logical_Switch_Port rp2-bob \
+    type=router options:router-port=R2_bob addresses='"00:00:02:01:02:03"'
+
+# connect bob to R3
+ovn-nbctl lrp-add R3 R3_bob 00:00:02:01:02:04 fd07::3/64
+ovn-nbctl lsp-add bob rp3-bob -- set Logical_Switch_Port rp3-bob \
+    type=router options:router-port=R3_bob addresses='"00:00:02:01:02:04"'
+
+# Connect R1 to ext
+ovn-nbctl lrp-add R1 R1_ext 00:00:04:01:02:03 fd02::1/64
+ovn-nbctl lsp-add ext r1-ext -- set Logical_Switch_Port r1-ext \
+    type=router options:router-port=R1_ext addresses='"00:00:04:01:02:03"'
+
+# Connect R2 to ext
+ovn-nbctl lrp-add R2 R2_ext 00:00:04:01:02:04 fd02::2/64
+ovn-nbctl lsp-add ext r2-ext -- set Logical_Switch_Port r2-ext \
+    type=router options:router-port=R2_ext addresses='"00:00:04:01:02:04"'
+
+# Connect R3 to ext
+ovn-nbctl lrp-add R3 R3_ext 00:00:04:01:02:05 fd02::3/64
+ovn-nbctl lsp-add ext r3-ext -- set Logical_Switch_Port r3-ext \
+    type=router options:router-port=R3_ext addresses='"00:00:04:01:02:05"'
+
+# Install ECMP routes for alice.
+ovn-nbctl --ecmp-symmetric-reply --policy="src-ip" lr-route-add R1 fd01::/64 fd02::2
+ovn-nbctl --ecmp-symmetric-reply --policy="src-ip" lr-route-add R1 fd01::/64 fd02::3
+
+# Static Routes
+ovn-nbctl lr-route-add R2 fd01::/64 fd02::1
+ovn-nbctl lr-route-add R3 fd01::/64 fd02::1
+
+# Logical port 'alice1' in switch 'alice'.
+ADD_NAMESPACES(alice1)
+ADD_VETH(alice1, alice1, br-int, "fd01::2/64", "f0:00:00:01:02:04", \
+         "fd01::1")
+OVS_WAIT_UNTIL([test "$(ip netns exec alice1 ip a | grep fd01::2 | grep tentative)" = ""])
+ovn-nbctl lsp-add alice alice1 \
+-- lsp-set-addresses alice1 "f0:00:00:01:02:04 fd01::2"
+
+# Logical port 'bob1' in switch 'bob'.
+ADD_NAMESPACES(bob1)
+ADD_VETH(bob1, bob1, br-int, "fd07::1/64", "f0:00:00:01:02:06", \
+         "fd07::2")
+OVS_WAIT_UNTIL([test "$(ip netns exec bob1 ip a | grep fd07::1 | grep tentative)" = ""])
+ovn-nbctl lsp-add bob bob1 \
+-- lsp-set-addresses bob1 "f0:00:00:01:02:06 fd07::1"
+
+# Ensure ovn-controller is caught up
+ovn-nbctl --wait=hv sync
+
+on_exit 'ovs-ofctl dump-flows br-int'
+
+# Later in this test we will check for a datapath flow that matches:
+# "ct_state(+new-est-rpl+trk).*ct(.*label=0x200000401020400000000/.*)". Due
+# to the way OVS generates datapath flows with wildcards, ICMPv6 NS flows will
+# evict this datapath flow. In order to ensure that the flow does not
+# get evicted, we send one ping packet in order to carry out neighbor
+# discovery. We then flush the datpath to remove the NS flows so that the flow
+# "ct_state(+new-est-rpl+trk).*ct(.*label=0x200000401020400000000/.*)" will
+# be present when we check for it.
+NS_CHECK_EXEC([bob1], [ping -q -c 2 -i 0.3 -w 15 fd01::2 | FORMAT_PING], \
+[0], [dnl
+2 packets transmitted, 2 received, 0% packet loss, time 0ms
+])
+ovs-appctl dpctl/del-flows
+
+# 'bob1' should be able to ping 'alice1' directly.
+NS_CHECK_EXEC([bob1], [ping -q -c 20 -i 0.3 -w 15 fd01::2 | FORMAT_PING], \
+[0], [dnl
+20 packets transmitted, 20 received, 0% packet loss, time 0ms
+])
+
+# Ensure conntrack entry is present. We should not try to predict
+# the tunnel key for the output port, so we strip it from the labels
+# and just ensure that the known ethernet address is present.
+AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fd01::2) | \
+sed -e 's/zone=[[0-9]]*/zone=<cleared>/' |
+sed -e 's/labels=0x[[0-9a-f]]*00000401020400000000/labels=0x00000401020400000000/'], [0], [dnl
+icmpv6,orig=(src=fd07::1,dst=fd01::2,id=<cleared>,type=128,code=0),reply=(src=fd01::2,dst=fd07::1,id=<cleared>,type=129,code=0),zone=<cleared>,labels=0x00000401020400000000
+])
+
+# Ensure datapaths show conntrack states as expected
+# Like with conntrack entries, we shouldn't try to predict
+# port binding tunnel keys. So omit them from expected labels.
+AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(+new-est-rpl+trk).*ct(.*label=0x200000401020400000000/.*)' -c], [0], [dnl
+1
+])
+AT_CHECK([ovs-appctl dpctl/dump-flows | grep 'ct_state(-new+est+rpl+trk).*ct_label(0x.*00000401020400000000/.*)' -c], [0], [dnl
+1
+])
+
+ovs-ofctl dump-flows br-int
+
+OVS_APP_EXIT_AND_WAIT([ovn-controller])
+
+as ovn-sb
+OVS_APP_EXIT_AND_WAIT([ovsdb-server])
+
+as ovn-nb
+OVS_APP_EXIT_AND_WAIT([ovsdb-server])
+
+as northd
+OVS_APP_EXIT_AND_WAIT([ovn-northd])
+
+as
+OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d
+/connection dropped.*/d"])
+
+AT_CLEANUP
+])
+
 OVN_FOR_EACH_NORTHD([
 AT_SETUP([ovn -- controller I-P handling when ovs iface ofport is -1])