Message ID | 20200212014107.110066-1-bpoirier@cumulusnetworks.com |
---|---|
State | Accepted |
Delegated to: | David Miller |
Headers | show |
Series | [net,1/2] ipv6: Fix route replacement with dev-only route | expand |
On Wed, Feb 12, 2020 at 10:41:06AM +0900, Benjamin Poirier wrote: > After commit 27596472473a ("ipv6: fix ECMP route replacement") it is no > longer possible to replace an ECMP-able route by a non ECMP-able route. > For example, > ip route add 2001:db8::1/128 via fe80::1 dev dummy0 > ip route replace 2001:db8::1/128 dev dummy0 > does not work as expected. > > Tweak the replacement logic so that point 3 in the log of the above commit > becomes: > 3. If the new route is not ECMP-able, and no matching non-ECMP-able route > exists, replace matching ECMP-able route (if any) or add the new route. > > We can now summarize the entire replace semantics to: > When doing a replace, prefer replacing a matching route of the same > "ECMP-able-ness" as the replace argument. If there is no such candidate, > fallback to the first route found. > > Fixes: 27596472473a ("ipv6: fix ECMP route replacement") > Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> > --- > net/ipv6/ip6_fib.c | 7 ++++--- > tools/testing/selftests/net/fib_tests.sh | 6 ++++++ > 2 files changed, 10 insertions(+), 3 deletions(-) > > diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c > index 58fbde244381..72abf892302f 100644 > --- a/net/ipv6/ip6_fib.c > +++ b/net/ipv6/ip6_fib.c > @@ -1102,8 +1102,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt, > found++; > break; > } > - if (rt_can_ecmp) > - fallback_ins = fallback_ins ?: ins; > + fallback_ins = fallback_ins ?: ins; > goto next_iter; > } > > @@ -1146,7 +1145,9 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt, > } > > if (fallback_ins && !found) { > - /* No ECMP-able route found, replace first non-ECMP one */ > + /* No matching route with same ecmp-able-ness found, replace > + * first matching route > + */ > ins = fallback_ins; > iter = rcu_dereference_protected(*ins, > lockdep_is_held(&rt->fib6_table->tb6_lock)); > diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh > index 6dd403103800..60273f1bc7d9 100755 > --- a/tools/testing/selftests/net/fib_tests.sh > +++ b/tools/testing/selftests/net/fib_tests.sh > @@ -910,6 +910,12 @@ ipv6_rt_replace_mpath() > check_route6 "2001:db8:104::/64 via 2001:db8:101::3 dev veth1 metric 1024" > log_test $? 0 "Multipath with single path via multipath attribute" > > + # multipath with dev-only > + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" > + run_cmd "$IP -6 ro replace 2001:db8:104::/64 dev veth1" > + check_route6 "2001:db8:104::/64 dev veth1 metric 1024" > + log_test $? 0 "Multipath with dev-only" > + > # route replace fails - invalid nexthop 1 > add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" > run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:111::3 nexthop via 2001:db8:103::3" > -- > 2.25.0 >
On 2/11/20 6:41 PM, Benjamin Poirier wrote: > After commit 27596472473a ("ipv6: fix ECMP route replacement") it is no > longer possible to replace an ECMP-able route by a non ECMP-able route. > For example, > ip route add 2001:db8::1/128 via fe80::1 dev dummy0 > ip route replace 2001:db8::1/128 dev dummy0 > does not work as expected. > > Tweak the replacement logic so that point 3 in the log of the above commit > becomes: > 3. If the new route is not ECMP-able, and no matching non-ECMP-able route > exists, replace matching ECMP-able route (if any) or add the new route. > > We can now summarize the entire replace semantics to: > When doing a replace, prefer replacing a matching route of the same > "ECMP-able-ness" as the replace argument. If there is no such candidate, > fallback to the first route found. > > Fixes: 27596472473a ("ipv6: fix ECMP route replacement") > Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com> > --- > net/ipv6/ip6_fib.c | 7 ++++--- > tools/testing/selftests/net/fib_tests.sh | 6 ++++++ > 2 files changed, 10 insertions(+), 3 deletions(-) > > diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c > index 58fbde244381..72abf892302f 100644 > --- a/net/ipv6/ip6_fib.c > +++ b/net/ipv6/ip6_fib.c > @@ -1102,8 +1102,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt, > found++; > break; > } > - if (rt_can_ecmp) > - fallback_ins = fallback_ins ?: ins; > + fallback_ins = fallback_ins ?: ins; > goto next_iter; > } > > @@ -1146,7 +1145,9 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt, > } > > if (fallback_ins && !found) { > - /* No ECMP-able route found, replace first non-ECMP one */ > + /* No matching route with same ecmp-able-ness found, replace > + * first matching route > + */ > ins = fallback_ins; > iter = rcu_dereference_protected(*ins, > lockdep_is_held(&rt->fib6_table->tb6_lock)); > diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh > index 6dd403103800..60273f1bc7d9 100755 > --- a/tools/testing/selftests/net/fib_tests.sh > +++ b/tools/testing/selftests/net/fib_tests.sh > @@ -910,6 +910,12 @@ ipv6_rt_replace_mpath() > check_route6 "2001:db8:104::/64 via 2001:db8:101::3 dev veth1 metric 1024" > log_test $? 0 "Multipath with single path via multipath attribute" > > + # multipath with dev-only > + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" > + run_cmd "$IP -6 ro replace 2001:db8:104::/64 dev veth1" > + check_route6 "2001:db8:104::/64 dev veth1 metric 1024" > + log_test $? 0 "Multipath with dev-only" > + > # route replace fails - invalid nexthop 1 > add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" > run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:111::3 nexthop via 2001:db8:103::3" > Thanks for adding a test case. I take this to mean that all existing tests pass with this change. We have found this code to be extremely sensitive to seemingly obvious changes. Added Ido.
From: Benjamin Poirier <bpoirier@cumulusnetworks.com> Date: Wed, 12 Feb 2020 10:41:06 +0900 > After commit 27596472473a ("ipv6: fix ECMP route replacement") it is no > longer possible to replace an ECMP-able route by a non ECMP-able route. > For example, > ip route add 2001:db8::1/128 via fe80::1 dev dummy0 > ip route replace 2001:db8::1/128 dev dummy0 > does not work as expected. > > Tweak the replacement logic so that point 3 in the log of the above commit > becomes: > 3. If the new route is not ECMP-able, and no matching non-ECMP-able route > exists, replace matching ECMP-able route (if any) or add the new route. > > We can now summarize the entire replace semantics to: > When doing a replace, prefer replacing a matching route of the same > "ECMP-able-ness" as the replace argument. If there is no such candidate, > fallback to the first route found. > > Fixes: 27596472473a ("ipv6: fix ECMP route replacement") > Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com> Applied and queued up for -stable.
On 2020/02/15 11:00 -0700, David Ahern wrote: [...] > > Thanks for adding a test case. I take this to mean that all existing > tests pass with this change. We have found this code to be extremely > sensitive to seemingly obvious changes. I saw two failures from that test suite, regardless of this change: IPv4 rp_filter tests TEST: rp_filter passes local packets [FAIL] TEST: rp_filter passes loopback packets [FAIL] The other tests, including the ipv6_rt group of tests, are OK. The rp_filter tests fail for me even if I build a kernel from commit adb701d6cfa4 ("selftests: add a test case for rp_filter") Running the first ping manually, tcpdump shows: root@vsid:~# tcpdump -nepi lo tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes 12:01:12.618623 52:54:00:6a:c7:5e > 52:54:00:6a:c7:5e, ethertype IPv4 (0x0800), length 98: 198.51.100.1 > 198.51.100.1: ICMP echo request, id 616, seq 435, length 64 12:01:12.618650 52:54:00:6a:c7:5e > 52:54:00:6a:c7:5e, ethertype IPv4 (0x0800), length 98: 198.51.100.1 > 198.51.100.1: ICMP echo reply, id 616, seq 435, length 64 `ping` doesn't show any replies (since it's bound to dummy1...?). Cong?
On Sun, Feb 16, 2020 at 7:12 PM Benjamin Poirier <bpoirier@cumulusnetworks.com> wrote: > > On 2020/02/15 11:00 -0700, David Ahern wrote: > [...] > > > > Thanks for adding a test case. I take this to mean that all existing > > tests pass with this change. We have found this code to be extremely > > sensitive to seemingly obvious changes. > > I saw two failures from that test suite, regardless of this change: > > IPv4 rp_filter tests > TEST: rp_filter passes local packets [FAIL] > TEST: rp_filter passes loopback packets [FAIL] > > The other tests, including the ipv6_rt group of tests, are OK. > > > The rp_filter tests fail for me even if I build a kernel from commit > adb701d6cfa4 ("selftests: add a test case for rp_filter") > > Running the first ping manually, tcpdump shows: > root@vsid:~# tcpdump -nepi lo > tcpdump: verbose output suppressed, use -v or -vv for full protocol decode > listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes > 12:01:12.618623 52:54:00:6a:c7:5e > 52:54:00:6a:c7:5e, ethertype IPv4 (0x0800), length 98: 198.51.100.1 > 198.51.100.1: ICMP echo request, id 616, seq 435, length 64 > 12:01:12.618650 52:54:00:6a:c7:5e > 52:54:00:6a:c7:5e, ethertype IPv4 (0x0800), length 98: 198.51.100.1 > 198.51.100.1: ICMP echo reply, id 616, seq 435, length 64 > > `ping` doesn't show any replies (since it's bound to dummy1...?). I suspect this is related to the version of ping, as it works perfectly for me on an old Fedora: IPv4 rp_filter tests ping: Warning: source address might be selected on device other than dummy1. PING 198.51.100.1 (198.51.100.1) from 198.51.100.1 dummy1: 56(84) bytes of data. 64 bytes from 198.51.100.1: icmp_seq=1 ttl=64 time=0.452 ms 64 bytes from 198.51.100.1: icmp_seq=2 ttl=64 time=0.418 ms 64 bytes from 198.51.100.1: icmp_seq=3 ttl=64 time=0.485 ms 64 bytes from 198.51.100.1: icmp_seq=4 ttl=64 time=0.446 ms 64 bytes from 198.51.100.1: icmp_seq=5 ttl=64 time=0.388 ms 64 bytes from 198.51.100.1: icmp_seq=6 ttl=64 time=0.362 ms 64 bytes from 198.51.100.1: icmp_seq=7 ttl=64 time=0.373 ms 64 bytes from 198.51.100.1: icmp_seq=8 ttl=64 time=0.515 ms 64 bytes from 198.51.100.1: icmp_seq=9 ttl=64 time=0.641 ms --- 198.51.100.1 ping statistics --- 9 packets transmitted, 9 received, 0% packet loss, time 8012ms rtt min/avg/max/mdev = 0.362/0.453/0.641/0.083 ms TEST: rp_filter passes local packets [ OK ] I do see the same failure on a new Fedora. So, I will take a look at this tomorrow. Thanks!
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index 58fbde244381..72abf892302f 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -1102,8 +1102,7 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt, found++; break; } - if (rt_can_ecmp) - fallback_ins = fallback_ins ?: ins; + fallback_ins = fallback_ins ?: ins; goto next_iter; } @@ -1146,7 +1145,9 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct fib6_info *rt, } if (fallback_ins && !found) { - /* No ECMP-able route found, replace first non-ECMP one */ + /* No matching route with same ecmp-able-ness found, replace + * first matching route + */ ins = fallback_ins; iter = rcu_dereference_protected(*ins, lockdep_is_held(&rt->fib6_table->tb6_lock)); diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh index 6dd403103800..60273f1bc7d9 100755 --- a/tools/testing/selftests/net/fib_tests.sh +++ b/tools/testing/selftests/net/fib_tests.sh @@ -910,6 +910,12 @@ ipv6_rt_replace_mpath() check_route6 "2001:db8:104::/64 via 2001:db8:101::3 dev veth1 metric 1024" log_test $? 0 "Multipath with single path via multipath attribute" + # multipath with dev-only + add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" + run_cmd "$IP -6 ro replace 2001:db8:104::/64 dev veth1" + check_route6 "2001:db8:104::/64 dev veth1 metric 1024" + log_test $? 0 "Multipath with dev-only" + # route replace fails - invalid nexthop 1 add_initial_route6 "nexthop via 2001:db8:101::2 nexthop via 2001:db8:103::2" run_cmd "$IP -6 ro replace 2001:db8:104::/64 nexthop via 2001:db8:111::3 nexthop via 2001:db8:103::3"
After commit 27596472473a ("ipv6: fix ECMP route replacement") it is no longer possible to replace an ECMP-able route by a non ECMP-able route. For example, ip route add 2001:db8::1/128 via fe80::1 dev dummy0 ip route replace 2001:db8::1/128 dev dummy0 does not work as expected. Tweak the replacement logic so that point 3 in the log of the above commit becomes: 3. If the new route is not ECMP-able, and no matching non-ECMP-able route exists, replace matching ECMP-able route (if any) or add the new route. We can now summarize the entire replace semantics to: When doing a replace, prefer replacing a matching route of the same "ECMP-able-ness" as the replace argument. If there is no such candidate, fallback to the first route found. Fixes: 27596472473a ("ipv6: fix ECMP route replacement") Signed-off-by: Benjamin Poirier <bpoirier@cumulusnetworks.com> --- net/ipv6/ip6_fib.c | 7 ++++--- tools/testing/selftests/net/fib_tests.sh | 6 ++++++ 2 files changed, 10 insertions(+), 3 deletions(-)