Message ID | 71e7331f-d528-430e-f880-e995ff53d362@lists.m7n.se |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
Series | IPv6 PMTU discovery fails with source-specific routing | expand |
On 5/13/19 1:22 PM, Mikael Magnusson wrote: > Hello list, > > I think I have found a regression in 4.15+ kernels. IPv6 PMTU discovery > doesn't seem to work with source-specific routing (AKA source-address > dependent routing, SADR). > > I made a test script (see attachment). It sets up a test environment > with three network namespaces (a, b and c) using SADR. The link between > b and c is configured with MTU 1280. It then runs a ping test with large > packets. > > I have tested a couple of kernels on Ubuntu 19.04 with the following > results. > > mainline 4.14.117-0414117-generic SUCCESS > ubuntu 4.15.0-1036-oem FAIL > mainline 5.1.0-050100-generic FAIL > git bisect shows good: 38fbeeeeccdb38d0635398e8e344d245f6d8dc52 bad: 2b760fcf5cfb34e8610df56d83745b2b74ae1379 Those are back to back commits so 2b760fcf5cfb ipv6: hook up exception table to store dst cache has to be the bad commit. Your patch may work, but does not seem logical relative to code at the time of 4.15 and the commit that caused the failure. cc'ing authors of the changes referenced above.
Thanks Mikael for reporting this issue. And thanks David for the bisection. Let me spend some time to reproduce it and see what is going on. From: David Ahern <dsahern@gmail.com> Date: Mon, May 13, 2019 at 8:35 PM To: Mikael Magnusson, <netdev@vger.kernel.org>, Martin KaFai Lau, Wei Wang > On 5/13/19 1:22 PM, Mikael Magnusson wrote: > > Hello list, > > > > I think I have found a regression in 4.15+ kernels. IPv6 PMTU discovery > > doesn't seem to work with source-specific routing (AKA source-address > > dependent routing, SADR). > > > > I made a test script (see attachment). It sets up a test environment > > with three network namespaces (a, b and c) using SADR. The link between > > b and c is configured with MTU 1280. It then runs a ping test with large > > packets. > > > > I have tested a couple of kernels on Ubuntu 19.04 with the following > > results. > > > > mainline 4.14.117-0414117-generic SUCCESS > > ubuntu 4.15.0-1036-oem FAIL > > mainline 5.1.0-050100-generic FAIL > > > > git bisect shows > > good: 38fbeeeeccdb38d0635398e8e344d245f6d8dc52 > bad: 2b760fcf5cfb34e8610df56d83745b2b74ae1379 > > Those are back to back commits so > 2b760fcf5cfb ipv6: hook up exception table to store dst cache > > has to be the bad commit. > > Your patch may work, but does not seem logical relative to code at the > time of 4.15 and the commit that caused the failure. cc'ing authors of > the changes referenced above.
On Mon, 13 May 2019 23:12:31 -0700 Wei Wang <weiwan@google.com> wrote: > Thanks Mikael for reporting this issue. And thanks David for the bisection. > Let me spend some time to reproduce it and see what is going on. Mikael, by the way, once this is sorted out, it would be nice if you could add your test as a case in tools/testing/selftests/net/pmtu.sh -- you could probably reuse all the setup parts that are already implemented there.
I think the bug is because when creating exceptions, src_addr is not always set even though fib6_info is in the subtree. (because of rt6_is_gw_or_nonexthop() check) However, when looking up for exceptions, we always set src_addr to the passed in flow->src_addr if fib6_info is in the subtree. That causes the exception lookup to fail. I will make it consistent. However, I don't quite understand the following logic in ip6_rt_cache_alloc(): if (!rt6_is_gw_or_nonexthop(ort)) { if (ort->fib6_dst.plen != 128 && ipv6_addr_equal(&ort->fib6_dst.addr, daddr)) rt->rt6i_flags |= RTF_ANYCAST; #ifdef CONFIG_IPV6_SUBTREES if (rt->rt6i_src.plen && saddr) { rt->rt6i_src.addr = *saddr; rt->rt6i_src.plen = 128; } #endif } Why do we need to check that the route is not gateway and has next hop for updating rt6i_src? I checked the git history and it seems this part was there from very early on (with some refactor in between)... From: Stefano Brivio <sbrivio@redhat.com> Date: Tue, May 14, 2019 at 7:33 AM To: Mikael Magnusson Cc: Wei Wang, David Ahern, Linux Kernel Network Developers, Martin KaFai Lau > On Mon, 13 May 2019 23:12:31 -0700 > Wei Wang <weiwan@google.com> wrote: > > > Thanks Mikael for reporting this issue. And thanks David for the bisection. > > Let me spend some time to reproduce it and see what is going on. > > Mikael, by the way, once this is sorted out, it would be nice if you > could add your test as a case in tools/testing/selftests/net/pmtu.sh -- > you could probably reuse all the setup parts that are already > implemented there. > > -- > Stefano
On 5/14/19 1:33 PM, Wei Wang wrote: > I think the bug is because when creating exceptions, src_addr is not > always set even though fib6_info is in the subtree. (because of > rt6_is_gw_or_nonexthop() check) > However, when looking up for exceptions, we always set src_addr to the > passed in flow->src_addr if fib6_info is in the subtree. That causes > the exception lookup to fail. > I will make it consistent. > However, I don't quite understand the following logic in ip6_rt_cache_alloc(): > if (!rt6_is_gw_or_nonexthop(ort)) { > if (ort->fib6_dst.plen != 128 && > ipv6_addr_equal(&ort->fib6_dst.addr, daddr)) > rt->rt6i_flags |= RTF_ANYCAST; > #ifdef CONFIG_IPV6_SUBTREES > if (rt->rt6i_src.plen && saddr) { > rt->rt6i_src.addr = *saddr; > rt->rt6i_src.plen = 128; > } > #endif > } > Why do we need to check that the route is not gateway and has next hop > for updating rt6i_src? I checked the git history and it seems this > part was there from very early on (with some refactor in between)... I can not make sense of that check either.
On Tue, May 14, 2019 at 12:33:25PM -0700, Wei Wang wrote: > I think the bug is because when creating exceptions, src_addr is not > always set even though fib6_info is in the subtree. (because of > rt6_is_gw_or_nonexthop() check) > However, when looking up for exceptions, we always set src_addr to the > passed in flow->src_addr if fib6_info is in the subtree. That causes > the exception lookup to fail. > I will make it consistent. > However, I don't quite understand the following logic in ip6_rt_cache_alloc(): > if (!rt6_is_gw_or_nonexthop(ort)) { > if (ort->fib6_dst.plen != 128 && > ipv6_addr_equal(&ort->fib6_dst.addr, daddr)) > rt->rt6i_flags |= RTF_ANYCAST; > #ifdef CONFIG_IPV6_SUBTREES > if (rt->rt6i_src.plen && saddr) { > rt->rt6i_src.addr = *saddr; > rt->rt6i_src.plen = 128; > } > #endif > } > Why do we need to check that the route is not gateway and has next hop > for updating rt6i_src? I checked the git history and it seems this > part was there from very early on (with some refactor in between)... I also failed to understand the RTF_GATEWAY check. The earliest related commit seems to be c440f1609b65 ("ipv6: Do not depend on rt->n in ip6_pol_route().") How was it working when the exception route was in the tree? > > > From: Stefano Brivio <sbrivio@redhat.com> > Date: Tue, May 14, 2019 at 7:33 AM > To: Mikael Magnusson > Cc: Wei Wang, David Ahern, Linux Kernel Network Developers, Martin KaFai Lau > > > On Mon, 13 May 2019 23:12:31 -0700 > > Wei Wang <weiwan@google.com> wrote: > > > > > Thanks Mikael for reporting this issue. And thanks David for the bisection. > > > Let me spend some time to reproduce it and see what is going on. > > > > Mikael, by the way, once this is sorted out, it would be nice if you > > could add your test as a case in tools/testing/selftests/net/pmtu.sh -- > > you could probably reuse all the setup parts that are already > > implemented there. > > > > -- > > Stefano
On Wed, May 15, 2019 at 11:06 AM Martin Lau <kafai@fb.com> wrote: > > On Tue, May 14, 2019 at 12:33:25PM -0700, Wei Wang wrote: > > I think the bug is because when creating exceptions, src_addr is not > > always set even though fib6_info is in the subtree. (because of > > rt6_is_gw_or_nonexthop() check) > > However, when looking up for exceptions, we always set src_addr to the > > passed in flow->src_addr if fib6_info is in the subtree. That causes > > the exception lookup to fail. > > I will make it consistent. > > However, I don't quite understand the following logic in ip6_rt_cache_alloc(): > > if (!rt6_is_gw_or_nonexthop(ort)) { > > if (ort->fib6_dst.plen != 128 && > > ipv6_addr_equal(&ort->fib6_dst.addr, daddr)) > > rt->rt6i_flags |= RTF_ANYCAST; > > #ifdef CONFIG_IPV6_SUBTREES > > if (rt->rt6i_src.plen && saddr) { > > rt->rt6i_src.addr = *saddr; > > rt->rt6i_src.plen = 128; > > } > > #endif > > } > > Why do we need to check that the route is not gateway and has next hop > > for updating rt6i_src? I checked the git history and it seems this > > part was there from very early on (with some refactor in between)... > I also failed to understand the RTF_GATEWAY check. The earliest related > commit seems to be c440f1609b65 ("ipv6: Do not depend on rt->n in ip6_pol_route().") > > How was it working when the exception route was in the tree? > When adding all exception route to the main routing tree, because route cache has dest_addr as /128, the longest prefix match will always match the /128 route entry. > > > > > > From: Stefano Brivio <sbrivio@redhat.com> > > Date: Tue, May 14, 2019 at 7:33 AM > > To: Mikael Magnusson > > Cc: Wei Wang, David Ahern, Linux Kernel Network Developers, Martin KaFai Lau > > > > > On Mon, 13 May 2019 23:12:31 -0700 > > > Wei Wang <weiwan@google.com> wrote: > > > > > > > Thanks Mikael for reporting this issue. And thanks David for the bisection. > > > > Let me spend some time to reproduce it and see what is going on. > > > > > > Mikael, by the way, once this is sorted out, it would be nice if you > > > could add your test as a case in tools/testing/selftests/net/pmtu.sh -- > > > you could probably reuse all the setup parts that are already > > > implemented there. > > > > > > -- > > > Stefano
On Wed, May 15, 2019 at 11:31:44AM -0700, Wei Wang wrote: > On Wed, May 15, 2019 at 11:06 AM Martin Lau <kafai@fb.com> wrote: > > > > On Tue, May 14, 2019 at 12:33:25PM -0700, Wei Wang wrote: > > > I think the bug is because when creating exceptions, src_addr is not > > > always set even though fib6_info is in the subtree. (because of > > > rt6_is_gw_or_nonexthop() check) > > > However, when looking up for exceptions, we always set src_addr to the > > > passed in flow->src_addr if fib6_info is in the subtree. That causes > > > the exception lookup to fail. > > > I will make it consistent. > > > However, I don't quite understand the following logic in ip6_rt_cache_alloc(): > > > if (!rt6_is_gw_or_nonexthop(ort)) { > > > if (ort->fib6_dst.plen != 128 && > > > ipv6_addr_equal(&ort->fib6_dst.addr, daddr)) > > > rt->rt6i_flags |= RTF_ANYCAST; > > > #ifdef CONFIG_IPV6_SUBTREES > > > if (rt->rt6i_src.plen && saddr) { > > > rt->rt6i_src.addr = *saddr; > > > rt->rt6i_src.plen = 128; > > > } > > > #endif > > > } > > > Why do we need to check that the route is not gateway and has next hop > > > for updating rt6i_src? I checked the git history and it seems this > > > part was there from very early on (with some refactor in between)... > > I also failed to understand the RTF_GATEWAY check. The earliest related > > commit seems to be c440f1609b65 ("ipv6: Do not depend on rt->n in ip6_pol_route().") > > > > How was it working when the exception route was in the tree? > > > When adding all exception route to the main routing tree, because > route cache has dest_addr as /128, the longest prefix match will > always match the /128 route entry. Got it. Thanks for the explanation. > > > > > > > > > > From: Stefano Brivio <sbrivio@redhat.com> > > > Date: Tue, May 14, 2019 at 7:33 AM > > > To: Mikael Magnusson > > > Cc: Wei Wang, David Ahern, Linux Kernel Network Developers, Martin KaFai Lau > > > > > > > On Mon, 13 May 2019 23:12:31 -0700 > > > > Wei Wang <weiwan@google.com> wrote: > > > > > > > > > Thanks Mikael for reporting this issue. And thanks David for the bisection. > > > > > Let me spend some time to reproduce it and see what is going on. > > > > > > > > Mikael, by the way, once this is sorted out, it would be nice if you > > > > could add your test as a case in tools/testing/selftests/net/pmtu.sh -- > > > > you could probably reuse all the setup parts that are already > > > > implemented there. > > > > > > > > -- > > > > Stefano
From 4dd6d3e00663ec1749d8c6cf8139a2e255c7a797 Mon Sep 17 00:00:00 2001 From: Mikael Magnusson <mikma@users.sourceforge.net> Date: Wed, 8 May 2019 22:06:44 +0000 Subject: [PATCH] net/ipv6/route: Fix PMTU for source-specific routes --- net/ipv6/route.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 0520aca3354b..63d678588ec9 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1197,13 +1197,13 @@ static struct rt6_info *ip6_rt_cache_alloc(struct fib6_info *ort, if (ort->fib6_dst.plen != 128 && ipv6_addr_equal(&ort->fib6_dst.addr, daddr)) rt->rt6i_flags |= RTF_ANYCAST; + } #ifdef CONFIG_IPV6_SUBTREES - if (rt->rt6i_src.plen && saddr) { - rt->rt6i_src.addr = *saddr; - rt->rt6i_src.plen = 128; - } -#endif + if (rt->rt6i_src.plen && saddr) { + rt->rt6i_src.addr = *saddr; + rt->rt6i_src.plen = 128; } +#endif return rt; } -- 2.17.1