[net,v2,RESEND] ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes

Message ID 20180306101019.74424-1-sbrivio@redhat.com
State Accepted
Delegated to: David Miller
Headers show
Series
  • [net,v2,RESEND] ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes
Related show

Commit Message

Stefano Brivio March 6, 2018, 10:10 a.m.
Currently, administrative MTU changes on a given netdevice are
not reflected on route exceptions for MTU-less routes, with a
set PMTU value, for that device:

 # ip -6 route get 2001:db8::b
 2001:db8::b from :: dev vti_a proto kernel src 2001:db8::a metric 256 pref medium
 # ping6 -c 1 -q -s10000 2001:db8::b > /dev/null
 # ip netns exec a ip -6 route get 2001:db8::b
 2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
     cache expires 571sec mtu 4926 pref medium
 # ip link set dev vti_a mtu 3000
 # ip -6 route get 2001:db8::b
 2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
     cache expires 571sec mtu 4926 pref medium
 # ip link set dev vti_a mtu 9000
 # ip -6 route get 2001:db8::b
 2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
     cache expires 571sec mtu 4926 pref medium

The first issue is that since commit fb56be83e43d ("net-ipv6: on
device mtu change do not add mtu to mtu-less routes") we don't
call rt6_exceptions_update_pmtu() from rt6_mtu_change_route(),
which handles administrative MTU changes, if the regular route
is MTU-less.

However, PMTU exceptions should be always updated, as long as
RTAX_MTU is not locked. Keep the check for MTU-less main route,
as introduced by that commit, but, for exceptions,
call rt6_exceptions_update_pmtu() regardless of that check.

Once that is fixed, one problem remains: MTU changes are not
reflected if the new MTU is higher than the previous one,
because rt6_exceptions_update_pmtu() doesn't allow that. We
should instead allow PMTU increase if the old PMTU matches the
local MTU, as that implies that the old MTU was the lowest in the
path, and PMTU discovery might lead to different results.

The existing check in rt6_mtu_change_route() correctly took that
case into account (for regular routes only), so factor it out
and re-use it also in rt6_exceptions_update_pmtu().

While at it, fix comments style and grammar, and try to be a bit
more descriptive.

Reported-by: Xiumei Mu <xmu@redhat.com>
Fixes: fb56be83e43d ("net-ipv6: on device mtu change do not add mtu to mtu-less routes")
Fixes: f5bbe7ee79c2 ("ipv6: prepare rt6_mtu_change() for exception table")
Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
---
RESEND: I kept by mistake the same Message-Id as v1, patchwork
    doesn't like that

v2: Use 2001:db8::/32 addresses in commit message as assigned by
    RFC 3849 for documentation purposes [David Ahern], rephrase
    paragraph about check on MTU-less route

This patch introduces some visual code churn as I'm factoring out
the existing MTU checks from rt6_exceptions_update_pmtu() and
updating comments style and syntax. Real code changes are rather
small.

Let me know if I should rather submit an "ugly" fix for net, and
a separate, small refactoring for net-next.

 net/ipv6/route.c | 71 +++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 42 insertions(+), 29 deletions(-)

Comments

David Ahern March 6, 2018, 8:30 p.m. | #1
On 3/6/18 3:10 AM, Stefano Brivio wrote:
> Currently, administrative MTU changes on a given netdevice are
> not reflected on route exceptions for MTU-less routes, with a
> set PMTU value, for that device:
> 
>  # ip -6 route get 2001:db8::b
>  2001:db8::b from :: dev vti_a proto kernel src 2001:db8::a metric 256 pref medium
>  # ping6 -c 1 -q -s10000 2001:db8::b > /dev/null
>  # ip netns exec a ip -6 route get 2001:db8::b
>  2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
>      cache expires 571sec mtu 4926 pref medium
>  # ip link set dev vti_a mtu 3000
>  # ip -6 route get 2001:db8::b
>  2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
>      cache expires 571sec mtu 4926 pref medium
>  # ip link set dev vti_a mtu 9000
>  # ip -6 route get 2001:db8::b
>  2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
>      cache expires 571sec mtu 4926 pref medium

Using your test script, I never see the route get an updated MTU -- it
is always 1426.

++ exception='fd00:2::b from :: dev vti_a src fd00:2::a metric 0 expires
598sec mtu 1426 pref medium'
Stefano Brivio March 6, 2018, 9:07 p.m. | #2
On Tue, 6 Mar 2018 13:30:05 -0700
David Ahern <dsahern@gmail.com> wrote:

> On 3/6/18 3:10 AM, Stefano Brivio wrote:
> > Currently, administrative MTU changes on a given netdevice are
> > not reflected on route exceptions for MTU-less routes, with a
> > set PMTU value, for that device:
> > 
> >  # ip -6 route get 2001:db8::b
> >  2001:db8::b from :: dev vti_a proto kernel src 2001:db8::a metric 256 pref medium
> >  # ping6 -c 1 -q -s10000 2001:db8::b > /dev/null
> >  # ip netns exec a ip -6 route get 2001:db8::b
> >  2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
> >      cache expires 571sec mtu 4926 pref medium
> >  # ip link set dev vti_a mtu 3000
> >  # ip -6 route get 2001:db8::b
> >  2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
> >      cache expires 571sec mtu 4926 pref medium
> >  # ip link set dev vti_a mtu 9000
> >  # ip -6 route get 2001:db8::b
> >  2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
> >      cache expires 571sec mtu 4926 pref medium  
> 
> Using your test script, I never see the route get an updated MTU -- it
> is always 1426.
> 
> ++ exception='fd00:2::b from :: dev vti_a src fd00:2::a metric 0 expires
> 598sec mtu 1426 pref medium'

Thanks for reporting this.

There's another issue in the test script: the initial PMTU of the
exception depends on the veth MTU value, which I'm not explicitly
setting. It happened to be 5000 on my host, it's a more reasonable 1500
on yours.

If you have 1426 as initial PMTU, by setting the MTU to 3000 as second
step in the script, I'm clearly not decreasing it.

I'll send a v3 of the test script.
David Ahern March 7, 2018, 4:46 a.m. | #3
On 3/6/18 3:10 AM, Stefano Brivio wrote:
> Currently, administrative MTU changes on a given netdevice are
> not reflected on route exceptions for MTU-less routes, with a
> set PMTU value, for that device:
> 
>  # ip -6 route get 2001:db8::b
>  2001:db8::b from :: dev vti_a proto kernel src 2001:db8::a metric 256 pref medium
>  # ping6 -c 1 -q -s10000 2001:db8::b > /dev/null
>  # ip netns exec a ip -6 route get 2001:db8::b
>  2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
>      cache expires 571sec mtu 4926 pref medium
>  # ip link set dev vti_a mtu 3000
>  # ip -6 route get 2001:db8::b
>  2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
>      cache expires 571sec mtu 4926 pref medium
>  # ip link set dev vti_a mtu 9000
>  # ip -6 route get 2001:db8::b
>  2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
>      cache expires 571sec mtu 4926 pref medium
> 
> The first issue is that since commit fb56be83e43d ("net-ipv6: on
> device mtu change do not add mtu to mtu-less routes") we don't
> call rt6_exceptions_update_pmtu() from rt6_mtu_change_route(),
> which handles administrative MTU changes, if the regular route
> is MTU-less.
> 
> However, PMTU exceptions should be always updated, as long as
> RTAX_MTU is not locked. Keep the check for MTU-less main route,
> as introduced by that commit, but, for exceptions,
> call rt6_exceptions_update_pmtu() regardless of that check.
> 
> Once that is fixed, one problem remains: MTU changes are not
> reflected if the new MTU is higher than the previous one,
> because rt6_exceptions_update_pmtu() doesn't allow that. We
> should instead allow PMTU increase if the old PMTU matches the
> local MTU, as that implies that the old MTU was the lowest in the
> path, and PMTU discovery might lead to different results.
> 
> The existing check in rt6_mtu_change_route() correctly took that
> case into account (for regular routes only), so factor it out
> and re-use it also in rt6_exceptions_update_pmtu().
> 
> While at it, fix comments style and grammar, and try to be a bit
> more descriptive.
> 
> Reported-by: Xiumei Mu <xmu@redhat.com>
> Fixes: fb56be83e43d ("net-ipv6: on device mtu change do not add mtu to mtu-less routes")
> Fixes: f5bbe7ee79c2 ("ipv6: prepare rt6_mtu_change() for exception table")
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
> ---

Acked-by: David Ahern <dsahern@gmail.com>
David Miller March 7, 2018, 6:18 p.m. | #4
From: Stefano Brivio <sbrivio@redhat.com>
Date: Tue,  6 Mar 2018 11:10:19 +0100

> Currently, administrative MTU changes on a given netdevice are
> not reflected on route exceptions for MTU-less routes, with a
> set PMTU value, for that device:
> 
>  # ip -6 route get 2001:db8::b
>  2001:db8::b from :: dev vti_a proto kernel src 2001:db8::a metric 256 pref medium
>  # ping6 -c 1 -q -s10000 2001:db8::b > /dev/null
>  # ip netns exec a ip -6 route get 2001:db8::b
>  2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
>      cache expires 571sec mtu 4926 pref medium
>  # ip link set dev vti_a mtu 3000
>  # ip -6 route get 2001:db8::b
>  2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
>      cache expires 571sec mtu 4926 pref medium
>  # ip link set dev vti_a mtu 9000
>  # ip -6 route get 2001:db8::b
>  2001:db8::b from :: dev vti_a src 2001:db8::a metric 0
>      cache expires 571sec mtu 4926 pref medium
> 
> The first issue is that since commit fb56be83e43d ("net-ipv6: on
> device mtu change do not add mtu to mtu-less routes") we don't
> call rt6_exceptions_update_pmtu() from rt6_mtu_change_route(),
> which handles administrative MTU changes, if the regular route
> is MTU-less.
> 
> However, PMTU exceptions should be always updated, as long as
> RTAX_MTU is not locked. Keep the check for MTU-less main route,
> as introduced by that commit, but, for exceptions,
> call rt6_exceptions_update_pmtu() regardless of that check.
> 
> Once that is fixed, one problem remains: MTU changes are not
> reflected if the new MTU is higher than the previous one,
> because rt6_exceptions_update_pmtu() doesn't allow that. We
> should instead allow PMTU increase if the old PMTU matches the
> local MTU, as that implies that the old MTU was the lowest in the
> path, and PMTU discovery might lead to different results.
> 
> The existing check in rt6_mtu_change_route() correctly took that
> case into account (for regular routes only), so factor it out
> and re-use it also in rt6_exceptions_update_pmtu().
> 
> While at it, fix comments style and grammar, and try to be a bit
> more descriptive.
> 
> Reported-by: Xiumei Mu <xmu@redhat.com>
> Fixes: fb56be83e43d ("net-ipv6: on device mtu change do not add mtu to mtu-less routes")
> Fixes: f5bbe7ee79c2 ("ipv6: prepare rt6_mtu_change() for exception table")
> Signed-off-by: Stefano Brivio <sbrivio@redhat.com>

Applied and queued up for -stable, thanks.

Patch

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 9dcfadddd800..0db4218c9186 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1509,7 +1509,30 @@  static void rt6_exceptions_remove_prefsrc(struct rt6_info *rt)
 	}
 }
 
-static void rt6_exceptions_update_pmtu(struct rt6_info *rt, int mtu)
+static bool rt6_mtu_change_route_allowed(struct inet6_dev *idev,
+					 struct rt6_info *rt, int mtu)
+{
+	/* If the new MTU is lower than the route PMTU, this new MTU will be the
+	 * lowest MTU in the path: always allow updating the route PMTU to
+	 * reflect PMTU decreases.
+	 *
+	 * If the new MTU is higher, and the route PMTU is equal to the local
+	 * MTU, this means the old MTU is the lowest in the path, so allow
+	 * updating it: if other nodes now have lower MTUs, PMTU discovery will
+	 * handle this.
+	 */
+
+	if (dst_mtu(&rt->dst) >= mtu)
+		return true;
+
+	if (dst_mtu(&rt->dst) == idev->cnf.mtu6)
+		return true;
+
+	return false;
+}
+
+static void rt6_exceptions_update_pmtu(struct inet6_dev *idev,
+				       struct rt6_info *rt, int mtu)
 {
 	struct rt6_exception_bucket *bucket;
 	struct rt6_exception *rt6_ex;
@@ -1518,20 +1541,22 @@  static void rt6_exceptions_update_pmtu(struct rt6_info *rt, int mtu)
 	bucket = rcu_dereference_protected(rt->rt6i_exception_bucket,
 					lockdep_is_held(&rt6_exception_lock));
 
-	if (bucket) {
-		for (i = 0; i < FIB6_EXCEPTION_BUCKET_SIZE; i++) {
-			hlist_for_each_entry(rt6_ex, &bucket->chain, hlist) {
-				struct rt6_info *entry = rt6_ex->rt6i;
-				/* For RTF_CACHE with rt6i_pmtu == 0
-				 * (i.e. a redirected route),
-				 * the metrics of its rt->dst.from has already
-				 * been updated.
-				 */
-				if (entry->rt6i_pmtu && entry->rt6i_pmtu > mtu)
-					entry->rt6i_pmtu = mtu;
-			}
-			bucket++;
+	if (!bucket)
+		return;
+
+	for (i = 0; i < FIB6_EXCEPTION_BUCKET_SIZE; i++) {
+		hlist_for_each_entry(rt6_ex, &bucket->chain, hlist) {
+			struct rt6_info *entry = rt6_ex->rt6i;
+
+			/* For RTF_CACHE with rt6i_pmtu == 0 (i.e. a redirected
+			 * route), the metrics of its rt->dst.from have already
+			 * been updated.
+			 */
+			if (entry->rt6i_pmtu &&
+			    rt6_mtu_change_route_allowed(idev, entry, mtu))
+				entry->rt6i_pmtu = mtu;
 		}
+		bucket++;
 	}
 }
 
@@ -3809,25 +3834,13 @@  static int rt6_mtu_change_route(struct rt6_info *rt, void *p_arg)
 	   Since RFC 1981 doesn't include administrative MTU increase
 	   update PMTU increase is a MUST. (i.e. jumbo frame)
 	 */
-	/*
-	   If new MTU is less than route PMTU, this new MTU will be the
-	   lowest MTU in the path, update the route PMTU to reflect PMTU
-	   decreases; if new MTU is greater than route PMTU, and the
-	   old MTU is the lowest MTU in the path, update the route PMTU
-	   to reflect the increase. In this case if the other nodes' MTU
-	   also have the lowest MTU, TOO BIG MESSAGE will be lead to
-	   PMTU discovery.
-	 */
 	if (rt->dst.dev == arg->dev &&
-	    dst_metric_raw(&rt->dst, RTAX_MTU) &&
 	    !dst_metric_locked(&rt->dst, RTAX_MTU)) {
 		spin_lock_bh(&rt6_exception_lock);
-		if (dst_mtu(&rt->dst) >= arg->mtu ||
-		    (dst_mtu(&rt->dst) < arg->mtu &&
-		     dst_mtu(&rt->dst) == idev->cnf.mtu6)) {
+		if (dst_metric_raw(&rt->dst, RTAX_MTU) &&
+		    rt6_mtu_change_route_allowed(idev, rt, arg->mtu))
 			dst_metric_set(&rt->dst, RTAX_MTU, arg->mtu);
-		}
-		rt6_exceptions_update_pmtu(rt, arg->mtu);
+		rt6_exceptions_update_pmtu(idev, rt, arg->mtu);
 		spin_unlock_bh(&rt6_exception_lock);
 	}
 	return 0;