diff mbox series

Revert "ipv6: add mtu lock check in __ip6_rt_update_pmtu"

Message ID 20200505185723.191944-1-zenczykowski@gmail.com
State Accepted
Delegated to: David Miller
Headers show
Series Revert "ipv6: add mtu lock check in __ip6_rt_update_pmtu" | expand

Commit Message

Maciej Żenczykowski May 5, 2020, 6:57 p.m. UTC
From: Maciej Żenczykowski <maze@google.com>

This reverts commit 19bda36c4299ce3d7e5bce10bebe01764a655a6d:

| ipv6: add mtu lock check in __ip6_rt_update_pmtu
|
| Prior to this patch, ipv6 didn't do mtu lock check in ip6_update_pmtu.
| It leaded to that mtu lock doesn't really work when receiving the pkt
| of ICMPV6_PKT_TOOBIG.
|
| This patch is to add mtu lock check in __ip6_rt_update_pmtu just as ipv4
| did in __ip_rt_update_pmtu.

The above reasoning is incorrect.  IPv6 *requires* icmp based pmtu to work.
There's already a comment to this effect elsewhere in the kernel:

  $ git grep -p -B1 -A3 'RTAX_MTU lock'
  net/ipv6/route.c=4813=

  static int rt6_mtu_change_route(struct fib6_info *f6i, void *p_arg)
  ...
    /* In IPv6 pmtu discovery is not optional,
       so that RTAX_MTU lock cannot disable it.
       We still use this lock to block changes
       caused by addrconf/ndisc.
    */

This reverts to the pre-4.9 behaviour.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Xin Long <lucien.xin@gmail.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
Fixes: 19bda36c4299 ("ipv6: add mtu lock check in __ip6_rt_update_pmtu")
---
 net/ipv6/route.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

David Miller May 5, 2020, 9:23 p.m. UTC | #1
From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Tue,  5 May 2020 11:57:23 -0700

> The above reasoning is incorrect.  IPv6 *requires* icmp based pmtu to work.
> There's already a comment to this effect elsewhere in the kernel:

How can an internet standard specify a system local policy decision
on this level?

If I want to lock the MTU value on my ipv6 routes, I have a reason
and I should be able to do it.
Maciej Żenczykowski May 5, 2020, 9:56 p.m. UTC | #2
I don't buy your argument at all.
There's *lots* of places where internet standards prevent Linux from
doing various things.
Trying to prevent users from shooting themselves in the foot, or
trying to be a good netizen.

If users require their computers to be broken, they can patch and
build their own kernels.

Indeed, the entire point of internet standards is interoperability and
specifying things that must or must not be done.

To quote from https://tools.ietf.org/html/rfc8201

Nodes not implementing Path MTU Discovery must use the IPv6 minimum
   link MTU defined in [RFC8200] as the maximum packet size.

(my comment: ie. 1280)

...

Note that Path MTU Discovery must be performed even in cases where a
   node "thinks" a destination is attached to the same link as itself,
   as it might have a PMTU lower than the link MTU.  In a situation such
   as when a neighboring router acts as proxy [ND] for some destination,
   the destination can appear to be directly connected, but it is in
   fact more than one hop away.

...

When a node receives a Packet Too Big message, it must reduce its
   estimate of the PMTU for the relevant path, based on the value of the
   MTU field in the message.

...

After receiving a Packet Too Big message, a node must attempt to
   avoid eliciting more such messages in the near future.  The node must
   reduce the size of the packets it is sending along the path

...

Because each of these messages (and
   the dropped packets they respond to) consume network resources, nodes
   using Path MTU Discovery must detect decreases in PMTU as fast as
   possible.

--

Furthermore, as we're finally upgrading to 4.9+ kernels, we now have
customers complaining about broken ipv6 pmtud.
This is a userspace visible regression in previously correct behaviour.

And we do have a reason for locking the mtu with the old pre-4.9 behaviour:
So we can change the mtu of the interfaces without it affecting the
mtu of the routes through those interfaces.
David Miller May 5, 2020, 10:09 p.m. UTC | #3
From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Tue, 5 May 2020 14:56:03 -0700

> There's *lots* of places where internet standards prevent Linux from
> doing various things.

"Linux" generally speaking?

That's true only if "rm -rf net/netfilter/ net/ipv4/netfilter net/ipv6/netfilter"

Also, insert an XDP program... --> "non compliant"

And so on and so forth. :-)

It's local system policy, how do I react to packets.  If it doesn't
violate the min/max limits for ipv6 packets it emits onto the internet
I don't see this as something that can be seen as mandatory.
Maciej Żenczykowski May 5, 2020, 11:19 p.m. UTC | #4
> It's local system policy, how do I react to packets.  If it doesn't
> violate the min/max limits for ipv6 packets it emits onto the internet
> I don't see this as something that can be seen as mandatory.

And if you *truly* do want to violate internet standards you can
indeed already achieve this behaviour by dropping incoming icmpv6
packet too big errors (and there's lots of reasons why that is a bad
idea...).

I'll repeat what I said previously: this is a userspace visible
regression in behaviour, of none or very questionable benefit.

It results in TCP over IPv6 simply not working to destinations to
which your locked mtu is higher then the real path mtu.  This is why
'locked mtu' on IPv4 turns of the Don't Fragment bit - to allow
fragmentation at intermediate routers.  There is no such thing in
IPv6.
There is no DF bit, and there is no router fragmentation - all ipv6
fragmentation is supposed to happen at the source host.
This is why hosts must either use 1280 min guaranteed mtu or be
responsive to pmtu errors.  Otherwise things simply don't work.
Maciej Żenczykowski May 5, 2020, 11:26 p.m. UTC | #5
> > It's local system policy, how do I react to packets.  If it doesn't
> > violate the min/max limits for ipv6 packets it emits onto the internet
> > I don't see this as something that can be seen as mandatory.

It does violate the max limit for ipv6 packets it emits onto the internet.

You're not allowed to emit > 1280 mtu packets without also supporting pmtu.

>
> And if you *truly* do want to violate internet standards you can
> indeed already achieve this behaviour by dropping incoming icmpv6
> packet too big errors (and there's lots of reasons why that is a bad
> idea...).
>
> I'll repeat what I said previously: this is a userspace visible
> regression in behaviour, of none or very questionable benefit.
>
> It results in TCP over IPv6 simply not working to destinations to
> which your locked mtu is higher then the real path mtu.  This is why
> 'locked mtu' on IPv4 turns of the Don't Fragment bit - to allow
> fragmentation at intermediate routers.  There is no such thing in
> IPv6.
> There is no DF bit, and there is no router fragmentation - all ipv6
> fragmentation is supposed to happen at the source host.
> This is why hosts must either use 1280 min guaranteed mtu or be
> responsive to pmtu errors.  Otherwise things simply don't work.
David Miller May 8, 2020, 12:30 a.m. UTC | #6
From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Tue,  5 May 2020 11:57:23 -0700

> From: Maciej Żenczykowski <maze@google.com>
> 
> This reverts commit 19bda36c4299ce3d7e5bce10bebe01764a655a6d:
> 
> | ipv6: add mtu lock check in __ip6_rt_update_pmtu
> |
> | Prior to this patch, ipv6 didn't do mtu lock check in ip6_update_pmtu.
> | It leaded to that mtu lock doesn't really work when receiving the pkt
> | of ICMPV6_PKT_TOOBIG.
> |
> | This patch is to add mtu lock check in __ip6_rt_update_pmtu just as ipv4
> | did in __ip_rt_update_pmtu.
> 
> The above reasoning is incorrect.  IPv6 *requires* icmp based pmtu to work.
> There's already a comment to this effect elsewhere in the kernel:
> 
>   $ git grep -p -B1 -A3 'RTAX_MTU lock'
>   net/ipv6/route.c=4813=
> 
>   static int rt6_mtu_change_route(struct fib6_info *f6i, void *p_arg)
>   ...
>     /* In IPv6 pmtu discovery is not optional,
>        so that RTAX_MTU lock cannot disable it.
>        We still use this lock to block changes
>        caused by addrconf/ndisc.
>     */
> 
> This reverts to the pre-4.9 behaviour.
> 
> Signed-off-by: Maciej Żenczykowski <maze@google.com>
> Fixes: 19bda36c4299 ("ipv6: add mtu lock check in __ip6_rt_update_pmtu")

I've thought about this some more and decided to apply this and
queue it up for -stable, thank you.
Maciej Żenczykowski May 8, 2020, 12:33 a.m. UTC | #7
> I've thought about this some more and decided to apply this and
> queue it up for -stable, thank you.

Thank you!
diff mbox series

Patch

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 8d418038fe32..ff847a324220 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2722,8 +2722,10 @@  static void __ip6_rt_update_pmtu(struct dst_entry *dst, const struct sock *sk,
 	const struct in6_addr *daddr, *saddr;
 	struct rt6_info *rt6 = (struct rt6_info *)dst;
 
-	if (dst_metric_locked(dst, RTAX_MTU))
-		return;
+	/* Note: do *NOT* check dst_metric_locked(dst, RTAX_MTU)
+	 * IPv6 pmtu discovery isn't optional, so 'mtu lock' cannot disable it.
+	 * [see also comment in rt6_mtu_change_route()]
+	 */
 
 	if (iph) {
 		daddr = &iph->daddr;