diff mbox series

[v2] net/ipv4: always honour route mtu during forwarding

Message ID 20200923045143.3755128-1-zenczykowski@gmail.com
State Changes Requested
Delegated to: David Miller
Headers show
Series [v2] net/ipv4: always honour route mtu during forwarding | expand

Commit Message

Maciej Żenczykowski Sept. 23, 2020, 4:51 a.m. UTC
From: Maciej Żenczykowski <maze@google.com>

Documentation/networking/ip-sysctl.txt:46 says:
  ip_forward_use_pmtu - BOOLEAN
    By default we don't trust protocol path MTUs while forwarding
    because they could be easily forged and can lead to unwanted
    fragmentation by the router.
    You only need to enable this if you have user-space software
    which tries to discover path mtus by itself and depends on the
    kernel honoring this information. This is normally not the case.
    Default: 0 (disabled)
    Possible values:
    0 - disabled
    1 - enabled

Which makes it pretty clear that setting it to 1 is a potential
security/safety/DoS issue, and yet it is entirely reasonable to want
forwarded traffic to honour explicitly administrator configured
route mtus (instead of defaulting to device mtu).

Indeed, I can't think of a single reason why you wouldn't want to.
Since you configured a route mtu you probably know better...

It is pretty common to have a higher device mtu to allow receiving
large (jumbo) frames, while having some routes via that interface
(potentially including the default route to the internet) specify
a lower mtu.

Note that ipv6 forwarding uses device mtu unless the route is locked
(in which case it will use the route mtu).

This approach is not usable for IPv4 where an 'mtu lock' on a route
also has the side effect of disabling TCP path mtu discovery via
disabling the IPv4 DF (don't frag) bit on all outgoing frames.

I'm not aware of a way to lock a route from an IPv6 RA, so that also
potentially seems wrong.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
Cc: Eric Dumazet <maze@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Sunmeet Gill (Sunny) <sgill@quicinc.com>
Cc: Vinay Paradkar <vparadka@qti.qualcomm.com>
Cc: Tyler Wear <twear@quicinc.com>
Cc: David Ahern <dsahern@kernel.org>
---
 include/net/ip.h | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Eric Dumazet Sept. 23, 2020, 8:46 a.m. UTC | #1
On 9/23/20 6:51 AM, Maciej Żenczykowski wrote:
> From: Maciej Żenczykowski <maze@google.com>
> 
> Documentation/networking/ip-sysctl.txt:46 says:
>   ip_forward_use_pmtu - BOOLEAN
>     By default we don't trust protocol path MTUs while forwarding
>     because they could be easily forged and can lead to unwanted
>     fragmentation by the router.
>     You only need to enable this if you have user-space software
>     which tries to discover path mtus by itself and depends on the
>     kernel honoring this information. This is normally not the case.
>     Default: 0 (disabled)
>     Possible values:
>     0 - disabled
>     1 - enabled
> 
> Which makes it pretty clear that setting it to 1 is a potential
> security/safety/DoS issue, and yet it is entirely reasonable to want
> forwarded traffic to honour explicitly administrator configured
> route mtus (instead of defaulting to device mtu).
> 
> Indeed, I can't think of a single reason why you wouldn't want to.
> Since you configured a route mtu you probably know better...
> 
> It is pretty common to have a higher device mtu to allow receiving
> large (jumbo) frames, while having some routes via that interface
> (potentially including the default route to the internet) specify
> a lower mtu.
> 
> Note that ipv6 forwarding uses device mtu unless the route is locked
> (in which case it will use the route mtu).
> 
> This approach is not usable for IPv4 where an 'mtu lock' on a route
> also has the side effect of disabling TCP path mtu discovery via
> disabling the IPv4 DF (don't frag) bit on all outgoing frames.
> 
> I'm not aware of a way to lock a route from an IPv6 RA, so that also
> potentially seems wrong.
> 
> Signed-off-by: Maciej Żenczykowski <maze@google.com>
> Cc: Eric Dumazet <maze@google.com>

Note that my email address is more like : <edumazet@google.com>

> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Lorenzo Colitti <lorenzo@google.com>
> Cc: Sunmeet Gill (Sunny) <sgill@quicinc.com>
> Cc: Vinay Paradkar <vparadka@qti.qualcomm.com>
> Cc: Tyler Wear <twear@quicinc.com>
> Cc: David Ahern <dsahern@kernel.org>
> ---
>  include/net/ip.h | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/include/net/ip.h b/include/net/ip.h
> index b09c48d862cc..c2188bebbc54 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -436,12 +436,17 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
>  						    bool forwarding)
>  {
>  	struct net *net = dev_net(dst->dev);
> +	unsigned int mtu;
>  
>  	if (net->ipv4.sysctl_ip_fwd_use_pmtu ||
>  	    ip_mtu_locked(dst) ||
>  	    !forwarding)
>  		return dst_mtu(dst);
>  
> +	/* 'forwarding = true' case should always honour route mtu */
> +	mtu = dst_metric_raw(dst, RTAX_MTU);
> +	if (mtu) return mtu;


        if (mtu)
                return mtu;

Apparently route mtu are capped to 65520, not sure where it is done exactly IP_MAX_MTU being 65535)

# ip ro add 1.1.1.4 dev wlp2s0 mtu 100000
# ip ro get 1.1.1.4
1.1.1.4 dev wlp2s0 src 192.168.8.147 uid 0 
    cache mtu 65520 




> +
>  	return min(READ_ONCE(dst->dev->mtu), IP_MAX_MTU);
>  }
>  
>
David Ahern Sept. 23, 2020, 2:36 p.m. UTC | #2
On 9/23/20 2:46 AM, Eric Dumazet wrote:
>> diff --git a/include/net/ip.h b/include/net/ip.h
>> index b09c48d862cc..c2188bebbc54 100644
>> --- a/include/net/ip.h
>> +++ b/include/net/ip.h
>> @@ -436,12 +436,17 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
>>  						    bool forwarding)
>>  {
>>  	struct net *net = dev_net(dst->dev);
>> +	unsigned int mtu;
>>  
>>  	if (net->ipv4.sysctl_ip_fwd_use_pmtu ||
>>  	    ip_mtu_locked(dst) ||
>>  	    !forwarding)
>>  		return dst_mtu(dst);
>>  
>> +	/* 'forwarding = true' case should always honour route mtu */
>> +	mtu = dst_metric_raw(dst, RTAX_MTU);
>> +	if (mtu) return mtu;
> 
> 
>         if (mtu)
>                 return mtu;
> 
> Apparently route mtu are capped to 65520, not sure where it is done exactly IP_MAX_MTU being 65535)

ip_metrics_convert seems to be the place"
                if (type == RTAX_MTU && val > 65535 - 15)
                        val = 65535 - 15;

going back through the code moves, and it was added by Dave with
710ab6c03122c

> 
> # ip ro add 1.1.1.4 dev wlp2s0 mtu 100000
> # ip ro get 1.1.1.4
> 1.1.1.4 dev wlp2s0 src 192.168.8.147 uid 0 
>     cache mtu 65520 
> 
> 
> 
> 
>> +
>>  	return min(READ_ONCE(dst->dev->mtu), IP_MAX_MTU);
>>  }
>>  
>>
David Miller Sept. 23, 2020, 7:02 p.m. UTC | #3
From: David Ahern <dsahern@gmail.com>
Date: Wed, 23 Sep 2020 08:36:50 -0600

> On 9/23/20 2:46 AM, Eric Dumazet wrote:
>> Apparently route mtu are capped to 65520, not sure where it is done exactly IP_MAX_MTU being 65535)
> 
> ip_metrics_convert seems to be the place"
>                 if (type == RTAX_MTU && val > 65535 - 15)
>                         val = 65535 - 15;
> 
> going back through the code moves, and it was added by Dave with
> 710ab6c03122c

At the time of that commit IP_MAX_MTU only existed in net/ipv4/route.c
and it's value was 0xFFF0.

So I didn't add anything :-)  Just moving stuff around.
diff mbox series

Patch

diff --git a/include/net/ip.h b/include/net/ip.h
index b09c48d862cc..c2188bebbc54 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -436,12 +436,17 @@  static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
 						    bool forwarding)
 {
 	struct net *net = dev_net(dst->dev);
+	unsigned int mtu;
 
 	if (net->ipv4.sysctl_ip_fwd_use_pmtu ||
 	    ip_mtu_locked(dst) ||
 	    !forwarding)
 		return dst_mtu(dst);
 
+	/* 'forwarding = true' case should always honour route mtu */
+	mtu = dst_metric_raw(dst, RTAX_MTU);
+	if (mtu) return mtu;
+
 	return min(READ_ONCE(dst->dev->mtu), IP_MAX_MTU);
 }