[net-next] ip6_vti: adjust vti mtu according to mtu of output device

Message ID 1512578299-7573-1-git-send-email-alexey.kodanev@oracle.com
State Changes Requested
Delegated to: David Miller
Headers show
Series
  • [net-next] ip6_vti: adjust vti mtu according to mtu of output device
Related show

Commit Message

Alexey Kodanev Dec. 6, 2017, 4:38 p.m.
LTP/udp6_ipsec_vti tests fail when sending large UDP datagrams
that require fragmentation and underlying device MTU <= 1500.
This happens because ip6_vti sets mtu to ETH_DATA_LEN and not
updating it depending on a destiantion address.

Futhure attempts to send UDP packets may succeed because pmtu
get updated on ICMPV6_PKT_TOOBIG in vti6_err().

Here is the example when output device MTU set to 9000:

  # ip a sh ltp_ns_veth2
    ltp_ns_veth2@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 ...
      inet 10.0.0.2/24 scope global ltp_ns_veth2
      inet6 fd00::2/64 scope global
      ...
  # ip li add vti6 type vti6 local fd00::2 remote fd00::1
  # ip li show vti6
    vti6@NONE: <POINTOPOINT,NOARP> mtu 1500 ...
      link/tunnel6 fd00::2 peer fd00::1

After the patch:

  # ip li add vti6 type vti6 local fd00::2 remote fd00::1
  # ip li show vti6
    vti6@NONE: <POINTOPOINT,NOARP> mtu 8832 ...
      link/tunnel6 fd00::2 peer fd00::1

Regarding ip_vti, it already tunes mtu with ip_tunnel_bind_dev():

  # ip li add vti4 type vti local 10.0.0.2 remote 10.0.0.1
  # ip li sh vti4
    vti4@NONE: <POINTOPOINT,NOARP> mtu 8832 ...
      link/ipip 10.0.0.2 peer 10.0.0.1

Reported-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
---

ip6_vti mtu offset is the same (168) as in ip_vti because ip_vti
offset includes two sizes of struct iphdr: in dev->hard_header_len
and in t_hlen in ip_tunnel_bind_dev(). I'm not sure if it's correct.

 net/ipv6/ip6_vti.c |   18 ++++++++++++++++++
 1 files changed, 18 insertions(+), 0 deletions(-)

Comments

Steffen Klassert Dec. 8, 2017, 7:02 a.m. | #1
On Wed, Dec 06, 2017 at 07:38:19PM +0300, Alexey Kodanev wrote:
> LTP/udp6_ipsec_vti tests fail when sending large UDP datagrams
> that require fragmentation and underlying device MTU <= 1500.
> This happens because ip6_vti sets mtu to ETH_DATA_LEN and not
> updating it depending on a destiantion address.
> 
> Futhure attempts to send UDP packets may succeed because pmtu
> get updated on ICMPV6_PKT_TOOBIG in vti6_err().
> 
> Here is the example when output device MTU set to 9000:
> 
>   # ip a sh ltp_ns_veth2
>     ltp_ns_veth2@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 ...
>       inet 10.0.0.2/24 scope global ltp_ns_veth2
>       inet6 fd00::2/64 scope global
>       ...
>   # ip li add vti6 type vti6 local fd00::2 remote fd00::1
>   # ip li show vti6
>     vti6@NONE: <POINTOPOINT,NOARP> mtu 1500 ...
>       link/tunnel6 fd00::2 peer fd00::1
> 
> After the patch:
> 
>   # ip li add vti6 type vti6 local fd00::2 remote fd00::1
>   # ip li show vti6
>     vti6@NONE: <POINTOPOINT,NOARP> mtu 8832 ...
>       link/tunnel6 fd00::2 peer fd00::1
> 
> Regarding ip_vti, it already tunes mtu with ip_tunnel_bind_dev():
> 
>   # ip li add vti4 type vti local 10.0.0.2 remote 10.0.0.1
>   # ip li sh vti4
>     vti4@NONE: <POINTOPOINT,NOARP> mtu 8832 ...
>       link/ipip 10.0.0.2 peer 10.0.0.1
> 
> Reported-by: Petr Vorel <pvorel@suse.cz>
> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
> ---
> 
> ip6_vti mtu offset is the same (168) as in ip_vti because ip_vti
> offset includes two sizes of struct iphdr: in dev->hard_header_len
> and in t_hlen in ip_tunnel_bind_dev(). I'm not sure if it's correct.
> 
>  net/ipv6/ip6_vti.c |   18 ++++++++++++++++++
>  1 files changed, 18 insertions(+), 0 deletions(-)
> 
> diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
> index dbb74f3..47e6464 100644
> --- a/net/ipv6/ip6_vti.c
> +++ b/net/ipv6/ip6_vti.c
> @@ -638,6 +638,24 @@ static void vti6_link_config(struct ip6_tnl *t)
>  		dev->flags |= IFF_POINTOPOINT;
>  	else
>  		dev->flags &= ~IFF_POINTOPOINT;
> +
> +	if (p->flags & IP6_TNL_F_CAP_XMIT) {
> +		int strict = (ipv6_addr_type(&p->raddr) &
> +			      (IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL));
> +
> +		struct rt6_info *rt = rt6_lookup(t->net,
> +						 &p->raddr, &p->laddr,
> +						 p->link, strict);
> +
> +		if (!rt)
> +			return;
> +
> +		if (rt->dst.dev) {
> +			dev->mtu = max(rt->dst.dev->mtu - dev->hard_header_len,
> +				       IPV6_MIN_MTU);

Hm, I'm gettting this when compiling with your patch:

In file included from /home/klassert/git/ipsec-next/include/linux/list.h:9:0,
from /home/klassert/git/ipsec-next/include/linux/module.h:9,
from /home/klassert/git/ipsec-next/net/ipv6/ip6_vti.c:18:
/home/klassert/git/ipsec-next/net/ipv6/ip6_vti.c: In function ‘vti6_link_config’:
/home/klassert/git/ipsec-next/include/linux/kernel.h:808:16: warning: comparison of distinct pointer types lacks a cast
 (void) (&max1 == &max2);   \
               ^
/home/klassert/git/ipsec-next/include/linux/kernel.h:817:2: note: in expansion of macro ‘__max’
 __max(typeof(x), typeof(y),   \
 ^~~~~
/home/klassert/git/ipsec-next/net/ipv6/ip6_vti.c:654:15: note: in expansion of macro ‘max’
  dev->mtu = max(rt->dst.dev->mtu - dev->hard_header_len,
Alexey Kodanev Dec. 8, 2017, 11:54 a.m. | #2
On 12/08/2017 10:02 AM, Steffen Klassert wrote:
> On Wed, Dec 06, 2017 at 07:38:19PM +0300, Alexey Kodanev wrote:
>> LTP/udp6_ipsec_vti tests fail when sending large UDP datagrams
>> that require fragmentation and underlying device MTU <= 1500.
>> This happens because ip6_vti sets mtu to ETH_DATA_LEN and not
>> updating it depending on a destiantion address.
>>
>> Futhure attempts to send UDP packets may succeed because pmtu
>> get updated on ICMPV6_PKT_TOOBIG in vti6_err().
>>
>> Here is the example when output device MTU set to 9000:
>>
>>   # ip a sh ltp_ns_veth2
>>     ltp_ns_veth2@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 ...
>>       inet 10.0.0.2/24 scope global ltp_ns_veth2
>>       inet6 fd00::2/64 scope global
>>       ...
>>   # ip li add vti6 type vti6 local fd00::2 remote fd00::1
>>   # ip li show vti6
>>     vti6@NONE: <POINTOPOINT,NOARP> mtu 1500 ...
>>       link/tunnel6 fd00::2 peer fd00::1
>>
>> After the patch:
>>
>>   # ip li add vti6 type vti6 local fd00::2 remote fd00::1
>>   # ip li show vti6
>>     vti6@NONE: <POINTOPOINT,NOARP> mtu 8832 ...
>>       link/tunnel6 fd00::2 peer fd00::1
>>
>> Regarding ip_vti, it already tunes mtu with ip_tunnel_bind_dev():
>>
>>   # ip li add vti4 type vti local 10.0.0.2 remote 10.0.0.1
>>   # ip li sh vti4
>>     vti4@NONE: <POINTOPOINT,NOARP> mtu 8832 ...
>>       link/ipip 10.0.0.2 peer 10.0.0.1
>>
>> Reported-by: Petr Vorel <pvorel@suse.cz>
>> Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
>> ---
>>
>> ip6_vti mtu offset is the same (168) as in ip_vti because ip_vti
>> offset includes two sizes of struct iphdr: in dev->hard_header_len
>> and in t_hlen in ip_tunnel_bind_dev(). I'm not sure if it's correct.
>>
>>  net/ipv6/ip6_vti.c |   18 ++++++++++++++++++
>>  1 files changed, 18 insertions(+), 0 deletions(-)
>>
>> diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
>> index dbb74f3..47e6464 100644
>> --- a/net/ipv6/ip6_vti.c
>> +++ b/net/ipv6/ip6_vti.c
>> @@ -638,6 +638,24 @@ static void vti6_link_config(struct ip6_tnl *t)
>>  		dev->flags |= IFF_POINTOPOINT;
>>  	else
>>  		dev->flags &= ~IFF_POINTOPOINT;
>> +
>> +	if (p->flags & IP6_TNL_F_CAP_XMIT) {
>> +		int strict = (ipv6_addr_type(&p->raddr) &
>> +			      (IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL));
>> +
>> +		struct rt6_info *rt = rt6_lookup(t->net,
>> +						 &p->raddr, &p->laddr,
>> +						 p->link, strict);
>> +
>> +		if (!rt)
>> +			return;
>> +
>> +		if (rt->dst.dev) {
>> +			dev->mtu = max(rt->dst.dev->mtu - dev->hard_header_len,
>> +				       IPV6_MIN_MTU);
> 
> Hm, I'm gettting this when compiling with your patch:
> 
> In file included from /home/klassert/git/ipsec-next/include/linux/list.h:9:0,
> from /home/klassert/git/ipsec-next/include/linux/module.h:9,
> from /home/klassert/git/ipsec-next/net/ipv6/ip6_vti.c:18:
> /home/klassert/git/ipsec-next/net/ipv6/ip6_vti.c: In function ‘vti6_link_config’:
> /home/klassert/git/ipsec-next/include/linux/kernel.h:808:16: warning: comparison of distinct pointer types lacks a cast
>  (void) (&max1 == &max2);   \
>                ^
> /home/klassert/git/ipsec-next/include/linux/kernel.h:817:2: note: in expansion of macro ‘__max’
>  __max(typeof(x), typeof(y),   \
>  ^~~~~
> /home/klassert/git/ipsec-next/net/ipv6/ip6_vti.c:654:15: note: in expansion of macro ‘max’
>   dev->mtu = max(rt->dst.dev->mtu - dev->hard_header_len,
> 

rt->dst.dev->mtu and dev->hard_header_len are both unsigned and
IPV6_MIN_MTU considered as int, I guess IPV6_MIN_MTU can be changed
to dev->min_mtu as it is set to the same value in setup, but checking
in the way it is done in ip6_tnl_link_config() looks better.

I'll send 2nd version.

Thanks,
Alexey
Shannon Nelson Dec. 8, 2017, 11:25 p.m. | #3
On 12/8/2017 3:54 AM, Alexey Kodanev wrote:
> On 12/08/2017 10:02 AM, Steffen Klassert wrote:
>> On Wed, Dec 06, 2017 at 07:38:19PM +0300, Alexey Kodanev wrote:

Since you're planning to do a 2nd version anyway, can we get a couple of 
the commit message issues cleaned up?

>>> LTP/udp6_ipsec_vti tests fail when sending large UDP datagrams
>>> that require fragmentation and underlying device MTU <= 1500.

s/underlying device/the underlying device has/

>>> This happens because ip6_vti sets mtu to ETH_DATA_LEN and not
>>> updating it depending on a destiantion address.

s/destiantion/destination/

>>>
>>> Futhure attempts to send UDP packets may succeed because pmtu

s/Futhure/Further/

>>> get updated on ICMPV6_PKT_TOOBIG in vti6_err().

s/get/gets/

>>>
>>> Here is the example when output device MTU set to 9000:

s/output device MTU/the output device MTU is/

Thanks,
sln

Patch

diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index dbb74f3..47e6464 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -638,6 +638,24 @@  static void vti6_link_config(struct ip6_tnl *t)
 		dev->flags |= IFF_POINTOPOINT;
 	else
 		dev->flags &= ~IFF_POINTOPOINT;
+
+	if (p->flags & IP6_TNL_F_CAP_XMIT) {
+		int strict = (ipv6_addr_type(&p->raddr) &
+			      (IPV6_ADDR_MULTICAST | IPV6_ADDR_LINKLOCAL));
+
+		struct rt6_info *rt = rt6_lookup(t->net,
+						 &p->raddr, &p->laddr,
+						 p->link, strict);
+
+		if (!rt)
+			return;
+
+		if (rt->dst.dev) {
+			dev->mtu = max(rt->dst.dev->mtu - dev->hard_header_len,
+				       IPV6_MIN_MTU);
+		}
+		ip6_rt_put(rt);
+	}
 }
 
 /**