diff mbox

Question: should local address be expired when updating PMTU?

Message ID 20150203120140.GU13046@secunet.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Steffen Klassert Feb. 3, 2015, 12:01 p.m. UTC
On Tue, Feb 03, 2015 at 06:54:19PM +0800, shengyong wrote:
> 
> 
> 在 2015/2/3 17:28, Steffen Klassert 写道:
> > On Mon, Feb 02, 2015 at 04:20:24PM +0800, shengyong wrote:
> > 
> > We first need to find out why you receive this Packet Too Big message,
> The packet is sent by a commercial-off-the-shelf testcase, and I can reproduce the
> situation by using scapy and creating a packet as the following:
> 
> 	$ cat packet-too-big.py
> 	#!/usr/bin/python
> 	
> 	from scapy.all import *
> 
> 	# fe80::800:27ff:fe00:0 is linklocal addr of PC
> 	# fe80::a00:27ff:fe1a:e2a0 is linklocal addr of VM
> 	base=IPv6(src='fe80::800:27ff:fe00:0',dst='fe80::a00:27ff:fe1a:e2a0')
> 	pkt_too_big=ICMPv6PacketTooBig(mtu=1024)
> 	ext_base=IPv6(src='fe80::a00:27ff:fe1a:e2a0',dst='fe80::a00:27ff:fe1a:e2a0',plen=24)
> 	ext_nd_na=ICMPv6ND_NA()
> 	
> 	packet=base/pkt_too_big/ext_base/ext_nd_na
> 	send(packet)

So it is not a valid pmtu update, this make life easier.

Can you please test the patch below (compile tested only)?

This should fix your problem, and in combination with the two patches I sent
out last week, it should cure the whole 'expiring of uncached routes' problem.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

shengyong Feb. 4, 2015, 1:59 a.m. UTC | #1
在 2015/2/3 20:01, Steffen Klassert 写道:
> On Tue, Feb 03, 2015 at 06:54:19PM +0800, shengyong wrote:
>>
>>
>> 在 2015/2/3 17:28, Steffen Klassert 写道:
>>> On Mon, Feb 02, 2015 at 04:20:24PM +0800, shengyong wrote:
>>>
>>> We first need to find out why you receive this Packet Too Big message,
>> The packet is sent by a commercial-off-the-shelf testcase, and I can reproduce the
>> situation by using scapy and creating a packet as the following:
>>
>> 	$ cat packet-too-big.py
>> 	#!/usr/bin/python
>> 	
>> 	from scapy.all import *
>>
>> 	# fe80::800:27ff:fe00:0 is linklocal addr of PC
>> 	# fe80::a00:27ff:fe1a:e2a0 is linklocal addr of VM
>> 	base=IPv6(src='fe80::800:27ff:fe00:0',dst='fe80::a00:27ff:fe1a:e2a0')
>> 	pkt_too_big=ICMPv6PacketTooBig(mtu=1024)
>> 	ext_base=IPv6(src='fe80::a00:27ff:fe1a:e2a0',dst='fe80::a00:27ff:fe1a:e2a0',plen=24)
>> 	ext_nd_na=ICMPv6ND_NA()
>> 	
>> 	packet=base/pkt_too_big/ext_base/ext_nd_na
>> 	send(packet)
> 
> So it is not a valid pmtu update, this make life easier.
> 
> Can you please test the patch below (compile tested only)?
Sorry, the later. I test it on 3.10-stable. It can fix this problem. So maybe this is a bug?
And the 3 approaches (different flags are used: RTF_LOCAL, IFF_LOOPBACK and RTF_CACHE) in
these mails can fix the expire of local address. I'm confused about these flags:
* RTF_LOCAL: the entries of local address, like address binded to the native NIC
* RTF_CACHE: all cached entries
* IFF_LOOPBACK: this is a device-related flag, which has the same meaning as RTF_LOCAL

Am I right? If so, I think RTF_LOCAL is appropriate, because we just want entries of local
addresses to be not expired and we don't care other entries (I think if they get expired,
a neigh discovery could find them back).

thx,
Sheng
> 
> This should fix your problem, and in combination with the two patches I sent
> out last week, it should cure the whole 'expiring of uncached routes' problem.
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 49596535..4ccfb9c 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1156,7 +1156,8 @@ static void ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
>  	struct rt6_info *rt6 = (struct rt6_info *)dst;
>  
>  	dst_confirm(dst);
> -	if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128) {
> +	if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128 &&
> +	    (rt6->rt6i_flags & RTF_CACHE)) {
>  		struct net *net = dev_net(dst->dev);
>  
>  		rt6->rt6i_flags |= RTF_MODIFIED;
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steffen Klassert Feb. 5, 2015, 7:21 a.m. UTC | #2
On Wed, Feb 04, 2015 at 09:59:54AM +0800, shengyong wrote:
> 
> Sorry, the later. I test it on 3.10-stable. It can fix this problem. So maybe this is a bug?

Yes, it's a bug.

> And the 3 approaches (different flags are used: RTF_LOCAL, IFF_LOOPBACK and RTF_CACHE) in
> these mails can fix the expire of local address. I'm confused about these flags:
> * RTF_LOCAL: the entries of local address, like address binded to the native NIC
> * RTF_CACHE: all cached entries
> * IFF_LOOPBACK: this is a device-related flag, which has the same meaning as RTF_LOCAL
> 
> Am I right? If so, I think RTF_LOCAL is appropriate, because we just want entries of local
> addresses to be not expired and we don't care other entries (I think if they get expired,
> a neigh discovery could find them back).

It is not the address that expires, it is the learned PMTU value that
expires. If we delete an uncached route, nothing will bring it back
unless you readd it manually.

We need to ensure that all routes that can learn something what
expires are cached. This means that we clone the inserted route
when it is used for the first time. The learned values are stored
at this cloned route. If the learned value expires, the clone is
deleted. The original route remains in the fib tree and can be
still looked up.

The problem is, that we currently don't cache/clone host routes.
So if a host route learns something that expires, the original
route is removed from the fib tree and we loose connectivity
to that host. We don't cache host routes because some of them
(the local ones) are automatically added with metric 0.
If we try to cache such a route, the clone will be identical
to the original route and we fail to insert it to the fib tree.

So we need to adjust the caching to all routes that actually can
learn something and leave out only those that can not.

I'll send a patchset that should fix this at the beginning of the
next week.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
shengyong Feb. 27, 2015, 2:37 a.m. UTC | #3
在 2015/2/5 15:21, Steffen Klassert 写道:
> On Wed, Feb 04, 2015 at 09:59:54AM +0800, shengyong wrote:
>>
>> Sorry, the later. I test it on 3.10-stable. It can fix this problem. So maybe this is a bug?
> 
> Yes, it's a bug.
> 
>> And the 3 approaches (different flags are used: RTF_LOCAL, IFF_LOOPBACK and RTF_CACHE) in
>> these mails can fix the expire of local address. I'm confused about these flags:
>> * RTF_LOCAL: the entries of local address, like address binded to the native NIC
>> * RTF_CACHE: all cached entries
>> * IFF_LOOPBACK: this is a device-related flag, which has the same meaning as RTF_LOCAL
>>
>> Am I right? If so, I think RTF_LOCAL is appropriate, because we just want entries of local
>> addresses to be not expired and we don't care other entries (I think if they get expired,
>> a neigh discovery could find them back).
> 
> It is not the address that expires, it is the learned PMTU value that
> expires. If we delete an uncached route, nothing will bring it back
> unless you readd it manually.
> 
> We need to ensure that all routes that can learn something what
> expires are cached. This means that we clone the inserted route
> when it is used for the first time. The learned values are stored
> at this cloned route. If the learned value expires, the clone is
> deleted. The original route remains in the fib tree and can be
> still looked up.
> 
> The problem is, that we currently don't cache/clone host routes.
> So if a host route learns something that expires, the original
> route is removed from the fib tree and we loose connectivity
> to that host. We don't cache host routes because some of them
> (the local ones) are automatically added with metric 0.
> If we try to cache such a route, the clone will be identical
> to the original route and we fail to insert it to the fib tree.
> 
> So we need to adjust the caching to all routes that actually can
> learn something and leave out only those that can not.
> 
> I'll send a patchset that should fix this at the beginning of the
> next week.
Hi, Steffen
Is this patchset ready? It seems that I didn't find it in the mainling
list. If it is ready, I can test it to see if it solves the problem I
met :)

many thanks,
Sheng
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steffen Klassert Feb. 27, 2015, 10:32 a.m. UTC | #4
On Fri, Feb 27, 2015 at 10:37:46AM +0800, shengyong wrote:
> Hi, Steffen
> Is this patchset ready? It seems that I didn't find it in the mainling
> list. If it is ready, I can test it to see if it solves the problem I
> met :)

Martin Lau pointed me to a problem when cached host routes are deleted.
We may delete the clone but not the original route. This has to be
resolved first. Unfortunately I had no time to look into this so far.
I hope to get to it next week.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steffen Klassert March 30, 2015, 10:32 a.m. UTC | #5
On Fri, Feb 27, 2015 at 10:37:46AM +0800, shengyong wrote:
> 
> 
> 在 2015/2/5 15:21, Steffen Klassert 写道:
> > On Wed, Feb 04, 2015 at 09:59:54AM +0800, shengyong wrote:
> >>
> >> Sorry, the later. I test it on 3.10-stable. It can fix this problem. So maybe this is a bug?
> > 
> > Yes, it's a bug.
> > 
> >> And the 3 approaches (different flags are used: RTF_LOCAL, IFF_LOOPBACK and RTF_CACHE) in
> >> these mails can fix the expire of local address. I'm confused about these flags:
> >> * RTF_LOCAL: the entries of local address, like address binded to the native NIC
> >> * RTF_CACHE: all cached entries
> >> * IFF_LOOPBACK: this is a device-related flag, which has the same meaning as RTF_LOCAL
> >>
> >> Am I right? If so, I think RTF_LOCAL is appropriate, because we just want entries of local
> >> addresses to be not expired and we don't care other entries (I think if they get expired,
> >> a neigh discovery could find them back).
> > 
> > It is not the address that expires, it is the learned PMTU value that
> > expires. If we delete an uncached route, nothing will bring it back
> > unless you readd it manually.
> > 
> > We need to ensure that all routes that can learn something what
> > expires are cached. This means that we clone the inserted route
> > when it is used for the first time. The learned values are stored
> > at this cloned route. If the learned value expires, the clone is
> > deleted. The original route remains in the fib tree and can be
> > still looked up.
> > 
> > The problem is, that we currently don't cache/clone host routes.
> > So if a host route learns something that expires, the original
> > route is removed from the fib tree and we loose connectivity
> > to that host. We don't cache host routes because some of them
> > (the local ones) are automatically added with metric 0.
> > If we try to cache such a route, the clone will be identical
> > to the original route and we fail to insert it to the fib tree.
> > 
> > So we need to adjust the caching to all routes that actually can
> > learn something and leave out only those that can not.
> > 
> > I'll send a patchset that should fix this at the beginning of the
> > next week.
> Hi, Steffen
> Is this patchset ready? It seems that I didn't find it in the mainling
> list. If it is ready, I can test it to see if it solves the problem I
> met :)

I finally end up with a patchset that passed all your testcases.
I'll send it in reply to this mail, please test it.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
shengyong March 30, 2015, 11:13 a.m. UTC | #6
On 3/30/2015 6:32 PM, Steffen Klassert wrote:
> On Fri, Feb 27, 2015 at 10:37:46AM +0800, shengyong wrote:
>>
>>
>> 在 2015/2/5 15:21, Steffen Klassert 写道:
>>> On Wed, Feb 04, 2015 at 09:59:54AM +0800, shengyong wrote:
>>>>
>>>> Sorry, the later. I test it on 3.10-stable. It can fix this problem. So maybe this is a bug?
>>>
>>> Yes, it's a bug.
>>>
>>>> And the 3 approaches (different flags are used: RTF_LOCAL, IFF_LOOPBACK and RTF_CACHE) in
>>>> these mails can fix the expire of local address. I'm confused about these flags:
>>>> * RTF_LOCAL: the entries of local address, like address binded to the native NIC
>>>> * RTF_CACHE: all cached entries
>>>> * IFF_LOOPBACK: this is a device-related flag, which has the same meaning as RTF_LOCAL
>>>>
>>>> Am I right? If so, I think RTF_LOCAL is appropriate, because we just want entries of local
>>>> addresses to be not expired and we don't care other entries (I think if they get expired,
>>>> a neigh discovery could find them back).
>>>
>>> It is not the address that expires, it is the learned PMTU value that
>>> expires. If we delete an uncached route, nothing will bring it back
>>> unless you readd it manually.
>>>
>>> We need to ensure that all routes that can learn something what
>>> expires are cached. This means that we clone the inserted route
>>> when it is used for the first time. The learned values are stored
>>> at this cloned route. If the learned value expires, the clone is
>>> deleted. The original route remains in the fib tree and can be
>>> still looked up.
>>>
>>> The problem is, that we currently don't cache/clone host routes.
>>> So if a host route learns something that expires, the original
>>> route is removed from the fib tree and we loose connectivity
>>> to that host. We don't cache host routes because some of them
>>> (the local ones) are automatically added with metric 0.
>>> If we try to cache such a route, the clone will be identical
>>> to the original route and we fail to insert it to the fib tree.
>>>
>>> So we need to adjust the caching to all routes that actually can
>>> learn something and leave out only those that can not.
>>>
>>> I'll send a patchset that should fix this at the beginning of the
>>> next week.
>> Hi, Steffen
>> Is this patchset ready? It seems that I didn't find it in the mainling
>> list. If it is ready, I can test it to see if it solves the problem I
>> met :)
> 
> I finally end up with a patchset that passed all your testcases.
> I'll send it in reply to this mail, please test it.
> 
I just test the patchset, and it can solve the problem I met.
Thanks very much,
Sheng

> Thanks!
> 
> .
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 49596535..4ccfb9c 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1156,7 +1156,8 @@  static void ip6_rt_update_pmtu(struct dst_entry *dst, struct sock *sk,
 	struct rt6_info *rt6 = (struct rt6_info *)dst;
 
 	dst_confirm(dst);
-	if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128) {
+	if (mtu < dst_mtu(dst) && rt6->rt6i_dst.plen == 128 &&
+	    (rt6->rt6i_flags & RTF_CACHE)) {
 		struct net *net = dev_net(dst->dev);
 
 		rt6->rt6i_flags |= RTF_MODIFIED;