diff mbox

[net] ipv6: refresh rt6i_genid in ip6_pol_route()

Message ID 1410116703.11872.55.camel@edumazet-glaptop2.roam.corp.google.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Sept. 7, 2014, 7:05 p.m. UTC
From: Eric Dumazet <edumazet@google.com>

While tracking IPv6 poor performance, I found that IPv6 early demux
was broken in recent kernels. perf profiles show inet6_sk_rx_dst_set()
being called for every incoming TCP segment :

  20.95%      netserver  [kernel.kallsyms]    [k] dst_release     
  19.33%      netserver  [kernel.kallsyms]    [k] ip6_pol_route
  11.75%      netserver  [kernel.kallsyms]    [k] inet6_sk_rx_dst_set
   3.72%      netserver  [kernel.kallsyms]    [k] ip6_input_finish

Regression came in linux-3.6 with commit 6f3118b571b8 ("ipv6: use
net->rt_genid to check dst validity")

When a route found in ip6_pol_route() is cloned (either using
rt6_alloc_cow() or rt6_alloc_clone()), copy gets an updated rt6i_genid.

But when original route is selected, we need to refresh its rt6i_genid
that could be obsolete. If we do not refresh rt6i_genid, ip6_dst_check()
will fail.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Fixes: 6f3118b571b8 ("ipv6: use net->rt_genid to check dst validity")
Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/ipv6/route.c |   15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Sept. 7, 2014, 10:54 p.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 07 Sep 2014 12:05:03 -0700

> From: Eric Dumazet <edumazet@google.com>
> 
> While tracking IPv6 poor performance, I found that IPv6 early demux
> was broken in recent kernels. perf profiles show inet6_sk_rx_dst_set()
> being called for every incoming TCP segment :
> 
>   20.95%      netserver  [kernel.kallsyms]    [k] dst_release     
>   19.33%      netserver  [kernel.kallsyms]    [k] ip6_pol_route
>   11.75%      netserver  [kernel.kallsyms]    [k] inet6_sk_rx_dst_set
>    3.72%      netserver  [kernel.kallsyms]    [k] ip6_input_finish
> 
> Regression came in linux-3.6 with commit 6f3118b571b8 ("ipv6: use
> net->rt_genid to check dst validity")
> 
> When a route found in ip6_pol_route() is cloned (either using
> rt6_alloc_cow() or rt6_alloc_clone()), copy gets an updated rt6i_genid.
> 
> But when original route is selected, we need to refresh its rt6i_genid
> that could be obsolete. If we do not refresh rt6i_genid, ip6_dst_check()
> will fail.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Fixes: 6f3118b571b8 ("ipv6: use net->rt_genid to check dst validity")

This might be broken.

We are dealing here with persistent entries in the ipv6 routine trie.

If you just bump the genid on the next person to look it up, other
sockets and cached entities might not have validated the route yet,
and now will falsely see the route as valid.  We have to ensure that
they too drop this route and perform a relookup.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 8, 2014, 4:18 a.m. UTC | #2
On Sun, 2014-09-07 at 15:54 -0700, David Miller wrote:

> This might be broken.
> 
> We are dealing here with persistent entries in the ipv6 routine trie.
> 
> If you just bump the genid on the next person to look it up, other
> sockets and cached entities might not have validated the route yet,
> and now will falsely see the route as valid.  We have to ensure that
> they too drop this route and perform a relookup.

I am confused, I thought it was the role of the cookie.

(Ie socket has to store its own cookie to be able to validate a route)

Before 6f3118b571b8 patch, how was this done anyway ?

If persistent routes cannot refresh the genid, then they are useless ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Sept. 8, 2014, 4:27 a.m. UTC | #3
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 07 Sep 2014 21:18:25 -0700

> On Sun, 2014-09-07 at 15:54 -0700, David Miller wrote:
> 
>> This might be broken.
>> 
>> We are dealing here with persistent entries in the ipv6 routine trie.
>> 
>> If you just bump the genid on the next person to look it up, other
>> sockets and cached entities might not have validated the route yet,
>> and now will falsely see the route as valid.  We have to ensure that
>> they too drop this route and perform a relookup.
> 
> I am confused, I thought it was the role of the cookie.
> 
> (Ie socket has to store its own cookie to be able to validate a route)
> 
> Before 6f3118b571b8 patch, how was this done anyway ?
> 
> If persistent routes cannot refresh the genid, then they are useless ?

I just speak about the genid aspect.

I understand that cookie (via node->fn_sernum) invalidates the path
in the fib_trie, but the genid protects against other circumstances
(matching IPSEC rule, f.e.)

You have to make sure all other sockets did a full route lookup
(including IPSEC) before you can safely adjust the genid.

I could be wrong, recheck my analysis please :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 8, 2014, 4:43 a.m. UTC | #4
On Sun, 2014-09-07 at 21:27 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sun, 07 Sep 2014 21:18:25 -0700
> 
> > On Sun, 2014-09-07 at 15:54 -0700, David Miller wrote:
> > 
> >> This might be broken.
> >> 
> >> We are dealing here with persistent entries in the ipv6 routine trie.
> >> 
> >> If you just bump the genid on the next person to look it up, other
> >> sockets and cached entities might not have validated the route yet,
> >> and now will falsely see the route as valid.  We have to ensure that
> >> they too drop this route and perform a relookup.
> > 
> > I am confused, I thought it was the role of the cookie.
> > 
> > (Ie socket has to store its own cookie to be able to validate a route)
> > 
> > Before 6f3118b571b8 patch, how was this done anyway ?
> > 
> > If persistent routes cannot refresh the genid, then they are useless ?
> 
> I just speak about the genid aspect.
> 
> I understand that cookie (via node->fn_sernum) invalidates the path
> in the fib_trie, but the genid protects against other circumstances
> (matching IPSEC rule, f.e.)
> 
> You have to make sure all other sockets did a full route lookup
> (including IPSEC) before you can safely adjust the genid.
> 
> I could be wrong, recheck my analysis please :-)

So this whole genid protection can not work, unless we make sure a
socket cannot share a route with another socket.

This means we have to clone all routes.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Sept. 8, 2014, 4:59 a.m. UTC | #5
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 07 Sep 2014 21:43:54 -0700

> On Sun, 2014-09-07 at 21:27 -0700, David Miller wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Sun, 07 Sep 2014 21:18:25 -0700
>> 
>> > On Sun, 2014-09-07 at 15:54 -0700, David Miller wrote:
>> > 
>> >> This might be broken.
>> >> 
>> >> We are dealing here with persistent entries in the ipv6 routine trie.
>> >> 
>> >> If you just bump the genid on the next person to look it up, other
>> >> sockets and cached entities might not have validated the route yet,
>> >> and now will falsely see the route as valid.  We have to ensure that
>> >> they too drop this route and perform a relookup.
>> > 
>> > I am confused, I thought it was the role of the cookie.
>> > 
>> > (Ie socket has to store its own cookie to be able to validate a route)
>> > 
>> > Before 6f3118b571b8 patch, how was this done anyway ?
>> > 
>> > If persistent routes cannot refresh the genid, then they are useless ?
>> 
>> I just speak about the genid aspect.
>> 
>> I understand that cookie (via node->fn_sernum) invalidates the path
>> in the fib_trie, but the genid protects against other circumstances
>> (matching IPSEC rule, f.e.)
>> 
>> You have to make sure all other sockets did a full route lookup
>> (including IPSEC) before you can safely adjust the genid.
>> 
>> I could be wrong, recheck my analysis please :-)
> 
> So this whole genid protection can not work, unless we make sure a
> socket cannot share a route with another socket.
> 
> This means we have to clone all routes.

I'm willing to revert the change in question if you think that's the
sanest way forward.

The bug fix for more obscure use cases (IPSEC) if pointless if it
breaks more common things (TCP input route caching).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 8, 2014, 5:07 a.m. UTC | #6
On Sun, 2014-09-07 at 21:59 -0700, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Sun, 07 Sep 2014 21:43:54 -0700
> 
> > On Sun, 2014-09-07 at 21:27 -0700, David Miller wrote:
> >> From: Eric Dumazet <eric.dumazet@gmail.com>
> >> Date: Sun, 07 Sep 2014 21:18:25 -0700
> >> 
> >> > On Sun, 2014-09-07 at 15:54 -0700, David Miller wrote:
> >> > 
> >> >> This might be broken.
> >> >> 
> >> >> We are dealing here with persistent entries in the ipv6 routine trie.
> >> >> 
> >> >> If you just bump the genid on the next person to look it up, other
> >> >> sockets and cached entities might not have validated the route yet,
> >> >> and now will falsely see the route as valid.  We have to ensure that
> >> >> they too drop this route and perform a relookup.
> >> > 
> >> > I am confused, I thought it was the role of the cookie.
> >> > 
> >> > (Ie socket has to store its own cookie to be able to validate a route)
> >> > 
> >> > Before 6f3118b571b8 patch, how was this done anyway ?
> >> > 
> >> > If persistent routes cannot refresh the genid, then they are useless ?
> >> 
> >> I just speak about the genid aspect.
> >> 
> >> I understand that cookie (via node->fn_sernum) invalidates the path
> >> in the fib_trie, but the genid protects against other circumstances
> >> (matching IPSEC rule, f.e.)
> >> 
> >> You have to make sure all other sockets did a full route lookup
> >> (including IPSEC) before you can safely adjust the genid.
> >> 
> >> I could be wrong, recheck my analysis please :-)
> > 
> > So this whole genid protection can not work, unless we make sure a
> > socket cannot share a route with another socket.
> > 
> > This means we have to clone all routes.
> 
> I'm willing to revert the change in question if you think that's the
> sanest way forward.
> 
> The bug fix for more obscure use cases (IPSEC) if pointless if it
> breaks more common things (TCP input route caching).

Lets wait for Nicolas and/or Hannes input, they might have some ideas...



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas Dichtel Sept. 8, 2014, 8:11 a.m. UTC | #7
Le 08/09/2014 07:07, Eric Dumazet a écrit :
> On Sun, 2014-09-07 at 21:59 -0700, David Miller wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Sun, 07 Sep 2014 21:43:54 -0700
>>
>>> On Sun, 2014-09-07 at 21:27 -0700, David Miller wrote:
>>>> From: Eric Dumazet <eric.dumazet@gmail.com>
>>>> Date: Sun, 07 Sep 2014 21:18:25 -0700
>>>>
>>>>> On Sun, 2014-09-07 at 15:54 -0700, David Miller wrote:
>>>>>
>>>>>> This might be broken.
>>>>>>
>>>>>> We are dealing here with persistent entries in the ipv6 routine trie.
>>>>>>
>>>>>> If you just bump the genid on the next person to look it up, other
>>>>>> sockets and cached entities might not have validated the route yet,
>>>>>> and now will falsely see the route as valid.  We have to ensure that
>>>>>> they too drop this route and perform a relookup.
>>>>>
>>>>> I am confused, I thought it was the role of the cookie.
>>>>>
>>>>> (Ie socket has to store its own cookie to be able to validate a route)
>>>>>
>>>>> Before 6f3118b571b8 patch, how was this done anyway ?
>>>>>
>>>>> If persistent routes cannot refresh the genid, then they are useless ?
>>>>
>>>> I just speak about the genid aspect.
>>>>
>>>> I understand that cookie (via node->fn_sernum) invalidates the path
>>>> in the fib_trie, but the genid protects against other circumstances
>>>> (matching IPSEC rule, f.e.)
>>>>
>>>> You have to make sure all other sockets did a full route lookup
>>>> (including IPSEC) before you can safely adjust the genid.
>>>>
>>>> I could be wrong, recheck my analysis please :-)
>>>
>>> So this whole genid protection can not work, unless we make sure a
>>> socket cannot share a route with another socket.
>>>
>>> This means we have to clone all routes.
>>
>> I'm willing to revert the change in question if you think that's the
>> sanest way forward.
>>
>> The bug fix for more obscure use cases (IPSEC) if pointless if it
>> breaks more common things (TCP input route caching).
>
> Lets wait for Nicolas and/or Hannes input, they might have some ideas...

The initial problem was in SCTP. Here is the thread after the v1 patch:
http://patchwork.ozlabs.org/patch/182235/

Before the patch, SCTP stored the IPv6 route in its cache and if an IPsec
rules was inserted after that operation, SCTP never invalidated the cached
route to use a new xfrm route.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 8, 2014, 10:28 a.m. UTC | #8
On Mon, 2014-09-08 at 10:11 +0200, Nicolas Dichtel wrote:
> Le 08/09/2014 07:07, Eric Dumazet a écrit :
> > On Sun, 2014-09-07 at 21:59 -0700, David Miller wrote:
> >> From: Eric Dumazet <eric.dumazet@gmail.com>
> >> Date: Sun, 07 Sep 2014 21:43:54 -0700
> >>
> >>> On Sun, 2014-09-07 at 21:27 -0700, David Miller wrote:
> >>>> From: Eric Dumazet <eric.dumazet@gmail.com>
> >>>> Date: Sun, 07 Sep 2014 21:18:25 -0700
> >>>>
> >>>>> On Sun, 2014-09-07 at 15:54 -0700, David Miller wrote:
> >>>>>
> >>>>>> This might be broken.
> >>>>>>
> >>>>>> We are dealing here with persistent entries in the ipv6 routine trie.
> >>>>>>
> >>>>>> If you just bump the genid on the next person to look it up, other
> >>>>>> sockets and cached entities might not have validated the route yet,
> >>>>>> and now will falsely see the route as valid.  We have to ensure that
> >>>>>> they too drop this route and perform a relookup.
> >>>>>
> >>>>> I am confused, I thought it was the role of the cookie.
> >>>>>
> >>>>> (Ie socket has to store its own cookie to be able to validate a route)
> >>>>>
> >>>>> Before 6f3118b571b8 patch, how was this done anyway ?
> >>>>>
> >>>>> If persistent routes cannot refresh the genid, then they are useless ?
> >>>>
> >>>> I just speak about the genid aspect.
> >>>>
> >>>> I understand that cookie (via node->fn_sernum) invalidates the path
> >>>> in the fib_trie, but the genid protects against other circumstances
> >>>> (matching IPSEC rule, f.e.)
> >>>>
> >>>> You have to make sure all other sockets did a full route lookup
> >>>> (including IPSEC) before you can safely adjust the genid.
> >>>>
> >>>> I could be wrong, recheck my analysis please :-)
> >>>
> >>> So this whole genid protection can not work, unless we make sure a
> >>> socket cannot share a route with another socket.
> >>>
> >>> This means we have to clone all routes.
> >>
> >> I'm willing to revert the change in question if you think that's the
> >> sanest way forward.
> >>
> >> The bug fix for more obscure use cases (IPSEC) if pointless if it
> >> breaks more common things (TCP input route caching).
> >
> > Lets wait for Nicolas and/or Hannes input, they might have some ideas...
> 
> The initial problem was in SCTP. Here is the thread after the v1 patch:
> http://patchwork.ozlabs.org/patch/182235/
> 
> Before the patch, SCTP stored the IPv6 route in its cache and if an IPsec
> rules was inserted after that operation, SCTP never invalidated the cached
> route to use a new xfrm route.

This thread mentions output route.

The problem I currently have with IPv6 early demux is for input routes.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas Dichtel Sept. 8, 2014, 12:16 p.m. UTC | #9
Le 08/09/2014 12:28, Eric Dumazet a écrit :
> On Mon, 2014-09-08 at 10:11 +0200, Nicolas Dichtel wrote:
>> Le 08/09/2014 07:07, Eric Dumazet a écrit :
>>> On Sun, 2014-09-07 at 21:59 -0700, David Miller wrote:
>>>> From: Eric Dumazet <eric.dumazet@gmail.com>
>>>> Date: Sun, 07 Sep 2014 21:43:54 -0700
>>>>
>>>>> On Sun, 2014-09-07 at 21:27 -0700, David Miller wrote:
>>>>>> From: Eric Dumazet <eric.dumazet@gmail.com>
>>>>>> Date: Sun, 07 Sep 2014 21:18:25 -0700
>>>>>>
>>>>>>> On Sun, 2014-09-07 at 15:54 -0700, David Miller wrote:
>>>>>>>
>>>>>>>> This might be broken.
>>>>>>>>
>>>>>>>> We are dealing here with persistent entries in the ipv6 routine trie.
>>>>>>>>
>>>>>>>> If you just bump the genid on the next person to look it up, other
>>>>>>>> sockets and cached entities might not have validated the route yet,
>>>>>>>> and now will falsely see the route as valid.  We have to ensure that
>>>>>>>> they too drop this route and perform a relookup.
>>>>>>>
>>>>>>> I am confused, I thought it was the role of the cookie.
>>>>>>>
>>>>>>> (Ie socket has to store its own cookie to be able to validate a route)
>>>>>>>
>>>>>>> Before 6f3118b571b8 patch, how was this done anyway ?
>>>>>>>
>>>>>>> If persistent routes cannot refresh the genid, then they are useless ?
>>>>>>
>>>>>> I just speak about the genid aspect.
>>>>>>
>>>>>> I understand that cookie (via node->fn_sernum) invalidates the path
>>>>>> in the fib_trie, but the genid protects against other circumstances
>>>>>> (matching IPSEC rule, f.e.)
>>>>>>
>>>>>> You have to make sure all other sockets did a full route lookup
>>>>>> (including IPSEC) before you can safely adjust the genid.
>>>>>>
>>>>>> I could be wrong, recheck my analysis please :-)
>>>>>
>>>>> So this whole genid protection can not work, unless we make sure a
>>>>> socket cannot share a route with another socket.
>>>>>
>>>>> This means we have to clone all routes.
>>>>
>>>> I'm willing to revert the change in question if you think that's the
>>>> sanest way forward.
>>>>
>>>> The bug fix for more obscure use cases (IPSEC) if pointless if it
>>>> breaks more common things (TCP input route caching).
>>>
>>> Lets wait for Nicolas and/or Hannes input, they might have some ideas...
>>
>> The initial problem was in SCTP. Here is the thread after the v1 patch:
>> http://patchwork.ozlabs.org/patch/182235/
>>
>> Before the patch, SCTP stored the IPv6 route in its cache and if an IPsec
>> rules was inserted after that operation, SCTP never invalidated the cached
>> route to use a new xfrm route.
>
> This thread mentions output route.
Yes, it was the target.

>
> The problem I currently have with IPv6 early demux is for input routes.
It's clearly a regression.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Vlad Yasevich Sept. 8, 2014, 6:48 p.m. UTC | #10
On 09/08/2014 01:07 AM, Eric Dumazet wrote:
> On Sun, 2014-09-07 at 21:59 -0700, David Miller wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Sun, 07 Sep 2014 21:43:54 -0700
>>
>>> On Sun, 2014-09-07 at 21:27 -0700, David Miller wrote:
>>>> From: Eric Dumazet <eric.dumazet@gmail.com>
>>>> Date: Sun, 07 Sep 2014 21:18:25 -0700
>>>>
>>>>> On Sun, 2014-09-07 at 15:54 -0700, David Miller wrote:
>>>>>
>>>>>> This might be broken.
>>>>>>
>>>>>> We are dealing here with persistent entries in the ipv6 routine trie.
>>>>>>
>>>>>> If you just bump the genid on the next person to look it up, other
>>>>>> sockets and cached entities might not have validated the route yet,
>>>>>> and now will falsely see the route as valid.  We have to ensure that
>>>>>> they too drop this route and perform a relookup.
>>>>>
>>>>> I am confused, I thought it was the role of the cookie.
>>>>>
>>>>> (Ie socket has to store its own cookie to be able to validate a route)
>>>>>
>>>>> Before 6f3118b571b8 patch, how was this done anyway ?
>>>>>
>>>>> If persistent routes cannot refresh the genid, then they are useless ?
>>>>
>>>> I just speak about the genid aspect.
>>>>
>>>> I understand that cookie (via node->fn_sernum) invalidates the path
>>>> in the fib_trie, but the genid protects against other circumstances
>>>> (matching IPSEC rule, f.e.)
>>>>
>>>> You have to make sure all other sockets did a full route lookup
>>>> (including IPSEC) before you can safely adjust the genid.
>>>>
>>>> I could be wrong, recheck my analysis please :-)
>>>
>>> So this whole genid protection can not work, unless we make sure a
>>> socket cannot share a route with another socket.
>>>
>>> This means we have to clone all routes.
>>
>> I'm willing to revert the change in question if you think that's the
>> sanest way forward.
>>
>> The bug fix for more obscure use cases (IPSEC) if pointless if it
>> breaks more common things (TCP input route caching).
> 
> Lets wait for Nicolas and/or Hannes input, they might have some ideas...
> 
> 

So, looking at the differences between ipv4 and ipv6, it looks like ipv4
genid is only bumped by xfrm, while ipv6 one is bumped every time
__ipv6_ifa_notify() is called which happens in a lot of palaces...

So, my guess is that ipv4 genid doesn't really grow on non-xfrm configured
systems, while ipv6 one can grow very fast.  Probably why we don't see
it with ipv4 and do with ipv6.

-vlad


> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Sept. 9, 2014, 12:58 p.m. UTC | #11
On So, 2014-09-07 at 22:07 -0700, Eric Dumazet wrote:
> On Sun, 2014-09-07 at 21:59 -0700, David Miller wrote:
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Sun, 07 Sep 2014 21:43:54 -0700
> > 
> > > On Sun, 2014-09-07 at 21:27 -0700, David Miller wrote:
> > >> From: Eric Dumazet <eric.dumazet@gmail.com>
> > >> Date: Sun, 07 Sep 2014 21:18:25 -0700
> > >> 
> > >> > On Sun, 2014-09-07 at 15:54 -0700, David Miller wrote:
> > >> > 
> > >> >> This might be broken.
> > >> >> 
> > >> >> We are dealing here with persistent entries in the ipv6 routine trie.
> > >> >> 
> > >> >> If you just bump the genid on the next person to look it up, other
> > >> >> sockets and cached entities might not have validated the route yet,
> > >> >> and now will falsely see the route as valid.  We have to ensure that
> > >> >> they too drop this route and perform a relookup.
> > >> > 
> > >> > I am confused, I thought it was the role of the cookie.
> > >> > 
> > >> > (Ie socket has to store its own cookie to be able to validate a route)
> > >> > 
> > >> > Before 6f3118b571b8 patch, how was this done anyway ?
> > >> > 
> > >> > If persistent routes cannot refresh the genid, then they are useless ?
> > >> 
> > >> I just speak about the genid aspect.
> > >> 
> > >> I understand that cookie (via node->fn_sernum) invalidates the path
> > >> in the fib_trie, but the genid protects against other circumstances
> > >> (matching IPSEC rule, f.e.)
> > >> 
> > >> You have to make sure all other sockets did a full route lookup
> > >> (including IPSEC) before you can safely adjust the genid.
> > >> 
> > >> I could be wrong, recheck my analysis please :-)
> > > 
> > > So this whole genid protection can not work, unless we make sure a
> > > socket cannot share a route with another socket.
> > > 
> > > This means we have to clone all routes.
> > 
> > I'm willing to revert the change in question if you think that's the
> > sanest way forward.
> > 
> > The bug fix for more obscure use cases (IPSEC) if pointless if it
> > breaks more common things (TCP input route caching).
> 
> Lets wait for Nicolas and/or Hannes input, they might have some ideas...

My first idea was to remove rt_genid check in ip6_dst_check completely
and rt_genid_bump_ipv6() should walk the trie to increase fib6_sernum in
rt6i_nodes. I'll try this.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f23fbd28a501..1e76c3c5b87b 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -944,13 +944,20 @@  restart:
 	dst_hold(&rt->dst);
 	read_unlock_bh(&table->tb6_lock);
 
-	if (!(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_GATEWAY)))
+	if (!(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_GATEWAY))) {
 		nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr);
-	else if (!(rt->dst.flags & DST_HOST))
+	} else if (!(rt->dst.flags & DST_HOST)) {
 		nrt = rt6_alloc_clone(rt, &fl6->daddr);
-	else
-		goto out2;
+	} else {
+		u32 genid = rt_genid_ipv6(net);
 
+		/* We must refresh rt6i_genid, but only if needed
+		 * to avoid false sharing.
+		 */
+		if (rt->rt6i_genid != genid)
+			rt->rt6i_genid = genid;
+		goto out2;
+	}
 	ip6_rt_put(rt);
 	rt = nrt ? : net->ipv6.ip6_null_entry;