Message ID | 1410116703.11872.55.camel@edumazet-glaptop2.roam.corp.google.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Sun, 07 Sep 2014 12:05:03 -0700 > From: Eric Dumazet <edumazet@google.com> > > While tracking IPv6 poor performance, I found that IPv6 early demux > was broken in recent kernels. perf profiles show inet6_sk_rx_dst_set() > being called for every incoming TCP segment : > > 20.95% netserver [kernel.kallsyms] [k] dst_release > 19.33% netserver [kernel.kallsyms] [k] ip6_pol_route > 11.75% netserver [kernel.kallsyms] [k] inet6_sk_rx_dst_set > 3.72% netserver [kernel.kallsyms] [k] ip6_input_finish > > Regression came in linux-3.6 with commit 6f3118b571b8 ("ipv6: use > net->rt_genid to check dst validity") > > When a route found in ip6_pol_route() is cloned (either using > rt6_alloc_cow() or rt6_alloc_clone()), copy gets an updated rt6i_genid. > > But when original route is selected, we need to refresh its rt6i_genid > that could be obsolete. If we do not refresh rt6i_genid, ip6_dst_check() > will fail. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > Fixes: 6f3118b571b8 ("ipv6: use net->rt_genid to check dst validity") This might be broken. We are dealing here with persistent entries in the ipv6 routine trie. If you just bump the genid on the next person to look it up, other sockets and cached entities might not have validated the route yet, and now will falsely see the route as valid. We have to ensure that they too drop this route and perform a relookup. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2014-09-07 at 15:54 -0700, David Miller wrote: > This might be broken. > > We are dealing here with persistent entries in the ipv6 routine trie. > > If you just bump the genid on the next person to look it up, other > sockets and cached entities might not have validated the route yet, > and now will falsely see the route as valid. We have to ensure that > they too drop this route and perform a relookup. I am confused, I thought it was the role of the cookie. (Ie socket has to store its own cookie to be able to validate a route) Before 6f3118b571b8 patch, how was this done anyway ? If persistent routes cannot refresh the genid, then they are useless ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Sun, 07 Sep 2014 21:18:25 -0700 > On Sun, 2014-09-07 at 15:54 -0700, David Miller wrote: > >> This might be broken. >> >> We are dealing here with persistent entries in the ipv6 routine trie. >> >> If you just bump the genid on the next person to look it up, other >> sockets and cached entities might not have validated the route yet, >> and now will falsely see the route as valid. We have to ensure that >> they too drop this route and perform a relookup. > > I am confused, I thought it was the role of the cookie. > > (Ie socket has to store its own cookie to be able to validate a route) > > Before 6f3118b571b8 patch, how was this done anyway ? > > If persistent routes cannot refresh the genid, then they are useless ? I just speak about the genid aspect. I understand that cookie (via node->fn_sernum) invalidates the path in the fib_trie, but the genid protects against other circumstances (matching IPSEC rule, f.e.) You have to make sure all other sockets did a full route lookup (including IPSEC) before you can safely adjust the genid. I could be wrong, recheck my analysis please :-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2014-09-07 at 21:27 -0700, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Sun, 07 Sep 2014 21:18:25 -0700 > > > On Sun, 2014-09-07 at 15:54 -0700, David Miller wrote: > > > >> This might be broken. > >> > >> We are dealing here with persistent entries in the ipv6 routine trie. > >> > >> If you just bump the genid on the next person to look it up, other > >> sockets and cached entities might not have validated the route yet, > >> and now will falsely see the route as valid. We have to ensure that > >> they too drop this route and perform a relookup. > > > > I am confused, I thought it was the role of the cookie. > > > > (Ie socket has to store its own cookie to be able to validate a route) > > > > Before 6f3118b571b8 patch, how was this done anyway ? > > > > If persistent routes cannot refresh the genid, then they are useless ? > > I just speak about the genid aspect. > > I understand that cookie (via node->fn_sernum) invalidates the path > in the fib_trie, but the genid protects against other circumstances > (matching IPSEC rule, f.e.) > > You have to make sure all other sockets did a full route lookup > (including IPSEC) before you can safely adjust the genid. > > I could be wrong, recheck my analysis please :-) So this whole genid protection can not work, unless we make sure a socket cannot share a route with another socket. This means we have to clone all routes. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Sun, 07 Sep 2014 21:43:54 -0700 > On Sun, 2014-09-07 at 21:27 -0700, David Miller wrote: >> From: Eric Dumazet <eric.dumazet@gmail.com> >> Date: Sun, 07 Sep 2014 21:18:25 -0700 >> >> > On Sun, 2014-09-07 at 15:54 -0700, David Miller wrote: >> > >> >> This might be broken. >> >> >> >> We are dealing here with persistent entries in the ipv6 routine trie. >> >> >> >> If you just bump the genid on the next person to look it up, other >> >> sockets and cached entities might not have validated the route yet, >> >> and now will falsely see the route as valid. We have to ensure that >> >> they too drop this route and perform a relookup. >> > >> > I am confused, I thought it was the role of the cookie. >> > >> > (Ie socket has to store its own cookie to be able to validate a route) >> > >> > Before 6f3118b571b8 patch, how was this done anyway ? >> > >> > If persistent routes cannot refresh the genid, then they are useless ? >> >> I just speak about the genid aspect. >> >> I understand that cookie (via node->fn_sernum) invalidates the path >> in the fib_trie, but the genid protects against other circumstances >> (matching IPSEC rule, f.e.) >> >> You have to make sure all other sockets did a full route lookup >> (including IPSEC) before you can safely adjust the genid. >> >> I could be wrong, recheck my analysis please :-) > > So this whole genid protection can not work, unless we make sure a > socket cannot share a route with another socket. > > This means we have to clone all routes. I'm willing to revert the change in question if you think that's the sanest way forward. The bug fix for more obscure use cases (IPSEC) if pointless if it breaks more common things (TCP input route caching). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 2014-09-07 at 21:59 -0700, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Sun, 07 Sep 2014 21:43:54 -0700 > > > On Sun, 2014-09-07 at 21:27 -0700, David Miller wrote: > >> From: Eric Dumazet <eric.dumazet@gmail.com> > >> Date: Sun, 07 Sep 2014 21:18:25 -0700 > >> > >> > On Sun, 2014-09-07 at 15:54 -0700, David Miller wrote: > >> > > >> >> This might be broken. > >> >> > >> >> We are dealing here with persistent entries in the ipv6 routine trie. > >> >> > >> >> If you just bump the genid on the next person to look it up, other > >> >> sockets and cached entities might not have validated the route yet, > >> >> and now will falsely see the route as valid. We have to ensure that > >> >> they too drop this route and perform a relookup. > >> > > >> > I am confused, I thought it was the role of the cookie. > >> > > >> > (Ie socket has to store its own cookie to be able to validate a route) > >> > > >> > Before 6f3118b571b8 patch, how was this done anyway ? > >> > > >> > If persistent routes cannot refresh the genid, then they are useless ? > >> > >> I just speak about the genid aspect. > >> > >> I understand that cookie (via node->fn_sernum) invalidates the path > >> in the fib_trie, but the genid protects against other circumstances > >> (matching IPSEC rule, f.e.) > >> > >> You have to make sure all other sockets did a full route lookup > >> (including IPSEC) before you can safely adjust the genid. > >> > >> I could be wrong, recheck my analysis please :-) > > > > So this whole genid protection can not work, unless we make sure a > > socket cannot share a route with another socket. > > > > This means we have to clone all routes. > > I'm willing to revert the change in question if you think that's the > sanest way forward. > > The bug fix for more obscure use cases (IPSEC) if pointless if it > breaks more common things (TCP input route caching). Lets wait for Nicolas and/or Hannes input, they might have some ideas... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le 08/09/2014 07:07, Eric Dumazet a écrit : > On Sun, 2014-09-07 at 21:59 -0700, David Miller wrote: >> From: Eric Dumazet <eric.dumazet@gmail.com> >> Date: Sun, 07 Sep 2014 21:43:54 -0700 >> >>> On Sun, 2014-09-07 at 21:27 -0700, David Miller wrote: >>>> From: Eric Dumazet <eric.dumazet@gmail.com> >>>> Date: Sun, 07 Sep 2014 21:18:25 -0700 >>>> >>>>> On Sun, 2014-09-07 at 15:54 -0700, David Miller wrote: >>>>> >>>>>> This might be broken. >>>>>> >>>>>> We are dealing here with persistent entries in the ipv6 routine trie. >>>>>> >>>>>> If you just bump the genid on the next person to look it up, other >>>>>> sockets and cached entities might not have validated the route yet, >>>>>> and now will falsely see the route as valid. We have to ensure that >>>>>> they too drop this route and perform a relookup. >>>>> >>>>> I am confused, I thought it was the role of the cookie. >>>>> >>>>> (Ie socket has to store its own cookie to be able to validate a route) >>>>> >>>>> Before 6f3118b571b8 patch, how was this done anyway ? >>>>> >>>>> If persistent routes cannot refresh the genid, then they are useless ? >>>> >>>> I just speak about the genid aspect. >>>> >>>> I understand that cookie (via node->fn_sernum) invalidates the path >>>> in the fib_trie, but the genid protects against other circumstances >>>> (matching IPSEC rule, f.e.) >>>> >>>> You have to make sure all other sockets did a full route lookup >>>> (including IPSEC) before you can safely adjust the genid. >>>> >>>> I could be wrong, recheck my analysis please :-) >>> >>> So this whole genid protection can not work, unless we make sure a >>> socket cannot share a route with another socket. >>> >>> This means we have to clone all routes. >> >> I'm willing to revert the change in question if you think that's the >> sanest way forward. >> >> The bug fix for more obscure use cases (IPSEC) if pointless if it >> breaks more common things (TCP input route caching). > > Lets wait for Nicolas and/or Hannes input, they might have some ideas... The initial problem was in SCTP. Here is the thread after the v1 patch: http://patchwork.ozlabs.org/patch/182235/ Before the patch, SCTP stored the IPv6 route in its cache and if an IPsec rules was inserted after that operation, SCTP never invalidated the cached route to use a new xfrm route. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2014-09-08 at 10:11 +0200, Nicolas Dichtel wrote: > Le 08/09/2014 07:07, Eric Dumazet a écrit : > > On Sun, 2014-09-07 at 21:59 -0700, David Miller wrote: > >> From: Eric Dumazet <eric.dumazet@gmail.com> > >> Date: Sun, 07 Sep 2014 21:43:54 -0700 > >> > >>> On Sun, 2014-09-07 at 21:27 -0700, David Miller wrote: > >>>> From: Eric Dumazet <eric.dumazet@gmail.com> > >>>> Date: Sun, 07 Sep 2014 21:18:25 -0700 > >>>> > >>>>> On Sun, 2014-09-07 at 15:54 -0700, David Miller wrote: > >>>>> > >>>>>> This might be broken. > >>>>>> > >>>>>> We are dealing here with persistent entries in the ipv6 routine trie. > >>>>>> > >>>>>> If you just bump the genid on the next person to look it up, other > >>>>>> sockets and cached entities might not have validated the route yet, > >>>>>> and now will falsely see the route as valid. We have to ensure that > >>>>>> they too drop this route and perform a relookup. > >>>>> > >>>>> I am confused, I thought it was the role of the cookie. > >>>>> > >>>>> (Ie socket has to store its own cookie to be able to validate a route) > >>>>> > >>>>> Before 6f3118b571b8 patch, how was this done anyway ? > >>>>> > >>>>> If persistent routes cannot refresh the genid, then they are useless ? > >>>> > >>>> I just speak about the genid aspect. > >>>> > >>>> I understand that cookie (via node->fn_sernum) invalidates the path > >>>> in the fib_trie, but the genid protects against other circumstances > >>>> (matching IPSEC rule, f.e.) > >>>> > >>>> You have to make sure all other sockets did a full route lookup > >>>> (including IPSEC) before you can safely adjust the genid. > >>>> > >>>> I could be wrong, recheck my analysis please :-) > >>> > >>> So this whole genid protection can not work, unless we make sure a > >>> socket cannot share a route with another socket. > >>> > >>> This means we have to clone all routes. > >> > >> I'm willing to revert the change in question if you think that's the > >> sanest way forward. > >> > >> The bug fix for more obscure use cases (IPSEC) if pointless if it > >> breaks more common things (TCP input route caching). > > > > Lets wait for Nicolas and/or Hannes input, they might have some ideas... > > The initial problem was in SCTP. Here is the thread after the v1 patch: > http://patchwork.ozlabs.org/patch/182235/ > > Before the patch, SCTP stored the IPv6 route in its cache and if an IPsec > rules was inserted after that operation, SCTP never invalidated the cached > route to use a new xfrm route. This thread mentions output route. The problem I currently have with IPv6 early demux is for input routes. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le 08/09/2014 12:28, Eric Dumazet a écrit : > On Mon, 2014-09-08 at 10:11 +0200, Nicolas Dichtel wrote: >> Le 08/09/2014 07:07, Eric Dumazet a écrit : >>> On Sun, 2014-09-07 at 21:59 -0700, David Miller wrote: >>>> From: Eric Dumazet <eric.dumazet@gmail.com> >>>> Date: Sun, 07 Sep 2014 21:43:54 -0700 >>>> >>>>> On Sun, 2014-09-07 at 21:27 -0700, David Miller wrote: >>>>>> From: Eric Dumazet <eric.dumazet@gmail.com> >>>>>> Date: Sun, 07 Sep 2014 21:18:25 -0700 >>>>>> >>>>>>> On Sun, 2014-09-07 at 15:54 -0700, David Miller wrote: >>>>>>> >>>>>>>> This might be broken. >>>>>>>> >>>>>>>> We are dealing here with persistent entries in the ipv6 routine trie. >>>>>>>> >>>>>>>> If you just bump the genid on the next person to look it up, other >>>>>>>> sockets and cached entities might not have validated the route yet, >>>>>>>> and now will falsely see the route as valid. We have to ensure that >>>>>>>> they too drop this route and perform a relookup. >>>>>>> >>>>>>> I am confused, I thought it was the role of the cookie. >>>>>>> >>>>>>> (Ie socket has to store its own cookie to be able to validate a route) >>>>>>> >>>>>>> Before 6f3118b571b8 patch, how was this done anyway ? >>>>>>> >>>>>>> If persistent routes cannot refresh the genid, then they are useless ? >>>>>> >>>>>> I just speak about the genid aspect. >>>>>> >>>>>> I understand that cookie (via node->fn_sernum) invalidates the path >>>>>> in the fib_trie, but the genid protects against other circumstances >>>>>> (matching IPSEC rule, f.e.) >>>>>> >>>>>> You have to make sure all other sockets did a full route lookup >>>>>> (including IPSEC) before you can safely adjust the genid. >>>>>> >>>>>> I could be wrong, recheck my analysis please :-) >>>>> >>>>> So this whole genid protection can not work, unless we make sure a >>>>> socket cannot share a route with another socket. >>>>> >>>>> This means we have to clone all routes. >>>> >>>> I'm willing to revert the change in question if you think that's the >>>> sanest way forward. >>>> >>>> The bug fix for more obscure use cases (IPSEC) if pointless if it >>>> breaks more common things (TCP input route caching). >>> >>> Lets wait for Nicolas and/or Hannes input, they might have some ideas... >> >> The initial problem was in SCTP. Here is the thread after the v1 patch: >> http://patchwork.ozlabs.org/patch/182235/ >> >> Before the patch, SCTP stored the IPv6 route in its cache and if an IPsec >> rules was inserted after that operation, SCTP never invalidated the cached >> route to use a new xfrm route. > > This thread mentions output route. Yes, it was the target. > > The problem I currently have with IPv6 early demux is for input routes. It's clearly a regression. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/08/2014 01:07 AM, Eric Dumazet wrote: > On Sun, 2014-09-07 at 21:59 -0700, David Miller wrote: >> From: Eric Dumazet <eric.dumazet@gmail.com> >> Date: Sun, 07 Sep 2014 21:43:54 -0700 >> >>> On Sun, 2014-09-07 at 21:27 -0700, David Miller wrote: >>>> From: Eric Dumazet <eric.dumazet@gmail.com> >>>> Date: Sun, 07 Sep 2014 21:18:25 -0700 >>>> >>>>> On Sun, 2014-09-07 at 15:54 -0700, David Miller wrote: >>>>> >>>>>> This might be broken. >>>>>> >>>>>> We are dealing here with persistent entries in the ipv6 routine trie. >>>>>> >>>>>> If you just bump the genid on the next person to look it up, other >>>>>> sockets and cached entities might not have validated the route yet, >>>>>> and now will falsely see the route as valid. We have to ensure that >>>>>> they too drop this route and perform a relookup. >>>>> >>>>> I am confused, I thought it was the role of the cookie. >>>>> >>>>> (Ie socket has to store its own cookie to be able to validate a route) >>>>> >>>>> Before 6f3118b571b8 patch, how was this done anyway ? >>>>> >>>>> If persistent routes cannot refresh the genid, then they are useless ? >>>> >>>> I just speak about the genid aspect. >>>> >>>> I understand that cookie (via node->fn_sernum) invalidates the path >>>> in the fib_trie, but the genid protects against other circumstances >>>> (matching IPSEC rule, f.e.) >>>> >>>> You have to make sure all other sockets did a full route lookup >>>> (including IPSEC) before you can safely adjust the genid. >>>> >>>> I could be wrong, recheck my analysis please :-) >>> >>> So this whole genid protection can not work, unless we make sure a >>> socket cannot share a route with another socket. >>> >>> This means we have to clone all routes. >> >> I'm willing to revert the change in question if you think that's the >> sanest way forward. >> >> The bug fix for more obscure use cases (IPSEC) if pointless if it >> breaks more common things (TCP input route caching). > > Lets wait for Nicolas and/or Hannes input, they might have some ideas... > > So, looking at the differences between ipv4 and ipv6, it looks like ipv4 genid is only bumped by xfrm, while ipv6 one is bumped every time __ipv6_ifa_notify() is called which happens in a lot of palaces... So, my guess is that ipv4 genid doesn't really grow on non-xfrm configured systems, while ipv6 one can grow very fast. Probably why we don't see it with ipv4 and do with ipv6. -vlad > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On So, 2014-09-07 at 22:07 -0700, Eric Dumazet wrote: > On Sun, 2014-09-07 at 21:59 -0700, David Miller wrote: > > From: Eric Dumazet <eric.dumazet@gmail.com> > > Date: Sun, 07 Sep 2014 21:43:54 -0700 > > > > > On Sun, 2014-09-07 at 21:27 -0700, David Miller wrote: > > >> From: Eric Dumazet <eric.dumazet@gmail.com> > > >> Date: Sun, 07 Sep 2014 21:18:25 -0700 > > >> > > >> > On Sun, 2014-09-07 at 15:54 -0700, David Miller wrote: > > >> > > > >> >> This might be broken. > > >> >> > > >> >> We are dealing here with persistent entries in the ipv6 routine trie. > > >> >> > > >> >> If you just bump the genid on the next person to look it up, other > > >> >> sockets and cached entities might not have validated the route yet, > > >> >> and now will falsely see the route as valid. We have to ensure that > > >> >> they too drop this route and perform a relookup. > > >> > > > >> > I am confused, I thought it was the role of the cookie. > > >> > > > >> > (Ie socket has to store its own cookie to be able to validate a route) > > >> > > > >> > Before 6f3118b571b8 patch, how was this done anyway ? > > >> > > > >> > If persistent routes cannot refresh the genid, then they are useless ? > > >> > > >> I just speak about the genid aspect. > > >> > > >> I understand that cookie (via node->fn_sernum) invalidates the path > > >> in the fib_trie, but the genid protects against other circumstances > > >> (matching IPSEC rule, f.e.) > > >> > > >> You have to make sure all other sockets did a full route lookup > > >> (including IPSEC) before you can safely adjust the genid. > > >> > > >> I could be wrong, recheck my analysis please :-) > > > > > > So this whole genid protection can not work, unless we make sure a > > > socket cannot share a route with another socket. > > > > > > This means we have to clone all routes. > > > > I'm willing to revert the change in question if you think that's the > > sanest way forward. > > > > The bug fix for more obscure use cases (IPSEC) if pointless if it > > breaks more common things (TCP input route caching). > > Lets wait for Nicolas and/or Hannes input, they might have some ideas... My first idea was to remove rt_genid check in ip6_dst_check completely and rt_genid_bump_ipv6() should walk the trie to increase fib6_sernum in rt6i_nodes. I'll try this. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv6/route.c b/net/ipv6/route.c index f23fbd28a501..1e76c3c5b87b 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -944,13 +944,20 @@ restart: dst_hold(&rt->dst); read_unlock_bh(&table->tb6_lock); - if (!(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_GATEWAY))) + if (!(rt->rt6i_flags & (RTF_NONEXTHOP | RTF_GATEWAY))) { nrt = rt6_alloc_cow(rt, &fl6->daddr, &fl6->saddr); - else if (!(rt->dst.flags & DST_HOST)) + } else if (!(rt->dst.flags & DST_HOST)) { nrt = rt6_alloc_clone(rt, &fl6->daddr); - else - goto out2; + } else { + u32 genid = rt_genid_ipv6(net); + /* We must refresh rt6i_genid, but only if needed + * to avoid false sharing. + */ + if (rt->rt6i_genid != genid) + rt->rt6i_genid = genid; + goto out2; + } ip6_rt_put(rt); rt = nrt ? : net->ipv6.ip6_null_entry;