Message ID | 59304478d82e52c817c92d183ef2105f35c17266.1508261949.git.pabeni@redhat.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
Series | ipv6: fixes for RTF_CACHE entries | expand |
On Tue, Oct 17, 2017 at 10:40 AM, Paolo Abeni <pabeni@redhat.com> wrote: > The commit 2b760fcf5cfb ("ipv6: hook up exception table to store > dst cache") partially reverted 1e2ea8ad37be ("ipv6: set > dst.obsolete when a cached route has expired"). > > This change brings back the dst obsoleting and push it a step > farther: cached dst are always obsoleted when removed from the > fib tree, and removal by time expiration is now performed > regardless of dst->__refcnt, to be consistent with what we > already do for RTF_GATEWAY dst. > > Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > net/ipv6/route.c | 13 +++++++++++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index 8b25a31b6b03..fce740049e3e 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -1147,6 +1147,12 @@ static void rt6_remove_exception(struct rt6_exception_bucket *bucket, > if (!bucket || !rt6_ex) > return; > > + /* sockets, flow cache, etc. can hold a refence to this dst, be sure > + * they will drop it. > + */ > + if (rt6_ex->rt6i) > + rt6_ex->rt6i->dst.obsolete = DST_OBSOLETE_FORCE_CHK; > + Hmm... I don't really think it is needed. rt6 is created with rt6->dst.obsolete set to DST_OBSOLETE_FORCE_CHK. And by the time the above function is called, it should still be that value. Furthermore, the later call rt6_release() calls dst_dev_put() which sets rt6->dst.obsolete to DST_OBSOLETE_DEAD to indicate this route has been removed from the tree. > net = dev_net(rt6_ex->rt6i->dst.dev); > rt6_ex->rt6i->rt6i_node = NULL; > hlist_del_rcu(&rt6_ex->hlist); > @@ -1575,8 +1581,11 @@ static void rt6_age_examine_exception(struct rt6_exception_bucket *bucket, > { > struct rt6_info *rt = rt6_ex->rt6i; > > - if (atomic_read(&rt->dst.__refcnt) == 1 && > - time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) { > + /* we are pruning and obsoleting the exception route even if others > + * have still reference to it, so that on next dst_check() such > + * reference can be dropped > + */ > + if (time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) { Why do we want to change this behavior? Before my patch series, cached routes were only deleted from the tree in fib6_age() when rt->dst.__refcnt == 1, isn't it? > RT6_TRACE("aging clone %p\n", rt); > rt6_remove_exception(bucket, rt6_ex); > return; > -- > 2.13.6 >
On Tue, 2017-10-17 at 11:58 -0700, Wei Wang wrote: > On Tue, Oct 17, 2017 at 10:40 AM, Paolo Abeni <pabeni@redhat.com> wrote: > > The commit 2b760fcf5cfb ("ipv6: hook up exception table to store > > dst cache") partially reverted 1e2ea8ad37be ("ipv6: set > > dst.obsolete when a cached route has expired"). > > > > This change brings back the dst obsoleting and push it a step > > farther: cached dst are always obsoleted when removed from the > > fib tree, and removal by time expiration is now performed > > regardless of dst->__refcnt, to be consistent with what we > > already do for RTF_GATEWAY dst. > > > > Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > --- > > net/ipv6/route.c | 13 +++++++++++-- > > 1 file changed, 11 insertions(+), 2 deletions(-) > > > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > > index 8b25a31b6b03..fce740049e3e 100644 > > --- a/net/ipv6/route.c > > +++ b/net/ipv6/route.c > > @@ -1147,6 +1147,12 @@ static void rt6_remove_exception(struct rt6_exception_bucket *bucket, > > if (!bucket || !rt6_ex) > > return; > > > > + /* sockets, flow cache, etc. can hold a refence to this dst, be sure > > + * they will drop it. > > + */ > > + if (rt6_ex->rt6i) > > + rt6_ex->rt6i->dst.obsolete = DST_OBSOLETE_FORCE_CHK; > > + > > Hmm... I don't really think it is needed. rt6 is created with > rt6->dst.obsolete set to DST_OBSOLETE_FORCE_CHK. And by the time the > above function is called, it should still be that value. > Furthermore, the later call rt6_release() calls dst_dev_put() which > sets rt6->dst.obsolete to DST_OBSOLETE_DEAD to indicate this route has > been removed from the tree. You are right, this looks as not needed, if we keep the chunck below. > > net = dev_net(rt6_ex->rt6i->dst.dev); > > rt6_ex->rt6i->rt6i_node = NULL; > > hlist_del_rcu(&rt6_ex->hlist); > > @@ -1575,8 +1581,11 @@ static void rt6_age_examine_exception(struct rt6_exception_bucket *bucket, > > { > > struct rt6_info *rt = rt6_ex->rt6i; > > > > - if (atomic_read(&rt->dst.__refcnt) == 1 && > > - time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) { > > + /* we are pruning and obsoleting the exception route even if others > > + * have still reference to it, so that on next dst_check() such > > + * reference can be dropped > > + */ > > + if (time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) { > > Why do we want to change this behavior? Before my patch series, cached > routes were only deleted from the tree in fib6_age() when > rt->dst.__refcnt == 1, isn't it? yes, but that really looks like a relic from ancient past more than something really needed. We already remove from the dst from fib tree regardless of the refcnt if the gateway validation fails - a few lines below in the same function. Waiting for __refcnt going down will let the kernel keep the exception entry around for much longer - potentially forever, if e.g. we have a reference in a socket dst cache and the application stops processing packets. Meanwhile others sockets may grab more references to (and use) the same aged-out dst. The commit 1e2ea8ad37be ("ipv6: set dst.obsolete when a cached route has expired") was the solution to the above issue prior to the recent refactor. Cheers, Paolo
On Tue, Oct 17, 2017 at 1:02 PM, Paolo Abeni <pabeni@redhat.com> wrote: > On Tue, 2017-10-17 at 11:58 -0700, Wei Wang wrote: >> On Tue, Oct 17, 2017 at 10:40 AM, Paolo Abeni <pabeni@redhat.com> wrote: >> > The commit 2b760fcf5cfb ("ipv6: hook up exception table to store >> > dst cache") partially reverted 1e2ea8ad37be ("ipv6: set >> > dst.obsolete when a cached route has expired"). >> > >> > This change brings back the dst obsoleting and push it a step >> > farther: cached dst are always obsoleted when removed from the >> > fib tree, and removal by time expiration is now performed >> > regardless of dst->__refcnt, to be consistent with what we >> > already do for RTF_GATEWAY dst. >> > >> > Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") >> > Signed-off-by: Paolo Abeni <pabeni@redhat.com> >> > --- >> > net/ipv6/route.c | 13 +++++++++++-- >> > 1 file changed, 11 insertions(+), 2 deletions(-) >> > >> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c >> > index 8b25a31b6b03..fce740049e3e 100644 >> > --- a/net/ipv6/route.c >> > +++ b/net/ipv6/route.c >> > @@ -1147,6 +1147,12 @@ static void rt6_remove_exception(struct rt6_exception_bucket *bucket, >> > if (!bucket || !rt6_ex) >> > return; >> > >> > + /* sockets, flow cache, etc. can hold a refence to this dst, be sure >> > + * they will drop it. >> > + */ >> > + if (rt6_ex->rt6i) >> > + rt6_ex->rt6i->dst.obsolete = DST_OBSOLETE_FORCE_CHK; >> > + >> >> Hmm... I don't really think it is needed. rt6 is created with >> rt6->dst.obsolete set to DST_OBSOLETE_FORCE_CHK. And by the time the >> above function is called, it should still be that value. >> Furthermore, the later call rt6_release() calls dst_dev_put() which >> sets rt6->dst.obsolete to DST_OBSOLETE_DEAD to indicate this route has >> been removed from the tree. > > You are right, this looks as not needed, if we keep the chunck below. > >> > net = dev_net(rt6_ex->rt6i->dst.dev); >> > rt6_ex->rt6i->rt6i_node = NULL; >> > hlist_del_rcu(&rt6_ex->hlist); >> > @@ -1575,8 +1581,11 @@ static void rt6_age_examine_exception(struct rt6_exception_bucket *bucket, >> > { >> > struct rt6_info *rt = rt6_ex->rt6i; >> > >> > - if (atomic_read(&rt->dst.__refcnt) == 1 && >> > - time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) { >> > + /* we are pruning and obsoleting the exception route even if others >> > + * have still reference to it, so that on next dst_check() such >> > + * reference can be dropped >> > + */ >> > + if (time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) { >> >> Why do we want to change this behavior? Before my patch series, cached >> routes were only deleted from the tree in fib6_age() when >> rt->dst.__refcnt == 1, isn't it? > > yes, but that really looks like a relic from ancient past more than > something really needed. We already remove from the dst from fib tree > regardless of the refcnt if the gateway validation fails - a few lines > below in the same function. > > Waiting for __refcnt going down will let the kernel keep the exception > entry around for much longer - potentially forever, if e.g. we have a > reference in a socket dst cache and the application stops processing > packets. > True. If the socket is idle and doesn't send/receive packets, dst_check() won't get triggered and the socket will keep holding refcnt on the obsolete dst. > Meanwhile others sockets may grab more references to (and use) the same > aged-out dst. > I don't think other sockets could grab more reference to this dst because this dst should already be removed from the fib6 tree. > The commit 1e2ea8ad37be ("ipv6: set dst.obsolete when a cached route > has expired") was the solution to the above issue prior to the recent > refactor. > I don't really understand how this commit is solving the above issue. This commit still only ages out cached route if &rt->dst.__refcnt == 1. So if socket is holding refcnt to this dst and dst_check() is not getting called, this cached route still won't get deleted. > Cheers, > > Paolo
On Tue, Oct 17, 2017 at 06:58:23PM +0000, Wei Wang wrote: > On Tue, Oct 17, 2017 at 10:40 AM, Paolo Abeni <pabeni@redhat.com> wrote: > > The commit 2b760fcf5cfb ("ipv6: hook up exception table to store > > dst cache") partially reverted 1e2ea8ad37be ("ipv6: set > > dst.obsolete when a cached route has expired"). > > > > This change brings back the dst obsoleting and push it a step > > farther: cached dst are always obsoleted when removed from the > > fib tree, and removal by time expiration is now performed > > regardless of dst->__refcnt, to be consistent with what we > > already do for RTF_GATEWAY dst. > > > > Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > --- > > net/ipv6/route.c | 13 +++++++++++-- > > 1 file changed, 11 insertions(+), 2 deletions(-) > > > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > > index 8b25a31b6b03..fce740049e3e 100644 > > --- a/net/ipv6/route.c > > +++ b/net/ipv6/route.c > > @@ -1147,6 +1147,12 @@ static void rt6_remove_exception(struct rt6_exception_bucket *bucket, > > if (!bucket || !rt6_ex) > > return; > > > > + /* sockets, flow cache, etc. can hold a refence to this dst, be sure > > + * they will drop it. > > + */ > > + if (rt6_ex->rt6i) > > + rt6_ex->rt6i->dst.obsolete = DST_OBSOLETE_FORCE_CHK; > > + > > Hmm... I don't really think it is needed. rt6 is created with > rt6->dst.obsolete set to DST_OBSOLETE_FORCE_CHK. And by the time the > above function is called, it should still be that value. > Furthermore, the later call rt6_release() calls dst_dev_put() which > sets rt6->dst.obsolete to DST_OBSOLETE_DEAD to indicate this route has > been removed from the tree. > > > net = dev_net(rt6_ex->rt6i->dst.dev); > > rt6_ex->rt6i->rt6i_node = NULL; > > hlist_del_rcu(&rt6_ex->hlist); > > @@ -1575,8 +1581,11 @@ static void rt6_age_examine_exception(struct rt6_exception_bucket *bucket, > > { > > struct rt6_info *rt = rt6_ex->rt6i; > > > > - if (atomic_read(&rt->dst.__refcnt) == 1 && > > - time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) { > > + /* we are pruning and obsoleting the exception route even if others > > + * have still reference to it, so that on next dst_check() such > > + * reference can be dropped > > + */ > > + if (time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) { > > Why do we want to change this behavior? Before my patch series, cached > routes were only deleted from the tree in fib6_age() when > rt->dst.__refcnt == 1, isn't it? In the commit 1e2ea8ad37be ("ipv6: set dst.obsolete when a cached route has expired"), if obsolete is set to DST_OBSOLETE_KILL, why it is not removed from the tree together? > > > RT6_TRACE("aging clone %p\n", rt); > > rt6_remove_exception(bucket, rt6_ex); > > return; > > -- > > 2.13.6 > >
On Tue, 2017-10-17 at 13:48 -0700, Wei Wang wrote: > On Tue, Oct 17, 2017 at 1:02 PM, Paolo Abeni <pabeni@redhat.com> wrote: > > Meanwhile others sockets may grab more references to (and use) the same > > aged-out dst. > > > > I don't think other sockets could grab more reference to this dst > because this dst should already be removed from the fib6 tree. With the current net-next code, the dst is not removed from the fib tree while someone else is holding it and dst_check() does not fail after that the cached dst is aged out. If a socket cache grab a reference to the CACHE dst, it will not release it untill the next sernum change, regardless of the dst aging. > > The commit 1e2ea8ad37be ("ipv6: set dst.obsolete when a cached route > > has expired") was the solution to the above issue prior to the recent > > refactor. > > > > I don't really understand how this commit is solving the above issue. > This commit still only ages out cached route if &rt->dst.__refcnt == > 1. So if socket is holding refcnt to this dst and dst_check() is not > getting called, this cached route still won't get deleted. Setting obsolete to DST_OBSOLETE_KILL forced whoever was holding the dst reference to drop it on the next dst_check(), so that refcnt could go down. Cheers, Paolo
On Wed, Oct 18, 2017 at 6:03 AM, Paolo Abeni <pabeni@redhat.com> wrote: > On Tue, 2017-10-17 at 13:48 -0700, Wei Wang wrote: >> On Tue, Oct 17, 2017 at 1:02 PM, Paolo Abeni <pabeni@redhat.com> wrote: >> > Meanwhile others sockets may grab more references to (and use) the same >> > aged-out dst. >> > >> >> I don't think other sockets could grab more reference to this dst >> because this dst should already be removed from the fib6 tree. > > With the current net-next code, the dst is not removed from the fib > tree while someone else is holding it and dst_check() does not fail > after that the cached dst is aged out. If a socket cache grab a > reference to the CACHE dst, it will not release it untill the next > sernum change, regardless of the dst aging. > >> > The commit 1e2ea8ad37be ("ipv6: set dst.obsolete when a cached route >> > has expired") was the solution to the above issue prior to the recent >> > refactor. >> > >> >> I don't really understand how this commit is solving the above issue. >> This commit still only ages out cached route if &rt->dst.__refcnt == >> 1. So if socket is holding refcnt to this dst and dst_check() is not >> getting called, this cached route still won't get deleted. > > Setting obsolete to DST_OBSOLETE_KILL forced whoever was holding the > dst reference to drop it on the next dst_check(), so that refcnt could > go down. > Yes. Understood. Martin and I had a discussion yesterday. We both think it is not a good idea to set obolete to DST_OBSOLETE_KILL but not to remove it from the fib6 tree. It is because others who do a route lookup later might potentially find this route and tries to use this route. However, dst_check() will show this route is invalid. So the user will redo the route lookup. But as this route is not yet deleted from the tree, it will find this route again. This seems like a bad situation. And again, setting obsolete to DST_OBSOLETE_KILL does not prevent some idle socket holding on to this dst for a long time... With the above said, I am now convinced what you have in your patch is the correct thing to do. Just remove the cached route without checking the refcnt when it is aged. > Cheers, > > Paolo
On Wed, Oct 18, 2017 at 05:56:39PM +0000, Wei Wang wrote: > On Wed, Oct 18, 2017 at 6:03 AM, Paolo Abeni <pabeni@redhat.com> wrote: > > On Tue, 2017-10-17 at 13:48 -0700, Wei Wang wrote: > >> On Tue, Oct 17, 2017 at 1:02 PM, Paolo Abeni <pabeni@redhat.com> wrote: > >> > Meanwhile others sockets may grab more references to (and use) the same > >> > aged-out dst. > >> > > >> > >> I don't think other sockets could grab more reference to this dst > >> because this dst should already be removed from the fib6 tree. > > > > With the current net-next code, the dst is not removed from the fib > > tree while someone else is holding it and dst_check() does not fail > > after that the cached dst is aged out. If a socket cache grab a > > reference to the CACHE dst, it will not release it untill the next > > sernum change, regardless of the dst aging. > > > >> > The commit 1e2ea8ad37be ("ipv6: set dst.obsolete when a cached route > >> > has expired") was the solution to the above issue prior to the recent > >> > refactor. > >> > > >> > >> I don't really understand how this commit is solving the above issue. > >> This commit still only ages out cached route if &rt->dst.__refcnt == > >> 1. So if socket is holding refcnt to this dst and dst_check() is not > >> getting called, this cached route still won't get deleted. > > > > Setting obsolete to DST_OBSOLETE_KILL forced whoever was holding the > > dst reference to drop it on the next dst_check(), so that refcnt could > > go down. > > > > Yes. Understood. > Martin and I had a discussion yesterday. We both think it is not a > good idea to set obolete to DST_OBSOLETE_KILL but not to remove it > from the fib6 tree. > It is because others who do a route lookup later might potentially > find this route and tries to use this route. However, dst_check() will > show this route is invalid. So the user will redo the route lookup. > But as this route is not yet deleted from the tree, it will find this > route again. This seems like a bad situation. > And again, setting obsolete to DST_OBSOLETE_KILL does not prevent some > idle socket holding on to this dst for a long time... > > With the above said, I am now convinced what you have in your patch is > the correct thing to do. Just remove the cached route without checking > the refcnt when it is aged. Another thing (not limited to this case), Considering we have a limited size in the exception table now and the oldest one will get removed when the table is full, do we still need to purge this periodically in gc? I would like to see the IPv6's gc eventually goes away. If we need to expire the pmtu value of a RTF_CACHE rt, can dst.expires be checked during the lookup (like what ipv4 is doing) and then remove it from the exception table?
On Wed, 2017-10-18 at 12:05 -0700, Martin KaFai Lau wrote: > Another thing (not limited to this case), > > Considering we have a limited size in the exception table now and > the oldest one will get removed when the table is full, > do we still need to purge this periodically in gc? At least in some scenarios we have only a few entries in the exception table. > I would like to see the IPv6's gc eventually goes away. > > If we need to expire the pmtu value of a RTF_CACHE rt, > can dst.expires be checked during the lookup (like what > ipv4 is doing) and then remove it from the exception table? Currently the gc also performs validation vs the related neigh for GATEWAY dst. That looks potentially quite expensive, as we currently have a per neighbour atomic refcount (e.g. if the same dst is cached on different CPUs) Cheers, Paolo
diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 8b25a31b6b03..fce740049e3e 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1147,6 +1147,12 @@ static void rt6_remove_exception(struct rt6_exception_bucket *bucket, if (!bucket || !rt6_ex) return; + /* sockets, flow cache, etc. can hold a refence to this dst, be sure + * they will drop it. + */ + if (rt6_ex->rt6i) + rt6_ex->rt6i->dst.obsolete = DST_OBSOLETE_FORCE_CHK; + net = dev_net(rt6_ex->rt6i->dst.dev); rt6_ex->rt6i->rt6i_node = NULL; hlist_del_rcu(&rt6_ex->hlist); @@ -1575,8 +1581,11 @@ static void rt6_age_examine_exception(struct rt6_exception_bucket *bucket, { struct rt6_info *rt = rt6_ex->rt6i; - if (atomic_read(&rt->dst.__refcnt) == 1 && - time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) { + /* we are pruning and obsoleting the exception route even if others + * have still reference to it, so that on next dst_check() such + * reference can be dropped + */ + if (time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) { RT6_TRACE("aging clone %p\n", rt); rt6_remove_exception(bucket, rt6_ex); return;
The commit 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") partially reverted 1e2ea8ad37be ("ipv6: set dst.obsolete when a cached route has expired"). This change brings back the dst obsoleting and push it a step farther: cached dst are always obsoleted when removed from the fib tree, and removal by time expiration is now performed regardless of dst->__refcnt, to be consistent with what we already do for RTF_GATEWAY dst. Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache") Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- net/ipv6/route.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)