diff mbox series

[net-next,3/3] ipv6: obsolete cached dst when removing them from fib tree

Message ID 59304478d82e52c817c92d183ef2105f35c17266.1508261949.git.pabeni@redhat.com
State Changes Requested, archived
Delegated to: David Miller
Headers show
Series ipv6: fixes for RTF_CACHE entries | expand

Commit Message

Paolo Abeni Oct. 17, 2017, 5:40 p.m. UTC
The commit 2b760fcf5cfb ("ipv6: hook up exception table to store
dst cache") partially reverted 1e2ea8ad37be ("ipv6: set
dst.obsolete when a cached route has expired").

This change brings back the dst obsoleting and push it a step
farther: cached dst are always obsoleted when removed from the
fib tree, and removal by time expiration is now performed
regardless of dst->__refcnt, to be consistent with what we
already do for RTF_GATEWAY dst.

Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv6/route.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

Comments

Wei Wang Oct. 17, 2017, 6:58 p.m. UTC | #1
On Tue, Oct 17, 2017 at 10:40 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> The commit 2b760fcf5cfb ("ipv6: hook up exception table to store
> dst cache") partially reverted 1e2ea8ad37be ("ipv6: set
> dst.obsolete when a cached route has expired").
>
> This change brings back the dst obsoleting and push it a step
> farther: cached dst are always obsoleted when removed from the
> fib tree, and removal by time expiration is now performed
> regardless of dst->__refcnt, to be consistent with what we
> already do for RTF_GATEWAY dst.
>
> Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/ipv6/route.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 8b25a31b6b03..fce740049e3e 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1147,6 +1147,12 @@ static void rt6_remove_exception(struct rt6_exception_bucket *bucket,
>         if (!bucket || !rt6_ex)
>                 return;
>
> +       /* sockets, flow cache, etc. can hold a refence to this dst, be sure
> +        * they will drop it.
> +        */
> +       if (rt6_ex->rt6i)
> +               rt6_ex->rt6i->dst.obsolete = DST_OBSOLETE_FORCE_CHK;
> +

Hmm... I don't really think it is needed. rt6 is created with
rt6->dst.obsolete set to DST_OBSOLETE_FORCE_CHK. And by the time the
above function is called, it should still be that value.
Furthermore, the later call rt6_release() calls dst_dev_put() which
sets rt6->dst.obsolete to DST_OBSOLETE_DEAD to indicate this route has
been removed from the tree.

>         net = dev_net(rt6_ex->rt6i->dst.dev);
>         rt6_ex->rt6i->rt6i_node = NULL;
>         hlist_del_rcu(&rt6_ex->hlist);
> @@ -1575,8 +1581,11 @@ static void rt6_age_examine_exception(struct rt6_exception_bucket *bucket,
>  {
>         struct rt6_info *rt = rt6_ex->rt6i;
>
> -       if (atomic_read(&rt->dst.__refcnt) == 1 &&
> -           time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {
> +       /* we are pruning and obsoleting the exception route even if others
> +        * have still reference to it, so that on next dst_check() such
> +        * reference can be dropped
> +        */
> +       if (time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {

Why do we want to change this behavior? Before my patch series, cached
routes were only deleted from the tree in fib6_age() when
rt->dst.__refcnt == 1, isn't it?

>                 RT6_TRACE("aging clone %p\n", rt);
>                 rt6_remove_exception(bucket, rt6_ex);
>                 return;
> --
> 2.13.6
>
Paolo Abeni Oct. 17, 2017, 8:02 p.m. UTC | #2
On Tue, 2017-10-17 at 11:58 -0700, Wei Wang wrote:
> On Tue, Oct 17, 2017 at 10:40 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> > The commit 2b760fcf5cfb ("ipv6: hook up exception table to store
> > dst cache") partially reverted 1e2ea8ad37be ("ipv6: set
> > dst.obsolete when a cached route has expired").
> > 
> > This change brings back the dst obsoleting and push it a step
> > farther: cached dst are always obsoleted when removed from the
> > fib tree, and removal by time expiration is now performed
> > regardless of dst->__refcnt, to be consistent with what we
> > already do for RTF_GATEWAY dst.
> > 
> > Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  net/ipv6/route.c | 13 +++++++++++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index 8b25a31b6b03..fce740049e3e 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -1147,6 +1147,12 @@ static void rt6_remove_exception(struct rt6_exception_bucket *bucket,
> >         if (!bucket || !rt6_ex)
> >                 return;
> > 
> > +       /* sockets, flow cache, etc. can hold a refence to this dst, be sure
> > +        * they will drop it.
> > +        */
> > +       if (rt6_ex->rt6i)
> > +               rt6_ex->rt6i->dst.obsolete = DST_OBSOLETE_FORCE_CHK;
> > +
> 
> Hmm... I don't really think it is needed. rt6 is created with
> rt6->dst.obsolete set to DST_OBSOLETE_FORCE_CHK. And by the time the
> above function is called, it should still be that value.
> Furthermore, the later call rt6_release() calls dst_dev_put() which
> sets rt6->dst.obsolete to DST_OBSOLETE_DEAD to indicate this route has
> been removed from the tree.

You are right, this looks as not needed, if we keep the chunck below.

> >         net = dev_net(rt6_ex->rt6i->dst.dev);
> >         rt6_ex->rt6i->rt6i_node = NULL;
> >         hlist_del_rcu(&rt6_ex->hlist);
> > @@ -1575,8 +1581,11 @@ static void rt6_age_examine_exception(struct rt6_exception_bucket *bucket,
> >  {
> >         struct rt6_info *rt = rt6_ex->rt6i;
> > 
> > -       if (atomic_read(&rt->dst.__refcnt) == 1 &&
> > -           time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {
> > +       /* we are pruning and obsoleting the exception route even if others
> > +        * have still reference to it, so that on next dst_check() such
> > +        * reference can be dropped
> > +        */
> > +       if (time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {
> 
> Why do we want to change this behavior? Before my patch series, cached
> routes were only deleted from the tree in fib6_age() when
> rt->dst.__refcnt == 1, isn't it?

yes, but that really looks like a relic from ancient past more than
something really needed. We already remove from the dst from fib tree
regardless of the refcnt if the gateway validation fails - a few lines
below in the same function.

Waiting for __refcnt going down will let the kernel keep the exception
entry around for much longer - potentially forever, if e.g. we have a
reference in a socket dst cache and the application stops processing
packets. 

Meanwhile others sockets may grab more references to (and use) the same
aged-out dst.

The commit 1e2ea8ad37be ("ipv6: set dst.obsolete when a cached route
has expired") was the solution to the above issue prior to the recent
refactor.

Cheers,

Paolo
Wei Wang Oct. 17, 2017, 8:48 p.m. UTC | #3
On Tue, Oct 17, 2017 at 1:02 PM, Paolo Abeni <pabeni@redhat.com> wrote:
> On Tue, 2017-10-17 at 11:58 -0700, Wei Wang wrote:
>> On Tue, Oct 17, 2017 at 10:40 AM, Paolo Abeni <pabeni@redhat.com> wrote:
>> > The commit 2b760fcf5cfb ("ipv6: hook up exception table to store
>> > dst cache") partially reverted 1e2ea8ad37be ("ipv6: set
>> > dst.obsolete when a cached route has expired").
>> >
>> > This change brings back the dst obsoleting and push it a step
>> > farther: cached dst are always obsoleted when removed from the
>> > fib tree, and removal by time expiration is now performed
>> > regardless of dst->__refcnt, to be consistent with what we
>> > already do for RTF_GATEWAY dst.
>> >
>> > Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
>> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>> > ---
>> >  net/ipv6/route.c | 13 +++++++++++--
>> >  1 file changed, 11 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
>> > index 8b25a31b6b03..fce740049e3e 100644
>> > --- a/net/ipv6/route.c
>> > +++ b/net/ipv6/route.c
>> > @@ -1147,6 +1147,12 @@ static void rt6_remove_exception(struct rt6_exception_bucket *bucket,
>> >         if (!bucket || !rt6_ex)
>> >                 return;
>> >
>> > +       /* sockets, flow cache, etc. can hold a refence to this dst, be sure
>> > +        * they will drop it.
>> > +        */
>> > +       if (rt6_ex->rt6i)
>> > +               rt6_ex->rt6i->dst.obsolete = DST_OBSOLETE_FORCE_CHK;
>> > +
>>
>> Hmm... I don't really think it is needed. rt6 is created with
>> rt6->dst.obsolete set to DST_OBSOLETE_FORCE_CHK. And by the time the
>> above function is called, it should still be that value.
>> Furthermore, the later call rt6_release() calls dst_dev_put() which
>> sets rt6->dst.obsolete to DST_OBSOLETE_DEAD to indicate this route has
>> been removed from the tree.
>
> You are right, this looks as not needed, if we keep the chunck below.
>
>> >         net = dev_net(rt6_ex->rt6i->dst.dev);
>> >         rt6_ex->rt6i->rt6i_node = NULL;
>> >         hlist_del_rcu(&rt6_ex->hlist);
>> > @@ -1575,8 +1581,11 @@ static void rt6_age_examine_exception(struct rt6_exception_bucket *bucket,
>> >  {
>> >         struct rt6_info *rt = rt6_ex->rt6i;
>> >
>> > -       if (atomic_read(&rt->dst.__refcnt) == 1 &&
>> > -           time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {
>> > +       /* we are pruning and obsoleting the exception route even if others
>> > +        * have still reference to it, so that on next dst_check() such
>> > +        * reference can be dropped
>> > +        */
>> > +       if (time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {
>>
>> Why do we want to change this behavior? Before my patch series, cached
>> routes were only deleted from the tree in fib6_age() when
>> rt->dst.__refcnt == 1, isn't it?
>
> yes, but that really looks like a relic from ancient past more than
> something really needed. We already remove from the dst from fib tree
> regardless of the refcnt if the gateway validation fails - a few lines
> below in the same function.
>
> Waiting for __refcnt going down will let the kernel keep the exception
> entry around for much longer - potentially forever, if e.g. we have a
> reference in a socket dst cache and the application stops processing
> packets.
>

True. If the socket is idle and doesn't send/receive packets,
dst_check() won't get triggered and the socket will keep holding
refcnt on the obsolete dst.

> Meanwhile others sockets may grab more references to (and use) the same
> aged-out dst.
>
I don't think other sockets could grab more reference to this dst
because this dst should already be removed from the fib6 tree.

> The commit 1e2ea8ad37be ("ipv6: set dst.obsolete when a cached route
> has expired") was the solution to the above issue prior to the recent
> refactor.
>

I don't really understand how this commit is solving the above issue.
This commit still only ages out cached route if &rt->dst.__refcnt ==
1. So if socket is holding refcnt to this dst and dst_check() is not
getting called,  this cached route still won't get deleted.

> Cheers,
>
> Paolo
Martin KaFai Lau Oct. 17, 2017, 9:52 p.m. UTC | #4
On Tue, Oct 17, 2017 at 06:58:23PM +0000, Wei Wang wrote:
> On Tue, Oct 17, 2017 at 10:40 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> > The commit 2b760fcf5cfb ("ipv6: hook up exception table to store
> > dst cache") partially reverted 1e2ea8ad37be ("ipv6: set
> > dst.obsolete when a cached route has expired").
> >
> > This change brings back the dst obsoleting and push it a step
> > farther: cached dst are always obsoleted when removed from the
> > fib tree, and removal by time expiration is now performed
> > regardless of dst->__refcnt, to be consistent with what we
> > already do for RTF_GATEWAY dst.
> >
> > Fixes: 2b760fcf5cfb ("ipv6: hook up exception table to store dst cache")
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  net/ipv6/route.c | 13 +++++++++++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index 8b25a31b6b03..fce740049e3e 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -1147,6 +1147,12 @@ static void rt6_remove_exception(struct rt6_exception_bucket *bucket,
> >         if (!bucket || !rt6_ex)
> >                 return;
> >
> > +       /* sockets, flow cache, etc. can hold a refence to this dst, be sure
> > +        * they will drop it.
> > +        */
> > +       if (rt6_ex->rt6i)
> > +               rt6_ex->rt6i->dst.obsolete = DST_OBSOLETE_FORCE_CHK;
> > +
> 
> Hmm... I don't really think it is needed. rt6 is created with
> rt6->dst.obsolete set to DST_OBSOLETE_FORCE_CHK. And by the time the
> above function is called, it should still be that value.
> Furthermore, the later call rt6_release() calls dst_dev_put() which
> sets rt6->dst.obsolete to DST_OBSOLETE_DEAD to indicate this route has
> been removed from the tree.
> 
> >         net = dev_net(rt6_ex->rt6i->dst.dev);
> >         rt6_ex->rt6i->rt6i_node = NULL;
> >         hlist_del_rcu(&rt6_ex->hlist);
> > @@ -1575,8 +1581,11 @@ static void rt6_age_examine_exception(struct rt6_exception_bucket *bucket,
> >  {
> >         struct rt6_info *rt = rt6_ex->rt6i;
> >
> > -       if (atomic_read(&rt->dst.__refcnt) == 1 &&
> > -           time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {
> > +       /* we are pruning and obsoleting the exception route even if others
> > +        * have still reference to it, so that on next dst_check() such
> > +        * reference can be dropped
> > +        */
> > +       if (time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {
> 
> Why do we want to change this behavior? Before my patch series, cached
> routes were only deleted from the tree in fib6_age() when
> rt->dst.__refcnt == 1, isn't it?
In the commit 1e2ea8ad37be ("ipv6: set dst.obsolete when a cached route has expired"),
if obsolete is set to DST_OBSOLETE_KILL, why it is not removed from
the tree together?

> 
> >                 RT6_TRACE("aging clone %p\n", rt);
> >                 rt6_remove_exception(bucket, rt6_ex);
> >                 return;
> > --
> > 2.13.6
> >
Paolo Abeni Oct. 18, 2017, 1:03 p.m. UTC | #5
On Tue, 2017-10-17 at 13:48 -0700, Wei Wang wrote:
> On Tue, Oct 17, 2017 at 1:02 PM, Paolo Abeni <pabeni@redhat.com> wrote:
> > Meanwhile others sockets may grab more references to (and use) the same
> > aged-out dst.
> > 
> 
> I don't think other sockets could grab more reference to this dst
> because this dst should already be removed from the fib6 tree.

With the current net-next code, the dst is not removed from the fib
tree while someone else is holding it and dst_check() does not fail
after that the cached dst is aged out. If a socket cache grab a
reference to the CACHE dst, it will not release it untill the next
sernum change, regardless of the dst aging.

> > The commit 1e2ea8ad37be ("ipv6: set dst.obsolete when a cached route
> > has expired") was the solution to the above issue prior to the recent
> > refactor.
> > 
> 
> I don't really understand how this commit is solving the above issue.
> This commit still only ages out cached route if &rt->dst.__refcnt ==
> 1. So if socket is holding refcnt to this dst and dst_check() is not
> getting called,  this cached route still won't get deleted.

Setting obsolete to DST_OBSOLETE_KILL forced whoever was holding the
dst reference to drop it on the next dst_check(), so that refcnt could
go down. 

Cheers,

Paolo
Wei Wang Oct. 18, 2017, 5:56 p.m. UTC | #6
On Wed, Oct 18, 2017 at 6:03 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> On Tue, 2017-10-17 at 13:48 -0700, Wei Wang wrote:
>> On Tue, Oct 17, 2017 at 1:02 PM, Paolo Abeni <pabeni@redhat.com> wrote:
>> > Meanwhile others sockets may grab more references to (and use) the same
>> > aged-out dst.
>> >
>>
>> I don't think other sockets could grab more reference to this dst
>> because this dst should already be removed from the fib6 tree.
>
> With the current net-next code, the dst is not removed from the fib
> tree while someone else is holding it and dst_check() does not fail
> after that the cached dst is aged out. If a socket cache grab a
> reference to the CACHE dst, it will not release it untill the next
> sernum change, regardless of the dst aging.
>
>> > The commit 1e2ea8ad37be ("ipv6: set dst.obsolete when a cached route
>> > has expired") was the solution to the above issue prior to the recent
>> > refactor.
>> >
>>
>> I don't really understand how this commit is solving the above issue.
>> This commit still only ages out cached route if &rt->dst.__refcnt ==
>> 1. So if socket is holding refcnt to this dst and dst_check() is not
>> getting called,  this cached route still won't get deleted.
>
> Setting obsolete to DST_OBSOLETE_KILL forced whoever was holding the
> dst reference to drop it on the next dst_check(), so that refcnt could
> go down.
>

Yes. Understood.
Martin and I had a discussion yesterday. We both think it is not a
good idea to set obolete to DST_OBSOLETE_KILL but not to remove it
from the fib6 tree.
It is because others who do a route lookup later might potentially
find this route and tries to use this route. However, dst_check() will
show this route is invalid. So the user will redo the route lookup.
But as this route is not yet deleted from the tree, it will find this
route again. This seems like a bad situation.
And again, setting obsolete to DST_OBSOLETE_KILL does not prevent some
idle socket holding on to this dst for a long time...

With the above said, I am now convinced what you have in your patch is
the correct thing to do. Just remove the cached route without checking
the refcnt when it is aged.

> Cheers,
>
> Paolo
Martin KaFai Lau Oct. 18, 2017, 7:05 p.m. UTC | #7
On Wed, Oct 18, 2017 at 05:56:39PM +0000, Wei Wang wrote:
> On Wed, Oct 18, 2017 at 6:03 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> > On Tue, 2017-10-17 at 13:48 -0700, Wei Wang wrote:
> >> On Tue, Oct 17, 2017 at 1:02 PM, Paolo Abeni <pabeni@redhat.com> wrote:
> >> > Meanwhile others sockets may grab more references to (and use) the same
> >> > aged-out dst.
> >> >
> >>
> >> I don't think other sockets could grab more reference to this dst
> >> because this dst should already be removed from the fib6 tree.
> >
> > With the current net-next code, the dst is not removed from the fib
> > tree while someone else is holding it and dst_check() does not fail
> > after that the cached dst is aged out. If a socket cache grab a
> > reference to the CACHE dst, it will not release it untill the next
> > sernum change, regardless of the dst aging.
> >
> >> > The commit 1e2ea8ad37be ("ipv6: set dst.obsolete when a cached route
> >> > has expired") was the solution to the above issue prior to the recent
> >> > refactor.
> >> >
> >>
> >> I don't really understand how this commit is solving the above issue.
> >> This commit still only ages out cached route if &rt->dst.__refcnt ==
> >> 1. So if socket is holding refcnt to this dst and dst_check() is not
> >> getting called,  this cached route still won't get deleted.
> >
> > Setting obsolete to DST_OBSOLETE_KILL forced whoever was holding the
> > dst reference to drop it on the next dst_check(), so that refcnt could
> > go down.
> >
> 
> Yes. Understood.
> Martin and I had a discussion yesterday. We both think it is not a
> good idea to set obolete to DST_OBSOLETE_KILL but not to remove it
> from the fib6 tree.
> It is because others who do a route lookup later might potentially
> find this route and tries to use this route. However, dst_check() will
> show this route is invalid. So the user will redo the route lookup.
> But as this route is not yet deleted from the tree, it will find this
> route again. This seems like a bad situation.
> And again, setting obsolete to DST_OBSOLETE_KILL does not prevent some
> idle socket holding on to this dst for a long time...
> 
> With the above said, I am now convinced what you have in your patch is
> the correct thing to do. Just remove the cached route without checking
> the refcnt when it is aged.
Another thing (not limited to this case),

Considering we have a limited size in the exception table now and
the oldest one will get removed when the table is full,
do we still need to purge this periodically in gc?

I would like to see the IPv6's gc eventually goes away.

If we need to expire the pmtu value of a RTF_CACHE rt,
can dst.expires be checked during the lookup (like what
ipv4 is doing) and then remove it from the exception table?
Paolo Abeni Oct. 18, 2017, 8:30 p.m. UTC | #8
On Wed, 2017-10-18 at 12:05 -0700, Martin KaFai Lau wrote:
> Another thing (not limited to this case),
> 
> Considering we have a limited size in the exception table now and
> the oldest one will get removed when the table is full,
> do we still need to purge this periodically in gc?

At least in some scenarios we have only a few entries in the exception
table.

> I would like to see the IPv6's gc eventually goes away.
> 
> If we need to expire the pmtu value of a RTF_CACHE rt,
> can dst.expires be checked during the lookup (like what
> ipv4 is doing) and then remove it from the exception table?

Currently the gc also performs validation vs the related neigh for
GATEWAY dst. That looks potentially quite expensive, as we currently
have a per neighbour atomic refcount (e.g. if the same dst is cached on
different CPUs)

Cheers,

Paolo
diff mbox series

Patch

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 8b25a31b6b03..fce740049e3e 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1147,6 +1147,12 @@  static void rt6_remove_exception(struct rt6_exception_bucket *bucket,
 	if (!bucket || !rt6_ex)
 		return;
 
+	/* sockets, flow cache, etc. can hold a refence to this dst, be sure
+	 * they will drop it.
+	 */
+	if (rt6_ex->rt6i)
+		rt6_ex->rt6i->dst.obsolete = DST_OBSOLETE_FORCE_CHK;
+
 	net = dev_net(rt6_ex->rt6i->dst.dev);
 	rt6_ex->rt6i->rt6i_node = NULL;
 	hlist_del_rcu(&rt6_ex->hlist);
@@ -1575,8 +1581,11 @@  static void rt6_age_examine_exception(struct rt6_exception_bucket *bucket,
 {
 	struct rt6_info *rt = rt6_ex->rt6i;
 
-	if (atomic_read(&rt->dst.__refcnt) == 1 &&
-	    time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {
+	/* we are pruning and obsoleting the exception route even if others
+	 * have still reference to it, so that on next dst_check() such
+	 * reference can be dropped
+	 */
+	if (time_after_eq(now, rt->dst.lastuse + gc_args->timeout)) {
 		RT6_TRACE("aging clone %p\n", rt);
 		rt6_remove_exception(bucket, rt6_ex);
 		return;