diff mbox

linux-3.0.x regression with ipv4 routes having mtu

Message ID 20111216122147.GJ6348@secunet.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Steffen Klassert Dec. 16, 2011, 12:21 p.m. UTC
On Thu, Dec 15, 2011 at 02:49:57PM +0100, Steffen Klassert wrote:
> On Wed, Dec 14, 2011 at 12:50:10PM -0500, David Miller wrote:
> > From: Timo Teräs <timo.teras@iki.fi>
> > Date: Wed, 14 Dec 2011 17:54:16 +0200
> > 
> > > So something is does not get updated here. This used to work though.
> > > The current production boxes where I know this work is 2.6.38.8.
> > 
> > We store the PMTU externally in inetpeer entries, Eric Dumazet
> > fixed recently but that fix hasn't been submitted to -stable
> > yet.
> > 
> 
> I think we still have at least two problems with pmtu handling.
> One is that "ip route flush cache" flushes the routing cache
> but not the cached metrics on the inetpeer.

"ip route flush cache" does not even flush the routing cache,
it just markes the routing cache as invalid by changing the
rt_genid which make the old routes invisible. And since we don't
have rt_check_expire() anymore, we have to wait until we have
collected gc_thresh (1024) useless routes before rt_garbage_collect()
starts to remove some of them (btw. is this intentional).

I think we need a trigger in rt_cache_invalidate() that expires
all cached pmtu values similar to the routing cache entries.
For a moment I thought we could just reset the __rt_peer_genid
value back to zero to mark all cached pmtu values as expired,
but that's apparently not save to do. So I came to no conclusion
how to fix this today. Any ideas?

> 
> Another problem might be that we ignore the user configured
> fib_metrics in rt_init_metrics() if we have a cached metric
> on the inetpeer.
> 

I think this could be fixed with something like the patch
below. With this it should be possible to overwrite cached
values from userspace. It has not much testing, so it's
just for review at the monent.

---------
Subject: [PATCH] route: Initialize with the fib_metrics in the non default case

We initialize the routing metrics with the cached values in
rt_init_metrics(). So if we have the metrics cached on the
inetpeer, we ignore the user configured fib_metrics. So
initialize the routing metrics with the fib_metrics if they
are different from dst_default_metrics.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/route.c |   28 ++++++++++++++++++++--------
 1 files changed, 20 insertions(+), 8 deletions(-)

Comments

Timo Teras Dec. 16, 2011, 2:30 p.m. UTC | #1
On 12/16/2011 02:21 PM, Steffen Klassert wrote:
> On Thu, Dec 15, 2011 at 02:49:57PM +0100, Steffen Klassert wrote:
>> On Wed, Dec 14, 2011 at 12:50:10PM -0500, David Miller wrote:
>>> From: Timo Teräs <timo.teras@iki.fi>
>>> Date: Wed, 14 Dec 2011 17:54:16 +0200
>>>
>>>> So something is does not get updated here. This used to work though.
>>>> The current production boxes where I know this work is 2.6.38.8.
>>>
>>> We store the PMTU externally in inetpeer entries, Eric Dumazet
>>> fixed recently but that fix hasn't been submitted to -stable
>>> yet.
>>>
>>
>> I think we still have at least two problems with pmtu handling.
>> One is that "ip route flush cache" flushes the routing cache
>> but not the cached metrics on the inetpeer.
> 
> "ip route flush cache" does not even flush the routing cache,
> it just markes the routing cache as invalid by changing the
> rt_genid which make the old routes invisible. And since we don't
> have rt_check_expire() anymore, we have to wait until we have
> collected gc_thresh (1024) useless routes before rt_garbage_collect()
> starts to remove some of them (btw. is this intentional).
> 
> I think we need a trigger in rt_cache_invalidate() that expires
> all cached pmtu values similar to the routing cache entries.
> For a moment I thought we could just reset the __rt_peer_genid
> value back to zero to mark all cached pmtu values as expired,
> but that's apparently not save to do. So I came to no conclusion
> how to fix this today. Any ideas?

Oh, right. This problem is secondary. As long as the mtu value is
reflected properly, I'll be happy :)

>> Another problem might be that we ignore the user configured
>> fib_metrics in rt_init_metrics() if we have a cached metric
>> on the inetpeer.
>>
> 
> I think this could be fixed with something like the patch
> below. With this it should be possible to overwrite cached
> values from userspace. It has not much testing, so it's
> just for review at the monent.
> 
> ---------
> Subject: [PATCH] route: Initialize with the fib_metrics in the non default case
>[snip]

Thanks! After a very quick test spin, it would appear to fix the bug.
Will continue testing.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steffen Klassert Dec. 19, 2011, 1:52 p.m. UTC | #2
On Fri, Dec 16, 2011 at 04:30:26PM +0200, Timo Teräs wrote:
> On 12/16/2011 02:21 PM, Steffen Klassert wrote:
> > 
> > "ip route flush cache" does not even flush the routing cache,
> > it just markes the routing cache as invalid by changing the
> > rt_genid which make the old routes invisible. And since we don't
> > have rt_check_expire() anymore, we have to wait until we have
> > collected gc_thresh (1024) useless routes before rt_garbage_collect()
> > starts to remove some of them (btw. is this intentional).
> > 
> > I think we need a trigger in rt_cache_invalidate() that expires
> > all cached pmtu values similar to the routing cache entries.
> > For a moment I thought we could just reset the __rt_peer_genid
> > value back to zero to mark all cached pmtu values as expired,
> > but that's apparently not save to do. So I came to no conclusion
> > how to fix this today. Any ideas?
> 
> Oh, right. This problem is secondary. As long as the mtu value is
> reflected properly, I'll be happy :)
> 

Well, "ip route flush cache" flushed the cached pmtu informations
in 2.6.38 and before, now it does not do it anymore. I guess
some users rely on the old behaviour, so it would be nice if we
could restore it. I think we can fix this by adding a pmtu_genid,
exactly in the same way as we have now the redirect_genid.
I'll give this a try.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 19, 2011, 8:09 p.m. UTC | #3
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Mon, 19 Dec 2011 14:52:59 +0100

> On Fri, Dec 16, 2011 at 04:30:26PM +0200, Timo Teräs wrote:
>> On 12/16/2011 02:21 PM, Steffen Klassert wrote:
>> > 
>> > "ip route flush cache" does not even flush the routing cache,
>> > it just markes the routing cache as invalid by changing the
>> > rt_genid which make the old routes invisible. And since we don't
>> > have rt_check_expire() anymore, we have to wait until we have
>> > collected gc_thresh (1024) useless routes before rt_garbage_collect()
>> > starts to remove some of them (btw. is this intentional).
>> > 
>> > I think we need a trigger in rt_cache_invalidate() that expires
>> > all cached pmtu values similar to the routing cache entries.
>> > For a moment I thought we could just reset the __rt_peer_genid
>> > value back to zero to mark all cached pmtu values as expired,
>> > but that's apparently not save to do. So I came to no conclusion
>> > how to fix this today. Any ideas?
>> 
>> Oh, right. This problem is secondary. As long as the mtu value is
>> reflected properly, I'll be happy :)
>> 
> 
> Well, "ip route flush cache" flushed the cached pmtu informations
> in 2.6.38 and before, now it does not do it anymore. I guess
> some users rely on the old behaviour, so it would be nice if we
> could restore it. I think we can fix this by adding a pmtu_genid,
> exactly in the same way as we have now the redirect_genid.
> I'll give this a try.

Actually, adding yet another ID is pointless.

Just get rid of redirect_genid, and use net->ipv4.rt_genid for all of
the necessary purposes.  Let there be one ID in the inetpeer and use
the net->ipv4.rt_genid to drive it.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 19, 2011, 9:10 p.m. UTC | #4
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Fri, 16 Dec 2011 13:21:47 +0100

> Subject: [PATCH] route: Initialize with the fib_metrics in the non default case
> 
> We initialize the routing metrics with the cached values in
> rt_init_metrics(). So if we have the metrics cached on the
> inetpeer, we ignore the user configured fib_metrics. So
> initialize the routing metrics with the fib_metrics if they
> are different from dst_default_metrics.
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

The current behavior is intentional.

Learned metrics should be used on all routes for which a inetpeer
peer exists and the destination matches.

There is no sane way to allow overrides.

I'm pretty sure all of Timo's bugs will be fixed when you add the
generation count for PMTU stuff.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Timo Teras Dec. 20, 2011, 6:53 a.m. UTC | #5
On 12/19/2011 11:10 PM, David Miller wrote:
> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Fri, 16 Dec 2011 13:21:47 +0100
> 
>> Subject: [PATCH] route: Initialize with the fib_metrics in the non default case
>>
>> We initialize the routing metrics with the cached values in
>> rt_init_metrics(). So if we have the metrics cached on the
>> inetpeer, we ignore the user configured fib_metrics. So
>> initialize the routing metrics with the fib_metrics if they
>> are different from dst_default_metrics.
>>
>> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> 
> The current behavior is intentional.
> 
> Learned metrics should be used on all routes for which a inetpeer
> peer exists and the destination matches.
> 
> There is no sane way to allow overrides.
> 
> I'm pretty sure all of Timo's bugs will be fixed when you add the
> generation count for PMTU stuff.

I tried to look at the code to see how the fib MTU is handled, but I
don't think just generation count for PMTU would solve it.

My problem is that after inetpeer is created, the fib mtu is never
looked again at. The code that updates it, is in rt_init_metrics():
                if (inet_metrics_new(peer))
                        memcpy(peer->metrics, fi->fib_metrics,
                               sizeof(u32) * RTAX_MAX);

Since the inetpeer there never gets recycled (peer lookup does not look
at generation count), the metrics are initialised from the fib exactly
once: when the inetpeer is initially created.

Now, if I have running system, there's traffic to specific inetpeer, and
later I add a system wide override route with mtu to that destination,
the updated mtu is never honoured. Because it comes from fib, and not
via the pmtu mechanism.

Or maybe I missed the place where that updated would happen?

It seems that the inetpeer.c comment that " The (inetpeer) nodes
contains long-living information about the peer which doesn't depend on
routes." does not hold true any more. Since mtu is (or at least used to
be) a route dependant value.

Perhaps we could then at least check the fib MTU and update inetpeer if
it's lower than what inetpeer used to be. This means of course that if
there's various routes to same destination (e.g. due to policy routing)
with different MTUs, only the smallest one would get used system wide.
But at least the route specific MTU would work then.

This is basically a problem for me, as I have userland code adding
dynamically per-destination mtu routes to workaround black hole ISP routers.

- Timo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 20, 2011, 7:03 a.m. UTC | #6
From: Timo Teräs <timo.teras@iki.fi>
Date: Tue, 20 Dec 2011 08:53:09 +0200

> Or maybe I missed the place where that updated would happen?

It should happen on every routing cache lookup hit just like we
validate the peer for redirect information.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steffen Klassert Dec. 20, 2011, 7:18 a.m. UTC | #7
On Tue, Dec 20, 2011 at 02:03:55AM -0500, David Miller wrote:
> From: Timo Teräs <timo.teras@iki.fi>
> Date: Tue, 20 Dec 2011 08:53:09 +0200
> 
> > Or maybe I missed the place where that updated would happen?
> 
> It should happen on every routing cache lookup hit just like we
> validate the peer for redirect information.

The problem is that we need to do a route cache lookup to
validate the peer informations. This does not happen if
somebody adds a new route. I tried already to add a pmtu specific
generation id and it appears to not solve the problem. We would
still need to overwrite the cached value if we add a route with mtu.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steffen Klassert Dec. 20, 2011, 8:03 a.m. UTC | #8
On Mon, Dec 19, 2011 at 03:09:03PM -0500, David Miller wrote:
> 
> Actually, adding yet another ID is pointless.
> 
> Just get rid of redirect_genid, and use net->ipv4.rt_genid for all of
> the necessary purposes.  

I already thought about such a solution, but is this really save?

What if somebody adds/deletes/flushes routes? rt_cache_invalidate()
is invoked and we increment net->ipv4.rt_genid. Then an icmp redirect
may appear and we call ip_rt_redirect() which sets the id on the
inetpeer to the new value of net->ipv4.rt_genid. After that,
ipv4_validate_peer() will find the generation id in proper state
and does not invalidate the learned pmtu value.

I think we could either have two separate generation ids for
pmtu and redirect handling, or we could handle both cases together
by accumulating check_peer_redir() and check_peer_pmtu() into a
check_peer() function. But the latter seems to be a bit too extensive
for a simple bug fix.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 20, 2011, 6:35 p.m. UTC | #9
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Tue, 20 Dec 2011 08:18:43 +0100

> On Tue, Dec 20, 2011 at 02:03:55AM -0500, David Miller wrote:
>> From: Timo Teräs <timo.teras@iki.fi>
>> Date: Tue, 20 Dec 2011 08:53:09 +0200
>> 
>> > Or maybe I missed the place where that updated would happen?
>> 
>> It should happen on every routing cache lookup hit just like we
>> validate the peer for redirect information.
> 
> The problem is that we need to do a route cache lookup to
> validate the peer informations. This does not happen if
> somebody adds a new route. I tried already to add a pmtu specific
> generation id and it appears to not solve the problem. We would
> still need to overwrite the cached value if we add a route with mtu.

Why?

Every use of a routing cache entry does a dst_check() or a routing
cache lookup.

You can check the generation ID in both locations, and if the
comparison fails then you force the routing cache entry to be killed
and the caller performs a lookup.  At that time the refreshing of the
new FIB entry will be realized.

The critical bit is invalidating the routing cache entry, I can only
guess that you're not doing that.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steffen Klassert Dec. 21, 2011, 8:56 a.m. UTC | #10
On Tue, Dec 20, 2011 at 01:35:42PM -0500, David Miller wrote:
> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Tue, 20 Dec 2011 08:18:43 +0100
> 
> > On Tue, Dec 20, 2011 at 02:03:55AM -0500, David Miller wrote:
> >> From: Timo Teräs <timo.teras@iki.fi>
> >> Date: Tue, 20 Dec 2011 08:53:09 +0200
> >> 
> >> > Or maybe I missed the place where that updated would happen?
> >> 
> >> It should happen on every routing cache lookup hit just like we
> >> validate the peer for redirect information.
> > 
> > The problem is that we need to do a route cache lookup to
> > validate the peer informations. This does not happen if
> > somebody adds a new route. I tried already to add a pmtu specific
> > generation id and it appears to not solve the problem. We would
> > still need to overwrite the cached value if we add a route with mtu.
> 
> Why?
> 
> Every use of a routing cache entry does a dst_check() or a routing
> cache lookup.
> 
> You can check the generation ID in both locations, and if the
> comparison fails then you force the routing cache entry to be killed
> and the caller performs a lookup.  At that time the refreshing of the
> new FIB entry will be realized.

My decscription was a bit missleading, sorry. This seems to be not
even related to pmtu handling. It's just that we don't use the
user configured metrics on the fib entry if we find an inetpeer
with metrics that are not in INETPEER_METRICS_NEW state when we
initialize the routing metrics in rt_init_metrics(). I tried to add
a route with a hoplimit, this has also no effect if the metrics on
the inetpeer are not new.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 21, 2011, 8:56 p.m. UTC | #11
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Wed, 21 Dec 2011 09:56:16 +0100

> On Tue, Dec 20, 2011 at 01:35:42PM -0500, David Miller wrote:
>> From: Steffen Klassert <steffen.klassert@secunet.com>
>> Date: Tue, 20 Dec 2011 08:18:43 +0100
>> 
>> > On Tue, Dec 20, 2011 at 02:03:55AM -0500, David Miller wrote:
>> >> From: Timo Teräs <timo.teras@iki.fi>
>> >> Date: Tue, 20 Dec 2011 08:53:09 +0200
>> >> 
>> >> > Or maybe I missed the place where that updated would happen?
>> >> 
>> >> It should happen on every routing cache lookup hit just like we
>> >> validate the peer for redirect information.
>> > 
>> > The problem is that we need to do a route cache lookup to
>> > validate the peer informations. This does not happen if
>> > somebody adds a new route. I tried already to add a pmtu specific
>> > generation id and it appears to not solve the problem. We would
>> > still need to overwrite the cached value if we add a route with mtu.
>> 
>> Why?
>> 
>> Every use of a routing cache entry does a dst_check() or a routing
>> cache lookup.
>> 
>> You can check the generation ID in both locations, and if the
>> comparison fails then you force the routing cache entry to be killed
>> and the caller performs a lookup.  At that time the refreshing of the
>> new FIB entry will be realized.
> 
> My decscription was a bit missleading, sorry. This seems to be not
> even related to pmtu handling. It's just that we don't use the
> user configured metrics on the fib entry if we find an inetpeer
> with metrics that are not in INETPEER_METRICS_NEW state when we
> initialize the routing metrics in rt_init_metrics(). I tried to add
> a route with a hoplimit, this has also no effect if the metrics on
> the inetpeer are not new.

Ok, so what you're saying is that we need a way to invalidate inetpeer
entries, or at least invalidate their cached metrics and set
INETPEER_METRICS_NEW once more.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steffen Klassert Dec. 22, 2011, 10:25 a.m. UTC | #12
On Wed, Dec 21, 2011 at 03:56:15PM -0500, David Miller wrote:
> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Wed, 21 Dec 2011 09:56:16 +0100
> 
> > My decscription was a bit missleading, sorry. This seems to be not
> > even related to pmtu handling. It's just that we don't use the
> > user configured metrics on the fib entry if we find an inetpeer
> > with metrics that are not in INETPEER_METRICS_NEW state when we
> > initialize the routing metrics in rt_init_metrics(). I tried to add
> > a route with a hoplimit, this has also no effect if the metrics on
> > the inetpeer are not new.
> 
> Ok, so what you're saying is that we need a way to invalidate inetpeer
> entries, or at least invalidate their cached metrics and set
> INETPEER_METRICS_NEW once more.

Yes, we probaply need to invalidate whenever the fib changes.
We would have to invalidate at least the cached metrics and
all the pmtu related stuff we have on the inetpeer now.
Not sure if it is better to just invalidate some pieces
or the whole inetpeer entries.

Actually, I'm getting some doubts on the concept of caching
the metrics on the inetpeer. How should we handle the case when
we have multiple routes with different metrics to the same
destination? As it is, we can cache just one of them on the
inetpeer.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 22, 2011, 6:51 p.m. UTC | #13
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Thu, 22 Dec 2011 11:25:26 +0100

> Actually, I'm getting some doubts on the concept of caching
> the metrics on the inetpeer. How should we handle the case when
> we have multiple routes with different metrics to the same
> destination? As it is, we can cache just one of them on the
> inetpeer.

In such cases the metrics won't be perfect.

Actually, we have two kinds of metrics and it might be best in the
long term to seperate them out explicitly.

We have "loose" metrics (such as all the TCP measurements) which,
if incorrect, will lead to suboptimal performance but will not
dramatically effect connectivity.

Then we have "strict" metrics like PMTU information and redirects,
which could have more fatal consequences if wrong.

The core issue is I want to keep these things tied only to destination
address, it is the only sane way to get the routing cache removed.

Sockets and the packet forwarding path in the future will be using
routes that cover whole subnets, or even the default route.  There
will be no "rt->rt_dst" and there will also be not attached inetpeer.

The inetpeer, or something like it, will be managed seperately.
Therefore metrics will need to be calculated using both objects (rt
and inetpeer).

This is one of the longer term design issues we face in the routing
cache removal and for which I don't have exact plans for yet.  I'm
still concentrating on "ref-less" neighbour entries, so once I finish
that work I can think more seriously about metrics in a world without
the routing cache.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steffen Klassert Dec. 23, 2011, 8:47 a.m. UTC | #14
On Thu, Dec 22, 2011 at 01:51:27PM -0500, David Miller wrote:
> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Thu, 22 Dec 2011 11:25:26 +0100
> 
> > Actually, I'm getting some doubts on the concept of caching
> > the metrics on the inetpeer. How should we handle the case when
> > we have multiple routes with different metrics to the same
> > destination? As it is, we can cache just one of them on the
> > inetpeer.
> 
> In such cases the metrics won't be perfect.
> 
> Actually, we have two kinds of metrics and it might be best in the
> long term to seperate them out explicitly.
> 
> We have "loose" metrics (such as all the TCP measurements) which,
> if incorrect, will lead to suboptimal performance but will not
> dramatically effect connectivity.
> 
> Then we have "strict" metrics like PMTU information and redirects,
> which could have more fatal consequences if wrong.
> 
> The core issue is I want to keep these things tied only to destination
> address, it is the only sane way to get the routing cache removed.
> 
> Sockets and the packet forwarding path in the future will be using
> routes that cover whole subnets, or even the default route.  There
> will be no "rt->rt_dst" and there will also be not attached inetpeer.
> 
> The inetpeer, or something like it, will be managed seperately.
> Therefore metrics will need to be calculated using both objects (rt
> and inetpeer).
> 
> This is one of the longer term design issues we face in the routing
> cache removal and for which I don't have exact plans for yet.  I'm
> still concentrating on "ref-less" neighbour entries, so once I finish
> that work I can think more seriously about metrics in a world without
> the routing cache.

Ok, so let's try to keep the things working as good as possible at this
intermediate stage and see how it evolves. I know that you want to remove
the routing cache, but I did not care too much in the beginning. So I
missed the reason why you want to do that. Could you please give a short
explaination or point me to where I can find this informations?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steffen Klassert Dec. 23, 2011, 8:58 a.m. UTC | #15
On Thu, Dec 22, 2011 at 11:25:26AM +0100, Steffen Klassert wrote:
> > 
> > Ok, so what you're saying is that we need a way to invalidate inetpeer
> > entries, or at least invalidate their cached metrics and set
> > INETPEER_METRICS_NEW once more.
> 
> Yes, we probaply need to invalidate whenever the fib changes.
> We would have to invalidate at least the cached metrics and
> all the pmtu related stuff we have on the inetpeer now.
> Not sure if it is better to just invalidate some pieces
> or the whole inetpeer entries.
> 

I think I would favour to invalidate the inetpeer entries, we could
do this similar to the invalidation of the routing cache. Do you have
any preferences on this regarding the routing cache removal?

I'll continue to work at this in the new year, as I'll leave for
holidays in some hours.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Dec. 23, 2011, 9 a.m. UTC | #16
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Fri, 23 Dec 2011 09:47:37 +0100

> Could you please give a short explaination or point me to where I
> can find this informations?

The routing cache can be easily DoS'd because it fundamentally
has performance which is a product of the characteristics of
the packets we receive, which is controllable by remote entities.

If we get rid of the routing cache, then our lookup performance
will be a product of the contents of the routing table which we
and the administrator have full control over.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steffen Klassert Feb. 2, 2012, 10:01 a.m. UTC | #17
On Wed, Dec 21, 2011 at 03:56:15PM -0500, David Miller wrote:
> 
> Ok, so what you're saying is that we need a way to invalidate inetpeer
> entries, or at least invalidate their cached metrics and set
> INETPEER_METRICS_NEW once more.

In between I tried the approach to invalidate the cached metrics.
Unfortunately, we we can not just set INETPEER_METRICS_NEW once
again and overwrite the existing metrics once the inetpeer
is public. This would require some kind of locking, so I tried
a rcu based approach.

I'll send the patches for review in a new thread.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index f30112f..26a6249 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1841,6 +1841,22 @@  static unsigned int ipv4_mtu(const struct dst_entry *dst)
 	return mtu;
 }
 
+static void __rt_init_metrics(struct rtable *rt, struct fib_info *fi,
+			      struct inet_peer *peer)
+{
+	if (peer && fi->fib_metrics == (u32 *) dst_default_metrics) {
+		dst_init_metrics(&rt->dst, peer->metrics, false);
+		return;
+	}
+
+	if (fi->fib_metrics != (u32 *) dst_default_metrics) {
+		rt->fi = fi;
+		atomic_inc(&fi->fib_clntref);
+	}
+
+	dst_init_metrics(&rt->dst, fi->fib_metrics, true);
+}
+
 static void rt_init_metrics(struct rtable *rt, const struct flowi4 *fl4,
 			    struct fib_info *fi)
 {
@@ -1859,7 +1875,8 @@  static void rt_init_metrics(struct rtable *rt, const struct flowi4 *fl4,
 		if (inet_metrics_new(peer))
 			memcpy(peer->metrics, fi->fib_metrics,
 			       sizeof(u32) * RTAX_MAX);
-		dst_init_metrics(&rt->dst, peer->metrics, false);
+
+		__rt_init_metrics(rt, fi, peer);
 
 		check_peer_pmtu(&rt->dst, peer);
 		if (peer->redirect_genid != redirect_genid)
@@ -1869,13 +1886,8 @@  static void rt_init_metrics(struct rtable *rt, const struct flowi4 *fl4,
 			rt->rt_gateway = peer->redirect_learned.a4;
 			rt->rt_flags |= RTCF_REDIRECTED;
 		}
-	} else {
-		if (fi->fib_metrics != (u32 *) dst_default_metrics) {
-			rt->fi = fi;
-			atomic_inc(&fi->fib_clntref);
-		}
-		dst_init_metrics(&rt->dst, fi->fib_metrics, true);
-	}
+	} else
+		__rt_init_metrics(rt, fi, NULL);
 }
 
 static void rt_set_nexthop(struct rtable *rt, const struct flowi4 *fl4,