diff mbox

[5/5] ipv4: Don't use the cached pmtu informations for input routes

Message ID 20111123121450.GE6348@secunet.com
State Accepted, archived
Delegated to: David Miller
Headers show

Commit Message

Steffen Klassert Nov. 23, 2011, 12:14 p.m. UTC
The pmtu informations on the inetpeer are visible for output and
input routes. On packet forwarding, we might propagate a learned
pmtu to the sender. As we update the pmtu informations of the
inetpeer on demand, the original sender of the forwarded packets
might never notice when the pmtu to that inetpeer increases.
So use the mtu of the outgoing device on packet forwarding instead
of the pmtu to the final destination.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/route.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

Comments

Herbert Xu Nov. 23, 2011, 3:52 p.m. UTC | #1
Steffen Klassert <steffen.klassert@secunet.com> wrote:
> The pmtu informations on the inetpeer are visible for output and
> input routes. On packet forwarding, we might propagate a learned
> pmtu to the sender. As we update the pmtu informations of the
> inetpeer on demand, the original sender of the forwarded packets
> might never notice when the pmtu to that inetpeer increases.
> So use the mtu of the outgoing device on packet forwarding instead
> of the pmtu to the final destination.

I disagree.  MTU increases are detected by the source retrying,
so they most certainly should notice if our cached value expires.

Cheers,
David Miller Nov. 23, 2011, 9:49 p.m. UTC | #2
From: Herbert Xu <herbert@gondor.hengli.com.au>
Date: Wed, 23 Nov 2011 23:52:45 +0800

> Steffen Klassert <steffen.klassert@secunet.com> wrote:
>> The pmtu informations on the inetpeer are visible for output and
>> input routes. On packet forwarding, we might propagate a learned
>> pmtu to the sender. As we update the pmtu informations of the
>> inetpeer on demand, the original sender of the forwarded packets
>> might never notice when the pmtu to that inetpeer increases.
>> So use the mtu of the outgoing device on packet forwarding instead
>> of the pmtu to the final destination.
> 
> I disagree.  MTU increases are detected by the source retrying,
> so they most certainly should notice if our cached value expires.

Herbert, consider the issue one level higher :-)

The issue is that "we", the forwarding entity, end up with a cached on
the input route and report this cached PMTU to the sender.

This takes a while to time out.

So even if the sender "probes" he will be kept from seeing new MTU space
becomming available due to our cached value.

What Steffen is doing is simply re-instating the behavior we had
before the inetpeer conversion occurred.  We never used learned PMTU
information on input routes, only output routes stored them.

But now that input and output routes share learned information from
the same set of inetpeer entries, we have to accomodate cases like this
one.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Nov. 24, 2011, 12:10 a.m. UTC | #3
On Wed, Nov 23, 2011 at 04:49:41PM -0500, David Miller wrote:
>
> The issue is that "we", the forwarding entity, end up with a cached on
> the input route and report this cached PMTU to the sender.
> 
> This takes a while to time out.

I understand.  However, this could be seen as an optimisation
as if somehow we as a router discovered the smaller MTU this
would save everybody behind us from having to disocver it by
probing until our discovery expires.

Is there a particular scenario where this breaks? I could only think
of policy routing cases but that's not unique to forwarding as it
also affects us as a host.

Thanks,
David Miller Nov. 24, 2011, 12:17 a.m. UTC | #4
From: Herbert Xu <herbert@gondor.hengli.com.au>
Date: Thu, 24 Nov 2011 08:10:35 +0800

> Is there a particular scenario where this breaks? I could only think
> of policy routing cases but that's not unique to forwarding as it
> also affects us as a host.

Steffen has a specific situation.

But regardless I think the current behavior is terrible.

It means there is a propagation delay of PMTU increasing which is on the
order of the number of hops between the sender and the PMTU increased link.
So if the PMTU timeout is X, the propagation time is X * N_HOPS.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Nov. 24, 2011, 1:01 a.m. UTC | #5
On Wed, Nov 23, 2011 at 07:17:58PM -0500, David Miller wrote:
>
> It means there is a propagation delay of PMTU increasing which is on the
> order of the number of hops between the sender and the PMTU increased link.
> So if the PMTU timeout is X, the propagation time is X * N_HOPS.

That assumes that each router in the path would constantly probe
to refresh its cached value, as otherwise they should all expire
after X.

However, as this was the original behaviour, I think going back to
it is fine.

I am curious in knowing exactly what scenario Steffen ran into
that prompted this change though.

Cheers,
Steffen Klassert Nov. 24, 2011, 8:49 a.m. UTC | #6
On Thu, Nov 24, 2011 at 09:01:47AM +0800, Herbert Xu wrote:
> On Wed, Nov 23, 2011 at 07:17:58PM -0500, David Miller wrote:
> >
> > It means there is a propagation delay of PMTU increasing which is on the
> > order of the number of hops between the sender and the PMTU increased link.
> > So if the PMTU timeout is X, the propagation time is X * N_HOPS.
> 
> That assumes that each router in the path would constantly probe
> to refresh its cached value, as otherwise they should all expire
> after X.

Well, it simply got unpredictable how long pmtu propagation takes.
This depends now on the network activity of the routers and on the
individual configuration of the expiration time on each router. With
the original behaviour, we could expect to notice a pmtu increase
if we probe after our timeout X. Now we don't know when this is going
to happen.

Btw. we leak a trigger to check if the learned pmtu information has
expired in our current tree, the learned pmtu informations never
expire. So we are doing good to not propagate our learned pmtu
to other hosts :)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Nov. 24, 2011, 12:06 p.m. UTC | #7
On Thu, Nov 24, 2011 at 09:49:52AM +0100, Steffen Klassert wrote:
>
> Well, it simply got unpredictable how long pmtu propagation takes.
> This depends now on the network activity of the routers and on the
> individual configuration of the expiration time on each router. With
> the original behaviour, we could expect to notice a pmtu increase
> if we probe after our timeout X. Now we don't know when this is going
> to happen.

With IPsec tunnel mode this already happens to an extent anyway.
Any PMTU event outside the tunnel is invisible to hosts within it.
Only the router can see those events.

However, as I said I'm not too bothered with this so feel free
to go with your patch.

Cheers,
diff mbox

Patch

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 11d1b20..fb47c8f 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1816,15 +1816,15 @@  static unsigned int ipv4_default_advmss(const struct dst_entry *dst)
 
 static unsigned int ipv4_mtu(const struct dst_entry *dst)
 {
+	const struct rtable *rt = (const struct rtable *) dst;
 	unsigned int mtu = dst_metric_raw(dst, RTAX_MTU);
 
-	if (mtu)
+	if (mtu && rt_is_output_route(rt))
 		return mtu;
 
 	mtu = dst->dev->mtu;
 
 	if (unlikely(dst_metric_locked(dst, RTAX_MTU))) {
-		const struct rtable *rt = (const struct rtable *) dst;
 
 		if (rt->rt_gateway != rt->rt_dst && mtu > 576)
 			mtu = 576;