Message ID | 20111123121450.GE6348@secunet.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
Steffen Klassert <steffen.klassert@secunet.com> wrote: > The pmtu informations on the inetpeer are visible for output and > input routes. On packet forwarding, we might propagate a learned > pmtu to the sender. As we update the pmtu informations of the > inetpeer on demand, the original sender of the forwarded packets > might never notice when the pmtu to that inetpeer increases. > So use the mtu of the outgoing device on packet forwarding instead > of the pmtu to the final destination. I disagree. MTU increases are detected by the source retrying, so they most certainly should notice if our cached value expires. Cheers,
From: Herbert Xu <herbert@gondor.hengli.com.au> Date: Wed, 23 Nov 2011 23:52:45 +0800 > Steffen Klassert <steffen.klassert@secunet.com> wrote: >> The pmtu informations on the inetpeer are visible for output and >> input routes. On packet forwarding, we might propagate a learned >> pmtu to the sender. As we update the pmtu informations of the >> inetpeer on demand, the original sender of the forwarded packets >> might never notice when the pmtu to that inetpeer increases. >> So use the mtu of the outgoing device on packet forwarding instead >> of the pmtu to the final destination. > > I disagree. MTU increases are detected by the source retrying, > so they most certainly should notice if our cached value expires. Herbert, consider the issue one level higher :-) The issue is that "we", the forwarding entity, end up with a cached on the input route and report this cached PMTU to the sender. This takes a while to time out. So even if the sender "probes" he will be kept from seeing new MTU space becomming available due to our cached value. What Steffen is doing is simply re-instating the behavior we had before the inetpeer conversion occurred. We never used learned PMTU information on input routes, only output routes stored them. But now that input and output routes share learned information from the same set of inetpeer entries, we have to accomodate cases like this one. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 23, 2011 at 04:49:41PM -0500, David Miller wrote: > > The issue is that "we", the forwarding entity, end up with a cached on > the input route and report this cached PMTU to the sender. > > This takes a while to time out. I understand. However, this could be seen as an optimisation as if somehow we as a router discovered the smaller MTU this would save everybody behind us from having to disocver it by probing until our discovery expires. Is there a particular scenario where this breaks? I could only think of policy routing cases but that's not unique to forwarding as it also affects us as a host. Thanks,
From: Herbert Xu <herbert@gondor.hengli.com.au> Date: Thu, 24 Nov 2011 08:10:35 +0800 > Is there a particular scenario where this breaks? I could only think > of policy routing cases but that's not unique to forwarding as it > also affects us as a host. Steffen has a specific situation. But regardless I think the current behavior is terrible. It means there is a propagation delay of PMTU increasing which is on the order of the number of hops between the sender and the PMTU increased link. So if the PMTU timeout is X, the propagation time is X * N_HOPS. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Nov 23, 2011 at 07:17:58PM -0500, David Miller wrote: > > It means there is a propagation delay of PMTU increasing which is on the > order of the number of hops between the sender and the PMTU increased link. > So if the PMTU timeout is X, the propagation time is X * N_HOPS. That assumes that each router in the path would constantly probe to refresh its cached value, as otherwise they should all expire after X. However, as this was the original behaviour, I think going back to it is fine. I am curious in knowing exactly what scenario Steffen ran into that prompted this change though. Cheers,
On Thu, Nov 24, 2011 at 09:01:47AM +0800, Herbert Xu wrote: > On Wed, Nov 23, 2011 at 07:17:58PM -0500, David Miller wrote: > > > > It means there is a propagation delay of PMTU increasing which is on the > > order of the number of hops between the sender and the PMTU increased link. > > So if the PMTU timeout is X, the propagation time is X * N_HOPS. > > That assumes that each router in the path would constantly probe > to refresh its cached value, as otherwise they should all expire > after X. Well, it simply got unpredictable how long pmtu propagation takes. This depends now on the network activity of the routers and on the individual configuration of the expiration time on each router. With the original behaviour, we could expect to notice a pmtu increase if we probe after our timeout X. Now we don't know when this is going to happen. Btw. we leak a trigger to check if the learned pmtu information has expired in our current tree, the learned pmtu informations never expire. So we are doing good to not propagate our learned pmtu to other hosts :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Nov 24, 2011 at 09:49:52AM +0100, Steffen Klassert wrote: > > Well, it simply got unpredictable how long pmtu propagation takes. > This depends now on the network activity of the routers and on the > individual configuration of the expiration time on each router. With > the original behaviour, we could expect to notice a pmtu increase > if we probe after our timeout X. Now we don't know when this is going > to happen. With IPsec tunnel mode this already happens to an extent anyway. Any PMTU event outside the tunnel is invisible to hosts within it. Only the router can see those events. However, as I said I'm not too bothered with this so feel free to go with your patch. Cheers,
diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 11d1b20..fb47c8f 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1816,15 +1816,15 @@ static unsigned int ipv4_default_advmss(const struct dst_entry *dst) static unsigned int ipv4_mtu(const struct dst_entry *dst) { + const struct rtable *rt = (const struct rtable *) dst; unsigned int mtu = dst_metric_raw(dst, RTAX_MTU); - if (mtu) + if (mtu && rt_is_output_route(rt)) return mtu; mtu = dst->dev->mtu; if (unlikely(dst_metric_locked(dst, RTAX_MTU))) { - const struct rtable *rt = (const struct rtable *) dst; if (rt->rt_gateway != rt->rt_dst && mtu > 576) mtu = 576;
The pmtu informations on the inetpeer are visible for output and input routes. On packet forwarding, we might propagate a learned pmtu to the sender. As we update the pmtu informations of the inetpeer on demand, the original sender of the forwarded packets might never notice when the pmtu to that inetpeer increases. So use the mtu of the outgoing device on packet forwarding instead of the pmtu to the final destination. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> --- net/ipv4/route.c | 4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-)