Message ID | 20111011110922.GD1830@secunet.com |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
From: Steffen Klassert <steffen.klassert@secunet.com> Date: Tue, 11 Oct 2011 13:09:22 +0200 > Since commit 2c8cec5c (ipv4: Cache learned PMTU information in inetpeer) > we cache the learned pmtu informations in inetpeer and propagate these > informations with the dst_ops->check() functions. However, these functions > might not be called for udp and raw packets. As a consequence, we don't > use the learned pmtu informations in these cases. With this patch we > call dst_check() from ip_setup_cork() to propagate the pmtu informations. > > Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> This dst_check() call will only do something if dst->obsolete is non-zero. If dst->obsolete can be set in these circumstances, that's a bug. The caller is responsible for providing either a freshly looked up route or a cached route which has had dst_check() or sk_dst_check() invoked upon it. I am pretty sure these rules are followed by the current code. Again, there are only two scenerios: 1) 'rt' is just looked up by caller (f.e. udp_sendmsg() in rt == NULL case), here dst->obsolete is very unlikely to be non-zero. 2) Connected case, and we use cached route from the socket, but here we'll use sk_dst_check() to validate the route. sk_dst_check() makes the necessary dst->ops->check() call if dst->obsolete is non-zero, and in fact that is it's one and only job. So I cannot see a case where your new check can be necessary. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 12, 2011 at 05:02:18PM -0400, David Miller wrote: > > This dst_check() call will only do something if dst->obsolete is non-zero. > > If dst->obsolete can be set in these circumstances, that's a bug. The > caller is responsible for providing either a freshly looked up route > or a cached route which has had dst_check() or sk_dst_check() invoked > upon it. dst->obsolete was -1 in all cases I observed, regardless if connected or not. > > I am pretty sure these rules are followed by the current code. > > Again, there are only two scenerios: > > 1) 'rt' is just looked up by caller (f.e. udp_sendmsg() in rt == NULL case), > here dst->obsolete is very unlikely to be non-zero. > > 2) Connected case, and we use cached route from the socket, but here > we'll use sk_dst_check() to validate the route. sk_dst_check() > makes the necessary dst->ops->check() call if dst->obsolete is > non-zero, and in fact that is it's one and only job. > At least it seems that raw_sendmsg() and ping_sendmsg() don't use a cached route, they do the route lookup in any case. I don't see where we check if we learned a new pmtu in this cases. I added some debugging output and saw that peer->pmtu_learned has the correct pmtu value, but it never ends up in the metric of the dst_entry. With a ping -s 1400 to a destination where the pmtu is 1000 I get for each packet 'Frag needed and DF set (mtu = 1000)'. That's how I noticed that something changed. With the dst_check() in ip_setup_cork() I get 'Frag needed and DF set (mtu = 1000)' for the first packet and all further packets reach the destination, as it was before commmit 2c8cec5c. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Steffen Klassert <steffen.klassert@secunet.com> Date: Thu, 13 Oct 2011 12:09:50 +0200 > At least it seems that raw_sendmsg() and ping_sendmsg() don't use > a cached route, they do the route lookup in any case. I don't see > where we check if we learned a new pmtu in this cases. A freshly looked up route should not have ->obsolete set. That's why we don't do dst_check() in that part of the ip_output.c helper code you're modifying. Please find out exactly why dst->obsolete is non-zero on a freshly looked up route. It's unexpected. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 13, 2011 at 01:58:08PM -0400, David Miller wrote: > From: Steffen Klassert <steffen.klassert@secunet.com> > Date: Thu, 13 Oct 2011 12:09:50 +0200 > > > At least it seems that raw_sendmsg() and ping_sendmsg() don't use > > a cached route, they do the route lookup in any case. I don't see > > where we check if we learned a new pmtu in this cases. > > A freshly looked up route should not have ->obsolete set. > > That's why we don't do dst_check() in that part of the ip_output.c > helper code you're modifying. > > Please find out exactly why dst->obsolete is non-zero on a freshly > looked up route. It's unexpected. Hm, on a slow path route lookup e.g. __mkroute_output() calls rt_dst_alloc() which initializes dst->obsolete to -1. It seems that ___dst_free() is the only function that ever changes the initial obsolete value. After calling ___dst_free() dst->obsolete is 2. Btw. on a slow path route lookup, __mkroute_output() and friends initialize the pmtu informations via rt_set_nexthop(). How do we check if these informations are still valid if we get the route via the routing hash cache? Do we need to check in this case? The raw protocol uses ip4_datagram_connect() as it's connect function. ip4_datagram_connect() uses sk_dst_set() to cache the dst_entry on the socket, why we don't use this cached dst_entry on raw_sendmsg() in the connected case? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 14, 2011 at 07:54:06AM +0200, Steffen Klassert wrote: > On Thu, Oct 13, 2011 at 01:58:08PM -0400, David Miller wrote: > > > > Please find out exactly why dst->obsolete is non-zero on a freshly > > looked up route. It's unexpected. > > Hm, on a slow path route lookup e.g. __mkroute_output() calls > rt_dst_alloc() which initializes dst->obsolete to -1. Just a follow up: git commit d11a4dc18 (ipv4: check rt_genid in dst_check) changed the initialization value of dst->obsolete from 0 to -1. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2011.10.17 20:18, Steffen Klassert wrote: > On Fri, Oct 14, 2011 at 07:54:06AM +0200, Steffen Klassert wrote: >> On Thu, Oct 13, 2011 at 01:58:08PM -0400, David Miller wrote: >>> >>> Please find out exactly why dst->obsolete is non-zero on a freshly >>> looked up route. It's unexpected. >> >> Hm, on a slow path route lookup e.g. __mkroute_output() calls >> rt_dst_alloc() which initializes dst->obsolete to -1. > > Just a follow up: > git commit d11a4dc18 (ipv4: check rt_genid in dst_check) > changed the initialization value of dst->obsolete from > 0 to -1. > -- Anybody give any suggestion? Actually,we can't suppose the dst->obsolete is non-zero. maybe we should use both dst->obsolete and rt->rt_peer_genid to decide whether to do dst_check or not? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Gao feng <gaofeng@cn.fujitsu.com> Date: Wed, 19 Oct 2011 17:07:50 +0800 > 2011.10.17 20:18, Steffen Klassert wrote: >> On Fri, Oct 14, 2011 at 07:54:06AM +0200, Steffen Klassert wrote: >>> On Thu, Oct 13, 2011 at 01:58:08PM -0400, David Miller wrote: >>>> >>>> Please find out exactly why dst->obsolete is non-zero on a freshly >>>> looked up route. It's unexpected. >>> >>> Hm, on a slow path route lookup e.g. __mkroute_output() calls >>> rt_dst_alloc() which initializes dst->obsolete to -1. >> >> Just a follow up: >> git commit d11a4dc18 (ipv4: check rt_genid in dst_check) >> changed the initialization value of dst->obsolete from >> 0 to -1. >> -- > > Anybody give any suggestion? I'm really busy but will look at this soon. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Steffen I look at this specific patch again. The peer should be found and loaded up, and the PMTU propagated, when the route cache entry is created. Specifically rt_init_metrics() will find any existing peer, and do the whole check_peer_pmtu() sequence that ipv4_dst_check() does. If we have some issue with UDP or RAW caching a route past the first use after the route lookup, yes we have to introduce a dst_check() call somewhere. But unilaterally doing this on every CORK setup seems at least very excessive. Because CORK setup happens even for single sends. One such example is ip_make_skb(). ip_make_skb() is used by, f.e., udp_sendmsg() when corkreq is false. And in this case 'rt' is used immediately after being looked up in udp_sendmsg(). And, as far as I can tell, routines like udp_sendmsg() in fact already handle the "cached route across multiple sendmsg() calls" case too. Specifically, udp_sendmsg() does this: if (connected) rt = (struct rtable *)sk_dst_check(sk, 0); otherwise it makes a completely fresh route lookup. RAW sendmsg unconditionally makes a fresh route lookup on every sendmsg call. So it should be OK too. So I really fail to see the problematic case. Therefore, if it exists you'll have to give me an exact sequence of events that leads to the problem. I suspect that your real problem has nothing to do with UDP or RAW, but rather the issue is that entries already in the routing cache with a NULL peer need to be refreshed with peer information created in another context. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 8c65633..f682437 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1063,6 +1063,11 @@ static int ip_setup_cork(struct sock *sk, struct inet_cork *cork, rt = *rtp; if (unlikely(!rt)) return -EFAULT; + + rt = (struct rtable *)dst_check(&rt->dst, 0); + if (!rt) + return -EHOSTUNREACH; + /* * We steal reference to this route, caller should not release it */
Since commit 2c8cec5c (ipv4: Cache learned PMTU information in inetpeer) we cache the learned pmtu informations in inetpeer and propagate these informations with the dst_ops->check() functions. However, these functions might not be called for udp and raw packets. As a consequence, we don't use the learned pmtu informations in these cases. With this patch we call dst_check() from ip_setup_cork() to propagate the pmtu informations. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> --- net/ipv4/ip_output.c | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-)