diff mbox

[1/4] ipv4: Fix pmtu propagating

Message ID 20111011110922.GD1830@secunet.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Steffen Klassert Oct. 11, 2011, 11:09 a.m. UTC
Since commit 2c8cec5c (ipv4: Cache learned PMTU information in inetpeer)
we cache the learned pmtu informations in inetpeer and propagate these
informations with the dst_ops->check() functions. However, these functions
might not be called for udp and raw packets. As a consequence, we don't
use the learned pmtu informations in these cases. With this patch we
call dst_check() from ip_setup_cork() to propagate the pmtu informations.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv4/ip_output.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

Comments

David Miller Oct. 12, 2011, 9:02 p.m. UTC | #1
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Tue, 11 Oct 2011 13:09:22 +0200

> Since commit 2c8cec5c (ipv4: Cache learned PMTU information in inetpeer)
> we cache the learned pmtu informations in inetpeer and propagate these
> informations with the dst_ops->check() functions. However, these functions
> might not be called for udp and raw packets. As a consequence, we don't
> use the learned pmtu informations in these cases. With this patch we
> call dst_check() from ip_setup_cork() to propagate the pmtu informations.
> 
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

This dst_check() call will only do something if dst->obsolete is non-zero.

If dst->obsolete can be set in these circumstances, that's a bug.  The
caller is responsible for providing either a freshly looked up route
or a cached route which has had dst_check() or sk_dst_check() invoked
upon it.

I am pretty sure these rules are followed by the current code.

Again, there are only two scenerios:

1) 'rt' is just looked up by caller (f.e. udp_sendmsg() in rt == NULL case),
   here dst->obsolete is very unlikely to be non-zero.

2) Connected case, and we use cached route from the socket, but here
   we'll use sk_dst_check() to validate the route.  sk_dst_check()
   makes the necessary dst->ops->check() call if dst->obsolete is
   non-zero, and in fact that is it's one and only job.

So I cannot see a case where your new check can be necessary.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steffen Klassert Oct. 13, 2011, 10:09 a.m. UTC | #2
On Wed, Oct 12, 2011 at 05:02:18PM -0400, David Miller wrote:
> 
> This dst_check() call will only do something if dst->obsolete is non-zero.
> 
> If dst->obsolete can be set in these circumstances, that's a bug.  The
> caller is responsible for providing either a freshly looked up route
> or a cached route which has had dst_check() or sk_dst_check() invoked
> upon it.

dst->obsolete was -1 in all cases I observed, regardless if connected or not.

> 
> I am pretty sure these rules are followed by the current code.
> 
> Again, there are only two scenerios:
> 
> 1) 'rt' is just looked up by caller (f.e. udp_sendmsg() in rt == NULL case),
>    here dst->obsolete is very unlikely to be non-zero.
> 
> 2) Connected case, and we use cached route from the socket, but here
>    we'll use sk_dst_check() to validate the route.  sk_dst_check()
>    makes the necessary dst->ops->check() call if dst->obsolete is
>    non-zero, and in fact that is it's one and only job.
> 

At least it seems that raw_sendmsg() and ping_sendmsg() don't use
a cached route, they do the route lookup in any case. I don't see
where we check if we learned a new pmtu in this cases. 

I added some debugging output and saw that peer->pmtu_learned
has the correct pmtu value, but it never ends up in the metric
of the dst_entry.

With a ping -s 1400 to a destination where the pmtu is 1000
I get for each packet 'Frag needed and DF set (mtu = 1000)'.
That's how I noticed that something changed.

With the dst_check() in ip_setup_cork() I get
'Frag needed and DF set (mtu = 1000)' for the first packet and
all further packets reach the destination, as it was before
commmit 2c8cec5c.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 13, 2011, 5:58 p.m. UTC | #3
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Thu, 13 Oct 2011 12:09:50 +0200

> At least it seems that raw_sendmsg() and ping_sendmsg() don't use
> a cached route, they do the route lookup in any case. I don't see
> where we check if we learned a new pmtu in this cases. 

A freshly looked up route should not have ->obsolete set.

That's why we don't do dst_check() in that part of the ip_output.c
helper code you're modifying.

Please find out exactly why dst->obsolete is non-zero on a freshly
looked up route.  It's unexpected.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steffen Klassert Oct. 14, 2011, 5:54 a.m. UTC | #4
On Thu, Oct 13, 2011 at 01:58:08PM -0400, David Miller wrote:
> From: Steffen Klassert <steffen.klassert@secunet.com>
> Date: Thu, 13 Oct 2011 12:09:50 +0200
> 
> > At least it seems that raw_sendmsg() and ping_sendmsg() don't use
> > a cached route, they do the route lookup in any case. I don't see
> > where we check if we learned a new pmtu in this cases. 
> 
> A freshly looked up route should not have ->obsolete set.
> 
> That's why we don't do dst_check() in that part of the ip_output.c
> helper code you're modifying.
> 
> Please find out exactly why dst->obsolete is non-zero on a freshly
> looked up route.  It's unexpected.

Hm, on a slow path route lookup e.g. __mkroute_output() calls
rt_dst_alloc() which initializes dst->obsolete to -1. It seems
that ___dst_free() is the only function that ever changes the
initial obsolete value. After calling ___dst_free() dst->obsolete
is 2.

Btw. on a slow path route lookup, __mkroute_output() and friends
initialize the pmtu informations via rt_set_nexthop(). How do we
check if these informations are still valid if we get the route
via the routing hash cache? Do we need to check in this case?

The raw protocol uses ip4_datagram_connect() as it's connect function.
ip4_datagram_connect() uses sk_dst_set() to cache the dst_entry on
the socket, why we don't use this cached dst_entry on raw_sendmsg()
in the connected case?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steffen Klassert Oct. 17, 2011, 12:18 p.m. UTC | #5
On Fri, Oct 14, 2011 at 07:54:06AM +0200, Steffen Klassert wrote:
> On Thu, Oct 13, 2011 at 01:58:08PM -0400, David Miller wrote:
> > 
> > Please find out exactly why dst->obsolete is non-zero on a freshly
> > looked up route.  It's unexpected.
> 
> Hm, on a slow path route lookup e.g. __mkroute_output() calls
> rt_dst_alloc() which initializes dst->obsolete to -1.

Just a follow up:
git commit d11a4dc18 (ipv4: check rt_genid in dst_check)
changed the initialization value of dst->obsolete from
0 to -1.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gao feng Oct. 19, 2011, 9:07 a.m. UTC | #6
2011.10.17 20:18, Steffen Klassert wrote:
> On Fri, Oct 14, 2011 at 07:54:06AM +0200, Steffen Klassert wrote:
>> On Thu, Oct 13, 2011 at 01:58:08PM -0400, David Miller wrote:
>>>
>>> Please find out exactly why dst->obsolete is non-zero on a freshly
>>> looked up route.  It's unexpected.
>>
>> Hm, on a slow path route lookup e.g. __mkroute_output() calls
>> rt_dst_alloc() which initializes dst->obsolete to -1.
> 
> Just a follow up:
> git commit d11a4dc18 (ipv4: check rt_genid in dst_check)
> changed the initialization value of dst->obsolete from
> 0 to -1.
> --

Anybody give any suggestion?
Actually,we can't suppose the dst->obsolete is non-zero.
maybe we should use both dst->obsolete and rt->rt_peer_genid to decide whether to do dst_check or not?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 19, 2011, 7:32 p.m. UTC | #7
From: Gao feng <gaofeng@cn.fujitsu.com>
Date: Wed, 19 Oct 2011 17:07:50 +0800

> 2011.10.17 20:18, Steffen Klassert wrote:
>> On Fri, Oct 14, 2011 at 07:54:06AM +0200, Steffen Klassert wrote:
>>> On Thu, Oct 13, 2011 at 01:58:08PM -0400, David Miller wrote:
>>>>
>>>> Please find out exactly why dst->obsolete is non-zero on a freshly
>>>> looked up route.  It's unexpected.
>>>
>>> Hm, on a slow path route lookup e.g. __mkroute_output() calls
>>> rt_dst_alloc() which initializes dst->obsolete to -1.
>> 
>> Just a follow up:
>> git commit d11a4dc18 (ipv4: check rt_genid in dst_check)
>> changed the initialization value of dst->obsolete from
>> 0 to -1.
>> --
> 
> Anybody give any suggestion?

I'm really busy but will look at this soon.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Nov. 8, 2011, 7:19 p.m. UTC | #8
Steffen I look at this specific patch again.

The peer should be found and loaded up, and the PMTU propagated, when
the route cache entry is created.  Specifically rt_init_metrics() will
find any existing peer, and do the whole check_peer_pmtu() sequence
that ipv4_dst_check() does.

If we have some issue with UDP or RAW caching a route past the first
use after the route lookup, yes we have to introduce a dst_check()
call somewhere.

But unilaterally doing this on every CORK setup seems at least very
excessive.  Because CORK setup happens even for single sends.  One
such example is ip_make_skb().

ip_make_skb() is used by, f.e., udp_sendmsg() when corkreq is false.
And in this case 'rt' is used immediately after being looked up
in udp_sendmsg().

And, as far as I can tell, routines like udp_sendmsg() in fact already
handle the "cached route across multiple sendmsg() calls" case too.

Specifically, udp_sendmsg() does this:

	if (connected)
		rt = (struct rtable *)sk_dst_check(sk, 0);

otherwise it makes a completely fresh route lookup.

RAW sendmsg unconditionally makes a fresh route lookup on every
sendmsg call.  So it should be OK too.

So I really fail to see the problematic case.  Therefore, if it exists
you'll have to give me an exact sequence of events that leads to the
problem.

I suspect that your real problem has nothing to do with UDP or RAW,
but rather the issue is that entries already in the routing cache
with a NULL peer need to be refreshed with peer information created
in another context.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 8c65633..f682437 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1063,6 +1063,11 @@  static int ip_setup_cork(struct sock *sk, struct inet_cork *cork,
 	rt = *rtp;
 	if (unlikely(!rt))
 		return -EFAULT;
+
+	rt = (struct rtable *)dst_check(&rt->dst, 0);
+	if (!rt)
+		return -EHOSTUNREACH;
+
 	/*
 	 * We steal reference to this route, caller should not release it
 	 */