diff mbox

[v2,net-next] ipv6: prevent useless neigh alloc on PTP or lo routes

Message ID 1347506158.13103.1365.camel@edumazet-glaptop
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Sept. 13, 2012, 3:15 a.m. UTC
From: Eric Dumazet <edumazet@google.com>

We have special handling of SIT devices in addrconf_prefix_route()
to avoid allocating a neighbour for each destination.

If routing entry is :

ip -6 route add 2001:db8::/64 dev sit1

Then the kernel will create a new route and neighbour for every new
address under 2001:db8::/64 that we send a packet to 
(potentially, 2^64 routes and neighbours).

Under load, we immediately get the infamous "Neighbour table overflow"
message and machine eventually crash.

This does not happen if we specify a next-hop explicitly, like so:

ip -6 route add 2001:db8::/64 via fe80:: dev sit1

Same problem happens if we use routes to loopback.

Idea of this patch is to move existing SIT related code from
addrconf_prefix_route() to a more generic one in ip6_route_add(). 

This permits ip6_pol_route() to clone route instead of calling
rt6_alloc_cow() and allocate a neighbour.

Many thanks to Lorenzo for his help and suggestions.

Reported-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Willem de Bruijn <willemb@google.com>
---
 net/ipv6/addrconf.c |   10 ----------
 net/ipv6/route.c    |    4 ++++
 2 files changed, 4 insertions(+), 10 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Miller Sept. 13, 2012, 9:13 p.m. UTC | #1
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 13 Sep 2012 05:15:58 +0200

> From: Eric Dumazet <edumazet@google.com>
> 
> We have special handling of SIT devices in addrconf_prefix_route()
> to avoid allocating a neighbour for each destination.
> 
> If routing entry is :
> 
> ip -6 route add 2001:db8::/64 dev sit1
> 
> Then the kernel will create a new route and neighbour for every new
> address under 2001:db8::/64 that we send a packet to 
> (potentially, 2^64 routes and neighbours).
> 
> Under load, we immediately get the infamous "Neighbour table overflow"
> message and machine eventually crash.
> 
> This does not happen if we specify a next-hop explicitly, like so:
> 
> ip -6 route add 2001:db8::/64 via fe80:: dev sit1
> 
> Same problem happens if we use routes to loopback.
> 
> Idea of this patch is to move existing SIT related code from
> addrconf_prefix_route() to a more generic one in ip6_route_add(). 
> 
> This permits ip6_pol_route() to clone route instead of calling
> rt6_alloc_cow() and allocate a neighbour.
> 
> Many thanks to Lorenzo for his help and suggestions.
> 
> Reported-by: Lorenzo Colitti <lorenzo@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

This patch lacks the desired effect without your clone-caching-removal
patch, which I will not apply.

Therefore it doesn't make any sense to apply this either, as it won't
fix the stated problem.

Doing a proper conversion of ipv6 to ref-count-less neigh's will solve
this problem and allow all of the clone/cow caching code to be elided
for the majority of cases and is the correct approach to these problems.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 13, 2012, 9:51 p.m. UTC | #2
On Thu, 2012-09-13 at 17:13 -0400, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 13 Sep 2012 05:15:58 +0200
> 
> > From: Eric Dumazet <edumazet@google.com>
> > 
> > We have special handling of SIT devices in addrconf_prefix_route()
> > to avoid allocating a neighbour for each destination.
> > 
> > If routing entry is :
> > 
> > ip -6 route add 2001:db8::/64 dev sit1
> > 
> > Then the kernel will create a new route and neighbour for every new
> > address under 2001:db8::/64 that we send a packet to 
> > (potentially, 2^64 routes and neighbours).
> > 
> > Under load, we immediately get the infamous "Neighbour table overflow"
> > message and machine eventually crash.
> > 
> > This does not happen if we specify a next-hop explicitly, like so:
> > 
> > ip -6 route add 2001:db8::/64 via fe80:: dev sit1
> > 
> > Same problem happens if we use routes to loopback.
> > 
> > Idea of this patch is to move existing SIT related code from
> > addrconf_prefix_route() to a more generic one in ip6_route_add(). 
> > 
> > This permits ip6_pol_route() to clone route instead of calling
> > rt6_alloc_cow() and allocate a neighbour.
> > 
> > Many thanks to Lorenzo for his help and suggestions.
> > 
> > Reported-by: Lorenzo Colitti <lorenzo@google.com>
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> 
> This patch lacks the desired effect without your clone-caching-removal
> patch, which I will not apply.
> 
> Therefore it doesn't make any sense to apply this either, as it won't
> fix the stated problem.
> 
> Doing a proper conversion of ipv6 to ref-count-less neigh's will solve
> this problem and allow all of the clone/cow caching code to be elided
> for the majority of cases and is the correct approach to these problems.


This seems quite different patches to me. Addressing two problems.

This patch is about not allocating a neighbour for a given route, but
reusing one existing neighbour. What could be possibly wrong with this,
since all neighbours are exactly the sames ?

I understand we can make it better with big surgery later, but right now
it seems quite reasonable.

This is already done (in part) for SIT devices, which are a subclass of
PtP device.

For the other patch, it seems problem was introduced in 2.6.38 when
CLONE_OFFLINK_ROUTE was removed in commit d80bc0fd26.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 1237d5d..c6837d2 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1679,16 +1679,6 @@  addrconf_prefix_route(struct in6_addr *pfx, int plen, struct net_device *dev,
 	};
 
 	cfg.fc_dst = *pfx;
-
-	/* Prevent useless cloning on PtP SIT.
-	   This thing is done here expecting that the whole
-	   class of non-broadcast devices need not cloning.
-	 */
-#if defined(CONFIG_IPV6_SIT) || defined(CONFIG_IPV6_SIT_MODULE)
-	if (dev->type == ARPHRD_SIT && (dev->flags & IFF_POINTOPOINT))
-		cfg.fc_flags |= RTF_NONEXTHOP;
-#endif
-
 	ip6_route_add(&cfg);
 }
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 399613b..7df8dfc 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1540,6 +1540,10 @@  int ip6_route_add(struct fib6_config *cfg)
 	} else
 		rt->rt6i_prefsrc.plen = 0;
 
+	/* Prevent useless cloning on link types that don't have next hops. */
+	if (dev->flags & (IFF_POINTOPOINT | IFF_LOOPBACK))
+		cfg->fc_flags |= RTF_NONEXTHOP;
+
 	if (cfg->fc_flags & (RTF_GATEWAY | RTF_NONEXTHOP)) {
 		err = rt6_bind_neighbour(rt, dev);
 		if (err)