Message ID | 1347506158.13103.1365.camel@edumazet-glaptop |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Thu, 13 Sep 2012 05:15:58 +0200 > From: Eric Dumazet <edumazet@google.com> > > We have special handling of SIT devices in addrconf_prefix_route() > to avoid allocating a neighbour for each destination. > > If routing entry is : > > ip -6 route add 2001:db8::/64 dev sit1 > > Then the kernel will create a new route and neighbour for every new > address under 2001:db8::/64 that we send a packet to > (potentially, 2^64 routes and neighbours). > > Under load, we immediately get the infamous "Neighbour table overflow" > message and machine eventually crash. > > This does not happen if we specify a next-hop explicitly, like so: > > ip -6 route add 2001:db8::/64 via fe80:: dev sit1 > > Same problem happens if we use routes to loopback. > > Idea of this patch is to move existing SIT related code from > addrconf_prefix_route() to a more generic one in ip6_route_add(). > > This permits ip6_pol_route() to clone route instead of calling > rt6_alloc_cow() and allocate a neighbour. > > Many thanks to Lorenzo for his help and suggestions. > > Reported-by: Lorenzo Colitti <lorenzo@google.com> > Signed-off-by: Eric Dumazet <edumazet@google.com> This patch lacks the desired effect without your clone-caching-removal patch, which I will not apply. Therefore it doesn't make any sense to apply this either, as it won't fix the stated problem. Doing a proper conversion of ipv6 to ref-count-less neigh's will solve this problem and allow all of the clone/cow caching code to be elided for the majority of cases and is the correct approach to these problems. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-09-13 at 17:13 -0400, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Thu, 13 Sep 2012 05:15:58 +0200 > > > From: Eric Dumazet <edumazet@google.com> > > > > We have special handling of SIT devices in addrconf_prefix_route() > > to avoid allocating a neighbour for each destination. > > > > If routing entry is : > > > > ip -6 route add 2001:db8::/64 dev sit1 > > > > Then the kernel will create a new route and neighbour for every new > > address under 2001:db8::/64 that we send a packet to > > (potentially, 2^64 routes and neighbours). > > > > Under load, we immediately get the infamous "Neighbour table overflow" > > message and machine eventually crash. > > > > This does not happen if we specify a next-hop explicitly, like so: > > > > ip -6 route add 2001:db8::/64 via fe80:: dev sit1 > > > > Same problem happens if we use routes to loopback. > > > > Idea of this patch is to move existing SIT related code from > > addrconf_prefix_route() to a more generic one in ip6_route_add(). > > > > This permits ip6_pol_route() to clone route instead of calling > > rt6_alloc_cow() and allocate a neighbour. > > > > Many thanks to Lorenzo for his help and suggestions. > > > > Reported-by: Lorenzo Colitti <lorenzo@google.com> > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > This patch lacks the desired effect without your clone-caching-removal > patch, which I will not apply. > > Therefore it doesn't make any sense to apply this either, as it won't > fix the stated problem. > > Doing a proper conversion of ipv6 to ref-count-less neigh's will solve > this problem and allow all of the clone/cow caching code to be elided > for the majority of cases and is the correct approach to these problems. This seems quite different patches to me. Addressing two problems. This patch is about not allocating a neighbour for a given route, but reusing one existing neighbour. What could be possibly wrong with this, since all neighbours are exactly the sames ? I understand we can make it better with big surgery later, but right now it seems quite reasonable. This is already done (in part) for SIT devices, which are a subclass of PtP device. For the other patch, it seems problem was introduced in 2.6.38 when CLONE_OFFLINK_ROUTE was removed in commit d80bc0fd26. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 1237d5d..c6837d2 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -1679,16 +1679,6 @@ addrconf_prefix_route(struct in6_addr *pfx, int plen, struct net_device *dev, }; cfg.fc_dst = *pfx; - - /* Prevent useless cloning on PtP SIT. - This thing is done here expecting that the whole - class of non-broadcast devices need not cloning. - */ -#if defined(CONFIG_IPV6_SIT) || defined(CONFIG_IPV6_SIT_MODULE) - if (dev->type == ARPHRD_SIT && (dev->flags & IFF_POINTOPOINT)) - cfg.fc_flags |= RTF_NONEXTHOP; -#endif - ip6_route_add(&cfg); } diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 399613b..7df8dfc 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1540,6 +1540,10 @@ int ip6_route_add(struct fib6_config *cfg) } else rt->rt6i_prefsrc.plen = 0; + /* Prevent useless cloning on link types that don't have next hops. */ + if (dev->flags & (IFF_POINTOPOINT | IFF_LOOPBACK)) + cfg->fc_flags |= RTF_NONEXTHOP; + if (cfg->fc_flags & (RTF_GATEWAY | RTF_NONEXTHOP)) { err = rt6_bind_neighbour(rt, dev); if (err)