diff mbox

ipv4: Flush per-ns routing cache more sanely.

Message ID 20101026.103428.179941457.davem@davemloft.net
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

David Miller Oct. 26, 2010, 5:34 p.m. UTC
Flush the routing cache only of entries that match the
network namespace in which the purge event occurred.

Signed-off-by: David S. Miller <davem@davemloft.net>
---

Could I get some testing and performance analysis feedback
on this change?  I don't want it to just die :-)

THanks!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Eric Dumazet Oct. 26, 2010, 6:57 p.m. UTC | #1
Le mardi 26 octobre 2010 à 10:34 -0700, David Miller a écrit :
> Flush the routing cache only of entries that match the
> network namespace in which the purge event occurred.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
> 
> Could I get some testing and performance analysis feedback
> on this change?  I don't want it to just die :-)
> 

I believe its fine, thanks !

Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

I'll respin my __rcu patch, of course.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Lezcano Oct. 26, 2010, 6:57 p.m. UTC | #2
On 10/26/2010 07:34 PM, David Miller wrote:
> Flush the routing cache only of entries that match the
> network namespace in which the purge event occurred.
>
> Signed-off-by: David S. Miller<davem@davemloft.net>
> ---
>
> Could I get some testing and performance analysis feedback
> on this change?  I don't want it to just die :-)
>    

Ok, will do in a couple of days.

Thanks
   -- Daniel

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman Oct. 26, 2010, 7:05 p.m. UTC | #3
David Miller <davem@davemloft.net> writes:

> Flush the routing cache only of entries that match the
> network namespace in which the purge event occurred.
>
> Signed-off-by: David S. Miller <davem@davemloft.net>
> ---
>
> Could I get some testing and performance analysis feedback
> on this change?  I don't want it to just die :-)
>
> THanks!
>
> diff --git a/include/net/route.h b/include/net/route.h
> index 7e5e73b..8d24761 100644
> --- a/include/net/route.h
> +++ b/include/net/route.h
> @@ -106,7 +106,7 @@ extern int		ip_rt_init(void);
>  extern void		ip_rt_redirect(__be32 old_gw, __be32 dst, __be32 new_gw,
>  				       __be32 src, struct net_device *dev);
>  extern void		rt_cache_flush(struct net *net, int how);
> -extern void		rt_cache_flush_batch(void);
> +extern void		rt_cache_flush_batch(struct net *net);
>  extern int		__ip_route_output_key(struct net *, struct rtable **, const struct flowi *flp);
>  extern int		ip_route_output_key(struct net *, struct rtable **, struct flowi *flp);
>  extern int		ip_route_output_flow(struct net *, struct rtable **rp, struct flowi *flp, struct sock *sk, int flags);
> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index 36e27c2..828d471 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c
> @@ -999,7 +999,7 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
>  		rt_cache_flush(dev_net(dev), 0);
>  		break;
>  	case NETDEV_UNREGISTER_BATCH:
> -		rt_cache_flush_batch();
> +		rt_cache_flush_batch(dev_net(dev));

It still has this incorrect conversion in it.

Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 26, 2010, 7:20 p.m. UTC | #4
From: ebiederm@xmission.com (Eric W. Biederman)
Date: Tue, 26 Oct 2010 12:05:39 -0700

>> @@ -999,7 +999,7 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
>>  		rt_cache_flush(dev_net(dev), 0);
>>  		break;
>>  	case NETDEV_UNREGISTER_BATCH:
>> -		rt_cache_flush_batch();
>> +		rt_cache_flush_batch(dev_net(dev));
> 
> It still has this incorrect conversion in it.

Sorry I missed that, what's the exact problem with it?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 26, 2010, 7:30 p.m. UTC | #5
Le mardi 26 octobre 2010 à 12:20 -0700, David Miller a écrit :
> From: ebiederm@xmission.com (Eric W. Biederman)
> Date: Tue, 26 Oct 2010 12:05:39 -0700
> 
> >> @@ -999,7 +999,7 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
> >>  		rt_cache_flush(dev_net(dev), 0);
> >>  		break;
> >>  	case NETDEV_UNREGISTER_BATCH:
> >> -		rt_cache_flush_batch();
> >> +		rt_cache_flush_batch(dev_net(dev));
> > 
> > It still has this incorrect conversion in it.
> 
> Sorry I missed that, what's the exact problem with it?

Because the way _BATCH operation is performed, we call it once...

rollback_registered_many() calls it for the first dev queued in the
list.

So it should be net independant.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Lezcano Oct. 29, 2010, 9:41 p.m. UTC | #6
On 10/26/2010 09:30 PM, Eric Dumazet wrote:
> Le mardi 26 octobre 2010 à 12:20 -0700, David Miller a écrit :
>    
>> From: ebiederm@xmission.com (Eric W. Biederman)
>> Date: Tue, 26 Oct 2010 12:05:39 -0700
>>
>>      
>>>> @@ -999,7 +999,7 @@ static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
>>>>   		rt_cache_flush(dev_net(dev), 0);
>>>>   		break;
>>>>   	case NETDEV_UNREGISTER_BATCH:
>>>> -		rt_cache_flush_batch();
>>>> +		rt_cache_flush_batch(dev_net(dev));
>>>>          
>>> It still has this incorrect conversion in it.
>>>        
>> Sorry I missed that, what's the exact problem with it?
>>      
> Because the way _BATCH operation is performed, we call it once...
>
> rollback_registered_many() calls it for the first dev queued in the
> list.
>
> So it should be net independant.
>    

Dave,

do you plan to send another version of this patch ? Or can I test it as 
it is ?
Without removing a network device, I can check the routine, no ?

Thanks
   -- Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Oct. 29, 2010, 10:21 p.m. UTC | #7
From: Daniel Lezcano <daniel.lezcano@free.fr>
Date: Fri, 29 Oct 2010 23:41:40 +0200

> do you plan to send another version of this patch ? Or can I test it
> as it is ?  Without removing a network device, I can check the
> routine, no ?

I'm backlogged with the current merge window work and fixing bugs,
so feel free to test what I posted if you have the time.

I'll post an updated version when time permits.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/route.h b/include/net/route.h
index 7e5e73b..8d24761 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -106,7 +106,7 @@  extern int		ip_rt_init(void);
 extern void		ip_rt_redirect(__be32 old_gw, __be32 dst, __be32 new_gw,
 				       __be32 src, struct net_device *dev);
 extern void		rt_cache_flush(struct net *net, int how);
-extern void		rt_cache_flush_batch(void);
+extern void		rt_cache_flush_batch(struct net *net);
 extern int		__ip_route_output_key(struct net *, struct rtable **, const struct flowi *flp);
 extern int		ip_route_output_key(struct net *, struct rtable **, struct flowi *flp);
 extern int		ip_route_output_flow(struct net *, struct rtable **rp, struct flowi *flp, struct sock *sk, int flags);
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 36e27c2..828d471 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -999,7 +999,7 @@  static int fib_netdev_event(struct notifier_block *this, unsigned long event, vo
 		rt_cache_flush(dev_net(dev), 0);
 		break;
 	case NETDEV_UNREGISTER_BATCH:
-		rt_cache_flush_batch();
+		rt_cache_flush_batch(dev_net(dev));
 		break;
 	}
 	return NOTIFY_DONE;
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index d6cb2bf..f6860eb 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -712,13 +712,14 @@  static inline int rt_is_expired(struct rtable *rth)
  * Can be called by a softirq or a process.
  * In the later case, we want to be reschedule if necessary
  */
-static void rt_do_flush(int process_context)
+static void rt_do_flush(struct net *net, int process_context)
 {
 	unsigned int i;
 	struct rtable *rth, *next;
-	struct rtable * tail;
 
 	for (i = 0; i <= rt_hash_mask; i++) {
+		struct rtable *list, **pprev;
+
 		if (process_context && need_resched())
 			cond_resched();
 		rth = rt_hash_table[i].chain;
@@ -726,41 +727,28 @@  static void rt_do_flush(int process_context)
 			continue;
 
 		spin_lock_bh(rt_hash_lock_addr(i));
-#ifdef CONFIG_NET_NS
-		{
-		struct rtable ** prev, * p;
 
-		rth = rt_hash_table[i].chain;
+		list = NULL;
+		pprev = &rt_hash_table[i].chain;
+		rth = *pprev;
+		while (rth) {
+			next = rth->dst.rt_next;
+			if (net_eq(dev_net(rth->dst.dev), net)) {
+				*pprev = next;
 
-		/* defer releasing the head of the list after spin_unlock */
-		for (tail = rth; tail; tail = tail->dst.rt_next)
-			if (!rt_is_expired(tail))
-				break;
-		if (rth != tail)
-			rt_hash_table[i].chain = tail;
-
-		/* call rt_free on entries after the tail requiring flush */
-		prev = &rt_hash_table[i].chain;
-		for (p = *prev; p; p = next) {
-			next = p->dst.rt_next;
-			if (!rt_is_expired(p)) {
-				prev = &p->dst.rt_next;
+				rth->dst.rt_next = list;
+				list = rth;
 			} else {
-				*prev = next;
-				rt_free(p);
+				pprev = &rth->dst.rt_next;
 			}
+			rth = next;
 		}
-		}
-#else
-		rth = rt_hash_table[i].chain;
-		rt_hash_table[i].chain = NULL;
-		tail = NULL;
-#endif
+
 		spin_unlock_bh(rt_hash_lock_addr(i));
 
-		for (; rth != tail; rth = next) {
-			next = rth->dst.rt_next;
-			rt_free(rth);
+		for (; list; list = next) {
+			next = list->dst.rt_next;
+			rt_free(list);
 		}
 	}
 }
@@ -906,13 +894,13 @@  void rt_cache_flush(struct net *net, int delay)
 {
 	rt_cache_invalidate(net);
 	if (delay >= 0)
-		rt_do_flush(!in_softirq());
+		rt_do_flush(net, !in_softirq());
 }
 
 /* Flush previous cache invalidated entries from the cache */
-void rt_cache_flush_batch(void)
+void rt_cache_flush_batch(struct net *net)
 {
-	rt_do_flush(!in_softirq());
+	rt_do_flush(net, !in_softirq());
 }
 
 static void rt_emergency_hash_rebuild(struct net *net)