diff mbox

net: allow netdev_wait_allrefs() to run faster

Message ID 4AE28429.6040608@gmail.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Eric Dumazet Oct. 24, 2009, 4:35 a.m. UTC
Paul E. McKenney a écrit :
> On Wed, Oct 21, 2009 at 05:40:07PM +0200, Eric Dumazet wrote:
>> [PATCH] net: allow netdev_wait_allrefs() to run faster
>>
>> netdev_wait_allrefs() waits that all references to a device vanishes.
>>
>> It currently uses a _very_ pessimistic 250 ms delay between each probe.
>> Some users report that no more than 4 devices can be dismantled per second,
>> this is a pretty serious problem for extreme setups.
>>
>> Most likely, references only wait for a rcu grace period that should come
>> fast, so use a schedule_timeout_uninterruptible(1) to allow faster recovery.
> 
> Is this a place where synchronize_rcu_expedited() is appropriate?
> (It went in to 2.6.32-rc1.)
> 

Thanks for the tip Paul

I believe netdev_wait_allrefs() is not a perfect candidate, because 
synchronize_sched_expedited() seems really expensive.

Maybe we could call it once only, if we had to call 1 times
the jiffie delay ?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Paul E. McKenney Oct. 24, 2009, 5:49 a.m. UTC | #1
On Sat, Oct 24, 2009 at 06:35:53AM +0200, Eric Dumazet wrote:
> Paul E. McKenney a écrit :
> > On Wed, Oct 21, 2009 at 05:40:07PM +0200, Eric Dumazet wrote:
> >> [PATCH] net: allow netdev_wait_allrefs() to run faster
> >>
> >> netdev_wait_allrefs() waits that all references to a device vanishes.
> >>
> >> It currently uses a _very_ pessimistic 250 ms delay between each probe.
> >> Some users report that no more than 4 devices can be dismantled per second,
> >> this is a pretty serious problem for extreme setups.
> >>
> >> Most likely, references only wait for a rcu grace period that should come
> >> fast, so use a schedule_timeout_uninterruptible(1) to allow faster recovery.
> > 
> > Is this a place where synchronize_rcu_expedited() is appropriate?
> > (It went in to 2.6.32-rc1.)
> 
> Thanks for the tip Paul
> 
> I believe netdev_wait_allrefs() is not a perfect candidate, because 
> synchronize_sched_expedited() seems really expensive.

It does indeed keep the CPUs quite busy for a bit.  ;-)

> Maybe we could call it once only, if we had to call 1 times
> the jiffie delay ?

This could be a very useful approach!

However, please keep in mind that although synchronize_rcu_expedited()
forces a grace period, it does nothing to speed the invocation of other
RCU callbacks.  In short, synchronize_rcu_expedited() is a faster version
of synchronize_rcu(), but doesn't necessarily help other synchronize_rcu()
or call_rcu() invocations.

The reason I point this out is that it looks to me that the code below is
waiting for some other task which is in turn waiting on a grace period.
But I don't know this code, so could easily be confused.

						Thanx, paul

> diff --git a/net/core/dev.c b/net/core/dev.c
> index fa88dcd..9b04b9a 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4970,6 +4970,7 @@ EXPORT_SYMBOL(register_netdev);
>  static void netdev_wait_allrefs(struct net_device *dev)
>  {
>  	unsigned long rebroadcast_time, warning_time;
> +	unsigned int count = 0;
> 
>  	rebroadcast_time = warning_time = jiffies;
>  	while (atomic_read(&dev->refcnt) != 0) {
> @@ -4995,7 +4996,10 @@ static void netdev_wait_allrefs(struct net_device *dev)
>  			rebroadcast_time = jiffies;
>  		}
> 
> -		msleep(250);
> +		if (count++ == 1)
> +			synchronize_rcu_expedited();
> +		else
> +			schedule_timeout_uninterruptible(1);
> 
>  		if (time_after(jiffies, warning_time + 10 * HZ)) {
>  			printk(KERN_EMERG "unregister_netdevice: "
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 24, 2009, 8:49 a.m. UTC | #2
Paul E. McKenney a écrit :
> On Sat, Oct 24, 2009 at 06:35:53AM +0200, Eric Dumazet wrote:
> 
>> Maybe we could call it once only, if we had to call 1 times
>> the jiffie delay ?
> 
> This could be a very useful approach!
> 
> However, please keep in mind that although synchronize_rcu_expedited()
> forces a grace period, it does nothing to speed the invocation of other
> RCU callbacks.  In short, synchronize_rcu_expedited() is a faster version
> of synchronize_rcu(), but doesn't necessarily help other synchronize_rcu()
> or call_rcu() invocations.
> 
> The reason I point this out is that it looks to me that the code below is
> waiting for some other task which is in turn waiting on a grace period.
> But I don't know this code, so could easily be confused.
> 

Normally, we need a synchronize_rcu() calls, but I feel its bit more than really
needed here.

On my dev machine, a synchronize_rcu() lasts between 2 an 12 ms


messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.580259] synchronize_net() 4045596 ns
messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.588262] synchronize_net() 7769327 ns
messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.625014] synchronize_net() 4772052 ns
messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.633008] synchronize_net() 7773896 ns
messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.669260] synchronize_net() 3958141 ns
messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.677259] synchronize_net() 7755817 ns
messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.712011] synchronize_net() 2502544 ns
messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.720011] synchronize_net() 7767748 ns
messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.754259] synchronize_net() 2087946 ns
messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.762258] synchronize_net() 7738054 ns
messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.796011] synchronize_net() 3392760 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2515.808025] synchronize_net() 11814619 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2515.848010] synchronize_net() 8970220 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2515.856015] synchronize_net() 7800782 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2515.893008] synchronize_net() 6650174 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2515.897012] synchronize_net() 3744808 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2515.940202] synchronize_net() 8354366 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2515.952137] synchronize_net() 11693215 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2515.985010] synchronize_net() 2355970 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2515.989009] synchronize_net() 3771419 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.028137] synchronize_net() 7661195 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.036152] synchronize_net() 7800056 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.083135] synchronize_net() 6774026 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.089145] synchronize_net() 5727189 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.130385] synchronize_net() 10133932 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.134399] synchronize_net() 3773058 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.170136] synchronize_net() 4479194 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.178138] synchronize_net() 7710466 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.217198] synchronize_net() 4323437 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.226206] synchronize_net() 8723108 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.268013] synchronize_net() 6221155 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.280007] synchronize_net() 11719297 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.324008] synchronize_net() 11654511 ns
messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.332009] synchronize_net() 7744182 ns

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney Oct. 24, 2009, 1:52 p.m. UTC | #3
On Sat, Oct 24, 2009 at 10:49:55AM +0200, Eric Dumazet wrote:
> Paul E. McKenney a écrit :
> > On Sat, Oct 24, 2009 at 06:35:53AM +0200, Eric Dumazet wrote:
> > 
> >> Maybe we could call it once only, if we had to call 1 times
> >> the jiffie delay ?
> > 
> > This could be a very useful approach!
> > 
> > However, please keep in mind that although synchronize_rcu_expedited()
> > forces a grace period, it does nothing to speed the invocation of other
> > RCU callbacks.  In short, synchronize_rcu_expedited() is a faster version
> > of synchronize_rcu(), but doesn't necessarily help other synchronize_rcu()
> > or call_rcu() invocations.
> > 
> > The reason I point this out is that it looks to me that the code below is
> > waiting for some other task which is in turn waiting on a grace period.
> > But I don't know this code, so could easily be confused.
> > 
> 
> Normally, we need a synchronize_rcu() calls, but I feel its bit more than really
> needed here.
> 
> On my dev machine, a synchronize_rcu() lasts between 2 an 12 ms

That sounds like the right range, depending on what else is happening
on the machine at the time.

The synchronize_rcu_expedited() primitive would run in the 10s-100s
of microseconds.  It involves a pair of wakeups and a pair of context
switches on each CPU.

							Thanx, Paul

> messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.580259] synchronize_net() 4045596 ns
> messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.588262] synchronize_net() 7769327 ns
> messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.625014] synchronize_net() 4772052 ns
> messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.633008] synchronize_net() 7773896 ns
> messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.669260] synchronize_net() 3958141 ns
> messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.677259] synchronize_net() 7755817 ns
> messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.712011] synchronize_net() 2502544 ns
> messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.720011] synchronize_net() 7767748 ns
> messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.754259] synchronize_net() 2087946 ns
> messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.762258] synchronize_net() 7738054 ns
> messages:Oct 21 19:13:14 svivoipvnx001-00 kernel: [ 2515.796011] synchronize_net() 3392760 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2515.808025] synchronize_net() 11814619 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2515.848010] synchronize_net() 8970220 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2515.856015] synchronize_net() 7800782 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2515.893008] synchronize_net() 6650174 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2515.897012] synchronize_net() 3744808 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2515.940202] synchronize_net() 8354366 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2515.952137] synchronize_net() 11693215 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2515.985010] synchronize_net() 2355970 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2515.989009] synchronize_net() 3771419 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.028137] synchronize_net() 7661195 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.036152] synchronize_net() 7800056 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.083135] synchronize_net() 6774026 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.089145] synchronize_net() 5727189 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.130385] synchronize_net() 10133932 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.134399] synchronize_net() 3773058 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.170136] synchronize_net() 4479194 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.178138] synchronize_net() 7710466 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.217198] synchronize_net() 4323437 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.226206] synchronize_net() 8723108 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.268013] synchronize_net() 6221155 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.280007] synchronize_net() 11719297 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.324008] synchronize_net() 11654511 ns
> messages:Oct 21 19:13:15 svivoipvnx001-00 kernel: [ 2516.332009] synchronize_net() 7744182 ns
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 24, 2009, 2:24 p.m. UTC | #4
Paul E. McKenney a écrit :
> On Sat, Oct 24, 2009 at 10:49:55AM +0200, Eric Dumazet wrote:
>>
>> On my dev machine, a synchronize_rcu() lasts between 2 an 12 ms
> 
> That sounds like the right range, depending on what else is happening
> on the machine at the time.
> 
> The synchronize_rcu_expedited() primitive would run in the 10s-100s
> of microseconds.  It involves a pair of wakeups and a pair of context
> switches on each CPU.
> 

Hmm... I'll make some experiments Monday and post results, but it seems very
promising.

Do you think the "on_each_cpu(flush_backlog, dev, 1);"
we perform right before calling netdev_wait_allrefs() could be changed
somehow to speedup rcu callbacks ? Maybe we ould avoid sending IPI twice to
cpus ?

Thanks

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney Oct. 24, 2009, 2:46 p.m. UTC | #5
On Sat, Oct 24, 2009 at 04:24:27PM +0200, Eric Dumazet wrote:
> Paul E. McKenney a écrit :
> > On Sat, Oct 24, 2009 at 10:49:55AM +0200, Eric Dumazet wrote:
> >>
> >> On my dev machine, a synchronize_rcu() lasts between 2 an 12 ms
> > 
> > That sounds like the right range, depending on what else is happening
> > on the machine at the time.
> > 
> > The synchronize_rcu_expedited() primitive would run in the 10s-100s
> > of microseconds.  It involves a pair of wakeups and a pair of context
> > switches on each CPU.
> 
> Hmm... I'll make some experiments Monday and post results, but it seems very
> promising.

I should hasten to add that synchronize_rcu_expedited() goes fast for
TREE_RCU but not yet for TREE_PREEMPT_RCU (where it maps safely but
slowly to synchronize_rcu()).

> Do you think the "on_each_cpu(flush_backlog, dev, 1);"
> we perform right before calling netdev_wait_allrefs() could be changed
> somehow to speedup rcu callbacks ? Maybe we ould avoid sending IPI twice to
> cpus ?

This is an interesting possibility, and might fit in with some of the
changes that I am thinking about to reduce OS jitter for the heavy-duty
numerical-computing guys.

In the meantime, you could try doing the following from flush_backlog():

	local_irq_save(flags);
	rcu_check_callbacks(smp_processor_id(), 0);
	local_irq_restore(flags);

This would emulate a much-faster HZ value, but only for RCU.  This works
better in TREE_RCU than it does in TREE_PREEMPT_RCU at the moment (on my
todo list!).  In older kernels, this should also work for CLASSIC_RCU.
Of course, in TINY_RCU, synchronize_rcu() is a no-op anyway.  ;-)

And just to be clear, synchronize_rcu_expedited() currently just does
wakeups, not explicit IPIs.

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger Oct. 24, 2009, 8:22 p.m. UTC | #6
On Sat, 24 Oct 2009 06:35:53 +0200
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Paul E. McKenney a écrit :
> > On Wed, Oct 21, 2009 at 05:40:07PM +0200, Eric Dumazet wrote:
> >> [PATCH] net: allow netdev_wait_allrefs() to run faster
> >>
> >> netdev_wait_allrefs() waits that all references to a device vanishes.
> >>
> >> It currently uses a _very_ pessimistic 250 ms delay between each probe.
> >> Some users report that no more than 4 devices can be dismantled per second,
> >> this is a pretty serious problem for extreme setups.
> >>
> >> Most likely, references only wait for a rcu grace period that should come
> >> fast, so use a schedule_timeout_uninterruptible(1) to allow faster recovery.
> > 
> > Is this a place where synchronize_rcu_expedited() is appropriate?
> > (It went in to 2.6.32-rc1.)
> > 
> 
> Thanks for the tip Paul
> 
> I believe netdev_wait_allrefs() is not a perfect candidate, because 
> synchronize_sched_expedited() seems really expensive.
> 
> Maybe we could call it once only, if we had to call 1 times
> the jiffie delay ?
> 
> diff --git a/net/core/dev.c b/net/core/dev.c
> index fa88dcd..9b04b9a 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4970,6 +4970,7 @@ EXPORT_SYMBOL(register_netdev);
>  static void netdev_wait_allrefs(struct net_device *dev)
>  {
>  	unsigned long rebroadcast_time, warning_time;
> +	unsigned int count = 0;
>  
>  	rebroadcast_time = warning_time = jiffies;
>  	while (atomic_read(&dev->refcnt) != 0) {
> @@ -4995,7 +4996,10 @@ static void netdev_wait_allrefs(struct net_device *dev)
>  			rebroadcast_time = jiffies;
>  		}
>  
> -		msleep(250);
> +		if (count++ == 1)
> +			synchronize_rcu_expedited();
> +		else
> +			schedule_timeout_uninterruptible(1);
>  
>  		if (time_after(jiffies, warning_time + 10 * HZ)) {
>  			printk(KERN_EMERG "unregister_netdevice: "

Actually, anything that requires more than one pass through the loop is
broken. Devices and protocols should be cleaning up on the first notifier.
The worst offender seems to be the dst cache gc code.
Octavian Purdila Oct. 24, 2009, 11:49 p.m. UTC | #7
On Saturday 24 October 2009 17:24:27 you wrote:
> Paul E. McKenney a écrit :
> > On Sat, Oct 24, 2009 at 10:49:55AM +0200, Eric Dumazet wrote:
> >> On my dev machine, a synchronize_rcu() lasts between 2 an 12 ms
> >
> > That sounds like the right range, depending on what else is happening
> > on the machine at the time.
> >
> > The synchronize_rcu_expedited() primitive would run in the 10s-100s
> > of microseconds.  It involves a pair of wakeups and a pair of context
> > switches on each CPU.
> 
> Hmm... I'll make some experiments Monday and post results, but it seems
>  very promising.
> 

Got some time today and did some experiments myself. The test is deleting 1000 
dummy interfaces (interface status down, no IP/IPv6 addresses assigned) on a 
UP non-preempt ppc750 @800Mhz system.

1. Ben's patch:

real    0m 3.42s
user    0m 0.00s
sys     0m 0.00s

2. Eric's schedule_timeout_uninterruptible(1);

real    0m 3.00s
user    0m 0.00s
sys     0m 0.00s

3. Simple synchronize_rcu_expedited()

This doesn't seem to work well with the UP non-preempt case since 
synchronize_rcu_expedited() is a noop in this case - turning 
netdev_wait_allrefs() into a while(1) loop.

tavi




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney Oct. 25, 2009, 4:47 a.m. UTC | #8
On Sun, Oct 25, 2009 at 02:49:00AM +0300, Octavian Purdila wrote:
> On Saturday 24 October 2009 17:24:27 you wrote:
> > Paul E. McKenney a écrit :
> > > On Sat, Oct 24, 2009 at 10:49:55AM +0200, Eric Dumazet wrote:
> > >> On my dev machine, a synchronize_rcu() lasts between 2 an 12 ms
> > >
> > > That sounds like the right range, depending on what else is happening
> > > on the machine at the time.
> > >
> > > The synchronize_rcu_expedited() primitive would run in the 10s-100s
> > > of microseconds.  It involves a pair of wakeups and a pair of context
> > > switches on each CPU.
> > 
> > Hmm... I'll make some experiments Monday and post results, but it seems
> >  very promising.
> > 
> 
> Got some time today and did some experiments myself. The test is deleting 1000 
> dummy interfaces (interface status down, no IP/IPv6 addresses assigned) on a 
> UP non-preempt ppc750 @800Mhz system.
> 
> 1. Ben's patch:
> 
> real    0m 3.42s
> user    0m 0.00s
> sys     0m 0.00s
> 
> 2. Eric's schedule_timeout_uninterruptible(1);
> 
> real    0m 3.00s
> user    0m 0.00s
> sys     0m 0.00s
> 
> 3. Simple synchronize_rcu_expedited()
> 
> This doesn't seem to work well with the UP non-preempt case since 
> synchronize_rcu_expedited() is a noop in this case - turning 
> netdev_wait_allrefs() into a while(1) loop.

Indeed -- but then again, in the UP case, synchronize_rcu() itself
is pretty much a no-op.  So if your main target is UP, you should
be able to have seriously fast RCU updates.

(I know, I know, you want SMP to run fast as well...)

						Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 25, 2009, 8:35 a.m. UTC | #9
Octavian Purdila a écrit :
> 
> Got some time today and did some experiments myself. The test is deleting 1000 
> dummy interfaces (interface status down, no IP/IPv6 addresses assigned) on a 
> UP non-preempt ppc750 @800Mhz system.
> 
> 1. Ben's patch:
> 
> real    0m 3.42s
> user    0m 0.00s
> sys     0m 0.00s
> 
> 2. Eric's schedule_timeout_uninterruptible(1);
> 
> real    0m 3.00s
> user    0m 0.00s
> sys     0m 0.00s
> 
> 3. Simple synchronize_rcu_expedited()
> 
> This doesn't seem to work well with the UP non-preempt case since 
> synchronize_rcu_expedited() is a noop in this case - turning 
> netdev_wait_allrefs() into a while(1) loop.
> 

Thanks for these numbers. I presume HZ value is 1000 on this platform ?

Could you give us your scripts so that we can use same "benchmark" ?

BTW, I found I could not use IPV6 with many devices on x86_32, because of
the huge per_cpu allocations (on IPV6, each device has percpu SNMP counters)


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Octavian Purdila Oct. 25, 2009, 3:19 p.m. UTC | #10
On Sunday 25 October 2009 10:35:10 you wrote:
> > Got some time today and did some experiments myself. The test is deleting
> > 1000 dummy interfaces (interface status down, no IP/IPv6 addresses
> > assigned) on a UP non-preempt ppc750 @800Mhz system.
> >
> > 1. Ben's patch:
> >
> > real    0m 3.42s
> > user    0m 0.00s
> > sys     0m 0.00s
> >
> > 2. Eric's schedule_timeout_uninterruptible(1);
> >
> > real    0m 3.00s
> > user    0m 0.00s
> > sys     0m 0.00s
> >
> > 3. Simple synchronize_rcu_expedited()
> >
> > This doesn't seem to work well with the UP non-preempt case since
> > synchronize_rcu_expedited() is a noop in this case - turning
> > netdev_wait_allrefs() into a while(1) loop.
> 
> Thanks for these numbers. I presume HZ value is 1000 on this platform ?
> 

Yes. I've attach the full config to this email as well.

> Could you give us your scripts so that we can use same "benchmark" ?
> 

Sure, I've attached the hack module code I've used. 

For creating interfaces: echo 1000 > /proc/sys/net/ndst/add
For deleting interface echo start_ifindex stop_ifindex > /proc/sys/net/ndst/del

Some more information:

- on our old and optimized kernel I am getting 0.4s for creating 128000 
interfaces and 0.57s for deleting them

- the 2.6.31 kernel I got the 3s numbers does have some patches to speed-up 
interface creating and deletion (removal of per device sysctl and dev_snmp6 
entries)

I'll start posting the patches we have as RFC.

Thanks,
tavi
#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.31
# Sat Oct 24 20:54:34 2009
#
# CONFIG_PPC64 is not set

#
# Processor support
#
CONFIG_PPC_BOOK3S_32=y
# CONFIG_PPC_85xx is not set
# CONFIG_PPC_8xx is not set
# CONFIG_40x is not set
# CONFIG_44x is not set
# CONFIG_E200 is not set
CONFIG_PPC_BOOK3S=y
CONFIG_6xx=y
CONFIG_PPC_FPU=y
# CONFIG_ALTIVEC is not set
CONFIG_PPC_STD_MMU=y
CONFIG_PPC_STD_MMU_32=y
# CONFIG_PPC_MM_SLICES is not set
CONFIG_PPC_HAVE_PMU_SUPPORT=y
# CONFIG_SMP is not set
CONFIG_PPC32=y
CONFIG_WORD_SIZE=32
# CONFIG_ARCH_PHYS_ADDR_T_64BIT is not set
CONFIG_MMU=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
# CONFIG_HAVE_SETUP_PER_CPU_AREA is not set
CONFIG_IRQ_PER_CPU=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_ARCH_HAS_ILOG2_U32=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
# CONFIG_ARCH_NO_VIRT_TO_BUS is not set
CONFIG_PPC=y
CONFIG_EARLY_PRINTK=y
CONFIG_GENERIC_NVRAM=y
CONFIG_SCHED_OMIT_FRAME_POINTER=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_PPC_OF=y
CONFIG_OF=y
# CONFIG_PPC_UDBG_16550 is not set
# CONFIG_GENERIC_TBSYNC is not set
CONFIG_AUDIT_ARCH=y
CONFIG_GENERIC_BUG=y
CONFIG_DTC=y
# CONFIG_DEFAULT_UIMAGE is not set
# CONFIG_PPC_DCR_NATIVE is not set
# CONFIG_PPC_DCR_MMIO is not set
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_AUDIT is not set

#
# RCU Subsystem
#
CONFIG_CLASSIC_RCU=y
# CONFIG_TREE_RCU is not set
# CONFIG_PREEMPT_RCU is not set
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_PREEMPT_RCU_TRACE is not set
# CONFIG_IKCONFIG is not set
CONFIG_LOG_BUF_SHIFT=17
# CONFIG_GROUP_SCHED is not set
# CONFIG_CGROUPS is not set
# CONFIG_RELAY is not set
# CONFIG_NAMESPACES is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
# CONFIG_RD_BZIP2 is not set
# CONFIG_RD_LZMA is not set
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_EMBEDDED=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
# CONFIG_KALLSYMS_EXTRA_PASS is not set
# CONFIG_HOTPLUG is not set
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
# CONFIG_SIGNALFD is not set
CONFIG_TIMERFD=y
# CONFIG_EVENTFD is not set
# CONFIG_SHMEM is not set
# CONFIG_AIO is not set
CONFIG_HAVE_PERF_COUNTERS=y

#
# Performance Counters
#
# CONFIG_PERF_COUNTERS is not set
CONFIG_VM_EVENT_COUNTERS=y
# CONFIG_PCI_QUIRKS is not set
# CONFIG_STRIP_ASM_SYMS is not set
CONFIG_COMPAT_BRK=y
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
CONFIG_MARKERS=y
CONFIG_OPROFILE=y
CONFIG_HAVE_OPROFILE=y
# CONFIG_KPROBES is not set
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y

#
# GCOV-based kernel profiling
#
# CONFIG_GCOV_KERNEL is not set
# CONFIG_SLOW_WORK is not set
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
# CONFIG_BLOCK is not set
# CONFIG_FREEZER is not set

#
# Platform support
#
# CONFIG_PPC_CHRP is not set
# CONFIG_MPC5121_ADS is not set
# CONFIG_MPC5121_GENERIC is not set
# CONFIG_PPC_MPC52xx is not set
# CONFIG_PPC_PMAC is not set
# CONFIG_PPC_CELL is not set
# CONFIG_PPC_CELL_NATIVE is not set
# CONFIG_PPC_82xx is not set
# CONFIG_PQ2ADS is not set
# CONFIG_PPC_83xx is not set
# CONFIG_PPC_86xx is not set
# CONFIG_EMBEDDED6xx is not set
# CONFIG_AMIGAONE is not set
CONFIG_PPC_IXIA=y
# CONFIG_PPC_OF_BOOT_TRAMPOLINE is not set
# CONFIG_IPIC is not set
# CONFIG_MPIC is not set
# CONFIG_MPIC_WEIRD is not set
# CONFIG_PPC_I8259 is not set
# CONFIG_PPC_RTAS is not set
# CONFIG_MMIO_NVRAM is not set
# CONFIG_PPC_MPC106 is not set
# CONFIG_PPC_970_NAP is not set
# CONFIG_PPC_INDIRECT_IO is not set
# CONFIG_GENERIC_IOMAP is not set
# CONFIG_CPU_FREQ is not set
# CONFIG_TAU is not set
# CONFIG_FSL_ULI1575 is not set
# CONFIG_SIMPLE_GPIO is not set

#
# Kernel options
#
# CONFIG_HIGHMEM is not set
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
# CONFIG_HZ_300 is not set
CONFIG_HZ_1000=y
CONFIG_HZ=1000
CONFIG_SCHED_HRTICK=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_BINFMT_ELF=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
# CONFIG_HAVE_AOUT is not set
# CONFIG_BINFMT_MISC is not set
# CONFIG_IOMMU_HELPER is not set
# CONFIG_SWIOTLB is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_HAS_WALK_MEMORY=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
# CONFIG_KEXEC is not set
# CONFIG_CRASH_DUMP is not set
CONFIG_ARCH_FLATMEM_ENABLE=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_SELECT_MEMORY_MODEL=y
CONFIG_FLATMEM_MANUAL=y
# CONFIG_DISCONTIGMEM_MANUAL is not set
# CONFIG_SPARSEMEM_MANUAL is not set
CONFIG_FLATMEM=y
CONFIG_FLAT_NODE_MEM_MAP=y
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
# CONFIG_MIGRATION is not set
# CONFIG_PHYS_ADDR_T_64BIT is not set
CONFIG_ZONE_DMA_FLAG=1
CONFIG_VIRT_TO_BUS=y
CONFIG_HAVE_MLOCK=y
CONFIG_HAVE_MLOCKED_PAGE_BIT=y
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
CONFIG_PPC_4K_PAGES=y
# CONFIG_PPC_16K_PAGES is not set
# CONFIG_PPC_64K_PAGES is not set
# CONFIG_PPC_256K_PAGES is not set
CONFIG_FORCE_MAX_ZONEORDER=11
# CONFIG_PROC_DEVICETREE is not set
CONFIG_CMDLINE_BOOL=y
CONFIG_CMDLINE="console=ttyS0 rootfstype=ramfs powersave=off"
CONFIG_EXTRA_TARGETS=""
# CONFIG_PM is not set
# CONFIG_SECCOMP is not set
CONFIG_ISA_DMA_API=y

#
# Bus options
#
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
# CONFIG_PPC_INDIRECT_PCI is not set
CONFIG_PCI=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCI_SYSCALL=y
# CONFIG_PCIEPORTBUS is not set
CONFIG_ARCH_SUPPORTS_MSI=y
# CONFIG_PCI_MSI is not set
# CONFIG_PCI_LEGACY is not set
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_STUB is not set
# CONFIG_PCI_IOV is not set
# CONFIG_HAS_RAPIDIO is not set

#
# Advanced setup
#
CONFIG_ADVANCED_OPTIONS=y
CONFIG_LOWMEM_SIZE_BOOL=y
CONFIG_LOWMEM_SIZE=0x70000000
CONFIG_PAGE_OFFSET_BOOL=y
CONFIG_PAGE_OFFSET=0x80000000
CONFIG_KERNEL_START_BOOL=y
CONFIG_KERNEL_START=0x80000000
CONFIG_PHYSICAL_START=0x00000000
CONFIG_TASK_SIZE_BOOL=y
CONFIG_TASK_SIZE=0x70000000
CONFIG_NET=y

#
# Networking options
#
# CONFIG_NET_SYSCTL_DEV is not set
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=y
CONFIG_XFRM=y
CONFIG_XFRM_USER=m
# CONFIG_XFRM_SUB_POLICY is not set
# CONFIG_XFRM_MIGRATE is not set
# CONFIG_XFRM_STATISTICS is not set
CONFIG_XFRM_IPCOMP=m
CONFIG_NET_KEY=m
# CONFIG_NET_KEY_MIGRATE is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_ADVANCED_ROUTER=y
CONFIG_ASK_IP_FIB_HASH=y
# CONFIG_IP_FIB_TRIE is not set
CONFIG_IP_FIB_HASH=y
CONFIG_IP_MULTIPLE_TABLES=y
CONFIG_IP_IXIA_ROUTING=y
# CONFIG_IP_ROUTE_MULTIPATH is not set
# CONFIG_IP_ROUTE_VERBOSE is not set
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_IP_MROUTE is not set
# CONFIG_ARPD is not set
# CONFIG_SYN_COOKIES is not set
CONFIG_INET_AH=m
CONFIG_INET_ESP=m
CONFIG_INET_IPCOMP=m
CONFIG_INET_XFRM_TUNNEL=m
CONFIG_INET_TUNNEL=y
CONFIG_INET_XFRM_MODE_TRANSPORT=y
CONFIG_INET_XFRM_MODE_TUNNEL=y
CONFIG_INET_XFRM_MODE_BEET=y
# CONFIG_INET_LRO is not set
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IPV6=y
# CONFIG_IPV6_PRIVACY is not set
# CONFIG_IPV6_ROUTER_PREF is not set
# CONFIG_IPV6_OPTIMISTIC_DAD is not set
CONFIG_INET6_AH=m
CONFIG_INET6_ESP=m
# CONFIG_INET6_IPCOMP is not set
# CONFIG_IPV6_MIP6 is not set
# CONFIG_INET6_XFRM_TUNNEL is not set
# CONFIG_INET6_TUNNEL is not set
CONFIG_INET6_XFRM_MODE_TRANSPORT=y
CONFIG_INET6_XFRM_MODE_TUNNEL=y
CONFIG_INET6_XFRM_MODE_BEET=y
# CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION is not set
CONFIG_IPV6_SIT=y
CONFIG_IPV6_NDISC_NODETYPE=y
# CONFIG_IPV6_TUNNEL is not set
# CONFIG_IPV6_MULTIPLE_TABLES is not set
# CONFIG_IPV6_MROUTE is not set
# CONFIG_NETWORK_SECMARK is not set
CONFIG_NETFILTER=y
# CONFIG_NETFILTER_DEBUG is not set
# CONFIG_NETFILTER_ADVANCED is not set

#
# Core Netfilter Configuration
#
# CONFIG_NETFILTER_NETLINK_LOG is not set
# CONFIG_NF_CONNTRACK is not set
CONFIG_NETFILTER_XTABLES=m
# CONFIG_NETFILTER_XT_TARGET_MARK is not set
# CONFIG_NETFILTER_XT_TARGET_NFLOG is not set
# CONFIG_NETFILTER_XT_TARGET_TCPMSS is not set
# CONFIG_NETFILTER_XT_MATCH_MARK is not set
# CONFIG_NETFILTER_XT_MATCH_POLICY is not set
# CONFIG_IP_VS is not set

#
# IP: Netfilter Configuration
#
# CONFIG_NF_DEFRAG_IPV4 is not set
CONFIG_IP_NF_IPTABLES=m
CONFIG_IP_NF_FILTER=m
# CONFIG_IP_NF_TARGET_REJECT is not set
# CONFIG_IP_NF_TARGET_LOG is not set
# CONFIG_IP_NF_TARGET_ULOG is not set
# CONFIG_IP_NF_MANGLE is not set

#
# IPv6: Netfilter Configuration
#
CONFIG_IP6_NF_IPTABLES=m
# CONFIG_IP6_NF_MATCH_IPV6HEADER is not set
# CONFIG_IP6_NF_TARGET_LOG is not set
CONFIG_IP6_NF_FILTER=m
# CONFIG_IP6_NF_TARGET_REJECT is not set
# CONFIG_IP6_NF_MANGLE is not set
# CONFIG_IP_DCCP is not set
# CONFIG_IP_SCTP is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
# CONFIG_BRIDGE is not set
# CONFIG_NET_DSA is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
CONFIG_LLC=y
CONFIG_LLC2=y
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_PHONET is not set
# CONFIG_IEEE802154 is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
# CONFIG_NET_SCH_CBQ is not set
# CONFIG_NET_SCH_HTB is not set
# CONFIG_NET_SCH_HFSC is not set
# CONFIG_NET_SCH_PRIO is not set
# CONFIG_NET_SCH_MULTIQ is not set
# CONFIG_NET_SCH_RED is not set
# CONFIG_NET_SCH_SFQ is not set
# CONFIG_NET_SCH_TEQL is not set
CONFIG_NET_SCH_TBF=m
# CONFIG_NET_SCH_GRED is not set
# CONFIG_NET_SCH_DSMARK is not set
# CONFIG_NET_SCH_NETEM is not set
# CONFIG_NET_SCH_DRR is not set
CONFIG_NET_SCH_INGRESS=m

#
# Classification
#
CONFIG_NET_CLS=y
# CONFIG_NET_CLS_BASIC is not set
CONFIG_NET_CLS_TCINDEX=m
CONFIG_NET_CLS_ROUTE4=m
CONFIG_NET_CLS_ROUTE=y
CONFIG_NET_CLS_FW=m
CONFIG_NET_CLS_U32=m
# CONFIG_CLS_U32_PERF is not set
# CONFIG_CLS_U32_MARK is not set
# CONFIG_NET_CLS_RSVP is not set
# CONFIG_NET_CLS_RSVP6 is not set
# CONFIG_NET_CLS_FLOW is not set
# CONFIG_NET_EMATCH is not set
CONFIG_NET_CLS_ACT=y
CONFIG_NET_ACT_POLICE=y
# CONFIG_NET_ACT_GACT is not set
# CONFIG_NET_ACT_MIRRED is not set
# CONFIG_NET_ACT_IPT is not set
# CONFIG_NET_ACT_NAT is not set
# CONFIG_NET_ACT_PEDIT is not set
# CONFIG_NET_ACT_SIMP is not set
# CONFIG_NET_ACT_SKBEDIT is not set
# CONFIG_NET_CLS_IND is not set
CONFIG_NET_SCH_FIFO=y
# CONFIG_DCB is not set

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_NET_DROP_MONITOR is not set
CONFIG_NET_TXTIMESTAMP=y
# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_AF_RXRPC is not set
CONFIG_FIB_RULES=y
# CONFIG_WIRELESS is not set
# CONFIG_WIMAX is not set
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_SYS_HYPERVISOR is not set
# CONFIG_CONNECTOR is not set
# CONFIG_MTD is not set
CONFIG_OF_DEVICE=y
# CONFIG_PARPORT is not set
# CONFIG_MISC_DEVICES is not set
CONFIG_HAVE_IDE=y

#
# SCSI device support
#
# CONFIG_SCSI_DMA is not set
# CONFIG_SCSI_NETLINK is not set
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#

#
# You can enable one or both FireWire driver stacks.
#

#
# See the help texts for more information.
#
# CONFIG_FIREWIRE is not set
# CONFIG_IEEE1394 is not set
# CONFIG_I2O is not set
# CONFIG_MACINTOSH_DRIVERS is not set
CONFIG_NETDEVICES=y
# CONFIG_IFB is not set
# CONFIG_DUMMY is not set
# CONFIG_BONDING is not set
# CONFIG_MACVLAN is not set
# CONFIG_EQUALIZER is not set
# CONFIG_TUN is not set
# CONFIG_VETH is not set
# CONFIG_ARCNET is not set
# CONFIG_NET_ETHERNET is not set
# CONFIG_NETDEV_1000 is not set
# CONFIG_NETDEV_10000 is not set
# CONFIG_TR is not set

#
# Wireless LAN
#
# CONFIG_WLAN_PRE80211 is not set
# CONFIG_WLAN_80211 is not set

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#
# CONFIG_WAN is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
# CONFIG_NETCONSOLE is not set
# CONFIG_NETPOLL is not set
# CONFIG_NET_POLL_CONTROLLER is not set
# CONFIG_ISDN is not set
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y
# CONFIG_INPUT_FF_MEMLESS is not set
# CONFIG_INPUT_POLLDEV is not set

#
# Userland interfaces
#
# CONFIG_INPUT_MOUSEDEV is not set
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_EVDEV is not set
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
# CONFIG_INPUT_KEYBOARD is not set
# CONFIG_INPUT_MOUSE is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set

#
# Hardware I/O ports
#
# CONFIG_SERIO is not set
# CONFIG_GAMEPORT is not set

#
# Character devices
#
# CONFIG_VT is not set
CONFIG_DEVKMEM=y
# CONFIG_SERIAL_NONSTANDARD is not set
# CONFIG_NOZOMI is not set

#
# Serial drivers
#
# CONFIG_SERIAL_8250 is not set

#
# Non-8250 serial port support
#
# CONFIG_SERIAL_UARTLITE is not set
# CONFIG_SERIAL_JSM is not set
CONFIG_UNIX98_PTYS=y
# CONFIG_DEVPTS_MULTIPLE_INSTANCES is not set
# CONFIG_LEGACY_PTYS is not set
# CONFIG_HVC_UDBG is not set
# CONFIG_IPMI_HANDLER is not set
CONFIG_HW_RANDOM=m
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
# CONFIG_NVRAM is not set
# CONFIG_GEN_RTC is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
# CONFIG_TCG_TPM is not set
CONFIG_DEVPORT=y
CONFIG_IXIA_CONSOLE=y
# CONFIG_I2C is not set
# CONFIG_SPI is not set

#
# PPS support
#
# CONFIG_PPS is not set
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
# CONFIG_GPIOLIB is not set
# CONFIG_W1 is not set
# CONFIG_POWER_SUPPLY is not set
# CONFIG_HWMON is not set
# CONFIG_THERMAL is not set
# CONFIG_THERMAL_HWMON is not set
# CONFIG_WATCHDOG is not set
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
# CONFIG_SSB is not set

#
# Multifunction device drivers
#
# CONFIG_MFD_CORE is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_MFD_TMIO is not set
# CONFIG_REGULATOR is not set
# CONFIG_MEDIA_SUPPORT is not set

#
# Graphics support
#
# CONFIG_AGP is not set
# CONFIG_DRM is not set
# CONFIG_VGASTATE is not set
# CONFIG_VIDEO_OUTPUT_CONTROL is not set
# CONFIG_FB is not set
# CONFIG_BACKLIGHT_LCD_SUPPORT is not set

#
# Display device support
#
# CONFIG_DISPLAY_SUPPORT is not set
# CONFIG_SOUND is not set
# CONFIG_HID_SUPPORT is not set
# CONFIG_USB_SUPPORT is not set
# CONFIG_UWB is not set
# CONFIG_MMC is not set
# CONFIG_MEMSTICK is not set
# CONFIG_NEW_LEDS is not set
# CONFIG_ACCESSIBILITY is not set
# CONFIG_INFINIBAND is not set
# CONFIG_EDAC is not set
# CONFIG_RTC_CLASS is not set
# CONFIG_DMADEVICES is not set
# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set

#
# TI VLYNQ
#
# CONFIG_STAGING is not set

#
# File systems
#
CONFIG_FILE_LOCKING=y
CONFIG_FSNOTIFY=y
# CONFIG_DNOTIFY is not set
# CONFIG_INOTIFY is not set
CONFIG_INOTIFY_USER=y
# CONFIG_QUOTA is not set
# CONFIG_AUTOFS_FS is not set
# CONFIG_AUTOFS4_FS is not set
# CONFIG_FUSE_FS is not set

#
# Caches
#
# CONFIG_FSCACHE is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_SYSCTL=y
# CONFIG_PROC_PAGE_MONITOR is not set
# CONFIG_SYSFS is not set
CONFIG_TMPFS=y
# CONFIG_TMPFS_POSIX_ACL is not set
# CONFIG_HUGETLB_PAGE is not set
# CONFIG_MISC_FILESYSTEMS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
# CONFIG_NFS_V3_ACL is not set
# CONFIG_NFS_V4 is not set
# CONFIG_NFSD is not set
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=y
# CONFIG_RPCSEC_GSS_KRB5 is not set
# CONFIG_RPCSEC_GSS_SPKM3 is not set
# CONFIG_SMB_FS is not set
# CONFIG_CIFS is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set
# CONFIG_NLS is not set
CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_GENERIC_FIND_LAST_BIT=y
# CONFIG_CRC_CCITT is not set
# CONFIG_CRC16 is not set
# CONFIG_CRC_T10DIF is not set
# CONFIG_CRC_ITU_T is not set
# CONFIG_CRC32 is not set
# CONFIG_CRC7 is not set
# CONFIG_LIBCRC32C is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_ZLIB_DEFLATE=y
CONFIG_DECOMPRESS_GZIP=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
CONFIG_HAVE_LMB=y
CONFIG_NLATTR=y
CONFIG_GENERIC_ATOMIC64=y

#
# Kernel hacking
#
CONFIG_PRINTK_TIME=y
CONFIG_ENABLE_WARN_DEPRECATED=y
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=1024
# CONFIG_MAGIC_SYSRQ is not set
# CONFIG_UNUSED_SYMBOLS is not set
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
CONFIG_DEBUG_KERNEL=y
# CONFIG_DEBUG_SHIRQ is not set
# CONFIG_DETECT_SOFTLOCKUP is not set
# CONFIG_DETECT_HUNG_TASK is not set
# CONFIG_SCHED_DEBUG is not set
# CONFIG_SCHEDSTATS is not set
# CONFIG_TIMER_STATS is not set
# CONFIG_DEBUG_OBJECTS is not set
# CONFIG_DEBUG_SLAB is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
# CONFIG_DEBUG_SPINLOCK is not set
# CONFIG_DEBUG_MUTEXES is not set
# CONFIG_DEBUG_LOCK_ALLOC is not set
# CONFIG_PROVE_LOCKING is not set
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
# CONFIG_DEBUG_VM is not set
# CONFIG_DEBUG_WRITECOUNT is not set
# CONFIG_DEBUG_MEMORY_INIT is not set
# CONFIG_DEBUG_LIST is not set
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_CPU_STALL_DETECTOR is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_FAULT_INJECTION is not set
# CONFIG_LATENCYTOP is not set
CONFIG_SYSCTL_SYSCALL_CHECK=y
# CONFIG_DEBUG_PAGEALLOC is not set
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_TRACING=y
CONFIG_TRACING_SUPPORT=y
# CONFIG_FTRACE is not set
# CONFIG_DYNAMIC_DEBUG is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
# CONFIG_KMEMCHECK is not set
# CONFIG_PPC_DISABLE_WERROR is not set
CONFIG_PPC_WERROR=y
CONFIG_PRINT_STACK_DEPTH=64
# CONFIG_DEBUG_STACKOVERFLOW is not set
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_PPC_EMULATED_STATS is not set
# CONFIG_CODE_PATCHING_SELFTEST is not set
# CONFIG_FTR_FIXUP_SELFTEST is not set
# CONFIG_MSI_BITMAP_SELFTEST is not set
# CONFIG_XMON is not set
# CONFIG_IRQSTACKS is not set
# CONFIG_VIRQ_DEBUG is not set
# CONFIG_BDI_SWITCH is not set
# CONFIG_BOOTX_TEXT is not set
# CONFIG_PPC_EARLY_DEBUG is not set

#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITYFS is not set
# CONFIG_SECURITY_FILE_CAPABILITIES is not set
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
# CONFIG_CRYPTO_FIPS is not set
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD=m
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER=y
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_PCOMP=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
# CONFIG_CRYPTO_GF128MUL is not set
# CONFIG_CRYPTO_NULL is not set
CONFIG_CRYPTO_WORKQUEUE=y
# CONFIG_CRYPTO_CRYPTD is not set
CONFIG_CRYPTO_AUTHENC=m
# CONFIG_CRYPTO_TEST is not set

#
# Authenticated Encryption with Associated Data
#
# CONFIG_CRYPTO_CCM is not set
# CONFIG_CRYPTO_GCM is not set
# CONFIG_CRYPTO_SEQIV is not set

#
# Block modes
#
CONFIG_CRYPTO_CBC=y
# CONFIG_CRYPTO_CTR is not set
# CONFIG_CRYPTO_CTS is not set
# CONFIG_CRYPTO_ECB is not set
# CONFIG_CRYPTO_LRW is not set
# CONFIG_CRYPTO_PCBC is not set
# CONFIG_CRYPTO_XTS is not set

#
# Hash modes
#
CONFIG_CRYPTO_HMAC=y
# CONFIG_CRYPTO_XCBC is not set

#
# Digest
#
# CONFIG_CRYPTO_CRC32C is not set
# CONFIG_CRYPTO_MD4 is not set
CONFIG_CRYPTO_MD5=y
# CONFIG_CRYPTO_MICHAEL_MIC is not set
# CONFIG_CRYPTO_RMD128 is not set
# CONFIG_CRYPTO_RMD160 is not set
# CONFIG_CRYPTO_RMD256 is not set
# CONFIG_CRYPTO_RMD320 is not set
CONFIG_CRYPTO_SHA1=y
# CONFIG_CRYPTO_SHA256 is not set
# CONFIG_CRYPTO_SHA512 is not set
# CONFIG_CRYPTO_TGR192 is not set
# CONFIG_CRYPTO_WP512 is not set

#
# Ciphers
#
# CONFIG_CRYPTO_AES is not set
# CONFIG_CRYPTO_ANUBIS is not set
# CONFIG_CRYPTO_ARC4 is not set
# CONFIG_CRYPTO_BLOWFISH is not set
# CONFIG_CRYPTO_CAMELLIA is not set
# CONFIG_CRYPTO_CAST5 is not set
# CONFIG_CRYPTO_CAST6 is not set
CONFIG_CRYPTO_DES=y
# CONFIG_CRYPTO_FCRYPT is not set
# CONFIG_CRYPTO_KHAZAD is not set
# CONFIG_CRYPTO_SALSA20 is not set
# CONFIG_CRYPTO_SEED is not set
# CONFIG_CRYPTO_SERPENT is not set
# CONFIG_CRYPTO_TEA is not set
# CONFIG_CRYPTO_TWOFISH is not set

#
# Compression
#
CONFIG_CRYPTO_DEFLATE=y
# CONFIG_CRYPTO_ZLIB is not set
# CONFIG_CRYPTO_LZO is not set

#
# Random Number Generation
#
# CONFIG_CRYPTO_ANSI_CPRNG is not set
# CONFIG_CRYPTO_HW is not set
# CONFIG_PPC_CLOCK is not set
# CONFIG_VIRTUALIZATION is not set
Eric Dumazet Oct. 25, 2009, 7:28 p.m. UTC | #11
Octavian Purdila a écrit :
> On Sunday 25 October 2009 10:35:10 you wrote:
>>> Got some time today and did some experiments myself. The test is deleting
>>> 1000 dummy interfaces (interface status down, no IP/IPv6 addresses
>>> assigned) on a UP non-preempt ppc750 @800Mhz system.
>>>
>>> 1. Ben's patch:
>>>
>>> real    0m 3.42s
>>> user    0m 0.00s
>>> sys     0m 0.00s
>>>
>>> 2. Eric's schedule_timeout_uninterruptible(1);
>>>
>>> real    0m 3.00s
>>> user    0m 0.00s
>>> sys     0m 0.00s
>>>
>>> 3. Simple synchronize_rcu_expedited()
>>>
>>> This doesn't seem to work well with the UP non-preempt case since
>>> synchronize_rcu_expedited() is a noop in this case - turning
>>> netdev_wait_allrefs() into a while(1) loop.
>> Thanks for these numbers. I presume HZ value is 1000 on this platform ?
>>
> 
> Yes. I've attach the full config to this email as well.
> 
>> Could you give us your scripts so that we can use same "benchmark" ?
>>
> 
> Sure, I've attached the hack module code I've used. 
> 
> For creating interfaces: echo 1000 > /proc/sys/net/ndst/add
> For deleting interface echo start_ifindex stop_ifindex > /proc/sys/net/ndst/del
> 
> Some more information:
> 
> - on our old and optimized kernel I am getting 0.4s for creating 128000 
> interfaces and 0.57s for deleting them
> 
> - the 2.6.31 kernel I got the 3s numbers does have some patches to speed-up 
> interface creating and deletion (removal of per device sysctl and dev_snmp6 
> entries)
> 
> I'll start posting the patches we have as RFC.
> 

OK thanks, I thought you were using dummy module

$ time insmod drivers/net/dummy.ko numdummies=100

real    0m2.493s
user    0m0.001s
sys     0m0.021s

$ time rmmod dummy

real    0m1.610s
user    0m0.000s
sys     0m0.001s

$ time insmod drivers/net/dummy.ko numdummies=200

real    0m10.118s
user    0m0.000s
sys     0m0.015s

$ time rmmod dummy

real    0m3.218s
user    0m0.000s
sys     0m0.001s

$ time insmod drivers/net/dummy.ko numdummies=300

real    0m22.564s
user    0m0.000s
sys     0m0.034s

$ time rmmod dummy

real    0m4.755s
user    0m0.000s
sys     0m0.006s

$ perf record -f insmod drivers/net/dummy.ko numdummies=300
$ perf report
# Samples: 898
#
# Overhead  Command           Shared Object  Symbol
# ........  .......  ......................  ......
#
    41.65%   insmod  [kernel]                [k] __register_sysctl_paths
    22.83%   insmod  [kernel]                [k] strcmp
     5.46%   insmod  [kernel]                [k] pcpu_alloc
     2.23%   insmod  [kernel]                [k] sysfs_find_dirent
     1.56%   insmod  [kernel]                [k] __sysfs_add_one
     1.11%   insmod  [kernel]                [k] pcpu_alloc_area
     1.11%   insmod  [kernel]                [k] _spin_lock
     1.00%   insmod  [kernel]                [k] kmemdup
     1.00%   insmod  [kernel]                [k] kmem_cache_alloc
     0.67%   insmod  [kernel]                [k] find_symbol_in_section
     0.67%   insmod  [kernel]                [k] find_next_zero_bit
     0.67%   insmod  [kernel]                [k] idr_get_empty_slot
     0.67%   insmod  [kernel]                [k] mutex_lock
     0.67%   insmod  [kernel]                [k] mutex_unlock
     0.56%   insmod  [kernel]                [k] vunmap_page_range
     0.56%   insmod  [kernel]                [k] __slab_alloc
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index fa88dcd..9b04b9a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4970,6 +4970,7 @@  EXPORT_SYMBOL(register_netdev);
 static void netdev_wait_allrefs(struct net_device *dev)
 {
 	unsigned long rebroadcast_time, warning_time;
+	unsigned int count = 0;
 
 	rebroadcast_time = warning_time = jiffies;
 	while (atomic_read(&dev->refcnt) != 0) {
@@ -4995,7 +4996,10 @@  static void netdev_wait_allrefs(struct net_device *dev)
 			rebroadcast_time = jiffies;
 		}
 
-		msleep(250);
+		if (count++ == 1)
+			synchronize_rcu_expedited();
+		else
+			schedule_timeout_uninterruptible(1);
 
 		if (time_after(jiffies, warning_time + 10 * HZ)) {
 			printk(KERN_EMERG "unregister_netdevice: "