diff mbox

[RFC] make per interface sysctl entries configurable

Message ID 200910251954.49700.opurdila@ixiacom.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Octavian Purdila Oct. 25, 2009, 5:54 p.m. UTC
RFC patches are attached.

Another possible approach: add an interface flag and use it to decide whether 
we want per interface sysctl entries or not.

Benchmarks for creating 1000 interface (with the ndst module previously posted 
on the list, ppc750 @800Mhz machine):

- without the patches:

real    4m 38.27s
user    0m 0.00s
sys     2m 18.90s

- with the patches:

real    0m 0.10s
user    0m 0.00s
sys     0m 0.05s

Thanks,
tavi

Comments

Denys Fedoryshchenko Oct. 25, 2009, 6:07 p.m. UTC | #1
Very interesting patch, because i have PPPoE and sysctl locking issue my issue 
N1(accorting to perf and oprofile) on massive pppoe login and during 
operation.
Probably i will try to apply it on one of loaded (but redundant, in case it 
will crash) pppoe.

On Sunday 25 October 2009 19:54:49 Octavian Purdila wrote:
> RFC patches are attached.
>
> Another possible approach: add an interface flag and use it to decide
> whether we want per interface sysctl entries or not.
>
> Benchmarks for creating 1000 interface (with the ndst module previously
> posted on the list, ppc750 @800Mhz machine):
>
> - without the patches:
>
> real    4m 38.27s
> user    0m 0.00s
> sys     2m 18.90s
>
> - with the patches:
>
> real    0m 0.10s
> user    0m 0.00s
> sys     0m 0.05s
>
> Thanks,
> tavi


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 25, 2009, 9:37 p.m. UTC | #2
Octavian Purdila a écrit :
> RFC patches are attached.
> 
> Another possible approach: add an interface flag and use it to decide whether 
> we want per interface sysctl entries or not.
> 

Hmm, could we speedup sysctl instead, adding rbtree or something ?


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Octavian Purdila Oct. 25, 2009, 10:21 p.m. UTC | #3
On Sunday 25 October 2009 23:37:19 you wrote:
> Octavian Purdila a écrit :
> > RFC patches are attached.
> >
> > Another possible approach: add an interface flag and use it to decide
> > whether we want per interface sysctl entries or not.
> 
> Hmm, could we speedup sysctl instead, adding rbtree or something ?
> 

Very good point, I think this is the best solution for people using a 
moderately high number of interfaces (a few thousand).

But for really large setups there is another issue: memory consumption. In 
fact, in order to be able to scale to 128K interfaces and still have a 
significant amount of memory available to applications we also had to disable 
sysfs and #ifdef CONFIG_SYSFS struct device from net_device.

I would also argue that when you have such a large number of interfaces you 
don't need to change setting on a per interface basis. Or at least this is our 
case :)  and I suspect that the case with a large number of PPP interfaces is 
similar.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Denys Fedoryshchenko Oct. 25, 2009, 10:32 p.m. UTC | #4
On Monday 26 October 2009 00:21:48 Octavian Purdila wrote:

> Very good point, I think this is the best solution for people using a
> moderately high number of interfaces (a few thousand).
>
> But for really large setups there is another issue: memory consumption. In
> fact, in order to be able to scale to 128K interfaces and still have a
> significant amount of memory available to applications we also had to
> disable sysfs and #ifdef CONFIG_SYSFS struct device from net_device.
>
> I would also argue that when you have such a large number of interfaces you
> don't need to change setting on a per interface basis. Or at least this is
> our case :)  and I suspect that the case with a large number of PPP
> interfaces is similar.
I will add also, sysctl -a (over busybox) on pppoe with 2k interfaces takes 
ages to complete.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
stephen hemminger Oct. 26, 2009, 4:31 a.m. UTC | #5
On Sun, 25 Oct 2009 19:54:49 +0200
Octavian Purdila <opurdila@ixiacom.com> wrote:

> 
> RFC patches are attached.
> 
> Another possible approach: add an interface flag and use it to decide whether 
> we want per interface sysctl entries or not.
> 
> Benchmarks for creating 1000 interface (with the ndst module previously posted 
> on the list, ppc750 @800Mhz machine):
> 
> - without the patches:
> 
> real    4m 38.27s
> user    0m 0.00s
> sys     2m 18.90s
> 
> - with the patches:
> 
> real    0m 0.10s
> user    0m 0.00s
> sys     0m 0.05s
> 
> Thanks,
> tavi

I would rather optimize the algorithm than give up and make it
not available. It should be possible to do better by just using some
better programming.
Cosmin Ratiu Oct. 26, 2009, 9:01 a.m. UTC | #6
On Monday 26 October 2009 00:21:48 Octavian Purdila wrote:
> On Sunday 25 October 2009 23:37:19 you wrote:
> > Octavian Purdila a écrit :
> > > RFC patches are attached.
> > >
> > > Another possible approach: add an interface flag and use it to decide
> > > whether we want per interface sysctl entries or not.
> >
> > Hmm, could we speedup sysctl instead, adding rbtree or something ?
> 
> Very good point, I think this is the best solution for people using a
> moderately high number of interfaces (a few thousand).
> 
> But for really large setups there is another issue: memory consumption. In
> fact, in order to be able to scale to 128K interfaces and still have a
> significant amount of memory available to applications we also had to
>  disable sysfs and #ifdef CONFIG_SYSFS struct device from net_device.
> 
> I would also argue that when you have such a large number of interfaces you
> don't need to change setting on a per interface basis. Or at least this is
>  our case :)  and I suspect that the case with a large number of PPP
>  interfaces is similar.
> 

Another possible approach: shared settings for an interface group. If you have 
a large number of interfaces of the same type it would be nice if you could 
change some setting for the whole group instead of globally or individually. 

Is this approach feasible anyway? Or I'm talking rubbish.

Cosmin.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Denys Fedoryshchenko Oct. 26, 2009, 3:24 p.m. UTC | #7
I test it on pppoe with 1k customers. It works flawlessly.

When there is problem on network and i have massive users disconnect and then 
login, the bottleneck is in lock somewhere in creation of sysctl(according 
perf). PPPoE after 200-300 interfaces will start dying, and connection rate 
will drop to 20-50 customers per minute, load average will jump to 70-100 (i 
guess pppd processes waiting their turn). With this patch i am able to 
sustain 200-300 customers / minute login rate and perftop is "clear" now.

Definitely this option is optional, and doesn't cut any functionality by 
default, just giving more choice. And for PPP (pppoe/pptp) NAS it is very 
useful.

On Sunday 25 October 2009 19:54:49 Octavian Purdila wrote:
> RFC patches are attached.
>
> Another possible approach: add an interface flag and use it to decide
> whether we want per interface sysctl entries or not.
>
> Benchmarks for creating 1000 interface (with the ndst module previously
> posted on the list, ppc750 @800Mhz machine):
>
> - without the patches:
>
> real    4m 38.27s
> user    0m 0.00s
> sys     2m 18.90s
>
> - with the patches:
>
> real    0m 0.10s
> user    0m 0.00s
> sys     0m 0.05s
>
> Thanks,
> tavi


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

net: CONFIG_NET_SYSCTL_DEV: make per interface dev_snmp6 proc entries optional
        
Use same CONFIG_NET_SYSCTL_DEV config option (we should probably
rename it to a better name) to enable/disable per interface dev_snmp6
proc entries.

--- //packages/linux_2.6.31/rc7/src/include/net/ipv6.h
+++ //packages/linux_2.6.31/rc7/src/include/net/ipv6.h
@@ -604,8 +604,14 @@ 
 extern void udplite6_proc_exit(void);
 extern int  ipv6_misc_proc_init(void);
 extern void ipv6_misc_proc_exit(void);
+
+#ifdef CONFIG_NET_SYSCTL_DEV
 extern int snmp6_register_dev(struct inet6_dev *idev);
 extern int snmp6_unregister_dev(struct inet6_dev *idev);
+#else
+static inline int snmp6_register_dev(struct inet6_dev *idev) { return 0; }
+static inline int snmp6_unregister_dev(struct inet6_dev *idev) { return 0; }
+#endif
 
 #else
 static inline int ac6_proc_init(struct net *net) { return 0; }
--- //packages/linux_2.6.31/rc7/src/net/ipv6/proc.c
+++ //packages/linux_2.6.31/rc7/src/net/ipv6/proc.c
@@ -232,6 +232,7 @@ 
 	.release = single_release,
 };
 
+#ifdef CONFIG_NET_SYSCTL_DEV
 int snmp6_register_dev(struct inet6_dev *idev)
 {
 	struct proc_dir_entry *p;
@@ -266,6 +267,7 @@ 
 	idev->stats.proc_dir_entry = NULL;
 	return 0;
 }
+#endif
 
 static int ipv6_proc_init_net(struct net *net)
 {