Message ID | 4ADF2B57.4030708@gmail.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
Eric Dumazet a écrit : > Octavian Purdila a écrit : >> On Sunday 18 October 2009 21:21:44 you wrote: >>>> The msleep(250) should be tuned first. Then if this is really necessary >>>> to dismantle 100.000 netdevices per second, we might have to think a bit >>>> more. >>>> Just try msleep(1 or 2), it should work quite well. >>> My goal is tearing down 100,000 interfaces in a few seconds, which really >>> is necessary. Right now we're running about 40,000 interfaces on a not >>> yet saturated 10Gbps link. Going to dual 10Gbps links means pushing more >>> than 100,000 subscriber interfaces, and it looks like a modern dual socket >>> system can handle that. >>> >> I would also like to see this patch in, we are running into scalability issues >> with creating/deleting lots of interfaces as well. > > Ben patch only address interface deletion, and one part of the problem, > maybe the more visible one for the current kernel. > > Adding lots of interfaces only needs several threads to run concurently. > > Before applying/examining his patch I suggest identifying all dev_put() spots than > can be deleted and replaced by something more scalable. I began this job > but others can help me. > > RTNL and rcu grace periods are going to hurt anyway, so you probably need > to use many tasks to be able to delete lots of interfaces in parallel. > > netdev_run_todo() should also use a better algorithm to allow parallelism. > > Following patch doesnt slow down dev_put() users and real scalability > problems will surface and might be addressed. > Here are typical timings (on current kernel, but on following example netdev_wait_allrefs() doesnt wait at all, because my netdevice has no refs) # time ip link add link eth3 address 00:1E:0B:8F:D0:D6 mv161 type macvlan real 0m0.001s user 0m0.000s sys 0m0.001s # time ip link set mv161 up real 0m0.001s user 0m0.000s sys 0m0.001s # time ip link set mv161 down real 0m0.021s user 0m0.000s sys 0m0.001s # time ip link del mv161 real 0m0.022s user 0m0.000s sys 0m0.001s # time ip link add link eth3 address 00:1E:0B:8F:D0:D6 mv161 type macvlan real 0m0.001s user 0m0.001s sys 0m0.001s # time ip link set mv161 up real 0m0.001s user 0m0.000s sys 0m0.001s # time ip link del mv161 real 0m0.036s user 0m0.000s sys 0m0.001s 22 ms (or 36 ms) delay are also problematic if you want to dismantle 1.000.000 netdevices at once. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 21, 2009 at 05:40:07PM +0200, Eric Dumazet wrote: > Ben patch only address interface deletion, and one part of the problem, > maybe the more visible one for the current kernel. The first part I've been tackling has been the overhead in procfs, sysctl and sysfs. I've got patches for some of the issues, hacks for others, and should have something to post in a few days. Getting rid of those overheads is enough to get to decent interface creation times for the first 20 or 30k interfaces. On the interface deletion side of things, within the network code, fib_hash has a few linear scans that really start hurting. trie is a bit better, but I haven't started digging too deeply into its flush/remove overhead yet, aside from noticing that trie has a linear scan if CONFIG_IP_ROUTE_MULTIPATH is set since it sets the hash size to 1. fn_trie_flush() is currently the worst offender during deletion. -ben -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wednesday 21 October 2009 18:40:07 you wrote: > > > > I would also like to see this patch in, we are running into scalability > > issues with creating/deleting lots of interfaces as well. > > Ben patch only address interface deletion, and one part of the problem, > maybe the more visible one for the current kernel. > > Adding lots of interfaces only needs several threads to run concurently. > > Before applying/examining his patch I suggest identifying all dev_put() > spots than can be deleted and replaced by something more scalable. I began > this job but others can help me. > Yes, I agree with you, there are multiple places which needs to be touched to allow for better scaling with regard to the number of interfaces. We do have patches that addresses some of these issues, but unfortunately they are based on 2.6.7 and some of them are quite ugly hacks :) However, we are in the process of switching to 2.6.31 so I hope we will be able to contribute on this effort. > RTNL and rcu grace periods are going to hurt anyway, so you probably need > to use many tasks to be able to delete lots of interfaces in parallel. > Hmm, how would multiple tasks help here? Isn't the RTNL mutex global? > netdev_run_todo() should also use a better algorithm to allow parallelism. > > Following patch doesnt slow down dev_put() users and real scalability > problems will surface and might be addressed. > > [PATCH] net: allow netdev_wait_allrefs() to run faster > Thanks, I am going to test it on our platform and send back the results. tavi -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Benjamin LaHaise a écrit : > On Wed, Oct 21, 2009 at 05:40:07PM +0200, Eric Dumazet wrote: >> Ben patch only address interface deletion, and one part of the problem, >> maybe the more visible one for the current kernel. > > The first part I've been tackling has been the overhead in procfs, sysctl > and sysfs. I've got patches for some of the issues, hacks for others, and > should have something to post in a few days. Getting rid of those overheads > is enough to get to decent interface creation times for the first 20 or 30k > interfaces. > > On the interface deletion side of things, within the network code, fib_hash > has a few linear scans that really start hurting. trie is a bit better, > but I haven't started digging too deeply into its flush/remove overhead yet, > aside from noticing that trie has a linear scan if > CONFIG_IP_ROUTE_MULTIPATH is set since it sets the hash size to 1. > fn_trie_flush() is currently the worst offender during deletion. Well, there are many things to change... # ip -o link | wc -l 13097 # time ip -o link show mv22248 13045: mv22248@eth3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN \ link/ether 00:1e:0b:8e:c8:08 brd ff:ff:ff:ff:ff:ff real 0m0.840s user 0m0.473s sys 0m0.368s almost one second to get link status of one particular interface :( -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 21, 2009 at 05:40:07PM +0200, Eric Dumazet wrote: > Octavian Purdila a écrit : > > On Sunday 18 October 2009 21:21:44 you wrote: > >>> The msleep(250) should be tuned first. Then if this is really necessary > >>> to dismantle 100.000 netdevices per second, we might have to think a bit > >>> more. > >>> Just try msleep(1 or 2), it should work quite well. > >> My goal is tearing down 100,000 interfaces in a few seconds, which really > >> is necessary. Right now we're running about 40,000 interfaces on a not > >> yet saturated 10Gbps link. Going to dual 10Gbps links means pushing more > >> than 100,000 subscriber interfaces, and it looks like a modern dual socket > >> system can handle that. > >> > > > > I would also like to see this patch in, we are running into scalability issues > > with creating/deleting lots of interfaces as well. > > Ben patch only address interface deletion, and one part of the problem, > maybe the more visible one for the current kernel. > > Adding lots of interfaces only needs several threads to run concurently. > > Before applying/examining his patch I suggest identifying all dev_put() spots than > can be deleted and replaced by something more scalable. I began this job > but others can help me. > > RTNL and rcu grace periods are going to hurt anyway, so you probably need > to use many tasks to be able to delete lots of interfaces in parallel. > > netdev_run_todo() should also use a better algorithm to allow parallelism. > > Following patch doesnt slow down dev_put() users and real scalability > problems will surface and might be addressed. > > [PATCH] net: allow netdev_wait_allrefs() to run faster > > netdev_wait_allrefs() waits that all references to a device vanishes. > > It currently uses a _very_ pessimistic 250 ms delay between each probe. > Some users report that no more than 4 devices can be dismantled per second, > this is a pretty serious problem for extreme setups. > > Most likely, references only wait for a rcu grace period that should come > fast, so use a schedule_timeout_uninterruptible(1) to allow faster recovery. Is this a place where synchronize_rcu_expedited() is appropriate? (It went in to 2.6.32-rc1.) Thanx, Paul > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > --- > net/core/dev.c | 2 +- > 1 files changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 28b0b9e..fca2e4a 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4983,7 +4983,7 @@ static void netdev_wait_allrefs(struct net_device *dev) > rebroadcast_time = jiffies; > } > > - msleep(250); > + schedule_timeout_uninterruptible(1); > > if (time_after(jiffies, warning_time + 10 * HZ)) { > printk(KERN_EMERG "unregister_netdevice: " > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Benjamin LaHaise <bcrl@lhnet.ca> writes: > On Wed, Oct 21, 2009 at 05:40:07PM +0200, Eric Dumazet wrote: >> Ben patch only address interface deletion, and one part of the problem, >> maybe the more visible one for the current kernel. > > The first part I've been tackling has been the overhead in procfs, sysctl > and sysfs. Could you keep me in the loop with that. I have some pending cleanups for all of those pieces of code and may be able to help/advice/review. Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/core/dev.c b/net/core/dev.c index 28b0b9e..fca2e4a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4983,7 +4983,7 @@ static void netdev_wait_allrefs(struct net_device *dev) rebroadcast_time = jiffies; } - msleep(250); + schedule_timeout_uninterruptible(1); if (time_after(jiffies, warning_time + 10 * HZ)) { printk(KERN_EMERG "unregister_netdevice: "