Message ID | 87k3ozm5v1.fsf@xmission.com |
---|---|
State | RFC, archived |
Delegated to: | stephen hemminger |
Headers | show |
The whole ifindex map is a design mistake at this point. Better off to do a lazy cache or something like that. On Fri, Mar 22, 2013 at 3:23 PM, Eric W. Biederman <ebiederm@xmission.com> wrote: > > Because ip link add, set, and delete map the interface name to the > interface index by dumping all of the interfaces before performing > their respective commands. Operations that should be constant time > slow down when lots of network interfaces are in use. Resulting > in O(N^2) time to work with O(N) devices. > > Make the work that iproute does constant time by passing the interface > name to the kernel instead. > > In small scale testing on my system this shows dramatic performance > increases of ip link add from 120s to just 11s to add 5000 network > devices. And from longer than I cared to wait to just 58s to delete > all of those interfaces again. > > Cc: Serge Hallyn <serge.hallyn@ubuntu.com> > Reported-by: Benoit Lourdelet <blourdel@juniper.net> > Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> > --- > > I think I am bungling the case where people specify an ifindex as ifNNNN > but does anyone care? > > ip/iplink.c | 19 +------------------ > 1 files changed, 1 insertions(+), 18 deletions(-) > > diff --git a/ip/iplink.c b/ip/iplink.c > index ad33611..6dffbf0 100644 > --- a/ip/iplink.c > +++ b/ip/iplink.c > @@ -533,8 +533,6 @@ static int iplink_modify(int cmd, unsigned int flags, int argc, char **argv) > } > } > > - ll_init_map(&rth); > - > if (!(flags & NLM_F_CREATE)) { > if (!dev) { > fprintf(stderr, "Not enough information: \"dev\" " > @@ -542,27 +540,12 @@ static int iplink_modify(int cmd, unsigned int flags, int argc, char **argv) > exit(-1); > } > > - req.i.ifi_index = ll_name_to_index(dev); > - if (req.i.ifi_index == 0) { > - fprintf(stderr, "Cannot find device \"%s\"\n", dev); > - return -1; > - } > + name = dev; > } else { > /* Allow "ip link add dev" and "ip link add name" */ > if (!name) > name = dev; > > - if (link) { > - int ifindex; > - > - ifindex = ll_name_to_index(link); > - if (ifindex == 0) { > - fprintf(stderr, "Cannot find device \"%s\"\n", > - link); > - return -1; > - } > - addattr_l(&req.n, sizeof(req), IFLA_LINK, &ifindex, 4); > - } > } > > if (name) { > -- > 1.7.5.4 > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, I re-tested with the patch and got the following results on a 32x 2Ghz core system. # veth add delete 1000 36 34 3000 259 137 4000 462 195 5000 729 N/A The script to create is the following : for i in `seq 1 5000`; do sudo ip link add type veth Done The script to delete: for d in /sys/class/net/veth*; do ip link del `basename $d` 2>/dev/null || true Done There is a very good improvement in deletion. iproute2 does not seems to be well multithread as I get time divided by a factor of 2 with a 8x 3.2 Ghz core system. I don¹t know if that is the improvement you expected ? Would the iproute2 redesign you mentioned help improve performance even further ? As a reference : Iproute2 baseline w/o patch: # veth add delete 1000 57 70 2000 193 250 3000 435 510 4000 752 824 5000 1123 1185 Regards Benoit On 22/03/2013 23:27, "Stephen Hemminger" <stephen@networkplumber.org> wrote: >The whole ifindex map is a design mistake at this point. >Better off to do a lazy cache or something like that. > > >On Fri, Mar 22, 2013 at 3:23 PM, Eric W. Biederman ><ebiederm@xmission.com> wrote: >> >> Because ip link add, set, and delete map the interface name to the >> interface index by dumping all of the interfaces before performing >> their respective commands. Operations that should be constant time >> slow down when lots of network interfaces are in use. Resulting >> in O(N^2) time to work with O(N) devices. >> >> Make the work that iproute does constant time by passing the interface >> name to the kernel instead. >> >> In small scale testing on my system this shows dramatic performance >> increases of ip link add from 120s to just 11s to add 5000 network >> devices. And from longer than I cared to wait to just 58s to delete >> all of those interfaces again. >> >> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> >> Reported-by: Benoit Lourdelet <blourdel@juniper.net> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> >> --- >> >> I think I am bungling the case where people specify an ifindex as ifNNNN >> but does anyone care? >> >> ip/iplink.c | 19 +------------------ >> 1 files changed, 1 insertions(+), 18 deletions(-) >> >> diff --git a/ip/iplink.c b/ip/iplink.c >> index ad33611..6dffbf0 100644 >> --- a/ip/iplink.c >> +++ b/ip/iplink.c >> @@ -533,8 +533,6 @@ static int iplink_modify(int cmd, unsigned int >>flags, int argc, char **argv) >> } >> } >> >> - ll_init_map(&rth); >> - >> if (!(flags & NLM_F_CREATE)) { >> if (!dev) { >> fprintf(stderr, "Not enough information: >>\"dev\" " >> @@ -542,27 +540,12 @@ static int iplink_modify(int cmd, unsigned int >>flags, int argc, char **argv) >> exit(-1); >> } >> >> - req.i.ifi_index = ll_name_to_index(dev); >> - if (req.i.ifi_index == 0) { >> - fprintf(stderr, "Cannot find device \"%s\"\n", >>dev); >> - return -1; >> - } >> + name = dev; >> } else { >> /* Allow "ip link add dev" and "ip link add name" */ >> if (!name) >> name = dev; >> >> - if (link) { >> - int ifindex; >> - >> - ifindex = ll_name_to_index(link); >> - if (ifindex == 0) { >> - fprintf(stderr, "Cannot find device >>\"%s\"\n", >> - link); >> - return -1; >> - } >> - addattr_l(&req.n, sizeof(req), IFLA_LINK, >>&ifindex, 4); >> - } >> } >> >> if (name) { >> -- >> 1.7.5.4 >> > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Benoit Lourdelet <blourdel@juniper.net> writes: > Hello, > > I re-tested with the patch and got the following results on a 32x 2Ghz > core system. > > # veth add delete > 1000 36 34 > 3000 259 137 > 4000 462 195 > 5000 729 N/A > > The script to create is the following : > for i in `seq 1 5000`; do > sudo ip link add type veth > Done Which performs horribly as I mentioned earlier because you are asking the kernel to create the names. If you want performance you need to specify the names of the network devices you are creating. aka ip link add a$i type veth name b$i > The script to delete: > for d in /sys/class/net/veth*; do > ip link del `basename $d` 2>/dev/null || true > Done > > There is a very good improvement in deletion. > > > > iproute2 does not seems to be well multithread as I get time divided by a > factor of 2 with a 8x 3.2 Ghz core system. All netlink traffic and all network stack configuration is serialized by the rtnl_lock in the kernel. This is the slow path in the kernel, not the fast path. > I don¹t know if that is the improvement you expected ? > > Would the iproute2 redesign you mentioned help improve performance even > further ? Specifing the names would dramatically improve your creation performance. It should only take you about 10s for 5000 veth pairs. But you have to specify the names. Anyway I have exhausted my time, and inclination in this matter. Good luck with whatever your problem is. Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Quoting Eric W. Biederman (ebiederm@xmission.com): > Specifing the names would dramatically improve your creation > performance. It should only take you about 10s for 5000 veth pairs. > But you have to specify the names. Thanks, Eric. I'm going to update lxc to always specify names for the veth pairs, rather than only when they are requested by the user's configuration file. -serge -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Actually, lxc is using random names now, so it's ok. Benoit, can you use the patches from Eric with lxc (or use the script you were using before but specify names as he said)? -serge -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2013-03-26 at 11:51 +0000, Benoit Lourdelet wrote: > The script to delete: > for d in /sys/class/net/veth*; do > ip link del `basename $d` 2>/dev/null || true > Done > > There is a very good improvement in deletion. I can do better ;) If you are really doing this kind of things, you could use : rmmod veth Note that "ip" command supports a batch mode ip -batch filename In this case, the caching is done only once. Eric, Stephen, one possibility would be to use the cache only in batch mode. Anyway caching is wrong because several users can use ip command at the same time. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello Serge, I am indeed using Eric patch with lxc. It solves the initial problem of slowness to start around 1600 containers. I am now able to start more than 2000 without having new containers slower and slower to start. thanks Benoit On 26/03/2013 15:33, "Serge Hallyn" <serge.hallyn@ubuntu.com> wrote: >Actually, lxc is using random names now, so it's ok. > >Benoit, can you use the patches from Eric with lxc (or use the script >you were using before but specify names as he said)? > >-serge > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Benoit Lourdelet <blourdel@juniper.net> writes: > Hello Serge, > > I am indeed using Eric patch with lxc. > > It solves the initial problem of slowness to start around 1600 > containers. Good so now we just need a production ready patch for iproute. > I am now able to start more than 2000 without having new containers > slower and slower to start. May I ask how large a box you are running and how complex your containers are. I am trying to get a feel for how common it is likely to be to find people running thousands of containers on a single machine. Eric -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello Eric, I am running simple containers (2 network interfaces, 10MB of RAM, default routing) and want to test scalability. Out test platform is a 32x 2Ghz cores x86. Regards Benoit On 27/03/2013 16:11, "Eric W. Biederman" <ebiederm@xmission.com> wrote: >Benoit Lourdelet <blourdel@juniper.net> writes: > >> Hello Serge, >> >> I am indeed using Eric patch with lxc. >> >> It solves the initial problem of slowness to start around 1600 >> containers. > >Good so now we just need a production ready patch for iproute. > >> I am now able to start more than 2000 without having new containers >> slower and slower to start. > >May I ask how large a box you are running and how complex your >containers are. I am trying to get a feel for how common it is likely >to be to find people running thousands of containers on a single >machine. > >Eric > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/ip/iplink.c b/ip/iplink.c index ad33611..6dffbf0 100644 --- a/ip/iplink.c +++ b/ip/iplink.c @@ -533,8 +533,6 @@ static int iplink_modify(int cmd, unsigned int flags, int argc, char **argv) } } - ll_init_map(&rth); - if (!(flags & NLM_F_CREATE)) { if (!dev) { fprintf(stderr, "Not enough information: \"dev\" " @@ -542,27 +540,12 @@ static int iplink_modify(int cmd, unsigned int flags, int argc, char **argv) exit(-1); } - req.i.ifi_index = ll_name_to_index(dev); - if (req.i.ifi_index == 0) { - fprintf(stderr, "Cannot find device \"%s\"\n", dev); - return -1; - } + name = dev; } else { /* Allow "ip link add dev" and "ip link add name" */ if (!name) name = dev; - if (link) { - int ifindex; - - ifindex = ll_name_to_index(link); - if (ifindex == 0) { - fprintf(stderr, "Cannot find device \"%s\"\n", - link); - return -1; - } - addattr_l(&req.n, sizeof(req), IFLA_LINK, &ifindex, 4); - } } if (name) {
Because ip link add, set, and delete map the interface name to the interface index by dumping all of the interfaces before performing their respective commands. Operations that should be constant time slow down when lots of network interfaces are in use. Resulting in O(N^2) time to work with O(N) devices. Make the work that iproute does constant time by passing the interface name to the kernel instead. In small scale testing on my system this shows dramatic performance increases of ip link add from 120s to just 11s to add 5000 network devices. And from longer than I cared to wait to just 58s to delete all of those interfaces again. Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Reported-by: Benoit Lourdelet <blourdel@juniper.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> --- I think I am bungling the case where people specify an ifindex as ifNNNN but does anyone care? ip/iplink.c | 19 +------------------ 1 files changed, 1 insertions(+), 18 deletions(-)