Message ID | 1277790959-28075-1-git-send-email-greearb@candelatech.com |
---|---|
State | Accepted, archived |
Delegated to: | stephen hemminger |
Headers | show |
From: greearb@gmail.com Date: Mon, 28 Jun 2010 22:55:59 -0700 > From: Ben Greear <greearb@candelatech.com> > > The default remains at 10 for backwards compatibility. > > For instance: > # ip addr flush dev eth2 > *** Flush remains incomplete after 10 rounds. *** > # ip -l 20 addr flush dev eth2 > *** Flush remains incomplete after 20 rounds. *** > # ip -loops 0 addr flush dev eth2 > # > > This is useful for getting rid of large numbers of IP > addresses in scripts. > > Signed-off-by: Ben Greear <greearb@candelatech.com> I would suggest to instead add some logic to this code to detect that forward progress is being made. I really don't see any value in having a hard limit that triggers on a bulk delete when no other address changing activity is happening in the system. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/28/2010 11:12 PM, David Miller wrote: > From: greearb@gmail.com > Date: Mon, 28 Jun 2010 22:55:59 -0700 > >> From: Ben Greear<greearb@candelatech.com> >> >> The default remains at 10 for backwards compatibility. >> >> For instance: >> # ip addr flush dev eth2 >> *** Flush remains incomplete after 10 rounds. *** >> # ip -l 20 addr flush dev eth2 >> *** Flush remains incomplete after 20 rounds. *** >> # ip -loops 0 addr flush dev eth2 >> # >> >> This is useful for getting rid of large numbers of IP >> addresses in scripts. >> >> Signed-off-by: Ben Greear<greearb@candelatech.com> > > I would suggest to instead add some logic to this code to detect that > forward progress is being made. > > I really don't see any value in having a hard limit that triggers on a > bulk delete when no other address changing activity is happening in > the system. I'm not sure I understand how this loop could have run forever anyway, unless some other process(es) was constantly adding addresses at the same time? Or maybe some ipv6 auto config thing? It appears there is already code to detect when the loop is done (flushing ~70 IPv4 addresses with -l 0 was one of my test cases, and worked as expected). Thanks, Ben
From: Ben Greear <greearb@candelatech.com> Date: Mon, 28 Jun 2010 23:27:39 -0700 > I'm not sure I understand how this loop could have run forever > anyway, unless some other process(es) was constantly adding > addresses at the same time? Or maybe some ipv6 auto config thing? > > It appears there is already code to detect when the loop > is done (flushing ~70 IPv4 addresses with -l 0 was one of my > test cases, and worked as expected). What happens is that we are simply limited by how many addresses we can delete in one go, and that limit is 4096 bytes of netlink message size. So we have to iterate, reusing that buffer each time, to get them all done. The limit exists because meanwhile it is possible that some other entity could add addresses and thus cause us to loop forever and never actually delete all of the addresses because every time we delete a bunch the other entity adds more. I can understand the reasoning behind the limit, because if this is run by something automated it's not like someone is at the command line and hit Ctrl-C to break out of a looping instance. But practically speaking I bet this never happens. So what makes sense to me is: 1) Loop forever by default. 2) When the number of loops exceeds a threshold (calculated by the number of addresses we see the first dump, divided by the number of deletes we can squeeze into the 4096 byte message), we emit a warning. 3) A hard limit, off by default, it available via your "-l" new option. But seriously we can determine forward progress quite easily I think. Each loop, we see if the dump returns a smaller number of addresses than the last iteration. If so, we just keep going. If the number of addresses increases, I think we can bail in this case. This logic would only ever trigger iff another entity is adding a large number of addresses simultaneously with our flush. And frankly speaking the person doing the flush probably doesn't expect that to be happening. You're flushing all of the addresses so you can start with a clean slate and then add specific addresses back, or whatever. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
greearb@gmail.com wrote: > > The default remains at 10 for backwards compatibility. > > For instance: > # ip addr flush dev eth2 > *** Flush remains incomplete after 10 rounds. *** > # ip -l 20 addr flush dev eth2 > *** Flush remains incomplete after 20 rounds. *** > # ip -loops 0 addr flush dev eth2 > # > > This is useful for getting rid of large numbers of IP > addresses in scripts. > Maybe I am missing a trick, but what is wrong with putting this trivial logic into the script: ip addr show ${DEV} | awk '/inet6? / { print $2 }' | xargs -I{} ip addr del '{}' dev ${DEV} You can probably speed things up with '-P' too, '-P 2' gives me a huge huge speed up for the work I do with 'ip route'. If you still have addresses on your interface after the above command, your looping approach probably would have failed also. Why the need to cram more functionality and options into iproute when it is something that can be pushed into the wrapper script? Cheers
Hello all! I'm sorry if I forgot to CC someone in this reply. I'm not subscribed and all list archives seems to be very scared of showing recipiets these days. [...] > > I can understand the reasoning behind the limit, because if this is > run by something automated it's not like someone is at the command > line and hit Ctrl-C to break out of a looping instance. > > But practically speaking I bet this never happens. I'm sorry to bring bad news, but your bet is wrong! There are atleast two different places (IIRC route flush, addr flush) in iproute2 which have these limits because they've been preventing people from booting their systems in the past! I know atleast ubuntu users has been having problems booting their computers because firewalling scripts executed by init use iproute2 commands and expect them to finish. > > So what makes sense to me is: > > 1) Loop forever by default. I think this is a completely insane default. The iproute2 tools are low-level and will be executed by higher level tools. Authors of these higher level tools expect iproute2 commands to always finish. Since iproute2 will *usually* finish, it's hard for these authors that they need to add some switch to always get this behaviour. > > 2) When the number of loops exceeds a threshold (calculated by the > number of addresses we see the first dump, divided by the number > of deletes we can squeeze into the 4096 byte message), we emit > a warning. > > 3) A hard limit, off by default, it available via your "-l" new option. > > But seriously we can determine forward progress quite easily I think. > > Each loop, we see if the dump returns a smaller number of addresses > than the last iteration. If so, we just keep going. This would be a much better solution! > > If the number of addresses increases, I think we can bail in this > case. > > This logic would only ever trigger iff another entity is adding a > large number of addresses simultaneously with our flush. And frankly > speaking the person doing the flush probably doesn't expect that to be > happening. You're flushing all of the addresses so you can start with > a clean slate and then add specific addresses back, or whatever. > How about implementing it in the kernel so iproute2 can tell the kernel via netlink "flush <addresses|routes> on interface X" with a single netlink message? I guess the kernel side has some kind of lock here that will prevent addresses being added and removed at the same time? Please think about this more before re-introducing the same problems again! I guess with modern distributions now using parallell boot the issue will not block the entire bootup anymore, but firewalls being blocked forever by iproute2 and not coming up might be very bad in some circumstances.
On 06/29/2010 02:58 AM, Andreas Henriksson wrote: > Hello all! > > I'm sorry if I forgot to CC someone in this reply. I'm not subscribed > and all list archives seems to be very scared of showing recipiets > these days. > > [...] >> >> I can understand the reasoning behind the limit, because if this is >> run by something automated it's not like someone is at the command >> line and hit Ctrl-C to break out of a looping instance. >> >> But practically speaking I bet this never happens. > > I'm sorry to bring bad news, but your bet is wrong! > > There are atleast two different places (IIRC route flush, addr flush) > in iproute2 which have these limits because they've been preventing > people from booting their systems in the past! I know atleast > ubuntu users has been having problems booting their computers because > firewalling scripts executed by init use iproute2 commands and > expect them to finish. First, my patch doesn't change the default behaviour, so whatever bugs the old hack was working around, the new hack should work around as well. Second, these hacks were put in 4+ years ago..maybe the underlying issues have been resolved since then? If not, it would be good to figure out the root cause and fix it, because having 'ip addr flush' not actually flush all the addresses isn't much better than having it spin forever. >> If the number of addresses increases, I think we can bail in this >> case. >> >> This logic would only ever trigger iff another entity is adding a >> large number of addresses simultaneously with our flush. And frankly >> speaking the person doing the flush probably doesn't expect that to be >> happening. You're flushing all of the addresses so you can start with >> a clean slate and then add specific addresses back, or whatever. >> > > How about implementing it in the kernel so iproute2 can tell the kernel > via netlink "flush<addresses|routes> on interface X" with a single > netlink message? > I guess the kernel side has some kind of lock here that will prevent > addresses being added and removed at the same time? I like this idea. It should help with performance issues with deleting very large numbers of addresses as well. (I'm hoping for 5k+ virtual IPs per interface.) Unfortunately, I generally make a mess when trying to write such kernel patches, but I will see if I can find someone who wants to give it a try. Thanks, Ben
On 06/29/2010 01:03 AM, Alexander Clouter wrote: > greearb@gmail.com wrote: >> >> The default remains at 10 for backwards compatibility. >> >> For instance: >> # ip addr flush dev eth2 >> *** Flush remains incomplete after 10 rounds. *** >> # ip -l 20 addr flush dev eth2 >> *** Flush remains incomplete after 20 rounds. *** >> # ip -loops 0 addr flush dev eth2 >> # >> >> This is useful for getting rid of large numbers of IP >> addresses in scripts. >> > Maybe I am missing a trick, but what is wrong with putting this trivial > logic into the script: > > ip addr show ${DEV} | awk '/inet6? / { print $2 }' | xargs -I{} ip addr del '{}' dev ${DEV} This isn't going to be fast if you have thousands of addresses. > You can probably speed things up with '-P' too, '-P 2' gives me a huge > huge speed up for the work I do with 'ip route'. Where are you using the -P at? It's not a supported option of 'ip' as far as I can tell. > Why the need to cram more functionality and options into iproute when > it is something that can be pushed into the wrapper script? Speed and ease of use. Thanks, Ben
On 06/28/2010 11:36 PM, David Miller wrote: > From: Ben Greear<greearb@candelatech.com> > Date: Mon, 28 Jun 2010 23:27:39 -0700 > >> I'm not sure I understand how this loop could have run forever >> anyway, unless some other process(es) was constantly adding >> addresses at the same time? Or maybe some ipv6 auto config thing? >> >> It appears there is already code to detect when the loop >> is done (flushing ~70 IPv4 addresses with -l 0 was one of my >> test cases, and worked as expected). > > What happens is that we are simply limited by how many addresses > we can delete in one go, and that limit is 4096 bytes of netlink > message size. > > So we have to iterate, reusing that buffer each time, to get them all > done. > > The limit exists because meanwhile it is possible that some other > entity could add addresses and thus cause us to loop forever and > never actually delete all of the addresses because every time we > delete a bunch the other entity adds more. > > I can understand the reasoning behind the limit, because if this is > run by something automated it's not like someone is at the command > line and hit Ctrl-C to break out of a looping instance. > > But practically speaking I bet this never happens. > > So what makes sense to me is: > > 1) Loop forever by default. > > 2) When the number of loops exceeds a threshold (calculated by the > number of addresses we see the first dump, divided by the number > of deletes we can squeeze into the 4096 byte message), we emit > a warning. > > 3) A hard limit, off by default, it available via your "-l" new option. > > But seriously we can determine forward progress quite easily I think. > > Each loop, we see if the dump returns a smaller number of addresses > than the last iteration. If so, we just keep going. > > If the number of addresses increases, I think we can bail in this > case. > > This logic would only ever trigger iff another entity is adding a > large number of addresses simultaneously with our flush. And frankly > speaking the person doing the flush probably doesn't expect that to be > happening. You're flushing all of the addresses so you can start with > a clean slate and then add specific addresses back, or whatever. If I understand your proposal properly, this would seem to be somewhat O(N^2) if we have large numbers of addresses, and I'm hoping to support thousands of IPs with decent performance. What do you think about improving the kernel side so that we can send a single netlink msg to delete all addresses on an interface, and just let the kernel do the looping/locking needed to make it happen? Thanks, Ben
Ben Greear <greearb@candelatech.com> wrote: > >>> This is useful for getting rid of large numbers of IP >>> addresses in scripts. >>> >> Maybe I am missing a trick, but what is wrong with putting this trivial >> logic into the script: >> >> ip addr show ${DEV} | awk '/inet6? / { print $2 }' | xargs -I{} ip addr del '{}' dev ${DEV} > > This isn't going to be fast if you have thousands of addresses. > Obviously not like-for-like but: ---- maru:/home/alex# cat /proc/cpuinfo Processor : Feroceon 88FR131 rev 1 (v5l) BogoMIPS : 1192.75 Features : swp half thumb fastmult edsp CPU implementer : 0x56 CPU architecture: 5TE CPU variant : 0x2 CPU part : 0x131 CPU revision : 1 Hardware : Marvell SheevaPlug Reference Board Revision : 0000 Serial : 0000000000000000 maru:/home/alex# ip route show realm filthpit | wc 8464 76176 533812 maru:/home/alex# time ip route list realm filthpit | xargs -I{} sh -c 'ip route del {}' real 1m7.590s user 0m0.650s sys 0m3.010s ---- >> You can probably speed things up with '-P' too, '-P 2' gives me a huge >> huge speed up for the work I do with 'ip route'. > > Where are you using the -P at? It's not a supported option of 'ip' > as far as I can tell. > xargs, it works well on SMP systems, on my SheevaPlug you do not get much gain. >> Why the need to cram more functionality and options into iproute when >> it is something that can be pushed into the wrapper script? > > Speed and ease of use. > Annoying 'ip addr show dev dummy0' craps out after 54 assignments in it's output (bug?), however ifconfig works so: ---- truffle:/home/ac56# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 4 model name : Intel(R) Xeon(TM) CPU 2.80GHz stepping : 3 cpu MHz : 2793.177 cache size : 2048 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 5 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc pebs bts pni monitor ds_cpl cid cx16 xtpr bogomips : 5591.21 clflush size : 64 cache_alignment : 128 address sizes : 36 bits physical, 48 bits virtual power management: [snipped other real and two ht cpus] truffle:/home/ac56# time for I in $(seq 1 4085); do ip -6 addr add fe80::$(printf "%x" ${I})/64 dev dummy0; done real 0m17.013s user 0m4.004s sys 0m11.417s truffle:/home/ac56# ifconfig dummy0 | awk '/inet6? / { print $3 }' | wc 4085 4085 52835 truffle:/home/ac56# time ifconfig dummy0 | awk '/inet6? / { print $3 }' | xargs -I{} ip addr del {} dev dummy0 real 0m5.897s user 0m1.272s sys 0m4.456s # no idea why but, only 24 entries, re-running the above cleans up # properly in 4ms truffle:/home/ac56# ifconfig dummy0 | awk '/inet6? / { print $3 }' | wc 24 24 310 # from the top again (but for '-P2' action): truffle:/home/ac56# time ifconfig dummy0 | awk '/inet6? / { print $3 }' | xargs -I{} -P2 ip addr del {} dev dummy0 real 0m4.013s user 0m1.268s sys 0m5.072s truffle:/home/ac56# ifconfig dummy0 | awk '/inet6? / { print $3 }' | wc 40 40 516 ---- Apart from the cleanup glitches in both cases, things are rather fast already? Probably worth focusing at the 'ip (route|addr) add' slowness? Bah, just my £0.02. Cheers [1] and anything more than 4085 for the sequence gives 'RTNETLINK answers: Cannot allocate memory'
On 06/29/2010 08:48 AM, Alexander Clouter wrote: > Ben Greear<greearb@candelatech.com> wrote: >> >>>> This is useful for getting rid of large numbers of IP >>>> addresses in scripts. >>>> >>> Maybe I am missing a trick, but what is wrong with putting this trivial >>> logic into the script: >>> >>> ip addr show ${DEV} | awk '/inet6? / { print $2 }' | xargs -I{} ip addr del '{}' dev ${DEV} >> >> This isn't going to be fast if you have thousands of addresses. >> > Obviously not like-for-like but: You might try comparing against -batch mode v/s individual calls to ip. I'd enjoy adding/removing thousands of IP addrs in < 1 second, but got a bit of work to do before I get there I think. Thanks, Ben
From: Ben Greear <greearb@candelatech.com> Date: Tue, 29 Jun 2010 08:10:43 -0700 > If I understand your proposal properly, this would seem to be > somewhat O(N^2) if we have large numbers of addresses, and I'm > hoping to support thousands of IPs with decent performance. I am not modifying the computational complexity or cost of the operation at all. I'm just changing under what circumstances it gives up. > What do you think about improving the kernel side so that we can send > a single netlink msg to delete all addresses on an interface, and just > let the kernel do the looping/locking needed to make it happen? I think the current scheme isn't that bad. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 28 Jun 2010 22:55:59 -0700 greearb@gmail.com wrote: > From: Ben Greear <greearb@candelatech.com> > > The default remains at 10 for backwards compatibility. > > For instance: > # ip addr flush dev eth2 > *** Flush remains incomplete after 10 rounds. *** > # ip -l 20 addr flush dev eth2 > *** Flush remains incomplete after 20 rounds. *** > # ip -loops 0 addr flush dev eth2 > # > > This is useful for getting rid of large numbers of IP > addresses in scripts. > > Signed-off-by: Ben Greear <greearb@candelatech.com> The current userspace driven flush behavior is crap. There really should be an atomic flush netlink message with kernel support. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/04/2010 02:00 PM, Stephen Hemminger wrote: > On Mon, 28 Jun 2010 22:55:59 -0700 > greearb@gmail.com wrote: > >> From: Ben Greear<greearb@candelatech.com> >> >> The default remains at 10 for backwards compatibility. >> >> For instance: >> # ip addr flush dev eth2 >> *** Flush remains incomplete after 10 rounds. *** >> # ip -l 20 addr flush dev eth2 >> *** Flush remains incomplete after 20 rounds. *** >> # ip -loops 0 addr flush dev eth2 >> # >> >> This is useful for getting rid of large numbers of IP >> addresses in scripts. >> >> Signed-off-by: Ben Greear<greearb@candelatech.com> > > The current userspace driven flush behavior is crap. > There really should be an atomic flush netlink message > with kernel support. No argument from me, but I've no time to figure out how to do that properly. Thanks, Ben
diff --git a/include/utils.h b/include/utils.h index f7ef939..3da6998 100644 --- a/include/utils.h +++ b/include/utils.h @@ -17,6 +17,7 @@ extern int resolve_hosts; extern int oneline; extern int timestamp; extern char * _SL_; +extern int max_flush_loops; #ifndef IPPROTO_ESP #define IPPROTO_ESP 50 diff --git a/ip/ip.c b/ip/ip.c index 9f29533..b127d57 100644 --- a/ip/ip.c +++ b/ip/ip.c @@ -32,6 +32,8 @@ int timestamp = 0; char * _SL_ = NULL; char *batch_file = NULL; int force = 0; +int max_flush_loops = 10; + struct rtnl_handle rth = { .fd = -1 }; static void usage(void) __attribute__((noreturn)); @@ -45,6 +47,7 @@ static void usage(void) " tunnel | tuntap | maddr | mroute | mrule | monitor | xfrm }\n" " OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |\n" " -f[amily] { inet | inet6 | ipx | dnet | link } |\n" +" -l[oops] { maximum-addr-flush-attempts } |\n" " -o[neline] | -t[imestamp] | -b[atch] [filename] |\n" " -rc[vbuf] [size]}\n"); exit(-1); @@ -157,7 +160,13 @@ int main(int argc, char **argv) break; if (opt[1] == '-') opt++; - if (matches(opt, "-family") == 0) { + if (matches(opt, "-loops") == 0) { + argc--; + argv++; + if (argc <= 1) + usage(); + max_flush_loops = atoi(argv[1]); + } else if (matches(opt, "-family") == 0) { argc--; argv++; if (argc <= 1) diff --git a/ip/ipaddress.c b/ip/ipaddress.c index 3a411b1..5f0789c 100644 --- a/ip/ipaddress.c +++ b/ip/ipaddress.c @@ -33,7 +33,6 @@ #include "ll_map.h" #include "ip_common.h" -#define MAX_ROUNDS 10 static struct { @@ -818,7 +817,7 @@ static int ipaddr_list_or_flush(int argc, char **argv, int flush) filter.flushp = 0; filter.flushe = sizeof(flushb); - while (round < MAX_ROUNDS) { + while ((max_flush_loops == 0) || (round < max_flush_loops)) { const struct rtnl_dump_filter_arg a[3] = { { .filter = print_addrinfo_secondary, @@ -867,7 +866,8 @@ static int ipaddr_list_or_flush(int argc, char **argv, int flush) fflush(stdout); } } - fprintf(stderr, "*** Flush remains incomplete after %d rounds. ***\n", MAX_ROUNDS); fflush(stderr); + fprintf(stderr, "*** Flush remains incomplete after %d rounds. ***\n", max_flush_loops); + fflush(stderr); return 1; } diff --git a/man/man8/ip.8 b/man/man8/ip.8 index 1a73efa..d0146a5 100644 --- a/man/man8/ip.8 +++ b/man/man8/ip.8 @@ -730,6 +730,12 @@ appears twice or more, the amount of information increases. As a rule, the information is statistics or some time values. .TP +.BR "\-l" , " \-loops" +Specify maximum number of loops the 'ip addr flush' logic +will attempt before giving up. The default is 10. +Zero (0) means loop until all addresses are removed. + +.TP .BR "\-f" , " \-family" followed by protocol family identifier: .BR "inet" , " inet6"