Message ID | 1440489266-31127-3-git-send-email-raghavendra.kt@linux.vnet.ibm.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, 2015-08-25 at 13:24 +0530, Raghavendra K T wrote: > Docker container creation linearly increased from around 1.6 sec to 7.5 sec > (at 1000 containers) and perf data showed 50% ovehead in snmp_fold_field. > > reason: currently __snmp6_fill_stats64 calls snmp_fold_field that walks > through per cpu data of an item (iteratively for around 90 items). > > idea: This patch tries to aggregate the statistics by going through > all the items of each cpu sequentially which is reducing cache > misses. > > Docker creation got faster by more than 2x after the patch. > > Result: > Before After > Docker creation time 6.836s 3.357s > cache miss 2.7% 1.38% > > perf before: > 50.73% docker [kernel.kallsyms] [k] snmp_fold_field > 9.07% swapper [kernel.kallsyms] [k] snooze_loop > 3.49% docker [kernel.kallsyms] [k] veth_stats_one > 2.85% swapper [kernel.kallsyms] [k] _raw_spin_lock > > perf after: > 10.56% swapper [kernel.kallsyms] [k] snooze_loop > 8.72% docker [kernel.kallsyms] [k] snmp_get_cpu_field > 7.59% docker [kernel.kallsyms] [k] veth_stats_one > 3.65% swapper [kernel.kallsyms] [k] _raw_spin_lock > > Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> > --- > net/ipv6/addrconf.c | 14 +++++++++++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c > index 21c2c81..2ec905f 100644 > --- a/net/ipv6/addrconf.c > +++ b/net/ipv6/addrconf.c > @@ -4624,16 +4624,24 @@ static inline void __snmp6_fill_statsdev(u64 *stats, atomic_long_t *mib, > } > > static inline void __snmp6_fill_stats64(u64 *stats, void __percpu *mib, > - int items, int bytes, size_t syncpoff) > + int items, int bytes, size_t syncpoff) > { > - int i; > + int i, c; > + u64 *tmp; > int pad = bytes - sizeof(u64) * items; > BUG_ON(pad < 0); > > + tmp = kcalloc(items, sizeof(u64), GFP_KERNEL); > + This is a great idea, but kcalloc()/kmalloc() can fail and you'll crash the whole kernel at this point. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/25/2015 07:58 PM, Eric Dumazet wrote: > On Tue, 2015-08-25 at 13:24 +0530, Raghavendra K T wrote: >> Docker container creation linearly increased from around 1.6 sec to 7.5 sec >> (at 1000 containers) and perf data showed 50% ovehead in snmp_fold_field. >> >> reason: currently __snmp6_fill_stats64 calls snmp_fold_field that walks >> through per cpu data of an item (iteratively for around 90 items). >> >> idea: This patch tries to aggregate the statistics by going through >> all the items of each cpu sequentially which is reducing cache >> misses. >> >> Docker creation got faster by more than 2x after the patch. >> >> Result: >> Before After >> Docker creation time 6.836s 3.357s >> cache miss 2.7% 1.38% >> >> perf before: >> 50.73% docker [kernel.kallsyms] [k] snmp_fold_field >> 9.07% swapper [kernel.kallsyms] [k] snooze_loop >> 3.49% docker [kernel.kallsyms] [k] veth_stats_one >> 2.85% swapper [kernel.kallsyms] [k] _raw_spin_lock >> >> perf after: >> 10.56% swapper [kernel.kallsyms] [k] snooze_loop >> 8.72% docker [kernel.kallsyms] [k] snmp_get_cpu_field >> 7.59% docker [kernel.kallsyms] [k] veth_stats_one >> 3.65% swapper [kernel.kallsyms] [k] _raw_spin_lock >> >> Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> >> --- >> net/ipv6/addrconf.c | 14 +++++++++++--- >> 1 file changed, 11 insertions(+), 3 deletions(-) >> >> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c >> index 21c2c81..2ec905f 100644 >> --- a/net/ipv6/addrconf.c >> +++ b/net/ipv6/addrconf.c >> @@ -4624,16 +4624,24 @@ static inline void __snmp6_fill_statsdev(u64 *stats, atomic_long_t *mib, >> } >> >> static inline void __snmp6_fill_stats64(u64 *stats, void __percpu *mib, >> - int items, int bytes, size_t syncpoff) >> + int items, int bytes, size_t syncpoff) >> { >> - int i; >> + int i, c; >> + u64 *tmp; >> int pad = bytes - sizeof(u64) * items; >> BUG_ON(pad < 0); >> >> + tmp = kcalloc(items, sizeof(u64), GFP_KERNEL); >> + > > > This is a great idea, but kcalloc()/kmalloc() can fail and you'll crash > the whole kernel at this point. > Good catch, and my bad. Though system is in bad memory condition, since fill_stat is not critical for the system do you think silently returning from here is a good idea? or do you think we should handle with -ENOMEM way up.? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2015-08-25 at 21:17 +0530, Raghavendra K T wrote: > On 08/25/2015 07:58 PM, Eric Dumazet wrote: > > > > > > This is a great idea, but kcalloc()/kmalloc() can fail and you'll crash > > the whole kernel at this point. > > > > Good catch, and my bad. Though system is in bad memory condition, > since fill_stat is not critical for the system do you think silently > returning from here is a good idea? > or do you think we should handle with -ENOMEM way up.? Hmm... presumably these 288 bytes could be allocated in inet6_fill_ifla6_attrs() stack frame. Also it is weird we fill all these stats for a device we just created and never enabled : initial stats are all 0 for them. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/25/2015 09:30 PM, Eric Dumazet wrote: > On Tue, 2015-08-25 at 21:17 +0530, Raghavendra K T wrote: >> On 08/25/2015 07:58 PM, Eric Dumazet wrote: > >>> >>> >>> This is a great idea, but kcalloc()/kmalloc() can fail and you'll crash >>> the whole kernel at this point. >>> >> >> Good catch, and my bad. Though system is in bad memory condition, >> since fill_stat is not critical for the system do you think silently >> returning from here is a good idea? >> or do you think we should handle with -ENOMEM way up.? > > Hmm... presumably these 288 bytes could be allocated in > inet6_fill_ifla6_attrs() stack frame. > > Also it is weird we fill all these stats for a device we just created > and never enabled : initial stats are all 0 for them. > Yes it is.. Initially I was even thinking 1. if we could disable the stat filling just after creation ( only allocate the space for statistics but do not fill). 2. should we have a PROC_FS_NET_SNMP config which we can disable if not necessary. 3. should we defer this snmp_fold_walk to a workqueue. (unfortunately there is not much to do after this stat filling which can run in parallel before we wait for completion.. or may be there is a way). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/25/2015 09:30 PM, Eric Dumazet wrote: > On Tue, 2015-08-25 at 21:17 +0530, Raghavendra K T wrote: >> On 08/25/2015 07:58 PM, Eric Dumazet wrote: > >>> >>> >>> This is a great idea, but kcalloc()/kmalloc() can fail and you'll crash >>> the whole kernel at this point. >>> >> >> Good catch, and my bad. Though system is in bad memory condition, >> since fill_stat is not critical for the system do you think silently >> returning from here is a good idea? >> or do you think we should handle with -ENOMEM way up.? > > Hmm... presumably these 288 bytes could be allocated in > inet6_fill_ifla6_attrs() stack frame. Correct, since we need to allocate for IPSTATS_MIB_MAX, we could do this in even snmp6_fill_stats() stack frame. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 21c2c81..2ec905f 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -4624,16 +4624,24 @@ static inline void __snmp6_fill_statsdev(u64 *stats, atomic_long_t *mib, } static inline void __snmp6_fill_stats64(u64 *stats, void __percpu *mib, - int items, int bytes, size_t syncpoff) + int items, int bytes, size_t syncpoff) { - int i; + int i, c; + u64 *tmp; int pad = bytes - sizeof(u64) * items; BUG_ON(pad < 0); + tmp = kcalloc(items, sizeof(u64), GFP_KERNEL); + /* Use put_unaligned() because stats may not be aligned for u64. */ put_unaligned(items, &stats[0]); + + for_each_possible_cpu(c) + for (i = 1; i < items; i++) + tmp[i] += snmp_get_cpu_field64(mib, c, i, syncpoff); + for (i = 1; i < items; i++) - put_unaligned(snmp_fold_field64(mib, i, syncpoff), &stats[i]); + put_unaligned(tmp[i], &stats[i]); memset(&stats[items], 0, pad); }
Docker container creation linearly increased from around 1.6 sec to 7.5 sec (at 1000 containers) and perf data showed 50% ovehead in snmp_fold_field. reason: currently __snmp6_fill_stats64 calls snmp_fold_field that walks through per cpu data of an item (iteratively for around 90 items). idea: This patch tries to aggregate the statistics by going through all the items of each cpu sequentially which is reducing cache misses. Docker creation got faster by more than 2x after the patch. Result: Before After Docker creation time 6.836s 3.357s cache miss 2.7% 1.38% perf before: 50.73% docker [kernel.kallsyms] [k] snmp_fold_field 9.07% swapper [kernel.kallsyms] [k] snooze_loop 3.49% docker [kernel.kallsyms] [k] veth_stats_one 2.85% swapper [kernel.kallsyms] [k] _raw_spin_lock perf after: 10.56% swapper [kernel.kallsyms] [k] snooze_loop 8.72% docker [kernel.kallsyms] [k] snmp_get_cpu_field 7.59% docker [kernel.kallsyms] [k] veth_stats_one 3.65% swapper [kernel.kallsyms] [k] _raw_spin_lock Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> --- net/ipv6/addrconf.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)