Message ID | 1264721088.10385.1.camel@jschopp-laptop (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
On Thu, 2010-01-28 at 17:24 -0600, Joel Schopp wrote: > On Power7 processors running in SMT4 mode with 2, 3, or 4 idle threads > there is performance benefit to idling the higher numbered threads in > the core. > > This patch implements arch_scale_smt_power to dynamically update smt > thread power in these idle cases in order to prefer threads 0,1 over > threads 2,3 within a core. Almost there :-) Joel, Peter, can you help me figure something out tho ? On machine that don't have SMT, I would like to avoid calling arch_scale_smt_power() at all if possible (in addition to not compiling it in if SMT is not enabled in .config). Now, I must say I'm utterly confused by how the domains are setup and I haven't quite managed to sort it out... it looks to me that SD_SHARE_CPUPOWER is always going to be set on all CPUs when the config option is set (though each CPU will have its own domain) or am I misguided ? IE. Is there any sense in having at least a fast exit path out of arch_scale_smt_power() for non-SMT CPUs ? Joel, can you look at compiling it out when SMT is not set ? We don't want to bloat SMP kernels for 32-bit non-SMT embedded platforms. Oh, and one minor nit: > Signed-off-by: Joel Schopp <jschopp@austin.ibm.com> > --- > Version 2 addresses style and optimization, same basic functionality > Version 3 adds a comment, resent due to mailing format error > Index: linux-2.6.git/arch/powerpc/kernel/smp.c > =================================================================== > --- linux-2.6.git.orig/arch/powerpc/kernel/smp.c > +++ linux-2.6.git/arch/powerpc/kernel/smp.c > @@ -620,3 +620,59 @@ void __cpu_die(unsigned int cpu) > smp_ops->cpu_die(cpu); > } > #endif ^^^ Please add the /* CONFIG_CPU_HOTPLUG */ (or whatever it is) that's missing after that #endif :-) Cheers, Ben.
On Fri, 2010-01-29 at 12:23 +1100, Benjamin Herrenschmidt wrote: > On machine that don't have SMT, I would like to avoid calling > arch_scale_smt_power() at all if possible (in addition to not compiling > it in if SMT is not enabled in .config). > > Now, I must say I'm utterly confused by how the domains are setup and I > haven't quite managed to sort it out... it looks to me that > SD_SHARE_CPUPOWER is always going to be set on all CPUs when the config > option is set (though each CPU will have its own domain) or am I > misguided ? IE. Is there any sense in having at least a fast exit path > out of arch_scale_smt_power() for non-SMT CPUs ? The sched_domain creation code is a f'ing stink pile that hurts everybody's brain. The AMD magny-cours people sort of cleaned it up a bit but didn't go nearly far enough. Doing so is somewhere on my todo list, but sadly that thing is way larger than my spare time. Now SD_SHARE_CPUPOWER _should_ only be set for SMT domains, because only SMT siblings share cpupower. SD_SHARE_PKG_RESOURCES _should_ be set for both SMT and MC, because they all share the same cache domain. If it all works out that way in practice on powerpc is another question entirely ;-) That said, I'm still not entirely convinced I like this usage of cpupower, its supposed to be a normalization scale for load-balancing, not a placement hook. I'd be much happier with a SD_GROUP_ORDER or something like that, that works together with SD_PREFER_SIBLING to pack active tasks to cpus in ascending group order.
> That said, I'm still not entirely convinced I like this usage of > cpupower, its supposed to be a normalization scale for load-balancing, > not a placement hook. > Even if you do a placement hook you'll need to address it in the load balancing as well. Consider a single 4 thread SMT core with 4 running tasks. If 2 of them exit the remaining 2 will need to be load balanced within the core in a way that takes into account the dynamic nature of the thread power. This patch does that. > I'd be much happier with a SD_GROUP_ORDER or something like that, that > works together with SD_PREFER_SIBLING to pack active tasks to cpus in > ascending group order. > > I don't see this load-balancing patch as mutually exclusive with a patch to fix placement. But even if it is a mutually exclusive solution there is no reason we can't fix things now with this patch and then later take it out when it's fixed another way. This patch series is straightforward, non-intrusive, and without it the scheduler is broken on this processor.
Benjamin Herrenschmidt wrote: > On Thu, 2010-01-28 at 17:24 -0600, Joel Schopp wrote: > >> On Power7 processors running in SMT4 mode with 2, 3, or 4 idle threads >> there is performance benefit to idling the higher numbered threads in >> the core. >> >> This patch implements arch_scale_smt_power to dynamically update smt >> thread power in these idle cases in order to prefer threads 0,1 over >> threads 2,3 within a core. >> > > Almost there :-) Joel, Peter, can you help me figure something out tho ? > > On machine that don't have SMT, I would like to avoid calling > arch_scale_smt_power() at all if possible (in addition to not compiling > it in if SMT is not enabled in .config). > > Now, I must say I'm utterly confused by how the domains are setup and I > haven't quite managed to sort it out... it looks to me that > SD_SHARE_CPUPOWER is always going to be set on all CPUs when the config > option is set (though each CPU will have its own domain) or am I > misguided ? IE. Is there any sense in having at least a fast exit path > out of arch_scale_smt_power() for non-SMT CPUs ? > > Joel, can you look at compiling it out when SMT is not set ? We don't > want to bloat SMP kernels for 32-bit non-SMT embedded platforms. > I can wrap the powerpc definition of arch_scale_smt in an #ifdef, if it's not there the scheduler uses the default, which is the same as it uses if SMT isn't compiled.
Index: linux-2.6.git/arch/powerpc/kernel/smp.c =================================================================== --- linux-2.6.git.orig/arch/powerpc/kernel/smp.c +++ linux-2.6.git/arch/powerpc/kernel/smp.c @@ -620,3 +620,59 @@ void __cpu_die(unsigned int cpu) smp_ops->cpu_die(cpu); } #endif + +unsigned long arch_scale_smt_power(struct sched_domain *sd, int cpu) +{ + int sibling; + int idle_count = 0; + int thread; + + /* Setup the default weight and smt_gain used by most cpus for SMT + * Power. Doing this right away covers the default case and can be + * used by cpus that modify it dynamically. + */ + struct cpumask *sibling_map = sched_domain_span(sd); + unsigned long weight = cpumask_weight(sibling_map); + unsigned long smt_gain = sd->smt_gain; + + + if (cpu_has_feature(CPU_FTR_ASYNC_SMT4) && weight == 4) { + for_each_cpu(sibling, sibling_map) { + if (idle_cpu(sibling)) + idle_count++; + } + + /* the following section attempts to tweak cpu power based + * on current idleness of the threads dynamically at runtime + */ + if (idle_count > 1) { + thread = cpu_thread_in_core(cpu); + if (thread < 2) { + /* add 75 % to thread power */ + smt_gain += (smt_gain >> 1) + (smt_gain >> 2); + } else { + /* subtract 75 % to thread power */ + smt_gain = smt_gain >> 2; + } + } + } + + /* default smt gain is 1178, weight is # of SMT threads */ + switch (weight) { + case 1: + /*divide by 1, do nothing*/ + break; + case 2: + smt_gain = smt_gain >> 1; + break; + case 4: + smt_gain = smt_gain >> 2; + break; + default: + smt_gain /= weight; + break; + } + + return smt_gain; + +} Index: linux-2.6.git/arch/powerpc/include/asm/cputable.h =================================================================== --- linux-2.6.git.orig/arch/powerpc/include/asm/cputable.h +++ linux-2.6.git/arch/powerpc/include/asm/cputable.h @@ -195,6 +195,7 @@ extern const char *powerpc_base_platform #define CPU_FTR_SAO LONG_ASM_CONST(0x0020000000000000) #define CPU_FTR_CP_USE_DCBTZ LONG_ASM_CONST(0x0040000000000000) #define CPU_FTR_UNALIGNED_LD_STD LONG_ASM_CONST(0x0080000000000000) +#define CPU_FTR_ASYNC_SMT4 LONG_ASM_CONST(0x0100000000000000) #ifndef __ASSEMBLY__ @@ -409,7 +410,7 @@ extern const char *powerpc_base_platform CPU_FTR_MMCRA | CPU_FTR_SMT | \ CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \ CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \ - CPU_FTR_DSCR | CPU_FTR_SAO) + CPU_FTR_DSCR | CPU_FTR_SAO | CPU_FTR_ASYNC_SMT4) #define CPU_FTRS_CELL (CPU_FTR_USE_TB | CPU_FTR_LWSYNC | \ CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \ CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
On Power7 processors running in SMT4 mode with 2, 3, or 4 idle threads there is performance benefit to idling the higher numbered threads in the core. This patch implements arch_scale_smt_power to dynamically update smt thread power in these idle cases in order to prefer threads 0,1 over threads 2,3 within a core. Signed-off-by: Joel Schopp <jschopp@austin.ibm.com> --- Version 2 addresses style and optimization, same basic functionality Version 3 adds a comment, resent due to mailing format error