From patchwork Mon Jun 7 15:06:51 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Srivatsa Vaddagiri X-Patchwork-Id: 54873 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from bilbo.ozlabs.org (localhost [127.0.0.1]) by ozlabs.org (Postfix) with ESMTP id D53E7B7FD0 for ; Tue, 8 Jun 2010 01:07:14 +1000 (EST) Received: by ozlabs.org (Postfix) id B0A05B7D29; Tue, 8 Jun 2010 01:07:08 +1000 (EST) Delivered-To: linuxppc-dev@ozlabs.org Received: from e3.ny.us.ibm.com (e3.ny.us.ibm.com [32.97.182.143]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "e3.ny.us.ibm.com", Issuer "Equifax" (verified OK)) by ozlabs.org (Postfix) with ESMTPS id 2A977B7D1F for ; Tue, 8 Jun 2010 01:07:05 +1000 (EST) Received: from d01relay05.pok.ibm.com (d01relay05.pok.ibm.com [9.56.227.237]) by e3.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id o57ErbTD032161 for ; Mon, 7 Jun 2010 10:53:37 -0400 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay05.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o57F70j6122382 for ; Mon, 7 Jun 2010 11:07:00 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id o57F6w3r004200 for ; Mon, 7 Jun 2010 12:06:59 -0300 Received: from snowy-tp.in.ibm.com ([9.126.238.210]) by d01av02.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id o57F6vB8004078; Mon, 7 Jun 2010 12:06:57 -0300 Received: by snowy-tp.in.ibm.com (Postfix, from userid 500) id 353F85FF63; Mon, 7 Jun 2010 20:36:52 +0530 (IST) Date: Mon, 7 Jun 2010 20:36:51 +0530 From: Srivatsa Vaddagiri To: Peter Zijlstra Subject: Re: [PATCH 1/5] sched: fix capacity calculations for SMT4 Message-ID: <20100607150651.GA13993@linux.vnet.ibm.com> References: <20100409062118.D4096CBB6C@localhost.localdomain> <1271161766.4807.1280.camel@twins> <2906.1271219317@neuling.org> <1271426308.1674.429.camel@laptop> <1275294796.27810.21554.camel@twins> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1275294796.27810.21554.camel@twins> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: Michael Neuling , Suresh Siddha , Gautham R Shenoy , linux-kernel@vger.kernel.org, linuxppc-dev@ozlabs.org, Ingo Molnar X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.13 Precedence: list Reply-To: vatsa@in.ibm.com List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org On Mon, May 31, 2010 at 10:33:16AM +0200, Peter Zijlstra wrote: > On Fri, 2010-04-16 at 15:58 +0200, Peter Zijlstra wrote: > > > > > > Hrmm, my brain seems muddled but I might have another solution, let me > > ponder this for a bit.. > > > > Right, so the thing I was thinking about is taking the group capacity > into account when determining the capacity for a single cpu. Peter, We are exploring an alternate solution which seems to be working as expected. Basically allow capacity of 1 for SMT threads provided there is no significant influence by RT tasks or freq scaling. Note that at core level, capacity is unchanged and hence this affects only how tasks are distributed within a core. Mike Neuling should post an updated patchset containing this patch (with more comments added ofcourse!). Signed-off-by: Srivatsa Vaddagiri --- include/linux/sched.h | 2 +- kernel/sched_fair.c | 30 +++++++++++++++++++++++------- 2 files changed, 24 insertions(+), 8 deletions(-) Index: linux-2.6-ozlabs/include/linux/sched.h =================================================================== --- linux-2.6-ozlabs.orig/include/linux/sched.h +++ linux-2.6-ozlabs/include/linux/sched.h @@ -860,7 +860,7 @@ struct sched_group { * CPU power of this group, SCHED_LOAD_SCALE being max power for a * single CPU. */ - unsigned int cpu_power; + unsigned int cpu_power, cpu_power_orig; /* * The CPUs this group covers. Index: linux-2.6-ozlabs/kernel/sched_fair.c =================================================================== --- linux-2.6-ozlabs.orig/kernel/sched_fair.c +++ linux-2.6-ozlabs/kernel/sched_fair.c @@ -2285,13 +2285,6 @@ static void update_cpu_power(struct sche unsigned long power = SCHED_LOAD_SCALE; struct sched_group *sdg = sd->groups; - if (sched_feat(ARCH_POWER)) - power *= arch_scale_freq_power(sd, cpu); - else - power *= default_scale_freq_power(sd, cpu); - - power >>= SCHED_LOAD_SHIFT; - if ((sd->flags & SD_SHARE_CPUPOWER) && weight > 1) { if (sched_feat(ARCH_POWER)) power *= arch_scale_smt_power(sd, cpu); @@ -2301,6 +2294,15 @@ static void update_cpu_power(struct sche power >>= SCHED_LOAD_SHIFT; } + sdg->cpu_power_orig = power; + + if (sched_feat(ARCH_POWER)) + power *= arch_scale_freq_power(sd, cpu); + else + power *= default_scale_freq_power(sd, cpu); + + power >>= SCHED_LOAD_SHIFT; + power *= scale_rt_power(cpu); power >>= SCHED_LOAD_SHIFT; @@ -2333,6 +2335,22 @@ static void update_group_power(struct sc sdg->cpu_power = power; } +static inline int +rt_freq_influence(struct sched_group *group, struct sched_domain *sd) +{ + if (sd->child) + return 1; + + /* + * Check to see if the final cpu power was reduced by more + * than 10% by frequency or rt tasks + */ + if (group->cpu_power * 100 < group->cpu_power_orig * 90) + return 1; + + return 0; +} + /** * update_sg_lb_stats - Update sched_group's statistics for load balancing. * @sd: The sched_domain whose statistics are to be updated. @@ -2426,6 +2444,8 @@ static inline void update_sg_lb_stats(st sgs->group_capacity = DIV_ROUND_CLOSEST(group->cpu_power, SCHED_LOAD_SCALE); + if (!sgs->group_capacity && !rt_freq_influence(group, sd)) + sgs->group_capacity = 1; } /** @@ -2725,7 +2745,8 @@ ret: */ static struct rq * find_busiest_queue(struct sched_group *group, enum cpu_idle_type idle, - unsigned long imbalance, const struct cpumask *cpus) + unsigned long imbalance, const struct cpumask *cpus, + struct sched_domain *sd) { struct rq *busiest = NULL, *rq; unsigned long max_load = 0; @@ -2736,6 +2757,9 @@ find_busiest_queue(struct sched_group *g unsigned long capacity = DIV_ROUND_CLOSEST(power, SCHED_LOAD_SCALE); unsigned long wl; + if (!capacity && !rt_freq_influence(group, sd)) + capacity = 1; + if (!cpumask_test_cpu(i, cpus)) continue; @@ -2852,7 +2876,7 @@ redo: goto out_balanced; } - busiest = find_busiest_queue(group, idle, imbalance, cpus); + busiest = find_busiest_queue(group, idle, imbalance, cpus, sd); if (!busiest) { schedstat_inc(sd, lb_nobusyq[idle]); goto out_balanced;