diff mbox

[RFC] sched/numa: do load balance between remote nodes

Message ID 1338965571-9812-1-git-send-email-alex.shi@intel.com (mailing list archive)
State Not Applicable
Headers show

Commit Message

alex.shi June 6, 2012, 6:52 a.m. UTC
commit cb83b629b remove the NODE sched domain and check if the node
distance in SLIT table is farther than REMOTE_DISTANCE, if so, it will
lose the load balance chance at exec/fork/wake_affine points.

But actually, even the node distance is farther than REMOTE_DISTANCE,
Modern CPUs also has QPI like connections, that make memory access is
not too slow between nodes. So above losing on NUMA machine make a
huge performance regression on benchmark: hackbench, tbench, netperf
and oltp etc.

This patch will recover the scheduler behavior to old mode on all my
Intel platforms: NHM EP/EX, WSM EP, SNB EP/EP4S, and so remove the
perfromance regressions. (all of them just has 2 kinds distance, 10 21)

Signed-off-by: Alex Shi <alex.shi@intel.com>
---
 kernel/sched/core.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

Comments

Peter Zijlstra June 6, 2012, 9:01 a.m. UTC | #1
On Wed, 2012-06-06 at 14:52 +0800, Alex Shi wrote:
> -       if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
> +       if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE) 

I actually considered this.. I just felt a little uneasy re-purposing
the RECLAIM_DISTANCE for this, but I guess its all the same anyway. Both
mean expensive-away-distance.

So I've taken this.

thanks!
Sergei Shtylyov June 6, 2012, 10:53 a.m. UTC | #2
Hello.

On 06-06-2012 10:52, Alex Shi wrote:

> commit cb83b629b

    Please also specify that commit's summary in parens.

> remove the NODE sched domain and check if the node
> distance in SLIT table is farther than REMOTE_DISTANCE, if so, it will
> lose the load balance chance at exec/fork/wake_affine points.

> But actually, even the node distance is farther than REMOTE_DISTANCE,
> Modern CPUs also has QPI like connections, that make memory access is

    "Is" not needed here.

> not too slow between nodes.  So above losing on NUMA machine make a
> huge performance regression on benchmark: hackbench, tbench, netperf
> and oltp etc.

> This patch will recover the scheduler behavior to old mode on all my
> Intel platforms: NHM EP/EX, WSM EP, SNB EP/EP4S, and so remove the
> perfromance regressions. (all of them just has 2 kinds distance, 10 21)

> Signed-off-by: Alex Shi<alex.shi@intel.com>

WBR, Sergei
alex.shi June 7, 2012, 12:33 a.m. UTC | #3
On 06/06/2012 05:01 PM, Peter Zijlstra wrote:

> On Wed, 2012-06-06 at 14:52 +0800, Alex Shi wrote:
>> -       if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
>> +       if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE) 
> 
> I actually considered this.. I just felt a little uneasy re-purposing
> the RECLAIM_DISTANCE for this, but I guess its all the same anyway. Both
> mean expensive-away-distance.
> 


I understand you, the BIOS guys don't have a good alignment with us on
this.

> So I've taken this.
> 
> thanks!
diff mbox

Patch

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 39eb601..b2ee41a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6286,7 +6286,7 @@  static int sched_domains_curr_level;
 
 static inline int sd_local_flags(int level)
 {
-	if (sched_domains_numa_distance[level] > REMOTE_DISTANCE)
+	if (sched_domains_numa_distance[level] > RECLAIM_DISTANCE)
 		return 0;
 
 	return SD_BALANCE_EXEC | SD_BALANCE_FORK | SD_WAKE_AFFINE;