diff mbox series

SLW: Increase stop4-5 residency by 10x

Message ID 1521602856-387-1-git-send-email-akshay.adiga@linux.vnet.ibm.com
State Accepted
Headers show
Series SLW: Increase stop4-5 residency by 10x | expand

Commit Message

Akshay Adiga March 21, 2018, 3:27 a.m. UTC
Using DGEMM benchmark we observed there was a drop of 5-9% throughput with
and without stop4/5. In this benchmark the GPU waits on the cpu to wakeup
and provide the subsequent data block to compute. The wakup latency
accumulates over the run and shows up as a performance drop.

Linux enters stop4/5 more aggressively for its wakeup latency. Increasing
the residency from 1ms to 10ms makes the performance drop <1%

Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
---
 hw/slw.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Vaidyanathan Srinivasan March 21, 2018, 7:29 a.m. UTC | #1
* Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> [2018-03-21 08:57:36]:

> Using DGEMM benchmark we observed there was a drop of 5-9% throughput with
> and without stop4/5. In this benchmark the GPU waits on the cpu to wakeup
> and provide the subsequent data block to compute. The wakup latency
> accumulates over the run and shows up as a performance drop.
> 
> Linux enters stop4/5 more aggressively for its wakeup latency. Increasing
> the residency from 1ms to 10ms makes the performance drop <1%
> 
> Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>

Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>


> ---
>  hw/slw.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/slw.c b/hw/slw.c
> index db238ec..515582b 100644
> --- a/hw/slw.c
> +++ b/hw/slw.c
> @@ -598,7 +598,7 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>  	{
>  		.name = "stop4",
>  		.latency_ns = 100000,
> -		.residency_ns = 1000000,
> +		.residency_ns = 10000000,
>  		.flags = 0*OPAL_PM_DEC_STOP \
>  		       | 0*OPAL_PM_TIMEBASE_STOP  \
>  		       | 1*OPAL_PM_LOSE_USER_CONTEXT \
> @@ -614,7 +614,7 @@ static struct cpu_idle_states power9_cpu_idle_states[] = {
>  	{
>  		.name = "stop5",
>  		.latency_ns = 200000,
> -		.residency_ns = 2000000,
> +		.residency_ns = 20000000,
>  		.flags = 0*OPAL_PM_DEC_STOP \
>  		       | 0*OPAL_PM_TIMEBASE_STOP  \
>  		       | 1*OPAL_PM_LOSE_USER_CONTEXT \


Tuning the thresholds reduce the stop4/5 entry throughout the runtime
of the GPU benchmark/workload and recover performance.  Power savings
have very less impact since in GPU workload scenario since the GPU
power and utilization dominate the overall runtime efficiency.

This threshold/setting is a performance trade off due to wakeup
latency of these deep idle states.

--Vaidy
Stewart Smith March 23, 2018, 5:43 a.m. UTC | #2
Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> writes:
> Using DGEMM benchmark we observed there was a drop of 5-9% throughput with
> and without stop4/5. In this benchmark the GPU waits on the cpu to wakeup
> and provide the subsequent data block to compute. The wakup latency
> accumulates over the run and shows up as a performance drop.
>
> Linux enters stop4/5 more aggressively for its wakeup latency. Increasing
> the residency from 1ms to 10ms makes the performance drop <1%
>
> Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
> ---
>  hw/slw.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Merged to master as of 87f33f4990612116306ab42fbd7c163a2f90c89c
diff mbox series

Patch

diff --git a/hw/slw.c b/hw/slw.c
index db238ec..515582b 100644
--- a/hw/slw.c
+++ b/hw/slw.c
@@ -598,7 +598,7 @@  static struct cpu_idle_states power9_cpu_idle_states[] = {
 	{
 		.name = "stop4",
 		.latency_ns = 100000,
-		.residency_ns = 1000000,
+		.residency_ns = 10000000,
 		.flags = 0*OPAL_PM_DEC_STOP \
 		       | 0*OPAL_PM_TIMEBASE_STOP  \
 		       | 1*OPAL_PM_LOSE_USER_CONTEXT \
@@ -614,7 +614,7 @@  static struct cpu_idle_states power9_cpu_idle_states[] = {
 	{
 		.name = "stop5",
 		.latency_ns = 200000,
-		.residency_ns = 2000000,
+		.residency_ns = 20000000,
 		.flags = 0*OPAL_PM_DEC_STOP \
 		       | 0*OPAL_PM_TIMEBASE_STOP  \
 		       | 1*OPAL_PM_LOSE_USER_CONTEXT \