Message ID | 1521602856-387-1-git-send-email-akshay.adiga@linux.vnet.ibm.com |
---|---|
State | Accepted |
Headers | show |
Series | SLW: Increase stop4-5 residency by 10x | expand |
* Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> [2018-03-21 08:57:36]: > Using DGEMM benchmark we observed there was a drop of 5-9% throughput with > and without stop4/5. In this benchmark the GPU waits on the cpu to wakeup > and provide the subsequent data block to compute. The wakup latency > accumulates over the run and shows up as a performance drop. > > Linux enters stop4/5 more aggressively for its wakeup latency. Increasing > the residency from 1ms to 10ms makes the performance drop <1% > > Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> Acked-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Tested-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> > --- > hw/slw.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/hw/slw.c b/hw/slw.c > index db238ec..515582b 100644 > --- a/hw/slw.c > +++ b/hw/slw.c > @@ -598,7 +598,7 @@ static struct cpu_idle_states power9_cpu_idle_states[] = { > { > .name = "stop4", > .latency_ns = 100000, > - .residency_ns = 1000000, > + .residency_ns = 10000000, > .flags = 0*OPAL_PM_DEC_STOP \ > | 0*OPAL_PM_TIMEBASE_STOP \ > | 1*OPAL_PM_LOSE_USER_CONTEXT \ > @@ -614,7 +614,7 @@ static struct cpu_idle_states power9_cpu_idle_states[] = { > { > .name = "stop5", > .latency_ns = 200000, > - .residency_ns = 2000000, > + .residency_ns = 20000000, > .flags = 0*OPAL_PM_DEC_STOP \ > | 0*OPAL_PM_TIMEBASE_STOP \ > | 1*OPAL_PM_LOSE_USER_CONTEXT \ Tuning the thresholds reduce the stop4/5 entry throughout the runtime of the GPU benchmark/workload and recover performance. Power savings have very less impact since in GPU workload scenario since the GPU power and utilization dominate the overall runtime efficiency. This threshold/setting is a performance trade off due to wakeup latency of these deep idle states. --Vaidy
Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> writes: > Using DGEMM benchmark we observed there was a drop of 5-9% throughput with > and without stop4/5. In this benchmark the GPU waits on the cpu to wakeup > and provide the subsequent data block to compute. The wakup latency > accumulates over the run and shows up as a performance drop. > > Linux enters stop4/5 more aggressively for its wakeup latency. Increasing > the residency from 1ms to 10ms makes the performance drop <1% > > Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> > --- > hw/slw.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Merged to master as of 87f33f4990612116306ab42fbd7c163a2f90c89c
diff --git a/hw/slw.c b/hw/slw.c index db238ec..515582b 100644 --- a/hw/slw.c +++ b/hw/slw.c @@ -598,7 +598,7 @@ static struct cpu_idle_states power9_cpu_idle_states[] = { { .name = "stop4", .latency_ns = 100000, - .residency_ns = 1000000, + .residency_ns = 10000000, .flags = 0*OPAL_PM_DEC_STOP \ | 0*OPAL_PM_TIMEBASE_STOP \ | 1*OPAL_PM_LOSE_USER_CONTEXT \ @@ -614,7 +614,7 @@ static struct cpu_idle_states power9_cpu_idle_states[] = { { .name = "stop5", .latency_ns = 200000, - .residency_ns = 2000000, + .residency_ns = 20000000, .flags = 0*OPAL_PM_DEC_STOP \ | 0*OPAL_PM_TIMEBASE_STOP \ | 1*OPAL_PM_LOSE_USER_CONTEXT \
Using DGEMM benchmark we observed there was a drop of 5-9% throughput with and without stop4/5. In this benchmark the GPU waits on the cpu to wakeup and provide the subsequent data block to compute. The wakup latency accumulates over the run and shows up as a performance drop. Linux enters stop4/5 more aggressively for its wakeup latency. Increasing the residency from 1ms to 10ms makes the performance drop <1% Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com> --- hw/slw.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)