diff mbox

[5/5] cpu: throttle: fix throttle time slice

Message ID 1490599288-11751-6-git-send-email-peterx@redhat.com
State New
Headers show

Commit Message

Peter Xu March 27, 2017, 7:21 a.m. UTC
IIUC the throttle idea is that: we split a CPU_THROTTLE_TIMESLICE_NS
time slice into two parts - one for vcpu, one for throttle thread (which
will suspend the thread by a sleep). However current algorithm on
calculating the working piece and sleeping piece is strange.

Assume a 99% throttle, what we want is to merely stop vcpu from running,
but the old logic will just first let the vcpu run for a very long
time (which is "CPU_THROTTLE_TIMESLICE_NS / (1-pct)" = 1 second) before
doing anything else.

Fixing it up to the simplest but imho accurate logic.

CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Jason J. Herne <jjherne@linux.vnet.ibm.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 cpus.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Comments

Peter Xu March 27, 2017, 7:40 a.m. UTC | #1
On Mon, Mar 27, 2017 at 03:21:28PM +0800, Peter Xu wrote:
> IIUC the throttle idea is that: we split a CPU_THROTTLE_TIMESLICE_NS
> time slice into two parts - one for vcpu, one for throttle thread (which
> will suspend the thread by a sleep). However current algorithm on
> calculating the working piece and sleeping piece is strange.
> 
> Assume a 99% throttle, what we want is to merely stop vcpu from running,
> but the old logic will just first let the vcpu run for a very long
> time (which is "CPU_THROTTLE_TIMESLICE_NS / (1-pct)" = 1 second) before
> doing anything else.
> 
> Fixing it up to the simplest but imho accurate logic.

Oh, looks like I need to switch the two pct below... :)

> 
> CC: Paolo Bonzini <pbonzini@redhat.com>
> CC: Richard Henderson <rth@twiddle.net>
> CC: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  cpus.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/cpus.c b/cpus.c
> index 167d961..7976ce4 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -633,7 +633,6 @@ static const VMStateDescription vmstate_timers = {
>  static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
>  {
>      double pct;
> -    double throttle_ratio;
>      long sleeptime_ns;
>  
>      if (!cpu_throttle_get_percentage()) {
> @@ -641,8 +640,7 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
>      }
>  
>      pct = (double)cpu_throttle_get_percentage()/100;
> -    throttle_ratio = pct / (1 - pct);
> -    sleeptime_ns = (long)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS);
> +    sleeptime_ns = (long)((1 - pct) * CPU_THROTTLE_TIMESLICE_NS);
                             ^^^^^^^^^
                             here it should be "pct", while...

>  
>      qemu_mutex_unlock_iothread();
>      atomic_set(&cpu->throttle_thread_scheduled, 0);
> @@ -668,7 +666,7 @@ static void cpu_throttle_timer_tick(void *opaque)
>  
>      pct = (double)cpu_throttle_get_percentage()/100;
>      timer_mod(throttle_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT) +
> -                                   CPU_THROTTLE_TIMESLICE_NS / (1-pct));
> +                                   CPU_THROTTLE_TIMESLICE_NS * pct);
                                                                  ^^^
                                                here it should be (1 - pct)

I'll wait for review comment on the raw idea, to see whether I will
need a repost. Sorry for the misunderstanding.

-- peterx
Paolo Bonzini March 31, 2017, 4:38 p.m. UTC | #2
On 27/03/2017 09:21, Peter Xu wrote:
> @@ -641,8 +640,7 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
>      }
>  
>      pct = (double)cpu_throttle_get_percentage()/100;
> -    throttle_ratio = pct / (1 - pct);
> -    sleeptime_ns = (long)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS);

Say pct = 0.25, then throttle_ratio = 0.25/0.75 = 1/3.

> +    sleeptime_ns = (long)((1 - pct) * CPU_THROTTLE_TIMESLICE_NS);
>  
>      qemu_mutex_unlock_iothread();
>      atomic_set(&cpu->throttle_thread_scheduled, 0);
> @@ -668,7 +666,7 @@ static void cpu_throttle_timer_tick(void *opaque)
>  
>      pct = (double)cpu_throttle_get_percentage()/100;
>      timer_mod(throttle_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT) +
> -                                   CPU_THROTTLE_TIMESLICE_NS / (1-pct));

And the timer is running every 1/0.75 = 4/3 * CPU_THROTTLE_TIMESLICE_NS.

Of these, 1/3 is spent sleeping (3.33 ms), while 1 (10 ms) is spent not
sleeping.

When pct = 0.75, throttle_ratio = 3 and the timer is running every 4 *
CPU_THROTTLE_TIMESLICE_NS (40 ms).  Of these, 3 slices (30 ms) are spent
sleeping, while 10 ms are spent not sleeping.

The rationale _could_ be (I don't remember) that a CPU with a very high
throttling frequency leaves little time for the migration thread to do
any work.  So QEMU keeps the "on" phase always the same and lengthens
the "off" phase, which as you found out can be unsatisfactory.

However, I think your patch has the opposite problem: the frequency is
constant, but with high throttling all time reserved for the CPU will be
lost in overhead.  For example, at 99% throttling you only have 100
microseconds to wake up, do work and go back to sleep.

So I'm inclined _not_ to take your patch.  One possibility could be to
do the following:

- for throttling between 0% and 80%, use the current algorithm.  At 66%,
the CPU will work for 10 ms and sleep for 40 ms.

- for throttling above 80% adapt your algorithm to have a variable
timeslice, going from 50 ms at 66% to 100 ms at 100%.  This way, the CPU
time will shrink below 10 ms and the sleep time will grow.

It looks like this: http://i.imgur.com/lyFie04.png

So at 99% the timeslice will be 97.5 ms; the CPU will work for 975 us
and sleep for the rest (10x more than with just your patch).  But I'm
not sure it's really worth it.

Paolo

> +                                   CPU_THROTTLE_TIMESLICE_NS * pct);
>  }
Dr. David Alan Gilbert March 31, 2017, 7:13 p.m. UTC | #3
Ignoring the details below for a minute, this patch belongs in a separate
series; all the rest of the patches in this set are nice simple ones.

* Paolo Bonzini (pbonzini@redhat.com) wrote:
> 
> 
> On 27/03/2017 09:21, Peter Xu wrote:
> > @@ -641,8 +640,7 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
> >      }
> >  
> >      pct = (double)cpu_throttle_get_percentage()/100;
> > -    throttle_ratio = pct / (1 - pct);
> > -    sleeptime_ns = (long)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS);
> 
> Say pct = 0.25, then throttle_ratio = 0.25/0.75 = 1/3.
> 
> > +    sleeptime_ns = (long)((1 - pct) * CPU_THROTTLE_TIMESLICE_NS);
> >  
> >      qemu_mutex_unlock_iothread();
> >      atomic_set(&cpu->throttle_thread_scheduled, 0);
> > @@ -668,7 +666,7 @@ static void cpu_throttle_timer_tick(void *opaque)
> >  
> >      pct = (double)cpu_throttle_get_percentage()/100;
> >      timer_mod(throttle_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT) +
> > -                                   CPU_THROTTLE_TIMESLICE_NS / (1-pct));
> 
> And the timer is running every 1/0.75 = 4/3 * CPU_THROTTLE_TIMESLICE_NS.
> 
> Of these, 1/3 is spent sleeping (3.33 ms), while 1 (10 ms) is spent not
> sleeping.
> 
> When pct = 0.75, throttle_ratio = 3 and the timer is running every 4 *
> CPU_THROTTLE_TIMESLICE_NS (40 ms).  Of these, 3 slices (30 ms) are spent
> sleeping, while 10 ms are spent not sleeping.
> 
> The rationale _could_ be (I don't remember) that a CPU with a very high
> throttling frequency leaves little time for the migration thread to do
> any work.  So QEMU keeps the "on" phase always the same and lengthens
> the "off" phase, which as you found out can be unsatisfactory.
> 
> However, I think your patch has the opposite problem: the frequency is
> constant, but with high throttling all time reserved for the CPU will be
> lost in overhead.  For example, at 99% throttling you only have 100
> microseconds to wake up, do work and go back to sleep.

Yes, and I'm worried that with the 10ms timeslice it is a lot of overhead,
especially if your timer that wakes you have that much higher resolution than
that.

> So I'm inclined _not_ to take your patch.  One possibility could be to
> do the following:
> 
> - for throttling between 0% and 80%, use the current algorithm.  At 66%,
> the CPU will work for 10 ms and sleep for 40 ms.
> 
> - for throttling above 80% adapt your algorithm to have a variable
> timeslice, going from 50 ms at 66% to 100 ms at 100%.  This way, the CPU
> time will shrink below 10 ms and the sleep time will grow.

It seems odd to have a threshold like that on something that's supposedly
a linear scale.

> It looks like this: http://i.imgur.com/lyFie04.png
> 
> So at 99% the timeslice will be 97.5 ms; the CPU will work for 975 us
> and sleep for the rest (10x more than with just your patch).  But I'm
> not sure it's really worth it.

Can you really run a CPU for 975us ?

Dave

> Paolo
> 
> > +                                   CPU_THROTTLE_TIMESLICE_NS * pct);
> >  }
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Paolo Bonzini March 31, 2017, 7:46 p.m. UTC | #4
> > So I'm inclined _not_ to take your patch.  One possibility could be to
> > do the following:
> > 
> > - for throttling between 0% and 80%, use the current algorithm.  At 66%,
> > the CPU will work for 10 ms and sleep for 40 ms.
> > 
> > - for throttling above 80% adapt your algorithm to have a variable
> > timeslice, going from 50 ms at 66% to 100 ms at 100%.  This way, the CPU
> > time will shrink below 10 ms and the sleep time will grow.

Oops, all the 66% should be 80%.

> It seems odd to have a threshold like that on something that's supposedly
> a linear scale.

I futzed a bit with the threshold until the first derivative of the CPU
time was zero at the threshold, and the result was 80%.  That is, if you
switch before 80%, the CPU time slice can grow to more than 10 ms right
after the threshold, and then start shrinking.

> > It looks like this: http://i.imgur.com/lyFie04.png
> > 
> > So at 99% the timeslice will be 97.5 ms; the CPU will work for 975 us
> > and sleep for the rest (10x more than with just your patch).  But I'm
> > not sure it's really worth it.
> 
> Can you really run a CPU for 975us ?

It's 2-3 million clock cycles, should be doable.

Paolo
Peter Xu April 1, 2017, 7:52 a.m. UTC | #5
On Fri, Mar 31, 2017 at 06:38:50PM +0200, Paolo Bonzini wrote:
> 
> 
> On 27/03/2017 09:21, Peter Xu wrote:
> > @@ -641,8 +640,7 @@ static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
> >      }
> >  
> >      pct = (double)cpu_throttle_get_percentage()/100;
> > -    throttle_ratio = pct / (1 - pct);
> > -    sleeptime_ns = (long)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS);
> 
> Say pct = 0.25, then throttle_ratio = 0.25/0.75 = 1/3.
> 
> > +    sleeptime_ns = (long)((1 - pct) * CPU_THROTTLE_TIMESLICE_NS);
> >  
> >      qemu_mutex_unlock_iothread();
> >      atomic_set(&cpu->throttle_thread_scheduled, 0);
> > @@ -668,7 +666,7 @@ static void cpu_throttle_timer_tick(void *opaque)
> >  
> >      pct = (double)cpu_throttle_get_percentage()/100;
> >      timer_mod(throttle_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT) +
> > -                                   CPU_THROTTLE_TIMESLICE_NS / (1-pct));
> 
> And the timer is running every 1/0.75 = 4/3 * CPU_THROTTLE_TIMESLICE_NS.
> 
> Of these, 1/3 is spent sleeping (3.33 ms), while 1 (10 ms) is spent not
> sleeping.
> 
> When pct = 0.75, throttle_ratio = 3 and the timer is running every 4 *
> CPU_THROTTLE_TIMESLICE_NS (40 ms).  Of these, 3 slices (30 ms) are spent
> sleeping, while 10 ms are spent not sleeping.
> 
> The rationale _could_ be (I don't remember) that a CPU with a very high
> throttling frequency leaves little time for the migration thread to do
> any work.  So QEMU keeps the "on" phase always the same and lengthens
> the "off" phase, which as you found out can be unsatisfactory.

Yes. Sorry I must have done an incorrect math that day. Current
algorithm is correct, it just assumes that the running time is
constant, which is CPU_THROTTLE_TIMESLICE_NS (10ms). I just didn't
realize it at the first glance.

> 
> However, I think your patch has the opposite problem: the frequency is
> constant, but with high throttling all time reserved for the CPU will be
> lost in overhead.  For example, at 99% throttling you only have 100
> microseconds to wake up, do work and go back to sleep.
> 
> So I'm inclined _not_ to take your patch.  One possibility could be to
> do the following:
> 
> - for throttling between 0% and 80%, use the current algorithm.  At 66%,
> the CPU will work for 10 ms and sleep for 40 ms.
> 
> - for throttling above 80% adapt your algorithm to have a variable
> timeslice, going from 50 ms at 66% to 100 ms at 100%.  This way, the CPU
> time will shrink below 10 ms and the sleep time will grow.
> 
> It looks like this: http://i.imgur.com/lyFie04.png
> 
> So at 99% the timeslice will be 97.5 ms; the CPU will work for 975 us
> and sleep for the rest (10x more than with just your patch).  But I'm
> not sure it's really worth it.

Yeah. It may not that worth it. Thanks for the analysis. :-)

Will drop this patch in next post.

Thanks,

-- peterx
Dr. David Alan Gilbert April 4, 2017, 3:44 p.m. UTC | #6
* Paolo Bonzini (pbonzini@redhat.com) wrote:
> 
> 
> > > So I'm inclined _not_ to take your patch.  One possibility could be to
> > > do the following:
> > > 
> > > - for throttling between 0% and 80%, use the current algorithm.  At 66%,
> > > the CPU will work for 10 ms and sleep for 40 ms.
> > > 
> > > - for throttling above 80% adapt your algorithm to have a variable
> > > timeslice, going from 50 ms at 66% to 100 ms at 100%.  This way, the CPU
> > > time will shrink below 10 ms and the sleep time will grow.
> 
> Oops, all the 66% should be 80%.
> 
> > It seems odd to have a threshold like that on something that's supposedly
> > a linear scale.
> 
> I futzed a bit with the threshold until the first derivative of the CPU
> time was zero at the threshold, and the result was 80%.  That is, if you
> switch before 80%, the CPU time slice can grow to more than 10 ms right
> after the threshold, and then start shrinking.
> 
> > > It looks like this: http://i.imgur.com/lyFie04.png
> > > 
> > > So at 99% the timeslice will be 97.5 ms; the CPU will work for 975 us
> > > and sleep for the rest (10x more than with just your patch).  But I'm
> > > not sure it's really worth it.
> > 
> > Can you really run a CPU for 975us ?
> 
> It's 2-3 million clock cycles, should be doable.

I wasn't really worrying about the CPU running, I was more worried
about timer resolution in stopping it.  If you're timer isn't that accurate
in starting/stopping the CPU then the errors might be so large as to
make that a bit odd.

Dave

> Paolo
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
diff mbox

Patch

diff --git a/cpus.c b/cpus.c
index 167d961..7976ce4 100644
--- a/cpus.c
+++ b/cpus.c
@@ -633,7 +633,6 @@  static const VMStateDescription vmstate_timers = {
 static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
 {
     double pct;
-    double throttle_ratio;
     long sleeptime_ns;
 
     if (!cpu_throttle_get_percentage()) {
@@ -641,8 +640,7 @@  static void cpu_throttle_thread(CPUState *cpu, run_on_cpu_data opaque)
     }
 
     pct = (double)cpu_throttle_get_percentage()/100;
-    throttle_ratio = pct / (1 - pct);
-    sleeptime_ns = (long)(throttle_ratio * CPU_THROTTLE_TIMESLICE_NS);
+    sleeptime_ns = (long)((1 - pct) * CPU_THROTTLE_TIMESLICE_NS);
 
     qemu_mutex_unlock_iothread();
     atomic_set(&cpu->throttle_thread_scheduled, 0);
@@ -668,7 +666,7 @@  static void cpu_throttle_timer_tick(void *opaque)
 
     pct = (double)cpu_throttle_get_percentage()/100;
     timer_mod(throttle_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL_RT) +
-                                   CPU_THROTTLE_TIMESLICE_NS / (1-pct));
+                                   CPU_THROTTLE_TIMESLICE_NS * pct);
 }
 
 void cpu_throttle_set(int new_throttle_pct)