diff mbox

[v2,1/3] cpu: Provide vcpu throttling interface

Message ID 87pp5drzrl.fsf@neno.neno
State New
Headers show

Commit Message

Juan Quintela June 3, 2015, 7:56 a.m. UTC
"Jason J. Herne" <jjherne@linux.vnet.ibm.com> wrote:
> Provide a method to throttle guest cpu execution. CPUState is augmented with
> timeout controls and throttle start/stop functions. To throttle the guest cpu
> the caller simply has to call the throttle start function and provide a ratio of
> sleep time to normal execution time.
>
> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>



Hi

sorry that I replied to your previous submission, I had "half done" the
mail from yesterday, I still think that all applies.


> ---
>  cpus.c            | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  include/qom/cpu.h | 46 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 108 insertions(+)
>
> diff --git a/cpus.c b/cpus.c
> index de6469f..7568357 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -64,6 +64,9 @@
>  
>  #endif /* CONFIG_LINUX */
>  
> +/* Number of ms between cpu throttle operations */
> +#define CPU_THROTTLE_TIMESLICE 10


We are checking for throotling on each cpu each 10ms.
But on patch 2 we can see that we only change the throotling each
time that we call migration_bitmap_sync(), that only happens each round
through all the pages. Normally auto-converge only matters for machines
with lots of memory, so this is going to happen each more than 10ms (we
change it each 4 passes).  You changed it to each 2 passes, and you add
it a 0.2.  I think  that I would preffer to just have it each single
pass, but add a 0.1 each pass?  simpler and end result would be the same?


This is what the old code did (new code do exactly the same, just in the
previous patch)

-static void mig_sleep_cpu(void *opq)
-{
-    qemu_mutex_unlock_iothread();
-    g_usleep(30*1000);
-    qemu_mutex_lock_iothread();
-}

So, what we are doing, calling async_run_on_cpu() with this function.

To make things short, we end here:

static void flush_queued_work(CPUState *cpu)
{
    struct qemu_work_item *wi;

    if (cpu->queued_work_first == NULL) {
        return;
    }

    while ((wi = cpu->queued_work_first)) {
        cpu->queued_work_first = wi->next;
        wi->func(wi->data);
        wi->done = true;
        if (wi->free) {
            g_free(wi);
        }
    }
    cpu->queued_work_last = NULL;
    qemu_cond_broadcast(&qemu_work_cond);
}

So, we are walking a list that is protected with the iothread_lock, and
we are dropping the lock in the middle of the walk .....  Just hoping
that nothing else is calling run_async_on_cpu() from a different place
....


Could we change this to something like:




And we handle where we are doing now what throotling_time and
sleeping_time should be?  No need to drop throotling_time/lock/whatever.

It adds an if on the fast path, but we have a call to the hypervisor
each time, so it shouldn't be so bad, no?

For tcp/xen we should seach for a different place to put this code, but
you get the idea.  Reason for putting it here is because this is where
the iothread is dropped, so we don't lost any locking, any other place
that we put it needs review with respect to this.


Jason, I am really, really sorry to have open this big can of worms on
your patch.  Now you know why I was telling that "improving"
autoconverge was not as easy as it looked.

With respect ot the last comment, I also think that we can include the
Jason patches.  They are one imprevement over what we have now.  It is
just that it needs more improvements.

Later, Juan.

Comments

Jason J. Herne June 3, 2015, 6:02 p.m. UTC | #1
On 06/03/2015 03:56 AM, Juan Quintela wrote:
> "Jason J. Herne" <jjherne@linux.vnet.ibm.com> wrote:
>> Provide a method to throttle guest cpu execution. CPUState is augmented with
>> timeout controls and throttle start/stop functions. To throttle the guest cpu
>> the caller simply has to call the throttle start function and provide a ratio of
>> sleep time to normal execution time.
>>
>> Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
>> Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
>
>
>
> Hi
>
> sorry that I replied to your previous submission, I had "half done" the
> mail from yesterday, I still think that all applies.
>
>
>> ---
>>   cpus.c            | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   include/qom/cpu.h | 46 +++++++++++++++++++++++++++++++++++++++++
>>   2 files changed, 108 insertions(+)
>>
>> diff --git a/cpus.c b/cpus.c
>> index de6469f..7568357 100644
>> --- a/cpus.c
>> +++ b/cpus.c
>> @@ -64,6 +64,9 @@
>>
>>   #endif /* CONFIG_LINUX */
>>
>> +/* Number of ms between cpu throttle operations */
>> +#define CPU_THROTTLE_TIMESLICE 10
>
>
> We are checking for throotling on each cpu each 10ms.
> But on patch 2 we can see that we only change the throotling each
> time that we call migration_bitmap_sync(), that only happens each round
> through all the pages. Normally auto-converge only matters for machines
> with lots of memory, so this is going to happen each more than 10ms (we
> change it each 4 passes).  You changed it to each 2 passes, and you add
> it a 0.2.  I think  that I would preffer to just have it each single
> pass, but add a 0.1 each pass?  simpler and end result would be the same?
>
>

Well, we certainly could make it run every pass but I think it would get
a little too aggressive then. The reason is, we do not increment the 
throttle
rate by adding 0.2 each time. We increment it by multiplying the current 
rate
by 2. So by doing that every pass we are doubling the exponential growth
rate. I will admit the numbers I chose are hardly scientific... I chose them
because they seemed to provide a decent balance of "throttling 
aggression" in
my workloads.

> This is what the old code did (new code do exactly the same, just in the
> previous patch)
>
> -static void mig_sleep_cpu(void *opq)
> -{
> -    qemu_mutex_unlock_iothread();
> -    g_usleep(30*1000);
> -    qemu_mutex_lock_iothread();
> -}
>
> So, what we are doing, calling async_run_on_cpu() with this function.
>
> To make things short, we end here:
>
> static void flush_queued_work(CPUState *cpu)
> {
>      struct qemu_work_item *wi;
>
>      if (cpu->queued_work_first == NULL) {
>          return;
>      }
>
>      while ((wi = cpu->queued_work_first)) {
>          cpu->queued_work_first = wi->next;
>          wi->func(wi->data);
>          wi->done = true;
>          if (wi->free) {
>              g_free(wi);
>          }
>      }
>      cpu->queued_work_last = NULL;
>      qemu_cond_broadcast(&qemu_work_cond);
> }
>
> So, we are walking a list that is protected with the iothread_lock, and
> we are dropping the lock in the middle of the walk .....  Just hoping
> that nothing else is calling run_async_on_cpu() from a different place
> ....
>
>
> Could we change this to something like:
>
> diff --git a/kvm-all.c b/kvm-all.c
> index 17a3771..ae9635f 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -1787,6 +1787,7 @@ int kvm_cpu_exec(CPUState *cpu)
>   {
>       struct kvm_run *run = cpu->kvm_run;
>       int ret, run_ret;
> +    long throotling_time;
>
>       DPRINTF("kvm_cpu_exec()\n");
>
> @@ -1813,8 +1814,15 @@ int kvm_cpu_exec(CPUState *cpu)
>                */
>               qemu_cpu_kick_self();
>           }
> +        throotling_time = cpu->throotling_time;
> +        cpu->throotling_time = 0;
> +        cpu->sleeping_time += throotling_time;
>           qemu_mutex_unlock_iothread();
>
> +
> +        if (throotling_time) {
> +            g_usleep(throttling_time);
> +        }
>           run_ret = kvm_vcpu_ioctl(cpu, KVM_RUN, 0);
>
>           qemu_mutex_lock_iothread();
>
>
>
> And we handle where we are doing now what throotling_time and
> sleeping_time should be?  No need to drop throotling_time/lock/whatever.
>
> It adds an if on the fast path, but we have a call to the hypervisor
> each time, so it shouldn't be so bad, no?
>
> For tcp/xen we should seach for a different place to put this code, but
> you get the idea.  Reason for putting it here is because this is where
> the iothread is dropped, so we don't lost any locking, any other place
> that we put it needs review with respect to this.
>
>
> Jason, I am really, really sorry to have open this big can of worms on
> your patch.  Now you know why I was telling that "improving"
> autoconverge was not as easy as it looked.
>
> With respect ot the last comment, I also think that we can include the
> Jason patches.  They are one imprevement over what we have now.  It is
> just that it needs more improvements.
>

Yes, I understand what you are suggesting here. I can certainly look into it
if you would prefer that rather than accept the current design. The reason I
did things using the timer and thread was because it fit into the old
auto-converge code very nicely. Does it make sense to go forward with my
current design (modified as per your earlier suggestions), and then follow
up with your proposal at a later date?
Dr. David Alan Gilbert June 3, 2015, 6:03 p.m. UTC | #2
* Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
> On 06/03/2015 03:56 AM, Juan Quintela wrote:
> >"Jason J. Herne" <jjherne@linux.vnet.ibm.com> wrote:
> >>Provide a method to throttle guest cpu execution. CPUState is augmented with
> >>timeout controls and throttle start/stop functions. To throttle the guest cpu
> >>the caller simply has to call the throttle start function and provide a ratio of
> >>sleep time to normal execution time.
> >>
> >>Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com>
> >>Reviewed-by: Matthew Rosato <mjrosato@linux.vnet.ibm.com>
> >
> >
> >
> >Hi
> >
> >sorry that I replied to your previous submission, I had "half done" the
> >mail from yesterday, I still think that all applies.
> >
> >
> >>---
> >>  cpus.c            | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  include/qom/cpu.h | 46 +++++++++++++++++++++++++++++++++++++++++
> >>  2 files changed, 108 insertions(+)
> >>
> >>diff --git a/cpus.c b/cpus.c
> >>index de6469f..7568357 100644
> >>--- a/cpus.c
> >>+++ b/cpus.c
> >>@@ -64,6 +64,9 @@
> >>
> >>  #endif /* CONFIG_LINUX */
> >>
> >>+/* Number of ms between cpu throttle operations */
> >>+#define CPU_THROTTLE_TIMESLICE 10
> >
> >
> >We are checking for throotling on each cpu each 10ms.
> >But on patch 2 we can see that we only change the throotling each
> >time that we call migration_bitmap_sync(), that only happens each round
> >through all the pages. Normally auto-converge only matters for machines
> >with lots of memory, so this is going to happen each more than 10ms (we
> >change it each 4 passes).  You changed it to each 2 passes, and you add
> >it a 0.2.  I think  that I would preffer to just have it each single
> >pass, but add a 0.1 each pass?  simpler and end result would be the same?
> >
> >
> 
> Well, we certainly could make it run every pass but I think it would get
> a little too aggressive then. The reason is, we do not increment the
> throttle
> rate by adding 0.2 each time. We increment it by multiplying the current
> rate
> by 2. So by doing that every pass we are doubling the exponential growth
> rate. I will admit the numbers I chose are hardly scientific... I chose them
> because they seemed to provide a decent balance of "throttling aggression"
> in
> my workloads.

That's the advantage of making them parameters.

Dave

> >This is what the old code did (new code do exactly the same, just in the
> >previous patch)
> >
> >-static void mig_sleep_cpu(void *opq)
> >-{
> >-    qemu_mutex_unlock_iothread();
> >-    g_usleep(30*1000);
> >-    qemu_mutex_lock_iothread();
> >-}
> >
> >So, what we are doing, calling async_run_on_cpu() with this function.
> >
> >To make things short, we end here:
> >
> >static void flush_queued_work(CPUState *cpu)
> >{
> >     struct qemu_work_item *wi;
> >
> >     if (cpu->queued_work_first == NULL) {
> >         return;
> >     }
> >
> >     while ((wi = cpu->queued_work_first)) {
> >         cpu->queued_work_first = wi->next;
> >         wi->func(wi->data);
> >         wi->done = true;
> >         if (wi->free) {
> >             g_free(wi);
> >         }
> >     }
> >     cpu->queued_work_last = NULL;
> >     qemu_cond_broadcast(&qemu_work_cond);
> >}
> >
> >So, we are walking a list that is protected with the iothread_lock, and
> >we are dropping the lock in the middle of the walk .....  Just hoping
> >that nothing else is calling run_async_on_cpu() from a different place
> >....
> >
> >
> >Could we change this to something like:
> >
> >diff --git a/kvm-all.c b/kvm-all.c
> >index 17a3771..ae9635f 100644
> >--- a/kvm-all.c
> >+++ b/kvm-all.c
> >@@ -1787,6 +1787,7 @@ int kvm_cpu_exec(CPUState *cpu)
> >  {
> >      struct kvm_run *run = cpu->kvm_run;
> >      int ret, run_ret;
> >+    long throotling_time;
> >
> >      DPRINTF("kvm_cpu_exec()\n");
> >
> >@@ -1813,8 +1814,15 @@ int kvm_cpu_exec(CPUState *cpu)
> >               */
> >              qemu_cpu_kick_self();
> >          }
> >+        throotling_time = cpu->throotling_time;
> >+        cpu->throotling_time = 0;
> >+        cpu->sleeping_time += throotling_time;
> >          qemu_mutex_unlock_iothread();
> >
> >+
> >+        if (throotling_time) {
> >+            g_usleep(throttling_time);
> >+        }
> >          run_ret = kvm_vcpu_ioctl(cpu, KVM_RUN, 0);
> >
> >          qemu_mutex_lock_iothread();
> >
> >
> >
> >And we handle where we are doing now what throotling_time and
> >sleeping_time should be?  No need to drop throotling_time/lock/whatever.
> >
> >It adds an if on the fast path, but we have a call to the hypervisor
> >each time, so it shouldn't be so bad, no?
> >
> >For tcp/xen we should seach for a different place to put this code, but
> >you get the idea.  Reason for putting it here is because this is where
> >the iothread is dropped, so we don't lost any locking, any other place
> >that we put it needs review with respect to this.
> >
> >
> >Jason, I am really, really sorry to have open this big can of worms on
> >your patch.  Now you know why I was telling that "improving"
> >autoconverge was not as easy as it looked.
> >
> >With respect ot the last comment, I also think that we can include the
> >Jason patches.  They are one imprevement over what we have now.  It is
> >just that it needs more improvements.
> >
> 
> Yes, I understand what you are suggesting here. I can certainly look into it
> if you would prefer that rather than accept the current design. The reason I
> did things using the timer and thread was because it fit into the old
> auto-converge code very nicely. Does it make sense to go forward with my
> current design (modified as per your earlier suggestions), and then follow
> up with your proposal at a later date?
> 
> -- 
> -- Jason J. Herne (jjherne@linux.vnet.ibm.com)
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Jason J. Herne June 3, 2015, 6:11 p.m. UTC | #3
On 06/03/2015 02:03 PM, Dr. David Alan Gilbert wrote:
> * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
>> On 06/03/2015 03:56 AM, Juan Quintela wrote:
>>> "Jason J. Herne" <jjherne@linux.vnet.ibm.com> wrote:
...
>>> We are checking for throotling on each cpu each 10ms.
>>> But on patch 2 we can see that we only change the throotling each
>>> time that we call migration_bitmap_sync(), that only happens each round
>>> through all the pages. Normally auto-converge only matters for machines
>>> with lots of memory, so this is going to happen each more than 10ms (we
>>> change it each 4 passes).  You changed it to each 2 passes, and you add
>>> it a 0.2.  I think  that I would preffer to just have it each single
>>> pass, but add a 0.1 each pass?  simpler and end result would be the same?
>>>
>>>
>>
>> Well, we certainly could make it run every pass but I think it would get
>> a little too aggressive then. The reason is, we do not increment the
>> throttle
>> rate by adding 0.2 each time. We increment it by multiplying the current
>> rate
>> by 2. So by doing that every pass we are doubling the exponential growth
>> rate. I will admit the numbers I chose are hardly scientific... I chose them
>> because they seemed to provide a decent balance of "throttling aggression"
>> in
>> my workloads.
>
> That's the advantage of making them parameters.

I see your point. Expecting the user to configure these parameters
seems a bit much. But I guess, in theory, it is better to have the
ability to change them and not need it, than need it and not have it
right?

So, as you stated earlier these should hook into MigrationParams
somehow? I'll admit this is the first I've seen this construct. If
this is the optimal location for the two controls (x-throttle-initial,
x-throttle-multiplier?) I can add them there. Will keep defaults of
0.2 for initial and 2.0 for multiplier(is there a better name?)?

...
Jason J. Herne June 8, 2015, 7:07 p.m. UTC | #4
On 06/03/2015 02:11 PM, Jason J. Herne wrote:
> On 06/03/2015 02:03 PM, Dr. David Alan Gilbert wrote:
>> * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
>>> On 06/03/2015 03:56 AM, Juan Quintela wrote:
>>>> "Jason J. Herne" <jjherne@linux.vnet.ibm.com> wrote:
> ...
>>>> We are checking for throotling on each cpu each 10ms.
>>>> But on patch 2 we can see that we only change the throotling each
>>>> time that we call migration_bitmap_sync(), that only happens each round
>>>> through all the pages. Normally auto-converge only matters for machines
>>>> with lots of memory, so this is going to happen each more than 10ms (we
>>>> change it each 4 passes).  You changed it to each 2 passes, and you add
>>>> it a 0.2.  I think  that I would preffer to just have it each single
>>>> pass, but add a 0.1 each pass?  simpler and end result would be the
>>>> same?
>>>>
>>>>
>>>
>>> Well, we certainly could make it run every pass but I think it would get
>>> a little too aggressive then. The reason is, we do not increment the
>>> throttle
>>> rate by adding 0.2 each time. We increment it by multiplying the current
>>> rate
>>> by 2. So by doing that every pass we are doubling the exponential growth
>>> rate. I will admit the numbers I chose are hardly scientific... I
>>> chose them
>>> because they seemed to provide a decent balance of "throttling
>>> aggression"
>>> in
>>> my workloads.
>>
>> That's the advantage of making them parameters.
>
> I see your point. Expecting the user to configure these parameters
> seems a bit much. But I guess, in theory, it is better to have the
> ability to change them and not need it, than need it and not have it
> right?
>
> So, as you stated earlier these should hook into MigrationParams
> somehow? I'll admit this is the first I've seen this construct. If
> this is the optimal location for the two controls (x-throttle-initial,
> x-throttle-multiplier?) I can add them there. Will keep defaults of
> 0.2 for initial and 2.0 for multiplier(is there a better name?)?
>

So I'm attempting add the initial throttle value and the multiplier to 
MigrationParameters and I've come across a problem. 
hmp_migrate_set_parameter assumes all parameters are ints. Apparently 
floating point is not allowed...

     void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
     {
         const char *param = qdict_get_str(qdict, "parameter");
         int value = qdict_get_int(qdict, "value");

Also from hmp-commands.hx

     {
         .name       = "migrate_set_parameter",
         .args_type  = "parameter:s,value:i",
         .params     = "parameter value",
         .help       = "Set the parameter for migration",
         .mhandler.cmd = hmp_migrate_set_parameter,
         .command_completion = migrate_set_parameter_completion,
     },

I'm hoping someone already has an idea for dealing with this problem? If 
not, I suppose this is a good add-on for Dave's discussion on 
redesigning MigrationParameters.
Dr. David Alan Gilbert June 9, 2015, 8:06 a.m. UTC | #5
* Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
> On 06/03/2015 02:11 PM, Jason J. Herne wrote:
> >On 06/03/2015 02:03 PM, Dr. David Alan Gilbert wrote:
> >>* Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
> >>>On 06/03/2015 03:56 AM, Juan Quintela wrote:
> >>>>"Jason J. Herne" <jjherne@linux.vnet.ibm.com> wrote:
> >...
> >>>>We are checking for throotling on each cpu each 10ms.
> >>>>But on patch 2 we can see that we only change the throotling each
> >>>>time that we call migration_bitmap_sync(), that only happens each round
> >>>>through all the pages. Normally auto-converge only matters for machines
> >>>>with lots of memory, so this is going to happen each more than 10ms (we
> >>>>change it each 4 passes).  You changed it to each 2 passes, and you add
> >>>>it a 0.2.  I think  that I would preffer to just have it each single
> >>>>pass, but add a 0.1 each pass?  simpler and end result would be the
> >>>>same?
> >>>>
> >>>>
> >>>
> >>>Well, we certainly could make it run every pass but I think it would get
> >>>a little too aggressive then. The reason is, we do not increment the
> >>>throttle
> >>>rate by adding 0.2 each time. We increment it by multiplying the current
> >>>rate
> >>>by 2. So by doing that every pass we are doubling the exponential growth
> >>>rate. I will admit the numbers I chose are hardly scientific... I
> >>>chose them
> >>>because they seemed to provide a decent balance of "throttling
> >>>aggression"
> >>>in
> >>>my workloads.
> >>
> >>That's the advantage of making them parameters.
> >
> >I see your point. Expecting the user to configure these parameters
> >seems a bit much. But I guess, in theory, it is better to have the
> >ability to change them and not need it, than need it and not have it
> >right?
> >
> >So, as you stated earlier these should hook into MigrationParams
> >somehow? I'll admit this is the first I've seen this construct. If
> >this is the optimal location for the two controls (x-throttle-initial,
> >x-throttle-multiplier?) I can add them there. Will keep defaults of
> >0.2 for initial and 2.0 for multiplier(is there a better name?)?
> >
> 
> So I'm attempting add the initial throttle value and the multiplier to
> MigrationParameters and I've come across a problem.
> hmp_migrate_set_parameter assumes all parameters are ints. Apparently
> floating point is not allowed...
> 
>     void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
>     {
>         const char *param = qdict_get_str(qdict, "parameter");
>         int value = qdict_get_int(qdict, "value");
> 
> Also from hmp-commands.hx
> 
>     {
>         .name       = "migrate_set_parameter",
>         .args_type  = "parameter:s,value:i",
>         .params     = "parameter value",
>         .help       = "Set the parameter for migration",
>         .mhandler.cmd = hmp_migrate_set_parameter,
>         .command_completion = migrate_set_parameter_completion,
>     },
> 
> I'm hoping someone already has an idea for dealing with this problem? If
> not, I suppose this is a good add-on for Dave's discussion on redesigning
> MigrationParameters.

Oh, that's yet another problem; hadn't thought about this one.
I don't think the suggestions I had in the previous mail would help that one
either;   It might work if you flipped the type to 's' and then parsed
that in the hmp code.

(cc'ing Markus in)
Dave

> 
> -- 
> -- Jason J. Herne (jjherne@linux.vnet.ibm.com)
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Jason J. Herne June 9, 2015, 3:14 p.m. UTC | #6
On 06/09/2015 04:06 AM, Dr. David Alan Gilbert wrote:
> * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
>> On 06/03/2015 02:11 PM, Jason J. Herne wrote:
>>> On 06/03/2015 02:03 PM, Dr. David Alan Gilbert wrote:
>>>> * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
>>>>> On 06/03/2015 03:56 AM, Juan Quintela wrote:
>>>>>> "Jason J. Herne" <jjherne@linux.vnet.ibm.com> wrote:
>>> ...
>>>>>> We are checking for throotling on each cpu each 10ms.
>>>>>> But on patch 2 we can see that we only change the throotling each
>>>>>> time that we call migration_bitmap_sync(), that only happens each round
>>>>>> through all the pages. Normally auto-converge only matters for machines
>>>>>> with lots of memory, so this is going to happen each more than 10ms (we
>>>>>> change it each 4 passes).  You changed it to each 2 passes, and you add
>>>>>> it a 0.2.  I think  that I would preffer to just have it each single
>>>>>> pass, but add a 0.1 each pass?  simpler and end result would be the
>>>>>> same?
>>>>>>
>>>>>>
>>>>>
>>>>> Well, we certainly could make it run every pass but I think it would get
>>>>> a little too aggressive then. The reason is, we do not increment the
>>>>> throttle
>>>>> rate by adding 0.2 each time. We increment it by multiplying the current
>>>>> rate
>>>>> by 2. So by doing that every pass we are doubling the exponential growth
>>>>> rate. I will admit the numbers I chose are hardly scientific... I
>>>>> chose them
>>>>> because they seemed to provide a decent balance of "throttling
>>>>> aggression"
>>>>> in
>>>>> my workloads.
>>>>
>>>> That's the advantage of making them parameters.
>>>
>>> I see your point. Expecting the user to configure these parameters
>>> seems a bit much. But I guess, in theory, it is better to have the
>>> ability to change them and not need it, than need it and not have it
>>> right?
>>>
>>> So, as you stated earlier these should hook into MigrationParams
>>> somehow? I'll admit this is the first I've seen this construct. If
>>> this is the optimal location for the two controls (x-throttle-initial,
>>> x-throttle-multiplier?) I can add them there. Will keep defaults of
>>> 0.2 for initial and 2.0 for multiplier(is there a better name?)?
>>>
>>
>> So I'm attempting add the initial throttle value and the multiplier to
>> MigrationParameters and I've come across a problem.
>> hmp_migrate_set_parameter assumes all parameters are ints. Apparently
>> floating point is not allowed...
>>
>>      void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
>>      {
>>          const char *param = qdict_get_str(qdict, "parameter");
>>          int value = qdict_get_int(qdict, "value");
>>
>> Also from hmp-commands.hx
>>
>>      {
>>          .name       = "migrate_set_parameter",
>>          .args_type  = "parameter:s,value:i",
>>          .params     = "parameter value",
>>          .help       = "Set the parameter for migration",
>>          .mhandler.cmd = hmp_migrate_set_parameter,
>>          .command_completion = migrate_set_parameter_completion,
>>      },
>>
>> I'm hoping someone already has an idea for dealing with this problem? If
>> not, I suppose this is a good add-on for Dave's discussion on redesigning
>> MigrationParameters.
>
> Oh, that's yet another problem; hadn't thought about this one.
> I don't think the suggestions I had in the previous mail would help that one
> either;   It might work if you flipped the type to 's' and then parsed
> that in the hmp code.
>

I could change it to a string, and then parse the data on a case-by-case 
basis in the switch/case logic. I feel like this is making a bad 
situation worse... But I don't see an easy way around it.
Dr. David Alan Gilbert June 10, 2015, 8:45 a.m. UTC | #7
* Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
> On 06/09/2015 04:06 AM, Dr. David Alan Gilbert wrote:
> >* Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
> >>On 06/03/2015 02:11 PM, Jason J. Herne wrote:
> >>>On 06/03/2015 02:03 PM, Dr. David Alan Gilbert wrote:
> >>>>* Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
> >>>>>On 06/03/2015 03:56 AM, Juan Quintela wrote:
> >>>>>>"Jason J. Herne" <jjherne@linux.vnet.ibm.com> wrote:
> >>>...
> >>>>>>We are checking for throotling on each cpu each 10ms.
> >>>>>>But on patch 2 we can see that we only change the throotling each
> >>>>>>time that we call migration_bitmap_sync(), that only happens each round
> >>>>>>through all the pages. Normally auto-converge only matters for machines
> >>>>>>with lots of memory, so this is going to happen each more than 10ms (we
> >>>>>>change it each 4 passes).  You changed it to each 2 passes, and you add
> >>>>>>it a 0.2.  I think  that I would preffer to just have it each single
> >>>>>>pass, but add a 0.1 each pass?  simpler and end result would be the
> >>>>>>same?
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>Well, we certainly could make it run every pass but I think it would get
> >>>>>a little too aggressive then. The reason is, we do not increment the
> >>>>>throttle
> >>>>>rate by adding 0.2 each time. We increment it by multiplying the current
> >>>>>rate
> >>>>>by 2. So by doing that every pass we are doubling the exponential growth
> >>>>>rate. I will admit the numbers I chose are hardly scientific... I
> >>>>>chose them
> >>>>>because they seemed to provide a decent balance of "throttling
> >>>>>aggression"
> >>>>>in
> >>>>>my workloads.
> >>>>
> >>>>That's the advantage of making them parameters.
> >>>
> >>>I see your point. Expecting the user to configure these parameters
> >>>seems a bit much. But I guess, in theory, it is better to have the
> >>>ability to change them and not need it, than need it and not have it
> >>>right?
> >>>
> >>>So, as you stated earlier these should hook into MigrationParams
> >>>somehow? I'll admit this is the first I've seen this construct. If
> >>>this is the optimal location for the two controls (x-throttle-initial,
> >>>x-throttle-multiplier?) I can add them there. Will keep defaults of
> >>>0.2 for initial and 2.0 for multiplier(is there a better name?)?
> >>>
> >>
> >>So I'm attempting add the initial throttle value and the multiplier to
> >>MigrationParameters and I've come across a problem.
> >>hmp_migrate_set_parameter assumes all parameters are ints. Apparently
> >>floating point is not allowed...
> >>
> >>     void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
> >>     {
> >>         const char *param = qdict_get_str(qdict, "parameter");
> >>         int value = qdict_get_int(qdict, "value");
> >>
> >>Also from hmp-commands.hx
> >>
> >>     {
> >>         .name       = "migrate_set_parameter",
> >>         .args_type  = "parameter:s,value:i",
> >>         .params     = "parameter value",
> >>         .help       = "Set the parameter for migration",
> >>         .mhandler.cmd = hmp_migrate_set_parameter,
> >>         .command_completion = migrate_set_parameter_completion,
> >>     },
> >>
> >>I'm hoping someone already has an idea for dealing with this problem? If
> >>not, I suppose this is a good add-on for Dave's discussion on redesigning
> >>MigrationParameters.
> >
> >Oh, that's yet another problem; hadn't thought about this one.
> >I don't think the suggestions I had in the previous mail would help that one
> >either;   It might work if you flipped the type to 's' and then parsed
> >that in the hmp code.
> >
> 
> I could change it to a string, and then parse the data on a case-by-case
> basis in the switch/case logic. I feel like this is making a bad situation
> worse... But I don't see an easy way around it.

I think the easiest 'solution' for this is to make the parameter an integer percentage
rather than as a float.  Not that pretty but easier than fighting the
interface code.

Dave


> 
> -- 
> -- Jason J. Herne (jjherne@linux.vnet.ibm.com)
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Jason J. Herne June 10, 2015, 12:03 p.m. UTC | #8
On 06/10/2015 04:45 AM, Dr. David Alan Gilbert wrote:
> * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
>> On 06/09/2015 04:06 AM, Dr. David Alan Gilbert wrote:
>>> * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
>>>> On 06/03/2015 02:11 PM, Jason J. Herne wrote:
>>>>> On 06/03/2015 02:03 PM, Dr. David Alan Gilbert wrote:
>>>>>> * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
>>>>>>> On 06/03/2015 03:56 AM, Juan Quintela wrote:
>>>>>>>> "Jason J. Herne" <jjherne@linux.vnet.ibm.com> wrote:
>>>>> ...
>>>>>>>> We are checking for throotling on each cpu each 10ms.
>>>>>>>> But on patch 2 we can see that we only change the throotling each
>>>>>>>> time that we call migration_bitmap_sync(), that only happens each round
>>>>>>>> through all the pages. Normally auto-converge only matters for machines
>>>>>>>> with lots of memory, so this is going to happen each more than 10ms (we
>>>>>>>> change it each 4 passes).  You changed it to each 2 passes, and you add
>>>>>>>> it a 0.2.  I think  that I would preffer to just have it each single
>>>>>>>> pass, but add a 0.1 each pass?  simpler and end result would be the
>>>>>>>> same?
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Well, we certainly could make it run every pass but I think it would get
>>>>>>> a little too aggressive then. The reason is, we do not increment the
>>>>>>> throttle
>>>>>>> rate by adding 0.2 each time. We increment it by multiplying the current
>>>>>>> rate
>>>>>>> by 2. So by doing that every pass we are doubling the exponential growth
>>>>>>> rate. I will admit the numbers I chose are hardly scientific... I
>>>>>>> chose them
>>>>>>> because they seemed to provide a decent balance of "throttling
>>>>>>> aggression"
>>>>>>> in
>>>>>>> my workloads.
>>>>>>
>>>>>> That's the advantage of making them parameters.
>>>>>
>>>>> I see your point. Expecting the user to configure these parameters
>>>>> seems a bit much. But I guess, in theory, it is better to have the
>>>>> ability to change them and not need it, than need it and not have it
>>>>> right?
>>>>>
>>>>> So, as you stated earlier these should hook into MigrationParams
>>>>> somehow? I'll admit this is the first I've seen this construct. If
>>>>> this is the optimal location for the two controls (x-throttle-initial,
>>>>> x-throttle-multiplier?) I can add them there. Will keep defaults of
>>>>> 0.2 for initial and 2.0 for multiplier(is there a better name?)?
>>>>>
>>>>
>>>> So I'm attempting add the initial throttle value and the multiplier to
>>>> MigrationParameters and I've come across a problem.
>>>> hmp_migrate_set_parameter assumes all parameters are ints. Apparently
>>>> floating point is not allowed...
>>>>
>>>>      void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
>>>>      {
>>>>          const char *param = qdict_get_str(qdict, "parameter");
>>>>          int value = qdict_get_int(qdict, "value");
>>>>
>>>> Also from hmp-commands.hx
>>>>
>>>>      {
>>>>          .name       = "migrate_set_parameter",
>>>>          .args_type  = "parameter:s,value:i",
>>>>          .params     = "parameter value",
>>>>          .help       = "Set the parameter for migration",
>>>>          .mhandler.cmd = hmp_migrate_set_parameter,
>>>>          .command_completion = migrate_set_parameter_completion,
>>>>      },
>>>>
>>>> I'm hoping someone already has an idea for dealing with this problem? If
>>>> not, I suppose this is a good add-on for Dave's discussion on redesigning
>>>> MigrationParameters.
>>>
>>> Oh, that's yet another problem; hadn't thought about this one.
>>> I don't think the suggestions I had in the previous mail would help that one
>>> either;   It might work if you flipped the type to 's' and then parsed
>>> that in the hmp code.
>>>
>>
>> I could change it to a string, and then parse the data on a case-by-case
>> basis in the switch/case logic. I feel like this is making a bad situation
>> worse... But I don't see an easy way around it.
>
> I think the easiest 'solution' for this is to make the parameter an integer percentage
> rather than as a float.  Not that pretty but easier than fighting the
> interface code.
>

Yes, I'm starting to feel this way :). Another option I'd like to 
collect opinions on it to change hmp's migrate_set_parameter to accept 
argument type O. As per monitor.c:

  * 'O'          option string of the form NAME=VALUE,...
  *              parsed according to QemuOptsList given by its name
  *              Example: 'device:O' uses qemu_device_opts.
  *              Restriction: only lists with empty desc are supported

This would allow arbitrary data types and allow several parameters to be 
set at once right? It looks like it would be a relatively 
straightforward change to make. Opinions?
Markus Armbruster June 11, 2015, 6:38 a.m. UTC | #9
"Jason J. Herne" <jjherne@linux.vnet.ibm.com> writes:

> On 06/10/2015 04:45 AM, Dr. David Alan Gilbert wrote:
>> * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
>>> On 06/09/2015 04:06 AM, Dr. David Alan Gilbert wrote:
>>>> * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
>>>>> On 06/03/2015 02:11 PM, Jason J. Herne wrote:
>>>>>> On 06/03/2015 02:03 PM, Dr. David Alan Gilbert wrote:
>>>>>>> * Jason J. Herne (jjherne@linux.vnet.ibm.com) wrote:
>>>>>>>> On 06/03/2015 03:56 AM, Juan Quintela wrote:
>>>>>>>>> "Jason J. Herne" <jjherne@linux.vnet.ibm.com> wrote:
>>>>>> ...
>>>>>>>>> We are checking for throotling on each cpu each 10ms.
>>>>>>>>> But on patch 2 we can see that we only change the throotling each
>>>>>>>>> time that we call migration_bitmap_sync(), that only happens each round
>>>>>>>>> through all the pages. Normally auto-converge only matters for machines
>>>>>>>>> with lots of memory, so this is going to happen each more than 10ms (we
>>>>>>>>> change it each 4 passes).  You changed it to each 2 passes, and you add
>>>>>>>>> it a 0.2.  I think  that I would preffer to just have it each single
>>>>>>>>> pass, but add a 0.1 each pass?  simpler and end result would be the
>>>>>>>>> same?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Well, we certainly could make it run every pass but I think it would get
>>>>>>>> a little too aggressive then. The reason is, we do not increment the
>>>>>>>> throttle
>>>>>>>> rate by adding 0.2 each time. We increment it by multiplying the current
>>>>>>>> rate
>>>>>>>> by 2. So by doing that every pass we are doubling the exponential growth
>>>>>>>> rate. I will admit the numbers I chose are hardly scientific... I
>>>>>>>> chose them
>>>>>>>> because they seemed to provide a decent balance of "throttling
>>>>>>>> aggression"
>>>>>>>> in
>>>>>>>> my workloads.
>>>>>>>
>>>>>>> That's the advantage of making them parameters.
>>>>>>
>>>>>> I see your point. Expecting the user to configure these parameters
>>>>>> seems a bit much. But I guess, in theory, it is better to have the
>>>>>> ability to change them and not need it, than need it and not have it
>>>>>> right?
>>>>>>
>>>>>> So, as you stated earlier these should hook into MigrationParams
>>>>>> somehow? I'll admit this is the first I've seen this construct. If
>>>>>> this is the optimal location for the two controls (x-throttle-initial,
>>>>>> x-throttle-multiplier?) I can add them there. Will keep defaults of
>>>>>> 0.2 for initial and 2.0 for multiplier(is there a better name?)?
>>>>>>
>>>>>
>>>>> So I'm attempting add the initial throttle value and the multiplier to
>>>>> MigrationParameters and I've come across a problem.
>>>>> hmp_migrate_set_parameter assumes all parameters are ints. Apparently
>>>>> floating point is not allowed...
>>>>>
>>>>>      void hmp_migrate_set_parameter(Monitor *mon, const QDict *qdict)
>>>>>      {
>>>>>          const char *param = qdict_get_str(qdict, "parameter");
>>>>>          int value = qdict_get_int(qdict, "value");
>>>>>
>>>>> Also from hmp-commands.hx
>>>>>
>>>>>      {
>>>>>          .name       = "migrate_set_parameter",
>>>>>          .args_type  = "parameter:s,value:i",
>>>>>          .params     = "parameter value",
>>>>>          .help       = "Set the parameter for migration",
>>>>>          .mhandler.cmd = hmp_migrate_set_parameter,
>>>>>          .command_completion = migrate_set_parameter_completion,
>>>>>      },
>>>>>
>>>>> I'm hoping someone already has an idea for dealing with this problem? If
>>>>> not, I suppose this is a good add-on for Dave's discussion on redesigning
>>>>> MigrationParameters.
>>>>
>>>> Oh, that's yet another problem; hadn't thought about this one.
>>>> I don't think the suggestions I had in the previous mail would help that one
>>>> either;   It might work if you flipped the type to 's' and then parsed
>>>> that in the hmp code.
>>>>
>>>
>>> I could change it to a string, and then parse the data on a case-by-case
>>> basis in the switch/case logic. I feel like this is making a bad situation
>>> worse... But I don't see an easy way around it.

Ugh.

Differently ugh: make value's declared type a double, then reject
unwanted values depending on the parameter string.

>> I think the easiest 'solution' for this is to make the parameter an
>> integer percentage
>> rather than as a float.  Not that pretty but easier than fighting the
>> interface code.

If you can make do with int, you don't have to change the HMP command.

If I understand your problem correctly (I only skimmed, sorry), we're
talking about two parameters: an initial throttling factor (range (0,1],
percentage should do) and a "throttling multiplier", but I didn't quite
get what it's supposed to do, or what its acceptable range would be.

> Yes, I'm starting to feel this way :). Another option I'd like to
> collect opinions on it to change hmp's migrate_set_parameter to accept
> argument type O. As per monitor.c:
>
>  * 'O'          option string of the form NAME=VALUE,...
>  *              parsed according to QemuOptsList given by its name
>  *              Example: 'device:O' uses qemu_device_opts.
>  *              Restriction: only lists with empty desc are supported
>
> This would allow arbitrary data types and allow several parameters to
> be set at once right? It looks like it would be a relatively
> straightforward change to make. Opinions?

Incompatible change.  "migrate_set_parameter P V" becomes
"migrate_set_parameter P=V".  I guess we could accept that.

I'd prefer that to doing the parsing yourself, because it has a better
chance at keeping the parsing consistent.
diff mbox

Patch

diff --git a/kvm-all.c b/kvm-all.c
index 17a3771..ae9635f 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1787,6 +1787,7 @@  int kvm_cpu_exec(CPUState *cpu)
 {
     struct kvm_run *run = cpu->kvm_run;
     int ret, run_ret;
+    long throotling_time;

     DPRINTF("kvm_cpu_exec()\n");

@@ -1813,8 +1814,15 @@  int kvm_cpu_exec(CPUState *cpu)
              */
             qemu_cpu_kick_self();
         }
+        throotling_time = cpu->throotling_time;
+        cpu->throotling_time = 0;
+        cpu->sleeping_time += throotling_time;
         qemu_mutex_unlock_iothread();

+
+        if (throotling_time) {
+            g_usleep(throttling_time);
+        }
         run_ret = kvm_vcpu_ioctl(cpu, KVM_RUN, 0);

         qemu_mutex_lock_iothread();