diff mbox

[qemu,2/2] kvmclock: reduce kvmclock difference on migration

Message ID 20161114154015.GA30048@amt.cnet
State New
Headers show

Commit Message

Marcelo Tosatti Nov. 14, 2016, 3:40 p.m. UTC
On Mon, Nov 14, 2016 at 04:00:38PM +0100, Paolo Bonzini wrote:
> 
> 
> On 14/11/2016 15:50, Marcelo Tosatti wrote:
> > Well, i didnt want to mix the meaning of the variables:
> > 
> > +    /* whether machine supports reliable KVM_GET_CLOCK */
> > +    bool mach_use_reliable_get_clock;
> > +
> > +    /* whether source host supported reliable KVM_GET_CLOCK */
> > +    bool src_use_reliable_get_clock;
> > 
> > See the comments on top (later if you look at the variable, 
> > then have to think: well it has one name, but its disabled 
> > by that other path as well, so its more than its 
> > name,etc...).
> > 
> >> I'm thinking "mach_use_reliable_get_clock is just for migration,
> > 
> > Thats whether the machine supports it. New machines have it enabled,
> > olders don't.
> 
> Yes.
> 
> >> src_use_reliable_get_clock is the state". 
> > 
> > Thats whether the migration source supported it.
> 
> But it's not used only for migration.  It's used on every vmstate change
> (running->stop and stop->running, isn't it?  

No. Its used only if s->clock_valid == false, which means either:

migration/savevm/init.

Now for savevm there is a bug: 

> I think that, apart from
> the migration case, it's better to use s->clock if kvmclock is stable,
> even on older machine types.

Yes, its already doing that:

        /* local (running VM) restore */
        if (s->clock_valid) {
            /*
             * if host does not support reliable KVM_GET_CLOCK,
             * read kvmclock value from memory
             */
            if (!kvm_has_adjust_clock_stable()) {
                time_at_migration = kvmclock_current_nsec(s);
            }

It only uses kvmclock_current_nsec() if kvm_has_adjust_clock_stable is 
false.

> >>>>> +static bool kvmclock_src_use_reliable_get_clock(void *opaque)
> >>>>> +{
> >>>>> +    KVMClockState *s = opaque;
> >>>>> +
> >>>>> +    /*
> >>>>> +     * On machine types that support reliable KVM_GET_CLOCK,
> >>>>> +     * if host kernel does provide reliable KVM_GET_CLOCK,
> >>>>> +     * set src_use_reliable_get_clock=true so that destination
> >>>>> +     * avoids reading kvmclock from memory.
> >>>>> +     */
> >>>>> +    if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) {
> >>>>> +        s->src_use_reliable_get_clock = true;
> >>>>> +    }
> >>>>> +
> >>>>> +    return s->src_use_reliable_get_clock;
> >>>>> +}
> >>>>
> >>>> Here you can just return s->mach_use_reliable_get_clock. 
> >>>
> >>> mach_use_reliable_get_clock can be true but host might not support it.
> >>
> >> Yes, but the "needed" function is only required to avoid breaking
> >> pc-i440fx-2.7 and earlier. 
> > 
> > "needed" is required so that the migration between:
> > 
> > SRC             DEST                BEHAVIOUR
> > ~support        supports            on migration read from guest,
> >                                     on stop/cont use
> >                                     kvm_get_clock/kvm_set_clock
> > 
> > Destination does not use KVM_GET_CLOCK value (which is
> > broken and should not be used).
> 
> If needed returns false, the destination will see
> src_use_reliable_get_clock = false anyway.

Causing it to read the clock from memory, thats what 
should happen.

> If needed returns true, the destination can still see
> src_use_reliable_get_clock = false if that's the value.

You are asking me to change from:

+static bool kvmclock_src_use_reliable_get_clock(void *opaque)
+{
+    KVMClockState *s = opaque;
+
+    /*
+     * On machine types that support reliable KVM_GET_CLOCK,
+     * if host kernel does provide reliable KVM_GET_CLOCK,
+     * set src_use_reliable_get_clock=true so that destination
+     * avoids reading kvmclock from memory.
+     */
+    if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) {
+        s->src_use_reliable_get_clock = true;
+    }
+
+    return s->src_use_reliable_get_clock;
+}

to

static bool kvmclock_src_use_reliable_get_clock(void *opaque)
{
    KVMClockState *s = opaque;

    /*
     * On machine types that support reliable KVM_GET_CLOCK,
     * if host kernel does provide reliable KVM_GET_CLOCK,
     * set src_use_reliable_get_clock=true so that destination
     * avoids reading kvmclock from memory.
     */
    if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable())
    {
        s->src_use_reliable_get_clock = true;
    }

    return s->mach_use_reliable_get_clock;
}


Ah, OK, done.


> needed	src_use_reliable_get_clock	effect
> false	false				use kvmclock_current_nsec
> false	true				use kvmclock_current_nsec
> true	false				use kvmclock_current_nsec
> true	true				use s->clock
> 
> 
> So the idea is:
> 
> - set s->clock and s->reliable_get_clock on every KVM_GET_CLOCK

Already do that (apart from s->clock_valid which is necessary).

> - on migration source, use subsections and
> s->mach_use_reliable_get_clock to avoid breaking migration on pre-2.8
> machine types

Migration is already broken when you use different machine types.

So s->src_use_reliable_get_clock is only used to indicate 
to the destination that: "you can use KVM_GET_CLOCK value, 
its safe".

> - on migration destination, use .pre_load so that s->reliable_get_clock
> is initialized to false on older machine types

Thats what mach_use_reliable_get_clock does already:

static Property kvmclock_properties[] = {
    DEFINE_PROP_BOOL("mach_use_reliable_get_clock", KVMClockState,
                      mach_use_reliable_get_clock, true),
    DEFINE_PROP_END_OF_LIST(),
};

+        .driver   = "kvmclock",\
+        .property = "mach_use_reliable_get_clock",\
+        .value    = "off",\
+    },\
+    {\

> 
> >> If you return true here, you can still
> >> migrate a "false" value for src_use_reliable_get_clock.
> > 
> > But the source only uses a reliable KVM_GET_CLOCK if 
> > both conditions are true.
> > 
> > And the subsection is only needed if the source
> > uses a reliable KVM_GET_CLOCK.
> 
> Yes, but the point of the subsection is just to avoid breaking migration
> format.

Its a new creative use of the subsection. 

> >> It is the same as "is using masterclock", which is actually a stricter
> >> condition than the KVM_CHECK_EXTENSION return value.  The right check to
> >> use is whether masterclock is in use, 
> > 
> > Actually its "has a reliable KVM_GET_CLOCK" (which returns 
> > get_kernel_clock() + (rdtsc() - tsc_timestamp), 
> > 
> > "broken KVM_GET_CLOCK" =  get_kernel_clock()
> 
> Yes.  And these end up being the same.

No. You can have masterclock enabled, KVM_TSC_STABLE set in pvclock
flags, and unreliable KVM_GET_CLOCK (without your patch to
KVM_GET_CLOCK).


Paolo not sure if there is anything further you want me to 
change at this point ?

Comments

Paolo Bonzini Nov. 14, 2016, 4:43 p.m. UTC | #1
On 14/11/2016 16:40, Marcelo Tosatti wrote:
> static bool kvmclock_src_use_reliable_get_clock(void *opaque)
> {
>     KVMClockState *s = opaque;
> 
>     /*
>      * On machine types that support reliable KVM_GET_CLOCK,
>      * if host kernel does provide reliable KVM_GET_CLOCK,
>      * set src_use_reliable_get_clock=true so that destination
>      * avoids reading kvmclock from memory.
>      */
>     if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable())
>     {
>         s->src_use_reliable_get_clock = true;
>     }
> 
>     return s->mach_use_reliable_get_clock;
> }
> 
> 
> Ah, OK, done.

s->src_use_reliable_get_clock should not be set with
KVM_CHECK_EXTENSION, but rather from the flags returned by KVM_GET_CLOCK.

> So s->src_use_reliable_get_clock is only used to indicate 
> to the destination that: "you can use KVM_GET_CLOCK value, 
> its safe".

Yes, we agree.  I was listing all the points, not just those where we
disagree.  Actually I'm not sure where we disagree, except on using
flags from KVM_CHECK_EXTENSION vs. flags from KVM_GET_CLOCK...

Paolo
Marcelo Tosatti Nov. 14, 2016, 5:13 p.m. UTC | #2
On Mon, Nov 14, 2016 at 05:43:33PM +0100, Paolo Bonzini wrote:
> 
> 
> On 14/11/2016 16:40, Marcelo Tosatti wrote:
> > static bool kvmclock_src_use_reliable_get_clock(void *opaque)
> > {
> >     KVMClockState *s = opaque;
> > 
> >     /*
> >      * On machine types that support reliable KVM_GET_CLOCK,
> >      * if host kernel does provide reliable KVM_GET_CLOCK,
> >      * set src_use_reliable_get_clock=true so that destination
> >      * avoids reading kvmclock from memory.
> >      */
> >     if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable())
> >     {
> >         s->src_use_reliable_get_clock = true;
> >     }
> > 
> >     return s->mach_use_reliable_get_clock;
> > }
> > 
> > 
> > Ah, OK, done.
> 
> s->src_use_reliable_get_clock should not be set with
> KVM_CHECK_EXTENSION, but rather from the flags returned by KVM_GET_CLOCK.

Well, thats not right: What matters is the presence of get_kvmclock_ns 
which returns a value that the guest sees. 

                get_kernel_monotonic_clock() + kvmclock_offset +
                (rdtsc() - tsc_timestamp)

IOW what the guest sees. And you changed that in 

commit 108b249c453dd7132599ab6dc7e435a7036c193f
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Thu Sep 1 14:21:03 2016 +0200

    KVM: x86: introduce get_kvmclock_ns

And the correct behaviour (once KVM_GET_CLOCK is fixed per 
previous message to return rdtsc - tsc_timestamp for the 
non masterclock case) depends on this commit above, 
not on masterclock.

> > So s->src_use_reliable_get_clock is only used to indicate 
> > to the destination that: "you can use KVM_GET_CLOCK value, 
> > its safe".
> 
> Yes, we agree.  I was listing all the points, not just those where we
> disagree.  Actually I'm not sure where we disagree, except on using
> flags from KVM_CHECK_EXTENSION vs. flags from KVM_GET_CLOCK...
> 
> Paolo
Paolo Bonzini Nov. 14, 2016, 5:20 p.m. UTC | #3
On 14/11/2016 18:13, Marcelo Tosatti wrote:
> On Mon, Nov 14, 2016 at 05:43:33PM +0100, Paolo Bonzini wrote:
>>
>>
>> On 14/11/2016 16:40, Marcelo Tosatti wrote:
>>> static bool kvmclock_src_use_reliable_get_clock(void *opaque)
>>> {
>>>     KVMClockState *s = opaque;
>>>
>>>     /*
>>>      * On machine types that support reliable KVM_GET_CLOCK,
>>>      * if host kernel does provide reliable KVM_GET_CLOCK,
>>>      * set src_use_reliable_get_clock=true so that destination
>>>      * avoids reading kvmclock from memory.
>>>      */
>>>     if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable())
>>>     {
>>>         s->src_use_reliable_get_clock = true;
>>>     }
>>>
>>>     return s->mach_use_reliable_get_clock;
>>> }
>>>
>>>
>>> Ah, OK, done.
>>
>> s->src_use_reliable_get_clock should not be set with
>> KVM_CHECK_EXTENSION, but rather from the flags returned by KVM_GET_CLOCK.
> 
> Well, thats not right: What matters is the presence of get_kvmclock_ns 
> which returns a value that the guest sees. 
> 
>                 get_kernel_monotonic_clock() + kvmclock_offset +
>                 (rdtsc() - tsc_timestamp)
> 
> IOW what the guest sees. And you changed that in 
> 
> commit 108b249c453dd7132599ab6dc7e435a7036c193f
> Author: Paolo Bonzini <pbonzini@redhat.com>
> Date:   Thu Sep 1 14:21:03 2016 +0200
> 
>     KVM: x86: introduce get_kvmclock_ns
> 
> And the correct behaviour (once KVM_GET_CLOCK is fixed per 
> previous message to return rdtsc - tsc_timestamp for the 
> non masterclock case) depends on this commit above, 
> not on masterclock.

This commit in turn only gets the correct behavior if 
"vcpu->arch.hv_clock.flags & PVCLOCK_TSC_STABLE_BIT" (and it will be 
changed soon to ka->use_masterclock).  KVM_CHECK_EXTENSION can still 
return KVM_CLOCK_TSC_STABLE even if the masterclock is disabled, 
because KVM_CHECK_EXTENSION only tells you which flags are known for
this version of the KVM module.

To see if the masterclock is enabled _now_, you need to check what
KVM_GET_CLOCK sets in the flags.  From the KVM_CLOCK_TSC_STABLE patch:

		user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0;

Thanks,

Paolo

>>> So s->src_use_reliable_get_clock is only used to indicate 
>>> to the destination that: "you can use KVM_GET_CLOCK value, 
>>> its safe".
>>
>> Yes, we agree.  I was listing all the points, not just those where we
>> disagree.  Actually I'm not sure where we disagree, except on using
>> flags from KVM_CHECK_EXTENSION vs. flags from KVM_GET_CLOCK...
>>
>> Paolo
Marcelo Tosatti Nov. 14, 2016, 6:15 p.m. UTC | #4
On Mon, Nov 14, 2016 at 06:20:29PM +0100, Paolo Bonzini wrote:
> 
> 
> On 14/11/2016 18:13, Marcelo Tosatti wrote:
> > On Mon, Nov 14, 2016 at 05:43:33PM +0100, Paolo Bonzini wrote:
> >>
> >>
> >> On 14/11/2016 16:40, Marcelo Tosatti wrote:
> >>> static bool kvmclock_src_use_reliable_get_clock(void *opaque)
> >>> {
> >>>     KVMClockState *s = opaque;
> >>>
> >>>     /*
> >>>      * On machine types that support reliable KVM_GET_CLOCK,
> >>>      * if host kernel does provide reliable KVM_GET_CLOCK,
> >>>      * set src_use_reliable_get_clock=true so that destination
> >>>      * avoids reading kvmclock from memory.
> >>>      */
> >>>     if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable())
> >>>     {
> >>>         s->src_use_reliable_get_clock = true;
> >>>     }
> >>>
> >>>     return s->mach_use_reliable_get_clock;
> >>> }
> >>>
> >>>
> >>> Ah, OK, done.
> >>
> >> s->src_use_reliable_get_clock should not be set with
> >> KVM_CHECK_EXTENSION, but rather from the flags returned by KVM_GET_CLOCK.
> > 
> > Well, thats not right: What matters is the presence of get_kvmclock_ns 
> > which returns a value that the guest sees. 
> > 
> >                 get_kernel_monotonic_clock() + kvmclock_offset +
> >                 (rdtsc() - tsc_timestamp)
> > 
> > IOW what the guest sees. And you changed that in 
> > 
> > commit 108b249c453dd7132599ab6dc7e435a7036c193f
> > Author: Paolo Bonzini <pbonzini@redhat.com>
> > Date:   Thu Sep 1 14:21:03 2016 +0200
> > 
> >     KVM: x86: introduce get_kvmclock_ns
> > 
> > And the correct behaviour (once KVM_GET_CLOCK is fixed per 
> > previous message to return rdtsc - tsc_timestamp for the 
> > non masterclock case) depends on this commit above, 
> > not on masterclock.
> 
> This commit in turn only gets the correct behavior if 
> "vcpu->arch.hv_clock.flags & PVCLOCK_TSC_STABLE_BIT" (and it will be 
> changed soon to ka->use_masterclock).  KVM_CHECK_EXTENSION can still 
> return KVM_CLOCK_TSC_STABLE even if the masterclock is disabled, 
> because KVM_CHECK_EXTENSION only tells you which flags are known for
> this version of the KVM module.

What QEMU wants is to use KVM_GET_CLOCK at pre_save independently
of whether masterclock is enabled or not... it just depends
on KVM_GET_CLOCK being correct for the masterclock case
(108b249c453dd7132599ab6dc7e435a7036c193f).

So a "reliable KVM_GET_CLOCK" (that does not timebackward
when masterclock is enabled) is much simpler to userspace
than "whether masterclock is enabled or not".

If you have a reason why that should not be the case,
let me know.

> To see if the masterclock is enabled _now_, you need to check what
> KVM_GET_CLOCK sets in the flags.  From the KVM_CLOCK_TSC_STABLE patch:
> 
> 		user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0;

Again, whether masterclock is enable is independent of 
being able to use KVM_GET_CLOCK at pre_save.
Marcelo Tosatti Nov. 17, 2016, 12:16 p.m. UTC | #5
On Mon, Nov 14, 2016 at 04:15:18PM -0200, Marcelo Tosatti wrote:
> On Mon, Nov 14, 2016 at 06:20:29PM +0100, Paolo Bonzini wrote:
> > 
> > 
> > On 14/11/2016 18:13, Marcelo Tosatti wrote:
> > > On Mon, Nov 14, 2016 at 05:43:33PM +0100, Paolo Bonzini wrote:
> > >>
> > >>
> > >> On 14/11/2016 16:40, Marcelo Tosatti wrote:
> > >>> static bool kvmclock_src_use_reliable_get_clock(void *opaque)
> > >>> {
> > >>>     KVMClockState *s = opaque;
> > >>>
> > >>>     /*
> > >>>      * On machine types that support reliable KVM_GET_CLOCK,
> > >>>      * if host kernel does provide reliable KVM_GET_CLOCK,
> > >>>      * set src_use_reliable_get_clock=true so that destination
> > >>>      * avoids reading kvmclock from memory.
> > >>>      */
> > >>>     if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable())
> > >>>     {
> > >>>         s->src_use_reliable_get_clock = true;
> > >>>     }
> > >>>
> > >>>     return s->mach_use_reliable_get_clock;
> > >>> }
> > >>>
> > >>>
> > >>> Ah, OK, done.
> > >>
> > >> s->src_use_reliable_get_clock should not be set with
> > >> KVM_CHECK_EXTENSION, but rather from the flags returned by KVM_GET_CLOCK.
> > > 
> > > Well, thats not right: What matters is the presence of get_kvmclock_ns 
> > > which returns a value that the guest sees. 
> > > 
> > >                 get_kernel_monotonic_clock() + kvmclock_offset +
> > >                 (rdtsc() - tsc_timestamp)
> > > 
> > > IOW what the guest sees. And you changed that in 
> > > 
> > > commit 108b249c453dd7132599ab6dc7e435a7036c193f
> > > Author: Paolo Bonzini <pbonzini@redhat.com>
> > > Date:   Thu Sep 1 14:21:03 2016 +0200
> > > 
> > >     KVM: x86: introduce get_kvmclock_ns
> > > 
> > > And the correct behaviour (once KVM_GET_CLOCK is fixed per 
> > > previous message to return rdtsc - tsc_timestamp for the 
> > > non masterclock case) depends on this commit above, 
> > > not on masterclock.
> > 
> > This commit in turn only gets the correct behavior if 
> > "vcpu->arch.hv_clock.flags & PVCLOCK_TSC_STABLE_BIT" (and it will be 
> > changed soon to ka->use_masterclock).  KVM_CHECK_EXTENSION can still 
> > return KVM_CLOCK_TSC_STABLE even if the masterclock is disabled, 
> > because KVM_CHECK_EXTENSION only tells you which flags are known for
> > this version of the KVM module.
> 
> What QEMU wants is to use KVM_GET_CLOCK at pre_save independently
> of whether masterclock is enabled or not... it just depends
> on KVM_GET_CLOCK being correct for the masterclock case
> (108b249c453dd7132599ab6dc7e435a7036c193f).
> 
> So a "reliable KVM_GET_CLOCK" (that does not timebackward
> when masterclock is enabled) is much simpler to userspace
> than "whether masterclock is enabled or not".
> 
> If you have a reason why that should not be the case,
> let me know.
> 
> > To see if the masterclock is enabled _now_, you need to check what
> > KVM_GET_CLOCK sets in the flags.  From the KVM_CLOCK_TSC_STABLE patch:
> > 
> > 		user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0;
> 
> Again, whether masterclock is enable is independent of 
> being able to use KVM_GET_CLOCK at pre_save.

Is this point OK ?

Using 

                break;
+   case KVM_CAP_ADJUST_CLOCK:
+               r = KVM_CLOCK_TSC_STABLE;
+               break;

To infer whether KVM_GET_CLOCK is fixed for the monotonic case.
Paolo Bonzini Nov. 17, 2016, 1:03 p.m. UTC | #6
On 17/11/2016 13:16, Marcelo Tosatti wrote:
> On Mon, Nov 14, 2016 at 04:15:18PM -0200, Marcelo Tosatti wrote:
>> On Mon, Nov 14, 2016 at 06:20:29PM +0100, Paolo Bonzini wrote:
>>>
>>>
>>> On 14/11/2016 18:13, Marcelo Tosatti wrote:
>>>> On Mon, Nov 14, 2016 at 05:43:33PM +0100, Paolo Bonzini wrote:
>>>>>
>>>>>
>>>>> On 14/11/2016 16:40, Marcelo Tosatti wrote:
>>>>>> static bool kvmclock_src_use_reliable_get_clock(void *opaque)
>>>>>> {
>>>>>>     KVMClockState *s = opaque;
>>>>>>
>>>>>>     /*
>>>>>>      * On machine types that support reliable KVM_GET_CLOCK,
>>>>>>      * if host kernel does provide reliable KVM_GET_CLOCK,
>>>>>>      * set src_use_reliable_get_clock=true so that destination
>>>>>>      * avoids reading kvmclock from memory.
>>>>>>      */
>>>>>>     if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable())
>>>>>>     {
>>>>>>         s->src_use_reliable_get_clock = true;
>>>>>>     }
>>>>>>
>>>>>>     return s->mach_use_reliable_get_clock;
>>>>>> }
>>>>>>
>>>>>>
>>>>>> Ah, OK, done.
>>>>>
>>>>> s->src_use_reliable_get_clock should not be set with
>>>>> KVM_CHECK_EXTENSION, but rather from the flags returned by KVM_GET_CLOCK.
>>>>
>>>> Well, thats not right: What matters is the presence of get_kvmclock_ns 
>>>> which returns a value that the guest sees. 
>>>>
>>>>                 get_kernel_monotonic_clock() + kvmclock_offset +
>>>>                 (rdtsc() - tsc_timestamp)
>>>>
>>>> IOW what the guest sees. And you changed that in 
>>>>
>>>> commit 108b249c453dd7132599ab6dc7e435a7036c193f
>>>> Author: Paolo Bonzini <pbonzini@redhat.com>
>>>> Date:   Thu Sep 1 14:21:03 2016 +0200
>>>>
>>>>     KVM: x86: introduce get_kvmclock_ns
>>>>
>>>> And the correct behaviour (once KVM_GET_CLOCK is fixed per 
>>>> previous message to return rdtsc - tsc_timestamp for the 
>>>> non masterclock case) depends on this commit above, 
>>>> not on masterclock.
>>>
>>> This commit in turn only gets the correct behavior if 
>>> "vcpu->arch.hv_clock.flags & PVCLOCK_TSC_STABLE_BIT" (and it will be 
>>> changed soon to ka->use_masterclock).  KVM_CHECK_EXTENSION can still 
>>> return KVM_CLOCK_TSC_STABLE even if the masterclock is disabled, 
>>> because KVM_CHECK_EXTENSION only tells you which flags are known for
>>> this version of the KVM module.
>>
>> What QEMU wants is to use KVM_GET_CLOCK at pre_save independently
>> of whether masterclock is enabled or not... it just depends
>> on KVM_GET_CLOCK being correct for the masterclock case
>> (108b249c453dd7132599ab6dc7e435a7036c193f).
>>
>> So a "reliable KVM_GET_CLOCK" (that does not timebackward
>> when masterclock is enabled) is much simpler to userspace
>> than "whether masterclock is enabled or not".
>>
>> If you have a reason why that should not be the case,
>> let me know.
>>
>>> To see if the masterclock is enabled _now_, you need to check what
>>> KVM_GET_CLOCK sets in the flags.  From the KVM_CLOCK_TSC_STABLE patch:
>>>
>>> 		user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0;
>>
>> Again, whether masterclock is enable is independent of 
>> being able to use KVM_GET_CLOCK at pre_save.
> 
> Is this point OK ?
> 
> Using 
> 
>                 break;
> +   case KVM_CAP_ADJUST_CLOCK:
> +               r = KVM_CLOCK_TSC_STABLE;
> +               break;
> 
> To infer whether KVM_GET_CLOCK is fixed for the monotonic case.

Yes, I still haven't digested why it is correct (I need to read again
what you wrote), but it is indeed correct to use KVM_CAP_ADJUST_CLOCK
this way.

Paolo
diff mbox

Patch

Index: qemu-mig-advance-clock/hw/i386/kvm/clock.c
===================================================================
--- qemu-mig-advance-clock.orig/hw/i386/kvm/clock.c	2016-11-14 10:40:39.748116312 -0200
+++ qemu-mig-advance-clock/hw/i386/kvm/clock.c	2016-11-14 13:38:29.299955042 -0200
@@ -36,6 +36,12 @@ 
 
     uint64_t clock;
     bool clock_valid;
+
+    /* whether machine supports reliable KVM_GET_CLOCK */
+    bool mach_use_reliable_get_clock;
+
+    /* whether source host supported reliable KVM_GET_CLOCK */
+    bool src_use_reliable_get_clock;
 } KVMClockState;
 
 struct pvclock_vcpu_time_info {
@@ -81,6 +87,19 @@ 
     return nsec + time.system_time;
 }
 
+static uint64_t kvm_get_clock(void)
+{
+    struct kvm_clock_data data;
+    int ret;
+
+    ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
+    if (ret < 0) {
+        fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
+                abort();
+    }
+    return data.clock;
+}
+
 static void kvmclock_vm_state_change(void *opaque, int running,
                                      RunState state)
 {
@@ -91,15 +110,37 @@ 
 
     if (running) {
         struct kvm_clock_data data = {};
-        uint64_t time_at_migration = kvmclock_current_nsec(s);
+        uint64_t pvclock_via_mem = 0;
 
-        s->clock_valid = false;
+        /* local (running VM) restore */
+        if (s->clock_valid) {
+            /*
+             * if host does not support reliable KVM_GET_CLOCK,
+             * read kvmclock value from memory
+             */
+            if (!kvm_has_adjust_clock_stable()) {
+                pvclock_via_mem = kvmclock_current_nsec(s);
+            }
+        /* migration/savevm/init restore */
+        } else {
+            /*
+             * use s->clock in case machine uses reliable
+             * get clock and source host supported
+             * reliable get clock
+             */
+            if (!(s->mach_use_reliable_get_clock &&
+                  s->src_use_reliable_get_clock)) {
+                pvclock_via_mem = kvmclock_current_nsec(s);
+            }
+        }
 
-        /* We can't rely on the migrated clock value, just discard it */
-        if (time_at_migration) {
-            s->clock = time_at_migration;
+        /* We can't rely on the saved clock value, just discard it */
+        if (pvclock_via_mem) {
+            s->clock = pvclock_via_mem;
         }
 
+        s->clock_valid = false;
+
         data.clock = s->clock;
         ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data);
         if (ret < 0) {
@@ -120,8 +161,6 @@ 
             }
         }
     } else {
-        struct kvm_clock_data data;
-        int ret;
 
         if (s->clock_valid) {
             return;
@@ -129,13 +168,7 @@ 
 
         kvm_synchronize_all_tsc();
 
-        ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
-        if (ret < 0) {
-            fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
-            abort();
-        }
-        s->clock = data.clock;
-
+        s->clock = kvm_get_clock();
         /*
          * If the VM is stopped, declare the clock state valid to
          * avoid re-reading it on next vmsave (which would return
@@ -152,22 +185,69 @@ 
     qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s);
 }
 
+static bool kvmclock_src_use_reliable_get_clock(void *opaque)
+{
+    KVMClockState *s = opaque;
+
+    /*
+     * On machine types that support reliable KVM_GET_CLOCK,
+     * if host kernel does provide reliable KVM_GET_CLOCK,
+     * set src_use_reliable_get_clock=true so that destination
+     * avoids reading kvmclock from memory.
+     */
+    if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) {
+        s->src_use_reliable_get_clock = true;
+    }
+
+    return s->mach_use_reliable_get_clock;
+}
+
+static const VMStateDescription kvmclock_reliable_get_clock = {
+    .name = "kvmclock/src_use_reliable_get_clock",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .needed = kvmclock_src_use_reliable_get_clock,
+    .fields = (VMStateField[]) {
+        VMSTATE_BOOL(src_use_reliable_get_clock, KVMClockState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static void kvmclock_pre_save(void *opaque)
+{
+    KVMClockState *s = opaque;
+
+    s->clock = kvm_get_clock();
+}
+
 static const VMStateDescription kvmclock_vmsd = {
     .name = "kvmclock",
     .version_id = 1,
     .minimum_version_id = 1,
+    .pre_save = kvmclock_pre_save,
     .fields = (VMStateField[]) {
         VMSTATE_UINT64(clock, KVMClockState),
         VMSTATE_END_OF_LIST()
+    },
+    .subsections = (const VMStateDescription * []) {
+        &kvmclock_reliable_get_clock,
+        NULL
     }
 };
 
+static Property kvmclock_properties[] = {
+    DEFINE_PROP_BOOL("mach_use_reliable_get_clock", KVMClockState,
+                      mach_use_reliable_get_clock, true),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
 static void kvmclock_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
 
     dc->realize = kvmclock_realize;
     dc->vmsd = &kvmclock_vmsd;
+    dc->props = kvmclock_properties;
 }
 
 static const TypeInfo kvmclock_info = {
Index: qemu-mig-advance-clock/include/hw/i386/pc.h
===================================================================
--- qemu-mig-advance-clock.orig/include/hw/i386/pc.h	2016-11-14 10:40:39.748116312 -0200
+++ qemu-mig-advance-clock/include/hw/i386/pc.h	2016-11-14 10:40:48.059128649 -0200
@@ -370,6 +370,11 @@ 
 #define PC_COMPAT_2_7 \
     HW_COMPAT_2_7 \
     {\
+        .driver   = "kvmclock",\
+        .property = "mach_use_reliable_get_clock",\
+        .value    = "off",\
+    },\
+    {\
         .driver   = TYPE_X86_CPU,\
         .property = "l3-cache",\
         .value    = "off",\
Index: qemu-mig-advance-clock/target-i386/kvm.c
===================================================================
--- qemu-mig-advance-clock.orig/target-i386/kvm.c	2016-11-14 10:40:39.750116314 -0200
+++ qemu-mig-advance-clock/target-i386/kvm.c	2016-11-14 10:40:48.061128652 -0200
@@ -117,6 +117,13 @@ 
     return kvm_check_extension(kvm_state, KVM_CAP_X86_SMM);
 }
 
+bool kvm_has_adjust_clock_stable(void)
+{
+    int ret = kvm_check_extension(kvm_state, KVM_CAP_ADJUST_CLOCK);
+
+    return (ret == KVM_CLOCK_TSC_STABLE);
+}
+
 bool kvm_allows_irq0_override(void)
 {
     return !kvm_irqchip_in_kernel() || kvm_has_gsi_routing();
Index: qemu-mig-advance-clock/target-i386/kvm_i386.h
===================================================================
--- qemu-mig-advance-clock.orig/target-i386/kvm_i386.h	2016-11-14 10:40:39.751116316 -0200
+++ qemu-mig-advance-clock/target-i386/kvm_i386.h	2016-11-14 10:40:48.061128652 -0200
@@ -17,6 +17,7 @@ 
 
 bool kvm_allows_irq0_override(void);
 bool kvm_has_smm(void);
+bool kvm_has_adjust_clock_stable(void);
 void kvm_synchronize_all_tsc(void);
 void kvm_arch_reset_vcpu(X86CPU *cs);
 void kvm_arch_do_init_vcpu(X86CPU *cs);