diff mbox

kvmclock: clarify usage of cpu_clean_all_dirty

Message ID 20140916151437.GA27819@amt.cnet
State New
Headers show

Commit Message

Marcelo Tosatti Sept. 16, 2014, 3:14 p.m. UTC
Explain the cpu_clean_all_dirty call.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
 hw/i386/kvm/clock.c |    4 ++++
 1 file changed, 4 insertions(+)

Comments

Paolo Bonzini Sept. 16, 2014, 3:42 p.m. UTC | #1
Il 16/09/2014 17:14, Marcelo Tosatti ha scritto:
> +	/*
> +	 * Make sure that CPU state is synchronized from KVM
> + 	 * once every VM state change callback has finished.

Which other callback could affect the in-kernel state, and should that
call cpu_clean_all_dirty instead?

Paolo

> + 	 */
>          cpu_clean_all_dirty();
Marcelo Tosatti Sept. 16, 2014, 3:55 p.m. UTC | #2
On Tue, Sep 16, 2014 at 05:42:16PM +0200, Paolo Bonzini wrote:
> Il 16/09/2014 17:14, Marcelo Tosatti ha scritto:
> > +	/*
> > +	 * Make sure that CPU state is synchronized from KVM
> > + 	 * once every VM state change callback has finished.
> 
> Which other callback could affect the in-kernel state, 

Marcin mentioned that APIC state was the culprit.

Perhaps 

     bdrv_drain_all(); 
     ret = bdrv_flush_all();

Can change the interrupt state ?

Then that should read "once VM stop has finished".

> call cpu_clean_all_dirty instead?
> 
> Paolo
> 
> > + 	 */
> >          cpu_clean_all_dirty();
Paolo Bonzini Sept. 16, 2014, 4:03 p.m. UTC | #3
Il 16/09/2014 17:55, Marcelo Tosatti ha scritto:
> On Tue, Sep 16, 2014 at 05:42:16PM +0200, Paolo Bonzini wrote:
>> Il 16/09/2014 17:14, Marcelo Tosatti ha scritto:
>>> +	/*
>>> +	 * Make sure that CPU state is synchronized from KVM
>>> + 	 * once every VM state change callback has finished.
>>
>> Which other callback could affect the in-kernel state, 
> 
> Marcin mentioned that APIC state was the culprit.
> 
> Perhaps 
> 
>      bdrv_drain_all(); 
>      ret = bdrv_flush_all();
> 
> Can change the interrupt state ?

Ah, I thought Marcin was checking on the destination, not the source.

> Then that should read "once VM stop has finished".

But I still do not understand.

The cpu_synchronize_all_states() call in kvmclock_vm_state_change() is
needed to make env->tsc up to date with the value on the source, right?

But if the synchronize_all_states+clean_all_dirty pair runs on the
source, why is the cpu_synchronize_all_states() call in
qemu_savevm_state_complete() not enough?  It runs even later than
kvmclock_vm_state_change.

I don't understand even the original patch without cpu_clean_all_dirty()...

Paolo
Marcelo Tosatti Sept. 16, 2014, 4:03 p.m. UTC | #4
On Tue, Sep 16, 2014 at 05:42:16PM +0200, Paolo Bonzini wrote:
> Il 16/09/2014 17:14, Marcelo Tosatti ha scritto:
> > +	/*
> > +	 * Make sure that CPU state is synchronized from KVM
> > + 	 * once every VM state change callback has finished.
> 
> Which other callback could affect the in-kernel state, 

Do you want me to find out exactly which callback is changing 
the in-kernel interrupt state?
To be honest: i don't know.

> and should that call cpu_clean_all_dirty instead?

Does not make sense: any callback which modifies any in-kernel register
state should clean dirty state because its possibly set.
Marcelo Tosatti Sept. 16, 2014, 4:07 p.m. UTC | #5
On Tue, Sep 16, 2014 at 06:03:40PM +0200, Paolo Bonzini wrote:
> Il 16/09/2014 17:55, Marcelo Tosatti ha scritto:
> > On Tue, Sep 16, 2014 at 05:42:16PM +0200, Paolo Bonzini wrote:
> >> Il 16/09/2014 17:14, Marcelo Tosatti ha scritto:
> >>> +	/*
> >>> +	 * Make sure that CPU state is synchronized from KVM
> >>> + 	 * once every VM state change callback has finished.
> >>
> >> Which other callback could affect the in-kernel state, 
> > 
> > Marcin mentioned that APIC state was the culprit.
> > 
> > Perhaps 
> > 
> >      bdrv_drain_all(); 
> >      ret = bdrv_flush_all();
> > 
> > Can change the interrupt state ?
> 
> Ah, I thought Marcin was checking on the destination, not the source.
> 
> > Then that should read "once VM stop has finished".
> 
> But I still do not understand.
> 
> The cpu_synchronize_all_states() call in kvmclock_vm_state_change() is
> needed to make env->tsc up to date with the value on the source, right?

Its there to make sure the pair

env->tsc, s->clock = data.clock

are relative to point close in time.

> But if the synchronize_all_states+clean_all_dirty pair runs on the
> source, why is the cpu_synchronize_all_states() call in
> qemu_savevm_state_complete() not enough?  It runs even later than
> kvmclock_vm_state_change.

Because of the "pair of time values" explanation above.

> I don't understand even the original patch without cpu_clean_all_dirty()...
> 
> Paolo
Paolo Bonzini Sept. 16, 2014, 4:22 p.m. UTC | #6
Il 16/09/2014 18:07, Marcelo Tosatti ha scritto:
>> > The cpu_synchronize_all_states() call in kvmclock_vm_state_change() is
>> > needed to make env->tsc up to date with the value on the source, right?
> Its there to make sure the pair
> 
> env->tsc, s->clock = data.clock
> 
> are relative to point close in time.

Ok.  But why are they not close in time?

Could we have the opposite situation where env->tsc is loaded a long
time _after_ s->clock, and something breaks?

Also, there is no reason to do kvmclock_current_nsec() during normal
execution of the VM.  Is the s->clock sent by the source ever good
across migration, and could the source send kvmclock_current_nsec()
instead of whatever KVM_GET_CLOCK returns?

I don't understand this code very well, but it seems to me that the
migration handling and VM state change handler are mixed up...

Paolo

>> > But if the synchronize_all_states+clean_all_dirty pair runs on the
>> > source, why is the cpu_synchronize_all_states() call in
>> > qemu_savevm_state_complete() not enough?  It runs even later than
>> > kvmclock_vm_state_change.
> Because of the "pair of time values" explanation above.
Marcelo Tosatti Sept. 16, 2014, 4:48 p.m. UTC | #7
On Tue, Sep 16, 2014 at 06:22:15PM +0200, Paolo Bonzini wrote:
> Il 16/09/2014 18:07, Marcelo Tosatti ha scritto:
> >> > The cpu_synchronize_all_states() call in kvmclock_vm_state_change() is
> >> > needed to make env->tsc up to date with the value on the source, right?
> > Its there to make sure the pair
> > 
> > env->tsc, s->clock = data.clock
> > 
> > are relative to point close in time.
> 
> Ok.  But why are they not close in time?

Scenario 1

A. s->clock = get_clock()
B. env->tsc = rdtsc()

Scenario 2

A. s->clock = get_clock()
C. VM callbacks, bdrv_flush, ...
B. env->tsc = rdtsc()

They are not "close in time" because of C.

> Could we have the opposite situation where env->tsc is loaded a long
> time _after_ s->clock, and something breaks?

This particular read avoids an overflow. See 

+    assert(time.tsc_timestamp <= migration_tsc);


About your question, perhaps, would have to make up an
env->tsc, s->clock pair which breaks the code or a guest
application.

> Also, there is no reason to do kvmclock_current_nsec() during normal
> execution of the VM.  Is the s->clock sent by the source ever good
> across migration, and could the source send kvmclock_current_nsec()
> instead of whatever KVM_GET_CLOCK returns?

Yes it could. What difference does it make?

> I don't understand this code very well, but it seems to me that the
> migration handling and VM state change handler are mixed up...

I don't see what the problem is. I am sure you can understand the code.
Marcelo Tosatti Sept. 16, 2014, 4:54 p.m. UTC | #8
On Tue, Sep 16, 2014 at 01:48:24PM -0300, Marcelo Tosatti wrote:
> On Tue, Sep 16, 2014 at 06:22:15PM +0200, Paolo Bonzini wrote:
> > Il 16/09/2014 18:07, Marcelo Tosatti ha scritto:
> > >> > The cpu_synchronize_all_states() call in kvmclock_vm_state_change() is
> > >> > needed to make env->tsc up to date with the value on the source, right?
> > > Its there to make sure the pair
> > > 
> > > env->tsc, s->clock = data.clock
> > > 
> > > are relative to point close in time.
> > 
> > Ok.  But why are they not close in time?
> 
> Scenario 1
> 
> A. s->clock = get_clock()
> B. env->tsc = rdtsc()
> 
> Scenario 2
> 
> A. s->clock = get_clock()
> C. VM callbacks, bdrv_flush, ...
> B. env->tsc = rdtsc()
> 
> They are not "close in time" because of C.
> 
> > Could we have the opposite situation where env->tsc is loaded a long
> > time _after_ s->clock, and something breaks?
> 
> This particular read avoids an overflow. See 
> 
> +    assert(time.tsc_timestamp <= migration_tsc);
> 
> 
> About your question, perhaps, would have to make up an
> env->tsc, s->clock pair which breaks the code or a guest
> application.
> 
> > Also, there is no reason to do kvmclock_current_nsec() during normal
> > execution of the VM.  Is the s->clock sent by the source ever good
> > across migration, and could the source send kvmclock_current_nsec()
> > instead of whatever KVM_GET_CLOCK returns?
> 
> Yes it could. What difference does it make?
> 
> > I don't understand this code very well, but it seems to me that the
> > migration handling and VM state change handler are mixed up...
> 
> I don't see what the problem is. I am sure you can understand the code.

If you have a suggestion to improve the code, i am all ears. But IMHO,
its not terrible (migration and VM state change handling are mixed in 
other drivers as well).
Marcelo Tosatti Sept. 16, 2014, 6:10 p.m. UTC | #9
On Tue, Sep 16, 2014 at 06:22:15PM +0200, Paolo Bonzini wrote:
> Il 16/09/2014 18:07, Marcelo Tosatti ha scritto:
> >> > The cpu_synchronize_all_states() call in kvmclock_vm_state_change() is
> >> > needed to make env->tsc up to date with the value on the source, right?
> > Its there to make sure the pair
> > 
> > env->tsc, s->clock = data.clock
> > 
> > are relative to point close in time.
> 
> Ok.  But why are they not close in time?
> 
> Could we have the opposite situation where env->tsc is loaded a long
> time _after_ s->clock, and something breaks?
> 
> Also, there is no reason to do kvmclock_current_nsec() during normal
> execution of the VM.  Is the s->clock sent by the source ever good
> across migration, and could the source send kvmclock_current_nsec()
> instead of whatever KVM_GET_CLOCK returns?

guest clock read = pvclock.system_timestamp + (rdtsc() - pvclock.tsc)

Host kernel updates pvclock.system_timestamp in certain situations,
such as guest initialization. With master clock scheme,
pvclock.system_timestamp is only updated on guest initialization.

In case TSC runs faster than the host system clock, you cannot do 
the following on destination:

pvclock.system_timestamp = ioctl(KVM_GET_CLOCK)
povclock.tsc = rdtsc

guest clock read = pvclock.system_timestampOLD + (rdtsc() - pvclock.tsc)

Because the effective clock was not pvclock.system_timestamp but the
TSC, which is running at a higher frequency. If you do that, the time 
goes backward.

Q: could the source send kvmclock_current_nsec() 
instead of whatever KVM_GET_CLOCK returns?

Well no because there are other users of KVM_GET_CLOCK such as 
Hyper-V clock.

> I don't understand this code very well, but it seems to me that the
> migration handling and VM state change handler are mixed up...

Again, suggestions are welcome.

> 
> Paolo
> 
> >> > But if the synchronize_all_states+clean_all_dirty pair runs on the
> >> > source, why is the cpu_synchronize_all_states() call in
> >> > qemu_savevm_state_complete() not enough?  It runs even later than
> >> > kvmclock_vm_state_change.
> > Because of the "pair of time values" explanation above.
>
diff mbox

Patch

Index: qemu/hw/i386/kvm/clock.c
===================================================================
--- qemu.orig/hw/i386/kvm/clock.c	2014-09-16 12:12:00.622537799 -0300
+++ qemu/hw/i386/kvm/clock.c	2014-09-16 12:12:44.696555868 -0300
@@ -127,6 +127,10 @@ 
         }
 
         cpu_synchronize_all_states();
+	/*
+	 * Make sure that CPU state is synchronized from KVM
+ 	 * once every VM state change callback has finished.
+ 	 */
         cpu_clean_all_dirty();
         ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data);
         if (ret < 0) {