kvmclock: clarify usage of cpu_clean_all_dirty

Message ID	20140916151437.GA27819@amt.cnet
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> Date: Tue, 16 Sep 2014 12:14:37 -0300 From: Marcelo Tosatti <mtosatti@redhat.com> To: Paolo Bonzini <pbonzini@redhat.com>, qemu-devel <qemu-devel@nongnu.org> Message-ID: <20140916151437.GA27819@amt.cnet> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.21 (2010-09-15) Subject: [Qemu-devel] [PATCH] kvmclock: clarify usage of cpu_clean_all_dirty Precedence: list Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Marcelo Tosatti Sept. 16, 2014, 3:14 p.m. UTC

Explain the cpu_clean_all_dirty call.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

---
 hw/i386/kvm/clock.c |    4 ++++
 1 file changed, 4 insertions(+)

Paolo Bonzini Sept. 16, 2014, 3:42 p.m. UTC | #1

Il 16/09/2014 17:14, Marcelo Tosatti ha scritto:
> +	/*
> +	 * Make sure that CPU state is synchronized from KVM
> + 	 * once every VM state change callback has finished.

Which other callback could affect the in-kernel state, and should that
call cpu_clean_all_dirty instead?

Paolo

> + 	 */
>          cpu_clean_all_dirty();

Marcelo Tosatti Sept. 16, 2014, 3:55 p.m. UTC | #2

On Tue, Sep 16, 2014 at 05:42:16PM +0200, Paolo Bonzini wrote:
> Il 16/09/2014 17:14, Marcelo Tosatti ha scritto:
> > +	/*
> > +	 * Make sure that CPU state is synchronized from KVM
> > + 	 * once every VM state change callback has finished.
> 
> Which other callback could affect the in-kernel state, 

Marcin mentioned that APIC state was the culprit.

Perhaps 

     bdrv_drain_all(); 
     ret = bdrv_flush_all();

Can change the interrupt state ?

Then that should read "once VM stop has finished".

> call cpu_clean_all_dirty instead?
> 
> Paolo
> 
> > + 	 */
> >          cpu_clean_all_dirty();

Paolo Bonzini Sept. 16, 2014, 4:03 p.m. UTC | #3

Il 16/09/2014 17:55, Marcelo Tosatti ha scritto:
> On Tue, Sep 16, 2014 at 05:42:16PM +0200, Paolo Bonzini wrote:
>> Il 16/09/2014 17:14, Marcelo Tosatti ha scritto:
>>> +	/*
>>> +	 * Make sure that CPU state is synchronized from KVM
>>> + 	 * once every VM state change callback has finished.
>>
>> Which other callback could affect the in-kernel state, 
> 
> Marcin mentioned that APIC state was the culprit.
> 
> Perhaps 
> 
>      bdrv_drain_all(); 
>      ret = bdrv_flush_all();
> 
> Can change the interrupt state ?

Ah, I thought Marcin was checking on the destination, not the source.

> Then that should read "once VM stop has finished".

But I still do not understand.

The cpu_synchronize_all_states() call in kvmclock_vm_state_change() is
needed to make env->tsc up to date with the value on the source, right?

But if the synchronize_all_states+clean_all_dirty pair runs on the
source, why is the cpu_synchronize_all_states() call in
qemu_savevm_state_complete() not enough?  It runs even later than
kvmclock_vm_state_change.

I don't understand even the original patch without cpu_clean_all_dirty()...

Paolo

Marcelo Tosatti Sept. 16, 2014, 4:03 p.m. UTC | #4

On Tue, Sep 16, 2014 at 05:42:16PM +0200, Paolo Bonzini wrote:
> Il 16/09/2014 17:14, Marcelo Tosatti ha scritto:
> > +	/*
> > +	 * Make sure that CPU state is synchronized from KVM
> > + 	 * once every VM state change callback has finished.
> 
> Which other callback could affect the in-kernel state, 

Do you want me to find out exactly which callback is changing 
the in-kernel interrupt state?
To be honest: i don't know.

> and should that call cpu_clean_all_dirty instead?

Does not make sense: any callback which modifies any in-kernel register
state should clean dirty state because its possibly set.

Marcelo Tosatti Sept. 16, 2014, 4:07 p.m. UTC | #5

On Tue, Sep 16, 2014 at 06:03:40PM +0200, Paolo Bonzini wrote:
> Il 16/09/2014 17:55, Marcelo Tosatti ha scritto:
> > On Tue, Sep 16, 2014 at 05:42:16PM +0200, Paolo Bonzini wrote:
> >> Il 16/09/2014 17:14, Marcelo Tosatti ha scritto:
> >>> +	/*
> >>> +	 * Make sure that CPU state is synchronized from KVM
> >>> + 	 * once every VM state change callback has finished.
> >>
> >> Which other callback could affect the in-kernel state, 
> > 
> > Marcin mentioned that APIC state was the culprit.
> > 
> > Perhaps 
> > 
> >      bdrv_drain_all(); 
> >      ret = bdrv_flush_all();
> > 
> > Can change the interrupt state ?
> 
> Ah, I thought Marcin was checking on the destination, not the source.
> 
> > Then that should read "once VM stop has finished".
> 
> But I still do not understand.
> 
> The cpu_synchronize_all_states() call in kvmclock_vm_state_change() is
> needed to make env->tsc up to date with the value on the source, right?

Its there to make sure the pair

env->tsc, s->clock = data.clock

are relative to point close in time.

> But if the synchronize_all_states+clean_all_dirty pair runs on the
> source, why is the cpu_synchronize_all_states() call in
> qemu_savevm_state_complete() not enough?  It runs even later than
> kvmclock_vm_state_change.

Because of the "pair of time values" explanation above.

> I don't understand even the original patch without cpu_clean_all_dirty()...
> 
> Paolo

Paolo Bonzini Sept. 16, 2014, 4:22 p.m. UTC | #6

Il 16/09/2014 18:07, Marcelo Tosatti ha scritto:
>> > The cpu_synchronize_all_states() call in kvmclock_vm_state_change() is
>> > needed to make env->tsc up to date with the value on the source, right?
> Its there to make sure the pair
> 
> env->tsc, s->clock = data.clock
> 
> are relative to point close in time.

Ok.  But why are they not close in time?

Could we have the opposite situation where env->tsc is loaded a long
time _after_ s->clock, and something breaks?

Also, there is no reason to do kvmclock_current_nsec() during normal
execution of the VM.  Is the s->clock sent by the source ever good
across migration, and could the source send kvmclock_current_nsec()
instead of whatever KVM_GET_CLOCK returns?

I don't understand this code very well, but it seems to me that the
migration handling and VM state change handler are mixed up...

Paolo

>> > But if the synchronize_all_states+clean_all_dirty pair runs on the
>> > source, why is the cpu_synchronize_all_states() call in
>> > qemu_savevm_state_complete() not enough?  It runs even later than
>> > kvmclock_vm_state_change.
> Because of the "pair of time values" explanation above.

Marcelo Tosatti Sept. 16, 2014, 4:48 p.m. UTC | #7

On Tue, Sep 16, 2014 at 06:22:15PM +0200, Paolo Bonzini wrote:
> Il 16/09/2014 18:07, Marcelo Tosatti ha scritto:
> >> > The cpu_synchronize_all_states() call in kvmclock_vm_state_change() is
> >> > needed to make env->tsc up to date with the value on the source, right?
> > Its there to make sure the pair
> > 
> > env->tsc, s->clock = data.clock
> > 
> > are relative to point close in time.
> 
> Ok.  But why are they not close in time?

Scenario 1

A. s->clock = get_clock()
B. env->tsc = rdtsc()

Scenario 2

A. s->clock = get_clock()
C. VM callbacks, bdrv_flush, ...
B. env->tsc = rdtsc()

They are not "close in time" because of C.

> Could we have the opposite situation where env->tsc is loaded a long
> time _after_ s->clock, and something breaks?

This particular read avoids an overflow. See 

+    assert(time.tsc_timestamp <= migration_tsc);


About your question, perhaps, would have to make up an
env->tsc, s->clock pair which breaks the code or a guest
application.

> Also, there is no reason to do kvmclock_current_nsec() during normal
> execution of the VM.  Is the s->clock sent by the source ever good
> across migration, and could the source send kvmclock_current_nsec()
> instead of whatever KVM_GET_CLOCK returns?

Yes it could. What difference does it make?

> I don't understand this code very well, but it seems to me that the
> migration handling and VM state change handler are mixed up...

I don't see what the problem is. I am sure you can understand the code.

Marcelo Tosatti Sept. 16, 2014, 4:54 p.m. UTC | #8

On Tue, Sep 16, 2014 at 01:48:24PM -0300, Marcelo Tosatti wrote:
> On Tue, Sep 16, 2014 at 06:22:15PM +0200, Paolo Bonzini wrote:
> > Il 16/09/2014 18:07, Marcelo Tosatti ha scritto:
> > >> > The cpu_synchronize_all_states() call in kvmclock_vm_state_change() is
> > >> > needed to make env->tsc up to date with the value on the source, right?
> > > Its there to make sure the pair
> > > 
> > > env->tsc, s->clock = data.clock
> > > 
> > > are relative to point close in time.
> > 
> > Ok.  But why are they not close in time?
> 
> Scenario 1
> 
> A. s->clock = get_clock()
> B. env->tsc = rdtsc()
> 
> Scenario 2
> 
> A. s->clock = get_clock()
> C. VM callbacks, bdrv_flush, ...
> B. env->tsc = rdtsc()
> 
> They are not "close in time" because of C.
> 
> > Could we have the opposite situation where env->tsc is loaded a long
> > time _after_ s->clock, and something breaks?
> 
> This particular read avoids an overflow. See 
> 
> +    assert(time.tsc_timestamp <= migration_tsc);
> 
> 
> About your question, perhaps, would have to make up an
> env->tsc, s->clock pair which breaks the code or a guest
> application.
> 
> > Also, there is no reason to do kvmclock_current_nsec() during normal
> > execution of the VM.  Is the s->clock sent by the source ever good
> > across migration, and could the source send kvmclock_current_nsec()
> > instead of whatever KVM_GET_CLOCK returns?
> 
> Yes it could. What difference does it make?
> 
> > I don't understand this code very well, but it seems to me that the
> > migration handling and VM state change handler are mixed up...
> 
> I don't see what the problem is. I am sure you can understand the code.

If you have a suggestion to improve the code, i am all ears. But IMHO,
its not terrible (migration and VM state change handling are mixed in 
other drivers as well).

Marcelo Tosatti Sept. 16, 2014, 6:10 p.m. UTC | #9

On Tue, Sep 16, 2014 at 06:22:15PM +0200, Paolo Bonzini wrote:
> Il 16/09/2014 18:07, Marcelo Tosatti ha scritto:
> >> > The cpu_synchronize_all_states() call in kvmclock_vm_state_change() is
> >> > needed to make env->tsc up to date with the value on the source, right?
> > Its there to make sure the pair
> > 
> > env->tsc, s->clock = data.clock
> > 
> > are relative to point close in time.
> 
> Ok.  But why are they not close in time?
> 
> Could we have the opposite situation where env->tsc is loaded a long
> time _after_ s->clock, and something breaks?
> 
> Also, there is no reason to do kvmclock_current_nsec() during normal
> execution of the VM.  Is the s->clock sent by the source ever good
> across migration, and could the source send kvmclock_current_nsec()
> instead of whatever KVM_GET_CLOCK returns?

guest clock read = pvclock.system_timestamp + (rdtsc() - pvclock.tsc)

Host kernel updates pvclock.system_timestamp in certain situations,
such as guest initialization. With master clock scheme,
pvclock.system_timestamp is only updated on guest initialization.

In case TSC runs faster than the host system clock, you cannot do 
the following on destination:

pvclock.system_timestamp = ioctl(KVM_GET_CLOCK)
povclock.tsc = rdtsc

guest clock read = pvclock.system_timestampOLD + (rdtsc() - pvclock.tsc)

Because the effective clock was not pvclock.system_timestamp but the
TSC, which is running at a higher frequency. If you do that, the time 
goes backward.

Q: could the source send kvmclock_current_nsec() 
instead of whatever KVM_GET_CLOCK returns?

Well no because there are other users of KVM_GET_CLOCK such as 
Hyper-V clock.

> I don't understand this code very well, but it seems to me that the
> migration handling and VM state change handler are mixed up...

Again, suggestions are welcome.

> 
> Paolo
> 
> >> > But if the synchronize_all_states+clean_all_dirty pair runs on the
> >> > source, why is the cpu_synchronize_all_states() call in
> >> > qemu_savevm_state_complete() not enough?  It runs even later than
> >> > kvmclock_vm_state_change.
> > Because of the "pair of time values" explanation above.
>

kvmclock: clarify usage of cpu_clean_all_dirty

Commit Message

Comments

Patch