Message ID | 20140916151437.GA27819@amt.cnet |
---|---|
State | New |
Headers | show |
Il 16/09/2014 17:14, Marcelo Tosatti ha scritto: > + /* > + * Make sure that CPU state is synchronized from KVM > + * once every VM state change callback has finished. Which other callback could affect the in-kernel state, and should that call cpu_clean_all_dirty instead? Paolo > + */ > cpu_clean_all_dirty();
On Tue, Sep 16, 2014 at 05:42:16PM +0200, Paolo Bonzini wrote: > Il 16/09/2014 17:14, Marcelo Tosatti ha scritto: > > + /* > > + * Make sure that CPU state is synchronized from KVM > > + * once every VM state change callback has finished. > > Which other callback could affect the in-kernel state, Marcin mentioned that APIC state was the culprit. Perhaps bdrv_drain_all(); ret = bdrv_flush_all(); Can change the interrupt state ? Then that should read "once VM stop has finished". > call cpu_clean_all_dirty instead? > > Paolo > > > + */ > > cpu_clean_all_dirty();
Il 16/09/2014 17:55, Marcelo Tosatti ha scritto: > On Tue, Sep 16, 2014 at 05:42:16PM +0200, Paolo Bonzini wrote: >> Il 16/09/2014 17:14, Marcelo Tosatti ha scritto: >>> + /* >>> + * Make sure that CPU state is synchronized from KVM >>> + * once every VM state change callback has finished. >> >> Which other callback could affect the in-kernel state, > > Marcin mentioned that APIC state was the culprit. > > Perhaps > > bdrv_drain_all(); > ret = bdrv_flush_all(); > > Can change the interrupt state ? Ah, I thought Marcin was checking on the destination, not the source. > Then that should read "once VM stop has finished". But I still do not understand. The cpu_synchronize_all_states() call in kvmclock_vm_state_change() is needed to make env->tsc up to date with the value on the source, right? But if the synchronize_all_states+clean_all_dirty pair runs on the source, why is the cpu_synchronize_all_states() call in qemu_savevm_state_complete() not enough? It runs even later than kvmclock_vm_state_change. I don't understand even the original patch without cpu_clean_all_dirty()... Paolo
On Tue, Sep 16, 2014 at 05:42:16PM +0200, Paolo Bonzini wrote: > Il 16/09/2014 17:14, Marcelo Tosatti ha scritto: > > + /* > > + * Make sure that CPU state is synchronized from KVM > > + * once every VM state change callback has finished. > > Which other callback could affect the in-kernel state, Do you want me to find out exactly which callback is changing the in-kernel interrupt state? To be honest: i don't know. > and should that call cpu_clean_all_dirty instead? Does not make sense: any callback which modifies any in-kernel register state should clean dirty state because its possibly set.
On Tue, Sep 16, 2014 at 06:03:40PM +0200, Paolo Bonzini wrote: > Il 16/09/2014 17:55, Marcelo Tosatti ha scritto: > > On Tue, Sep 16, 2014 at 05:42:16PM +0200, Paolo Bonzini wrote: > >> Il 16/09/2014 17:14, Marcelo Tosatti ha scritto: > >>> + /* > >>> + * Make sure that CPU state is synchronized from KVM > >>> + * once every VM state change callback has finished. > >> > >> Which other callback could affect the in-kernel state, > > > > Marcin mentioned that APIC state was the culprit. > > > > Perhaps > > > > bdrv_drain_all(); > > ret = bdrv_flush_all(); > > > > Can change the interrupt state ? > > Ah, I thought Marcin was checking on the destination, not the source. > > > Then that should read "once VM stop has finished". > > But I still do not understand. > > The cpu_synchronize_all_states() call in kvmclock_vm_state_change() is > needed to make env->tsc up to date with the value on the source, right? Its there to make sure the pair env->tsc, s->clock = data.clock are relative to point close in time. > But if the synchronize_all_states+clean_all_dirty pair runs on the > source, why is the cpu_synchronize_all_states() call in > qemu_savevm_state_complete() not enough? It runs even later than > kvmclock_vm_state_change. Because of the "pair of time values" explanation above. > I don't understand even the original patch without cpu_clean_all_dirty()... > > Paolo
Il 16/09/2014 18:07, Marcelo Tosatti ha scritto: >> > The cpu_synchronize_all_states() call in kvmclock_vm_state_change() is >> > needed to make env->tsc up to date with the value on the source, right? > Its there to make sure the pair > > env->tsc, s->clock = data.clock > > are relative to point close in time. Ok. But why are they not close in time? Could we have the opposite situation where env->tsc is loaded a long time _after_ s->clock, and something breaks? Also, there is no reason to do kvmclock_current_nsec() during normal execution of the VM. Is the s->clock sent by the source ever good across migration, and could the source send kvmclock_current_nsec() instead of whatever KVM_GET_CLOCK returns? I don't understand this code very well, but it seems to me that the migration handling and VM state change handler are mixed up... Paolo >> > But if the synchronize_all_states+clean_all_dirty pair runs on the >> > source, why is the cpu_synchronize_all_states() call in >> > qemu_savevm_state_complete() not enough? It runs even later than >> > kvmclock_vm_state_change. > Because of the "pair of time values" explanation above.
On Tue, Sep 16, 2014 at 06:22:15PM +0200, Paolo Bonzini wrote: > Il 16/09/2014 18:07, Marcelo Tosatti ha scritto: > >> > The cpu_synchronize_all_states() call in kvmclock_vm_state_change() is > >> > needed to make env->tsc up to date with the value on the source, right? > > Its there to make sure the pair > > > > env->tsc, s->clock = data.clock > > > > are relative to point close in time. > > Ok. But why are they not close in time? Scenario 1 A. s->clock = get_clock() B. env->tsc = rdtsc() Scenario 2 A. s->clock = get_clock() C. VM callbacks, bdrv_flush, ... B. env->tsc = rdtsc() They are not "close in time" because of C. > Could we have the opposite situation where env->tsc is loaded a long > time _after_ s->clock, and something breaks? This particular read avoids an overflow. See + assert(time.tsc_timestamp <= migration_tsc); About your question, perhaps, would have to make up an env->tsc, s->clock pair which breaks the code or a guest application. > Also, there is no reason to do kvmclock_current_nsec() during normal > execution of the VM. Is the s->clock sent by the source ever good > across migration, and could the source send kvmclock_current_nsec() > instead of whatever KVM_GET_CLOCK returns? Yes it could. What difference does it make? > I don't understand this code very well, but it seems to me that the > migration handling and VM state change handler are mixed up... I don't see what the problem is. I am sure you can understand the code.
On Tue, Sep 16, 2014 at 01:48:24PM -0300, Marcelo Tosatti wrote: > On Tue, Sep 16, 2014 at 06:22:15PM +0200, Paolo Bonzini wrote: > > Il 16/09/2014 18:07, Marcelo Tosatti ha scritto: > > >> > The cpu_synchronize_all_states() call in kvmclock_vm_state_change() is > > >> > needed to make env->tsc up to date with the value on the source, right? > > > Its there to make sure the pair > > > > > > env->tsc, s->clock = data.clock > > > > > > are relative to point close in time. > > > > Ok. But why are they not close in time? > > Scenario 1 > > A. s->clock = get_clock() > B. env->tsc = rdtsc() > > Scenario 2 > > A. s->clock = get_clock() > C. VM callbacks, bdrv_flush, ... > B. env->tsc = rdtsc() > > They are not "close in time" because of C. > > > Could we have the opposite situation where env->tsc is loaded a long > > time _after_ s->clock, and something breaks? > > This particular read avoids an overflow. See > > + assert(time.tsc_timestamp <= migration_tsc); > > > About your question, perhaps, would have to make up an > env->tsc, s->clock pair which breaks the code or a guest > application. > > > Also, there is no reason to do kvmclock_current_nsec() during normal > > execution of the VM. Is the s->clock sent by the source ever good > > across migration, and could the source send kvmclock_current_nsec() > > instead of whatever KVM_GET_CLOCK returns? > > Yes it could. What difference does it make? > > > I don't understand this code very well, but it seems to me that the > > migration handling and VM state change handler are mixed up... > > I don't see what the problem is. I am sure you can understand the code. If you have a suggestion to improve the code, i am all ears. But IMHO, its not terrible (migration and VM state change handling are mixed in other drivers as well).
On Tue, Sep 16, 2014 at 06:22:15PM +0200, Paolo Bonzini wrote: > Il 16/09/2014 18:07, Marcelo Tosatti ha scritto: > >> > The cpu_synchronize_all_states() call in kvmclock_vm_state_change() is > >> > needed to make env->tsc up to date with the value on the source, right? > > Its there to make sure the pair > > > > env->tsc, s->clock = data.clock > > > > are relative to point close in time. > > Ok. But why are they not close in time? > > Could we have the opposite situation where env->tsc is loaded a long > time _after_ s->clock, and something breaks? > > Also, there is no reason to do kvmclock_current_nsec() during normal > execution of the VM. Is the s->clock sent by the source ever good > across migration, and could the source send kvmclock_current_nsec() > instead of whatever KVM_GET_CLOCK returns? guest clock read = pvclock.system_timestamp + (rdtsc() - pvclock.tsc) Host kernel updates pvclock.system_timestamp in certain situations, such as guest initialization. With master clock scheme, pvclock.system_timestamp is only updated on guest initialization. In case TSC runs faster than the host system clock, you cannot do the following on destination: pvclock.system_timestamp = ioctl(KVM_GET_CLOCK) povclock.tsc = rdtsc guest clock read = pvclock.system_timestampOLD + (rdtsc() - pvclock.tsc) Because the effective clock was not pvclock.system_timestamp but the TSC, which is running at a higher frequency. If you do that, the time goes backward. Q: could the source send kvmclock_current_nsec() instead of whatever KVM_GET_CLOCK returns? Well no because there are other users of KVM_GET_CLOCK such as Hyper-V clock. > I don't understand this code very well, but it seems to me that the > migration handling and VM state change handler are mixed up... Again, suggestions are welcome. > > Paolo > > >> > But if the synchronize_all_states+clean_all_dirty pair runs on the > >> > source, why is the cpu_synchronize_all_states() call in > >> > qemu_savevm_state_complete() not enough? It runs even later than > >> > kvmclock_vm_state_change. > > Because of the "pair of time values" explanation above. >
Index: qemu/hw/i386/kvm/clock.c =================================================================== --- qemu.orig/hw/i386/kvm/clock.c 2014-09-16 12:12:00.622537799 -0300 +++ qemu/hw/i386/kvm/clock.c 2014-09-16 12:12:44.696555868 -0300 @@ -127,6 +127,10 @@ } cpu_synchronize_all_states(); + /* + * Make sure that CPU state is synchronized from KVM + * once every VM state change callback has finished. + */ cpu_clean_all_dirty(); ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data); if (ret < 0) {
Explain the cpu_clean_all_dirty call. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> --- hw/i386/kvm/clock.c | 4 ++++ 1 file changed, 4 insertions(+)