Patchwork Re: [PATCH v3 07/10] qemu-kvm: Cleanup/fix TSC and PV clock writeback

login
register
mail settings
Submitter Jan Kiszka
Date Feb. 25, 2010, 8:48 a.m.
Message ID <4B86396F.8070208@web.de>
Download mbox | patch
Permalink /patch/46227/
State New
Headers show

Comments

Jan Kiszka - Feb. 25, 2010, 8:48 a.m.
Marcelo Tosatti wrote:
> On Thu, Feb 25, 2010 at 12:58:26AM +0100, Jan Kiszka wrote:
>> Marcelo Tosatti wrote:
>>> On Thu, Feb 25, 2010 at 12:45:55AM +0100, Jan Kiszka wrote:
>>>> Marcelo Tosatti wrote:
>>>>> On Wed, Feb 24, 2010 at 03:17:55PM +0100, Jan Kiszka wrote:
>>>>>> Drop kvm_load_tsc in favor of level-dependent writeback in
>>>>>> kvm_arch_load_regs. KVM's PV clock MSRs fall in the same category and
>>>>>> should therefore only be written back on full sync.
>>>>>>
>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>> ---
>>>>>>  qemu-kvm-x86.c        |   19 +++++--------------
>>>>>>  qemu-kvm.h            |    4 ----
>>>>>>  target-i386/machine.c |    5 -----
>>>>>>  3 files changed, 5 insertions(+), 23 deletions(-)
>>>>>>
>>>>>> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
>>>>>> index 840c1c9..84fd7fa 100644
>>>>>> --- a/qemu-kvm-x86.c
>>>>>> +++ b/qemu-kvm-x86.c
>>>>>> @@ -965,8 +965,11 @@ void kvm_arch_load_regs(CPUState *env, int level)
>>>>>>          set_msr_entry(&msrs[n++], MSR_LSTAR  ,           env->lstar);
>>>>>>      }
>>>>>>  #endif
>>>>>> -    set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME,  env->system_time_msr);
>>>>>> -    set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK,  env->wall_clock_msr);
>>>>>> +    if (level == KVM_PUT_FULL_STATE) {
>>>>>> +        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
>>>>>> +        set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
>>>>>> +        set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
>>>>>> +    }
>>>>> As things stand today, the TSC should only be written on migration. See
>>>>> 53f658b3c33616a4997ee254311b335e59063289 in the kernel.
>>>> Migration and power-up - that's what this patch ensures (=>
>>>> KVM_PUT_FULL_STATE). Or where do you see any problem?
>>>>
>>>> Jan
>>>>
>>> The problem is it should not write on power up (the kernel attempts
>>> to synchronize the TSCs in that case, see the commit).
>>>
>> OK, need to read this more carefully.
>>
>> I do not yet understand the difference from user space POV: it tries to
>> transfer the identical TSC values to all currently stopped VCPU threads.
> 
> guest tsc = host tsc + offset
> 
> So at the time you set_msr(TSC), the guest visible TSC starts ticking. 
> For SMP guests, this does not happen exactly at the same time for all
> vcpus.

Ouch.

> 
>> That should not be different if we are booting a fresh VM or loading a
>> complete state of a migrated image. If it does, it looks like a KVM
>> kernel deficit on first glance.
> 
> Yes it is a deficit. After migration TSCs of SMP guests go out of sync.
> Zachary is working on that.
> 

OK, so we need a workaround, ideally without reintroducing hooks. Is
this one acceptable?




Jan
Marcelo Tosatti - Feb. 25, 2010, 3:07 p.m.
On Thu, Feb 25, 2010 at 09:48:47AM +0100, Jan Kiszka wrote:
> Marcelo Tosatti wrote:
> > On Thu, Feb 25, 2010 at 12:58:26AM +0100, Jan Kiszka wrote:
> >> Marcelo Tosatti wrote:
> >>> On Thu, Feb 25, 2010 at 12:45:55AM +0100, Jan Kiszka wrote:
> >>>> Marcelo Tosatti wrote:
> >>>>> On Wed, Feb 24, 2010 at 03:17:55PM +0100, Jan Kiszka wrote:
> >>>>>> Drop kvm_load_tsc in favor of level-dependent writeback in
> >>>>>> kvm_arch_load_regs. KVM's PV clock MSRs fall in the same category and
> >>>>>> should therefore only be written back on full sync.
> >>>>>>
> >>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>>>> ---
> >>>>>>  qemu-kvm-x86.c        |   19 +++++--------------
> >>>>>>  qemu-kvm.h            |    4 ----
> >>>>>>  target-i386/machine.c |    5 -----
> >>>>>>  3 files changed, 5 insertions(+), 23 deletions(-)
> >>>>>>
> >>>>>> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
> >>>>>> index 840c1c9..84fd7fa 100644
> >>>>>> --- a/qemu-kvm-x86.c
> >>>>>> +++ b/qemu-kvm-x86.c
> >>>>>> @@ -965,8 +965,11 @@ void kvm_arch_load_regs(CPUState *env, int level)
> >>>>>>          set_msr_entry(&msrs[n++], MSR_LSTAR  ,           env->lstar);
> >>>>>>      }
> >>>>>>  #endif
> >>>>>> -    set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME,  env->system_time_msr);
> >>>>>> -    set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK,  env->wall_clock_msr);
> >>>>>> +    if (level == KVM_PUT_FULL_STATE) {
> >>>>>> +        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
> >>>>>> +        set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
> >>>>>> +        set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
> >>>>>> +    }
> >>>>> As things stand today, the TSC should only be written on migration. See
> >>>>> 53f658b3c33616a4997ee254311b335e59063289 in the kernel.
> >>>> Migration and power-up - that's what this patch ensures (=>
> >>>> KVM_PUT_FULL_STATE). Or where do you see any problem?
> >>>>
> >>>> Jan
> >>>>
> >>> The problem is it should not write on power up (the kernel attempts
> >>> to synchronize the TSCs in that case, see the commit).
> >>>
> >> OK, need to read this more carefully.
> >>
> >> I do not yet understand the difference from user space POV: it tries to
> >> transfer the identical TSC values to all currently stopped VCPU threads.
> > 
> > guest tsc = host tsc + offset
> > 
> > So at the time you set_msr(TSC), the guest visible TSC starts ticking. 
> > For SMP guests, this does not happen exactly at the same time for all
> > vcpus.
> 
> Ouch.
> 
> > 
> >> That should not be different if we are booting a fresh VM or loading a
> >> complete state of a migrated image. If it does, it looks like a KVM
> >> kernel deficit on first glance.
> > 
> > Yes it is a deficit. After migration TSCs of SMP guests go out of sync.
> > Zachary is working on that.
> > 
> 
> OK, so we need a workaround, ideally without reintroducing hooks. Is
> this one acceptable?
> 
> 
> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
> index 84fd7fa..285c05a 100644
> --- a/qemu-kvm-x86.c
> +++ b/qemu-kvm-x86.c
> @@ -966,7 +966,15 @@ void kvm_arch_load_regs(CPUState *env, int level)
>      }
>  #endif
>      if (level == KVM_PUT_FULL_STATE) {
> -        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
> +        /*
> +         * KVM is yet unable to synchronize TSC values of multiple VCPUs on
> +         * writeback. Until this is fixed, we only write the offset to SMP
> +         * guests after migration, desynchronizing the VCPUs, but avoiding
> +         * huge jump-backs that would occur without any writeback at all.
> +         */
> +        if (smp_cpus == 1 || env->tsc != 0) {
> +            set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
> +        }
>          set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
>          set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
>      }

Well, migration of the SMP TSCs is not precise, but it is better than
zeroing the TSCs on migration. So something like a new state (migration
only), or a hack that mimicks that behaviour, is needed :(
Jan Kiszka - Feb. 25, 2010, 3:17 p.m.
Marcelo Tosatti wrote:
> On Thu, Feb 25, 2010 at 09:48:47AM +0100, Jan Kiszka wrote:
>> Marcelo Tosatti wrote:
>>> On Thu, Feb 25, 2010 at 12:58:26AM +0100, Jan Kiszka wrote:
>>>> Marcelo Tosatti wrote:
>>>>> On Thu, Feb 25, 2010 at 12:45:55AM +0100, Jan Kiszka wrote:
>>>>>> Marcelo Tosatti wrote:
>>>>>>> On Wed, Feb 24, 2010 at 03:17:55PM +0100, Jan Kiszka wrote:
>>>>>>>> Drop kvm_load_tsc in favor of level-dependent writeback in
>>>>>>>> kvm_arch_load_regs. KVM's PV clock MSRs fall in the same category and
>>>>>>>> should therefore only be written back on full sync.
>>>>>>>>
>>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>>>>>> ---
>>>>>>>>  qemu-kvm-x86.c        |   19 +++++--------------
>>>>>>>>  qemu-kvm.h            |    4 ----
>>>>>>>>  target-i386/machine.c |    5 -----
>>>>>>>>  3 files changed, 5 insertions(+), 23 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
>>>>>>>> index 840c1c9..84fd7fa 100644
>>>>>>>> --- a/qemu-kvm-x86.c
>>>>>>>> +++ b/qemu-kvm-x86.c
>>>>>>>> @@ -965,8 +965,11 @@ void kvm_arch_load_regs(CPUState *env, int level)
>>>>>>>>          set_msr_entry(&msrs[n++], MSR_LSTAR  ,           env->lstar);
>>>>>>>>      }
>>>>>>>>  #endif
>>>>>>>> -    set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME,  env->system_time_msr);
>>>>>>>> -    set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK,  env->wall_clock_msr);
>>>>>>>> +    if (level == KVM_PUT_FULL_STATE) {
>>>>>>>> +        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
>>>>>>>> +        set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
>>>>>>>> +        set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
>>>>>>>> +    }
>>>>>>> As things stand today, the TSC should only be written on migration. See
>>>>>>> 53f658b3c33616a4997ee254311b335e59063289 in the kernel.
>>>>>> Migration and power-up - that's what this patch ensures (=>
>>>>>> KVM_PUT_FULL_STATE). Or where do you see any problem?
>>>>>>
>>>>>> Jan
>>>>>>
>>>>> The problem is it should not write on power up (the kernel attempts
>>>>> to synchronize the TSCs in that case, see the commit).
>>>>>
>>>> OK, need to read this more carefully.
>>>>
>>>> I do not yet understand the difference from user space POV: it tries to
>>>> transfer the identical TSC values to all currently stopped VCPU threads.
>>> guest tsc = host tsc + offset
>>>
>>> So at the time you set_msr(TSC), the guest visible TSC starts ticking. 
>>> For SMP guests, this does not happen exactly at the same time for all
>>> vcpus.
>> Ouch.
>>
>>>> That should not be different if we are booting a fresh VM or loading a
>>>> complete state of a migrated image. If it does, it looks like a KVM
>>>> kernel deficit on first glance.
>>> Yes it is a deficit. After migration TSCs of SMP guests go out of sync.
>>> Zachary is working on that.
>>>
>> OK, so we need a workaround, ideally without reintroducing hooks. Is
>> this one acceptable?
>>
>>
>> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
>> index 84fd7fa..285c05a 100644
>> --- a/qemu-kvm-x86.c
>> +++ b/qemu-kvm-x86.c
>> @@ -966,7 +966,15 @@ void kvm_arch_load_regs(CPUState *env, int level)
>>      }
>>  #endif
>>      if (level == KVM_PUT_FULL_STATE) {
>> -        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
>> +        /*
>> +         * KVM is yet unable to synchronize TSC values of multiple VCPUs on
>> +         * writeback. Until this is fixed, we only write the offset to SMP
>> +         * guests after migration, desynchronizing the VCPUs, but avoiding
>> +         * huge jump-backs that would occur without any writeback at all.
>> +         */
>> +        if (smp_cpus == 1 || env->tsc != 0) {
>> +            set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
>> +        }
>>          set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
>>          set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
>>      }
> 
> Well, migration of the SMP TSCs is not precise, but it is better than
> zeroing the TSCs on migration. So something like a new state (migration
> only), or a hack that mimicks that behaviour, is needed :(

It is not precise, but the above code shouldn't behave differently
compared to the existing one: tcp != 0 => we are loading values from
some VM that ran before, i.e. we are migrating.

If somehow possible, I do not want to introduce a new writeback level
for an x86-only kernel quirk.

Jan
Marcelo Tosatti - Feb. 25, 2010, 3:48 p.m.
On Thu, Feb 25, 2010 at 04:17:22PM +0100, Jan Kiszka wrote:
> Marcelo Tosatti wrote:
> > On Thu, Feb 25, 2010 at 09:48:47AM +0100, Jan Kiszka wrote:
> >> Marcelo Tosatti wrote:
> >>> On Thu, Feb 25, 2010 at 12:58:26AM +0100, Jan Kiszka wrote:
> >>>> Marcelo Tosatti wrote:
> >>>>> On Thu, Feb 25, 2010 at 12:45:55AM +0100, Jan Kiszka wrote:
> >>>>>> Marcelo Tosatti wrote:
> >>>>>>> On Wed, Feb 24, 2010 at 03:17:55PM +0100, Jan Kiszka wrote:
> >>>>>>>> Drop kvm_load_tsc in favor of level-dependent writeback in
> >>>>>>>> kvm_arch_load_regs. KVM's PV clock MSRs fall in the same category and
> >>>>>>>> should therefore only be written back on full sync.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> >>>>>>>> ---
> >>>>>>>>  qemu-kvm-x86.c        |   19 +++++--------------
> >>>>>>>>  qemu-kvm.h            |    4 ----
> >>>>>>>>  target-i386/machine.c |    5 -----
> >>>>>>>>  3 files changed, 5 insertions(+), 23 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
> >>>>>>>> index 840c1c9..84fd7fa 100644
> >>>>>>>> --- a/qemu-kvm-x86.c
> >>>>>>>> +++ b/qemu-kvm-x86.c
> >>>>>>>> @@ -965,8 +965,11 @@ void kvm_arch_load_regs(CPUState *env, int level)
> >>>>>>>>          set_msr_entry(&msrs[n++], MSR_LSTAR  ,           env->lstar);
> >>>>>>>>      }
> >>>>>>>>  #endif
> >>>>>>>> -    set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME,  env->system_time_msr);
> >>>>>>>> -    set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK,  env->wall_clock_msr);
> >>>>>>>> +    if (level == KVM_PUT_FULL_STATE) {
> >>>>>>>> +        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
> >>>>>>>> +        set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
> >>>>>>>> +        set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
> >>>>>>>> +    }
> >>>>>>> As things stand today, the TSC should only be written on migration. See
> >>>>>>> 53f658b3c33616a4997ee254311b335e59063289 in the kernel.
> >>>>>> Migration and power-up - that's what this patch ensures (=>
> >>>>>> KVM_PUT_FULL_STATE). Or where do you see any problem?
> >>>>>>
> >>>>>> Jan
> >>>>>>
> >>>>> The problem is it should not write on power up (the kernel attempts
> >>>>> to synchronize the TSCs in that case, see the commit).
> >>>>>
> >>>> OK, need to read this more carefully.
> >>>>
> >>>> I do not yet understand the difference from user space POV: it tries to
> >>>> transfer the identical TSC values to all currently stopped VCPU threads.
> >>> guest tsc = host tsc + offset
> >>>
> >>> So at the time you set_msr(TSC), the guest visible TSC starts ticking. 
> >>> For SMP guests, this does not happen exactly at the same time for all
> >>> vcpus.
> >> Ouch.
> >>
> >>>> That should not be different if we are booting a fresh VM or loading a
> >>>> complete state of a migrated image. If it does, it looks like a KVM
> >>>> kernel deficit on first glance.
> >>> Yes it is a deficit. After migration TSCs of SMP guests go out of sync.
> >>> Zachary is working on that.
> >>>
> >> OK, so we need a workaround, ideally without reintroducing hooks. Is
> >> this one acceptable?
> >>
> >>
> >> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
> >> index 84fd7fa..285c05a 100644
> >> --- a/qemu-kvm-x86.c
> >> +++ b/qemu-kvm-x86.c
> >> @@ -966,7 +966,15 @@ void kvm_arch_load_regs(CPUState *env, int level)
> >>      }
> >>  #endif
> >>      if (level == KVM_PUT_FULL_STATE) {
> >> -        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
> >> +        /*
> >> +         * KVM is yet unable to synchronize TSC values of multiple VCPUs on
> >> +         * writeback. Until this is fixed, we only write the offset to SMP
> >> +         * guests after migration, desynchronizing the VCPUs, but avoiding
> >> +         * huge jump-backs that would occur without any writeback at all.
> >> +         */
> >> +        if (smp_cpus == 1 || env->tsc != 0) {
> >> +            set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
> >> +        }
> >>          set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
> >>          set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
> >>      }
> > 
> > Well, migration of the SMP TSCs is not precise, but it is better than
> > zeroing the TSCs on migration. So something like a new state (migration
> > only), or a hack that mimicks that behaviour, is needed :(
> 
> It is not precise, but the above code shouldn't behave differently
> compared to the existing one: tcp != 0 => we are loading values from
> some VM that ran before, i.e. we are migrating.
> 
> If somehow possible, I do not want to introduce a new writeback level
> for an x86-only kernel quirk.
> 
> Jan

I can't read. Looks OK.

Patch

diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c
index 84fd7fa..285c05a 100644
--- a/qemu-kvm-x86.c
+++ b/qemu-kvm-x86.c
@@ -966,7 +966,15 @@  void kvm_arch_load_regs(CPUState *env, int level)
     }
 #endif
     if (level == KVM_PUT_FULL_STATE) {
-        set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
+        /*
+         * KVM is yet unable to synchronize TSC values of multiple VCPUs on
+         * writeback. Until this is fixed, we only write the offset to SMP
+         * guests after migration, desynchronizing the VCPUs, but avoiding
+         * huge jump-backs that would occur without any writeback at all.
+         */
+        if (smp_cpus == 1 || env->tsc != 0) {
+            set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc);
+        }
         set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr);
         set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr);
     }