Patchwork [v2,14/17] kvm: x86: Introduce kvmclock device to save/restore its state

login
register
mail settings
Submitter Jan Kiszka
Date Jan. 3, 2011, 8:33 a.m.
Message ID <d35c17614fa7646ff48ef57f7922aec01e1044a6.1294043582.git.jan.kiszka@web.de>
Download mbox | patch
Permalink /patch/77226/
State New
Headers show

Comments

Jan Kiszka - Jan. 3, 2011, 8:33 a.m.
From: Jan Kiszka <jan.kiszka@siemens.com>

If kvmclock is used, which implies the kernel supports it, register a
kvmclock device with the sysbus. Its main purpose is to save and restore
the kernel state on migration, but this will also allow to visualize it
one day.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Glauber Costa <glommer@redhat.com>
---
 target-i386/kvm.c |   71 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 70 insertions(+), 1 deletions(-)
Avi Kivity - Jan. 3, 2011, 4:04 p.m.
On 01/03/2011 10:33 AM, Jan Kiszka wrote:
> From: Jan Kiszka<jan.kiszka@siemens.com>
>
> If kvmclock is used, which implies the kernel supports it, register a
> kvmclock device with the sysbus. Its main purpose is to save and restore
> the kernel state on migration, but this will also allow to visualize it
> one day.
>

kvmclock is a per-cpu affair.

>
> @@ -534,6 +599,10 @@ int kvm_arch_init(int smp_cpus)
>       int ret;
>       struct utsname utsname;
>
> +#ifdef KVM_CAP_ADJUST_CLOCK
> +    sysbus_register_withprop(&kvmclock_info);
> +#endif
> +

So this doesn't look right.  I think we're fine with just migrating the 
MSRs, like we migrate anything else that has to do with the cpu.
Jan Kiszka - Jan. 3, 2011, 4:30 p.m.
Am 03.01.2011 17:04, Avi Kivity wrote:
> On 01/03/2011 10:33 AM, Jan Kiszka wrote:
>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>
>> If kvmclock is used, which implies the kernel supports it, register a
>> kvmclock device with the sysbus. Its main purpose is to save and restore
>> the kernel state on migration, but this will also allow to visualize it
>> one day.
>>
> 
> kvmclock is a per-cpu affair.

Nope, it's state (the one save/restored here) is per VM.

> 
>>
>> @@ -534,6 +599,10 @@ int kvm_arch_init(int smp_cpus)
>>       int ret;
>>       struct utsname utsname;
>>
>> +#ifdef KVM_CAP_ADJUST_CLOCK
>> +    sysbus_register_withprop(&kvmclock_info);
>> +#endif
>> +
> 
> So this doesn't look right.  I think we're fine with just migrating the
> MSRs, like we migrate anything else that has to do with the cpu.
> 

The kvmclock state is not contained in any MSR. It's an independent
machine state that can be indirectly obtained via MSR access. Therefore,
qemu-kvm currently registers only one vmstate entry per machine, and
this patch just turns this into a clean device - because that's what
kvmclock is in the end, something like an HPET.

Jan
Glauber Costa - Jan. 3, 2011, 4:37 p.m.
On Mon, 2011-01-03 at 09:33 +0100, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> If kvmclock is used, which implies the kernel supports it, register a
> kvmclock device with the sysbus. Its main purpose is to save and restore
> the kernel state on migration, but this will also allow to visualize it
> one day.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> CC: Glauber Costa <glommer@redhat.com>

Hi Jan.

I've just recently posted a patch (not sure what was made from it), that
fixes a bug that you reintroduce here.

The bug is: if we call KVM_GET_CLOCK ioctl in pre_save, this means that
this value will change every time we issue savevm, even if the machine
is not run in between.

Ideally, we'd like to have two consecutive savevms returning the exact
same thing in that situation.

I like the general approach of moving it to sysbus, but please move the
ioctl to change state notifiers.

Cheers!
Glauber Costa - Jan. 3, 2011, 4:38 p.m.
On Mon, 2011-01-03 at 18:04 +0200, Avi Kivity wrote:
> On 01/03/2011 10:33 AM, Jan Kiszka wrote:
> > From: Jan Kiszka<jan.kiszka@siemens.com>
> >
> > If kvmclock is used, which implies the kernel supports it, register a
> > kvmclock device with the sysbus. Its main purpose is to save and restore
> > the kernel state on migration, but this will also allow to visualize it
> > one day.
> >
> 
> kvmclock is a per-cpu affair.
> 
> >
> > @@ -534,6 +599,10 @@ int kvm_arch_init(int smp_cpus)
> >       int ret;
> >       struct utsname utsname;
> >
> > +#ifdef KVM_CAP_ADJUST_CLOCK
> > +    sysbus_register_withprop(&kvmclock_info);
> > +#endif
> > +
> 
> So this doesn't look right.  I think we're fine with just migrating the 
> MSRs, like we migrate anything else that has to do with the cpu.

The ioctl jan is handling here is a global one, that adjusts the base
offset for the clock over migration. It is okay.
Glauber Costa - Jan. 3, 2011, 4:39 p.m.
On Mon, 2011-01-03 at 17:30 +0100, Jan Kiszka wrote:
> Am 03.01.2011 17:04, Avi Kivity wrote:
> > On 01/03/2011 10:33 AM, Jan Kiszka wrote:
> >> From: Jan Kiszka<jan.kiszka@siemens.com>
> >>
> >> If kvmclock is used, which implies the kernel supports it, register a
> >> kvmclock device with the sysbus. Its main purpose is to save and restore
> >> the kernel state on migration, but this will also allow to visualize it
> >> one day.
> >>
> > 
> > kvmclock is a per-cpu affair.
> 
> Nope, it's state (the one save/restored here) is per VM.
> 
> > 
> >>
> >> @@ -534,6 +599,10 @@ int kvm_arch_init(int smp_cpus)
> >>       int ret;
> >>       struct utsname utsname;
> >>
> >> +#ifdef KVM_CAP_ADJUST_CLOCK
> >> +    sysbus_register_withprop(&kvmclock_info);
> >> +#endif
> >> +
> > 
> > So this doesn't look right.  I think we're fine with just migrating the
> > MSRs, like we migrate anything else that has to do with the cpu.
> > 
> 
> The kvmclock state is not contained in any MSR. It's an independent
> machine state that can be indirectly obtained via MSR access. Therefore,
> qemu-kvm currently registers only one vmstate entry per machine, and
> this patch just turns this into a clean device - because that's what
> kvmclock is in the end, something like an HPET.

Jan is right, the per-cpu information only cares about which piece of
memory will be written to.
Avi Kivity - Jan. 3, 2011, 4:41 p.m.
On 01/03/2011 06:30 PM, Jan Kiszka wrote:
> Am 03.01.2011 17:04, Avi Kivity wrote:
> >  On 01/03/2011 10:33 AM, Jan Kiszka wrote:
> >>  From: Jan Kiszka<jan.kiszka@siemens.com>
> >>
> >>  If kvmclock is used, which implies the kernel supports it, register a
> >>  kvmclock device with the sysbus. Its main purpose is to save and restore
> >>  the kernel state on migration, but this will also allow to visualize it
> >>  one day.
> >>
> >
> >  kvmclock is a per-cpu affair.
>
> Nope, it's state (the one save/restored here) is per VM.
>
> >
> >>
> >>  @@ -534,6 +599,10 @@ int kvm_arch_init(int smp_cpus)
> >>        int ret;
> >>        struct utsname utsname;
> >>
> >>  +#ifdef KVM_CAP_ADJUST_CLOCK
> >>  +    sysbus_register_withprop(&kvmclock_info);
> >>  +#endif
> >>  +
> >
> >  So this doesn't look right.  I think we're fine with just migrating the
> >  MSRs, like we migrate anything else that has to do with the cpu.
> >
>
> The kvmclock state is not contained in any MSR. It's an independent
> machine state that can be indirectly obtained via MSR access. Therefore,
> qemu-kvm currently registers only one vmstate entry per machine, and
> this patch just turns this into a clean device - because that's what
> kvmclock is in the end, something like an HPET.

Okay, thanks for the correction.
Jan Kiszka - Jan. 3, 2011, 4:48 p.m.
Am 03.01.2011 17:37, Glauber Costa wrote:
> On Mon, 2011-01-03 at 09:33 +0100, Jan Kiszka wrote:
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> If kvmclock is used, which implies the kernel supports it, register a
>> kvmclock device with the sysbus. Its main purpose is to save and restore
>> the kernel state on migration, but this will also allow to visualize it
>> one day.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> CC: Glauber Costa <glommer@redhat.com>
> 
> Hi Jan.
> 
> I've just recently posted a patch (not sure what was made from it), that
> fixes a bug that you reintroduce here.

Hmm, must have missed it.

> 
> The bug is: if we call KVM_GET_CLOCK ioctl in pre_save, this means that
> this value will change every time we issue savevm, even if the machine
> is not run in between.
> 
> Ideally, we'd like to have two consecutive savevms returning the exact
> same thing in that situation.
> 
> I like the general approach of moving it to sysbus, but please move the
> ioctl to change state notifiers.

OK, will look into this.

Jan

Patch

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 26f459c..c4b6813 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -29,6 +29,7 @@ 
 #include "hw/apic.h"
 #include "ioport.h"
 #include "kvm_x86.h"
+#include "hw/sysbus.h"
 
 #ifdef CONFIG_KVM_PARA
 #include <linux/kvm_para.h>
@@ -309,6 +310,64 @@  void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
 #endif
 }
 
+#ifdef KVM_CAP_ADJUST_CLOCK
+typedef struct KVMClockState {
+    SysBusDevice busdev;
+    uint64_t clock;
+    struct kvm_clock_data data;
+} KVMClockState;
+
+static void kvmclock_pre_save(void *opaque)
+{
+    KVMClockState *s = opaque;
+    struct kvm_clock_data data;
+    int ret;
+
+    ret = kvm_vm_ioctl(KVM_GET_CLOCK, &data);
+    if (ret < 0) {
+        fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
+        data.clock = 0;
+    }
+    s->clock = data.clock;
+}
+
+static int kvmclock_post_load(void *opaque, int version_id)
+{
+    KVMClockState *s = opaque;
+    struct kvm_clock_data data;
+
+    data.clock = s->clock;
+    data.flags = 0;
+    return kvm_vm_ioctl(KVM_SET_CLOCK, &data);
+}
+
+static int kvmclock_init(SysBusDevice *dev)
+{
+    return 0;
+}
+
+static const VMStateDescription kvmclock_vmsd= {
+    .name = "kvmclock",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .pre_save = kvmclock_pre_save,
+    .post_load = kvmclock_post_load,
+    .fields = (VMStateField []) {
+        VMSTATE_UINT64(clock, KVMClockState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static SysBusDeviceInfo kvmclock_info = {
+    .qdev.name = "kvmclock",
+    .qdev.size = sizeof(KVMClockState),
+    .qdev.vmsd = &kvmclock_vmsd,
+    .qdev.no_user = 1,
+    .init = kvmclock_init,
+};
+#endif /* KVM_CAP_ADJUST_CLOCK */
+
 int kvm_arch_init_vcpu(CPUState *env)
 {
     struct {
@@ -335,7 +394,6 @@  int kvm_arch_init_vcpu(CPUState *env)
     env->cpuid_svm_features  &= kvm_x86_get_supported_cpuid(0x8000000A,
                                                             0, R_EDX);
 
-
     cpuid_i = 0;
 
 #ifdef CONFIG_KVM_PARA
@@ -442,6 +500,13 @@  int kvm_arch_init_vcpu(CPUState *env)
     }
 #endif
 
+#ifdef KVM_CAP_ADJUST_CLOCK
+    if (cpu_is_bsp(env) &&
+        (env->cpuid_kvm_features & (1ULL << KVM_FEATURE_CLOCKSOURCE))) {
+        sysbus_create_simple("kvmclock", -1, NULL);
+    }
+#endif
+
     return kvm_vcpu_ioctl(env, KVM_SET_CPUID2, &cpuid_data);
 }
 
@@ -534,6 +599,10 @@  int kvm_arch_init(int smp_cpus)
     int ret;
     struct utsname utsname;
 
+#ifdef KVM_CAP_ADJUST_CLOCK
+    sysbus_register_withprop(&kvmclock_info);
+#endif
+
     ret = kvm_get_supported_msrs();
     if (ret < 0) {
         return ret;