diff mbox

[v2,14/17] kvm: x86: Introduce kvmclock device to save/restore its state

Message ID d35c17614fa7646ff48ef57f7922aec01e1044a6.1294043582.git.jan.kiszka@web.de
State New
Headers show

Commit Message

Jan Kiszka Jan. 3, 2011, 8:33 a.m. UTC
From: Jan Kiszka <jan.kiszka@siemens.com>

If kvmclock is used, which implies the kernel supports it, register a
kvmclock device with the sysbus. Its main purpose is to save and restore
the kernel state on migration, but this will also allow to visualize it
one day.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
CC: Glauber Costa <glommer@redhat.com>
---
 target-i386/kvm.c |   71 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 70 insertions(+), 1 deletions(-)

Comments

Avi Kivity Jan. 3, 2011, 4:04 p.m. UTC | #1
On 01/03/2011 10:33 AM, Jan Kiszka wrote:
> From: Jan Kiszka<jan.kiszka@siemens.com>
>
> If kvmclock is used, which implies the kernel supports it, register a
> kvmclock device with the sysbus. Its main purpose is to save and restore
> the kernel state on migration, but this will also allow to visualize it
> one day.
>

kvmclock is a per-cpu affair.

>
> @@ -534,6 +599,10 @@ int kvm_arch_init(int smp_cpus)
>       int ret;
>       struct utsname utsname;
>
> +#ifdef KVM_CAP_ADJUST_CLOCK
> +    sysbus_register_withprop(&kvmclock_info);
> +#endif
> +

So this doesn't look right.  I think we're fine with just migrating the 
MSRs, like we migrate anything else that has to do with the cpu.
Jan Kiszka Jan. 3, 2011, 4:30 p.m. UTC | #2
Am 03.01.2011 17:04, Avi Kivity wrote:
> On 01/03/2011 10:33 AM, Jan Kiszka wrote:
>> From: Jan Kiszka<jan.kiszka@siemens.com>
>>
>> If kvmclock is used, which implies the kernel supports it, register a
>> kvmclock device with the sysbus. Its main purpose is to save and restore
>> the kernel state on migration, but this will also allow to visualize it
>> one day.
>>
> 
> kvmclock is a per-cpu affair.

Nope, it's state (the one save/restored here) is per VM.

> 
>>
>> @@ -534,6 +599,10 @@ int kvm_arch_init(int smp_cpus)
>>       int ret;
>>       struct utsname utsname;
>>
>> +#ifdef KVM_CAP_ADJUST_CLOCK
>> +    sysbus_register_withprop(&kvmclock_info);
>> +#endif
>> +
> 
> So this doesn't look right.  I think we're fine with just migrating the
> MSRs, like we migrate anything else that has to do with the cpu.
> 

The kvmclock state is not contained in any MSR. It's an independent
machine state that can be indirectly obtained via MSR access. Therefore,
qemu-kvm currently registers only one vmstate entry per machine, and
this patch just turns this into a clean device - because that's what
kvmclock is in the end, something like an HPET.

Jan
Glauber Costa Jan. 3, 2011, 4:37 p.m. UTC | #3
On Mon, 2011-01-03 at 09:33 +0100, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@siemens.com>
> 
> If kvmclock is used, which implies the kernel supports it, register a
> kvmclock device with the sysbus. Its main purpose is to save and restore
> the kernel state on migration, but this will also allow to visualize it
> one day.
> 
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> CC: Glauber Costa <glommer@redhat.com>

Hi Jan.

I've just recently posted a patch (not sure what was made from it), that
fixes a bug that you reintroduce here.

The bug is: if we call KVM_GET_CLOCK ioctl in pre_save, this means that
this value will change every time we issue savevm, even if the machine
is not run in between.

Ideally, we'd like to have two consecutive savevms returning the exact
same thing in that situation.

I like the general approach of moving it to sysbus, but please move the
ioctl to change state notifiers.

Cheers!
Glauber Costa Jan. 3, 2011, 4:38 p.m. UTC | #4
On Mon, 2011-01-03 at 18:04 +0200, Avi Kivity wrote:
> On 01/03/2011 10:33 AM, Jan Kiszka wrote:
> > From: Jan Kiszka<jan.kiszka@siemens.com>
> >
> > If kvmclock is used, which implies the kernel supports it, register a
> > kvmclock device with the sysbus. Its main purpose is to save and restore
> > the kernel state on migration, but this will also allow to visualize it
> > one day.
> >
> 
> kvmclock is a per-cpu affair.
> 
> >
> > @@ -534,6 +599,10 @@ int kvm_arch_init(int smp_cpus)
> >       int ret;
> >       struct utsname utsname;
> >
> > +#ifdef KVM_CAP_ADJUST_CLOCK
> > +    sysbus_register_withprop(&kvmclock_info);
> > +#endif
> > +
> 
> So this doesn't look right.  I think we're fine with just migrating the 
> MSRs, like we migrate anything else that has to do with the cpu.

The ioctl jan is handling here is a global one, that adjusts the base
offset for the clock over migration. It is okay.
Glauber Costa Jan. 3, 2011, 4:39 p.m. UTC | #5
On Mon, 2011-01-03 at 17:30 +0100, Jan Kiszka wrote:
> Am 03.01.2011 17:04, Avi Kivity wrote:
> > On 01/03/2011 10:33 AM, Jan Kiszka wrote:
> >> From: Jan Kiszka<jan.kiszka@siemens.com>
> >>
> >> If kvmclock is used, which implies the kernel supports it, register a
> >> kvmclock device with the sysbus. Its main purpose is to save and restore
> >> the kernel state on migration, but this will also allow to visualize it
> >> one day.
> >>
> > 
> > kvmclock is a per-cpu affair.
> 
> Nope, it's state (the one save/restored here) is per VM.
> 
> > 
> >>
> >> @@ -534,6 +599,10 @@ int kvm_arch_init(int smp_cpus)
> >>       int ret;
> >>       struct utsname utsname;
> >>
> >> +#ifdef KVM_CAP_ADJUST_CLOCK
> >> +    sysbus_register_withprop(&kvmclock_info);
> >> +#endif
> >> +
> > 
> > So this doesn't look right.  I think we're fine with just migrating the
> > MSRs, like we migrate anything else that has to do with the cpu.
> > 
> 
> The kvmclock state is not contained in any MSR. It's an independent
> machine state that can be indirectly obtained via MSR access. Therefore,
> qemu-kvm currently registers only one vmstate entry per machine, and
> this patch just turns this into a clean device - because that's what
> kvmclock is in the end, something like an HPET.

Jan is right, the per-cpu information only cares about which piece of
memory will be written to.
Avi Kivity Jan. 3, 2011, 4:41 p.m. UTC | #6
On 01/03/2011 06:30 PM, Jan Kiszka wrote:
> Am 03.01.2011 17:04, Avi Kivity wrote:
> >  On 01/03/2011 10:33 AM, Jan Kiszka wrote:
> >>  From: Jan Kiszka<jan.kiszka@siemens.com>
> >>
> >>  If kvmclock is used, which implies the kernel supports it, register a
> >>  kvmclock device with the sysbus. Its main purpose is to save and restore
> >>  the kernel state on migration, but this will also allow to visualize it
> >>  one day.
> >>
> >
> >  kvmclock is a per-cpu affair.
>
> Nope, it's state (the one save/restored here) is per VM.
>
> >
> >>
> >>  @@ -534,6 +599,10 @@ int kvm_arch_init(int smp_cpus)
> >>        int ret;
> >>        struct utsname utsname;
> >>
> >>  +#ifdef KVM_CAP_ADJUST_CLOCK
> >>  +    sysbus_register_withprop(&kvmclock_info);
> >>  +#endif
> >>  +
> >
> >  So this doesn't look right.  I think we're fine with just migrating the
> >  MSRs, like we migrate anything else that has to do with the cpu.
> >
>
> The kvmclock state is not contained in any MSR. It's an independent
> machine state that can be indirectly obtained via MSR access. Therefore,
> qemu-kvm currently registers only one vmstate entry per machine, and
> this patch just turns this into a clean device - because that's what
> kvmclock is in the end, something like an HPET.

Okay, thanks for the correction.
Jan Kiszka Jan. 3, 2011, 4:48 p.m. UTC | #7
Am 03.01.2011 17:37, Glauber Costa wrote:
> On Mon, 2011-01-03 at 09:33 +0100, Jan Kiszka wrote:
>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>
>> If kvmclock is used, which implies the kernel supports it, register a
>> kvmclock device with the sysbus. Its main purpose is to save and restore
>> the kernel state on migration, but this will also allow to visualize it
>> one day.
>>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> CC: Glauber Costa <glommer@redhat.com>
> 
> Hi Jan.
> 
> I've just recently posted a patch (not sure what was made from it), that
> fixes a bug that you reintroduce here.

Hmm, must have missed it.

> 
> The bug is: if we call KVM_GET_CLOCK ioctl in pre_save, this means that
> this value will change every time we issue savevm, even if the machine
> is not run in between.
> 
> Ideally, we'd like to have two consecutive savevms returning the exact
> same thing in that situation.
> 
> I like the general approach of moving it to sysbus, but please move the
> ioctl to change state notifiers.

OK, will look into this.

Jan
diff mbox

Patch

diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 26f459c..c4b6813 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -29,6 +29,7 @@ 
 #include "hw/apic.h"
 #include "ioport.h"
 #include "kvm_x86.h"
+#include "hw/sysbus.h"
 
 #ifdef CONFIG_KVM_PARA
 #include <linux/kvm_para.h>
@@ -309,6 +310,64 @@  void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status,
 #endif
 }
 
+#ifdef KVM_CAP_ADJUST_CLOCK
+typedef struct KVMClockState {
+    SysBusDevice busdev;
+    uint64_t clock;
+    struct kvm_clock_data data;
+} KVMClockState;
+
+static void kvmclock_pre_save(void *opaque)
+{
+    KVMClockState *s = opaque;
+    struct kvm_clock_data data;
+    int ret;
+
+    ret = kvm_vm_ioctl(KVM_GET_CLOCK, &data);
+    if (ret < 0) {
+        fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret));
+        data.clock = 0;
+    }
+    s->clock = data.clock;
+}
+
+static int kvmclock_post_load(void *opaque, int version_id)
+{
+    KVMClockState *s = opaque;
+    struct kvm_clock_data data;
+
+    data.clock = s->clock;
+    data.flags = 0;
+    return kvm_vm_ioctl(KVM_SET_CLOCK, &data);
+}
+
+static int kvmclock_init(SysBusDevice *dev)
+{
+    return 0;
+}
+
+static const VMStateDescription kvmclock_vmsd= {
+    .name = "kvmclock",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .minimum_version_id_old = 1,
+    .pre_save = kvmclock_pre_save,
+    .post_load = kvmclock_post_load,
+    .fields = (VMStateField []) {
+        VMSTATE_UINT64(clock, KVMClockState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static SysBusDeviceInfo kvmclock_info = {
+    .qdev.name = "kvmclock",
+    .qdev.size = sizeof(KVMClockState),
+    .qdev.vmsd = &kvmclock_vmsd,
+    .qdev.no_user = 1,
+    .init = kvmclock_init,
+};
+#endif /* KVM_CAP_ADJUST_CLOCK */
+
 int kvm_arch_init_vcpu(CPUState *env)
 {
     struct {
@@ -335,7 +394,6 @@  int kvm_arch_init_vcpu(CPUState *env)
     env->cpuid_svm_features  &= kvm_x86_get_supported_cpuid(0x8000000A,
                                                             0, R_EDX);
 
-
     cpuid_i = 0;
 
 #ifdef CONFIG_KVM_PARA
@@ -442,6 +500,13 @@  int kvm_arch_init_vcpu(CPUState *env)
     }
 #endif
 
+#ifdef KVM_CAP_ADJUST_CLOCK
+    if (cpu_is_bsp(env) &&
+        (env->cpuid_kvm_features & (1ULL << KVM_FEATURE_CLOCKSOURCE))) {
+        sysbus_create_simple("kvmclock", -1, NULL);
+    }
+#endif
+
     return kvm_vcpu_ioctl(env, KVM_SET_CPUID2, &cpuid_data);
 }
 
@@ -534,6 +599,10 @@  int kvm_arch_init(int smp_cpus)
     int ret;
     struct utsname utsname;
 
+#ifdef KVM_CAP_ADJUST_CLOCK
+    sysbus_register_withprop(&kvmclock_info);
+#endif
+
     ret = kvm_get_supported_msrs();
     if (ret < 0) {
         return ret;