Message ID | d35c17614fa7646ff48ef57f7922aec01e1044a6.1294043582.git.jan.kiszka@web.de |
---|---|
State | New |
Headers | show |
On 01/03/2011 10:33 AM, Jan Kiszka wrote: > From: Jan Kiszka<jan.kiszka@siemens.com> > > If kvmclock is used, which implies the kernel supports it, register a > kvmclock device with the sysbus. Its main purpose is to save and restore > the kernel state on migration, but this will also allow to visualize it > one day. > kvmclock is a per-cpu affair. > > @@ -534,6 +599,10 @@ int kvm_arch_init(int smp_cpus) > int ret; > struct utsname utsname; > > +#ifdef KVM_CAP_ADJUST_CLOCK > + sysbus_register_withprop(&kvmclock_info); > +#endif > + So this doesn't look right. I think we're fine with just migrating the MSRs, like we migrate anything else that has to do with the cpu.
Am 03.01.2011 17:04, Avi Kivity wrote: > On 01/03/2011 10:33 AM, Jan Kiszka wrote: >> From: Jan Kiszka<jan.kiszka@siemens.com> >> >> If kvmclock is used, which implies the kernel supports it, register a >> kvmclock device with the sysbus. Its main purpose is to save and restore >> the kernel state on migration, but this will also allow to visualize it >> one day. >> > > kvmclock is a per-cpu affair. Nope, it's state (the one save/restored here) is per VM. > >> >> @@ -534,6 +599,10 @@ int kvm_arch_init(int smp_cpus) >> int ret; >> struct utsname utsname; >> >> +#ifdef KVM_CAP_ADJUST_CLOCK >> + sysbus_register_withprop(&kvmclock_info); >> +#endif >> + > > So this doesn't look right. I think we're fine with just migrating the > MSRs, like we migrate anything else that has to do with the cpu. > The kvmclock state is not contained in any MSR. It's an independent machine state that can be indirectly obtained via MSR access. Therefore, qemu-kvm currently registers only one vmstate entry per machine, and this patch just turns this into a clean device - because that's what kvmclock is in the end, something like an HPET. Jan
On Mon, 2011-01-03 at 09:33 +0100, Jan Kiszka wrote: > From: Jan Kiszka <jan.kiszka@siemens.com> > > If kvmclock is used, which implies the kernel supports it, register a > kvmclock device with the sysbus. Its main purpose is to save and restore > the kernel state on migration, but this will also allow to visualize it > one day. > > Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> > CC: Glauber Costa <glommer@redhat.com> Hi Jan. I've just recently posted a patch (not sure what was made from it), that fixes a bug that you reintroduce here. The bug is: if we call KVM_GET_CLOCK ioctl in pre_save, this means that this value will change every time we issue savevm, even if the machine is not run in between. Ideally, we'd like to have two consecutive savevms returning the exact same thing in that situation. I like the general approach of moving it to sysbus, but please move the ioctl to change state notifiers. Cheers!
On Mon, 2011-01-03 at 18:04 +0200, Avi Kivity wrote: > On 01/03/2011 10:33 AM, Jan Kiszka wrote: > > From: Jan Kiszka<jan.kiszka@siemens.com> > > > > If kvmclock is used, which implies the kernel supports it, register a > > kvmclock device with the sysbus. Its main purpose is to save and restore > > the kernel state on migration, but this will also allow to visualize it > > one day. > > > > kvmclock is a per-cpu affair. > > > > > @@ -534,6 +599,10 @@ int kvm_arch_init(int smp_cpus) > > int ret; > > struct utsname utsname; > > > > +#ifdef KVM_CAP_ADJUST_CLOCK > > + sysbus_register_withprop(&kvmclock_info); > > +#endif > > + > > So this doesn't look right. I think we're fine with just migrating the > MSRs, like we migrate anything else that has to do with the cpu. The ioctl jan is handling here is a global one, that adjusts the base offset for the clock over migration. It is okay.
On Mon, 2011-01-03 at 17:30 +0100, Jan Kiszka wrote: > Am 03.01.2011 17:04, Avi Kivity wrote: > > On 01/03/2011 10:33 AM, Jan Kiszka wrote: > >> From: Jan Kiszka<jan.kiszka@siemens.com> > >> > >> If kvmclock is used, which implies the kernel supports it, register a > >> kvmclock device with the sysbus. Its main purpose is to save and restore > >> the kernel state on migration, but this will also allow to visualize it > >> one day. > >> > > > > kvmclock is a per-cpu affair. > > Nope, it's state (the one save/restored here) is per VM. > > > > >> > >> @@ -534,6 +599,10 @@ int kvm_arch_init(int smp_cpus) > >> int ret; > >> struct utsname utsname; > >> > >> +#ifdef KVM_CAP_ADJUST_CLOCK > >> + sysbus_register_withprop(&kvmclock_info); > >> +#endif > >> + > > > > So this doesn't look right. I think we're fine with just migrating the > > MSRs, like we migrate anything else that has to do with the cpu. > > > > The kvmclock state is not contained in any MSR. It's an independent > machine state that can be indirectly obtained via MSR access. Therefore, > qemu-kvm currently registers only one vmstate entry per machine, and > this patch just turns this into a clean device - because that's what > kvmclock is in the end, something like an HPET. Jan is right, the per-cpu information only cares about which piece of memory will be written to.
On 01/03/2011 06:30 PM, Jan Kiszka wrote: > Am 03.01.2011 17:04, Avi Kivity wrote: > > On 01/03/2011 10:33 AM, Jan Kiszka wrote: > >> From: Jan Kiszka<jan.kiszka@siemens.com> > >> > >> If kvmclock is used, which implies the kernel supports it, register a > >> kvmclock device with the sysbus. Its main purpose is to save and restore > >> the kernel state on migration, but this will also allow to visualize it > >> one day. > >> > > > > kvmclock is a per-cpu affair. > > Nope, it's state (the one save/restored here) is per VM. > > > > >> > >> @@ -534,6 +599,10 @@ int kvm_arch_init(int smp_cpus) > >> int ret; > >> struct utsname utsname; > >> > >> +#ifdef KVM_CAP_ADJUST_CLOCK > >> + sysbus_register_withprop(&kvmclock_info); > >> +#endif > >> + > > > > So this doesn't look right. I think we're fine with just migrating the > > MSRs, like we migrate anything else that has to do with the cpu. > > > > The kvmclock state is not contained in any MSR. It's an independent > machine state that can be indirectly obtained via MSR access. Therefore, > qemu-kvm currently registers only one vmstate entry per machine, and > this patch just turns this into a clean device - because that's what > kvmclock is in the end, something like an HPET. Okay, thanks for the correction.
Am 03.01.2011 17:37, Glauber Costa wrote: > On Mon, 2011-01-03 at 09:33 +0100, Jan Kiszka wrote: >> From: Jan Kiszka <jan.kiszka@siemens.com> >> >> If kvmclock is used, which implies the kernel supports it, register a >> kvmclock device with the sysbus. Its main purpose is to save and restore >> the kernel state on migration, but this will also allow to visualize it >> one day. >> >> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> >> CC: Glauber Costa <glommer@redhat.com> > > Hi Jan. > > I've just recently posted a patch (not sure what was made from it), that > fixes a bug that you reintroduce here. Hmm, must have missed it. > > The bug is: if we call KVM_GET_CLOCK ioctl in pre_save, this means that > this value will change every time we issue savevm, even if the machine > is not run in between. > > Ideally, we'd like to have two consecutive savevms returning the exact > same thing in that situation. > > I like the general approach of moving it to sysbus, but please move the > ioctl to change state notifiers. OK, will look into this. Jan
diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 26f459c..c4b6813 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -29,6 +29,7 @@ #include "hw/apic.h" #include "ioport.h" #include "kvm_x86.h" +#include "hw/sysbus.h" #ifdef CONFIG_KVM_PARA #include <linux/kvm_para.h> @@ -309,6 +310,64 @@ void kvm_inject_x86_mce(CPUState *cenv, int bank, uint64_t status, #endif } +#ifdef KVM_CAP_ADJUST_CLOCK +typedef struct KVMClockState { + SysBusDevice busdev; + uint64_t clock; + struct kvm_clock_data data; +} KVMClockState; + +static void kvmclock_pre_save(void *opaque) +{ + KVMClockState *s = opaque; + struct kvm_clock_data data; + int ret; + + ret = kvm_vm_ioctl(KVM_GET_CLOCK, &data); + if (ret < 0) { + fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret)); + data.clock = 0; + } + s->clock = data.clock; +} + +static int kvmclock_post_load(void *opaque, int version_id) +{ + KVMClockState *s = opaque; + struct kvm_clock_data data; + + data.clock = s->clock; + data.flags = 0; + return kvm_vm_ioctl(KVM_SET_CLOCK, &data); +} + +static int kvmclock_init(SysBusDevice *dev) +{ + return 0; +} + +static const VMStateDescription kvmclock_vmsd= { + .name = "kvmclock", + .version_id = 1, + .minimum_version_id = 1, + .minimum_version_id_old = 1, + .pre_save = kvmclock_pre_save, + .post_load = kvmclock_post_load, + .fields = (VMStateField []) { + VMSTATE_UINT64(clock, KVMClockState), + VMSTATE_END_OF_LIST() + } +}; + +static SysBusDeviceInfo kvmclock_info = { + .qdev.name = "kvmclock", + .qdev.size = sizeof(KVMClockState), + .qdev.vmsd = &kvmclock_vmsd, + .qdev.no_user = 1, + .init = kvmclock_init, +}; +#endif /* KVM_CAP_ADJUST_CLOCK */ + int kvm_arch_init_vcpu(CPUState *env) { struct { @@ -335,7 +394,6 @@ int kvm_arch_init_vcpu(CPUState *env) env->cpuid_svm_features &= kvm_x86_get_supported_cpuid(0x8000000A, 0, R_EDX); - cpuid_i = 0; #ifdef CONFIG_KVM_PARA @@ -442,6 +500,13 @@ int kvm_arch_init_vcpu(CPUState *env) } #endif +#ifdef KVM_CAP_ADJUST_CLOCK + if (cpu_is_bsp(env) && + (env->cpuid_kvm_features & (1ULL << KVM_FEATURE_CLOCKSOURCE))) { + sysbus_create_simple("kvmclock", -1, NULL); + } +#endif + return kvm_vcpu_ioctl(env, KVM_SET_CPUID2, &cpuid_data); } @@ -534,6 +599,10 @@ int kvm_arch_init(int smp_cpus) int ret; struct utsname utsname; +#ifdef KVM_CAP_ADJUST_CLOCK + sysbus_register_withprop(&kvmclock_info); +#endif + ret = kvm_get_supported_msrs(); if (ret < 0) { return ret;