Patchwork [RFC PATCH 00/11 Allow PR and HV KVM to coexist in one kernel

login
register
mail settings
Submitter Aneesh Kumar K.V
Date Oct. 1, 2013, 11:26 a.m.
Message ID <87ob79cjvm.fsf@linux.vnet.ibm.com>
Download mbox | patch
Permalink /patch/279419/
State Not Applicable
Headers show

Comments

Aneesh Kumar K.V - Oct. 1, 2013, 11:26 a.m.
Alexander Graf <agraf@suse.de> writes:

> On 09/30/2013 03:09 PM, Aneesh Kumar K.V wrote:
>> Alexander Graf<agraf@suse.de>  writes:
>>
>>> On 27.09.2013, at 12:52, Aneesh Kumar K.V wrote:
>>>
>>>> "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>  writes:
>>>>
>>>>> Hi All,
>>>>>
>>>>> This patch series support enabling HV and PR KVM together in the same kernel. We
>>>>> extend machine property with new property "kvm_type". A value of 1 will force HV
>>>>> KVM and 2 PR KVM. The default value is 0 which will select the fastest KVM mode.
>>>>> ie, HV if that is supported otherwise PR.
>>>>>
>>>>> With Qemu command line having
>>>>>
>>>>> -machine pseries,accel=kvm,kvm_type=1
>>>>>
>>>>> [root@llmp24l02 qemu]# bash ../qemu
>>>>> failed to initialize KVM: Invalid argument
>>>>> [root@llmp24l02 qemu]# modprobe kvm-pr
>>>>> [root@llmp24l02 qemu]# bash ../qemu
>>>>> failed to initialize KVM: Invalid argument
>>>>> [root@llmp24l02 qemu]# modprobe  kvm-hv
>>>>> [root@llmp24l02 qemu]# bash ../qemu
>>>>>
>>>>> now with
>>>>>
>>>>> -machine pseries,accel=kvm,kvm_type=2
>>>>>
>>>>> [root@llmp24l02 qemu]# rmmod kvm-pr
>>>>> [root@llmp24l02 qemu]# bash ../qemu
>>>>> failed to initialize KVM: Invalid argument
>>>>> [root@llmp24l02 qemu]#
>>>>> [root@llmp24l02 qemu]# modprobe kvm-pr
>>>>> [root@llmp24l02 qemu]# bash ../qemu
>>>>>
>>>>> if don't specify kvm_type machine property, it will take a default value 0,
>>>>> which means fastest supported.
>>>> Related qemu patch
>>>>
>>>> commit 8d139053177d48a70cb710b211ea4c2843eccdfb
>>>> Author: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>
>>>> Date:   Mon Sep 23 12:28:37 2013 +0530
>>>>
>>>>     kvm: Add a new machine property kvm_type
>>>>
>>>>     Targets like ppc64 support different type of KVM, one which use
>>>>     hypervisor mode and the other which doesn't. Add a new machine
>>>>     property kvm_type that helps in selecting the respective ones
>>>>
>>>>     Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>
>>> This really is too early, as we can't possibly run in HV mode for
>>> non-pseries machines, so the interpretation (or at least sanity
>>> checking) of what values are reasonable should occur in the
>>> machine. That's why it's a variable in the "machine opts".
>> With the current code CREATE_VM will fail, because we won't have
>> kvm-hv.ko loaded and trying to create a vm with type 1 will fail.
>> Now the challenge related to moving that to machine_init or later is, we
>> depend on HV or PR callback early in CREATE_VM. With the changes we have
>>
>> int kvmppc_core_init_vm(struct kvm *kvm)
>> {
>>
>> #ifdef CONFIG_PPC64
>> 	INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
>> 	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
>> #endif
>>
>> 	return kvm->arch.kvm_ops->init_vm(kvm);
>> }
>>
>> Also the mmu notifier callback do end up calling kvm_unmap_hva etc which
>> are all HV/PR dependent.
>
> Yes, so we should verify in the machine models that we're runnable with 
> the currently selected type at least, to give the user a sensible error 
> message.

Something like the below


>
>>
>>
>>
>>> Also, users don't want to say type=0. They want to say type=PR or
>>> type=HV or type=HV,PR. In fact, can't you make this a property of
>>> -accel? Then it's truly accel specific and everything should be well.
>> If we are doing this as machine property, we can't specify string,
>> because "HV"/"PR" are all powerpc dependent, so parsing that is not
>> possible in kvm_init in qemu. But, yes ideally it would be nice to be
>
> Well, we could do the "name to integer" conversion in an arch specific 
> function, no?
>
>> able to speicy the type using string. I thought accel is a machine
>> property, hence was not sure whether I can have additional properties
>> against that. I was using it as below.
>>
>>   -machine pseries,accel=kvm,kvm_type=1

Can we really specific -accel ? I check and I am finding that as machine
property. 

-aneesh
Alexander Graf - Oct. 1, 2013, 11:36 a.m.
On 10/01/2013 01:26 PM, Aneesh Kumar K.V wrote:
> Alexander Graf<agraf@suse.de>  writes:
>
>> On 09/30/2013 03:09 PM, Aneesh Kumar K.V wrote:
>>> Alexander Graf<agraf@suse.de>   writes:
>>>
>>>> On 27.09.2013, at 12:52, Aneesh Kumar K.V wrote:
>>>>
>>>>> "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>   writes:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> This patch series support enabling HV and PR KVM together in the same kernel. We
>>>>>> extend machine property with new property "kvm_type". A value of 1 will force HV
>>>>>> KVM and 2 PR KVM. The default value is 0 which will select the fastest KVM mode.
>>>>>> ie, HV if that is supported otherwise PR.
>>>>>>
>>>>>> With Qemu command line having
>>>>>>
>>>>>> -machine pseries,accel=kvm,kvm_type=1
>>>>>>
>>>>>> [root@llmp24l02 qemu]# bash ../qemu
>>>>>> failed to initialize KVM: Invalid argument
>>>>>> [root@llmp24l02 qemu]# modprobe kvm-pr
>>>>>> [root@llmp24l02 qemu]# bash ../qemu
>>>>>> failed to initialize KVM: Invalid argument
>>>>>> [root@llmp24l02 qemu]# modprobe  kvm-hv
>>>>>> [root@llmp24l02 qemu]# bash ../qemu
>>>>>>
>>>>>> now with
>>>>>>
>>>>>> -machine pseries,accel=kvm,kvm_type=2
>>>>>>
>>>>>> [root@llmp24l02 qemu]# rmmod kvm-pr
>>>>>> [root@llmp24l02 qemu]# bash ../qemu
>>>>>> failed to initialize KVM: Invalid argument
>>>>>> [root@llmp24l02 qemu]#
>>>>>> [root@llmp24l02 qemu]# modprobe kvm-pr
>>>>>> [root@llmp24l02 qemu]# bash ../qemu
>>>>>>
>>>>>> if don't specify kvm_type machine property, it will take a default value 0,
>>>>>> which means fastest supported.
>>>>> Related qemu patch
>>>>>
>>>>> commit 8d139053177d48a70cb710b211ea4c2843eccdfb
>>>>> Author: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>
>>>>> Date:   Mon Sep 23 12:28:37 2013 +0530
>>>>>
>>>>>      kvm: Add a new machine property kvm_type
>>>>>
>>>>>      Targets like ppc64 support different type of KVM, one which use
>>>>>      hypervisor mode and the other which doesn't. Add a new machine
>>>>>      property kvm_type that helps in selecting the respective ones
>>>>>
>>>>>      Signed-off-by: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>
>>>> This really is too early, as we can't possibly run in HV mode for
>>>> non-pseries machines, so the interpretation (or at least sanity
>>>> checking) of what values are reasonable should occur in the
>>>> machine. That's why it's a variable in the "machine opts".
>>> With the current code CREATE_VM will fail, because we won't have
>>> kvm-hv.ko loaded and trying to create a vm with type 1 will fail.
>>> Now the challenge related to moving that to machine_init or later is, we
>>> depend on HV or PR callback early in CREATE_VM. With the changes we have
>>>
>>> int kvmppc_core_init_vm(struct kvm *kvm)
>>> {
>>>
>>> #ifdef CONFIG_PPC64
>>> 	INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
>>> 	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
>>> #endif
>>>
>>> 	return kvm->arch.kvm_ops->init_vm(kvm);
>>> }
>>>
>>> Also the mmu notifier callback do end up calling kvm_unmap_hva etc which
>>> are all HV/PR dependent.
>> Yes, so we should verify in the machine models that we're runnable with
>> the currently selected type at least, to give the user a sensible error
>> message.
> Something like the below

I like that one a lot. Andreas, Paolo, what do you think?


Alex

>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 004184d..7d59ac1 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1337,6 +1337,21 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
>       assert(spapr->fdt_skel != NULL);
>   }
>
> +static int spapr_get_vm_type(const char *vm_type)
> +{
> +    if (!vm_type)
> +        return 0;
> +
> +    if (!strcmp(vm_type, "HV"))
> +        return 1;
> +
> +    if (!strcmp(vm_type, "PR"))
> +        return 2;
> +
> +    hw_error("qemu: unknown kvm_type specified '%s'", vm_type);
> +    exit(1);
> +}
> +
>   static QEMUMachine spapr_machine = {
>       .name = "pseries",
>       .desc = "pSeries Logical Partition (PAPR compliant)",
> @@ -1347,6 +1362,7 @@ static QEMUMachine spapr_machine = {
>       .max_cpus = MAX_CPUS,
>       .no_parallel = 1,
>       .default_boot_order = NULL,
> +    .get_vm_type = spapr_get_vm_type,
>   };
>
>   static void spapr_machine_init(void)
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 5a7ae9f..2130488 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -21,6 +21,8 @@ typedef void QEMUMachineResetFunc(void);
>
>   typedef void QEMUMachineHotAddCPUFunc(const int64_t id, Error **errp);
>
> +typedef int QEMUMachineGetVmTypeFunc(const char *arg);
> +
>   typedef struct QEMUMachine {
>       const char *name;
>       const char *alias;
> @@ -28,6 +30,7 @@ typedef struct QEMUMachine {
>       QEMUMachineInitFunc *init;
>       QEMUMachineResetFunc *reset;
>       QEMUMachineHotAddCPUFunc *hot_add_cpu;
> +    QEMUMachineGetVmTypeFunc *get_vm_type;
>       BlockInterfaceType block_default_type;
>       int max_cpus;
>       unsigned int no_serial:1,
> diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
> index e1f88bf..acc3d74 100644
> --- a/include/hw/xen/xen.h
> +++ b/include/hw/xen/xen.h
> @@ -36,7 +36,8 @@ void xen_cmos_set_s3_resume(void *opaque, int irq, int level);
>
>   qemu_irq *xen_interrupt_controller_init(void);
>
> -int xen_init(void);
> +typedef struct QEMUMachine QEMUMachine;
> +int xen_init(QEMUMachine *machine);
>   int xen_hvm_init(MemoryRegion **ram_memory);
>   void xenstore_store_pv_console_info(int i, struct CharDriverState *chr);
>
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 9bbe3db..f25caec 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -142,8 +142,8 @@ typedef struct KVMState KVMState;
>   extern KVMState *kvm_state;
>
>   /* external API */
> -
> -int kvm_init(void);
> +typedef struct QEMUMachine QEMUMachine;
> +int kvm_init(QEMUMachine *machine);
>
>   int kvm_has_sync_mmu(void);
>   int kvm_has_vcpu_events(void);
> diff --git a/include/sysemu/qtest.h b/include/sysemu/qtest.h
> index 9a0c6b3..d71343d 100644
> --- a/include/sysemu/qtest.h
> +++ b/include/sysemu/qtest.h
> @@ -31,7 +31,8 @@ static inline int qtest_available(void)
>       return 1;
>   }
>
> -int qtest_init(void);
> +typedef struct QEMUMachine QEMUMachine;
> +int qtest_init(QEMUMachine *machine);
>   #else
>   static inline bool qtest_enabled(void)
>   {
> @@ -43,7 +44,7 @@ static inline int qtest_available(void)
>       return 0;
>   }
>
> -static inline int qtest_init(void)
> +static inline int qtest_init(QEMUMachine *machine)
>   {
>       return 0;
>   }
> diff --git a/kvm-all.c b/kvm-all.c
> index b87215c..3863abd 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -35,6 +35,8 @@
>   #include "qemu/event_notifier.h"
>   #include "trace.h"
>
> +#include "hw/boards.h"
> +
>   /* This check must be after config-host.h is included */
>   #ifdef CONFIG_EVENTFD
>   #include<sys/eventfd.h>
> @@ -1342,7 +1344,7 @@ static int kvm_max_vcpus(KVMState *s)
>       return 4;
>   }
>
> -int kvm_init(void)
> +int kvm_init(QEMUMachine *machine)
>   {
>       static const char upgrade_note[] =
>           "Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n"
> @@ -1350,7 +1352,7 @@ int kvm_init(void)
>       KVMState *s;
>       const KVMCapabilityInfo *missing_cap;
>       int ret;
> -    int i;
> +    int i, kvm_type = 0;
>       int max_vcpus;
>
>       s = g_malloc0(sizeof(KVMState));
> @@ -1407,7 +1409,11 @@ int kvm_init(void)
>           goto err;
>       }
>
> -    s->vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0);
> +    if (machine->get_vm_type) {
> +        kvm_type = machine->get_vm_type(qemu_opt_get(qemu_get_machine_opts(),
> +                                                     "kvm_type"));
> +    }
> +    s->vmfd = kvm_ioctl(s, KVM_CREATE_VM, kvm_type);
>       if (s->vmfd<  0) {
>   #ifdef TARGET_S390X
>           fprintf(stderr, "Please add the 'switch_amode' kernel parameter to "
> diff --git a/kvm-stub.c b/kvm-stub.c
> index 548f471..ccb7b8c 100644
> --- a/kvm-stub.c
> +++ b/kvm-stub.c
> @@ -19,6 +19,8 @@
>   #include "hw/pci/msi.h"
>   #endif
>
> +#include "hw/boards.h"
> +
>   KVMState *kvm_state;
>   bool kvm_kernel_irqchip;
>   bool kvm_async_interrupts_allowed;
> @@ -33,7 +35,7 @@ int kvm_init_vcpu(CPUState *cpu)
>       return -ENOSYS;
>   }
>
> -int kvm_init(void)
> +int kvm_init(QEMUMachine *machine)
>   {
>       return -ENOSYS;
>   }
> diff --git a/qtest.c b/qtest.c
> index 584c707..ef3c473 100644
> --- a/qtest.c
> +++ b/qtest.c
> @@ -502,7 +502,7 @@ static void qtest_event(void *opaque, int event)
>       }
>   }
>
> -int qtest_init(void)
> +int qtest_init(QEMUMachine *machine)
>   {
>       CharDriverState *chr;
>
> diff --git a/vl.c b/vl.c
> index 4e709d5..7ecc581 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -427,7 +427,12 @@ static QemuOptsList qemu_machine_opts = {
>               .name = "usb",
>               .type = QEMU_OPT_BOOL,
>               .help = "Set on/off to enable/disable usb",
> +        },{
> +            .name = "kvm_type",
> +            .type = QEMU_OPT_STRING,
> +            .help = "Set to kvm type to be used in create vm ioctl",
>           },
> +
>           { /* End of list */ }
>       },
>   };
> @@ -2608,7 +2613,7 @@ static QEMUMachine *machine_parse(const char *name)
>       exit(!name || !is_help_option(name));
>   }
>
> -static int tcg_init(void)
> +static int tcg_init(QEMUMachine *machine)
>   {
>       tcg_exec_init(tcg_tb_size * 1024 * 1024);
>       return 0;
> @@ -2618,7 +2623,7 @@ static struct {
>       const char *opt_name;
>       const char *name;
>       int (*available)(void);
> -    int (*init)(void);
> +    int (*init)(QEMUMachine *);
>       bool *allowed;
>   } accel_list[] = {
>       { "tcg", "tcg", tcg_available, tcg_init,&tcg_allowed },
> @@ -2627,7 +2632,7 @@ static struct {
>       { "qtest", "QTest", qtest_available, qtest_init,&qtest_allowed },
>   };
>
> -static int configure_accelerator(void)
> +static int configure_accelerator(QEMUMachine *machine)
>   {
>       const char *p;
>       char buf[10];
> @@ -2654,7 +2659,7 @@ static int configure_accelerator(void)
>                       continue;
>                   }
>                   *(accel_list[i].allowed) = true;
> -                ret = accel_list[i].init();
> +                ret = accel_list[i].init(machine);
>                   if (ret<  0) {
>                       init_failed = true;
>                       fprintf(stderr, "failed to initialize %s: %s\n",
> @@ -4037,10 +4042,10 @@ int main(int argc, char **argv, char **envp)
>           exit(0);
>       }
>
> -    configure_accelerator();
> +    configure_accelerator(machine);
>
>       if (!qtest_enabled()&&  qtest_chrdev) {
> -        qtest_init();
> +        qtest_init(machine);
>       }
>
>       machine_opts = qemu_get_machine_opts();
> diff --git a/xen-all.c b/xen-all.c
> index 839f14f..ac3654b 100644
> --- a/xen-all.c
> +++ b/xen-all.c
> @@ -1000,7 +1000,7 @@ static void xen_exit_notifier(Notifier *n, void *data)
>       xs_daemon_close(state->xenstore);
>   }
>
> -int xen_init(void)
> +int xen_init(QEMUMachine *machine)
>   {
>       xen_xc = xen_xc_interface_open(0, 0, 0);
>       if (xen_xc == XC_HANDLER_INITIAL_VALUE) {
> diff --git a/xen-stub.c b/xen-stub.c
> index ad189a6..59927cb 100644
> --- a/xen-stub.c
> +++ b/xen-stub.c
> @@ -47,7 +47,7 @@ qemu_irq *xen_interrupt_controller_init(void)
>       return NULL;
>   }
>
> -int xen_init(void)
> +int xen_init(QEMUMachine *machine)
>   {
>       return -ENOSYS;
>   }
>
>>>
>>>
>>>> Also, users don't want to say type=0. They want to say type=PR or
>>>> type=HV or type=HV,PR. In fact, can't you make this a property of
>>>> -accel? Then it's truly accel specific and everything should be well.
>>> If we are doing this as machine property, we can't specify string,
>>> because "HV"/"PR" are all powerpc dependent, so parsing that is not
>>> possible in kvm_init in qemu. But, yes ideally it would be nice to be
>> Well, we could do the "name to integer" conversion in an arch specific
>> function, no?
>>
>>> able to speicy the type using string. I thought accel is a machine
>>> property, hence was not sure whether I can have additional properties
>>> against that. I was using it as below.
>>>
>>>    -machine pseries,accel=kvm,kvm_type=1
> Can we really specific -accel ? I check and I am finding that as machine
> property.
>
> -aneesh
>
Paolo Bonzini - Oct. 1, 2013, 11:41 a.m.
Il 01/10/2013 13:36, Alexander Graf ha scritto:
>>>>
>>> Yes, so we should verify in the machine models that we're runnable with
>>> the currently selected type at least, to give the user a sensible error
>>> message.
>> Something like the below
> 
> I like that one a lot. Andreas, Paolo, what do you think?

Yes, it's fine.

Paolo
Alexander Graf - Oct. 1, 2013, 11:43 a.m.
On 10/01/2013 01:36 PM, Alexander Graf wrote:
> On 10/01/2013 01:26 PM, Aneesh Kumar K.V wrote:
>> Alexander Graf<agraf@suse.de>  writes:
>>
>>> On 09/30/2013 03:09 PM, Aneesh Kumar K.V wrote:
>>>> Alexander Graf<agraf@suse.de>   writes:
>>>>
>>>>> On 27.09.2013, at 12:52, Aneesh Kumar K.V wrote:
>>>>>
>>>>>> "Aneesh Kumar K.V"<aneesh.kumar@linux.vnet.ibm.com>   writes:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> This patch series support enabling HV and PR KVM together in the 
>>>>>>> same kernel. We
>>>>>>> extend machine property with new property "kvm_type". A value of 
>>>>>>> 1 will force HV
>>>>>>> KVM and 2 PR KVM. The default value is 0 which will select the 
>>>>>>> fastest KVM mode.
>>>>>>> ie, HV if that is supported otherwise PR.
>>>>>>>
>>>>>>> With Qemu command line having
>>>>>>>
>>>>>>> -machine pseries,accel=kvm,kvm_type=1
>>>>>>>
>>>>>>> [root@llmp24l02 qemu]# bash ../qemu
>>>>>>> failed to initialize KVM: Invalid argument
>>>>>>> [root@llmp24l02 qemu]# modprobe kvm-pr
>>>>>>> [root@llmp24l02 qemu]# bash ../qemu
>>>>>>> failed to initialize KVM: Invalid argument
>>>>>>> [root@llmp24l02 qemu]# modprobe  kvm-hv
>>>>>>> [root@llmp24l02 qemu]# bash ../qemu
>>>>>>>
>>>>>>> now with
>>>>>>>
>>>>>>> -machine pseries,accel=kvm,kvm_type=2
>>>>>>>
>>>>>>> [root@llmp24l02 qemu]# rmmod kvm-pr
>>>>>>> [root@llmp24l02 qemu]# bash ../qemu
>>>>>>> failed to initialize KVM: Invalid argument
>>>>>>> [root@llmp24l02 qemu]#
>>>>>>> [root@llmp24l02 qemu]# modprobe kvm-pr
>>>>>>> [root@llmp24l02 qemu]# bash ../qemu
>>>>>>>
>>>>>>> if don't specify kvm_type machine property, it will take a 
>>>>>>> default value 0,
>>>>>>> which means fastest supported.
>>>>>> Related qemu patch
>>>>>>
>>>>>> commit 8d139053177d48a70cb710b211ea4c2843eccdfb
>>>>>> Author: Aneesh Kumar K.V<aneesh.kumar@linux.vnet.ibm.com>
>>>>>> Date:   Mon Sep 23 12:28:37 2013 +0530
>>>>>>
>>>>>>      kvm: Add a new machine property kvm_type
>>>>>>
>>>>>>      Targets like ppc64 support different type of KVM, one which use
>>>>>>      hypervisor mode and the other which doesn't. Add a new machine
>>>>>>      property kvm_type that helps in selecting the respective ones
>>>>>>
>>>>>>      Signed-off-by: Aneesh Kumar 
>>>>>> K.V<aneesh.kumar@linux.vnet.ibm.com>
>>>>> This really is too early, as we can't possibly run in HV mode for
>>>>> non-pseries machines, so the interpretation (or at least sanity
>>>>> checking) of what values are reasonable should occur in the
>>>>> machine. That's why it's a variable in the "machine opts".
>>>> With the current code CREATE_VM will fail, because we won't have
>>>> kvm-hv.ko loaded and trying to create a vm with type 1 will fail.
>>>> Now the challenge related to moving that to machine_init or later 
>>>> is, we
>>>> depend on HV or PR callback early in CREATE_VM. With the changes we 
>>>> have
>>>>
>>>> int kvmppc_core_init_vm(struct kvm *kvm)
>>>> {
>>>>
>>>> #ifdef CONFIG_PPC64
>>>>     INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
>>>>     INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
>>>> #endif
>>>>
>>>>     return kvm->arch.kvm_ops->init_vm(kvm);
>>>> }
>>>>
>>>> Also the mmu notifier callback do end up calling kvm_unmap_hva etc 
>>>> which
>>>> are all HV/PR dependent.
>>> Yes, so we should verify in the machine models that we're runnable with
>>> the currently selected type at least, to give the user a sensible error
>>> message.
>> Something like the below
>
> I like that one a lot. Andreas, Paolo, what do you think?

To clarify this a bit more, this patch is missing a few critical pieces:

   1) The default implementation for the kvm_type callback needs to 
error out when kvm_type is set to anything
   2) All non-papr-non-e500-PPC machines should force "PR only" mode and 
bail out on anything else
   3) We're missing sensible error messages for the user still when the 
VM could not get created

but I like the overall concept :).


Alex

Patch

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 004184d..7d59ac1 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1337,6 +1337,21 @@  static void ppc_spapr_init(QEMUMachineInitArgs *args)
     assert(spapr->fdt_skel != NULL);
 }
 
+static int spapr_get_vm_type(const char *vm_type)
+{
+    if (!vm_type)
+        return 0;
+
+    if (!strcmp(vm_type, "HV"))
+        return 1;
+
+    if (!strcmp(vm_type, "PR"))
+        return 2;
+
+    hw_error("qemu: unknown kvm_type specified '%s'", vm_type);
+    exit(1);
+}
+
 static QEMUMachine spapr_machine = {
     .name = "pseries",
     .desc = "pSeries Logical Partition (PAPR compliant)",
@@ -1347,6 +1362,7 @@  static QEMUMachine spapr_machine = {
     .max_cpus = MAX_CPUS,
     .no_parallel = 1,
     .default_boot_order = NULL,
+    .get_vm_type = spapr_get_vm_type,
 };
 
 static void spapr_machine_init(void)
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 5a7ae9f..2130488 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -21,6 +21,8 @@  typedef void QEMUMachineResetFunc(void);
 
 typedef void QEMUMachineHotAddCPUFunc(const int64_t id, Error **errp);
 
+typedef int QEMUMachineGetVmTypeFunc(const char *arg);
+
 typedef struct QEMUMachine {
     const char *name;
     const char *alias;
@@ -28,6 +30,7 @@  typedef struct QEMUMachine {
     QEMUMachineInitFunc *init;
     QEMUMachineResetFunc *reset;
     QEMUMachineHotAddCPUFunc *hot_add_cpu;
+    QEMUMachineGetVmTypeFunc *get_vm_type;
     BlockInterfaceType block_default_type;
     int max_cpus;
     unsigned int no_serial:1,
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index e1f88bf..acc3d74 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -36,7 +36,8 @@  void xen_cmos_set_s3_resume(void *opaque, int irq, int level);
 
 qemu_irq *xen_interrupt_controller_init(void);
 
-int xen_init(void);
+typedef struct QEMUMachine QEMUMachine;
+int xen_init(QEMUMachine *machine);
 int xen_hvm_init(MemoryRegion **ram_memory);
 void xenstore_store_pv_console_info(int i, struct CharDriverState *chr);
 
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 9bbe3db..f25caec 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -142,8 +142,8 @@  typedef struct KVMState KVMState;
 extern KVMState *kvm_state;
 
 /* external API */
-
-int kvm_init(void);
+typedef struct QEMUMachine QEMUMachine;
+int kvm_init(QEMUMachine *machine);
 
 int kvm_has_sync_mmu(void);
 int kvm_has_vcpu_events(void);
diff --git a/include/sysemu/qtest.h b/include/sysemu/qtest.h
index 9a0c6b3..d71343d 100644
--- a/include/sysemu/qtest.h
+++ b/include/sysemu/qtest.h
@@ -31,7 +31,8 @@  static inline int qtest_available(void)
     return 1;
 }
 
-int qtest_init(void);
+typedef struct QEMUMachine QEMUMachine;
+int qtest_init(QEMUMachine *machine);
 #else
 static inline bool qtest_enabled(void)
 {
@@ -43,7 +44,7 @@  static inline int qtest_available(void)
     return 0;
 }
 
-static inline int qtest_init(void)
+static inline int qtest_init(QEMUMachine *machine)
 {
     return 0;
 }
diff --git a/kvm-all.c b/kvm-all.c
index b87215c..3863abd 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -35,6 +35,8 @@ 
 #include "qemu/event_notifier.h"
 #include "trace.h"
 
+#include "hw/boards.h"
+
 /* This check must be after config-host.h is included */
 #ifdef CONFIG_EVENTFD
 #include <sys/eventfd.h>
@@ -1342,7 +1344,7 @@  static int kvm_max_vcpus(KVMState *s)
     return 4;
 }
 
-int kvm_init(void)
+int kvm_init(QEMUMachine *machine)
 {
     static const char upgrade_note[] =
         "Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n"
@@ -1350,7 +1352,7 @@  int kvm_init(void)
     KVMState *s;
     const KVMCapabilityInfo *missing_cap;
     int ret;
-    int i;
+    int i, kvm_type = 0;
     int max_vcpus;
 
     s = g_malloc0(sizeof(KVMState));
@@ -1407,7 +1409,11 @@  int kvm_init(void)
         goto err;
     }
 
-    s->vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0);
+    if (machine->get_vm_type) {
+        kvm_type = machine->get_vm_type(qemu_opt_get(qemu_get_machine_opts(),
+                                                     "kvm_type"));
+    }
+    s->vmfd = kvm_ioctl(s, KVM_CREATE_VM, kvm_type);
     if (s->vmfd < 0) {
 #ifdef TARGET_S390X
         fprintf(stderr, "Please add the 'switch_amode' kernel parameter to "
diff --git a/kvm-stub.c b/kvm-stub.c
index 548f471..ccb7b8c 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -19,6 +19,8 @@ 
 #include "hw/pci/msi.h"
 #endif
 
+#include "hw/boards.h"
+
 KVMState *kvm_state;
 bool kvm_kernel_irqchip;
 bool kvm_async_interrupts_allowed;
@@ -33,7 +35,7 @@  int kvm_init_vcpu(CPUState *cpu)
     return -ENOSYS;
 }
 
-int kvm_init(void)
+int kvm_init(QEMUMachine *machine)
 {
     return -ENOSYS;
 }
diff --git a/qtest.c b/qtest.c
index 584c707..ef3c473 100644
--- a/qtest.c
+++ b/qtest.c
@@ -502,7 +502,7 @@  static void qtest_event(void *opaque, int event)
     }
 }
 
-int qtest_init(void)
+int qtest_init(QEMUMachine *machine)
 {
     CharDriverState *chr;
 
diff --git a/vl.c b/vl.c
index 4e709d5..7ecc581 100644
--- a/vl.c
+++ b/vl.c
@@ -427,7 +427,12 @@  static QemuOptsList qemu_machine_opts = {
             .name = "usb",
             .type = QEMU_OPT_BOOL,
             .help = "Set on/off to enable/disable usb",
+        },{
+            .name = "kvm_type",
+            .type = QEMU_OPT_STRING,
+            .help = "Set to kvm type to be used in create vm ioctl",
         },
+
         { /* End of list */ }
     },
 };
@@ -2608,7 +2613,7 @@  static QEMUMachine *machine_parse(const char *name)
     exit(!name || !is_help_option(name));
 }
 
-static int tcg_init(void)
+static int tcg_init(QEMUMachine *machine)
 {
     tcg_exec_init(tcg_tb_size * 1024 * 1024);
     return 0;
@@ -2618,7 +2623,7 @@  static struct {
     const char *opt_name;
     const char *name;
     int (*available)(void);
-    int (*init)(void);
+    int (*init)(QEMUMachine *);
     bool *allowed;
 } accel_list[] = {
     { "tcg", "tcg", tcg_available, tcg_init, &tcg_allowed },
@@ -2627,7 +2632,7 @@  static struct {
     { "qtest", "QTest", qtest_available, qtest_init, &qtest_allowed },
 };
 
-static int configure_accelerator(void)
+static int configure_accelerator(QEMUMachine *machine)
 {
     const char *p;
     char buf[10];
@@ -2654,7 +2659,7 @@  static int configure_accelerator(void)
                     continue;
                 }
                 *(accel_list[i].allowed) = true;
-                ret = accel_list[i].init();
+                ret = accel_list[i].init(machine);
                 if (ret < 0) {
                     init_failed = true;
                     fprintf(stderr, "failed to initialize %s: %s\n",
@@ -4037,10 +4042,10 @@  int main(int argc, char **argv, char **envp)
         exit(0);
     }
 
-    configure_accelerator();
+    configure_accelerator(machine);
 
     if (!qtest_enabled() && qtest_chrdev) {
-        qtest_init();
+        qtest_init(machine);
     }
 
     machine_opts = qemu_get_machine_opts();
diff --git a/xen-all.c b/xen-all.c
index 839f14f..ac3654b 100644
--- a/xen-all.c
+++ b/xen-all.c
@@ -1000,7 +1000,7 @@  static void xen_exit_notifier(Notifier *n, void *data)
     xs_daemon_close(state->xenstore);
 }
 
-int xen_init(void)
+int xen_init(QEMUMachine *machine)
 {
     xen_xc = xen_xc_interface_open(0, 0, 0);
     if (xen_xc == XC_HANDLER_INITIAL_VALUE) {
diff --git a/xen-stub.c b/xen-stub.c
index ad189a6..59927cb 100644
--- a/xen-stub.c
+++ b/xen-stub.c
@@ -47,7 +47,7 @@  qemu_irq *xen_interrupt_controller_init(void)
     return NULL;
 }
 
-int xen_init(void)
+int xen_init(QEMUMachine *machine)
 {
     return -ENOSYS;
 }