Patchwork [2/2] Expose tsc deadline timer cpuid to guest

login
register
mail settings
Submitter Liu, Jinsong
Date Feb. 28, 2012, 10:30 a.m.
Message ID <DE8DF0795D48FD4CA783C40EC82923350B8532@SHSMSX101.ccr.corp.intel.com>
Download mbox | patch
Permalink /patch/143393/
State New
Headers show

Comments

Liu, Jinsong - Feb. 28, 2012, 10:30 a.m.
> 
> My point is that
> 
>   qemu-version-A [-cpu whatever]
> 
> should provide the same VM as
> 
>   qemu-version-B -machine pc-A [-cpu whatever]
> 
> specifically if you leave out the cpu specification.
> 
> So the compat machine could establish a feature mask (e.g. append some
> "-tsc_deadline" in this case). But, indeed, we need a new channel for
> this. 
> 

Yes, if such requirement need to be satisfied, I agree we need a new channel to solve this kind of common issue.

As for tsc deadline timer feature exposing, I write an updated patch as attached.
1). It exposes tsc deadline timer feature to guest if in-kernel irqchip is used and kvm has emulated tsc deadline timer;
2). It also authorizes user to control the feature exposing via a cpu feature flag;

Thanks,
Jinsong
Liu, Jinsong - March 6, 2012, 7:49 a.m.
Jan,

Any comments? I feel some confused about your point 'disable cpuid feature for older machine types by default': are you planning a common approach for this common issue, or, you just ask me a specific solution for the tsc deadline timer case?

Thanks,
Jinsong


Liu, Jinsong wrote:
>> My point is that
>> 
>>   qemu-version-A [-cpu whatever]
>> 
>> should provide the same VM as
>> 
>>   qemu-version-B -machine pc-A [-cpu whatever]
>> 
>> specifically if you leave out the cpu specification.
>> 
>> So the compat machine could establish a feature mask (e.g. append
>> some "-tsc_deadline" in this case). But, indeed, we need a new
>> channel for this. 
>> 
> 
> Yes, if such requirement need to be satisfied, I agree we need a new
> channel to solve this kind of common issue. 
> 
> As for tsc deadline timer feature exposing, I write an updated patch
> as attached. 1). It exposes tsc deadline timer feature to guest if
> in-kernel irqchip is used and kvm has emulated tsc deadline timer;
> 2). It also authorizes user to control the feature exposing via a cpu
> feature flag;  
> 
> Thanks,
> Jinsong
> 
> ====================
> From 5b7d5f459b621686e78e437010ce34748bcb9e8e Mon Sep 17 00:00:00 2001
> From: Liu, Jinsong <jinsong.liu@intel.com>
> Date: Wed, 29 Feb 2012 01:53:15 +0800
> Subject: [PATCH] Expose tsc deadline timer feature to guest
> 
> It exposes tsc deadline timer feature to guest if in-kernel irqchip
> is used 
> and kvm has emulated tsc deadline timer.
> It also authorizes user to control the feature exposing via a cpu
> feature flag. 
> 
> Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
> ---
>  target-i386/cpu.h   |    1 +
>  target-i386/cpuid.c |    2 +-
>  target-i386/kvm.c   |    4 ++++
>  3 files changed, 6 insertions(+), 1 deletions(-)
> 
> diff --git a/target-i386/cpu.h b/target-i386/cpu.h
> index d92be5d..3409afe 100644
> --- a/target-i386/cpu.h
> +++ b/target-i386/cpu.h
> @@ -399,6 +399,7 @@
>  #define CPUID_EXT_X2APIC   (1 << 21)
>  #define CPUID_EXT_MOVBE    (1 << 22)
>  #define CPUID_EXT_POPCNT   (1 << 23)
> +#define CPUID_EXT_TSC_DEADLINE_TIMER (1 << 24)
>  #define CPUID_EXT_XSAVE    (1 << 26)
>  #define CPUID_EXT_OSXSAVE  (1 << 27)
>  #define CPUID_EXT_HYPERVISOR  (1 << 31)
> diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
> index b9bfeaf..ac4b79c 100644
> --- a/target-i386/cpuid.c
> +++ b/target-i386/cpuid.c
> @@ -50,7 +50,7 @@ static const char *ext_feature_name[] = {
>      "fma", "cx16", "xtpr", "pdcm",
>      NULL, NULL, "dca", "sse4.1|sse4_1",
>      "sse4.2|sse4_2", "x2apic", "movbe", "popcnt",
> -    NULL, "aes", "xsave", "osxsave",
> +    "tsc_deadline", "aes", "xsave", "osxsave",
>      "avx", NULL, NULL, "hypervisor",
>  };
>  static const char *ext2_feature_name[] = {
> diff --git a/target-i386/kvm.c b/target-i386/kvm.c
> index 7079e87..2639699 100644
> --- a/target-i386/kvm.c
> +++ b/target-i386/kvm.c
> @@ -370,6 +370,10 @@ int kvm_arch_init_vcpu(CPUState *env)
>      i = env->cpuid_ext_features & CPUID_EXT_HYPERVISOR;
>      env->cpuid_ext_features &= kvm_arch_get_supported_cpuid(s, 1, 0,
>      R_ECX); env->cpuid_ext_features |= i;
> +    if (!kvm_irqchip_in_kernel() ||
> +        !kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
> +        env->cpuid_ext_features &= ~CPUID_EXT_TSC_DEADLINE_TIMER;
> +    }
> 
>      env->cpuid_ext2_features &= kvm_arch_get_supported_cpuid(s,
>                                                              
> 0x80000001, 0, R_EDX);
Jan Kiszka - March 6, 2012, 10:14 a.m.
On 2012-03-06 08:49, Liu, Jinsong wrote:
> Jan,
> 
> Any comments? I feel some confused about your point 'disable cpuid feature for older machine types by default': are you planning a common approach for this common issue, or, you just ask me a specific solution for the tsc deadline timer case?

I think a generic solution for this can be as simple as passing a
feature exclusion mask to cpu_init. You could simple append a string of
"-feature1,-feature2" to the cpu model that is specified on creation.
And that string could be defined in the compat machine descriptions.
Does this make sense?

Jan
Liu, Jinsong - March 9, 2012, 6:27 p.m.
Jan Kiszka wrote:
> On 2012-03-06 08:49, Liu, Jinsong wrote:
>> Jan,
>> 
>> Any comments? I feel some confused about your point 'disable cpuid
>> feature for older machine types by default': are you planning a
>> common approach for this common issue, or, you just ask me a
>> specific solution for the tsc deadline timer case?   
> 
> I think a generic solution for this can be as simple as passing a
> feature exclusion mask to cpu_init. You could simple append a string
> of "-feature1,-feature2" to the cpu model that is specified on
> creation. And that string could be defined in the compat machine
> descriptions. Does this make sense?
> 

Jan, to prevent misunderstanding, I elaborate my understanding of your points below (if any misunderstanding please point out to me):
=====================
Your target is, to migrate from A(old qemu) to B(new qemu) by
1. at A: qemu-version-A [-cpu whatever]      // currently the default machine type is pc-A
2. at B: qemu-version-B -machine pc-A [-cpu whatever] -feature1 -feature2

B run new qemu-version-B (w/ new features 'feature1' and 'feature2'), but when B runs w/ compat '-machine pc-A', vm should not see 'feature1' and 'feature2', so commandline append string to cpu model '-cpu whatever -feature1 -feature2' to hidden new feature1 and feature2 to vm, hence vm can see same cpuid features (at B) as those at A (which means, no feature1, no feature2)
=====================

If my understanding of your thoughts is right, I think currently qemu has satisfied your target, code refer to
     pc_cpus_init(cpu_model)
     ......
     cpu_init(cpu_model)
--> cpu_x86_register(*env, cpu_model)
--> cpu_x86_find_by_name(*def, cpu_model)     // parse '+/- features', generate feature masks plus_features... 
                                                                     // and minus_features...(this is feature exclusion masks you want)
I think your point 'define in the compat machine description' is unnecessary.

As for 'tsc deadline' feature exposing, my patch (as attached) just obey qemu general cpuid exposing method, and also satisfied your target I think.


Thanks,
Jinsong
Jan Kiszka - March 9, 2012, 6:56 p.m.
On 2012-03-09 19:27, Liu, Jinsong wrote:
> Jan Kiszka wrote:
>> On 2012-03-06 08:49, Liu, Jinsong wrote:
>>> Jan,
>>>
>>> Any comments? I feel some confused about your point 'disable cpuid
>>> feature for older machine types by default': are you planning a
>>> common approach for this common issue, or, you just ask me a
>>> specific solution for the tsc deadline timer case?   
>>
>> I think a generic solution for this can be as simple as passing a
>> feature exclusion mask to cpu_init. You could simple append a string
>> of "-feature1,-feature2" to the cpu model that is specified on
>> creation. And that string could be defined in the compat machine
>> descriptions. Does this make sense?
>>
> 
> Jan, to prevent misunderstanding, I elaborate my understanding of your points below (if any misunderstanding please point out to me):
> =====================
> Your target is, to migrate from A(old qemu) to B(new qemu) by
> 1. at A: qemu-version-A [-cpu whatever]      // currently the default machine type is pc-A
> 2. at B: qemu-version-B -machine pc-A [-cpu whatever] -feature1 -feature2
> 
> B run new qemu-version-B (w/ new features 'feature1' and 'feature2'), but when B runs w/ compat '-machine pc-A', vm should not see 'feature1' and 'feature2', so commandline append string to cpu model '-cpu whatever -feature1 -feature2' to hidden new feature1 and feature2 to vm, hence vm can see same cpuid features (at B) as those at A (which means, no feature1, no feature2)
> =====================
> 
> If my understanding of your thoughts is right, I think currently qemu has satisfied your target, code refer to
>      pc_cpus_init(cpu_model)
>      ......
>      cpu_init(cpu_model)
> --> cpu_x86_register(*env, cpu_model)
> --> cpu_x86_find_by_name(*def, cpu_model)     // parse '+/- features', generate feature masks plus_features... 
>                                                                      // and minus_features...(this is feature exclusion masks you want)
> I think your point 'define in the compat machine description' is unnecessary.

The user would have to specify the new feature as exclusions *manually*
on the command line if -machine pc-A doesn't inject them
*automatically*. So it is necessary to enhance qemu in this regard.

Jan
Liu, Jinsong - March 9, 2012, 7:09 p.m.
Jan Kiszka wrote:
> On 2012-03-09 19:27, Liu, Jinsong wrote:
>> Jan Kiszka wrote:
>>> On 2012-03-06 08:49, Liu, Jinsong wrote:
>>>> Jan,
>>>> 
>>>> Any comments? I feel some confused about your point 'disable cpuid
>>>> feature for older machine types by default': are you planning a
>>>> common approach for this common issue, or, you just ask me a
>>>> specific solution for the tsc deadline timer case?
>>> 
>>> I think a generic solution for this can be as simple as passing a
>>> feature exclusion mask to cpu_init. You could simple append a string
>>> of "-feature1,-feature2" to the cpu model that is specified on
>>> creation. And that string could be defined in the compat machine
>>> descriptions. Does this make sense?
>>> 
>> 
>> Jan, to prevent misunderstanding, I elaborate my understanding of
>> your points below (if any misunderstanding please point out to me):
>> =====================  
>> Your target is, to migrate from A(old qemu) to B(new qemu) by
>> 1. at A: qemu-version-A [-cpu whatever]      // currently the
>> default machine type is pc-A 
>> 2. at B: qemu-version-B -machine pc-A [-cpu whatever] -feature1
>> -feature2 
>> 
>> B run new qemu-version-B (w/ new features 'feature1' and
>> 'feature2'), but when B runs w/ compat '-machine pc-A', vm should
>> not see 'feature1' and 'feature2', so commandline append string to
>> cpu model '-cpu whatever -feature1 -feature2' to hidden new feature1
>> and feature2 to vm, hence vm can see same cpuid features (at B) as
>> those at A (which means, no feature1, no feature2)
>> =====================      
>> 
>> If my understanding of your thoughts is right, I think currently
>>      qemu has satisfied your target, code refer to     
>>      pc_cpus_init(cpu_model) ...... cpu_init(cpu_model)
>> --> cpu_x86_register(*env, cpu_model)
>> --> cpu_x86_find_by_name(*def, cpu_model)     // parse '+/-
>>                                                                     
>> features', generate feature masks plus_features... // and
>> minus_features...(this is feature exclusion masks you want)  
>> I think your point 'define in the compat machine description' is
>> unnecessary. 
> 
> The user would have to specify the new feature as exclusions
> *manually* on the command line if -machine pc-A doesn't inject them
> *automatically*. So it is necessary to enhance qemu in this regard.
> 

... You suggest 'append a string of "-feature1,-feature2" to the cpu model that is specified on creation' at your last email.
Could you tell me other way user exclude features? I only know qemu command line :-(

Thanks,
Jinsong
Liu, Jinsong - March 9, 2012, 7:29 p.m.
Liu, Jinsong wrote:
> Jan Kiszka wrote:
>> On 2012-03-06 08:49, Liu, Jinsong wrote:
>>> Jan,
>>> 
>>> Any comments? I feel some confused about your point 'disable cpuid
>>> feature for older machine types by default': are you planning a
>>> common approach for this common issue, or, you just ask me a
>>> specific solution for the tsc deadline timer case?
>> 
>> I think a generic solution for this can be as simple as passing a
>> feature exclusion mask to cpu_init. You could simple append a string
>> of "-feature1,-feature2" to the cpu model that is specified on
>> creation. And that string could be defined in the compat machine
>> descriptions. Does this make sense?
>> 
> 
> Jan, to prevent misunderstanding, I elaborate my understanding of
> your points below (if any misunderstanding please point out to me):
> ===================== 
> Your target is, to migrate from A(old qemu) to B(new qemu) by
> 1. at A: qemu-version-A [-cpu whatever]      // currently the default
> machine type is pc-A 
> 2. at B: qemu-version-B -machine pc-A [-cpu whatever] -feature1
> -feature2 
> 
> B run new qemu-version-B (w/ new features 'feature1' and 'feature2'),
> but when B runs w/ compat '-machine pc-A', vm should not see
> 'feature1' and 'feature2', so commandline append string to cpu model
> '-cpu whatever -feature1 -feature2' to hidden new feature1 and
> feature2 to vm, hence vm can see same cpuid features (at B) as those
> at A (which means, no feature1, no feature2) =====================    
> 

BTW, any misunderstanding or something wrong about my understanding of your target? please help me confirm. I want to make sure we are talking same thing. 

Thanks,
Jinsong

> If my understanding of your thoughts is right, I think currently qemu
>      has satisfied your target, code refer to pc_cpus_init(cpu_model)
>      ......
>      cpu_init(cpu_model)
> --> cpu_x86_register(*env, cpu_model)
> --> cpu_x86_find_by_name(*def, cpu_model)     // parse '+/-
>                                                                     
> features', generate feature masks plus_features... // and
> minus_features...(this is feature exclusion masks you want)  
> I think your point 'define in the compat machine description' is
> unnecessary. 
> 
> As for 'tsc deadline' feature exposing, my patch (as attached) just
> obey qemu general cpuid exposing method, and also satisfied your
> target I think.  
> 
> 
> Thanks,
> Jinsong
Rik van Riel - March 19, 2012, 10:35 p.m.
On 03/09/2012 01:27 PM, Liu, Jinsong wrote:

> As for 'tsc deadline' feature exposing, my patch (as attached) just obey qemu general cpuid exposing method, and also satisfied your target I think.

One question.

Why is TSC_DEADLINE not exposed in the cpuid allowed feature
bits in do_cpuid_ent() in arch/x86/kvm/x86.c ?

         /* cpuid 1.ecx */
         const u32 kvm_supported_word4_x86_features =
                 F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
                 0 /* DS-CPL, VMX, SMX, EST */ |
                 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* 
Reserved */ |
                 F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ |
                 0 /* Reserved, DCA */ | F(XMM4_1) |
                 F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
                 0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | 
F(AVX) |
                 F(F16C) | F(RDRAND);

Would it make sense to expose F(TSC_DEADLINE) above?

Or is there something truly special about tsc deadline
that means it should be different from everything else?
Liu, Jinsong - March 20, 2012, 12:53 p.m.
Rik van Riel wrote:
> On 03/09/2012 01:27 PM, Liu, Jinsong wrote:
> 
>> As for 'tsc deadline' feature exposing, my patch (as attached) just
>> obey qemu general cpuid exposing method, and also satisfied your
>> target I think.  
> 
> One question.
> 
> Why is TSC_DEADLINE not exposed in the cpuid allowed feature
> bits in do_cpuid_ent() in arch/x86/kvm/x86.c ?
> 
>          /* cpuid 1.ecx */
>          const u32 kvm_supported_word4_x86_features =
>                  F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
>                  0 /* DS-CPL, VMX, SMX, EST */ |
>                  0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /*
> Reserved */ |
>                  F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ |
>                  0 /* Reserved, DCA */ | F(XMM4_1) |
>                  F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
>                  0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE
> */ | F(AVX) |
>                  F(F16C) | F(RDRAND);
> 
> Would it make sense to expose F(TSC_DEADLINE) above?
> 
> Or is there something truly special about tsc deadline
> that means it should be different from everything else?

Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot guarantee will be called, we expose it via a KVM_CAP_TSC_DEADLINE_TIMER and not KVM_GET_SUPPORTED_CPUID.
Refer changeset 4d25a066b69fb749a39d0d4c610689dd765a0b0e.

BTW, your question remind me qemu-kvm side patch, and I update it a little (would be sent out later).

Thanks,
Jinsong
Eduardo Habkost - March 20, 2012, 1:33 p.m.
On Tue, Mar 20, 2012 at 12:53:57PM +0000, Liu, Jinsong wrote:
> Rik van Riel wrote:
> > On 03/09/2012 01:27 PM, Liu, Jinsong wrote:
> > 
> >> As for 'tsc deadline' feature exposing, my patch (as attached) just
> >> obey qemu general cpuid exposing method, and also satisfied your
> >> target I think.  
> > 
> > One question.
> > 
> > Why is TSC_DEADLINE not exposed in the cpuid allowed feature
> > bits in do_cpuid_ent() in arch/x86/kvm/x86.c ?
> > 
> >          /* cpuid 1.ecx */
> >          const u32 kvm_supported_word4_x86_features =
> >                  F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
> >                  0 /* DS-CPL, VMX, SMX, EST */ |
> >                  0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /*
> > Reserved */ |
> >                  F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ |
> >                  0 /* Reserved, DCA */ | F(XMM4_1) |
> >                  F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
> >                  0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE
> > */ | F(AVX) |
> >                  F(F16C) | F(RDRAND);
> > 
> > Would it make sense to expose F(TSC_DEADLINE) above?
> > 
> > Or is there something truly special about tsc deadline
> > that means it should be different from everything else?
> 
> Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot guarantee will be called, we expose it via a KVM_CAP_TSC_DEADLINE_TIMER and not KVM_GET_SUPPORTED_CPUID.

We have many other features that depend on proper support from userspace
otherwise they wouldn't work, but are listed on GET_SUPPORTED_CPUID,
don't we? Why is TSC-deadline special?

GET_SUPPORTED_CPUID just means "KVM supports it as long as userspace
supports it too and enables it", it doesn't mean "CPUID bit that will be
enabled by default"[1].

> Refer changeset 4d25a066b69fb749a39d0d4c610689dd765a0b0e.

That changeset was necessary because the kernel was doing this on
update_cpu

	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
		best->function == 0x1) {
		best->ecx |= bit(X86_FEATURE_TSC_DEADLINE_TIMER);

And that was really wrong, because it enabled the bit unconditionally.
But I don't understand why KVM_CAP_TSC_DEADLINE_TIMER was created if we
already have KVM_GET_SUPPORTED_CPUID to tell userspace which bits are
supported by KVM.


[1] From Documentation/virtual/kvm/api.txt:

"KVM_GET_SUPPORTED_CPUID
[...]
This ioctl returns x86 cpuid features which are supported by both the
hardware and kvm.  Userspace can use the information returned by this
ioctl to construct cpuid information (for KVM_SET_CPUID2) that is
consistent with hardware, kernel, and userspace capabilities, and with
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
user requirements (for example, the user may wish to constrain cpuid to
emulate older hardware, or for feature consistency across a cluster)."
Liu, Jinsong - March 23, 2012, 3:49 a.m.
Eduardo Habkost wrote:
> On Tue, Mar 20, 2012 at 12:53:57PM +0000, Liu, Jinsong wrote:
>> Rik van Riel wrote:
>>> On 03/09/2012 01:27 PM, Liu, Jinsong wrote:
>>> 
>>>> As for 'tsc deadline' feature exposing, my patch (as attached) just
>>>> obey qemu general cpuid exposing method, and also satisfied your
>>>> target I think.
>>> 
>>> One question.
>>> 
>>> Why is TSC_DEADLINE not exposed in the cpuid allowed feature
>>> bits in do_cpuid_ent() in arch/x86/kvm/x86.c ?
>>> 
>>>          /* cpuid 1.ecx */
>>>          const u32 kvm_supported_word4_x86_features =
>>>                  F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
>>>                  0 /* DS-CPL, VMX, SMX, EST */ |
>>>                  0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /*
>>>                  Reserved */ | F(FMA) | F(CX16) | 0 /* xTPR Update,
>>>                  PDCM */ | 0 /* Reserved, DCA */ | F(XMM4_1) |
>>>                  F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
>>>                  0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE
>>>                  */ | F(AVX) | F(F16C) | F(RDRAND);
>>> 
>>> Would it make sense to expose F(TSC_DEADLINE) above?
>>> 
>>> Or is there something truly special about tsc deadline
>>> that means it should be different from everything else?
>> 
>> Because the feature depends on KVM_CREATE_IRQCHIP, which we cannot
>> guarantee will be called, we expose it via a
>> KVM_CAP_TSC_DEADLINE_TIMER and not KVM_GET_SUPPORTED_CPUID.  
> 
> We have many other features that depend on proper support from
> userspace otherwise they wouldn't work, but are listed on
> GET_SUPPORTED_CPUID, don't we? Why is TSC-deadline special?
> 
> GET_SUPPORTED_CPUID just means "KVM supports it as long as userspace
> supports it too and enables it", it doesn't mean "CPUID bit that will
> be enabled by default"[1].
> 
>> Refer changeset 4d25a066b69fb749a39d0d4c610689dd765a0b0e.
> 
> That changeset was necessary because the kernel was doing this on
> update_cpu
> 
> 	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
> 		best->function == 0x1) {
> 		best->ecx |= bit(X86_FEATURE_TSC_DEADLINE_TIMER);
> 
> And that was really wrong, because it enabled the bit unconditionally.
> But I don't understand why KVM_CAP_TSC_DEADLINE_TIMER was created if
> we already have KVM_GET_SUPPORTED_CPUID to tell userspace which bits
> are supported by KVM.
> 

Yes, exactly. That's why we need this patch.

> 
> [1] From Documentation/virtual/kvm/api.txt:
> 
> "KVM_GET_SUPPORTED_CPUID
> [...]
> This ioctl returns x86 cpuid features which are supported by both the
> hardware and kvm.  Userspace can use the information returned by this
> ioctl to construct cpuid information (for KVM_SET_CPUID2) that is
> consistent with hardware, kernel, and userspace capabilities, and with
>                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
> user requirements (for example, the user may wish to constrain cpuid
> to emulate older hardware, or for feature consistency across a
> cluster)." 

The fixbug patch is implemented by Jan and Avi, I reply per my understanding.

I think for tsc deadline timer feature, KVM_CAP_TSC_DEADLINE_TIMER is slightly better than KVM_GET_SUPPORTED_CPUID. If use KVM_GET_SUPPORTED_CPUID, it means tsc deadline features bind to host cpuid, while it fact it could be pure software emulated by kvm (though currently it implemented as bound to hareware). For the sake of extension, it choose KVM_CAP_TSC_DEADLINE_TIMER.

Thanks,
Jinsong
Eduardo Habkost - March 23, 2012, 1:46 p.m.
On Fri, Mar 23, 2012 at 03:49:27AM +0000, Liu, Jinsong wrote:
> Eduardo Habkost wrote:
> > [1] From Documentation/virtual/kvm/api.txt:
> > 
> > "KVM_GET_SUPPORTED_CPUID
> > [...]
> > This ioctl returns x86 cpuid features which are supported by both the
> > hardware and kvm.  Userspace can use the information returned by this
> > ioctl to construct cpuid information (for KVM_SET_CPUID2) that is
> > consistent with hardware, kernel, and userspace capabilities, and with
> >                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
> > user requirements (for example, the user may wish to constrain cpuid
> > to emulate older hardware, or for feature consistency across a
> > cluster)." 
> 
> The fixbug patch is implemented by Jan and Avi, I reply per my understanding.

No problem. I hope Jan or Avi can clarify this.

> 
> I think for tsc deadline timer feature, KVM_CAP_TSC_DEADLINE_TIMER is
> slightly better than KVM_GET_SUPPORTED_CPUID. If use
> KVM_GET_SUPPORTED_CPUID, it means tsc deadline features bind to host
> cpuid, while it fact it could be pure software emulated by kvm (though
> currently it implemented as bound to hareware). For the sake of
> extension, it choose KVM_CAP_TSC_DEADLINE_TIMER.

There's no requirement for GET_SUPPORTED_CPUID to be a subset of the
host CPU features. If KVM can completely emulate the feature by
software, then it can return the feature on GET_SUPPORTED_CPUID even if
the host CPU doesn't have the feature. That's the case for x2apic, for
example (see commit 0d1de2d901f4ba0972a3886496a44fb1d3300dbd).
Liu, Jinsong - March 23, 2012, 2:17 p.m.
Eduardo Habkost wrote:
> On Fri, Mar 23, 2012 at 03:49:27AM +0000, Liu, Jinsong wrote:
>> Eduardo Habkost wrote:
>>> [1] From Documentation/virtual/kvm/api.txt:
>>> 
>>> "KVM_GET_SUPPORTED_CPUID
>>> [...]
>>> This ioctl returns x86 cpuid features which are supported by both
>>> the hardware and kvm.  Userspace can use the information returned
>>> by this ioctl to construct cpuid information (for KVM_SET_CPUID2)
>>> that is consistent with hardware, kernel, and userspace
>>>                                   capabilities, and with
>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^ 
>>> user requirements (for example, the user may wish to constrain cpuid
>>> to emulate older hardware, or for feature consistency across a
>>> cluster)."
>> 
>> The fixbug patch is implemented by Jan and Avi, I reply per my
>> understanding. 
> 
> No problem. I hope Jan or Avi can clarify this.
> 
>> 
>> I think for tsc deadline timer feature, KVM_CAP_TSC_DEADLINE_TIMER is
>> slightly better than KVM_GET_SUPPORTED_CPUID. If use
>> KVM_GET_SUPPORTED_CPUID, it means tsc deadline features bind to host
>> cpuid, while it fact it could be pure software emulated by kvm
>> (though currently it implemented as bound to hareware). For the sake
>> of 
>> extension, it choose KVM_CAP_TSC_DEADLINE_TIMER.
> 
> There's no requirement for GET_SUPPORTED_CPUID to be a subset of the
> host CPU features. If KVM can completely emulate the feature by
> software, then it can return the feature on GET_SUPPORTED_CPUID even
> if the host CPU doesn't have the feature. That's the case for x2apic,
> for example (see commit 0d1de2d901f4ba0972a3886496a44fb1d3300dbd).


Jan/Avi,

Could you elaborate more your thought? 

Thanks,
Jinsong
Eduardo Habkost - April 19, 2012, 8:03 p.m.
Jan/Avi: ping?

I would like to get this ABI detail clarified so it can be implemented
the right way on Qemu and KVM.

My proposal is to simply add tsc-deadline to the data returned by
GET_SUPPORTED_CPUID, making KVM_CAP_TSC_DEADLINE_TIMER unnecessary.


On Fri, Mar 23, 2012 at 02:17:52PM +0000, Liu, Jinsong wrote:
> Eduardo Habkost wrote:
> > On Fri, Mar 23, 2012 at 03:49:27AM +0000, Liu, Jinsong wrote:
> >> Eduardo Habkost wrote:
> >>> [1] From Documentation/virtual/kvm/api.txt:
> >>> 
> >>> "KVM_GET_SUPPORTED_CPUID
> >>> [...]
> >>> This ioctl returns x86 cpuid features which are supported by both
> >>> the hardware and kvm.  Userspace can use the information returned
> >>> by this ioctl to construct cpuid information (for KVM_SET_CPUID2)
> >>> that is consistent with hardware, kernel, and userspace
> >>>                                   capabilities, and with
> >>> ^^^^^^^^^^^^^^^^^^^^^^^^^^ 
> >>> user requirements (for example, the user may wish to constrain cpuid
> >>> to emulate older hardware, or for feature consistency across a
> >>> cluster)."
> >> 
> >> The fixbug patch is implemented by Jan and Avi, I reply per my
> >> understanding. 
> > 
> > No problem. I hope Jan or Avi can clarify this.
> > 
> >> 
> >> I think for tsc deadline timer feature, KVM_CAP_TSC_DEADLINE_TIMER is
> >> slightly better than KVM_GET_SUPPORTED_CPUID. If use
> >> KVM_GET_SUPPORTED_CPUID, it means tsc deadline features bind to host
> >> cpuid, while it fact it could be pure software emulated by kvm
> >> (though currently it implemented as bound to hareware). For the sake
> >> of 
> >> extension, it choose KVM_CAP_TSC_DEADLINE_TIMER.
> > 
> > There's no requirement for GET_SUPPORTED_CPUID to be a subset of the
> > host CPU features. If KVM can completely emulate the feature by
> > software, then it can return the feature on GET_SUPPORTED_CPUID even
> > if the host CPU doesn't have the feature. That's the case for x2apic,
> > for example (see commit 0d1de2d901f4ba0972a3886496a44fb1d3300dbd).
> 
> 
> Jan/Avi,
> 
> Could you elaborate more your thought? 
> 
> Thanks,
> Jinsong
Jan Kiszka - April 20, 2012, 10:12 a.m.
On 2012-04-19 22:03, Eduardo Habkost wrote:
> Jan/Avi: ping?
> 
> I would like to get this ABI detail clarified so it can be implemented
> the right way on Qemu and KVM.
> 
> My proposal is to simply add tsc-deadline to the data returned by
> GET_SUPPORTED_CPUID, making KVM_CAP_TSC_DEADLINE_TIMER unnecessary.
> 

IIRC, there were problems with this model to exclude that the feature
gets reported this way without ensuring that in-kernel irqchip is
actually activated. Please browse the discussions, it should be
documented there.

Jan
Eduardo Habkost - April 20, 2012, 3 p.m.
On Fri, Apr 20, 2012 at 12:12:38PM +0200, Jan Kiszka wrote:
> On 2012-04-19 22:03, Eduardo Habkost wrote:
> > Jan/Avi: ping?
> > 
> > I would like to get this ABI detail clarified so it can be implemented
> > the right way on Qemu and KVM.
> > 
> > My proposal is to simply add tsc-deadline to the data returned by
> > GET_SUPPORTED_CPUID, making KVM_CAP_TSC_DEADLINE_TIMER unnecessary.
> > 
> 
> IIRC, there were problems with this model to exclude that the feature
> gets reported this way without ensuring that in-kernel irqchip is
> actually activated. Please browse the discussions, it should be
> documented there.

The flag was never added to GET_SUPPORTED_CPUID on any of the git
repositories I have checked, and I don't see that being done anywhere on
my KVM mailing list archives, either. So I don't see how we could have
had issues with GET_SUPPORTED_CPUID, if it was never present on the
code.

What was present on the code before the fix, was coded that
unconditinally enabled the flag even if Qemu never asked for it, and
that really was wrong.

GET_SUPPORTED_CPUID doesn't enable any flag: it just tells userspace
that the hardware and KVM supports the feature (and, surprise, this is
exactly what KVM_CAP_TSC_DEADLINE_TIMER means too, isn't it?). It's up
to userspace to enable the CPUID bits according to user requirements and
userspace capabilities (in other words: only when userspace knows it is
safe because the in-kernel irqchip is enabled).

If the above is not what GET_SUPPORTED_CPUID means, I would like to get
that clarified, so I can submit an updated to KVM API documentation.
Jan Kiszka - April 20, 2012, 3:19 p.m.
On 2012-04-20 17:00, Eduardo Habkost wrote:
> On Fri, Apr 20, 2012 at 12:12:38PM +0200, Jan Kiszka wrote:
>> On 2012-04-19 22:03, Eduardo Habkost wrote:
>>> Jan/Avi: ping?
>>>
>>> I would like to get this ABI detail clarified so it can be implemented
>>> the right way on Qemu and KVM.
>>>
>>> My proposal is to simply add tsc-deadline to the data returned by
>>> GET_SUPPORTED_CPUID, making KVM_CAP_TSC_DEADLINE_TIMER unnecessary.
>>>
>>
>> IIRC, there were problems with this model to exclude that the feature
>> gets reported this way without ensuring that in-kernel irqchip is
>> actually activated. Please browse the discussions, it should be
>> documented there.
> 
> The flag was never added to GET_SUPPORTED_CPUID on any of the git
> repositories I have checked, and I don't see that being done anywhere on
> my KVM mailing list archives, either. So I don't see how we could have
> had issues with GET_SUPPORTED_CPUID, if it was never present on the
> code.
> 
> What was present on the code before the fix, was coded that
> unconditinally enabled the flag even if Qemu never asked for it, and
> that really was wrong.
> 
> GET_SUPPORTED_CPUID doesn't enable any flag: it just tells userspace
> that the hardware and KVM supports the feature (and, surprise, this is
> exactly what KVM_CAP_TSC_DEADLINE_TIMER means too, isn't it?). It's up
> to userspace to enable the CPUID bits according to user requirements and
> userspace capabilities (in other words: only when userspace knows it is
> safe because the in-kernel irqchip is enabled).
> 
> If the above is not what GET_SUPPORTED_CPUID means, I would like to get
> that clarified, so I can submit an updated to KVM API documentation.

Does old userspace filter out flags from GET_SUPPORTED_CPUID that it
does not understand?

Jan
Eduardo Habkost - April 20, 2012, 3:36 p.m.
On Fri, Apr 20, 2012 at 05:19:17PM +0200, Jan Kiszka wrote:
> On 2012-04-20 17:00, Eduardo Habkost wrote:
> > On Fri, Apr 20, 2012 at 12:12:38PM +0200, Jan Kiszka wrote:
> >> On 2012-04-19 22:03, Eduardo Habkost wrote:
> >>> Jan/Avi: ping?
> >>>
> >>> I would like to get this ABI detail clarified so it can be implemented
> >>> the right way on Qemu and KVM.
> >>>
> >>> My proposal is to simply add tsc-deadline to the data returned by
> >>> GET_SUPPORTED_CPUID, making KVM_CAP_TSC_DEADLINE_TIMER unnecessary.
> >>>
> >>
> >> IIRC, there were problems with this model to exclude that the feature
> >> gets reported this way without ensuring that in-kernel irqchip is
> >> actually activated. Please browse the discussions, it should be
> >> documented there.
> > 
> > The flag was never added to GET_SUPPORTED_CPUID on any of the git
> > repositories I have checked, and I don't see that being done anywhere on
> > my KVM mailing list archives, either. So I don't see how we could have
> > had issues with GET_SUPPORTED_CPUID, if it was never present on the
> > code.
> > 
> > What was present on the code before the fix, was coded that
> > unconditinally enabled the flag even if Qemu never asked for it, and
> > that really was wrong.
> > 
> > GET_SUPPORTED_CPUID doesn't enable any flag: it just tells userspace
> > that the hardware and KVM supports the feature (and, surprise, this is
> > exactly what KVM_CAP_TSC_DEADLINE_TIMER means too, isn't it?). It's up
> > to userspace to enable the CPUID bits according to user requirements and
> > userspace capabilities (in other words: only when userspace knows it is
> > safe because the in-kernel irqchip is enabled).
> > 
> > If the above is not what GET_SUPPORTED_CPUID means, I would like to get
> > that clarified, so I can submit an updated to KVM API documentation.
> 
> Does old userspace filter out flags from GET_SUPPORTED_CPUID that it
> does not understand?

It's even more strict than that: it only enables what was explicitly
enabled on the CPU model asked by the user.

But:

The only exception is "-cpu host", that tries to enable everything, even
flags Qemu never heard of, and that is something that may require some
changes on the API design (or at least documentation clarifications).

Today "-cpu host" can't differentiate (A) "a feature that KVM supports
and emulate by itself and can be enabled without any support from
userspace" and (B) "a feature that KVM supports but need support from
userspace to be enabled". I am sure we will be able to find multiple
examples of (B) inside the flags returned by GET_SUPPORTED_CPUID today.

We could decide to never add any new flag to GET_SUPPORTED_CPUID if it
requires additional userspace support to work, from now on, and create
new KVM_CAP_* flags for them. But, we really want to do that for almost
every new CPUID feature bit in the future?

We also have the problem of defining what "requires support from
userspace to work" means: I am not sure this has the same meaning for
everybody. Many new features require userspace support only for
migration, and nothing else, but some users don't need migration to
work. Do those features qualify as (A), or as (B)?
Jan Kiszka - April 21, 2012, 7:23 a.m.
On 2012-04-20 17:36, Eduardo Habkost wrote:
> On Fri, Apr 20, 2012 at 05:19:17PM +0200, Jan Kiszka wrote:
>> On 2012-04-20 17:00, Eduardo Habkost wrote:
>>> On Fri, Apr 20, 2012 at 12:12:38PM +0200, Jan Kiszka wrote:
>>>> On 2012-04-19 22:03, Eduardo Habkost wrote:
>>>>> Jan/Avi: ping?
>>>>>
>>>>> I would like to get this ABI detail clarified so it can be implemented
>>>>> the right way on Qemu and KVM.
>>>>>
>>>>> My proposal is to simply add tsc-deadline to the data returned by
>>>>> GET_SUPPORTED_CPUID, making KVM_CAP_TSC_DEADLINE_TIMER unnecessary.
>>>>>
>>>>
>>>> IIRC, there were problems with this model to exclude that the feature
>>>> gets reported this way without ensuring that in-kernel irqchip is
>>>> actually activated. Please browse the discussions, it should be
>>>> documented there.
>>>
>>> The flag was never added to GET_SUPPORTED_CPUID on any of the git
>>> repositories I have checked, and I don't see that being done anywhere on
>>> my KVM mailing list archives, either. So I don't see how we could have
>>> had issues with GET_SUPPORTED_CPUID, if it was never present on the
>>> code.
>>>
>>> What was present on the code before the fix, was coded that
>>> unconditinally enabled the flag even if Qemu never asked for it, and
>>> that really was wrong.
>>>
>>> GET_SUPPORTED_CPUID doesn't enable any flag: it just tells userspace
>>> that the hardware and KVM supports the feature (and, surprise, this is
>>> exactly what KVM_CAP_TSC_DEADLINE_TIMER means too, isn't it?). It's up
>>> to userspace to enable the CPUID bits according to user requirements and
>>> userspace capabilities (in other words: only when userspace knows it is
>>> safe because the in-kernel irqchip is enabled).
>>>
>>> If the above is not what GET_SUPPORTED_CPUID means, I would like to get
>>> that clarified, so I can submit an updated to KVM API documentation.
>>
>> Does old userspace filter out flags from GET_SUPPORTED_CPUID that it
>> does not understand?
> 
> It's even more strict than that: it only enables what was explicitly
> enabled on the CPU model asked by the user.
> 
> But:
> 
> The only exception is "-cpu host", that tries to enable everything, even
> flags Qemu never heard of, and that is something that may require some
> changes on the API design (or at least documentation clarifications).
> 
> Today "-cpu host" can't differentiate (A) "a feature that KVM supports
> and emulate by itself and can be enabled without any support from
> userspace" and (B) "a feature that KVM supports but need support from
> userspace to be enabled". I am sure we will be able to find multiple
> examples of (B) inside the flags returned by GET_SUPPORTED_CPUID today.

The problem remains that exposing TSC_DEADLINE via SUPPORTED_CPUID is
wrong in case userspace doesn't select the in-kernel APIC. The kernel
does _nothing_ about emulating the flag if userspace provides the APIC,
so it must not expose this feature. Even if this had no practical impact
(but it has: old qemu[-kvm] + userspace APIC + -cpu host), it would
still be semantically broken.

> 
> We could decide to never add any new flag to GET_SUPPORTED_CPUID if it
> requires additional userspace support to work, from now on, and create
> new KVM_CAP_* flags for them. But, we really want to do that for almost
> every new CPUID feature bit in the future?

Most CPU features do not depend on our in-kernel irqchips. But if there
is something to simplify in retrieving information about such
"conditional features", let's do it.

> 
> We also have the problem of defining what "requires support from
> userspace to work" means: I am not sure this has the same meaning for
> everybody. Many new features require userspace support only for
> migration, and nothing else, but some users don't need migration to
> work. Do those features qualify as (A), or as (B)?

"Works for most user" also means "breaks for some". And that is a bug,
isn't it?

Jan
Eduardo Habkost - April 23, 2012, 2:48 p.m.
On Sat, Apr 21, 2012 at 09:23:50AM +0200, Jan Kiszka wrote:
> On 2012-04-20 17:36, Eduardo Habkost wrote:
> > On Fri, Apr 20, 2012 at 05:19:17PM +0200, Jan Kiszka wrote:
> >> On 2012-04-20 17:00, Eduardo Habkost wrote:
> >>> On Fri, Apr 20, 2012 at 12:12:38PM +0200, Jan Kiszka wrote:
> >>>> On 2012-04-19 22:03, Eduardo Habkost wrote:
> >>>>> Jan/Avi: ping?
> >>>>>
> >>>>> I would like to get this ABI detail clarified so it can be implemented
> >>>>> the right way on Qemu and KVM.
> >>>>>
> >>>>> My proposal is to simply add tsc-deadline to the data returned by
> >>>>> GET_SUPPORTED_CPUID, making KVM_CAP_TSC_DEADLINE_TIMER unnecessary.
> >>>>>
> >>>>
> >>>> IIRC, there were problems with this model to exclude that the feature
> >>>> gets reported this way without ensuring that in-kernel irqchip is
> >>>> actually activated. Please browse the discussions, it should be
> >>>> documented there.
> >>>
> >>> The flag was never added to GET_SUPPORTED_CPUID on any of the git
> >>> repositories I have checked, and I don't see that being done anywhere on
> >>> my KVM mailing list archives, either. So I don't see how we could have
> >>> had issues with GET_SUPPORTED_CPUID, if it was never present on the
> >>> code.
> >>>
> >>> What was present on the code before the fix, was coded that
> >>> unconditinally enabled the flag even if Qemu never asked for it, and
> >>> that really was wrong.
> >>>
> >>> GET_SUPPORTED_CPUID doesn't enable any flag: it just tells userspace
> >>> that the hardware and KVM supports the feature (and, surprise, this is
> >>> exactly what KVM_CAP_TSC_DEADLINE_TIMER means too, isn't it?). It's up
> >>> to userspace to enable the CPUID bits according to user requirements and
> >>> userspace capabilities (in other words: only when userspace knows it is
> >>> safe because the in-kernel irqchip is enabled).
> >>>
> >>> If the above is not what GET_SUPPORTED_CPUID means, I would like to get
> >>> that clarified, so I can submit an updated to KVM API documentation.
> >>
> >> Does old userspace filter out flags from GET_SUPPORTED_CPUID that it
> >> does not understand?
> > 
> > It's even more strict than that: it only enables what was explicitly
> > enabled on the CPU model asked by the user.
> > 
> > But:
> > 
> > The only exception is "-cpu host", that tries to enable everything, even
> > flags Qemu never heard of, and that is something that may require some
> > changes on the API design (or at least documentation clarifications).
> > 
> > Today "-cpu host" can't differentiate (A) "a feature that KVM supports
> > and emulate by itself and can be enabled without any support from
> > userspace" and (B) "a feature that KVM supports but need support from
> > userspace to be enabled". I am sure we will be able to find multiple
> > examples of (B) inside the flags returned by GET_SUPPORTED_CPUID today.
> 
> The problem remains that exposing TSC_DEADLINE via SUPPORTED_CPUID is
> wrong in case userspace doesn't select the in-kernel APIC. The kernel
> does _nothing_ about emulating the flag if userspace provides the APIC,
> so it must not expose this feature. Even if this had no practical impact
> (but it has: old qemu[-kvm] + userspace APIC + -cpu host), it would
> still be semantically broken.

How is that different from any other feature that requires userspace to
"do the right thing" to make the feature work? GET_SUPPORTED_CPUID just
tells userspace that the flag is supported, but userspace has to be sure
it will really work, before enabling it.

In other words, I always assumed that features from the (B) group were
always included on GET_SUPPORTED_CPUID, too. Yes, that risks breaking
-cpu host, so we will probably need a better interface to let "-cpu
host" know what can be enabled or not, anyway.


> 
> > 
> > We could decide to never add any new flag to GET_SUPPORTED_CPUID if it
> > requires additional userspace support to work, from now on, and create
> > new KVM_CAP_* flags for them. But, we really want to do that for almost
> > every new CPUID feature bit in the future?
> 
> Most CPU features do not depend on our in-kernel irqchips. But if there
> is something to simplify in retrieving information about such
> "conditional features", let's do it.

But many CPU features on GET_SUPPORTED_CPUID depend on other userspace
capabilities, need userspace to do "the right thing" to make it work,
and won't work if userspace doesn't do what's required. Why configuring
the in-kernel irqchip is special?


> 
> > 
> > We also have the problem of defining what "requires support from
> > userspace to work" means: I am not sure this has the same meaning for
> > everybody. Many new features require userspace support only for
> > migration, and nothing else, but some users don't need migration to
> > work. Do those features qualify as (A), or as (B)?
> 
> "Works for most user" also means "breaks for some". And that is a bug,
> isn't it?

It is, surely, but -cpu host is the only case where CPU features are
enabled blindly, and migration is not expected to be available when
using -cpu host, so in that case "breaks only migration" would still
mean "works for all" (becasue except for -cpu host, Qemu won't enable
any feature if it still doesn't have the proper migration support
working).

So, it's still not clear to me: "breaks only migration" qualifies as (A)
or (B)? In other words, a "breaks only migration" feature should be
added to GET_SUPPORTED_CPUID, or not? There are many features already
inside GET_SUPPORTED_CPUID that require specific migration support to
work (e.g. many of them require additional registers to be
saved/loaded). I would bet that _most_ of them require some additional
stated to be saved/loaded.



Trying to summarize the points above:

Groups (A) and (B) are:

A) a feature that KVM supports and emulate by itself and can be enabled
   by userspace blindly, without requiring any additional userspace
   code to work.
B) a feature that KVM supports but need support from userspace to work.

We have to differentiate those two groups somehow, otherwise "-cpu host"
will always risk being unstable (in case we can't identify group (B) and
end up enabling a feature that will break) or useless (if group (A) is
considered always empty).

(If you think this two-group model is not sufficient, please let me know.)

Note that I am discussing two things above:

- Whether GET_SUPPORTED_CPUID should expose only features from group
  (A), or group (B) too.
  - One problem here is that today GET_SUPPORTED_CPUIDS have many
    examples of (B) features inside it. Should we stop doing that?
    - TSC-deadline is the first case where we are _not_ doing that. It
      is the first CPU feature in KVM that can be enabled by userspace
      (as long as userspace does the proper setup), but it's not
      included on GET_SUPPORTED_CPUIDs.
  - Even the current documentation implies that group (B) is included:

    "This ioctl returns x86 cpuid features which are supported by both
    the hardware and kvm.  Userspace can use the information returned by
    this ioctl to construct cpuid information (for KVM_SET_CPUID2) that
    is consistent with hardware, kernel, and userspace capabilities, and
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^
    with user requirements (for example, the user may wish to constrain
    cpuid to emulate older hardware, or for feature consistency across a
    cluster)."

    In the specific case of TSC-deadline, I consider "Qemu knowing that
    TSC-deadline can be enabled only if in-kernel irqchip is enabled" as
    an "userpace capability".

- How to precisely define the groups (A) and (B)?
  - "requires additional code only if migration is required" qualifies
    as (B) or (A)?
  - "requires in-kernel irqchip to be enabled to work" qualifies as (B),
    doesn't it?
Jan Kiszka - April 23, 2012, 4:31 p.m.
On 2012-04-23 16:48, Eduardo Habkost wrote:
> Trying to summarize the points above:
> 
> Groups (A) and (B) are:
> 
> A) a feature that KVM supports and emulate by itself and can be enabled
>    by userspace blindly, without requiring any additional userspace
>    code to work.
> B) a feature that KVM supports but need support from userspace to work.
> 
> We have to differentiate those two groups somehow, otherwise "-cpu host"
> will always risk being unstable (in case we can't identify group (B) and
> end up enabling a feature that will break) or useless (if group (A) is
> considered always empty).
> 
> (If you think this two-group model is not sufficient, please let me know.)
> 
> Note that I am discussing two things above:
> 
> - Whether GET_SUPPORTED_CPUID should expose only features from group
>   (A), or group (B) too.
>   - One problem here is that today GET_SUPPORTED_CPUIDS have many
>     examples of (B) features inside it. Should we stop doing that?

We have exactly two for the category that I was concerned about: TSC
deadline and X2APIC. The latter is already exposed unconditionally, even
if the kernel does not provide emulation. So, you are right, TSC
deadline is not a new scenario.

>     - TSC-deadline is the first case where we are _not_ doing that. It
>       is the first CPU feature in KVM that can be enabled by userspace
>       (as long as userspace does the proper setup), but it's not
>       included on GET_SUPPORTED_CPUIDs.
>   - Even the current documentation implies that group (B) is included:
> 
>     "This ioctl returns x86 cpuid features which are supported by both
>     the hardware and kvm.  Userspace can use the information returned by
>     this ioctl to construct cpuid information (for KVM_SET_CPUID2) that
>     is consistent with hardware, kernel, and userspace capabilities, and
>                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^
>     with user requirements (for example, the user may wish to constrain
>     cpuid to emulate older hardware, or for feature consistency across a
>     cluster)."
> 
>     In the specific case of TSC-deadline, I consider "Qemu knowing that
>     TSC-deadline can be enabled only if in-kernel irqchip is enabled" as
>     an "userpace capability".
> 
> - How to precisely define the groups (A) and (B)?
>   - "requires additional code only if migration is required" qualifies
>     as (B) or (A)?
>   - "requires in-kernel irqchip to be enabled to work" qualifies as (B),
>     doesn't it?

My problem is that features like X2APIC and TSC deadline are exposed by
the kernel without "contributing" to them (if in-kernel irqchip is off).
However, that was how I interpreted this GET_SUPPORTED_CPUID. In fact,
it is used as "kernel or hardware does not _prevent_" already. And in
that sense, it's ok to enable even features that are not in
kernel/hardware hands. We should point out this fact in the documentation.

Jan
Eduardo Habkost - April 23, 2012, 8:02 p.m.
On Mon, Apr 23, 2012 at 06:31:25PM +0200, Jan Kiszka wrote:
> On 2012-04-23 16:48, Eduardo Habkost wrote:
> > Trying to summarize the points above:
> > 
> > Groups (A) and (B) are:
> > 
> > A) a feature that KVM supports and emulate by itself and can be enabled
> >    by userspace blindly, without requiring any additional userspace
> >    code to work.
> > B) a feature that KVM supports but need support from userspace to work.
> > 
> > We have to differentiate those two groups somehow, otherwise "-cpu host"
> > will always risk being unstable (in case we can't identify group (B) and
> > end up enabling a feature that will break) or useless (if group (A) is
> > considered always empty).
> > 
> > (If you think this two-group model is not sufficient, please let me know.)
> > 
> > Note that I am discussing two things above:
> > 
> > - Whether GET_SUPPORTED_CPUID should expose only features from group
> >   (A), or group (B) too.
> >   - One problem here is that today GET_SUPPORTED_CPUIDS have many
> >     examples of (B) features inside it. Should we stop doing that?
> 
> We have exactly two for the category that I was concerned about: TSC
> deadline and X2APIC. The latter is already exposed unconditionally, even
> if the kernel does not provide emulation. So, you are right, TSC
> deadline is not a new scenario.

Oh, that explains why you were seing it differently: if the kernel
doesn't control anything about the feature exposure, there was no
obvious need to add it to GET_SUPPORTED_CPUID.

> 
> >     - TSC-deadline is the first case where we are _not_ doing that. It
> >       is the first CPU feature in KVM that can be enabled by userspace
> >       (as long as userspace does the proper setup), but it's not
> >       included on GET_SUPPORTED_CPUIDs.
> >   - Even the current documentation implies that group (B) is included:
> > 
> >     "This ioctl returns x86 cpuid features which are supported by both
> >     the hardware and kvm.  Userspace can use the information returned by
> >     this ioctl to construct cpuid information (for KVM_SET_CPUID2) that
> >     is consistent with hardware, kernel, and userspace capabilities, and
> >                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^
> >     with user requirements (for example, the user may wish to constrain
> >     cpuid to emulate older hardware, or for feature consistency across a
> >     cluster)."
> > 
> >     In the specific case of TSC-deadline, I consider "Qemu knowing that
> >     TSC-deadline can be enabled only if in-kernel irqchip is enabled" as
> >     an "userpace capability".
> > 
> > - How to precisely define the groups (A) and (B)?
> >   - "requires additional code only if migration is required" qualifies
> >     as (B) or (A)?
> >   - "requires in-kernel irqchip to be enabled to work" qualifies as (B),
> >     doesn't it?
> 
> My problem is that features like X2APIC and TSC deadline are exposed by
> the kernel without "contributing" to them (if in-kernel irqchip is off).

They are not simply "exposed by the kernel": they are enabled by
userspace, after userspace deciding whether it's safe or not (based on
multiple factors).


> However, that was how I interpreted this GET_SUPPORTED_CPUID. In fact,
> it is used as "kernel or hardware does not _prevent_" already. And in
> that sense, it's ok to enable even features that are not in
> kernel/hardware hands. We should point out this fact in the documentation.

I see GET_SUPPORTED_CPUID as just a "what userspace can enable because
the kernel and the hardware support it (= don't prevent it), as long as
userspace has the required support" (meaning A+B). It's a bit like
KVM_CHECK_EXTENSION, but with the nice feature that that the
capabilities map directly to CPUID bits.

So, it's not clear to me: now you are OK with adding TSC_DEADLINE to
GET_SUPPORTED_CPUID?

But we still have the issue of "-cpu host" not knowing what can be
safely enabled (without userspace feature-specific setup code), or not.
Do you have any suggestion for that? Avi, do you have any suggestion?

And I still don't know the answer to:

> > - How to precisely define the groups (A) and (B)?
> >   - "requires additional code only if migration is required" qualifies
> >     as (B) or (A)?


Re: documentation, isn't the following paragraph (already present on
api.txt) sufficient?

"The entries returned are the host cpuid as returned by the cpuid
instruction, with unknown or unsupported features masked out.  Some
features (for example, x2apic), may not be present in the host cpu, but
are exposed by kvm if it can emulate them efficiently."
Jan Kiszka - April 24, 2012, 4:06 p.m.
On 2012-04-23 22:02, Eduardo Habkost wrote:
> On Mon, Apr 23, 2012 at 06:31:25PM +0200, Jan Kiszka wrote:
>> On 2012-04-23 16:48, Eduardo Habkost wrote:
>>> Trying to summarize the points above:
>>>
>>> Groups (A) and (B) are:
>>>
>>> A) a feature that KVM supports and emulate by itself and can be enabled
>>>    by userspace blindly, without requiring any additional userspace
>>>    code to work.
>>> B) a feature that KVM supports but need support from userspace to work.
>>>
>>> We have to differentiate those two groups somehow, otherwise "-cpu host"
>>> will always risk being unstable (in case we can't identify group (B) and
>>> end up enabling a feature that will break) or useless (if group (A) is
>>> considered always empty).
>>>
>>> (If you think this two-group model is not sufficient, please let me know.)
>>>
>>> Note that I am discussing two things above:
>>>
>>> - Whether GET_SUPPORTED_CPUID should expose only features from group
>>>   (A), or group (B) too.
>>>   - One problem here is that today GET_SUPPORTED_CPUIDS have many
>>>     examples of (B) features inside it. Should we stop doing that?
>>
>> We have exactly two for the category that I was concerned about: TSC
>> deadline and X2APIC. The latter is already exposed unconditionally, even
>> if the kernel does not provide emulation. So, you are right, TSC
>> deadline is not a new scenario.
> 
> Oh, that explains why you were seing it differently: if the kernel
> doesn't control anything about the feature exposure, there was no
> obvious need to add it to GET_SUPPORTED_CPUID.
> 
>>
>>>     - TSC-deadline is the first case where we are _not_ doing that. It
>>>       is the first CPU feature in KVM that can be enabled by userspace
>>>       (as long as userspace does the proper setup), but it's not
>>>       included on GET_SUPPORTED_CPUIDs.
>>>   - Even the current documentation implies that group (B) is included:
>>>
>>>     "This ioctl returns x86 cpuid features which are supported by both
>>>     the hardware and kvm.  Userspace can use the information returned by
>>>     this ioctl to construct cpuid information (for KVM_SET_CPUID2) that
>>>     is consistent with hardware, kernel, and userspace capabilities, and
>>>                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>     with user requirements (for example, the user may wish to constrain
>>>     cpuid to emulate older hardware, or for feature consistency across a
>>>     cluster)."
>>>
>>>     In the specific case of TSC-deadline, I consider "Qemu knowing that
>>>     TSC-deadline can be enabled only if in-kernel irqchip is enabled" as
>>>     an "userpace capability".
>>>
>>> - How to precisely define the groups (A) and (B)?
>>>   - "requires additional code only if migration is required" qualifies
>>>     as (B) or (A)?
>>>   - "requires in-kernel irqchip to be enabled to work" qualifies as (B),
>>>     doesn't it?
>>
>> My problem is that features like X2APIC and TSC deadline are exposed by
>> the kernel without "contributing" to them (if in-kernel irqchip is off).
> 
> They are not simply "exposed by the kernel": they are enabled by
> userspace, after userspace deciding whether it's safe or not (based on
> multiple factors).
> 
> 
>> However, that was how I interpreted this GET_SUPPORTED_CPUID. In fact,
>> it is used as "kernel or hardware does not _prevent_" already. And in
>> that sense, it's ok to enable even features that are not in
>> kernel/hardware hands. We should point out this fact in the documentation.
> 
> I see GET_SUPPORTED_CPUID as just a "what userspace can enable because
> the kernel and the hardware support it (= don't prevent it), as long as
> userspace has the required support" (meaning A+B). It's a bit like
> KVM_CHECK_EXTENSION, but with the nice feature that that the
> capabilities map directly to CPUID bits.
> 
> So, it's not clear to me: now you are OK with adding TSC_DEADLINE to
> GET_SUPPORTED_CPUID?
> 
> But we still have the issue of "-cpu host" not knowing what can be
> safely enabled (without userspace feature-specific setup code), or not.
> Do you have any suggestion for that? Avi, do you have any suggestion?

First of all, I bet this was already broken with the introduction of
x2apic. So TSC deadline won't make it worse. I guess we need to address
this in userspace, first by masking those features out, later by
actually emulating them.

> 
> And I still don't know the answer to:
> 
>>> - How to precisely define the groups (A) and (B)?
>>>   - "requires additional code only if migration is required" qualifies
>>>     as (B) or (A)?
> 
> 
> Re: documentation, isn't the following paragraph (already present on
> api.txt) sufficient?
> 
> "The entries returned are the host cpuid as returned by the cpuid
> instruction, with unknown or unsupported features masked out.  Some
> features (for example, x2apic), may not be present in the host cpu, but
> are exposed by kvm if it can emulate them efficiently."

That suggests such features are always emulated - which is not true.
They are either emulated, or nothing _prevents_ their emulation by user
space.

Jan
Eduardo Habkost - April 24, 2012, 5:19 p.m.
(CCing Andre Przywara, in case he can help to clarify what's the
expected meaning of "-cpu host")

On Tue, Apr 24, 2012 at 06:06:55PM +0200, Jan Kiszka wrote:
> On 2012-04-23 22:02, Eduardo Habkost wrote:
> > On Mon, Apr 23, 2012 at 06:31:25PM +0200, Jan Kiszka wrote:
> >> However, that was how I interpreted this GET_SUPPORTED_CPUID. In fact,
> >> it is used as "kernel or hardware does not _prevent_" already. And in
> >> that sense, it's ok to enable even features that are not in
> >> kernel/hardware hands. We should point out this fact in the documentation.
> > 
> > I see GET_SUPPORTED_CPUID as just a "what userspace can enable because
> > the kernel and the hardware support it (= don't prevent it), as long as
> > userspace has the required support" (meaning A+B). It's a bit like
> > KVM_CHECK_EXTENSION, but with the nice feature that that the
> > capabilities map directly to CPUID bits.
> > 
> > So, it's not clear to me: now you are OK with adding TSC_DEADLINE to
> > GET_SUPPORTED_CPUID?
> > 
> > But we still have the issue of "-cpu host" not knowing what can be
> > safely enabled (without userspace feature-specific setup code), or not.
> > Do you have any suggestion for that? Avi, do you have any suggestion?
> 
> First of all, I bet this was already broken with the introduction of
> x2apic. So TSC deadline won't make it worse. I guess we need to address
> this in userspace, first by masking those features out, later by
> actually emulating them.

I am not sure I understand what you are proposing. Let me explain the
use case I am thinking about:

- Feature FOO is of type (A) (e.g. just a new instruction set that
  doesn't require additional userspace support)
- User has a Qemu vesion that doesn't know anything about feature FOO
- User gets a new CPU that supports feature FOO
- User gets a new kernel that supports feature FOO (i.e. has FOO in
  GET_SUPPORTED_CPUID)
- User does _not_ upgrade Qemu.
- User expects to get feature FOO enabled if using "-cpu host", without
  upgrading Qemu.

The problem here is: to support the above use-case, userspace need a
probing mechanism that can differentiate _new_ (previously unknown)
features that are in group (A) (safe to blindly enable) from features
that are in group (B) (that can't be enabled without an userspace
upgrade).

In short, it becomes a problem if we consider the following case:

- Feature BAR is of type (B) (it can't be enabled without extra
  userspace support)
- User has a Qemu version that doesn't know anything about feature BAR
- User gets a new CPU that supports feature BAR
- User gets a new kernel that supports feature BAR (i.e. has BAR in
  GET_SUPPORTED_CPUID)
- User does _not_ upgrade Qemu.
- User simply shouldn't get feature BAR enabled, even if using "-cpu
  host", otherwise Qemu would break.

If userspace always limited itself to features it knows about, it would
be really easy to implement the feature without any new probing
mechanism from the kernel. But that's not how I think users expect "-cpu
host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
introduced the "-cpu host" feature, in case he can explain what's the
expected semantics on the cases above.

> 
> > 
> > And I still don't know the answer to:
> > 
> >>> - How to precisely define the groups (A) and (B)?
> >>>   - "requires additional code only if migration is required" qualifies
> >>>     as (B) or (A)?
> > 
> > 
> > Re: documentation, isn't the following paragraph (already present on
> > api.txt) sufficient?
> > 
> > "The entries returned are the host cpuid as returned by the cpuid
> > instruction, with unknown or unsupported features masked out.  Some
> > features (for example, x2apic), may not be present in the host cpu, but
> > are exposed by kvm if it can emulate them efficiently."
> 
> That suggests such features are always emulated - which is not true.
> They are either emulated, or nothing _prevents_ their emulation by user
> space.

Well... it's a bit more complicated than that: the current semantics are
a bit more than "doesn't prevent", as in theory every single feature can
be emulated by userspace, without any help from the kernel. So, if
"doesn't prevent" were the only criteria, the kernel would set every
single feature bit on GET_SUPPORTED_CPUID, making it not very useful.

At least in the case of x2apic, the kernel is using GET_SUPPORTED_CPUID
to expose a _capability_ too: when x2apic is present on
GET_SUPPORTED_CPUID, userspace knows that in addition to "not
preventing" the feature from being enabled, the kernel is now able to
emulate x2apic (if proper setup is made by userspace). A kernel that
can't emulate x2apic (even if userspace was allowed to emulate it
completely in userspace) would never have x2apic enabled on
GET_SUPPORTED_CPUID.

Like I said previously, in the end GET_SUPPORTED_CPUID is just a
capability querying mechanism like KVM_CHECK_EXTENSION (where each
extension have a specific kernel-capability meaning), but with the nice
feature of being automatically mapped to CPUID bits (with no need for
additional KVM_CAP_* => CPUID translation in userspace).
Eduardo Habkost - May 7, 2012, 6:21 p.m.
Andre? Are you able to help to answer the question below?

I would like to clarify what's the expected behavior of "-cpu host" to
be able to continue working on it. I believe the code will need to be
fixed on either case, but first we need to figure out what are the
expectations/requirements, to know _which_ changes will be needed.


On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
> (CCing Andre Przywara, in case he can help to clarify what's the
> expected meaning of "-cpu host")
> 
[...]
> I am not sure I understand what you are proposing. Let me explain the
> use case I am thinking about:
> 
> - Feature FOO is of type (A) (e.g. just a new instruction set that
>   doesn't require additional userspace support)
> - User has a Qemu vesion that doesn't know anything about feature FOO
> - User gets a new CPU that supports feature FOO
> - User gets a new kernel that supports feature FOO (i.e. has FOO in
>   GET_SUPPORTED_CPUID)
> - User does _not_ upgrade Qemu.
> - User expects to get feature FOO enabled if using "-cpu host", without
>   upgrading Qemu.
> 
> The problem here is: to support the above use-case, userspace need a
> probing mechanism that can differentiate _new_ (previously unknown)
> features that are in group (A) (safe to blindly enable) from features
> that are in group (B) (that can't be enabled without an userspace
> upgrade).
> 
> In short, it becomes a problem if we consider the following case:
> 
> - Feature BAR is of type (B) (it can't be enabled without extra
>   userspace support)
> - User has a Qemu version that doesn't know anything about feature BAR
> - User gets a new CPU that supports feature BAR
> - User gets a new kernel that supports feature BAR (i.e. has BAR in
>   GET_SUPPORTED_CPUID)
> - User does _not_ upgrade Qemu.
> - User simply shouldn't get feature BAR enabled, even if using "-cpu
>   host", otherwise Qemu would break.
> 
> If userspace always limited itself to features it knows about, it would
> be really easy to implement the feature without any new probing
> mechanism from the kernel. But that's not how I think users expect "-cpu
> host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
> introduced the "-cpu host" feature, in case he can explain what's the
> expected semantics on the cases above.
>
Alexander Graf - May 8, 2012, 12:58 a.m.
On 07.05.2012, at 20:21, Eduardo Habkost wrote:

> 
> Andre? Are you able to help to answer the question below?
> 
> I would like to clarify what's the expected behavior of "-cpu host" to
> be able to continue working on it. I believe the code will need to be
> fixed on either case, but first we need to figure out what are the
> expectations/requirements, to know _which_ changes will be needed.
> 
> 
> On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
>> (CCing Andre Przywara, in case he can help to clarify what's the
>> expected meaning of "-cpu host")
>> 
> [...]
>> I am not sure I understand what you are proposing. Let me explain the
>> use case I am thinking about:
>> 
>> - Feature FOO is of type (A) (e.g. just a new instruction set that
>>  doesn't require additional userspace support)
>> - User has a Qemu vesion that doesn't know anything about feature FOO
>> - User gets a new CPU that supports feature FOO
>> - User gets a new kernel that supports feature FOO (i.e. has FOO in
>>  GET_SUPPORTED_CPUID)
>> - User does _not_ upgrade Qemu.
>> - User expects to get feature FOO enabled if using "-cpu host", without
>>  upgrading Qemu.
>> 
>> The problem here is: to support the above use-case, userspace need a
>> probing mechanism that can differentiate _new_ (previously unknown)
>> features that are in group (A) (safe to blindly enable) from features
>> that are in group (B) (that can't be enabled without an userspace
>> upgrade).
>> 
>> In short, it becomes a problem if we consider the following case:
>> 
>> - Feature BAR is of type (B) (it can't be enabled without extra
>>  userspace support)
>> - User has a Qemu version that doesn't know anything about feature BAR
>> - User gets a new CPU that supports feature BAR
>> - User gets a new kernel that supports feature BAR (i.e. has BAR in
>>  GET_SUPPORTED_CPUID)
>> - User does _not_ upgrade Qemu.
>> - User simply shouldn't get feature BAR enabled, even if using "-cpu
>>  host", otherwise Qemu would break.
>> 
>> If userspace always limited itself to features it knows about, it would
>> be really easy to implement the feature without any new probing
>> mechanism from the kernel. But that's not how I think users expect "-cpu
>> host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
>> introduced the "-cpu host" feature, in case he can explain what's the
>> expected semantics on the cases above.

Can you think of any feature that'd go into category B?

All features I'm aware of work fine (without migration, but that one is moot for -cpu host anyway) as long as the host kvm implementation is fine with it (GET_SUPPORTED_CPUID). So they'd be category A.


Alex
Eduardo Habkost - May 8, 2012, 8:14 p.m.
On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote:
> On 07.05.2012, at 20:21, Eduardo Habkost wrote:
> 
> > 
> > Andre? Are you able to help to answer the question below?
> > 
> > I would like to clarify what's the expected behavior of "-cpu host" to
> > be able to continue working on it. I believe the code will need to be
> > fixed on either case, but first we need to figure out what are the
> > expectations/requirements, to know _which_ changes will be needed.
> > 
> > 
> > On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
> >> (CCing Andre Przywara, in case he can help to clarify what's the
> >> expected meaning of "-cpu host")
> >> 
> > [...]
> >> I am not sure I understand what you are proposing. Let me explain the
> >> use case I am thinking about:
> >> 
> >> - Feature FOO is of type (A) (e.g. just a new instruction set that
> >>  doesn't require additional userspace support)
> >> - User has a Qemu vesion that doesn't know anything about feature FOO
> >> - User gets a new CPU that supports feature FOO
> >> - User gets a new kernel that supports feature FOO (i.e. has FOO in
> >>  GET_SUPPORTED_CPUID)
> >> - User does _not_ upgrade Qemu.
> >> - User expects to get feature FOO enabled if using "-cpu host", without
> >>  upgrading Qemu.
> >> 
> >> The problem here is: to support the above use-case, userspace need a
> >> probing mechanism that can differentiate _new_ (previously unknown)
> >> features that are in group (A) (safe to blindly enable) from features
> >> that are in group (B) (that can't be enabled without an userspace
> >> upgrade).
> >> 
> >> In short, it becomes a problem if we consider the following case:
> >> 
> >> - Feature BAR is of type (B) (it can't be enabled without extra
> >>  userspace support)
> >> - User has a Qemu version that doesn't know anything about feature BAR
> >> - User gets a new CPU that supports feature BAR
> >> - User gets a new kernel that supports feature BAR (i.e. has BAR in
> >>  GET_SUPPORTED_CPUID)
> >> - User does _not_ upgrade Qemu.
> >> - User simply shouldn't get feature BAR enabled, even if using "-cpu
> >>  host", otherwise Qemu would break.
> >> 
> >> If userspace always limited itself to features it knows about, it would
> >> be really easy to implement the feature without any new probing
> >> mechanism from the kernel. But that's not how I think users expect "-cpu
> >> host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
> >> introduced the "-cpu host" feature, in case he can explain what's the
> >> expected semantics on the cases above.
> 
> Can you think of any feature that'd go into category B?

- TSC-deadline: can't be enabled unless userspace takes care to enable
  the in-kernel irqchip.
- x2apic: ditto.

I am not sure about XSAVE: an old qemu version would call kvm_put_fpu()
instead of kvm_put_xsave() on kvm_arch_put_registers(), but I don't know
if this would have unexpected side-effects or not.

I wouldn't be surprised if we find many other cases, as even the
GET_SUPPORTED_CPUID documentation is explicit about that: "Userspace can
use the information returned by this ioctl [GET_SUPPORTED_CPUID] to
construct cpuid information (for KVM_SET_CPUID2) that is consistent with
hardware, kernel, and userspace capabilities, [...]"
                      ^^^^^^^^^^^^^^^^^^^^^^

> 
> All features I'm aware of work fine (without migration, but that one
> is moot for -cpu host anyway) as long as the host kvm implementation
> is fine with it (GET_SUPPORTED_CPUID). So they'd be category A.

So, you would argue that GET_SUPPORTED_CPUID should include only
features of type A? That's the opposite of what we have today.

Maybe we could change the semantics to type-A-only if we define "type A"
as:

- Don't require any extra userspace support except for:
  - Migration support.
  - Enabling the in-kernel irqchip.

If we agree on that semantics, "-cpu host" could safely enable all the
fetures returned by GET_SUPPORTED_CPUID blindly, after making sure that
migration support is disabled and the in-kernel irqchip is enabled. Then
all type-B features will require defining a KVM_CAP_* capability instead
of using GET_SUPPORTED_CPUID. It's the opposite of the direction I was
proposing earlier in this thread, but it is starting to look like a
better idea (otherwise "-cpu host" would never be reliable).

If we agree on that semantics, it won't require any code change on the
current code, just a documentation update.
Alexander Graf - May 8, 2012, 10:07 p.m.
On 08.05.2012, at 22:14, Eduardo Habkost wrote:

> On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote:
>> On 07.05.2012, at 20:21, Eduardo Habkost wrote:
>> 
>>> 
>>> Andre? Are you able to help to answer the question below?
>>> 
>>> I would like to clarify what's the expected behavior of "-cpu host" to
>>> be able to continue working on it. I believe the code will need to be
>>> fixed on either case, but first we need to figure out what are the
>>> expectations/requirements, to know _which_ changes will be needed.
>>> 
>>> 
>>> On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
>>>> (CCing Andre Przywara, in case he can help to clarify what's the
>>>> expected meaning of "-cpu host")
>>>> 
>>> [...]
>>>> I am not sure I understand what you are proposing. Let me explain the
>>>> use case I am thinking about:
>>>> 
>>>> - Feature FOO is of type (A) (e.g. just a new instruction set that
>>>> doesn't require additional userspace support)
>>>> - User has a Qemu vesion that doesn't know anything about feature FOO
>>>> - User gets a new CPU that supports feature FOO
>>>> - User gets a new kernel that supports feature FOO (i.e. has FOO in
>>>> GET_SUPPORTED_CPUID)
>>>> - User does _not_ upgrade Qemu.
>>>> - User expects to get feature FOO enabled if using "-cpu host", without
>>>> upgrading Qemu.
>>>> 
>>>> The problem here is: to support the above use-case, userspace need a
>>>> probing mechanism that can differentiate _new_ (previously unknown)
>>>> features that are in group (A) (safe to blindly enable) from features
>>>> that are in group (B) (that can't be enabled without an userspace
>>>> upgrade).
>>>> 
>>>> In short, it becomes a problem if we consider the following case:
>>>> 
>>>> - Feature BAR is of type (B) (it can't be enabled without extra
>>>> userspace support)
>>>> - User has a Qemu version that doesn't know anything about feature BAR
>>>> - User gets a new CPU that supports feature BAR
>>>> - User gets a new kernel that supports feature BAR (i.e. has BAR in
>>>> GET_SUPPORTED_CPUID)
>>>> - User does _not_ upgrade Qemu.
>>>> - User simply shouldn't get feature BAR enabled, even if using "-cpu
>>>> host", otherwise Qemu would break.
>>>> 
>>>> If userspace always limited itself to features it knows about, it would
>>>> be really easy to implement the feature without any new probing
>>>> mechanism from the kernel. But that's not how I think users expect "-cpu
>>>> host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
>>>> introduced the "-cpu host" feature, in case he can explain what's the
>>>> expected semantics on the cases above.
>> 
>> Can you think of any feature that'd go into category B?
> 
> - TSC-deadline: can't be enabled unless userspace takes care to enable
>  the in-kernel irqchip.

The kernel can check if in-kernel irqchip has it enabled and otherwise mask it out, no?

> - x2apic: ditto.

Same here. For user space irqchip the kernel side doesn't care. If in-kernel APIC is enabled, check for its capabilities.

> 
> I am not sure about XSAVE: an old qemu version would call kvm_put_fpu()
> instead of kvm_put_xsave() on kvm_arch_put_registers(), but I don't know
> if this would have unexpected side-effects or not.

Then XSAVE awareness should be manually enabled by user space. That's what we have ENABLE_CAP for :). Do ENABLE_CAP(XSAVE) -> get XSAVE as a bit in GET_SUPPORTED_CPUID.

> 
> I wouldn't be surprised if we find many other cases, as even the
> GET_SUPPORTED_CPUID documentation is explicit about that: "Userspace can
> use the information returned by this ioctl [GET_SUPPORTED_CPUID] to
> construct cpuid information (for KVM_SET_CPUID2) that is consistent with
> hardware, kernel, and userspace capabilities, [...]"
>                      ^^^^^^^^^^^^^^^^^^^^^^

Yeah, so the intent is that kvm is aware of all the bits user space would care about.

> 
>> 
>> All features I'm aware of work fine (without migration, but that one
>> is moot for -cpu host anyway) as long as the host kvm implementation
>> is fine with it (GET_SUPPORTED_CPUID). So they'd be category A.
> 
> So, you would argue that GET_SUPPORTED_CPUID should include only
> features of type A? That's the opposite of what we have today.
> 
> Maybe we could change the semantics to type-A-only if we define "type A"
> as:
> 
> - Don't require any extra userspace support except for:
>  - Migration support.
>  - Enabling the in-kernel irqchip.
> 
> If we agree on that semantics, "-cpu host" could safely enable all the
> fetures returned by GET_SUPPORTED_CPUID blindly, after making sure that
> migration support is disabled and the in-kernel irqchip is enabled. Then
> all type-B features will require defining a KVM_CAP_* capability instead

not instead. In addition. Define a KVM_CAP_ and do an ENABLE_CAP on that one to have it exposed.

> of using GET_SUPPORTED_CPUID. It's the opposite of the direction I was
> proposing earlier in this thread, but it is starting to look like a
> better idea (otherwise "-cpu host" would never be reliable).
> 
> If we agree on that semantics, it won't require any code change on the
> current code, just a documentation update.

Life is simple, eh? :)


Alex
Andre Przywara - May 9, 2012, 7:16 a.m.
On 05/07/2012 08:21 PM, Eduardo Habkost wrote:
>
> Andre? Are you able to help to answer the question below?

Sorry for the delay, the easy answers first:

> I would like to clarify what's the expected behavior of "-cpu host" to
> be able to continue working on it.

The purpose of -cpu host is to let guests use ISA features that the host 
CPU provides. Those would not need any KVM/QEMU intervention, because 
they work out of the box. Things like AES or SSE4.2, which are used by 
Linux and glibc, for instance. Users would expect those to be usable, 
and actually we only need to set the bits and are done.

A second goal is to get rid of the awkward and artificial 
family/model/stepping settings and just let the guest see the actual CPU 
model. This has some strings attached, though, but other virtualization 
solution do it the same way and they can cope with it.

A third thing currently not really addressed are the more informational 
bits like cache size and organization and TLB layout. Those are 
properties of the (core) CPU (except shared L3, cache maybe) and apply 
to guests as well. I don't see any reason why this should not be exposed 
to the guest. From the top of my head I don't know any prominent user 
(just glibc reading the cache size, but not really using it), but I 
guess there are applications which benefit from it.

What clearly is not the intention of -cpu host is to provide access to 
every feature a certain CPU model provides. So things which require 
emulation or hypervisor intervention are not in the primary use case. 
That does not mean that they don't need to implemented, but that was not 
the purpose of -cpu host and they should be handled independently.

Regards,
Andre.

I believe the code will need to be
> fixed on either case, but first we need to figure out what are the
> expectations/requirements, to know _which_ changes will be needed.
>
>
> On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
>> (CCing Andre Przywara, in case he can help to clarify what's the
>> expected meaning of "-cpu host")
>>
> [...]
>> I am not sure I understand what you are proposing. Let me explain the
>> use case I am thinking about:
>>
>> - Feature FOO is of type (A) (e.g. just a new instruction set that
>>    doesn't require additional userspace support)
>> - User has a Qemu vesion that doesn't know anything about feature FOO
>> - User gets a new CPU that supports feature FOO
>> - User gets a new kernel that supports feature FOO (i.e. has FOO in
>>    GET_SUPPORTED_CPUID)
>> - User does _not_ upgrade Qemu.
>> - User expects to get feature FOO enabled if using "-cpu host", without
>>    upgrading Qemu.
>>
>> The problem here is: to support the above use-case, userspace need a
>> probing mechanism that can differentiate _new_ (previously unknown)
>> features that are in group (A) (safe to blindly enable) from features
>> that are in group (B) (that can't be enabled without an userspace
>> upgrade).
>>
>> In short, it becomes a problem if we consider the following case:
>>
>> - Feature BAR is of type (B) (it can't be enabled without extra
>>    userspace support)
>> - User has a Qemu version that doesn't know anything about feature BAR
>> - User gets a new CPU that supports feature BAR
>> - User gets a new kernel that supports feature BAR (i.e. has BAR in
>>    GET_SUPPORTED_CPUID)
>> - User does _not_ upgrade Qemu.
>> - User simply shouldn't get feature BAR enabled, even if using "-cpu
>>    host", otherwise Qemu would break.
>>
>> If userspace always limited itself to features it knows about, it would
>> be really easy to implement the feature without any new probing
>> mechanism from the kernel. But that's not how I think users expect "-cpu
>> host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
>> introduced the "-cpu host" feature, in case he can explain what's the
>> expected semantics on the cases above.
>>
>
Gleb Natapov - May 9, 2012, 8:14 a.m.
On Wed, May 09, 2012 at 12:07:04AM +0200, Alexander Graf wrote:
> 
> On 08.05.2012, at 22:14, Eduardo Habkost wrote:
> 
> > On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote:
> >> On 07.05.2012, at 20:21, Eduardo Habkost wrote:
> >> 
> >>> 
> >>> Andre? Are you able to help to answer the question below?
> >>> 
> >>> I would like to clarify what's the expected behavior of "-cpu host" to
> >>> be able to continue working on it. I believe the code will need to be
> >>> fixed on either case, but first we need to figure out what are the
> >>> expectations/requirements, to know _which_ changes will be needed.
> >>> 
> >>> 
> >>> On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
> >>>> (CCing Andre Przywara, in case he can help to clarify what's the
> >>>> expected meaning of "-cpu host")
> >>>> 
> >>> [...]
> >>>> I am not sure I understand what you are proposing. Let me explain the
> >>>> use case I am thinking about:
> >>>> 
> >>>> - Feature FOO is of type (A) (e.g. just a new instruction set that
> >>>> doesn't require additional userspace support)
> >>>> - User has a Qemu vesion that doesn't know anything about feature FOO
> >>>> - User gets a new CPU that supports feature FOO
> >>>> - User gets a new kernel that supports feature FOO (i.e. has FOO in
> >>>> GET_SUPPORTED_CPUID)
> >>>> - User does _not_ upgrade Qemu.
> >>>> - User expects to get feature FOO enabled if using "-cpu host", without
> >>>> upgrading Qemu.
> >>>> 
> >>>> The problem here is: to support the above use-case, userspace need a
> >>>> probing mechanism that can differentiate _new_ (previously unknown)
> >>>> features that are in group (A) (safe to blindly enable) from features
> >>>> that are in group (B) (that can't be enabled without an userspace
> >>>> upgrade).
> >>>> 
> >>>> In short, it becomes a problem if we consider the following case:
> >>>> 
> >>>> - Feature BAR is of type (B) (it can't be enabled without extra
> >>>> userspace support)
> >>>> - User has a Qemu version that doesn't know anything about feature BAR
> >>>> - User gets a new CPU that supports feature BAR
> >>>> - User gets a new kernel that supports feature BAR (i.e. has BAR in
> >>>> GET_SUPPORTED_CPUID)
> >>>> - User does _not_ upgrade Qemu.
> >>>> - User simply shouldn't get feature BAR enabled, even if using "-cpu
> >>>> host", otherwise Qemu would break.
> >>>> 
> >>>> If userspace always limited itself to features it knows about, it would
> >>>> be really easy to implement the feature without any new probing
> >>>> mechanism from the kernel. But that's not how I think users expect "-cpu
> >>>> host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
> >>>> introduced the "-cpu host" feature, in case he can explain what's the
> >>>> expected semantics on the cases above.
> >> 
> >> Can you think of any feature that'd go into category B?
> > 
> > - TSC-deadline: can't be enabled unless userspace takes care to enable
> >  the in-kernel irqchip.
> 
> The kernel can check if in-kernel irqchip has it enabled and otherwise mask it out, no?
> 
How kernel should know that userspace does not emulate it?

> > - x2apic: ditto.
> 
> Same here. For user space irqchip the kernel side doesn't care. If in-kernel APIC is enabled, check for its capabilities.
> 
Same here.

Well, technically both of those features can't be implemented in
userspace right now since MSRs are terminated in the kernel, but I
wouldn't make it into ABI.


--
			Gleb.
Alexander Graf - May 9, 2012, 8:42 a.m.
On 09.05.2012, at 10:14, Gleb Natapov <gleb@redhat.com> wrote:

> On Wed, May 09, 2012 at 12:07:04AM +0200, Alexander Graf wrote:
>> 
>> On 08.05.2012, at 22:14, Eduardo Habkost wrote:
>> 
>>> On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote:
>>>> On 07.05.2012, at 20:21, Eduardo Habkost wrote:
>>>> 
>>>>> 
>>>>> Andre? Are you able to help to answer the question below?
>>>>> 
>>>>> I would like to clarify what's the expected behavior of "-cpu host" to
>>>>> be able to continue working on it. I believe the code will need to be
>>>>> fixed on either case, but first we need to figure out what are the
>>>>> expectations/requirements, to know _which_ changes will be needed.
>>>>> 
>>>>> 
>>>>> On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
>>>>>> (CCing Andre Przywara, in case he can help to clarify what's the
>>>>>> expected meaning of "-cpu host")
>>>>>> 
>>>>> [...]
>>>>>> I am not sure I understand what you are proposing. Let me explain the
>>>>>> use case I am thinking about:
>>>>>> 
>>>>>> - Feature FOO is of type (A) (e.g. just a new instruction set that
>>>>>> doesn't require additional userspace support)
>>>>>> - User has a Qemu vesion that doesn't know anything about feature FOO
>>>>>> - User gets a new CPU that supports feature FOO
>>>>>> - User gets a new kernel that supports feature FOO (i.e. has FOO in
>>>>>> GET_SUPPORTED_CPUID)
>>>>>> - User does _not_ upgrade Qemu.
>>>>>> - User expects to get feature FOO enabled if using "-cpu host", without
>>>>>> upgrading Qemu.
>>>>>> 
>>>>>> The problem here is: to support the above use-case, userspace need a
>>>>>> probing mechanism that can differentiate _new_ (previously unknown)
>>>>>> features that are in group (A) (safe to blindly enable) from features
>>>>>> that are in group (B) (that can't be enabled without an userspace
>>>>>> upgrade).
>>>>>> 
>>>>>> In short, it becomes a problem if we consider the following case:
>>>>>> 
>>>>>> - Feature BAR is of type (B) (it can't be enabled without extra
>>>>>> userspace support)
>>>>>> - User has a Qemu version that doesn't know anything about feature BAR
>>>>>> - User gets a new CPU that supports feature BAR
>>>>>> - User gets a new kernel that supports feature BAR (i.e. has BAR in
>>>>>> GET_SUPPORTED_CPUID)
>>>>>> - User does _not_ upgrade Qemu.
>>>>>> - User simply shouldn't get feature BAR enabled, even if using "-cpu
>>>>>> host", otherwise Qemu would break.
>>>>>> 
>>>>>> If userspace always limited itself to features it knows about, it would
>>>>>> be really easy to implement the feature without any new probing
>>>>>> mechanism from the kernel. But that's not how I think users expect "-cpu
>>>>>> host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
>>>>>> introduced the "-cpu host" feature, in case he can explain what's the
>>>>>> expected semantics on the cases above.
>>>> 
>>>> Can you think of any feature that'd go into category B?
>>> 
>>> - TSC-deadline: can't be enabled unless userspace takes care to enable
>>> the in-kernel irqchip.
>> 
>> The kernel can check if in-kernel irqchip has it enabled and otherwise mask it out, no?
>> 
> How kernel should know that userspace does not emulate it?

You have to enable the in-kernel apic to use it, at which point the kernel knows it's in use, right?

> 
>>> - x2apic: ditto.
>> 
>> Same here. For user space irqchip the kernel side doesn't care. If in-kernel APIC is enabled, check for its capabilities.
>> 
> Same here.
> 
> Well, technically both of those features can't be implemented in
> userspace right now since MSRs are terminated in the kernel, but I

Doesn't sound like the greatest design - unless you deprecate the non-in-kernel apic case.

> wouldn't make it into ABI.
> 
> 
> --
>            Gleb.
Gleb Natapov - May 9, 2012, 8:51 a.m.
On Wed, May 09, 2012 at 10:42:26AM +0200, Alexander Graf wrote:
> 
> 
> On 09.05.2012, at 10:14, Gleb Natapov <gleb@redhat.com> wrote:
> 
> > On Wed, May 09, 2012 at 12:07:04AM +0200, Alexander Graf wrote:
> >> 
> >> On 08.05.2012, at 22:14, Eduardo Habkost wrote:
> >> 
> >>> On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote:
> >>>> On 07.05.2012, at 20:21, Eduardo Habkost wrote:
> >>>> 
> >>>>> 
> >>>>> Andre? Are you able to help to answer the question below?
> >>>>> 
> >>>>> I would like to clarify what's the expected behavior of "-cpu host" to
> >>>>> be able to continue working on it. I believe the code will need to be
> >>>>> fixed on either case, but first we need to figure out what are the
> >>>>> expectations/requirements, to know _which_ changes will be needed.
> >>>>> 
> >>>>> 
> >>>>> On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
> >>>>>> (CCing Andre Przywara, in case he can help to clarify what's the
> >>>>>> expected meaning of "-cpu host")
> >>>>>> 
> >>>>> [...]
> >>>>>> I am not sure I understand what you are proposing. Let me explain the
> >>>>>> use case I am thinking about:
> >>>>>> 
> >>>>>> - Feature FOO is of type (A) (e.g. just a new instruction set that
> >>>>>> doesn't require additional userspace support)
> >>>>>> - User has a Qemu vesion that doesn't know anything about feature FOO
> >>>>>> - User gets a new CPU that supports feature FOO
> >>>>>> - User gets a new kernel that supports feature FOO (i.e. has FOO in
> >>>>>> GET_SUPPORTED_CPUID)
> >>>>>> - User does _not_ upgrade Qemu.
> >>>>>> - User expects to get feature FOO enabled if using "-cpu host", without
> >>>>>> upgrading Qemu.
> >>>>>> 
> >>>>>> The problem here is: to support the above use-case, userspace need a
> >>>>>> probing mechanism that can differentiate _new_ (previously unknown)
> >>>>>> features that are in group (A) (safe to blindly enable) from features
> >>>>>> that are in group (B) (that can't be enabled without an userspace
> >>>>>> upgrade).
> >>>>>> 
> >>>>>> In short, it becomes a problem if we consider the following case:
> >>>>>> 
> >>>>>> - Feature BAR is of type (B) (it can't be enabled without extra
> >>>>>> userspace support)
> >>>>>> - User has a Qemu version that doesn't know anything about feature BAR
> >>>>>> - User gets a new CPU that supports feature BAR
> >>>>>> - User gets a new kernel that supports feature BAR (i.e. has BAR in
> >>>>>> GET_SUPPORTED_CPUID)
> >>>>>> - User does _not_ upgrade Qemu.
> >>>>>> - User simply shouldn't get feature BAR enabled, even if using "-cpu
> >>>>>> host", otherwise Qemu would break.
> >>>>>> 
> >>>>>> If userspace always limited itself to features it knows about, it would
> >>>>>> be really easy to implement the feature without any new probing
> >>>>>> mechanism from the kernel. But that's not how I think users expect "-cpu
> >>>>>> host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
> >>>>>> introduced the "-cpu host" feature, in case he can explain what's the
> >>>>>> expected semantics on the cases above.
> >>>> 
> >>>> Can you think of any feature that'd go into category B?
> >>> 
> >>> - TSC-deadline: can't be enabled unless userspace takes care to enable
> >>> the in-kernel irqchip.
> >> 
> >> The kernel can check if in-kernel irqchip has it enabled and otherwise mask it out, no?
> >> 
> > How kernel should know that userspace does not emulate it?
> 
> You have to enable the in-kernel apic to use it, at which point the kernel knows it's in use, right?
> 
> > 
> >>> - x2apic: ditto.
> >> 
> >> Same here. For user space irqchip the kernel side doesn't care. If in-kernel APIC is enabled, check for its capabilities.
> >> 
> > Same here.
> > 
> > Well, technically both of those features can't be implemented in
> > userspace right now since MSRs are terminated in the kernel, but I
> 
> Doesn't sound like the greatest design - unless you deprecate the non-in-kernel apic case.
> 
You mean terminating MSRs in kernel does not sound like the greatest
design? I do not disagree. That is why IMO kernel can't filter out
TSC-deadline and x2apic like you suggest.

> > wouldn't make it into ABI.
> > 
> > 
> > --
> >            Gleb.

--
			Gleb.
Alexander Graf - May 9, 2012, 9:05 a.m.
On 09.05.2012, at 10:51, Gleb Natapov wrote:

> On Wed, May 09, 2012 at 10:42:26AM +0200, Alexander Graf wrote:
>> 
>> 
>> On 09.05.2012, at 10:14, Gleb Natapov <gleb@redhat.com> wrote:
>> 
>>> On Wed, May 09, 2012 at 12:07:04AM +0200, Alexander Graf wrote:
>>>> 
>>>> On 08.05.2012, at 22:14, Eduardo Habkost wrote:
>>>> 
>>>>> On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote:
>>>>>> On 07.05.2012, at 20:21, Eduardo Habkost wrote:
>>>>>> 
>>>>>>> 
>>>>>>> Andre? Are you able to help to answer the question below?
>>>>>>> 
>>>>>>> I would like to clarify what's the expected behavior of "-cpu host" to
>>>>>>> be able to continue working on it. I believe the code will need to be
>>>>>>> fixed on either case, but first we need to figure out what are the
>>>>>>> expectations/requirements, to know _which_ changes will be needed.
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
>>>>>>>> (CCing Andre Przywara, in case he can help to clarify what's the
>>>>>>>> expected meaning of "-cpu host")
>>>>>>>> 
>>>>>>> [...]
>>>>>>>> I am not sure I understand what you are proposing. Let me explain the
>>>>>>>> use case I am thinking about:
>>>>>>>> 
>>>>>>>> - Feature FOO is of type (A) (e.g. just a new instruction set that
>>>>>>>> doesn't require additional userspace support)
>>>>>>>> - User has a Qemu vesion that doesn't know anything about feature FOO
>>>>>>>> - User gets a new CPU that supports feature FOO
>>>>>>>> - User gets a new kernel that supports feature FOO (i.e. has FOO in
>>>>>>>> GET_SUPPORTED_CPUID)
>>>>>>>> - User does _not_ upgrade Qemu.
>>>>>>>> - User expects to get feature FOO enabled if using "-cpu host", without
>>>>>>>> upgrading Qemu.
>>>>>>>> 
>>>>>>>> The problem here is: to support the above use-case, userspace need a
>>>>>>>> probing mechanism that can differentiate _new_ (previously unknown)
>>>>>>>> features that are in group (A) (safe to blindly enable) from features
>>>>>>>> that are in group (B) (that can't be enabled without an userspace
>>>>>>>> upgrade).
>>>>>>>> 
>>>>>>>> In short, it becomes a problem if we consider the following case:
>>>>>>>> 
>>>>>>>> - Feature BAR is of type (B) (it can't be enabled without extra
>>>>>>>> userspace support)
>>>>>>>> - User has a Qemu version that doesn't know anything about feature BAR
>>>>>>>> - User gets a new CPU that supports feature BAR
>>>>>>>> - User gets a new kernel that supports feature BAR (i.e. has BAR in
>>>>>>>> GET_SUPPORTED_CPUID)
>>>>>>>> - User does _not_ upgrade Qemu.
>>>>>>>> - User simply shouldn't get feature BAR enabled, even if using "-cpu
>>>>>>>> host", otherwise Qemu would break.
>>>>>>>> 
>>>>>>>> If userspace always limited itself to features it knows about, it would
>>>>>>>> be really easy to implement the feature without any new probing
>>>>>>>> mechanism from the kernel. But that's not how I think users expect "-cpu
>>>>>>>> host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
>>>>>>>> introduced the "-cpu host" feature, in case he can explain what's the
>>>>>>>> expected semantics on the cases above.
>>>>>> 
>>>>>> Can you think of any feature that'd go into category B?
>>>>> 
>>>>> - TSC-deadline: can't be enabled unless userspace takes care to enable
>>>>> the in-kernel irqchip.
>>>> 
>>>> The kernel can check if in-kernel irqchip has it enabled and otherwise mask it out, no?
>>>> 
>>> How kernel should know that userspace does not emulate it?
>> 
>> You have to enable the in-kernel apic to use it, at which point the kernel knows it's in use, right?
>> 
>>> 
>>>>> - x2apic: ditto.
>>>> 
>>>> Same here. For user space irqchip the kernel side doesn't care. If in-kernel APIC is enabled, check for its capabilities.
>>>> 
>>> Same here.
>>> 
>>> Well, technically both of those features can't be implemented in
>>> userspace right now since MSRs are terminated in the kernel, but I
>> 
>> Doesn't sound like the greatest design - unless you deprecate the non-in-kernel apic case.
>> 
> You mean terminating MSRs in kernel does not sound like the greatest
> design? I do not disagree. That is why IMO kernel can't filter out
> TSC-deadline and x2apic like you suggest.

I still don't see why it can't.

Imagine we would filter TSC-deadline and x2apic by default in the kernel - they are not known to exist yet.
Now, we implement TSC-deadline in the kernel. We still filter TSC-deadline out in GET_SUPORTED_CPUID in the kernel. But we provide an interface to user space that says "call me to enable TSC-deadline CPUID, but only if you're using the in-kernel apic"
New user space calls that ioctl when it's using the in-kernel apic, it doesn't when it's using the user space apic.
Old user space doesn't call that ioctl.

So at the end all bits in GET_SUPPORTED_CPUID are consistent with what user space is capable of.


Alex
Gleb Natapov - May 9, 2012, 9:38 a.m.
On Wed, May 09, 2012 at 11:05:58AM +0200, Alexander Graf wrote:
> 
> On 09.05.2012, at 10:51, Gleb Natapov wrote:
> 
> > On Wed, May 09, 2012 at 10:42:26AM +0200, Alexander Graf wrote:
> >> 
> >> 
> >> On 09.05.2012, at 10:14, Gleb Natapov <gleb@redhat.com> wrote:
> >> 
> >>> On Wed, May 09, 2012 at 12:07:04AM +0200, Alexander Graf wrote:
> >>>> 
> >>>> On 08.05.2012, at 22:14, Eduardo Habkost wrote:
> >>>> 
> >>>>> On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote:
> >>>>>> On 07.05.2012, at 20:21, Eduardo Habkost wrote:
> >>>>>> 
> >>>>>>> 
> >>>>>>> Andre? Are you able to help to answer the question below?
> >>>>>>> 
> >>>>>>> I would like to clarify what's the expected behavior of "-cpu host" to
> >>>>>>> be able to continue working on it. I believe the code will need to be
> >>>>>>> fixed on either case, but first we need to figure out what are the
> >>>>>>> expectations/requirements, to know _which_ changes will be needed.
> >>>>>>> 
> >>>>>>> 
> >>>>>>> On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
> >>>>>>>> (CCing Andre Przywara, in case he can help to clarify what's the
> >>>>>>>> expected meaning of "-cpu host")
> >>>>>>>> 
> >>>>>>> [...]
> >>>>>>>> I am not sure I understand what you are proposing. Let me explain the
> >>>>>>>> use case I am thinking about:
> >>>>>>>> 
> >>>>>>>> - Feature FOO is of type (A) (e.g. just a new instruction set that
> >>>>>>>> doesn't require additional userspace support)
> >>>>>>>> - User has a Qemu vesion that doesn't know anything about feature FOO
> >>>>>>>> - User gets a new CPU that supports feature FOO
> >>>>>>>> - User gets a new kernel that supports feature FOO (i.e. has FOO in
> >>>>>>>> GET_SUPPORTED_CPUID)
> >>>>>>>> - User does _not_ upgrade Qemu.
> >>>>>>>> - User expects to get feature FOO enabled if using "-cpu host", without
> >>>>>>>> upgrading Qemu.
> >>>>>>>> 
> >>>>>>>> The problem here is: to support the above use-case, userspace need a
> >>>>>>>> probing mechanism that can differentiate _new_ (previously unknown)
> >>>>>>>> features that are in group (A) (safe to blindly enable) from features
> >>>>>>>> that are in group (B) (that can't be enabled without an userspace
> >>>>>>>> upgrade).
> >>>>>>>> 
> >>>>>>>> In short, it becomes a problem if we consider the following case:
> >>>>>>>> 
> >>>>>>>> - Feature BAR is of type (B) (it can't be enabled without extra
> >>>>>>>> userspace support)
> >>>>>>>> - User has a Qemu version that doesn't know anything about feature BAR
> >>>>>>>> - User gets a new CPU that supports feature BAR
> >>>>>>>> - User gets a new kernel that supports feature BAR (i.e. has BAR in
> >>>>>>>> GET_SUPPORTED_CPUID)
> >>>>>>>> - User does _not_ upgrade Qemu.
> >>>>>>>> - User simply shouldn't get feature BAR enabled, even if using "-cpu
> >>>>>>>> host", otherwise Qemu would break.
> >>>>>>>> 
> >>>>>>>> If userspace always limited itself to features it knows about, it would
> >>>>>>>> be really easy to implement the feature without any new probing
> >>>>>>>> mechanism from the kernel. But that's not how I think users expect "-cpu
> >>>>>>>> host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
> >>>>>>>> introduced the "-cpu host" feature, in case he can explain what's the
> >>>>>>>> expected semantics on the cases above.
> >>>>>> 
> >>>>>> Can you think of any feature that'd go into category B?
> >>>>> 
> >>>>> - TSC-deadline: can't be enabled unless userspace takes care to enable
> >>>>> the in-kernel irqchip.
> >>>> 
> >>>> The kernel can check if in-kernel irqchip has it enabled and otherwise mask it out, no?
> >>>> 
> >>> How kernel should know that userspace does not emulate it?
> >> 
> >> You have to enable the in-kernel apic to use it, at which point the kernel knows it's in use, right?
> >> 
> >>> 
> >>>>> - x2apic: ditto.
> >>>> 
> >>>> Same here. For user space irqchip the kernel side doesn't care. If in-kernel APIC is enabled, check for its capabilities.
> >>>> 
> >>> Same here.
> >>> 
> >>> Well, technically both of those features can't be implemented in
> >>> userspace right now since MSRs are terminated in the kernel, but I
> >> 
> >> Doesn't sound like the greatest design - unless you deprecate the non-in-kernel apic case.
> >> 
> > You mean terminating MSRs in kernel does not sound like the greatest
> > design? I do not disagree. That is why IMO kernel can't filter out
> > TSC-deadline and x2apic like you suggest.
> 
> I still don't see why it can't.
> 
> Imagine we would filter TSC-deadline and x2apic by default in the kernel - they are not known to exist yet.
> Now, we implement TSC-deadline in the kernel. We still filter TSC-deadline out in GET_SUPORTED_CPUID in the kernel. But we provide an interface to user space that says "call me to enable TSC-deadline CPUID, but only if you're using the in-kernel apic"
> New user space calls that ioctl when it's using the in-kernel apic, it doesn't when it's using the user space apic.
> Old user space doesn't call that ioctl.
First of all we already have TSC-deadline in GET_SUPORTED_CPUID without
any additional ioctls. And second I do not see why we need additional
iostls here.

Hmm, so may be I misunderstood you. You propose to mask TSC-deadline and
x2apic out from GET_SUPORTED_CPUID if irq chip is not in kernel, not
from KVM_SET_CPUID? For those two features it may make sense indeed. Not
sure there won't be others that are not dependent on irq chip presence.
You propose to add additional ioctls to enable them if they appear?

> 
> So at the end all bits in GET_SUPPORTED_CPUID are consistent with what user space is capable of.
> 
GET_SUPPORTED_CPUID should not be necessary consistent with what user
space is capable of. Userspace may emulate features that are not in
GET_SUPPORTED_CPUID.

--
			Gleb.
Alexander Graf - May 9, 2012, 9:54 a.m.
On 05/09/2012 11:38 AM, Gleb Natapov wrote:
> On Wed, May 09, 2012 at 11:05:58AM +0200, Alexander Graf wrote:
>> On 09.05.2012, at 10:51, Gleb Natapov wrote:
>>
>>> On Wed, May 09, 2012 at 10:42:26AM +0200, Alexander Graf wrote:
>>>>
>>>> On 09.05.2012, at 10:14, Gleb Natapov<gleb@redhat.com>  wrote:
>>>>
>>>>> On Wed, May 09, 2012 at 12:07:04AM +0200, Alexander Graf wrote:
>>>>>> On 08.05.2012, at 22:14, Eduardo Habkost wrote:
>>>>>>
>>>>>>> On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote:
>>>>>>>> On 07.05.2012, at 20:21, Eduardo Habkost wrote:
>>>>>>>>
>>>>>>>>> Andre? Are you able to help to answer the question below?
>>>>>>>>>
>>>>>>>>> I would like to clarify what's the expected behavior of "-cpu host" to
>>>>>>>>> be able to continue working on it. I believe the code will need to be
>>>>>>>>> fixed on either case, but first we need to figure out what are the
>>>>>>>>> expectations/requirements, to know _which_ changes will be needed.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
>>>>>>>>>> (CCing Andre Przywara, in case he can help to clarify what's the
>>>>>>>>>> expected meaning of "-cpu host")
>>>>>>>>>>
>>>>>>>>> [...]
>>>>>>>>>> I am not sure I understand what you are proposing. Let me explain the
>>>>>>>>>> use case I am thinking about:
>>>>>>>>>>
>>>>>>>>>> - Feature FOO is of type (A) (e.g. just a new instruction set that
>>>>>>>>>> doesn't require additional userspace support)
>>>>>>>>>> - User has a Qemu vesion that doesn't know anything about feature FOO
>>>>>>>>>> - User gets a new CPU that supports feature FOO
>>>>>>>>>> - User gets a new kernel that supports feature FOO (i.e. has FOO in
>>>>>>>>>> GET_SUPPORTED_CPUID)
>>>>>>>>>> - User does _not_ upgrade Qemu.
>>>>>>>>>> - User expects to get feature FOO enabled if using "-cpu host", without
>>>>>>>>>> upgrading Qemu.
>>>>>>>>>>
>>>>>>>>>> The problem here is: to support the above use-case, userspace need a
>>>>>>>>>> probing mechanism that can differentiate _new_ (previously unknown)
>>>>>>>>>> features that are in group (A) (safe to blindly enable) from features
>>>>>>>>>> that are in group (B) (that can't be enabled without an userspace
>>>>>>>>>> upgrade).
>>>>>>>>>>
>>>>>>>>>> In short, it becomes a problem if we consider the following case:
>>>>>>>>>>
>>>>>>>>>> - Feature BAR is of type (B) (it can't be enabled without extra
>>>>>>>>>> userspace support)
>>>>>>>>>> - User has a Qemu version that doesn't know anything about feature BAR
>>>>>>>>>> - User gets a new CPU that supports feature BAR
>>>>>>>>>> - User gets a new kernel that supports feature BAR (i.e. has BAR in
>>>>>>>>>> GET_SUPPORTED_CPUID)
>>>>>>>>>> - User does _not_ upgrade Qemu.
>>>>>>>>>> - User simply shouldn't get feature BAR enabled, even if using "-cpu
>>>>>>>>>> host", otherwise Qemu would break.
>>>>>>>>>>
>>>>>>>>>> If userspace always limited itself to features it knows about, it would
>>>>>>>>>> be really easy to implement the feature without any new probing
>>>>>>>>>> mechanism from the kernel. But that's not how I think users expect "-cpu
>>>>>>>>>> host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
>>>>>>>>>> introduced the "-cpu host" feature, in case he can explain what's the
>>>>>>>>>> expected semantics on the cases above.
>>>>>>>> Can you think of any feature that'd go into category B?
>>>>>>> - TSC-deadline: can't be enabled unless userspace takes care to enable
>>>>>>> the in-kernel irqchip.
>>>>>> The kernel can check if in-kernel irqchip has it enabled and otherwise mask it out, no?
>>>>>>
>>>>> How kernel should know that userspace does not emulate it?
>>>> You have to enable the in-kernel apic to use it, at which point the kernel knows it's in use, right?
>>>>
>>>>>>> - x2apic: ditto.
>>>>>> Same here. For user space irqchip the kernel side doesn't care. If in-kernel APIC is enabled, check for its capabilities.
>>>>>>
>>>>> Same here.
>>>>>
>>>>> Well, technically both of those features can't be implemented in
>>>>> userspace right now since MSRs are terminated in the kernel, but I
>>>> Doesn't sound like the greatest design - unless you deprecate the non-in-kernel apic case.
>>>>
>>> You mean terminating MSRs in kernel does not sound like the greatest
>>> design? I do not disagree. That is why IMO kernel can't filter out
>>> TSC-deadline and x2apic like you suggest.
>> I still don't see why it can't.
>>
>> Imagine we would filter TSC-deadline and x2apic by default in the kernel - they are not known to exist yet.
>> Now, we implement TSC-deadline in the kernel. We still filter TSC-deadline out in GET_SUPORTED_CPUID in the kernel. But we provide an interface to user space that says "call me to enable TSC-deadline CPUID, but only if you're using the in-kernel apic"
>> New user space calls that ioctl when it's using the in-kernel apic, it doesn't when it's using the user space apic.
>> Old user space doesn't call that ioctl.
> First of all we already have TSC-deadline in GET_SUPORTED_CPUID without
> any additional ioctls. And second I do not see why we need additional
> iostls here.

Yeah, some times our ABI is already broken :(. What a shame...

> Hmm, so may be I misunderstood you. You propose to mask TSC-deadline and
> x2apic out from GET_SUPORTED_CPUID if irq chip is not in kernel, not
> from KVM_SET_CPUID? For those two features it may make sense indeed. Not
> sure there won't be others that are not dependent on irq chip presence.
> You propose to add additional ioctls to enable them if they appear?

That's the only backwards compatible way to design this without putting 
a plethora of knowledge into user space I can see, yes.

>
>> So at the end all bits in GET_SUPPORTED_CPUID are consistent with what user space is capable of.
>>
> GET_SUPPORTED_CPUID should not be necessary consistent with what user
> space is capable of. Userspace may emulate features that are not in
> GET_SUPPORTED_CPUID.

If it does, it can OR them in. GET_SUPPORTED_CPUID should return "these 
are the bits that your currently configured kvm is capable of".


Alex
Eduardo Habkost - May 9, 2012, 7:38 p.m.
On Wed, May 09, 2012 at 12:38:37PM +0300, Gleb Natapov wrote:
> On Wed, May 09, 2012 at 11:05:58AM +0200, Alexander Graf wrote:
> > 
> > On 09.05.2012, at 10:51, Gleb Natapov wrote:
> > 
> > > On Wed, May 09, 2012 at 10:42:26AM +0200, Alexander Graf wrote:
> > >> 
> > >> 
> > >> On 09.05.2012, at 10:14, Gleb Natapov <gleb@redhat.com> wrote:
> > >> 
> > >>> On Wed, May 09, 2012 at 12:07:04AM +0200, Alexander Graf wrote:
> > >>>> 
> > >>>> On 08.05.2012, at 22:14, Eduardo Habkost wrote:
> > >>>> 
> > >>>>> On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote:
> > >>>>>> On 07.05.2012, at 20:21, Eduardo Habkost wrote:
> > >>>>>> 
> > >>>>>>> 
> > >>>>>>> Andre? Are you able to help to answer the question below?
> > >>>>>>> 
> > >>>>>>> I would like to clarify what's the expected behavior of "-cpu host" to
> > >>>>>>> be able to continue working on it. I believe the code will need to be
> > >>>>>>> fixed on either case, but first we need to figure out what are the
> > >>>>>>> expectations/requirements, to know _which_ changes will be needed.
> > >>>>>>> 
> > >>>>>>> 
> > >>>>>>> On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
> > >>>>>>>> (CCing Andre Przywara, in case he can help to clarify what's the
> > >>>>>>>> expected meaning of "-cpu host")
> > >>>>>>>> 
> > >>>>>>> [...]
> > >>>>>>>> I am not sure I understand what you are proposing. Let me explain the
> > >>>>>>>> use case I am thinking about:
> > >>>>>>>> 
> > >>>>>>>> - Feature FOO is of type (A) (e.g. just a new instruction set that
> > >>>>>>>> doesn't require additional userspace support)
> > >>>>>>>> - User has a Qemu vesion that doesn't know anything about feature FOO
> > >>>>>>>> - User gets a new CPU that supports feature FOO
> > >>>>>>>> - User gets a new kernel that supports feature FOO (i.e. has FOO in
> > >>>>>>>> GET_SUPPORTED_CPUID)
> > >>>>>>>> - User does _not_ upgrade Qemu.
> > >>>>>>>> - User expects to get feature FOO enabled if using "-cpu host", without
> > >>>>>>>> upgrading Qemu.
> > >>>>>>>> 
> > >>>>>>>> The problem here is: to support the above use-case, userspace need a
> > >>>>>>>> probing mechanism that can differentiate _new_ (previously unknown)
> > >>>>>>>> features that are in group (A) (safe to blindly enable) from features
> > >>>>>>>> that are in group (B) (that can't be enabled without an userspace
> > >>>>>>>> upgrade).
> > >>>>>>>> 
> > >>>>>>>> In short, it becomes a problem if we consider the following case:
> > >>>>>>>> 
> > >>>>>>>> - Feature BAR is of type (B) (it can't be enabled without extra
> > >>>>>>>> userspace support)
> > >>>>>>>> - User has a Qemu version that doesn't know anything about feature BAR
> > >>>>>>>> - User gets a new CPU that supports feature BAR
> > >>>>>>>> - User gets a new kernel that supports feature BAR (i.e. has BAR in
> > >>>>>>>> GET_SUPPORTED_CPUID)
> > >>>>>>>> - User does _not_ upgrade Qemu.
> > >>>>>>>> - User simply shouldn't get feature BAR enabled, even if using "-cpu
> > >>>>>>>> host", otherwise Qemu would break.
> > >>>>>>>> 
> > >>>>>>>> If userspace always limited itself to features it knows about, it would
> > >>>>>>>> be really easy to implement the feature without any new probing
> > >>>>>>>> mechanism from the kernel. But that's not how I think users expect "-cpu
> > >>>>>>>> host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
> > >>>>>>>> introduced the "-cpu host" feature, in case he can explain what's the
> > >>>>>>>> expected semantics on the cases above.
> > >>>>>> 
> > >>>>>> Can you think of any feature that'd go into category B?
> > >>>>> 
> > >>>>> - TSC-deadline: can't be enabled unless userspace takes care to enable
> > >>>>> the in-kernel irqchip.
> > >>>> 
> > >>>> The kernel can check if in-kernel irqchip has it enabled and otherwise mask it out, no?
> > >>>> 
> > >>> How kernel should know that userspace does not emulate it?
> > >> 
> > >> You have to enable the in-kernel apic to use it, at which point the kernel knows it's in use, right?
> > >> 
> > >>> 
> > >>>>> - x2apic: ditto.
> > >>>> 
> > >>>> Same here. For user space irqchip the kernel side doesn't care. If in-kernel APIC is enabled, check for its capabilities.
> > >>>> 
> > >>> Same here.
> > >>> 
> > >>> Well, technically both of those features can't be implemented in
> > >>> userspace right now since MSRs are terminated in the kernel, but I
> > >> 
> > >> Doesn't sound like the greatest design - unless you deprecate the non-in-kernel apic case.
> > >> 
> > > You mean terminating MSRs in kernel does not sound like the greatest
> > > design? I do not disagree. That is why IMO kernel can't filter out
> > > TSC-deadline and x2apic like you suggest.
> > 
> > I still don't see why it can't.
> > 
> > Imagine we would filter TSC-deadline and x2apic by default in the kernel - they are not known to exist yet.
> > Now, we implement TSC-deadline in the kernel. We still filter
> > TSC-deadline out in GET_SUPORTED_CPUID in the kernel. But we provide
> > an interface to user space that says "call me to enable TSC-deadline
> > CPUID, but only if you're using the in-kernel apic"

We have that interface already, it is called KVM_SET_CPUID.  :-)

> > New user space calls that ioctl when it's using the in-kernel apic, it doesn't when it's using the user space apic.
> > Old user space doesn't call that ioctl.
> First of all we already have TSC-deadline in GET_SUPORTED_CPUID without
> any additional ioctls. And second I do not see why we need additional
> iostls here.

We don't have TSC-deadline set today (and that's what started this
thread), but we have x2apic. So what you say is true for x2apic, anyway.

> Hmm, so may be I misunderstood you. You propose to mask TSC-deadline and
> x2apic out from GET_SUPORTED_CPUID if irq chip is not in kernel, not
> from KVM_SET_CPUID? For those two features it may make sense indeed.

It makes sense to me.

It looks like my assumptions were wrong. They were:

- GET_SUPPORTED_CPUID simply can't know if the in-kernel irqchip is
  going to be enabled or not.
- GET_SUPPORTED_CPUID output has to be a function of the kernel code
  capabilitie and host CPU, and not depend on any input from userspace.

Are those assumptions incorrect? If we break them, we may try what
Alexander is proposing. It would be much more flexible than the options
I was considering.

I didn't know ENABLE_CAP existed. Even if GET_SUPPORTED_CPUID can't
check for the in-kernel irqchip setup for some reason, ENABLE_CAP could
be used by userpace to tell the kernel "I will enable the in-kernel
irqchip, so feel free to return features that depend on it on
GET_SUPPORTED_CPUID".

In other words, we would return only the type-A features on
GET_SUPPORTED_CPUID (i.e. safe to be blindly enabled by -cpu host as
long as migration is not required), but if we use ENABLE_CAP we can make
group A safely grow, as long as userspace first tells the kernel what it
supports.

Anybody is against doing that? Otherwise I plan to work on this.
Probably I will start by making GET_SUPPORTED_CPUID not return x2apic
unless userspace tells the kernel (using ENABLE_CAP) that it will enable
the in-kernel irqchip. Then we can do the same with TSC-deadline.

Note that all this work is to allow the kernel to let userspace blindly
enable features it _doesn't know yet_. If we limit ourselves to features
userspace already knows about, we could simply remove x2apic and
TSC-deadline from GET_SUPPORTED_CPUID completely, not use ENABLE_CAP for
that, and let userspace set x2apic or TSC-deadline on KVM_SET_CPUID only
if it knows it is safe (either because it checked for the corresponding
KVM_CAP_* capability is present and it will enable the in-kernel
irqchip, or because it will emulated it in userspace).

In case anybody is against the proposal above: note that the current
documented GET_SUPPORTED_CPUID semantics (unconditionally returning bits
that depend on specific userspace behavior/capabilities) is simply
unusable by "-cpu host". If the above proposal gets rejected, my Plan B
is to update the GET_SUPPORTED_CPUID documentation to note that it
returns only type-A features (features that userspace can safely enable
even if it doesn't know what it does), remove x2apic from
GET_SUPPORTED_CPUID forever, and use KVM_CAP_* for discovery of all
type-B features (features that depend on specific userspace
capabilities/behavior).


> Not
> sure there won't be others that are not dependent on irq chip presence.
> You propose to add additional ioctls to enable them if they appear?

I am sure there will be new features in the future that don't depend on
any userspace support, so they would be enabled on GET_SUPPORTED_CPUID
unconditionally.

But if we have new features that depend on specific userspace
capabilities/behavior (i.e. enabling the irqchip, or something else), we
could also add them as long as we check if that capability/behavior was
enabled using ENABLE_CAP.


> > 
> > So at the end all bits in GET_SUPPORTED_CPUID are consistent with what user space is capable of.
> > 
> GET_SUPPORTED_CPUID should not be necessary consistent with what user
> space is capable of. Userspace may emulate features that are not in
> GET_SUPPORTED_CPUID.

True. We don't need to make the interface too complex just to make
GET_SUPPORTED_CPUID match exactly what userspace is going to enable. If
userspace wants to enable a feature because it can emulate it by its
own, it can just enable it using SET_CPUID.
Alexander Graf - May 9, 2012, 8:30 p.m.
On 09.05.2012, at 21:38, Eduardo Habkost wrote:

> On Wed, May 09, 2012 at 12:38:37PM +0300, Gleb Natapov wrote:
>> On Wed, May 09, 2012 at 11:05:58AM +0200, Alexander Graf wrote:
>>> 
>>> On 09.05.2012, at 10:51, Gleb Natapov wrote:
>>> 
>>>> On Wed, May 09, 2012 at 10:42:26AM +0200, Alexander Graf wrote:
>>>>> 
>>>>> 
>>>>> On 09.05.2012, at 10:14, Gleb Natapov <gleb@redhat.com> wrote:
>>>>> 
>>>>>> On Wed, May 09, 2012 at 12:07:04AM +0200, Alexander Graf wrote:
>>>>>>> 
>>>>>>> On 08.05.2012, at 22:14, Eduardo Habkost wrote:
>>>>>>> 
>>>>>>>> On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote:
>>>>>>>>> On 07.05.2012, at 20:21, Eduardo Habkost wrote:
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Andre? Are you able to help to answer the question below?
>>>>>>>>>> 
>>>>>>>>>> I would like to clarify what's the expected behavior of "-cpu host" to
>>>>>>>>>> be able to continue working on it. I believe the code will need to be
>>>>>>>>>> fixed on either case, but first we need to figure out what are the
>>>>>>>>>> expectations/requirements, to know _which_ changes will be needed.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
>>>>>>>>>>> (CCing Andre Przywara, in case he can help to clarify what's the
>>>>>>>>>>> expected meaning of "-cpu host")
>>>>>>>>>>> 
>>>>>>>>>> [...]
>>>>>>>>>>> I am not sure I understand what you are proposing. Let me explain the
>>>>>>>>>>> use case I am thinking about:
>>>>>>>>>>> 
>>>>>>>>>>> - Feature FOO is of type (A) (e.g. just a new instruction set that
>>>>>>>>>>> doesn't require additional userspace support)
>>>>>>>>>>> - User has a Qemu vesion that doesn't know anything about feature FOO
>>>>>>>>>>> - User gets a new CPU that supports feature FOO
>>>>>>>>>>> - User gets a new kernel that supports feature FOO (i.e. has FOO in
>>>>>>>>>>> GET_SUPPORTED_CPUID)
>>>>>>>>>>> - User does _not_ upgrade Qemu.
>>>>>>>>>>> - User expects to get feature FOO enabled if using "-cpu host", without
>>>>>>>>>>> upgrading Qemu.
>>>>>>>>>>> 
>>>>>>>>>>> The problem here is: to support the above use-case, userspace need a
>>>>>>>>>>> probing mechanism that can differentiate _new_ (previously unknown)
>>>>>>>>>>> features that are in group (A) (safe to blindly enable) from features
>>>>>>>>>>> that are in group (B) (that can't be enabled without an userspace
>>>>>>>>>>> upgrade).
>>>>>>>>>>> 
>>>>>>>>>>> In short, it becomes a problem if we consider the following case:
>>>>>>>>>>> 
>>>>>>>>>>> - Feature BAR is of type (B) (it can't be enabled without extra
>>>>>>>>>>> userspace support)
>>>>>>>>>>> - User has a Qemu version that doesn't know anything about feature BAR
>>>>>>>>>>> - User gets a new CPU that supports feature BAR
>>>>>>>>>>> - User gets a new kernel that supports feature BAR (i.e. has BAR in
>>>>>>>>>>> GET_SUPPORTED_CPUID)
>>>>>>>>>>> - User does _not_ upgrade Qemu.
>>>>>>>>>>> - User simply shouldn't get feature BAR enabled, even if using "-cpu
>>>>>>>>>>> host", otherwise Qemu would break.
>>>>>>>>>>> 
>>>>>>>>>>> If userspace always limited itself to features it knows about, it would
>>>>>>>>>>> be really easy to implement the feature without any new probing
>>>>>>>>>>> mechanism from the kernel. But that's not how I think users expect "-cpu
>>>>>>>>>>> host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
>>>>>>>>>>> introduced the "-cpu host" feature, in case he can explain what's the
>>>>>>>>>>> expected semantics on the cases above.
>>>>>>>>> 
>>>>>>>>> Can you think of any feature that'd go into category B?
>>>>>>>> 
>>>>>>>> - TSC-deadline: can't be enabled unless userspace takes care to enable
>>>>>>>> the in-kernel irqchip.
>>>>>>> 
>>>>>>> The kernel can check if in-kernel irqchip has it enabled and otherwise mask it out, no?
>>>>>>> 
>>>>>> How kernel should know that userspace does not emulate it?
>>>>> 
>>>>> You have to enable the in-kernel apic to use it, at which point the kernel knows it's in use, right?
>>>>> 
>>>>>> 
>>>>>>>> - x2apic: ditto.
>>>>>>> 
>>>>>>> Same here. For user space irqchip the kernel side doesn't care. If in-kernel APIC is enabled, check for its capabilities.
>>>>>>> 
>>>>>> Same here.
>>>>>> 
>>>>>> Well, technically both of those features can't be implemented in
>>>>>> userspace right now since MSRs are terminated in the kernel, but I
>>>>> 
>>>>> Doesn't sound like the greatest design - unless you deprecate the non-in-kernel apic case.
>>>>> 
>>>> You mean terminating MSRs in kernel does not sound like the greatest
>>>> design? I do not disagree. That is why IMO kernel can't filter out
>>>> TSC-deadline and x2apic like you suggest.
>>> 
>>> I still don't see why it can't.
>>> 
>>> Imagine we would filter TSC-deadline and x2apic by default in the kernel - they are not known to exist yet.
>>> Now, we implement TSC-deadline in the kernel. We still filter
>>> TSC-deadline out in GET_SUPORTED_CPUID in the kernel. But we provide
>>> an interface to user space that says "call me to enable TSC-deadline
>>> CPUID, but only if you're using the in-kernel apic"
> 
> We have that interface already, it is called KVM_SET_CPUID.  :-)
> 
>>> New user space calls that ioctl when it's using the in-kernel apic, it doesn't when it's using the user space apic.
>>> Old user space doesn't call that ioctl.
>> First of all we already have TSC-deadline in GET_SUPORTED_CPUID without
>> any additional ioctls. And second I do not see why we need additional
>> iostls here.
> 
> We don't have TSC-deadline set today (and that's what started this
> thread), but we have x2apic. So what you say is true for x2apic, anyway.
> 
>> Hmm, so may be I misunderstood you. You propose to mask TSC-deadline and
>> x2apic out from GET_SUPORTED_CPUID if irq chip is not in kernel, not
>> from KVM_SET_CPUID? For those two features it may make sense indeed.
> 
> It makes sense to me.
> 
> It looks like my assumptions were wrong. They were:
> 
> - GET_SUPPORTED_CPUID simply can't know if the in-kernel irqchip is
>  going to be enabled or not.
> - GET_SUPPORTED_CPUID output has to be a function of the kernel code
>  capabilitie and host CPU, and not depend on any input from userspace.
> 
> Are those assumptions incorrect? If we break them, we may try what
> Alexander is proposing. It would be much more flexible than the options
> I was considering.
> 
> I didn't know ENABLE_CAP existed. Even if GET_SUPPORTED_CPUID can't
> check for the in-kernel irqchip setup for some reason, ENABLE_CAP could
> be used by userpace to tell the kernel "I will enable the in-kernel
> irqchip, so feel free to return features that depend on it on
> GET_SUPPORTED_CPUID".
> 
> In other words, we would return only the type-A features on
> GET_SUPPORTED_CPUID (i.e. safe to be blindly enabled by -cpu host as
> long as migration is not required), but if we use ENABLE_CAP we can make
> group A safely grow, as long as userspace first tells the kernel what it
> supports.
> 
> Anybody is against doing that? Otherwise I plan to work on this.

I think it makes a lot of sense to go this route. But get your blessing from Avi first - ultimately he owns the API :).


Alex
Gleb Natapov - May 10, 2012, 12:53 p.m.
On Wed, May 09, 2012 at 04:38:02PM -0300, Eduardo Habkost wrote:
> On Wed, May 09, 2012 at 12:38:37PM +0300, Gleb Natapov wrote:
> > On Wed, May 09, 2012 at 11:05:58AM +0200, Alexander Graf wrote:
> > > 
> > > On 09.05.2012, at 10:51, Gleb Natapov wrote:
> > > 
> > > > On Wed, May 09, 2012 at 10:42:26AM +0200, Alexander Graf wrote:
> > > >> 
> > > >> 
> > > >> On 09.05.2012, at 10:14, Gleb Natapov <gleb@redhat.com> wrote:
> > > >> 
> > > >>> On Wed, May 09, 2012 at 12:07:04AM +0200, Alexander Graf wrote:
> > > >>>> 
> > > >>>> On 08.05.2012, at 22:14, Eduardo Habkost wrote:
> > > >>>> 
> > > >>>>> On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote:
> > > >>>>>> On 07.05.2012, at 20:21, Eduardo Habkost wrote:
> > > >>>>>> 
> > > >>>>>>> 
> > > >>>>>>> Andre? Are you able to help to answer the question below?
> > > >>>>>>> 
> > > >>>>>>> I would like to clarify what's the expected behavior of "-cpu host" to
> > > >>>>>>> be able to continue working on it. I believe the code will need to be
> > > >>>>>>> fixed on either case, but first we need to figure out what are the
> > > >>>>>>> expectations/requirements, to know _which_ changes will be needed.
> > > >>>>>>> 
> > > >>>>>>> 
> > > >>>>>>> On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
> > > >>>>>>>> (CCing Andre Przywara, in case he can help to clarify what's the
> > > >>>>>>>> expected meaning of "-cpu host")
> > > >>>>>>>> 
> > > >>>>>>> [...]
> > > >>>>>>>> I am not sure I understand what you are proposing. Let me explain the
> > > >>>>>>>> use case I am thinking about:
> > > >>>>>>>> 
> > > >>>>>>>> - Feature FOO is of type (A) (e.g. just a new instruction set that
> > > >>>>>>>> doesn't require additional userspace support)
> > > >>>>>>>> - User has a Qemu vesion that doesn't know anything about feature FOO
> > > >>>>>>>> - User gets a new CPU that supports feature FOO
> > > >>>>>>>> - User gets a new kernel that supports feature FOO (i.e. has FOO in
> > > >>>>>>>> GET_SUPPORTED_CPUID)
> > > >>>>>>>> - User does _not_ upgrade Qemu.
> > > >>>>>>>> - User expects to get feature FOO enabled if using "-cpu host", without
> > > >>>>>>>> upgrading Qemu.
> > > >>>>>>>> 
> > > >>>>>>>> The problem here is: to support the above use-case, userspace need a
> > > >>>>>>>> probing mechanism that can differentiate _new_ (previously unknown)
> > > >>>>>>>> features that are in group (A) (safe to blindly enable) from features
> > > >>>>>>>> that are in group (B) (that can't be enabled without an userspace
> > > >>>>>>>> upgrade).
> > > >>>>>>>> 
> > > >>>>>>>> In short, it becomes a problem if we consider the following case:
> > > >>>>>>>> 
> > > >>>>>>>> - Feature BAR is of type (B) (it can't be enabled without extra
> > > >>>>>>>> userspace support)
> > > >>>>>>>> - User has a Qemu version that doesn't know anything about feature BAR
> > > >>>>>>>> - User gets a new CPU that supports feature BAR
> > > >>>>>>>> - User gets a new kernel that supports feature BAR (i.e. has BAR in
> > > >>>>>>>> GET_SUPPORTED_CPUID)
> > > >>>>>>>> - User does _not_ upgrade Qemu.
> > > >>>>>>>> - User simply shouldn't get feature BAR enabled, even if using "-cpu
> > > >>>>>>>> host", otherwise Qemu would break.
> > > >>>>>>>> 
> > > >>>>>>>> If userspace always limited itself to features it knows about, it would
> > > >>>>>>>> be really easy to implement the feature without any new probing
> > > >>>>>>>> mechanism from the kernel. But that's not how I think users expect "-cpu
> > > >>>>>>>> host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
> > > >>>>>>>> introduced the "-cpu host" feature, in case he can explain what's the
> > > >>>>>>>> expected semantics on the cases above.
> > > >>>>>> 
> > > >>>>>> Can you think of any feature that'd go into category B?
> > > >>>>> 
> > > >>>>> - TSC-deadline: can't be enabled unless userspace takes care to enable
> > > >>>>> the in-kernel irqchip.
> > > >>>> 
> > > >>>> The kernel can check if in-kernel irqchip has it enabled and otherwise mask it out, no?
> > > >>>> 
> > > >>> How kernel should know that userspace does not emulate it?
> > > >> 
> > > >> You have to enable the in-kernel apic to use it, at which point the kernel knows it's in use, right?
> > > >> 
> > > >>> 
> > > >>>>> - x2apic: ditto.
> > > >>>> 
> > > >>>> Same here. For user space irqchip the kernel side doesn't care. If in-kernel APIC is enabled, check for its capabilities.
> > > >>>> 
> > > >>> Same here.
> > > >>> 
> > > >>> Well, technically both of those features can't be implemented in
> > > >>> userspace right now since MSRs are terminated in the kernel, but I
> > > >> 
> > > >> Doesn't sound like the greatest design - unless you deprecate the non-in-kernel apic case.
> > > >> 
> > > > You mean terminating MSRs in kernel does not sound like the greatest
> > > > design? I do not disagree. That is why IMO kernel can't filter out
> > > > TSC-deadline and x2apic like you suggest.
> > > 
> > > I still don't see why it can't.
> > > 
> > > Imagine we would filter TSC-deadline and x2apic by default in the kernel - they are not known to exist yet.
> > > Now, we implement TSC-deadline in the kernel. We still filter
> > > TSC-deadline out in GET_SUPORTED_CPUID in the kernel. But we provide
> > > an interface to user space that says "call me to enable TSC-deadline
> > > CPUID, but only if you're using the in-kernel apic"
> 
> We have that interface already, it is called KVM_SET_CPUID.  :-)
> 
> > > New user space calls that ioctl when it's using the in-kernel apic, it doesn't when it's using the user space apic.
> > > Old user space doesn't call that ioctl.
> > First of all we already have TSC-deadline in GET_SUPORTED_CPUID without
> > any additional ioctls. And second I do not see why we need additional
> > iostls here.
> 
> We don't have TSC-deadline set today (and that's what started this
> thread), but we have x2apic. So what you say is true for x2apic, anyway.
> 
Hmm, strange. But s/TSC-deadline/x2apic/ and the point stands :)

> > Hmm, so may be I misunderstood you. You propose to mask TSC-deadline and
> > x2apic out from GET_SUPORTED_CPUID if irq chip is not in kernel, not
> > from KVM_SET_CPUID? For those two features it may make sense indeed.
> 
> It makes sense to me.
> 
> It looks like my assumptions were wrong. They were:
> 
> - GET_SUPPORTED_CPUID simply can't know if the in-kernel irqchip is
>   going to be enabled or not.
I think this is currently true. Before we changing that we have to make
sure that no existing userspace calls GET_SUPPORTED_CPUID before
creating irqchip. Not sure we can check all existing userspaces.

> - GET_SUPPORTED_CPUID output has to be a function of the kernel code
>   capabilitie and host CPU, and not depend on any input from userspace.
> 
> Are those assumptions incorrect? If we break them, we may try what
> Alexander is proposing. It would be much more flexible than the options
> I was considering.
> 
> I didn't know ENABLE_CAP existed. Even if GET_SUPPORTED_CPUID can't
What is ENABLE_CAP?

> check for the in-kernel irqchip setup for some reason, ENABLE_CAP could
> be used by userpace to tell the kernel "I will enable the in-kernel
> irqchip, so feel free to return features that depend on it on
> GET_SUPPORTED_CPUID".
> 
> In other words, we would return only the type-A features on
> GET_SUPPORTED_CPUID (i.e. safe to be blindly enabled by -cpu host as
> long as migration is not required), but if we use ENABLE_CAP we can make
> group A safely grow, as long as userspace first tells the kernel what it
> supports.
> 
> Anybody is against doing that? Otherwise I plan to work on this.
> Probably I will start by making GET_SUPPORTED_CPUID not return x2apic
> unless userspace tells the kernel (using ENABLE_CAP) that it will enable
> the in-kernel irqchip. Then we can do the same with TSC-deadline.
> 
> Note that all this work is to allow the kernel to let userspace blindly
> enable features it _doesn't know yet_. If we limit ourselves to features
> userspace already knows about, we could simply remove x2apic and
> TSC-deadline from GET_SUPPORTED_CPUID completely, not use ENABLE_CAP for
> that, and let userspace set x2apic or TSC-deadline on KVM_SET_CPUID only
> if it knows it is safe (either because it checked for the corresponding
> KVM_CAP_* capability is present and it will enable the in-kernel
> irqchip, or because it will emulated it in userspace).
> 
There is not KVM_CAP_* for x2apic as far as I see.

> In case anybody is against the proposal above: note that the current
> documented GET_SUPPORTED_CPUID semantics (unconditionally returning bits
> that depend on specific userspace behavior/capabilities) is simply
> unusable by "-cpu host". If the above proposal gets rejected, my Plan B
> is to update the GET_SUPPORTED_CPUID documentation to note that it
> returns only type-A features (features that userspace can safely enable
> even if it doesn't know what it does), remove x2apic from
> GET_SUPPORTED_CPUID forever, and use KVM_CAP_* for discovery of all
> type-B features (features that depend on specific userspace
> capabilities/behavior).
> 
This will break existing userspaces, no?

> 
> > Not
> > sure there won't be others that are not dependent on irq chip presence.
> > You propose to add additional ioctls to enable them if they appear?
> 
> I am sure there will be new features in the future that don't depend on
> any userspace support, so they would be enabled on GET_SUPPORTED_CPUID
> unconditionally.
> 
Those not the kind of features I am worry about though. I worry about
the features that can be emulated either by kernel or by userspace and
userspace may choose where it wants the feature to be emulated.

> But if we have new features that depend on specific userspace
> capabilities/behavior (i.e. enabling the irqchip, or something else), we
> could also add them as long as we check if that capability/behavior was
> enabled using ENABLE_CAP.
> 
> 
> > > 
> > > So at the end all bits in GET_SUPPORTED_CPUID are consistent with what user space is capable of.
> > > 
> > GET_SUPPORTED_CPUID should not be necessary consistent with what user
> > space is capable of. Userspace may emulate features that are not in
> > GET_SUPPORTED_CPUID.
> 
> True. We don't need to make the interface too complex just to make
> GET_SUPPORTED_CPUID match exactly what userspace is going to enable. If
> userspace wants to enable a feature because it can emulate it by its
> own, it can just enable it using SET_CPUID.
> 
> -- 
> Eduardo

--
			Gleb.
Alexander Graf - May 10, 2012, 1:21 p.m.
On 05/10/2012 02:53 PM, Gleb Natapov wrote:
> On Wed, May 09, 2012 at 04:38:02PM -0300, Eduardo Habkost wrote:
>> On Wed, May 09, 2012 at 12:38:37PM +0300, Gleb Natapov wrote:
>>> On Wed, May 09, 2012 at 11:05:58AM +0200, Alexander Graf wrote:
>>>> On 09.05.2012, at 10:51, Gleb Natapov wrote:
>>>>
>>>>> On Wed, May 09, 2012 at 10:42:26AM +0200, Alexander Graf wrote:
>>>>>>
>>>>>> On 09.05.2012, at 10:14, Gleb Natapov<gleb@redhat.com>  wrote:
>>>>>>
>>>>>>> On Wed, May 09, 2012 at 12:07:04AM +0200, Alexander Graf wrote:
>>>>>>>> On 08.05.2012, at 22:14, Eduardo Habkost wrote:
>>>>>>>>
>>>>>>>>> On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote:
>>>>>>>>>> On 07.05.2012, at 20:21, Eduardo Habkost wrote:
>>>>>>>>>>
>>>>>>>>>>> Andre? Are you able to help to answer the question below?
>>>>>>>>>>>
>>>>>>>>>>> I would like to clarify what's the expected behavior of "-cpu host" to
>>>>>>>>>>> be able to continue working on it. I believe the code will need to be
>>>>>>>>>>> fixed on either case, but first we need to figure out what are the
>>>>>>>>>>> expectations/requirements, to know _which_ changes will be needed.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
>>>>>>>>>>>> (CCing Andre Przywara, in case he can help to clarify what's the
>>>>>>>>>>>> expected meaning of "-cpu host")
>>>>>>>>>>>>
>>>>>>>>>>> [...]
>>>>>>>>>>>> I am not sure I understand what you are proposing. Let me explain the
>>>>>>>>>>>> use case I am thinking about:
>>>>>>>>>>>>
>>>>>>>>>>>> - Feature FOO is of type (A) (e.g. just a new instruction set that
>>>>>>>>>>>> doesn't require additional userspace support)
>>>>>>>>>>>> - User has a Qemu vesion that doesn't know anything about feature FOO
>>>>>>>>>>>> - User gets a new CPU that supports feature FOO
>>>>>>>>>>>> - User gets a new kernel that supports feature FOO (i.e. has FOO in
>>>>>>>>>>>> GET_SUPPORTED_CPUID)
>>>>>>>>>>>> - User does _not_ upgrade Qemu.
>>>>>>>>>>>> - User expects to get feature FOO enabled if using "-cpu host", without
>>>>>>>>>>>> upgrading Qemu.
>>>>>>>>>>>>
>>>>>>>>>>>> The problem here is: to support the above use-case, userspace need a
>>>>>>>>>>>> probing mechanism that can differentiate _new_ (previously unknown)
>>>>>>>>>>>> features that are in group (A) (safe to blindly enable) from features
>>>>>>>>>>>> that are in group (B) (that can't be enabled without an userspace
>>>>>>>>>>>> upgrade).
>>>>>>>>>>>>
>>>>>>>>>>>> In short, it becomes a problem if we consider the following case:
>>>>>>>>>>>>
>>>>>>>>>>>> - Feature BAR is of type (B) (it can't be enabled without extra
>>>>>>>>>>>> userspace support)
>>>>>>>>>>>> - User has a Qemu version that doesn't know anything about feature BAR
>>>>>>>>>>>> - User gets a new CPU that supports feature BAR
>>>>>>>>>>>> - User gets a new kernel that supports feature BAR (i.e. has BAR in
>>>>>>>>>>>> GET_SUPPORTED_CPUID)
>>>>>>>>>>>> - User does _not_ upgrade Qemu.
>>>>>>>>>>>> - User simply shouldn't get feature BAR enabled, even if using "-cpu
>>>>>>>>>>>> host", otherwise Qemu would break.
>>>>>>>>>>>>
>>>>>>>>>>>> If userspace always limited itself to features it knows about, it would
>>>>>>>>>>>> be really easy to implement the feature without any new probing
>>>>>>>>>>>> mechanism from the kernel. But that's not how I think users expect "-cpu
>>>>>>>>>>>> host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
>>>>>>>>>>>> introduced the "-cpu host" feature, in case he can explain what's the
>>>>>>>>>>>> expected semantics on the cases above.
>>>>>>>>>> Can you think of any feature that'd go into category B?
>>>>>>>>> - TSC-deadline: can't be enabled unless userspace takes care to enable
>>>>>>>>> the in-kernel irqchip.
>>>>>>>> The kernel can check if in-kernel irqchip has it enabled and otherwise mask it out, no?
>>>>>>>>
>>>>>>> How kernel should know that userspace does not emulate it?
>>>>>> You have to enable the in-kernel apic to use it, at which point the kernel knows it's in use, right?
>>>>>>
>>>>>>>>> - x2apic: ditto.
>>>>>>>> Same here. For user space irqchip the kernel side doesn't care. If in-kernel APIC is enabled, check for its capabilities.
>>>>>>>>
>>>>>>> Same here.
>>>>>>>
>>>>>>> Well, technically both of those features can't be implemented in
>>>>>>> userspace right now since MSRs are terminated in the kernel, but I
>>>>>> Doesn't sound like the greatest design - unless you deprecate the non-in-kernel apic case.
>>>>>>
>>>>> You mean terminating MSRs in kernel does not sound like the greatest
>>>>> design? I do not disagree. That is why IMO kernel can't filter out
>>>>> TSC-deadline and x2apic like you suggest.
>>>> I still don't see why it can't.
>>>>
>>>> Imagine we would filter TSC-deadline and x2apic by default in the kernel - they are not known to exist yet.
>>>> Now, we implement TSC-deadline in the kernel. We still filter
>>>> TSC-deadline out in GET_SUPORTED_CPUID in the kernel. But we provide
>>>> an interface to user space that says "call me to enable TSC-deadline
>>>> CPUID, but only if you're using the in-kernel apic"
>> We have that interface already, it is called KVM_SET_CPUID.  :-)
>>
>>>> New user space calls that ioctl when it's using the in-kernel apic, it doesn't when it's using the user space apic.
>>>> Old user space doesn't call that ioctl.
>>> First of all we already have TSC-deadline in GET_SUPORTED_CPUID without
>>> any additional ioctls. And second I do not see why we need additional
>>> iostls here.
>> We don't have TSC-deadline set today (and that's what started this
>> thread), but we have x2apic. So what you say is true for x2apic, anyway.
>>
> Hmm, strange. But s/TSC-deadline/x2apic/ and the point stands :)
>
>>> Hmm, so may be I misunderstood you. You propose to mask TSC-deadline and
>>> x2apic out from GET_SUPORTED_CPUID if irq chip is not in kernel, not
>>> from KVM_SET_CPUID? For those two features it may make sense indeed.
>> It makes sense to me.
>>
>> It looks like my assumptions were wrong. They were:
>>
>> - GET_SUPPORTED_CPUID simply can't know if the in-kernel irqchip is
>>    going to be enabled or not.
> I think this is currently true. Before we changing that we have to make
> sure that no existing userspace calls GET_SUPPORTED_CPUID before
> creating irqchip. Not sure we can check all existing userspaces.
>
>> - GET_SUPPORTED_CPUID output has to be a function of the kernel code
>>    capabilitie and host CPU, and not depend on any input from userspace.
>>
>> Are those assumptions incorrect? If we break them, we may try what
>> Alexander is proposing. It would be much more flexible than the options
>> I was considering.
>>
>> I didn't know ENABLE_CAP existed. Even if GET_SUPPORTED_CPUID can't
> What is ENABLE_CAP?

It's an ioctl that you pass in a CAP. It's used on non-x86 already to 
easily turn on features inside kvm :).

>
>> check for the in-kernel irqchip setup for some reason, ENABLE_CAP could
>> be used by userpace to tell the kernel "I will enable the in-kernel
>> irqchip, so feel free to return features that depend on it on
>> GET_SUPPORTED_CPUID".
>>
>> In other words, we would return only the type-A features on
>> GET_SUPPORTED_CPUID (i.e. safe to be blindly enabled by -cpu host as
>> long as migration is not required), but if we use ENABLE_CAP we can make
>> group A safely grow, as long as userspace first tells the kernel what it
>> supports.
>>
>> Anybody is against doing that? Otherwise I plan to work on this.
>> Probably I will start by making GET_SUPPORTED_CPUID not return x2apic
>> unless userspace tells the kernel (using ENABLE_CAP) that it will enable
>> the in-kernel irqchip. Then we can do the same with TSC-deadline.
>>
>> Note that all this work is to allow the kernel to let userspace blindly
>> enable features it _doesn't know yet_. If we limit ourselves to features
>> userspace already knows about, we could simply remove x2apic and
>> TSC-deadline from GET_SUPPORTED_CPUID completely, not use ENABLE_CAP for
>> that, and let userspace set x2apic or TSC-deadline on KVM_SET_CPUID only
>> if it knows it is safe (either because it checked for the corresponding
>> KVM_CAP_* capability is present and it will enable the in-kernel
>> irqchip, or because it will emulated it in userspace).
>>
> There is not KVM_CAP_* for x2apic as far as I see.
>
>> In case anybody is against the proposal above: note that the current
>> documented GET_SUPPORTED_CPUID semantics (unconditionally returning bits
>> that depend on specific userspace behavior/capabilities) is simply
>> unusable by "-cpu host". If the above proposal gets rejected, my Plan B
>> is to update the GET_SUPPORTED_CPUID documentation to note that it
>> returns only type-A features (features that userspace can safely enable
>> even if it doesn't know what it does), remove x2apic from
>> GET_SUPPORTED_CPUID forever, and use KVM_CAP_* for discovery of all
>> type-B features (features that depend on specific userspace
>> capabilities/behavior).
>>
> This will break existing userspaces, no?

It would mean that new user space on old kernels don't get x2apic or 
tsc-deadline in cpuid, yes.

>
>>> Not
>>> sure there won't be others that are not dependent on irq chip presence.
>>> You propose to add additional ioctls to enable them if they appear?
>> I am sure there will be new features in the future that don't depend on
>> any userspace support, so they would be enabled on GET_SUPPORTED_CPUID
>> unconditionally.
>>
> Those not the kind of features I am worry about though. I worry about
> the features that can be emulated either by kernel or by userspace and
> userspace may choose where it wants the feature to be emulated.

Like? Usually CPUID features reflect new instructions. We don't notify 
user space on unknown instructions, right?


Alex
Gleb Natapov - May 10, 2012, 1:39 p.m.
On Thu, May 10, 2012 at 03:21:41PM +0200, Alexander Graf wrote:
> On 05/10/2012 02:53 PM, Gleb Natapov wrote:
> >On Wed, May 09, 2012 at 04:38:02PM -0300, Eduardo Habkost wrote:
> >>On Wed, May 09, 2012 at 12:38:37PM +0300, Gleb Natapov wrote:
> >>>On Wed, May 09, 2012 at 11:05:58AM +0200, Alexander Graf wrote:
> >>>>On 09.05.2012, at 10:51, Gleb Natapov wrote:
> >>>>
> >>>>>On Wed, May 09, 2012 at 10:42:26AM +0200, Alexander Graf wrote:
> >>>>>>
> >>>>>>On 09.05.2012, at 10:14, Gleb Natapov<gleb@redhat.com>  wrote:
> >>>>>>
> >>>>>>>On Wed, May 09, 2012 at 12:07:04AM +0200, Alexander Graf wrote:
> >>>>>>>>On 08.05.2012, at 22:14, Eduardo Habkost wrote:
> >>>>>>>>
> >>>>>>>>>On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote:
> >>>>>>>>>>On 07.05.2012, at 20:21, Eduardo Habkost wrote:
> >>>>>>>>>>
> >>>>>>>>>>>Andre? Are you able to help to answer the question below?
> >>>>>>>>>>>
> >>>>>>>>>>>I would like to clarify what's the expected behavior of "-cpu host" to
> >>>>>>>>>>>be able to continue working on it. I believe the code will need to be
> >>>>>>>>>>>fixed on either case, but first we need to figure out what are the
> >>>>>>>>>>>expectations/requirements, to know _which_ changes will be needed.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
> >>>>>>>>>>>>(CCing Andre Przywara, in case he can help to clarify what's the
> >>>>>>>>>>>>expected meaning of "-cpu host")
> >>>>>>>>>>>>
> >>>>>>>>>>>[...]
> >>>>>>>>>>>>I am not sure I understand what you are proposing. Let me explain the
> >>>>>>>>>>>>use case I am thinking about:
> >>>>>>>>>>>>
> >>>>>>>>>>>>- Feature FOO is of type (A) (e.g. just a new instruction set that
> >>>>>>>>>>>>doesn't require additional userspace support)
> >>>>>>>>>>>>- User has a Qemu vesion that doesn't know anything about feature FOO
> >>>>>>>>>>>>- User gets a new CPU that supports feature FOO
> >>>>>>>>>>>>- User gets a new kernel that supports feature FOO (i.e. has FOO in
> >>>>>>>>>>>>GET_SUPPORTED_CPUID)
> >>>>>>>>>>>>- User does _not_ upgrade Qemu.
> >>>>>>>>>>>>- User expects to get feature FOO enabled if using "-cpu host", without
> >>>>>>>>>>>>upgrading Qemu.
> >>>>>>>>>>>>
> >>>>>>>>>>>>The problem here is: to support the above use-case, userspace need a
> >>>>>>>>>>>>probing mechanism that can differentiate _new_ (previously unknown)
> >>>>>>>>>>>>features that are in group (A) (safe to blindly enable) from features
> >>>>>>>>>>>>that are in group (B) (that can't be enabled without an userspace
> >>>>>>>>>>>>upgrade).
> >>>>>>>>>>>>
> >>>>>>>>>>>>In short, it becomes a problem if we consider the following case:
> >>>>>>>>>>>>
> >>>>>>>>>>>>- Feature BAR is of type (B) (it can't be enabled without extra
> >>>>>>>>>>>>userspace support)
> >>>>>>>>>>>>- User has a Qemu version that doesn't know anything about feature BAR
> >>>>>>>>>>>>- User gets a new CPU that supports feature BAR
> >>>>>>>>>>>>- User gets a new kernel that supports feature BAR (i.e. has BAR in
> >>>>>>>>>>>>GET_SUPPORTED_CPUID)
> >>>>>>>>>>>>- User does _not_ upgrade Qemu.
> >>>>>>>>>>>>- User simply shouldn't get feature BAR enabled, even if using "-cpu
> >>>>>>>>>>>>host", otherwise Qemu would break.
> >>>>>>>>>>>>
> >>>>>>>>>>>>If userspace always limited itself to features it knows about, it would
> >>>>>>>>>>>>be really easy to implement the feature without any new probing
> >>>>>>>>>>>>mechanism from the kernel. But that's not how I think users expect "-cpu
> >>>>>>>>>>>>host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
> >>>>>>>>>>>>introduced the "-cpu host" feature, in case he can explain what's the
> >>>>>>>>>>>>expected semantics on the cases above.
> >>>>>>>>>>Can you think of any feature that'd go into category B?
> >>>>>>>>>- TSC-deadline: can't be enabled unless userspace takes care to enable
> >>>>>>>>>the in-kernel irqchip.
> >>>>>>>>The kernel can check if in-kernel irqchip has it enabled and otherwise mask it out, no?
> >>>>>>>>
> >>>>>>>How kernel should know that userspace does not emulate it?
> >>>>>>You have to enable the in-kernel apic to use it, at which point the kernel knows it's in use, right?
> >>>>>>
> >>>>>>>>>- x2apic: ditto.
> >>>>>>>>Same here. For user space irqchip the kernel side doesn't care. If in-kernel APIC is enabled, check for its capabilities.
> >>>>>>>>
> >>>>>>>Same here.
> >>>>>>>
> >>>>>>>Well, technically both of those features can't be implemented in
> >>>>>>>userspace right now since MSRs are terminated in the kernel, but I
> >>>>>>Doesn't sound like the greatest design - unless you deprecate the non-in-kernel apic case.
> >>>>>>
> >>>>>You mean terminating MSRs in kernel does not sound like the greatest
> >>>>>design? I do not disagree. That is why IMO kernel can't filter out
> >>>>>TSC-deadline and x2apic like you suggest.
> >>>>I still don't see why it can't.
> >>>>
> >>>>Imagine we would filter TSC-deadline and x2apic by default in the kernel - they are not known to exist yet.
> >>>>Now, we implement TSC-deadline in the kernel. We still filter
> >>>>TSC-deadline out in GET_SUPORTED_CPUID in the kernel. But we provide
> >>>>an interface to user space that says "call me to enable TSC-deadline
> >>>>CPUID, but only if you're using the in-kernel apic"
> >>We have that interface already, it is called KVM_SET_CPUID.  :-)
> >>
> >>>>New user space calls that ioctl when it's using the in-kernel apic, it doesn't when it's using the user space apic.
> >>>>Old user space doesn't call that ioctl.
> >>>First of all we already have TSC-deadline in GET_SUPORTED_CPUID without
> >>>any additional ioctls. And second I do not see why we need additional
> >>>iostls here.
> >>We don't have TSC-deadline set today (and that's what started this
> >>thread), but we have x2apic. So what you say is true for x2apic, anyway.
> >>
> >Hmm, strange. But s/TSC-deadline/x2apic/ and the point stands :)
> >
> >>>Hmm, so may be I misunderstood you. You propose to mask TSC-deadline and
> >>>x2apic out from GET_SUPORTED_CPUID if irq chip is not in kernel, not
> >>>from KVM_SET_CPUID? For those two features it may make sense indeed.
> >>It makes sense to me.
> >>
> >>It looks like my assumptions were wrong. They were:
> >>
> >>- GET_SUPPORTED_CPUID simply can't know if the in-kernel irqchip is
> >>   going to be enabled or not.
> >I think this is currently true. Before we changing that we have to make
> >sure that no existing userspace calls GET_SUPPORTED_CPUID before
> >creating irqchip. Not sure we can check all existing userspaces.
> >
> >>- GET_SUPPORTED_CPUID output has to be a function of the kernel code
> >>   capabilitie and host CPU, and not depend on any input from userspace.
> >>
> >>Are those assumptions incorrect? If we break them, we may try what
> >>Alexander is proposing. It would be much more flexible than the options
> >>I was considering.
> >>
> >>I didn't know ENABLE_CAP existed. Even if GET_SUPPORTED_CPUID can't
> >What is ENABLE_CAP?
> 
> It's an ioctl that you pass in a CAP. It's used on non-x86 already
> to easily turn on features inside kvm :).
> 
No wonder grep in arch/x86 told me nothing :) So for x86 this ioctl as
good as non-existent for now.

> >
> >>check for the in-kernel irqchip setup for some reason, ENABLE_CAP could
> >>be used by userpace to tell the kernel "I will enable the in-kernel
> >>irqchip, so feel free to return features that depend on it on
> >>GET_SUPPORTED_CPUID".
> >>
> >>In other words, we would return only the type-A features on
> >>GET_SUPPORTED_CPUID (i.e. safe to be blindly enabled by -cpu host as
> >>long as migration is not required), but if we use ENABLE_CAP we can make
> >>group A safely grow, as long as userspace first tells the kernel what it
> >>supports.
> >>
> >>Anybody is against doing that? Otherwise I plan to work on this.
> >>Probably I will start by making GET_SUPPORTED_CPUID not return x2apic
> >>unless userspace tells the kernel (using ENABLE_CAP) that it will enable
> >>the in-kernel irqchip. Then we can do the same with TSC-deadline.
> >>
> >>Note that all this work is to allow the kernel to let userspace blindly
> >>enable features it _doesn't know yet_. If we limit ourselves to features
> >>userspace already knows about, we could simply remove x2apic and
> >>TSC-deadline from GET_SUPPORTED_CPUID completely, not use ENABLE_CAP for
> >>that, and let userspace set x2apic or TSC-deadline on KVM_SET_CPUID only
> >>if it knows it is safe (either because it checked for the corresponding
> >>KVM_CAP_* capability is present and it will enable the in-kernel
> >>irqchip, or because it will emulated it in userspace).
> >>
> >There is not KVM_CAP_* for x2apic as far as I see.
> >
> >>In case anybody is against the proposal above: note that the current
> >>documented GET_SUPPORTED_CPUID semantics (unconditionally returning bits
> >>that depend on specific userspace behavior/capabilities) is simply
> >>unusable by "-cpu host". If the above proposal gets rejected, my Plan B
> >>is to update the GET_SUPPORTED_CPUID documentation to note that it
> >>returns only type-A features (features that userspace can safely enable
> >>even if it doesn't know what it does), remove x2apic from
> >>GET_SUPPORTED_CPUID forever, and use KVM_CAP_* for discovery of all
> >>type-B features (features that depend on specific userspace
> >>capabilities/behavior).
> >>
> >This will break existing userspaces, no?
> 
> It would mean that new user space on old kernels don't get x2apic or
> tsc-deadline in cpuid, yes.
> 
tsc-deadline actually exposed via capability, but disappearance of
x2apic would be serious regression.

> >
> >>>Not
> >>>sure there won't be others that are not dependent on irq chip presence.
> >>>You propose to add additional ioctls to enable them if they appear?
> >>I am sure there will be new features in the future that don't depend on
> >>any userspace support, so they would be enabled on GET_SUPPORTED_CPUID
> >>unconditionally.
> >>
> >Those not the kind of features I am worry about though. I worry about
> >the features that can be emulated either by kernel or by userspace and
> >userspace may choose where it wants the feature to be emulated.
> 
> Like? Usually CPUID features reflect new instructions. We don't
> notify user space on unknown instructions, right?
> 
No we do not. I can't give you concrete example, but think about some
apic unrelated device that is accessible via MSR and can be emulated
either in kernel or in userspace. Hmm, maybe something like ACPI
(Thermal Monitor and Software Controlled Clock Facilities), or PSN 
(Processor Serial Number).

--
			Gleb.
Eduardo Habkost - May 10, 2012, 2:12 p.m.
On Thu, May 10, 2012 at 04:39:45PM +0300, Gleb Natapov wrote:
> On Thu, May 10, 2012 at 03:21:41PM +0200, Alexander Graf wrote:
> > On 05/10/2012 02:53 PM, Gleb Natapov wrote:
> > >On Wed, May 09, 2012 at 04:38:02PM -0300, Eduardo Habkost wrote:
> > >>On Wed, May 09, 2012 at 12:38:37PM +0300, Gleb Natapov wrote:
> > >>>On Wed, May 09, 2012 at 11:05:58AM +0200, Alexander Graf wrote:
> > >>>>On 09.05.2012, at 10:51, Gleb Natapov wrote:
> > >>>>
> > >>>>>On Wed, May 09, 2012 at 10:42:26AM +0200, Alexander Graf wrote:
> > >>>>>>
> > >>>>>>On 09.05.2012, at 10:14, Gleb Natapov<gleb@redhat.com>  wrote:
> > >>>>>>
> > >>>>>>>On Wed, May 09, 2012 at 12:07:04AM +0200, Alexander Graf wrote:
> > >>>>>>>>On 08.05.2012, at 22:14, Eduardo Habkost wrote:
> > >>>>>>>>
> > >>>>>>>>>On Tue, May 08, 2012 at 02:58:11AM +0200, Alexander Graf wrote:
> > >>>>>>>>>>On 07.05.2012, at 20:21, Eduardo Habkost wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>>Andre? Are you able to help to answer the question below?
> > >>>>>>>>>>>
> > >>>>>>>>>>>I would like to clarify what's the expected behavior of "-cpu host" to
> > >>>>>>>>>>>be able to continue working on it. I believe the code will need to be
> > >>>>>>>>>>>fixed on either case, but first we need to figure out what are the
> > >>>>>>>>>>>expectations/requirements, to know _which_ changes will be needed.
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>On Tue, Apr 24, 2012 at 02:19:25PM -0300, Eduardo Habkost wrote:
> > >>>>>>>>>>>>(CCing Andre Przywara, in case he can help to clarify what's the
> > >>>>>>>>>>>>expected meaning of "-cpu host")
> > >>>>>>>>>>>>
> > >>>>>>>>>>>[...]
> > >>>>>>>>>>>>I am not sure I understand what you are proposing. Let me explain the
> > >>>>>>>>>>>>use case I am thinking about:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>- Feature FOO is of type (A) (e.g. just a new instruction set that
> > >>>>>>>>>>>>doesn't require additional userspace support)
> > >>>>>>>>>>>>- User has a Qemu vesion that doesn't know anything about feature FOO
> > >>>>>>>>>>>>- User gets a new CPU that supports feature FOO
> > >>>>>>>>>>>>- User gets a new kernel that supports feature FOO (i.e. has FOO in
> > >>>>>>>>>>>>GET_SUPPORTED_CPUID)
> > >>>>>>>>>>>>- User does _not_ upgrade Qemu.
> > >>>>>>>>>>>>- User expects to get feature FOO enabled if using "-cpu host", without
> > >>>>>>>>>>>>upgrading Qemu.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>The problem here is: to support the above use-case, userspace need a
> > >>>>>>>>>>>>probing mechanism that can differentiate _new_ (previously unknown)
> > >>>>>>>>>>>>features that are in group (A) (safe to blindly enable) from features
> > >>>>>>>>>>>>that are in group (B) (that can't be enabled without an userspace
> > >>>>>>>>>>>>upgrade).
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>In short, it becomes a problem if we consider the following case:
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>- Feature BAR is of type (B) (it can't be enabled without extra
> > >>>>>>>>>>>>userspace support)
> > >>>>>>>>>>>>- User has a Qemu version that doesn't know anything about feature BAR
> > >>>>>>>>>>>>- User gets a new CPU that supports feature BAR
> > >>>>>>>>>>>>- User gets a new kernel that supports feature BAR (i.e. has BAR in
> > >>>>>>>>>>>>GET_SUPPORTED_CPUID)
> > >>>>>>>>>>>>- User does _not_ upgrade Qemu.
> > >>>>>>>>>>>>- User simply shouldn't get feature BAR enabled, even if using "-cpu
> > >>>>>>>>>>>>host", otherwise Qemu would break.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>If userspace always limited itself to features it knows about, it would
> > >>>>>>>>>>>>be really easy to implement the feature without any new probing
> > >>>>>>>>>>>>mechanism from the kernel. But that's not how I think users expect "-cpu
> > >>>>>>>>>>>>host" to work. Maybe I am wrong, I don't know. I am CCing Andre, who
> > >>>>>>>>>>>>introduced the "-cpu host" feature, in case he can explain what's the
> > >>>>>>>>>>>>expected semantics on the cases above.
> > >>>>>>>>>>Can you think of any feature that'd go into category B?
> > >>>>>>>>>- TSC-deadline: can't be enabled unless userspace takes care to enable
> > >>>>>>>>>the in-kernel irqchip.
> > >>>>>>>>The kernel can check if in-kernel irqchip has it enabled and otherwise mask it out, no?
> > >>>>>>>>
> > >>>>>>>How kernel should know that userspace does not emulate it?
> > >>>>>>You have to enable the in-kernel apic to use it, at which point the kernel knows it's in use, right?
> > >>>>>>
> > >>>>>>>>>- x2apic: ditto.
> > >>>>>>>>Same here. For user space irqchip the kernel side doesn't care. If in-kernel APIC is enabled, check for its capabilities.
> > >>>>>>>>
> > >>>>>>>Same here.
> > >>>>>>>
> > >>>>>>>Well, technically both of those features can't be implemented in
> > >>>>>>>userspace right now since MSRs are terminated in the kernel, but I
> > >>>>>>Doesn't sound like the greatest design - unless you deprecate the non-in-kernel apic case.
> > >>>>>>
> > >>>>>You mean terminating MSRs in kernel does not sound like the greatest
> > >>>>>design? I do not disagree. That is why IMO kernel can't filter out
> > >>>>>TSC-deadline and x2apic like you suggest.
> > >>>>I still don't see why it can't.
> > >>>>
> > >>>>Imagine we would filter TSC-deadline and x2apic by default in the kernel - they are not known to exist yet.
> > >>>>Now, we implement TSC-deadline in the kernel. We still filter
> > >>>>TSC-deadline out in GET_SUPORTED_CPUID in the kernel. But we provide
> > >>>>an interface to user space that says "call me to enable TSC-deadline
> > >>>>CPUID, but only if you're using the in-kernel apic"
> > >>We have that interface already, it is called KVM_SET_CPUID.  :-)
> > >>
> > >>>>New user space calls that ioctl when it's using the in-kernel apic, it doesn't when it's using the user space apic.
> > >>>>Old user space doesn't call that ioctl.
> > >>>First of all we already have TSC-deadline in GET_SUPORTED_CPUID without
> > >>>any additional ioctls. And second I do not see why we need additional
> > >>>iostls here.
> > >>We don't have TSC-deadline set today (and that's what started this
> > >>thread), but we have x2apic. So what you say is true for x2apic, anyway.
> > >>
> > >Hmm, strange. But s/TSC-deadline/x2apic/ and the point stands :)
> > >
> > >>>Hmm, so may be I misunderstood you. You propose to mask TSC-deadline and
> > >>>x2apic out from GET_SUPORTED_CPUID if irq chip is not in kernel, not
> > >>>from KVM_SET_CPUID? For those two features it may make sense indeed.
> > >>It makes sense to me.
> > >>
> > >>It looks like my assumptions were wrong. They were:
> > >>
> > >>- GET_SUPPORTED_CPUID simply can't know if the in-kernel irqchip is
> > >>   going to be enabled or not.
> > >I think this is currently true. Before we changing that we have to make
> > >sure that no existing userspace calls GET_SUPPORTED_CPUID before
> > >creating irqchip. Not sure we can check all existing userspaces.
> > >
> > >>- GET_SUPPORTED_CPUID output has to be a function of the kernel code
> > >>   capabilitie and host CPU, and not depend on any input from userspace.
> > >>
> > >>Are those assumptions incorrect? If we break them, we may try what
> > >>Alexander is proposing. It would be much more flexible than the options
> > >>I was considering.
> > >>
> > >>I didn't know ENABLE_CAP existed. Even if GET_SUPPORTED_CPUID can't
> > >What is ENABLE_CAP?
> > 
> > It's an ioctl that you pass in a CAP. It's used on non-x86 already
> > to easily turn on features inside kvm :).
> > 
> No wonder grep in arch/x86 told me nothing :) So for x86 this ioctl as
> good as non-existent for now.
> 
> > >
> > >>check for the in-kernel irqchip setup for some reason, ENABLE_CAP could
> > >>be used by userpace to tell the kernel "I will enable the in-kernel
> > >>irqchip, so feel free to return features that depend on it on
> > >>GET_SUPPORTED_CPUID".
> > >>
> > >>In other words, we would return only the type-A features on
> > >>GET_SUPPORTED_CPUID (i.e. safe to be blindly enabled by -cpu host as
> > >>long as migration is not required), but if we use ENABLE_CAP we can make
> > >>group A safely grow, as long as userspace first tells the kernel what it
> > >>supports.
> > >>
> > >>Anybody is against doing that? Otherwise I plan to work on this.
> > >>Probably I will start by making GET_SUPPORTED_CPUID not return x2apic
> > >>unless userspace tells the kernel (using ENABLE_CAP) that it will enable
> > >>the in-kernel irqchip. Then we can do the same with TSC-deadline.
> > >>
> > >>Note that all this work is to allow the kernel to let userspace blindly
> > >>enable features it _doesn't know yet_. If we limit ourselves to features
> > >>userspace already knows about, we could simply remove x2apic and
> > >>TSC-deadline from GET_SUPPORTED_CPUID completely, not use ENABLE_CAP for
> > >>that, and let userspace set x2apic or TSC-deadline on KVM_SET_CPUID only
> > >>if it knows it is safe (either because it checked for the corresponding
> > >>KVM_CAP_* capability is present and it will enable the in-kernel
> > >>irqchip, or because it will emulated it in userspace).
> > >>
> > >There is not KVM_CAP_* for x2apic as far as I see.
> > >
> > >>In case anybody is against the proposal above: note that the current
> > >>documented GET_SUPPORTED_CPUID semantics (unconditionally returning bits
> > >>that depend on specific userspace behavior/capabilities) is simply
> > >>unusable by "-cpu host". If the above proposal gets rejected, my Plan B
> > >>is to update the GET_SUPPORTED_CPUID documentation to note that it
> > >>returns only type-A features (features that userspace can safely enable
> > >>even if it doesn't know what it does), remove x2apic from
> > >>GET_SUPPORTED_CPUID forever, and use KVM_CAP_* for discovery of all
> > >>type-B features (features that depend on specific userspace
> > >>capabilities/behavior).
> > >>
> > >This will break existing userspaces, no?
> > 
> > It would mean that new user space on old kernels don't get x2apic or
> > tsc-deadline in cpuid, yes.
> > 
> tsc-deadline actually exposed via capability, but disappearance of
> x2apic would be serious regression.

To keep compatibility, we can let userspace tell the kernel that they
want the "safe" version of GET_SUPPORTED_CPUID (using SET_CAP?). If
userspace doesn't do that, x2apic would be kept enabled on
GET_SUPPORTED_CPUID for compatiblity.

The problem here is that the current behavior of GET_SUPPORTED_CPUID
(returning x2apic) is unusable by -cpu host, unless we make the
in-kernel irqchip mandatory for -cpu host (but that would be solving
just our immediate needs, and not any similar cases in the future).


> > >
> > >>>Not
> > >>>sure there won't be others that are not dependent on irq chip presence.
> > >>>You propose to add additional ioctls to enable them if they appear?
> > >>I am sure there will be new features in the future that don't depend on
> > >>any userspace support, so they would be enabled on GET_SUPPORTED_CPUID
> > >>unconditionally.
> > >>
> > >Those not the kind of features I am worry about though. I worry about
> > >the features that can be emulated either by kernel or by userspace and
> > >userspace may choose where it wants the feature to be emulated.
> > 
> > Like? Usually CPUID features reflect new instructions. We don't
> > notify user space on unknown instructions, right?
> > 
> No we do not. I can't give you concrete example, but think about some
> apic unrelated device that is accessible via MSR and can be emulated
> either in kernel or in userspace. Hmm, maybe something like ACPI
> (Thermal Monitor and Software Controlled Clock Facilities), or PSN 
> (Processor Serial Number).

Those cases are the easy ones (or maybe they're just the ones I don't
worry about, right now :).

- If the kernel can't emulate it, the feature would never appear on
  GET_SUPPORTED_CPUID, but userspace could enable it using SET_CPUID if
  userspace is going to emulate it.
  - In case userspace emulation don't depend on any extra kernel
    capability, userspace would simply set it on SET_CPUID.
    - Now, what happens here if a future kernel version becomes able to
      emulate it in-kernel?
  - In case userspace emulation depend on some kernel capability,
    userspace can use CHECK_EXTENSION before setting it on SET_CPUID.
- If the kernel can emulate it without depending on any userspace
  capability/behavior, the feature would appear on GET_SUPPORTED_CPUID.
  - If userspace wants to use the kernel emulation, it would just enable
    it on SET_CPUID.
  - If userspace wants to emulate it itself, it would set it on
    SET_CPUID, but it would need to ask the kernel to disable the
    in-kernel emulation (how?).

I am more worried about this case:

- If the kernel can emulate it, but depends on specific userspace
  capability/behavior to work. That feature can't appear on
  GET_SUPPORTED_CPUID by default, otherwise older qemu versions using
  "-cpu host" would try to enable it without doing what's necessary to
  make it work. This is the case for x2apic and TSC-deadline.
Liu, Jinsong - June 14, 2012, 7:02 p.m.
Eduardo, Jan

I will update tsc deadline timer patch (at qemu-kvm side) recently.
Have you made a final agreement of the issue 'KVM_CAP_TSC_DEADLINE_TIMER' vs. 'GET_SUPPORTED_CPUID'?

Thanks,
Jinsong


Eduardo Habkost wrote:
> (CCing Andre Przywara, in case he can help to clarify what's the
> expected meaning of "-cpu host")
> 
> On Tue, Apr 24, 2012 at 06:06:55PM +0200, Jan Kiszka wrote:
>> On 2012-04-23 22:02, Eduardo Habkost wrote:
>>> On Mon, Apr 23, 2012 at 06:31:25PM +0200, Jan Kiszka wrote:
>>>> However, that was how I interpreted this GET_SUPPORTED_CPUID. In
>>>> fact, 
>>>> it is used as "kernel or hardware does not _prevent_" already. And
>>>> in that sense, it's ok to enable even features that are not in
>>>> kernel/hardware hands. We should point out this fact in the
>>>> documentation. 
>>> 
>>> I see GET_SUPPORTED_CPUID as just a "what userspace can enable
>>> because the kernel and the hardware support it (= don't prevent
>>> it), as long as userspace has the required support" (meaning A+B).
>>> It's a bit like KVM_CHECK_EXTENSION, but with the nice feature that
>>> that the capabilities map directly to CPUID bits.
>>> 
>>> So, it's not clear to me: now you are OK with adding TSC_DEADLINE
>>> to GET_SUPPORTED_CPUID? 
>>> 
>>> But we still have the issue of "-cpu host" not knowing what can be
>>> safely enabled (without userspace feature-specific setup code), or
>>> not. Do you have any suggestion for that? Avi, do you have any
>>> suggestion? 
>> 
>> First of all, I bet this was already broken with the introduction of
>> x2apic. So TSC deadline won't make it worse. I guess we need to
>> address this in userspace, first by masking those features out,
>> later by actually emulating them.
> 
> I am not sure I understand what you are proposing. Let me explain the
> use case I am thinking about:
> 
> - Feature FOO is of type (A) (e.g. just a new instruction set that
>   doesn't require additional userspace support)
> - User has a Qemu vesion that doesn't know anything about feature FOO
> - User gets a new CPU that supports feature FOO
> - User gets a new kernel that supports feature FOO (i.e. has FOO in
>   GET_SUPPORTED_CPUID)
> - User does _not_ upgrade Qemu.
> - User expects to get feature FOO enabled if using "-cpu host",
>   without upgrading Qemu.
> 
> The problem here is: to support the above use-case, userspace need a
> probing mechanism that can differentiate _new_ (previously unknown)
> features that are in group (A) (safe to blindly enable) from features
> that are in group (B) (that can't be enabled without an userspace
> upgrade).
> 
> In short, it becomes a problem if we consider the following case:
> 
> - Feature BAR is of type (B) (it can't be enabled without extra
>   userspace support)
> - User has a Qemu version that doesn't know anything about feature BAR
> - User gets a new CPU that supports feature BAR
> - User gets a new kernel that supports feature BAR (i.e. has BAR in
>   GET_SUPPORTED_CPUID)
> - User does _not_ upgrade Qemu.
> - User simply shouldn't get feature BAR enabled, even if using "-cpu
>   host", otherwise Qemu would break.
> 
> If userspace always limited itself to features it knows about, it
> would be really easy to implement the feature without any new probing
> mechanism from the kernel. But that's not how I think users expect
> "-cpu host" to work. Maybe I am wrong, I don't know. I am CCing
> Andre, who introduced the "-cpu host" feature, in case he can explain
> what's the expected semantics on the cases above.
> 
>> 
>>> 
>>> And I still don't know the answer to:
>>> 
>>>>> - How to precisely define the groups (A) and (B)?
>>>>>   - "requires additional code only if migration is required"
>>>>>     qualifies as (B) or (A)?
>>> 
>>> 
>>> Re: documentation, isn't the following paragraph (already present
>>> on api.txt) sufficient? 
>>> 
>>> "The entries returned are the host cpuid as returned by the cpuid
>>> instruction, with unknown or unsupported features masked out.  Some
>>> features (for example, x2apic), may not be present in the host cpu,
>>> but are exposed by kvm if it can emulate them efficiently."
>> 
>> That suggests such features are always emulated - which is not true.
>> They are either emulated, or nothing _prevents_ their emulation by
>> user space.
> 
> Well... it's a bit more complicated than that: the current semantics
> are a bit more than "doesn't prevent", as in theory every single
> feature can be emulated by userspace, without any help from the
> kernel. So, if "doesn't prevent" were the only criteria, the kernel
> would set every single feature bit on GET_SUPPORTED_CPUID, making it
> not very useful. 
> 
> At least in the case of x2apic, the kernel is using
> GET_SUPPORTED_CPUID to expose a _capability_ too: when x2apic is
> present on GET_SUPPORTED_CPUID, userspace knows that in addition to
> "not preventing" the feature from being enabled, the kernel is now
> able to emulate x2apic (if proper setup is made by userspace). A
> kernel that can't emulate x2apic (even if userspace was allowed to
> emulate it completely in userspace) would never have x2apic enabled on
> GET_SUPPORTED_CPUID.
> 
> Like I said previously, in the end GET_SUPPORTED_CPUID is just a
> capability querying mechanism like KVM_CHECK_EXTENSION (where each
> extension have a specific kernel-capability meaning), but with the
> nice feature of being automatically mapped to CPUID bits (with no
> need for additional KVM_CAP_* => CPUID translation in userspace).
Eduardo Habkost - June 14, 2012, 7:12 p.m.
On Thu, Jun 14, 2012 at 07:02:03PM +0000, Liu, Jinsong wrote:
> Eduardo, Jan
> 
> I will update tsc deadline timer patch (at qemu-kvm side) recently.
> Have you made a final agreement of the issue 'KVM_CAP_TSC_DEADLINE_TIMER' vs. 'GET_SUPPORTED_CPUID'?

I don't think there's a final agreement, but I was convinced later that
it's probably better to _not_ have TSC-deadline on GET_SUPPORTED_CPUID,
at least not by default.

Even if this is changed in the future, it's a good idea to make qemu
support the KVM_CAP_TSC_DEADLINE_TIMER method if running on older
kernels, anyway.


> Eduardo Habkost wrote:
> > (CCing Andre Przywara, in case he can help to clarify what's the
> > expected meaning of "-cpu host")
> > 
> > On Tue, Apr 24, 2012 at 06:06:55PM +0200, Jan Kiszka wrote:
> >> On 2012-04-23 22:02, Eduardo Habkost wrote:
> >>> On Mon, Apr 23, 2012 at 06:31:25PM +0200, Jan Kiszka wrote:
> >>>> However, that was how I interpreted this GET_SUPPORTED_CPUID. In
> >>>> fact, 
> >>>> it is used as "kernel or hardware does not _prevent_" already. And
> >>>> in that sense, it's ok to enable even features that are not in
> >>>> kernel/hardware hands. We should point out this fact in the
> >>>> documentation. 
> >>> 
> >>> I see GET_SUPPORTED_CPUID as just a "what userspace can enable
> >>> because the kernel and the hardware support it (= don't prevent
> >>> it), as long as userspace has the required support" (meaning A+B).
> >>> It's a bit like KVM_CHECK_EXTENSION, but with the nice feature that
> >>> that the capabilities map directly to CPUID bits.
> >>> 
> >>> So, it's not clear to me: now you are OK with adding TSC_DEADLINE
> >>> to GET_SUPPORTED_CPUID? 
> >>> 
> >>> But we still have the issue of "-cpu host" not knowing what can be
> >>> safely enabled (without userspace feature-specific setup code), or
> >>> not. Do you have any suggestion for that? Avi, do you have any
> >>> suggestion? 
> >> 
> >> First of all, I bet this was already broken with the introduction of
> >> x2apic. So TSC deadline won't make it worse. I guess we need to
> >> address this in userspace, first by masking those features out,
> >> later by actually emulating them.
> > 
> > I am not sure I understand what you are proposing. Let me explain the
> > use case I am thinking about:
> > 
> > - Feature FOO is of type (A) (e.g. just a new instruction set that
> >   doesn't require additional userspace support)
> > - User has a Qemu vesion that doesn't know anything about feature FOO
> > - User gets a new CPU that supports feature FOO
> > - User gets a new kernel that supports feature FOO (i.e. has FOO in
> >   GET_SUPPORTED_CPUID)
> > - User does _not_ upgrade Qemu.
> > - User expects to get feature FOO enabled if using "-cpu host",
> >   without upgrading Qemu.
> > 
> > The problem here is: to support the above use-case, userspace need a
> > probing mechanism that can differentiate _new_ (previously unknown)
> > features that are in group (A) (safe to blindly enable) from features
> > that are in group (B) (that can't be enabled without an userspace
> > upgrade).
> > 
> > In short, it becomes a problem if we consider the following case:
> > 
> > - Feature BAR is of type (B) (it can't be enabled without extra
> >   userspace support)
> > - User has a Qemu version that doesn't know anything about feature BAR
> > - User gets a new CPU that supports feature BAR
> > - User gets a new kernel that supports feature BAR (i.e. has BAR in
> >   GET_SUPPORTED_CPUID)
> > - User does _not_ upgrade Qemu.
> > - User simply shouldn't get feature BAR enabled, even if using "-cpu
> >   host", otherwise Qemu would break.
> > 
> > If userspace always limited itself to features it knows about, it
> > would be really easy to implement the feature without any new probing
> > mechanism from the kernel. But that's not how I think users expect
> > "-cpu host" to work. Maybe I am wrong, I don't know. I am CCing
> > Andre, who introduced the "-cpu host" feature, in case he can explain
> > what's the expected semantics on the cases above.
> > 
> >> 
> >>> 
> >>> And I still don't know the answer to:
> >>> 
> >>>>> - How to precisely define the groups (A) and (B)?
> >>>>>   - "requires additional code only if migration is required"
> >>>>>     qualifies as (B) or (A)?
> >>> 
> >>> 
> >>> Re: documentation, isn't the following paragraph (already present
> >>> on api.txt) sufficient? 
> >>> 
> >>> "The entries returned are the host cpuid as returned by the cpuid
> >>> instruction, with unknown or unsupported features masked out.  Some
> >>> features (for example, x2apic), may not be present in the host cpu,
> >>> but are exposed by kvm if it can emulate them efficiently."
> >> 
> >> That suggests such features are always emulated - which is not true.
> >> They are either emulated, or nothing _prevents_ their emulation by
> >> user space.
> > 
> > Well... it's a bit more complicated than that: the current semantics
> > are a bit more than "doesn't prevent", as in theory every single
> > feature can be emulated by userspace, without any help from the
> > kernel. So, if "doesn't prevent" were the only criteria, the kernel
> > would set every single feature bit on GET_SUPPORTED_CPUID, making it
> > not very useful. 
> > 
> > At least in the case of x2apic, the kernel is using
> > GET_SUPPORTED_CPUID to expose a _capability_ too: when x2apic is
> > present on GET_SUPPORTED_CPUID, userspace knows that in addition to
> > "not preventing" the feature from being enabled, the kernel is now
> > able to emulate x2apic (if proper setup is made by userspace). A
> > kernel that can't emulate x2apic (even if userspace was allowed to
> > emulate it completely in userspace) would never have x2apic enabled on
> > GET_SUPPORTED_CPUID.
> > 
> > Like I said previously, in the end GET_SUPPORTED_CPUID is just a
> > capability querying mechanism like KVM_CHECK_EXTENSION (where each
> > extension have a specific kernel-capability meaning), but with the
> > nice feature of being automatically mapped to CPUID bits (with no
> > need for additional KVM_CAP_* => CPUID translation in userspace).
>
Liu, Jinsong - June 14, 2012, 7:18 p.m.
Eduardo Habkost wrote:
> On Thu, Jun 14, 2012 at 07:02:03PM +0000, Liu, Jinsong wrote:
>> Eduardo, Jan
>> 
>> I will update tsc deadline timer patch (at qemu-kvm side) recently.
>> Have you made a final agreement of the issue
>> 'KVM_CAP_TSC_DEADLINE_TIMER' vs. 'GET_SUPPORTED_CPUID'? 
> 
> I don't think there's a final agreement, but I was convinced later
> that it's probably better to _not_ have TSC-deadline on
> GET_SUPPORTED_CPUID, at least not by default.
> 
> Even if this is changed in the future, it's a good idea to make qemu
> support the KVM_CAP_TSC_DEADLINE_TIMER method if running on older
> kernels, anyway.

OK, so I will coding based on current KVM_CAP_TSC_DEADLINE_TIMER method.

Thanks for clarifying!


> 
> 
>> Eduardo Habkost wrote:
>>> (CCing Andre Przywara, in case he can help to clarify what's the
>>> expected meaning of "-cpu host")
>>> 
>>> On Tue, Apr 24, 2012 at 06:06:55PM +0200, Jan Kiszka wrote:
>>>> On 2012-04-23 22:02, Eduardo Habkost wrote:
>>>>> On Mon, Apr 23, 2012 at 06:31:25PM +0200, Jan Kiszka wrote:
>>>>>> However, that was how I interpreted this GET_SUPPORTED_CPUID. In
>>>>>> fact, it is used as "kernel or hardware does not _prevent_"
>>>>>> already. And in that sense, it's ok to enable even features that
>>>>>> are not in kernel/hardware hands. We should point out this fact
>>>>>> in the documentation.
>>>>> 
>>>>> I see GET_SUPPORTED_CPUID as just a "what userspace can enable
>>>>> because the kernel and the hardware support it (= don't prevent
>>>>> it), as long as userspace has the required support" (meaning A+B).
>>>>> It's a bit like KVM_CHECK_EXTENSION, but with the nice feature
>>>>> that that the capabilities map directly to CPUID bits.
>>>>> 
>>>>> So, it's not clear to me: now you are OK with adding TSC_DEADLINE
>>>>> to GET_SUPPORTED_CPUID? 
>>>>> 
>>>>> But we still have the issue of "-cpu host" not knowing what can be
>>>>> safely enabled (without userspace feature-specific setup code), or
>>>>> not. Do you have any suggestion for that? Avi, do you have any
>>>>> suggestion?
>>>> 
>>>> First of all, I bet this was already broken with the introduction
>>>> of x2apic. So TSC deadline won't make it worse. I guess we need to
>>>> address this in userspace, first by masking those features out,
>>>> later by actually emulating them.
>>> 
>>> I am not sure I understand what you are proposing. Let me explain
>>> the use case I am thinking about: 
>>> 
>>> - Feature FOO is of type (A) (e.g. just a new instruction set that
>>>   doesn't require additional userspace support)
>>> - User has a Qemu vesion that doesn't know anything about feature
>>> FOO 
>>> - User gets a new CPU that supports feature FOO
>>> - User gets a new kernel that supports feature FOO (i.e. has FOO in
>>> GET_SUPPORTED_CPUID) 
>>> - User does _not_ upgrade Qemu.
>>> - User expects to get feature FOO enabled if using "-cpu host",  
>>> without upgrading Qemu. 
>>> 
>>> The problem here is: to support the above use-case, userspace need a
>>> probing mechanism that can differentiate _new_ (previously unknown)
>>> features that are in group (A) (safe to blindly enable) from
>>> features that are in group (B) (that can't be enabled without an
>>> userspace upgrade). 
>>> 
>>> In short, it becomes a problem if we consider the following case:
>>> 
>>> - Feature BAR is of type (B) (it can't be enabled without extra  
>>> userspace support) 
>>> - User has a Qemu version that doesn't know anything about feature
>>> BAR 
>>> - User gets a new CPU that supports feature BAR
>>> - User gets a new kernel that supports feature BAR (i.e. has BAR in
>>> GET_SUPPORTED_CPUID) 
>>> - User does _not_ upgrade Qemu.
>>> - User simply shouldn't get feature BAR enabled, even if using "-cpu
>>>   host", otherwise Qemu would break.
>>> 
>>> If userspace always limited itself to features it knows about, it
>>> would be really easy to implement the feature without any new
>>> probing mechanism from the kernel. But that's not how I think users
>>> expect "-cpu host" to work. Maybe I am wrong, I don't know. I am
>>> CCing Andre, who introduced the "-cpu host" feature, in case he can
>>> explain what's the expected semantics on the cases above.
>>> 
>>>> 
>>>>> 
>>>>> And I still don't know the answer to:
>>>>> 
>>>>>>> - How to precisely define the groups (A) and (B)?
>>>>>>>   - "requires additional code only if migration is required"
>>>>>>>     qualifies as (B) or (A)?
>>>>> 
>>>>> 
>>>>> Re: documentation, isn't the following paragraph (already present
>>>>> on api.txt) sufficient? 
>>>>> 
>>>>> "The entries returned are the host cpuid as returned by the cpuid
>>>>> instruction, with unknown or unsupported features masked out. 
>>>>> Some features (for example, x2apic), may not be present in the
>>>>> host cpu, but are exposed by kvm if it can emulate them
>>>>> efficiently." 
>>>> 
>>>> That suggests such features are always emulated - which is not
>>>> true. They are either emulated, or nothing _prevents_ their
>>>> emulation by user space.
>>> 
>>> Well... it's a bit more complicated than that: the current semantics
>>> are a bit more than "doesn't prevent", as in theory every single
>>> feature can be emulated by userspace, without any help from the
>>> kernel. So, if "doesn't prevent" were the only criteria, the kernel
>>> would set every single feature bit on GET_SUPPORTED_CPUID, making
>>> it not very useful. 
>>> 
>>> At least in the case of x2apic, the kernel is using
>>> GET_SUPPORTED_CPUID to expose a _capability_ too: when x2apic is
>>> present on GET_SUPPORTED_CPUID, userspace knows that in addition to
>>> "not preventing" the feature from being enabled, the kernel is now
>>> able to emulate x2apic (if proper setup is made by userspace). A
>>> kernel that can't emulate x2apic (even if userspace was allowed to
>>> emulate it completely in userspace) would never have x2apic enabled
>>> on GET_SUPPORTED_CPUID. 
>>> 
>>> Like I said previously, in the end GET_SUPPORTED_CPUID is just a
>>> capability querying mechanism like KVM_CHECK_EXTENSION (where each
>>> extension have a specific kernel-capability meaning), but with the
>>> nice feature of being automatically mapped to CPUID bits (with no
>>> need for additional KVM_CAP_* => CPUID translation in userspace).

Patch

====================
From 5b7d5f459b621686e78e437010ce34748bcb9e8e Mon Sep 17 00:00:00 2001
From: Liu, Jinsong <jinsong.liu@intel.com>
Date: Wed, 29 Feb 2012 01:53:15 +0800
Subject: [PATCH] Expose tsc deadline timer feature to guest

It exposes tsc deadline timer feature to guest if in-kernel irqchip is used
and kvm has emulated tsc deadline timer.
It also authorizes user to control the feature exposing via a cpu feature flag.

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
---
 target-i386/cpu.h   |    1 +
 target-i386/cpuid.c |    2 +-
 target-i386/kvm.c   |    4 ++++
 3 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index d92be5d..3409afe 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -399,6 +399,7 @@ 
 #define CPUID_EXT_X2APIC   (1 << 21)
 #define CPUID_EXT_MOVBE    (1 << 22)
 #define CPUID_EXT_POPCNT   (1 << 23)
+#define CPUID_EXT_TSC_DEADLINE_TIMER (1 << 24)
 #define CPUID_EXT_XSAVE    (1 << 26)
 #define CPUID_EXT_OSXSAVE  (1 << 27)
 #define CPUID_EXT_HYPERVISOR  (1 << 31)
diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
index b9bfeaf..ac4b79c 100644
--- a/target-i386/cpuid.c
+++ b/target-i386/cpuid.c
@@ -50,7 +50,7 @@  static const char *ext_feature_name[] = {
     "fma", "cx16", "xtpr", "pdcm",
     NULL, NULL, "dca", "sse4.1|sse4_1",
     "sse4.2|sse4_2", "x2apic", "movbe", "popcnt",
-    NULL, "aes", "xsave", "osxsave",
+    "tsc_deadline", "aes", "xsave", "osxsave",
     "avx", NULL, NULL, "hypervisor",
 };
 static const char *ext2_feature_name[] = {
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 7079e87..2639699 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -370,6 +370,10 @@  int kvm_arch_init_vcpu(CPUState *env)
     i = env->cpuid_ext_features & CPUID_EXT_HYPERVISOR;
     env->cpuid_ext_features &= kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
     env->cpuid_ext_features |= i;
+    if (!kvm_irqchip_in_kernel() ||
+        !kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
+        env->cpuid_ext_features &= ~CPUID_EXT_TSC_DEADLINE_TIMER;
+    }
 
     env->cpuid_ext2_features &= kvm_arch_get_supported_cpuid(s, 0x80000001,
                                                              0, R_EDX);