diff mbox series

[V2,2/3] target/i386: call cpu_exec_realizefn before x86_cpu_filter_features

Message ID 20240524200017.150339-3-zide.chen@intel.com
State New
Headers show
Series improve -overcommit cpu-pm=on|off | expand

Commit Message

Chen, Zide May 24, 2024, 8 p.m. UTC
cpu_exec_realizefn which calls the accel-specific realizefn may expand
features.  e.g., some accel-specific options may require extra features
to be enabled, and it's appropriate to expand these features in accel-
specific realizefn.

One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.

Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
that it won't expose features not supported by the host.

Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: Zide Chen <zide.chen@intel.com>
---
 target/i386/cpu.c         | 24 ++++++++++++------------
 target/i386/kvm/kvm-cpu.c |  1 -
 2 files changed, 12 insertions(+), 13 deletions(-)

Comments

Zhao Liu May 31, 2024, 6:30 a.m. UTC | #1
Hi Zide,

On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:
> Date: Fri, 24 May 2024 13:00:16 -0700
> From: Zide Chen <zide.chen@intel.com>
> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>  x86_cpu_filter_features
> X-Mailer: git-send-email 2.34.1
> 
> cpu_exec_realizefn which calls the accel-specific realizefn may expand
> features.  e.g., some accel-specific options may require extra features
> to be enabled, and it's appropriate to expand these features in accel-
> specific realizefn.
> 
> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
> 
> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
> that it won't expose features not supported by the host.
> 
> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
> Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
> Signed-off-by: Zide Chen <zide.chen@intel.com>
> ---
>  target/i386/cpu.c         | 24 ++++++++++++------------
>  target/i386/kvm/kvm-cpu.c |  1 -
>  2 files changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index bc2dceb647fa..a1c1c785bd2f 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>          }
>      }
>  
> +    /*
> +     * note: the call to the framework needs to happen after feature expansion,
> +     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
> +     * These may be set by the accel-specific code,
> +     * and the results are subsequently checked / assumed in this function.
> +     */
> +    cpu_exec_realizefn(cs, &local_err);
> +    if (local_err != NULL) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
>      x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);

For your case, which sets cpu-pm=on via overcommit, then
x86_cpu_filter_features() will complain that mwait is not supported.

Such warning is not necessary, because the purpose of overcommit (from
code) is only to support mwait when possible, not to commit to support
mwait in Guest.

Additionally, I understand x86_cpu_filter_features() is primarily
intended to filter features configured by the user, and the changes of
CPUID after x86_cpu_filter_features() should by default be regarded like
"QEMU knows what it is doing".

I feel adding a check for the CPUID mwait bit in host_cpu_realizefn()
is enough, after all, this bit should be present if host supports mwait
and enable_cpu_pm (in kvm_arch_get_supported_cpuid()).

Thanks,
Zhao
Chen, Zide May 31, 2024, 5:13 p.m. UTC | #2
On 5/30/2024 11:30 PM, Zhao Liu wrote:
> Hi Zide,
> 
> On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:
>> Date: Fri, 24 May 2024 13:00:16 -0700
>> From: Zide Chen <zide.chen@intel.com>
>> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>  x86_cpu_filter_features
>> X-Mailer: git-send-email 2.34.1
>>
>> cpu_exec_realizefn which calls the accel-specific realizefn may expand
>> features.  e.g., some accel-specific options may require extra features
>> to be enabled, and it's appropriate to expand these features in accel-
>> specific realizefn.
>>
>> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
>>
>> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
>> that it won't expose features not supported by the host.
>>
>> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
>> Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>> ---
>>  target/i386/cpu.c         | 24 ++++++++++++------------
>>  target/i386/kvm/kvm-cpu.c |  1 -
>>  2 files changed, 12 insertions(+), 13 deletions(-)
>>
>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>> index bc2dceb647fa..a1c1c785bd2f 100644
>> --- a/target/i386/cpu.c
>> +++ b/target/i386/cpu.c
>> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>>          }
>>      }
>>  
>> +    /*
>> +     * note: the call to the framework needs to happen after feature expansion,
>> +     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
>> +     * These may be set by the accel-specific code,
>> +     * and the results are subsequently checked / assumed in this function.
>> +     */
>> +    cpu_exec_realizefn(cs, &local_err);
>> +    if (local_err != NULL) {
>> +        error_propagate(errp, local_err);
>> +        return;
>> +    }
>> +
>>      x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);
> 
> For your case, which sets cpu-pm=on via overcommit, then
> x86_cpu_filter_features() will complain that mwait is not supported.
> 
> Such warning is not necessary, because the purpose of overcommit (from
> code) is only to support mwait when possible, not to commit to support
> mwait in Guest.
> 
> Additionally, I understand x86_cpu_filter_features() is primarily
> intended to filter features configured by the user, 

Yes, that's why this patches intends to let x86_cpu_filter_features()
filter out the MWAIT bit which is set from the overcommit option.

> and the changes of
> CPUID after x86_cpu_filter_features() should by default be regarded like
> "QEMU knows what it is doing".

Sure, we can add feature bits after x86_cpu_filter_features(), but I
think moving cpu_exec_realizefn() before x86_cpu_filter_features() is
more generic, and actually this is what QEMU did before commit 662175b91ff2.

- Less redundant code. Specifically, no need to call
x86_cpu_get_supported_feature_word() again.
- Potentially there could be other features could be added from the
accel-specific realizefn, kvm_cpu_realizefn() for example.  And these
features need to be checked against the host availability.

> 
> I feel adding a check for the CPUID mwait bit in host_cpu_realizefn()
> is enough, after all, this bit should be present if host supports mwait
> and enable_cpu_pm (in kvm_arch_get_supported_cpuid()).

Besides the above reasons, it seems to me expanding env->features in
host-cpu.c is confusing.

> Thanks,
> Zhao
>
Zhao Liu June 1, 2024, 3:26 p.m. UTC | #3
On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote:
> Date: Fri, 31 May 2024 10:13:47 -0700
> From: "Chen, Zide" <zide.chen@intel.com>
> Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>  x86_cpu_filter_features
> 
> On 5/30/2024 11:30 PM, Zhao Liu wrote:
> > Hi Zide,
> > 
> > On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:
> >> Date: Fri, 24 May 2024 13:00:16 -0700
> >> From: Zide Chen <zide.chen@intel.com>
> >> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
> >>  x86_cpu_filter_features
> >> X-Mailer: git-send-email 2.34.1
> >>
> >> cpu_exec_realizefn which calls the accel-specific realizefn may expand
> >> features.  e.g., some accel-specific options may require extra features
> >> to be enabled, and it's appropriate to expand these features in accel-
> >> specific realizefn.
> >>
> >> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
> >>
> >> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
> >> that it won't expose features not supported by the host.
> >>
> >> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
> >> Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
> >> Signed-off-by: Zide Chen <zide.chen@intel.com>
> >> ---
> >>  target/i386/cpu.c         | 24 ++++++++++++------------
> >>  target/i386/kvm/kvm-cpu.c |  1 -
> >>  2 files changed, 12 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> >> index bc2dceb647fa..a1c1c785bd2f 100644
> >> --- a/target/i386/cpu.c
> >> +++ b/target/i386/cpu.c
> >> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> >>          }
> >>      }
> >>  
> >> +    /*
> >> +     * note: the call to the framework needs to happen after feature expansion,
> >> +     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
> >> +     * These may be set by the accel-specific code,
> >> +     * and the results are subsequently checked / assumed in this function.
> >> +     */
> >> +    cpu_exec_realizefn(cs, &local_err);
> >> +    if (local_err != NULL) {
> >> +        error_propagate(errp, local_err);
> >> +        return;
> >> +    }
> >> +
> >>      x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);
> > 
> > For your case, which sets cpu-pm=on via overcommit, then
> > x86_cpu_filter_features() will complain that mwait is not supported.
> > 
> > Such warning is not necessary, because the purpose of overcommit (from
> > code) is only to support mwait when possible, not to commit to support
> > mwait in Guest.
> > 
> > Additionally, I understand x86_cpu_filter_features() is primarily
> > intended to filter features configured by the user, 
> 
> Yes, that's why this patches intends to let x86_cpu_filter_features()
> filter out the MWAIT bit which is set from the overcommit option.

HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT
bit set by "-overcommit cpu-pm=on". ;-)

(Pls correct me if I'm wrong) Revisiting what cpu-pm did to MWAIT:
* Firstly, it set MWAIT bit in x86_cpu_expand_features():
  x86_cpu_expand_features()
     -> x86_cpu_get_supported_feature_word()
        -> kvm_arch_get_supported_cpuid()
 This MWAIT is based on Host's MWAIT capability. This MWAIT enablement
 is fine for next x86_cpu_filter_features() and x86_cpu_filter_features()
 is working correctly here!

* Then, MWAIT was secondly set in host_cpu_enable_cpu_pm() regardless
  neither Host's support or previous MWAIT enablement result. This is
  the root cause of your issue.

Therefore, we should make cpu-pm honor his first MWAIT enablement result
instead of repeatly and unconditionally setting the MWAIT bit again in
host_cpu_enable_cpu_pm().

Additionally, I think the code in x86_cpu_realizefn():
  cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
has the similar issue because it also should check MWAIT feature bit.

Further, it may be possible to remove cpu->mwait: just check the MWAIT
bit in leaf 5 of cpu_x86_cpuid(), and if MWAIT is present, use host's
mwait info plus CPUID_MWAIT_EMX | CPUID_MWAIT_IBE.

> > and the changes of
> > CPUID after x86_cpu_filter_features() should by default be regarded like
> > "QEMU knows what it is doing".
> 
> Sure, we can add feature bits after x86_cpu_filter_features(), but I
> think moving cpu_exec_realizefn() before x86_cpu_filter_features() is
> more generic, and actually this is what QEMU did before commit 662175b91ff2.
> 
> - Less redundant code. Specifically, no need to call
> x86_cpu_get_supported_feature_word() again.
> - Potentially there could be other features could be added from the
> accel-specific realizefn, kvm_cpu_realizefn() for example.  And these
> features need to be checked against the host availability.

Mainly I don't think this reorder is a direct fix for the problem (I
just analyse it above), also in your case x86_cpu_filter_features() will
print a WARNING when QEMU boots, which I don't think is cpu-pm's intention.
Igor Mammedov June 3, 2024, 9:30 a.m. UTC | #4
On Sat, 1 Jun 2024 23:26:55 +0800
Zhao Liu <zhao1.liu@intel.com> wrote:

> On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote:
> > Date: Fri, 31 May 2024 10:13:47 -0700
> > From: "Chen, Zide" <zide.chen@intel.com>
> > Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
> >  x86_cpu_filter_features
> > 
> > On 5/30/2024 11:30 PM, Zhao Liu wrote:  
> > > Hi Zide,
> > > 
> > > On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:  
> > >> Date: Fri, 24 May 2024 13:00:16 -0700
> > >> From: Zide Chen <zide.chen@intel.com>
> > >> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
> > >>  x86_cpu_filter_features
> > >> X-Mailer: git-send-email 2.34.1
> > >>
> > >> cpu_exec_realizefn which calls the accel-specific realizefn may expand
> > >> features.  e.g., some accel-specific options may require extra features
> > >> to be enabled, and it's appropriate to expand these features in accel-
> > >> specific realizefn.
> > >>
> > >> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
> > >>
> > >> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
> > >> that it won't expose features not supported by the host.
> > >>
> > >> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
> > >> Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
> > >> Signed-off-by: Zide Chen <zide.chen@intel.com>
> > >> ---
> > >>  target/i386/cpu.c         | 24 ++++++++++++------------
> > >>  target/i386/kvm/kvm-cpu.c |  1 -
> > >>  2 files changed, 12 insertions(+), 13 deletions(-)
> > >>
> > >> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> > >> index bc2dceb647fa..a1c1c785bd2f 100644
> > >> --- a/target/i386/cpu.c
> > >> +++ b/target/i386/cpu.c
> > >> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> > >>          }
> > >>      }
> > >>  
> > >> +    /*
> > >> +     * note: the call to the framework needs to happen after feature expansion,
> > >> +     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
> > >> +     * These may be set by the accel-specific code,
> > >> +     * and the results are subsequently checked / assumed in this function.
> > >> +     */
> > >> +    cpu_exec_realizefn(cs, &local_err);
> > >> +    if (local_err != NULL) {
> > >> +        error_propagate(errp, local_err);
> > >> +        return;
> > >> +    }
> > >> +
> > >>      x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);  
> > > 
> > > For your case, which sets cpu-pm=on via overcommit, then
> > > x86_cpu_filter_features() will complain that mwait is not supported.
> > > 
> > > Such warning is not necessary, because the purpose of overcommit (from
> > > code) is only to support mwait when possible, not to commit to support
> > > mwait in Guest.
> > > 
> > > Additionally, I understand x86_cpu_filter_features() is primarily
> > > intended to filter features configured by the user,   
> > 
> > Yes, that's why this patches intends to let x86_cpu_filter_features()
> > filter out the MWAIT bit which is set from the overcommit option.  
> 
> HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT
> bit set by "-overcommit cpu-pm=on". ;-)
> 
> (Pls correct me if I'm wrong) Revisiting what cpu-pm did to MWAIT:
> * Firstly, it set MWAIT bit in x86_cpu_expand_features():
>   x86_cpu_expand_features()
>      -> x86_cpu_get_supported_feature_word()
>         -> kvm_arch_get_supported_cpuid()  
>  This MWAIT is based on Host's MWAIT capability. This MWAIT enablement
>  is fine for next x86_cpu_filter_features() and x86_cpu_filter_features()
>  is working correctly here!
> 
> * Then, MWAIT was secondly set in host_cpu_enable_cpu_pm() regardless
>   neither Host's support or previous MWAIT enablement result. This is
>   the root cause of your issue.
> 
> Therefore, we should make cpu-pm honor his first MWAIT enablement result
> instead of repeatly and unconditionally setting the MWAIT bit again in
> host_cpu_enable_cpu_pm().
> 
> Additionally, I think the code in x86_cpu_realizefn():
>   cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
> has the similar issue because it also should check MWAIT feature bit.
> 
> Further, it may be possible to remove cpu->mwait: just check the MWAIT
> bit in leaf 5 of cpu_x86_cpuid(), and if MWAIT is present, use host's
> mwait info plus CPUID_MWAIT_EMX | CPUID_MWAIT_IBE.

Agreed with above analysis,
we shouldn't have host_cpu_enable_cpu_pm() as kvm_arch_get_supported_cpuid
gets us MWAIT already.

filling in cpu->mwait.ecx is benign mistake which likely doesn't
trigger anything if CPUID_EXT_MONITOR is not present.
But for clarity it's better to add an explicit check there as well.

> 
> > > and the changes of
> > > CPUID after x86_cpu_filter_features() should by default be regarded like
> > > "QEMU knows what it is doing".  
> > 
> > Sure, we can add feature bits after x86_cpu_filter_features(), but I
> > think moving cpu_exec_realizefn() before x86_cpu_filter_features() is
> > more generic, and actually this is what QEMU did before commit 662175b91ff2.
> > 
> > - Less redundant code. Specifically, no need to call
> > x86_cpu_get_supported_feature_word() again.
> > - Potentially there could be other features could be added from the
> > accel-specific realizefn, kvm_cpu_realizefn() for example.  And these
> > features need to be checked against the host availability.  
> 
> Mainly I don't think this reorder is a direct fix for the problem (I
> just analyse it above), also in your case x86_cpu_filter_features() will
> print a WARNING when QEMU boots, which I don't think is cpu-pm's intention.

There is no problem with warning, I'd even say it's a good thing.
But you are right reordering just masks the issue.

As for expected behavior, if user asked for "-overcommit cpu-pm=on"
there are 2 options:
   * it's working as expected (mwait exiting is enabled successfully with CPUID MONITOR bit set)
   * QEMU shall fail to start.
Chen, Zide June 3, 2024, 9:29 p.m. UTC | #5
On 6/3/2024 2:30 AM, Igor Mammedov wrote:
> On Sat, 1 Jun 2024 23:26:55 +0800
> Zhao Liu <zhao1.liu@intel.com> wrote:
> 
>> On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote:
>>> Date: Fri, 31 May 2024 10:13:47 -0700
>>> From: "Chen, Zide" <zide.chen@intel.com>
>>> Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>>  x86_cpu_filter_features
>>>
>>> On 5/30/2024 11:30 PM, Zhao Liu wrote:  
>>>> Hi Zide,
>>>>
>>>> On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:  
>>>>> Date: Fri, 24 May 2024 13:00:16 -0700
>>>>> From: Zide Chen <zide.chen@intel.com>
>>>>> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>>>>  x86_cpu_filter_features
>>>>> X-Mailer: git-send-email 2.34.1
>>>>>
>>>>> cpu_exec_realizefn which calls the accel-specific realizefn may expand
>>>>> features.  e.g., some accel-specific options may require extra features
>>>>> to be enabled, and it's appropriate to expand these features in accel-
>>>>> specific realizefn.
>>>>>
>>>>> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
>>>>>
>>>>> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
>>>>> that it won't expose features not supported by the host.
>>>>>
>>>>> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
>>>>> Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>>>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>>>>> ---
>>>>>  target/i386/cpu.c         | 24 ++++++++++++------------
>>>>>  target/i386/kvm/kvm-cpu.c |  1 -
>>>>>  2 files changed, 12 insertions(+), 13 deletions(-)
>>>>>
>>>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>>>>> index bc2dceb647fa..a1c1c785bd2f 100644
>>>>> --- a/target/i386/cpu.c
>>>>> +++ b/target/i386/cpu.c
>>>>> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>>>>>          }
>>>>>      }
>>>>>  
>>>>> +    /*
>>>>> +     * note: the call to the framework needs to happen after feature expansion,
>>>>> +     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
>>>>> +     * These may be set by the accel-specific code,
>>>>> +     * and the results are subsequently checked / assumed in this function.
>>>>> +     */
>>>>> +    cpu_exec_realizefn(cs, &local_err);
>>>>> +    if (local_err != NULL) {
>>>>> +        error_propagate(errp, local_err);
>>>>> +        return;
>>>>> +    }
>>>>> +
>>>>>      x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);  
>>>>
>>>> For your case, which sets cpu-pm=on via overcommit, then
>>>> x86_cpu_filter_features() will complain that mwait is not supported.
>>>>
>>>> Such warning is not necessary, because the purpose of overcommit (from
>>>> code) is only to support mwait when possible, not to commit to support
>>>> mwait in Guest.
>>>>
>>>> Additionally, I understand x86_cpu_filter_features() is primarily
>>>> intended to filter features configured by the user,   
>>>
>>> Yes, that's why this patches intends to let x86_cpu_filter_features()
>>> filter out the MWAIT bit which is set from the overcommit option.  
>>
>> HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT
>> bit set by "-overcommit cpu-pm=on". ;-)
>>
>> (Pls correct me if I'm wrong) Revisiting what cpu-pm did to MWAIT:
>> * Firstly, it set MWAIT bit in x86_cpu_expand_features():
>>   x86_cpu_expand_features()
>>      -> x86_cpu_get_supported_feature_word()
>>         -> kvm_arch_get_supported_cpuid()  
>>  This MWAIT is based on Host's MWAIT capability. This MWAIT enablement
>>  is fine for next x86_cpu_filter_features() and x86_cpu_filter_features()
>>  is working correctly here!
>>
>> * Then, MWAIT was secondly set in host_cpu_enable_cpu_pm() regardless
>>   neither Host's support or previous MWAIT enablement result. This is
>>   the root cause of your issue.
>>
>> Therefore, we should make cpu-pm honor his first MWAIT enablement result
>> instead of repeatly and unconditionally setting the MWAIT bit again in
>> host_cpu_enable_cpu_pm().
>>
>> Additionally, I think the code in x86_cpu_realizefn():
>>   cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
>> has the similar issue because it also should check MWAIT feature bit.
>>
>> Further, it may be possible to remove cpu->mwait: just check the MWAIT
>> bit in leaf 5 of cpu_x86_cpuid(), and if MWAIT is present, use host's
>> mwait info plus CPUID_MWAIT_EMX | CPUID_MWAIT_IBE.
> 
> Agreed with above analysis,
> we shouldn't have host_cpu_enable_cpu_pm() as kvm_arch_get_supported_cpuid
> gets us MWAIT already.

Yes, I agree don't need to set CPUID_EXT_MONITOR besides
kvm_arch_get_supported_cpuid().

> 
> filling in cpu->mwait.ecx is benign mistake which likely doesn't
> trigger anything if CPUID_EXT_MONITOR is not present.
> But for clarity it's better to add an explicit check there as well.

Yes, I agree without MWAIT available and advertised, it's meaningless to
set the EMX and IBE bits. Seems to me it's cleaner to remove cpu->mwait
all together, and in cpu_x86_cpuid(), just read from host_cpuid(5) if
MWAIT is available to the guest. But I don't understand the history of
why QEMU unconditionally advertises these two bits, and don't know it it
could break some thing if these two bits are removed.

Even if we want to fix these two bits, we can do it in another separate
patch.

e737b32a36 (" Core 2 Duo specification (Alexander Graf).")
unconditionally adds "CPUID_MWAIT_EMX | CPUID_MWAIT_IBE" to CPUID.5.ECX
with further explanation.

2266d44311 ("i386/cpu: make -cpu host support monitor/mwait") adds
comment "We always wake on interrupt even if host does not have the
capability" to CPUID_MWAIT_IBE.


> 
>>
>>>> and the changes of
>>>> CPUID after x86_cpu_filter_features() should by default be regarded like
>>>> "QEMU knows what it is doing".  
>>>
>>> Sure, we can add feature bits after x86_cpu_filter_features(), but I
>>> think moving cpu_exec_realizefn() before x86_cpu_filter_features() is
>>> more generic, and actually this is what QEMU did before commit 662175b91ff2.
>>>
>>> - Less redundant code. Specifically, no need to call
>>> x86_cpu_get_supported_feature_word() again.
>>> - Potentially there could be other features could be added from the
>>> accel-specific realizefn, kvm_cpu_realizefn() for example.  And these
>>> features need to be checked against the host availability.  
>>
>> Mainly I don't think this reorder is a direct fix for the problem (I
>> just analyse it above), also in your case x86_cpu_filter_features() will
>> print a WARNING when QEMU boots, which I don't think is cpu-pm's intention.
> 
> There is no problem with warning, I'd even say it's a good thing.

I agree it's good to have the warning as well.

> But you are right reordering just masks the issue.
> 
> As for expected behavior, if user asked for "-overcommit cpu-pm=on"
> there are 2 options:
>    * it's working as expected (mwait exiting is enabled successfully with CPUID MONITOR bit set)
>    * QEMU shall fail to start.

I like the idea that QEMU refuses to launch the guest if the asked CPU
features are not available, which is more user friendly.  But the
problem is, "-overcommit cpu-pm=on" is an umbrella which intends to
enable all the following CPUIDs and KVM features if it's possible.  So,
if QEMU fails the guest in this case, then it needs to fail the WAITPKG
feature as well. Additionally, it may need to provide individual options
to enable these individual features, which I doubt could be too complicated.

KVM_X86_DISABLE_EXITS_MWAI
KVM_X86_DISABLE_EXITS_HLTKVM_X86_DISABLE_EXITS_PAUSE
KVM_X86_DISABLE_EXITS_CSTATE
CPUID.7.0:ECX.WAITPKG
CPUID.1.ECX.MWAIT
Chen, Zide June 3, 2024, 9:29 p.m. UTC | #6
On 6/1/2024 8:26 AM, Zhao Liu wrote:
> On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote:
>> Date: Fri, 31 May 2024 10:13:47 -0700
>> From: "Chen, Zide" <zide.chen@intel.com>
>> Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>  x86_cpu_filter_features
>>
>> On 5/30/2024 11:30 PM, Zhao Liu wrote:
>>> Hi Zide,
>>>
>>> On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:
>>>> Date: Fri, 24 May 2024 13:00:16 -0700
>>>> From: Zide Chen <zide.chen@intel.com>
>>>> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>>>  x86_cpu_filter_features
>>>> X-Mailer: git-send-email 2.34.1
>>>>
>>>> cpu_exec_realizefn which calls the accel-specific realizefn may expand
>>>> features.  e.g., some accel-specific options may require extra features
>>>> to be enabled, and it's appropriate to expand these features in accel-
>>>> specific realizefn.
>>>>
>>>> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
>>>>
>>>> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
>>>> that it won't expose features not supported by the host.
>>>>
>>>> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
>>>> Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>>>> ---
>>>>  target/i386/cpu.c         | 24 ++++++++++++------------
>>>>  target/i386/kvm/kvm-cpu.c |  1 -
>>>>  2 files changed, 12 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>>>> index bc2dceb647fa..a1c1c785bd2f 100644
>>>> --- a/target/i386/cpu.c
>>>> +++ b/target/i386/cpu.c
>>>> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>>>>          }
>>>>      }
>>>>  
>>>> +    /*
>>>> +     * note: the call to the framework needs to happen after feature expansion,
>>>> +     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
>>>> +     * These may be set by the accel-specific code,
>>>> +     * and the results are subsequently checked / assumed in this function.
>>>> +     */
>>>> +    cpu_exec_realizefn(cs, &local_err);
>>>> +    if (local_err != NULL) {
>>>> +        error_propagate(errp, local_err);
>>>> +        return;
>>>> +    }
>>>> +
>>>>      x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);
>>>
>>> For your case, which sets cpu-pm=on via overcommit, then
>>> x86_cpu_filter_features() will complain that mwait is not supported.
>>>
>>> Such warning is not necessary, because the purpose of overcommit (from
>>> code) is only to support mwait when possible, not to commit to support
>>> mwait in Guest.
>>>
>>> Additionally, I understand x86_cpu_filter_features() is primarily
>>> intended to filter features configured by the user, 
>>
>> Yes, that's why this patches intends to let x86_cpu_filter_features()
>> filter out the MWAIT bit which is set from the overcommit option.
> 
> HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT
> bit set by "-overcommit cpu-pm=on". ;-)
> 
> (Pls correct me if I'm wrong) Revisiting what cpu-pm did to MWAIT:
> * Firstly, it set MWAIT bit in x86_cpu_expand_features():
>   x86_cpu_expand_features()
>      -> x86_cpu_get_supported_feature_word()
>         -> kvm_arch_get_supported_cpuid()
>  This MWAIT is based on Host's MWAIT capability. This MWAIT enablement
>  is fine for next x86_cpu_filter_features() and x86_cpu_filter_features()
>  is working correctly here!
> 
> * Then, MWAIT was secondly set in host_cpu_enable_cpu_pm() regardless
>   neither Host's support or previous MWAIT enablement result. This is
>   the root cause of your issue.
> 
> Therefore, we should make cpu-pm honor his first MWAIT enablement result
> instead of repeatly and unconditionally setting the MWAIT bit again in
> host_cpu_enable_cpu_pm().

Yes, we don't need to set CPUID_EXT_MONITOR in host_cpu_enable_cpu_pm().

I'll drop this patch though I still think it makes sense to reorder
cpu_exec_realizefn () call.


> 
> Additionally, I think the code in x86_cpu_realizefn():
>   cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
> has the similar issue because it also should check MWAIT feature bit.

As explained in another comment, I agree that these two bits need to be
checked against MWAIT availability, or even we don't need
cpu->mwait->ecx at all. But I don't understand why the code explicitly
states that "We always wake on interrupt even if host does not have the
capability", and don't know if it could cause problems if they are removed.

> 
> Further, it may be possible to remove cpu->mwait: just check the MWAIT
> bit in leaf 5 of cpu_x86_cpuid(), and if MWAIT is present, use host's
> mwait info plus CPUID_MWAIT_EMX | CPUID_MWAIT_IBE.
> 
>>> and the changes of
>>> CPUID after x86_cpu_filter_features() should by default be regarded like
>>> "QEMU knows what it is doing".
>>
>> Sure, we can add feature bits after x86_cpu_filter_features(), but I
>> think moving cpu_exec_realizefn() before x86_cpu_filter_features() is
>> more generic, and actually this is what QEMU did before commit 662175b91ff2.
>>
>> - Less redundant code. Specifically, no need to call
>> x86_cpu_get_supported_feature_word() again.
>> - Potentially there could be other features could be added from the
>> accel-specific realizefn, kvm_cpu_realizefn() for example.  And these
>> features need to be checked against the host availability.
> 
> Mainly I don't think this reorder is a direct fix for the problem (I
> just analyse it above), also in your case x86_cpu_filter_features() will
> print a WARNING when QEMU boots, which I don't think is cpu-pm's intention.
>
Igor Mammedov June 5, 2024, 3:07 p.m. UTC | #7
On Mon, 3 Jun 2024 14:29:50 -0700
"Chen, Zide" <zide.chen@intel.com> wrote:

> On 6/3/2024 2:30 AM, Igor Mammedov wrote:
> > On Sat, 1 Jun 2024 23:26:55 +0800
> > Zhao Liu <zhao1.liu@intel.com> wrote:
> >   
> >> On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote:  
> >>> Date: Fri, 31 May 2024 10:13:47 -0700
> >>> From: "Chen, Zide" <zide.chen@intel.com>
> >>> Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
> >>>  x86_cpu_filter_features
> >>>
> >>> On 5/30/2024 11:30 PM, Zhao Liu wrote:    
> >>>> Hi Zide,
> >>>>
> >>>> On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:    
> >>>>> Date: Fri, 24 May 2024 13:00:16 -0700
> >>>>> From: Zide Chen <zide.chen@intel.com>
> >>>>> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
> >>>>>  x86_cpu_filter_features
> >>>>> X-Mailer: git-send-email 2.34.1
> >>>>>
> >>>>> cpu_exec_realizefn which calls the accel-specific realizefn may expand
> >>>>> features.  e.g., some accel-specific options may require extra features
> >>>>> to be enabled, and it's appropriate to expand these features in accel-
> >>>>> specific realizefn.
> >>>>>
> >>>>> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
> >>>>>
> >>>>> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
> >>>>> that it won't expose features not supported by the host.
> >>>>>
> >>>>> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
> >>>>> Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
> >>>>> Signed-off-by: Zide Chen <zide.chen@intel.com>
> >>>>> ---
> >>>>>  target/i386/cpu.c         | 24 ++++++++++++------------
> >>>>>  target/i386/kvm/kvm-cpu.c |  1 -
> >>>>>  2 files changed, 12 insertions(+), 13 deletions(-)
> >>>>>
> >>>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> >>>>> index bc2dceb647fa..a1c1c785bd2f 100644
> >>>>> --- a/target/i386/cpu.c
> >>>>> +++ b/target/i386/cpu.c
> >>>>> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
> >>>>>          }
> >>>>>      }
> >>>>>  
> >>>>> +    /*
> >>>>> +     * note: the call to the framework needs to happen after feature expansion,
> >>>>> +     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
> >>>>> +     * These may be set by the accel-specific code,
> >>>>> +     * and the results are subsequently checked / assumed in this function.
> >>>>> +     */
> >>>>> +    cpu_exec_realizefn(cs, &local_err);
> >>>>> +    if (local_err != NULL) {
> >>>>> +        error_propagate(errp, local_err);
> >>>>> +        return;
> >>>>> +    }
> >>>>> +
> >>>>>      x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);    
> >>>>
> >>>> For your case, which sets cpu-pm=on via overcommit, then
> >>>> x86_cpu_filter_features() will complain that mwait is not supported.
> >>>>
> >>>> Such warning is not necessary, because the purpose of overcommit (from
> >>>> code) is only to support mwait when possible, not to commit to support
> >>>> mwait in Guest.
> >>>>
> >>>> Additionally, I understand x86_cpu_filter_features() is primarily
> >>>> intended to filter features configured by the user,     
> >>>
> >>> Yes, that's why this patches intends to let x86_cpu_filter_features()
> >>> filter out the MWAIT bit which is set from the overcommit option.    
> >>
> >> HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT
> >> bit set by "-overcommit cpu-pm=on". ;-)
> >>
> >> (Pls correct me if I'm wrong) Revisiting what cpu-pm did to MWAIT:
> >> * Firstly, it set MWAIT bit in x86_cpu_expand_features():
> >>   x86_cpu_expand_features()  
> >>      -> x86_cpu_get_supported_feature_word()
> >>         -> kvm_arch_get_supported_cpuid()    
> >>  This MWAIT is based on Host's MWAIT capability. This MWAIT enablement
> >>  is fine for next x86_cpu_filter_features() and x86_cpu_filter_features()
> >>  is working correctly here!
> >>
> >> * Then, MWAIT was secondly set in host_cpu_enable_cpu_pm() regardless
> >>   neither Host's support or previous MWAIT enablement result. This is
> >>   the root cause of your issue.
> >>
> >> Therefore, we should make cpu-pm honor his first MWAIT enablement result
> >> instead of repeatly and unconditionally setting the MWAIT bit again in
> >> host_cpu_enable_cpu_pm().
> >>
> >> Additionally, I think the code in x86_cpu_realizefn():
> >>   cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
> >> has the similar issue because it also should check MWAIT feature bit.
> >>
> >> Further, it may be possible to remove cpu->mwait: just check the MWAIT
> >> bit in leaf 5 of cpu_x86_cpuid(), and if MWAIT is present, use host's
> >> mwait info plus CPUID_MWAIT_EMX | CPUID_MWAIT_IBE.  
> > 
> > Agreed with above analysis,
> > we shouldn't have host_cpu_enable_cpu_pm() as kvm_arch_get_supported_cpuid
> > gets us MWAIT already.  
> 
> Yes, I agree don't need to set CPUID_EXT_MONITOR besides
> kvm_arch_get_supported_cpuid().
> 
> > 
> > filling in cpu->mwait.ecx is benign mistake which likely doesn't
> > trigger anything if CPUID_EXT_MONITOR is not present.
> > But for clarity it's better to add an explicit check there as well.  
> 
> Yes, I agree without MWAIT available and advertised, it's meaningless to
> set the EMX and IBE bits. Seems to me it's cleaner to remove cpu->mwait
> all together, and in cpu_x86_cpuid(), just read from host_cpuid(5) if
> MWAIT is available to the guest. But I don't understand the history of
> why QEMU unconditionally advertises these two bits, and don't know it it
> could break some thing if these two bits are removed.
> 
> Even if we want to fix these two bits, we can do it in another separate
> patch.
> 
> e737b32a36 (" Core 2 Duo specification (Alexander Graf).")
> unconditionally adds "CPUID_MWAIT_EMX | CPUID_MWAIT_IBE" to CPUID.5.ECX
> with further explanation.
> 
> 2266d44311 ("i386/cpu: make -cpu host support monitor/mwait") adds
> comment "We always wake on interrupt even if host does not have the
> capability" to CPUID_MWAIT_IBE.
> 
> 
> >   
> >>  
> >>>> and the changes of
> >>>> CPUID after x86_cpu_filter_features() should by default be regarded like
> >>>> "QEMU knows what it is doing".    
> >>>
> >>> Sure, we can add feature bits after x86_cpu_filter_features(), but I
> >>> think moving cpu_exec_realizefn() before x86_cpu_filter_features() is
> >>> more generic, and actually this is what QEMU did before commit 662175b91ff2.
> >>>
> >>> - Less redundant code. Specifically, no need to call
> >>> x86_cpu_get_supported_feature_word() again.
> >>> - Potentially there could be other features could be added from the
> >>> accel-specific realizefn, kvm_cpu_realizefn() for example.  And these
> >>> features need to be checked against the host availability.    
> >>
> >> Mainly I don't think this reorder is a direct fix for the problem (I
> >> just analyse it above), also in your case x86_cpu_filter_features() will
> >> print a WARNING when QEMU boots, which I don't think is cpu-pm's intention.  
> > 
> > There is no problem with warning, I'd even say it's a good thing.  
> 
> I agree it's good to have the warning as well.
> 
> > But you are right reordering just masks the issue.
> > 
> > As for expected behavior, if user asked for "-overcommit cpu-pm=on"
> > there are 2 options:
> >    * it's working as expected (mwait exiting is enabled successfully with CPUID MONITOR bit set)
> >    * QEMU shall fail to start.  
> 
> I like the idea that QEMU refuses to launch the guest if the asked CPU
> features are not available, which is more user friendly.  But the
> problem is, "-overcommit cpu-pm=on" is an umbrella which intends to
> enable all the following CPUIDs and KVM features if it's possible.  So,
> if QEMU fails the guest in this case, then it needs to fail the WAITPKG
> feature as well. Additionally, it may need to provide individual options
> to enable these individual features, which I doubt could be too complicated.

how about

kvm_arch_init()
    ...
    if (enable_cpu_pm) {                                                         
        int disable_exits = kvm_check_extension(s, KVM_CAP_X86_DISABLE_EXITS);   

/* Work around for kernel header with a typo. TODO: fix header and drop. */      
#if defined(KVM_X86_DISABLE_EXITS_HTL) && !defined(KVM_X86_DISABLE_EXITS_HLT)    
#define KVM_X86_DISABLE_EXITS_HLT KVM_X86_DISABLE_EXITS_HTL                      
#endif                                   
                                        
above comment probably needs to be cleaned up

        if (disable_exits) {                                                     
            disable_exits &= (KVM_X86_DISABLE_EXITS_MWAIT |                      
                              KVM_X86_DISABLE_EXITS_HLT |                        
                              KVM_X86_DISABLE_EXITS_PAUSE |                      
                              KVM_X86_DISABLE_EXITS_CSTATE);                     
        }  

fail here if filtered disable_exits is an empty set?                                                                
                                                                                 
        ret = kvm_vm_enable_cap(s, KVM_CAP_X86_DISABLE_EXITS, 0,                 
                                disable_exits);                                  
        if (ret < 0) {                                                           
            error_report("kvm: guest stopping CPU not supported: %s",            
                         strerror(-ret));                                        
        }                                                                        
    }                                                                            
      


> KVM_X86_DISABLE_EXITS_MWAI
> KVM_X86_DISABLE_EXITS_HLTKVM_X86_DISABLE_EXITS_PAUSE
> KVM_X86_DISABLE_EXITS_CSTATE
> CPUID.7.0:ECX.WAITPKG
> CPUID.1.ECX.MWAIT
>
Chen, Zide June 5, 2024, 5:58 p.m. UTC | #8
On 6/5/2024 8:07 AM, Igor Mammedov wrote:
> On Mon, 3 Jun 2024 14:29:50 -0700
> "Chen, Zide" <zide.chen@intel.com> wrote:
> 
>> On 6/3/2024 2:30 AM, Igor Mammedov wrote:
>>> On Sat, 1 Jun 2024 23:26:55 +0800
>>> Zhao Liu <zhao1.liu@intel.com> wrote:
>>>   
>>>> On Fri, May 31, 2024 at 10:13:47AM -0700, Chen, Zide wrote:  
>>>>> Date: Fri, 31 May 2024 10:13:47 -0700
>>>>> From: "Chen, Zide" <zide.chen@intel.com>
>>>>> Subject: Re: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>>>>  x86_cpu_filter_features
>>>>>
>>>>> On 5/30/2024 11:30 PM, Zhao Liu wrote:    
>>>>>> Hi Zide,
>>>>>>
>>>>>> On Fri, May 24, 2024 at 01:00:16PM -0700, Zide Chen wrote:    
>>>>>>> Date: Fri, 24 May 2024 13:00:16 -0700
>>>>>>> From: Zide Chen <zide.chen@intel.com>
>>>>>>> Subject: [PATCH V2 2/3] target/i386: call cpu_exec_realizefn before
>>>>>>>  x86_cpu_filter_features
>>>>>>> X-Mailer: git-send-email 2.34.1
>>>>>>>
>>>>>>> cpu_exec_realizefn which calls the accel-specific realizefn may expand
>>>>>>> features.  e.g., some accel-specific options may require extra features
>>>>>>> to be enabled, and it's appropriate to expand these features in accel-
>>>>>>> specific realizefn.
>>>>>>>
>>>>>>> One such example is the cpu-pm option, which may add CPUID_EXT_MONITOR.
>>>>>>>
>>>>>>> Thus, call cpu_exec_realizefn before x86_cpu_filter_features to ensure
>>>>>>> that it won't expose features not supported by the host.
>>>>>>>
>>>>>>> Fixes: 662175b91ff2 ("i386: reorder call to cpu_exec_realizefn")
>>>>>>> Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com>
>>>>>>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>>>>>>> ---
>>>>>>>  target/i386/cpu.c         | 24 ++++++++++++------------
>>>>>>>  target/i386/kvm/kvm-cpu.c |  1 -
>>>>>>>  2 files changed, 12 insertions(+), 13 deletions(-)
>>>>>>>
>>>>>>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>>>>>>> index bc2dceb647fa..a1c1c785bd2f 100644
>>>>>>> --- a/target/i386/cpu.c
>>>>>>> +++ b/target/i386/cpu.c
>>>>>>> @@ -7604,6 +7604,18 @@ static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
>>>>>>>          }
>>>>>>>      }
>>>>>>>  
>>>>>>> +    /*
>>>>>>> +     * note: the call to the framework needs to happen after feature expansion,
>>>>>>> +     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
>>>>>>> +     * These may be set by the accel-specific code,
>>>>>>> +     * and the results are subsequently checked / assumed in this function.
>>>>>>> +     */
>>>>>>> +    cpu_exec_realizefn(cs, &local_err);
>>>>>>> +    if (local_err != NULL) {
>>>>>>> +        error_propagate(errp, local_err);
>>>>>>> +        return;
>>>>>>> +    }
>>>>>>> +
>>>>>>>      x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);    
>>>>>>
>>>>>> For your case, which sets cpu-pm=on via overcommit, then
>>>>>> x86_cpu_filter_features() will complain that mwait is not supported.
>>>>>>
>>>>>> Such warning is not necessary, because the purpose of overcommit (from
>>>>>> code) is only to support mwait when possible, not to commit to support
>>>>>> mwait in Guest.
>>>>>>
>>>>>> Additionally, I understand x86_cpu_filter_features() is primarily
>>>>>> intended to filter features configured by the user,     
>>>>>
>>>>> Yes, that's why this patches intends to let x86_cpu_filter_features()
>>>>> filter out the MWAIT bit which is set from the overcommit option.    
>>>>
>>>> HMM, but in fact x86_cpu_filter_features() has already checked the MWAIT
>>>> bit set by "-overcommit cpu-pm=on". ;-)
>>>>
>>>> (Pls correct me if I'm wrong) Revisiting what cpu-pm did to MWAIT:
>>>> * Firstly, it set MWAIT bit in x86_cpu_expand_features():
>>>>   x86_cpu_expand_features()  
>>>>      -> x86_cpu_get_supported_feature_word()
>>>>         -> kvm_arch_get_supported_cpuid()    
>>>>  This MWAIT is based on Host's MWAIT capability. This MWAIT enablement
>>>>  is fine for next x86_cpu_filter_features() and x86_cpu_filter_features()
>>>>  is working correctly here!
>>>>
>>>> * Then, MWAIT was secondly set in host_cpu_enable_cpu_pm() regardless
>>>>   neither Host's support or previous MWAIT enablement result. This is
>>>>   the root cause of your issue.
>>>>
>>>> Therefore, we should make cpu-pm honor his first MWAIT enablement result
>>>> instead of repeatly and unconditionally setting the MWAIT bit again in
>>>> host_cpu_enable_cpu_pm().
>>>>
>>>> Additionally, I think the code in x86_cpu_realizefn():
>>>>   cpu->mwait.ecx |= CPUID_MWAIT_EMX | CPUID_MWAIT_IBE;
>>>> has the similar issue because it also should check MWAIT feature bit.
>>>>
>>>> Further, it may be possible to remove cpu->mwait: just check the MWAIT
>>>> bit in leaf 5 of cpu_x86_cpuid(), and if MWAIT is present, use host's
>>>> mwait info plus CPUID_MWAIT_EMX | CPUID_MWAIT_IBE.  
>>>
>>> Agreed with above analysis,
>>> we shouldn't have host_cpu_enable_cpu_pm() as kvm_arch_get_supported_cpuid
>>> gets us MWAIT already.  
>>
>> Yes, I agree don't need to set CPUID_EXT_MONITOR besides
>> kvm_arch_get_supported_cpuid().
>>
>>>
>>> filling in cpu->mwait.ecx is benign mistake which likely doesn't
>>> trigger anything if CPUID_EXT_MONITOR is not present.
>>> But for clarity it's better to add an explicit check there as well.  
>>
>> Yes, I agree without MWAIT available and advertised, it's meaningless to
>> set the EMX and IBE bits. Seems to me it's cleaner to remove cpu->mwait
>> all together, and in cpu_x86_cpuid(), just read from host_cpuid(5) if
>> MWAIT is available to the guest. But I don't understand the history of
>> why QEMU unconditionally advertises these two bits, and don't know it it
>> could break some thing if these two bits are removed.
>>
>> Even if we want to fix these two bits, we can do it in another separate
>> patch.
>>
>> e737b32a36 (" Core 2 Duo specification (Alexander Graf).")
>> unconditionally adds "CPUID_MWAIT_EMX | CPUID_MWAIT_IBE" to CPUID.5.ECX
>> with further explanation.
>>
>> 2266d44311 ("i386/cpu: make -cpu host support monitor/mwait") adds
>> comment "We always wake on interrupt even if host does not have the
>> capability" to CPUID_MWAIT_IBE.
>>
>>
>>>   
>>>>  
>>>>>> and the changes of
>>>>>> CPUID after x86_cpu_filter_features() should by default be regarded like
>>>>>> "QEMU knows what it is doing".    
>>>>>
>>>>> Sure, we can add feature bits after x86_cpu_filter_features(), but I
>>>>> think moving cpu_exec_realizefn() before x86_cpu_filter_features() is
>>>>> more generic, and actually this is what QEMU did before commit 662175b91ff2.
>>>>>
>>>>> - Less redundant code. Specifically, no need to call
>>>>> x86_cpu_get_supported_feature_word() again.
>>>>> - Potentially there could be other features could be added from the
>>>>> accel-specific realizefn, kvm_cpu_realizefn() for example.  And these
>>>>> features need to be checked against the host availability.    
>>>>
>>>> Mainly I don't think this reorder is a direct fix for the problem (I
>>>> just analyse it above), also in your case x86_cpu_filter_features() will
>>>> print a WARNING when QEMU boots, which I don't think is cpu-pm's intention.  
>>>
>>> There is no problem with warning, I'd even say it's a good thing.  
>>
>> I agree it's good to have the warning as well.
>>
>>> But you are right reordering just masks the issue.
>>>
>>> As for expected behavior, if user asked for "-overcommit cpu-pm=on"
>>> there are 2 options:
>>>    * it's working as expected (mwait exiting is enabled successfully with CPUID MONITOR bit set)
>>>    * QEMU shall fail to start.  
>>
>> I like the idea that QEMU refuses to launch the guest if the asked CPU
>> features are not available, which is more user friendly.  But the
>> problem is, "-overcommit cpu-pm=on" is an umbrella which intends to
>> enable all the following CPUIDs and KVM features if it's possible.  So,
>> if QEMU fails the guest in this case, then it needs to fail the WAITPKG
>> feature as well. Additionally, it may need to provide individual options
>> to enable these individual features, which I doubt could be too complicated.
> 
> how about
> 
> kvm_arch_init()
>     ...
>     if (enable_cpu_pm) {                                                         
>         int disable_exits = kvm_check_extension(s, KVM_CAP_X86_DISABLE_EXITS);   
> 
> /* Work around for kernel header with a typo. TODO: fix header and drop. */      
> #if defined(KVM_X86_DISABLE_EXITS_HTL) && !defined(KVM_X86_DISABLE_EXITS_HLT)    
> #define KVM_X86_DISABLE_EXITS_HLT KVM_X86_DISABLE_EXITS_HTL                      
> #endif                                   
>                                         
> above comment probably needs to be cleaned up

Zhao already has a patch to clean it up.
https://lists.nongnu.org/archive/html/qemu-devel/2024-05/msg00889.html

> 
>         if (disable_exits) {                                                     
>             disable_exits &= (KVM_X86_DISABLE_EXITS_MWAIT |                      
>                               KVM_X86_DISABLE_EXITS_HLT |                        
>                               KVM_X86_DISABLE_EXITS_PAUSE |                      
>                               KVM_X86_DISABLE_EXITS_CSTATE);                     
>         }  
> 
> fail here if filtered disable_exits is an empty set?                                                                
>                                                                                  
>         ret = kvm_vm_enable_cap(s, KVM_CAP_X86_DISABLE_EXITS, 0,                 
>                                 disable_exits);                                  
>         if (ret < 0) {                                                           
>             error_report("kvm: guest stopping CPU not supported: %s",            
>                          strerror(-ret));                                        
>         }                                                                        
>     }                                                                            

We may do the following. But it's still confusing:

- If any of these cpu-pm features are available, we should not fail the
guest.  Meanwhile, users are not able to easily identify which features
are available and enabled, so that user may still get false assumptions.

- We may print out the capabilities that are enabled from
kvm_vm_enable_cap(), but seems this is not QEMU style.

- Since KVM commit 0abcc8f65cc ("KVM: VMX: enable X86_FEATURE_WAITPKG in
KVM capabilities"), WAITPKG is advertised to the guest by default, thus
it's not much meaningful to tight enable_cpu_pm with this feature.  But
before this logic is removed from QEMU, look at this case: WAITPKS is
available while none of KVM_X86_DISABLE_EXITS_xxx are available, we
don't fail the guest, but the behavior of the guest is exactly the same
with or without "cpu-pm=on".

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 891ce287e473..a4da23e2cd07 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2708,11 +2708,16 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
                               KVM_X86_DISABLE_EXITS_CSTATE);
         }

-        ret = kvm_vm_enable_cap(s, KVM_CAP_X86_DISABLE_EXITS, 0,
-                                disable_exits);
-        if (ret < 0) {
-            error_report("kvm: guest stopping CPU not supported: %s",
+        if (disable_exits)
+            ret = kvm_vm_enable_cap(s, KVM_CAP_X86_DISABLE_EXITS, 0,
+                                    disable_exits);
+       else
+            ret = -ENOTSUP;
+
+        if (ret < 0 && !has_msr_umwait) {
+            error_report("kvm: Failed to enable cpu-pm overcommit: %s",
                          strerror(-ret));
+           return ret;
         }
     }

> 
>> KVM_X86_DISABLE_EXITS_MWAI
>> KVM_X86_DISABLE_EXITS_HLTKVM_X86_DISABLE_EXITS_PAUSE
>> KVM_X86_DISABLE_EXITS_CSTATE
>> CPUID.7.0:ECX.WAITPKG
>> CPUID.1.ECX.MWAIT
>>
>
diff mbox series

Patch

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index bc2dceb647fa..a1c1c785bd2f 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7604,6 +7604,18 @@  static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
         }
     }
 
+    /*
+     * note: the call to the framework needs to happen after feature expansion,
+     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
+     * These may be set by the accel-specific code,
+     * and the results are subsequently checked / assumed in this function.
+     */
+    cpu_exec_realizefn(cs, &local_err);
+    if (local_err != NULL) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
     x86_cpu_filter_features(cpu, cpu->check_cpuid || cpu->enforce_cpuid);
 
     if (cpu->enforce_cpuid && x86_cpu_have_filtered_features(cpu)) {
@@ -7625,18 +7637,6 @@  static void x86_cpu_realizefn(DeviceState *dev, Error **errp)
 
     x86_cpu_set_sgxlepubkeyhash(env);
 
-    /*
-     * note: the call to the framework needs to happen after feature expansion,
-     * but before the checks/modifications to ucode_rev, mwait, phys_bits.
-     * These may be set by the accel-specific code,
-     * and the results are subsequently checked / assumed in this function.
-     */
-    cpu_exec_realizefn(cs, &local_err);
-    if (local_err != NULL) {
-        error_propagate(errp, local_err);
-        return;
-    }
-
     if (xcc->host_cpuid_required && !accel_uses_host_cpuid()) {
         g_autofree char *name = x86_cpu_class_get_model_name(xcc);
         error_setg(&local_err, "CPU model '%s' requires KVM or HVF", name);
diff --git a/target/i386/kvm/kvm-cpu.c b/target/i386/kvm/kvm-cpu.c
index f76972e47e61..3adcedf0dbc3 100644
--- a/target/i386/kvm/kvm-cpu.c
+++ b/target/i386/kvm/kvm-cpu.c
@@ -50,7 +50,6 @@  static bool kvm_cpu_realizefn(CPUState *cs, Error **errp)
      * nothing else has been set by the user (or by accelerators) in
      * cpu->ucode_rev and cpu->phys_bits, and updates the CPUID results in
      * mwait.ecx.
-     * This accel realization code also assumes cpu features are already expanded.
      *
      * realize order:
      *