diff mbox series

i386/kvm: Do not sync nested state during runtime

Message ID bdd53f40-4e60-f3ae-7ec6-162198214953@siemens.com
State New
Headers show
Series i386/kvm: Do not sync nested state during runtime | expand

Commit Message

Jan Kiszka July 22, 2019, 4 a.m. UTC
Writing the nested state e.g. after a vmport access can invalidate
important parts of the kernel-internal state, and it is not needed as
well. So leave this out from KVM_PUT_RUNTIME_STATE.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
---
 target/i386/kvm.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Liran Alon July 22, 2019, 9:44 a.m. UTC | #1
> On 22 Jul 2019, at 7:00, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> 
> Writing the nested state e.g. after a vmport access can invalidate
> important parts of the kernel-internal state, and it is not needed as
> well. So leave this out from KVM_PUT_RUNTIME_STATE.
> 
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

As QEMU never modifies vCPU nested-state in userspace besides in vmload and vCPU creation,
shouldn’t this be under KVM_PUT_FULL_STATE? Same as the call to kvm_arch_set_tsc_khz().

-Liran 

> ---
> target/i386/kvm.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/target/i386/kvm.c b/target/i386/kvm.c
> index ec7870c6af..da98b2cbca 100644
> --- a/target/i386/kvm.c
> +++ b/target/i386/kvm.c
> @@ -3577,12 +3577,12 @@ int kvm_arch_put_registers(CPUState *cpu, int level)
> 
>     assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
> 
> -    ret = kvm_put_nested_state(x86_cpu);
> -    if (ret < 0) {
> -        return ret;
> -    }
> -
>     if (level >= KVM_PUT_RESET_STATE) {
> +        ret = kvm_put_nested_state(x86_cpu);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +
>         ret = kvm_put_msr_feature_control(x86_cpu);
>         if (ret < 0) {
>             return ret;
> -- 
> 2.16.4
Jan Kiszka July 22, 2019, 10:20 a.m. UTC | #2
On 22.07.19 11:44, Liran Alon wrote:
> 
> 
>> On 22 Jul 2019, at 7:00, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>
>> Writing the nested state e.g. after a vmport access can invalidate
>> important parts of the kernel-internal state, and it is not needed as
>> well. So leave this out from KVM_PUT_RUNTIME_STATE.
>>
>> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
> 
> As QEMU never modifies vCPU nested-state in userspace besides in vmload and vCPU creation,
> shouldn’t this be under KVM_PUT_FULL_STATE? Same as the call to kvm_arch_set_tsc_khz().

Reset is a relevant modification as well. If we do not write back a state that
is disabling virtualization, we break.

Jan
Liran Alon July 22, 2019, 10:31 a.m. UTC | #3
> On 22 Jul 2019, at 13:20, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> 
> On 22.07.19 11:44, Liran Alon wrote:
>> 
>> 
>>> On 22 Jul 2019, at 7:00, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>> 
>>> Writing the nested state e.g. after a vmport access can invalidate
>>> important parts of the kernel-internal state, and it is not needed as
>>> well. So leave this out from KVM_PUT_RUNTIME_STATE.
>>> 
>>> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>> 
>> As QEMU never modifies vCPU nested-state in userspace besides in vmload and vCPU creation,
>> shouldn’t this be under KVM_PUT_FULL_STATE? Same as the call to kvm_arch_set_tsc_khz().
> 
> Reset is a relevant modification as well. If we do not write back a state that
> is disabling virtualization, we break.
> 
> Jan

Currently QEMU writes to userspace maintained nested-state only at kvm_arch_init_vcpu() and
when loading vmstate_nested_state vmstate subsection.
kvm_arch_reset_vcpu() do not modify userspace maintained nested-state.

I would expect KVM internal nested-state to be reset as part of KVM’s vmx_vcpu_reset().
Are we missing a call to vmx_leave_nested() there? 

-Liran
Jan Kiszka July 22, 2019, 10:43 a.m. UTC | #4
On 22.07.19 12:31, Liran Alon wrote:
>> On 22 Jul 2019, at 13:20, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>
>> On 22.07.19 11:44, Liran Alon wrote:
>>>
>>>
>>>> On 22 Jul 2019, at 7:00, Jan Kiszka <jan.kiszka@siemens.com> wrote:
>>>>
>>>> Writing the nested state e.g. after a vmport access can invalidate
>>>> important parts of the kernel-internal state, and it is not needed as
>>>> well. So leave this out from KVM_PUT_RUNTIME_STATE.
>>>>
>>>> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
>>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> As QEMU never modifies vCPU nested-state in userspace besides in vmload and vCPU creation,
>>> shouldn’t this be under KVM_PUT_FULL_STATE? Same as the call to kvm_arch_set_tsc_khz().
>>
>> Reset is a relevant modification as well. If we do not write back a state that
>> is disabling virtualization, we break.
>>
>> Jan
> 
> Currently QEMU writes to userspace maintained nested-state only at kvm_arch_init_vcpu() and
> when loading vmstate_nested_state vmstate subsection.
> kvm_arch_reset_vcpu() do not modify userspace maintained nested-state.

Hmm, then we probably achieve that effect by clearing the related bit in CR4. If
doing that implies an invalidation of the nested state by KVM, we are fine.
Otherwise, I would expect userspace to reset the state to VMCLEAR and purge any
traces of prior use.

> 
> I would expect KVM internal nested-state to be reset as part of KVM’s vmx_vcpu_reset().

vmx_vcpu_reset is not issued on userspace initiated VM reset, only on INIT/SIPI
cycles. It's misleading, and I remember myself running into that when I hacked
on KVM back then.

Jan

> Are we missing a call to vmx_leave_nested() there? 
> 
> -Liran
>
Paolo Bonzini July 22, 2019, 11:23 a.m. UTC | #5
On 22/07/19 12:43, Jan Kiszka wrote:
>> Currently QEMU writes to userspace maintained nested-state only at kvm_arch_init_vcpu() and
>> when loading vmstate_nested_state vmstate subsection.
>> kvm_arch_reset_vcpu() do not modify userspace maintained nested-state.
> Hmm, then we probably achieve that effect by clearing the related bit in CR4.

Almost: by clearing the VMX enable bit in MSR_IA32_FEATURE_CONTROL.
Actually I think you contributed that. :)

I think we could in principle skip that MSR write if env->nested_state
!= NULL, but it doesn't hurt either, and it makes sense that nested virt
state goes together with MSR_IA32_FEATURE_CONTROL since the latter is
indede nested virtualization related.

Paolo

> If doing that implies an invalidation of the nested state by KVM, we
> are fine. Otherwise, I would expect userspace to reset the state to
> VMCLEAR and purge any traces of prior use.
diff mbox series

Patch

diff --git a/target/i386/kvm.c b/target/i386/kvm.c
index ec7870c6af..da98b2cbca 100644
--- a/target/i386/kvm.c
+++ b/target/i386/kvm.c
@@ -3577,12 +3577,12 @@  int kvm_arch_put_registers(CPUState *cpu, int level)
 
     assert(cpu_is_stopped(cpu) || qemu_cpu_is_self(cpu));
 
-    ret = kvm_put_nested_state(x86_cpu);
-    if (ret < 0) {
-        return ret;
-    }
-
     if (level >= KVM_PUT_RESET_STATE) {
+        ret = kvm_put_nested_state(x86_cpu);
+        if (ret < 0) {
+            return ret;
+        }
+
         ret = kvm_put_msr_feature_control(x86_cpu);
         if (ret < 0) {
             return ret;