diff mbox series

[1/1,SRU,B,F] KVM: x86: Properly reset MMU context at vCPU RESET/INIT

Message ID 20211027171516.313906-2-halves@canonical.com
State New
Headers show
Series KVM emulation failure when booting into VM crash kernel with multiple CPUs | expand

Commit Message

Heitor Alves de Siqueira Oct. 27, 2021, 5:15 p.m. UTC
From: Sean Christopherson <seanjc@google.com>

BugLink: https://bugs.launchpad.net/bugs/1948862

Reset the MMU context at vCPU INIT (and RESET for good measure) if CR0.PG
was set prior to INIT.  Simply re-initializing the current MMU is not
sufficient as the current root HPA may not be usable in the new context.
E.g. if TDP is disabled and INIT arrives while the vCPU is in long mode,
KVM will fail to switch to the 32-bit pae_root and bomb on the next
VM-Enter due to running with a 64-bit CR3 in 32-bit mode.

This bug was papered over in both VMX and SVM, but still managed to rear
its head in the MMU role on VMX.  Because EFER.LMA=1 requires CR0.PG=1,
kvm_calc_shadow_mmu_root_page_role() checks for EFER.LMA without first
checking CR0.PG.  VMX's RESET/INIT flow writes CR0 before EFER, and so
an INIT with the vCPU in 64-bit mode will cause the hack-a-fix to
generate the wrong MMU role.

In VMX, the INIT issue is specific to running without unrestricted guest
since unrestricted guest is available if and only if EPT is enabled.
Commit 8668a3c468ed ("KVM: VMX: Reset mmu context when entering real
mode") resolved the issue by forcing a reset when entering emulated real
mode.

In SVM, commit ebae871a509d ("kvm: svm: reset mmu on VCPU reset") forced
a MMU reset on every INIT to workaround the flaw in common x86.  Note, at
the time the bug was fixed, the SVM problem was exacerbated by a complete
lack of a CR4 update.

The vendor resets will be reverted in future patches, primarily to aid
bisection in case there are non-INIT flows that rely on the existing VMX
logic.

Because CR0.PG is unconditionally cleared on INIT, and because CR0.WP and
all CR4/EFER paging bits are ignored if CR0.PG=0, simply checking that
CR0.PG was '1' prior to INIT/RESET is sufficient to detect a required MMU
context reset.

Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(backported from 0aa1837533e5f4be8cc21bbc06314c23ba2c5447)
Signed-off-by: Heitor Alves de Siqueira <halves@canonical.com>
Signed-off-by: Matthew Ruffell <matthew.ruffell@canonical.com>
---
 arch/x86/kvm/x86.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Thadeu Lima de Souza Cascardo Oct. 27, 2021, 6:32 p.m. UTC | #1
On Wed, Oct 27, 2021 at 02:15:16PM -0300, Heitor Alves de Siqueira wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1948862
> 
> Reset the MMU context at vCPU INIT (and RESET for good measure) if CR0.PG
> was set prior to INIT.  Simply re-initializing the current MMU is not
> sufficient as the current root HPA may not be usable in the new context.
> E.g. if TDP is disabled and INIT arrives while the vCPU is in long mode,
> KVM will fail to switch to the 32-bit pae_root and bomb on the next
> VM-Enter due to running with a 64-bit CR3 in 32-bit mode.
> 
> This bug was papered over in both VMX and SVM, but still managed to rear
> its head in the MMU role on VMX.  Because EFER.LMA=1 requires CR0.PG=1,
> kvm_calc_shadow_mmu_root_page_role() checks for EFER.LMA without first
> checking CR0.PG.  VMX's RESET/INIT flow writes CR0 before EFER, and so
> an INIT with the vCPU in 64-bit mode will cause the hack-a-fix to
> generate the wrong MMU role.
> 
> In VMX, the INIT issue is specific to running without unrestricted guest
> since unrestricted guest is available if and only if EPT is enabled.
> Commit 8668a3c468ed ("KVM: VMX: Reset mmu context when entering real
> mode") resolved the issue by forcing a reset when entering emulated real
> mode.
> 
> In SVM, commit ebae871a509d ("kvm: svm: reset mmu on VCPU reset") forced
> a MMU reset on every INIT to workaround the flaw in common x86.  Note, at
> the time the bug was fixed, the SVM problem was exacerbated by a complete
> lack of a CR4 update.
> 
> The vendor resets will be reverted in future patches, primarily to aid
> bisection in case there are non-INIT flows that rely on the existing VMX
> logic.
> 
> Because CR0.PG is unconditionally cleared on INIT, and because CR0.WP and
> all CR4/EFER paging bits are ignored if CR0.PG=0, simply checking that
> CR0.PG was '1' prior to INIT/RESET is sufficient to detect a required MMU
> context reset.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Message-Id: <20210622175739.3610207-4-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> (backported from 0aa1837533e5f4be8cc21bbc06314c23ba2c5447)
> Signed-off-by: Heitor Alves de Siqueira <halves@canonical.com>
> Signed-off-by: Matthew Ruffell <matthew.ruffell@canonical.com>
> ---
>  arch/x86/kvm/x86.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0b6edc63103e..3f6a5630f2c0 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -8109,6 +8109,8 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>  
>  void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>  {
> +	unsigned long old_cr0 = kvm_read_cr0(vcpu);
> +
>  	kvm_lapic_reset(vcpu, init_event);
>  
>  	vcpu->arch.hflags = 0;
> @@ -8178,6 +8180,17 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
>  	vcpu->arch.ia32_xss = 0;
>  
>  	kvm_x86_ops->vcpu_reset(vcpu, init_event);
> +
> +	/*
> +	 * Reset the MMU context if paging was enabled prior to INIT (which is
> +	 * implied if CR0.PG=1 as CR0 will be '0' prior to RESET).  Unlike the
> +	 * standard CR0/CR4/EFER modification paths, only CR0.PG needs to be
> +	 * checked because it is unconditionally cleared on INIT and all other
> +	 * paging related bits are ignored if paging is disabled, i.e. CR0.WP,
> +	 * CR4, and EFER changes are all irrelevant if CR0.PG was '0'.
> +	 */
> +	if (old_cr0 & X86_CR0_PG)
> +		kvm_mmu_reset_context(vcpu);
>  }
>  
>  void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
> -- 
> 2.33.1

Makes sense, is upstream and fixes a real issue.

Thanks.

Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
diff mbox series

Patch

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0b6edc63103e..3f6a5630f2c0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8109,6 +8109,8 @@  void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 
 void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
+	unsigned long old_cr0 = kvm_read_cr0(vcpu);
+
 	kvm_lapic_reset(vcpu, init_event);
 
 	vcpu->arch.hflags = 0;
@@ -8178,6 +8180,17 @@  void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	vcpu->arch.ia32_xss = 0;
 
 	kvm_x86_ops->vcpu_reset(vcpu, init_event);
+
+	/*
+	 * Reset the MMU context if paging was enabled prior to INIT (which is
+	 * implied if CR0.PG=1 as CR0 will be '0' prior to RESET).  Unlike the
+	 * standard CR0/CR4/EFER modification paths, only CR0.PG needs to be
+	 * checked because it is unconditionally cleared on INIT and all other
+	 * paging related bits are ignored if paging is disabled, i.e. CR0.WP,
+	 * CR4, and EFER changes are all irrelevant if CR0.PG was '0'.
+	 */
+	if (old_cr0 & X86_CR0_PG)
+		kvm_mmu_reset_context(vcpu);
 }
 
 void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)