diff mbox series

[SRU,J,1/1] KVM: x86/mmu: Track the number of TDP MMU pages, but not the actual pages

Message ID 20230912060246.150003-2-chengen.du@canonical.com
State New
Headers show
Series NULL Pointer Dereference During KVM MMU Page Invalidation | expand

Commit Message

Chengen Du Sept. 12, 2023, 6:02 a.m. UTC
From: Sean Christopherson <seanjc@google.com>

Track the number of TDP MMU "shadow" pages instead of tracking the pages
themselves. With the NX huge page list manipulation moved out of the common
linking flow, elminating the list-based tracking means the happy path of
adding a shadow page doesn't need to acquire a spinlock and can instead
inc/dec an atomic.

Keep the tracking as the WARN during TDP MMU teardown on leaked shadow
pages is very, very useful for detecting KVM bugs.

Tracking the number of pages will also make it trivial to expose the
counter to userspace as a stat in the future, which may or may not be
desirable.

Note, the TDP MMU needs to use a separate counter (and stat if that ever
comes to be) from the existing n_used_mmu_pages. The TDP MMU doesn't bother
supporting the shrinker nor does it honor KVM_SET_NR_MMU_PAGES (because the
TDP MMU consumes so few pages relative to shadow paging), and including TDP
MMU pages in that counter would break both the shrinker and shadow MMUs,
e.g. if a VM is using nested TDP.

Cc: Yan Zhao <yan.y.zhao@intel.com>
Reviewed-by: Mingwei Zhang <mizhang@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Message-Id: <20221019165618.927057-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(backported from commit d25ceb9264364dca0683748b301340097fdab6c7)
[chengen - Apply atomic64_t-related changes following
the upstream patch's execution order]
Signed-off-by: Chengen Du <chengen.du@canonical.com>
---
 arch/x86/include/asm/kvm_host.h | 11 +++--------
 arch/x86/kvm/mmu/tdp_mmu.c      | 20 +++++++++++---------
 2 files changed, 14 insertions(+), 17 deletions(-)

Comments

Thadeu Lima de Souza Cascardo Sept. 12, 2023, 11:06 a.m. UTC | #1
On Tue, Sep 12, 2023 at 02:02:46PM +0800, Chengen Du wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Track the number of TDP MMU "shadow" pages instead of tracking the pages
> themselves. With the NX huge page list manipulation moved out of the common
> linking flow, elminating the list-based tracking means the happy path of
> adding a shadow page doesn't need to acquire a spinlock and can instead
> inc/dec an atomic.
> 
> Keep the tracking as the WARN during TDP MMU teardown on leaked shadow
> pages is very, very useful for detecting KVM bugs.
> 
> Tracking the number of pages will also make it trivial to expose the
> counter to userspace as a stat in the future, which may or may not be
> desirable.
> 
> Note, the TDP MMU needs to use a separate counter (and stat if that ever
> comes to be) from the existing n_used_mmu_pages. The TDP MMU doesn't bother
> supporting the shrinker nor does it honor KVM_SET_NR_MMU_PAGES (because the
> TDP MMU consumes so few pages relative to shadow paging), and including TDP
> MMU pages in that counter would break both the shrinker and shadow MMUs,
> e.g. if a VM is using nested TDP.
> 
> Cc: Yan Zhao <yan.y.zhao@intel.com>
> Reviewed-by: Mingwei Zhang <mizhang@google.com>
> Reviewed-by: David Matlack <dmatlack@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
> Message-Id: <20221019165618.927057-6-seanjc@google.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> (backported from commit d25ceb9264364dca0683748b301340097fdab6c7)
> [chengen - Apply atomic64_t-related changes following
> the upstream patch's execution order]
> Signed-off-by: Chengen Du <chengen.du@canonical.com>

Missing the BugLink here.

Also, notice that upstream decided to make tdp_mmu not the default in 5.15,
given some bugs they decided not to fix on 5.15. We have gone through and
applied this too.

Still, it's worth trying to make this better when usecases ask for TDP MMU to
be used.

I looked at the history here, and noticed the rename from link_page to link_sp
at c298a30c2821cb03274dfc81840f7bd3eb59d5a3 and the account functions added at
43a063cab325ee7cc50349967e536b3cd4e57f03.

Taking those into consideration, your backport looks correct to me.

Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>

> ---
>  arch/x86/include/asm/kvm_host.h | 11 +++--------
>  arch/x86/kvm/mmu/tdp_mmu.c      | 20 +++++++++++---------
>  2 files changed, 14 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 55d791ad4787..9b7a401a4174 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1168,6 +1168,9 @@ struct kvm_arch {
>  	 */
>  	bool tdp_mmu_enabled;
>  
> +	/* The number of TDP MMU pages across all roots. */
> +	atomic64_t tdp_mmu_pages;
> +
>  	/*
>  	 * List of struct kvm_mmu_pages being used as roots.
>  	 * All struct kvm_mmu_pages in the list should have
> @@ -1188,18 +1191,10 @@ struct kvm_arch {
>  	 */
>  	struct list_head tdp_mmu_roots;
>  
> -	/*
> -	 * List of struct kvmp_mmu_pages not being used as roots.
> -	 * All struct kvm_mmu_pages in the list should have
> -	 * tdp_mmu_page set and a tdp_mmu_root_count of 0.
> -	 */
> -	struct list_head tdp_mmu_pages;
> -
>  	/*
>  	 * Protects accesses to the following fields when the MMU lock
>  	 * is held in read mode:
>  	 *  - tdp_mmu_roots (above)
> -	 *  - tdp_mmu_pages (above)
>  	 *  - the link field of struct kvm_mmu_pages used by the TDP MMU
>  	 *  - lpage_disallowed_mmu_pages
>  	 *  - the lpage_disallowed_link field of struct kvm_mmu_pages used
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 7a64fb238044..04d35439113e 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -24,7 +24,6 @@ bool kvm_mmu_init_tdp_mmu(struct kvm *kvm)
>  
>  	INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots);
>  	spin_lock_init(&kvm->arch.tdp_mmu_pages_lock);
> -	INIT_LIST_HEAD(&kvm->arch.tdp_mmu_pages);
>  
>  	return true;
>  }
> @@ -43,7 +42,7 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
>  	if (!kvm->arch.tdp_mmu_enabled)
>  		return;
>  
> -	WARN_ON(!list_empty(&kvm->arch.tdp_mmu_pages));
> +	WARN_ON(atomic64_read(&kvm->arch.tdp_mmu_pages));
>  	WARN_ON(!list_empty(&kvm->arch.tdp_mmu_roots));
>  
>  	/*
> @@ -278,11 +277,12 @@ static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
>  static void tdp_mmu_link_page(struct kvm *kvm, struct kvm_mmu_page *sp,
>  			      bool account_nx)
>  {
> -	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
> -	list_add(&sp->link, &kvm->arch.tdp_mmu_pages);
> -	if (account_nx)
> +	atomic64_inc(&kvm->arch.tdp_mmu_pages);
> +	if (account_nx) {
> +		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
>  		account_huge_nx_page(kvm, sp);
> -	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
> +		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
> +	}
>  }
>  
>  /**
> @@ -297,14 +297,16 @@ static void tdp_mmu_link_page(struct kvm *kvm, struct kvm_mmu_page *sp,
>  static void tdp_mmu_unlink_page(struct kvm *kvm, struct kvm_mmu_page *sp,
>  				bool shared)
>  {
> +	atomic64_dec(&kvm->arch.tdp_mmu_pages);
> +	if (!sp->lpage_disallowed)
> +		return;
> +
>  	if (shared)
>  		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
>  	else
>  		lockdep_assert_held_write(&kvm->mmu_lock);
>  
> -	list_del(&sp->link);
> -	if (sp->lpage_disallowed)
> -		unaccount_huge_nx_page(kvm, sp);
> +	unaccount_huge_nx_page(kvm, sp);
>  
>  	if (shared)
>  		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
> -- 
> 2.39.2
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
diff mbox series

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55d791ad4787..9b7a401a4174 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1168,6 +1168,9 @@  struct kvm_arch {
 	 */
 	bool tdp_mmu_enabled;
 
+	/* The number of TDP MMU pages across all roots. */
+	atomic64_t tdp_mmu_pages;
+
 	/*
 	 * List of struct kvm_mmu_pages being used as roots.
 	 * All struct kvm_mmu_pages in the list should have
@@ -1188,18 +1191,10 @@  struct kvm_arch {
 	 */
 	struct list_head tdp_mmu_roots;
 
-	/*
-	 * List of struct kvmp_mmu_pages not being used as roots.
-	 * All struct kvm_mmu_pages in the list should have
-	 * tdp_mmu_page set and a tdp_mmu_root_count of 0.
-	 */
-	struct list_head tdp_mmu_pages;
-
 	/*
 	 * Protects accesses to the following fields when the MMU lock
 	 * is held in read mode:
 	 *  - tdp_mmu_roots (above)
-	 *  - tdp_mmu_pages (above)
 	 *  - the link field of struct kvm_mmu_pages used by the TDP MMU
 	 *  - lpage_disallowed_mmu_pages
 	 *  - the lpage_disallowed_link field of struct kvm_mmu_pages used
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7a64fb238044..04d35439113e 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -24,7 +24,6 @@  bool kvm_mmu_init_tdp_mmu(struct kvm *kvm)
 
 	INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots);
 	spin_lock_init(&kvm->arch.tdp_mmu_pages_lock);
-	INIT_LIST_HEAD(&kvm->arch.tdp_mmu_pages);
 
 	return true;
 }
@@ -43,7 +42,7 @@  void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
 	if (!kvm->arch.tdp_mmu_enabled)
 		return;
 
-	WARN_ON(!list_empty(&kvm->arch.tdp_mmu_pages));
+	WARN_ON(atomic64_read(&kvm->arch.tdp_mmu_pages));
 	WARN_ON(!list_empty(&kvm->arch.tdp_mmu_roots));
 
 	/*
@@ -278,11 +277,12 @@  static void handle_changed_spte_dirty_log(struct kvm *kvm, int as_id, gfn_t gfn,
 static void tdp_mmu_link_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 			      bool account_nx)
 {
-	spin_lock(&kvm->arch.tdp_mmu_pages_lock);
-	list_add(&sp->link, &kvm->arch.tdp_mmu_pages);
-	if (account_nx)
+	atomic64_inc(&kvm->arch.tdp_mmu_pages);
+	if (account_nx) {
+		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
 		account_huge_nx_page(kvm, sp);
-	spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);
+	}
 }
 
 /**
@@ -297,14 +297,16 @@  static void tdp_mmu_link_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 static void tdp_mmu_unlink_page(struct kvm *kvm, struct kvm_mmu_page *sp,
 				bool shared)
 {
+	atomic64_dec(&kvm->arch.tdp_mmu_pages);
+	if (!sp->lpage_disallowed)
+		return;
+
 	if (shared)
 		spin_lock(&kvm->arch.tdp_mmu_pages_lock);
 	else
 		lockdep_assert_held_write(&kvm->mmu_lock);
 
-	list_del(&sp->link);
-	if (sp->lpage_disallowed)
-		unaccount_huge_nx_page(kvm, sp);
+	unaccount_huge_nx_page(kvm, sp);
 
 	if (shared)
 		spin_unlock(&kvm->arch.tdp_mmu_pages_lock);