[V2,3/10] KVM/MMU: Add last_level in the struct mmu_spte_page

Message ID 20190202013825.51261-4-Tianyu.Lan@microsoft.com
State Not Applicable
Headers show
Series
  • X86/KVM/Hyper-V: Add HV ept tlb range list flush support in KVM
Related show

Commit Message

Tianyu Lan Feb. 2, 2019, 1:38 a.m.
From: Lan Tianyu <Tianyu.Lan@microsoft.com>

This patch is to add last_level in the struct kvm_mmu_page. When build
flush tlb range list, last_level will be used to identify whehter the
page should be added into list.

Signed-off-by: Lan Tianyu <Tianyu.Lan@microsoft.com>
---
 arch/x86/include/asm/kvm_host.h | 1 +
 arch/x86/kvm/mmu.c              | 3 +++
 2 files changed, 4 insertions(+)

Comments

Paolo Bonzini Feb. 14, 2019, 4:12 p.m. | #1
On 02/02/19 02:38, lantianyu1986@gmail.com wrote:
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index ce770b446238..70cafd3f95ab 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2918,6 +2918,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  
>  	if (level > PT_PAGE_TABLE_LEVEL)
>  		spte |= PT_PAGE_SIZE_MASK;
> +
> +	sp->last_level = is_last_spte(spte, level);

sp->last_level is always true here.

Paolo

>  	if (tdp_enabled)
>  		spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
>  			kvm_is_mmio_pfn(pfn));
Paolo Bonzini Feb. 14, 2019, 4:32 p.m. | #2
On 02/02/19 02:38, lantianyu1986@gmail.com wrote:
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index ce770b446238..70cafd3f95ab 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2918,6 +2918,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>  
>  	if (level > PT_PAGE_TABLE_LEVEL)
>  		spte |= PT_PAGE_SIZE_MASK;
> +
> +	sp->last_level = is_last_spte(spte, level);

Wait, I wasn't thinking straight.  If a struct kvm_mmu_page exists, it
is never the last level.  Page table entries for the last level do not
have a struct kvm_mmu_page.

Therefore you don't need the flag after all.  I suspect your
calculations in patch 2 are off by one, and you actually need

	hlist_for_each_entry(sp, range->flush_list, flush_link) {
		int pages = KVM_PAGES_PER_HPAGE(sp->role.level + 1);
		...
	}

For example, if sp->role.level is 1 then the struct kvm_mmu_page is for
a page containing PTEs and covers an area of 2 MiB.

Thanks,

Paolo

>  	if (tdp_enabled)
>  		spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
>  			kvm_is_mmio_pfn(pfn));
Tianyu Lan Feb. 15, 2019, 3:05 p.m. | #3
On Fri, Feb 15, 2019 at 12:32 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 02/02/19 02:38, lantianyu1986@gmail.com wrote:
> > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > index ce770b446238..70cafd3f95ab 100644
> > --- a/arch/x86/kvm/mmu.c
> > +++ b/arch/x86/kvm/mmu.c
> > @@ -2918,6 +2918,9 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
> >
> >       if (level > PT_PAGE_TABLE_LEVEL)
> >               spte |= PT_PAGE_SIZE_MASK;
> > +
> > +     sp->last_level = is_last_spte(spte, level);
>
> Wait, I wasn't thinking straight.  If a struct kvm_mmu_page exists, it
> is never the last level.  Page table entries for the last level do not
> have a struct kvm_mmu_page.
>
> Therefore you don't need the flag after all.  I suspect your
> calculations in patch 2 are off by one, and you actually need
>
>         hlist_for_each_entry(sp, range->flush_list, flush_link) {
>                 int pages = KVM_PAGES_PER_HPAGE(sp->role.level + 1);
>                 ...
>         }
>
> For example, if sp->role.level is 1 then the struct kvm_mmu_page is for
> a page containing PTEs and covers an area of 2 MiB.

Yes, you are right. Thanks to point out and will fix. The last_level
flag is to avoid adding middle page node(e.g, PGD, PMD)
into flush list. The address range will be duplicated if adding both
leaf, node and middle node into flush list.

>
> Thanks,
>
> Paolo
>
> >       if (tdp_enabled)
> >               spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
> >                       kvm_is_mmio_pfn(pfn));
>
Paolo Bonzini Feb. 15, 2019, 3:22 p.m. | #4
On 15/02/19 16:05, Tianyu Lan wrote:
> Yes, you are right. Thanks to point out and will fix. The last_level
> flag is to avoid adding middle page node(e.g, PGD, PMD)
> into flush list. The address range will be duplicated if adding both
> leaf, node and middle node into flush list.

Hmm, that's not easy to track.  One kvm_mmu_page could include both leaf
and non-leaf page (for example a huge page for 0 to 2 MB and a page
table for 2 MB to 4 MB).

Is this really needed?  First, your benchmarks so far have been done
with sp->last_level always set to true.  Second, you would only
encounter this optimization in kvm_mmu_commit_zap_page when zapping a 1
GB region (which then would be invalidated twice, at both the PMD and
PGD level) or bigger.

Paolo
Tianyu Lan Feb. 22, 2019, 3:16 p.m. | #5
On Fri, Feb 15, 2019 at 11:23 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 15/02/19 16:05, Tianyu Lan wrote:
> > Yes, you are right. Thanks to point out and will fix. The last_level
> > flag is to avoid adding middle page node(e.g, PGD, PMD)
> > into flush list. The address range will be duplicated if adding both
> > leaf, node and middle node into flush list.
>
> Hmm, that's not easy to track.  One kvm_mmu_page could include both leaf
> and non-leaf page (for example a huge page for 0 to 2 MB and a page
> table for 2 MB to 4 MB).
>
> Is this really needed?  First, your benchmarks so far have been done
> with sp->last_level always set to true.  Second, you would only
> encounter this optimization in kvm_mmu_commit_zap_page when zapping a 1
> GB region (which then would be invalidated twice, at both the PMD and
> PGD level) or bigger.
>
> Paolo

Hi Paolo:
             Sorry for later response and I tried to figure out a bug
lead by defining wrong
max flush count. I just sent out V3. I still put the last_level flag
patch in the end of patchset.
Detail please see the change log. Just like you said this was an
optimization and wasn't 100%
required. If you still have some concerns, you can ignore it and other
patches in this patchset
should be good. Thanks.

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4a3d3e58fe0a..9d858d68c17a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -325,6 +325,7 @@  struct kvm_mmu_page {
 	struct hlist_node flush_link;
 	struct hlist_node hash_link;
 	bool unsync;
+	bool last_level;
 
 	/*
 	 * The following two entries are used to key the shadow page in the
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ce770b446238..70cafd3f95ab 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2918,6 +2918,9 @@  static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 
 	if (level > PT_PAGE_TABLE_LEVEL)
 		spte |= PT_PAGE_SIZE_MASK;
+
+	sp->last_level = is_last_spte(spte, level);
+
 	if (tdp_enabled)
 		spte |= kvm_x86_ops->get_mt_mask(vcpu, gfn,
 			kvm_is_mmio_pfn(pfn));