Patchwork [5/6] KVM: Separate rmap_pde from kvm_lpage_info->write_count

login
register
mail settings
Submitter Takuya Yoshikawa
Date June 28, 2012, 2:01 a.m.
Message ID <20120628110141.7b908c91.yoshikawa.takuya@oss.ntt.co.jp>
Download mbox | patch
Permalink /patch/167781/
State New
Headers show

Comments

Takuya Yoshikawa - June 28, 2012, 2:01 a.m.
This makes it possible to loop over rmap_pde arrays in the same way as
we do over rmap so that we can optimize kvm_handle_hva_range() easily in
the following patch.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 arch/x86/include/asm/kvm_host.h |    2 +-
 arch/x86/kvm/mmu.c              |    6 +++---
 arch/x86/kvm/x86.c              |   11 +++++++++++
 3 files changed, 15 insertions(+), 4 deletions(-)
Xiao Guangrong - June 28, 2012, 3:12 a.m.
On 06/28/2012 10:01 AM, Takuya Yoshikawa wrote:
> This makes it possible to loop over rmap_pde arrays in the same way as
> we do over rmap so that we can optimize kvm_handle_hva_range() easily in
> the following patch.
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> ---
>  arch/x86/include/asm/kvm_host.h |    2 +-
>  arch/x86/kvm/mmu.c              |    6 +++---
>  arch/x86/kvm/x86.c              |   11 +++++++++++
>  3 files changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 5aab8d4..aea1673 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -499,11 +499,11 @@ struct kvm_vcpu_arch {
>  };
> 
>  struct kvm_lpage_info {
> -	unsigned long rmap_pde;
>  	int write_count;
>  };
> 
>  struct kvm_arch_memory_slot {
> +	unsigned long *rmap_pde[KVM_NR_PAGE_SIZES - 1];
>  	struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
>  };
> 

It looks little complex than before - need manage more alloc-ed/freed buffers.

Why not just introduce a function to get the next rmap? Something like this:

static unsigned long *get_next_rmap(unsigned long *rmap, int level)
{
	struct kvm_lpage_info *linfo

	if (level == 1)
		return rmap++
	
	linfo = container_of(rmap, struct kvm_lpage_info, rmap_pde);
	
	return &(++linfo)->rmap_pde
}

[ Completely untested ]

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Takuya Yoshikawa - June 28, 2012, 3:45 a.m.
On Thu, 28 Jun 2012 11:12:51 +0800
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:

> >  struct kvm_arch_memory_slot {
> > +	unsigned long *rmap_pde[KVM_NR_PAGE_SIZES - 1];
> >  	struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
> >  };
> > 
> 
> It looks little complex than before - need manage more alloc-ed/freed buffers.

Actually I want to integrate rmap and rmap_pde in the future:

	rmap[KVM_NR_PAGE_SIZES]

For this we need to modify some unrelated ppc code, so I just
avoided the integration in this series.

Note: write_count: 4 bytes, rmap_pde: 8 bytes.  So we are wasting
extra paddings by packing them into lpage_info.

> Why not just introduce a function to get the next rmap? Something like this:

I want to eliminate this kind of overheads.

Thanks,
	Takuya

> static unsigned long *get_next_rmap(unsigned long *rmap, int level)
> {
> 	struct kvm_lpage_info *linfo
> 
> 	if (level == 1)
> 		return rmap++
> 	
> 	linfo = container_of(rmap, struct kvm_lpage_info, rmap_pde);
> 	
> 	return &(++linfo)->rmap_pde
> }
> 
> [ Completely untested ]
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity - June 28, 2012, 5:39 p.m.
On 06/28/2012 06:45 AM, Takuya Yoshikawa wrote:
> On Thu, 28 Jun 2012 11:12:51 +0800
> Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> wrote:
> 
>> >  struct kvm_arch_memory_slot {
>> > +	unsigned long *rmap_pde[KVM_NR_PAGE_SIZES - 1];
>> >  	struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
>> >  };
>> > 
>> 
>> It looks little complex than before - need manage more alloc-ed/freed buffers.
> 
> Actually I want to integrate rmap and rmap_pde in the future:
> 
> 	rmap[KVM_NR_PAGE_SIZES]

That's a good direction.

> 
> For this we need to modify some unrelated ppc code, so I just
> avoided the integration in this series.
> 
> Note: write_count: 4 bytes, rmap_pde: 8 bytes.  So we are wasting
> extra paddings by packing them into lpage_info.

The wastage is quite low since it's just 4 bytes per 2MB.

> 
>> Why not just introduce a function to get the next rmap? Something like this:
> 
> I want to eliminate this kind of overheads.

I don't think the overhead is significant.  rmap walk speed is largely a
function of cache misses IMO, and we may even be adding cache misses by
splitting lpage_info.

But I still think it's the right thing since it simplifies the code.
Maybe we should add a prefetch() on write_count do mitigate the
overhead, if it starts showing up in profiles.
Takuya Yoshikawa - June 29, 2012, 1:44 a.m.
On Thu, 28 Jun 2012 20:39:55 +0300
Avi Kivity <avi@redhat.com> wrote:

> > Note: write_count: 4 bytes, rmap_pde: 8 bytes.  So we are wasting
> > extra paddings by packing them into lpage_info.
> 
> The wastage is quite low since it's just 4 bytes per 2MB.

Yes.

> >> Why not just introduce a function to get the next rmap? Something like this:
> > 
> > I want to eliminate this kind of overheads.
> 
> I don't think the overhead is significant.  rmap walk speed is largely a
> function of cache misses IMO, and we may even be adding cache misses by
> splitting lpage_info.

Maybe.  But as far as I can see, write_count does not gain much from
being close to rmap_pde.

> But I still think it's the right thing since it simplifies the code.

After the rmap integration, we can remove
	if (likely(level == PT_PAGE_TABLE_LEVEL))
heuristics from __gfn_to_rmap().

As a bonus, the helper will become enough simple to be always inlined
which reduces some function calls.

> Maybe we should add a prefetch() on write_count do mitigate the
> overhead, if it starts showing up in profiles.

OK, I will post rmap integration work as soon as possible, but it
still needs to be synced with unrelated ppc works at some point in
the future: so please take that separately from this series.

Thanks,
	Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiao Guangrong - July 2, 2012, 2:22 a.m.
On 06/29/2012 01:39 AM, Avi Kivity wrote:

> But I still think it's the right thing since it simplifies the code.
> Maybe we should add a prefetch() on write_count do mitigate the
> overhead, if it starts showing up in profiles.
> 

Long time ago, there was a discussion about dropping prefetch in the
operation of list walking:
	http://lwn.net/Articles/444336/
IIRC, the conclusion is that it is better to let CPU prefetch memory
by itself.

Actually, when i developed lockless spte walking, i measure the performance if
prefetch was used, but i did not see the improvement.


--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5aab8d4..aea1673 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -499,11 +499,11 @@  struct kvm_vcpu_arch {
 };
 
 struct kvm_lpage_info {
-	unsigned long rmap_pde;
 	int write_count;
 };
 
 struct kvm_arch_memory_slot {
+	unsigned long *rmap_pde[KVM_NR_PAGE_SIZES - 1];
 	struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
 };
 
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index edf54ae..477b3da 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -931,13 +931,13 @@  static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn)
 static unsigned long *__gfn_to_rmap(gfn_t gfn, int level,
 				    struct kvm_memory_slot *slot)
 {
-	struct kvm_lpage_info *linfo;
+	unsigned long idx;
 
 	if (likely(level == PT_PAGE_TABLE_LEVEL))
 		return &slot->rmap[gfn - slot->base_gfn];
 
-	linfo = lpage_info_slot(gfn, slot, level);
-	return &linfo->rmap_pde;
+	idx = gfn_to_index(gfn, slot->base_gfn, level);
+	return &slot->arch.rmap_pde[level - PT_DIRECTORY_LEVEL][idx];
 }
 
 /*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8eacb2e..9136f2c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6313,6 +6313,10 @@  void kvm_arch_free_memslot(struct kvm_memory_slot *free,
 	int i;
 
 	for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) {
+		if (!dont || free->arch.rmap_pde[i] != dont->arch.rmap_pde[i]) {
+			kvm_kvfree(free->arch.rmap_pde[i]);
+			free->arch.rmap_pde[i] = NULL;
+		}
 		if (!dont || free->arch.lpage_info[i] != dont->arch.lpage_info[i]) {
 			kvm_kvfree(free->arch.lpage_info[i]);
 			free->arch.lpage_info[i] = NULL;
@@ -6332,6 +6336,11 @@  int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
 		lpages = gfn_to_index(slot->base_gfn + npages - 1,
 				      slot->base_gfn, level) + 1;
 
+		slot->arch.rmap_pde[i] =
+			kvm_kvzalloc(lpages * sizeof(*slot->arch.rmap_pde[i]));
+		if (!slot->arch.rmap_pde[i])
+			goto out_free;
+
 		slot->arch.lpage_info[i] =
 			kvm_kvzalloc(lpages * sizeof(*slot->arch.lpage_info[i]));
 		if (!slot->arch.lpage_info[i])
@@ -6360,7 +6369,9 @@  int kvm_arch_create_memslot(struct kvm_memory_slot *slot, unsigned long npages)
 
 out_free:
 	for (i = 0; i < KVM_NR_PAGE_SIZES - 1; ++i) {
+		kvm_kvfree(slot->arch.rmap_pde[i]);
 		kvm_kvfree(slot->arch.lpage_info[i]);
+		slot->arch.rmap_pde[i] = NULL;
 		slot->arch.lpage_info[i] = NULL;
 	}
 	return -ENOMEM;