Patchwork [6/6] KVM: MMU: Avoid handling same rmap_pde in kvm_handle_hva_range()

login
register
mail settings
Submitter Takuya Yoshikawa
Date June 28, 2012, 2:02 a.m.
Message ID <20120628110235.7410445e.yoshikawa.takuya@oss.ntt.co.jp>
Download mbox | patch
Permalink /patch/167782/
State New
Headers show

Comments

Takuya Yoshikawa - June 28, 2012, 2:02 a.m.
When we invalidate a THP page, we call the handler with the same
rmap_pde argument 512 times in the following loop:

  for each guest page in the range
    for each level
      unmap using rmap

This patch avoids these extra handler calls by changing the loop order
like this:

  for each level
    for each rmap in the range
      unmap using rmap

With the preceding patches in the patch series, this made THP page
invalidation more than 5 times faster on our x86 host: the host became
more responsive during swapping the guest's memory as a result.

Note: in the new code we could not use trace_kvm_age_page(), so we just
dropped the point from kvm_handle_hva_range().

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 arch/x86/kvm/mmu.c |   37 +++++++++++++++++++------------------
 1 files changed, 19 insertions(+), 18 deletions(-)
Avi Kivity - June 28, 2012, 5:53 p.m.
On 06/28/2012 05:02 AM, Takuya Yoshikawa wrote:
> When we invalidate a THP page, we call the handler with the same
> rmap_pde argument 512 times in the following loop:
> 
>   for each guest page in the range
>     for each level
>       unmap using rmap
> 
> This patch avoids these extra handler calls by changing the loop order
> like this:
> 
>   for each level
>     for each rmap in the range
>       unmap using rmap
> 
> With the preceding patches in the patch series, this made THP page
> invalidation more than 5 times faster on our x86 host: the host became
> more responsive during swapping the guest's memory as a result.
> 
> Note: in the new code we could not use trace_kvm_age_page(), so we just
> dropped the point from kvm_handle_hva_range().
> 

Can't it be pushed to handler()?
Takuya Yoshikawa - June 29, 2012, 1:46 a.m.
On Thu, 28 Jun 2012 20:53:47 +0300
Avi Kivity <avi@redhat.com> wrote:

> > Note: in the new code we could not use trace_kvm_age_page(), so we just
> > dropped the point from kvm_handle_hva_range().
> > 
> 
> Can't it be pushed to handler()?

Yes, but it will be changed to print rmap, not hva and gfn.
I will do in the next version.

Thanks,
	Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity - July 1, 2012, 7:41 a.m.
On 06/29/2012 04:46 AM, Takuya Yoshikawa wrote:
> On Thu, 28 Jun 2012 20:53:47 +0300
> Avi Kivity <avi@redhat.com> wrote:
> 
>> > Note: in the new code we could not use trace_kvm_age_page(), so we just
>> > dropped the point from kvm_handle_hva_range().
>> > 
>> 
>> Can't it be pushed to handler()?
> 
> Yes, but it will be changed to print rmap, not hva and gfn.
> I will do in the next version.

We should try to avoid changing the tracepoint ABI.  If we pass the
memslot to handler(), we should be able to reconstruct the gfn and hva.
Takuya Yoshikawa - July 1, 2012, 1:18 p.m.
On Sun, 01 Jul 2012 10:41:05 +0300
Avi Kivity <avi@redhat.com> wrote:

> >> > Note: in the new code we could not use trace_kvm_age_page(), so we just
> >> > dropped the point from kvm_handle_hva_range().
> >> > 
> >> 
> >> Can't it be pushed to handler()?
> > 
> > Yes, but it will be changed to print rmap, not hva and gfn.
> > I will do in the next version.
> 
> We should try to avoid changing the tracepoint ABI.  If we pass the
> memslot to handler(), we should be able to reconstruct the gfn and hva.

That's not enough because we only know rmapp in the handler: we also need
level or something.  I agree that passing memslot for keeping the ABI is
good, but how about two more arguments?

BTW, trace_kvm_age_page() is tracing every handler, not just kvm_age_rmapp().
Is this intentional?

If we trace in the caller, in kvm_age_hva(), we know hva, but not gfn.

Thanks,
	Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity - July 1, 2012, 1:31 p.m.
On 07/01/2012 04:18 PM, Takuya Yoshikawa wrote:
> On Sun, 01 Jul 2012 10:41:05 +0300
> Avi Kivity <avi@redhat.com> wrote:
> 
>> >> > Note: in the new code we could not use trace_kvm_age_page(), so we just
>> >> > dropped the point from kvm_handle_hva_range().
>> >> > 
>> >> 
>> >> Can't it be pushed to handler()?
>> > 
>> > Yes, but it will be changed to print rmap, not hva and gfn.
>> > I will do in the next version.
>> 
>> We should try to avoid changing the tracepoint ABI.  If we pass the
>> memslot to handler(), we should be able to reconstruct the gfn and hva.
> 
> That's not enough because we only know rmapp in the handler: we also need
> level or something.  I agree that passing memslot for keeping the ABI is
> good, but how about two more arguments?

I don't see a problem with that.  But it would be good to move all the
address transformations into tp_fast_assign so we only perform them if
the tracepoint is active.

> BTW, trace_kvm_age_page() is tracing every handler, not just kvm_age_rmapp().
> Is this intentional?

No, it's a bug.  Pushing the trace into the handler will fix it though.

> If we trace in the caller, in kvm_age_hva(), we know hva, but not gfn.

gfn is useful to correlate to later events with the same gfn.

Patch

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 477b3da..524f7c0 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1192,8 +1192,7 @@  static int kvm_handle_hva_range(struct kvm *kvm,
 					       unsigned long data))
 {
 	int j;
-	int ret;
-	int retval = 0;
+	int ret = 0;
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *memslot;
 
@@ -1201,7 +1200,7 @@  static int kvm_handle_hva_range(struct kvm *kvm,
 
 	kvm_for_each_memslot(memslot, slots) {
 		unsigned long hva_start, hva_end;
-		gfn_t gfn, gfn_end;
+		gfn_t gfn_start, gfn_end;
 
 		hva_start = max(start, memslot->userspace_addr);
 		hva_end = min(end, memslot->userspace_addr +
@@ -1210,29 +1209,31 @@  static int kvm_handle_hva_range(struct kvm *kvm,
 			continue;
 		/*
 		 * {gfn(page) | page intersects with [hva_start, hva_end)} =
-		 * {gfn, gfn+1, ..., gfn_end-1}.
+		 * {gfn_start, gfn_start+1, ..., gfn_end-1}.
 		 */
-		gfn = hva_to_gfn_memslot(hva_start, memslot);
+		gfn_start = hva_to_gfn_memslot(hva_start, memslot);
 		gfn_end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, memslot);
 
-		for (; gfn < gfn_end; ++gfn) {
-			ret = 0;
+		for (j = PT_PAGE_TABLE_LEVEL;
+		     j < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++j) {
+			unsigned long idx, idx_end;
+			unsigned long *rmapp;
 
-			for (j = PT_PAGE_TABLE_LEVEL;
-			     j < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++j) {
-				unsigned long *rmapp;
+			/*
+			 * {idx(page_j) | page_j intersects with
+			 *  [hva_start, hva_end)} ={idx, idx+1, ..., idx_end}.
+			 */
+			idx = gfn_to_index(gfn_start, memslot->base_gfn, j);
+			idx_end = gfn_to_index(gfn_end - 1, memslot->base_gfn, j);
 
-				rmapp = __gfn_to_rmap(gfn, j, memslot);
-				ret |= handler(kvm, rmapp, data);
-			}
-			trace_kvm_age_page(memslot->userspace_addr +
-					(gfn - memslot->base_gfn) * PAGE_SIZE,
-					memslot, ret);
-			retval |= ret;
+			rmapp = __gfn_to_rmap(gfn_start, j, memslot);
+
+			for (; idx <= idx_end; ++idx)
+				ret |= handler(kvm, rmapp++, data);
 		}
 	}
 
-	return retval;
+	return ret;
 }
 
 static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,