diff mbox

[3/4] KVM: MMU: Make kvm_handle_hva() handle range of addresses

Message ID 20120615203230.2c577652.yoshikawa.takuya@oss.ntt.co.jp
State New, archived
Headers show

Commit Message

Takuya Yoshikawa June 15, 2012, 11:32 a.m. UTC
When guest's memory is backed by THP pages, MMU notifier needs to call
kvm_unmap_hva(), which in turn leads to kvm_handle_hva(), in a loop to
invalidate a range of pages which constitute one huge page:

  for each guest page
    for each memslot
      if page is in memslot
        unmap using rmap

This means although every page in that range is expected to be found in
the same memslot, we are forced to check unrelated memslots many times.
If the guest has more memslots, the situation will become worse.

This patch, together with the following patch, solves this problem by
introducing kvm_handle_hva_range() which makes the loop look like this:

  for each memslot
    for each guest page in memslot
      unmap using rmap

In this new processing, the actual work is converted to the loop over
rmap array which is much more cache friendly than before.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: Alexander Graf <agraf@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c |   25 +++++++++++++++++++------
 arch/x86/kvm/mmu.c                  |   32 ++++++++++++++++++++++++--------
 2 files changed, 43 insertions(+), 14 deletions(-)

Comments

Avi Kivity June 18, 2012, 12:11 p.m. UTC | #1
On 06/15/2012 02:32 PM, Takuya Yoshikawa wrote:
> When guest's memory is backed by THP pages, MMU notifier needs to call
> kvm_unmap_hva(), which in turn leads to kvm_handle_hva(), in a loop to
> invalidate a range of pages which constitute one huge page:
> 
>   for each guest page
>     for each memslot
>       if page is in memslot
>         unmap using rmap
> 
> This means although every page in that range is expected to be found in
> the same memslot, we are forced to check unrelated memslots many times.
> If the guest has more memslots, the situation will become worse.
> 
> This patch, together with the following patch, solves this problem by
> introducing kvm_handle_hva_range() which makes the loop look like this:
> 
>   for each memslot
>     for each guest page in memslot
>       unmap using rmap
> 
> In this new processing, the actual work is converted to the loop over
> rmap array which is much more cache friendly than before.


Moreover, if the pages are in no slot (munmap of some non-guest memory),
then we're iterating over all those pages for no purpose.

> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index ba57b3b..3629f9b 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1185,10 +1185,13 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
>  	return 0;
>  }
>  
> -static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
> -			  unsigned long data,
> -			  int (*handler)(struct kvm *kvm, unsigned long *rmapp,
> -					 unsigned long data))
> +static int kvm_handle_hva_range(struct kvm *kvm,
> +				unsigned long start_hva,
> +				unsigned long end_hva,
> +				unsigned long data,
> +				int (*handler)(struct kvm *kvm,
> +					       unsigned long *rmapp,
> +					       unsigned long data))
>  {
>  	int j;
>  	int ret;
> @@ -1199,10 +1202,13 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
>  	slots = kvm_memslots(kvm);
>  
>  	kvm_for_each_memslot(memslot, slots) {
> -		gfn_t gfn = hva_to_gfn(hva, memslot);
> +		gfn_t gfn = hva_to_gfn(start_hva, memslot);
> +		gfn_t end_gfn = hva_to_gfn(end_hva, memslot);

These will return random results which you then use in min/max later, no?

> +
> +		gfn = max(gfn, memslot->base_gfn);
> +		end_gfn = min(end_gfn, memslot->base_gfn + memslot->npages);
>  
> -		if (gfn >= memslot->base_gfn &&
> -		    gfn < memslot->base_gfn + memslot->npages) {
> +		for (; gfn < end_gfn; gfn++) {
>  			ret = 0;
>  
>  			for (j = PT_PAGE_TABLE_LEVEL;
> @@ -1212,7 +1218,9 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
>  				rmapp = __gfn_to_rmap(gfn, j, memslot);
>  				ret |= handler(kvm, rmapp, data);

Potential for improvement: don't do 512 iterations on same large page.

Something like

    if ((gfn ^ prev_gfn) & mask(level))
        ret |= handler(...)

with clever selection of the first prev_gfn so it always matches (~gfn
maybe).
Takuya Yoshikawa June 18, 2012, 1:20 p.m. UTC | #2
On Mon, 18 Jun 2012 15:11:42 +0300
Avi Kivity <avi@redhat.com> wrote:


> >  	kvm_for_each_memslot(memslot, slots) {
> > -		gfn_t gfn = hva_to_gfn(hva, memslot);
> > +		gfn_t gfn = hva_to_gfn(start_hva, memslot);
> > +		gfn_t end_gfn = hva_to_gfn(end_hva, memslot);
> 
> These will return random results which you then use in min/max later, no?

Yes, I will follow your advice: check-then-convert (or check-and-convert).

> > @@ -1212,7 +1218,9 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
> >  				rmapp = __gfn_to_rmap(gfn, j, memslot);
> >  				ret |= handler(kvm, rmapp, data);
> 
> Potential for improvement: don't do 512 iterations on same large page.
> 
> Something like
> 
>     if ((gfn ^ prev_gfn) & mask(level))
>         ret |= handler(...)
> 
> with clever selection of the first prev_gfn so it always matches (~gfn
> maybe).

Really nice.
I'm sure that will make this much faster!

Thanks,
	Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Takuya Yoshikawa June 19, 2012, 1:46 p.m. UTC | #3
On Mon, 18 Jun 2012 15:11:42 +0300
Avi Kivity <avi@redhat.com> wrote:

> Potential for improvement: don't do 512 iterations on same large page.
> 
> Something like
> 
>     if ((gfn ^ prev_gfn) & mask(level))
>         ret |= handler(...)
> 
> with clever selection of the first prev_gfn so it always matches (~gfn
> maybe).


I thought up a better solution:

1. Separate rmap_pde from lpage_info->write_count and
   make this a simple array. (I once tried this.)

2. Use gfn_to_index() and loop over rmap array:
 ...
  /* intersection check */
  start = max(start, memslot->userspace_addr);
  end = min(end, memslot->userspace_addr +
                 (memslot->npages << PAGE_SHIFT));
  if (start > end)
      continue;

  /* hva to gfn conversion */
  gfn_start = hva_to_gfn_memslot(start);
  gfn_end = hva_to_gfn_memslot(end);

  /* main part */
  for each level {
      rmapp = __gfn_to_rmap(gfn_start, level, memslot);
      for (idx = gfn_to_index(gfn_start, memslot->base_gfn, level);
           idx < gfn_to_index(gfn_end, memslot->base_gfn, level); idx++) {
              ...
          /* loop over rmap array */
          ret |= handler(kvm, rmapp + idx, data);
      }
  }


Thanks,
	Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity June 21, 2012, 8:24 a.m. UTC | #4
On 06/19/2012 04:46 PM, Takuya Yoshikawa wrote:
> On Mon, 18 Jun 2012 15:11:42 +0300
> Avi Kivity <avi@redhat.com> wrote:
> 
>> Potential for improvement: don't do 512 iterations on same large page.
>> 
>> Something like
>> 
>>     if ((gfn ^ prev_gfn) & mask(level))
>>         ret |= handler(...)
>> 
>> with clever selection of the first prev_gfn so it always matches (~gfn
>> maybe).
> 
> 
> I thought up a better solution:
> 
> 1. Separate rmap_pde from lpage_info->write_count and
>    make this a simple array. (I once tried this.)
> 

This has the potential to increase cache misses, but I don't think it's
a killer.  The separation can simplify other things as well.


> 2. Use gfn_to_index() and loop over rmap array:
>  ...
>   /* intersection check */
>   start = max(start, memslot->userspace_addr);
>   end = min(end, memslot->userspace_addr +
>                  (memslot->npages << PAGE_SHIFT));
>   if (start > end)
>       continue;
> 
>   /* hva to gfn conversion */
>   gfn_start = hva_to_gfn_memslot(start);
>   gfn_end = hva_to_gfn_memslot(end);
> 
>   /* main part */
>   for each level {
>       rmapp = __gfn_to_rmap(gfn_start, level, memslot);
>       for (idx = gfn_to_index(gfn_start, memslot->base_gfn, level);
>            idx < gfn_to_index(gfn_end, memslot->base_gfn, level); idx++) {
>               ...
>           /* loop over rmap array */
>           ret |= handler(kvm, rmapp + idx, data);
>       }
>   }
> 

Probably want idx <= gfn_to_index(gfn_end-1, ...) otherwise we fail on
small slots.
Takuya Yoshikawa June 21, 2012, 1:41 p.m. UTC | #5
I should have read this before sending v2...

On Thu, 21 Jun 2012 11:24:59 +0300
Avi Kivity <avi@redhat.com> wrote:

> > 1. Separate rmap_pde from lpage_info->write_count and
> >    make this a simple array. (I once tried this.)
> > 
> 
> This has the potential to increase cache misses, but I don't think it's
> a killer.  The separation can simplify other things as well.

Yes, I think so too.

IIRC, write_count and rmap_pde are not used together so often.

> > 2. Use gfn_to_index() and loop over rmap array:

...

> >   /* main part */
> >   for each level {
> >       rmapp = __gfn_to_rmap(gfn_start, level, memslot);
> >       for (idx = gfn_to_index(gfn_start, memslot->base_gfn, level);
> >            idx < gfn_to_index(gfn_end, memslot->base_gfn, level); idx++) {
> >               ...
> >           /* loop over rmap array */
> >           ret |= handler(kvm, rmapp + idx, data);
> >       }
> >   }
> > 
> 
> Probably want idx <= gfn_to_index(gfn_end-1, ...) otherwise we fail on
> small slots.

I was thinking the same thing when making v2.
But I will check the boundary condition again.

(mmu_notifier + memslot + lpage + rmap...) * alignment...
Very confusing.

Thanks,
	Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 53716dd..97465ba 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -756,9 +756,12 @@  int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	goto out_put;
 }
 
-static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
-			  int (*handler)(struct kvm *kvm, unsigned long *rmapp,
-					 unsigned long gfn))
+static int kvm_handle_hva_range(struct kvm *kvm,
+				unsigned long start_hva,
+				unsigned long end_hva,
+				int (*handler)(struct kvm *kvm,
+					       unsigned long *rmapp,
+					       unsigned long gfn))
 {
 	int ret;
 	int retval = 0;
@@ -767,10 +770,13 @@  static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 
 	slots = kvm_memslots(kvm);
 	kvm_for_each_memslot(memslot, slots) {
-		gfn_t gfn = hva_to_gfn(hva, memslot);
+		gfn_t gfn = hva_to_gfn(start_hva, memslot);
+		gfn_t end_gfn = hva_to_gfn(end_hva, memslot);
 
-		if (gfn >= memslot->base_gfn &&
-		    gfn < memslot->base_gfn + memslot->npages) {
+		gfn = max(gfn, memslot->base_gfn);
+		end_gfn = min(end_gfn, memslot->base_gfn + memslot->npages);
+
+		for (; gfn < end_gfn; gfn++) {
 			gfn_t gfn_offset = gfn - memslot->base_gfn;
 
 			ret = handler(kvm, &memslot->rmap[gfn_offset], gfn);
@@ -781,6 +787,13 @@  static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 	return retval;
 }
 
+static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
+			  int (*handler)(struct kvm *kvm, unsigned long *rmapp,
+					 unsigned long gfn))
+{
+	return kvm_handle_hva_range(kvm, hva, hva + PAGE_SIZE, handler);
+}
+
 static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			   unsigned long gfn)
 {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ba57b3b..3629f9b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1185,10 +1185,13 @@  static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
 	return 0;
 }
 
-static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
-			  unsigned long data,
-			  int (*handler)(struct kvm *kvm, unsigned long *rmapp,
-					 unsigned long data))
+static int kvm_handle_hva_range(struct kvm *kvm,
+				unsigned long start_hva,
+				unsigned long end_hva,
+				unsigned long data,
+				int (*handler)(struct kvm *kvm,
+					       unsigned long *rmapp,
+					       unsigned long data))
 {
 	int j;
 	int ret;
@@ -1199,10 +1202,13 @@  static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 	slots = kvm_memslots(kvm);
 
 	kvm_for_each_memslot(memslot, slots) {
-		gfn_t gfn = hva_to_gfn(hva, memslot);
+		gfn_t gfn = hva_to_gfn(start_hva, memslot);
+		gfn_t end_gfn = hva_to_gfn(end_hva, memslot);
+
+		gfn = max(gfn, memslot->base_gfn);
+		end_gfn = min(end_gfn, memslot->base_gfn + memslot->npages);
 
-		if (gfn >= memslot->base_gfn &&
-		    gfn < memslot->base_gfn + memslot->npages) {
+		for (; gfn < end_gfn; gfn++) {
 			ret = 0;
 
 			for (j = PT_PAGE_TABLE_LEVEL;
@@ -1212,7 +1218,9 @@  static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 				rmapp = __gfn_to_rmap(gfn, j, memslot);
 				ret |= handler(kvm, rmapp, data);
 			}
-			trace_kvm_age_page(hva, memslot, ret);
+			trace_kvm_age_page(memslot->userspace_addr +
+					(gfn - memslot->base_gfn) * PAGE_SIZE,
+					memslot, ret);
 			retval |= ret;
 		}
 	}
@@ -1220,6 +1228,14 @@  static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
 	return retval;
 }
 
+static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
+			  unsigned long data,
+			  int (*handler)(struct kvm *kvm, unsigned long *rmapp,
+					 unsigned long data))
+{
+	return kvm_handle_hva_range(kvm, hva, hva + PAGE_SIZE, data, handler);
+}
+
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
 {
 	return kvm_handle_hva(kvm, hva, 0, kvm_unmap_rmapp);