Message ID | 20120628110235.7410445e.yoshikawa.takuya@oss.ntt.co.jp |
---|---|
State | New, archived |
Headers | show |
On 06/28/2012 05:02 AM, Takuya Yoshikawa wrote: > When we invalidate a THP page, we call the handler with the same > rmap_pde argument 512 times in the following loop: > > for each guest page in the range > for each level > unmap using rmap > > This patch avoids these extra handler calls by changing the loop order > like this: > > for each level > for each rmap in the range > unmap using rmap > > With the preceding patches in the patch series, this made THP page > invalidation more than 5 times faster on our x86 host: the host became > more responsive during swapping the guest's memory as a result. > > Note: in the new code we could not use trace_kvm_age_page(), so we just > dropped the point from kvm_handle_hva_range(). > Can't it be pushed to handler()?
On Thu, 28 Jun 2012 20:53:47 +0300 Avi Kivity <avi@redhat.com> wrote: > > Note: in the new code we could not use trace_kvm_age_page(), so we just > > dropped the point from kvm_handle_hva_range(). > > > > Can't it be pushed to handler()? Yes, but it will be changed to print rmap, not hva and gfn. I will do in the next version. Thanks, Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/29/2012 04:46 AM, Takuya Yoshikawa wrote: > On Thu, 28 Jun 2012 20:53:47 +0300 > Avi Kivity <avi@redhat.com> wrote: > >> > Note: in the new code we could not use trace_kvm_age_page(), so we just >> > dropped the point from kvm_handle_hva_range(). >> > >> >> Can't it be pushed to handler()? > > Yes, but it will be changed to print rmap, not hva and gfn. > I will do in the next version. We should try to avoid changing the tracepoint ABI. If we pass the memslot to handler(), we should be able to reconstruct the gfn and hva.
On Sun, 01 Jul 2012 10:41:05 +0300 Avi Kivity <avi@redhat.com> wrote: > >> > Note: in the new code we could not use trace_kvm_age_page(), so we just > >> > dropped the point from kvm_handle_hva_range(). > >> > > >> > >> Can't it be pushed to handler()? > > > > Yes, but it will be changed to print rmap, not hva and gfn. > > I will do in the next version. > > We should try to avoid changing the tracepoint ABI. If we pass the > memslot to handler(), we should be able to reconstruct the gfn and hva. That's not enough because we only know rmapp in the handler: we also need level or something. I agree that passing memslot for keeping the ABI is good, but how about two more arguments? BTW, trace_kvm_age_page() is tracing every handler, not just kvm_age_rmapp(). Is this intentional? If we trace in the caller, in kvm_age_hva(), we know hva, but not gfn. Thanks, Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 07/01/2012 04:18 PM, Takuya Yoshikawa wrote: > On Sun, 01 Jul 2012 10:41:05 +0300 > Avi Kivity <avi@redhat.com> wrote: > >> >> > Note: in the new code we could not use trace_kvm_age_page(), so we just >> >> > dropped the point from kvm_handle_hva_range(). >> >> > >> >> >> >> Can't it be pushed to handler()? >> > >> > Yes, but it will be changed to print rmap, not hva and gfn. >> > I will do in the next version. >> >> We should try to avoid changing the tracepoint ABI. If we pass the >> memslot to handler(), we should be able to reconstruct the gfn and hva. > > That's not enough because we only know rmapp in the handler: we also need > level or something. I agree that passing memslot for keeping the ABI is > good, but how about two more arguments? I don't see a problem with that. But it would be good to move all the address transformations into tp_fast_assign so we only perform them if the tracepoint is active. > BTW, trace_kvm_age_page() is tracing every handler, not just kvm_age_rmapp(). > Is this intentional? No, it's a bug. Pushing the trace into the handler will fix it though. > If we trace in the caller, in kvm_age_hva(), we know hva, but not gfn. gfn is useful to correlate to later events with the same gfn.
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 477b3da..524f7c0 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1192,8 +1192,7 @@ static int kvm_handle_hva_range(struct kvm *kvm, unsigned long data)) { int j; - int ret; - int retval = 0; + int ret = 0; struct kvm_memslots *slots; struct kvm_memory_slot *memslot; @@ -1201,7 +1200,7 @@ static int kvm_handle_hva_range(struct kvm *kvm, kvm_for_each_memslot(memslot, slots) { unsigned long hva_start, hva_end; - gfn_t gfn, gfn_end; + gfn_t gfn_start, gfn_end; hva_start = max(start, memslot->userspace_addr); hva_end = min(end, memslot->userspace_addr + @@ -1210,29 +1209,31 @@ static int kvm_handle_hva_range(struct kvm *kvm, continue; /* * {gfn(page) | page intersects with [hva_start, hva_end)} = - * {gfn, gfn+1, ..., gfn_end-1}. + * {gfn_start, gfn_start+1, ..., gfn_end-1}. */ - gfn = hva_to_gfn_memslot(hva_start, memslot); + gfn_start = hva_to_gfn_memslot(hva_start, memslot); gfn_end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, memslot); - for (; gfn < gfn_end; ++gfn) { - ret = 0; + for (j = PT_PAGE_TABLE_LEVEL; + j < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++j) { + unsigned long idx, idx_end; + unsigned long *rmapp; - for (j = PT_PAGE_TABLE_LEVEL; - j < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++j) { - unsigned long *rmapp; + /* + * {idx(page_j) | page_j intersects with + * [hva_start, hva_end)} ={idx, idx+1, ..., idx_end}. + */ + idx = gfn_to_index(gfn_start, memslot->base_gfn, j); + idx_end = gfn_to_index(gfn_end - 1, memslot->base_gfn, j); - rmapp = __gfn_to_rmap(gfn, j, memslot); - ret |= handler(kvm, rmapp, data); - } - trace_kvm_age_page(memslot->userspace_addr + - (gfn - memslot->base_gfn) * PAGE_SIZE, - memslot, ret); - retval |= ret; + rmapp = __gfn_to_rmap(gfn_start, j, memslot); + + for (; idx <= idx_end; ++idx) + ret |= handler(kvm, rmapp++, data); } } - return retval; + return ret; } static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
When we invalidate a THP page, we call the handler with the same rmap_pde argument 512 times in the following loop: for each guest page in the range for each level unmap using rmap This patch avoids these extra handler calls by changing the loop order like this: for each level for each rmap in the range unmap using rmap With the preceding patches in the patch series, this made THP page invalidation more than 5 times faster on our x86 host: the host became more responsive during swapping the guest's memory as a result. Note: in the new code we could not use trace_kvm_age_page(), so we just dropped the point from kvm_handle_hva_range(). Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> --- arch/x86/kvm/mmu.c | 37 +++++++++++++++++++------------------ 1 files changed, 19 insertions(+), 18 deletions(-)