From patchwork Thu Apr 4 05:58:03 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Aneesh Kumar K.V" X-Patchwork-Id: 233646 X-Patchwork-Delegate: michael@ellerman.id.au Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from ozlabs.org (localhost [IPv6:::1]) by ozlabs.org (Postfix) with ESMTP id D27EC2C0B06 for ; Thu, 4 Apr 2013 17:14:07 +1100 (EST) Received: from e28smtp01.in.ibm.com (e28smtp01.in.ibm.com [122.248.162.1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client CN "e28smtp01.in.ibm.com", Issuer "GeoTrust SSL CA" (not verified)) by ozlabs.org (Postfix) with ESMTPS id 79C952C0133 for ; Thu, 4 Apr 2013 16:58:29 +1100 (EST) Received: from /spool/local by e28smtp01.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 4 Apr 2013 11:24:14 +0530 Received: from d28dlp01.in.ibm.com (9.184.220.126) by e28smtp01.in.ibm.com (192.168.1.131) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 4 Apr 2013 11:24:12 +0530 Received: from d28relay04.in.ibm.com (d28relay04.in.ibm.com [9.184.220.61]) by d28dlp01.in.ibm.com (Postfix) with ESMTP id 60174E004A for ; Thu, 4 Apr 2013 11:30:08 +0530 (IST) Received: from d28av01.in.ibm.com (d28av01.in.ibm.com [9.184.220.63]) by d28relay04.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r345wKiE54329438 for ; Thu, 4 Apr 2013 11:28:20 +0530 Received: from d28av01.in.ibm.com (loopback [127.0.0.1]) by d28av01.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r345wLk9027004 for ; Thu, 4 Apr 2013 05:58:22 GMT Received: from skywalker.in.ibm.com ([9.77.205.26]) by d28av01.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id r345wBIm026350; Thu, 4 Apr 2013 05:58:21 GMT From: "Aneesh Kumar K.V" To: benh@kernel.crashing.org, paulus@samba.org Subject: [PATCH -V5 25/25] powerpc: Handle hugepages in kvm Date: Thu, 4 Apr 2013 11:28:03 +0530 Message-Id: <1365055083-31956-26-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.10 In-Reply-To: <1365055083-31956-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> References: <1365055083-31956-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com> X-TM-AS-MML: No X-Content-Scanned: Fidelis XPS MAILER x-cbid: 13040405-4790-0000-0000-000007B237BB Cc: linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, "Aneesh Kumar K.V" X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" From: "Aneesh Kumar K.V" We could possibly avoid some of these changes because most of the HUGE PMD bits map to PTE bits. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/kvm_book3s_64.h | 31 ++++++++++++ arch/powerpc/kvm/book3s_64_mmu_hv.c | 12 ++++- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 75 ++++++++++++++++++++++-------- 3 files changed, 97 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 38bec1d..1c5c799 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -110,6 +110,7 @@ static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, return rb; } +/* FIXME !! should we use hpte_actual_psize or hpte decode ? */ static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) { /* only handle 4k, 64k and 16M pages for now */ @@ -189,6 +190,36 @@ static inline pte_t kvmppc_read_update_linux_pte(pte_t *p, int writing) return pte; } +/* + * Lock and read a linux hugepage PMD. If it's present and writable, atomically + * set dirty and referenced bits and return the PMD, otherwise return 0. + */ +static inline pmd_t kvmppc_read_update_linux_hugepmd(pmd_t *p, int writing) +{ + pmd_t pmd, tmp; + + /* wait until _PAGE_BUSY is clear then set it atomically */ + __asm__ __volatile__ ( + "1: ldarx %0,0,%3\n" + " andi. %1,%0,%4\n" + " bne- 1b\n" + " ori %1,%0,%4\n" + " stdcx. %1,0,%3\n" + " bne- 1b" + : "=&r" (pmd), "=&r" (tmp), "=m" (*p) + : "r" (p), "i" (PMD_HUGE_BUSY) + : "cc"); + + if (pmd_large(pmd)) { + pmd = pmd_mkyoung(pmd); + if (writing && pmd_write(pmd)) + pmd = pte_mkdirty(pmd); + } + + *p = pmd; /* clears PMD_HUGE_BUSY */ + return pmd; +} + /* Return HPTE cache control bits corresponding to Linux pte bits */ static inline unsigned long hpte_cache_bits(unsigned long pte_val) { diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 4f2a7dc..da006da 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -675,6 +675,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, } /* if the guest wants write access, see if that is OK */ if (!writing && hpte_is_writable(r)) { + int hugepage; pte_t *ptep, pte; /* @@ -683,11 +684,18 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, */ rcu_read_lock_sched(); ptep = find_linux_pte_or_hugepte(current->mm->pgd, - hva, NULL, NULL); - if (ptep && pte_present(*ptep)) { + hva, NULL, &hugepage); + if (!hugepage && ptep && pte_present(*ptep)) { pte = kvmppc_read_update_linux_pte(ptep, 1); if (pte_write(pte)) write_ok = 1; + } else if (hugepage && ptep) { + pmd_t pmd = *(pmd_t *)ptep; + if (pmd_large(pmd)) { + pmd = kvmppc_read_update_linux_hugepmd((pmd_t *)ptep, 1); + if (pmd_write(pmd)) + write_ok = 1; + } } rcu_read_unlock_sched(); } diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c index 7c8e1ed..e9d4e3a 100644 --- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c +++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c @@ -146,24 +146,37 @@ static void remove_revmap_chain(struct kvm *kvm, long pte_index, } static pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva, - int writing, unsigned long *pte_sizep) + int writing, unsigned long *pte_sizep, + int *hugepage) { pte_t *ptep; unsigned long ps = *pte_sizep; unsigned int shift; - ptep = find_linux_pte_or_hugepte(pgdir, hva, &shift, NULL); + ptep = find_linux_pte_or_hugepte(pgdir, hva, &shift, hugepage); if (!ptep) return __pte(0); - if (shift) - *pte_sizep = 1ul << shift; - else - *pte_sizep = PAGE_SIZE; + if (*hugepage) { + *pte_sizep = 1ul << 24; + } else { + if (shift) + *pte_sizep = 1ul << shift; + else + *pte_sizep = PAGE_SIZE; + } if (ps > *pte_sizep) return __pte(0); - if (!pte_present(*ptep)) - return __pte(0); - return kvmppc_read_update_linux_pte(ptep, writing); + + if (*hugepage) { + pmd_t *pmdp = (pmd_t *)ptep; + if (!pmd_large(*pmdp)) + return __pmd(0); + return kvmppc_read_update_linux_hugepmd(pmdp, writing); + } else { + if (!pte_present(*ptep)) + return __pte(0); + return kvmppc_read_update_linux_pte(ptep, writing); + } } static inline void unlock_hpte(unsigned long *hpte, unsigned long hpte_v) @@ -239,18 +252,34 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags, pte_size = PAGE_SIZE << (pa & KVMPPC_PAGE_ORDER_MASK); pa &= PAGE_MASK; } else { + int hugepage; + /* Translate to host virtual address */ hva = __gfn_to_hva_memslot(memslot, gfn); /* Look up the Linux PTE for the backing page */ pte_size = psize; - pte = lookup_linux_pte(pgdir, hva, writing, &pte_size); - if (pte_present(pte)) { - if (writing && !pte_write(pte)) - /* make the actual HPTE be read-only */ - ptel = hpte_make_readonly(ptel); - is_io = hpte_cache_bits(pte_val(pte)); - pa = pte_pfn(pte) << PAGE_SHIFT; + pte = lookup_linux_pte(pgdir, hva, writing, &pte_size, &hugepage); + if (hugepage) { + pmd_t pmd = (pmd_t)pte; + if (!pmd_large(pmd)) { + if (writing && !pmd_write(pmd)) + /* make the actual HPTE be read-only */ + ptel = hpte_make_readonly(ptel); + /* + * we support hugepage only for RAM + */ + is_io = 0; + pa = pmd_pfn(pmd) << PAGE_SHIFT; + } + } else { + if (pte_present(pte)) { + if (writing && !pte_write(pte)) + /* make the actual HPTE be read-only */ + ptel = hpte_make_readonly(ptel); + is_io = hpte_cache_bits(pte_val(pte)); + pa = pte_pfn(pte) << PAGE_SHIFT; + } } } @@ -645,10 +674,18 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags, gfn = ((r & HPTE_R_RPN) & ~(psize - 1)) >> PAGE_SHIFT; memslot = __gfn_to_memslot(kvm_memslots(kvm), gfn); if (memslot) { + int hugepage; hva = __gfn_to_hva_memslot(memslot, gfn); - pte = lookup_linux_pte(pgdir, hva, 1, &psize); - if (pte_present(pte) && !pte_write(pte)) - r = hpte_make_readonly(r); + pte = lookup_linux_pte(pgdir, hva, 1, + &psize, &hugepage); + if (hugepage) { + pmd_t pmd = (pmd_t)pte; + if (pmd_large(pmd) && !pmd_write(pmd)) + r = hpte_make_readonly(r); + } else { + if (pte_present(pte) && !pte_write(pte)) + r = hpte_make_readonly(r); + } } } }