From patchwork Fri Mar 8 16:34:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leonardo Bras X-Patchwork-Id: 1053596 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.ibm.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 44GCnv1nF2z9s9y; Sat, 9 Mar 2019 03:37:07 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1h2IUX-0004dk-Vq; Fri, 08 Mar 2019 16:37:01 +0000 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5] helo=mx0a-001b2d01.pphosted.com) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1h2IUV-0004cm-Jo for kernel-team@lists.ubuntu.com; Fri, 08 Mar 2019 16:36:59 +0000 Received: from pps.filterd (m0098421.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x28GZVPu138001 for ; Fri, 8 Mar 2019 11:36:58 -0500 Received: from e13.ny.us.ibm.com (e13.ny.us.ibm.com [129.33.205.203]) by mx0a-001b2d01.pphosted.com with ESMTP id 2r3s7fgqxs-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Fri, 08 Mar 2019 11:36:58 -0500 Received: from localhost by e13.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Fri, 8 Mar 2019 16:36:57 -0000 Received: from b01cxnp22034.gho.pok.ibm.com (9.57.198.24) by e13.ny.us.ibm.com (146.89.104.200) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Fri, 8 Mar 2019 16:36:53 -0000 Received: from b01ledav005.gho.pok.ibm.com (b01ledav005.gho.pok.ibm.com [9.57.199.110]) by b01cxnp22034.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x28GaqRP22413562 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 8 Mar 2019 16:36:53 GMT Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D8B78AE067; Fri, 8 Mar 2019 16:36:52 +0000 (GMT) Received: from b01ledav005.gho.pok.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id BB4BEAE062; Fri, 8 Mar 2019 16:36:47 +0000 (GMT) Received: from LeoBras.ibmmodules.com (unknown [9.85.169.115]) by b01ledav005.gho.pok.ibm.com (Postfix) with ESMTP; Fri, 8 Mar 2019 16:36:45 +0000 (GMT) From: Leonardo Bras To: kernel-team@lists.ubuntu.com Subject: [PATCH 05/20] KVM: PPC: Book3S HV: Use __gfn_to_pfn_memslot() in page fault handler Date: Fri, 8 Mar 2019 13:34:51 -0300 X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190308163505.13021-1-leonardo@linux.ibm.com> References: <20190308163505.13021-1-leonardo@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 x-cbid: 19030816-0064-0000-0000-000003B646A8 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00010727; HX=3.00000242; KW=3.00000007; PH=3.00000004; SC=3.00000281; SDB=6.01171458; UDB=6.00612324; IPR=6.00952101; MB=3.00025894; MTD=3.00000008; XFM=3.00000015; UTC=2019-03-08 16:36:55 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 19030816-0065-0000-0000-00003CA7CD35 Message-Id: <20190308163505.13021-6-leonardo@linux.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:, , definitions=2019-03-08_14:, , signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=1 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=599 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1903080115 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Paul Mackerras Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Paul Mackerras This changes the hypervisor page fault handler for radix guests to use the generic KVM __gfn_to_pfn_memslot() function instead of using get_user_pages_fast() and then handling the case of VM_PFNMAP vmas specially. The old code missed the case of VM_IO vmas; with this change, VM_IO vmas will now be handled correctly by code within __gfn_to_pfn_memslot. Currently, __gfn_to_pfn_memslot calls hva_to_pfn, which only uses __get_user_pages_fast for the initial lookup in the cases where either atomic or async is set. Since we are not setting either atomic or async, we do our own __get_user_pages_fast first, for now. This also adds code to check for the KVM_MEM_READONLY flag on the memslot. If it is set and this is a write access, we synthesize a data storage interrupt for the guest. In the case where the page is not normal RAM (i.e. page == NULL in kvmppc_book3s_radix_page_fault(), we read the PTE from the Linux page tables because we need the mapping attribute bits as well as the PFN. (The mapping attribute bits indicate whether accesses have to be non-cacheable and/or guarded.) Signed-off-by: Paul Mackerras --- arch/powerpc/kvm/book3s_64_mmu_radix.c | 148 +++++++++++++++---------- 1 file changed, 88 insertions(+), 60 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 233b3d169121..a57eafec4dc2 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -395,11 +395,11 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned long mmu_seq, pte_size; unsigned long gpa, gfn, hva, pfn; struct kvm_memory_slot *memslot; - struct page *page = NULL, *pages[1]; - long ret, npages; - unsigned int writing; - struct vm_area_struct *vma; - unsigned long flags; + struct page *page = NULL; + long ret; + bool writing; + bool upgrade_write = false; + bool *upgrade_p = &upgrade_write; pte_t pte, *ptep; unsigned long pgflags; unsigned int shift, level; @@ -439,12 +439,17 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, dsisr & DSISR_ISSTORE); } - /* used to check for invalidations in progress */ - mmu_seq = kvm->mmu_notifier_seq; - smp_rmb(); - writing = (dsisr & DSISR_ISSTORE) != 0; - hva = gfn_to_hva_memslot(memslot, gfn); + if (memslot->flags & KVM_MEM_READONLY) { + if (writing) { + /* give the guest a DSI */ + dsisr = DSISR_ISSTORE | DSISR_PROTFAULT; + kvmppc_core_queue_data_storage(vcpu, ea, dsisr); + return RESUME_GUEST; + } + upgrade_p = NULL; + } + if (dsisr & DSISR_SET_RC) { /* * Need to set an R or C bit in the 2nd-level tables; @@ -473,69 +478,92 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, return RESUME_GUEST; } - ret = -EFAULT; - pfn = 0; - pte_size = PAGE_SIZE; - pgflags = _PAGE_READ | _PAGE_EXEC; - level = 0; - npages = get_user_pages_fast(hva, 1, writing, pages); - if (npages < 1) { - /* Check if it's an I/O mapping */ - down_read(¤t->mm->mmap_sem); - vma = find_vma(current->mm, hva); - if (vma && vma->vm_start <= hva && hva < vma->vm_end && - (vma->vm_flags & VM_PFNMAP)) { - pfn = vma->vm_pgoff + - ((hva - vma->vm_start) >> PAGE_SHIFT); - pgflags = pgprot_val(vma->vm_page_prot); - } - up_read(¤t->mm->mmap_sem); - if (!pfn) - return -EFAULT; - } else { - page = pages[0]; + /* used to check for invalidations in progress */ + mmu_seq = kvm->mmu_notifier_seq; + smp_rmb(); + + /* + * Do a fast check first, since __gfn_to_pfn_memslot doesn't + * do it with !atomic && !async, which is how we call it. + * We always ask for write permission since the common case + * is that the page is writable. + */ + hva = gfn_to_hva_memslot(memslot, gfn); + if (upgrade_p && __get_user_pages_fast(hva, 1, 1, &page) == 1) { pfn = page_to_pfn(page); - if (PageCompound(page)) { - pte_size <<= compound_order(compound_head(page)); - /* See if we can insert a 1GB or 2MB large PTE here */ - if (pte_size >= PUD_SIZE && - (gpa & (PUD_SIZE - PAGE_SIZE)) == - (hva & (PUD_SIZE - PAGE_SIZE))) { - level = 2; - pfn &= ~((PUD_SIZE >> PAGE_SHIFT) - 1); - } else if (pte_size >= PMD_SIZE && - (gpa & (PMD_SIZE - PAGE_SIZE)) == - (hva & (PMD_SIZE - PAGE_SIZE))) { - level = 1; - pfn &= ~((PMD_SIZE >> PAGE_SHIFT) - 1); - } + upgrade_write = true; + } else { + /* Call KVM generic code to do the slow-path check */ + pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, + writing, upgrade_p); + if (is_error_noslot_pfn(pfn)) + return -EFAULT; + page = NULL; + if (pfn_valid(pfn)) { + page = pfn_to_page(pfn); + if (PageReserved(page)) + page = NULL; } - /* See if we can provide write access */ - if (writing) { - pgflags |= _PAGE_WRITE; - } else { - local_irq_save(flags); - ptep = find_current_mm_pte(current->mm->pgd, - hva, NULL, NULL); - if (ptep && pte_write(*ptep)) - pgflags |= _PAGE_WRITE; - local_irq_restore(flags); + } + + /* See if we can insert a 1GB or 2MB large PTE here */ + level = 0; + if (page && PageCompound(page)) { + pte_size = PAGE_SIZE << compound_order(compound_head(page)); + if (pte_size >= PUD_SIZE && + (gpa & (PUD_SIZE - PAGE_SIZE)) == + (hva & (PUD_SIZE - PAGE_SIZE))) { + level = 2; + pfn &= ~((PUD_SIZE >> PAGE_SHIFT) - 1); + } else if (pte_size >= PMD_SIZE && + (gpa & (PMD_SIZE - PAGE_SIZE)) == + (hva & (PMD_SIZE - PAGE_SIZE))) { + level = 1; + pfn &= ~((PMD_SIZE >> PAGE_SHIFT) - 1); } } /* * Compute the PTE value that we need to insert. */ - pgflags |= _PAGE_PRESENT | _PAGE_PTE | _PAGE_ACCESSED; - if (pgflags & _PAGE_WRITE) - pgflags |= _PAGE_DIRTY; - pte = pfn_pte(pfn, __pgprot(pgflags)); + if (page) { + pgflags = _PAGE_READ | _PAGE_EXEC | _PAGE_PRESENT | _PAGE_PTE | + _PAGE_ACCESSED; + if (writing || upgrade_write) + pgflags |= _PAGE_WRITE | _PAGE_DIRTY; + pte = pfn_pte(pfn, __pgprot(pgflags)); + } else { + /* + * Read the PTE from the process' radix tree and use that + * so we get the attribute bits. + */ + local_irq_disable(); + ptep = __find_linux_pte(vcpu->arch.pgdir, hva, NULL, &shift); + pte = *ptep; + local_irq_enable(); + if (shift == PUD_SHIFT && + (gpa & (PUD_SIZE - PAGE_SIZE)) == + (hva & (PUD_SIZE - PAGE_SIZE))) { + level = 2; + } else if (shift == PMD_SHIFT && + (gpa & (PMD_SIZE - PAGE_SIZE)) == + (hva & (PMD_SIZE - PAGE_SIZE))) { + level = 1; + } else if (shift && shift != PAGE_SHIFT) { + /* Adjust PFN */ + unsigned long mask = (1ul << shift) - PAGE_SIZE; + pte = __pte(pte_val(pte) | (hva & mask)); + } + if (!(writing || upgrade_write)) + pte = __pte(pte_val(pte) & ~ _PAGE_WRITE); + pte = __pte(pte_val(pte) | _PAGE_EXEC); + } /* Allocate space in the tree and write the PTE */ ret = kvmppc_create_pte(kvm, pte, gpa, level, mmu_seq); if (page) { - if (!ret && (pgflags & _PAGE_WRITE)) + if (!ret && (pte_val(pte) & _PAGE_WRITE)) set_page_dirty_lock(page); put_page(page); }