From patchwork Tue Sep 4 08:16:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 965827 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=kvm-ppc-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="BTqmuWyb"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 424KRJ1rT8z9sCs for ; Tue, 4 Sep 2018 18:16:12 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726133AbeIDMkM (ORCPT ); Tue, 4 Sep 2018 08:40:12 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:41691 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726094AbeIDMkL (ORCPT ); Tue, 4 Sep 2018 08:40:11 -0400 Received: by mail-pg1-f193.google.com with SMTP id s15-v6so1281366pgv.8 for ; Tue, 04 Sep 2018 01:16:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=15mDEWrdUj1SaBt3SpEjUt6csZBT+RXJThl30DA8/Ac=; b=BTqmuWybKCFok7Nxna39/Re3cO7tv75RYOVb0gl5SpeQ91K63KhutRbYxJnG3AHGtA TA/OpFkHlACIrWFq8m5pF/2WatqmIH0DHexiYHB1NTVmGjW4lNjq1i3nLyXlDVrA5bJM lJ672qDLBLx/D+9S8O4nXKYUSiyLcdbPC/FoSVU8sBc24Mh/1v55oSe4kMj8RDdWrWYW tu5bjvlGlr8jO61EiN4HU0AxSztp0PozqJtzASZFWglwN48X7gBhTpWgUjCGacOwZrHa 8h9Vce1qbua6PYWtOcPGn62U0HB55/Gf3bwfuxRWuC+WV+y+NpxlW1tnUzLufBnS+GFP DmmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=15mDEWrdUj1SaBt3SpEjUt6csZBT+RXJThl30DA8/Ac=; b=LkRySBIz/lsUinKgsKjELL3BL+HHh7yvO3hvjnnlmxVMDbyFxMY77XXO2qKZsIGqrd 6AR9Hx2qDmg6qRBjIvNemCS9ncYzYre3DFKPMWYAeDl0nRjEcabNKp4DAQ6hFcFK62ZP BRGLm7s1/gzxR6ncK8mJ9TArHdTuXJg6/wZk8usRadVLH9BfzUDJzBSjwK5A5WBr1a4d A6zvDCcVo2ClOEKkqPx7Qxi8GQvz8x+Dggr4MS8LIEbsJLZwA6ynkO0ZQhYpXuyl1plR oojw+SM3FUTNUVb3dB8CjBOpwncqCPaTM4YVmytcfhVoePXjyxyPj5dcYgGRQrjPsdDq TT7Q== X-Gm-Message-State: APzg51AMoDH/4Ua0vvgPw/WCGSv7+eaURY7q0WjbnjSmt2xOZPJVx1lP dvsFLOTNE1vlKKmrrWhDIIGscPKA X-Google-Smtp-Source: ANB0VdaBHYlvPVjPzD57Wj59gWoRjF+0QmYaWxBKE/dltAE1wwBcl3/UtzDTbtTZTYrJWi7xgrFKhw== X-Received: by 2002:a62:63c2:: with SMTP id x185-v6mr33674063pfb.13.1536048969870; Tue, 04 Sep 2018 01:16:09 -0700 (PDT) Received: from roar.au.ibm.com (59-102-81-67.tpgi.com.au. [59.102.81.67]) by smtp.gmail.com with ESMTPSA id t9-v6sm29300066pgi.87.2018.09.04.01.16.06 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 04 Sep 2018 01:16:09 -0700 (PDT) From: Nicholas Piggin To: kvm-ppc@vger.kernel.org Cc: Nicholas Piggin , Paul Mackerras , David Gibson , "Aneesh Kumar K.V" , linuxppc-dev@lists.ozlabs.org Subject: [PATCH] KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size Date: Tue, 4 Sep 2018 18:16:01 +1000 Message-Id: <20180904081601.32703-1-npiggin@gmail.com> X-Mailer: git-send-email 2.18.0 Sender: kvm-ppc-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm-ppc@vger.kernel.org THP paths can defer splitting compound pages until after the actual remap and TLB flushes to split a huge PMD/PUD. This causes radix partition scope page table mappings to get out of synch with the host qemu page table mappings. This results in random memory corruption in the guest when running with THP. The easiest way to reproduce is use KVM baloon to free up a lot of memory in the guest and then shrink the balloon to give the memory back, while some work is being done in the guest. Cc: Paul Mackerras Cc: David Gibson Cc: "Aneesh Kumar K.V" Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Nicholas Piggin Reviewed-by: Aneesh Kumar K.V Tested-by: David Gibson --- arch/powerpc/kvm/book3s_64_mmu_radix.c | 88 ++++++++++---------------- 1 file changed, 34 insertions(+), 54 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index 0af1c0aea1fe..d8792445d95a 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -525,8 +525,8 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, unsigned long ea, unsigned long dsisr) { struct kvm *kvm = vcpu->kvm; - unsigned long mmu_seq, pte_size; - unsigned long gpa, gfn, hva, pfn; + unsigned long mmu_seq; + unsigned long gpa, gfn, hva; struct kvm_memory_slot *memslot; struct page *page = NULL; long ret; @@ -623,9 +623,10 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, */ hva = gfn_to_hva_memslot(memslot, gfn); if (upgrade_p && __get_user_pages_fast(hva, 1, 1, &page) == 1) { - pfn = page_to_pfn(page); upgrade_write = true; } else { + unsigned long pfn; + /* Call KVM generic code to do the slow-path check */ pfn = __gfn_to_pfn_memslot(memslot, gfn, false, NULL, writing, upgrade_p); @@ -639,63 +640,42 @@ int kvmppc_book3s_radix_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, } } - /* See if we can insert a 1GB or 2MB large PTE here */ - level = 0; - if (page && PageCompound(page)) { - pte_size = PAGE_SIZE << compound_order(compound_head(page)); - if (pte_size >= PUD_SIZE && - (gpa & (PUD_SIZE - PAGE_SIZE)) == - (hva & (PUD_SIZE - PAGE_SIZE))) { - level = 2; - pfn &= ~((PUD_SIZE >> PAGE_SHIFT) - 1); - } else if (pte_size >= PMD_SIZE && - (gpa & (PMD_SIZE - PAGE_SIZE)) == - (hva & (PMD_SIZE - PAGE_SIZE))) { - level = 1; - pfn &= ~((PMD_SIZE >> PAGE_SHIFT) - 1); - } - } - /* - * Compute the PTE value that we need to insert. + * Read the PTE from the process' radix tree and use that + * so we get the shift and attribute bits. */ - if (page) { - pgflags = _PAGE_READ | _PAGE_EXEC | _PAGE_PRESENT | _PAGE_PTE | - _PAGE_ACCESSED; - if (writing || upgrade_write) - pgflags |= _PAGE_WRITE | _PAGE_DIRTY; - pte = pfn_pte(pfn, __pgprot(pgflags)); + local_irq_disable(); + ptep = __find_linux_pte(vcpu->arch.pgdir, hva, NULL, &shift); + pte = *ptep; + local_irq_enable(); + + /* Get pte level from shift/size */ + if (shift == PUD_SHIFT && + (gpa & (PUD_SIZE - PAGE_SIZE)) == + (hva & (PUD_SIZE - PAGE_SIZE))) { + level = 2; + } else if (shift == PMD_SHIFT && + (gpa & (PMD_SIZE - PAGE_SIZE)) == + (hva & (PMD_SIZE - PAGE_SIZE))) { + level = 1; } else { - /* - * Read the PTE from the process' radix tree and use that - * so we get the attribute bits. - */ - local_irq_disable(); - ptep = __find_linux_pte(vcpu->arch.pgdir, hva, NULL, &shift); - pte = *ptep; - local_irq_enable(); - if (shift == PUD_SHIFT && - (gpa & (PUD_SIZE - PAGE_SIZE)) == - (hva & (PUD_SIZE - PAGE_SIZE))) { - level = 2; - } else if (shift == PMD_SHIFT && - (gpa & (PMD_SIZE - PAGE_SIZE)) == - (hva & (PMD_SIZE - PAGE_SIZE))) { - level = 1; - } else if (shift && shift != PAGE_SHIFT) { - /* Adjust PFN */ - unsigned long mask = (1ul << shift) - PAGE_SIZE; - pte = __pte(pte_val(pte) | (hva & mask)); - } - pte = __pte(pte_val(pte) | _PAGE_EXEC | _PAGE_ACCESSED); - if (writing || upgrade_write) { - if (pte_val(pte) & _PAGE_WRITE) - pte = __pte(pte_val(pte) | _PAGE_DIRTY); - } else { - pte = __pte(pte_val(pte) & ~(_PAGE_WRITE | _PAGE_DIRTY)); + level = 0; + + /* Can not cope with unknown page shift */ + if (shift && shift != PAGE_SHIFT) { + WARN_ON_ONCE(1); + return -EFAULT; } } + pte = __pte(pte_val(pte) | _PAGE_EXEC | _PAGE_ACCESSED); + if (writing || upgrade_write) { + if (pte_val(pte) & _PAGE_WRITE) + pte = __pte(pte_val(pte) | _PAGE_DIRTY); + } else { + pte = __pte(pte_val(pte) & ~(_PAGE_WRITE | _PAGE_DIRTY)); + } + /* Allocate space in the tree and write the PTE */ ret = kvmppc_create_pte(kvm, pte, gpa, level, mmu_seq);