From patchwork Mon Apr 16 04:32:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 898454 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=kvm-ppc-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="ErznZEWz"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 40Pb8z6Xqkz9s2M for ; Mon, 16 Apr 2018 14:33:07 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753151AbeDPEdH (ORCPT ); Mon, 16 Apr 2018 00:33:07 -0400 Received: from mail-pf0-f193.google.com ([209.85.192.193]:42223 "EHLO mail-pf0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752906AbeDPEdG (ORCPT ); Mon, 16 Apr 2018 00:33:06 -0400 Received: by mail-pf0-f193.google.com with SMTP id o16so9916733pfk.9 for ; Sun, 15 Apr 2018 21:33:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=SHblShcL6cWD2qCo1ccHlzhlgXwebqDRmpQvroUmA1k=; b=ErznZEWzOfnq8Xsu9+GbbEGX8p37bdreO83WIfxJhnkpEwx3x3cfeb6QdkDS/U6UX5 Nt58pOgnzTDSIadDxY2lJ5lZdEhMQcZgv6WuqNQLE0IjxdX6NUsp57QqT77upRE0hYvC Q1BM4exjbvYp2qKnRpKGpIhHD+v0ZLu8+YrobwQXw3qUrVXJdPTNlt8oC8lx56WKtKMs anQCEJitkC+yuNtNUdHvXGXPWi3Hi/Oua3a0vPto77JNgcF0dddZFA2IJhm9XvbH9k1N +KPESeJGzwfJ1ay7eY+tYyR8oieoSPpuxo/7ISLPmC6Gs1f7yGub/Gk7bkv7tXtaRL3u gstA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=SHblShcL6cWD2qCo1ccHlzhlgXwebqDRmpQvroUmA1k=; b=qNwlkV9lqk659L7EjTLi+2qXOtryJEw+Q9uPqDWcyNdkNA8vDLtepUvoh/mUcVVqT3 MFuLmVN/+Apg3vYHh/vLNsgVAG8Vq8fqvbhiKZqImB2ZaQTU1tG4kErJiUmWTPv9nTS6 cyHNLgodomWjvMLZz92nL/dFkPr0Hmm/4s9zX2qzFFMpU4qGEFa5+cui8g9gjQNOA6+V Oij1mMRPkOtCbKRX47brd1HS+S/ImUnp4apbsU5V7XpdX3OI1tpgWk16tDw52ebreaMC ITJJNlSQ3Awe5kcSfEOQRUP7oBq2oZ2xiU09WgSyhjSPMpU+ZFDmTW9twKW3QHfNylFn HCHg== X-Gm-Message-State: ALQs6tBwJHN3yAdCqj6GR4NrtjRFqVDhrF2bBzD+ZHsg7gPIVG8dOTHg YOdZcrPwQ4e3JASRLa7DbRJhLg== X-Google-Smtp-Source: AIpwx4/wh58LbaHla8sRRo6fz6DMw3jFNnn4LQg6ERkCz3piKfdOeaoc1o8wGIMJHI0SCw4ZrX4/pw== X-Received: by 10.101.97.205 with SMTP id j13mr11560802pgv.266.1523853185789; Sun, 15 Apr 2018 21:33:05 -0700 (PDT) Received: from roar.au.ibm.com (59-102-70-78.tpgi.com.au. [59.102.70.78]) by smtp.gmail.com with ESMTPSA id e87sm23029614pfd.136.2018.04.15.21.33.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 15 Apr 2018 21:33:05 -0700 (PDT) From: Nicholas Piggin To: kvm-ppc@vger.kernel.org Cc: Nicholas Piggin , linuxppc-dev@lists.ozlabs.org Subject: [PATCH v2 5/5] KVM: PPC: Book3S HV: radix do not clear partition scoped page table when page fault races with other vCPUs. Date: Mon, 16 Apr 2018 14:32:40 +1000 Message-Id: <20180416043240.8796-6-npiggin@gmail.com> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180416043240.8796-1-npiggin@gmail.com> References: <20180416043240.8796-1-npiggin@gmail.com> Sender: kvm-ppc-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: kvm-ppc@vger.kernel.org When running a SMP radix guest, KVM can get into page fault / tlbie storms -- hundreds of thousands to the same address from different threads -- due to partition scoped page faults invalidating the page table entry if it was found to be already set up by a racing CPU. What can happen is that guest threads can hit page faults for the same addresses, this can happen when KSM or THP takes out a commonly used page. gRA zero (the interrupt vectors and important kernel text) was a common one. Multiple CPUs will page fault and contend on the same lock, when one CPU sets up the page table and releases the lock, the next will find the new entry and invalidate it before installing its own, which causes other page faults which invalidate that entry, etc. The solution to this is to avoid invalidating the entry or flushing TLBs in case of a race. The pte may still need bits updated, but those are to add R/C or relax access restrictions so no flush is required. This solves the page fault / tlbie storms. Signed-off-by: Nicholas Piggin --- arch/powerpc/kvm/book3s_64_mmu_radix.c | 52 ++++++++++++++++---------- 1 file changed, 33 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index dab6b622011c..2d3af22f90dd 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -199,7 +199,6 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa, pud_t *pud, *new_pud = NULL; pmd_t *pmd, *new_pmd = NULL; pte_t *ptep, *new_ptep = NULL; - unsigned long old; int ret; /* Traverse the guest's 2nd-level tree, allocate new levels needed */ @@ -243,6 +242,7 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa, pmd = pmd_offset(pud, gpa); if (pmd_is_leaf(*pmd)) { unsigned long lgpa = gpa & PMD_MASK; + pte_t old_pte = *pmdp_ptep(pmd); /* * If we raced with another CPU which has just put @@ -252,18 +252,22 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa, ret = -EAGAIN; goto out_unlock; } - /* Valid 2MB page here already, remove it */ - old = kvmppc_radix_update_pte(kvm, pmdp_ptep(pmd), - ~0UL, 0, lgpa, PMD_SHIFT); - kvmppc_radix_tlbie_page(kvm, lgpa, PMD_SHIFT); - if (old & _PAGE_DIRTY) { - unsigned long gfn = lgpa >> PAGE_SHIFT; - struct kvm_memory_slot *memslot; - memslot = gfn_to_memslot(kvm, gfn); - if (memslot && memslot->dirty_bitmap) - kvmppc_update_dirty_map(memslot, - gfn, PMD_SIZE); + + /* PTE was previously valid, so update it */ + if (pte_val(old_pte) == pte_val(pte)) { + ret = -EAGAIN; + goto out_unlock; } + + /* Make sure we're weren't trying to take bits away */ + WARN_ON_ONCE(pte_pfn(old_pte) != pte_pfn(pte)); + WARN_ON_ONCE((pte_val(old_pte) & ~pte_val(pte)) & + (_PAGE_PRESENT | _PAGE_READ | _PAGE_WRITE)); + + kvmppc_radix_update_pte(kvm, pmdp_ptep(pmd), + 0, pte_val(pte), lgpa, PMD_SHIFT); + ret = 0; + goto out_unlock; } else if (level == 1 && !pmd_none(*pmd)) { /* * There's a page table page here, but we wanted @@ -274,6 +278,8 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa, goto out_unlock; } if (level == 0) { + pte_t old_pte; + if (pmd_none(*pmd)) { if (!new_ptep) goto out_unlock; @@ -281,13 +287,21 @@ static int kvmppc_create_pte(struct kvm *kvm, pte_t pte, unsigned long gpa, new_ptep = NULL; } ptep = pte_offset_kernel(pmd, gpa); - if (pte_present(*ptep)) { - /* PTE was previously valid, so invalidate it */ - old = kvmppc_radix_update_pte(kvm, ptep, _PAGE_PRESENT, - 0, gpa, 0); - kvmppc_radix_tlbie_page(kvm, gpa, 0); - if (old & _PAGE_DIRTY) - mark_page_dirty(kvm, gpa >> PAGE_SHIFT); + old_pte = *ptep; + if (pte_present(old_pte)) { + /* PTE was previously valid, so update it */ + if (pte_val(old_pte) == pte_val(pte)) { + ret = -EAGAIN; + goto out_unlock; + } + + /* Make sure we're weren't trying to take bits away */ + WARN_ON_ONCE(pte_pfn(old_pte) != pte_pfn(pte)); + WARN_ON_ONCE((pte_val(old_pte) & ~pte_val(pte)) & + (_PAGE_PRESENT | _PAGE_READ | _PAGE_WRITE)); + + kvmppc_radix_update_pte(kvm, ptep, 0, + pte_val(pte), gpa, 0); } kvmppc_radix_set_pte_at(kvm, gpa, ptep, pte); } else {