From patchwork Wed Jun 13 09:58:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 928786 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 415Mhy3QPrz9rxs for ; Wed, 13 Jun 2018 20:01:22 +1000 (AEST) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="nJ1lFaW6"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 415Mhy1twZzF10l for ; Wed, 13 Jun 2018 20:01:22 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="nJ1lFaW6"; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:400e:c00::243; helo=mail-pf0-x243.google.com; envelope-from=npiggin@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="nJ1lFaW6"; dkim-atps=neutral Received: from mail-pf0-x243.google.com (mail-pf0-x243.google.com [IPv6:2607:f8b0:400e:c00::243]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 415MfP232bzF0Pd for ; Wed, 13 Jun 2018 19:59:09 +1000 (AEST) Received: by mail-pf0-x243.google.com with SMTP id y5-v6so1141400pfn.4 for ; Wed, 13 Jun 2018 02:59:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=fp1hQB31BfNGxg20Skg2rOWK12DM7sYK9MpCH4RuYJM=; b=nJ1lFaW695Yn9fRmnI2QlRvlNgck2WbbMXv30q2Dq1U6AbK4h/2Hq9sXZssK2LJU5M dDY+hBWEUDYc/vI7zS98zC0bZIoL+zLT1XS4xowFMNlmiipFR/n7/YvwKBPZZC2PCCLt QbxFC4Ga34pp6JxKNc0x25w8DpWpZFS35hZd9pRunKPfpBJgdX5+zO2LUi33aGWd93pv acFepoEs7GIztur84hfaL2DUv6Otgg+71AzJ/bEH+5Tvj1/P34Xi2nnmdAhc7g2v819E Iv1zMgRnYe9Ky1p72b+J427oR/JmzyVQkhcaCW9RwiKlO22fCbbJqW4g/dXGHA9rb0OS pkPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=fp1hQB31BfNGxg20Skg2rOWK12DM7sYK9MpCH4RuYJM=; b=rf7HWXGtP4BrKIDXQ66tWjRb1tjofY9O51boLvPE0oFtkpvkmSAszZ6khMdLPiMUe5 PCKqt/EVqoEh2jRcUgWbS+Llqi7TXe3hOGCl24JESg+RZ0VMJu12ERJP4F9ZHm6Db8C/ jvMi5L34a7FVyB9JEjrFertQsHhHC6T4KubeBRid0KPQX87ETj+H480Vuk2Xpy0DNMpN wCOv65h3rBtLSc28ABILUjEkZnlZziz7uVfoV7dTGvDw1gXcJK0ITw2BgIQIySYGvqq+ +Jmj7a3IM/x+GuiYF8QN5WdCzhz81OjxDb2lQztOQBKDoqMdsAQX7jwmOCY9yPCCMV5A phsg== X-Gm-Message-State: APt69E2dV+gUlylUxY1CPWv8WXQtMDblXzD/wdXjOEHkxhsYjr4Cok6u c7ThLZCIYLsG4lyzUYGxyVSwZg== X-Google-Smtp-Source: ADUXVKKcq0Oej5HVun4Um3xuojbVqLe27GjDS1FoQKAmV9+CHW/AQvGIuPH0VVfKakWc1uUvsC52tA== X-Received: by 2002:a63:8dca:: with SMTP id z193-v6mr3460367pgd.451.1528883946954; Wed, 13 Jun 2018 02:59:06 -0700 (PDT) Received: from roar.au.ibm.com (59-102-70-78.tpgi.com.au. [59.102.70.78]) by smtp.gmail.com with ESMTPSA id l71-v6sm3401379pge.8.2018.06.13.02.59.04 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 13 Jun 2018 02:59:06 -0700 (PDT) From: Nicholas Piggin To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH] powerpc/64s/radix: Fix MADV_[FREE|DONTNEED] TLB flush miss problem with THP Date: Wed, 13 Jun 2018 19:58:58 +1000 Message-Id: <20180613095858.31078-1-npiggin@gmail.com> X-Mailer: git-send-email 2.17.0 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Aneesh Kumar K . V" , Nicholas Piggin Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" The patch 99baac21e4 ("mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem") added a force flush mode to the mmu_gather flush, which unconditionally flushes the entire address range being invalidated (even if actual ptes only covered a smaller range), to solve a problem with concurrent threads invalidating the same PTEs causing them to miss TLBs that need flushing. This does not work with powerpc that invalidates mmu_gather batches according to page size. Have powerpc flush all possible page sizes in the range if it encounters the concurrency condition. Hash does not have a problem because it invalidates TLBs inside the page table locks. Reported-by: Aneesh Kumar K.V Signed-off-by: Nicholas Piggin --- Since RFC: - Account for hugetlb pages that can be mixed with the tlb_flush range. arch/powerpc/mm/tlb-radix.c | 85 ++++++++++++++++++++++++++++--------- 1 file changed, 66 insertions(+), 19 deletions(-) diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c index 67a6e86d3e7e..9dbccda651d7 100644 --- a/arch/powerpc/mm/tlb-radix.c +++ b/arch/powerpc/mm/tlb-radix.c @@ -689,22 +689,17 @@ EXPORT_SYMBOL(radix__flush_tlb_kernel_range); static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33; static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2; -void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start, - unsigned long end) +static inline void __radix__flush_tlb_range(struct mm_struct *mm, + unsigned long start, unsigned long end, + bool flush_all_sizes) { - struct mm_struct *mm = vma->vm_mm; unsigned long pid; unsigned int page_shift = mmu_psize_defs[mmu_virtual_psize].shift; unsigned long page_size = 1UL << page_shift; unsigned long nr_pages = (end - start) >> page_shift; bool local, full; -#ifdef CONFIG_HUGETLB_PAGE - if (is_vm_hugetlb_page(vma)) - return radix__flush_hugetlb_tlb_range(vma, start, end); -#endif - pid = mm->context.id; if (unlikely(pid == MMU_NO_CONTEXT)) return; @@ -738,18 +733,27 @@ void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start, _tlbie_pid(pid, RIC_FLUSH_TLB); } } else { - bool hflush = false; + bool hflush = flush_all_sizes; + bool gflush = flush_all_sizes; unsigned long hstart, hend; + unsigned long gstart, gend; -#ifdef CONFIG_TRANSPARENT_HUGEPAGE - hstart = (start + HPAGE_PMD_SIZE - 1) >> HPAGE_PMD_SHIFT; - hend = end >> HPAGE_PMD_SHIFT; - if (hstart < hend) { - hstart <<= HPAGE_PMD_SHIFT; - hend <<= HPAGE_PMD_SHIFT; + if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) hflush = true; + + if (hflush) { + hstart = (start + HPAGE_PMD_SIZE - 1) & HPAGE_PMD_MASK; + hend = end & HPAGE_PMD_MASK; + if (hstart == hend) + hflush = false; + } + + if (gflush) { + gstart = (start + HPAGE_PUD_SIZE - 1) & HPAGE_PUD_MASK; + gend = end & HPAGE_PUD_MASK; + if (gstart == gend) + gflush = false; } -#endif asm volatile("ptesync": : :"memory"); if (local) { @@ -757,18 +761,36 @@ void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start, if (hflush) __tlbiel_va_range(hstart, hend, pid, HPAGE_PMD_SIZE, MMU_PAGE_2M); + if (gflush) + __tlbiel_va_range(gstart, gend, pid, + HPAGE_PUD_SIZE, MMU_PAGE_1G); asm volatile("ptesync": : :"memory"); } else { __tlbie_va_range(start, end, pid, page_size, mmu_virtual_psize); if (hflush) __tlbie_va_range(hstart, hend, pid, HPAGE_PMD_SIZE, MMU_PAGE_2M); + if (gflush) + __tlbie_va_range(gstart, gend, pid, + HPAGE_PUD_SIZE, MMU_PAGE_1G); fixup_tlbie(); asm volatile("eieio; tlbsync; ptesync": : :"memory"); } } preempt_enable(); } + +void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start, + unsigned long end) + +{ +#ifdef CONFIG_HUGETLB_PAGE + if (is_vm_hugetlb_page(vma)) + return radix__flush_hugetlb_tlb_range(vma, start, end); +#endif + + __radix__flush_tlb_range(vma->vm_mm, start, end, false); +} EXPORT_SYMBOL(radix__flush_tlb_range); static int radix_get_mmu_psize(int page_size) @@ -837,6 +859,8 @@ void radix__tlb_flush(struct mmu_gather *tlb) int psize = 0; struct mm_struct *mm = tlb->mm; int page_size = tlb->page_size; + unsigned long start = tlb->start; + unsigned long end = tlb->end; /* * if page size is not something we understand, do a full mm flush @@ -847,15 +871,38 @@ void radix__tlb_flush(struct mmu_gather *tlb) */ if (tlb->fullmm) { __flush_all_mm(mm, true); +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLB_PAGE) + } else if (mm_tlb_flush_pending(mm)) { + /* + * If there is a concurrent invalidation that is clearing ptes, + * then it's possible this invalidation will miss one of those + * cleared ptes and miss flushing the TLB. If this invalidate + * returns before the other one flushes TLBs, that can result + * in it returning while there are still valid TLBs inside the + * range to be invalidated. + * + * See mm/memory.c:tlb_finish_mmu() for more details. + * + * The solution to this is ensure the entire range is always + * flushed here. The problem for powerpc is that the flushes + * are page size specific, so this "forced flush" would not + * do the right thing if there are a mix of page sizes in + * the range to be invalidated. So use __flush_tlb_range + * which invalidates all possible page sizes in the range. + * + * PWC flush probably is not be required because the core code + * shouldn't free page tables in this path, but accounting + * for the possibility makes us a bit more robust. + */ + WARN_ON_ONCE(tlb->need_flush_all); + __radix__flush_tlb_range(mm, start, end, true); +#endif } else if ( (psize = radix_get_mmu_psize(page_size)) == -1) { if (!tlb->need_flush_all) radix__flush_tlb_mm(mm); else radix__flush_all_mm(mm); } else { - unsigned long start = tlb->start; - unsigned long end = tlb->end; - if (!tlb->need_flush_all) radix__flush_tlb_range_psize(mm, start, end, psize); else