From patchwork Sat Oct 13 01:31:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joel Fernandes X-Patchwork-Id: 983411 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42X8NX0ZLzz9s3l for ; Sat, 13 Oct 2018 13:51:24 +1100 (AEDT) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=joelfernandes.org Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=joelfernandes.org header.i=@joelfernandes.org header.b="lwmRep2s"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 42X8NW641SzF3TT for ; Sat, 13 Oct 2018 13:51:23 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=joelfernandes.org Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=joelfernandes.org header.i=@joelfernandes.org header.b="lwmRep2s"; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=joelfernandes.org (client-ip=2607:f8b0:4864:20::441; helo=mail-pf1-x441.google.com; envelope-from=joel@joelfernandes.org; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=joelfernandes.org Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=joelfernandes.org header.i=@joelfernandes.org header.b="lwmRep2s"; dkim-atps=neutral Received: from mail-pf1-x441.google.com (mail-pf1-x441.google.com [IPv6:2607:f8b0:4864:20::441]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 42X6dP0cWKzF3Sp for ; Sat, 13 Oct 2018 12:32:24 +1100 (AEDT) Received: by mail-pf1-x441.google.com with SMTP id g21-v6so7857pfi.7 for ; Fri, 12 Oct 2018 18:32:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=TuTZZG6t7Vpxk6hICMjuKsRejg5EuhIlnOB0TcQ6zaw=; b=lwmRep2sGNPNtJhgB9RsOPKTOOSB/krJ8anCRbmRRsLRPZ/Xgg8wtftrtImybZPJ9o ta6TKRrl0gc5jms/MCsxSVJ+O1Hrax3boWXjxUDfy0mAOjOPdatt+VtwlcaiJ8EZAXsz W1Bl/HdY1fIvlWc7La4wrjjn5Dnmh5o9mNT2I= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=TuTZZG6t7Vpxk6hICMjuKsRejg5EuhIlnOB0TcQ6zaw=; b=eFHSFP+zGI+k5+hscu+3wZvXYIo0Zh+ue7x1iz88UqFoyLgRIaeTq3vxDp7CPeUiCB CYuYKqgs84MgFUe54GtnW30+qUlaONsnpWLmvk2SlL7TaIydlomB0P0OF/wD7/GUZhOf rebmgTlRiJ82kRPPUdlregvC1o+9E7mxrwehqBu7qQyIL2hN7PFNtXDH7EEq1YhGzwqS ExFFyGtwDDWOEuFsECkWVQE4TiBcac/VYw7FH1daZBGLmn1JUrvlDjFVbMmudsmve/mK oNEi1JMA1tFgkx+9Oblt/DB0mdGcl0GZH/PwXSnt4yVlqCmJHipbfk9Diz6G9EByKz3+ IVHQ== X-Gm-Message-State: ABuFfogwUMgBj7Y4PEuP0z3sj8Yt/crDSP8E7rbT3seUZcRubAzVG47y iAkuAj28biBUjB0osBNN5Mg6jdCBois= X-Google-Smtp-Source: ACcGV61+9EmPtOdQGzTHjVGwLpiqYsmqyZ1bXa4NAgT6WNqQkE33CkAc0nRdLLEL017wJ1a4KyTnew== X-Received: by 2002:a63:b08:: with SMTP id 8-v6mr7584652pgl.130.1539394343099; Fri, 12 Oct 2018 18:32:23 -0700 (PDT) Received: from joelaf.mtv.corp.google.com ([2620:0:1000:1601:3aef:314f:b9ea:889f]) by smtp.gmail.com with ESMTPSA id q7-v6sm6507828pfd.164.2018.10.12.18.32.20 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 12 Oct 2018 18:32:21 -0700 (PDT) From: "Joel Fernandes (Google)" To: linux-kernel@vger.kernel.org Subject: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2) Date: Fri, 12 Oct 2018 18:31:58 -0700 Message-Id: <20181013013200.206928-3-joel@joelfernandes.org> X-Mailer: git-send-email 2.19.0.605.g01d371f741-goog In-Reply-To: <20181013013200.206928-1-joel@joelfernandes.org> References: <20181013013200.206928-1-joel@joelfernandes.org> MIME-Version: 1.0 X-Mailman-Approved-At: Sat, 13 Oct 2018 13:44:51 +1100 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-mips@linux-mips.org, Rich Felker , linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Dave Hansen , Will Deacon , mhocko@kernel.org, linux-mm@kvack.org, lokeshgidra@google.com, "Joel Fernandes \(Google\)" , linux-riscv@lists.infradead.org, elfring@users.sourceforge.net, Jonas Bonn , kvmarm@lists.cs.columbia.edu, dancol@google.com, Yoshinori Sato , sparclinux@vger.kernel.org, linux-xtensa@linux-xtensa.org, linux-hexagon@vger.kernel.org, Helge Deller , "maintainer:X86 ARCHITECTURE 32-BIT AND 64-BIT" , hughd@google.com, "James E.J. Bottomley" , kasan-dev@googlegroups.com, anton.ivanov@kot-begemot.co.uk, Ingo Molnar , Geert Uytterhoeven , Andrey Ryabinin , linux-snps-arc@lists.infradead.org, kernel-team@android.com, Sam Creasey , Fenghua Yu , linux-s390@vger.kernel.org, Jeff Dike , linux-um@lists.infradead.org, Stefan Kristiansson , Julia Lawall , linux-m68k@lists.linux-m68k.org, Borislav Petkov , Andy Lutomirski , nios2-dev@lists.rocketboards.org, kirill@shutemov.name, Stafford Horne , Guan Xuetao , Chris Zankel , Tony Luck , Richard Weinberger , linux-parisc@vger.kernel.org, pantin@google.com, Max Filippov , minchan@kernel.org, Thomas Gleixner , linux-alpha@vger.kernel.org, Ley Foon Tan , akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org, "David S. Miller" Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" Android needs to mremap large regions of memory during memory management related operations. The mremap system call can be really slow if THP is not enabled. The bottleneck is move_page_tables, which is copying each pte at a time, and can be really slow across a large map. Turning on THP may not be a viable option, and is not for us. This patch speeds up the performance for non-THP system by copying at the PMD level when possible. The speed up is three orders of magnitude. On a 1GB mremap, the mremap completion times drops from 160-250 millesconds to 380-400 microseconds. Before: Total mremap time for 1GB data: 242321014 nanoseconds. Total mremap time for 1GB data: 196842467 nanoseconds. Total mremap time for 1GB data: 167051162 nanoseconds. After: Total mremap time for 1GB data: 385781 nanoseconds. Total mremap time for 1GB data: 388959 nanoseconds. Total mremap time for 1GB data: 402813 nanoseconds. Incase THP is enabled, the optimization is skipped. I also flush the tlb every time we do this optimization since I couldn't find a way to determine if the low-level PTEs are dirty. It is seen that the cost of doing so is not much compared the improvement, on both x86-64 and arm64. Cc: minchan@kernel.org Cc: pantin@google.com Cc: hughd@google.com Cc: lokeshgidra@google.com Cc: dancol@google.com Cc: mhocko@kernel.org Cc: kirill@shutemov.name Cc: akpm@linux-foundation.org Cc: kernel-team@android.com Signed-off-by: Joel Fernandes (Google) Signed-off-by: Joel Fernandes (Google) --- arch/Kconfig | 5 ++++ mm/mremap.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 70 insertions(+) diff --git a/arch/Kconfig b/arch/Kconfig index 6801123932a5..9724fe39884f 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -518,6 +518,11 @@ config HAVE_IRQ_TIME_ACCOUNTING Archs need to ensure they use a high enough resolution clock to support irq time accounting and then call enable_sched_clock_irqtime(). +config HAVE_MOVE_PMD + bool + help + Archs that select this are able to move page tables at the PMD level. + config HAVE_ARCH_TRANSPARENT_HUGEPAGE bool diff --git a/mm/mremap.c b/mm/mremap.c index 9e68a02a52b1..2fd163cff406 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -191,6 +191,54 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, drop_rmap_locks(vma); } +static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, + unsigned long new_addr, unsigned long old_end, + pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush) +{ + spinlock_t *old_ptl, *new_ptl; + struct mm_struct *mm = vma->vm_mm; + + if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK) + || old_end - old_addr < PMD_SIZE) + return false; + + /* + * The destination pmd shouldn't be established, free_pgtables() + * should have release it. + */ + if (WARN_ON(!pmd_none(*new_pmd))) + return false; + + /* + * We don't have to worry about the ordering of src and dst + * ptlocks because exclusive mmap_sem prevents deadlock. + */ + old_ptl = pmd_lock(vma->vm_mm, old_pmd); + if (old_ptl) { + pmd_t pmd; + + new_ptl = pmd_lockptr(mm, new_pmd); + if (new_ptl != old_ptl) + spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); + + /* Clear the pmd */ + pmd = *old_pmd; + pmd_clear(old_pmd); + + VM_BUG_ON(!pmd_none(*new_pmd)); + + /* Set the new pmd */ + set_pmd_at(mm, new_addr, new_pmd, pmd); + if (new_ptl != old_ptl) + spin_unlock(new_ptl); + spin_unlock(old_ptl); + + *need_flush = true; + return true; + } + return false; +} + unsigned long move_page_tables(struct vm_area_struct *vma, unsigned long old_addr, struct vm_area_struct *new_vma, unsigned long new_addr, unsigned long len, @@ -239,7 +287,24 @@ unsigned long move_page_tables(struct vm_area_struct *vma, split_huge_pmd(vma, old_pmd, old_addr); if (pmd_trans_unstable(old_pmd)) continue; + } else if (extent == PMD_SIZE && IS_ENABLED(CONFIG_HAVE_MOVE_PMD)) { + /* + * If the extent is PMD-sized, try to speed the move by + * moving at the PMD level if possible. + */ + bool moved; + + if (need_rmap_locks) + take_rmap_locks(vma); + moved = move_normal_pmd(vma, old_addr, new_addr, + old_end, old_pmd, new_pmd, + &need_flush); + if (need_rmap_locks) + drop_rmap_locks(vma); + if (moved) + continue; } + if (pte_alloc(new_vma->vm_mm, new_pmd)) break; next = (new_addr + PMD_SIZE) & PMD_MASK;