From patchwork Mon Jul 29 11:19:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Colin Ian King X-Patchwork-Id: 1138315 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 45xxzP2yCCz9sMr; Mon, 29 Jul 2019 21:19:29 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1hs3gb-0001tz-Di; Mon, 29 Jul 2019 11:19:25 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1hs3gZ-0001tE-Fu for kernel-team@lists.ubuntu.com; Mon, 29 Jul 2019 11:19:23 +0000 Received: from 1.general.cking.uk.vpn ([10.172.193.212] helo=localhost) by youngberry.canonical.com with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1hs3gZ-0000V5-3b; Mon, 29 Jul 2019 11:19:23 +0000 From: Colin King To: kernel-team@lists.ubuntu.com Subject: [PATCH 1/4][SRU][B-HWE][D][V3] x86/mm: Check for pfn instead of page in vmalloc_sync_one() Date: Mon, 29 Jul 2019 12:19:19 +0100 Message-Id: <20190729111922.8611-2-colin.king@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190729111922.8611-1-colin.king@canonical.com> References: <20190729111922.8611-1-colin.king@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Joerg Roedel BugLink: https://bugs.launchpad.net/bugs/1838115 Do not require a struct page for the mapped memory location because it might not exist. This can happen when an ioremapped region is mapped with 2MB pages. Fixes: 5d72b4fba40ef ('x86, mm: support huge I/O mapping capability I/F') Signed-off-by: Joerg Roedel Signed-off-by: Thomas Gleixner Reviewed-by: Dave Hansen Link: https://lkml.kernel.org/r/20190719184652.11391-2-joro@8bytes.org (cherry picked from commit 51b75b5b563a2637f9d8dc5bd02a31b2ff9e5ea0) Signed-off-by: Colin Ian King --- arch/x86/mm/fault.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 9d5c75f..2d61c7b 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -199,7 +199,7 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address) if (!pmd_present(*pmd)) set_pmd(pmd, *pmd_k); else - BUG_ON(pmd_page(*pmd) != pmd_page(*pmd_k)); + BUG_ON(pmd_pfn(*pmd) != pmd_pfn(*pmd_k)); return pmd_k; } From patchwork Mon Jul 29 11:19:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Colin Ian King X-Patchwork-Id: 1138317 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 45xxzS36s3z9sML; Mon, 29 Jul 2019 21:19:32 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1hs3ge-0001uw-KL; Mon, 29 Jul 2019 11:19:28 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1hs3gZ-0001tK-UB for kernel-team@lists.ubuntu.com; Mon, 29 Jul 2019 11:19:23 +0000 Received: from 1.general.cking.uk.vpn ([10.172.193.212] helo=localhost) by youngberry.canonical.com with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1hs3gZ-0000VA-I4; Mon, 29 Jul 2019 11:19:23 +0000 From: Colin King To: kernel-team@lists.ubuntu.com Subject: [PATCH 2/4][SRU][B-HWE][D][V3] x86/mm: Sync also unmappings in vmalloc_sync_all() Date: Mon, 29 Jul 2019 12:19:20 +0100 Message-Id: <20190729111922.8611-3-colin.king@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190729111922.8611-1-colin.king@canonical.com> References: <20190729111922.8611-1-colin.king@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Joerg Roedel BugLink: https://bugs.launchpad.net/bugs/1838115 With huge-page ioremap areas the unmappings also need to be synced between all page-tables. Otherwise it can cause data corruption when a region is unmapped and later re-used. Make the vmalloc_sync_one() function ready to sync unmappings and make sure vmalloc_sync_all() iterates over all page-tables even when an unmapped PMD is found. Fixes: 5d72b4fba40ef ('x86, mm: support huge I/O mapping capability I/F') Signed-off-by: Joerg Roedel Signed-off-by: Thomas Gleixner Reviewed-by: Dave Hansen Link: https://lkml.kernel.org/r/20190719184652.11391-3-joro@8bytes.org (cherry picked from commit 8e998fc24de47c55b47a887f6c95ab91acd4a720) Signed-off-by: Colin Ian King --- arch/x86/mm/fault.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 2d61c7b..df25c9c 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -193,11 +193,12 @@ static inline pmd_t *vmalloc_sync_one(pgd_t *pgd, unsigned long address) pmd = pmd_offset(pud, address); pmd_k = pmd_offset(pud_k, address); - if (!pmd_present(*pmd_k)) - return NULL; - if (!pmd_present(*pmd)) + if (pmd_present(*pmd) != pmd_present(*pmd_k)) set_pmd(pmd, *pmd_k); + + if (!pmd_present(*pmd_k)) + return NULL; else BUG_ON(pmd_pfn(*pmd) != pmd_pfn(*pmd_k)); @@ -219,17 +220,13 @@ void vmalloc_sync_all(void) spin_lock(&pgd_lock); list_for_each_entry(page, &pgd_list, lru) { spinlock_t *pgt_lock; - pmd_t *ret; /* the pgt_lock only for Xen */ pgt_lock = &pgd_page_get_mm(page)->page_table_lock; spin_lock(pgt_lock); - ret = vmalloc_sync_one(page_address(page), address); + vmalloc_sync_one(page_address(page), address); spin_unlock(pgt_lock); - - if (!ret) - break; } spin_unlock(&pgd_lock); } From patchwork Mon Jul 29 11:19:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Colin Ian King X-Patchwork-Id: 1138319 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 45xxzV5h5Rz9sMQ; Mon, 29 Jul 2019 21:19:34 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1hs3gh-0001wa-1v; Mon, 29 Jul 2019 11:19:31 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1hs3ga-0001tQ-D1 for kernel-team@lists.ubuntu.com; Mon, 29 Jul 2019 11:19:24 +0000 Received: from 1.general.cking.uk.vpn ([10.172.193.212] helo=localhost) by youngberry.canonical.com with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1hs3gZ-0000VF-WC; Mon, 29 Jul 2019 11:19:24 +0000 From: Colin King To: kernel-team@lists.ubuntu.com Subject: [PATCH 3/4][SRU][B-HWE][D][V3] mm/vmalloc.c: add priority threshold to __purge_vmap_area_lazy() Date: Mon, 29 Jul 2019 12:19:21 +0100 Message-Id: <20190729111922.8611-4-colin.king@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190729111922.8611-1-colin.king@canonical.com> References: <20190729111922.8611-1-colin.king@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: "Uladzislau Rezki (Sony)" BugLink: https://bugs.launchpad.net/bugs/1838115 Commit 763b218ddfaf ("mm: add preempt points into __purge_vmap_area_lazy()") introduced some preempt points, one of those is making an allocation more prioritized over lazy free of vmap areas. Prioritizing an allocation over freeing does not work well all the time, i.e. it should be rather a compromise. 1) Number of lazy pages directly influences the busy list length thus on operations like: allocation, lookup, unmap, remove, etc. 2) Under heavy stress of vmalloc subsystem I run into a situation when memory usage gets increased hitting out_of_memory -> panic state due to completely blocking of logic that frees vmap areas in the __purge_vmap_area_lazy() function. Establish a threshold passing which the freeing is prioritized back over allocation creating a balance between each other. Using vmalloc test driver in "stress mode", i.e. When all available test cases are run simultaneously on all online CPUs applying a pressure on the vmalloc subsystem, my HiKey 960 board runs out of memory due to the fact that __purge_vmap_area_lazy() logic simply is not able to free pages in time. How I run it: 1) You should build your kernel with CONFIG_TEST_VMALLOC=m 2) ./tools/testing/selftests/vm/test_vmalloc.sh stress During this test "vmap_lazy_nr" pages will go far beyond acceptable lazy_max_pages() threshold, that will lead to enormous busy list size and other problems including allocation time and so on. Link: http://lkml.kernel.org/r/20190124115648.9433-3-urezki@gmail.com Signed-off-by: Uladzislau Rezki (Sony) Reviewed-by: Andrew Morton Cc: Michal Hocko Cc: Matthew Wilcox Cc: Thomas Garnier Cc: Oleksiy Avramchenko Cc: Steven Rostedt Cc: Joel Fernandes Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Tejun Heo Cc: Joel Fernandes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds (cherry picked from commit 68571be99f323c3c3db62a8513a43380ccefe97c) Signed-off-by: Colin Ian King --- mm/vmalloc.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 1bf8fa9..07c9a18 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -661,23 +661,27 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) struct llist_node *valist; struct vmap_area *va; struct vmap_area *n_va; - bool do_free = false; + int resched_threshold; lockdep_assert_held(&vmap_purge_lock); valist = llist_del_all(&vmap_purge_list); + if (unlikely(valist == NULL)) + return false; + + /* + * TODO: to calculate a flush range without looping. + * The list can be up to lazy_max_pages() elements. + */ llist_for_each_entry(va, valist, purge_list) { if (va->va_start < start) start = va->va_start; if (va->va_end > end) end = va->va_end; - do_free = true; } - if (!do_free) - return false; - flush_tlb_kernel_range(start, end); + resched_threshold = (int) lazy_max_pages() << 1; spin_lock(&vmap_area_lock); llist_for_each_entry_safe(va, n_va, valist, purge_list) { @@ -685,7 +689,9 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) __free_vmap_area(va); atomic_sub(nr, &vmap_lazy_nr); - cond_resched_lock(&vmap_area_lock); + + if (atomic_read(&vmap_lazy_nr) < resched_threshold) + cond_resched_lock(&vmap_area_lock); } spin_unlock(&vmap_area_lock); return true; From patchwork Mon Jul 29 11:19:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Colin Ian King X-Patchwork-Id: 1138318 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 45xxzT60DDz9s3l; Mon, 29 Jul 2019 21:19:33 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1hs3gf-0001vg-PV; Mon, 29 Jul 2019 11:19:29 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1hs3ga-0001tj-QA for kernel-team@lists.ubuntu.com; Mon, 29 Jul 2019 11:19:24 +0000 Received: from 1.general.cking.uk.vpn ([10.172.193.212] helo=localhost) by youngberry.canonical.com with esmtpsa (TLS1.0:RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1hs3ga-0000VL-Eh; Mon, 29 Jul 2019 11:19:24 +0000 From: Colin King To: kernel-team@lists.ubuntu.com Subject: [PATCH 4/4][SRU][B-HWE][D][V3] mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy() Date: Mon, 29 Jul 2019 12:19:22 +0100 Message-Id: <20190729111922.8611-5-colin.king@canonical.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190729111922.8611-1-colin.king@canonical.com> References: <20190729111922.8611-1-colin.king@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Joerg Roedel BugLink: https://bugs.launchpad.net/bugs/1838115 On x86-32 with PTI enabled, parts of the kernel page-tables are not shared between processes. This can cause mappings in the vmalloc/ioremap area to persist in some page-tables after the region is unmapped and released. When the region is re-used the processes with the old mappings do not fault in the new mappings but still access the old ones. This causes undefined behavior, in reality often data corruption, kernel oopses and panics and even spontaneous reboots. Fix this problem by activly syncing unmaps in the vmalloc/ioremap area to all page-tables in the system before the regions can be re-used. References: https://bugzilla.suse.com/show_bug.cgi?id=1118689 Fixes: 5d72b4fba40ef ('x86, mm: support huge I/O mapping capability I/F') Signed-off-by: Joerg Roedel Signed-off-by: Thomas Gleixner Reviewed-by: Dave Hansen Link: https://lkml.kernel.org/r/20190719184652.11391-4-joro@8bytes.org (cherry picked from commit 3f8fd02b1bf1d7ba964485a56f2f4b53ae88c167) Signed-off-by: Colin Ian King --- mm/vmalloc.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 07c9a18..ddf5ee0 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -670,6 +670,12 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) return false; /* + * First make sure the mappings are removed from all page-tables + * before they are freed. + */ + vmalloc_sync_all(); + + /* * TODO: to calculate a flush range without looping. * The list can be up to lazy_max_pages() elements. */ @@ -2308,6 +2314,9 @@ EXPORT_SYMBOL(remap_vmalloc_range); /* * Implement a stub for vmalloc_sync_all() if the architecture chose not to * have one. + * + * The purpose of this function is to make sure the vmalloc area + * mappings are identical in all page-tables in the system. */ void __weak vmalloc_sync_all(void) {