From patchwork Wed Aug 22 06:40:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juerg Haefliger X-Patchwork-Id: 960793 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41wHwz6GLZz9s7X; Wed, 22 Aug 2018 16:40:35 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1fsMod-0006t7-RC; Wed, 22 Aug 2018 06:40:27 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1fsMoc-0006sm-Fr for kernel-team@lists.ubuntu.com; Wed, 22 Aug 2018 06:40:26 +0000 Received: from mail-ed1-f70.google.com ([209.85.208.70]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1fsMoc-0004EZ-8Y for kernel-team@lists.ubuntu.com; Wed, 22 Aug 2018 06:40:26 +0000 Received: by mail-ed1-f70.google.com with SMTP id g11-v6so475735edi.8 for ; Tue, 21 Aug 2018 23:40:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=dgYDeRztlICVCRIOVxJ45x3nTzd+nsC5qZ2yjsdWrE8=; b=gpg1ez6SiNMfhVuRC6tdkslfHf5F8dmL7Uro3qoBmZx2uRae+mdr29PIgzsQExZGZI LWtQfO1XY+wHcrquliiC04JAHBGB5YZ+I7PwEa1QJjbXNNtkaZJ5b2R+mFFE4VZuMo7q 6w3gDfciIsndcJYFDaCjITn8FcdQ6KQ0n9EhkeKHTPAXjRFvVVTa2sGSuH39DfO7T1aj q3Z8XBdl6DS+wdicwpD0i2LOipImDRJxGXdZTDm1SmiHzktjyqBylU8eaNiBCOY9Qy1B ELiRdJRzSCktrihiek2fyX+RfS6oOCWOTZCsyQo0NZNBzA0yDzMtHgqNi1agVK1Jq4o7 SRAQ== X-Gm-Message-State: APzg51A/p0JG1beJ/7fnNE/jvDB5FOKXbgLmsjWrH8efajvqFUOZII8R 6Y+9rad3mJSSQ5x9zxtKcx/0rJzTQgNAXF+sJqFMiLVL49YHaHd7YiRc9PvgCk0m+7cNQTR3BjK hErWQa9pbdZ2c5ceZS9GjOKYfP1ji5cLddfVT8nJRlQ== X-Received: by 2002:a50:a5af:: with SMTP id a44-v6mr16188348edc.289.1534920025757; Tue, 21 Aug 2018 23:40:25 -0700 (PDT) X-Google-Smtp-Source: AA+uWPzF8ntjXIo4ETZDasRyg6Y5OrmsZVCg2l7LtGgdffJIWK2zLqRdzYBG0k+IftisWuGKHaGigQ== X-Received: by 2002:a50:a5af:: with SMTP id a44-v6mr16188337edc.289.1534920025614; Tue, 21 Aug 2018 23:40:25 -0700 (PDT) Received: from localhost.localdomain ([81.221.205.149]) by smtp.gmail.com with ESMTPSA id j23-v6sm529350edh.29.2018.08.21.23.40.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 21 Aug 2018 23:40:24 -0700 (PDT) From: Juerg Haefliger X-Google-Original-From: Juerg Haefliger To: kernel-team@lists.ubuntu.com Subject: [SRU][Trusty][PATCH 1/7] mm: x86 pgtable: drop unneeded preprocessor ifdef Date: Wed, 22 Aug 2018 08:40:15 +0200 Message-Id: <20180822064021.17216-2-juergh@canonical.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180822064021.17216-1-juergh@canonical.com> References: <20180822064021.17216-1-juergh@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: juergh@canonical.com MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Cyrill Gorcunov _PAGE_BIT_FILE (bit 6) is always less than _PAGE_BIT_PROTNONE (bit 8), so drop redundant #ifdef. Signed-off-by: Cyrill Gorcunov Cc: Linus Torvalds Cc: Mel Gorman Cc: Peter Anvin Cc: Ingo Molnar Cc: Steven Noonan Cc: Rik van Riel Cc: David Vrabel Cc: Peter Zijlstra Cc: Pavel Emelyanov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds CVE-2018-3620 CVE-2018-3646 (backported from commit 2373eaecff33db5972bde9418f92d6401b4a945c) [juergh: - Added additional comment from commit bcd11afa7ada ("x86/speculation/l1tf: Change order of offset/type in swap entry"). - Added a compile-time error for _PAGE_BIT_FILE > _PAGE_BIT_PROTNONE.] Signed-off-by: Juerg Haefliger --- arch/x86/include/asm/pgtable-2level.h | 10 ---------- arch/x86/include/asm/pgtable_64.h | 21 +++++++++++++++------ 2 files changed, 15 insertions(+), 16 deletions(-) diff --git a/arch/x86/include/asm/pgtable-2level.h b/arch/x86/include/asm/pgtable-2level.h index c3625ecf5e3e..b405a0e5f053 100644 --- a/arch/x86/include/asm/pgtable-2level.h +++ b/arch/x86/include/asm/pgtable-2level.h @@ -105,13 +105,8 @@ static inline pmd_t native_pmdp_get_and_clear(pmd_t *xp) */ #define PTE_FILE_MAX_BITS 29 #define PTE_FILE_SHIFT1 (_PAGE_BIT_PRESENT + 1) -#if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE #define PTE_FILE_SHIFT2 (_PAGE_BIT_FILE + 1) #define PTE_FILE_SHIFT3 (_PAGE_BIT_PROTNONE + 1) -#else -#define PTE_FILE_SHIFT2 (_PAGE_BIT_PROTNONE + 1) -#define PTE_FILE_SHIFT3 (_PAGE_BIT_FILE + 1) -#endif #define PTE_FILE_BITS1 (PTE_FILE_SHIFT2 - PTE_FILE_SHIFT1 - 1) #define PTE_FILE_BITS2 (PTE_FILE_SHIFT3 - PTE_FILE_SHIFT2 - 1) @@ -135,13 +130,8 @@ static inline pmd_t native_pmdp_get_and_clear(pmd_t *xp) #endif /* CONFIG_MEM_SOFT_DIRTY */ /* Encode and de-code a swap entry */ -#if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1) #define SWP_OFFSET_SHIFT (_PAGE_BIT_PROTNONE + 1) -#else -#define SWP_TYPE_BITS (_PAGE_BIT_PROTNONE - _PAGE_BIT_PRESENT - 1) -#define SWP_OFFSET_SHIFT (_PAGE_BIT_FILE + 1) -#endif #define MAX_SWAPFILES_CHECK() BUILD_BUG_ON(MAX_SWAPFILES_SHIFT > SWP_TYPE_BITS) diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index 975962a32a20..a39a0afe65be 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -165,19 +165,28 @@ static inline int pgd_large(pgd_t pgd) { return 0; } #define pte_offset_map(dir, address) pte_offset_kernel((dir), (address)) #define pte_unmap(pte) ((void)(pte))/* NOP */ +#if _PAGE_BIT_FILE > _PAGE_BIT_PROTNONE +#error "Unsupported PTE bit arrangement" +#endif + /* * Encode and de-code a swap entry * + * | ... | 11| 10| 9|8|7|6|5| 4| 3|2|1|0| <- bit number + * | ... |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P| <- bit names + * | TYPE (59-63) | ~OFFSET (9-58) |0|X|X|X| X| X|X|X|0| <- swp entry + * + * G (8) is aliased and used as a PROT_NONE indicator for + * !present ptes. We need to start storing swap entries above + * there. We also need to avoid using A and D because of an + * erratum where they can be incorrectly set by hardware on + * non-present PTEs. + * * The offset is inverted by a binary not operation to make the high * physical bits set. -*/ -#if _PAGE_BIT_FILE < _PAGE_BIT_PROTNONE + */ #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1) #define SWP_OFFSET_FIRST_BIT (_PAGE_BIT_PROTNONE + 1) -#else -#define SWP_TYPE_BITS (_PAGE_BIT_PROTNONE - _PAGE_BIT_PRESENT - 1) -#define SWP_OFFSET_FIRST_BIT (_PAGE_BIT_FILE + 1) -#endif /* We always extract/encode the offset by shifting it all the way up, and then down again */ #define SWP_OFFSET_SHIFT (SWP_OFFSET_FIRST_BIT+SWP_TYPE_BITS) From patchwork Wed Aug 22 06:40:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juerg Haefliger X-Patchwork-Id: 960795 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41wHwz6rcYz9s8F; Wed, 22 Aug 2018 16:40:35 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1fsMof-0006tS-16; Wed, 22 Aug 2018 06:40:29 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1fsMod-0006ss-Ef for kernel-team@lists.ubuntu.com; Wed, 22 Aug 2018 06:40:27 +0000 Received: from mail-ed1-f72.google.com ([209.85.208.72]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1fsMod-0004Ef-7K for kernel-team@lists.ubuntu.com; Wed, 22 Aug 2018 06:40:27 +0000 Received: by mail-ed1-f72.google.com with SMTP id t24-v6so492830edq.13 for ; Tue, 21 Aug 2018 23:40:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=PC9DlkW2NMfFBAm1RezU+PbWWANSSSTlhYhloeOZQyA=; b=poRAPeq5gdDNf3qL/X34zY/vJGVx8M8ZfwnYTRrk3yrAhFTGUWZtHwvPTW5JKSVt5c CRK6pr5V/aLPKIAlxfGIm2gWGRLNyXHSagzeO+kHHJklBh2t1HrRKBuic4Tr7pKUVeak s0Hq9l0JR8G1M/GhdMuFhMhRXXX06z3s4J6xnTRVofCZzN3nUW0ZxEJu7hPK/ODCrb0y 1V+HwHVZbRr8HQOyUfvijGD94in77l8XIOOSb9XOBcz+bBfNZgNe5R/PuJhHM6x3Z7fT zK4DSMtkksn1LQ7ulS8DtjQAPUN3cZ9AqNVSJB5+12ZXLnIAx9k+FIqqWEXH3kTfIHzg j11Q== X-Gm-Message-State: AOUpUlFn6gGlxLBE2mYFYku0VT1wiKKjaIWBxf1zr6xCGCqUbIxupac1 z+F47adLAMuw9a8MqOj0P/vH15Hi9UK2TiHC/BiNwMtT5pAnAPh7xj9zZE3fbcRvnN9f6B2l9v8 kJgMWptM+Lzsyj/ewv3KY6Bece3CUoUNpGjSERMRphw== X-Received: by 2002:a50:a2a6:: with SMTP id 35-v6mr65325389edm.276.1534920026751; Tue, 21 Aug 2018 23:40:26 -0700 (PDT) X-Google-Smtp-Source: AA+uWPwSKShWLH7iUhZOxs0MF1xCgEh98R4s6f955ZsMwbxt32HquG20tY3pMz14x7jL60WwuKoZQA== X-Received: by 2002:a50:a2a6:: with SMTP id 35-v6mr65325375edm.276.1534920026622; Tue, 21 Aug 2018 23:40:26 -0700 (PDT) Received: from localhost.localdomain ([81.221.205.149]) by smtp.gmail.com with ESMTPSA id j23-v6sm529350edh.29.2018.08.21.23.40.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 21 Aug 2018 23:40:25 -0700 (PDT) From: Juerg Haefliger X-Google-Original-From: Juerg Haefliger To: kernel-team@lists.ubuntu.com Subject: [SRU][Trusty][PATCH 2/7] mm: Move change_prot_numa outside CONFIG_ARCH_USES_NUMA_PROT_NONE Date: Wed, 22 Aug 2018 08:40:16 +0200 Message-Id: <20180822064021.17216-3-juergh@canonical.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180822064021.17216-1-juergh@canonical.com> References: <20180822064021.17216-1-juergh@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: juergh@canonical.com MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: "Aneesh Kumar K.V" change_prot_numa should work even if _PAGE_NUMA != _PAGE_PROTNONE. On archs like ppc64 that don't use _PAGE_PROTNONE and also have a separate page table outside linux pagetable, we just need to make sure that when calling change_prot_numa we flush the hardware page table entry so that next page access result in a numa fault. We still need to make sure we use the numa faulting logic only when CONFIG_NUMA_BALANCING is set. This implies the migrate-on-fault (Lazy migration) via mbind will only work if CONFIG_NUMA_BALANCING is set. Signed-off-by: Aneesh Kumar K.V Reviewed-by: Rik van Riel Acked-by: Mel Gorman Signed-off-by: Benjamin Herrenschmidt CVE-2018-3620 CVE-2018-3646 (cherry picked from commit 5877231f646bbd6d1d545e7af83aaa6e6b746013) Signed-off-by: Juerg Haefliger --- include/linux/mm.h | 2 +- mm/mempolicy.c | 5 ++--- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index c954fcac4c44..08c4eb046642 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1962,7 +1962,7 @@ static inline pgprot_t vm_get_page_prot(unsigned long vm_flags) } #endif -#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE +#ifdef CONFIG_NUMA_BALANCING unsigned long change_prot_numa(struct vm_area_struct *vma, unsigned long start, unsigned long end); #endif diff --git a/mm/mempolicy.c b/mm/mempolicy.c index f8e170ec6086..a629171a93fb 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -617,7 +617,7 @@ static inline int queue_pages_pgd_range(struct vm_area_struct *vma, return 0; } -#ifdef CONFIG_ARCH_USES_NUMA_PROT_NONE +#ifdef CONFIG_NUMA_BALANCING /* * This is used to mark a range of virtual addresses to be inaccessible. * These are later cleared by a NUMA hinting fault. Depending on these @@ -631,7 +631,6 @@ unsigned long change_prot_numa(struct vm_area_struct *vma, unsigned long addr, unsigned long end) { int nr_updated; - BUILD_BUG_ON(_PAGE_NUMA != _PAGE_PROTNONE); nr_updated = change_protection(vma, addr, end, vma->vm_page_prot, 0, 1); if (nr_updated) @@ -645,7 +644,7 @@ static unsigned long change_prot_numa(struct vm_area_struct *vma, { return 0; } -#endif /* CONFIG_ARCH_USES_NUMA_PROT_NONE */ +#endif /* CONFIG_NUMA_BALANCING */ /* * Walk through page tables and collect pages to be migrated. From patchwork Wed Aug 22 06:40:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juerg Haefliger X-Patchwork-Id: 960796 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41wHx02vvLz9s7c; Wed, 22 Aug 2018 16:40:36 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1fsMof-0006uN-Fc; Wed, 22 Aug 2018 06:40:29 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1fsMoe-0006tM-Cy for kernel-team@lists.ubuntu.com; Wed, 22 Aug 2018 06:40:28 +0000 Received: from mail-ed1-f69.google.com ([209.85.208.69]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1fsMoe-0004En-5u for kernel-team@lists.ubuntu.com; Wed, 22 Aug 2018 06:40:28 +0000 Received: by mail-ed1-f69.google.com with SMTP id l16-v6so465542edq.18 for ; Tue, 21 Aug 2018 23:40:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=BOP9Mb68Z6dZfi6qBwyeGP6g8ruiJRAzALtTgQ2WlOs=; b=ORcs7tXnb6hj6LkyhuAMUOIrfcsSnHBq2uSYI9lUy6iEMk1qqk/cN2BHMXC0Abf6QY g3COLDONGuBu26L/HW/k5FYVAcD+uzzFGSunjHbfEOOJ+sfrMUyoKc1bZrv61YXBoNKc CVFNv8cUgLOs28TN9sWiRwsArMBp2Dv+j3S08LGbvW46/qdZruKdmlGi88fo6u2Vos8X zPMTYsJo+fNjVJPOVzciMTdYvMr+VPsFEqasGeLBO8FQtE/mCCfRq8cHlA8KbxLl5x35 KIB1JjL4v6+Fst8sYzOPcvyI+BjDo2b8pHd49vC2PPFPWu0cqWtoJ7/Ddl3gJ0HTGgcv 0/sg== X-Gm-Message-State: AOUpUlEce83t5u8soas0bIJ/lk7m4oU1kGdFcV9dQJKQdN8K5nusfGG+ 61z2mb0p8XEg4XtkzK77bdVHuMFLEm0IVRtvi8BgV4ONZ56ZyUtSTAVwyIfG1s4d3RjGxax9CwF 0buynZehvDHyH6XPALgYr64snDT2MbwpDRGmHJ7nD3g== X-Received: by 2002:a50:b7db:: with SMTP id i27-v6mr65048468ede.284.1534920027698; Tue, 21 Aug 2018 23:40:27 -0700 (PDT) X-Google-Smtp-Source: AA+uWPxQj5i0Nc/B6sxz3L9Wq9f/h3+w35w0q+ibqpcFJKogEglW3+trWj1t+eEdRdGDFCgOfHOvmA== X-Received: by 2002:a50:b7db:: with SMTP id i27-v6mr65048462ede.284.1534920027587; Tue, 21 Aug 2018 23:40:27 -0700 (PDT) Received: from localhost.localdomain ([81.221.205.149]) by smtp.gmail.com with ESMTPSA id j23-v6sm529350edh.29.2018.08.21.23.40.26 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 21 Aug 2018 23:40:26 -0700 (PDT) From: Juerg Haefliger X-Google-Original-From: Juerg Haefliger To: kernel-team@lists.ubuntu.com Subject: [SRU][Trusty][PATCH 3/7] x86: require x86-64 for automatic NUMA balancing Date: Wed, 22 Aug 2018 08:40:17 +0200 Message-Id: <20180822064021.17216-4-juergh@canonical.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180822064021.17216-1-juergh@canonical.com> References: <20180822064021.17216-1-juergh@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: juergh@canonical.com MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Mel Gorman 32-bit support for NUMA is an oddity on its own but with automatic NUMA balancing on top there is a reasonable risk that the CPUPID information cannot be stored in the page flags. This patch removes support for automatic NUMA support on 32-bit x86. Signed-off-by: Mel Gorman Cc: David Vrabel Cc: Ingo Molnar Cc: Peter Anvin Cc: Fengguang Wu Cc: Linus Torvalds Cc: Steven Noonan Cc: Rik van Riel Cc: Peter Zijlstra Cc: Andrea Arcangeli Cc: Dave Hansen Cc: Srikar Dronamraju Cc: Cyrill Gorcunov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds CVE-2018-3620 CVE-2018-3646 (cherry picked from commit 4468dd76f51f8be75d4f04f1d721e379596e7262) Signed-off-by: Juerg Haefliger --- arch/x86/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 15fa347d3ced..5b7ab646d6a6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -25,7 +25,7 @@ config X86 select ARCH_MIGHT_HAVE_PC_PARPORT select HAVE_AOUT if X86_32 select HAVE_UNSTABLE_SCHED_CLOCK - select ARCH_SUPPORTS_NUMA_BALANCING + select ARCH_SUPPORTS_NUMA_BALANCING if X86_64 select ARCH_SUPPORTS_INT128 if X86_64 select ARCH_WANTS_PROT_NUMA_PROT_NONE select HAVE_IDE From patchwork Wed Aug 22 06:40:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juerg Haefliger X-Patchwork-Id: 960799 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41wHx91mYjz9s4c; Wed, 22 Aug 2018 16:40:45 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1fsMoo-000707-0B; Wed, 22 Aug 2018 06:40:38 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1fsMof-0006uQ-Ot for kernel-team@lists.ubuntu.com; Wed, 22 Aug 2018 06:40:29 +0000 Received: from mail-ed1-f69.google.com ([209.85.208.69]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1fsMof-0004Es-Dt for kernel-team@lists.ubuntu.com; Wed, 22 Aug 2018 06:40:29 +0000 Received: by mail-ed1-f69.google.com with SMTP id r21-v6so461549edp.23 for ; Tue, 21 Aug 2018 23:40:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=yXwDVKfqPd43DPmzN06CTplu32wRcUR/TwEdLvfMDN8=; b=g6SuKz8m8kM+eAtMIEGWiMrBV83Ax6hrUSe52p7WrPCwaxGu0wf/EJnc95jCQ/EdAy kosVEdMFpWTRkb84RFWeV0N3JaU4prgKp6isrUJXsLywxkc7TOQBof7YLxvAMqLxziY8 kzF6PcvW39OCjEL2g1WDlFBBHJdCh9zFVf+BWNzRzf1L9aR5HygyV+vOVjKytXmZJ0QO YqoqfHjJRlNfSgkQdzHHYtBOi74foZ5RZMHHNHQCPIgZQ4CT5lLrJbxH76fmQJAZfeK9 m/FSyVpe+xg6TewynNyF16WRQOhB2UilqLY6oN7riWsnzc3jlBgTg3ns/e1KeCTyQMNA BRww== X-Gm-Message-State: APzg51D5TzOjVM6k5IVxhtZdHG+Gs+kY4tjKDwBkxpfE4Nn6TvXEH0xn 1/wUv1k7f9Lk1JuQ4cOKwD8zm8lj06KF56FjDzBrILctwYBHwvVI1Lbm2TIuEAybR9R+yeFUId6 wR04YKmRs24EbTEGHEfmnCszAL63k/ybpSNJZlHhbvg== X-Received: by 2002:a50:aed8:: with SMTP id f24-v6mr2124541edd.271.1534920028658; Tue, 21 Aug 2018 23:40:28 -0700 (PDT) X-Google-Smtp-Source: ANB0VdY0giYjjDUjUSj1qysEQ5DIVILFzn/j7xY9E/lYWloKfD1V5gcJjD1nKHcAdZzrqMOGByS0rA== X-Received: by 2002:a50:aed8:: with SMTP id f24-v6mr2124529edd.271.1534920028463; Tue, 21 Aug 2018 23:40:28 -0700 (PDT) Received: from localhost.localdomain ([81.221.205.149]) by smtp.gmail.com with ESMTPSA id j23-v6sm529350edh.29.2018.08.21.23.40.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 21 Aug 2018 23:40:27 -0700 (PDT) From: Juerg Haefliger X-Google-Original-From: Juerg Haefliger To: kernel-team@lists.ubuntu.com Subject: [SRU][Trusty][PATCH 4/7] x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels Date: Wed, 22 Aug 2018 08:40:18 +0200 Message-Id: <20180822064021.17216-5-juergh@canonical.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180822064021.17216-1-juergh@canonical.com> References: <20180822064021.17216-1-juergh@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: juergh@canonical.com MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Mel Gorman _PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting faults on x86. Care is taken such that _PAGE_NUMA is used only in situations where the VMA flags distinguish between NUMA hinting faults and prot_none faults. This decision was x86-specific and conceptually it is difficult requiring special casing to distinguish between PROTNONE and NUMA ptes based on context. Fundamentally, we only need the _PAGE_NUMA bit to tell the difference between an entry that is really unmapped and a page that is protected for NUMA hinting faults as if the PTE is not present then a fault will be trapped. Swap PTEs on x86-64 use the bits after _PAGE_GLOBAL for the offset. This patch shrinks the maximum possible swap size and uses the bit to uniquely distinguish between NUMA hinting ptes and swap ptes. Signed-off-by: Mel Gorman Cc: David Vrabel Cc: Ingo Molnar Cc: Peter Anvin Cc: Fengguang Wu Cc: Linus Torvalds Cc: Steven Noonan Cc: Rik van Riel Cc: Peter Zijlstra Cc: Andrea Arcangeli Cc: Dave Hansen Cc: Srikar Dronamraju Cc: Cyrill Gorcunov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds CVE-2018-3620 CVE-2018-3646 (backported from commit c46a7c817e662a820373bb76b88d0ad67d6abe5d) [juergh: - Dropped powerpc changes (Trusty powerpc doesn't support NUMA). - Adjusted context due to L1TF changes.] Signed-off-by: Juerg Haefliger --- arch/x86/include/asm/pgtable.h | 15 +++++-- arch/x86/include/asm/pgtable_64.h | 5 +++ arch/x86/include/asm/pgtable_types.h | 66 +++++++++++++++------------- arch/x86/mm/pageattr-test.c | 2 +- include/asm-generic/pgtable.h | 8 +++- include/linux/swapops.h | 2 +- mm/memory.c | 17 +++---- 7 files changed, 66 insertions(+), 49 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 6d26abf00939..aa0b02bd1855 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -140,7 +140,8 @@ static inline int pte_exec(pte_t pte) static inline int pte_special(pte_t pte) { - return pte_flags(pte) & _PAGE_SPECIAL; + return (pte_flags(pte) & (_PAGE_PRESENT|_PAGE_SPECIAL)) == + (_PAGE_PRESENT|_PAGE_SPECIAL); } /* Entries that were set to PROT_NONE are inverted */ @@ -500,6 +501,12 @@ static inline int pte_present(pte_t a) _PAGE_NUMA); } +#define pte_present_nonuma pte_present_nonuma +static inline int pte_present_nonuma(pte_t a) +{ + return pte_flags(a) & (_PAGE_PRESENT | _PAGE_PROTNONE); +} + #define pte_accessible pte_accessible static inline bool pte_accessible(struct mm_struct *mm, pte_t a) { @@ -926,19 +933,19 @@ static inline void update_mmu_cache_pmd(struct vm_area_struct *vma, static inline pte_t pte_swp_mksoft_dirty(pte_t pte) { - VM_BUG_ON(pte_present(pte)); + VM_BUG_ON(pte_present_nonuma(pte)); return pte_set_flags(pte, _PAGE_SWP_SOFT_DIRTY); } static inline int pte_swp_soft_dirty(pte_t pte) { - VM_BUG_ON(pte_present(pte)); + VM_BUG_ON(pte_present_nonuma(pte)); return pte_flags(pte) & _PAGE_SWP_SOFT_DIRTY; } static inline pte_t pte_swp_clear_soft_dirty(pte_t pte) { - VM_BUG_ON(pte_present(pte)); + VM_BUG_ON(pte_present_nonuma(pte)); return pte_clear_flags(pte, _PAGE_SWP_SOFT_DIRTY); } diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index a39a0afe65be..a9b88fe94bfa 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -186,7 +186,12 @@ static inline int pgd_large(pgd_t pgd) { return 0; } * physical bits set. */ #define SWP_TYPE_BITS (_PAGE_BIT_FILE - _PAGE_BIT_PRESENT - 1) +#ifdef CONFIG_NUMA_BALANCING +/* Automatic NUMA balancing needs to be distinguishable from swap entries */ +#define SWP_OFFSET_FIRST_BIT (_PAGE_BIT_PROTNONE + 2) +#else #define SWP_OFFSET_FIRST_BIT (_PAGE_BIT_PROTNONE + 1) +#endif /* We always extract/encode the offset by shifting it all the way up, and then down again */ #define SWP_OFFSET_SHIFT (SWP_OFFSET_FIRST_BIT+SWP_TYPE_BITS) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index a0c024c7478e..8512719a6704 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -16,15 +16,26 @@ #define _PAGE_BIT_PSE 7 /* 4 MB (or 2MB) page */ #define _PAGE_BIT_PAT 7 /* on 4KB pages */ #define _PAGE_BIT_GLOBAL 8 /* Global TLB entry PPro+ */ -#define _PAGE_BIT_UNUSED1 9 /* available for programmer */ -#define _PAGE_BIT_IOMAP 10 /* flag used to indicate IO mapping */ -#define _PAGE_BIT_HIDDEN 11 /* hidden by kmemcheck */ +#define _PAGE_BIT_SOFTW1 9 /* available for programmer */ +#define _PAGE_BIT_SOFTW2 10 /* " */ +#define _PAGE_BIT_SOFTW3 11 /* " */ #define _PAGE_BIT_PAT_LARGE 12 /* On 2MB or 1GB pages */ -#define _PAGE_BIT_SPECIAL _PAGE_BIT_UNUSED1 -#define _PAGE_BIT_CPA_TEST _PAGE_BIT_UNUSED1 -#define _PAGE_BIT_SPLITTING _PAGE_BIT_UNUSED1 /* only valid on a PSE pmd */ +#define _PAGE_BIT_SPECIAL _PAGE_BIT_SOFTW1 +#define _PAGE_BIT_CPA_TEST _PAGE_BIT_SOFTW1 +#define _PAGE_BIT_SPLITTING _PAGE_BIT_SOFTW2 /* only valid on a PSE pmd */ +#define _PAGE_BIT_IOMAP _PAGE_BIT_SOFTW2 /* flag used to indicate IO mapping */ +#define _PAGE_BIT_HIDDEN _PAGE_BIT_SOFTW3 /* hidden by kmemcheck */ +#define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */ #define _PAGE_BIT_NX 63 /* No execute: only valid after cpuid check */ +/* + * Swap offsets on configurations that allow automatic NUMA balancing use the + * bits after _PAGE_BIT_GLOBAL. To uniquely distinguish NUMA hinting PTEs from + * swap entries, we use the first bit after _PAGE_BIT_GLOBAL and shrink the + * maximum possible swap space from 16TB to 8TB. + */ +#define _PAGE_BIT_NUMA (_PAGE_BIT_GLOBAL+1) + /* If _PAGE_BIT_PRESENT is clear, we use these: */ /* - if the user mapped it with PROT_NONE; pte_present gives true */ #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL @@ -40,7 +51,7 @@ #define _PAGE_DIRTY (_AT(pteval_t, 1) << _PAGE_BIT_DIRTY) #define _PAGE_PSE (_AT(pteval_t, 1) << _PAGE_BIT_PSE) #define _PAGE_GLOBAL (_AT(pteval_t, 1) << _PAGE_BIT_GLOBAL) -#define _PAGE_UNUSED1 (_AT(pteval_t, 1) << _PAGE_BIT_UNUSED1) +#define _PAGE_SOFTW1 (_AT(pteval_t, 1) << _PAGE_BIT_SOFTW1) #define _PAGE_IOMAP (_AT(pteval_t, 1) << _PAGE_BIT_IOMAP) #define _PAGE_PAT (_AT(pteval_t, 1) << _PAGE_BIT_PAT) #define _PAGE_PAT_LARGE (_AT(pteval_t, 1) << _PAGE_BIT_PAT_LARGE) @@ -61,14 +72,27 @@ * they do not conflict with each other. */ -#define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_HIDDEN - #ifdef CONFIG_MEM_SOFT_DIRTY #define _PAGE_SOFT_DIRTY (_AT(pteval_t, 1) << _PAGE_BIT_SOFT_DIRTY) #else #define _PAGE_SOFT_DIRTY (_AT(pteval_t, 0)) #endif +/* + * _PAGE_NUMA distinguishes between a numa hinting minor fault and a page + * that is not present. The hinting fault gathers numa placement statistics + * (see pte_numa()). The bit is always zero when the PTE is not present. + * + * The bit picked must be always zero when the pmd is present and not + * present, so that we don't lose information when we set it while + * atomically clearing the present bit. + */ +#ifdef CONFIG_NUMA_BALANCING +#define _PAGE_NUMA (_AT(pteval_t, 1) << _PAGE_BIT_NUMA) +#else +#define _PAGE_NUMA (_AT(pteval_t, 0)) +#endif + /* * Tracking soft dirty bit when a page goes to a swap is tricky. * We need a bit which can be stored in pte _and_ not conflict @@ -94,26 +118,6 @@ #define _PAGE_FILE (_AT(pteval_t, 1) << _PAGE_BIT_FILE) #define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) -/* - * _PAGE_NUMA indicates that this page will trigger a numa hinting - * minor page fault to gather numa placement statistics (see - * pte_numa()). The bit picked (8) is within the range between - * _PAGE_FILE (6) and _PAGE_PROTNONE (8) bits. Therefore, it doesn't - * require changes to the swp entry format because that bit is always - * zero when the pte is not present. - * - * The bit picked must be always zero when the pmd is present and not - * present, so that we don't lose information when we set it while - * atomically clearing the present bit. - * - * Because we shared the same bit (8) with _PAGE_PROTNONE this can be - * interpreted as _PAGE_NUMA only in places that _PAGE_PROTNONE - * couldn't reach, like handle_mm_fault() (see access_error in - * arch/x86/mm/fault.c, the vma protection must not be PROT_NONE for - * handle_mm_fault() to be invoked). - */ -#define _PAGE_NUMA _PAGE_PROTNONE - #define _PAGE_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \ _PAGE_ACCESSED | _PAGE_DIRTY) #define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | \ @@ -122,8 +126,8 @@ /* Set of bits not changed in pte_modify */ #define _PAGE_CHG_MASK (PTE_PFN_MASK | _PAGE_PCD | _PAGE_PWT | \ _PAGE_SPECIAL | _PAGE_ACCESSED | _PAGE_DIRTY | \ - _PAGE_SOFT_DIRTY) -#define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE) + _PAGE_SOFT_DIRTY | _PAGE_NUMA) +#define _HPAGE_CHG_MASK (_PAGE_CHG_MASK | _PAGE_PSE | _PAGE_NUMA) /* The ASID is the lower 12 bits of CR3 */ #define X86_CR3_PCID_ASID_MASK (_AC((1<<12)-1,UL)) diff --git a/arch/x86/mm/pageattr-test.c b/arch/x86/mm/pageattr-test.c index d0b1773d9d2e..8232b05dfd9b 100644 --- a/arch/x86/mm/pageattr-test.c +++ b/arch/x86/mm/pageattr-test.c @@ -36,7 +36,7 @@ enum { static int pte_testbit(pte_t pte) { - return pte_flags(pte) & _PAGE_UNUSED1; + return pte_flags(pte) & _PAGE_SOFTW1; } struct split_state { diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 3d237b67be43..02db1813d83d 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -220,6 +220,10 @@ static inline int pmd_same(pmd_t pmd_a, pmd_t pmd_b) # define pte_accessible(mm, pte) ((void)(pte), 1) #endif +#ifndef pte_present_nonuma +#define pte_present_nonuma(pte) pte_present(pte) +#endif + #ifndef flush_tlb_fix_spurious_fault #define flush_tlb_fix_spurious_fault(vma, address) flush_tlb_page(vma, address) #endif @@ -657,7 +661,7 @@ static inline int pmd_trans_unstable(pmd_t *pmd) static inline int pte_numa(pte_t pte) { return (pte_flags(pte) & - (_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA; + (_PAGE_NUMA|_PAGE_PROTNONE|_PAGE_PRESENT)) == _PAGE_NUMA; } #endif @@ -665,7 +669,7 @@ static inline int pte_numa(pte_t pte) static inline int pmd_numa(pmd_t pmd) { return (pmd_flags(pmd) & - (_PAGE_NUMA|_PAGE_PRESENT)) == _PAGE_NUMA; + (_PAGE_NUMA|_PAGE_PROTNONE|_PAGE_PRESENT)) == _PAGE_NUMA; } #endif diff --git a/include/linux/swapops.h b/include/linux/swapops.h index d7f3b3f443a3..e288d5c016a7 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -54,7 +54,7 @@ static inline pgoff_t swp_offset(swp_entry_t entry) /* check whether a pte points to a swap entry */ static inline int is_swap_pte(pte_t pte) { - return !pte_none(pte) && !pte_present(pte) && !pte_file(pte); + return !pte_none(pte) && !pte_present_nonuma(pte) && !pte_file(pte); } #endif diff --git a/mm/memory.c b/mm/memory.c index 112d1feed5aa..c8140b854dd6 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -745,7 +745,7 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn = pte_pfn(pte); if (HAVE_PTE_SPECIAL) { - if (likely(!pte_special(pte))) + if (likely(!pte_special(pte) || pte_numa(pte))) goto check_pfn; if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) return NULL; @@ -771,14 +771,15 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, } } - if (is_zero_pfn(pfn)) - return NULL; check_pfn: if (unlikely(pfn > highest_memmap_pfn)) { print_bad_pte(vma, addr, pte, NULL); return NULL; } + if (is_zero_pfn(pfn)) + return NULL; + /* * NOTE! We still have PageReserved() pages in the page tables. * eg. VDSO mappings can cause them to exist. @@ -1716,13 +1717,9 @@ long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, (VM_MAYREAD | VM_MAYWRITE) : (VM_READ | VM_WRITE); /* - * If FOLL_FORCE and FOLL_NUMA are both set, handle_mm_fault - * would be called on PROT_NONE ranges. We must never invoke - * handle_mm_fault on PROT_NONE ranges or the NUMA hinting - * page faults would unprotect the PROT_NONE ranges if - * _PAGE_NUMA and _PAGE_PROTNONE are sharing the same pte/pmd - * bitflag. So to avoid that, don't set FOLL_NUMA if - * FOLL_FORCE is set. + * If FOLL_FORCE is set then do not force a full fault as the hinting + * fault information is unrelated to the reference behaviour of a task + * using the address space */ if (!(gup_flags & FOLL_FORCE)) gup_flags |= FOLL_NUMA; From patchwork Wed Aug 22 06:40:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juerg Haefliger X-Patchwork-Id: 960800 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41wHxB1dNbz9s7X; Wed, 22 Aug 2018 16:40:46 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1fsMoo-00070r-QI; Wed, 22 Aug 2018 06:40:38 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1fsMog-0006uv-9w for kernel-team@lists.ubuntu.com; Wed, 22 Aug 2018 06:40:30 +0000 Received: from mail-ed1-f71.google.com ([209.85.208.71]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1fsMog-0004F0-00 for kernel-team@lists.ubuntu.com; Wed, 22 Aug 2018 06:40:30 +0000 Received: by mail-ed1-f71.google.com with SMTP id k48-v6so490496ede.14 for ; Tue, 21 Aug 2018 23:40:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Z+CEvP+MXUBVQnjpfgjRNrbopTCTqreOnjS6Xm8g0cs=; b=f3pHp7TY2X54a3RAgoE2xWFATk4Q2KF7AB5gH3pWLXVPPmZarfG4hP1dCB94IlmPvo hxB+3VUqmJLf8cpIcxP2GMOf6p9UJ3RsofsbQWe0gDVKb/r5S+B1zmEgLNTDDAqc1GE0 Het9MHWO1BGIm2wPT1aWJjd4fGLhrkff5r3elAOfVJkaqHcIbukJ11//9AX7LTzml4xs RLX5oZ8HbIva0MwkScUg/bcDWHbC2V1ljRdCrS5sWrJvu922019eJPgb6nQcZbLqzA/i OT1LCs3WiIAV9TbKGpR1GAH7SmtqOCWJqIutdhd85hFRtUen0ARK36NltJtLMyRgLUjG HGyA== X-Gm-Message-State: AOUpUlFN8L5hTMJnOISZv368EftqJXMJpRWL0mEp/DpCpgda6m2GzSnm Ielf/P5VcqofgfS1xNX/ikQ3RDlNDRqPPWeAvVHgUK0P+0cHHsPqR19Oensl6YQEVGSEDbLaU26 u6wOv02jnKTO1sZ9LdEG/X/qQoAtq9m8mpBfAdnnVfQ== X-Received: by 2002:a50:b003:: with SMTP id i3-v6mr9452599edd.120.1534920029401; Tue, 21 Aug 2018 23:40:29 -0700 (PDT) X-Google-Smtp-Source: AA+uWPy+BXoQJiEtymJiXhNte69boDK4O5yZDELlRmVUgXqAdZzs/C3KOaWWh9iG4oicPMVQPE/EGQ== X-Received: by 2002:a50:b003:: with SMTP id i3-v6mr9452588edd.120.1534920029261; Tue, 21 Aug 2018 23:40:29 -0700 (PDT) Received: from localhost.localdomain ([81.221.205.149]) by smtp.gmail.com with ESMTPSA id j23-v6sm529350edh.29.2018.08.21.23.40.28 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 21 Aug 2018 23:40:28 -0700 (PDT) From: Juerg Haefliger X-Google-Original-From: Juerg Haefliger To: kernel-team@lists.ubuntu.com Subject: [SRU][Trusty][PATCH 5/7] x86,mm: fix pte_special versus pte_numa Date: Wed, 22 Aug 2018 08:40:19 +0200 Message-Id: <20180822064021.17216-6-juergh@canonical.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180822064021.17216-1-juergh@canonical.com> References: <20180822064021.17216-1-juergh@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: juergh@canonical.com MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Hugh Dickins Sasha Levin has shown oopses on ffffea0003480048 and ffffea0003480008 at mm/memory.c:1132, running Trinity on different 3.16-rc-next kernels: where zap_pte_range() checks page->mapping to see if PageAnon(page). Those addresses fit struct pages for pfns d2001 and d2000, and in each dump a register or a stack slot showed d2001730 or d2000730: pte flags 0x730 are PCD ACCESSED PROTNONE SPECIAL IOMAP; and Sasha's e820 map has a hole between cfffffff and 100000000, which would need special access. Commit c46a7c817e66 ("x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels") has broken vm_normal_page(): a PROTNONE SPECIAL pte no longer passes the pte_special() test, so zap_pte_range() goes on to try to access a non-existent struct page. Fix this by refining pte_special() (SPECIAL with PRESENT or PROTNONE) to complement pte_numa() (SPECIAL with neither PRESENT nor PROTNONE). A hint that this was a problem was that c46a7c817e66 added pte_numa() test to vm_normal_page(), and moved its is_zero_pfn() test from slow to fast path: This was papering over a pte_special() snag when the zero page was encountered during zap. This patch reverts vm_normal_page() to how it was before, relying on pte_special(). It still appears that this patch may be incomplete: aren't there other places which need to be handling PROTNONE along with PRESENT? For example, pte_mknuma() clears _PAGE_PRESENT and sets _PAGE_NUMA, but on a PROT_NONE area, that would make it pte_special(). This is side-stepped by the fact that NUMA hinting faults skipped PROT_NONE VMAs and there are no grounds where a NUMA hinting fault on a PROT_NONE VMA would be interesting. Fixes: c46a7c817e66 ("x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels") Reported-by: Sasha Levin Tested-by: Sasha Levin Signed-off-by: Hugh Dickins Signed-off-by: Mel Gorman Cc: "Kirill A. Shutemov" Cc: Peter Zijlstra Cc: Rik van Riel Cc: Johannes Weiner Cc: Cyrill Gorcunov Cc: Matthew Wilcox Cc: [3.16] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds CVE-2018-3620 CVE-2018-3646 (cherry picked from commit b38af4721f59d0b564468f623b3e52a638195015) Signed-off-by: Juerg Haefliger --- arch/x86/include/asm/pgtable.h | 9 +++++++-- mm/memory.c | 7 +++---- 2 files changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index aa0b02bd1855..0eeaaf82a299 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -140,8 +140,13 @@ static inline int pte_exec(pte_t pte) static inline int pte_special(pte_t pte) { - return (pte_flags(pte) & (_PAGE_PRESENT|_PAGE_SPECIAL)) == - (_PAGE_PRESENT|_PAGE_SPECIAL); + /* + * See CONFIG_NUMA_BALANCING pte_numa in include/asm-generic/pgtable.h. + * On x86 we have _PAGE_BIT_NUMA == _PAGE_BIT_GLOBAL+1 == + * __PAGE_BIT_SOFTW1 == _PAGE_BIT_SPECIAL. + */ + return (pte_flags(pte) & _PAGE_SPECIAL) && + (pte_flags(pte) & (_PAGE_PRESENT|_PAGE_PROTNONE)); } /* Entries that were set to PROT_NONE are inverted */ diff --git a/mm/memory.c b/mm/memory.c index c8140b854dd6..337dfc000343 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -745,7 +745,7 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn = pte_pfn(pte); if (HAVE_PTE_SPECIAL) { - if (likely(!pte_special(pte) || pte_numa(pte))) + if (likely(!pte_special(pte))) goto check_pfn; if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) return NULL; @@ -771,15 +771,14 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, } } + if (is_zero_pfn(pfn)) + return NULL; check_pfn: if (unlikely(pfn > highest_memmap_pfn)) { print_bad_pte(vma, addr, pte, NULL); return NULL; } - if (is_zero_pfn(pfn)) - return NULL; - /* * NOTE! We still have PageReserved() pages in the page tables. * eg. VDSO mappings can cause them to exist. From patchwork Wed Aug 22 06:40:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juerg Haefliger X-Patchwork-Id: 960797 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41wHx51mRPz9s4c; Wed, 22 Aug 2018 16:40:41 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1fsMoj-0006xK-Ru; Wed, 22 Aug 2018 06:40:33 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1fsMoh-0006vL-3L for kernel-team@lists.ubuntu.com; Wed, 22 Aug 2018 06:40:31 +0000 Received: from mail-ed1-f70.google.com ([209.85.208.70]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1fsMog-0004F8-Qh for kernel-team@lists.ubuntu.com; Wed, 22 Aug 2018 06:40:30 +0000 Received: by mail-ed1-f70.google.com with SMTP id v26-v6so497010eds.9 for ; Tue, 21 Aug 2018 23:40:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=yOnh9i0kWHZSVdXyOru2XDjy2fGTH2aacrNy9XbH1Sk=; b=CEI7yT1XobRQzySYJz721r3KlQ62NRbEdNA6nkFdlkY2ECjd9hsWMZihS7eStfTei2 nsb/hFwZJYGLdIaokgivAtrHG2kHcw3u7Bd7ozxdbdTviebz6QLit8mT1akFumSjJ7Ao WLm/7FqoJj8QkEVPRcKoY/lQee3O1CYeVwfXVTqRlxNryhQnVwC6AwXlEcQJG63hpxdC C4D+PN1sWv6vFo6S4/qohSFkXQ17vQ6QzCJiPUWeA+5oW9PFd7QrdpyCQrZ8uiIQg/kT H2fIVMMoHi+bZRrjK5t2CdUIAqyJUvsV/TUAn72N+5w6LQ51ALLQ2EtxM5WoI1yPSj40 +WNg== X-Gm-Message-State: AOUpUlEIpHUfx/5OT4uHcoVWJEllYdsjs+QjeLacqBRAKkgGDiYONUzf HDAPTKw4p153BvR1SIoERDmy6TNTecRLSytnAnO3xLb28m4FEaY1EDIDX7sgGLEsxVgvfRkjL3n 70CnwnT2W70UZihrgWEmHF9ntF646ctAaJpFROhkC0g== X-Received: by 2002:a50:f10d:: with SMTP id w13-v6mr16966399edl.0.1534920030397; Tue, 21 Aug 2018 23:40:30 -0700 (PDT) X-Google-Smtp-Source: AA+uWPyncMkcgpNF+4tns/YPyZCUYbEnm57Cs0ipnHa3sOtCU7eQ+OWf4AP+xmwtiUTfX/J13SdU7A== X-Received: by 2002:a50:f10d:: with SMTP id w13-v6mr16966386edl.0.1534920030249; Tue, 21 Aug 2018 23:40:30 -0700 (PDT) Received: from localhost.localdomain ([81.221.205.149]) by smtp.gmail.com with ESMTPSA id j23-v6sm529350edh.29.2018.08.21.23.40.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 21 Aug 2018 23:40:29 -0700 (PDT) From: Juerg Haefliger X-Google-Original-From: Juerg Haefliger To: kernel-team@lists.ubuntu.com Subject: [SRU][Trusty][PATCH 6/7] Revert "UBUNTU: [Config] disable NUMA_BALANCING" Date: Wed, 22 Aug 2018 08:40:20 +0200 Message-Id: <20180822064021.17216-7-juergh@canonical.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180822064021.17216-1-juergh@canonical.com> References: <20180822064021.17216-1-juergh@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: juergh@canonical.com MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" This reverts commit c68375ad13b90e33dcf9d5008957ecd0e9d2c331. We can reenable NUMA balancing again, now that _PAGE_NUMA is no longer aliased to _PAGE_PROTNONE which was causing issues with the PTE offset inversion introduced by the L1TF patches. CVE-2018-3620 CVE-2018-3646 Signed-off-by: Juerg Haefliger --- debian.master/config/config.common.ubuntu | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/debian.master/config/config.common.ubuntu b/debian.master/config/config.common.ubuntu index aa9d16a0d011..3689f20ba78f 100644 --- a/debian.master/config/config.common.ubuntu +++ b/debian.master/config/config.common.ubuntu @@ -334,6 +334,7 @@ CONFIG_ARCH_TEGRA_124_SOC=y CONFIG_ARCH_TEGRA_2x_SOC=y CONFIG_ARCH_TEGRA_3x_SOC=y # CONFIG_ARCH_U8500 is not set +CONFIG_ARCH_USES_NUMA_PROT_NONE=y CONFIG_ARCH_USES_PG_UNCACHED=y CONFIG_ARCH_USE_BUILTIN_BSWAP=y CONFIG_ARCH_USE_CMPXCHG_LOCKREF=y @@ -4599,7 +4600,8 @@ CONFIG_NTB_NETDEV=m # CONFIG_NTFS_DEBUG is not set CONFIG_NTFS_FS=m # CONFIG_NTFS_RW is not set -# CONFIG_NUMA_BALANCING is not set +CONFIG_NUMA_BALANCING=y +CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y # CONFIG_NUMA_EMU is not set CONFIG_NVEC_PAZ00=m CONFIG_NVEC_POWER=m From patchwork Wed Aug 22 06:40:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Juerg Haefliger X-Patchwork-Id: 960798 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=canonical.com Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41wHx62TCKz9s7c; Wed, 22 Aug 2018 16:40:42 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1fsMok-0006xu-M2; Wed, 22 Aug 2018 06:40:34 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1fsMoi-0006w5-Js for kernel-team@lists.ubuntu.com; Wed, 22 Aug 2018 06:40:32 +0000 Received: from mail-ed1-f69.google.com ([209.85.208.69]) by youngberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1fsMoh-0004FI-Vw for kernel-team@lists.ubuntu.com; Wed, 22 Aug 2018 06:40:32 +0000 Received: by mail-ed1-f69.google.com with SMTP id k48-v6so490523ede.14 for ; Tue, 21 Aug 2018 23:40:31 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=4UzNAt5MSHfg7To/RvzTuswrTI9fMjxNz4bVfRe2wqI=; b=JonttcVk1rCYDd9zeRjeyf62iqv+JwB/LsZw83eJLYvBrb4S3Xkm+szm8ay2H67RdK QUpUfqDGZpRvyQXbZD+9ZzpFyas91sAMErxWY2zuF15270UrbXHwylHyZVA6rBlRXxjx rjATW+F/1bUbkLE+9I3q89jrDofzvRVgL3+tDTgWLxM38N39ccvTKAgi+LHvQ8F2KcRA toziZIsQWcazeoYnkA+zEzuBsFJ//bLR6LY6CU5D3wm35iV9ssy9seaTD0aBwNNS/omO c2UsG8QCjf5wC9nYsBHSNXruk6iZJ45QUnD8zhiJSgSXNSFNPRnPeRSbz/6iicFvboIk aZqQ== X-Gm-Message-State: AOUpUlHpNaC4x+DXoFis4hsFELKx4P0GYfBPaU9jnIjOysgShmHaSYo/ SaecyLuRzqWtdPHVgZn6P46Ql7kHMc3UYOoofaehDkJg8aw+fsb2VS3IpRbKzmlyFRAfPa+IO/G 6FVwTvW0sm3b1SqQaHoQi8cF75B04KQ3yI7CjofyvaQ== X-Received: by 2002:a50:c34b:: with SMTP id q11-v6mr15889825edb.177.1534920031376; Tue, 21 Aug 2018 23:40:31 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZqDqpXYrGlE8fqb8HKKyXMVg5chNOzJb6JStQnl4XZt6Wff5EG9r7FsOd0rz5l5ZvXZ6tfNQ== X-Received: by 2002:a50:c34b:: with SMTP id q11-v6mr15889807edb.177.1534920031205; Tue, 21 Aug 2018 23:40:31 -0700 (PDT) Received: from localhost.localdomain ([81.221.205.149]) by smtp.gmail.com with ESMTPSA id j23-v6sm529350edh.29.2018.08.21.23.40.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 21 Aug 2018 23:40:30 -0700 (PDT) From: Juerg Haefliger X-Google-Original-From: Juerg Haefliger To: kernel-team@lists.ubuntu.com Subject: [SRU][Trusty][PATCH 7/7] UBUNTU: SAUCE: x86/fremap: Invert the offset when converting to/from a PTE Date: Wed, 22 Aug 2018 08:40:21 +0200 Message-Id: <20180822064021.17216-8-juergh@canonical.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20180822064021.17216-1-juergh@canonical.com> References: <20180822064021.17216-1-juergh@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: juergh@canonical.com MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" The 3.13 kernel still uses non-emulated code for the remap_file_pages syscall which makes use of macros to convert between page offsets and PTEs. These macros need to invert the offset for L1TF protection. Without this, the page table entries of a remapped and swapped out file page look like this: [28865.660359] virtual user addr: 00007fe49ea9b000 [28865.660360] page: ffffeddf927aa6c0 [28865.660361] pgd: ffff8802605267f8 (8000000260229067) | USR RW NX | pgd [28865.660365] pud: ffff880260229c90 (000000025f9d1067) | USR RW PAT x | pud 1G [28865.660368] pmd: ffff88025f9d17a8 (00000002602d8067) | USR RW x | pmd 2M [28865.660371] pte: ffff8802602d84d8 (00000000001f4040) | ro x | pte 4K ^^^^^^ non-inverted offset With this commit, they look like: [ 2564.508511] virtual user addr: 00007f728c787000 [ 2564.508514] page: ffffedddca31e1c0 [ 2564.508518] pgd: ffff8802603207f0 (800000026036b067) | USR RW NX | pgd [ 2564.508531] pud: ffff88026036be50 (0000000260ee6067) | USR RW x | pud 1G [ 2564.508543] pmd: ffff880260ee6318 (0000000260360067) | USR RW x | pmd 2M [ 2564.508554] pte: ffff880260360c38 (00003fffffe0b040) | ro x | pte 4K ^^^^^^ inverted offset Also make sure that the number of bits for the maximum offset for a remap is limited to 1 bit less than the number of actual physical bits so that the highest bit can be inverted by the conversion macros. CVE-2018-3620 CVE-2018-3646 Signed-off-by: Juerg Haefliger --- arch/x86/include/asm/pgtable_64.h | 22 ++++++++++++++++++---- mm/fremap.c | 6 ++++++ 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index a9b88fe94bfa..7342c233e9ca 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -154,10 +154,24 @@ static inline int pgd_large(pgd_t pgd) { return 0; } /* PUD - Level3 access */ /* PMD - Level 2 access */ -#define pte_to_pgoff(pte) ((pte_val((pte)) & PHYSICAL_PAGE_MASK) >> PAGE_SHIFT) -#define pgoff_to_pte(off) ((pte_t) { .pte = ((off) << PAGE_SHIFT) | \ - _PAGE_FILE }) -#define PTE_FILE_MAX_BITS __PHYSICAL_MASK_SHIFT +#define pte_to_pgoff(pte) ((~pte_val((pte)) & PHYSICAL_PAGE_MASK) >> PAGE_SHIFT) +#define pgoff_to_pte(off) ((pte_t) { .pte = \ + ((~(off) & (PHYSICAL_PAGE_MASK >> PAGE_SHIFT)) \ + << PAGE_SHIFT) | _PAGE_FILE }) +/* + * Set the highest allowed nonlinear pgoff to 1 bit less than + * x86_phys_bits to guarantee the inversion of the highest bit + * in the pgoff_to_pte conversion. The lowest x86_phys_bits is + * 36 so x86 implementations with 36 bits will find themselves + * unable to keep using remap_file_pages() with file offsets + * above 128TiB (calculated as 1<<(36-1+PAGE_SHIFT)). More + * recent CPUs will retain much higher max file offset limits. + */ +#ifdef PTE_FILE_MAX_BITS +#error "Huh? PTE_FILE_MAX_BITS shouldn't be defined here" +#endif +#define L1TF_PTE_FILE_MAX_BITS min(__PHYSICAL_MASK_SHIFT, \ + boot_cpu_data.x86_phys_bits - 1) /* PTE - Level 1 access. */ diff --git a/mm/fremap.c b/mm/fremap.c index fd94a867cda0..9959bad4ec55 100644 --- a/mm/fremap.c +++ b/mm/fremap.c @@ -153,10 +153,16 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, return err; /* Can we represent this offset inside this architecture's pte's? */ +#ifdef L1TF_PTE_FILE_MAX_BITS + if (L1TF_PTE_FILE_MAX_BITS < BITS_PER_LONG && + (pgoff + (size >> PAGE_SHIFT) >= (1UL << L1TF_PTE_FILE_MAX_BITS))) + return err; +#else #if PTE_FILE_MAX_BITS < BITS_PER_LONG if (pgoff + (size >> PAGE_SHIFT) >= (1UL << PTE_FILE_MAX_BITS)) return err; #endif +#endif /* L1TF_PTE_FILE_MAX_BITS */ /* We need down_write() to change vma->vm_flags. */ down_read(&mm->mmap_sem);