From patchwork Wed Oct 19 15:09:02 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tim Gardner X-Patchwork-Id: 684172 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 3szb2j5428z9t1d; Thu, 20 Oct 2016 02:09:49 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical-com.20150623.gappssmtp.com header.i=@canonical-com.20150623.gappssmtp.com header.b=L3Nw04UG; dkim-atps=neutral Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1bwsV0-0001K7-5R; Wed, 19 Oct 2016 15:09:46 +0000 Received: from mail-pf0-f173.google.com ([209.85.192.173]) by huckleberry.canonical.com with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.76) (envelope-from ) id 1bwsUg-0001C1-Bv for kernel-team@lists.ubuntu.com; Wed, 19 Oct 2016 15:09:26 +0000 Received: by mail-pf0-f173.google.com with SMTP id 128so17788662pfz.0 for ; Wed, 19 Oct 2016 08:09:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical-com.20150623.gappssmtp.com; s=20150623; h=from:to:subject:date:message-id:in-reply-to:references; bh=hZbkaLw0SG5UaYKNPrb1KdZFyKpqGHQabFqj5/jT0x4=; b=L3Nw04UGixr/ZDgAoZbtJBrx682LhuE2sR8kfnOkpbWP2TwlcfTedi9hrg2m216BuH RhIAedUcgPgZx3+7S1EMEGe8KICOHn5r/6W9lKH8mi5ESzFEA/IJCIus1UVpl7V3ynBG H+cYgXrCOH/d0Dzj/h3cNr4LEGNdWrzaEqguVRr2WgiLuP1WUWQfgoIdZMSXdHYI7dTS madQSxpVsGGZt2la2lusE7UbqbYzdT9MhF2TPTA51Q5cCCIXHdfgflpp94bvSTsyo4Jj 5uxkj0Eb2FlZ6SOYmQl4t+j8sbD2g3b92/54Xhs72WuU94+dU7agI0Vc+28cByxPHzrj 8FBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=hZbkaLw0SG5UaYKNPrb1KdZFyKpqGHQabFqj5/jT0x4=; b=GB6f88fVzCA7dTWlu2O2QWEpE8By+TKK7C0OIqxEhY3ZGtdym+fxj+EKdNSumDra2E FHsMYhTkrZMr1DMsQuT94cg/4oievtZ5ymiJ1DDhNagoWkcG3H41K34TY5XaN4yKQHNl 1jsbnvLLQnIJhTUN6DaeXhcQC0Ibml7olArb3bDDhApvfTDsrKle9hTsV99Ev0DBVKCt xACcwa5n4M6W6R9WJntmb+5sFEzi909UM+psDc411R9eUXRcqtY75wIpXznZV4hhgcNa +gmI3KN8E3NG7MJ2cSYWUijBD+zaGDcsu8KhxsbxmdXrPxEIxXHDIKOuHVF6VEDlGK54 bqbg== X-Gm-Message-State: AA6/9RneIGMWlcCyRPli55fYi/6DiHp969T/4fowFY+cyS8T7mbd3XwaLoSXuHncIDw2z3HL X-Received: by 10.99.49.133 with SMTP id x127mr2089490pgx.68.1476889764620; Wed, 19 Oct 2016 08:09:24 -0700 (PDT) Received: from m3800.tpi.com (mail.tpi.com. [74.45.170.26]) by smtp.gmail.com with ESMTPSA id 3sm20987289pam.21.2016.10.19.08.09.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 19 Oct 2016 08:09:24 -0700 (PDT) From: Tim Gardner To: kernel-team@lists.ubuntu.com Subject: [PATCH 02/10] mm: Implement new pkey_mprotect() system call Date: Wed, 19 Oct 2016 08:09:02 -0700 Message-Id: <1476889750-20956-3-git-send-email-tim.gardner@canonical.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1476889750-20956-1-git-send-email-tim.gardner@canonical.com> References: <1476889750-20956-1-git-send-email-tim.gardner@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.14 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: kernel-team-bounces@lists.ubuntu.com From: Dave Hansen BugLink: http://bugs.launchpad.net/bugs/1591804 pkey_mprotect() is just like mprotect, except it also takes a protection key as an argument. On systems that do not support protection keys, it still works, but requires that key=0. Otherwise it does exactly what mprotect does. I expect it to get used like this, if you want to guarantee that any mapping you create can *never* be accessed without the right protection keys set up. int real_prot = PROT_READ|PROT_WRITE; pkey = pkey_alloc(0, PKEY_DENY_ACCESS); ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey); This way, there is *no* window where the mapping is accessible since it was always either PROT_NONE or had a protection key set that denied all access. We settled on 'unsigned long' for the type of the key here. We only need 4 bits on x86 today, but I figured that other architectures might need some more space. Semantically, we have a bit of a problem if we combine this syscall with our previously-introduced execute-only support: What do we do when we mix execute-only pkey use with pkey_mprotect() use? For instance: pkey_mprotect(ptr, PAGE_SIZE, PROT_WRITE, 6); // set pkey=6 mprotect(ptr, PAGE_SIZE, PROT_EXEC); // set pkey=X_ONLY_PKEY? mprotect(ptr, PAGE_SIZE, PROT_WRITE); // is pkey=6 again? To solve that, we make the plain-mprotect()-initiated execute-only support only apply to VMAs that have the default protection key (0) set on them. Proposed semantics: 1. protection key 0 is special and represents the default, "unassigned" protection key. It is always allocated. 2. mprotect() never affects a mapping's pkey_mprotect()-assigned protection key. A protection key of 0 (even if set explicitly) represents an unassigned protection key. 2a. mprotect(PROT_EXEC) on a mapping with an assigned protection key may or may not result in a mapping with execute-only properties. pkey_mprotect() plus pkey_set() on all threads should be used to _guarantee_ execute-only semantics if this is not a strong enough semantic. 3. mprotect(PROT_EXEC) may result in an "execute-only" mapping. The kernel will internally attempt to allocate and dedicate a protection key for the purpose of execute-only mappings. This may not be possible in cases where there are no free protection keys available. It can also happen, of course, in situations where there is no hardware support for protection keys. Signed-off-by: Dave Hansen Acked-by: Mel Gorman Cc: linux-arch@vger.kernel.org Cc: Dave Hansen Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163012.3DDD36C4@viggo.jf.intel.com Signed-off-by: Thomas Gleixner (cherry picked from commit 7d06d9c9bd813fc956b9c7bffc1b9724009983eb) Signed-off-by: Tim Gardner --- arch/x86/include/asm/mmu_context.h | 15 ++++++++++----- arch/x86/include/asm/pkeys.h | 11 +++++++++-- include/linux/pkeys.h | 12 ------------ mm/mprotect.c | 30 ++++++++++++++++++++++++++---- 4 files changed, 45 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index d8abfcf..af0251f 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -4,6 +4,7 @@ #include #include #include +#include #include @@ -195,16 +196,20 @@ static inline void arch_unmap(struct mm_struct *mm, struct vm_area_struct *vma, mpx_notify_unmap(mm, vma, start, end); } +#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS static inline int vma_pkey(struct vm_area_struct *vma) { - u16 pkey = 0; -#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS unsigned long vma_pkey_mask = VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | VM_PKEY_BIT3; - pkey = (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT; -#endif - return pkey; + + return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT; +} +#else +static inline int vma_pkey(struct vm_area_struct *vma) +{ + return 0; } +#endif static inline bool __pkru_allows_pkey(u16 pkey, bool write) { diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index 7b84565..33777c2 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -1,7 +1,12 @@ #ifndef _ASM_X86_PKEYS_H #define _ASM_X86_PKEYS_H -#define arch_max_pkey() (boot_cpu_has(X86_FEATURE_OSPKE) ? 16 : 1) +#define PKEY_DEDICATED_EXECUTE_ONLY 15 +/* + * Consider the PKEY_DEDICATED_EXECUTE_ONLY key unavailable. + */ +#define arch_max_pkey() (boot_cpu_has(X86_FEATURE_OSPKE) ? \ + PKEY_DEDICATED_EXECUTE_ONLY : 1) extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val); @@ -10,7 +15,6 @@ extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, * Try to dedicate one of the protection keys to be used as an * execute-only protection key. */ -#define PKEY_DEDICATED_EXECUTE_ONLY 15 extern int __execute_only_pkey(struct mm_struct *mm); static inline int execute_only_pkey(struct mm_struct *mm) { @@ -31,4 +35,7 @@ static inline int arch_override_mprotect_pkey(struct vm_area_struct *vma, return __arch_override_mprotect_pkey(vma, prot, pkey); } +extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey, + unsigned long init_val); + #endif /*_ASM_X86_PKEYS_H */ diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index 1d405a2..0030b40 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -18,16 +18,4 @@ #define PKEY_DEDICATED_EXECUTE_ONLY 0 #endif /* ! CONFIG_ARCH_HAS_PKEYS */ -/* - * This is called from mprotect_pkey(). - * - * Returns true if the protection keys is valid. - */ -static inline bool validate_pkey(int pkey) -{ - if (pkey < 0) - return false; - return (pkey < arch_max_pkey()); -} - #endif /* _LINUX_PKEYS_H */ diff --git a/mm/mprotect.c b/mm/mprotect.c index a4830f0..dd3f40a 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -352,8 +352,11 @@ fail: return error; } -SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len, - unsigned long, prot) +/* + * pkey==-1 when doing a legacy mprotect() + */ +static int do_mprotect_pkey(unsigned long start, size_t len, + unsigned long prot, int pkey) { unsigned long nstart, end, tmp, reqprot; struct vm_area_struct *vma, *prev; @@ -361,6 +364,12 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len, const int grows = prot & (PROT_GROWSDOWN|PROT_GROWSUP); const bool rier = (current->personality & READ_IMPLIES_EXEC) && (prot & PROT_READ); + /* + * A temporary safety check since we are not validating + * the pkey before we introduce the allocation code. + */ + if (pkey != -1) + return -EINVAL; prot &= ~(PROT_GROWSDOWN|PROT_GROWSUP); if (grows == (PROT_GROWSDOWN|PROT_GROWSUP)) /* can't be both */ @@ -409,7 +418,7 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len, for (nstart = start ; ; ) { unsigned long newflags; - int pkey = arch_override_mprotect_pkey(vma, prot, -1); + int new_vma_pkey; /* Here we know that vma->vm_start <= nstart < vma->vm_end. */ @@ -417,7 +426,8 @@ SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len, if (rier && (vma->vm_flags & VM_MAYEXEC)) prot |= PROT_EXEC; - newflags = calc_vm_prot_bits(prot, pkey); + new_vma_pkey = arch_override_mprotect_pkey(vma, prot, pkey); + newflags = calc_vm_prot_bits(prot, new_vma_pkey); newflags |= (vma->vm_flags & ~(VM_READ | VM_WRITE | VM_EXEC)); /* newflags >> 4 shift VM_MAY% in place of VM_% */ @@ -454,3 +464,15 @@ out: up_write(¤t->mm->mmap_sem); return error; } + +SYSCALL_DEFINE3(mprotect, unsigned long, start, size_t, len, + unsigned long, prot) +{ + return do_mprotect_pkey(start, len, prot, -1); +} + +SYSCALL_DEFINE4(pkey_mprotect, unsigned long, start, size_t, len, + unsigned long, prot, int, pkey) +{ + return do_mprotect_pkey(start, len, prot, pkey); +}