From patchwork Wed Dec 12 00:03:51 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
X-Patchwork-Id: 1011461
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <netdev-owner@vger.kernel.org>
X-Original-To: patchwork-incoming-netdev@ozlabs.org
Delivered-To: patchwork-incoming-netdev@ozlabs.org
Authentication-Results: ozlabs.org;
	spf=none (mailfrom) smtp.mailfrom=vger.kernel.org
	(client-ip=209.132.180.67; helo=vger.kernel.org;
	envelope-from=netdev-owner@vger.kernel.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
	dmarc=fail (p=none dis=none) header.from=intel.com
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by ozlabs.org (Postfix) with ESMTP id 43Dy1J2Q9Lz9s7T
	for <patchwork-incoming-netdev@ozlabs.org>;
	Wed, 12 Dec 2018 11:12:20 +1100 (AEDT)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1726301AbeLLAML (ORCPT
	<rfc822;patchwork-incoming-netdev@ozlabs.org>);
	Tue, 11 Dec 2018 19:12:11 -0500
Received: from mga04.intel.com ([192.55.52.120]:56048 "EHLO mga04.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1726218AbeLLAMJ (ORCPT <rfc822;netdev@vger.kernel.org>);
	Tue, 11 Dec 2018 19:12:09 -0500
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga005.jf.intel.com ([10.7.209.41])
	by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
	11 Dec 2018 16:12:07 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.56,343,1539673200"; d="scan'208";a="282839400"
Received: from rpedgeco-desk5.jf.intel.com ([10.54.75.141])
	by orsmga005.jf.intel.com with ESMTP; 11 Dec 2018 16:12:07 -0800
From: Rick Edgecombe <rick.p.edgecombe@intel.com>
To: akpm@linux-foundation.org, luto@kernel.org, will.deacon@arm.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-hardening@lists.openwall.com,
	naveen.n.rao@linux.vnet.ibm.com, anil.s.keshavamurthy@intel.com,
	davem@davemloft.net, mhiramat@kernel.org, rostedt@goodmis.org,
	mingo@redhat.com, ast@kernel.org, daniel@iogearbox.net,
	jeyu@kernel.org, namit@vmware.com, netdev@vger.kernel.org,
	ard.biesheuvel@linaro.org, jannh@google.com
Cc: kristen@linux.intel.com, dave.hansen@intel.com, deneen.t.dock@intel.com,
	Rick Edgecombe <rick.p.edgecombe@intel.com>
Subject: [PATCH v2 1/4] vmalloc: New flags for safe vfree on special perms
Date: Tue, 11 Dec 2018 16:03:51 -0800
Message-Id: <20181212000354.31955-2-rick.p.edgecombe@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20181212000354.31955-1-rick.p.edgecombe@intel.com>
References: <20181212000354.31955-1-rick.p.edgecombe@intel.com>
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

This adds two new flags VM_IMMEDIATE_UNMAP and VM_HAS_SPECIAL_PERMS, for
enabling vfree operations to immediately clear executable TLB entries to freed
pages, and handle freeing memory with special permissions.

In order to support vfree being called on memory that might be RO, the vfree
deferred list node is moved to a kmalloc allocated struct, from where it is
today, reusing the allocation being freed.

arch_vunmap is a new __weak function that implements the actual unmapping and
resetting of the direct map permissions. It can be overridden by more efficient
architecture specific implementations.

For the default implementation, it uses architecture agnostic methods which are
equivalent to what most usages do before calling vfree. So now it is just
centralized here.

This implementation derives from two sketches from Dave Hansen and Andy
Lutomirski.

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Suggested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 include/linux/vmalloc.h |  2 ++
 mm/vmalloc.c            | 73 +++++++++++++++++++++++++++++++++++++----
 2 files changed, 69 insertions(+), 6 deletions(-)
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 398e9c95cd61..872bcde17aca 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -21,6 +21,8 @@ struct notifier_block;		/* in notifier.h */
 #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
 #define VM_NO_GUARD		0x00000040      /* don't add guard page */
 #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
+#define VM_IMMEDIATE_UNMAP	0x00000200	/* flush before releasing pages */
+#define VM_HAS_SPECIAL_PERMS	0x00000400	/* may be freed with special perms */
 /* bits [20..32] reserved for arch specific ioremap internals */
 
 /*
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 97d4b25d0373..02b284d2245a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -18,6 +18,7 @@
 #include <linux/interrupt.h>
 #include <linux/proc_fs.h>
 #include <linux/seq_file.h>
+#include <linux/set_memory.h>
 #include <linux/debugobjects.h>
 #include <linux/kallsyms.h>
 #include <linux/list.h>
@@ -38,6 +39,11 @@
 
 #include "internal.h"
 
+struct vfree_work {
+	struct llist_node node;
+	void *addr;
+};
+
 struct vfree_deferred {
 	struct llist_head list;
 	struct work_struct wq;
@@ -50,9 +56,13 @@ static void free_work(struct work_struct *w)
 {
 	struct vfree_deferred *p = container_of(w, struct vfree_deferred, wq);
 	struct llist_node *t, *llnode;
+	struct vfree_work *cur;
 
-	llist_for_each_safe(llnode, t, llist_del_all(&p->list))
-		__vunmap((void *)llnode, 1);
+	llist_for_each_safe(llnode, t, llist_del_all(&p->list)) {
+		cur = container_of(llnode, struct vfree_work, node);
+		__vunmap(cur->addr, 1);
+		kfree(cur);
+	}
 }
 
 /*** Page table manipulation functions ***/
@@ -1494,6 +1504,48 @@ struct vm_struct *remove_vm_area(const void *addr)
 	return NULL;
 }
 
+/*
+ * This function handles unmapping and resetting the direct map as efficiently
+ * as it can with cross arch functions. The three categories of architectures
+ * are:
+ *   1. Architectures with no set_memory implementations and no direct map
+ *      permissions.
+ *   2. Architectures with set_memory implementations but no direct map
+ *      permissions
+ *   3. Architectures with set_memory implementations and direct map permissions
+ */
+void __weak arch_vunmap(struct vm_struct *area, int deallocate_pages)
+{
+	unsigned long addr = (unsigned long)area->addr;
+	int immediate = area->flags & VM_IMMEDIATE_UNMAP;
+	int special = area->flags & VM_HAS_SPECIAL_PERMS;
+
+	/*
+	 * In case of 2 and 3, use this general way of resetting the permissions
+	 * on the directmap. Do NX before RW, in case of X, so there is no W^X
+	 * violation window.
+	 *
+	 * For case 1 these will be noops.
+	 */
+	if (immediate)
+		set_memory_nx(addr, area->nr_pages);
+	if (deallocate_pages && special)
+		set_memory_rw(addr, area->nr_pages);
+
+	/* Always actually remove the area */
+	remove_vm_area(area->addr);
+
+	/*
+	 * Need to flush the TLB before freeing pages in the case of this flag.
+	 * As long as that's happening, unmap aliases.
+	 *
+	 * For 2 and 3, this will not be needed because of the set_memory_nx
+	 * above, because the stale TLBs will be NX.
+	 */
+	if (immediate && !IS_ENABLED(ARCH_HAS_SET_MEMORY))
+		vm_unmap_aliases();
+}
+
 static void __vunmap(const void *addr, int deallocate_pages)
 {
 	struct vm_struct *area;
@@ -1515,7 +1567,8 @@ static void __vunmap(const void *addr, int deallocate_pages)
 	debug_check_no_locks_freed(area->addr, get_vm_area_size(area));
 	debug_check_no_obj_freed(area->addr, get_vm_area_size(area));
 
-	remove_vm_area(addr);
+	arch_vunmap(area, deallocate_pages);
+
 	if (deallocate_pages) {
 		int i;
 
@@ -1542,8 +1595,15 @@ static inline void __vfree_deferred(const void *addr)
 	 * nother cpu's list.  schedule_work() should be fine with this too.
 	 */
 	struct vfree_deferred *p = raw_cpu_ptr(&vfree_deferred);
+	struct vfree_work *w = kmalloc(sizeof(struct vfree_work), GFP_ATOMIC);
+
+	/* If no memory for the deferred list node, give up */
+	if (!w)
+		return;
 
-	if (llist_add((struct llist_node *)addr, &p->list))
+	w->addr = (void *)addr;
+
+	if (llist_add(&w->node, &p->list))
 		schedule_work(&p->wq);
 }
 
@@ -1925,8 +1985,9 @@ EXPORT_SYMBOL(vzalloc_node);
 
 void *vmalloc_exec(unsigned long size)
 {
-	return __vmalloc_node(size, 1, GFP_KERNEL, PAGE_KERNEL_EXEC,
-			      NUMA_NO_NODE, __builtin_return_address(0));
+	return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END,
+			GFP_KERNEL, PAGE_KERNEL_EXEC, VM_IMMEDIATE_UNMAP,
+			NUMA_NO_NODE, __builtin_return_address(0));
 }
 
 #if defined(CONFIG_64BIT) && defined(CONFIG_ZONE_DMA32)

From patchwork Wed Dec 12 00:03:52 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
X-Patchwork-Id: 1011462
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <netdev-owner@vger.kernel.org>
X-Original-To: patchwork-incoming-netdev@ozlabs.org
Delivered-To: patchwork-incoming-netdev@ozlabs.org
Authentication-Results: ozlabs.org;
	spf=none (mailfrom) smtp.mailfrom=vger.kernel.org
	(client-ip=209.132.180.67; helo=vger.kernel.org;
	envelope-from=netdev-owner@vger.kernel.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
	dmarc=fail (p=none dis=none) header.from=intel.com
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by ozlabs.org (Postfix) with ESMTP id 43Dy1M4SWWz9s7T
	for <patchwork-incoming-netdev@ozlabs.org>;
	Wed, 12 Dec 2018 11:12:23 +1100 (AEDT)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1726283AbeLLAMK (ORCPT
	<rfc822;patchwork-incoming-netdev@ozlabs.org>);
	Tue, 11 Dec 2018 19:12:10 -0500
Received: from mga04.intel.com ([192.55.52.120]:56048 "EHLO mga04.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1726211AbeLLAMJ (ORCPT <rfc822;netdev@vger.kernel.org>);
	Tue, 11 Dec 2018 19:12:09 -0500
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga005.jf.intel.com ([10.7.209.41])
	by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
	11 Dec 2018 16:12:08 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.56,343,1539673200"; d="scan'208";a="282839403"
Received: from rpedgeco-desk5.jf.intel.com ([10.54.75.141])
	by orsmga005.jf.intel.com with ESMTP; 11 Dec 2018 16:12:07 -0800
From: Rick Edgecombe <rick.p.edgecombe@intel.com>
To: akpm@linux-foundation.org, luto@kernel.org, will.deacon@arm.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-hardening@lists.openwall.com,
	naveen.n.rao@linux.vnet.ibm.com, anil.s.keshavamurthy@intel.com,
	davem@davemloft.net, mhiramat@kernel.org, rostedt@goodmis.org,
	mingo@redhat.com, ast@kernel.org, daniel@iogearbox.net,
	jeyu@kernel.org, namit@vmware.com, netdev@vger.kernel.org,
	ard.biesheuvel@linaro.org, jannh@google.com
Cc: kristen@linux.intel.com, dave.hansen@intel.com, deneen.t.dock@intel.com,
	Rick Edgecombe <rick.p.edgecombe@intel.com>
Subject: [PATCH v2 2/4] modules: Add new special vfree flags
Date: Tue, 11 Dec 2018 16:03:52 -0800
Message-Id: <20181212000354.31955-3-rick.p.edgecombe@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20181212000354.31955-1-rick.p.edgecombe@intel.com>
References: <20181212000354.31955-1-rick.p.edgecombe@intel.com>
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

Add new flags for handling freeing of special permissioned memory in vmalloc,
and remove places where the handling was done in module.c.

This will enable this flag for all architectures.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 kernel/module.c | 43 ++++++++++++-------------------------------
 1 file changed, 12 insertions(+), 31 deletions(-)

diff --git a/kernel/module.c b/kernel/module.c
index 49a405891587..910f92b402f8 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1941,11 +1941,23 @@ void module_disable_ro(const struct module *mod)
 	frob_rodata(&mod->init_layout, set_memory_rw);
 }
 
+static void module_set_vm_flags(const struct module_layout *layout)
+{
+	struct vm_struct *vm = find_vm_area(layout->base);
+
+	if (vm) {
+		vm->flags |= VM_HAS_SPECIAL_PERMS;
+		vm->flags |= VM_IMMEDIATE_UNMAP;
+	}
+}
+
 void module_enable_ro(const struct module *mod, bool after_init)
 {
 	if (!rodata_enabled)
 		return;
 
+	module_set_vm_flags(&mod->core_layout);
+	module_set_vm_flags(&mod->init_layout);
 	frob_text(&mod->core_layout, set_memory_ro);
 	frob_rodata(&mod->core_layout, set_memory_ro);
 	frob_text(&mod->init_layout, set_memory_ro);
@@ -1964,15 +1976,6 @@ static void module_enable_nx(const struct module *mod)
 	frob_writable_data(&mod->init_layout, set_memory_nx);
 }
 
-static void module_disable_nx(const struct module *mod)
-{
-	frob_rodata(&mod->core_layout, set_memory_x);
-	frob_ro_after_init(&mod->core_layout, set_memory_x);
-	frob_writable_data(&mod->core_layout, set_memory_x);
-	frob_rodata(&mod->init_layout, set_memory_x);
-	frob_writable_data(&mod->init_layout, set_memory_x);
-}
-
 /* Iterate through all modules and set each module's text as RW */
 void set_all_modules_text_rw(void)
 {
@@ -2016,23 +2019,8 @@ void set_all_modules_text_ro(void)
 	}
 	mutex_unlock(&module_mutex);
 }
-
-static void disable_ro_nx(const struct module_layout *layout)
-{
-	if (rodata_enabled) {
-		frob_text(layout, set_memory_rw);
-		frob_rodata(layout, set_memory_rw);
-		frob_ro_after_init(layout, set_memory_rw);
-	}
-	frob_rodata(layout, set_memory_x);
-	frob_ro_after_init(layout, set_memory_x);
-	frob_writable_data(layout, set_memory_x);
-}
-
 #else
-static void disable_ro_nx(const struct module_layout *layout) { }
 static void module_enable_nx(const struct module *mod) { }
-static void module_disable_nx(const struct module *mod) { }
 #endif
 
 #ifdef CONFIG_LIVEPATCH
@@ -2163,7 +2151,6 @@ static void free_module(struct module *mod)
 	mutex_unlock(&module_mutex);
 
 	/* This may be empty, but that's OK */
-	disable_ro_nx(&mod->init_layout);
 	module_arch_freeing_init(mod);
 	module_memfree(mod->init_layout.base);
 	kfree(mod->args);
@@ -2173,7 +2160,6 @@ static void free_module(struct module *mod)
 	lockdep_free_key_range(mod->core_layout.base, mod->core_layout.size);
 
 	/* Finally, free the core (containing the module structure) */
-	disable_ro_nx(&mod->core_layout);
 	module_memfree(mod->core_layout.base);
 }
 
@@ -3497,7 +3483,6 @@ static noinline int do_init_module(struct module *mod)
 #endif
 	module_enable_ro(mod, true);
 	mod_tree_remove_init(mod);
-	disable_ro_nx(&mod->init_layout);
 	module_arch_freeing_init(mod);
 	mod->init_layout.base = NULL;
 	mod->init_layout.size = 0;
@@ -3812,10 +3797,6 @@ static int load_module(struct load_info *info, const char __user *uargs,
 	module_bug_cleanup(mod);
 	mutex_unlock(&module_mutex);
 
-	/* we can't deallocate the module until we clear memory protection */
-	module_disable_ro(mod);
-	module_disable_nx(mod);
-
  ddebug_cleanup:
 	ftrace_release_mod(mod);
 	dynamic_debug_remove(mod, info->debug);

From patchwork Wed Dec 12 00:03:53 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
X-Patchwork-Id: 1011463
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <netdev-owner@vger.kernel.org>
X-Original-To: patchwork-incoming-netdev@ozlabs.org
Delivered-To: patchwork-incoming-netdev@ozlabs.org
Authentication-Results: ozlabs.org;
	spf=none (mailfrom) smtp.mailfrom=vger.kernel.org
	(client-ip=209.132.180.67; helo=vger.kernel.org;
	envelope-from=netdev-owner@vger.kernel.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
	dmarc=fail (p=none dis=none) header.from=intel.com
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by ozlabs.org (Postfix) with ESMTP id 43Dy1Z2wLqz9s7T
	for <patchwork-incoming-netdev@ozlabs.org>;
	Wed, 12 Dec 2018 11:12:34 +1100 (AEDT)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1726349AbeLLAM0 (ORCPT
	<rfc822;patchwork-incoming-netdev@ozlabs.org>);
	Tue, 11 Dec 2018 19:12:26 -0500
Received: from mga04.intel.com ([192.55.52.120]:56050 "EHLO mga04.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1726267AbeLLAMJ (ORCPT <rfc822;netdev@vger.kernel.org>);
	Tue, 11 Dec 2018 19:12:09 -0500
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga005.jf.intel.com ([10.7.209.41])
	by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
	11 Dec 2018 16:12:08 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.56,343,1539673200"; d="scan'208";a="282839406"
Received: from rpedgeco-desk5.jf.intel.com ([10.54.75.141])
	by orsmga005.jf.intel.com with ESMTP; 11 Dec 2018 16:12:07 -0800
From: Rick Edgecombe <rick.p.edgecombe@intel.com>
To: akpm@linux-foundation.org, luto@kernel.org, will.deacon@arm.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-hardening@lists.openwall.com,
	naveen.n.rao@linux.vnet.ibm.com, anil.s.keshavamurthy@intel.com,
	davem@davemloft.net, mhiramat@kernel.org, rostedt@goodmis.org,
	mingo@redhat.com, ast@kernel.org, daniel@iogearbox.net,
	jeyu@kernel.org, namit@vmware.com, netdev@vger.kernel.org,
	ard.biesheuvel@linaro.org, jannh@google.com
Cc: kristen@linux.intel.com, dave.hansen@intel.com, deneen.t.dock@intel.com,
	Rick Edgecombe <rick.p.edgecombe@intel.com>
Subject: [PATCH v2 3/4] bpf: switch to new vmalloc vfree flags
Date: Tue, 11 Dec 2018 16:03:53 -0800
Message-Id: <20181212000354.31955-4-rick.p.edgecombe@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20181212000354.31955-1-rick.p.edgecombe@intel.com>
References: <20181212000354.31955-1-rick.p.edgecombe@intel.com>
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

This switches to use the new vmalloc flags to control freeing memory with
special permissions.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 include/linux/filter.h | 26 ++++++++++++--------------
 kernel/bpf/core.c      |  1 -
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 795ff0b869bb..2aeb93d3337d 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -20,6 +20,7 @@
 #include <linux/set_memory.h>
 #include <linux/kallsyms.h>
 #include <linux/if_vlan.h>
+#include <linux/vmalloc.h>
 
 #include <net/sch_generic.h>
 
@@ -487,7 +488,6 @@ struct bpf_prog {
 	u16			pages;		/* Number of allocated pages */
 	u16			jited:1,	/* Is our filter JIT'ed? */
 				jit_requested:1,/* archs need to JIT the prog */
-				undo_set_mem:1,	/* Passed set_memory_ro() checkpoint */
 				gpl_compatible:1, /* Is filter GPL compatible? */
 				cb_access:1,	/* Is control block accessed? */
 				dst_needed:1,	/* Do we need dst entry? */
@@ -699,24 +699,23 @@ bpf_ctx_narrow_access_ok(u32 off, u32 size, u32 size_default)
 
 static inline void bpf_prog_lock_ro(struct bpf_prog *fp)
 {
-	fp->undo_set_mem = 1;
-	set_memory_ro((unsigned long)fp, fp->pages);
-}
+	struct vm_struct *vm = find_vm_area(fp);
 
-static inline void bpf_prog_unlock_ro(struct bpf_prog *fp)
-{
-	if (fp->undo_set_mem)
-		set_memory_rw((unsigned long)fp, fp->pages);
+	if (vm)
+		vm->flags |= VM_HAS_SPECIAL_PERMS;
+	set_memory_ro((unsigned long)fp, fp->pages);
 }
 
 static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr)
 {
-	set_memory_ro((unsigned long)hdr, hdr->pages);
-}
+	struct vm_struct *vm = find_vm_area(hdr);
 
-static inline void bpf_jit_binary_unlock_ro(struct bpf_binary_header *hdr)
-{
-	set_memory_rw((unsigned long)hdr, hdr->pages);
+	if (vm) {
+		vm->flags |= VM_HAS_SPECIAL_PERMS;
+		vm->flags |= VM_IMMEDIATE_UNMAP;
+	}
+
+	set_memory_ro((unsigned long)hdr, hdr->pages);
 }
 
 static inline struct bpf_binary_header *
@@ -746,7 +745,6 @@ void __bpf_prog_free(struct bpf_prog *fp);
 
 static inline void bpf_prog_unlock_free(struct bpf_prog *fp)
 {
-	bpf_prog_unlock_ro(fp);
 	__bpf_prog_free(fp);
 }
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index b1a3545d0ec8..bd3efd7ce526 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -663,7 +663,6 @@ void __weak bpf_jit_free(struct bpf_prog *fp)
 	if (fp->jited) {
 		struct bpf_binary_header *hdr = bpf_jit_binary_hdr(fp);
 
-		bpf_jit_binary_unlock_ro(hdr);
 		bpf_jit_binary_free(hdr);
 
 		WARN_ON_ONCE(!bpf_prog_kallsyms_verify_off(fp));

From patchwork Wed Dec 12 00:03:54 2018
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
X-Patchwork-Id: 1011464
X-Patchwork-Delegate: bpf@iogearbox.net
Return-Path: <netdev-owner@vger.kernel.org>
X-Original-To: patchwork-incoming-netdev@ozlabs.org
Delivered-To: patchwork-incoming-netdev@ozlabs.org
Authentication-Results: ozlabs.org;
	spf=none (mailfrom) smtp.mailfrom=vger.kernel.org
	(client-ip=209.132.180.67; helo=vger.kernel.org;
	envelope-from=netdev-owner@vger.kernel.org;
	receiver=<UNKNOWN>)
Authentication-Results: ozlabs.org;
	dmarc=fail (p=none dis=none) header.from=intel.com
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by ozlabs.org (Postfix) with ESMTP id 43Dy1d6DPMz9s7T
	for <patchwork-incoming-netdev@ozlabs.org>;
	Wed, 12 Dec 2018 11:12:37 +1100 (AEDT)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1726333AbeLLAMZ (ORCPT
	<rfc822;patchwork-incoming-netdev@ozlabs.org>);
	Tue, 11 Dec 2018 19:12:25 -0500
Received: from mga04.intel.com ([192.55.52.120]:56048 "EHLO mga04.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1726269AbeLLAMJ (ORCPT <rfc822;netdev@vger.kernel.org>);
	Tue, 11 Dec 2018 19:12:09 -0500
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from orsmga005.jf.intel.com ([10.7.209.41])
	by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
	11 Dec 2018 16:12:08 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.56,343,1539673200"; d="scan'208";a="282839408"
Received: from rpedgeco-desk5.jf.intel.com ([10.54.75.141])
	by orsmga005.jf.intel.com with ESMTP; 11 Dec 2018 16:12:07 -0800
From: Rick Edgecombe <rick.p.edgecombe@intel.com>
To: akpm@linux-foundation.org, luto@kernel.org, will.deacon@arm.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-hardening@lists.openwall.com,
	naveen.n.rao@linux.vnet.ibm.com, anil.s.keshavamurthy@intel.com,
	davem@davemloft.net, mhiramat@kernel.org, rostedt@goodmis.org,
	mingo@redhat.com, ast@kernel.org, daniel@iogearbox.net,
	jeyu@kernel.org, namit@vmware.com, netdev@vger.kernel.org,
	ard.biesheuvel@linaro.org, jannh@google.com
Cc: kristen@linux.intel.com, dave.hansen@intel.com, deneen.t.dock@intel.com,
	Rick Edgecombe <rick.p.edgecombe@intel.com>
Subject: [PATCH v2 4/4] x86/vmalloc: Add TLB efficient x86 arch_vunmap
Date: Tue, 11 Dec 2018 16:03:54 -0800
Message-Id: <20181212000354.31955-5-rick.p.edgecombe@intel.com>
X-Mailer: git-send-email 2.17.1
In-Reply-To: <20181212000354.31955-1-rick.p.edgecombe@intel.com>
References: <20181212000354.31955-1-rick.p.edgecombe@intel.com>
Sender: netdev-owner@vger.kernel.org
Precedence: bulk
List-ID: <netdev.vger.kernel.org>
X-Mailing-List: netdev@vger.kernel.org

This adds a more efficient x86 architecture specific implementation of
arch_vunmap, that can free any type of special permission memory with only 1 TLB
flush.

In order to enable this, _set_pages_p and _set_pages_np are made non-static and
renamed set_pages_p_noflush and set_pages_np_noflush to better communicate
their different (non-flushing) behavior from the rest of the set_pages_*
functions.

The method for doing this with only 1 TLB flush was suggested by Andy
Lutomirski.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/x86/include/asm/set_memory.h |  2 +
 arch/x86/mm/Makefile              |  3 +-
 arch/x86/mm/pageattr.c            | 11 +++--
 arch/x86/mm/vmalloc.c             | 71 +++++++++++++++++++++++++++++++
 4 files changed, 80 insertions(+), 7 deletions(-)
 create mode 100644 arch/x86/mm/vmalloc.c

diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index 07a25753e85c..70ee81e8914b 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -84,6 +84,8 @@ int set_pages_x(struct page *page, int numpages);
 int set_pages_nx(struct page *page, int numpages);
 int set_pages_ro(struct page *page, int numpages);
 int set_pages_rw(struct page *page, int numpages);
+int set_pages_np_noflush(struct page *page, int numpages);
+int set_pages_p_noflush(struct page *page, int numpages);
 
 extern int kernel_set_to_readonly;
 void set_kernel_text_rw(void);
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 4b101dd6e52f..189681f863a6 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -13,7 +13,8 @@ CFLAGS_REMOVE_mem_encrypt_identity.o	= -pg
 endif
 
 obj-y	:=  init.o init_$(BITS).o fault.o ioremap.o extable.o pageattr.o mmap.o \
-	    pat.o pgtable.o physaddr.o setup_nx.o tlb.o cpu_entry_area.o
+	    pat.o pgtable.o physaddr.o setup_nx.o tlb.o cpu_entry_area.o \
+	    vmalloc.o
 
 # Make sure __phys_addr has no stackprotector
 nostackp := $(call cc-option, -fno-stack-protector)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index db7a10082238..db0a4dfb5a7f 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -2248,9 +2248,7 @@ int set_pages_rw(struct page *page, int numpages)
 	return set_memory_rw(addr, numpages);
 }
 
-#ifdef CONFIG_DEBUG_PAGEALLOC
-
-static int __set_pages_p(struct page *page, int numpages)
+int set_pages_p_noflush(struct page *page, int numpages)
 {
 	unsigned long tempaddr = (unsigned long) page_address(page);
 	struct cpa_data cpa = { .vaddr = &tempaddr,
@@ -2269,7 +2267,7 @@ static int __set_pages_p(struct page *page, int numpages)
 	return __change_page_attr_set_clr(&cpa, 0);
 }
 
-static int __set_pages_np(struct page *page, int numpages)
+int set_pages_np_noflush(struct page *page, int numpages)
 {
 	unsigned long tempaddr = (unsigned long) page_address(page);
 	struct cpa_data cpa = { .vaddr = &tempaddr,
@@ -2288,6 +2286,7 @@ static int __set_pages_np(struct page *page, int numpages)
 	return __change_page_attr_set_clr(&cpa, 0);
 }
 
+#ifdef CONFIG_DEBUG_PAGEALLOC
 void __kernel_map_pages(struct page *page, int numpages, int enable)
 {
 	if (PageHighMem(page))
@@ -2303,9 +2302,9 @@ void __kernel_map_pages(struct page *page, int numpages, int enable)
 	 * and hence no memory allocations during large page split.
 	 */
 	if (enable)
-		__set_pages_p(page, numpages);
+		set_pages_p_noflush(page, numpages);
 	else
-		__set_pages_np(page, numpages);
+		set_pages_np_noflush(page, numpages);
 
 	/*
 	 * We should perform an IPI and flush all tlbs,
diff --git a/arch/x86/mm/vmalloc.c b/arch/x86/mm/vmalloc.c
new file mode 100644
index 000000000000..be9ea42c3dfe
--- /dev/null
+++ b/arch/x86/mm/vmalloc.c
@@ -0,0 +1,71 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * vmalloc.c: x86 arch version of vmalloc.c
+ *
+ * (C) Copyright 2018 Intel Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+#include <linux/mm.h>
+#include <linux/set_memory.h>
+#include <linux/vmalloc.h>
+
+static void set_area_direct_np(struct vm_struct *area)
+{
+	int i;
+
+	for (i = 0; i < area->nr_pages; i++)
+		set_pages_np_noflush(area->pages[i], 1);
+}
+
+static void set_area_direct_prw(struct vm_struct *area)
+{
+	int i;
+
+	for (i = 0; i < area->nr_pages; i++)
+		set_pages_p_noflush(area->pages[i], 1);
+}
+
+void arch_vunmap(struct vm_struct *area, int deallocate_pages)
+{
+	int immediate = area->flags & VM_IMMEDIATE_UNMAP;
+	int special = area->flags & VM_HAS_SPECIAL_PERMS;
+
+	/* Unmap from vmalloc area */
+	remove_vm_area(area->addr);
+
+	/* If no need to reset directmap perms, just check if need to flush */
+	if (!(deallocate_pages || special)) {
+		if (immediate)
+			vm_unmap_aliases();
+		return;
+	}
+
+	/* From here we need to make sure to reset the direct map perms */
+
+	/*
+	 * If the area being freed does not have any extra capabilities, we can
+	 * just reset the directmap to RW before freeing.
+	 */
+	if (!immediate) {
+		set_area_direct_prw(area);
+		vm_unmap_aliases();
+		return;
+	}
+
+	/*
+	 * If the vm being freed has security sensitive capabilities such as
+	 * executable we need to make sure there is no W window on the directmap
+	 * before removing the X in the TLB. So we set not present first so we
+	 * can flush without any other CPU picking up the mapping. Then we reset
+	 * RW+P without a flush, since NP prevented it from being cached by
+	 * other cpus.
+	 */
+	set_area_direct_np(area);
+	vm_unmap_aliases();
+	set_area_direct_prw(area);
+}