diff mbox series

[1/1] vmalloc: Fix issues with flush flag

Message ID 20190517210123.5702-2-rick.p.edgecombe@intel.com
State Not Applicable
Delegated to: BPF Maintainers
Headers show
Series [1/1] vmalloc: Fix issues with flush flag | expand

Commit Message

Edgecombe, Rick P May 17, 2019, 9:01 p.m. UTC
Meelis Roos reported issues with the new VM_FLUSH_RESET_PERMS flag on the
sparc architecture.

When freeing many BPF JITs at once the free operations can become stuck
waiting for locks as they each try to vm_unmap_aliases(). Calls to this
function happen frequently on some archs, but in vmalloc itself the lazy
purge operations happens more rarely, where only in extreme cases could
multiple purges be happening at once. Since this is cross platform code we
shouldn't do this here where it could happen concurrently in a burst, and
instead just flush the TLB. Also, add a little logic to skip calls to
page_address() when possible to further speed this up, since they may have
locking on some archs.

Lastly, it appears that the calculation of the address range to flush
was broken at some point, so fix that as well.

Fixes: 868b104d7379 ("mm/vmalloc: Add flag for freeing of special permsissions")
Reported-by: Meelis Roos <mroos@linux.ee>
Cc: Meelis Roos <mroos@linux.ee>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Nadav Amit <namit@vmware.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 mm/vmalloc.c | 23 +++++++++++++----------
 1 file changed, 13 insertions(+), 10 deletions(-)

Comments

Edgecombe, Rick P May 20, 2019, 3:54 a.m. UTC | #1
Hi,

After investigating this more, I am not positive why this fixes the
issue on sparc. I will continue to investigate as best I can, but would
like to request help from some sparc experts on evaluating my line of
thinking. I think the changes in this patch are still very worthwhile
generally though.


Besides fixing the sparc issue:

1. The fixes for the calculation of the direct map address range are
important on x86 in case a RO direct map alias ever gets loaded into
the TLB. This shouldn't normally happen, but it could cause the
permissions to not get reset on the direct map alias, and then the page
would return from the page allocator to some other component as RO and
cause a crash. This was mostly broken implementing a style suggestion
late in the development. As best I can tell, it shouldn't have any
effect on sparc.

2. Simply flushing the TLB instead of the whole vm_unmap_alias()
operation makes the frees faster and pushes the heavy work to happen on
allocation where it would be more expected. vm_unmap_alias() takes some
locks including a long hold of vmap_purge_lock, which will make all
other VM_FLUSH_RESET_PERMS vfrees wait while the purge operation
happens.


The issue observed on an UltraSparc III system was a hang on boot. The
only significant difference I can find in how Sparc works in this area
is that there is actually special optimization in the TLB flush for
handling vmalloc lazy purge operations.

Some firmware mappings live between the modules and vmalloc ranges, and
if their translations are flushed can cause "hard hangs and crashes
[1]. Additionally in the mix, "sparc64 kernel learns about
openfirmware's dynamic mappings in this region early in the boot, and
then services TLB misses in this area".[1] The firmware protection
logic seems to be in place, however later another change was made in
the lower asm to do a "flush all" if the range was big enough on this
cpu [2]. With the advent of the change this patch addresses, the purge
operations would be happening much earlier than before, with the first
special permissioned vfree, instead of after the machine has been
running for some time and the vmalloc spaces had become fragmented.

So my best theory is that the history of vmalloc lazy purges causing
hangs on the sparc has come into play here somehow, triggered by that
we were doing the purges much earlier. If it was something like this,
the fact that we instead only flush the small allocation itself on
sparc after this patch would be the reason why it fixes it.

Admittedly, there are some missing pieces in the theory. If there are
any sparc architecture experts that can help enlighten me if this
sounds reasonable at all I would really appreciate it.

Thanks,

Rick

[1] https://patchwork.ozlabs.org/patch/376523/
[2] https://patchwork.ozlabs.org/patch/687780/
Edgecombe, Rick P May 20, 2019, 7:13 p.m. UTC | #2
On Fri, 2019-05-17 at 14:01 -0700, Rick Edgecomb e wrote:
> Meelis Roos reported issues with the new VM_FLUSH_RESET_PERMS flag on
> the
> sparc architecture.
> 
Argh, this patch is not correct in the flush range for non-x86. I'll
send a revision.
diff mbox series

Patch

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 67bbb8d2a0a8..5daa7ec8950f 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1531,9 +1531,10 @@  static inline void set_area_direct_map(const struct vm_struct *area,
 /* Handle removing and resetting vm mappings related to the vm_struct. */
 static void vm_remove_mappings(struct vm_struct *area, int deallocate_pages)
 {
+	const bool has_set_direct = IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP);
+	const bool flush_reset = area->flags & VM_FLUSH_RESET_PERMS;
 	unsigned long addr = (unsigned long)area->addr;
-	unsigned long start = ULONG_MAX, end = 0;
-	int flush_reset = area->flags & VM_FLUSH_RESET_PERMS;
+	unsigned long start = addr, end = addr + get_vm_area_size(area);
 	int i;
 
 	/*
@@ -1542,7 +1543,7 @@  static void vm_remove_mappings(struct vm_struct *area, int deallocate_pages)
 	 * This is concerned with resetting the direct map any an vm alias with
 	 * execute permissions, without leaving a RW+X window.
 	 */
-	if (flush_reset && !IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) {
+	if (flush_reset && !has_set_direct) {
 		set_memory_nx(addr, area->nr_pages);
 		set_memory_rw(addr, area->nr_pages);
 	}
@@ -1555,22 +1556,24 @@  static void vm_remove_mappings(struct vm_struct *area, int deallocate_pages)
 
 	/*
 	 * If not deallocating pages, just do the flush of the VM area and
-	 * return.
+	 * return. If the arch doesn't have set_direct_map_(), also skip the
+	 * below work.
 	 */
-	if (!deallocate_pages) {
-		vm_unmap_aliases();
+	if (!deallocate_pages || !has_set_direct) {
+		flush_tlb_kernel_range(addr, get_vm_area_size(area));
 		return;
 	}
 
 	/*
 	 * If execution gets here, flush the vm mapping and reset the direct
 	 * map. Find the start and end range of the direct mappings to make sure
-	 * the vm_unmap_aliases() flush includes the direct map.
+	 * the flush_tlb_kernel_range() includes the direct map.
 	 */
 	for (i = 0; i < area->nr_pages; i++) {
-		if (page_address(area->pages[i])) {
+		addr = (unsigned long)page_address(area->pages[i]);
+		if (addr) {
 			start = min(addr, start);
-			end = max(addr, end);
+			end = max(addr + PAGE_SIZE, end);
 		}
 	}
 
@@ -1580,7 +1583,7 @@  static void vm_remove_mappings(struct vm_struct *area, int deallocate_pages)
 	 * reset the direct map permissions to the default.
 	 */
 	set_area_direct_map(area, set_direct_map_invalid_noflush);
-	_vm_unmap_aliases(start, end, 1);
+	flush_tlb_kernel_range(start, end);
 	set_area_direct_map(area, set_direct_map_default_noflush);
 }