From patchwork Thu Feb 1 18:18:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Balbir Singh X-Patchwork-Id: 868365 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [103.22.144.68]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3zXT3k14Hpz9s75 for ; Fri, 2 Feb 2018 05:22:10 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="WgALMVLY"; dkim-atps=neutral Received: from bilbo.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3zXT3j6PFNzF0Yt for ; Fri, 2 Feb 2018 05:22:09 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="WgALMVLY"; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:400e:c01::243; helo=mail-pl0-x243.google.com; envelope-from=bsingharora@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="WgALMVLY"; dkim-atps=neutral Received: from mail-pl0-x243.google.com (mail-pl0-x243.google.com [IPv6:2607:f8b0:400e:c01::243]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3zXSzZ5V6TzF0Yd for ; Fri, 2 Feb 2018 05:18:34 +1100 (AEDT) Received: by mail-pl0-x243.google.com with SMTP id ay8so4144278plb.4 for ; Thu, 01 Feb 2018 10:18:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=V0ooWYFJ+qIKPWXp4g8IV96QplrpnLO5GGdhIhYn4+I=; b=WgALMVLYgNNfcXI29hIfjgFhvA+nc2+TKc3cFdn5/MPpBCDJiyTzgUQBZ0EpgtJUPE Z/MngSQSMK29w/yDxo80usBteiuQzB1nvEXT0xIt9rpnnHrKchZrqNVxcKBVTQznypyx 9iMjtgccLNzm1tMQmA+YYwqY1xCYDO1ggLQx3ZkImbTx7ubISNhkCKySBbhbXp+KsaO4 9B7hjfClyKjzF/leqmfNnsBvuxZ6ycPwmZJ8Y2ADJaB4JsiLY9dz6Odv7hJFNc3yYejg E4CQyWtrv/3gy0Ry4ZP8REDaKcAAiCwnyJqA5J4ImYR5NH3wenOHRBudKBd6DOrqsOe5 b2Mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=V0ooWYFJ+qIKPWXp4g8IV96QplrpnLO5GGdhIhYn4+I=; b=oyBJ/zbB9mKeUS33rDMlWkx9JFwYU6ZDYl5dut4anEWpejioCqvgtG59rnvOhxRnQr 5A+yJYE4v1Jo5gxwS5E0TtWguY9phdj+9RC3Wpae2hD5ykyu4ND4qH7a92NyTVk3ra41 baLgoROVJ5VfukxV1vgM3Ei8/ieagOOISWik414JAJAFPGrz4W93rPOSZwUjYCntfgFP +1zJzAIcWkWb/rWgRH961KH+sGF9qXqUTlRDzL5tAfEQ0ud4Dd3QO2R2HFeA3PSLCkFU lb/oYKfQ9PuJlQkEisc4j4wB5cyqJJ1LNSJhbStWc5yA1gvO17XbELTOsyRnv7mwUbuc GfPQ== X-Gm-Message-State: AKwxytcjlgQJi5g4HBWQ6oPhnn6tPYrQSavU4unWtANjPIYUOUooUfp2 h2dmUA2YsSAp8OfCaxgiTzk9M/Nn X-Google-Smtp-Source: AH8x226GM5Gn7CAZXjebPUIKOG8YK39rrwbCM7iqS2IdAdLnKDDhvlK5e7rV5h/SjDIfpghBqwj3+A== X-Received: by 2002:a17:902:10b:: with SMTP id 11-v6mr33528469plb.336.1517509111615; Thu, 01 Feb 2018 10:18:31 -0800 (PST) Received: from localhost.au.ibm.com ([122.171.88.110]) by smtp.googlemail.com with ESMTPSA id t1sm88485pgs.73.2018.02.01.10.18.28 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 01 Feb 2018 10:18:30 -0800 (PST) From: Balbir Singh To: linuxppc-dev@lists.ozlabs.org Subject: [RFC] powerpc/mm/pgtable: Split mappings on hot-unplug Date: Thu, 1 Feb 2018 23:48:19 +0530 Message-Id: <20180201181819.5723-1-bsingharora@gmail.com> X-Mailer: git-send-email 2.13.6 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.25 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: aneesh.kumar@linux.vnet.ibm.com, david@gibson.dropbear.id.au Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" This patch splits the a linear mapping if the hot-unplug range is smaller than the mapping size. The code detects if the mapping needs to be split into a smaller size and if so, uses the stop machine infrastructure to map the current linear mapping with a smaller size mapping. Then the requested area is unmapped. A new API to do a local TLB flush is introduced and exported. This API is used in stop mapping after clearing the larger mapping and before creating the smaller mapping. There is also a check to ensure that we don't overlap with kernel text (the region that is being split) as it could cause the executing text to fault. There are some caveats with this approach, as the unmapping can cause reserved memory area to be freed up (memblock reserved mappings and allocations as well), that can be a problem. There could be a case where radix__map_kernel_page() may fault if the area selected to unmap contains the process page table, this is not a new problem created by the patch, but something to watch out for while testing. The right approach to solving the hot-unplug issue is to make the node/DIMM to be marked hotpluggable so that all the memory in that is movable. Currently there is no method of enforcing the hotpluggable property via libvirt or qemu. I've tested these changes under a kvm guest with 2 vcpus, from a split mapping point of view, some of the caveats mentioned above applied to the testing I did. TODOs 1. In addition of checking overlapping text, we should check for reserved memory allocations in the unmapped region 2. The code has duplication across several remove_*pgtable (pgd/pud/pmd), we could create helper functions to simplify the functionality. Signed-off-by: Balbir Singh --- .../powerpc/include/asm/book3s/64/tlbflush-radix.h | 1 + arch/powerpc/mm/pgtable-radix.c | 130 +++++++++++++++++---- arch/powerpc/mm/tlb-radix.c | 6 + 3 files changed, 116 insertions(+), 21 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h index 6a9e68003387..d1d4104e676d 100644 --- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h @@ -20,6 +20,7 @@ extern void radix__flush_pmd_tlb_range(struct vm_area_struct *vma, extern void radix__flush_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned long end); extern void radix__flush_tlb_kernel_range(unsigned long start, unsigned long end); +extern void radix__flush_tlb_local_kernel_range(unsigned long start, unsigned long end); extern void radix__local_flush_tlb_mm(struct mm_struct *mm); extern void radix__local_flush_all_mm(struct mm_struct *mm); diff --git a/arch/powerpc/mm/pgtable-radix.c b/arch/powerpc/mm/pgtable-radix.c index cfbbee941a76..10b6a1c433ff 100644 --- a/arch/powerpc/mm/pgtable-radix.c +++ b/arch/powerpc/mm/pgtable-radix.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -671,6 +672,35 @@ static void free_pmd_table(pmd_t *pmd_start, pud_t *pud) pud_clear(pud); } +struct change_mapping_params { + pte_t *pte; + unsigned long start; + unsigned long end; + unsigned long size; + unsigned long aligned_start; + unsigned long aligned_end; +}; + +static int stop_machine_change_mapping(void *data) +{ + struct change_mapping_params *params = + (struct change_mapping_params *)data; + unsigned long addr; + + if (!data) + return -1; + + spin_unlock(&init_mm.page_table_lock); + pte_clear(&init_mm, params->aligned_start, params->pte); + radix__flush_tlb_local_kernel_range(params->aligned_start, + params->aligned_end); + for (addr = params->aligned_start; addr < params->aligned_end; + addr += params->size) + create_physical_mapping(addr, addr + params->size); + spin_lock(&init_mm.page_table_lock); + return 0; +} + static void remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end) { @@ -699,12 +729,65 @@ static void remove_pte_table(pte_t *pte_start, unsigned long addr, } } +/* + * clear the pte and potentially split the mapping helper + * Return values are as follows + * + * 0: pte cleared, no changes required + * 1: pte cleared and mapping split + * -1: We can't clear this pte range + */ +static int clear_and_maybe_split_kernel_mapping(unsigned long addr, + unsigned long end, unsigned long size, unsigned long prev_size, + pte_t *pte) +{ + unsigned long mask = ~(size - 1); + unsigned long aligned_start = addr & mask; + unsigned long aligned_end = addr + size; + struct change_mapping_params params; + bool split_region = false; + + if ((end - addr) < size) { + /* + * We're going to clear the PTE, but not flushed + * the mapping, time to remap and flush. The + * effects if visible outside the processor or + * if we are running in code close to the + * mapping we cleared, we are in trouble. + */ + if (overlaps_kernel_text(aligned_start, addr) || + overlaps_kernel_text(end, aligned_end)) { + /* + * Hack, just return, don't pte_clear + */ + return -1; + } + split_region = true; + } + + if (split_region) { + params.pte = pte; + params.start = addr; + params.end = end; + params.size = prev_size; + params.aligned_end = aligned_end; + params.aligned_end = min_t(unsigned long, aligned_end, + (unsigned long)__va(memblock_end_of_DRAM())); + stop_machine(stop_machine_change_mapping, ¶ms, NULL); + return 1; + } + + pte_clear(&init_mm, addr, pte); + return 0; +} + static void remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end) { unsigned long next; pte_t *pte_base; pmd_t *pmd; + int rc = 0; pmd = pmd_start + pmd_index(addr); for (; addr < end; addr = next, pmd++) { @@ -714,14 +797,15 @@ static void remove_pmd_table(pmd_t *pmd_start, unsigned long addr, continue; if (pmd_huge(*pmd)) { - if (!IS_ALIGNED(addr, PMD_SIZE) || - !IS_ALIGNED(next, PMD_SIZE)) { - WARN_ONCE(1, "%s: unaligned range\n", __func__); + rc = clear_and_maybe_split_kernel_mapping(addr, end, + PMD_SIZE, PAGE_SIZE, (pte_t *)pmd); + if (rc == 0) continue; - } - - pte_clear(&init_mm, addr, (pte_t *)pmd); - continue; + if (rc == 1) { + pte_base = (pte_t *)pmd_page_vaddr(*pmd); + remove_pte_table(pte_base, addr, end); + } else + return; } pte_base = (pte_t *)pmd_page_vaddr(*pmd); @@ -736,6 +820,7 @@ static void remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long next; pmd_t *pmd_base; pud_t *pud; + int rc = 0; pud = pud_start + pud_index(addr); for (; addr < end; addr = next, pud++) { @@ -745,14 +830,15 @@ static void remove_pud_table(pud_t *pud_start, unsigned long addr, continue; if (pud_huge(*pud)) { - if (!IS_ALIGNED(addr, PUD_SIZE) || - !IS_ALIGNED(next, PUD_SIZE)) { - WARN_ONCE(1, "%s: unaligned range\n", __func__); + rc = clear_and_maybe_split_kernel_mapping(addr, end, + PUD_SIZE, PMD_SIZE, (pte_t *)pud); + if (rc == 0) continue; - } - - pte_clear(&init_mm, addr, (pte_t *)pud); - continue; + if (rc == 1) { + pmd_base = (pmd_t *)pud_page_vaddr(*pud); + remove_pmd_table(pmd_base, addr, end); + } else + return; } pmd_base = (pmd_t *)pud_page_vaddr(*pud); @@ -766,6 +852,7 @@ static void remove_pagetable(unsigned long start, unsigned long end) unsigned long addr, next; pud_t *pud_base; pgd_t *pgd; + int rc = 0; spin_lock(&init_mm.page_table_lock); @@ -777,14 +864,15 @@ static void remove_pagetable(unsigned long start, unsigned long end) continue; if (pgd_huge(*pgd)) { - if (!IS_ALIGNED(addr, PGDIR_SIZE) || - !IS_ALIGNED(next, PGDIR_SIZE)) { - WARN_ONCE(1, "%s: unaligned range\n", __func__); + rc = clear_and_maybe_split_kernel_mapping(addr, end, + PGD_SIZE, PUD_SIZE, (pte_t *)pgd); + if (rc == 0) continue; - } - - pte_clear(&init_mm, addr, (pte_t *)pgd); - continue; + if (rc == 1) { + pud_base = (pud_t *)pgd_page_vaddr(*pgd); + remove_pud_table(pud_base, addr, end); + } else + return; } pud_base = (pud_t *)pgd_page_vaddr(*pgd); diff --git a/arch/powerpc/mm/tlb-radix.c b/arch/powerpc/mm/tlb-radix.c index 884f4b705b57..9bbeaef1ee9b 100644 --- a/arch/powerpc/mm/tlb-radix.c +++ b/arch/powerpc/mm/tlb-radix.c @@ -323,6 +323,12 @@ void radix__flush_tlb_kernel_range(unsigned long start, unsigned long end) } EXPORT_SYMBOL(radix__flush_tlb_kernel_range); +void radix__flush_tlb_local_kernel_range(unsigned long start, unsigned long end) +{ + _tlbie_pid(0, RIC_FLUSH_ALL); +} +EXPORT_SYMBOL(radix__flush_tlb_local_kernel_range); + #define TLB_FLUSH_ALL -1UL /*