From patchwork Tue Nov 3 13:51:24 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 1393046 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.infradead.org (client-ip=2001:8b0:10b:1231::1; helo=merlin.infradead.org; envelope-from=linux-snps-arc-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=linutronix.de Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; secure) header.d=lists.infradead.org header.i=@lists.infradead.org header.a=rsa-sha256 header.s=merlin.20170209 header.b=Ao6LqKlj; dkim=fail reason="signature verification failed" (2048-bit key; secure) header.d=linutronix.de header.i=@linutronix.de header.a=rsa-sha256 header.s=2020 header.b=HrIvto5P; dkim=fail reason="signature verification failed" header.d=linutronix.de header.i=@linutronix.de header.a=ed25519-sha256 header.s=2020e header.b=66P0BI0k; dkim-atps=neutral Received: from merlin.infradead.org (merlin.infradead.org [IPv6:2001:8b0:10b:1231::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4CQWRB0Rd3z9sVW for ; Wed, 4 Nov 2020 00:51:34 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:Message-ID:Date:In-Reply-To:Subject:To: From:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:References:List-Owner; bh=mKMa18xHi5Gfr/otC9bUNBt2gx5187FKGKj+4ls1Zoo=; b=Ao6LqKlj0OyVqpA4D4e9KqdVg OMiD31nbyus0DTwQKqvnko3ZKlFcfR7YR54HZKSZ1G43mGu4vLoMT6B2O2b7U93+brkdMr4DGigug D3v7HbvXnNIO2rifbAFivX6AGTv71W743qQlpYWxVBWBY8ngsL2NYjnVXc59ZiK3QoKuVuViem0l8 LyK60QrFiPmpLe0LZxyKSRzfoqjR5IjgH85e7DEJFflAduPzVXX3l9+rndwQhNk3hEvnlM+B3HwVI WQ5OwiD9ahoFpzn1KaYsNNhqw65W0yZunrSRApu+CcFgzHS6ReIoH3ISut1oUEi1Y33UnXjgHxurP fkJqAX0+Q==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kZwif-000058-RG; Tue, 03 Nov 2020 13:51:29 +0000 Received: from galois.linutronix.de ([193.142.43.55]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kZwic-0008V6-Dz; Tue, 03 Nov 2020 13:51:27 +0000 From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1604411485; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to; bh=IS0H0PmYwF+g21mCG5Xo/19/WQ/0DRdzMedmtdA3coA=; b=HrIvto5PjENGZ2sJaNLHD5fU5x7ktTkUsQiiDyfOlfNVG6i2QfDeoLNWiuBZ8UuMqdNhX2 fB4hQZYW9XLjZyQzXu32J97iL+4BOXERJUUmfYUCqJbELJ/WBmwK+DuqZ6BSQGoZXks/KC aDoQ57uSGJVq4g4B3MLZakzs1shvqXk/dnG4UwyHY3B13uTe41oYG3HvCVER6K38Ja+gJt 7Usp+noPsRZGIC1+DKDd7s1qqy+Nv55wtBL5RImMxzkF+RjknCJjAvMn0ydLpnI8IyVMmU hhwOYGNoGIQLOU3q/1OXFTFUdc4C4oO5CKnfyMrPpfB98PwG0AkBsHS1Z2hGyg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1604411485; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to; bh=IS0H0PmYwF+g21mCG5Xo/19/WQ/0DRdzMedmtdA3coA=; b=66P0BI0kspaCcqHZR5ZhOyEE21HuNZfhFXd12L2xM9Ri7Lmpu3Br/scECIFCm7xMgiFzUo sskAL7tUf7GfYKDg== To: LKML Subject: [patch V4 24/37] sched: highmem: Store local kmaps in task struct In-Reply-To: <20201103095859.038791330@linutronix.de> Date: Tue, 03 Nov 2020 14:51:24 +0100 Message-ID: <877dr235wj.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20201103_085126_806019_5D93C007 X-CRM114-Status: GOOD ( 25.29 ) X-Spam-Score: -2.5 (--) X-Spam-Report: SpamAssassin version 3.4.4 on merlin.infradead.org summary: Content analysis details: (-2.5 points) pts rule name description ---- ---------------------- -------------------------------------------------- -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/, medium trust [193.142.43.55 listed in list.dnswl.org] 0.0 SPF_HELO_NONE SPF: HELO does not publish an SPF Record -0.0 SPF_PASS SPF: sender matches SPF record 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_EF Message has a valid DKIM or DK signature from envelope-from domain -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain X-BeenThere: linux-snps-arc@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on Synopsys ARC Processors List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Juri Lelli , linux-aio@kvack.org, Peter Zijlstra , Benjamin Herrenschmidt , Sebastian Andrzej Siewior , Joonas Lahtinen , dri-devel@lists.freedesktop.org, virtualization@lists.linux-foundation.org, Ben Segall , Chris Mason , Huang Rui , Paul Mackerras , Gerd Hoffmann , Daniel Bristot de Oliveira , sparclinux@vger.kernel.org, Vincent Chen , Christoph Hellwig , Vincent Guittot , Paul McKenney , Max Filippov , Michael Ellerman , x86@kernel.org, Russell King , linux-csky@vger.kernel.org, Ingo Molnar , David Airlie , VMware Graphics , Mel Gorman , nouveau@lists.freedesktop.org, Dave Airlie , linux-snps-arc@lists.infradead.org, Ben Skeggs , linux-xtensa@linux-xtensa.org, Arnd Bergmann , intel-gfx@lists.freedesktop.org, Roland Scheidegger , Josef Bacik , Steven Rostedt , Linus Torvalds , Alexander Viro , spice-devel@lists.freedesktop.org, David Sterba , Rodrigo Vivi , Dietmar Eggemann , linux-arm-kernel@lists.infradead.org, Jani Nikula , Chris Zankel , Michal Simek , Thomas Bogendoerfer , Nick Hu , linux-mm@kvack.org, Vineet Gupta , linux-mips@vger.kernel.org, Christian Koenig , Benjamin LaHaise , Daniel Vetter , linux-fsdevel@vger.kernel.org, Andrew Morton , linuxppc-dev@lists.ozlabs.org, "David S. Miller" , linux-btrfs@vger.kernel.org, Greentime Hu Sender: "linux-snps-arc" Errors-To: linux-snps-arc-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org Instead of storing the map per CPU provide and use per task storage. That prepares for local kmaps which are preemptible. The context switch code is preparatory and not yet in use because kmap_atomic() runs with preemption disabled. Will be made usable in the next step. The context switch logic is safe even when an interrupt happens after clearing or before restoring the kmaps. The kmap index in task struct is not modified so any nesting kmap in an interrupt will use unused indices and on return the counter is the same as before. Also add an assert into the return to user space code. Going back to user space with an active kmap local is a nono. Signed-off-by: Thomas Gleixner --- V4: Use the version which actually compiles and works V3: Handle the debug case correctly --- include/linux/highmem-internal.h | 10 +++ include/linux/sched.h | 9 +++ kernel/entry/common.c | 2 kernel/fork.c | 1 kernel/sched/core.c | 18 +++++++ mm/highmem.c | 99 +++++++++++++++++++++++++++++++++++---- 6 files changed, 129 insertions(+), 10 deletions(-) --- a/include/linux/highmem-internal.h +++ b/include/linux/highmem-internal.h @@ -9,6 +9,16 @@ void *__kmap_local_pfn_prot(unsigned long pfn, pgprot_t prot); void *__kmap_local_page_prot(struct page *page, pgprot_t prot); void kunmap_local_indexed(void *vaddr); +void kmap_local_fork(struct task_struct *tsk); +void __kmap_local_sched_out(void); +void __kmap_local_sched_in(void); +static inline void kmap_assert_nomap(void) +{ + DEBUG_LOCKS_WARN_ON(current->kmap_ctrl.idx); +} +#else +static inline void kmap_local_fork(struct task_struct *tsk) { } +static inline void kmap_assert_nomap(void) { } #endif #ifdef CONFIG_HIGHMEM --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -34,6 +34,7 @@ #include #include #include +#include /* task_struct member predeclarations (sorted alphabetically): */ struct audit_context; @@ -629,6 +630,13 @@ struct wake_q_node { struct wake_q_node *next; }; +struct kmap_ctrl { +#ifdef CONFIG_KMAP_LOCAL + int idx; + pte_t pteval[KM_MAX_IDX]; +#endif +}; + struct task_struct { #ifdef CONFIG_THREAD_INFO_IN_TASK /* @@ -1294,6 +1302,7 @@ struct task_struct { unsigned int sequential_io; unsigned int sequential_io_avg; #endif + struct kmap_ctrl kmap_ctrl; #ifdef CONFIG_DEBUG_ATOMIC_SLEEP unsigned long task_state_change; #endif --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -2,6 +2,7 @@ #include #include +#include #include #include @@ -194,6 +195,7 @@ static void exit_to_user_mode_prepare(st /* Ensure that the address limit is intact and no locks are held */ addr_limit_user_check(); + kmap_assert_nomap(); lockdep_assert_irqs_disabled(); lockdep_sys_exit(); } --- a/kernel/fork.c +++ b/kernel/fork.c @@ -930,6 +930,7 @@ static struct task_struct *dup_task_stru account_kernel_stack(tsk, 1); kcov_task_init(tsk); + kmap_local_fork(tsk); #ifdef CONFIG_FAULT_INJECTION tsk->fail_nth = 0; --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4053,6 +4053,22 @@ static inline void finish_lock_switch(st # define finish_arch_post_lock_switch() do { } while (0) #endif +static inline void kmap_local_sched_out(void) +{ +#ifdef CONFIG_KMAP_LOCAL + if (unlikely(current->kmap_ctrl.idx)) + __kmap_local_sched_out(); +#endif +} + +static inline void kmap_local_sched_in(void) +{ +#ifdef CONFIG_KMAP_LOCAL + if (unlikely(current->kmap_ctrl.idx)) + __kmap_local_sched_in(); +#endif +} + /** * prepare_task_switch - prepare to switch tasks * @rq: the runqueue preparing to switch @@ -4075,6 +4091,7 @@ prepare_task_switch(struct rq *rq, struc perf_event_task_sched_out(prev, next); rseq_preempt(prev); fire_sched_out_preempt_notifiers(prev, next); + kmap_local_sched_out(); prepare_task(next); prepare_arch_switch(next); } @@ -4141,6 +4158,7 @@ static struct rq *finish_task_switch(str finish_lock_switch(rq); finish_arch_post_lock_switch(); kcov_finish_switch(current); + kmap_local_sched_in(); fire_sched_in_preempt_notifiers(current); /* --- a/mm/highmem.c +++ b/mm/highmem.c @@ -365,8 +365,6 @@ EXPORT_SYMBOL(kunmap_high); #include -static DEFINE_PER_CPU(int, __kmap_local_idx); - /* * With DEBUG_HIGHMEM the stack depth is doubled and every second * slot is unused which acts as a guard page @@ -379,23 +377,21 @@ static DEFINE_PER_CPU(int, __kmap_local_ static inline int kmap_local_idx_push(void) { - int idx = __this_cpu_add_return(__kmap_local_idx, KM_INCR) - 1; - WARN_ON_ONCE(in_irq() && !irqs_disabled()); - BUG_ON(idx >= KM_MAX_IDX); - return idx; + current->kmap_ctrl.idx += KM_INCR; + BUG_ON(current->kmap_ctrl.idx >= KM_MAX_IDX); + return current->kmap_ctrl.idx - 1; } static inline int kmap_local_idx(void) { - return __this_cpu_read(__kmap_local_idx) - 1; + return current->kmap_ctrl.idx - 1; } static inline void kmap_local_idx_pop(void) { - int idx = __this_cpu_sub_return(__kmap_local_idx, KM_INCR); - - BUG_ON(idx < 0); + current->kmap_ctrl.idx -= KM_INCR; + BUG_ON(current->kmap_ctrl.idx < 0); } #ifndef arch_kmap_local_post_map @@ -461,6 +457,7 @@ void *__kmap_local_pfn_prot(unsigned lon pteval = pfn_pte(pfn, prot); set_pte_at(&init_mm, vaddr, kmap_pte - idx, pteval); arch_kmap_local_post_map(vaddr, pteval); + current->kmap_ctrl.pteval[kmap_local_idx()] = pteval; preempt_enable(); return (void *)vaddr; @@ -505,10 +502,92 @@ void kunmap_local_indexed(void *vaddr) arch_kmap_local_pre_unmap(addr); pte_clear(&init_mm, addr, kmap_pte - idx); arch_kmap_local_post_unmap(addr); + current->kmap_ctrl.pteval[kmap_local_idx()] = __pte(0); kmap_local_idx_pop(); preempt_enable(); } EXPORT_SYMBOL(kunmap_local_indexed); + +/* + * Invoked before switch_to(). This is safe even when during or after + * clearing the maps an interrupt which needs a kmap_local happens because + * the task::kmap_ctrl.idx is not modified by the unmapping code so a + * nested kmap_local will use the next unused index and restore the index + * on unmap. The already cleared kmaps of the outgoing task are irrelevant + * because the interrupt context does not know about them. The same applies + * when scheduling back in for an interrupt which happens before the + * restore is complete. + */ +void __kmap_local_sched_out(void) +{ + struct task_struct *tsk = current; + pte_t *kmap_pte = kmap_get_pte(); + int i; + + /* Clear kmaps */ + for (i = 0; i < tsk->kmap_ctrl.idx; i++) { + pte_t pteval = tsk->kmap_ctrl.pteval[i]; + unsigned long addr; + int idx; + + /* With debug all even slots are unmapped and act as guard */ + if (IS_ENABLED(CONFIG_DEBUG_HIGHMEM) && !(i & 0x01)) { + WARN_ON_ONCE(!pte_none(pteval)); + continue; + } + if (WARN_ON_ONCE(pte_none(pteval))) + continue; + + /* + * This is a horrible hack for XTENSA to calculate the + * coloured PTE index. Uses the PFN encoded into the pteval + * and the map index calculation because the actual mapped + * virtual address is not stored in task::kmap_ctrl. + * For any sane architecture this is optimized out. + */ + idx = arch_kmap_local_map_idx(i, pte_pfn(pteval)); + + addr = __fix_to_virt(FIX_KMAP_BEGIN + idx); + arch_kmap_local_pre_unmap(addr); + pte_clear(&init_mm, addr, kmap_pte - idx); + arch_kmap_local_post_unmap(addr); + } +} + +void __kmap_local_sched_in(void) +{ + struct task_struct *tsk = current; + pte_t *kmap_pte = kmap_get_pte(); + int i; + + /* Restore kmaps */ + for (i = 0; i < tsk->kmap_ctrl.idx; i++) { + pte_t pteval = tsk->kmap_ctrl.pteval[i]; + unsigned long addr; + int idx; + + /* With debug all even slots are unmapped and act as guard */ + if (IS_ENABLED(CONFIG_DEBUG_HIGHMEM) && !(i & 0x01)) { + WARN_ON_ONCE(!pte_none(pteval)); + continue; + } + if (WARN_ON_ONCE(pte_none(pteval))) + continue; + + /* See comment in __kmap_local_sched_out() */ + idx = arch_kmap_local_map_idx(i, pte_pfn(pteval)); + addr = __fix_to_virt(FIX_KMAP_BEGIN + idx); + set_pte_at(&init_mm, addr, kmap_pte - idx, pteval); + arch_kmap_local_post_map(addr, pteval); + } +} + +void kmap_local_fork(struct task_struct *tsk) +{ + if (WARN_ON_ONCE(tsk->kmap_ctrl.idx)) + memset(&tsk->kmap_ctrl, 0, sizeof(tsk->kmap_ctrl)); +} + #endif #if defined(HASHED_PAGE_VIRTUAL)