diff mbox series

[3/3] powerpc: fix reschedule bug in KUAP-unlocked user copy

Message ID 20221013151647.1857994-3-npiggin@gmail.com (mailing list archive)
State Accepted
Headers show
Series [1/3] powerpc/64s: Disable preemption in hash lazy mmu mode | expand

Checks

Context Check Description
snowpatch_ozlabs/github-powerpc_ppctests success Successfully ran 10 jobs.
snowpatch_ozlabs/github-powerpc_selftests success Successfully ran 10 jobs.
snowpatch_ozlabs/github-powerpc_sparse success Successfully ran 4 jobs.
snowpatch_ozlabs/github-powerpc_clang success Successfully ran 6 jobs.

Commit Message

Nicholas Piggin Oct. 13, 2022, 3:16 p.m. UTC
schedule must not be explicitly called while KUAP is unlocked, because
the AMR register will not be saved across the context switch on 64s
(preemption is allowed because that is driven by interrupts which do
save the AMR).

exit_vmx_usercopy() runs inside an unlocked user access region, and it
calls preempt_enable() which will call schedule() if need_resched() was
set while non-preemptible. This can cause tasks to run unprotected when
the should not, and can cause the user copy to be improperly blocked
when scheduling back to it.

Fix this by avoiding the explicit resched for preempt kernels by
generating an interrupt to reschedule the context if need_resched() got
set.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/lib/vmx-helper.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

Guenter Roeck Oct. 13, 2022, 7:58 p.m. UTC | #1
On Fri, Oct 14, 2022 at 01:16:47AM +1000, Nicholas Piggin wrote:
> schedule must not be explicitly called while KUAP is unlocked, because
> the AMR register will not be saved across the context switch on 64s
> (preemption is allowed because that is driven by interrupts which do
> save the AMR).
> 
> exit_vmx_usercopy() runs inside an unlocked user access region, and it
> calls preempt_enable() which will call schedule() if need_resched() was
> set while non-preemptible. This can cause tasks to run unprotected when
> the should not, and can cause the user copy to be improperly blocked
> when scheduling back to it.
> 
> Fix this by avoiding the explicit resched for preempt kernels by
> generating an interrupt to reschedule the context if need_resched() got
> set.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

With:

0cbbc95b12ac (HEAD -> ppc) powerpc: fix reschedule bug in KUAP-unlocked user copy
28f6b1f174f4 powerpc/64s: Fix hash__change_memory_range preemption warning
20cbdcbb2b09 powerpc/64s: Disable preemption in hash lazy mmu mode
a8b305a95113 powerpc/pseries: Fix CONFIG_DTL=n build
9c55696dd278 powerpc/64s/interrupt: Fix lost interrupts when returning to soft-masked context
f2b220ef93ea (local/master, master) Merge tag 'docs-6.1-2' of git://git.lwn.net/linux

I observed the traceback below. This is with KFENCE and various module
tests (re-)enabled, so the problem may well be unrelated to any of the
above patches.

Guenter

---
WARNING: inconsistent lock state
6.0.0-11845-g0cbbc95b12ac #1 Tainted: G                 N
--------------------------------
inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
swapper/0/1 [HC0[0]:SC0[0]:HE1:SE1] takes:
c000000002734de8 (native_tlbie_lock){+.?.}-{2:2}, at: .native_hpte_updateboltedpp+0x1a4/0x600
{IN-SOFTIRQ-W} state was registered at:
  .lock_acquire+0x20c/0x520
  ._raw_spin_lock+0x4c/0x70
  .native_hpte_invalidate+0x62c/0x840
  .hash__kernel_map_pages+0x450/0x640
  .kfence_protect+0x58/0xc0
  .kfence_guarded_free+0x374/0x5a0
  .__slab_free+0x3d0/0x630
  .put_cred_rcu+0xcc/0x120
  .rcu_core+0x3c4/0x14e0
  .__do_softirq+0x1dc/0x7dc
  .do_softirq_own_stack+0x40/0x60
  0xc00000000869f6d0
  .irq_exit+0x1e8/0x220
  .timer_interrupt+0x284/0x700
  decrementer_common_virt+0x214/0x220
  .rwsem_wake+0x88/0xe0
  .crypto_alg_tested+0x26c/0x370
  .cryptomgr_test+0x34/0x70
  .kthread+0x154/0x180
  .ret_from_kernel_thread+0x58/0x60
irq event stamp: 162973
hardirqs last  enabled at (162973): [<c0000000010cb724>] ._raw_spin_unlock_irqrestore+0xa4/0x120
hardirqs last disabled at (162972): [<c0000000010cb2e0>] ._raw_spin_lock_irqsave+0x40/0xa0
softirqs last  enabled at (162280): [<c0000000010cea2c>] .__do_softirq+0x7ac/0x7dc
softirqs last disabled at (162271): [<c000000000014a60>] .do_softirq_own_stack+0x40/0x60

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(native_tlbie_lock);
  <Interrupt>
    lock(native_tlbie_lock);

 *** DEADLOCK ***

no locks held by swapper/0/1.

stack backtrace:
CPU: 0 PID: 1 Comm: swapper/0 Tainted: G                 N 6.0.0-11845-g0cbbc95b12ac #1
Hardware name: PowerMac3,1 PPC970FX 0x3c0301 PowerMac
Call Trace:
[c000000005607520] [c0000000010853a8] .dump_stack_lvl+0xa4/0x100 (unreliable)
[c0000000056075b0] [c00000000017e34c] .print_usage_bug+0x2dc/0x310
[c000000005607690] [c00000000017ec8c] .mark_lock.part.42+0x90c/0x9d0
[c0000000056077c0] [c00000000018071c] .__lock_acquire+0x58c/0x2070
[c000000005607910] [c00000000017f3bc] .lock_acquire+0x20c/0x520
[c000000005607a20] [c0000000010cb05c] ._raw_spin_lock+0x4c/0x70
[c000000005607ab0] [c00000000007af94] .native_hpte_updateboltedpp+0x1a4/0x600
[c000000005607b70] [c0000000000721cc] .hash__change_memory_range+0xec/0x200
[c000000005607c20] [c000000000072904] .hash__mark_initmem_nx+0x54/0x90
[c000000005607ca0] [c00000000006e490] .mark_initmem_nx+0x30/0x70
[c000000005607d10] [c00000000006d440] .free_initmem+0x30/0xf0
[c000000005607d90] [c0000000000116ec] .kernel_init+0x5c/0x1c0
[c000000005607e10] [c00000000000ca3c] .ret_from_kernel_thread+0x58/0x60
Guenter Roeck Oct. 14, 2022, 12:18 a.m. UTC | #2
On Fri, Oct 14, 2022 at 01:16:47AM +1000, Nicholas Piggin wrote:
> schedule must not be explicitly called while KUAP is unlocked, because
> the AMR register will not be saved across the context switch on 64s
> (preemption is allowed because that is driven by interrupts which do
> save the AMR).
> 
> exit_vmx_usercopy() runs inside an unlocked user access region, and it
> calls preempt_enable() which will call schedule() if need_resched() was
> set while non-preemptible. This can cause tasks to run unprotected when
> the should not, and can cause the user copy to be improperly blocked
> when scheduling back to it.
> 
> Fix this by avoiding the explicit resched for preempt kernels by
> generating an interrupt to reschedule the context if need_resched() got
> set.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Tested-by: Guenter Roeck <linux@roeck-us.net>

> ---
>  arch/powerpc/lib/vmx-helper.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/lib/vmx-helper.c b/arch/powerpc/lib/vmx-helper.c
> index f76a50291fd7..d491da8d1838 100644
> --- a/arch/powerpc/lib/vmx-helper.c
> +++ b/arch/powerpc/lib/vmx-helper.c
> @@ -36,7 +36,17 @@ int exit_vmx_usercopy(void)
>  {
>  	disable_kernel_altivec();
>  	pagefault_enable();
> -	preempt_enable();
> +	preempt_enable_no_resched();
> +	/*
> +	 * Must never explicitly call schedule (including preempt_enable())
> +	 * while in a kuap-unlocked user copy, because the AMR register will
> +	 * not be saved and restored across context switch. However preempt
> +	 * kernels need to be preempted as soon as possible if need_resched is
> +	 * set and we are preemptible. The hack here is to schedule a
> +	 * decrementer to fire here and reschedule for us if necessary.
> +	 */
> +	if (IS_ENABLED(CONFIG_PREEMPT) && need_resched())
> +		set_dec(1);
>  	return 0;
>  }
>  
> -- 
> 2.37.2
>
diff mbox series

Patch

diff --git a/arch/powerpc/lib/vmx-helper.c b/arch/powerpc/lib/vmx-helper.c
index f76a50291fd7..d491da8d1838 100644
--- a/arch/powerpc/lib/vmx-helper.c
+++ b/arch/powerpc/lib/vmx-helper.c
@@ -36,7 +36,17 @@  int exit_vmx_usercopy(void)
 {
 	disable_kernel_altivec();
 	pagefault_enable();
-	preempt_enable();
+	preempt_enable_no_resched();
+	/*
+	 * Must never explicitly call schedule (including preempt_enable())
+	 * while in a kuap-unlocked user copy, because the AMR register will
+	 * not be saved and restored across context switch. However preempt
+	 * kernels need to be preempted as soon as possible if need_resched is
+	 * set and we are preemptible. The hack here is to schedule a
+	 * decrementer to fire here and reschedule for us if necessary.
+	 */
+	if (IS_ENABLED(CONFIG_PREEMPT) && need_resched())
+		set_dec(1);
 	return 0;
 }