[3/5] arm64: early ISB at exit from extended quiescent state

Message ID 20180405171800.5648-4-ynorov@caviumnetworks.com
State Not Applicable
Headers show
Series
  • smp: don't kick CPUs running idle or nohz_full tasks
Related show

Commit Message

Yury Norov April 5, 2018, 5:17 p.m.
This series enables delaying of kernel memory synchronization
for CPUs running in extended quiescent state (EQS) till the exit
of that state.

ARM64 uses IPI mechanism to notify all cores in  SMP system that
kernel text is changed; and IPI handler calls isb() to synchronize.

If we don't deliver IPI to EQS CPUs anymore, we should add ISB early
in EQS exit path.

There are 2 such paths. One starts in do_idle() loop, and other
in el0_svc entry. For do_idle(), isb() is added in
arch_cpu_idle_exit() hook. And for SVC handler, isb is called in
el0_svc_naked.

Suggested-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
---
 arch/arm64/kernel/entry.S   | 16 +++++++++++++++-
 arch/arm64/kernel/process.c |  7 +++++++
 2 files changed, 22 insertions(+), 1 deletion(-)

Comments

James Morse April 6, 2018, 10:06 a.m. | #1
Hi Yury,

On 05/04/18 18:17, Yury Norov wrote:
> This series enables delaying of kernel memory synchronization
> for CPUs running in extended quiescent state (EQS) till the exit
> of that state.
> 
> ARM64 uses IPI mechanism to notify all cores in  SMP system that
> kernel text is changed; and IPI handler calls isb() to synchronize.
> 
> If we don't deliver IPI to EQS CPUs anymore, we should add ISB early
> in EQS exit path.
> 
> There are 2 such paths. One starts in do_idle() loop, and other
> in el0_svc entry. For do_idle(), isb() is added in
> arch_cpu_idle_exit() hook. And for SVC handler, isb is called in
> el0_svc_naked.

(I know nothing about this EQS stuff, but) there is a third path that might be
relevant.
From include/linux/context_tracking.h:guest_enter_irqoff():
|	 * KVM does not hold any references to rcu protected data when it
|	 * switches CPU into a guest mode. In fact switching to a guest mode
|	 * is very similar to exiting to userspace from rcu point of view. In
|	 * addition CPU may stay in a guest mode for quite a long time (up to
|	 * one time slice). Lets treat guest mode as quiescent state, just like
|	 * we do with user-mode execution.

For non-VHE systems guest_enter_irqoff()() is called just before we jump to EL2.
Coming back gives us an exception-return, so we have a context-synchronisation
event there, and I assume we will never patch the hyp-text on these systems.

But with VHE on the upcoming kernel version we still go on to run code at the
same EL. Do we need an ISB on the path back from the guest once we've told RCU
we've 'exited user-space'?
If this code can be patched, do we have a problem here?


> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index c8d9ec363ddd..b1e1c19b4432 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -48,7 +48,7 @@
>  	.endm
>  
>  	.macro el0_svc_restore_syscall_args
> -#if defined(CONFIG_CONTEXT_TRACKING)
> +#if !defined(CONFIG_TINY_RCU) || defined(CONFIG_CONTEXT_TRACKING)
>  	restore_syscall_args
>  #endif
>  	.endm
> @@ -483,6 +483,19 @@ __bad_stack:
>  	ASM_BUG()
>  	.endm
>  
> +/*
> + * If CPU is in extended quiescent state we need isb to ensure that
> + * possible change of kernel text is visible by the core.
> + */
> +	.macro	isb_if_eqs
> +#ifndef CONFIG_TINY_RCU
> +	bl	rcu_is_watching
> +	cbnz	x0, 1f
> +	isb 					// pairs with aarch64_insn_patch_text
> +1:
> +#endif
> +	.endm
> +
>  el0_sync_invalid:
>  	inv_entry 0, BAD_SYNC
>  ENDPROC(el0_sync_invalid)
> @@ -949,6 +962,7 @@ alternative_else_nop_endif
>  
>  el0_svc_naked:					// compat entry point
>  	stp	x0, xscno, [sp, #S_ORIG_X0]	// save the original x0 and syscall number
> +	isb_if_eqs
>  	enable_daif
>  	ct_user_exit
>  	el0_svc_restore_syscall_args

Shouldn't this be at the point that RCU knows we've exited user-space? Otherwise
there is a gap where RCU thinks we're in user-space, we're not, and we're about
to tell it. Code-patching occurring in this gap would be missed.

This gap only contains 'enable_daif', and any exception that occurs here is
safe, but its going to give someone a nasty surprise...

Mark points out this ISB needs to be after RCU knows we're not quiescent:
https://lkml.org/lkml/2018/4/3/378

Can't this go in the rcu exit-quiescence code? Isn't this what your
rcu_dynticks_eqs_exit_sync() hook does?


Thanks,

James
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Rutland April 6, 2018, 11:07 a.m. | #2
On Thu, Apr 05, 2018 at 08:17:58PM +0300, Yury Norov wrote:
> This series enables delaying of kernel memory synchronization
> for CPUs running in extended quiescent state (EQS) till the exit
> of that state.
> 
> ARM64 uses IPI mechanism to notify all cores in  SMP system that
> kernel text is changed; and IPI handler calls isb() to synchronize.
> 
> If we don't deliver IPI to EQS CPUs anymore, we should add ISB early
> in EQS exit path.
> 
> There are 2 such paths. One starts in do_idle() loop, and other
> in el0_svc entry. For do_idle(), isb() is added in
> arch_cpu_idle_exit() hook. And for SVC handler, isb is called in
> el0_svc_naked.
> 
> Suggested-by: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
> ---
>  arch/arm64/kernel/entry.S   | 16 +++++++++++++++-
>  arch/arm64/kernel/process.c |  7 +++++++
>  2 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index c8d9ec363ddd..b1e1c19b4432 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -48,7 +48,7 @@
>  	.endm
>  
>  	.macro el0_svc_restore_syscall_args
> -#if defined(CONFIG_CONTEXT_TRACKING)
> +#if !defined(CONFIG_TINY_RCU) || defined(CONFIG_CONTEXT_TRACKING)
>  	restore_syscall_args
>  #endif
>  	.endm
> @@ -483,6 +483,19 @@ __bad_stack:
>  	ASM_BUG()
>  	.endm
>  
> +/*
> + * If CPU is in extended quiescent state we need isb to ensure that
> + * possible change of kernel text is visible by the core.
> + */
> +	.macro	isb_if_eqs
> +#ifndef CONFIG_TINY_RCU
> +	bl	rcu_is_watching
> +	cbnz	x0, 1f
> +	isb 					// pairs with aarch64_insn_patch_text
> +1:
> +#endif
> +	.endm
> +
>  el0_sync_invalid:
>  	inv_entry 0, BAD_SYNC
>  ENDPROC(el0_sync_invalid)
> @@ -949,6 +962,7 @@ alternative_else_nop_endif
>  
>  el0_svc_naked:					// compat entry point
>  	stp	x0, xscno, [sp, #S_ORIG_X0]	// save the original x0 and syscall number
> +	isb_if_eqs

As I mentioned before, this is too early.

If we only kick active CPUs, then until we exit a quiescent state, we
can race with concurrent modification, and cannot reliably ensure that
instructions are up-to-date. Practically speaking, that means that we
cannot patch any code used on the path to exit a quiescent state.

Also, if this were needed in the SVC path, it would be necessary for all
exceptions from EL0. Buggy userspace can always trigger a data abort,
even if it doesn't intend to.

>  	enable_daif
>  	ct_user_exit
>  	el0_svc_restore_syscall_args
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index f08a2ed9db0d..74cad496b07b 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -88,6 +88,13 @@ void arch_cpu_idle(void)
>  	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
>  }
>  
> +void arch_cpu_idle_exit(void)
> +{
> +	/* Pairs with aarch64_insn_patch_text() for EQS CPUs. */
> +	if (!rcu_is_watching())
> +		isb();
> +}

Likewise, this is too early as we haven't left the extended quiescent
state yet.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index c8d9ec363ddd..b1e1c19b4432 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -48,7 +48,7 @@ 
 	.endm
 
 	.macro el0_svc_restore_syscall_args
-#if defined(CONFIG_CONTEXT_TRACKING)
+#if !defined(CONFIG_TINY_RCU) || defined(CONFIG_CONTEXT_TRACKING)
 	restore_syscall_args
 #endif
 	.endm
@@ -483,6 +483,19 @@  __bad_stack:
 	ASM_BUG()
 	.endm
 
+/*
+ * If CPU is in extended quiescent state we need isb to ensure that
+ * possible change of kernel text is visible by the core.
+ */
+	.macro	isb_if_eqs
+#ifndef CONFIG_TINY_RCU
+	bl	rcu_is_watching
+	cbnz	x0, 1f
+	isb 					// pairs with aarch64_insn_patch_text
+1:
+#endif
+	.endm
+
 el0_sync_invalid:
 	inv_entry 0, BAD_SYNC
 ENDPROC(el0_sync_invalid)
@@ -949,6 +962,7 @@  alternative_else_nop_endif
 
 el0_svc_naked:					// compat entry point
 	stp	x0, xscno, [sp, #S_ORIG_X0]	// save the original x0 and syscall number
+	isb_if_eqs
 	enable_daif
 	ct_user_exit
 	el0_svc_restore_syscall_args
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index f08a2ed9db0d..74cad496b07b 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -88,6 +88,13 @@  void arch_cpu_idle(void)
 	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
 }
 
+void arch_cpu_idle_exit(void)
+{
+	/* Pairs with aarch64_insn_patch_text() for EQS CPUs. */
+	if (!rcu_is_watching())
+		isb();
+}
+
 #ifdef CONFIG_HOTPLUG_CPU
 void arch_cpu_idle_dead(void)
 {