diff mbox series

[1/5] arm64: entry: isb in el1_irq

Message ID 20180405171800.5648-2-ynorov@caviumnetworks.com (mailing list archive)
State Not Applicable
Headers show
Series smp: don't kick CPUs running idle or nohz_full tasks | expand

Commit Message

Yury Norov April 5, 2018, 5:17 p.m. UTC
Kernel text patching framework relies on IPI to ensure that other
SMP cores observe the change. Target core calls isb() in IPI handler
path, but not at the beginning of el1_irq entry. There's a chance
that modified instruction will appear prior isb(), and so will not be
observed.

This patch inserts isb early at el1_irq entry to avoid that chance.

Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
---
 arch/arm64/kernel/entry.S | 1 +
 1 file changed, 1 insertion(+)

Comments

James Morse April 6, 2018, 10:02 a.m. UTC | #1
Hi Yury,

On 05/04/18 18:17, Yury Norov wrote:
> Kernel text patching framework relies on IPI to ensure that other
> SMP cores observe the change. Target core calls isb() in IPI handler

(Odd, if its just to synchronize the CPU, taking the IPI should be enough).


> path, but not at the beginning of el1_irq entry. There's a chance
> that modified instruction will appear prior isb(), and so will not be
> observed.
> 
> This patch inserts isb early at el1_irq entry to avoid that chance.


> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index ec2ee720e33e..9c06b4b80060 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -593,6 +593,7 @@ ENDPROC(el1_sync)
>  
>  	.align	6
>  el1_irq:
> +	isb					// pairs with aarch64_insn_patch_text
>  	kernel_entry 1
>  	enable_da_f
>  #ifdef CONFIG_TRACE_IRQFLAGS
> 

An ISB at the beginning of the vectors? This is odd, taking an IRQ to get in
here would be a context-synchronization-event too, so the ISB is superfluous.

The ARM-ARM  has a list of 'Context-Synchronization event's (Glossary on page
6480 of DDI0487B.b), paraphrasing:
* ISB
* Taking an exception
* ERET
* (...loads of debug stuff...)


Thanks,

James
Mark Rutland April 6, 2018, 10:57 a.m. UTC | #2
On Thu, Apr 05, 2018 at 08:17:56PM +0300, Yury Norov wrote:
> Kernel text patching framework relies on IPI to ensure that other
> SMP cores observe the change. Target core calls isb() in IPI handler
> path, but not at the beginning of el1_irq entry. There's a chance
> that modified instruction will appear prior isb(), and so will not be
> observed.
> 
> This patch inserts isb early at el1_irq entry to avoid that chance.

As James pointed out, taking an exception is context synchronizing, so
this looks unnecessary.

Also, it's important to realise that the exception entry is not tied to a
specific interrupt. We might take an EL1 IRQ because of a timer interrupt,
then an IPI could be taken before we get to gic_handle_irq().

This means that we can race:

	CPU0				CPU1
	<take IRQ>
	ISB
					<patch text>
					<send IPI>
	<discover IPI pending>

... and thus the ISB is too early.

Only once we're in the interrupt handler can we pair an ISB with the IPI, and
any code executed before that is not guaranteed to be up-to-date.

Thanks,
Mark.

> 
> Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
> ---
>  arch/arm64/kernel/entry.S | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
> index ec2ee720e33e..9c06b4b80060 100644
> --- a/arch/arm64/kernel/entry.S
> +++ b/arch/arm64/kernel/entry.S
> @@ -593,6 +593,7 @@ ENDPROC(el1_sync)
>  
>  	.align	6
>  el1_irq:
> +	isb					// pairs with aarch64_insn_patch_text
>  	kernel_entry 1
>  	enable_da_f
>  #ifdef CONFIG_TRACE_IRQFLAGS
> -- 
> 2.14.1
>
Yury Norov April 6, 2018, 4:54 p.m. UTC | #3
On Fri, Apr 06, 2018 at 11:02:56AM +0100, James Morse wrote:
> Hi Yury,
> 
> An ISB at the beginning of the vectors? This is odd, taking an IRQ to get in
> here would be a context-synchronization-event too, so the ISB is superfluous.
> 
> The ARM-ARM  has a list of 'Context-Synchronization event's (Glossary on page
> 6480 of DDI0487B.b), paraphrasing:
> * ISB
> * Taking an exception
> * ERET
> * (...loads of debug stuff...)

Hi James, Mark,

I completely forgot that taking an exception is the context synchronization
event. Sorry for your time on reviewing this crap. It means that patches 1,
2 and 3 are not needed except chunk that adds ISB in do_idle() path. 

Also it means that for arm64 we are safe to mask IPI delivering to CPUs that
run any userspace code, not only nohz_full.

In general, kick_all_cpus_sync() is needed to switch contexts. But exit from
userspace is anyway the switch of context. And while in userspace, we cannot
do something wrong on kernel side. For me it means that we can safely drop
IPI for all userspace modes - both normal and nohz_full. 

If it's correct, for v3 I would suggest:
 - in kick_all_cpus_sync() mask all is_idle_task() and user_mode() CPUs;
 - add isb() for arm64 in do_idle() path only - this path doesn't imply
   context switch.

What do you think?

Yury
Mark Rutland April 6, 2018, 5:22 p.m. UTC | #4
On Fri, Apr 06, 2018 at 07:54:02PM +0300, Yury Norov wrote:
> In general, kick_all_cpus_sync() is needed to switch contexts. But exit from
> userspace is anyway the switch of context. And while in userspace, we cannot
> do something wrong on kernel side. For me it means that we can safely drop
> IPI for all userspace modes - both normal and nohz_full. 

This *may* be true, but only if we never have to patch text in the
windows:

* between exception entry and eqs_exit()

* between eqs_enter() and exception return

* between eqs_enter() and eqs_exit() in the idle loop.

If it's possible that we need to execute patched text in any of those
paths, we must IPI all CPUs in order to correctly serialize things.

Digging a bit, I also thing that our ct_user_exit and ct_user_enter
usage is on dodgy ground today.

For example, in el0_dbg we call do_debug_exception() *before* calling
ct_user_exit. Which I believe means we'd use RCU while supposedly in an
extended quiescent period, which would be bad.

In other paths, we unmask all DAIF bits before calling ct_user_exit, so
we could similarly take an EL1 debug exception without having exited the
extended quiescent period.

I think similar applies to SDEI; we don't negotiate with RCU prior to
invoking handlers, which might need RCU.

> If it's correct, for v3 I would suggest:
>  - in kick_all_cpus_sync() mask all is_idle_task() and user_mode() CPUs;
>  - add isb() for arm64 in do_idle() path only - this path doesn't imply
>    context switch.

As mentioned in my other reply, I don't think the ISB in do_idle()
makes sense, unless that occurs *after* we exit the extended quiescent
state.

Thanks,
Mark.
James Morse April 6, 2018, 5:30 p.m. UTC | #5
Hi Mark,

On 06/04/18 18:22, Mark Rutland wrote:
> Digging a bit, I also thing that our ct_user_exit and ct_user_enter
> usage is on dodgy ground today.

[...]

> I think similar applies to SDEI; we don't negotiate with RCU prior to
> invoking handlers, which might need RCU.

The arch code's __sdei_handler() calls nmi_enter() -> rcu_nmi_enter(), which
does a rcu_dynticks_eqs_exit().


Thanks,

James
Mark Rutland April 6, 2018, 5:34 p.m. UTC | #6
On Fri, Apr 06, 2018 at 06:30:50PM +0100, James Morse wrote:
> Hi Mark,
> 
> On 06/04/18 18:22, Mark Rutland wrote:
> > Digging a bit, I also thing that our ct_user_exit and ct_user_enter
> > usage is on dodgy ground today.
> 
> [...]
> 
> > I think similar applies to SDEI; we don't negotiate with RCU prior to
> > invoking handlers, which might need RCU.
> 
> The arch code's __sdei_handler() calls nmi_enter() -> rcu_nmi_enter(), which
> does a rcu_dynticks_eqs_exit().

Ah, sorry. I missed that!

Thanks,
Mark.
Mark Rutland April 6, 2018, 5:50 p.m. UTC | #7
On Fri, Apr 06, 2018 at 06:22:11PM +0100, Mark Rutland wrote:
> Digging a bit, I also thing that our ct_user_exit and ct_user_enter
> usage is on dodgy ground today.
> 
> For example, in el0_dbg we call do_debug_exception() *before* calling
> ct_user_exit. Which I believe means we'd use RCU while supposedly in an
> extended quiescent period, which would be bad.

It seems this is the case. I can trigger the following by having GDB
place a SW breakpoint:

[   51.217947] =============================
[   51.217953] WARNING: suspicious RCU usage
[   51.217961] 4.16.0 #4 Not tainted
[   51.217966] -----------------------------
[   51.217974] ./include/linux/rcupdate.h:632 rcu_read_lock() used illegally while idle!
[   51.217980]
[   51.217980] other info that might help us debug this:
[   51.217980]
[   51.217987]
[   51.217987] RCU used illegally from idle CPU!
[   51.217987] rcu_scheduler_active = 2, debug_locks = 1
[   51.217992] RCU used illegally from extended quiescent state!
[   51.217999] 1 lock held by ls/2412:
[   51.218004]  #0:  (rcu_read_lock){....}, at: [<0000000092efbdd5>] brk_handler+0x0/0x198
[   51.218041]
[   51.218041] stack backtrace:
[   51.218049] CPU: 2 PID: 2412 Comm: ls Not tainted 4.16.0 #4
[   51.218055] Hardware name: ARM Juno development board (r1) (DT)
[   51.218061] Call trace:
[   51.218070]  dump_backtrace+0x0/0x1c8
[   51.218078]  show_stack+0x14/0x20
[   51.218087]  dump_stack+0xac/0xe4
[   51.218096]  lockdep_rcu_suspicious+0xcc/0x110
[   51.218103]  brk_handler+0x144/0x198
[   51.218110]  do_debug_exception+0x9c/0x190
[   51.218116]  el0_dbg+0x14/0x20

We will need to fix this before we can fiddle with kick_all_cpus_sync().

Thanks,
Mark.
diff mbox series

Patch

diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index ec2ee720e33e..9c06b4b80060 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -593,6 +593,7 @@  ENDPROC(el1_sync)
 
 	.align	6
 el1_irq:
+	isb					// pairs with aarch64_insn_patch_text
 	kernel_entry 1
 	enable_da_f
 #ifdef CONFIG_TRACE_IRQFLAGS