diff mbox series

powerpc/64: allow compiler to cache 'current'

Message ID 20190612140317.24490-1-npiggin@gmail.com (mailing list archive)
State Accepted
Commit e354d7dc81d0e81bea33165f381aff1eda45f5d9
Headers show
Series powerpc/64: allow compiler to cache 'current' | expand

Checks

Context Check Description
snowpatch_ozlabs/apply_patch success Successfully applied on branch next (a3bf9fbdad600b1e4335dd90979f8d6072e4f602)
snowpatch_ozlabs/build-ppc64le success Build succeeded
snowpatch_ozlabs/build-ppc64be success Build succeeded
snowpatch_ozlabs/build-ppc64e success Build succeeded
snowpatch_ozlabs/build-pmac32 success Build succeeded
snowpatch_ozlabs/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked

Commit Message

Nicholas Piggin June 12, 2019, 2:03 p.m. UTC
current may be cached by the compiler, so remove the volatile asm
restriction. This results in better generated code, as well as being
smaller and fewer dependent loads, it can avoid store-hit-load flushes
like this one that shows up in irq_exit():

    preempt_count_sub(HARDIRQ_OFFSET);
    if (!in_interrupt() && ...)

Which ends up as:

    ((struct thread_info *)current)->preempt_count -= HARDIRQ_OFFSET;
    if (((struct thread_info *)current)->preempt_count ...

Evaluating current twice presently means it has to be loaded twice, and
here gcc happens to pick a different register each time, then
preempt_count is accessed via that base register:

    1058:       ld      r10,2392(r13)     <-- current
    105c:       lwz     r9,0(r10)         <-- preempt_count
    1060:       addis   r9,r9,-1
    1064:       stw     r9,0(r10)         <-- preempt_count
    1068:       ld      r9,2392(r13)      <-- current
    106c:       lwz     r9,0(r9)          <-- preempt_count
    1070:       rlwinm. r9,r9,0,11,23
    1074:       bne     1090 <irq_exit+0x60>

This can frustrate store-hit-load detection heuristics and cause
flushes. Allowing the compiler to cache current in a reigster with this
patch results in the same base register being used for all accesses,
which is more likely to be detected as an alias:

    1058:       ld      r31,2392(r13)
    ...
    1070:       lwz     r9,0(r31)
    1074:       addis   r9,r9,-1
    1078:       stw     r9,0(r31)
    107c:       lwz     r9,0(r31)
    1080:       rlwinm. r9,r9,0,11,23
    1084:       bne     10a0 <irq_exit+0x60>

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/current.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Michael Ellerman Aug. 22, 2019, 1:08 p.m. UTC | #1
On Wed, 2019-06-12 at 14:03:17 UTC, Nicholas Piggin wrote:
> current may be cached by the compiler, so remove the volatile asm
> restriction. This results in better generated code, as well as being
> smaller and fewer dependent loads, it can avoid store-hit-load flushes
> like this one that shows up in irq_exit():
> 
>     preempt_count_sub(HARDIRQ_OFFSET);
>     if (!in_interrupt() && ...)
> 
> Which ends up as:
> 
>     ((struct thread_info *)current)->preempt_count -= HARDIRQ_OFFSET;
>     if (((struct thread_info *)current)->preempt_count ...
> 
> Evaluating current twice presently means it has to be loaded twice, and
> here gcc happens to pick a different register each time, then
> preempt_count is accessed via that base register:
> 
>     1058:       ld      r10,2392(r13)     <-- current
>     105c:       lwz     r9,0(r10)         <-- preempt_count
>     1060:       addis   r9,r9,-1
>     1064:       stw     r9,0(r10)         <-- preempt_count
>     1068:       ld      r9,2392(r13)      <-- current
>     106c:       lwz     r9,0(r9)          <-- preempt_count
>     1070:       rlwinm. r9,r9,0,11,23
>     1074:       bne     1090 <irq_exit+0x60>
> 
> This can frustrate store-hit-load detection heuristics and cause
> flushes. Allowing the compiler to cache current in a reigster with this
> patch results in the same base register being used for all accesses,
> which is more likely to be detected as an alias:
> 
>     1058:       ld      r31,2392(r13)
>     ...
>     1070:       lwz     r9,0(r31)
>     1074:       addis   r9,r9,-1
>     1078:       stw     r9,0(r31)
>     107c:       lwz     r9,0(r31)
>     1080:       rlwinm. r9,r9,0,11,23
>     1084:       bne     10a0 <irq_exit+0x60>
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/e354d7dc81d0e81bea33165f381aff1eda45f5d9

cheers
diff mbox series

Patch

diff --git a/arch/powerpc/include/asm/current.h b/arch/powerpc/include/asm/current.h
index 297827b76169..bbfb94800415 100644
--- a/arch/powerpc/include/asm/current.h
+++ b/arch/powerpc/include/asm/current.h
@@ -16,7 +16,8 @@  static inline struct task_struct *get_current(void)
 {
 	struct task_struct *task;
 
-	__asm__ __volatile__("ld %0,%1(13)"
+	/* get_current can be cached by the compiler, so no volatile */
+	asm ("ld %0,%1(13)"
 	: "=r" (task)
 	: "i" (offsetof(struct paca_struct, __current)));