Message ID | 20190612140317.24490-1-npiggin@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | e354d7dc81d0e81bea33165f381aff1eda45f5d9 |
Headers | show |
Series | powerpc/64: allow compiler to cache 'current' | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | Successfully applied on branch next (a3bf9fbdad600b1e4335dd90979f8d6072e4f602) |
snowpatch_ozlabs/build-ppc64le | success | Build succeeded |
snowpatch_ozlabs/build-ppc64be | success | Build succeeded |
snowpatch_ozlabs/build-ppc64e | success | Build succeeded |
snowpatch_ozlabs/build-pmac32 | success | Build succeeded |
snowpatch_ozlabs/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 9 lines checked |
On Wed, 2019-06-12 at 14:03:17 UTC, Nicholas Piggin wrote: > current may be cached by the compiler, so remove the volatile asm > restriction. This results in better generated code, as well as being > smaller and fewer dependent loads, it can avoid store-hit-load flushes > like this one that shows up in irq_exit(): > > preempt_count_sub(HARDIRQ_OFFSET); > if (!in_interrupt() && ...) > > Which ends up as: > > ((struct thread_info *)current)->preempt_count -= HARDIRQ_OFFSET; > if (((struct thread_info *)current)->preempt_count ... > > Evaluating current twice presently means it has to be loaded twice, and > here gcc happens to pick a different register each time, then > preempt_count is accessed via that base register: > > 1058: ld r10,2392(r13) <-- current > 105c: lwz r9,0(r10) <-- preempt_count > 1060: addis r9,r9,-1 > 1064: stw r9,0(r10) <-- preempt_count > 1068: ld r9,2392(r13) <-- current > 106c: lwz r9,0(r9) <-- preempt_count > 1070: rlwinm. r9,r9,0,11,23 > 1074: bne 1090 <irq_exit+0x60> > > This can frustrate store-hit-load detection heuristics and cause > flushes. Allowing the compiler to cache current in a reigster with this > patch results in the same base register being used for all accesses, > which is more likely to be detected as an alias: > > 1058: ld r31,2392(r13) > ... > 1070: lwz r9,0(r31) > 1074: addis r9,r9,-1 > 1078: stw r9,0(r31) > 107c: lwz r9,0(r31) > 1080: rlwinm. r9,r9,0,11,23 > 1084: bne 10a0 <irq_exit+0x60> > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/e354d7dc81d0e81bea33165f381aff1eda45f5d9 cheers
diff --git a/arch/powerpc/include/asm/current.h b/arch/powerpc/include/asm/current.h index 297827b76169..bbfb94800415 100644 --- a/arch/powerpc/include/asm/current.h +++ b/arch/powerpc/include/asm/current.h @@ -16,7 +16,8 @@ static inline struct task_struct *get_current(void) { struct task_struct *task; - __asm__ __volatile__("ld %0,%1(13)" + /* get_current can be cached by the compiler, so no volatile */ + asm ("ld %0,%1(13)" : "=r" (task) : "i" (offsetof(struct paca_struct, __current)));
current may be cached by the compiler, so remove the volatile asm restriction. This results in better generated code, as well as being smaller and fewer dependent loads, it can avoid store-hit-load flushes like this one that shows up in irq_exit(): preempt_count_sub(HARDIRQ_OFFSET); if (!in_interrupt() && ...) Which ends up as: ((struct thread_info *)current)->preempt_count -= HARDIRQ_OFFSET; if (((struct thread_info *)current)->preempt_count ... Evaluating current twice presently means it has to be loaded twice, and here gcc happens to pick a different register each time, then preempt_count is accessed via that base register: 1058: ld r10,2392(r13) <-- current 105c: lwz r9,0(r10) <-- preempt_count 1060: addis r9,r9,-1 1064: stw r9,0(r10) <-- preempt_count 1068: ld r9,2392(r13) <-- current 106c: lwz r9,0(r9) <-- preempt_count 1070: rlwinm. r9,r9,0,11,23 1074: bne 1090 <irq_exit+0x60> This can frustrate store-hit-load detection heuristics and cause flushes. Allowing the compiler to cache current in a reigster with this patch results in the same base register being used for all accesses, which is more likely to be detected as an alias: 1058: ld r31,2392(r13) ... 1070: lwz r9,0(r31) 1074: addis r9,r9,-1 1078: stw r9,0(r31) 107c: lwz r9,0(r31) 1080: rlwinm. r9,r9,0,11,23 1084: bne 10a0 <irq_exit+0x60> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- arch/powerpc/include/asm/current.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)