Message ID | b27f1b13f1d5fca6fb0ad5e90692667c56528583.1523950415.git.christophe.leroy@c-s.fr (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [1/7] powerpc/lib: move PPC32 specific functions out of string.S | expand |
diff --git a/arch/powerpc/lib/string.S b/arch/powerpc/lib/string.S index 89af53b08b4a..9e96f1c102c6 100644 --- a/arch/powerpc/lib/string.S +++ b/arch/powerpc/lib/string.S @@ -25,7 +25,9 @@ _GLOBAL(strncpy) mtctr r5 addi r6,r3,-1 addi r4,r4,-1 +#ifdef CONFIG_PPC64 .balign 16 +#endif 1: lbzu r0,1(r4) cmpwi 0,r0,0 stbu r0,1(r6) @@ -47,7 +49,9 @@ _GLOBAL(strncmp) mtctr r5 addi r5,r3,-1 addi r4,r4,-1 +#ifdef CONFIG_PPC64 .balign 16 +#endif 1: lbzu r3,1(r5) cmpwi 1,r3,0 lbzu r0,1(r4) @@ -68,7 +72,9 @@ _GLOBAL(memchr) #endif mtctr r5 addi r3,r3,-1 +#ifdef CONFIG_PPC64 .balign 16 +#endif 1: lbzu r0,1(r3) cmpw 0,r0,r4 bdnzf 2,1b
commit 87a156fb18fe1 ("Align hot loops of some string functions") degraded the performance of string functions by adding useless nops A simple benchmark on an 8xx calling 100000x a memchr() that matches the first byte runs in 41668 TB ticks before this patch and in 35986 TB ticks after this patch. So this gives an improvement of approx 10% Another benchmark doing the same with a memchr() matching the 128th byte runs in 1011365 TB ticks before this patch and 1005682 TB ticks after this patch, so regardless on the number of loops, removing those useless nops improves the test by 5683 TB ticks. Fixes: 87a156fb18fe1 ("Align hot loops of some string functions") Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> --- arch/powerpc/lib/string.S | 6 ++++++ 1 file changed, 6 insertions(+)