diff mbox

[2/2] powerpc: Align hot loops of some string functions

Message ID 20160526083955.3f7deda4@kryten (mailing list archive)
State Accepted
Headers show

Commit Message

Unknown sender due to SPF May 25, 2016, 10:39 p.m. UTC
Align the hot loops in our assembly implementation of strncpy(),
strncmp() and memchr().

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Comments

Christophe Leroy May 26, 2016, 7:24 a.m. UTC | #1
Le 26/05/2016 à 00:39, Anton Blanchard via Linuxppc-dev a écrit :
> Align the hot loops in our assembly implementation of strncpy(),
> strncmp() and memchr().
Wouldn't it be better to add nops before the function entry in order to 
get the hot loop aligned, instead of adding nops in the middle of the 
function ?

Christophe
>
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
>
> Index: linux.junk/arch/powerpc/lib/string.S
> ===================================================================
> --- linux.junk.orig/arch/powerpc/lib/string.S
> +++ linux.junk/arch/powerpc/lib/string.S
> @@ -24,6 +24,7 @@ _GLOBAL(strncpy)
>   	mtctr	r5
>   	addi	r6,r3,-1
>   	addi	r4,r4,-1
> +	.balign 16
>   1:	lbzu	r0,1(r4)
>   	cmpwi	0,r0,0
>   	stbu	r0,1(r6)
> @@ -42,6 +43,7 @@ _GLOBAL(strncmp)
>   	mtctr	r5
>   	addi	r5,r3,-1
>   	addi	r4,r4,-1
> +	.balign 16
>   1:	lbzu	r3,1(r5)
>   	cmpwi	1,r3,0
>   	lbzu	r0,1(r4)
> @@ -73,6 +75,7 @@ _GLOBAL(memchr)
>   	beq-	2f
>   	mtctr	r5
>   	addi	r3,r3,-1
> +	.balign 16
>   1:	lbzu	r0,1(r3)
>   	cmpw	0,r0,r4
>   	bdnzf	2,1b
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
Segher Boessenkool May 26, 2016, 7:37 p.m. UTC | #2
On Thu, May 26, 2016 at 09:24:51AM +0200, Christophe Leroy wrote:
> Wouldn't it be better to add nops before the function entry in order to 
> get the hot loop aligned, instead of adding nops in the middle of the 
> function ?

Why would that be better?  The nops are executed once per function call
in either case, there are the same number of nops in either case, and
on most CPUs nops aren't actually executed anyway (they are decoded and
the thrown away).


Segher
Christophe Leroy May 27, 2016, 5:45 a.m. UTC | #3
Le 26/05/2016 à 21:37, Segher Boessenkool a écrit :
> On Thu, May 26, 2016 at 09:24:51AM +0200, Christophe Leroy wrote:
>> Wouldn't it be better to add nops before the function entry in order to
>> get the hot loop aligned, instead of adding nops in the middle of the
>> function ?
> Why would that be better?  The nops are executed once per function call
> in either case, there are the same number of nops in either case, and
> on most CPUs nops aren't actually executed anyway (they are decoded and
> the thrown away).
>
The idea was to not execute them:

|.balign 16 nop nop _GLOBAL(strcpy) addi	r5,r3,-1 addi	r4,r4,-1 1:	lbzu 
r0,1(r4) cmpwi	0,r0,0 stbu	r0,1(r5) bne	1b blr |

Christophe
Segher Boessenkool May 27, 2016, 6:26 a.m. UTC | #4
On Fri, May 27, 2016 at 07:45:18AM +0200, Christophe Leroy wrote:
> >>Wouldn't it be better to add nops before the function entry in order to
> >>get the hot loop aligned, instead of adding nops in the middle of the
> >>function ?
> >Why would that be better?  The nops are executed once per function call
> >in either case, there are the same number of nops in either case, and
> >on most CPUs nops aren't actually executed anyway (they are decoded and
> >the thrown away).
> >
> The idea was to not execute them:
> 
> |.balign 16 nop nop _GLOBAL(strcpy) addi	r5,r3,-1 addi	r4,r4,-1 1: 
> lbzu r0,1(r4) cmpwi	0,r0,0 stbu	r0,1(r5) bne	1b blr |

That performs _worse_ on most modern CPUs (the first decode will decode
less, so instructions are available for execution later).  That's why
functions are aligned in the first place!


Segher
Michael Ellerman June 15, 2016, 12:39 p.m. UTC | #5
On Wed, 2016-25-05 at 22:39:55 UTC, Unknown sender due to SPF wrote:
> Align the hot loops in our assembly implementation of strncpy(),
> strncmp() and memchr().
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/87a156fb18fe15d012c3db506b

cheers
diff mbox

Patch

Index: linux.junk/arch/powerpc/lib/string.S
===================================================================
--- linux.junk.orig/arch/powerpc/lib/string.S
+++ linux.junk/arch/powerpc/lib/string.S
@@ -24,6 +24,7 @@  _GLOBAL(strncpy)
 	mtctr	r5
 	addi	r6,r3,-1
 	addi	r4,r4,-1
+	.balign 16
 1:	lbzu	r0,1(r4)
 	cmpwi	0,r0,0
 	stbu	r0,1(r6)
@@ -42,6 +43,7 @@  _GLOBAL(strncmp)
 	mtctr	r5
 	addi	r5,r3,-1
 	addi	r4,r4,-1
+	.balign 16
 1:	lbzu	r3,1(r5)
 	cmpwi	1,r3,0
 	lbzu	r0,1(r4)
@@ -73,6 +75,7 @@  _GLOBAL(memchr)
 	beq-	2f
 	mtctr	r5
 	addi	r3,r3,-1
+	.balign 16
 1:	lbzu	r0,1(r3)
 	cmpw	0,r0,r4
 	bdnzf	2,1b