Message ID | 20160526083955.3f7deda4@kryten (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Le 26/05/2016 à 00:39, Anton Blanchard via Linuxppc-dev a écrit : > Align the hot loops in our assembly implementation of strncpy(), > strncmp() and memchr(). Wouldn't it be better to add nops before the function entry in order to get the hot loop aligned, instead of adding nops in the middle of the function ? Christophe > > Signed-off-by: Anton Blanchard <anton@samba.org> > --- > > Index: linux.junk/arch/powerpc/lib/string.S > =================================================================== > --- linux.junk.orig/arch/powerpc/lib/string.S > +++ linux.junk/arch/powerpc/lib/string.S > @@ -24,6 +24,7 @@ _GLOBAL(strncpy) > mtctr r5 > addi r6,r3,-1 > addi r4,r4,-1 > + .balign 16 > 1: lbzu r0,1(r4) > cmpwi 0,r0,0 > stbu r0,1(r6) > @@ -42,6 +43,7 @@ _GLOBAL(strncmp) > mtctr r5 > addi r5,r3,-1 > addi r4,r4,-1 > + .balign 16 > 1: lbzu r3,1(r5) > cmpwi 1,r3,0 > lbzu r0,1(r4) > @@ -73,6 +75,7 @@ _GLOBAL(memchr) > beq- 2f > mtctr r5 > addi r3,r3,-1 > + .balign 16 > 1: lbzu r0,1(r3) > cmpw 0,r0,r4 > bdnzf 2,1b > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev
On Thu, May 26, 2016 at 09:24:51AM +0200, Christophe Leroy wrote: > Wouldn't it be better to add nops before the function entry in order to > get the hot loop aligned, instead of adding nops in the middle of the > function ? Why would that be better? The nops are executed once per function call in either case, there are the same number of nops in either case, and on most CPUs nops aren't actually executed anyway (they are decoded and the thrown away). Segher
Le 26/05/2016 à 21:37, Segher Boessenkool a écrit : > On Thu, May 26, 2016 at 09:24:51AM +0200, Christophe Leroy wrote: >> Wouldn't it be better to add nops before the function entry in order to >> get the hot loop aligned, instead of adding nops in the middle of the >> function ? > Why would that be better? The nops are executed once per function call > in either case, there are the same number of nops in either case, and > on most CPUs nops aren't actually executed anyway (they are decoded and > the thrown away). > The idea was to not execute them: |.balign 16 nop nop _GLOBAL(strcpy) addi r5,r3,-1 addi r4,r4,-1 1: lbzu r0,1(r4) cmpwi 0,r0,0 stbu r0,1(r5) bne 1b blr | Christophe
On Fri, May 27, 2016 at 07:45:18AM +0200, Christophe Leroy wrote: > >>Wouldn't it be better to add nops before the function entry in order to > >>get the hot loop aligned, instead of adding nops in the middle of the > >>function ? > >Why would that be better? The nops are executed once per function call > >in either case, there are the same number of nops in either case, and > >on most CPUs nops aren't actually executed anyway (they are decoded and > >the thrown away). > > > The idea was to not execute them: > > |.balign 16 nop nop _GLOBAL(strcpy) addi r5,r3,-1 addi r4,r4,-1 1: > lbzu r0,1(r4) cmpwi 0,r0,0 stbu r0,1(r5) bne 1b blr | That performs _worse_ on most modern CPUs (the first decode will decode less, so instructions are available for execution later). That's why functions are aligned in the first place! Segher
On Wed, 2016-25-05 at 22:39:55 UTC, Unknown sender due to SPF wrote: > Align the hot loops in our assembly implementation of strncpy(), > strncmp() and memchr(). > > Signed-off-by: Anton Blanchard <anton@samba.org> Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/87a156fb18fe15d012c3db506b cheers
Index: linux.junk/arch/powerpc/lib/string.S =================================================================== --- linux.junk.orig/arch/powerpc/lib/string.S +++ linux.junk/arch/powerpc/lib/string.S @@ -24,6 +24,7 @@ _GLOBAL(strncpy) mtctr r5 addi r6,r3,-1 addi r4,r4,-1 + .balign 16 1: lbzu r0,1(r4) cmpwi 0,r0,0 stbu r0,1(r6) @@ -42,6 +43,7 @@ _GLOBAL(strncmp) mtctr r5 addi r5,r3,-1 addi r4,r4,-1 + .balign 16 1: lbzu r3,1(r5) cmpwi 1,r3,0 lbzu r0,1(r4) @@ -73,6 +75,7 @@ _GLOBAL(memchr) beq- 2f mtctr r5 addi r3,r3,-1 + .balign 16 1: lbzu r0,1(r3) cmpw 0,r0,r4 bdnzf 2,1b
Align the hot loops in our assembly implementation of strncpy(), strncmp() and memchr(). Signed-off-by: Anton Blanchard <anton@samba.org> ---