Message ID | 484bcfaccc1ec3d91b74aeaaa26a0ae66fe0955a.1527160868.git.christophe.leroy@c-s.fr (mailing list archive) |
---|---|
State | Accepted |
Commit | 373e098e1e788d7b89ec0f31765a6c08e2ea0f7c |
Headers | show |
Series | powerpc/32: Optimise __csum_partial() | expand |
On Thu, May 24, 2018 at 11:22:27AM +0000, Christophe Leroy wrote: > Improve __csum_partial by interleaving loads and adds. > > On a 8xx, it brings neither improvement nor degradation. > On a 83xx, it brings a 25% improvement. Thanks! Looks fine to me. > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> > --- > arch/powerpc/lib/checksum_32.S | 13 +++++++++++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S > index d2238ea82209..aa224069f93a 100644 > --- a/arch/powerpc/lib/checksum_32.S > +++ b/arch/powerpc/lib/checksum_32.S > @@ -47,16 +47,25 @@ _GLOBAL(__csum_partial) > bdnz 2b > 21: srwi. r6,r4,4 /* # blocks of 4 words to do */ > beq 3f > + lwz r0,4(r3) > mtctr r6 > -22: lwz r0,4(r3) > lwz r6,8(r3) > + adde r5,r5,r0 > lwz r7,12(r3) > + adde r5,r5,r6 > lwzu r8,16(r3) > + adde r5,r5,r7 > + bdz 23f > +22: lwz r0,4(r3) > + adde r5,r5,r8 > + lwz r6,8(r3) > adde r5,r5,r0 > + lwz r7,12(r3) > adde r5,r5,r6 > + lwzu r8,16(r3) > adde r5,r5,r7 > - adde r5,r5,r8 > bdnz 22b > +23: adde r5,r5,r8 > 3: andi. r0,r4,2 > beq+ 4f > lhz r0,4(r3) > -- > 2.13.3
On Thu, 2018-05-24 at 11:22:27 UTC, Christophe Leroy wrote: > Improve __csum_partial by interleaving loads and adds. > > On a 8xx, it brings neither improvement nor degradation. > On a 83xx, it brings a 25% improvement. > > Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> > Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org> Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/373e098e1e788d7b89ec0f31765a6c cheers
diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S index d2238ea82209..aa224069f93a 100644 --- a/arch/powerpc/lib/checksum_32.S +++ b/arch/powerpc/lib/checksum_32.S @@ -47,16 +47,25 @@ _GLOBAL(__csum_partial) bdnz 2b 21: srwi. r6,r4,4 /* # blocks of 4 words to do */ beq 3f + lwz r0,4(r3) mtctr r6 -22: lwz r0,4(r3) lwz r6,8(r3) + adde r5,r5,r0 lwz r7,12(r3) + adde r5,r5,r6 lwzu r8,16(r3) + adde r5,r5,r7 + bdz 23f +22: lwz r0,4(r3) + adde r5,r5,r8 + lwz r6,8(r3) adde r5,r5,r0 + lwz r7,12(r3) adde r5,r5,r6 + lwzu r8,16(r3) adde r5,r5,r7 - adde r5,r5,r8 bdnz 22b +23: adde r5,r5,r8 3: andi. r0,r4,2 beq+ 4f lhz r0,4(r3)
Improve __csum_partial by interleaving loads and adds. On a 8xx, it brings neither improvement nor degradation. On a 83xx, it brings a 25% improvement. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> --- arch/powerpc/lib/checksum_32.S | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)