Message ID | DB8PR08MB503633698FAA56D1D8E3AF35835A0@DB8PR08MB5036.eurprd08.prod.outlook.com |
---|---|
State | New |
Headers | show |
Series | AArch64: Improve backwards memmove performance | expand |
On 20/08/2020 08:46, Wilco Dijkstra wrote: > On some microarchitectures performance of the backwards memmove improves if > the stores use STR with decreasing addresses. So change the memmove loop > in memcpy_advsimd.S to use 2x STR rather than STP. > > Passes GLIBC regression test, OK for commit? LGTM, thanks. Does it make any difference to use the same strategy on the last iteration at L(copy64_from_start) as well? Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> > > --- > diff --git a/sysdeps/aarch64/multiarch/memcpy_advsimd.S b/sysdeps/aarch64/multiarch/memcpy_advsimd.S > index d4ba74777744c8bb5a83e43ab2d63ad8dab35203..48bb6d7ca425197907eaef2307fb3939e69baa15 100644 > --- a/sysdeps/aarch64/multiarch/memcpy_advsimd.S > +++ b/sysdeps/aarch64/multiarch/memcpy_advsimd.S > @@ -223,12 +223,13 @@ L(copy_long_backwards): > b.ls L(copy64_from_start) > > L(loop64_backwards): > - stp A_q, B_q, [dstend, -32] > + str B_q, [dstend, -16] > + str A_q, [dstend, -32] > ldp A_q, B_q, [srcend, -96] > - stp C_q, D_q, [dstend, -64] > + str D_q, [dstend, -48] > + str C_q, [dstend, -64]! > ldp C_q, D_q, [srcend, -128] > sub srcend, srcend, 64 > - sub dstend, dstend, 64 > subs count, count, 64 > b.hi L(loop64_backwards) > >
diff --git a/sysdeps/aarch64/multiarch/memcpy_advsimd.S b/sysdeps/aarch64/multiarch/memcpy_advsimd.S index d4ba74777744c8bb5a83e43ab2d63ad8dab35203..48bb6d7ca425197907eaef2307fb3939e69baa15 100644 --- a/sysdeps/aarch64/multiarch/memcpy_advsimd.S +++ b/sysdeps/aarch64/multiarch/memcpy_advsimd.S @@ -223,12 +223,13 @@ L(copy_long_backwards): b.ls L(copy64_from_start) L(loop64_backwards): - stp A_q, B_q, [dstend, -32] + str B_q, [dstend, -16] + str A_q, [dstend, -32] ldp A_q, B_q, [srcend, -96] - stp C_q, D_q, [dstend, -64] + str D_q, [dstend, -48] + str C_q, [dstend, -64]! ldp C_q, D_q, [srcend, -128] sub srcend, srcend, 64 - sub dstend, dstend, 64 subs count, count, 64 b.hi L(loop64_backwards)