mbox series

[0/2] aarch64,falkor: memcpy/memmove performance improvements

Message ID 20180503175209.2943-1-siddhesh@sourceware.org
Headers show
Series aarch64,falkor: memcpy/memmove performance improvements | expand

Message

Siddhesh Poyarekar May 3, 2018, 5:52 p.m. UTC
Hi,

Here are a couple of patches to improve performance of the falkor memcpy
and memmove implementations based on testing on the latest hardware.
The theme of the optimization is to avoid trying to train the hardware
prefetcher for smaller sizes and in the loop tail since that just
mis-trains the prefetcher.  Instead, use multiple registers to aid
reordering wherever possible.  Testing showed that regressions in these
sizes compared to generic memcpy are resolved with this patch.

Siddhesh

Siddhesh Poyarekar (2):
  aarch64,falkor: Ignore prefetcher hints for memmove tail
  Ignore prefetcher tagging for smaller copies

 sysdeps/aarch64/multiarch/memcpy_falkor.S  | 68 ++++++++++++++++++------------
 sysdeps/aarch64/multiarch/memmove_falkor.S | 48 ++++++++++++---------
 2 files changed, 70 insertions(+), 46 deletions(-)