diff mbox series

[v13,7/7] riscv: Add and use alignment-ignorant memcpy

Message ID 20240227225644.724901-8-evan@rivosinc.com
State New
Headers show
Series RISC-V: ifunced memcpy using new kernel hwprobe interface | expand

Commit Message

Evan Green Feb. 27, 2024, 10:56 p.m. UTC
For CPU implementations that can perform unaligned accesses with little
or no performance penalty, create a memcpy implementation that does not
bother aligning buffers. It will use a block of integer registers, a
single integer register, and fall back to bytewise copy for the
remainder.

Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>

---

(no changes since v11)

Changes in v11:
 - Update copyrights to 2024 (Adhemerval)
 - Avoid implicit check in select_memcpy_ifunc (Adhemerval)
 - Remove hidden_def (__memcpy_noalignment) (Adhemerval)

Changes in v10:
 - One line per function in Makefile for memcpy (Adhemerval)
 - Space before argument-like things (Adhemerval)

Changes in v7:
 - Use new helper function in memcpy ifunc selector (Richard)

Changes in v6:
 - Fix a couple regressions in the assembly from v5 :/
 - Use passed hwprobe pointer in memcpy ifunc selector.

Changes in v5:
 - Do unaligned word access for final trailing bytes (Richard)

Changes in v4:
 - Fixed comment style (Florian)

Changes in v3:
 - Word align dest for large memcpy()s.
 - Add tags
 - Remove spurious blank line from sysdeps/riscv/memcpy.c

Changes in v2:
 - Used _MASK instead of _FAST value itself.


---
 sysdeps/riscv/memcopy.h                       |  26 ++++
 sysdeps/riscv/memcpy.c                        |  63 ++++++++
 sysdeps/riscv/memcpy_noalignment.S            | 136 ++++++++++++++++++
 sysdeps/unix/sysv/linux/riscv/Makefile        |   9 ++
 .../unix/sysv/linux/riscv/memcpy-generic.c    |  24 ++++
 5 files changed, 258 insertions(+)
 create mode 100644 sysdeps/riscv/memcopy.h
 create mode 100644 sysdeps/riscv/memcpy.c
 create mode 100644 sysdeps/riscv/memcpy_noalignment.S
 create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c

Comments

Andreas Schwab March 2, 2024, 10:33 a.m. UTC | #1
On Feb 27 2024, Evan Green wrote:

> For CPU implementations that can perform unaligned accesses with little
> or no performance penalty, create a memcpy implementation that does not
> bother aligning buffers. It will use a block of integer registers, a
> single integer register, and fall back to bytewise copy for the
> remainder.

How has that been tested?  It causes memory corruption.
Andreas Schwab March 2, 2024, 1:52 p.m. UTC | #2
FAIL: string/test-bcopy
FAIL: string/test-memccpy
FAIL: string/test-memcmp
FAIL: string/test-memcmpeq
FAIL: string/test-memcpy
FAIL: string/test-memcpy-large
FAIL: string/test-memmove
FAIL: string/test-mempcpy
FAIL: string/test-stpncpy
FAIL: string/test-strcasecmp
FAIL: string/test-strcat
FAIL: string/test-strcmp
FAIL: string/test-strncasecmp
FAIL: string/test-strncat
FAIL: string/test-strncmp
FAIL: string/test-strncpy
FAIL: string/tst-strlcat2
FAIL: string/tst-strlcpy2
FAIL: wcsmbs/test-wcpcpy
FAIL: wcsmbs/test-wcpncpy
FAIL: wcsmbs/test-wcscat
FAIL: wcsmbs/test-wcscmp
FAIL: wcsmbs/test-wcscpy
FAIL: wcsmbs/test-wcsncat
FAIL: wcsmbs/test-wcsncmp
FAIL: wcsmbs/test-wcsncpy
FAIL: wcsmbs/test-wmemcmp
FAIL: wcsmbs/tst-btowc
FAIL: wcsmbs/tst-wcslcat2
FAIL: wcsmbs/tst-wcslcpy2
FAIL: wcsmbs/tst-wcstod-nan-locale
FAIL: wcsmbs/tst-wcstol-locale
FAIL: wcsmbs/wcsatcliff
Adhemerval Zanella Netto March 4, 2024, 6:30 p.m. UTC | #3
On 02/03/24 07:33, Andreas Schwab wrote:
> On Feb 27 2024, Evan Green wrote:
> 
>> For CPU implementations that can perform unaligned accesses with little
>> or no performance penalty, create a memcpy implementation that does not
>> bother aligning buffers. It will use a block of integer registers, a
>> single integer register, and fall back to bytewise copy for the
>> remainder.
> 
> How has that been tested?  It causes memory corruption.
> 

The memcpy optimization has multiple issues:

 1. The implementation is wrong: the chunk size calculation is wrong
 leading to invalid memory access.

 2. It adds ifunc supports as default, so --disable-multi-arch does
 not work as expected for riscv.

 3. It mixes Linux files (memcpy ifunc selection which requires the
 vDSO/syscall mechanism) with generic support (the memcpy
 optimization itself).

 4. There is no __libc_ifunc_impl_list, which makes testing only
 check the selected implementation instead of all supported
 by the system.

The 2., 3., and 4. are not really a problem since they came most likely
from code base inexperience. However, the 1. is *really* a red
flag since it means that you did not run a 'make check' to certify
no regressions were found.

I could trigger the same failures on qemu-system/qemu-user by forcing the 
implementation to be used as default, even if your hardware does have 
RISCV_HWPROBE_MISALIGNED_FAST, you could also check for correctness.

I sent a newer version that [1] should fix all the issues I have found. 
The C implementation is also easier to follow and it should generate 
code as good as the assembly version, so please check if on real 
hardware (I have only tested on qemu). As per the previous discussion,
I would like to use this as a base for a generic implementation; to 
avoid each architecture to reimplement the same code (as I did for some 
string implementations).

[1] https://patchwork.sourceware.org/project/glibc/patch/20240304175902.1479562-1-adhemerval.zanella@linaro.org/
Evan Green March 4, 2024, 7:35 p.m. UTC | #4
On Mon, Mar 4, 2024 at 10:30 AM Adhemerval Zanella Netto
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 02/03/24 07:33, Andreas Schwab wrote:
> > On Feb 27 2024, Evan Green wrote:
> >
> >> For CPU implementations that can perform unaligned accesses with little
> >> or no performance penalty, create a memcpy implementation that does not
> >> bother aligning buffers. It will use a block of integer registers, a
> >> single integer register, and fall back to bytewise copy for the
> >> remainder.
> >
> > How has that been tested?  It causes memory corruption.
> >
>
> The memcpy optimization has multiple issues:
>
>  1. The implementation is wrong: the chunk size calculation is wrong
>  leading to invalid memory access.
>
>  2. It adds ifunc supports as default, so --disable-multi-arch does
>  not work as expected for riscv.
>
>  3. It mixes Linux files (memcpy ifunc selection which requires the
>  vDSO/syscall mechanism) with generic support (the memcpy
>  optimization itself).
>
>  4. There is no __libc_ifunc_impl_list, which makes testing only
>  check the selected implementation instead of all supported
>  by the system.
>
> The 2., 3., and 4. are not really a problem since they came most likely
> from code base inexperience. However, the 1. is *really* a red
> flag since it means that you did not run a 'make check' to certify
> no regressions were found.

This is my bad. My testing for this series involved boot testing, plus
some tests I had written specifically for this, as well as
build-many-glibcs. Make check was not on my radar, and clearly should
have been. Thanks for coming in with a quick fix. I'll take a look at
what you've done and holler if I see any issues.
-Evan
diff mbox series

Patch

diff --git a/sysdeps/riscv/memcopy.h b/sysdeps/riscv/memcopy.h
new file mode 100644
index 0000000000..27675964b0
--- /dev/null
+++ b/sysdeps/riscv/memcopy.h
@@ -0,0 +1,26 @@ 
+/* memcopy.h -- definitions for memory copy functions. RISC-V version.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdeps/generic/memcopy.h>
+
+/* Redefine the generic memcpy implementation to __memcpy_generic, so
+   the memcpy ifunc can select between generic and special versions.
+   In rtld, don't bother with all the ifunciness. */
+#if IS_IN (libc)
+#define MEMCPY __memcpy_generic
+#endif
diff --git a/sysdeps/riscv/memcpy.c b/sysdeps/riscv/memcpy.c
new file mode 100644
index 0000000000..20f9548c44
--- /dev/null
+++ b/sysdeps/riscv/memcpy.c
@@ -0,0 +1,63 @@ 
+/* Multiple versions of memcpy.
+   All versions must be listed in ifunc-impl-list.c.
+   Copyright (C) 2017-2024 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#if IS_IN (libc)
+/* Redefine memcpy so that the compiler won't complain about the type
+   mismatch with the IFUNC selector in strong_alias, below.  */
+# undef memcpy
+# define memcpy __redirect_memcpy
+# include <stdint.h>
+# include <string.h>
+# include <ifunc-init.h>
+# include <riscv-ifunc.h>
+# include <sys/hwprobe.h>
+
+# define INIT_ARCH()
+
+extern __typeof (__redirect_memcpy) __libc_memcpy;
+
+extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden;
+extern __typeof (__redirect_memcpy) __memcpy_noalignment attribute_hidden;
+
+static inline __typeof (__redirect_memcpy) *
+select_memcpy_ifunc (uint64_t dl_hwcap, __riscv_hwprobe_t hwprobe_func)
+{
+  unsigned long long int value;
+
+  INIT_ARCH ();
+
+  if (__riscv_hwprobe_one (hwprobe_func, RISCV_HWPROBE_KEY_CPUPERF_0, &value) != 0)
+    return __memcpy_generic;
+
+  if ((value & RISCV_HWPROBE_MISALIGNED_MASK) == RISCV_HWPROBE_MISALIGNED_FAST)
+    return __memcpy_noalignment;
+
+  return __memcpy_generic;
+}
+
+riscv_libc_ifunc (__libc_memcpy, select_memcpy_ifunc);
+
+# undef memcpy
+strong_alias (__libc_memcpy, memcpy);
+# ifdef SHARED
+__hidden_ver1 (memcpy, __GI_memcpy, __redirect_memcpy)
+  __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memcpy);
+# endif
+
+#endif
diff --git a/sysdeps/riscv/memcpy_noalignment.S b/sysdeps/riscv/memcpy_noalignment.S
new file mode 100644
index 0000000000..621f8d028f
--- /dev/null
+++ b/sysdeps/riscv/memcpy_noalignment.S
@@ -0,0 +1,136 @@ 
+/* memcpy for RISC-V, ignoring buffer alignment
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <sysdep.h>
+#include <sys/asm.h>
+
+/* void *memcpy(void *, const void *, size_t) */
+ENTRY (__memcpy_noalignment)
+	move t6, a0  /* Preserve return value */
+
+	/* Bail if 0 */
+	beqz a2, 7f
+
+	/* Jump to byte copy if size < SZREG */
+	li a4, SZREG
+	bltu a2, a4, 5f
+
+	/* Round down to the nearest "page" size */
+	andi a4, a2, ~((16*SZREG)-1)
+	beqz a4, 2f
+	add a3, a1, a4
+
+	/* Copy the first word to get dest word aligned */
+	andi a5, t6, SZREG-1
+	beqz a5, 1f
+	REG_L a6, (a1)
+	REG_S a6, (t6)
+
+	/* Align dst up to a word, move src and size as well. */
+	addi t6, t6, SZREG-1
+	andi t6, t6, ~(SZREG-1)
+	sub a5, t6, a0
+	add a1, a1, a5
+	sub a2, a2, a5
+
+	/* Recompute page count */
+	andi a4, a2, ~((16*SZREG)-1)
+	beqz a4, 2f
+
+1:
+	/* Copy "pages" (chunks of 16 registers) */
+	REG_L a4,       0(a1)
+	REG_L a5,   SZREG(a1)
+	REG_L a6, 2*SZREG(a1)
+	REG_L a7, 3*SZREG(a1)
+	REG_L t0, 4*SZREG(a1)
+	REG_L t1, 5*SZREG(a1)
+	REG_L t2, 6*SZREG(a1)
+	REG_L t3, 7*SZREG(a1)
+	REG_L t4, 8*SZREG(a1)
+	REG_L t5, 9*SZREG(a1)
+	REG_S a4,       0(t6)
+	REG_S a5,   SZREG(t6)
+	REG_S a6, 2*SZREG(t6)
+	REG_S a7, 3*SZREG(t6)
+	REG_S t0, 4*SZREG(t6)
+	REG_S t1, 5*SZREG(t6)
+	REG_S t2, 6*SZREG(t6)
+	REG_S t3, 7*SZREG(t6)
+	REG_S t4, 8*SZREG(t6)
+	REG_S t5, 9*SZREG(t6)
+	REG_L a4, 10*SZREG(a1)
+	REG_L a5, 11*SZREG(a1)
+	REG_L a6, 12*SZREG(a1)
+	REG_L a7, 13*SZREG(a1)
+	REG_L t0, 14*SZREG(a1)
+	REG_L t1, 15*SZREG(a1)
+	addi a1, a1, 16*SZREG
+	REG_S a4, 10*SZREG(t6)
+	REG_S a5, 11*SZREG(t6)
+	REG_S a6, 12*SZREG(t6)
+	REG_S a7, 13*SZREG(t6)
+	REG_S t0, 14*SZREG(t6)
+	REG_S t1, 15*SZREG(t6)
+	addi t6, t6, 16*SZREG
+	bltu a1, a3, 1b
+	andi a2, a2, (16*SZREG)-1  /* Update count */
+
+2:
+	/* Remainder is smaller than a page, compute native word count */
+	beqz a2, 7f
+	andi a5, a2, ~(SZREG-1)
+	andi a2, a2, (SZREG-1)
+	add a3, a1, a5
+	/* Jump directly to last word if no words. */
+	beqz a5, 4f
+
+3:
+	/* Use single native register copy */
+	REG_L a4, 0(a1)
+	addi a1, a1, SZREG
+	REG_S a4, 0(t6)
+	addi t6, t6, SZREG
+	bltu a1, a3, 3b
+
+	/* Jump directly out if no more bytes */
+	beqz a2, 7f
+
+4:
+	/* Copy the last word unaligned */
+	add a3, a1, a2
+	add a4, t6, a2
+	REG_L a5, -SZREG(a3)
+	REG_S a5, -SZREG(a4)
+	ret
+
+5:
+	/* Copy bytes when the total copy is <SZREG */
+	add a3, a1, a2
+
+6:
+	lb a4, 0(a1)
+	addi a1, a1, 1
+	sb a4, 0(t6)
+	addi t6, t6, 1
+	bltu a1, a3, 6b
+
+7:
+	ret
+
+END (__memcpy_noalignment)
diff --git a/sysdeps/unix/sysv/linux/riscv/Makefile b/sysdeps/unix/sysv/linux/riscv/Makefile
index 04abf226ad..398ff7418b 100644
--- a/sysdeps/unix/sysv/linux/riscv/Makefile
+++ b/sysdeps/unix/sysv/linux/riscv/Makefile
@@ -15,6 +15,15 @@  ifeq ($(subdir),stdlib)
 gen-as-const-headers += ucontext_i.sym
 endif
 
+ifeq ($(subdir),string)
+sysdep_routines += \
+  memcpy \
+  memcpy-generic \
+  memcpy_noalignment \
+  # sysdep_routines
+
+endif
+
 abi-variants := ilp32 ilp32d lp64 lp64d
 
 ifeq (,$(filter $(default-abi),$(abi-variants)))
diff --git a/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c b/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c
new file mode 100644
index 0000000000..f06f4bda15
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/riscv/memcpy-generic.c
@@ -0,0 +1,24 @@ 
+/* Re-include the default memcpy implementation.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <string.h>
+
+extern __typeof (memcpy) __memcpy_generic;
+hidden_proto (__memcpy_generic)
+
+#include <string/memcpy.c>