From patchwork Tue Feb 27 22:56:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Evan Green X-Patchwork-Id: 1905483 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=rivosinc-com.20230601.gappssmtp.com header.i=@rivosinc-com.20230601.gappssmtp.com header.a=rsa-sha256 header.s=20230601 header.b=P1+rMMUV; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TktF74jL8z23d3 for ; Wed, 28 Feb 2024 09:58:19 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 93CDD3858286 for ; Tue, 27 Feb 2024 22:58:17 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pg1-x532.google.com (mail-pg1-x532.google.com [IPv6:2607:f8b0:4864:20::532]) by sourceware.org (Postfix) with ESMTPS id D215E3858C32 for ; Tue, 27 Feb 2024 22:57:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D215E3858C32 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivosinc.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivosinc.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org D215E3858C32 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::532 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709074634; cv=none; b=AmQtKPVmgUYUal2X0JLDmDt7zqyI7wTs/F7/qsiVHofLlpx6oaVsDRpOqM9JZGYkTh977sHCWwO9cSXQvGcdBBYfe9jUa80/NCICn2wV6NhG000g2e4qdZSQ02V06gl7wpEZUUCuAoPYW5EIIqqvCCsZO+3S+t7lvPv//dosbX8= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709074634; c=relaxed/simple; bh=RQeCsTdLzL95jS0ANE2Z4YVo2AodGkOejdkJeicqBls=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=JUSx6sMQ6zalBeQ2ipSHBuFT/ZDA7+gPHJq2A7j5R5TCtCOYzVFaEsx6qN3iGAiloKE94RrjaR35lNEUSAs8FgWmuS89XXAOwzIB7XerzTt7wZCx0AsUVWOYy8Y47Ox7CchPYTyMTuU1a4NBhCI/r3C+4Rg5dEmn5mle7URrp88= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pg1-x532.google.com with SMTP id 41be03b00d2f7-5cdbc4334edso3556859a12.3 for ; Tue, 27 Feb 2024 14:57:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=rivosinc-com.20230601.gappssmtp.com; s=20230601; t=1709074628; x=1709679428; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=d0YVHUk/lddbP97jJ8T2SQlEKI7riLRj+3PwAqGif+0=; b=P1+rMMUVfKJT1xxB+Jm/8Pksbb4Ib7AL8uGqyYywx/OsFTbd5nsFo/tdB+Dp8ve8GZ 9zi+gi86BnxsY93ppjGVf2hZlMYi48AX8Qz/05xizjn7wRDy7zMeenJIyXIiYI4bF6eW 0uUeeAxcAlwb0oM0ozKAQIDtQ31XijP4wnEAxn/hsjlGV3d/qtmgA8mHDVOIAyseNb8h eg7GOOfAqvOEgmjAh/RnVeGau54Ujiwm7Mxc8PRiDMEH2aOThn5t0c/umOpR2mNMPYDS BG1tvg2uLKUwqIv3fmNcEY/6T8VJRkCjuxJLJNlTMx/qHVLvOWE6ANXngA/eLN3tdvlv 1WZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709074628; x=1709679428; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=d0YVHUk/lddbP97jJ8T2SQlEKI7riLRj+3PwAqGif+0=; b=oUZM9tT1gGxxLw1qhPs1LUa8lQ7QjK7hkm2e0Yhh5lliWPv/AwnUqtr00l6A6SBOUO GMAfZY2RxjaHNS0bocvdao9/eELRMU0NvPmPwBmlsvRD2Kbw1+cgAySXbAoqcWShtwXz B3xGly4siQJsqn2dVlnOxDtGKORzbzIaxm6EEJqYAYiFRM/v86GKwyqUIuh0wl/trs5u yX7vYJPiC8KAxwVzI7EdSng4zBAPii6yKPc1hPcYIN4HZ9ONOFskI/akFN7N1LqGD9Ih EwdURLr09jujEvzvxglVJZu8lkDc/j6/jMlOz/1tnPmpqM2tg+FYe+sWhib6dli7Lm4B xTBg== X-Gm-Message-State: AOJu0YyCDRJHRKehXyfjEV9SBvh4Ew/GeWI4Pp9WX5fq1PE1IwUn2N4x thEnYmNiQFv33KcIC54nX8n1/U6brE3r2mZGZs3xMgV0tEYYIpJ3Yxou4uaAn5tm6SpMwXH3Lum N X-Google-Smtp-Source: AGHT+IHfOy7a8KA1yM4XLI23oCo7kdntgbCBkbOvv6TjeG/qyO56K+8ei10uRXH1Xq12rERFHjHpMQ== X-Received: by 2002:a05:6a20:94c9:b0:1a0:9115:48c8 with SMTP id ht9-20020a056a2094c900b001a0911548c8mr4200980pzb.51.1709074627944; Tue, 27 Feb 2024 14:57:07 -0800 (PST) Received: from evan.ba.rivosinc.com ([64.71.180.162]) by smtp.gmail.com with ESMTPSA id li3-20020a170903294300b001dc90ac1cecsm2029225plb.284.2024.02.27.14.57.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 14:57:07 -0800 (PST) From: Evan Green To: libc-alpha@sourceware.org Cc: vineetg@rivosinc.com, slewis@rivosinc.com, Florian Weimer , palmer@rivosinc.com, Evan Green Subject: [PATCH v13 7/7] riscv: Add and use alignment-ignorant memcpy Date: Tue, 27 Feb 2024 14:56:43 -0800 Message-Id: <20240227225644.724901-8-evan@rivosinc.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240227225644.724901-1-evan@rivosinc.com> References: <20240227225644.724901-1-evan@rivosinc.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org For CPU implementations that can perform unaligned accesses with little or no performance penalty, create a memcpy implementation that does not bother aligning buffers. It will use a block of integer registers, a single integer register, and fall back to bytewise copy for the remainder. Signed-off-by: Evan Green Reviewed-by: Palmer Dabbelt --- (no changes since v11) Changes in v11: - Update copyrights to 2024 (Adhemerval) - Avoid implicit check in select_memcpy_ifunc (Adhemerval) - Remove hidden_def (__memcpy_noalignment) (Adhemerval) Changes in v10: - One line per function in Makefile for memcpy (Adhemerval) - Space before argument-like things (Adhemerval) Changes in v7: - Use new helper function in memcpy ifunc selector (Richard) Changes in v6: - Fix a couple regressions in the assembly from v5 :/ - Use passed hwprobe pointer in memcpy ifunc selector. Changes in v5: - Do unaligned word access for final trailing bytes (Richard) Changes in v4: - Fixed comment style (Florian) Changes in v3: - Word align dest for large memcpy()s. - Add tags - Remove spurious blank line from sysdeps/riscv/memcpy.c Changes in v2: - Used _MASK instead of _FAST value itself. --- sysdeps/riscv/memcopy.h | 26 ++++ sysdeps/riscv/memcpy.c | 63 ++++++++ sysdeps/riscv/memcpy_noalignment.S | 136 ++++++++++++++++++ sysdeps/unix/sysv/linux/riscv/Makefile | 9 ++ .../unix/sysv/linux/riscv/memcpy-generic.c | 24 ++++ 5 files changed, 258 insertions(+) create mode 100644 sysdeps/riscv/memcopy.h create mode 100644 sysdeps/riscv/memcpy.c create mode 100644 sysdeps/riscv/memcpy_noalignment.S create mode 100644 sysdeps/unix/sysv/linux/riscv/memcpy-generic.c diff --git a/sysdeps/riscv/memcopy.h b/sysdeps/riscv/memcopy.h new file mode 100644 index 0000000000..27675964b0 --- /dev/null +++ b/sysdeps/riscv/memcopy.h @@ -0,0 +1,26 @@ +/* memcopy.h -- definitions for memory copy functions. RISC-V version. + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include + +/* Redefine the generic memcpy implementation to __memcpy_generic, so + the memcpy ifunc can select between generic and special versions. + In rtld, don't bother with all the ifunciness. */ +#if IS_IN (libc) +#define MEMCPY __memcpy_generic +#endif diff --git a/sysdeps/riscv/memcpy.c b/sysdeps/riscv/memcpy.c new file mode 100644 index 0000000000..20f9548c44 --- /dev/null +++ b/sysdeps/riscv/memcpy.c @@ -0,0 +1,63 @@ +/* Multiple versions of memcpy. + All versions must be listed in ifunc-impl-list.c. + Copyright (C) 2017-2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#if IS_IN (libc) +/* Redefine memcpy so that the compiler won't complain about the type + mismatch with the IFUNC selector in strong_alias, below. */ +# undef memcpy +# define memcpy __redirect_memcpy +# include +# include +# include +# include +# include + +# define INIT_ARCH() + +extern __typeof (__redirect_memcpy) __libc_memcpy; + +extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden; +extern __typeof (__redirect_memcpy) __memcpy_noalignment attribute_hidden; + +static inline __typeof (__redirect_memcpy) * +select_memcpy_ifunc (uint64_t dl_hwcap, __riscv_hwprobe_t hwprobe_func) +{ + unsigned long long int value; + + INIT_ARCH (); + + if (__riscv_hwprobe_one (hwprobe_func, RISCV_HWPROBE_KEY_CPUPERF_0, &value) != 0) + return __memcpy_generic; + + if ((value & RISCV_HWPROBE_MISALIGNED_MASK) == RISCV_HWPROBE_MISALIGNED_FAST) + return __memcpy_noalignment; + + return __memcpy_generic; +} + +riscv_libc_ifunc (__libc_memcpy, select_memcpy_ifunc); + +# undef memcpy +strong_alias (__libc_memcpy, memcpy); +# ifdef SHARED +__hidden_ver1 (memcpy, __GI_memcpy, __redirect_memcpy) + __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memcpy); +# endif + +#endif diff --git a/sysdeps/riscv/memcpy_noalignment.S b/sysdeps/riscv/memcpy_noalignment.S new file mode 100644 index 0000000000..621f8d028f --- /dev/null +++ b/sysdeps/riscv/memcpy_noalignment.S @@ -0,0 +1,136 @@ +/* memcpy for RISC-V, ignoring buffer alignment + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include + +/* void *memcpy(void *, const void *, size_t) */ +ENTRY (__memcpy_noalignment) + move t6, a0 /* Preserve return value */ + + /* Bail if 0 */ + beqz a2, 7f + + /* Jump to byte copy if size < SZREG */ + li a4, SZREG + bltu a2, a4, 5f + + /* Round down to the nearest "page" size */ + andi a4, a2, ~((16*SZREG)-1) + beqz a4, 2f + add a3, a1, a4 + + /* Copy the first word to get dest word aligned */ + andi a5, t6, SZREG-1 + beqz a5, 1f + REG_L a6, (a1) + REG_S a6, (t6) + + /* Align dst up to a word, move src and size as well. */ + addi t6, t6, SZREG-1 + andi t6, t6, ~(SZREG-1) + sub a5, t6, a0 + add a1, a1, a5 + sub a2, a2, a5 + + /* Recompute page count */ + andi a4, a2, ~((16*SZREG)-1) + beqz a4, 2f + +1: + /* Copy "pages" (chunks of 16 registers) */ + REG_L a4, 0(a1) + REG_L a5, SZREG(a1) + REG_L a6, 2*SZREG(a1) + REG_L a7, 3*SZREG(a1) + REG_L t0, 4*SZREG(a1) + REG_L t1, 5*SZREG(a1) + REG_L t2, 6*SZREG(a1) + REG_L t3, 7*SZREG(a1) + REG_L t4, 8*SZREG(a1) + REG_L t5, 9*SZREG(a1) + REG_S a4, 0(t6) + REG_S a5, SZREG(t6) + REG_S a6, 2*SZREG(t6) + REG_S a7, 3*SZREG(t6) + REG_S t0, 4*SZREG(t6) + REG_S t1, 5*SZREG(t6) + REG_S t2, 6*SZREG(t6) + REG_S t3, 7*SZREG(t6) + REG_S t4, 8*SZREG(t6) + REG_S t5, 9*SZREG(t6) + REG_L a4, 10*SZREG(a1) + REG_L a5, 11*SZREG(a1) + REG_L a6, 12*SZREG(a1) + REG_L a7, 13*SZREG(a1) + REG_L t0, 14*SZREG(a1) + REG_L t1, 15*SZREG(a1) + addi a1, a1, 16*SZREG + REG_S a4, 10*SZREG(t6) + REG_S a5, 11*SZREG(t6) + REG_S a6, 12*SZREG(t6) + REG_S a7, 13*SZREG(t6) + REG_S t0, 14*SZREG(t6) + REG_S t1, 15*SZREG(t6) + addi t6, t6, 16*SZREG + bltu a1, a3, 1b + andi a2, a2, (16*SZREG)-1 /* Update count */ + +2: + /* Remainder is smaller than a page, compute native word count */ + beqz a2, 7f + andi a5, a2, ~(SZREG-1) + andi a2, a2, (SZREG-1) + add a3, a1, a5 + /* Jump directly to last word if no words. */ + beqz a5, 4f + +3: + /* Use single native register copy */ + REG_L a4, 0(a1) + addi a1, a1, SZREG + REG_S a4, 0(t6) + addi t6, t6, SZREG + bltu a1, a3, 3b + + /* Jump directly out if no more bytes */ + beqz a2, 7f + +4: + /* Copy the last word unaligned */ + add a3, a1, a2 + add a4, t6, a2 + REG_L a5, -SZREG(a3) + REG_S a5, -SZREG(a4) + ret + +5: + /* Copy bytes when the total copy is . */ + +#include + +extern __typeof (memcpy) __memcpy_generic; +hidden_proto (__memcpy_generic) + +#include