From patchwork Wed May 8 05:17:53 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Christoph_M=C3=BCllner?= X-Patchwork-Id: 1932780 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=vrull.eu header.i=@vrull.eu header.a=rsa-sha256 header.s=google header.b=DR27nuJG; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VZ3Mj3nSnz214L for ; Wed, 8 May 2024 15:18:41 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B6807384AB6A for ; Wed, 8 May 2024 05:18:39 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ej1-x630.google.com (mail-ej1-x630.google.com [IPv6:2a00:1450:4864:20::630]) by sourceware.org (Postfix) with ESMTPS id E5B493858D34 for ; Wed, 8 May 2024 05:18:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E5B493858D34 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E5B493858D34 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::630 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715145496; cv=none; b=mC9KkZi7XvByJ2FmIHadXqn/HHguy7bEFB4p8EsftQl1z0Dj3O7ZS7qEJvCBx6WGWuBca6VWnhkKc+ca6kIIScNvRq7MglzY9ShW/LtuItHK8fu14tdyXJ4op1BCvGl2FZOLIlm6+hIjzaPIH46HqpLl3En3hW/EmYh1RFHIAyE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715145496; c=relaxed/simple; bh=MgFe1GcB2vE/ImJB4Zz5bHAX2XZSJVytnKMDT7/JwAw=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=SJiQXMXz+YYmAAry7OA0WqHF3BtGZjq1k66X96w4pHupbP+BHBgsikI8lp44brxiEn9dWdL2bW11J1YopQ1UYn48WLrcL8U0Guie1Cf0pUrxRT5t4K2dQ6Ja9uiMIyHz0U9+tgxX2N127e+FJIueTaAQ69MI7bdsKH9BPmfG42E= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ej1-x630.google.com with SMTP id a640c23a62f3a-a59cf8140d0so731770866b.3 for ; Tue, 07 May 2024 22:18:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1715145491; x=1715750291; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5zaXwz3dAGPV6VbteUrXmUPwutpJ9xB4U+bqgZ1sXbE=; b=DR27nuJGIgTlu/ro/82/F5QAvCOovsmE19yxDfekCDI9DcBG0McGFPKODEMKwdiuac lxfI5fLLkK4OrCllSGieP88DZTs6WcbZmOPbkDlOtxGA0rr+Dfs+wuNDnv5usbWSmF6F TGXulPJSgW5z3qcbLh+g/5O7bwar609cMli8t/vJyWtj/4e7XCv6jxfu3OliP8HFpM8c 8ncUOYpqk/u/MOyGfoczDZnI5e3B9cYD9QXCtnOSr6pbf5Cjpg8WEDQ7XtTNsLXZFQc4 IQeEoB8QHhbcqzAI2FDsgb66e7YJLLqGgXWPwEIHxNuAATjO+bKX4uIJvHPFVHLpFiRz h7FA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715145491; x=1715750291; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5zaXwz3dAGPV6VbteUrXmUPwutpJ9xB4U+bqgZ1sXbE=; b=Z7NCr6kewVrwkvOU+p/OKIQeDQvk2wL2nde+MzFpOSN/TC1affk4PrRK9ltIKhrer0 5ZiJNah2mxw8hCjH4T9lwsg0UjS7CF8U5nfyyYktXPC1wlA6oOErkFjUTOi0oqhZ45Ux 8OZ/vgerDCVUNkYgyYiWiUnjlUuBXgfaJYZXMhT2O9NDr4XdSGbpkXP5qxeFWRG/DlJa Uh7PxssQNBGyLRMkS0t8JvmhYWCeIa4tpRIxes6hcU/Jdp0ZwIgNXTh+J6WM58vzq7Xy tis2ismOJ3WTsPJ46Uo2+1yyX5fpns6DUZkRTYTFV7aMJfbu5C5j8MR6oQjTjIofBEW2 gL+w== X-Gm-Message-State: AOJu0YxK9pMRMWl0FLb99pdWF+BUKvGsvtODfwDfOQ0FuXhgnFGgSNC8 rGi1wAd+vM3IinLzJmumVKj8R3LxYbM2oR3jOlN+Jv6r4ppOXDBpoH7Mc3NEyGdRzJdJQJeh5Sp SpDc= X-Google-Smtp-Source: AGHT+IHuqmCCJeAAfA/I+YDkn2YnjDzBn1/iFX+Yj8yDMRWTwVIewZba9LqxPNuyRcOy9F/1hRSebA== X-Received: by 2002:a50:ccda:0:b0:572:9c4c:2503 with SMTP id 4fb4d7f45d1cf-5731da69540mr1023875a12.38.1715145490925; Tue, 07 May 2024 22:18:10 -0700 (PDT) Received: from antares.fritz.box (62-178-148-172.cable.dynamic.surfer.at. [62.178.148.172]) by smtp.gmail.com with ESMTPSA id bo1-20020a0564020b2100b00572a8c07345sm7156555edb.54.2024.05.07.22.18.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 May 2024 22:18:10 -0700 (PDT) From: =?utf-8?q?Christoph_M=C3=BCllner?= To: gcc-patches@gcc.gnu.org, Kito Cheng , Jim Wilson , Palmer Dabbelt , Andrew Waterman , Philipp Tomsich , Jeff Law , Vineet Gupta Cc: =?utf-8?q?Christoph_M=C3=BCllner?= Subject: [PATCH 1/4] RISC-V: Add test cases for cpymem expansion Date: Wed, 8 May 2024 07:17:53 +0200 Message-ID: <20240508051756.3999080-2-christoph.muellner@vrull.eu> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240508051756.3999080-1-christoph.muellner@vrull.eu> References: <20240508051756.3999080-1-christoph.muellner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-11.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT, LIKELY_SPAM_BODY, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org We have two mechanisms in the RISC-V backend that expand cpymem pattern: a) by-pieces, b) riscv_expand_block_move() in riscv-string.cc. The by-pieces framework has higher priority and emits a sequence of up to 15 instructions (see use_by_pieces_infrastructure_p() for more details). As a rule-of-thumb, by-pieces emits alternating load/store sequences and the setmem expansion in the backend emits a sequence of loads followed by a sequence of stores. Let's add some test cases to document the current behaviour and to have tests to identify regressions. Signed-off-by: Christoph Müllner gcc/testsuite/ChangeLog: * gcc.target/riscv/cpymem-32-ooo.c: New test. * gcc.target/riscv/cpymem-32.c: New test. * gcc.target/riscv/cpymem-64-ooo.c: New test. * gcc.target/riscv/cpymem-64.c: New test. --- .../gcc.target/riscv/cpymem-32-ooo.c | 131 +++++++++++++++++ gcc/testsuite/gcc.target/riscv/cpymem-32.c | 138 ++++++++++++++++++ .../gcc.target/riscv/cpymem-64-ooo.c | 129 ++++++++++++++++ gcc/testsuite/gcc.target/riscv/cpymem-64.c | 138 ++++++++++++++++++ 4 files changed, 536 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c create mode 100644 gcc/testsuite/gcc.target/riscv/cpymem-32.c create mode 100644 gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c create mode 100644 gcc/testsuite/gcc.target/riscv/cpymem-64.c diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c new file mode 100644 index 00000000000..33fb9891d82 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c @@ -0,0 +1,131 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target rv32 } */ +/* { dg-options "-march=rv32gc -mabi=ilp32d -mtune=generic-ooo" } */ +/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */ +/* { dg-final { check-function-bodies "**" "" } } */ +/* { dg-allow-blank-lines-in-output 1 } */ + +#define COPY_N(N) \ +void copy_##N (void *to, void *from) \ +{ \ + __builtin_memcpy (to, from, N); \ +} + +#define COPY_ALIGNED_N(N) \ +void copy_aligned_##N (void *to, void *from) \ +{ \ + to = __builtin_assume_aligned(to, sizeof(long)); \ + from = __builtin_assume_aligned(from, sizeof(long)); \ + __builtin_memcpy (to, from, N); \ +} + +/* +**copy_7: +** ... +** lw\t[at][0-9],0\([at][0-9]\) +** sw\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],6\([at][0-9]\) +** sb\t[at][0-9],6\([at][0-9]\) +** ... +*/ +COPY_N(7) + +/* +**copy_aligned_7: +** ... +** lw\t[at][0-9],0\([at][0-9]\) +** sw\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],6\([at][0-9]\) +** sb\t[at][0-9],6\([at][0-9]\) +** ... +*/ +COPY_ALIGNED_N(7) + +/* +**copy_8: +** ... +** lw\ta[0-9],0\(a[0-9]\) +** sw\ta[0-9],0\(a[0-9]\) +** ... +*/ +COPY_N(8) + +/* +**copy_aligned_8: +** ... +** lw\ta[0-9],0\(a[0-9]\) +** sw\ta[0-9],0\(a[0-9]\) +** ... +*/ +COPY_ALIGNED_N(8) + +/* +**copy_11: +** ... +** lbu\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],10\([at][0-9]\) +** ... +** sb\t[at][0-9],0\([at][0-9]\) +** ... +** sb\t[at][0-9],10\([at][0-9]\) +** ... +*/ +COPY_N(11) + +/* +**copy_aligned_11: +** ... +** lw\t[at][0-9],0\([at][0-9]\) +** ... +** sw\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],10\([at][0-9]\) +** sb\t[at][0-9],10\([at][0-9]\) +** ... +*/ +COPY_ALIGNED_N(11) + +/* +**copy_15: +** ... +** (call|tail)\tmemcpy +** ... +*/ +COPY_N(15) + +/* +**copy_aligned_15: +** ... +** lw\t[at][0-9],0\([at][0-9]\) +** ... +** sw\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],14\([at][0-9]\) +** sb\t[at][0-9],14\([at][0-9]\) +** ... +*/ +COPY_ALIGNED_N(15) + +/* +**copy_27: +** ... +** (call|tail)\tmemcpy +** ... +*/ +COPY_N(27) + +/* +**copy_aligned_27: +** ... +** lw\t[at][0-9],20\([at][0-9]\) +** ... +** sw\t[at][0-9],20\([at][0-9]\) +** ... +** lbu\t[at][0-9],26\([at][0-9]\) +** sb\t[at][0-9],26\([at][0-9]\) +** ... +*/ +COPY_ALIGNED_N(27) diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32.c b/gcc/testsuite/gcc.target/riscv/cpymem-32.c new file mode 100644 index 00000000000..44ba14a1d51 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/cpymem-32.c @@ -0,0 +1,138 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target rv32 } */ +/* { dg-options "-march=rv32gc -mabi=ilp32d -mtune=rocket" } */ +/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */ +/* { dg-final { check-function-bodies "**" "" } } */ +/* { dg-allow-blank-lines-in-output 1 } */ + +#define COPY_N(N) \ +void copy_##N (void *to, void *from) \ +{ \ + __builtin_memcpy (to, from, N); \ +} + +#define COPY_ALIGNED_N(N) \ +void copy_aligned_##N (void *to, void *from) \ +{ \ + to = __builtin_assume_aligned(to, sizeof(long)); \ + from = __builtin_assume_aligned(from, sizeof(long)); \ + __builtin_memcpy (to, from, N); \ +} + +/* +**copy_7: +** ... +** lbu\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],6\([at][0-9]\) +** ... +** sb\t[at][0-9],0\([at][0-9]\) +** ... +** sb\t[at][0-9],6\([at][0-9]\) +** ... +*/ +COPY_N(7) + +/* +**copy_aligned_7: +** ... +** lw\t[at][0-9],0\([at][0-9]\) +** sw\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],6\([at][0-9]\) +** sb\t[at][0-9],6\([at][0-9]\) +** ... +*/ +COPY_ALIGNED_N(7) + +/* +**copy_8: +** ... +** lbu\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],7\([at][0-9]\) +** ... +** sb\t[at][0-9],0\([at][0-9]\) +** ... +** sb\t[at][0-9],7\([at][0-9]\) +** ... +*/ +COPY_N(8) + +/* +**copy_aligned_8: +** ... +** lw\ta[0-9],0\(a[0-9]\) +** sw\ta[0-9],0\(a[0-9]\) +** ... +*/ +COPY_ALIGNED_N(8) + +/* +**copy_11: +** ... +** lbu\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],10\([at][0-9]\) +** ... +** sb\t[at][0-9],0\([at][0-9]\) +** ... +** sb\t[at][0-9],10\([at][0-9]\) +** ... +*/ +COPY_N(11) + +/* +**copy_aligned_11: +** ... +** lw\t[at][0-9],0\([at][0-9]\) +** ... +** sw\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],10\([at][0-9]\) +** sb\t[at][0-9],10\([at][0-9]\) +** ... +*/ +COPY_ALIGNED_N(11) + +/* +**copy_15: +** ... +** (call|tail)\tmemcpy +** ... +*/ +COPY_N(15) + +/* +**copy_aligned_15: +** ... +** lw\t[at][0-9],0\([at][0-9]\) +** ... +** sw\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],14\([at][0-9]\) +** sb\t[at][0-9],14\([at][0-9]\) +** ... +*/ +COPY_ALIGNED_N(15) + +/* +**copy_27: +** ... +** (call|tail)\tmemcpy +** ... +*/ +COPY_N(27) + +/* +**copy_aligned_27: +** ... +** lw\t[at][0-9],20\([at][0-9]\) +** ... +** sw\t[at][0-9],20\([at][0-9]\) +** ... +** lbu\t[at][0-9],26\([at][0-9]\) +** sb\t[at][0-9],26\([at][0-9]\) +** ... +*/ +COPY_ALIGNED_N(27) diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c new file mode 100644 index 00000000000..8e40e52fa91 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c @@ -0,0 +1,129 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target rv64 } */ +/* { dg-options "-march=rv64gc -mabi=lp64d -mtune=generic-ooo" } */ +/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */ +/* { dg-final { check-function-bodies "**" "" } } */ +/* { dg-allow-blank-lines-in-output 1 } */ + +#define COPY_N(N) \ +void copy_##N (void *to, void *from) \ +{ \ + __builtin_memcpy (to, from, N); \ +} + +#define COPY_ALIGNED_N(N) \ +void copy_aligned_##N (void *to, void *from) \ +{ \ + to = __builtin_assume_aligned(to, sizeof(long)); \ + from = __builtin_assume_aligned(from, sizeof(long)); \ + __builtin_memcpy (to, from, N); \ +} + +/* +**copy_7: +** ... +** lw\t[at][0-9],0\([at][0-9]\) +** sw\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],6\([at][0-9]\) +** sb\t[at][0-9],6\([at][0-9]\) +** ... +*/ +COPY_N(7) + +/* +**copy_aligned_7: +** ... +** lw\t[at][0-9],0\([at][0-9]\) +** sw\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],6\([at][0-9]\) +** sb\t[at][0-9],6\([at][0-9]\) +** ... +*/ +COPY_ALIGNED_N(7) + +/* +**copy_8: +** ... +** ld\ta[0-9],0\(a[0-9]\) +** sd\ta[0-9],0\(a[0-9]\) +** ... +*/ +COPY_N(8) + +/* +**copy_aligned_8: +** ... +** ld\ta[0-9],0\(a[0-9]\) +** sd\ta[0-9],0\(a[0-9]\) +** ... +*/ +COPY_ALIGNED_N(8) + +/* +**copy_11: +** ... +** ld\t[at][0-9],0\([at][0-9]\) +** sd\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],10\([at][0-9]\) +** sb\t[at][0-9],10\([at][0-9]\) +** ... +*/ +COPY_N(11) + +/* +**copy_aligned_11: +** ... +** ld\t[at][0-9],0\([at][0-9]\) +** ... +** sd\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],10\([at][0-9]\) +** sb\t[at][0-9],10\([at][0-9]\) +** ... +*/ +COPY_ALIGNED_N(11) + +/* +**copy_15: +** ... +** (call|tail)\tmemcpy +** ... +*/ +COPY_N(15) + +/* +**copy_aligned_15: +** ... +** ld\t[at][0-9],0\([at][0-9]\) +** ... +** sd\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],14\([at][0-9]\) +** sb\t[at][0-9],14\([at][0-9]\) +** ... +*/ +COPY_ALIGNED_N(15) + +/* +**copy_27: +** ... +** (call|tail)\tmemcpy +** ... +*/ +COPY_N(27) + +/* +**copy_aligned_27: +** ... +** ld\t[at][0-9],16\([at][0-9]\) +** ... +** sd\t[at][0-9],16\([at][0-9]\) +** ... +** lbu\t[at][0-9],26\([at][0-9]\) +** sb\t[at][0-9],26\([at][0-9]\) +** ... +*/ +COPY_ALIGNED_N(27) diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-64.c b/gcc/testsuite/gcc.target/riscv/cpymem-64.c new file mode 100644 index 00000000000..bdfaca0d46a --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/cpymem-64.c @@ -0,0 +1,138 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target rv64 } */ +/* { dg-options "-march=rv64gc -mabi=lp64d -mtune=rocket" } */ +/* { dg-skip-if "" { *-*-* } {"-O0" "-Os" "-Og" "-Oz" "-flto" } } */ +/* { dg-final { check-function-bodies "**" "" } } */ +/* { dg-allow-blank-lines-in-output 1 } */ + +#define COPY_N(N) \ +void copy_##N (void *to, void *from) \ +{ \ + __builtin_memcpy (to, from, N); \ +} + +#define COPY_ALIGNED_N(N) \ +void copy_aligned_##N (void *to, void *from) \ +{ \ + to = __builtin_assume_aligned(to, sizeof(long)); \ + from = __builtin_assume_aligned(from, sizeof(long)); \ + __builtin_memcpy (to, from, N); \ +} + +/* +**copy_7: +** ... +** lbu\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],6\([at][0-9]\) +** ... +** sb\t[at][0-9],0\([at][0-9]\) +** ... +** sb\t[at][0-9],6\([at][0-9]\) +** ... +*/ +COPY_N(7) + +/* +**copy_aligned_7: +** ... +** lw\t[at][0-9],0\([at][0-9]\) +** sw\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],6\([at][0-9]\) +** sb\t[at][0-9],6\([at][0-9]\) +** ... +*/ +COPY_ALIGNED_N(7) + +/* +**copy_8: +** ... +** lbu\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],7\([at][0-9]\) +** ... +** sb\t[at][0-9],0\([at][0-9]\) +** ... +** sb\t[at][0-9],7\([at][0-9]\) +** ... +*/ +COPY_N(8) + +/* +**copy_aligned_8: +** ... +** ld\ta[0-9],0\(a[0-9]\) +** sd\ta[0-9],0\(a[0-9]\) +** ... +*/ +COPY_ALIGNED_N(8) + +/* +**copy_11: +** ... +** lbu\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],10\([at][0-9]\) +** ... +** sb\t[at][0-9],0\([at][0-9]\) +** ... +** sb\t[at][0-9],10\([at][0-9]\) +** ... +*/ +COPY_N(11) + +/* +**copy_aligned_11: +** ... +** ld\t[at][0-9],0\([at][0-9]\) +** ... +** sd\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],10\([at][0-9]\) +** sb\t[at][0-9],10\([at][0-9]\) +** ... +*/ +COPY_ALIGNED_N(11) + +/* +**copy_15: +** ... +** (call|tail)\tmemcpy +** ... +*/ +COPY_N(15) + +/* +**copy_aligned_15: +** ... +** ld\t[at][0-9],0\([at][0-9]\) +** ... +** sd\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],14\([at][0-9]\) +** sb\t[at][0-9],14\([at][0-9]\) +** ... +*/ +COPY_ALIGNED_N(15) + +/* +**copy_27: +** ... +** (call|tail)\tmemcpy +** ... +*/ +COPY_N(27) + +/* +**copy_aligned_27: +** ... +** ld\t[at][0-9],16\([at][0-9]\) +** ... +** sd\t[at][0-9],16\([at][0-9]\) +** ... +** lbu\t[at][0-9],26\([at][0-9]\) +** sb\t[at][0-9],26\([at][0-9]\) +** ... +*/ +COPY_ALIGNED_N(27) From patchwork Wed May 8 05:17:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Christoph_M=C3=BCllner?= X-Patchwork-Id: 1932781 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=vrull.eu header.i=@vrull.eu header.a=rsa-sha256 header.s=google header.b=UCo943bf; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VZ3Mm1wFDz1ymg for ; Wed, 8 May 2024 15:18:44 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8C40F384AB5A for ; Wed, 8 May 2024 05:18:42 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ej1-x633.google.com (mail-ej1-x633.google.com [IPv6:2a00:1450:4864:20::633]) by sourceware.org (Postfix) with ESMTPS id 5ADB83858D35 for ; Wed, 8 May 2024 05:18:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5ADB83858D35 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5ADB83858D35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::633 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715145498; cv=none; b=t6kzPd9Nlq7GUj1uezqrCDKVQ88R6UTKDypcK7/nuy2FZZ7EbI6uZKy5Tnu3CkcBil7Jc+o0fqGuer0fANuk0qTm25JUOWNv0ax+Y8wGCRsMpseArO6Rb4g7TzitJdMXhqQPcBPoGSJyJD0TEIXQ6tpfqxXn7uaSq/tSgBmXwVY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715145498; c=relaxed/simple; bh=0VoduqbhACyM6hzER9TGeqrHNtBbVNINER+OLldyBQo=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=IPXf+P65gx4+TAbEZ+UgEE/+qnmmTD97Ywx1HnmLV6xlIWqUNsSQs+d710JAZA1Zvr75HNm4uFPHvl5a0FNXPzNiUGlkqrGK137ZbML0dIUKH+MhxCjP1Cgd7tq1pcHLd4oFzqvcaf2T7TfPDyGMf9jdUpVPs9GCdcTGFuiALUA= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ej1-x633.google.com with SMTP id a640c23a62f3a-a59cf8140d0so731774566b.3 for ; Tue, 07 May 2024 22:18:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1715145492; x=1715750292; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lrYwNW2C8Dz5+L1JUWPNeWJ3ymL5g//2p5QxxHfnmts=; b=UCo943bfBN6KmrITnnJ4v/XPOYI40SR/DYeaJkIFXs5M/zGB0uZ2J/f3+4fLOmmUsD lkB25jMWhRasyYiCYUQnPvaYPbcPrJ/0BmhiJ52M6rSIt6anLuBkox9E/xUEJWq0OYxF Cy2fq6n87ZVSqQLGzSMm5ST2/JkTWgFu/r18/mF4tCkIcdekmpJxOe1c617B0KiB1ITk WeBI0LJ0cWEIFQNMHNQXgC/t4POdwjEZvWcRpXe4ir91eHvsi3boj/RZiEVxc0y+lsNe xN0U/zk0raqXXDy+sIL1scDtFfc/0dbT073XiDpmCKN1hTQnirO2ewQA2XQKLirnUQTL rYfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715145492; x=1715750292; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lrYwNW2C8Dz5+L1JUWPNeWJ3ymL5g//2p5QxxHfnmts=; b=ZPBsiBbKoYP1X5H6782K5gFOIRP4Xe8rTMBGRXmibRNSKlpl5Az8SlD4rWTlzL+T7X ZCpJQJfsnDQ3Zua1/VpVBjoA5w3FbvnQMoEeBs800RKiWD4Ktgq0Km2MpVNhY0+xkZx0 0IQHp6Kz/o7lLeklxTgiOvtvFkZtkOHow9TsvSyugknJaQC6N+5o+bFsodYUC8ByVhxR LkGn2+xXFwvBHDQIlWo1atNZ2gDAxgauhKeFPElZBCpYc7v8pne3+IYV3iYB7b6t2Myn LsAEWLjjhr5HRtwUJ1NkZFRqZgGnjsgZAcr4uHGh1myYGvqGee99OfQk7p+Jgj2ZmZ4u jchg== X-Gm-Message-State: AOJu0YwUCdBfH7AVLleikTbiYYCgEeVWyOOa6r/8aFkYMbX0hUzc4RrQ kHbA7K/ALFP3bG+vvNaRSewLWhfSaeSu4cFNLDluuTzcAb3it+i4QqdOnHHKKDnmruSFswCyBNn 6aYg= X-Google-Smtp-Source: AGHT+IFfNCBqtW2Bnet/xskF2NoXkkOpeJFwm8OVwWVFnwqwuj1erAwT2d0VNHl+CL0DVX20LaRRgg== X-Received: by 2002:a50:f681:0:b0:572:7142:4594 with SMTP id 4fb4d7f45d1cf-5731da5b7f8mr1096070a12.29.1715145492181; Tue, 07 May 2024 22:18:12 -0700 (PDT) Received: from antares.fritz.box (62-178-148-172.cable.dynamic.surfer.at. [62.178.148.172]) by smtp.gmail.com with ESMTPSA id bo1-20020a0564020b2100b00572a8c07345sm7156555edb.54.2024.05.07.22.18.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 May 2024 22:18:11 -0700 (PDT) From: =?utf-8?q?Christoph_M=C3=BCllner?= To: gcc-patches@gcc.gnu.org, Kito Cheng , Jim Wilson , Palmer Dabbelt , Andrew Waterman , Philipp Tomsich , Jeff Law , Vineet Gupta Cc: =?utf-8?q?Christoph_M=C3=BCllner?= Subject: [PATCH 2/4] RISC-V: Allow unaligned accesses in cpymemsi expansion Date: Wed, 8 May 2024 07:17:54 +0200 Message-ID: <20240508051756.3999080-3-christoph.muellner@vrull.eu> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240508051756.3999080-1-christoph.muellner@vrull.eu> References: <20240508051756.3999080-1-christoph.muellner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org The RISC-V cpymemsi expansion is called, whenever the by-pieces infrastructure will not take care of the builtin expansion. The code emitted by the by-pieces infrastructure may emits code, that includes unaligned accesses if riscv_slow_unaligned_access_p is false. The RISC-V cpymemsi expansion is handled via riscv_expand_block_move(). The current implementation of this function does not check riscv_slow_unaligned_access_p and never emits unaligned accesses. Since by-pieces emits unaligned accesses, it is reasonable to implement the same behaviour in the cpymemsi expansion. And that's what this patch is doing. The patch checks riscv_slow_unaligned_access_p at the entry and sets the allowed alignment accordingly. This alignment is then propagated down to the routines that emit the actual instructions. The changes introduced by this patch can be seen in the adjustments of the cpymem tests. gcc/ChangeLog: * config/riscv/riscv-string.cc (riscv_block_move_straight): Add parameter align. (riscv_adjust_block_mem): Replace parameter length by align. (riscv_block_move_loop): Add parameter align. (riscv_expand_block_move_scalar): Set alignment properly if the target has fast unaligned access. gcc/testsuite/ChangeLog: * gcc.target/riscv/cpymem-32-ooo.c: Adjust for unaligned access. * gcc.target/riscv/cpymem-64-ooo.c: Likewise. Signed-off-by: Christoph Müllner --- gcc/config/riscv/riscv-string.cc | 53 +++++++++++-------- .../gcc.target/riscv/cpymem-32-ooo.c | 20 +++++-- .../gcc.target/riscv/cpymem-64-ooo.c | 14 ++++- 3 files changed, 59 insertions(+), 28 deletions(-) diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc index b09b51d7526..8fc0877772f 100644 --- a/gcc/config/riscv/riscv-string.cc +++ b/gcc/config/riscv/riscv-string.cc @@ -610,11 +610,13 @@ riscv_expand_strlen (rtx result, rtx src, rtx search_char, rtx align) return false; } -/* Emit straight-line code to move LENGTH bytes from SRC to DEST. +/* Emit straight-line code to move LENGTH bytes from SRC to DEST + with accesses that are ALIGN bytes aligned. Assume that the areas do not overlap. */ static void -riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length) +riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length, + unsigned HOST_WIDE_INT align) { unsigned HOST_WIDE_INT offset, delta; unsigned HOST_WIDE_INT bits; @@ -622,8 +624,7 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length) enum machine_mode mode; rtx *regs; - bits = MAX (BITS_PER_UNIT, - MIN (BITS_PER_WORD, MIN (MEM_ALIGN (src), MEM_ALIGN (dest)))); + bits = MAX (BITS_PER_UNIT, MIN (BITS_PER_WORD, align)); mode = mode_for_size (bits, MODE_INT, 0).require (); delta = bits / BITS_PER_UNIT; @@ -648,21 +649,20 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length) { src = adjust_address (src, BLKmode, offset); dest = adjust_address (dest, BLKmode, offset); - move_by_pieces (dest, src, length - offset, - MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), RETURN_BEGIN); + move_by_pieces (dest, src, length - offset, align, RETURN_BEGIN); } } /* Helper function for doing a loop-based block operation on memory - reference MEM. Each iteration of the loop will operate on LENGTH - bytes of MEM. + reference MEM. Create a new base register for use within the loop and point it to the start of MEM. Create a new memory reference that uses this - register. Store them in *LOOP_REG and *LOOP_MEM respectively. */ + register and has an alignment of ALIGN. Store them in *LOOP_REG + and *LOOP_MEM respectively. */ static void -riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT length, +riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT align, rtx *loop_reg, rtx *loop_mem) { *loop_reg = copy_addr_to_reg (XEXP (mem, 0)); @@ -670,15 +670,17 @@ riscv_adjust_block_mem (rtx mem, unsigned HOST_WIDE_INT length, /* Although the new mem does not refer to a known location, it does keep up to LENGTH bytes of alignment. */ *loop_mem = change_address (mem, BLKmode, *loop_reg); - set_mem_align (*loop_mem, MIN (MEM_ALIGN (mem), length * BITS_PER_UNIT)); + set_mem_align (*loop_mem, align); } /* Move LENGTH bytes from SRC to DEST using a loop that moves BYTES_PER_ITER - bytes at a time. LENGTH must be at least BYTES_PER_ITER. Assume that - the memory regions do not overlap. */ + bytes at a time. LENGTH must be at least BYTES_PER_ITER. The alignment + of the access can be set by ALIGN. Assume that the memory regions do not + overlap. */ static void riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length, + unsigned HOST_WIDE_INT align, unsigned HOST_WIDE_INT bytes_per_iter) { rtx label, src_reg, dest_reg, final_src, test; @@ -688,8 +690,8 @@ riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length, length -= leftover; /* Create registers and memory references for use within the loop. */ - riscv_adjust_block_mem (src, bytes_per_iter, &src_reg, &src); - riscv_adjust_block_mem (dest, bytes_per_iter, &dest_reg, &dest); + riscv_adjust_block_mem (src, align, &src_reg, &src); + riscv_adjust_block_mem (dest, align, &dest_reg, &dest); /* Calculate the value that SRC_REG should have after the last iteration of the loop. */ @@ -701,7 +703,7 @@ riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length, emit_label (label); /* Emit the loop body. */ - riscv_block_move_straight (dest, src, bytes_per_iter); + riscv_block_move_straight (dest, src, bytes_per_iter, align); /* Move on to the next block. */ riscv_emit_move (src_reg, plus_constant (Pmode, src_reg, bytes_per_iter)); @@ -713,7 +715,7 @@ riscv_block_move_loop (rtx dest, rtx src, unsigned HOST_WIDE_INT length, /* Mop up any left-over bytes. */ if (leftover) - riscv_block_move_straight (dest, src, leftover); + riscv_block_move_straight (dest, src, leftover, align); else emit_insn(gen_nop ()); } @@ -730,8 +732,16 @@ riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length) unsigned HOST_WIDE_INT hwi_length = UINTVAL (length); unsigned HOST_WIDE_INT factor, align; - align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD); - factor = BITS_PER_WORD / align; + if (riscv_slow_unaligned_access_p) + { + align = MIN (MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), BITS_PER_WORD); + factor = BITS_PER_WORD / align; + } + else + { + align = hwi_length * BITS_PER_UNIT; + factor = 1; + } if (optimize_function_for_size_p (cfun) && hwi_length * factor * UNITS_PER_WORD > MOVE_RATIO (false)) @@ -739,7 +749,7 @@ riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length) if (hwi_length <= (RISCV_MAX_MOVE_BYTES_STRAIGHT / factor)) { - riscv_block_move_straight (dest, src, INTVAL (length)); + riscv_block_move_straight (dest, src, hwi_length, align); return true; } else if (optimize && align >= BITS_PER_WORD) @@ -759,7 +769,8 @@ riscv_expand_block_move_scalar (rtx dest, rtx src, rtx length) iter_words = i; } - riscv_block_move_loop (dest, src, bytes, iter_words * UNITS_PER_WORD); + riscv_block_move_loop (dest, src, bytes, align, + iter_words * UNITS_PER_WORD); return true; } diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c index 33fb9891d82..946a773f77a 100644 --- a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c +++ b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c @@ -64,12 +64,12 @@ COPY_ALIGNED_N(8) /* **copy_11: ** ... -** lbu\t[at][0-9],0\([at][0-9]\) ** ... -** lbu\t[at][0-9],10\([at][0-9]\) +** lw\t[at][0-9],0\([at][0-9]\) ** ... -** sb\t[at][0-9],0\([at][0-9]\) +** sw\t[at][0-9],0\([at][0-9]\) ** ... +** lbu\t[at][0-9],10\([at][0-9]\) ** sb\t[at][0-9],10\([at][0-9]\) ** ... */ @@ -91,7 +91,12 @@ COPY_ALIGNED_N(11) /* **copy_15: ** ... -** (call|tail)\tmemcpy +** lw\t[at][0-9],0\([at][0-9]\) +** ... +** sw\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],14\([at][0-9]\) +** sb\t[at][0-9],14\([at][0-9]\) ** ... */ COPY_N(15) @@ -112,7 +117,12 @@ COPY_ALIGNED_N(15) /* **copy_27: ** ... -** (call|tail)\tmemcpy +** lw\t[at][0-9],20\([at][0-9]\) +** ... +** sw\t[at][0-9],20\([at][0-9]\) +** ... +** lbu\t[at][0-9],26\([at][0-9]\) +** sb\t[at][0-9],26\([at][0-9]\) ** ... */ COPY_N(27) diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c index 8e40e52fa91..08a927b9483 100644 --- a/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c +++ b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c @@ -89,7 +89,12 @@ COPY_ALIGNED_N(11) /* **copy_15: ** ... -** (call|tail)\tmemcpy +** ld\t[at][0-9],0\([at][0-9]\) +** ... +** sd\t[at][0-9],0\([at][0-9]\) +** ... +** lbu\t[at][0-9],14\([at][0-9]\) +** sb\t[at][0-9],14\([at][0-9]\) ** ... */ COPY_N(15) @@ -110,7 +115,12 @@ COPY_ALIGNED_N(15) /* **copy_27: ** ... -** (call|tail)\tmemcpy +** ld\t[at][0-9],16\([at][0-9]\) +** ... +** sd\t[at][0-9],16\([at][0-9]\) +** ... +** lbu\t[at][0-9],26\([at][0-9]\) +** sb\t[at][0-9],26\([at][0-9]\) ** ... */ COPY_N(27) From patchwork Wed May 8 05:17:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Christoph_M=C3=BCllner?= X-Patchwork-Id: 1932782 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=vrull.eu header.i=@vrull.eu header.a=rsa-sha256 header.s=google header.b=mYtERFmK; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VZ3Nt1qxwz1ymg for ; Wed, 8 May 2024 15:19:42 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 63BAB385828E for ; Wed, 8 May 2024 05:19:40 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ej1-x631.google.com (mail-ej1-x631.google.com [IPv6:2a00:1450:4864:20::631]) by sourceware.org (Postfix) with ESMTPS id 846623858D20 for ; Wed, 8 May 2024 05:18:15 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 846623858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 846623858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::631 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715145498; cv=none; b=pBaQmN4hJhHxI5q/ucZXtSOfdof4L92Etg/SW1IyJM6WDS7uavT6uqZgbArwXRdUyDLu83t8CgnVBGRWOpUp+1pd9TgdFBzx2of7vKSfJRK9+2Et+4XqkYpSKn7+QuUH1H1xwsueFSJEslIUZKjRXM2VD3dle6hrLiTC/H+T7Wg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715145498; c=relaxed/simple; bh=iNL9IjHjjsugeVHKykY6HB5v4gxlyiTOf8h9W6B3ogU=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=FHvL+4+qicX1ieHn12MZymCa0293O+4Kpoy5P1445PWkgV7bjj8qPBMY7NHcB7EgFqFwLvplww19PSOAtntFvHXlb4chZ51TAHMng9jC1Q83Yn344DiiRSwTy02rdRB397OJr4hOLINDGfWl545tv163RgYJmspZGWTepX15pSs= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ej1-x631.google.com with SMTP id a640c23a62f3a-a59c5c9c6aeso789007066b.2 for ; Tue, 07 May 2024 22:18:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1715145493; x=1715750293; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=v+1okNVkAG2KkBn1z3jG0ZucyROclLVU+GiSYFw3c9o=; b=mYtERFmK13O8H/Au+T8Kg5jcYX/Gr3YDSqdE6IQ+11bMKJO6YjqXCJWj8qfBOgl90u u6cl98aksDfJHA/p7wI6DpqHCxqA+DE2HvWD+KW5Ph4G7RY2qaI9LXLVOVw2Ax3v9fo0 6ufjzTfMi+KEq5hyrYaUPEH+YOnF37ThBc3/v+TfZYbrzlZe6Hw0OoSibvZn2qsstw6e xdNvCv8XbdIklALJLAY7NRk/i9EDgZXPB4LslMrwI8KfL+h2klqPNJVVbxH8whsOn5RB ii/y8s+PSy9sKwywKciNNtf3PAkbsOiEr+RWSE+Iu8xt++OqKlJuWxs5Rpe7dh4p/4fB ynSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715145493; x=1715750293; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=v+1okNVkAG2KkBn1z3jG0ZucyROclLVU+GiSYFw3c9o=; b=GaoF5hCrsZqWIQdQ0hb1t2z2fjHPFBzOkMNNWz8JQJ2xacL7EIdLZzFItaYaCzPk2z S8kqsH5nPZWhXBb0BmrLRgxeUeT8PHB14HucUfzE3YYvwmM9BYWDPJ3W+KWpMHZbC+3G gTEDC97bwteSU9p0njzPYuhJUHUh/qX9+zoIShrUTkjenAq3kOEC2SW920UU+KnhHmjp bnyQ+toD59GNMxDvVnUS0vFNS1+WU5YtHfpIlB/WEJkyI2F/7adka5Sn8jB+ll/6xFpV nghCszBw98n7nZ8zEgQZpncnUVtWYgMDlhSG8uXI8rHulbCbYiq8RNbCfUmR6SRtyWoE o4sg== X-Gm-Message-State: AOJu0YzMqdXeU077Sf4hN029r5B85AdV2DG4PNevsdEU+ZgUCvBNyh6v cVu4mkTpi9ub5cceOEtH7x2zdRlSeeDgbRlu6LSuf5nxygb9+Os8UsBG8Nbcu7BtiRdrqKsN9dF 4IBI= X-Google-Smtp-Source: AGHT+IHMz5bhGmAKjwdTx1jm6YUHbTEZjizYpRMPo36NgcF7PNr4fS/FN3VYp3Is32EeU1YUf8rNGw== X-Received: by 2002:a17:906:3953:b0:a59:9af5:2c97 with SMTP id a640c23a62f3a-a59fb95579emr88042566b.25.1715145493647; Tue, 07 May 2024 22:18:13 -0700 (PDT) Received: from antares.fritz.box (62-178-148-172.cable.dynamic.surfer.at. [62.178.148.172]) by smtp.gmail.com with ESMTPSA id bo1-20020a0564020b2100b00572a8c07345sm7156555edb.54.2024.05.07.22.18.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 May 2024 22:18:12 -0700 (PDT) From: =?utf-8?q?Christoph_M=C3=BCllner?= To: gcc-patches@gcc.gnu.org, Kito Cheng , Jim Wilson , Palmer Dabbelt , Andrew Waterman , Philipp Tomsich , Jeff Law , Vineet Gupta Cc: =?utf-8?q?Christoph_M=C3=BCllner?= Subject: [PATCH 3/4] RISC-V: tune: Add setting for overlapping mem ops to tuning struct Date: Wed, 8 May 2024 07:17:55 +0200 Message-ID: <20240508051756.3999080-4-christoph.muellner@vrull.eu> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240508051756.3999080-1-christoph.muellner@vrull.eu> References: <20240508051756.3999080-1-christoph.muellner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds the field overlap_op_by_pieces to the struct riscv_tune_param, which is used by the TARGET_OVERLAP_OP_BY_PIECES_P() hook. This hook is used by the by-pieces infrastructure to decide if overlapping memory accesses should be emitted. The new property is set to false in all tune structs except for generic-ooo. The changes in the expansion can be seen in the adjustments of the cpymem test cases. These tests also reveal a limitation in the RISC-V cpymem expansion that prevents this optimization as only by-pieces cpymem expansions emit overlapping memory accesses. gcc/ChangeLog: * config/riscv/riscv.cc (struct riscv_tune_param): New field overlap_op_by_pieces. (riscv_overlap_op_by_pieces): New function. (TARGET_OVERLAP_OP_BY_PIECES_P): Connect to riscv_overlap_op_by_pieces. gcc/testsuite/ChangeLog: * gcc.target/riscv/cpymem-32-ooo.c: Adjust for overlapping access. * gcc.target/riscv/cpymem-64-ooo.c: Likewise. Signed-off-by: Christoph Müllner --- gcc/config/riscv/riscv.cc | 20 +++++++++++ .../gcc.target/riscv/cpymem-32-ooo.c | 20 +++++------ .../gcc.target/riscv/cpymem-64-ooo.c | 33 +++++++------------ 3 files changed, 40 insertions(+), 33 deletions(-) diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 44945d47fd6..793ec3155b9 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -286,6 +286,7 @@ struct riscv_tune_param unsigned short memory_cost; unsigned short fmv_cost; bool slow_unaligned_access; + bool overlap_op_by_pieces; bool use_divmod_expansion; unsigned int fusible_ops; const struct cpu_vector_cost *vec_costs; @@ -425,6 +426,7 @@ static const struct riscv_tune_param rocket_tune_info = { 5, /* memory_cost */ 8, /* fmv_cost */ true, /* slow_unaligned_access */ + false, /* overlap_op_by_pieces */ false, /* use_divmod_expansion */ RISCV_FUSE_NOTHING, /* fusible_ops */ NULL, /* vector cost */ @@ -442,6 +444,7 @@ static const struct riscv_tune_param sifive_7_tune_info = { 3, /* memory_cost */ 8, /* fmv_cost */ true, /* slow_unaligned_access */ + false, /* overlap_op_by_pieces */ false, /* use_divmod_expansion */ RISCV_FUSE_NOTHING, /* fusible_ops */ NULL, /* vector cost */ @@ -459,6 +462,7 @@ static const struct riscv_tune_param sifive_p400_tune_info = { 3, /* memory_cost */ 4, /* fmv_cost */ true, /* slow_unaligned_access */ + false, /* overlap_op_by_pieces */ false, /* use_divmod_expansion */ RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI, /* fusible_ops */ &generic_vector_cost, /* vector cost */ @@ -476,6 +480,7 @@ static const struct riscv_tune_param sifive_p600_tune_info = { 3, /* memory_cost */ 4, /* fmv_cost */ true, /* slow_unaligned_access */ + false, /* overlap_op_by_pieces */ false, /* use_divmod_expansion */ RISCV_FUSE_LUI_ADDI | RISCV_FUSE_AUIPC_ADDI, /* fusible_ops */ &generic_vector_cost, /* vector cost */ @@ -493,6 +498,7 @@ static const struct riscv_tune_param thead_c906_tune_info = { 5, /* memory_cost */ 8, /* fmv_cost */ false, /* slow_unaligned_access */ + false, /* overlap_op_by_pieces */ false, /* use_divmod_expansion */ RISCV_FUSE_NOTHING, /* fusible_ops */ NULL, /* vector cost */ @@ -510,6 +516,7 @@ static const struct riscv_tune_param xiangshan_nanhu_tune_info = { 3, /* memory_cost */ 3, /* fmv_cost */ true, /* slow_unaligned_access */ + false, /* overlap_op_by_pieces */ false, /* use_divmod_expansion */ RISCV_FUSE_ZEXTW | RISCV_FUSE_ZEXTH, /* fusible_ops */ NULL, /* vector cost */ @@ -527,6 +534,7 @@ static const struct riscv_tune_param generic_ooo_tune_info = { 4, /* memory_cost */ 4, /* fmv_cost */ false, /* slow_unaligned_access */ + true, /* overlap_op_by_pieces */ false, /* use_divmod_expansion */ RISCV_FUSE_NOTHING, /* fusible_ops */ &generic_vector_cost, /* vector cost */ @@ -544,6 +552,7 @@ static const struct riscv_tune_param optimize_size_tune_info = { 2, /* memory_cost */ 8, /* fmv_cost */ false, /* slow_unaligned_access */ + false, /* overlap_op_by_pieces */ false, /* use_divmod_expansion */ RISCV_FUSE_NOTHING, /* fusible_ops */ NULL, /* vector cost */ @@ -9923,6 +9932,14 @@ riscv_slow_unaligned_access (machine_mode, unsigned int) return riscv_slow_unaligned_access_p; } +/* Implement TARGET_OVERLAP_OP_BY_PIECES_P. */ + +static bool +riscv_overlap_op_by_pieces (void) +{ + return tune_param->overlap_op_by_pieces; +} + /* Implement TARGET_CAN_CHANGE_MODE_CLASS. */ static bool @@ -11340,6 +11357,9 @@ riscv_get_raw_result_mode (int regno) #undef TARGET_SLOW_UNALIGNED_ACCESS #define TARGET_SLOW_UNALIGNED_ACCESS riscv_slow_unaligned_access +#undef TARGET_OVERLAP_OP_BY_PIECES_P +#define TARGET_OVERLAP_OP_BY_PIECES_P riscv_overlap_op_by_pieces + #undef TARGET_SECONDARY_MEMORY_NEEDED #define TARGET_SECONDARY_MEMORY_NEEDED riscv_secondary_memory_needed diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c index 946a773f77a..947d58c30fa 100644 --- a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c +++ b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c @@ -24,9 +24,8 @@ void copy_aligned_##N (void *to, void *from) \ ** ... ** lw\t[at][0-9],0\([at][0-9]\) ** sw\t[at][0-9],0\([at][0-9]\) -** ... -** lbu\t[at][0-9],6\([at][0-9]\) -** sb\t[at][0-9],6\([at][0-9]\) +** lw\t[at][0-9],3\([at][0-9]\) +** sw\t[at][0-9],3\([at][0-9]\) ** ... */ COPY_N(7) @@ -36,9 +35,8 @@ COPY_N(7) ** ... ** lw\t[at][0-9],0\([at][0-9]\) ** sw\t[at][0-9],0\([at][0-9]\) -** ... -** lbu\t[at][0-9],6\([at][0-9]\) -** sb\t[at][0-9],6\([at][0-9]\) +** lw\t[at][0-9],3\([at][0-9]\) +** sw\t[at][0-9],3\([at][0-9]\) ** ... */ COPY_ALIGNED_N(7) @@ -66,11 +64,10 @@ COPY_ALIGNED_N(8) ** ... ** ... ** lw\t[at][0-9],0\([at][0-9]\) -** ... ** sw\t[at][0-9],0\([at][0-9]\) ** ... -** lbu\t[at][0-9],10\([at][0-9]\) -** sb\t[at][0-9],10\([at][0-9]\) +** lw\t[at][0-9],7\([at][0-9]\) +** sw\t[at][0-9],7\([at][0-9]\) ** ... */ COPY_N(11) @@ -79,11 +76,10 @@ COPY_N(11) **copy_aligned_11: ** ... ** lw\t[at][0-9],0\([at][0-9]\) -** ... ** sw\t[at][0-9],0\([at][0-9]\) ** ... -** lbu\t[at][0-9],10\([at][0-9]\) -** sb\t[at][0-9],10\([at][0-9]\) +** lw\t[at][0-9],7\([at][0-9]\) +** sw\t[at][0-9],7\([at][0-9]\) ** ... */ COPY_ALIGNED_N(11) diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c index 08a927b9483..108748690cd 100644 --- a/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c +++ b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c @@ -24,9 +24,8 @@ void copy_aligned_##N (void *to, void *from) \ ** ... ** lw\t[at][0-9],0\([at][0-9]\) ** sw\t[at][0-9],0\([at][0-9]\) -** ... -** lbu\t[at][0-9],6\([at][0-9]\) -** sb\t[at][0-9],6\([at][0-9]\) +** lw\t[at][0-9],3\([at][0-9]\) +** sw\t[at][0-9],3\([at][0-9]\) ** ... */ COPY_N(7) @@ -36,9 +35,8 @@ COPY_N(7) ** ... ** lw\t[at][0-9],0\([at][0-9]\) ** sw\t[at][0-9],0\([at][0-9]\) -** ... -** lbu\t[at][0-9],6\([at][0-9]\) -** sb\t[at][0-9],6\([at][0-9]\) +** lw\t[at][0-9],3\([at][0-9]\) +** sw\t[at][0-9],3\([at][0-9]\) ** ... */ COPY_ALIGNED_N(7) @@ -66,9 +64,8 @@ COPY_ALIGNED_N(8) ** ... ** ld\t[at][0-9],0\([at][0-9]\) ** sd\t[at][0-9],0\([at][0-9]\) -** ... -** lbu\t[at][0-9],10\([at][0-9]\) -** sb\t[at][0-9],10\([at][0-9]\) +** lw\t[at][0-9],7\([at][0-9]\) +** sw\t[at][0-9],7\([at][0-9]\) ** ... */ COPY_N(11) @@ -77,11 +74,9 @@ COPY_N(11) **copy_aligned_11: ** ... ** ld\t[at][0-9],0\([at][0-9]\) -** ... ** sd\t[at][0-9],0\([at][0-9]\) -** ... -** lbu\t[at][0-9],10\([at][0-9]\) -** sb\t[at][0-9],10\([at][0-9]\) +** lw\t[at][0-9],7\([at][0-9]\) +** sw\t[at][0-9],7\([at][0-9]\) ** ... */ COPY_ALIGNED_N(11) @@ -90,11 +85,9 @@ COPY_ALIGNED_N(11) **copy_15: ** ... ** ld\t[at][0-9],0\([at][0-9]\) -** ... ** sd\t[at][0-9],0\([at][0-9]\) -** ... -** lbu\t[at][0-9],14\([at][0-9]\) -** sb\t[at][0-9],14\([at][0-9]\) +** ld\t[at][0-9],7\([at][0-9]\) +** sd\t[at][0-9],7\([at][0-9]\) ** ... */ COPY_N(15) @@ -103,11 +96,9 @@ COPY_N(15) **copy_aligned_15: ** ... ** ld\t[at][0-9],0\([at][0-9]\) -** ... ** sd\t[at][0-9],0\([at][0-9]\) -** ... -** lbu\t[at][0-9],14\([at][0-9]\) -** sb\t[at][0-9],14\([at][0-9]\) +** ld\t[at][0-9],7\([at][0-9]\) +** sd\t[at][0-9],7\([at][0-9]\) ** ... */ COPY_ALIGNED_N(15) From patchwork Wed May 8 05:17:56 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Christoph_M=C3=BCllner?= X-Patchwork-Id: 1932783 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=vrull.eu header.i=@vrull.eu header.a=rsa-sha256 header.s=google header.b=DCZGqAvz; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4VZ3Nw53H1z1ymg for ; Wed, 8 May 2024 15:19:44 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E32A63858C35 for ; Wed, 8 May 2024 05:19:42 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-ej1-x633.google.com (mail-ej1-x633.google.com [IPv6:2a00:1450:4864:20::633]) by sourceware.org (Postfix) with ESMTPS id C35413858D37 for ; Wed, 8 May 2024 05:18:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C35413858D37 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C35413858D37 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::633 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715145502; cv=none; b=SK+YSYO9V2EZXv2nC3IB9cHjMopZLh8GbieWdrdZp2/pBn/gwwft5Eq2h4Wn+/EBhp+0TcM/sbXJ6KqN/CJZVQxDz7GRl2Jb/R4WjEU96dUb6hexeODLIJio+vONLANRKefzruz+DtX1khOq33VPPK0jMsx6cHQhmB8npbGtXgE= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1715145502; c=relaxed/simple; bh=wBiIt5GJG2Dn9Jp4pdVQnj588eA35APDoB1wDKpN3YI=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=gZiRI4WRhDVmYMs0derrGiGeez+AJJR8kktdXcx/od6DZD4by9zK/3ZpDXe/1Cr3jIBwQKhAqBUDunj+TfI1fuQcA9XrL7M35ZI9LwCfCEfNRLcvDO1BTJM5i6mDrw8Zg2Ub9wT9PcdtM+VEzBrZh63Nj6mYM0eMiNIWfiCshRU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-ej1-x633.google.com with SMTP id a640c23a62f3a-a59c5c9c6aeso789011266b.2 for ; Tue, 07 May 2024 22:18:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1715145495; x=1715750295; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=1dLvltrpAsILEhDtSa6+jf7VCSq9ST+GKIycVW34kbo=; b=DCZGqAvzwNURLlnuMjaLFtW5b7X0d/E18VdbUeep75UyBj2m4Sv4tdm3RE7ORD0Hw9 PRoTQHq6ZOMJu59mdRiaFldKVj/0G/Wf7gsyddL+01NIxdkp5L/d9whXoBZnrzdTkBn1 AhjOyto3IZp8+5B4+ei5Dfmh7AZpSFADkRlE5iEw5EOa5pNmOirG4oC9KjJ0zdcmNsqi jbHgfUlyCkPtir0733KjVmzKlAhWBnztfmwzLdEoneGUBxa26kuK2kxYRdpv2yF97ntO ZdQZZZMbDbxJ0OD41GGGfCveb+dVDFJuu+8mNcpb/u44a1zMQRye2GHrDElcQ0pVgSis AFDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1715145495; x=1715750295; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1dLvltrpAsILEhDtSa6+jf7VCSq9ST+GKIycVW34kbo=; b=ijZFLCfeuNHp6l+hEl3yVURCOqlO66kjXSXZTgb/AEuX9E+ngtfnG8PD/jPp5yUZEk auQHCDHhnJlUFObnIzD1lJrkQ9TVE9vilZXHMLLaCBh60UUZmvafS0zr4O5JievqY5gw 3D06u98zB5gIpyfk1VSzcuaG2vPevLArhivi//1LA+YdzUIBwUlMAG8EZxebGyawSkDB az2wSNrvAhGvozTlYrURB7axijUNAoaNo5ZNbXbafFGgQWgzzhDxS3sC0pCvOuB4Vej1 laf4kUOVCs0so9mzgQyFkGi6L+EYOX36O1OM9VI5CHBbsnBHTD8u7JHYY+Ot/4fKbtjl BXIg== X-Gm-Message-State: AOJu0YyUpvv6Y5KXf2byauDHG8kbMkpFzqelbq0T3ul6IlCKs9393jeB oX4vFiXP79Eabeqs1xf2NCLPq45lN3YsVatH8+c65k4XizJqqRqYYINxrgYSlkhUxgtANIO/+BL Gbs0= X-Google-Smtp-Source: AGHT+IHTp65SDylVVUgQONxC6N8xt+HsuY3cSSWuxybWXe6yrNtvknNBpGwzq0ZGvoPz83uqDvSnoA== X-Received: by 2002:a17:907:78cf:b0:a59:ca33:6841 with SMTP id a640c23a62f3a-a59fb9588bamr83915866b.32.1715145494804; Tue, 07 May 2024 22:18:14 -0700 (PDT) Received: from antares.fritz.box (62-178-148-172.cable.dynamic.surfer.at. [62.178.148.172]) by smtp.gmail.com with ESMTPSA id bo1-20020a0564020b2100b00572a8c07345sm7156555edb.54.2024.05.07.22.18.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 07 May 2024 22:18:14 -0700 (PDT) From: =?utf-8?q?Christoph_M=C3=BCllner?= To: gcc-patches@gcc.gnu.org, Kito Cheng , Jim Wilson , Palmer Dabbelt , Andrew Waterman , Philipp Tomsich , Jeff Law , Vineet Gupta Cc: =?utf-8?q?Christoph_M=C3=BCllner?= Subject: [PATCH 4/4] RISC-V: Allow by-pieces to do overlapping accesses in block_move_straight Date: Wed, 8 May 2024 07:17:56 +0200 Message-ID: <20240508051756.3999080-5-christoph.muellner@vrull.eu> X-Mailer: git-send-email 2.44.0 In-Reply-To: <20240508051756.3999080-1-christoph.muellner@vrull.eu> References: <20240508051756.3999080-1-christoph.muellner@vrull.eu> MIME-Version: 1.0 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_MANYTO, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org The current implementation of riscv_block_move_straight() emits a couple of loads/stores with with maximum width (e.g. 8-byte for RV64). The remainder is handed over to move_by_pieces(). The by-pieces framework utilizes target hooks to decide about the emitted instructions (e.g. unaligned accesses or overlapping accesses). Since the current implementation will always request less than XLEN bytes to be handled by the by-pieces infrastructure, it is impossible that overlapping memory accesses can ever be emitted (the by-pieces code does not know of any previous instructions that were emitted by the backend). This patch changes the implementation of riscv_block_move_straight() such, that it utilizes the by-pieces framework if the remaining data is less than 2*XLEN bytes, which is sufficient to enable overlapping memory accesses (if the requirements for them are given). The changes in the expansion can be seen in the adjustments of the cpymem-NN-ooo test cases. The changes in the cpymem-NN tests are caused by the different instruction ordering of the code emitted by the by-pieces infrastructure, which emits alternating load/store sequences. gcc/ChangeLog: * config/riscv/riscv-string.cc (riscv_block_move_straight): Hand over up to 2xXLEN bytes to move_by_pieces(). gcc/testsuite/ChangeLog: * gcc.target/riscv/cpymem-32-ooo.c: Adjustments for overlapping access. * gcc.target/riscv/cpymem-32.c: Adjustments for code emitted by by-pieces. * gcc.target/riscv/cpymem-64-ooo.c: Adjustments for overlapping access. * gcc.target/riscv/cpymem-64.c: Adjustments for code emitted by by-pieces. Signed-off-by: Christoph Müllner --- gcc/config/riscv/riscv-string.cc | 6 +++--- gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c | 16 ++++++++-------- gcc/testsuite/gcc.target/riscv/cpymem-32.c | 10 ++++------ gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c | 8 ++++---- gcc/testsuite/gcc.target/riscv/cpymem-64.c | 9 +++------ 5 files changed, 22 insertions(+), 27 deletions(-) diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc index 8fc0877772f..38cf60eb9cf 100644 --- a/gcc/config/riscv/riscv-string.cc +++ b/gcc/config/riscv/riscv-string.cc @@ -630,18 +630,18 @@ riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length, delta = bits / BITS_PER_UNIT; /* Allocate a buffer for the temporary registers. */ - regs = XALLOCAVEC (rtx, length / delta); + regs = XALLOCAVEC (rtx, length / delta - 1); /* Load as many BITS-sized chunks as possible. Use a normal load if the source has enough alignment, otherwise use left/right pairs. */ - for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++) + for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++) { regs[i] = gen_reg_rtx (mode); riscv_emit_move (regs[i], adjust_address (src, mode, offset)); } /* Copy the chunks to the destination. */ - for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++) + for (offset = 0, i = 0; offset + 2 * delta <= length; offset += delta, i++) riscv_emit_move (adjust_address (dest, mode, offset), regs[i]); /* Mop up any left-over bytes. */ diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c index 947d58c30fa..2a48567353a 100644 --- a/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c +++ b/gcc/testsuite/gcc.target/riscv/cpymem-32-ooo.c @@ -91,8 +91,8 @@ COPY_ALIGNED_N(11) ** ... ** sw\t[at][0-9],0\([at][0-9]\) ** ... -** lbu\t[at][0-9],14\([at][0-9]\) -** sb\t[at][0-9],14\([at][0-9]\) +** lw\t[at][0-9],11\([at][0-9]\) +** sw\t[at][0-9],11\([at][0-9]\) ** ... */ COPY_N(15) @@ -104,8 +104,8 @@ COPY_N(15) ** ... ** sw\t[at][0-9],0\([at][0-9]\) ** ... -** lbu\t[at][0-9],14\([at][0-9]\) -** sb\t[at][0-9],14\([at][0-9]\) +** lw\t[at][0-9],11\([at][0-9]\) +** sw\t[at][0-9],11\([at][0-9]\) ** ... */ COPY_ALIGNED_N(15) @@ -117,8 +117,8 @@ COPY_ALIGNED_N(15) ** ... ** sw\t[at][0-9],20\([at][0-9]\) ** ... -** lbu\t[at][0-9],26\([at][0-9]\) -** sb\t[at][0-9],26\([at][0-9]\) +** lw\t[at][0-9],23\([at][0-9]\) +** sw\t[at][0-9],23\([at][0-9]\) ** ... */ COPY_N(27) @@ -130,8 +130,8 @@ COPY_N(27) ** ... ** sw\t[at][0-9],20\([at][0-9]\) ** ... -** lbu\t[at][0-9],26\([at][0-9]\) -** sb\t[at][0-9],26\([at][0-9]\) +** lw\t[at][0-9],23\([at][0-9]\) +** sw\t[at][0-9],23\([at][0-9]\) ** ... */ COPY_ALIGNED_N(27) diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-32.c b/gcc/testsuite/gcc.target/riscv/cpymem-32.c index 44ba14a1d51..2030a39ca97 100644 --- a/gcc/testsuite/gcc.target/riscv/cpymem-32.c +++ b/gcc/testsuite/gcc.target/riscv/cpymem-32.c @@ -24,10 +24,10 @@ void copy_aligned_##N (void *to, void *from) \ ** ... ** lbu\t[at][0-9],0\([at][0-9]\) ** ... -** lbu\t[at][0-9],6\([at][0-9]\) -** ... ** sb\t[at][0-9],0\([at][0-9]\) ** ... +** lbu\t[at][0-9],6\([at][0-9]\) +** ... ** sb\t[at][0-9],6\([at][0-9]\) ** ... */ @@ -50,10 +50,9 @@ COPY_ALIGNED_N(7) ** ... ** lbu\t[at][0-9],0\([at][0-9]\) ** ... -** lbu\t[at][0-9],7\([at][0-9]\) -** ... ** sb\t[at][0-9],0\([at][0-9]\) ** ... +** lbu\t[at][0-9],7\([at][0-9]\) ** sb\t[at][0-9],7\([at][0-9]\) ** ... */ @@ -73,10 +72,9 @@ COPY_ALIGNED_N(8) ** ... ** lbu\t[at][0-9],0\([at][0-9]\) ** ... -** lbu\t[at][0-9],10\([at][0-9]\) -** ... ** sb\t[at][0-9],0\([at][0-9]\) ** ... +** lbu\t[at][0-9],10\([at][0-9]\) ** sb\t[at][0-9],10\([at][0-9]\) ** ... */ diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c index 108748690cd..147324093cb 100644 --- a/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c +++ b/gcc/testsuite/gcc.target/riscv/cpymem-64-ooo.c @@ -110,8 +110,8 @@ COPY_ALIGNED_N(15) ** ... ** sd\t[at][0-9],16\([at][0-9]\) ** ... -** lbu\t[at][0-9],26\([at][0-9]\) -** sb\t[at][0-9],26\([at][0-9]\) +** lw\t[at][0-9],23\([at][0-9]\) +** sw\t[at][0-9],23\([at][0-9]\) ** ... */ COPY_N(27) @@ -123,8 +123,8 @@ COPY_N(27) ** ... ** sd\t[at][0-9],16\([at][0-9]\) ** ... -** lbu\t[at][0-9],26\([at][0-9]\) -** sb\t[at][0-9],26\([at][0-9]\) +** lw\t[at][0-9],23\([at][0-9]\) +** sw\t[at][0-9],23\([at][0-9]\) ** ... */ COPY_ALIGNED_N(27) diff --git a/gcc/testsuite/gcc.target/riscv/cpymem-64.c b/gcc/testsuite/gcc.target/riscv/cpymem-64.c index bdfaca0d46a..37b8ef0e020 100644 --- a/gcc/testsuite/gcc.target/riscv/cpymem-64.c +++ b/gcc/testsuite/gcc.target/riscv/cpymem-64.c @@ -24,10 +24,9 @@ void copy_aligned_##N (void *to, void *from) \ ** ... ** lbu\t[at][0-9],0\([at][0-9]\) ** ... -** lbu\t[at][0-9],6\([at][0-9]\) -** ... ** sb\t[at][0-9],0\([at][0-9]\) ** ... +** lbu\t[at][0-9],6\([at][0-9]\) ** sb\t[at][0-9],6\([at][0-9]\) ** ... */ @@ -50,10 +49,9 @@ COPY_ALIGNED_N(7) ** ... ** lbu\t[at][0-9],0\([at][0-9]\) ** ... -** lbu\t[at][0-9],7\([at][0-9]\) -** ... ** sb\t[at][0-9],0\([at][0-9]\) ** ... +** lbu\t[at][0-9],7\([at][0-9]\) ** sb\t[at][0-9],7\([at][0-9]\) ** ... */ @@ -73,10 +71,9 @@ COPY_ALIGNED_N(8) ** ... ** lbu\t[at][0-9],0\([at][0-9]\) ** ... -** lbu\t[at][0-9],10\([at][0-9]\) -** ... ** sb\t[at][0-9],0\([at][0-9]\) ** ... +** lbu\t[at][0-9],10\([at][0-9]\) ** sb\t[at][0-9],10\([at][0-9]\) ** ... */