From patchwork Tue Jan 30 14:31:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Andre Vieira (lists)" X-Patchwork-Id: 1893010 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TPSLy41lGz23fD for ; Wed, 31 Jan 2024 01:32:58 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 84CA53858012 for ; Tue, 30 Jan 2024 14:32:56 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 1A63A38582A0 for ; Tue, 30 Jan 2024 14:32:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1A63A38582A0 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 1A63A38582A0 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706625146; cv=none; b=fcSGQ/SoNJXbUfQTuhpGgUfmbe4SJNj878QPpYC5rcowt/StRP89JVTrAcQTSsSKXRsd+CJD0k6Ugp8XAWCRLJucj17PQDgVR/79X+o2uwLft80oVDRwp+2WamDX2aPfvSEGI3ZveTlYh7uhwcELPklSojhCpfOBMT7XiCP9gUY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706625146; c=relaxed/simple; bh=ynQfSAYX193yoWWaRm3cd6lqPEHB2y881+sB6dpqVfE=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=FvtF6a/RSlo03ECxGauXCH+54fhW9lbMNd5TiXEPtWSrWx6FgAG3NUaCSVGvzJhh0dJwmOlrGrif1hB55s52y7azZIIW8yEYW/7C/5Arh4rdMFSsjOe9qu69is5bkHd5UlsE2HYHUMiIa24v0mVzgz7k1WQ/X/gILLIJuik+wRk= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C4C11139F; Tue, 30 Jan 2024 06:33:05 -0800 (PST) Received: from e107157-lin.cambridge.arm.com (e107157-lin.cambridge.arm.com [10.2.78.70]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D01E63F762; Tue, 30 Jan 2024 06:32:20 -0800 (PST) From: Andre Vieira To: gcc-patches@gcc.gnu.org Cc: Richard.Sandiford@arm.com, rguenther@suse.de, Andre Vieira Subject: [PATCH 1/3] vect: Pass stmt_vec_info to TARGET_SIMD_CLONE_USABLE Date: Tue, 30 Jan 2024 14:31:30 +0000 Message-Id: <20240130143132.9575-2-andre.simoesdiasvieira@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20240130143132.9575-1-andre.simoesdiasvieira@arm.com> References: <20240130143132.9575-1-andre.simoesdiasvieira@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This patch adds stmt_vec_info to TARGET_SIMD_CLONE_USABLE to make sure the target can reject a simd_clone based on the vector mode it is using. This is needed because for VLS SVE vectorization the vectorizer accepts Advanced SIMD simd clones when vectorizing using SVE types because the simdlens might match. This will cause type errors later on. Other targets do not currently need to use this argument. gcc/ChangeLog: * target.def (TARGET_SIMD_CLONE_USABLE): Add argument. * tree-vect-stmts.cc (vectorizable_simd_clone_call): Pass stmt_info to call TARGET_SIMD_CLONE_USABLE. * config/aarch64/aarch64.cc (aarch64_simd_clone_usable): Add argument and use it to reject the use of SVE simd clones with Advanced SIMD modes. * config/gcn/gcn.cc (gcn_simd_clone_usable): Add unused argument. * config/i386/i386.cc (ix86_simd_clone_usable): Likewise. diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index a37d47b243e..31617510160 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -28694,13 +28694,16 @@ aarch64_simd_clone_adjust (struct cgraph_node *node) /* Implement TARGET_SIMD_CLONE_USABLE. */ static int -aarch64_simd_clone_usable (struct cgraph_node *node) +aarch64_simd_clone_usable (struct cgraph_node *node, stmt_vec_info stmt_vinfo) { switch (node->simdclone->vecsize_mangle) { case 'n': if (!TARGET_SIMD) return -1; + if (STMT_VINFO_VECTYPE (stmt_vinfo) + && aarch64_sve_mode_p (TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo)))) + return -1; return 0; default: gcc_unreachable (); diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc index e80de2ce056..c48b212d9e6 100644 --- a/gcc/config/gcn/gcn.cc +++ b/gcc/config/gcn/gcn.cc @@ -5658,7 +5658,8 @@ gcn_simd_clone_adjust (struct cgraph_node *ARG_UNUSED (node)) /* Implement TARGET_SIMD_CLONE_USABLE. */ static int -gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node)) +gcn_simd_clone_usable (struct cgraph_node *ARG_UNUSED (node), + stmt_vec_info ARG_UNUSED (stmt_vinfo)) { /* We don't need to do anything here because gcn_simd_clone_compute_vecsize_and_simdlen currently only returns one diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index b3e7c74846e..63e6b9d2643 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -25193,7 +25193,8 @@ ix86_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, slightly less desirable, etc.). */ static int -ix86_simd_clone_usable (struct cgraph_node *node) +ix86_simd_clone_usable (struct cgraph_node *node, + stmt_vec_info ARG_UNUSED (stmt_vinfo)) { switch (node->simdclone->vecsize_mangle) { diff --git a/gcc/target.def b/gcc/target.def index fdad7bbc93e..4fade9c4eec 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1648,7 +1648,7 @@ DEFHOOK in vectorized loops in current function, or non-negative number if it is\n\ usable. In that case, the smaller the number is, the more desirable it is\n\ to use it.", -int, (struct cgraph_node *), NULL) +int, (struct cgraph_node *, _stmt_vec_info *), NULL) HOOK_VECTOR_END (simd_clone) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 1dbe1115da4..da02082c034 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -4074,7 +4074,7 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, this_badness += floor_log2 (num_calls) * 4096; if (n->simdclone->inbranch) this_badness += 8192; - int target_badness = targetm.simd_clone.usable (n); + int target_badness = targetm.simd_clone.usable (n, stmt_info); if (target_badness < 0) continue; this_badness += target_badness * 512; From patchwork Tue Jan 30 14:31:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Andre Vieira (lists)" X-Patchwork-Id: 1893011 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TPSM35Zrbz23fD for ; Wed, 31 Jan 2024 01:33:03 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C25723857C46 for ; Tue, 30 Jan 2024 14:33:01 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id B9C763858425 for ; Tue, 30 Jan 2024 14:32:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B9C763858425 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B9C763858425 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706625147; cv=none; b=PAhmZZhEBCYuEyY5TChcSp+0wEWAIzFnke3lEK4Vs0Nm/GtGkdxRuNTRTfyyhaW5zWuhgxi9mWK5VTrjKNqNUBJyJKRdds0W1sBOAb0wEEHOD1vYwmNIznHPMXqOY8JIUDgitwW/B35R0uPy//mmrWTQhlakR5XigJQ9uSoPofI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706625147; c=relaxed/simple; bh=RGU2LQh7fRSGLk3IRNeCR/evwP75S5OWZWZWX5p1VB4=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=wS0mzMrvtwP8kD+8fIaoDYAMt7MoA00/pyjvEW1RI+KQrLhr2H6O7bO1fVSAODBhUa/8w1ix8biV6y1zk4BhgnpEJXpN7bXGU4/M0v7nQRO/7aubUcOeaCrMSKf17WHojTz+emWnSlu2Ddxvm0iBfEWI4ZVZIBoQrXU80KIjeis= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6D0DA1692; Tue, 30 Jan 2024 06:33:07 -0800 (PST) Received: from e107157-lin.cambridge.arm.com (e107157-lin.cambridge.arm.com [10.2.78.70]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 756E03F762; Tue, 30 Jan 2024 06:32:22 -0800 (PST) From: Andre Vieira To: gcc-patches@gcc.gnu.org Cc: Richard.Sandiford@arm.com, rguenther@suse.de, Andre Vieira Subject: [PATCH 2/3] vect: disable multiple calls of poly simdclones Date: Tue, 30 Jan 2024 14:31:31 +0000 Message-Id: <20240130143132.9575-3-andre.simoesdiasvieira@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20240130143132.9575-1-andre.simoesdiasvieira@arm.com> References: <20240130143132.9575-1-andre.simoesdiasvieira@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org The current codegen code to support VF's that are multiples of a simdclone simdlen rely on BIT_FIELD_REF to create multiple input vectors. This does not work for non-constant simdclones, so we should disable using such clones when the VF is a multiple of the non-constant simdlen until we change the codegen to support those. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_simd_clone_call): Reject simdclones with non-constant simdlen when VF is not exactly the same. diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index da02082c034..9bfb898683d 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -4068,7 +4068,10 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info, if (!constant_multiple_p (vf * group_size, n->simdclone->simdlen, &num_calls) || (!n->simdclone->inbranch && (masked_call_offset > 0)) - || (nargs != simd_nargs)) + || (nargs != simd_nargs) + /* Currently we do not support multiple calls of non-constant + simdlen as poly vectors can not be accessed by BIT_FIELD_REF. */ + || (!n->simdclone->simdlen.is_constant () && num_calls != 1)) continue; if (num_calls != 1) this_badness += floor_log2 (num_calls) * 4096; From patchwork Tue Jan 30 14:31:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Andre Vieira (lists)" X-Patchwork-Id: 1893012 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=gcc.gnu.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TPSNM2qC8z23fD for ; Wed, 31 Jan 2024 01:34:11 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5615A385803B for ; Tue, 30 Jan 2024 14:34:09 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id AC19A385840D for ; Tue, 30 Jan 2024 14:32:26 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AC19A385840D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org AC19A385840D Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706625150; cv=none; b=JYR/VC4lQ6rKS9/qVd4FK4d2z9LvNxdb9Iu8DCIjcqlJL5R2n6yYB2mu3fQJDXe3WmZAbRuMmFhLdAj5gK/VpwHtmNoaDOZ5XElsEp/m1XgpOpcSKotZp90ma9dLhRuQAA11MkKKA+Imapg2Q/9B38YjvsNi4SHCMRjqDrrk/9M= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706625150; c=relaxed/simple; bh=sQmuc3+GLDEa5Xo9lFco21Zyned2EfqfpZjw0277UwM=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=JRUSQlqpGXIoNIV5gYTIF+ujLTvU0KktVWGrmnciNgNVo7Gc43zH+JwS7PZXXmCJIiP9D4akSKnHgKn892Ajdp+7p4jsUEsDtwFpQOUd00HlyjXP4oLIdADpswQCS1pDOgF7JEeMR8oIXiBnXcg5XdAMRMn1qmNQeQPgNlwGjbM= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3E2C2DA7; Tue, 30 Jan 2024 06:33:09 -0800 (PST) Received: from e107157-lin.cambridge.arm.com (e107157-lin.cambridge.arm.com [10.2.78.70]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1DC3E3F762; Tue, 30 Jan 2024 06:32:23 -0800 (PST) From: Andre Vieira To: gcc-patches@gcc.gnu.org Cc: Richard.Sandiford@arm.com, rguenther@suse.de, Andre Vieira Subject: [PATCH 3/3] aarch64: Add SVE support for simd clones [PR 96342] Date: Tue, 30 Jan 2024 14:31:32 +0000 Message-Id: <20240130143132.9575-4-andre.simoesdiasvieira@arm.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20240130143132.9575-1-andre.simoesdiasvieira@arm.com> References: <20240130143132.9575-1-andre.simoesdiasvieira@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-13.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+incoming=patchwork.ozlabs.org@gcc.gnu.org This patch finalizes adding support for the generation of SVE simd clones when no simdlen is provided, following the ABI rules where the widest data type determines the minimum amount of elements in a length agnostic vector. gcc/ChangeLog: * config/aarch64/aarch64-protos.h (add_sve_type_attribute): Declare. * config/aarch64/aarch64-sve-builtins.cc (add_sve_type_attribute): Make visibility global and support use for non_acle types. * config/aarch64/aarch64.cc (aarch64_simd_clone_compute_vecsize_and_simdlen): Create VLA simd clone when no simdlen is provided, according to ABI rules. (simd_clone_adjust_sve_vector_type): New helper function. (aarch64_simd_clone_adjust): Add '+sve' attribute to SVE simd clones and modify types to use SVE types. * omp-simd-clone.cc (simd_clone_mangle): Print 'x' for VLA simdlen. (simd_clone_adjust): Adapt safelen check to be compatible with VLA simdlen. gcc/testsuite/ChangeLog: * c-c++-common/gomp/declare-variant-14.c: Make i?86 and x86_64 target only test. * gfortran.dg/gomp/declare-variant-14.f90: Likewise. * gcc.target/aarch64/declare-simd-2.c: Add SVE clone scan. * gcc.target/aarch64/vect-simd-clone-1.c: New test. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index a0b142e0b94..207396de0ff 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -1031,6 +1031,8 @@ namespace aarch64_sve { #ifdef GCC_TARGET_H bool verify_type_context (location_t, type_context_kind, const_tree, bool); #endif + void add_sve_type_attribute (tree, unsigned int, unsigned int, + const char *, const char *); } extern void aarch64_split_combinev16qi (rtx operands[3]); diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc index 11f5c5c500c..747131e684e 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc @@ -953,14 +953,16 @@ static bool reported_missing_registers_p; /* Record that TYPE is an ABI-defined SVE type that contains NUM_ZR SVE vectors and NUM_PR SVE predicates. MANGLED_NAME, if nonnull, is the ABI-defined mangling of the type. ACLE_NAME is the name of the type. */ -static void +void add_sve_type_attribute (tree type, unsigned int num_zr, unsigned int num_pr, const char *mangled_name, const char *acle_name) { tree mangled_name_tree = (mangled_name ? get_identifier (mangled_name) : NULL_TREE); + tree acle_name_tree + = (acle_name ? get_identifier (acle_name) : NULL_TREE); - tree value = tree_cons (NULL_TREE, get_identifier (acle_name), NULL_TREE); + tree value = tree_cons (NULL_TREE, acle_name_tree, NULL_TREE); value = tree_cons (NULL_TREE, mangled_name_tree, value); value = tree_cons (NULL_TREE, size_int (num_pr), value); value = tree_cons (NULL_TREE, size_int (num_zr), value); diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 31617510160..cba8879ab33 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -28527,7 +28527,7 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, int num, bool explicit_p) { tree t, ret_type; - unsigned int nds_elt_bits; + unsigned int nds_elt_bits, wds_elt_bits; unsigned HOST_WIDE_INT const_simdlen; if (!TARGET_SIMD) @@ -28572,10 +28572,14 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, if (TREE_CODE (ret_type) != VOID_TYPE) { nds_elt_bits = lane_size (SIMD_CLONE_ARG_TYPE_VECTOR, ret_type); + wds_elt_bits = nds_elt_bits; vec_elts.safe_push (std::make_pair (ret_type, nds_elt_bits)); } else - nds_elt_bits = POINTER_SIZE; + { + nds_elt_bits = POINTER_SIZE; + wds_elt_bits = 0; + } int i; tree type_arg_types = TYPE_ARG_TYPES (TREE_TYPE (node->decl)); @@ -28583,44 +28587,72 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, for (t = (decl_arg_p ? DECL_ARGUMENTS (node->decl) : type_arg_types), i = 0; t && t != void_list_node; t = TREE_CHAIN (t), i++) { - tree arg_type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t); + tree type = decl_arg_p ? TREE_TYPE (t) : TREE_VALUE (t); if (clonei->args[i].arg_type != SIMD_CLONE_ARG_TYPE_UNIFORM - && !supported_simd_type (arg_type)) + && !supported_simd_type (type)) { if (!explicit_p) ; - else if (COMPLEX_FLOAT_TYPE_P (ret_type)) + else if (COMPLEX_FLOAT_TYPE_P (type)) warning_at (DECL_SOURCE_LOCATION (node->decl), 0, "GCC does not currently support argument type %qT " - "for simd", arg_type); + "for simd", type); else warning_at (DECL_SOURCE_LOCATION (node->decl), 0, "unsupported argument type %qT for simd", - arg_type); + type); return 0; } - unsigned lane_bits = lane_size (clonei->args[i].arg_type, arg_type); + unsigned lane_bits = lane_size (clonei->args[i].arg_type, type); if (clonei->args[i].arg_type == SIMD_CLONE_ARG_TYPE_VECTOR) - vec_elts.safe_push (std::make_pair (arg_type, lane_bits)); + vec_elts.safe_push (std::make_pair (type, lane_bits)); if (nds_elt_bits > lane_bits) nds_elt_bits = lane_bits; + if (wds_elt_bits < lane_bits) + wds_elt_bits = lane_bits; } - clonei->vecsize_mangle = 'n'; + /* If we could not determine the WDS type from available parameters/return, + then fallback to using uintptr_t. */ + if (wds_elt_bits == 0) + wds_elt_bits = POINTER_SIZE; + clonei->mask_mode = VOIDmode; poly_uint64 simdlen; - auto_vec simdlens (2); + auto_vec simdlens (3); + auto_vec simdmangle (3); /* Keep track of the possible simdlens the clones of this function can have, and check them later to see if we support them. */ if (known_eq (clonei->simdlen, 0U)) { simdlen = exact_div (poly_uint64 (64), nds_elt_bits); if (maybe_ne (simdlen, 1U)) - simdlens.safe_push (simdlen); + { + simdlens.safe_push (simdlen); + simdmangle.safe_push ('n'); + } simdlens.safe_push (simdlen * 2); + simdmangle.safe_push ('n'); + /* Only create a SVE simd clone if we aren't dealing with an unprototyped + function. + We have also disabled support for creating SVE simdclones for functions + with function bodies and any simdclones when -msve-vector-bits is used. + TODO: add support for these. */ + if ((DECL_ARGUMENTS (node->decl) != 0 + || type_arg_types != 0) + && !node->definition + && !aarch64_sve_vg.is_constant ()) + { + poly_uint64 sve_simdlen = aarch64_sve_vg * 64; + simdlens.safe_push (exact_div (sve_simdlen, wds_elt_bits)); + simdmangle.safe_push ('s'); + } } else - simdlens.safe_push (clonei->simdlen); + { + simdlens.safe_push (clonei->simdlen); + simdmangle.safe_push ('n'); + } clonei->vecsize_int = 0; clonei->vecsize_float = 0; @@ -28638,7 +28670,8 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, { bool remove_simdlen = false; for (auto elt : vec_elts) - if (known_gt (simdlens[j] * elt.second, 128U)) + if (simdmangle[j] == 'n' + && known_gt (simdlens[j] * elt.second, 128U)) { /* Don't issue a warning for every simdclone when there is no specific simdlen clause. */ @@ -28651,12 +28684,14 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, break; } if (remove_simdlen) - simdlens.ordered_remove (j); + { + simdlens.ordered_remove (j); + simdmangle.ordered_remove (j); + } else j++; } - int count = simdlens.length (); if (count == 0) { @@ -28675,20 +28710,107 @@ aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node, gcc_assert (num < count); clonei->simdlen = simdlens[num]; + clonei->vecsize_mangle = simdmangle[num]; + /* SVE simdclones always have a Mask, so set inbranch to 1. */ + if (clonei->vecsize_mangle == 's') + clonei->inbranch = 1; return count; } +static tree +simd_clone_adjust_sve_vector_type (tree type, bool is_mask, poly_uint64 simdlen) +{ + unsigned int num_zr = 0; + unsigned int num_pr = 0; + machine_mode vector_mode; + type = TREE_TYPE (type); + scalar_mode scalar_m = as_a (TYPE_MODE (type)); + gcc_assert (aarch64_sve_data_mode (scalar_m, + simdlen).exists (&vector_mode)); + type = build_vector_type_for_mode (type, vector_mode); + if (is_mask) + { + type = truth_type_for (type); + num_pr = 1; + } + else + num_zr = 1; + + aarch64_sve::add_sve_type_attribute (type, num_zr, num_pr, NULL, + NULL); + return type; +} + /* Implement TARGET_SIMD_CLONE_ADJUST. */ static void aarch64_simd_clone_adjust (struct cgraph_node *node) { - /* Add aarch64_vector_pcs target attribute to SIMD clones so they - use the correct ABI. */ - tree t = TREE_TYPE (node->decl); - TYPE_ATTRIBUTES (t) = make_attribute ("aarch64_vector_pcs", "default", - TYPE_ATTRIBUTES (t)); + cl_target_option cur_target; + bool m_old_have_regs_of_mode[MAX_MACHINE_MODE]; + + if (node->simdclone->vecsize_mangle == 's') + { + tree target = build_string (strlen ("+sve"), "+sve"); + aarch64_option_valid_attribute_p (node->decl, NULL_TREE, target, 0); + cl_target_option_save (&cur_target, &global_options, &global_options_set); + tree new_target = DECL_FUNCTION_SPECIFIC_TARGET (node->decl); + cl_target_option_restore (&global_options, &global_options_set, + TREE_TARGET_OPTION (new_target)); + aarch64_override_options_internal (&global_options); + memcpy (m_old_have_regs_of_mode, have_regs_of_mode, + sizeof (have_regs_of_mode)); + for (int i = 0; i < NUM_MACHINE_MODES; ++i) + if (aarch64_sve_mode_p ((machine_mode) i)) + have_regs_of_mode[i] = true; + } + else + { + /* Add aarch64_vector_pcs target attribute to SIMD clones so they + use the correct ABI. */ + TYPE_ATTRIBUTES (t) = make_attribute ("aarch64_vector_pcs", "default", + TYPE_ATTRIBUTES (t)); + } + cgraph_simd_clone *sc = node->simdclone; + + for (unsigned i = 0; i < sc->nargs; ++i) + { + bool is_mask = false; + tree type; + switch (sc->args[i].arg_type) + { + case SIMD_CLONE_ARG_TYPE_MASK: + is_mask = true; + gcc_fallthrough (); + case SIMD_CLONE_ARG_TYPE_VECTOR: + case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_CONSTANT_STEP: + case SIMD_CLONE_ARG_TYPE_LINEAR_VAL_VARIABLE_STEP: + type = sc->args[i].vector_type; + gcc_assert (VECTOR_TYPE_P (type)); + if (node->simdclone->vecsize_mangle == 's') + type = simd_clone_adjust_sve_vector_type (type, is_mask, + sc->simdlen); + else if (is_mask) + type = truth_type_for (type); + sc->args[i].vector_type = type; + default: + continue; + } + } + if (node->simdclone->vecsize_mangle == 's') + { + tree ret_type = TREE_TYPE (t); + if (VECTOR_TYPE_P (ret_type)) + TREE_TYPE (t) + = simd_clone_adjust_sve_vector_type (ret_type, false, + node->simdclone->simdlen); + /* Restore current options. */ + cl_target_option_restore (&global_options, &global_options_set, &cur_target); + aarch64_override_options_internal (&global_options); + memcpy (have_regs_of_mode, m_old_have_regs_of_mode, + sizeof (have_regs_of_mode)); + } } /* Implement TARGET_SIMD_CLONE_USABLE. */ @@ -28705,6 +28827,10 @@ aarch64_simd_clone_usable (struct cgraph_node *node, stmt_vec_info stmt_vinfo) && aarch64_sve_mode_p (TYPE_MODE (STMT_VINFO_VECTYPE (stmt_vinfo)))) return -1; return 0; + case 's': + if (!TARGET_SVE) + return -1; + return 0; default: gcc_unreachable (); } diff --git a/gcc/omp-simd-clone.cc b/gcc/omp-simd-clone.cc index 864586207ee..066b6217253 100644 --- a/gcc/omp-simd-clone.cc +++ b/gcc/omp-simd-clone.cc @@ -541,9 +541,12 @@ simd_clone_mangle (struct cgraph_node *node, pp_string (&pp, "_ZGV"); pp_character (&pp, vecsize_mangle); pp_character (&pp, mask); - /* For now, simdlen is always constant, while variable simdlen pp 'n'. */ - unsigned int len = simdlen.to_constant (); - pp_decimal_int (&pp, (len)); + + unsigned long long len = 0; + if (simdlen.is_constant (&len)) + pp_decimal_int (&pp, (int) (len)); + else + pp_character (&pp, 'x'); for (n = 0; n < clone_info->nargs; ++n) { @@ -1533,8 +1536,8 @@ simd_clone_adjust (struct cgraph_node *node) below). */ loop = alloc_loop (); cfun->has_force_vectorize_loops = true; - /* For now, simlen is always constant. */ - loop->safelen = node->simdclone->simdlen.to_constant (); + /* We can assert that safelen is the 'minimum' simdlen. */ + loop->safelen = constant_lower_bound (node->simdclone->simdlen); loop->force_vectorize = true; loop->header = body_bb; } diff --git a/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c b/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c index e3668893afe..2b71869787e 100644 --- a/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c +++ b/gcc/testsuite/c-c++-common/gomp/declare-variant-14.c @@ -1,6 +1,6 @@ -/* { dg-do compile { target vect_simd_clones } } */ +/* { dg-do compile { target { { i?86-*-* x86_64-*-* } && vect_simd_clones } } } */ /* { dg-additional-options "-fdump-tree-gimple -fdump-tree-optimized" } */ -/* { dg-additional-options "-mno-sse3" { target { i?86-*-* x86_64-*-* } } } */ +/* { dg-additional-options "-mno-sse3" } */ int f01 (int); int f02 (int); @@ -15,15 +15,13 @@ int test1 (int x) { /* At gimplification time, we can't decide yet which function to call. */ - /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" { target { !aarch64*-*-* } } } } */ + /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" } } */ /* After simd clones are created, the original non-clone test1 shall call f03 (score 6), the sse2/avx/avx2 clones too, but avx512f clones shall call f01 with score 8. */ /* { dg-final { scan-tree-dump-not "f04 \\\(x" "optimized" } } */ - /* { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" { target { !aarch64*-*-* } } } } */ - /* { dg-final { scan-tree-dump-times "f03 \\\(x" 10 "optimized" { target { aarch64*-*-* } } } } */ - /* { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" { target { !aarch64*-*-* } } } } */ - /* { dg-final { scan-tree-dump-times "f01 \\\(x" 0 "optimized" { target { aarch64*-*-* } } } } */ + /* { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" } } */ + /* { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" } } */ int a = f04 (x); int b = f04 (x); return a + b; diff --git a/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c b/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c index e2e80f0c663..2f4d3a866e5 100644 --- a/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c +++ b/gcc/testsuite/gcc.target/aarch64/declare-simd-2.c @@ -43,6 +43,7 @@ float f04 (double a) } /* { dg-final { scan-assembler {_ZGVnN2v_f04:} } } */ /* { dg-final { scan-assembler {_ZGVnM2v_f04:} } } */ +/* { dg-final { scan-assembler-not {_ZGVs[0-9a-z]*_f04:} } } */ #pragma omp declare simd uniform(a) linear (b) void f05 (short a, short *b, short c) diff --git a/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-1.c b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-1.c new file mode 100644 index 00000000000..71fd361acec --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vect-simd-clone-1.c @@ -0,0 +1,52 @@ +/* { dg-do compile } */ +/* { dg-options "-std=c99" } */ +/* { dg-additional-options "-O3 -march=armv8-a+sve -mcpu=neoverse-n2" } */ +extern int __attribute__ ((simd, const)) fn0 (int); + +void test_fn0 (int *a, int *b, int n) +{ + for (int i = 0; i < n; ++i) + a[i] += fn0 (b[i]); +} + +/* { dg-final { scan-assembler {_ZGVsMxv_fn0} } } */ + +extern int __attribute__ ((simd, const)) fn1 (short, int); + +void test_fn1 (int *a, int *b, short *c, int n) +{ + for (int i = 0; i < n; ++i) + a[i] = fn1 (c[i], b[i]); +} + +/* { dg-final { scan-assembler {_ZGVsMxvv_fn1} } } */ + +extern short __attribute__ ((simd, const)) fn2 (short, int); + +void test_fn2 (short *a, int *b, short *c, int n) +{ + for (int i = 0; i < n; ++i) + a[i] = fn2 (c[i], b[i]); +} + +/* { dg-final { scan-assembler {_ZGVsMxvv_fn2} } } */ + +extern char __attribute__ ((simd, const)) fn3 (int, char); + +void test_fn3 (int *a, int *b, char *c, int n) +{ + for (int i = 0; i < n; ++i) + a[i] = (int) (fn3 (b[i], c[i]) + c[i]); +} + +/* { dg-final { scan-assembler {_ZGVsMxvv_fn3} } } */ + +extern short __attribute__ ((simd, const)) fn4 (int, short); + +void test_fn4 (int *a, int *b, short *c, int n) +{ + for (int i = 0; i < n; ++i) + a[i] = (int) (fn4 (b[i], c[i]) + c[i]); +} + +/* { dg-final { scan-assembler {_ZGVsMxvv_fn4} } } */ diff --git a/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90 b/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90 index 6319df0558f..3c7d093c5c6 100644 --- a/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90 +++ b/gcc/testsuite/gfortran.dg/gomp/declare-variant-14.f90 @@ -1,6 +1,6 @@ -! { dg-do compile { target vect_simd_clones } } +! { dg-do compile { target { { i?86-*-* x86_64-*-* } && vect_simd_clones } } } */ ! { dg-additional-options "-O0 -fdump-tree-gimple -fdump-tree-optimized" } -! { dg-additional-options "-mno-sse3" { target { i?86-*-* x86_64-*-* } } } +! { dg-additional-options "-mno-sse3" } module main implicit none @@ -41,7 +41,7 @@ contains ! shall call f01 with score 8. ! { dg-final { scan-tree-dump-not "f04 \\\(x" "optimized" } } ! { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" { target { !aarch64*-*-* } } } } - ! { dg-final { scan-tree-dump-times "f03 \\\(x" 6 "optimized" { target { aarch64*-*-* } } } } + ! { dg-final { scan-tree-dump-times "f03 \\\(x" 8 "optimized" { target { aarch64*-*-* } } } } ! { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" { target { !aarch64*-*-* } } } } ! { dg-final { scan-tree-dump-times "f01 \\\(x" 0 "optimized" { target { aarch64*-*-* } } } } a = f04 (x)